Title: Wavelet-Based, Blur-Aware Decoupled Network for Video Deblurring
<br>Cover Date: 2025-02-01
<br>Cover Display Date: February 2025
<br>DOI: 10.3390/app15031311
<br>Description: Video deblurring faces a fundamental challenge, as blur degradation comprehensively affects frames by not only causing detail loss but also severely distorting structural information. This dual degradation across low- and high-frequency domains makes it challenging for existing methods to simultaneously restore both structural and detailed information through a unified approach. To address this issue, we propose a wavelet-based, blur-aware decoupled network (WBDNet) that innovatively decouples structure reconstruction from detail enhancement. Our method decomposes features into multiple frequency bands and employs specialized restoration strategies for different frequency domains. In the low-frequency domain, we construct a multi-scale feature pyramid with optical flow alignment. This enables accurate structure reconstruction through bottom-up progressive feature fusion. For high-frequency components, we combine deformable convolution with a blur-aware attention mechanism. This allows us to precisely extract and merge sharp details from multiple frames. Extensive experiments on benchmark datasets demonstrate the superior performance of our method, particularly in preserving structural integrity and detail fidelity.
<br>Citations: 0
<br>Aggregation Type: Journal
<br>-------------------
<br><br><br>Title: Deformable Attention Network for Efficient Space-Time Video Super-Resolution
<br>Cover Date: 2025-01-01
<br>Cover Display Date: January/December 2025
<br>DOI: 10.1049/ipr2.70026
<br>Description: Space-time video super-resolution (STVSR) aims to construct high space-time resolution video sequences from low frame rate and low-resolution video sequences. While recent STVSR works combine temporal interpolation and spatial super-resolution in a unified framework, they face challenges in computational complexity across both temporal and spatial dimensions, particularly in achieving accurate intermediate frame interpolation and efficient temporal information utilisation. To address these, we propose a deformable attention network for efficient STVSR. Specifically, we introduce a deformable interpolation block that employs hierarchical feature fusion to effectively handle complex inter-frame motions at multiple scales, enabling more accurate intermediate frame generation. To fully utilise temporal information, we design a temporal feature shuffle block (TFSB) to efficiently exchange complementary information across multiple frames. Additionally, we develop a motion feature enhancement block incorporating channel attention mechanism to selectively emphasise motion-related features, further boosting TFSB's effectiveness. Experimental results on benchmark datasets definitively demonstrate that our proposed method achieves competitive performance in STVSR tasks.
<br>Citations: 0
<br>Aggregation Type: Journal
<br>-------------------
<br><br><br>Title: A Comparison of Road Damage Detection Based on YOLOv8
<br>Cover Date: 2023-01-01
<br>Cover Display Date: 2023
<br>DOI: 10.1109/ICMLC58545.2023.10327993
<br>Description: With the rapid development of computer technology, the intelligent driving of automobiles has become a popular research field. Road damage detection is critical in intelligent automobile driving and has been developed for a long time. The early detection method is to detect by embedding several sensors in the car. In recent years, deep learning methods have also been gradually applied in the research of pavement damage detection. This paper mainly takes road damage detection based on the YOLOv8 by comparing it to YOLOv5 and multiple pre-train models of YOLOv8. This paper describes the solution using YOLO to detect the various types of road damage in the Crowdsensing-based Road Damage Detection Challenge (CRDDC'2022). The dataset is separated into training, validation, and test sets. The medium and large pre-train models of YOLOv8 have the highest mAP and F1-score at 0.62 and 0.61, respectively. YOLOv8 of the small and medium pre-train model provides better performance than YOLOv5. For comparing YOLOv8, the performance of the large pre-train model is better than the others, but it is not much different from the medium pre-train model. For this reason, the medium pre-train model may be suitable for the real-time problem as it does not take time for training.
<br>Citations: 4
<br>Aggregation Type: Conference Proceeding
<br>-------------------
<br><br><br>Title: FUSION CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR THAI AND ENGLISH VIDEO SUBTITLE RECOGNITION
<br>Cover Date: 2022-12-01
<br>Cover Display Date: December 2022
<br>DOI: 10.24507/icicel.16.12.1331
<br>Description: Presently, subtitles are embedded into videos and placed on their bottom line. Locating the subtitle area and recognizing the text in the image is not simple. In this paper, we propose using the fusion convolutional recurrent neural network (CRNN) to recognize multi-language (Thai and English) from the subtitle word images. We fused the state-of-the-art convolutional neural networks (CNNs) with additional fusion operation, followed by the bidirectional long short-term memory (BiLSTM) network. For decoding the output from the text images, we compared two decoding algorithms consisting of connectionist temporal classification (CTC) and word beam search (WBS). We discovered that the WBS outperformed the CTC algorithms in accuracy performance. However, the WBS algorithm computed relatively slowly and is not suggested for application in real-time application. We evaluated our fusion CRNN architecture on the multi-language video subtitle dataset and achieved the CER value of 5.29% and 5.33% when decoding with WBS and CTC algorithms, respectively.
<br>Citations: 6
<br>Aggregation Type: Journal
<br>-------------------
<br><br><br>Title: Preface
<br>Cover Date: 2021-01-01
<br>Cover Display Date: 2021
<br>DOI: N/A
<br>Description: N/A
<br>Citations: 0
<br>Aggregation Type: Book Series
<br>-------------------
<br><br><br>Title: Thai Handwritten Recognition on BEST2019 Datasets Using Deep Learning
<br>Cover Date: 2021-01-01
<br>Cover Display Date: 2021
<br>DOI: 10.1007/978-3-030-80253-0_14
<br>Description: Handwritten recognition is a difficult task. The conventional technique relies on the character segmentation, feature extraction, and classification process. The segmentation is a tremendous challenge when there are variation of character patterns and alignments in a sentence, such as linking segments between characters in the Thai language. Promising segmentation outcome is favorable but not applicable in most applications. This work proposes a methodology for Thai handwritten recognition by applying Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). The first step is text localization before feeding to the network. CNN extracts the abstract features before they are fed to RNN to learn the sequence of characters in an image. The optimization is performed with an integrated Connectionist Temporal Classification (CTC) module (to arrange the final results). A standard Thai handwritten dataset (BEST2019) and more collection are used in this study. for training and test sets. The experimental results show that the integration of CNN and RNN provides promising results of the test set with a Character Error Rate (CER) of 1.58%. For testing with the seen and unseen dataset of the final round of BEST2019 competition, the CER is at 24.53%.
<br>Citations: 5
<br>Aggregation Type: Book Series
<br>-------------------
<br><br><br>Title: One-vs-One classification for deep neural networks
<br>Cover Date: 2020-12-01
<br>Cover Display Date: December 2020
<br>DOI: 10.1016/j.patcog.2020.107528
<br>Description: For performing multi-class classification, deep neural networks almost always employ a One-vs-All (OvA) classification scheme with as many output units as there are classes in a dataset. The problem of this approach is that each output unit requires a complex decision boundary to separate examples from one class from all other examples. In this paper, we propose a novel One-vs-One (OvO) classification scheme for deep neural networks that trains each output unit to distinguish between a specific pair of classes. This method increases the number of output units compared to the One-vs-All classification scheme but makes learning correct decision boundaries much easier. In addition to changing the neural network architecture, we changed the loss function, created a code matrix to transform the one-hot encoding to a new label encoding, and changed the method for classifying examples. To analyze the advantages of the proposed method, we compared the One-vs-One and One-vs-All classification methods on three plant recognition datasets (including a novel dataset that we created) and a dataset with images of different monkey species using two deep architectures. The two deep convolutional neural network (CNN) architectures, Inception-V3 and ResNet-50, are trained from scratch or pre-trained weights. The results show that the One-vs-One classification method outperforms the One-vs-All method on all four datasets when training the CNNs from scratch. However, when using the two classification schemes for fine-tuning pre-trained CNNs, the One-vs-All method leads to the best performances, which is presumably because the CNNs had been pre-trained using the One-vs-All scheme.
<br>Citations: 62
<br>Aggregation Type: Journal
<br>-------------------
<br><br><br>Title: Deep Learning with Data Augmentation for Fruit Counting
<br>Cover Date: 2020-01-01
<br>Cover Display Date: 2020
<br>DOI: 10.1007/978-3-030-61401-0_20
<br>Description: Counting the number of fruits in an image is important for orchard management, but is complex due to different challenging problems such as overlapping fruits and the difficulty to create large labeled datasets. In this paper, we propose the use of a data-augmentation technique that creates novel images by adding a number of manually cropped fruits to original images. This helps to increase the size of a dataset with new images containing more fruits and guarantees correct label information. Furthermore, two different approaches for fruit counting are compared: a holistic regression-based approach, and a detection-based approach. The regression-based approach has the advantage that it only needs as target value the number of fruits in an image compared to the detection-based approach where bounding boxes need to be specified. We combine both approaches with different deep convolutional neural network architectures and object-detection methods. We also introduce a new dataset of 1500 images named the Five-Tropical-Fruits dataset and perform experiments to evaluate the usefulness of augmenting the dataset for the different fruit-counting approaches. The results show that the regression-based approaches profit a lot from the data-augmentation method, whereas the detection-based approaches are not aided by data augmentation. Although one detection-based approach finally still works best, this comes with the cost of much more labeling effort.
<br>Citations: 2
<br>Aggregation Type: Book Series
<br>-------------------
<br><br><br>Title: Comparative study between deep learning and bag of visual words for wild-animal recognition
<br>Cover Date: 2017-02-09
<br>Cover Display Date: 9 February 2017
<br>DOI: 10.1109/SSCI.2016.7850111
<br>Description: Most research in image classification has focused on applications such as face, object, scene and character recognition. This paper examines a comparative study between deep convolutional neural networks (CNNs) and bag of visual words (BOW) variants for recognizing animals. We developed two variants of the bag of visual words (BOW and HOG-BOW) and examine the use of gray and color information as well as different spatial pooling approaches. We combined the final feature vectors extracted from these BOW variants with a regularized L2 support vector machine (L2-SVM) to distinguish between classes within our datasets. We modified existing deep CNN architectures: AlexNet and GoogleNet, by reducing the number of neurons in each layer of the fully connected layers and last inception layer for both scratch and pre-trained versions. Finally, we compared the existing CNN methods, our modified CNN architectures and the proposed BOW variants on our novel wild-animal dataset (Wild-Anim). The results show that the CNN methods significantly outperform the BOW techniques.
<br>Citations: 41
<br>Aggregation Type: Conference Proceeding
<br>-------------------
<br><br><br>Title: Data augmentation for plant classification
<br>Cover Date: 2017-01-01
<br>Cover Display Date: 2017
<br>DOI: 10.1007/978-3-319-70353-4_52
<br>Description: Data augmentation plays a crucial role in increasing the number of training images, which often aids to improve classification performances of deep learning techniques for computer vision problems. In this paper, we employ the deep learning framework and determine the effects of several data-augmentation (DA) techniques for plant classification problems. For this, we use two convolutional neural network (CNN) architectures, AlexNet and GoogleNet trained from scratch or using pre-trained weights. These CNN models are then trained and tested on both original and data-augmented image datasets for three plant classification problems: Folio, AgrilPlant, and the Swedish leaf dataset. We evaluate the utility of six individual DA techniques (rotation, blur, contrast, scaling, illumination, and projective transformation) and several combinations of these techniques, resulting in a total of 12 data-augmentation methods. The results show that the CNN methods with particular data-augmented datasets yield the highest accuracies, which also surpass previous results on the three datasets. Furthermore, the CNN models trained from scratch profit a lot from data augmentation, whereas the fine-tuned CNN models do not really profit from data augmentation. Finally, we observed that data-augmentation using combinations of rotation and different illuminations or different contrasts helped most for getting high performances with the scratch CNN models.
<br>Citations: 96
<br>Aggregation Type: Book Series
<br>-------------------
<br><br><br>Title: Comparing local descriptors and bags of visualwords to deep convolutional neural networks for plant recognition
<br>Cover Date: 2017-01-01
<br>Cover Display Date: 2017
<br>DOI: 10.5220/0006196204790486
<br>Description: The use of machine learning and computer vision methods for recognizing different plants from images has attracted lots of attention from the community. This paper aims at comparing local feature descriptors and bags of visual words with different classifiers to deep convolutional neural networks (CNNs) on three plant datasets; AgrilPlant, LeafSnap, and Folio. To achieve this, we study the use of both scratch and fine-tuned versions of the GoogleNet and the AlexNet architectures and compare them to a local feature descriptor with k-nearest neighbors and the bag of visual words with the histogram of oriented gradients combined with either support vector machines and multi-layer perceptrons. The results shows that the deep CNN methods outperform the hand-crafted features. The CNN techniques can also learn well on a relatively small dataset, Folio.
<br>Citations: 100
<br>Aggregation Type: Conference Proceeding
<br>-------------------
<br><br><br>Title: Classifying peer-to-peer traffic using protocol hierarchy
<br>Cover Date: 2014-07-30
<br>Cover Display Date: 30 July 2014
<br>DOI: 10.1109/ICCOINS.2014.6868391
<br>Description: Detection and classification of peer-to-peer traffic are still difficult tasks for bandwidth shapers. First, peer-to-peer traffic is not easy to detect, and can be a serious problem. Second, some peer-to-peer applications may be desirable, while some may be undesirable. Hence, different peer-to-peer applications should also be treated differently. The previous work of peer-to-peer traffic detection still faces both problems. So, in this paper, we propose new classification mechanisms to solve the problems. Our proposed solution has been implemented by using JAVA, and experimented on a network test-bed. Experimental results have demonstrated that our extended classification mechanism can improve the peer-to-peer traffic detection and classification.
<br>Citations: 1
<br>Aggregation Type: Conference Proceeding
<br>-------------------
<br><br><br>Title: Valuable tourism information via mobile application
<br>Cover Date: 2014-01-01
<br>Cover Display Date: 2014
<br>DOI: 10.4028/www.scientific.net/AMR.1044-1045.1428
<br>Description: The travel and tourism industry is one of the main sources of income in Thailand. People choose to travel around the world, as a way to relax and enjoy their time. Users search information from many resources before and during their travel. We grouped the mobile tourism applications into two main groups in order to analyse the nature of information in the tourism applications. Social network provides rich collaborative user-generated information. We found that most tourism applications require personal information and pre-existing associations in order to get information. We argue that, in tourism, a user requires instant and easy access to information. So the social network might not be an appropriate option. Therefore we propose a simple collaborative usergenerated content application, which has a location awareness chat system. We aim to provide an application with self-sufficient information that allows the user to instantly share, search, and comment on information at anytime and anywhere. Moreover, the location awareness chat system is introduced to provide instant firsthand information to the users, as well.
<br>Citations: 4
<br>Aggregation Type: Book Series
<br>-------------------
<br><br><br>