Title: EPILEPTIC SEIZURES DETECTION FROM EEG SIGNALS USING HYBRID DEEP LEARNING MODEL WITH CONVOLUTIONAL BLOCK ATTENTION MODULE
Cover Date: 2025-06-01
Cover Display Date: June 2025
DOI: 10.24507/icicelb.16.06.679
Description: Identifying epileptic seizures from electroencephalography (EEG) signals re-mains a formidable task due to the variability of the recorded brain activity. We designed a hybrid deep learning model, CNN-BiGRU-CBAM, for accurate binary classification of seizure events. This model combines convolutional neural network (CNN) layers to automatically extract spatial-temporal features, bidirectional gated recurrent unit (BiGRU) layers to capture temporal dependencies, and a convolutional block attention module (CBAM) focusing on the most informative features. Our model achieved state-of-the-art performance when evaluated on a benchmark EEG dataset, with an accuracy and F1-score of 98.03%. The attention module is crucial in identifying the most relevant EEG channels and temporal segments associated with seizure events. Our visualizations highlight the periods and channels the model attends to during seizure and non-seizure events, enhancing the explainability of seizure characterization. By engineering channel and temporal attention mechanisms within the hybrid architecture, we attained higher performance and explainability for automated, patient-specific seizure detection from EEG data. This advancement in epileptic seizure characterization can facilitate prompt intervention and improve patient care.
Citations: 0
Aggregation Type: Journal
-------------------


Title: RESNET-CBAM: A DEEP CNN WITH CONVOLUTION BLOCK ATTENTION MODULE FOR SENSOR-BASED HUMAN ACTIVITY RECOGNITION
Cover Date: 2025-06-01
Cover Display Date: June 2025
DOI: 10.24507/icicelb.16.06.689
Description: The widespread adoption of wearable devices and sensors has made human activity recognition (HAR) using sensor data a vital research area with applications in healthcare, sports, and assisted living. Deep convolutional neural networks (CNNs) have demonstrated excellent performance in sensor-based HAR tasks by automatically learning discriminative features from raw sensor inputs. However, the feature maps learned by CNN layers contain relevant and irrelevant features. Attention mechanisms have proven effective in allowing CNNs to focus on the most informative features. This paper pro-poses ResNet-CBAM, a deep residual CNN architecture augmented with a convolutional block attention module (CBAM) for sensor-based HAR. The CBAM module incorporates channel and spatial attention to focus on “what” and “where” informative features. Channel attention utilizes average and max pooling operations to capture global and local context interactions. Spatial attention generates attention maps along spatial dimensions. ResNet architectures leverage identity shortcut connections to propagate signals, effectively addressing issues like vanishing gradients. Our ResNet-CBAM model integrates CBAM blocks into a ResNet to refine intermediate feature maps within residual blocks. We evaluate the ResNet-CBAM network on a public benchmark dataset containing accelerometer and gyroscope data from waist-mounted devices during various activity classes. We train the model using a train/test split and a 5-fold stratified cross-validation strategy using backpropagation and Adam optimization. Results demonstrate that ResNet-CBAM achieves an average F1-score of over 98.25%, outperforming baseline CNN, LSTM, and GRU models by a significant margin.
Citations: 0
Aggregation Type: Journal
-------------------


Title: ENHANCING PERSONALITY CHARACTERISTICS ANALYSIS WITH SMOTE AND ASSOCIATION RULE MINING: A CASE STUDY ON INTROVERTS AND EXTROVERTS
Cover Date: 2025-06-01
Cover Display Date: June 2025
DOI: 10.24507/icicel.19.06.597
Description: The classification of personality characteristics, typically divided into introverts and extroverts, differs from general public characteristics. Personality variation within teams significantly impacts team development and presents challenges for leaders in effective team management. Understanding how personality characteristics align with different types of work can enhance team potential. This research identifies variables relevant to analyzing co-worker personalities within organizations. An association rules model was constructed using questionnaire data to analyze introverted and extroverted characteristics. Imbalances in the data distribution were addressed using the synthetic minority oversampling technique, resulting in a balanced dataset with 3,198 extroverts and 3,512 introverts. The Apriori algorithm then generated association rules from this dataset, focusing on single-dimensional rules with high accuracy for each class. For the introvert class, the highest accuracy (96.52%) was associated with “Q81A: I am quiet around strangers (Agree)”, while the extrovert class achieved 68.81% accuracy with “Q82A: I do not talk a lot (Disagree)”. Optimal accuracy with two-rule associations reached 98.49% for introverts and 80.48% for extroverts.
Citations: 0
Aggregation Type: Journal
-------------------


Title: A DEEP NEURAL NETWORK WITH AGGREGATED RESIDUAL TRANSFORMATION FOR SMARTWATCH-BASED HUMAN ACTIVITY RECOGNITION IN REAL WORLD SITUATIONS
Cover Date: 2025-03-01
Cover Display Date: March 2025
DOI: 10.24507/icicel.19.03.343
Description: The field of pervasive computing focuses on using sensors to identify human activities, a practice commonly known as Sensor-based Human Activity Recognition (S-HAR). The objective of S-HAR is to automatically evaluate and understand real-time events and their contextual information by utilizing sensor data. Activity identification has various applications, including surveillance systems, medical monitoring systems, and systems involving wearable intelligent devices like smartwatches. Contemporary HAR algorithms are typically developed and evaluated using controlled conditions, which limits their effectiveness in real-life scenarios where sensor data may be incomplete or corrupted and human actions are spontaneous and unscripted. This study aims to identify human behavior in real-world scenarios. To improve the efficiency of the action comprehension structure, we propose a novel deep neural network architecture called ResNeXt, which incorporates an aggregated residual transformation component. This component enables the framework to categorize different human actions effectively and accurately. We evaluated the proposed network using the publicly available IDLab Real-World dataset for human activity recognition. This dataset was utilized for training and testing the model, employing a 5-fold cross-validation approach. Based on extensive investigations, we found that ResNeXt achieved the highest accuracy rate of 98.32% and an F1-score of 87.90%.
Citations: 0
Aggregation Type: Journal
-------------------


Title: SENTIMENT ANALYSIS OF THAI LABORERS’ PERCEPTIONS OF WORKING ABROAD: A MACHINE LEARNING APPROACH USING YOUTUBE COMMENTS
Cover Date: 2025-03-01
Cover Display Date: March 2025
DOI: 10.24507/icicelb.16.03.333
Description: The rising trend of Thai workers seeking employment overseas necessitates a nuanced understanding of their attitudes towards labor migration. This research paper delves into the perceptions of Thai workers about working overseas, focusing on five destinations: Australia, Japan, South Korea, Taiwan, and the United States. It applies machine learning methods to dissecting sentiments embedded in 37,077 comments from 400 YouTube videos, covering preparation and law, lifestyle, and work experience. The study uses Python for computational analysis to sort these comments into positive, negative, and neutral sentiments. The Naïve Bayes Support Vector Machine (NBSVM) algorithm emerged as the most effective model for classifying these sentiments. Our findings indicate that Australia elicited the most positive responses (32.31%) and the least negative perceptions, whereas Japan registered the highest proportion of negative sentiments (15.43%) across various aspects. The results, illustrated through quantitative percentages and visual representations like bar charts and word clouds, underscore the potential of machine learning in providing actionable insights for policymakers and market analysts in labor migration.
Citations: 0
Aggregation Type: Journal
-------------------


Title: PREDICTION OF STONE TYPES USING CONVOLUTIONAL NEURAL NETWORKS TECHNIQUE
Cover Date: 2025-03-01
Cover Display Date: March 2025
DOI: 10.24507/icicelb.16.03.343
Description: This paper investigates using Convolutional Neural Networks (CNNs), specifically the MobileNetV2 architecture, for predicting stone types. The research focused on classifying five stone categories – granite, marble, limestone, sandstone, and slate – using a dataset of 2,500 images. The CNN model was trained over 100 epochs, achieving a high training accuracy of 89.6%, demonstrating its capability to learn and identify distinct patterns within stone images. However, the model faced challenges with overfitting, as evidenced by the testing accuracy stabilizing around 60%, indicating difficulties in generalizing to unseen data. Evaluation of key performance metrics, including precision, recall, and F1 score, showed strong performance in identifying stone types like limestone and sandstone but highlighted areas needing improvement, such as distinguishing granite and marble. The study underscores the potential of CNNs for stone-type classification and proposes future enhancements through techniques like data augmentation, ensemble learning, and transfer learning to improve generalization and predictive accuracy. This research provides valuable insights into applying CNNs in material classification within geological contexts.
Citations: 0
Aggregation Type: Journal
-------------------


Title: FUSION-BASED CONVOLUTIONAL RECURRENT NEURAL NETWORK FOR IMPROVED DYNAMIC THAI FINGERSPELLING RECOGNITION
Cover Date: 2025-02-01
Cover Display Date: February 2025
DOI: 10.24507/icicelb.16.02.201
Description: Sign language recognition (SLR) has been an active research area due to the difficulty of interpreting hand and upper body movements in real life. Dynamic finger-spelling recognition is a very challenging task due to the problem associated with algo-rithms attempting to understand the meaning of fingerspelling from real-time videos. In this research, we propose the fusion-based convolutional recurrent neural network (CR-NN) that fuses a three-dimensional convolutional neural network (3D-CNN) and CNN model for extracting robust spatiotemporal features from the sequential images in a video. The fusion based CRNN framework was divided into deep feature extraction and sequence learning modules. In the deep feature extraction, the video was extracted and only 32 frames were selected. Additionally, we trained a YOLOv5 model for detecting or localizing the upper body of a human designed region of interest (ROI). After calculat-ing the ROI, it was sent to 3D-CNN and CNN to extract the solid sequential features. Furthermore, an addition operator was used in merging the sequential features, and the resulting features were passed to a sequence learning mechanism (bidirectional long short-term memory) in creating a robust model for recognizing dynamic fingerspelling. In the experiments, we evaluated the fusion based CRNN on the dynamic Thai fingerspelling dataset, including short videos of 42 classes from 3,025 videos. The experimental results indicated that the fusion based CRNN achieved an accuracy of 91.73% on the dynamic Thai fingerspelling dataset and outperformed the existing method.
Citations: 0
Aggregation Type: Journal
-------------------


Title: ENHANCING IMAGE CAPTION PERFORMANCE WITH IMPROVED VISUAL ATTENTION MECHANISM
Cover Date: 2025-01-01
Cover Display Date: January 2025
DOI: 10.24507/icicelb.16.01.73
Description: Image captioning analyzes and translates images into text, requiring extensive data and often facing challenges in comprehending the diverse contents of images during text generation. This research enhances image captioning using a visual attention mechanism to improve image-to-text translation performance. We propose a neural network architecture comprising an encoder, decoder, and beam search. The encoder uses either dual convolutional neural networks (Dual-CNN) or a single CNN to extract visual features, which are then passed to the decoder. The decoder employs long short-term memory (LSTM) to learn temporal and sequential patterns, converting visual features into output probabilities. The resulting outputs are then processed by the beam search algorithm to generate the best captions. Three experiments were conducted. First, single CNN architectures (ResNet-101, EfficientNet-B0, and ResNeXt-101) were evaluated with visual attention mechanisms on the Flickr8K dataset using BLEU scores. ResNet-101 achieved the highest performance. Second, three Dual-CNNs combined with attention mechanisms were tested, with ResNet-101 and EfficientNet-B0 outperforming other combinations. Third, early stopping was used to determine the optimal training epoch, revealing that the Dual-CNN with visual attention mechanism yielded the best results. The proposed framework, tested on the Flickr8K dataset, achieved BLEU scores of 68.76%, 49.15%, 35.46%, and 24.71% in different scenarios, demonstrating superior performance compared to other approaches.
Citations: 0
Aggregation Type: Journal
-------------------


Title: MULTIMODAL EMOTION RECOGNITION BASED ON HYBRID ENSEMBLE DEEP LEARNING FRAMEWORK
Cover Date: 2025-01-01
Cover Display Date: January 2025
DOI: 10.24507/icicelb.16.01.93
Description: Understanding emotions is crucial for accurately predicting human behavior. By anticipating emotions, we can forecast decisions and respond effectively. Emotion recognition models can be applied to robots and computers to enhance various business environments. Recognizing emotions is challenging due to diverse sources such as facial expressions, audio, text, and electroencephalogram (EEG) signals. In this research, we propose a hybrid ensemble deep learning framework for multimodal emotion recognition using emotional facial images and audio. The framework involves extracting features from facial images (visual) and audio using well-known convolutional neural network (CNN) models, followed by processing these features with bidirectional long short-term memory (BiLSTM) networks. We employed DenseNet121-BiLSTM and ResNet50-Bi-LSTM, referred to as V-Emotion and A-Emotion, respectively. Additionally, audio and visual features were concatenated and fed into a BiLSTM, named VA-Emotion. The final step of the proposed framework integrates the outputs of the V-, A-, and VA-Emotion models using a weighted average ensemble learning method, assigning higher weights to models with greater classification accuracy. We evaluated the proposed framework on the RAVDESS dataset, achieving an accuracy of 91.67%. Our experimental results demonstrate that the proposed framework outperforms existing methods.
Citations: 0
Aggregation Type: Journal
-------------------


Title: MULTIMODAL FUSION OF CONVOLUTIONAL RECURRENT NEURAL NETWORK AND LANGUAGE MODELS FOR TEXT RECOGNITION
Cover Date: 2025-01-01
Cover Display Date: January 2025
DOI: 10.24507/icicelb.16.01.83
Description: Humans often encounter text in natural scenes daily, such as on traffic signs, billboards, and walls. From a computer vision perspective, two main learning paradigms (text detection and text recognition) are most commonly explored for localizing and predicting text in natural scenes. However, many traditional computer vision algorithms for text recognition in natural scenes struggle with prediction accuracy due to variations in font styles, colors, blurriness, and text distortion. To address these challenges, this paper proposes a text recognition architecture that employs a fusion of multimodal contexts (vision and language models) trained on a multi-language video subtitle dataset, aimed at recognizing text (English letters and Arabic numbers) from video frames (scene images). To achieve this goal, the vision model was developed using a convolutional recurrent neural network (CRNN) integrated with a connectionist temporal classification decoder for feature extraction and text prediction. The language model was created using a sequence-to-sequence model (bidirectional gated recurrent unit: BiGRU) that learns text sequence representations and produces readable text output. The resulting proposed fused modality, known as Fusion-based CRNN with Sequence-to-Sequence (Fusion-CRNN+Seq2Seq), is used for recognizing text from images. The proposed method outperforms all other approaches and achieves the lowest character error rates of 1.36 and 1.22 based on different BiGRU network configurations.
Citations: 0
Aggregation Type: Journal
-------------------


Title: Multi-layer adaptive spatial-temporal feature fusion network for efficient food image recognition
Cover Date: 2024-12-01
Cover Display Date: 1 December 2024
DOI: 10.1016/j.eswa.2024.124834
Description: Numerous deep learning methods have been developed to tackle the challenges of recognizing food images, including convolutional neural networks, deep feature extraction, and deep feature fusion methods. This research proposes a new architecture called ASTFF-Net that uses deep feature fusion to tackle various challenges in food recognition, including similarity patterns between two categories, multi-object problems, light conditions, camera position, noise objects, and blurred images. ASTFF-Net is a robust and adaptive spatial–temporal fusion network designed to address these challenges effectively. The ASTFF-Net architecture consisted of three networks. In the spatial feature extraction network, the ResNet50 architecture was used to extract robust spatial features, and the reduction operation was utilized to minimize parameter size. Subsequently, the spatial features were passed through a 1D convolution (Conv1D) to fit the features into the recurrent neural networks. In the temporal feature extraction network, the spatial features were given to the long short-term memory, allowing the network to learn from various long sequence patterns. In the adaptive feature fusion network, the robust spatial and temporal features were fused and assigned to the Conv1D, followed by the softmax function. The ASTFF-Net architecture is also intended to decrease the number of network parameters and prevent overfitting problems. Experimental results on four benchmark food image datasets: Food11, UEC Food-100, UEC Food-256, and ETH Food-101, demonstrate that the proposed ASTFF-Nets, particularly ASTFF-NetB3, were more competitive compared with other existing methods.
Citations: 4
Aggregation Type: Journal
-------------------


Title: SEMG-BASED MUSCULAR MOVEMENT RECOGNITION FOR HAND PROSTHESIS USING CNN-LSTM
Cover Date: 2024-12-01
Cover Display Date: December 2024
DOI: 10.24507/icicelb.15.12.1311
Description: Recent advancements in sensing technology have enabled the development of more sophisticated assistive devices. Real-time myoelectric interfaces use surface electromyography (sEMG) to capture muscular activities. These signals can be utilized to create myoelectric prosthetic hands for individuals with physical disabilities. Accurate classification of the acquired sEMG signals is critical for effectively controlling external devices. This study introduces deep learning techniques for classifying muscular activities based on sEMG data. The methodology involves data acquisition, pre-processing, generation, and model training/testing. The Ninapro-DB1 dataset of sEMG signals from 27 healthy participants performing 53 hand motions was utilized. Multiple experiments compared various deep learning architectures – convolutional neural networks (CNN), long short-term memory networks (LSTM), bidirectional LSTM (BiLSTM), gated recurrent units (GRU), and bidirectional GRU (BiGRU). A novel hybrid CNN-LSTM model is proposed to automatically extract spatial and temporal features from the raw sEMG data. Experimental results demonstrate the hybrid model achieves 99.27% accuracy and F1-score, outperforming other deep learning models. Therefore, this study shows deep learning, specifically a CNN-LSTM hybrid, can effectively classify muscle movements from sEMG data for assistive technology applications.
Citations: 1
Aggregation Type: Journal
-------------------


Title: Mulberry leaf dataset for image classification task
Cover Date: 2024-06-01
Cover Display Date: June 2024
DOI: 10.1016/j.dib.2024.110281
Description: This manuscript presents a mulberry leaf dataset collected from five provinces within three regions in Thailand. The dataset contains ten categories of mulberry leaves. We proposed this dataset due to the challenges of classifying leaf images taken in natural environments arising from high inter-class similarity and variations in illumination and background conditions (multiple leaves from a mulberry tree and shadows appearing in the leaf images). We highlight that our research team recorded mulberry leaves independently from various perspectives during our data acquisition using multiple camera types. The mulberry leaf dataset can serve as vital input data passed to computer vision algorithms (conventional deep learning and vision transformer algorithms) for creating image recognition systems. The dataset will allow other researchers to propose novel computer vision techniques to approach mulberry recognition challenges.
Citations: 2
Aggregation Type: Journal
-------------------


Title: SPECIAL ISSUE ON EMERGING TRENDS IN ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS
Cover Date: 2024-05-01
Cover Display Date: May 2024
DOI: N/A
Description: N/A
Citations: 0
Aggregation Type: Journal
-------------------


Title: Preface
Cover Date: 2024-04-26
Cover Display Date: 26 April 2024
DOI: N/A
Description: N/A
Citations: 0
Aggregation Type: Conference Proceeding
-------------------


Title: Vehicle image datasets for image classification
Cover Date: 2024-04-01
Cover Display Date: April 2024
DOI: 10.1016/j.dib.2024.110133
Description: Vehicle image recognition is a critical research area with diverse traffic management, surveillance, and autonomous driving systems applications. Accurately classifying and identifying vehicles from images play a crucial role in these domains. This work presents two vehicle image datasets: the vehicle type image dataset version 2 (VTID2) and the vehicle make image dataset (VMID). The VTID2 Dataset comprises 4,356 images of Thailand's five most used vehicle types, which enhances diversity and reduces the risk of overfitting problems. This expanded dataset offers a more extensive and varied collection for robust model training and evaluation. This dataset will be valuable for researchers focusing on vehicle image recognition tasks. With an emphasis on sedans, hatchbacks, pick-ups, SUVs, and other vehicles, the dataset allows for developing and evaluating algorithms that accurately classify different types of vehicles. The VMID Dataset contains 2,072 images of logos (called vehicle make) from eleven prominent vehicle brands in Thailand. The proposed dataset will facilitate the development of computer vision algorithms and the evaluation of learning algorithm model performance metrics. These two datasets provide valuable resources to the research community that will foster possible research advancements in vehicle recognition, vehicle logo detection or localization, and vehicle segmentation, contributing to the development of intelligent transportation systems.
Citations: 5
Aggregation Type: Journal
-------------------


Title: RECOGNITION OF EXERCISE ACTIVITY USING CNN AND LSTM BASED ON ACCELEROMETER DATA
Cover Date: 2024-01-01
Cover Display Date: January 2024
DOI: 10.24507/icicel.18.01.69
Description: Applying wearable sensors to recognize human activities has developed as an emergent topic in the field of research on artificial intelligence. This is because human activity recognition (HAR) implementations extend from intelligent and sophisticated healthcare applications to other areas like innovative home surveillance systems and exercise performance monitoring devices. Exercise activity recognition (EAR) is a subclass of HAR investigating complicated human movement sequences. Literature evaluations indicate that understanding multi-modal sensors of diverse data kinds has various obstacles. We investigated multi-modal EAR utilizing deep learning techniques utilizing sensor data from many body areas. Focusing on accelerometer data, we proposed the hybrid model with the combination of a deep convolutional neural network (CNN) and long short-term memory (LSTM) neural network (called CNN-LSTM) for effectively recognizing fitness activities. The trained deep learning classifier's accuracy, loss, and F1-score were determined using a public standard EAR dataset (MEx dataset) to assess the newly proposed classifier. We inferred from experimental findings that the proposed CNN-LSTM could classify exercise activities utilizing accelerometer data from object location with the most significant accuracy (97.23%) and F1-score (97.20%), surpassing existing baseline classifiers.
Citations: 2
Aggregation Type: Journal
-------------------


Title: Robust Model Selection for Plant Leaf Image Recognition Based on Evolutionary Ant Colony Optimization with Learning Rate Schedule
Cover Date: 2024-01-01
Cover Display Date: 2024
DOI: 10.1109/ACCESS.2024.3457753
Description: Selecting optimal deep learning models is often a time-consuming process. To address this challenge, we propose a novel variant of the ant colony optimization (ACO) algorithm. This approach is designed to enhance model selection across various deep learning architectures, with a particular focus on leaf classification tasks. We introduce a new ACO technique specifically tailored for selecting robust models within convolutional neural networks (CNNs). These models are then integrated into an ensemble learning framework known as ensemble CNNs. A distinguishing feature of our proposed evolutionary ACO algorithm is its ability to consistently identify a set of robust CNN models in each iteration. This capability is facilitated by an innovative fitness function and an adaptive learning rate schedule embedded within the ACO algorithm, which optimizes pheromone distribution. Unlike the original ACO algorithm, which consistently selects the same CNN model, our evolutionary approach enables the dynamic discovery of new CNN models. To validate our method, we conducted experiments on two plant leaf datasets: Mulberry and Turkey-plant. Our comparison with existing methods, specifically the ant colony system (ACS) and the max-min ant system (MMAS), demonstrated that the MMAS algorithm outperformed the ACS algorithm. Furthermore, we explored three ensemble learning techniques: unweighted average, weighted average, and cost-sensitive learning. The weighted average method emerged as the most effective ensemble approach, with its parameters determined through a grid search process. The results indicate that the evolutionary ACO algorithm not only facilitates the selection of robust deep learning models but also achieves superior performance compared to the original ACO algorithm when applied to the Mulberry leaf and Turkey-plant datasets.
Citations: 2
Aggregation Type: Journal
-------------------


Title: ROAD SURFACE CLASSIFICATION FOR INTELLIGENT VEHICLE PERCEPTION BASED ON INERTIAL SENSORS
Cover Date: 2024-01-01
Cover Display Date: January 2024
DOI: 10.24507/icicel.18.01.79
Description: In latest years, the necessity for several sources of situational information from the traffic environment has increased due to the growth of Intelligent Transport System (ITS) solutions, such as autonomous vehicles and enhanced driver support systems. Identifying Road Surface Type (RST) within this environmental information is essential and applicable throughout the ITS sector. The classification method must function successfully across various cars, driving behaviors, and situations in which a vehicle might operate. In this study, we use inertial sensors, such as accelerometers, gyroscopes, and magnetometers, which are reliable, non-polluting, and low-cost solutions appropriate for large-scale deployment, to develop a deep learning model that classifies road surface characteristics effectively. These sensor data were employed in three basic deep learning models, including our proposed RST-PyramidNet model: CNN-based, LSTM-based, and GRU-based models. A public benchmark dataset named Passive Vehicular Sensors (PVS) dataset based on the 5-fold cross-validation methodology is used to assess the effectiveness of these models. The experimental findings indicate that the proposed RST-PyramidNet surpasses previous benchmark deep learning models with an accuracy of 97.68% and an F1-score of 97.35%.
Citations: 3
Aggregation Type: Journal
-------------------


Title: Improving Neural Network-Based Multi-Label Classification with Pattern Loss Penalties
Cover Date: 2024-01-01
Cover Display Date: 2024
DOI: 10.1109/ACCESS.2024.3386841
Description: This research work introduces two novel loss functions, pattern-loss (POL) and label similarity-based instance modeling (LSIM), for improving the performance of multi-label classification using artificial neural network-based techniques. These loss functions incorporate additional optimization constraints based on the distribution of multi-label class patterns and the similarity of data instances. By integrating these patterns during the network training process, the trained model is tuned to align with the existing patterns in the training data. The proposed approach decomposes the loss function into two components: the cross entropy loss and the pattern loss derived from the distribution of class-label patterns. Experimental evaluations were conducted on eight standard datasets, comparing the proposed methods with three existing techniques.The results demonstrate the effectiveness of the proposed approach, with POL and LSIM consistently achieving superior accuracy performance compared to the benchmark methods.
Citations: 3
Aggregation Type: Journal
-------------------


Title: DEEP LEARNING FOR RECOGNIZING DAILY HUMAN ACTIVITIES USING SMART HOME SENSORS
Cover Date: 2023-12-01
Cover Display Date: December 2023
DOI: 10.24507/icicel.17.12.1375
Description: One of the vital purposes of health-related studies is to enhance people’s living conditions and well-being. Solutions for smart homes could offer occupants preventive care based on the identification of regular activities. Recent advancements and developments in sensor technology have raised the demand for intelligent household products and services. The rising volume of data necessitates the development of the deep learning domain for the automated identification of human motions. Moreover, networks with long short-term memory have been used to represent spatio-temporal sequences recorded by smart home sensors. This study proposed ResNeXt-based models that learn to identify human behaviors in smart homes to increase detection capability. Experiment findings generated on a publicly available benchmark dataset known as CASAS data demonstrate that ResNeXt-based techniques surpass conventional DL approaches, achieving improved outcomes compared to the existing research. ResNeXt outperformed the benchmark approach by an average of 84.81%, 93.57%, and 90.38% for the CASAS Cairo, CASAS Milan, and CASAS Kyoto3 datasets, respectively.
Citations: 4
Aggregation Type: Journal
-------------------


Title: SPECIAL ISSUE ON MACHINE LEARNING AND DIGITAL ENGINEERING
Cover Date: 2023-03-01
Cover Display Date: March 2023
DOI: N/A
Description: N/A
Citations: 0
Aggregation Type: Journal
-------------------


Title: FUSION CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR THAI AND ENGLISH VIDEO SUBTITLE RECOGNITION
Cover Date: 2022-12-01
Cover Display Date: December 2022
DOI: 10.24507/icicel.16.12.1331
Description: Presently, subtitles are embedded into videos and placed on their bottom line. Locating the subtitle area and recognizing the text in the image is not simple. In this paper, we propose using the fusion convolutional recurrent neural network (CRNN) to recognize multi-language (Thai and English) from the subtitle word images. We fused the state-of-the-art convolutional neural networks (CNNs) with additional fusion operation, followed by the bidirectional long short-term memory (BiLSTM) network. For decoding the output from the text images, we compared two decoding algorithms consisting of connectionist temporal classification (CTC) and word beam search (WBS). We discovered that the WBS outperformed the CTC algorithms in accuracy performance. However, the WBS algorithm computed relatively slowly and is not suggested for application in real-time application. We evaluated our fusion CRNN architecture on the multi-language video subtitle dataset and achieved the CER value of 5.29% and 5.33% when decoding with WBS and CTC algorithms, respectively.
Citations: 6
Aggregation Type: Journal
-------------------


Title: SPECIAL ISSUE ON COMPUTATIONAL INTELLIGENCE AND ITS APPLICATIONS
Cover Date: 2022-11-01
Cover Display Date: November 2022
DOI: N/A
Description: N/A
Citations: 0
Aggregation Type: Journal
-------------------


Title: FUSION LIGHTWEIGHT CONVOLUTIONAL NEURAL NETWORKS AND SEQUENCE LEARNING ARCHITECTURES FOR VIOLENCE CLASSIFICATION
Cover Date: 2022-10-01
Cover Display Date: October 2022
DOI: 10.24507/icicelb.13.10.1027
Description: Stopping violent incidents in real life is more dangerous for ordinary peo-ple. It may harm people’s lives. Calling the police is the best choice to stop the violence. We should have an automatic system to recognize violence and warn the police on time. This paper proposes a method to classify violent incidents from video. However, classification of violent videos faces many challenging problems, such as video length, quality, angles and orientations of the recording devices. The proposed method is called fusion MobileNets-BiLSTM architecture. In the first part, we propose to use the lightweight MobileNetV1 and MobileNetV2 to extract the robust deep spatial features from the video so that only non-adjacent 16 frames were selected. The spatial features were transferred to the global average pooling, batch normalization, and time distribution. In the second part, the spatial features from the first part were concatenated and then sent to create the deep temporal features using the bidirectional long short-term memory (BiLSTM). The proposed fusion MobileNets-BiLSTM architecture was evaluated on the hockey fight dataset. The experimental results showed that the proposed method provides better results than the existing methods. It achieved 95.20% accuracy on the test set of the hockey fight dataset.
Citations: 1
Aggregation Type: Journal
-------------------


Title: EFFECTIVE DATA RESAMPLING AND META-LEARNING CONVOLUTIONAL NEURAL NETWORKS FOR DIABETIC RETINOPATHY RECOGNITION
Cover Date: 2022-09-01
Cover Display Date: September 2022
DOI: 10.24507/icicelb.13.09.939
Description: Rapid diagnosis increases the chance of a patient being cured of symp-toms. This applies especially to diabetic diseases where there is a high risk of diabetic retinopathy, which will lead to blindness if not treated promptly. Artificial intelligent techniques are proposed to diagnose diabetic retinopathy. In this paper, we recognize diabetic retinopathy from retinal images using meta-learning Convolutional Neural Networks (CNNs). Before training state-of-the-art CNNs, data resampling methods were proposed to select training and validation sets, and then the CNNs were trained on the selected training data. The simple data augmentation techniques were applied when training the CNNs to increase the training data pattern. We compared two ensemble learning meth-ods: meta-learner and unweighted average, to show that the ensemble methods always performed better than when using a single CNN. The results showed that training the CNN model with the random data method outperformed other data resampling methods. However, data augmentation techniques did not present an outstanding result on diabetic retinopathy. In conclusion, the ensemble learning method using the meta-learner method resulted in the best accuracy when compared with unweighted average method. The proposed meta-learner CNNs achieved an accuracy of 86.32%.
Citations: 2
Aggregation Type: Journal
-------------------


Title: DYNAMIC FINGERSPELLING RECOGNITION FROM VIDEO USING DEEP LEARNING APPROACH: FROM DETECTION TO RECOGNITION
Cover Date: 2022-09-01
Cover Display Date: September 2022
DOI: 10.24507/icicelb.13.09.949
Description: The World Health Organization found that more than 34 million people suf-fer from hearing loss and these people need to use sign language to communicate. Hence, the sign language recognition system is proposed to communicate with hearing loss people and others. In this paper, we aim to propose an end-to-end system to recognize the dynamic Thai fingerspelling from video. The proposed system includes two main processes. First, we use the YOLOv5 algorithm for the human detection task. Subsequently, a uni-form distribution method is proposed to select the robust frames before applying robust frames to the detection algorithm. Second, we propose dynamic fingerspelling recognition that consists of two deep learning architectures: convolutional neural network (CNN) and long short-term memory (LSTM). We then combine CNN and LSTM, called CNN-LSTM architecture, followed by the recognition block. The recognition block comprises dropout, global average pooling, and softmax layers. For the CNN architectures, we evaluated three CNNs: MobileNetV2, ResNet50, and DenseNet201. We found that the proposed ResNet50-LSTM architecture achieved an accuracy of 88.42% on the test set of the dynamic Thai fingerspelling dataset and also prevented the overfitting problem.
Citations: 2
Aggregation Type: Journal
-------------------


Title: Effective Data Augmentation and Training Techniques for Improving Deep Learning in Plant Leaf Disease Recognition
Cover Date: 2022-07-01
Cover Display Date: 1 July 2022
DOI: 10.14416/j.asep.2021.01.003
Description: Plant disease is the most common problem in agriculture. Usually, the symptoms appear on leaves of the plants which allow farmers to diagnose and prevent the disease from spreading to other areas. An accurate and consistent plant disease recognition system can help to prevent the spread of diseases and to save maintenance costs. In this research, we present a plant leaf disease recognition system using two deep convolutional neural networks (CNNs); MobileNetV2 and NasNetMobile. These CNN architectures are designed to be suitable for smartphones due to the models being small. We have experimented on training techniques; online, offline, and mixed training techniques on two plant leaf diseases. As for data augmentation techniques, we found that the combination of rotation, shift, and zoom techniques significantly increases the performance of the CNN architectures. The experimental results show that the most accurate algorithm for plant leaf disease recognition is NASNetMobile architecture using transfer learning. Additionally, the most accurate result is obtained when combining the offline training technique with data augmentation techniques.
Citations: 56
Aggregation Type: Journal
-------------------


Title: MULTI-LANGUAGE VIDEO SUBTITLE RECOGNITION WITH CONVOLUTIONAL NEURAL NETWORK AND LONG SHORT-TERM MEMORY NETWORKS
Cover Date: 2022-06-01
Cover Display Date: June 2022
DOI: 10.24507/icicel.16.06.647
Description: Nowadays, many videos are published on Internet channels such as Youtube and Facebook. Many audiences, however, cannot understand the contents of the video, maybe due to the different languages and even hearing impairment. As a result, subtitles have been added to videos. In this paper, we proposed deep learning techniques, which are the combination between convolutional neural network (CNN) and long short-term memory (LSTM) networks, called CNN-LSTM, to recognize video subtitles. We created the simplified CNN architecture with 16 weight layers. The last layer of the CNN was downsampling using max-pooling before sending it to the LSTM network. We first trained our CNN-LSTM architecture on printed text data which contained various font styles, diverse font sizes, and complicated backgrounds. The connectionist temporal classification was then used as a loss function to calculate the loss value and decode the output of the network. For the video subtitle dataset, we collected 24 videos from Youtube and Facebook, containing Thai, English, Arabic, and Thai numbers. The dataset also contained 157 characters. In this dataset, we extracted 4,224 subtitle images from the videos. The proposed CNN-LSTM architecture achieved an average character error rate of 9.36%.
Citations: 3
Aggregation Type: Journal
-------------------


Title: STACKING ENSEMBLE OF LIGHTWEIGHT CONVOLUTIONAL NEURAL NETWORKS FOR PLANT LEAF DISEASE RECOGNITION
Cover Date: 2022-05-01
Cover Display Date: May 2022
DOI: 10.24507/icicel.16.05.521
Description: The high-grade quality of agricultural goods can be affected by diseases. Therefore, farmers need to quickly stop the spread of diseases. This study proposes a stacking ensemble of lightweight learning convolutional neural network (CNN) framework to enhance the recognition accuracy of plant leaf disease images. In the proposed framework, we first planned four lightweight CNN architectures (InceptionResNetV2, NASNetMobile, MobileNetV2, and EfficientNetB1) to train and create robust CNN models from images of plant leaf diseases. The experimental results showed that the EfficientNetB1 outperformed other CNN models. We then created the stacking ensemble learning by stacking the output probabilities of each CNN model and provided as output to train to create the second model using the machine learning classifier. In this step, we experimented with five classifiers that were logistic regression, support vector machine, K-nearest neighbors, random forest, and long short-term memory network. We found that the random forest method achieved a more accurate performance. As a result, we considered that all machine learning techniques could be involved in stacking ensemble learning.
Citations: 9
Aggregation Type: Journal
-------------------


Title: AN END-TO-END THAI FINGERSPELLING RECOGNITION FRAMEWORK WITH DEEP CONVOLUTIONAL NEURAL NETWORKS
Cover Date: 2022-05-01
Cover Display Date: May 2022
DOI: 10.24507/icicel.16.05.529
Description: The WHO reports that approximately 34 million people worldwide experience deafness and hearing loss. In 2050, these will increase to affect 900 million people. It is essential to communicate with the hearing impaired in hand sign language. This paper proposes an end-to-end fingerspelling recognition framework of the Thai sign language based on deep convolutional neural networks (CNNs). We divided our framework into two processes. In the first process, we focus on the detection of hands using the YOLOv3 objection detection framework. In the second process, we propose using five CNN architectures, MobileNetV2, DenseNet121, InceptionResNetV2, NASNetMobile, and EfficientNetB2, to create the most robust model that provides high recognition performance. Hence, we evaluated the proposed framework to detect and recognize three Thai fingerspelling (TFS) datasets: TFS, KKU-TFS, and Unseen-TFS. We found that YOLOv3 showed a high precision value on the TFS dataset. However, the worst performance was found with KKU-TFS and Unseen-TFS datasets. Also, our proposed framework could not detect hands from only one image on the KKU-TFS and Unseen-TFS datasets. Therefore, we also examined the CNN architectures to recognize the 1-stage Thai fingerspelling images. The experimental results showed that DenseNet121 obtained an accuracy of 93.99% on the TFS dataset and 90.40% on the KKU-TFS dataset.
Citations: 8
Aggregation Type: Journal
-------------------


Title: COMPARATIVE STUDY BETWEEN ENSEMBLE AND FUSION CONVOLUTIONAL NEURAL NETWORKS FOR DIABETIC RETINOPATHY CLASSIFICATION
Cover Date: 2022-04-01
Cover Display Date: April 2022
DOI: 10.24507/icicel.16.04.401
Description: In this paper, we have demonstrated the effectiveness of the fusion convolutional neural network (CNN) and ensemble CNN architectures for diabetic retinopathy classification. Due to the fusion and ensemble CNN architectures, we proposed to use five CNN architectures consisting of InceptionV3, ResNet50, ResNet50V2, Xception, and DenseNet121 to find the best CNN model. Two of the best CNN models were then selected for creating the fusion and ensemble CNN architectures. We also performed data augmentation techniques while training the CNN models. We found that the data augmentation technique can increase the accuracy of the CNNs. However, the data augmentation technique should not distort the retinal image. For the fusion CNNs, Xception and InceptionV3 were combined and then attached with two dense layers with the size of 1,024 units for each dense layer. Hence, we selected the optimal dropout value with 0.4. For the ensemble CNNs, the output probabilities that were calculated from the Xception and InceptionV3 models, were sent to the ensemble learning method. Using ensemble learning methods, we also compared the weighted and unweighted average methods. The results showed that the weighted average method outperformed the unweighted average method in all ensemble CNNs. From our experiments, we found that the fusion CNN architecture slightly outperformed ensemble CNN architecture.
Citations: 2
Aggregation Type: Journal
-------------------


Title: CycleAugment: Efficient data augmentation strategy for handwritten text recognition in historical document images
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.14456/easr.2022.50
Description: Predicting the sequence pattern of the handwritten text images is a challenging problem due to various writing styles, insufficient training data, and also background noise appearing in the text images. The architecture of the combination between convolutional neural network (CNN) and recurrent neural network (RNN), called CRNN architecture, is the most successful sequence learning method for handwritten text recognition systems. For handwritten text recognition in historical Thai document images, we first trained nine different CRNN architectures with both training from scratch and transfer learning techniques to find out the most powerful technique. We discovered that the transfer learning technique does not significantly outperform scratch learning. Second, we examined training the CRNN model by applying the basic transformation data augmentation techniques: shifting, rotation, and shearing. Indeed, the data augmentation techniques provided more accurate performance than without applying data augmentation techniques. However, it did not show significant results. The original training strategy aimed to find the global minima value and not always solve the overfitting problems. Third, we proposed a cyclical data augmentation strategy, called CycleAugment, to discover many local minima values and prevent overfitting. In each cycle, it rapidly decreased the training loss to reach the local minima. The CycleAugment strategy allowed the CRNN model to learn the input images with and without applying data augmentation techniques to learn from many input patterns. Hence, the CycleAugment strategy consistently achieved the best performance when compared with other strategies. Finally, we prevented image distortion by applying a simple technique to the short word images and achieved better performance on the historical Thai document image dataset.
Citations: 1
Aggregation Type: Journal
-------------------


Title: Fast and Accurate Deep Learning Architecture on Vehicle Type Recognition
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.55003/cast.2022.01.22.001
Description: Vehicle Type Recognition has a significant problem that happens when people need to search for vehicle data from a video surveillance system at a time when a license plate does not appear in the image. This paper proposes to solve this problem with a deep learning technique called Convolutional Neural Network (CNN), which is one of the latest advanced machine learning techniques. In the experiments, researchers collected two datasets of Vehicle Type Image Data (VTID I & II), which contained 1,310 and 4,356 images, respectively. The first experiment was performed with 5 CNN architectures (MobileNets, VGG16, VGG19, Inception V3, and Inception V4), and the second experiment with another 5 CNNs (MobileNetV2, ResNet50, Inception ResNet V2, Darknet-19, and Darknet-53) including several data augmentation methods. The results showed that MobileNets, when combine with the brightness augmented method, significantly outperformed other CNN architectures, producing the highest accuracy rate at 95.46%. It was also the fastest model when compared to other CNN networks.
Citations: 7
Aggregation Type: Journal
-------------------


Title: DeblurGAN-CNN: Effective Image Denoising and Recognition for Noisy Handwritten Characters
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1109/ACCESS.2022.3201560
Description: Many problems can reduce handwritten character recognition performance, such as image degradation, light conditions, low-resolution images, and even the quality of the capture devices. However, in this research, we have focused on the noise in the character images that could decrease the accuracy of handwritten character recognition. Many types of noise penalties influence the recognition performance, for example, low resolution, Gaussian noise, low contrast, and blur. First, this research proposes a method that learns from the noisy handwritten character images and synthesizes clean character images using the robust deblur generative adversarial network (DeblurGAN). Second, we combine the DeblurGAN architecture with a convolutional neural network (CNN), called DeblurGAN-CNN. Subsequently, two state-of-the-art CNN architectures are combined with DeblurGAN, namely DeblurGAN-DenseNet121 and DeblurGAN-MobileNetV2, to address many noise problems and enhance the recognition performance of the handwritten character images. Finally, the DeblurGAN-CNN could transform the noisy characters to the new clean characters and recognize clean characters simultaneously. We have evaluated and compared the experimental results of the proposed DeblurGAN-CNN architectures with the existing methods on four handwritten character datasets: n-THI-C68, n-MNIST, THI-C68, and THCC-67. For the n-THI-C68 dataset, the DeblurGAN-CNN achieved above 98% and outperformed the other existing methods. For the n-MNIST, the proposed DeblurGAN-CNN achieved an accuracy of 97.59% when the AWGN+Contrast noise method was applied to the handwritten digits. We have evaluated the DeblurGAN-CNN on the THCC-67 dataset. The result showed that the proposed DeblurGAN-CNN achieved an accuracy of 80.68%, which is significantly higher than the existing method, approximately 10%.
Citations: 34
Aggregation Type: Journal
-------------------


Title: Ensemble multiple CNNs methods with partial training set for vehicle image classification
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.14456/sehs.2022.12
Description: Convolutional neural networks (CNNs) are now the state-of-the-art method for several types of image recognition. One challenging problem is vehicle image classification. However, applying only a single CNNs model is difficult due to the weakness of each model. This problem can be solved by using the ensemble method. Using the power of multiple CNNs together helps increase the final output accuracy but is very time-consuming. This paper introduced the new ensemble multiple CNNs methods with a partial training set method. This method combined the advantages of the ensemble technique to increase the recognition accuracy and used the idea of a partial training set to decrease the time of the training process. Its performance helped decrease the time taken by more than 60% but it was still able to maintain a high accuracy score of 96.01%, compared to the full ensemble technique. These properties made it a good choice to compete with other single CNNs models.
Citations: 2
Aggregation Type: Journal
-------------------


Title: dropCyclic: Snapshot Ensemble Convolutional Neural Network Based on a New Learning Rate Schedule for Land Use Classification
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1109/ACCESS.2022.3180844
Description: The ensemble learning method is a necessary process that provides robustness and is more accurate than the single model. The snapshot ensemble convolutional neural network (CNN) has been successful and widely used in many domains, such as image classification, fault diagnosis, and plant image classification. The advantage of the snapshot ensemble CNN is that it combines the cyclic learning rate schedule in the algorithm to snap the best model in each cycle. In this research, we proposed the dropCyclic learning rate schedule, which is a step decay to decrease the learning rate value in every learning epoch. The dropCyclic can reduce the learning rate and find the new local minimum in the subsequent cycle. We evaluated the snapshot ensemble CNN method based on three learning rate schedules: cyclic cosine annealing, max-min cyclic cosine learning rate scheduler, and dropCyclic then using three backbone CNN architectures: MobileNetV2, VGG16, and VGG19. The snapshot ensemble CNN methods were tested on three aerial image datasets: UCM, AID, and EcoCropsAID. The proposed dropCyclic learning rate schedule outperformed the other learning rate schedules on the UCM dataset and obtained high accuracy on the AID and EcoCropsAID datasets. We also compared the proposed dropCyclic learning rate schedule with other existing methods. The results show that the dropCyclic method achieved higher classification accuracy compared with other existing methods.
Citations: 20
Aggregation Type: Journal
-------------------


Title: Data Mining Approaches in Personal Loan Approval
Cover Date: 2022-01-01
Cover Display Date: January-June 2022
DOI: 10.14456/mijet.2022.2
Description: The approval of a bank's credit for an individual loan requires the fulfillment of several requirements, such as bank credit policy, loan amount, the purpose of the loan, and repayment ability. However, every type of credit is subject to the risk of non-repayment and non-performing loans, which affect the liquidity of the bank's operation. This research studied the application of data mining techniques to identify key factors for the loan decisions of a bank. The main objective was to compare the data mining process of personal loan approval process with and without feature selection techniques. For the experiments, the first step was to create the data mining models using three methods, including support vector machine (SVM), multi-layer perceptron (MLP), and decision tree. The results showed that the SVM method outperformed other data mining methods. Second, we experimented with feature selection techniques consisting of Chi-square and information gain. The Chi-square considered the ten factors, while information gain selected the best three factors. The experimental results showed that the Chi-square and information gain combined with the MLP method obtained an accuracy rate of 90.40% and 91.70%, respectively. Therefore, this research concluded that the SVM classifier without combining the feature selection method is the best method to use in personal credit evaluation.
Citations: 3
Aggregation Type: Journal
-------------------


Title: Preface
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: N/A
Description: N/A
Citations: 0
Aggregation Type: Book Series
-------------------


Title: Sentiment Analysis of Local Tourism in Thailand from YouTube Comments Using BiLSTM
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1007/978-3-031-20992-5_15
Description: Currently, social networks, where people can express their opinion through content and comments, are fast developing and affect various areas of daily life; Particularly, some research on YouTube travel channels found that almost tourists and audiences leave comments about their attitudes to that place. Thus, mining the emotional recognition of comments through artificial intelligence can bring knowledge about the tourists’ general view. This article analyzes the relationship(s) between social media use and its effect on community-based tourism in Thailand using the Social Media Sensing framework (S-Sense) as sentiment analysis and the Bidirectional Long Short-Term Memory (BiLSTM) methods to analyze the text comments. This research collected 51,280 comments on 114 Youtube Videos, which are tourist attractions in various provinces in Thailand. The approach categorizes attractions based on sentiment analysis of 60% or more, including restaurants, markets, historical sites, temples, or natural attractions. The results show that 67.51% of the 19,391 clean-processed comments were satisfied with those attraction places. Therefore S-Sense and BiLSTM models can be sufficient to analyze the attitude of comments about attraction places with from 43 to remain 33 keywords of 1,603 comments. Furthermore, the offered sentiment analysis method has higher precision, recall, and F1 scores.
Citations: 4
Aggregation Type: Book Series
-------------------


Title: Water Quality Assessment in the Lam Pa Thao Dam, Chaiyaphum, Thailand with K-Means Clustering Algorithm
Cover Date: 2021-09-01
Cover Display Date: 1 September 2021
DOI: 10.1109/RI2C51727.2021.9559811
Description: Water resource management is one of the biggest challenges that are being faced, such as a warming climate, arid land, and toxic chemicals in the water. It is essential to deal with water resource management urgently. In this article, researchers mainly focus on monitoring the water quality in the Lam Pa Thao dam, Chaiyaphum, Thailand. The farmer in that area directly affected by the water quality in the dam because they raise fish in floating fish cages. To prevent losses from fish farming, they should have the ability to monitor and control the factors that affect the water quality. As a result, the farmer can monitor the water quality and the monitor system can report to the farmer in time. In this case, to monitor the water quality, researchers designed the buoys, which is the internet of things device, to collect data from the Lam Pa Thao dam. researchers collected the water quality data from January - March 2021, including 13, 608 instances. The five important parameters were obtained, including dissolved oxygen, temperature, pH, total dissolved solids, and electric conductivity. Due to the number of parameters, researchers decided not to apply dimension reduction. In these experiments, researchers proposed using K-means clustering algorithms to group the water data into appropriate clusters. For the K-Means algorithm, we calculated the silhouette coefficient to analyze the effectiveness of cluster separation. The best cluster that was grouped using the K-means algorithm achieved the silhouette score of 0.6839. Furthermore, researchers evaluated the K-means algorithm on Charles river and Fitzroy river datasets. It obtained the silhouette score of 0.5489 and 0.6589, respectively.
Citations: 2
Aggregation Type: Conference Proceeding
-------------------


Title: Semi-Automated Mushroom Cultivation House using Internet of Things
Cover Date: 2021-07-01
Cover Display Date: July - December 2021
DOI: 10.14456/mijet.2021.24
Description: This research presents an application of the internet of things (IoT) technology. The technology is responsible for checking the temperature and humidity in a mushroom cultivation house and the operation of the IoT control box. It is a semi-automated system that does not rely on farmers' labor. The system can be checked and operated through an application that is installed on the farmer’s smartphone. In the case of offline operation, the system can be controlled manually by farmers. We designed a software and control system for the IoT control box with concern for the needs of farmers. Therefore, we can develop a suitable IoT control box that can be following farmers' needs. The farmer used the application for four months before their satisfaction was evaluated. The results showed that the semi-automated system obtained a high satisfaction rate towards system. However, when asked about “The value in using the internet of things technology to control the mushroom cultivation,” The satisfaction was on level 4 because of the high investment cost, including monthly internet cost. That cost might increase the overall production cost. If farmers want to reduce the monthly internet cost, the application architecture will cut the data transmission process via the cloud-connected to smartphones. The application is designed to be controlled through the IoT control box. The control system will be able to work automatically and manually.
Citations: 6
Aggregation Type: Journal
-------------------


Title: Ensemble methods with deep convolutional neural networks for plant leaf recognition
Cover Date: 2021-06-01
Cover Display Date: June 2021
DOI: 10.24507/icicel.15.06.553
Description: Recognition of plant leaves and diseases from images is a challenging task in computer vision and machine learning. This is because various problems directly affect the performance of the system, such as the leaf structure, differences of the intra-class, similarity of shape between inter-class, perspective of the image, and even recording time. In this paper, we propose the ensemble convolutional neural network (CNN) method to tackle these issues and improve plant leaf recognition performance. We trained five CNN models: MobileNetV1, MobileNetV2, NASNetMobile, DenseNet121, and Xception, accordingly to discover the best CNN based model. Ensemble methods, unweighted average, weighted average, and unweighted majority vote methods were then applied to the CNN output probabilities of each model. We have evaluated these ensemble CNN methods on a mulberry leaf dataset and two leaf disease datasets: tomato and corn leaf disease. As a result, the individual CNN model shows that MobileNetV2 outperforms every CNN model with an accuracy of 91.19% on the mulberry leaf dataset. The Xception combined with data augmentation techniques (Height Shift+Vertical Flip+Fill Mode) obtains an accuracy of 91.77%. We achieved very high accuracy above 99% from the DenseNet121 and Xception models on the leaf disease datasets. For the ensemble CNNs method, we selected the based models according to the best CNN models and predicted the output of each CNN with the weighted average ensemble method. The results showed that 3-Ensemble CNNs (3-EnsCNNs) performed better on plant leaf disease datasets, while 5-EnsCNNs outperforms on the mulberry leaf dataset. Surprisingly, the data augmentation technique did not affect the ensemble CNNs on the mulberry leaf and corn leaf disease datasets. On the other hand, application of data augmentation was slightly better than without only on the tomato leaf disease dataset.
Citations: 30
Aggregation Type: Journal
-------------------


Title: Ensemble convolutional neural network architectures for land use classification in economic crops aerial images
Cover Date: 2021-06-01
Cover Display Date: June 2021
DOI: 10.24507/icicel.15.06.531
Description: The analysis of land use and land cover is a task of remote sensing and geographic information systems. Nowadays, deep learning techniques can analyze land use and land cover with high performance. In this paper, we focus on the classification of land use for Thailand's economic crops based on the convolutional neural network (CNN) technique. We evaluated the ensemble CNN framework on Thailand's economic crops aerial image dataset called the EcoCropsAID dataset. Five economic crops categories, including rice, sugarcane, cassava, rubber, and longan, were collected using the Google Earth program. Economic crops aerial images obtained between 2014 and 2018 were considered. There were 5,400 images with approximately 1,000 images per class. Due to the ensemble CNN framework, we first proposed to use eight pre-trained CNN models consisting of InceptionResNetV2, MobileNetV2, DenseNet201, Xception, ResNet152V2, NASNetLarge, VGG16, and VGG19 to discover the best baseline CNN model. Second, three simplistic data augmentation techniques (rotation, width shift, and height shift) are applied to increasing the accuracy of the CNN models. As a result, we found that the three best models were VGG16, VGG19, and NASNetLarge architectures, respectively. Finally, we created an ensemble CNN framework that consisted of 3 CNNs based on the best CNN models. We also compared three ensemble methods, that were weighted average, unweighted average, and unweighted majority vote. From our experiments, the results show that the VGG16 outperforms other CNN models. Consequently, the classification performance on Thailand's economic crops aerial image dataset was significantly improved when the weighted average ensemble method was employed.
Citations: 21
Aggregation Type: Journal
-------------------


Title: Enhancement of plant leaf disease classification based on snapshot ensemble convolutional neural network
Cover Date: 2021-06-01
Cover Display Date: June 2021
DOI: 10.24507/icicel.15.06.669
Description: Plant diseases are one of the most serious issues that can decrease the value and volume of plant goods. It is time-consuming for farmers to discover and identify the disease by observing the leaves of plants, even with specialists scientists and labora- tory processes. This study proposed the deep learning approach to address the real-world problems that are contained in the PlantDoc dataset. The deep learning method aims to classify plant leaf disease images from the PlantDoc dataset. First, four state-of-the-art convolutional neural networks (CNNs): VGG16, MobileNetV2, InceptionResNetV2, and DenseNet201, were proposed to enhance the plant leaf disease classification performance. As a result, for the baseline CNN model, DenseNet201 showed better performance with an accuracy of 67.18%, while the second-best CNN model was the InceptionResNetV2 with an accuracy of 61.75%. In addition, the data augmentation techniques (rotation, zoom, brightness, cutout, and mixup) were combined in the training process. The InceptionRes- NetV2 when combined with the rotation technique obtained an accuracy of 66.02% and outperformed all other CNNs. Importantly, based on our experimental results, the data augmentation techniques with brightness, cutout, and mixup were less satisfactory on the PlantDoc dataset. Second, we proposed the snapshot ensemble to improve the perfor- mance of the CNN models. We evaluated the classification performance by applying the snapshot ensemble with 4 and 5 cosine annealing cycles and optimized the learning rate using a stochastic gradient descent algorithm. We also examined the snapshot ensemble with the weighted and unweighted ensemble methods. The experimental results showed that the DenseNet201 when training with the snapshot ensemble method (4-cycle) ob- tained the accuracy of 69.51%.
Citations: 13
Aggregation Type: Journal
-------------------


Title: Feature Extraction Efficient for Face Verification Based on Residual Network Architecture
Cover Date: 2021-01-01
Cover Display Date: 2021
DOI: 10.1007/978-3-030-80253-0_7
Description: Face verification systems have many challenges to address because human images are obtained in extensively variable conditions and in unconstrained environments. Problem occurs when capturing the human face in low light conditions, at low resolution, when occlusions are present, and even different orientations. This paper proposes a face verification system that combines the convolutional neural network and max-margin object detection called MMOD + CNN, for robust face detection and a residual network with 50 layers called ResNet-50 architecture to extract the deep feature from face images. First, we experimented with the face detection method on two face databases, LFW and BioID, to detect human faces from an unconstrained environment. We obtained face detection accuracy > 99.5% on the LFW and BioID databases. For deep feature extraction, we used the ResNet-50 architecture to extract 2,048 deep features from the human face. Second, we compared the query face image with the face images from the database using the cosine similarity function. Only similarity values higher than 0.85 were considered. Finally, the top-1 accuracy was used to evaluate the face verification. We achieved an accuracy of 100% and 99.46% on IMM frontal face and IMM face databases, respectively.
Citations: 2
Aggregation Type: Book Series
-------------------


Title: Deep feature extraction technique based on conv1d and lstm network for food image recognition
Cover Date: 2021-01-01
Cover Display Date: 2021
DOI: 10.14456/easr.2021.60
Description: There is a global increase in health awareness. The awareness of changing eating habits and choosing foods wisely are key factors that make for a healthy life. In order to design a food image recognition system, many food images were captured from a mobile device but sometimes include non-food objects such as people, cutlery, and even food decoration styles, called noise food images. These issues decreased the performance of the system. Convolutional neural network (CNN) architectures are proposed to address this issue and obtain good performance. In this study, we proposed to use the ResNet50-LSTM network to improve the efficiency of the food image recognition system. The state-of-the-art ResNet architecture was invented to extract the robust features from food images and was employed as the input data for the Conv1D combined with a long short-term memory (LSTM) network called Conv1D-LSTM. Then, the output of the LSTM was assigned to the global average pooling layer before passing to the softmax function to create a probability distribution. While training the CNN model, mixed data augmentation techniques were applied and increased by 0.6%. The results showed that the ResNet50+Conv1D-LSTM network outperformed the previous works on the Food-101 dataset. The best performance of the ResNet50+Conv1D-LSTM network achieved an accuracy of 90.87%.
Citations: 13
Aggregation Type: Journal
-------------------


Title: Optimal weighted parameters of ensemble convolutional neural networks based on a differential evolution algorithm for enhancing pornographic image classification
Cover Date: 2021-01-01
Cover Display Date: 2021
DOI: 10.14456/easr.2021.58
Description: Use of ensemble convolutional neural networks (CNNs) has become a more robust strategy to improve image classification performance. However, the success of the ensemble method depends on appropriately selecting the optimal weighted parameters. This paper aims to automatically optimize the weighted parameters using the differential evolution (DE) algorithm. The DE algorithm is applied to the weighted parameters and then assigning the optimal weighted to the ensemble method and stacked ensemble method. For the ensemble method, the weighted average ensemble method is applied. For the stacked ensemble method, we use the support vector machine for the second-level classifier. In the experiments, firstly, we experimented with discovering the baseline CNN models and found the best models on the pornographic image dataset were NASNetLarge with an accuracy of 93.63%. Additionally, three CNN models, including EfficientNetB1, InceptionResNetV2, and MobileNetV2, also obtained an accuracy above 92%. Secondly, we generated two ensemble CNN frameworks; the ensemble learning method, called Ensemble-CNN and the stacked ensemble learning method, called StackedEnsemble-CNN. In the framework, we optimized the weighted parameter using the DE algorithm with six mutation strategies containing rand/1, rand/2, best/1, best/2, current to best/1, and random to best/1. Therefore, the optimal weighted was given to classify using ensemble and stacked ensemble methods. The result showed that the Ensemble-3CNN and StackedEnsemble-3CNN, when optimized using the best/2 mutation strategy, surpassed other mutation strategies with an accuracy of 96.83%. The results indicated that we could create the learning method framework with only 3 CNN models, including NASNetLarge, EfficientNetB1, and InceptionResNetV2.
Citations: 9
Aggregation Type: Journal
-------------------


Title: Improving Recognition of Thai Handwritten Characters with Deep Convolutional Neural Networks
Cover Date: 2020-03-19
Cover Display Date: 19 March 2020
DOI: 10.1145/3388176.3388181
Description: For handwritten character recognition, a common problem is that each writer has unique handwriting for each character (e.g. stroke, head, loop, and curl). The similarities of handwritten characters in each language is also a problem. These similarities have led to recognition mistakes. This research compared deep Convolutional Neural Networks (CNNs) which were used for handwriting recognition in the Thai language. CNNs were tested with the THI-C68 dataset. This research also compared two training methods, Train from scratch and Transfer learning, by using VGGNet-19 and Inception-ResNet-v2 architectures. The results showed that VGGNet-19 architecture with transfer learning can reduce learning time. Moreover, it also increased recognition efficiency up to 99.20% when tested with 10-fold cross-validation.
Citations: 10
Aggregation Type: Conference Proceeding
-------------------


Title: Food Image Classification with Improved MobileNet Architecture and Data Augmentation
Cover Date: 2020-03-19
Cover Display Date: 19 March 2020
DOI: 10.1145/3388176.3388179
Description: The real-world food image is a challenging problem for food image classification, because food images can be captured from different perspective and patterns. Also, many objects can appear in the image, not just foods. To recognize food images, in this paper, we propose a modified MobileNet architecture that is applies the global average pooling layers to avoid overfitting the food images, batch normalization, rectified linear unit, dropout layers, and the last layer is softmax. The state-of-the-art and the proposed MobileNet architectures are trained according to the fine-tuned model. The experimental results show that the proposed version of the MobileNet architecture achieves significantly higher accuracies than the original MobileNet architecture. The proposed MobileNet architecture significantly outperforms other architectures when the data augmentation techniques are combined.
Citations: 57
Aggregation Type: Conference Proceeding
-------------------


Title: Comparative Study between Texture Feature and Local Feature Descriptors for Silk Fabric Pattern Image Recognition
Cover Date: 2020-03-19
Cover Display Date: 19 March 2020
DOI: 10.1145/3388176.3388201
Description: Thai silk fabrics have unique patterns in different regions of Thailand. The designers may have been inspired and took ideas from the natural environment to create new silk patterns. Hence, many new silk patterns are modified from the original silk pattern. It is challenging for people to recognize a pattern without any prior knowledge and expertise. This paper aims to present a comparative study between texture feature and local feature descriptor for silk pattern image recognition. First, two feature extraction techniques: texture feature and local feature descriptors are proposed to create robustness features from sub-regions that are divided by the grid-based method. Second, the robust features are then classified using the well-known and effective classifier algorithms: K-nearest neighbor (KNN) and support vector machine (SVM) with the radial basis function. We experimented with silk pattern image recognition on two silk fabric pattern image datasets: the Silk-Pattern and Silk-Diff-Pattern. The evaluation results show that the texture feature called the local binary pattern (LBP) when combined with the KNN and SVM algorithms outperforms other feature extraction methods, even deep learning architectures.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------


Title: Instance Segmentation of Water Body from Aerial Image using Mask Region-based Convolutional Neural Network
Cover Date: 2020-03-19
Cover Display Date: 19 March 2020
DOI: 10.1145/3388176.3388184
Description: Land use is constantly changing, and water plays a critical role in the process. If changes are noticed quickly or are predictable, land use planning and policies can be devised to mitigate almost any problem. Accordingly, researchers present a mask region-based convolutional neural network (Mask R-CNN) for water body segmentation from aerial images. The system's Aerial image water resources dataset (AIWR) was tested. The AIWR areas were agricultural and lowland areas that require rainwater for farming. Many wells were spotted throughout the agricultural areas. The AIWR dataset presents two types of data: natural water bodies and artificial water bodies. The two different areas appear as aerial area images that are different in color, shape, size, and similarity. A pre-trained model of Mask R-CNN was used to reduce network learning time. ResNet-101 was used as backbone architecture. The information gathered in the learning process is limited, and only 720 pictures were produced, Researchers used data augmentation to increase the amount of information for training by using affine image transformation, including scale, translation, rotation, and shear. The experiment found that mask R-CNN architecture can specify the position of the water surface. Measuring method in this case is mAP value. The mAP value is at 0.30 without data augmentation. However, if using the R-CNN mask with data augmentation, the mAP value increased to 0.59.
Citations: 3
Aggregation Type: Conference Proceeding
-------------------


Title: Plant Leaf Image Recognition using Multiple-grid Based Local Descriptor and Dimensionality Reduction Approach
Cover Date: 2020-03-19
Cover Display Date: 19 March 2020
DOI: 10.1145/3388176.3388180
Description: The identification process of plant species is one of the significant and challenging problems. In this research area, many researchers have focused on identifying the plant leaf images because the leaves of a plant are found almost all year round. The achieve method of the plant leaf image recognition is based on unique extraction features from the plant leaf and using the well-known machine learnings as a classification method. As a result, recognition accuracy was often not very high. In order to improve recognition accuracy, we proposed a multiple grids technique based on the local descriptors and dimensionality reduction. Firstly, we divided the plant leaf image according to grid size and calculated the local descriptors from each grid. Secondly, the dimensionality reduction is proposed to transform and decrease the correlated variables of the feature vector. Finally, the feature vector with a relatively low-dimensional is transferred to the machine learning techniques, which are the support vector machine and multi-layer perceptron algorithms. We have evaluated and compared the proposed algorithm with the bag of visual words method and the deep convolutional neural network (including AlexNet and GoogLeNet architectures) on the Folio leaf image dataset. The experiments show that the proposed algorithm has improved and obtained very high accuracy on plant leaf image recognition.
Citations: 5
Aggregation Type: Conference Proceeding
-------------------


Title: Gender Recognition from Facial Images using Local Gradient Feature Descriptors
Cover Date: 2019-10-01
Cover Display Date: October 2019
DOI: 10.1109/iSAI-NLP48611.2019.9045689
Description: Local gradient feature descriptors have been proposed to calculate the invariant feature vector. These local gradient methods are very fast to compute the feature vector and achieved very high recognition accuracy when combined with the support vector machine (SVM) classifier. Hence, they have been proposed to solve many problems in image recognition, such as the human face, object, plant, and animal recognition. In this paper, we propose the use of the Haar-cascade classifier for the face detection and the local gradient feature descriptors combined with the SVM classifier to solve the gender recognition problem. We detected 4, 624 face images from the ColorFERET dataset. The face images data used in gender recognition included 2, 854 male and 1, 770 female images, respectively. We divided the dataset into train and test set using 2-fold and 10-fold cross-validation. First, we experimented on 2-fold cross-validation, the results showed that the histogram of oriented gradient (HOG) descriptor outperforms the scale-invariant feature transform (SIFT) descriptor when combined with the support vector machine (SVM) algorithm. The accuracy of the HOG+SVM and the SIFT+SVM were 96.50% and 95.98%. Second, we experimented on 10-fold cross-validation and the SIFT+SVM showed high performance with an accuracy of 99.20%. We discovered that the SIFT+SVM method needed more training data when creating the model. On the other hand, the HOG+SVM method provided better accuracy when the training data was insufficient.
Citations: 8
Aggregation Type: Conference Proceeding
-------------------


Title: Develop the Framework Conception for Hybrid Indoor Navigation for Monitoring inside Building using Quadcopter
Cover Date: 2019-10-01
Cover Display Date: October 2019
DOI: 10.1109/iSAI-NLP48611.2019.9045445
Description: Building security is crucial, but guards and CCTV may be inadequate for monitoring all areas. A quadcopter (drone) with manual and autonomous control was used in a trial mission in this project. Generally, all drones can stream live video and take photos. They can also be adapted to assist better decision-making in emergencies that occur inside a building. In this paper, we show how to improve a quadcopter's ability to fly indoors, detect obstacles and react appropriately. This paper represents a new conceptual framework of hybrid indoor navigation ontology that analyzes a regular indoor route, including detection and avoidance of obstacles for the auto-pilot. An experiment with the system demonstrates improvements that occur in building surveillance and maintaining real-Time situational awareness. The immediate objective is to show that the drone can serve as a reliable tool in security operations in a building environment.
Citations: 3
Aggregation Type: Conference Proceeding
-------------------


Title: Effective Face Verification Systems Based on the Histogram of Oriented Gradients and Deep Learning Techniques
Cover Date: 2019-10-01
Cover Display Date: October 2019
DOI: 10.1109/iSAI-NLP48611.2019.9045237
Description: In this paper, we proposed a face verification method. We experiment with a histogram of oriented gradients description combined with the linear support vector machine (HOG+SVM) as for the face detection. Subsequently, we applied a deep learning method called ResNet-50 architecture in face verification. We evaluate the performance of the face verification system on three well-known face datasets (BioID, FERET, and ColorFERET). The experimental results are divided into two parts; face detection and face verification. First, the result shows that the HOG+SVM performs very well on the face detection part and without errors being detected. Second, The ResNet-50 and FaceNet architectures perform best and obtain 100% accuracy on the BioID and FERET dataset. They also, achieved very high accuracy on ColorFERET dataset.
Citations: 1
Aggregation Type: Conference Proceeding
-------------------


Title: Tracking people and objects with an autonomous unmanned aerial vehicle using face and color detection
Cover Date: 2019-04-15
Cover Display Date: 15 April 2019
DOI: 10.1109/ECTI-NCON.2019.8692269
Description: We propose a people and object tracking algorithm for an autonomous unmanned aerial vehicle (UAV). It uses as a surveillance camera and can move anywhere. The camera from UAV is not fixed l ocation a s c losed-circuit t elevision. T he face detection and objection detection are applied to support our proposed. In this research, the UAV model for this paper was AR-Drone 2.0. It has a constraint on the front camera because it has fixed the position of view and cannot change the view during flight. W e d esigned t wo e xperiments. F irst, t he f ace detection using images and applied to the popularity of the face detection, is a Haar-cascade classifier and max-margin object detection with convolutional neural network based features because they have high precision in analysis. Second, color detection system, which only focuses on the color of objects which can developed as an obstacle detection system. The results of the experiment can be accepted to adapt to tracking people and objects in the smart-city.
Citations: 11
Aggregation Type: Conference Proceeding
-------------------


Title: A machine learning approach for detecting distributed denial of service attacks
Cover Date: 2019-04-15
Cover Display Date: 15 April 2019
DOI: 10.1109/ECTI-NCON.2019.8692243
Description: This research aims to present the method for identifying distributed denial of service (DDoS) attacks. Two benchmark dataset, including KDD CUP 1999 and NSL-KDD, were used. The dataset were checked and deleted duplicate data. After the process, the amount of records of KDD Cup 1999 dataset were decreased from 4,898,431 records to 529,655 records, and the amount of records of NSL-KDD dataset were decreased from 125,373 to only 12,354 records. The reduction of the records always happened because of the characteristics of DDoS attacks which send repeated data to the victims' server. The researchers converted alphabet data to numeric data, then training by K-nearest neighbor (KNN), multi-layer perceptron and support vector machine. The result showed that KNN was the best method to identify the DDoS attacks.
Citations: 13
Aggregation Type: Conference Proceeding
-------------------


Title: Recognizing pornographic images using deep convolutional neural networks
Cover Date: 2019-04-15
Cover Display Date: 15 April 2019
DOI: 10.1109/ECTI-NCON.2019.8692296
Description: In this paper, we propose to use deep convolutional neural network (CNN) architectures, namely the deep residual networks (ResNet), the GoogLeNet, the AlexNet, and the AlexNet architectures, for pornographic image dataset. Also, the local descriptors, namely the local binary patterns (LBP), the histogram of oriented gradients, and the scale invariant feature transform (SIFT) combined with a support vector machine (SVM), a multilayer perceptron (MLP), or a K-nearest neighbor (KNN) techniques are proposed. Additionally, a bag of visual words (BOW) and the BOW using extracted HOG features (HOG-BOW) are compared. To classify the pornographic images, we compare the CNN architectures to well-known local descriptor techniques combined with the SVM, the MLP, and the MLP methods. Experimental results indicate that the ResNet architecture yields higher accuracies than all other approaches.
Citations: 14
Aggregation Type: Conference Proceeding
-------------------


Title: Factors Influencing the Adoption of Agricultural Management Information Systems in Thailand
Cover Date: 2018-07-02
Cover Display Date: 2 July 2018
DOI: 10.1109/TIMES-iCON.2018.8621831
Description: In order to implement information and communication technology (ICT) successfully, it is important to understand the underlying factors that influence Agricultural adoption. Therefore, this research intends to study this perspective of factors that influence and impact successful ICT adoption and related agricultural performance. Case study and survey methodology was adopted for this research. Case studies in two Thai-organizations were carried out. The results of the two main case studies suggested 22 factors that may have an impact on ICT adoption in agriculture in Thailand, which led to the development of the preliminary framework. Next, a survey instrument was developed based on the findings from case studies. Survey questionnaires were gathered from 481 respondents from two large-scale surveys were sent to selected members of Thailand farmer, and Thailand computer to test the research framework. The results indicate that the top five critical factors for ensuring ICT adoption in agricutural were: 1) cost of ICT 2) software 3) tranning and education 4) farmer attitude to the use of IT, and 5) skill development in ICT. Therefore, it is now clear which factors are influencing ICT adoption and which of those factors are critical success factors for ensuring ICT adoption in agricultural organization.
Citations: 8
Aggregation Type: Conference Proceeding
-------------------


Title: Comparative study between deep learning and bag of visual words for wild-animal recognition
Cover Date: 2017-02-09
Cover Display Date: 9 February 2017
DOI: 10.1109/SSCI.2016.7850111
Description: Most research in image classification has focused on applications such as face, object, scene and character recognition. This paper examines a comparative study between deep convolutional neural networks (CNNs) and bag of visual words (BOW) variants for recognizing animals. We developed two variants of the bag of visual words (BOW and HOG-BOW) and examine the use of gray and color information as well as different spatial pooling approaches. We combined the final feature vectors extracted from these BOW variants with a regularized L2 support vector machine (L2-SVM) to distinguish between classes within our datasets. We modified existing deep CNN architectures: AlexNet and GoogleNet, by reducing the number of neurons in each layer of the fully connected layers and last inception layer for both scratch and pre-trained versions. Finally, we compared the existing CNN methods, our modified CNN architectures and the proposed BOW variants on our novel wild-animal dataset (Wild-Anim). The results show that the CNN methods significantly outperform the BOW techniques.
Citations: 41
Aggregation Type: Conference Proceeding
-------------------


Title: Comparing local descriptors and bags of visualwords to deep convolutional neural networks for plant recognition
Cover Date: 2017-01-01
Cover Display Date: 2017
DOI: 10.5220/0006196204790486
Description: The use of machine learning and computer vision methods for recognizing different plants from images has attracted lots of attention from the community. This paper aims at comparing local feature descriptors and bags of visual words with different classifiers to deep convolutional neural networks (CNNs) on three plant datasets; AgrilPlant, LeafSnap, and Folio. To achieve this, we study the use of both scratch and fine-tuned versions of the GoogleNet and the AlexNet architectures and compare them to a local feature descriptor with k-nearest neighbors and the bag of visual words with the histogram of oriented gradients combined with either support vector machines and multi-layer perceptrons. The results shows that the deep CNN methods outperform the hand-crafted features. The CNN techniques can also learn well on a relatively small dataset, Folio.
Citations: 100
Aggregation Type: Conference Proceeding
-------------------


Title: Evaluating automatically parallelized versions of the support vector machine
Cover Date: 2016-05-01
Cover Display Date: 1 May 2016
DOI: 10.1002/cpe.3413
Description: The support vector machine (SVM) is a supervised learning algorithm used for recognizing patterns in data. It is a very popular technique in machine learning and has been successfully used in applications such as image classification, protein classification, and handwriting recognition. However, the computational complexity of the kernelized version of the algorithm grows quadratically with the number of training examples. To tackle this high computational complexity, we have developed a directive-based approach that converts a gradient-ascent based training algorithm for the CPU to an efficient graphics processing unit (GPU) implementation. We compare our GPU-based SVM training algorithm to the standard LibSVM CPU implementation, a highly optimized GPU-LibSVM implementation, as well as to a directive-based OpenACC implementation. The results on different handwritten digit classification datasets demonstrate an important speed-up for the current approach when compared to the CPU and OpenACC versions. Furthermore, our solution is almost as fast and sometimes even faster than the highly optimized CUBLAS-based GPU-LibSVM implementation, without sacrificing the algorithm's accuracy.
Citations: 10
Aggregation Type: Journal
-------------------


Title: Recognition of handwritten characters using local gradient feature descriptors
Cover Date: 2015-10-01
Cover Display Date: 1 October 2015
DOI: 10.1016/j.engappai.2015.07.017
Description: In this paper we propose to use local gradient feature descriptors, namely the scale invariant feature transform keypoint descriptor and the histogram of oriented gradients, for handwritten character recognition. The local gradient feature descriptors are used to extract feature vectors from the handwritten images, which are then presented to a machine learning algorithm to do the actual classification. As classifiers, the k-nearest neighbor and the support vector machine algorithms are used. We have evaluated these feature descriptors and classifiers on three different language scripts, namely Thai, Bangla, and Latin, consisting of both handwritten characters and digits. The results show that the local gradient feature descriptors significantly outperform directly using pixel intensities from the images. When the proposed feature descriptors are combined with the support vector machine, very high accuracies are obtained on the Thai handwritten datasets (character and digit), the Latin handwritten datasets (character and digit), and the Bangla handwritten digit dataset.
Citations: 81
Aggregation Type: Journal
-------------------


Title: Robust face recognition by computing distances from multiple histograms of oriented gradients
Cover Date: 2015-01-01
Cover Display Date: 2015
DOI: 10.1109/SSCI.2015.39
Description: The Single Sample per Person Problem is a challenging problem for face recognition algorithms. Patch-based methods have obtained some promising results for this problem. In this paper, we propose a new face recognition algorithm that is based on a combination of different histograms of oriented gradients (HOG) which we call Multi-HOG. Each member of Multi-HOG is a HOG patch which belongs to a grid structure. To recognize faces, we create a vector of distances computed by comparing train and test face images. After this, a distance calculation method is employed to calculate the final distance value between a test and a reference image. We describe here two distance calculation methods: mean of minimum distances (MMD) and a multi-layer perceptron based distance (MLPD) method. To cope with aligning difficulties, we also propose another technique which finds the most similar regions for two images compared. We call it the most similar region selection algorithm (MSRS). The regions found by MSRS are given to the algorithms we proposed. Our results show that, while MMD and MLPD contribute to obtaining much higher accuracies than the use of a single histogram of oriented gradients, combining them with the most similar region selection algorithm results in state-of-The-Art performances.
Citations: 27
Aggregation Type: Conference Proceeding
-------------------


Title: In-plane rotational alignment of faces by eye and eye-pair detection
Cover Date: 2015-01-01
Cover Display Date: 2015
DOI: 10.5220/0005308303920399
Description: In face recognition, face rotation alignment is an important part of the recognition process. In this paper, we present a hierarchical detector system using eye and eye-pair detectors combined with a geometrical method for calculating the in-plane angle of a face image. Two feature extraction methods, the restricted Boltzmann machine and the histogram of oriented gradients, are compared to extract feature vectors from a sliding window. Then a support vector machine is used to accurately localize the eyes. After the eye coordinates are obtained through our eye detector, the in-plane angle is estimated by calculating the arc-tangent of horizontal and vertical parts of the distance between left and right eye center points. By using this calculated in-plane angle, the face is subsequently rotationally aligned. We tested our approach on three different face datasets: IMM, Labeled Faces in the Wild (LFW) and FERET. Moreover, to compare the effect of rotational aligning on face recognition performance, we performed experiments using a face recognition method using rotationally aligned and non-aligned face images from the IMM dataset. The results show that our method calculates the in-plane rotation angle with high precision and this leads to a significant gain in face recognition performance.
Citations: 2
Aggregation Type: Conference Proceeding
-------------------


Title: Recognizing handwritten characters with local descriptors and bags of visual words
Cover Date: 2015-01-01
Cover Display Date: 2015
DOI: 10.1007/978-3-319-23983-5_24
Description: In this paper we propose the use of several feature extraction methods, which have been shown before to perform well for object recognition, for recognizing handwritten characters. These methods are the histogram of oriented gradients (HOG), a bag of visual words using pixel intensity information (BOW), and a bag of visual words using extracted HOG features (HOG-BOW). These feature extraction algorithms are compared to other well-known techniques: principal component analysis, the discrete cosine transform, and the direct use of pixel intensities. The extracted features are given to three different types of support vector machines for classification, namely a linear SVM, an SVM with the RBF kernel, and a linear SVM using L2-regularization. We have evaluated the six different feature descriptors and three SVM classifiers on three different handwritten character datasets: Bangla, Odia and MNIST. The results show that the HOG-BOW, BOW and HOG method significantly outperform the other methods. The HOG-BOW method performs best with the L2-regularized SVM and obtains very high recognition accuracies on all three datasets.
Citations: 5
Aggregation Type: Book Series
-------------------


Title: A Path Planning for Line Segmentation of Handwritten Documents
Cover Date: 2014-12-09
Cover Display Date: 9 December 2014
DOI: 10.1109/ICFHR.2014.37
Description: This paper describes the use of a novel A path-planning algorithm for performing line segmentation of handwritten documents. The novelty of the proposed approach lies in the use of a smart combination of simple soft cost functions that allows an artificial agent to compute paths separating the upper and lower text fields. The use of soft cost functions enables the agent to compute near-optimal separating paths even if the upper and lower text parts are overlapping in particular places. We have performed experiments on the Saint Gall and Monk line segmentation (MLS) datasets. The experimental results show that our proposed method performs very well on the Saint Gall dataset, and also demonstrate that our algorithm is able to cope well with the much more complicated MLS dataset.
Citations: 26
Aggregation Type: Conference Proceeding
-------------------


Title: A comparison of feature and pixel-based methods for recognizing handwritten bangla digits
Cover Date: 2013-12-11
Cover Display Date: 2013
DOI: 10.1109/ICDAR.2013.40
Description: We propose a novel handwritten character recognition method for isolated handwritten Bangla digits. A feature is introduced for such patterns, the contour angular technique. It is compared to other methods, such as the hotspot feature, the gray-level normalized character image and a basic low-resolution pixel-based method. One of the goals of this study is to explore performance differences between dedicated feature methods and the pixel-based methods. The four methods are compared with support vector machine (SVM) classifiers on the collection of handwritten Bangla digit images. The results show that the fast contour angular technique outperforms the other techniques when not very many training examples are used. The fast contour angular technique captures aspects of curvature of the handwritten image and results in much faster character classification than the gray pixel-based method. Still, this feature obtains a similar recognition compared to the gray pixel-based method when a large training set is used. In order to investigate further whether the different feature methods represent complementary aspects of shape, the effect of majority voting is explored. The results indicate that the majority voting method achieves the best recognition performance on this dataset. © 2013 IEEE.
Citations: 31
Aggregation Type: Conference Proceeding
-------------------


Title: Handwritten character classification using the hotspot feature extraction technique
Cover Date: 2012-06-18
Cover Display Date: 2012
DOI: N/A
Description: Feature extraction techniques can be important in character recognition, because they can enhance the efficacy of recognition in comparison to featureless or pixel-based approaches. This study aims to investigate the novel feature extraction technique called the hotspot technique in order to use it for representing handwritten characters and digits. In the hotspot technique, the distance values between the closest black pixels and the hotspots in each direction are used as representation for a character. The hotspot technique is applied to three data sets including Thai handwritten characters (65 classes), Bangla numeric (10 classes), and MNIST (10 classes). The hotspot technique consists of two parameters including the number of hotspots and the number of chain code directions. The data sets are then classified by the k-Nearest Neighbors algorithm using the Euclidean distance as function for computing distances between data points. In this study, the classification rates obtained from the hotspot, mark direction, and direction of chain code techniques are compared. The results revealed that the hotspot technique provides the largest average classification rates.
Citations: 13
Aggregation Type: Conference Proceeding
-------------------


Title: Optimization of line segmentation techniques for Thai handwritten documents
Cover Date: 2009-12-28
Cover Display Date: 2009
DOI: 10.1109/SNLP.2009.5340921
Description: The purpose of the research is to study the optimization of line segmentation techniques for Thai handwritten documents. This research considered only single-column of Thai documents. I proposed two new techniques including comparing Thai character and sorting and distinguishing. These two techniques were used with recognized techniques on the basis of projection profile (including horizontal projection profile and stripe) in the experiment. The outcome of this research suggested that the best technique for single-column Thai documents is the new technique for sorting and distinguishing, this technique provide the accuracy of 97.11%. © 2009 IEEE.
Citations: 7
Aggregation Type: Conference Proceeding
-------------------


Title: Image segmentation of historical handwriting from palm leaf manuscripts
Cover Date: 2008-10-02
Cover Display Date: 2008
DOI: 10.1007/978-0-387-87685-6_23
Description: Palm leaf manuscripts were one of the earliest forms of written media and were used in Southeast Asia to store early written knowledge about subjects such as medicine, Buddhist doctrine and astrology. Therefore, historical handwritten palm leaf manuscripts are important for people who like to learn about historical documents, because we can learn more experience from them. This paper presents an image segmentation of historical handwriting from palm leaf manuscripts. The process is composed of three steps: 1) background elimination to separate text and background by Otsu's algorithm 2) line segmentation and 3) character segmentation by histogram of image. The end result is the character's image. The results from this research may be applied to optical character recognition (OCR) in the future. © 2008 International Federation for Information Processing.
Citations: 17
Aggregation Type: Book Series
-------------------


Title: Comparison of image analysis for Thai handwritten character recognition
Cover Date: 2006-12-01
Cover Display Date: 2006
DOI: N/A
Description: This paper is proposing the method for Thai handwritten character recognition. The methods are Robust C-Prototype and Back-Propagation Neural Network. The objective of experimental is recognition on Thai handwritten character. This is the result of both methods to be appearing accuracy more than 85%.
Citations: 2
Aggregation Type: Book Series
-------------------


Title: Comparison of image analysis for thai handwritten character recognition
Cover Date: 2006-12-01
Cover Display Date: 2006
DOI: N/A
Description: This paper is proposing the method for Thai handwritten character recognition. The methods are Robust C- Prototype and Back-Propagation Neural Network. The objective of experimental is recognition on Thai handwritten character. This is the result of both methods to be appearing accuracy more than 85%.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------