Title: UNIFIED PREDICTIVE MODEL FOR PREDICTING THE CLOSING PRICE OF VARIOUS CRYPTOCURRENCIES
Cover Date: 2025-06-01
Cover Display Date: June 2025
DOI: 10.24507/icicelb.16.06.589
Description: This study introduces a unified predictive model for forecasting cryptocurrency closing prices, utilizing advanced machine learning techniques such as Support Vector Regression (SVR), Random Forest, and Long Short-Term Memory (LSTM) net-works. By integrating multiple cryptocurrency datasets through feature-level data fusion via concatenation, the model effectively captures the complex, nonlinear, and dynamic relationships characteristic of cryptocurrency markets. The experimental results reveal that the proposed model outperforms baseline models developed for individual cryptocurrencies, particularly with SVR. This enhanced performance is due to SVR’s ability to manage high-dimensional data and model intricate nonlinear patterns. While Random Forest and LSTM also demonstrate strong predictive capabilities, their effectiveness is more dependent on specific data characteristics and configurations. The integration of diverse data sources and the application of Min-Max normalization play a crucial role in enhancing prediction accuracy and model robustness. This approach allows the model to account for broader market dynamics, providing valuable insights for short-term and medium-term trading strategies and supporting informed decision-making for investors, traders, and analysts.
Citations: 0
Aggregation Type: Journal
-------------------
Title: ENHANCING SENTIMENT CLASSIFICATION: A COMPARATIVE ANALYSIS OF SUPERVISED AND UNSUPERVISED METHODS FOR IMPROVING TRAINING DATA QUALITY
Cover Date: 2025-05-01
Cover Display Date: May 2025
DOI: 10.24507/icicelb.16.05.471
Description: This study evaluates the effectiveness of supervised and unsupervised methods in enhancing data quality for binary sentiment classification. Two datasets of hotel reviews from TripAdvisor were utilized: one for training polarity correction models and the other containing noisy labels for experimental evaluation. Supervised methods, including SVM with a linear kernel, Random Forest (RF), and Convolutional Neural Network (CNN), consistently outperformed unsupervised methods such as Standard K-means, K-means++, and Spherical K-means. Following the development of sentiment classifier models using the improved training set, SVM demonstrated the highest performance, achieving an accuracy and F1 score of 0.85, followed by RF and CNN. Among the unsupervised approaches, K-means++ yielded the best results, with an accuracy of 0.75 and an F1 score of 0.74. These findings highlight the superiority of supervised learning in sentiment classification tasks and underscore the critical importance of training set quality in enhancing model performance.
Citations: 0
Aggregation Type: Journal
-------------------
Title: MULTICLASS CLASSIFICATION APPROACH FOR DETECTING SOFTWARE BUG SEVERITY LEVEL FROM BUG REPORTS
Cover Date: 2025-05-01
Cover Display Date: May 2025
DOI: 10.24507/icicelb.16.05.567
Description: This study focuses on developing multiclass classifiers to predict the severity levels of bug reports using three machine learning algorithms: Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) with an RBF kernel. The research utilizes three datasets from the Mozilla bug tracking system – Core, Firefox, and Thunderbird – categorizing bug severity into five levels: blocker, critical, major, minor, and low. To address class imbalance and enhance model performance, a domain expertbased data augmentation method was applied, generating synthetic summaries from bug descriptions using cosine similarity. The augmented datasets, combined with undersampling techniques, ensure balanced class distributions, improving classifier robustness. The study leverages unigram and CamelCase features to build and evaluate the classifiers. Performance metrics, including accuracy, F1 score, and Matthews Correlation Coefficient (MCC), were used to assess model efficacy. The results demonstrate that LR outperforms RF and SVM, offering superior accuracy and interpretability, particularly for high-dimensional text data. LR’s efficiency, reduced overfitting risk, and effective handling of linear relationships make it well-suited for bug severity classification. This research provides a robust framework for improving bug triage processes, enhancing the prioritization and resolution of critical software issues.
Citations: 0
Aggregation Type: Journal
-------------------
Title: A Comparative Analysis of Machine Learning Models for Domain Adaptation in Multiclass Sentiment Classification
Cover Date: 2025-04-01
Cover Display Date: April 2025
DOI: 10.37936/ecti-cit.2025192.258824
Description: This study presents a comparative evaluation of machine learning models for domain adaptation in multiclass sentiment classification. While sentiment analysis aims to categorize opinions as positive, neutral, or negative, adapting models across domains remains a significant challenge due to differences in vocabulary, writing style, and sentiment expression. Models trained on a specific domain often fail to generalize effectively to others. To solve this problem, we evaluate how well six models-logistic regression, support vector machine (SVM) with a linear kernel, random forest, convolutional neural network (CNN), long short-term memory (LSTM), and BERT-perform on sentiment data from books, beauty & personal care, and automotive categories. The evaluation uses Amazon review data and measures performance via accuracy, F1 score, and Area Under the ROC Curve (AUC). Results indicate that BERT consistently outperforms all other models due to its attention-based transformer architecture, which captures nuanced contextual information across diverse domains. CNN and LSTM models also perform well, particularly in domain-specific settings, with CNN excelling in extracting local features and LSTM in modeling sequential relationships. Traditional models, such as logistic regression and SVM, show limitations in generalizability, while random forest demonstrates stable yet moderate performance. These findings highlight the strengths and trade-offs of each approach for effective cross-domain sentiment classification.
Citations: 0
Aggregation Type: Journal
-------------------
Title: ENSEMBLE CLUSTERING METHOD FOR ASSEMBLING OF THAI DECIDED CIVIL CASES INTO SPECIFIC CLUSTERS
Cover Date: 2025-03-01
Cover Display Date: March 2025
DOI: 10.24507/icicel.19.03.271
Description: Civil cases often pertain to legal disputes between individuals or organizations. Following a judgment, civil cases are referred to as “decided cases” and the associated documents can be utilized for future legal determinations. One alternative method for managing these decided cases and making it easier to identify relevant decided cases that meet the user’s needs is to group relevant decided cases together. As a result, the purpose of this study was to offer an ensemble clustering method for finding and identifying the most relevant legal cases from a given collection that satisfy the needs of users. In our ensemble clustering, we employ well-known clustering methods such as k-means++, spherical k-means, and DBSCAN. Upon assessing the clustering quality measure (purity score), accuracy, and F1 score, the proposed method yielded good results. Furthermore, when comparing it to the baseline, the proposed method exhibits enhancements in the purity score, accuracy, and F1 score by 6.95%, 6.67%, and 6.95%, respectively.
Citations: 0
Aggregation Type: Journal
-------------------
Title: DEVELOPING OF MULTICLASS CLASSIFIER MODEL USING ENSEMBLE APPROACH FOR BUG REPORTS ANALYSIS
Cover Date: 2025-02-01
Cover Display Date: February 2025
DOI: 10.24507/icicel.19.02.149
Description: Prior research mostly concentrated on identifying actual-bug reports using binary classification. The information contained in those actual-bug reports can be utilized for software fixing purposes. Additional pertinent information is necessary to enhance and uphold software quality. This information from bug reports is referred to as “enhancement”. Conversely, bug reports that are relevant to the elimination, restructuring, substitution, activation, or deactivation of software functions, as well as other engineering tasks, are classified as “task”. Hence, bug report classification should encompass not only binary classification but also multiclass classification. Hence, this study focused on the issue of multiclass classification for bug reports. The proposed approach attempted to categorize bug reports into three distinct classes: actual-bug, enhancement, and task. This study developed a multiclass classifier model using ensemble method. The obtained model consists of five classifier models: Support Vector Machine (SVM) with linear, SVM with RBF, Logistic Regression, Multinomial Naïve Bayes, and eXtreme Gradient Boosting. The bug report features consist of unigrams and CamelCase words, whereas the term weighting algorithm employed is term frequency-inverse gravity moment (tf-igm). This study utilized two bug report datasets, specifically from FireFox and Thunderbird, which were acquired using the Bugzilla system. Also, the proposed model was compared to two prior models considered as the baselines. In comparison to the baseline models, the accuracy, F1, and AUC scores of the proposed model were marginally higher.
Citations: 0
Aggregation Type: Journal
-------------------
Title: Leveraging PubMed Abstracts for Identifying COVID-19 Treatment Modalities
Cover Date: 2025-01-01
Cover Display Date: 2025
DOI: 10.1007/978-3-031-90295-6_4
Description: Extracting relevant treatment strategies for COVID-19 from biomedical literature is crucial for efficient knowledge discovery. This study proposes a text mining approach to identify key treatment modalities from PubMed abstracts using one-cluster clustering and term frequency (tf) representation. The methodology involves sentence segmentation, text preprocessing, feature selection using Chi-Square (χ2), and constraint-based k-means clustering. The experimental results demonstrate that tf representation outperforms binary representation, achieving a recall score of 0.795 vs. 0.750, indicating improved identification of treatment-related insights. By employing constraint-based k-means clustering with k = 1, the model effectively consolidates all relevant information into a single cluster, preserving essential treatment details. The integration of Chi-Square feature selection further refines the extracted data by identifying the most significant terms. This approach provides a systematic and scalable method for biomedical text analysis.
Citations: 0
Aggregation Type: Book Series
-------------------
Title: Clustering-Based Approach for Identifying Key Information to Develop Short Video Prototypes in Science Communication for Aging Populations
Cover Date: 2025-01-01
Cover Display Date: 2025
DOI: 10.1007/978-3-031-90295-6_12
Description: This study proposes an innovative approach to designing short videos for science communication tailored to aging populations. By leveraging web scraping and text mining techniques, key elements such as content style, font selection, visuals, background music, color schemes, and video length are extracted and clustered using constraint-based k-means clustering. Domain experts contribute predefined centroids, enhancing the clustering process's relevance and accuracy. The methodology involves text pre-processing, vectorization using Term Frequency, and evaluation via recall analysis, achieving values between 0.71 and 0.75. Clusters focusing on visuals, color schemes, and video length show the highest recall, reflecting their well-defined nature, while broader topics like content style perform slightly lower. The results highlight the efficiency and scalability of combining automated methods with expert input, providing a robust framework for creating engaging and accessible science communication content. Future work includes integrating semantic embedding techniques to improve clustering outcomes and address broader categories more effectively.
Citations: 0
Aggregation Type: Book Series
-------------------
Title: Mapping sugarcane plantations in Northeast Thailand using multi-temporal data from multi-sensors and machine-learning algorithms
Cover Date: 2025-01-01
Cover Display Date: 2025
DOI: 10.1080/20964471.2025.2463730
Description: The effectiveness of machine learning algorithms and the limited reference data introduce uncertainness for sugarcane classification. To address these problems, our study classified sugarcane plantations at the field scale using multi-temporal and multi-sensor data together with a large number of ground truth datasets (>13,000 points) and compared the efficacy of ensemble and kernel classifier methods over 3 years (2021, 2022, and 2023) across Northeast Thailand. In the first step, land cover was generated from a random forest classifier, demonstrating excellent results for all years with an OA higher than 95%. In the second step, the discretization of sugarcane from non-sugarcane classes in the agricultural category was conducted using four efficient machine learning algorithms (decision tree (DT), random forest (RF), support vector machine (SVM), and one-class SVM). The RF classifier gave the optimal results with over 90% accuracy. Our results aligned with provincial statistics from the Office of the Cane and Sugar Board, thereby highlighting the efficacy and reliability of the RF method in mapping sugarcane in small fields and cloudy regions. A temporal evolution analysis of sugarcane cultivation spanning the preceding 3 years revealed a significant increase in the productive area. Our findings provide crucial information for sustainable management practices.
Citations: 2
Aggregation Type: Journal
-------------------
Title: Metadata-Driven Innovation in Smart Offices: A Study on the Impact of Standards on Digital Twins and Indoor Positioning Systems
Cover Date: 2024-05-15
Cover Display Date: 15 May 2024
DOI: 10.1145/3674558.3674594
Description: This study investigates the refinement of metadata from various sources for optimal integration with digital twins, aiming to enhance its applicability in smart office environments through the Internet of Things (IoT) and digital twin technologies. By developing a systematic framework for hypothesis testing, the research evaluates the metadata's performance in real-time operational dynamics, specifically in indoor tracking of wireless devices to assess data transmission accuracy. The analysis, supported by a performance evaluation with five reference sensors, confirms the metadata's effectiveness in ensuring rapid and precise information retrieval. These findings highlight the potential of customized metadata to improve the efficiency and accuracy of digital twin applications in smart offices.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Data-Driven Design for Educational Game: Leveraging Insights from Educational Game Reviews by the k-means Clustering
Cover Date: 2024-01-01
Cover Display Date: 2024
DOI: 10.1109/InCIT63192.2024.10810523
Description: This study proposes a data-driven approach to educational game design, utilizing user feedback and reviews to identify key factors contributing to the success or failure of educational games. By analyzing reviews of popular educational games from Common Sense Media, we employed k-means clustering to group similar reviews and extract meaningful insights. The clustering process, validated through Silhouette Scores and Davies-Bouldin Index metrics, revealed distinct themes in user feedback, highlighting areas for potential enhancement in educational game design. Our results demonstrate that clustering reviews can effectively differentiate between positive and mixed feedback, providing actionable guidance for game developers. This study underscores the value of incorporating systematic user feedback analysis into the educational game design process, offering a pathway to creating more engaging and educationally effective games. The findings contribute to a more structured and evidence-based approach to educational game development, ultimately enhancing learning outcomes and user satisfaction
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: A HYBRID METHOD OF ASPECT-BASED SENTIMENT ANALYSIS FOR HOTEL REVIEWS
Cover Date: 2024-01-01
Cover Display Date: January 2024
DOI: 10.24507/icicel.18.01.59
Description: The purpose of this study was to introduce a hybrid method of aspect-based sentiment analysis for hotel reviews. Hotel staff attentiveness, hotel cleanliness, value for money, and hotel location are all highly regarded hotel aspects. The proposed method is made up of two major components. BM25 is used in the first component to group the review sentences into the most relevant hotel aspect cluster. Word2Vec's skip-gram was utilized to generate the keywords relevant to each hotel aspect, which were then used as queries to organize review sentences into suitable hotel aspect cluster. Finally, hotel review sentences in each cluster are assigned a sentiment polarity as positive or negative using the sentiment polarity analyzer, which is an ensemble model comprised of five predictive models developed by C4.5 decision tree, Multinomial Naive Bayes (MNB), Support Vector Machines (SVM) with linear kernel, SVM with RBF kernel, and Logistic Regression (LR). After evaluating the proposed hybrid method via recall, precision, F1, and accuracy, our proposed method yielded satisfactory outcomes at 0.820, 0.805, 0.810, and 0.815, respectively. Furthermore, we also compared our hybrid method to a baseline utilizing the same training and test sets. The recall and precision scores of our proposed method were marginally higher than the baseline, with enhanced recall and precision scores at 4.76% and 4.88%, respectively.
Citations: 1
Aggregation Type: Journal
-------------------
Title: Analyzing Machine Learning Techniques for Air Passenger Numbers Forecasting
Cover Date: 2024-01-01
Cover Display Date: 2024
DOI: 10.1109/JCSSE61278.2024.10613714
Description: The imperative of forecasting air passenger numbers is underscored by its utility in strategic planning and operational optimization within the aviation sector. As a pivotal technique for predicting continuous data, machine learning methodologies offer a sophisticated avenue for enhancing forecast accuracy. This study embarks on a detailed examination of machine learning approaches to predict air passenger volumes in Thailand, employing a suite of models: Random Forest, XGBoost, Gradient Boosting, LightGBM and CatBoost. Comparative analysis reveals the Gradient Boosting model as the preeminent performer, demonstrating superior forecasting capabilities with an RMSE of 8,081 and a MAPE of 2.57%. These findings underscore the potential of machine learning techniques in refining the precision of air passenger number forecasts, offering significant implications for the planning and management of aviation operations.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Advancements in Healthcare Monitoring: Implementing Hybrid Indoor Positioning Systems for Alzheimer's and Dementia Care
Cover Date: 2024-01-01
Cover Display Date: 2024
DOI: 10.18178/wcse.2024.06.028
Description: This research explores the efficacy of integrating Wi-Fi and Bluetooth Low Energy (BLE) beacons in indoor positioning systems (IPS) to enhance care for Alzheimer's and dementia patients in healthcare settings. The study assesses the accuracy of these technologies in measuring indoor positions, with BLE beacons demonstrating superior precision of approximately ±1m, making them highly suitable for detailed patient tracking. Wi-Fi, while benefiting from existing infrastructure, displayed larger positioning errors ranging from ±10m to ±15m due to environmental interferences. The statistical analysis confirms that a hybrid approach, utilizing both Wi-Fi and BLE beacons, optimizes the balance between extensive coverage and high positional accuracy. This system significantly improves the monitoring and management of patient movements, thereby increasing safety and enhancing care delivery. The findings advocate for further development of IPS technologies, incorporating advanced algorithms and machine learning, to refine accuracy and reliability, aiming to substantially improve patient outcomes in healthcare environments.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: The Development of a Unified Predictive Model to Predict Closed Price for a Variety of Cryptocurrencies
Cover Date: 2024-01-01
Cover Display Date: 2024
DOI: 10.1109/RI2C64012.2024.10784451
Description: Predicting the closing price of a cryptocurrency typically involves utilizing an individual predictive model for each particular cryptocurrency. However, a unified model has the potential to predict the closing price of many cryptocurrencies, providing convenience and facilitating comparative research. Thus, this study utilized feature-based data fusion to integrate historical data of cryptocurrencies from the same time period using a data fusion, called as the average method. The predictive models were developed using two machine learning algorithms, i.e. Support Vector Regression (SVR) with RBF kernel and Random Forest (RF). This study used three types of cryptocurrencies: Bitcoin (BTC), Ethereum (ETH), and Litecoin (LTC). After evaluating the effectiveness of several predictive models utilizing MAE, RMSE, and MAPE for short-term (5-day) predictions, it was found that the unified predictive model yielded similar outcomes to the models specifically d eveloped f or the given particular cryptocurrency. This method offers the ability to help investors in identifying general price movements among multiple cryptocurrencies and decreasing the amount of time needed for observations. However, the unified p redictive models developed by the random forest algorithm surpass other models in successfully predicting the short-term (5-day) closing prices of cryptocurrencies.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Logistic Regression-based Sentiment Classification Approach for Identifying Undergraduate Student Sentiments in a Course Studied
Cover Date: 2024-01-01
Cover Display Date: 2024
DOI: 10.1109/RI2C64012.2024.10784409
Description: This study aimed to utilize sentiment classification to ascertain the sentiment of undergraduate students towards the course they have studied. This case study specifically examines the character design course given by the Department of Creative Media, Faculty of Informatics, Mahasarakham University. Unfortunately, our data collection exhibits an imbalance between the positive class and the negative class, with a greater likelihood for the data belong to the positive class. This issue has the potential to result in sentiment classifiers that generate subpar outcomes. Consequently, this issue was also addressed in this study. To develop the binary-based sentiment classifiers, logistic regression methods were employed, specifically traditional logistic regression and logistic regression with class weights. The term weighting scheme is tf-idf, The results were determined to be satisfactory after being evaluated using the F1 score and AUC. However, it was found that the sentiment classifiers generated by L R with class weights showed better results in terms of average F1 score and AUC compared to the sentiment classifiers developed using traditional LR. The overall improvements of F1 score and AUC were 14.51 % and 13.50%, respectively.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: COMPARATIVE STUDY OF SUPERVISED MACHINE LEARNING MODELS FOR MULTICLASSIFICATION IN BUG REPORT DOMAIN
Cover Date: 2023-12-01
Cover Display Date: December 2023
DOI: 10.24507/icicel.17.12.1365
Description: The challenge in this study was to multiclassify bug reports, and the proposed method attempted to assign bug reports into three categories: real-bug, enhancement, and task. The dataset that is used in this study was obtained from the Bugzilla system and was connected to the opensource Firefox browser. Our approach began with bug report pre-processing. It was driven by replacing contractions, tokenization, spelling correction, punctuation and stop-word removal, CamelCase processing, and stemming and lowercase conversion, in that order. We compared two features of bug reports (i.e., unigram words only and unigram together with CamelCase words). The pre-processed bug reports were afterwards formatted in a vector space model format, with each term weighed using a term weighting scheme. In addition, term frequency (tf) and term frequency-inverse gravity moment (tf-igm) used to assign weight for each term were examined in this research. Following that, the vector of bug reports was utilized to build the multi-classifier models. Logistic Regression, Multinomial Naïve Bayes, eXtreme Gradient Boosting, Linear Support Vector Machines, Random Forest, and Neural Networks were all evaluated. Finally, it was determined that the Linear Support Vector Machine classifier was the most suitable model for our dataset.
Citations: 1
Aggregation Type: Journal
-------------------
Title: Enhancing Indoor Positioning Accuracy: A Comprehensive Study on Euclidean Distance, Trilateration, Wi-Fi RTT and FTM Protocol Integration
Cover Date: 2023-11-25
Cover Display Date: 25 November 2023
DOI: 10.1145/3638209.3638235
Description: Indoor positioning is a critical technology with a broad spectrum of applications spanning from navigation systems in smart buildings to asset tracking in industrial environments. This research paper explores the effectiveness of four prominent techniques in indoor positioning: Euclidean Distance, Trilateration, Wi-Fi Round Trip Time (RTT), and the Fast Time Measurement (FTM) Protocol. In this study, we conducted extensive experiments and analysis to evaluate the accuracy and performance of these methods within diverse indoor settings. Our findings reveal that Euclidean Distance, when coupled with fingerprinting techniques, achieved an average positioning accuracy of ±1.5 meters in a real-world indoor environment. Trilateration, leveraging signals from strategically placed beacons, demonstrated even greater precision with an average accuracy of ±0.5 meters. Moreover, Wi-Fi RTT emerged as a promising approach, delivering an accuracy of ±0.3 meters in test scenarios. Furthermore, statistical analysis revealed that these techniques perform consistently across different indoor environments, regardless of factors such as signal obstructions and variations in signal strength. These results underscore the versatility and reliability of these indoor positioning methods, making them viable options for a wide range of applications. In conclusion, this research not only provides valuable insights into the capabilities of Euclidean Distance, Trilateration, Wi-Fi RTT, and FTM Protocol but also underscores their potential to transform indoor positioning accuracy. These findings pave the way for improved indoor navigation, asset tracking, and location-based services, with significant implications for industries such as logistics, healthcare, and smart building management.
Citations: 1
Aggregation Type: Conference Proceeding
-------------------
Title: Edge Computing Based on Raspberry PI for People Counting in Smart Office
Cover Date: 2023-07-27
Cover Display Date: 27 July 2023
DOI: 10.3233/ATDE230064
Description: This research presents a technique for counting the number of people in a smart office room when walking in and out of the door by relying on a normal quality webcam. This means that the camera is not high resolution. The picture is not clear and sharp. Therefore, the image must be counted to adjust the color first before being processed in other steps. The designed system uses OpenCV and Raspberry Pi to help and is Edge Computing processing by comparing images in front view and top view. The front view is accurate at 58.44% for entering the room and 30.15% accurate for leaving the room. However, it was clear that the top view image would be more accurate at 87.82% for walking into the room and 80.75% out of 130 tests for walking out of the room. This system can also be used for counting people in other situations, such as counting people in factories, counting workers on the job site and counting people at trade fairs, etc.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Automatically Correcting Noisy Labels for Improving Quality of Training Set in Domain-specific Sentiment Classification
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.55003/cast.2022.02.23.006
Description: Classification model performance can be degraded by label noise in the training set. The sentiment classification domain also struggles with this issue, whereby customer reviews can be mislabeled. Some customers give a rating score for a product or service that is inconsistent with the review content. If business owners are only interested in the overall rating picture that includes mislabeling, this can lead to erroneous business decisions. Therefore, this issue became the main challenge of this study. If we assume that customer reviews with noisy labels in the training data are validated and corrected before the learning process, then the training set can generate a predictive model that returns a better result for the sentiment analysis or classification process. Therefore, we proposed a mechanism, called polarity label analyzer, to improve the quality of a training set with noisy labels before the learning process. The proposed polarity label analyzer was used to assign the polarity class of each sentence in a customer review, and then polarity class of that customer review was concluded by voting. In our experiment, datasets were downloaded from TripAdvisor and two linguistic experts helped to assign the correct labels of customer reviews as the ground truth. Sentiment classifiers were developed using the k-NN, Logistic Regression, XGBoost, Linear SVM and CNN algorithms. After comparing the results of the sentiment classifiers without training set improvement and the results with training set improvement, our proposed method improved the average scores of F1 and accuracy by 20.59%.
Citations: 2
Aggregation Type: Journal
-------------------
Title: Machine Learning-Based Methods for Identifying Bug Severity Level from Bug Reports
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.1007/978-3-031-30474-3_17
Description: Bug reports constitute an important source of information that may be utilized to remedy bugs. One aspect of bug report analysis is determining the severity level of the reported problem. Nevertheless, manually assigning severity to a large number of bug reports is time-consuming, and human analysis of the severity of software bugs may be subject to biased. Consequently, automated analysis is required for detecting the severity of software defects concealed inside bug reports. Even though there have been several attempts to tackle the problem and a number of machine learning and text mining approaches have been applied, the performance of the solutions may vary depending on the datasets employed. As a result, this is the objective of our research, in which we intend to evaluate the efficacy of many machine learning approaches (i.e. Logistic Regression, Random Forest, Support Vector Machine, and Long Short-Term Memory) for severity analysis on multiple bug report datasets (i.e. Core, Firefox, Thunderbird, and Bugzilla) from the Mozilla bug tracking system. It should be emphasized that our research is based on a binary classification scheme in which bug reports are classified into two classes: severe class and non-severe class. The experimental results revealed that the LR algorithm performed well in analyzing the severity of bug reports in the Core, Firefox, and Bugzilla datasets, whereas the LSTM approach performed well for the Thunderbird dataset.
Citations: 4
Aggregation Type: Book Series
-------------------
Title: A Hybrid Approach for Aspect-based Sentiment Analysis: A Case Study of Hotel Reviews
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.55003/cast.2022.02.23.008
Description: This study presents a method of aspect-based sentiment analysis for customer reviews related to hotels. The considered hotel aspects are staff attentiveness, room cleanliness, value for money and convenience of location. The proposed method consists of two main components. The first component is used to assemble relevant sentences for each hotel aspect into relevant clusters of hotel aspects using BM25. We developed a corpus of keywords called the Keywords of Hotel Aspect (KoHA) Corpus, and the keywords of each aspect were used as queries to assemble relevant sentences of each hotel aspect into relevant clusters. Finally, customer review sentences in each cluster were classified into positive and negative classes using sentiment classifiers. Two algorithms, Support Vector Machines (SVM) with a linear and a RBF kernel, and Convolutional Neural Network (CNN) were applied to develop the sentiment classifier models. The model based on SVM with a linear kernel returned better results than other models with an AUC score of 0.87. Therefore, this model was chosen for the sentiment classification stage. The proposed method was evaluated using recall, precision and F1 with satisfactory results at 0.85, 0.87 and 0.86, respectively. Our proposed method provided an overview of customer feelings based on score, and also provided reasons why customers liked or disliked each aspect of the hotel. The best model from the proposed method was used to compare with a state-of-the-art model. The results show that our method increased recall, precision, and F1 scores by 2.44%, 2.50% and 1.84%, respectively.
Citations: 9
Aggregation Type: Journal
-------------------
Title: A Method of Identifying Treatment Modalities Related to COVID-19 from PubMed Abstracts
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.1109/InCIT60207.2023.10413055
Description: In the past three to four years, COVID-19, a highly contagious disease caused by a novel coronavirus, has resulted in a significant number of fatalities. Numerous interventions have been carried out in an effort to preserve human life. Despite the current decline in the COVID-19 outbreak, the virus responsible for the disease has undergone a significant mutation, resulting in modifications to treatment protocols. As viruses endure mutations, disease severity tends to increase. Therefore, prior expertise treating patients is beneficial for diagnosing the disease and determining an effective treatment plan. This is the objective of this study, which proposes a text mining approach for identifying relevant information regarding COVID-19 treatment modalities in clinical trials. Constraint-based k-means clustering is the principal algorithm utilized in the proposed method. The datasets utilized in this study are PubMed abstracts related to COVID-19 in clinical trials. Three medical professionals will provide a ground truth for the dataset used in the study. Upon conducting an evaluation of the outcomes using the recall score, it is possible to conclude that the proposed methodology produced results that are deemed satisfactory.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Compare People Counting Accuracy with OpenCV on Raspberry Pi and Infrared Sensor
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.1109/ICIKM59709.2023.00022
Description: This paper presents the results obtained from the comparison of techniques used to count the number of people in a room in real time for use in smart office systems. Counting people through a webcam based on a top view. After that, the image is processed with OpenCV on the Raspberry Pi and another technique is to count people with an infrared sensor attached to the door. The results obtained from the test revealed that the camera method used to count people with a top view had the highest accuracy at 84.28%. However, this value was obtained from a test on a PC. This value will vary according to the performance of the applied device. When used with the Raspberry Pi, the accuracy is reduced to only 75.63%. Hence, when actually using it, it can be seen that using an infrared sensor would be more appropriate. Since it has an accuracy of counting people at the door at 79.81%.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: A Comparative Study of Multi-class Sentiment Classification Models for Hotel Customer Reviews
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.1109/RI2C60382.2023.10355942
Description: This research intends to provide a comparative analysis of multi-class sentiment classification models for hotel customer reviews. This study utilized a dataset of hotel reviews downloaded from the TripAdvisor website. These hotel reviews were composed in English and were based on a five-star rating scale. The rating of each hotel reflected its class. There was a comparison between transformer-based sentiment classification algorithms (such as BERT) and traditional sentiment classification algorithms. The traditional algorithms for sentiment classification in this study can be machine learning (e.g. Multinomial Naïve Bayes, Random Forest, and Support Vector Machines with linear kernel function) and deep learning (e.g. Convolutional Neural Networks). After evaluation via recall, precision, F1, accuracy, and AUC, the BERT model outperformed other models such as Multinomial Nave Bayes, Random Forest, and Support Vector Machines, and Convolutional Neural Networks.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Machine Learning-based Multiclass Classification Methods for Sentiment Analysis
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.1109/InCIT60207.2023.10413035
Description: Sentiment analysis, also known as opinion mining, is the process of identifying the sentiment or emotion conveyed in a textual review. This requires categorizing the expressed opinions into several sentiment classes, namely positive, negative, or neutral. Typically, machine learning algorithms are employed to construct a sentiment classifier, which is subsequently utilized to automatically assign appropriate sentiment to individual textual reviews. Numerous machine learning methods have been utilized for these purposes. Determining the most suitable algorithm for sentiment analysis is a challenge. One potential methodology is doing a comparative examination of the algorithm's performance with the dataset under consideration, and then choose the most optimal sentiment classifier for adoption. In this study, we conducted a comparative analysis of many machine learning algorithms with lexicon-based approach, including multinomial naïve bayes, support vector machine, k-nearest neighbors, random forest and an ensemble approach combining these algorithms, with the purpose of developing a sentiment classifier model using the TripAdvisor dataset. The objective was to classify hotel customer reviews into three distinct categories: positive, neutral, and negative. After evaluation of recall, precision, F1, and accuracy metrics, it can be concluded that the ensemble approach yields superior outcomes compared to other approaches.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Predicting the Close-price of Cryptocurrency Using the Kernel Regression Algorithm
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.1109/RI2C60382.2023.10356032
Description: The aim of this work is to utilize the kernel regression (KR) approach to predict the closed-price for cryptocurrencies. This study makes use of three datasets: Bitcoin (BTC), Litecoin (LTC), and Ethereum (ETH). The min-max normalization method was used to scale feature values to a common range, often between 0 and 1. Furthermore, support vector regression (SVR) and long-short term memory (LSTM) were used to compare the prediction model-based on KR. The result of the KR models utilizing RMSE and MAPE demonstrated that the predictive model-based on KR gave more satisfying results.
Citations: 1
Aggregation Type: Conference Proceeding
-------------------
Title: ASPECT-BASED SENTIMENT CLASSIFICATION FOR CUSTOMER HOTEL REVIEWS
Cover Date: 2022-12-01
Cover Display Date: December 2022
DOI: 10.24507/icicelb.13.12.1291
Description: Using only ratings to gauge public opinion about products and services is in-sufficient to improve product quality or understand the reasons for consumer preferences. This problem was addressed by employing feature/aspect-based sentiment analysis to ex-amine the polarity of customer evaluations. An aspect-based sentiment analysis method was designed for hotel evaluations, taking account of staff attentiveness, room cleanliness, hotel facilities, value for money and location convenience. A collection of keywords for each hotel aspect was learned using Word2Vec as one of the three fundamental solution mechanics. This corpus was then utilized to select hotel features during developing an aspect-based multiclassification model to categorize sentences containing customer evaluations into their specific aspect classes. A binary-based sentiment classifier was also developed to assign the sentiment polarity of each sentence in each aspect class. Term frequency-inverse gravity moment (tf-igm) was employed as a term weighting scheme, while the SVM algorithm was used to construct text classification models. Our proposed method gave superior results to the baseline with improved average recall, precision, F1 and accuracy scores of 3.45%, 2.38%, 2.35% and 2.35%, respectively, compared to the baseline.
Citations: 1
Aggregation Type: Journal
-------------------
Title: Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction
Cover Date: 2022-11-01
Cover Display Date: November 2022
DOI: 10.34028/iajit/19/6/9
Description: In general, most existing bug report studies focus only on solving a single specific issue. Considering of multiple issues at one is required for a more complete and comprehensive process of bug fixing. We took up this challenge and proposed a method to analyze two issues of bug reports based on text mining techniques. Firstly, dependent bug reports are assembled into an individual cluster and then the bug reports in each cluster are analyzed for their severity. The method of dependent bug report assembly is experimented with threshold-based similarity analysis. Cosine similarity and BM25 are compared with term frequency (tf) weighting to obtain the most appropriate method. Meanwhile, four classification algorithms namely Random Forest (RF), Support Vector Machines (SVM) with the RBF kernel function, Multinomial Naïve Bayes (MNB), and k-Nearest Neighbor (k-NN) are utilized to model the bug severity predictor with four term weighting schemes, i.e., tf, term frequency-inverse document frequency (tf-idf), term frequency-inverse class frequency (tf-icf), and term frequency-inverse gravity moment (tf-igm). After the experimentation process, BM25 was found to be the most appropriate for dependent bug report assemblage, while for severity prediction using tf-icf weighting on the RF method yielded the best performance value.
Citations: 4
Aggregation Type: Journal
-------------------
Title: COMPARATIVE STUDY FOR DATA NORMALIZATION METHODS ON PREDICTING CRYPTOCURRENCY PRICE
Cover Date: 2022-08-01
Cover Display Date: August 2022
DOI: 10.24507/icicelb.13.08.853
Description: Feature values on highly different scales can decrease model performance prediction of cryptocurrency prices. Therefore, this work aimed to present a comparative study for data normalization in order to recognize the most appropriate method of data normalization for cryptocurrency price prediction. Three common data normalization methods often used in regression analysis as z-score, min-max and log scaling were com-pared. These data normalization methods were performed in the pre-processing data step, with scaled feature values used to develop the predictive models based on Support Vector Regression (SVR) and Long Short-Term Memory (LSTM). After evaluating the results by Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), the evaluation showed that the z-score method returned slightly better results than the min-max and log scaling methods. If considering the computational time, the z-score method required a slightly longer time because it calculates the mean and standard derivation values before the scaling feature values.
Citations: 3
Aggregation Type: Journal
-------------------
Title: Mining Bug Report Repositories to Identify Significant Information for Software Bug Fixing
Cover Date: 2022-07-01
Cover Display Date: 1 July 2022
DOI: 10.14416/j.asep.2021.03.005
Description: Most studies relating to bug reports aim to automatically identify necessary information from bug reports for software bug fixing. Unfortunately, the study of bug reports focuses only on one issue, but more complete and comprehensive software bug fixing would be facilitated by assessing multiple issues concurrently. This becomes a challenge in this study, where it aims to present a method of identifying bug report at a severe level from a bug report repository, together with assembling their related bug reports to visualize the overall picture of a software problem domain. The proposed method is called “mining bug report repositories”. Two techniques of text mining are applied as the main mechanisms in this method. First, classification is applied for identifying severe bug reports, called “bug severity classification”, while “threshold-based similarity analysis” is then applied to assemble bug reports that are related to a bug report at a severe level. Our datasets are obtained from three opensource namely SeaMonkey, Firefox, and Core:Layout downloaded from the Bugzilla. Finally, the best model from the proposed method is selected and compared with two baseline methods. For identifying severe bug reports using classification technique, the results show that our method improved accuracy, F1, and AUC scores over the baseline by 11.39, 11.63, and 19% respectively. Meanwhile, for assembling related bug reports using threshold-based similarity technique, the results show that our method improved precision, and likelihood scores over the other baseline by 15.76, and 9.14% respectively. This demonstrate that our proposed method may help to increase the chance to fix bugs completely.
Citations: 6
Aggregation Type: Journal
-------------------
Title: AUTOMATICALLY IDENTIFYING OF PLAGIARIZED SUBJECTIVE ANSWERS FOR THAI USING TEXT-BASED SIMILARITY ANALYSIS METHOD
Cover Date: 2022-06-01
Cover Display Date: June 2022
DOI: 10.24507/icicel.16.06.639
Description: In the context of education, many researchers design and develop methods or tools to identify plagiarism and maintain study quality. Text-based plagiarism often occurs in the academic domain, including online subjective examinations. Each one of the numerous proposed techniques has limitations in plagiarism detection. Here, a method is presented to identify plagiarized subjective answers in Thai when the subjective examination is performed online using natural language processing techniques (e.g., POS tagging) and cosine similarity analysis. The proposed method is called “similarity analysis of linguistic syntax and words used”. Results gave scores of true positive rate (TPR) as 0.81. Furthermore, the proposed method was compared with the baseline and when compared to the baseline, our proposed method improved the average TPR by 7.69%. This may demonstrate the success of our proposed method in identifying plagiarized subjective answers.
Citations: 0
Aggregation Type: Journal
-------------------
Title: A Comparative Study of Machine Learning Approaches for Predicting Close-Price Cryptocurrency
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1109/ICTKE55848.2022.9983453
Description: This study aimed to evaluate the effectiveness of several algorithms for predicting the close-price of various cryptocurrencies. Three algorithms employed in this comparative study were Support Vector Regression (SVR), Random Forest (RF), and Long Short-Term Memory (LSTM), while the three cryptocurrency datasets examined were Bitcoin, Ethereum, and Litecoin. Furthermore, in the stage of the data preparation, we compared two popular data normalization methods: min-max and z-score. After examining the close-price prediction results of each approach using Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE), it was revealed that the predictive model generated by the LSTM algorithm together with z-score normalization yielded the most effective results for each cryptocurrency dataset.
Citations: 3
Aggregation Type: Conference Proceeding
-------------------
Title: Bug reports identification using multiclassification method
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.14456/sehs.2022.46
Description: Whenever software defects (or bugs) are detected, they must be fixed immediately to allow the software to perform properly. The classification task for bug reports includes not only binary classification but also multiclassification. Therefore, multiclassification for bug reports was chosen as the challenge in this study. The proposed method aimed to classify bug reports into three classes, namely real-bug, enhancement, and task. The method began with bug report pre-processing, and then the vector of bug reports was used to develop the multiclassifier models. Eight machine learning algorithms namely multinomial naïve Bayes, logistic regression, random forest, support vector machines, k-nearest neighbor, extreme gradient boosting, neural networks and decision trees were compared. Finally, the classifier was chosen as the best model for the proposed method, and compared with the baseline. The Matthews correlation coefficient, area under the curve, F1 and accuracy scores of the best classifier from the proposed method showed improvement from the baseline at 4.09%, 2.71%, 1.83% and 1.69%, respectively.
Citations: 6
Aggregation Type: Journal
-------------------
Title: A Comparative Study of Short Text Classification Methods for Bug Report Type Identification
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1109/RI2C56397.2022.9910299
Description: This document is a model and instructions for LATEX. Previous related studies often used the 'summary' of bug reports because this part contains less noise. However, bug report summaries are often short, leading to short text classification issues which may have been overlooked. This study compares short text classification methods by categorizing bug reports into two classes as real-bug and non-bug based on three major factors namely bug report features, term weighting schemes and machine learning algorithms. Four bug report features (i.e. unigram, unigram + bigram, unigram + CamelCase, and all features), three term weighting schemes (i.e. tf, tf-idf and tf-igm) and three machine learning algorithms (i.e. random forest, support vector machine, and k-means clustering) are compared using bug reports relating to the Mozilla Firefox open source. Finally, unigram + CamelCase features along with tf-igm and support vector machine provide the most optimal bug report classification performance.
Citations: 1
Aggregation Type: Conference Proceeding
-------------------
Title: A Study of Comparative Methods for Closed-Price Cryptocurrency Prediction
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1109/InCIT56086.2022.10067551
Description: This work aims to present a comparative study method of closed-price prediction for cryptocurrencies namely machine learning-based and deep learning-based methods. Three data normalization techniques in the data pre-processing stage were also compared. They are log scaling, min-max, and z-score normalization. In the machine learning-based method, support vector regression (SVR) was used to develop the predictive model, whereas long-short term memory (LSTM) was used in the deep learning-based method to develop the predictive model. In addition, three datasets are used in this study namely Bitcoin (BTC), Ethereum (ETH), and Litecoin (LTC). The results of evaluating the predictive models using RMSE and MAPE revealed that SRV with RBF kernel produced slightly better results than LSTM. Compared to other data normalization methods, log scaling normalization produced outcomes that are more satisfactory.
Citations: 5
Aggregation Type: Conference Proceeding
-------------------
Title: Identifying Significant Customer Opinion Information of Each Aspect from Hotel Reviews
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1109/JCSSE54890.2022.9836251
Description: Recognizing whether customers like or dislike a product or service from online reviews may not be sufficient for other customers to make decisions or for owners to improve their merchandising. This was taken up as a challenge in this study that focused on finding significant sentiment information from customer reviews on each hotel aspect. The proposed framework first separated customer reviews into sentences, and then assembled all customer review sentences relating to each aspect of customer reviews using the k-means clustering. Later, those customer sentences are classified them into positive and negative sentiment polarity classes. The classifier was developed by Support Vector Machines (SVM). This can help other customers or the owner to understand why customers like or dislike a particular hotel aspect. The experimental results were evaluated using recall, precision, F1 and accuracy. The clustering method returned satisfactory results of 0.81, 0.80, 0.80 and 0.80, respectively. Meanwhile, the classification method also gave satisfactory results at 0.81, 0.79, 0.80 and 0.79, respectively. Compared to the baseline using F1 and accuracy, our proposed method produces very similar experimental results to the baseline method but our proposed method requires less computational time than the baseline.
Citations: 1
Aggregation Type: Conference Proceeding
-------------------
Title: Sentence-Level Sentiment Analysis for Student Feedback Relevant to Teaching Process Assessment
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1007/978-3-031-20992-5_14
Description: In the academic area, teaching process assessment conducted by students can be used as the main information to improve the teaching and learning process. However, when examination or consideration of the student feedback is conducted by teachers, the outcome may be a biased analysis. In the last decade, sentiment analysis has been applied to automatically evaluate the teaching process because it may help to reduce the problem of biased analysis when the sentiment analysis is performed by humans. This work presents a method of automatically analyzing student feedback relevant to teaching process assessment. The proposed method is called sentence-level sentiment analysis, and it is driven by processing steps such as pre-processing student comments and text representation, identifying aspect class for each sentence using the aspect analyzer, assigning sentence polarity for each sentence using the sentiment analyzer, and summarizing the overall sentiment polarity by considering student comments, respectively. The proposed method returns the recall, precision, F1, and accuracy scores of 0.835, 0.825, 0.825, and 0.825, respectively. These were satisfactory results.
Citations: 6
Aggregation Type: Book Series
-------------------
Title: Using the MQTT Broker as a Speech-Activated Medium to Control the Operation of Devices in the Smart Office
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1109/iSAI-NLP56921.2022.9960287
Description: This research is applying the MQTT broker protocol as a medium for various work orders in smart office management. It is an experiment and development of all functions of MQTT broker whether publishing, chatting and subscribing both globally and locally. The results are able to perform all commands correctly. In addition, in this research, the command procedure was added. This is a human speech command to operate all MQTT Brokers functions. However, there are still some weaknesses in the matter of voice commands are delayed response. It might not be a very good user experience. In this experiment, many functions were woven into the smart office. Regardless of whether the bulb acts as an IoT bulb internally connected to the MQTT broker, the camera performs the function of recognizing a person's face which is internally connected to MQTT broker. Speech also serves voice commands, lamp and feedback are connected to MQTT broker. Air conditioner acts as IoT air conditioner switch externally connected to cloud server. In addition, dashboard It also acts as an IoT visual light switch that connects externally to the cloud.
Citations: 1
Aggregation Type: Conference Proceeding
-------------------
Title: Development a Teleconsultation Platform for Outpatients during the COVID-19 Pandemic based on Cloud Firestore and Realtime Databases
Cover Date: 2022-01-01
Cover Display Date: 2022
DOI: 10.1109/BMEiCON56653.2022.10012077
Description: Due to the global epidemic situation of the Coronavirus Disease 2019 (Covid-19), in addition to serving patients with suspected symptoms and sickness from COVID-19, the hospital also provides services to patients outside requiring a lot of treatment causing a large number of queues in patients. It takes a long time to wait to see the doctor. The researcher therefore developed a teleconsultation platform. Hence, that patients can talk or seek advice from a doctor without the need to go to the hospital, allow patients to schedule appointments to see a doctor. Also, the patient can talk to the doctor via video calling developed in the system. Moreover, doctors can dispense medicines to patients by mail. To increase the efficiency of the system more and to support a wide range of applications, any devices, real-Time data updates, appointment notification via chatbot using Cloud Firestore and Realtime Databases, a NoSQL database, and study the performance gained. The results obtained from the test were satisfactory, with an average tracing server response of 107 ms + 0.14%, and an average handling latency in Thailand at 108 ms.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Improving of Imbalanced Data in Multiclass Classification for Sentiment Analysis using Supervised Term Weighting
Cover Date: 2021-09-01
Cover Display Date: 1 September 2021
DOI: 10.1109/RI2C51727.2021.9559797
Description: Sentiment classification (SC) is an ongoing field of research, which involves computing opinions, sentiments, and the subjectivity of a text. It has recently been proven that imbalanced classification is challenging for the SC research community. Most existing studies assume that the balance between negative and positive samples may not be true in reality. This work describes a method to improve the problem of imbalanced sentiment classification using supervised term weighting schemes and shows how these weighting schemes can improve the performance of sentiment classification with imbalanced data, especially in the domain of multi-class classification. Nonetheless, to obtain the most appropriate term weighting schemes, five term weighting schemes are comparatively studied, namely tf-idf, tf-idf-icf, tf-rf, tf-igm, and sqrt_tf-igm. In addition to comparing several term weightings schemes, this work also compares four supervised machine learning algorithms to obtain an appropriate algorithm, including k-Nearest Neighbor (k-NN), Multinomial Naïve Bayes (MNB), Support Vector Machines (SVM) with linear, and SVM with RBF. After evaluating by F1, the performance of sqrt_tf-igm was superior to all other weighting schemes. Since the overall picture of sqrt_tf-igm returned better results than the tf-idf, tf-idf-icf, and tf-rf methods, with improved scores of F1 at 10.94%. Meanwhile, the result of sqrt_tf-igm was slightly better than tf-igm.
Citations: 3
Aggregation Type: Conference Proceeding
-------------------
Title: Automatic dependent bug reports assembly for bug tracking systems by threshold-based similarity
Cover Date: 2021-09-01
Cover Display Date: September 2021
DOI: 10.11591/ijeecs.v23.i3.pp1620-1633
Description: Bug reports contain essential information for fixing problems that occur in software. Many studies have proposed methods for automatic analysis of bug reports. One such task could affect the completion of software bug fixing, known as "bug dependency". Although this problem was mentioned by many researches, most of them discussed about the related bugs but not really dealt with dependency issue in bug reports. One possible solution used for addressing this issue is to assemble all relevant/dependent bug reports together before analysis of the next processing stages. This study presents a method of assembling dependent bug reports. The main mechanism is called "threshold-based similarity analysis", and the three similarity techniques of cosine similarity (CS) multi aspect TF (MATF), and BM25 are compared with feedback, precision and likelihood value. As the BM25 with the threshold as 0.5 gives the best results, it was used to compare with the state of the art method. The results show that our method increases precision and likelihood values by 12% and 12.4% respectively. Therefore, our results can be used to encourage developers to recognize all dependent bugs in the same problem domain.
Citations: 1
Aggregation Type: Journal
-------------------
Title: A method of non-bug report identification from bug report repository
Cover Date: 2021-08-01
Cover Display Date: August 2021
DOI: 10.1007/s10015-021-00681-3
Description: One of the most common issues addressed by bug report studies is misclassification when identifying and then filtering non-bug reports from the bug report repository. Having to filter out unrelated reports wastes time in identifying actual bug reports, and this escalates costs as extra maintenance and effort are required to triage and fix bugs. Therefore, this issue has been seriously studied and is addressed here. To tackle this problem, this study proposes a method of automatically identifying non-bug reports in the bug report repository using classification techniques. Three points are considered here. First, the bug report features used are unigram and CamelCase, where CamelCase words are used for feature expansion. Second, five term weighting schemes are compared to determine an appropriate term weighting scheme for this task. Lastly, the support vector machine (SVM) family i.e. binary-class SVM, one class SVM based on Schölkopf methodology and support vector data description (SVDD) are used as the main mechanisms for modeling non-bug report identifiers. After testing by recall, precision, and F1, the results demonstrate the efficiency of identifying non-bug reports in the bug report repository. Our results may be acceptable after comparing to the previous well-known studies, and the performance of non-bug report identifiers with tf-igm and modified tf-icf weighting schemes for both Scölkopf methodology and SVDD methods yielded the best value when compared to others.
Citations: 9
Aggregation Type: Journal
-------------------
Title: Constraint-based clustering approach for retrieving of relevant decided civil cases in Thai
Cover Date: 2021-05-19
Cover Display Date: 19 May 2021
DOI: 10.1109/ECTI-CON51831.2021.9454946
Description: Civil cases usually refer to legal cases involving private disputes between persons or organizations. After judgment, the civil cases are termed as "decided cases"and the documents may be used for subsequent legal decisions. The substantial number of cases pleaded to the court has caused information overload in the legal area and become a topic of discussion in knowledge management. Some kind of filtering is required to reduce complexity and ease the workload. One possible solution is to group relevant decided cases together. Therefore, our study proposed the automatic retrieval of similar decided civil cases conducted in the Thai language as one cluster. To do this, we utilized a method of constraint-based clustering. Two clustering namely k-means and spherical k-means are compared. Also, three weighting schemes as tf-idf and BM25 were compared. The performance of the proposed method was tested by recall, precision, and F1. Results were satisfactory and acceptable, while the spherical k-means clustering with BM25 term weighting improved the performance of the others. We then selected the best model generated from our method and compared the results to the Multinomial Naïve Bayes and Support Vector Machines classification methods. Our proposed method returned better results than the classification methods, with improved average scores of recall, precision and F1 at 5.33%, 5.48% and 5.41%, respectively.
Citations: 2
Aggregation Type: Conference Proceeding
-------------------
Title: Applying CMX Tracking for Identifying the Indoor Positioning of Wireless Devices
Cover Date: 2021-02-19
Cover Display Date: 19 February 2021
DOI: 10.1145/3459104.3459140
Description: Most of today's technology has begun to move towards wireless technology. Even the network technology itself. That has been used WiFi technology, or the full name is Wireless Fidelity to replace the traditional network connection that uses LAN cable. For more convenience to use, it is at this point that WiFi technology is required to use a signal to link data between a user's device and a device in the system. For this reason, most company has a need to provide wireless network service to users. Helps service customers know the location of the users within the building. Allowing customers to use the Free Internet service of customers to promote the location to customers. The researchers have therefore studied and saw the positioning technology of devices entering the access point distribution range. It uses the same positioning theory used in GPS so that service providers can monitor people who enter the building. Also, can be furthered to find the user in different areas to allocate resources in that area appropriately. The technology selected for this propose is CMX because there is a work that meets the needs of all clients and most importantly, all devices in this work, whether they are access points, switches and servers.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Monitoring and Controlling Electrical Appliances through Rule Engine in the Smart Office
Cover Date: 2021-01-15
Cover Display Date: 15 January 2021
DOI: 10.1145/3456172.3456219
Description: In recent years, there has been an increasing interest in Internet of Things (IoT). Rule Engine is a developer tool to connect hardware devices to APIs (Application Programming Interface), a flow-based programming development with a UI for developers to use through a Web Browser. Node-RED runs on Node.js, making it ideal for use with Raspberry Pi as it consumes less resources. The file size is not large, and Node.js also acts as an intermediary for the Raspberry Pi to communicate with the web browser and other devices. This research is a development of the rule engine to monitor and operate office equipment. It was also shown that electrical appliances can be more easily controlled and operated by sensors. It can control the office environment in the desired conditions to become a smart office perfectly such as air conditioner, room temperature, room climate which includes the moisture value Carbon dioxide, TVOC, PM 2.5, and carbon monoxide. The values are stored in the Cloud and Edge Computing, and the results are very satisfactory. The results of this study indicate that the Rule Engine is the medium that connects sensors and office appliances, and effective. This finding has important implications for developing the smart office.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Concept-based one-class svm classifier with supervised term weighting scheme for imbalanced sentiment classification
Cover Date: 2021-01-01
Cover Display Date: 2021
DOI: 10.14456/easr.2021.62
Description: Imbalanced sentiment is one of the key classification issues. Many studies have proposed imbalanced sentiment classification improvements, but the topic remains problematic as a major challenge. This paper proposes a method, called “concept-based one-class SVM classifier”, to address imbalanced sentiment classification that consists of three main techniques. First, we apply Word2Vec and PageRank algorithms to extract “concepts” and their related terms (called “members”) embedded in texts. The corpus of “concepts” is then used to prepare the dataset by replacing words with the “concepts”. This reduces term ambiguity and also the size of word vectors. Second, supervised term weighting (STW) schemes are applied to determine the importance of a word in a document of a specific class. This reflects the class distinguishing power of each term. Finally, the one-class support vector machine (SVM) algorithm is used for sentiment classifier modeling. This has proved useful for imbalanced data classification, especially when the minority class lacks structure and is predominantly composed of small disjuncts or outliers. By combining these techniques, our proposed method may be able to competently identify and distinguish between the characteristics of each class, especially in the context of an imbalanced data scenario. After validating the proposed method with the hotel review dataset, and running experiments with different imbalanced ratios, our proposed method returned satisfactory results of recall, precision, and F1. We then selected the best model generated from our method and compared the results to the state-of-the-art method. Our proposed method returned better results than the state-of-the-art method, with improved scores of F1 at 3.19%. Moreover, if considering for the computational processing time, our proposed met hod is faster than the state-of-the-art method.
Citations: 4
Aggregation Type: Journal
-------------------
Title: Comparison of Methods to Estimate Missing Values in Monthly Rainfall Data
Cover Date: 2021-01-01
Cover Display Date: 2021
DOI: 10.1109/ICSEC53205.2021.9684588
Description: Missing data is common in data analytics, including rainfall, and can reduce accuracy through biased estimates that lead to invalid conclusions. Many methods have been proposed to handle missing data but the same method might not be suitable for every dataset. This challenge was taken up here as the presentation of a comparative study of popular methods to handle the issue of missing values. Popular methods of handling missing data values include removal of instances with missing values, arithmetic mean (AM), normal ratio (NR), nearest neighbor (NN) and inverse distance (ID). We proposed a method of estimating missing values called step-5 simple moving average (step-5 SMA). This applies simple moving average (SMA) principles with consideration of the EI Nino-Southern Oscillation (ENSO) phenomenon. After enhancing the training set, our method was used to model monthly rainfall forecasts utilizing two supervised machine learning algorithms as support vector machines (SVM) and k-nearest neighbor (k-NN). We used monthly precipitation data (in mm) gathered between 1953 and 2013 from 26 water measurement stations of the Meteorological Department located in Northeast Thailand. After evaluating by MAE and RMSE, results showed that monthly rainfall forecasters developed by the training set that removed observations with missing values returned the lowest performance. Enhancing the quality of the training set using our step-5 SMA gave a better performance than the other missing value estimation methods.
Citations: 1
Aggregation Type: Conference Proceeding
-------------------
Title: Automatically Retrieving of Software Specification Requirements Related to Each Actor
Cover Date: 2021-01-01
Cover Display Date: 2021
DOI: 10.1007/978-3-030-79757-7_12
Description: It is well-known that success or failure of any software is dependent upon requirement analysis. The most significant problem hidden in natural language software requirements may be ambiguity. This can lead to poor design and performance of the final software product and can be time-consuming at the system analysis stage. Previous studies applied natural language processing (NLP) techniques to identify these ambiguities and reduce them to improve the requirement quality. However, this problem has not yet been fully resolved. This study applies NLP to automatically extract the software system requirement specification and visualize an overview of the software that is being developed. The proposed method clusters relevant software requirement sentences of each system user by text mining technique, and then extracts ‘actors’ and their ‘actions’ from sentences using NLP techniques. The recall technique is used to evaluate the efficacy of the proposed method. Response time and relevancy of the results are significant factors for software product satisfaction.
Citations: 1
Aggregation Type: Book Series
-------------------
Title: Comparing of Multi-class Text Classification Methods for Automatic Ratings of Consumer Reviews
Cover Date: 2021-01-01
Cover Display Date: 2021
DOI: 10.1007/978-3-030-80253-0_15
Description: Consumer reviews show inconsistent ratings when compared to their contents as a result of sarcastic feedback. Consequently, they cannot provide valuable feedback to improve products and services of the firms. One possible solution is to utilize consumer review contents to identify the true ratings. In this work, different multi-class classification methods were applied to assign automatic ratings for consumer reviews based on a 5-star rating scale, where the original review ratings were inconsistent with the content. Two term weighting schemes (i.e. tf-idf and tf-igm) and five supervised machine learning algorithms (i.e. k-NN, MNB, RF, XGBoost and SVM) were compared. The dataset was downloaded from the Amazon website, and language experts helped to correct the real rating for each consumer review. After verifying the effectiveness of the proposed methods, the multi-class classifier model developed by SVM along with tf-igm returned the best results for automatic ratings of consumer reviews, with average improved scores of accuracies and F1 over the other methods at 11.7% and 10.5%, respectively.
Citations: 10
Aggregation Type: Book Series
-------------------
Title: Evaluation of the Reliability of Heat Map Planner Software to Assist in Indoor Positioning
Cover Date: 2021-01-01
Cover Display Date: 2021
DOI: 10.1109/ICTKE52386.2021.9665700
Description: This research is aimed at learning and understand how to make a heat map to design a suitable access point and then compare the discrepancies between the heat map and the signal quality measurement at the actual work Including statistical comparison of the signal distribution through different types of walls for calculating the location of the Wi-Fi devices in indoor positioning. To study the efficiency of the Wi-Fi signal that can be distributed evenly across the locations. The method of operation is designing the installation location of the access point and use it to specify the location in the Heat Map Planner software to measure the Wi-Fi signal and then measure the actual signal storage area. Then, the measured values from the actual work site were compared statistically with the program values to see the efficiency of the broadcasting. In conducting this analysis, it will be possible to find the possible tolerances and then be able to perform the next heat map efficiently. There are fewer mistakes. In conclusion, the quality of the concrete wall distribution compared with the outdoor signal quality measurement, there was a reduction of -12dBm. The signal quality through the Glass Wall with Metal Frame compared to the outdoor signal quality measurement, signal attenuation down to -7dBm.
Citations: 2
Aggregation Type: Conference Proceeding
-------------------
Title: Integration of Cloud Computing with Internet of Things for Network Management and Performance Monitoring
Cover Date: 2020-11-18
Cover Display Date: 18 November 2020
DOI: 10.1109/ICTKE50349.2020.9289876
Description: The past decade has seen the rapid development of the Internet in many areas. Nowadays, the internet is used widely in every organization. People also have internet usage 24 hours a day. Hence, it is important to monitor the working situation of network equipment all day long. This research is applied Internet of Things and Cloud Computing technology to monitor the operation of computer network devices by monitoring the network condition to be stable, secure, check the operation of the network, record working status and send alerting the administrator. In large scale networks, various problems are often encountered, such as when the server is unavailable due to too many users or when there are network users too much will cause the network to be slow or may not work. For example, which the device may be overused or some error occurs. This proposed platform can analyze problems that may occur in the future, will help reduce costs and damage that will occur when damaged or unusable. Also, it can also analyze the data to reduce the risk that may affect the network. It will be able to use various information to improve network performance, allows administrators to check managing network systems at the same time, multiple machines, and all over. In this testbed experiment, there are a total of 8 network equipment involved. Each device has approximately 48 interfaces, with each interface having an average 5 value. Therefore, the amount of traffic on the network shown is enormous. Researcher studies how to be sure that all data displayed in the system is accurate and reliable. In the process of retrieving the value, the OIDs (Object Identifier) in the MIB (Management Information Base). Hierarchy of each device have different values depending on the model, model type and brand of the device. How to be sure that the value retrieved is displayed as the correct value? Because if the OID value is incorrect, the value received will also be incorrect. The interpretation of the system administrator is also wrong. Another issue is that the values stored in the MIB Database are stored in raw ASCII format. When sending values to a machine that acts as an SNMP manager, the file is in binary format which is an unreadable value. Therefore, in this research, the interpretation of the value obtained must be correctly interpreted is a readable value and is in the unit of the correct value display. In this experiment, using the Raspberry Pi 3 board.
Citations: 2
Aggregation Type: Conference Proceeding
-------------------
Title: Applying image processing and edge computing for plant growth monitoring in smart farm
Cover Date: 2020-08-31
Cover Display Date: 31 August 2020
DOI: N/A
Description: In the history of development economics, smart farming has been thought of as a key factor in agriculture more efficiently. It is necessary to integrate various technologies to deploy in the process of production and cultivation. The aim of this study was to apply and evaluate information technology technique with agricultural engineering in order to reduce the monitoring process for farmers. This research was applying Image Processing techniques to assist in cultivation by measuring plant growth and plant health through webcam. The image of plant will calculate on Edge Computing in Raspberry Pie 3. The results will be displayed as a percentage of plant growth. In addition, this process can be performed the Smart Farming through the Internet of Things (IoT) technology and Edge Computing. The system will control the environment such as temperature, humidity, light, heat, pH in the greenhouse is appropriate for the growth of plant automatically. The results of this study indicate that plants growing faster is to increase productivity and reduce workload for farmers. The current findings add substantially to our understanding of techniques for processing plant growth from low resolution images in order to be suitable for real farmers' use. The results from the research are satisfactory. The evidence from this study indicates that 80.95% accurate for low resolution images.
Citations: 1
Aggregation Type: Book
-------------------
Title: Applying Image Processing and Edge Computing for Plant Growth Monitoring in Smart Farm
Cover Date: 2020-01-01
Cover Display Date: 2020
DOI: N/A
Description: In the history of development economics, smart farming has been thought of as a key factor in agriculture more efficiently. It is necessary to integrate various technologies to deploy in the process of production and cultivation. The aim of this study was to apply and evaluate information technology technique with agricultural engineering in order to reduce the monitoring process for farmers. This research was applying Image Processing techniques to assist in cultivation by measuring plant growth and plant health through webcam. The image of plant will calculate on Edge Computing in Raspberry Pie 3. The results will be displayed as a percentage of plant growth. In addition, this process can be performed the Smart Farming through the Internet of Things (IoT) technology and Edge Computing. The system will control the environment such as temperature, humidity, light, heat, pH in the greenhouse is appropriate for the growth of plant automatically. The results of this study indicate that plants growing faster is to increase productivity and reduce workload for farmers. The current findings add substantially to our understanding of techniques for processing plant growth from low resolution images in order to be suitable for real farmers' use. The results from the research are satisfactory. The evidence from this study indicates that 80.95% accurate for low resolution images.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Collecting Child Psychiatry Documents of Clinical Trials from PubMed by the SVM Text Classification Method with the MATF Weighting Scheme
Cover Date: 2020-01-01
Cover Display Date: 2020
DOI: 10.1007/978-3-030-19861-9_10
Description: Child psychiatry is a branch of psychiatry focused on the diagnosis, treatment, and prevention of mental health issues in children and their families. In many countries, the study of disorders such as ADHD (Attention-Deficit/Hyperactivity Disorder) by child and adolescent psychiatry is still in its infancy, with the result that children’s mental health issues can be the source of embarrassment for the family and of shame for many children. Misunderstanding, denying, and ignoring children’s mental health issues by parents are the main problem encountered in diagnosis and treatment of mental health issues in children. To help parents and extended families understand this problem better, and thus help them to better care for children with mental health issues, starting with seeking help from a psychiatrist without embarrassment, an easily accessible and reliable source of information is urgently needed. To develop such a single source of information, relevant documents need to be gathered together. This study presents a method of gathering reports of clinical trials from PubMed which describe diagnosis and treatment of child mental health issues. The main mechanism of the proposed method is a Support Vector Machine with a Multi Aspect TF (MATF) weighting scheme. After testing by recall, precision, and F1, it can return satisfactory results of 0.82, 0.79, and 0.80 respectively.
Citations: 0
Aggregation Type: Book Series
-------------------
Title: Identifying of Decision Components in Thai Civil Case Decision by Text Classification Technique
Cover Date: 2020-01-01
Cover Display Date: 2020
DOI: 10.1007/978-3-030-44044-2_2
Description: A Thai civil case decision document is typically presented in a semi-structured form. Generally, Thai civil case decision documents consist of four major components comprising the dispute, facts, decision, and judgment. To perform text summarization or information extraction on this document, the first process should recognize major components in the Thai civil case decision document. This has not been addressed previously and becomes the challenge for our study that aims to present a method of identifying the four major components utilizing the text classification technique. We employed two weighting schemes and three supervised machine learning algorithms and downloaded the dataset from the Supreme Court of Thailand website (http://www.supremecourt.or.th). After testing by recall, precision, and F1 satisfactory results were achieved for identifying major components in Thai civil case decision documents at 0.83, 0.80 and 0.81, respectively.
Citations: 1
Aggregation Type: Book Series
-------------------
Title: Feature Comparison for Automatic Bug Report Classification
Cover Date: 2020-01-01
Cover Display Date: 2020
DOI: 10.1007/978-3-030-19861-9_7
Description: Nowadays, various bug tracking systems (BTS) such as Jira, Trace, and Bugzilla have been developed and proposed to gather the issues from users worldwide. This is because those issues, called bug reports, contain a significant information for software quality maintenance and improvement. However, many bug reports with poor quality might have been submitted to the BTS. In general, the reported bugs in the BTS are firstly analyzed and filtered out by bug triagers. However, with the increasing amount of bug reports in the BTS, manually classifying bug reports is a time-consuming task. To address this problem, automatically distinguishing of bugs and non-bugs is necessary. To the best of our knowledge, this task is never easy for bug reports classification because the problem of bug reports misclassification still occurs to date. The background of this problem may be arise from using inappropriate or confusing features. Therefore, this work aims to study and discover the most proper features for binary bug report classification. This study compares seven features such as unigram, bigram, camel case, unigram+bigram, unigram+camel case, bigram+ camel case, and all features together. The experimental results show that the unigram+camel case should be the most proper features for binary bug report classification, especially when using with the logistic regression algorithm. Consequently, the unigram+camel case should be the proper feature to distinguish bug reports from the non-bugs ones.
Citations: 13
Aggregation Type: Book Series
-------------------
Title: A Form and API Data Management Platform for Progressive Web Application and Serverless Application Architecture
Cover Date: 2019-11-23
Cover Display Date: 23 November 2019
DOI: 10.1145/3372422.3372452
Description: In the new global economy, web application has become a central issue for an enterprise organization with branches covering many countries around the world. One major issue in early business services research concerned in those organizations is that developing a web application that can support both global and local services at the same time is very difficult. Since at the local level, each country will have different languages, currencies, regulations. Therefore, to develop a system or web application to support business services around the world will have to be repeated the same, but with some details that are different, such as the same invoice form. However, to support services around the world that has repeated the same process but there is detailed information in each country or each language that is different. The objectives of this research are to determine whether to develop a platform that makes Form design to be shared in many countries more easily by automatically linking to the database via API. APIs can be embedded seamlessly into both the front-end and server side. The user interface can be designed smoothly. Export code can be used with HTML. In additional, it can work with serverless applications. Files can be managed in the form of JSON, csv and txt files. The results indicate that the designed platform can support the work that is intended to meet the objectives. Makes Form development in a web application very convenient and reduces many repetitive steps. In addition, data management is also effective. When creating this Form, it helps developers reduce the time it takes to create multiple Forms and reduces the form creation errors in the settings. Custom will help to keep the original Form.
Citations: 4
Aggregation Type: Conference Proceeding
-------------------
Title: The Integration of File Server Function and Task Management Function to Replace Web Application on Cloud Platform for Cost Reduction
Cover Date: 2019-11-01
Cover Display Date: November 2019
DOI: 10.1109/APCCAS47518.2019.8953164
Description: The aim of this paper is to propose a new technique to operate a web application on a cloud platform based on the integration of multiple services which could benefit in the cost reduction of renting a cloud. Two of the most important functions are file server function and task management function. The purpose of study is to experiment this concept and technique. It is tested on Microsoft Azure platform since it is a cloud platform that covers all services required. These services are helpful in the operation of storing and retrieving data file in the created file server. Data is then displayed on web browser. Task management function is helpful in making notification to all 3 involved applications which are Trello, Microsoft To-Do, and Microsoft Planner. The operation starts with user's data input being processed in the function to create task list on the application. Test result is satisfactory i.e. the designed technique is able to function completely as normal website function and web application. Its main feature is the reduction of cloud rental fee because when service is open for user to access, the airtime fee is quite costly. Hence, with this new technique, the expense is calculated only when there is a data access from user which makes it much cheaper.
Citations: 7
Aggregation Type: Conference Proceeding
-------------------
Title: Integration of IoT, Edge Computing and Cloud Computing for Monitoring and Controlling Automated External Defibrillator Cabinets in Emergency Medical Service
Cover Date: 2019-05-14
Cover Display Date: 14 May 2019
DOI: 10.1109/INFOMAN.2019.8714717
Description: One of the most significant current discussions in Emergency Medical Service (EMS) is when an accident occurs and someone is injured, or when patients with acute cardiac arrest patients need to call the ambulance for immediate assistance. A major problem with this kind of situation is ambulance or rescue do not know the exact location of the victim, waste time in searching for victims which sometimes causes could not save their life. Another problem found that sometime when the rescue came, there was no AED (Automated External Defibrillator) with them, which if the rescue knew where the AED is, can go to get the AED to save the patient's life in a timely manner. Therefore, the objectives of this research are to develop the emergency platform that monitors the location of the AEDs and control the opening / closing of the AED cabinets through the App. Also, the user can check the status of the AED through the App. This platform also specifies the exact location of the accident via the GPS system. Further statistical tests revealed that the location of the AED has 95% accuracy, can be controlled accurately 96%, with the average response time through the App 658 ms. The results indicate that this platform supports EMS services more effective.
Citations: 13
Aggregation Type: Conference Proceeding
-------------------
Title: Internet usage patterns mining from firewall event logs
Cover Date: 2019-03-30
Cover Display Date: 30 March 2019
DOI: 10.1145/3322134.3322155
Description: Understanding users' behavior of internet usage is essential for the quality of service (QoS) analysis on the internet. If the internet providers can better understand their users, they may be able to provide better service, and also enhance the quality of the service. In general, the information about users' behavior is stored as the internet access log files, called event logs, on the server. To have the patterns of users' behavior from the event logs, this work aims to extract an interesting pattern of inappropriate user behaviors through the method of internet usage patterns mining. The primary mechanism of the proposed method is the Generalized Sequential Pattern (GSP) algorithm, which is an algorithm of sequential pattern mining. This study uses real event logs from an organization in Thailand. The results have identified exciting findings that have made possible to propose some improvements and increasing the QoS of the internet service.
Citations: 7
Aggregation Type: Conference Proceeding
-------------------
Title: Finding Clinical Knowledge from MEDLINE Abstracts by Text Summarization Technique
Cover Date: 2018-12-20
Cover Display Date: 20 December 2018
DOI: 10.23919/INCIT.2018.8584867
Description: Today, the MEDLINE is an important repository containing more than 26 million citations and abstracts in the fields of medicine, while PubMed provides free access to MEDLINE and links to full-text articles. MEDLINE abstracts becomes a potential source of new knowledge in medical field. However, it is time-consuming and labour-intensive to find knowledge from MEDLINE abstracts, when a search returns much abstracts and each may contain a large volume of information. Therefore, this work aims to present a method of summarizing clinical knowledge from a MEDLINE abstract. The main mechanisms of the proposed method are driven on natural language processing (NLP) and text filtering techniques. The case study of this work is to summarize the clinical knowledge from a MEDLINE abstracts relating to cervical cancer in clinical trials. In the evaluation stage, the actual results obtained from a domain expert are used to compare the predicted results. After testing by recall, precision, and F-score, they return the satisfactory results, where the average of recall, precision, and F-measure are 0.84, 1.00, and 0.91 respectively.
Citations: 7
Aggregation Type: Conference Proceeding
-------------------
Title: Assembling Relevant Bug Report using the Constraint-based k-means Clustering
Cover Date: 2018-12-20
Cover Display Date: 20 December 2018
DOI: 10.23919/INCIT.2018.8584866
Description: Bug reports provide an important information for improving software quality. Today, many bug tracking systems (BTS) such as Bugzilla, Jira, Mantis, and Trac are developed for collecting bug reports from users around the world. Unfortunately, many tasks on the BTS are still performed manually by bug triagers. The process is time consuming and errors prone. Although many studies on bug reports have been proposed, the problems may have never been truly investigated. It is the problem of bug dependency which is when an unfixed bug'a' affects bug 'b'. As a result, bug'b cannot be fixed if bug 'a' is not fixed. To address this problem, the relevant bug reports must be assigned to the same specific category in order to help the developers recognize all bugs that are indicating to the same problem domain. Bug dependency is a time-consuming and labor-intensive process. This is a challenge issue. Therefore, this work aims to present a method for assembling the relevant bug reports into specific clusters by the modifiedk-mean clustering algorithm, called the constraint-based k-means clustering. Furthermore, three weighting methods oftf, tf-idf, and BM25 are compared. After testing by recall, precision, andF-measure, the results reveal good score in precision but the recall score should be improved. The method withtf returns the better results than tf-idf and BM25 methods because tf method is based on the local weight that has paid towards a specific cluster-oriented.ster-oriented.
Citations: 3
Aggregation Type: Conference Proceeding
-------------------
Title: Word2Vec approach for sentiment classification relating to hotel reviews
Cover Date: 2018-01-01
Cover Display Date: 2018
DOI: 10.1007/978-3-319-60663-7_29
Description: In general, the existing works in sentiment classification concentrate only the syntactic context of words. It always disregards the sentiment of text. This work addresses this issue by applying Word2Vec to learn sentiment specific words embedded in texts, and then the similar words will be grouped as a same concept (or class) with sentiment information. Simply speaking, the aim of this work is to introduce a new task similar to word expansion or word similarity task, where this approach helps to discover words sharing the same semantics automatically, and then it is able to separate positive or negative sentiment in the end. The proposed method is validated through sentiment classification based on the employing of Support Vector Machine (SVM) algorithm. This approach may enable a more efficient solution for sentiment analysis because it can help to reduce the inherent ambiguity in natural language.
Citations: 18
Aggregation Type: Book Series
-------------------
Title: Concept-based sentiment analysis for opinion texts with multiple-languages
Cover Date: 2016-01-01
Cover Display Date: 2016
DOI: 10.1007/978-3-319-40415-8_4
Description: Today, millions of message posted daily contain opinions of users in a variety of languages, including emoticon. Sentiment analysis becomes a very difficult task, and the understanding and knowledge of the problem and its solution are still preliminary. Therefore, this work presents a new methodology, called Concept-based Sentiment Analysis (C-SA). The main mechanism of the C-SA is Msent-WordNet (Multilingual Sentiment WordNet), which is used to prove and increase the results accuracy of sentiment analysis. By using the Msent-WordNet, all words in opinion texts having similar sense or meaning will be denoted and considered as a same concept. Indeed, concept-level sentiment analysis aims to go beyond a mere word-level analysis of text and provide novel approaches to sentiment analysis that enables a more efficient solution from opinion text. This can help to reduce the inherent ambiguity and contextual nature of human languages. Finally, the proposed methodology is validated through sentiment classification.
Citations: 2
Aggregation Type: Book Series
-------------------
Title: Mining business rules from business process model repositories
Cover Date: 2015-07-06
Cover Display Date: 6 July 2015
DOI: 10.1108/BPMJ-01-2014-0004
Description: Purpose: – Business process has become the core assets of many organizations and it becomes increasing common for most medium to large organizations to have collections of hundreds or even thousands of business process models. The purpose of this paper is to explore an alternative dimension to process mining in which the objective is to extract process constraints (or business rules) as opposed to business process models. It also focusses on an alternative data set – process models as opposed to process instances (i.e. event logs). Design/methodology/approach: – The authors present a new method of knowledge discovery to find business activity sequential patterns embedded in process model repositories. The extracted sequential patterns are considered as business rules. Findings: – The authors find significant knowledge hidden in business processes model repositories. The hidden knowledge is considered as business rules. The business rules extracted from process models are significant and valid sequential correlations among business activities belonging to a particular organization. Such business rules represent business constraints that have been encoded in business process models. Experimental results have indicated the effectiveness and accuracy of the approach in extracting business rules from repositories of business process models. Social implications: – This research will assist organizations to extract business rules from their existing business process models. The discovered business rules are very important for any organization, where rules can be used to help organizations better achieve goals, remove obstacles to market growth, reduce costly mistakes, improve communication, comply with legal requirements, and increase customer loyalty. Originality/value: – There has very been little work in mining business process models as opposed to an increasing number of very large collections of business process models. This work has filled this gap with the focus on extracting business rules.
Citations: 10
Aggregation Type: Journal
-------------------
Title: The automatic Thai basketry detection and recognition on the local wisdom video
Cover Date: 2015-02-09
Cover Display Date: 9 February 2015
DOI: 10.1109/INFOS.2014.7036707
Description: To retrieve the videos having content relating to Local Thai basketry, creating video's indexes is vital because indexes allow users to more quickly find video for specific individuals. In this work, it aims to propose a novel methodology of video indexing. The proposed methodology consists of two main processing stages. The first stage is to capture video frames having the Thai basketry, and then classify them into many individual groups, because local Thai basketry can be classified into many types depending on its usability and pattern. This stage is driven on using color and texture, which are processed through the Artificial Neural networks (ANNs). The second stage is to recognize the basketry shape by chain code and template matching analysis on the object's shape in order to use them as video indexing. Finally, the proposed methodology is experimented on 41 local wisdom videos having 76 shorts and 4,196 frames. After testing by recall, precision, and F-measure, they show the satisfactory results for recall, precision, and F-measure as 78.64%, 83.88%, and 81.18%, respectively.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Concept-based text classification of thai medicine recipes represented with ancient isan language
Cover Date: 2015-01-01
Cover Display Date: 2015
DOI: 10.1007/978-3-319-19024-2_12
Description: This work presents the concept-based text classification for organizing of traditional Thai medicine recipes. These recipes were translated from the Northeastern Thai palm leaf manuscripts. It is noted that each medicine recipe is presented with the ancient Isan language. The proposed method is called ‘concept-based text classification’, because we utilize ‘concepts’ as document features, where a concept is a surrogate of a word group having a same meaning. The main mechanisms in the method are the k-Nearest Neighbor algorithm and an ancient Isan dictionary, called Isan-Thai Markup Language (ITML). The objective of this work is to assign the Thai medicine recipes into predefined 5 groups. They are the groups of medicine recipe for headache and fever, stomachache and abdomen, skin, abscess, and faint and vertigo, respectively. After testing by recall, precision, and F-measure, it returns the satisfactory results of automatic text classification.
Citations: 1
Aggregation Type: Book Series
-------------------
Title: Multilingual sentiment classification on large textual data
Cover Date: 2014-01-01
Cover Display Date: 2014
DOI: 10.1109/BDCloud.2014.15
Description: At present, Big Data have been created lot of buzz in the technology world. Sentiment Analysis or opinion mining is one of the important applications of 'Big Data', where sentiment analysis is used for recognising voice or response of crowd for products, services. This concept describes the items in some detail and evaluate them as good/bad, preferred/not preferred. The results are very important for a company because customer feedback can yield extremely valuable insights about a company's customer. However, in a commercial website of product reviews, many customers can access to describe the items in some detail and evaluate them with different languages. Therefore, many companies will gather customer feedback in multiple languages. Definitely, feedback in multiple languages raises problems in analysing the material. As this, this paper proposes a solution to classify a product review dataset into two classes: positive and negative sentiments. The proposed methodology is called 'Multilingual Sentiment Classification (MSC)'. It consists of two main processing steps: lingual separation and sentiment classification. The first main processing step is to classify online product reviews into language classes. The second processing step is to classify each textual dataset into two classes: positive and negative sentiments. It is noted, we concentrate and experiment on bilingual texts (Thai and English).
Citations: 10
Aggregation Type: Conference Proceeding
-------------------
Title: Concept-based cross language retrieval for Thai medicine recipes
Cover Date: 2014-01-01
Cover Display Date: 2014
DOI: 10.1007/978-3-319-12823-8_33
Description: This work aims to present a new methodology to retrieve the documents relating to the traditional Thai medicine recipe that is translated from the ancient palm leaf manuscripts. This methodology is developed based on three main concepts: sematic data, latent search indexing (LSI), and cross language information retrieval (CLIR). Our methodology consists of four main processing steps. They are document indexing, document representation based on LSI, user’s query transformation, and document retrieval and ranking. After testing by the common performance measures for information retrieval system such as recall, precision, and F-measure, it would demonstrate that our methodology can achieve substantial improvements.
Citations: 1
Aggregation Type: Book Series
-------------------
Title: Ontology-based text classification for filtering cholangiocarcinoma documents from PubMed
Cover Date: 2014-01-01
Cover Display Date: 2014
DOI: 10.1007/978-3-319-09891-3_25
Description: PubMed is a search engine used to access the MEDLINE database, which comprises the massive amounts of biomedical literature. This an make more difficult for accessing to find the relevant medical literature. Therefore, this problem has been challenging in this work. We present a solution to retrieve the most relevant biomedical literature relating to Cholangiocarcinoma in clinical trials from PubMed. The proposed methodology is called ontology-based text classification (On-TC). We provide an ontology used as a semantic tool. It is called Cancer Technical Term Net (CCT-Net). This ontology is intergrated to the methodology to support automatic semantic interpretation during text processing, especially in the case of synonyms or term variations. © 2014 Springer International Publishing.
Citations: 4
Aggregation Type: Book Series
-------------------
Title: Ontology-based knowledge discovery from unstructured text
Cover Date: 2013-08-02
Cover Display Date: 2013
DOI: 10.4156/ijipm.vol4.issue4.3
Description: Finding knowledge from unstructured textual data is a major unsolved problem in the area of knowledge discovery in databases (KDD). The problem becomes particularly acute due to ambiguity and lexical variations in natural language. This work seeks to address these problems. Therefore, this work is to present a unified methodology, called the ontology-based knowledge discovery in unstructured text (ON-KDT) methodology, to discover knowledge from unstructured text. This approach leverages semantic information encoded in ontologies to improve the effectiveness of the knowledge extraction process.
Citations: 1
Aggregation Type: Journal
-------------------
Title: The cancerology ontology: Designed to support the search of evidence-based oncology from biomedical literatures
Cover Date: 2011-09-26
Cover Display Date: 2011
DOI: 10.1109/CBMS.2011.5999168
Description: This work proposes a new ontology, called the Cancerology, where it faces a problem of unclear analysis in a biomedical text processing because existing ontologies such National Cancer Institute's Thesaurus and Ontology do not offer some information relating to domain specific variations in terms that can be provided by the domain expert. This ontology is experimented through a method of text classification with retrieving the relevant cervix cancer abstracts relating to clinical trials from PubMed. The experimental results show more effectiveness for increasing the accuracy. This demonstrates that the Cancerology may be also effective for other areas of text processing and analysis, especially in the particular domain of oncology literature such as intelligent search service, text mining, and knowledge extraction. © 2011 IEEE.
Citations: 3
Aggregation Type: Conference Proceeding
-------------------
Title: Thai E-san heritage images classification based on color features analysis: The normalized R/G ratio and edge histograms
Cover Date: 2010-12-01
Cover Display Date: 2010
DOI: 10.4156/ijipm.vol1.issue1.4
Description: In order to reduce a dilapidation of Thai E-san heritages by directly accessing and touching, it is found that digital Library can be a solution to represent, retrieval, and study Thai E-san culture heritages. This digital library was started with organization of a heritage image collection through classification technique. Therefore, this work is motivated by two drivers. First, it aims to apply an alternative dimension of Content-based Image Retrieval (CBIR) to classify a collection of Thai E-san heritage images into two classes: the class of heritage images with human, and the class of heritage images without human (e.g. images of ancient remains and antiques). Second, it also proposes a method of CBIR to automatically classify a heritage image collection based on color features. This approach is valuable for the automatically classifying heritage images, where it is a time-consuming and labour-intensive process if it is done by manual classification. Two techniques are proposed to classify and organize the collection of Thai E-san heritage images: the Normalized R/G ratio and the Naïve Bayes Image classifier based on edge histogram. After testing, the experimental results show the average accuracy of the Normalized R/G ratio at 72.5%, and the average accuracy of the Naïve Bayes Image classifier based on edge histogram at 86%. This would demonstrate that our approach can be sufficiently reliable for approach.
Citations: 2
Aggregation Type: Journal
-------------------
Title: Business rules discovery from process design repositories
Cover Date: 2010-11-05
Cover Display Date: 2010
DOI: 10.1109/SERVICES.2010.73
Description: Traditional process mining approaches focus on extracting process constraints or business rules from repositories of process instances. In this context, process designs or process models tend to be overlooked although they contain information that are valuable for the process of discovering business rules. This paper will propose an alternative approach to process mining in terms of using process designs as the mining resources. We propose a number of techniques for extracting business rules from repositories of business process designs or models, leveraging the well-known Apriori algorithm. Such business rules are then used as a prior knowledge for further analysing, verifying, and modifying process designs. © 2010 IEEE.
Citations: 10
Aggregation Type: Conference Proceeding
-------------------
Title: Thai heritage images classification by naïve bayes image classifier
Cover Date: 2010-10-25
Cover Display Date: 2010
DOI: N/A
Description: Digital Library is a way to represent, retrieval, and study Thai E-san culture heritages without directly accessing and touching. It may help to reduce a dilapidation of Thai E-san culture heritages. We commence our project with organizing a collection of images through classification technique. Therefore, this work is motivated by two main drivers. Firstly, we aim to apply an alternative dimension of CBIR to classify a collection of Thai E-san heritage images into two classes: the class of heritage images which involve human activities, and the class of heritage images with non-human activities (e.g. images of ancient remains and antiques). Secondly, we also propose a new method of images classification. It is to apply Naive Bayes to produce image classifier based on edge histogram features. This approach is valuable for the automatically classifying heritage images, where it is a time-consuIning and labour-intensive process if it is done by manual classification. After testing, the experimental results show an effective accuracy. This would demonstrate that our approach is sufficiently reliable for use.
Citations: 6
Aggregation Type: Conference Proceeding
-------------------
Title: An ontology-based text processing approach for simplifying ambiguity of requirement specifications
Cover Date: 2009-12-01
Cover Display Date: 2009
DOI: 10.1109/APSCC.2009.5394119
Description: In the last few years, several works in the literature of software engineering have addressed the problem of requirement management. A majority problem of software errors is introduced during the requirements phase because much of requirements specification is written in natural language format. As this, it is hard to identify consistencies because of too ambiguous for specification purpose. Therefore, this paper aims to propose a method for simplifying ambiguity of requirement specification documents through two concepts of ontology-based probabilistic text processing: Text classification and Text Filtering. Text classification is used to analyze and classify requirement specification having similar detail into the same class. This contributes to a better understanding of the impact of the requirements and to elaborate them. Meanwhile, text filters are used to leverage synopsis requirements in documents through probabilistic text classification technique. After testing by Fmeasure, the experimental results return a satisfactory accuracy. These demonstrate that our method may provide more effectiveness for simplifying ambiguity of requirement specifications. ©2009 IEEE.
Citations: 15
Aggregation Type: Conference Proceeding
-------------------
Title: An ontology-based sentiment classification methodology for online consumer reviews
Cover Date: 2008-12-01
Cover Display Date: 2008
DOI: 10.1109/WIIAT.2008.68
Description: This paper presents a method of ontology-based sentiment classification to classify and analyse online product reviews of consumers. We implement and experiment with a support vector machines text classification approach based on a lexical variable ontology. After testing, it could be demonstrated that the proposed method can provide more effectiveness for sentiment classification based on text content. © 2008 IEEE.
Citations: 58
Aggregation Type: Conference Proceeding
-------------------
Title: An automatic elaborate requirement specification by using hierarchical text classification
Cover Date: 2008-12-01
Cover Display Date: 2008
DOI: 10.1109/CSSE.2008.1393
Description: Ambiguity is a major problem of software errors because much of the requirements specification is written in a natural language format. Therefore, it is hard to identify consistencies because this format is too ambiguous for specification purposes. This paper aims to propose a method for handling requirement specification documents which have a similar content to each other through a hierarchical text classification. The method consists of two main processes of classification: heavy classification and light classification. The heavy classification is to classify the requirement specification documents having similar content together. Meanwhile, light classification is to elaborate specification requirement documents by using the Euclidean Distance. Finally, slimming down the number of requirements specification through hierarchical text classification classifying may yield a specification which is easier to understand. That means the proposed method is more effective for reducing and handling in the requirements specification. © 2008 IEEE.
Citations: 12
Aggregation Type: Conference Proceeding
-------------------
Title: A web pornography patrol system by content based analysis: In particular text and image
Cover Date: 2008-12-01
Cover Display Date: 2008
DOI: 10.1109/ICSMC.2008.4811326
Description: A problem of children being exposed to pornographic web sites on the internet has led to their safety issues. To prevent the children from these inappropriate materials, an effective web filtering system is essential. Content based web filtering is one of the important techniques to handle and filter inappropriate information on the web. In this paper, we examine a content based analysis technique to filter the pornographic web sites. Then, our system consists of two primary content based filtering techniques such as text and image. For text analysis, the Support Vector Machine (SVM) algorithm and N gram model based on Bayes' theorem is applied and experimented to filter pornographic text for both Thai and English language web sites. Meanwhile, we build and examine an image filtering system with a hierarchical image filtering method. It consists of two main processes such as normalized R/G ratio which is using the pixel ratios (red and green color channels) and human composition matrix (HCM) based on skin detection. The empirical results show that our analysis methods of text and image are more effective for pornographic web filtering. Finally, we have modeled a pornographic web filter using content based analysis into our Anti X system. © 2008 IEEE.
Citations: 25
Aggregation Type: Conference Proceeding
-------------------
Title: Synopsis information extraction in documents through probabilistic text classifiers
Cover Date: 2007-01-01
Cover Display Date: 2007
DOI: 10.1007/978-3-540-77094-7_70
Description: N/A
Citations: 0
Aggregation Type: Book Series
-------------------
Title: A web pornography patrol system based on hierarchical image filtering techniques
Cover Date: 2006-12-01
Cover Display Date: 2006
DOI: 10.2991/jcis.2006.268
Description: Due to the flood of pornographic web sites on the internet, content-based web filtering has become an important technique to detect and filter inappropriate information on the web. This is because pornographic web sites contain many sexually oriented texts, images, and other information that can be helpful to filter them. In this paper, we build and examine a system to filter web pornography based on image content. Our system consists of three main processes: (i) normalized R/G ratio, (ii) histogram, and (iii) human composition matrix (HCM) based on skin detection. The first process is using the pixel ratios (red and green color channels) for image filtering. The second process, histogram analysis, is to estimate frequency intensities of an image. If an image falls within the range of training set results, it is likely to be a pornographic image. The last process is HCM based on human skin detection. The experimental results show an effective accuracy after testing. This would demonstrate that our hierarchical image filtering techniques can achieve substantial improvements.
Citations: 2
Aggregation Type: Conference Proceeding
-------------------
Title: Content-based text classifiers for pornographic web filtering
Cover Date: 2006-01-01
Cover Display Date: 2006
DOI: 10.1109/ICSMC.2006.384926
Description: Due to the flood of pornographic web sites on the internet, effective web filtering systems are essential. Web filtering based on content has become one of the important techniques to handle and filter inappropriate information on the web. We examine two machine learning algorithms (Support Vector Machines and Naïve Bayes) for pornographic web filtering based on text content. We then focus initially on Thai-language and English-language web sites. In this paper, we aim to investigate whether machine learning algorithms are suitable for web sites classification. The empirical results show that the classifier based Support Vector Machines are more effective for pornographic web filtering than Naïve Bayes classifier after testing, especially an effectiveness for the over-blocking problem. ©2006 IEEE.
Citations: 17
Aggregation Type: Conference Proceeding
-------------------
Title: An effective pornographic WEB filtering system using a probabilistic classifier
Cover Date: 2005-01-01
Cover Display Date: 2005
DOI: 10.1142/9789812701534_0136
Description: Due to the flood of pornographic web sites on the internet, an effective web filtering system is essential. Web filtering has become one of the important techniques to handle and filter inappropriate information on the web. In this paper, we introduce a web filtering system based on contents. The system uses a probabilistic text classifier to filter pornographic information on the WWW. We focus initially only on Thai and English language web sites. The first process is to parse the web sites collection to extract unique words and to reduce stop-words. Afterwards, these features are transformed into a structurized "bag of words". The next process is calculating the probabilities of each category in the naïve bayes classifier (as a pornographic web filter). Finally, we have implemented and experimented on our techniques. After testing by the F-measure, the experimental results of our system show high accuracy. This demonstrates that naïve bayes can provide more effectiveness for web filtering based on text content. © 2005 World Scientific Publishing Co. Pte. Ltd.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Automotive market segmentation by machine learning
Cover Date: 2004-12-01
Cover Display Date: 2004
DOI: N/A
Description: Due to competitive market, many firms operators are constantly searching for alternative methods to supplement their income. One method of business is market segmentation analysis. Consequently, the classical approaches to segmentation such as demographic and behavioral segmentation schemes are well known. Entrepreneurs may segment their market on the basis of demographic characteristics such as age, income, or gender and geographic concentrations of consumers with the desired attributes. Demographic variables are a critical component to market segmentation because DEMAND for most products is related to factors such as age, income, and race. This research analyzes about automobile data by Back propagation artificial neural networks which one of machine learning technique for determining market plans. It used for classifying the group of geographic concentrations of consumers. When the segment model finished, it shown classifying model for analyzing concentrations of consumers. The third segment of the automobile is favorable, the mostly of group C is the Japanese car. The result from this analysis can be used to help determine marketing plans, particularly promotion, and to help companies retain and regain market share.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: A Bayesian Classification Approach to Personal Automotive Marketing Analysis
Cover Date: 2002-01-01
Cover Display Date: 2002
DOI: N/A
Description: Due to the large number of automotive companies in Thailand, the automotive business is characterized by drastic competition. Therefore, information is a key to business function. Information provides competitive advantage. Generally, multiple regression is used for analyzing and predicting business data. In this paper we explore an alternative technique from the area of data mining to analyze business data for the purpose of prediction. We explore the use of Bayesian network learning techniques for determining marketing plans. Bayesian networks represent relationships among variables with conditional probabilities, so they are well suited to dealing with noisy and incomplete data.
Citations: 1
Aggregation Type: Conference Proceeding
-------------------