Title: A Comparative Analysis of Machine Learning Models for Domain Adaptation in Multiclass Sentiment Classification
Cover Date: 2025-04-01
Cover Display Date: April 2025
DOI: 10.37936/ecti-cit.2025192.258824
Description: This study presents a comparative evaluation of machine learning models for domain adaptation in multiclass sentiment classification. While sentiment analysis aims to categorize opinions as positive, neutral, or negative, adapting models across domains remains a significant challenge due to differences in vocabulary, writing style, and sentiment expression. Models trained on a specific domain often fail to generalize effectively to others. To solve this problem, we evaluate how well six models-logistic regression, support vector machine (SVM) with a linear kernel, random forest, convolutional neural network (CNN), long short-term memory (LSTM), and BERT-perform on sentiment data from books, beauty & personal care, and automotive categories. The evaluation uses Amazon review data and measures performance via accuracy, F1 score, and Area Under the ROC Curve (AUC). Results indicate that BERT consistently outperforms all other models due to its attention-based transformer architecture, which captures nuanced contextual information across diverse domains. CNN and LSTM models also perform well, particularly in domain-specific settings, with CNN excelling in extracting local features and LSTM in modeling sequential relationships. Traditional models, such as logistic regression and SVM, show limitations in generalizability, while random forest demonstrates stable yet moderate performance. These findings highlight the strengths and trade-offs of each approach for effective cross-domain sentiment classification.
Citations: 0
Aggregation Type: Journal
-------------------
Title: Spatial Predictive Modeling of Liver Fluke Opisthorchis viverrine (OV) Infection under the Mathematical Models in Hexagonal Symmetrical Shapes Using Machine Learning-Based Forest Classification Regression
Cover Date: 2024-08-01
Cover Display Date: August 2024
DOI: 10.3390/sym16081067
Description: Infection with liver flukes (Opisthorchis viverrini) is partly due to their ability to thrive in habitats in sub-basin areas, causing the intermediate host to remain in the watershed system throughout the year. Spatial modeling is used to predict water source infections, which involves designing appropriate area units with hexagonal grids. This allows for the creation of a set of independent variables, which are then covered using machine learning techniques such as forest-based classification regression methods. The independent variable set was obtained from the local public health agency and used to establish a relationship with a mathematical model. The ordinary least (OLS) model approach was used to screen the variables, and the most consistent set was selected to create a new set of variables using the principal of component analysis (PCA) method. The results showed that the forest classification and regression (FCR) model was able to accurately predict the infection rates, with the PCA factor yielding a reliability value of 0.915. This was followed by values of 0.794, 0.741, and 0.632, respectively. This article provides detailed information on the factors related to water body infection, including the length and density of water flow lines in hexagonal form, and traces the depth of each process.
Citations: 4
Aggregation Type: Journal
-------------------
Title: Estimates of PM2.5 Concentration Based on Aerosol Optical Thickness Data Using Ensemble Learning with Support Vector Machine and Decision Tree
Cover Date: 2023-12-22
Cover Display Date: 22 December 2023
DOI: 10.5755/j01.erem.79.4.33913
Description: Air pollution, particularly fine particulate matter with a diameter of 2.5 micrometers or less (PM2.5), is a significant public health concern in many regions worldwide, including the northeastern region of Thailand. This study investigates the correlation between PM2.5 concentrations and meteorological spatial datasets such as surface relative humidity (SRH), surface wind speed (SPD), visibility (Vis), surface temperature (ST), and aerosol optical thickness (AOT) in the region. GIS techniques and the inverse distance weighting technique were used to create spatial maps of the meteorological datasets and ground station PM2.5 measurements. Pearson correlation analysis was performed to examine the relationship between PM2.5 and the meteorological datasets. Decision tree and support vector machine (SVM) algorithms were employed to estimate PM2.5 concentrations based on the spatial datasets. The results showed that Vis and ST have a moderate positive linear relationship with PM2.5, while AOT has a moderate negative linear relationship. SRH and SPD have weak relationships with PM2.5. The decision tree and SVM algorithms demonstrated a strong positive correlation between estimated and measured PM2.5 concentrations. The study shows that machine learning algorithms can be effective tools for estimating PM2.5 concentration based on AOT data, and feature selection can improve model performance. Ensemble learning could be employed to further improve model performance, particularly in regions with high spatial variability. Overall, the study provides a promising approach for estimating PM2.5 concentration using machine learning algorithms and AOT data.
Citations: 3
Aggregation Type: Journal
-------------------
Title: Machine Learning-based Multiclass Classification Methods for Sentiment Analysis
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.1109/InCIT60207.2023.10413035
Description: Sentiment analysis, also known as opinion mining, is the process of identifying the sentiment or emotion conveyed in a textual review. This requires categorizing the expressed opinions into several sentiment classes, namely positive, negative, or neutral. Typically, machine learning algorithms are employed to construct a sentiment classifier, which is subsequently utilized to automatically assign appropriate sentiment to individual textual reviews. Numerous machine learning methods have been utilized for these purposes. Determining the most suitable algorithm for sentiment analysis is a challenge. One potential methodology is doing a comparative examination of the algorithm's performance with the dataset under consideration, and then choose the most optimal sentiment classifier for adoption. In this study, we conducted a comparative analysis of many machine learning algorithms with lexicon-based approach, including multinomial naïve bayes, support vector machine, k-nearest neighbors, random forest and an ensemble approach combining these algorithms, with the purpose of developing a sentiment classifier model using the TripAdvisor dataset. The objective was to classify hotel customer reviews into three distinct categories: positive, neutral, and negative. After evaluation of recall, precision, F1, and accuracy metrics, it can be concluded that the ensemble approach yields superior outcomes compared to other approaches.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------
Title: Spatial Predictive Modeling of the Burning of Sugarcane Plots in Northeast Thailand with Selection of Factor Sets Using a GWR Model and Machine Learning Based on an ANN-CA
Cover Date: 2022-10-01
Cover Display Date: October 2022
DOI: 10.3390/sym14101989
Description: The main purpose of the study is to apply symmetry principles to general mathematical modelling based on multi-criteria decision making (MCDM) approach for use in development in conjunction with geographic weighted regression (GWR) model and optimize the artificial neural network-cellular automaton (ANN-CA) model for forecasting the sugarcane plot burning area of Northeast Thailand. First, to calculate the service area boundaries of sugarcane transport that caused the burning of sugarcane with a fire radiative power (FRP) values using spatial correlation analysis approach. Second, the analysis of the spatial factors influencing sugarcane burning. The study uses the approach of symmetry in the design of algorithm for finding the optimal service boundary distance (called as cut-off) in the analysis of hot-spot clustering and uses calculations with the geographic information system (GIS) approach, and the final stage is the use of screened independent variable factors to predict the plots of burned sugarcane in 2031. The results showed that the positively related factors for the percentage of cane plot sintering in the sub-area units of each sugar plant’s service were the distance to transport sugarcane plots index and percentage of sugarcane plantations in service areas, while the negative coefficients were FRP differences and density of sugarcane yield factors, according to the analysis with a total of seven spatial variables. The best GWR models display local R2 values at levels of 0.902 to 0.961 in the service zones of Khonburi and Saikaw. An influential set of independent variables can increase the accuracy of the ANN-CA model in forecasting with kappa statistical estimates in the range of 0.81 to 0.85 The results of the study can be applied to other regions of Thailand, including countries with similar sugarcane harvesting industries, to formulate policies to reduce the exposure of sugarcane harvested by burning methods and to support the transportation of sugarcane within the appropriate scope of service so that particulate matter less than 2.5 microns ((Formula presented.)) can be reduced.
Citations: 11
Aggregation Type: Journal
-------------------