Title: ENHANCING PERSONALITY CHARACTERISTICS ANALYSIS WITH SMOTE AND ASSOCIATION RULE MINING: A CASE STUDY ON INTROVERTS AND EXTROVERTS
<br>Cover Date: 2025-06-01
<br>Cover Display Date: June 2025
<br>DOI: 10.24507/icicel.19.06.597
<br>Description: The classification of personality characteristics, typically divided into introverts and extroverts, differs from general public characteristics. Personality variation within teams significantly impacts team development and presents challenges for leaders in effective team management. Understanding how personality characteristics align with different types of work can enhance team potential. This research identifies variables relevant to analyzing co-worker personalities within organizations. An association rules model was constructed using questionnaire data to analyze introverted and extroverted characteristics. Imbalances in the data distribution were addressed using the synthetic minority oversampling technique, resulting in a balanced dataset with 3,198 extroverts and 3,512 introverts. The Apriori algorithm then generated association rules from this dataset, focusing on single-dimensional rules with high accuracy for each class. For the introvert class, the highest accuracy (96.52%) was associated with “Q81A: I am quiet around strangers (Agree)”, while the extrovert class achieved 68.81% accuracy with “Q82A: I do not talk a lot (Disagree)”. Optimal accuracy with two-rule associations reached 98.49% for introverts and 80.48% for extroverts.
<br>Citations: 0
<br>Aggregation Type: Journal
<br>-------------------
<br><br><br>Title: ENHANCING SENTIMENT CLASSIFICATION: A COMPARATIVE ANALYSIS OF SUPERVISED AND UNSUPERVISED METHODS FOR IMPROVING TRAINING DATA QUALITY
<br>Cover Date: 2025-05-01
<br>Cover Display Date: May 2025
<br>DOI: 10.24507/icicelb.16.05.471
<br>Description: This study evaluates the effectiveness of supervised and unsupervised methods in enhancing data quality for binary sentiment classification. Two datasets of hotel reviews from TripAdvisor were utilized: one for training polarity correction models and the other containing noisy labels for experimental evaluation. Supervised methods, including SVM with a linear kernel, Random Forest (RF), and Convolutional Neural Network (CNN), consistently outperformed unsupervised methods such as Standard K-means, K-means++, and Spherical K-means. Following the development of sentiment classifier models using the improved training set, SVM demonstrated the highest performance, achieving an accuracy and F1 score of 0.85, followed by RF and CNN. Among the unsupervised approaches, K-means++ yielded the best results, with an accuracy of 0.75 and an F1 score of 0.74. These findings highlight the superiority of supervised learning in sentiment classification tasks and underscore the critical importance of training set quality in enhancing model performance.
<br>Citations: 0
<br>Aggregation Type: Journal
<br>-------------------
<br><br><br>Title: MULTICLASS CLASSIFICATION APPROACH FOR DETECTING SOFTWARE BUG SEVERITY LEVEL FROM BUG REPORTS
<br>Cover Date: 2025-05-01
<br>Cover Display Date: May 2025
<br>DOI: 10.24507/icicelb.16.05.567
<br>Description: This study focuses on developing multiclass classifiers to predict the severity levels of bug reports using three machine learning algorithms: Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) with an RBF kernel. The research utilizes three datasets from the Mozilla bug tracking system – Core, Firefox, and Thunderbird – categorizing bug severity into five levels: blocker, critical, major, minor, and low. To address class imbalance and enhance model performance, a domain expertbased data augmentation method was applied, generating synthetic summaries from bug descriptions using cosine similarity. The augmented datasets, combined with undersampling techniques, ensure balanced class distributions, improving classifier robustness. The study leverages unigram and CamelCase features to build and evaluate the classifiers. Performance metrics, including accuracy, F1 score, and Matthews Correlation Coefficient (MCC), were used to assess model efficacy. The results demonstrate that LR outperforms RF and SVM, offering superior accuracy and interpretability, particularly for high-dimensional text data. LR’s efficiency, reduced overfitting risk, and effective handling of linear relationships make it well-suited for bug severity classification. This research provides a robust framework for improving bug triage processes, enhancing the prioritization and resolution of critical software issues.
<br>Citations: 0
<br>Aggregation Type: Journal
<br>-------------------
<br><br><br>Title: Cost-sensitive probability for weighted voting in an ensemble model for multi-class classification problems
<br>Cover Date: 2021-07-01
<br>Cover Display Date: July 2021
<br>DOI: 10.1007/s10489-020-02106-3
<br>Description: Ensemble learning is an algorithm that utilizes various types of classification models. This algorithm can enhance the prediction efficiency of component models. However, the efficiency of combining models typically depends on the diversity and accuracy of the predicted results of ensemble models. However, the problem of multi-class data is still encountered. In the proposed approach, cost-sensitive learning was implemented to evaluate the prediction accuracy for each class, which was used to construct a cost-sensitivity matrix of the true positive (TP) rate. This TP rate can be used as a weight value and combined with a probability value to drive ensemble learning for a specified class. We proposed an ensemble model, which was a type of heterogenous model, namely, a combination of various individual classification models (support vector machine, Bayes, K-nearest neighbour, naïve Bayes, decision tree, and multi-layer perceptron) in experiments on 3-, 4-, 5- and 6-classifier models. The efficiencies of the propose models were compared to those of the individual classifier model and homogenous models (Adaboost, bagging, stacking, voting, random forest, and random subspaces) with various multi-class data sets. The experimental results demonstrate that the cost-sensitive probability for the weighted voting ensemble model that was derived from 3 models provided the most accurate results for the dataset in multi-class prediction. The objective of this study was to increase the efficiency of predicting classification results in multi-class classification tasks and to improve the classification results.
<br>Citations: 21
<br>Aggregation Type: Journal
<br>-------------------
<br><br><br>Title: Probability-weighted voting ensemble learning for classification model
<br>Cover Date: 2020-11-01
<br>Cover Display Date: November 2020
<br>DOI: 10.12720/jait.11.4.217-227
<br>Description: Many research studies have investigated ensemble learning. However, these research studies proposed an approach for improving the ensemble learning. We propose the efficiency method using probability weight as a support to the classifier model called the probabilityweighted voting ensemble learning, which computes its own probability computation for each model from the training data. This research has tested the proposed model with 5 UCI data sets in various dimensions and generated four models, the 3PW-Ensemble model, the 4PW-Ensemble model, the 5PW-Ensemble model, and the 6PW-Ensemble model. The experimental results of the study yield the highest accuracy. Considering the comparison of efficiency, the accuracy of the proposed model was higher than those of the based classification models and the other ensemble models.
<br>Citations: 12
<br>Aggregation Type: Journal
<br>-------------------
<br><br><br>Title: Improved ensemble learning for classification techniques based on majority voting
<br>Cover Date: 2016-07-02
<br>Cover Display Date: 2 July 2016
<br>DOI: 10.1109/ICSESS.2016.7883026
<br>Description: This paper proposes the methodology for improving the performance of the classification model, over several methods. The accuracy values obtained through experiments permit the evaluation of each method's performance. We propose a concept that brings Ensemble learning to model classification, in order to improve performance through majority voting, called M-Ensemble learning. The improved Ensemble learning approach is divided into two main formats of combined methods, namely the 3-Ensemble model (combining odd number methods, such as Naïve Bayes, Decision Tree, and Multilayer Perceptron); and the 4-Ensemble model (combining even number methods, such as Naïve Bayes, Decision Tree, Multilayer Perceptron, and K-Nearest Neighbor). The most improved classification model resulted from the improved 3-Ensemble method, with an accuracy value of 83.13%, compared with the Multilayer Perceptron based model classification and the 4-Ensemble model, which yielded accuracy values of 80.67% and 81.86%, respectively.
<br>Citations: 41
<br>Aggregation Type: Conference Proceeding
<br>-------------------
<br><br><br>