Title: ENSEMBLE CLUSTERING METHOD FOR ASSEMBLING OF THAI DECIDED CIVIL CASES INTO SPECIFIC CLUSTERS
Cover Date: 2025-03-01
Cover Display Date: March 2025
DOI: 10.24507/icicel.19.03.271
Description: Civil cases often pertain to legal disputes between individuals or organizations. Following a judgment, civil cases are referred to as “decided cases” and the associated documents can be utilized for future legal determinations. One alternative method for managing these decided cases and making it easier to identify relevant decided cases that meet the user’s needs is to group relevant decided cases together. As a result, the purpose of this study was to offer an ensemble clustering method for finding and identifying the most relevant legal cases from a given collection that satisfy the needs of users. In our ensemble clustering, we employ well-known clustering methods such as k-means++, spherical k-means, and DBSCAN. Upon assessing the clustering quality measure (purity score), accuracy, and F1 score, the proposed method yielded good results. Furthermore, when comparing it to the baseline, the proposed method exhibits enhancements in the purity score, accuracy, and F1 score by 6.95%, 6.67%, and 6.95%, respectively.
Citations: 0
Aggregation Type: Journal
-------------------
Title: A HYBRID METHOD OF ASPECT-BASED SENTIMENT ANALYSIS FOR HOTEL REVIEWS
Cover Date: 2024-01-01
Cover Display Date: January 2024
DOI: 10.24507/icicel.18.01.59
Description: The purpose of this study was to introduce a hybrid method of aspect-based sentiment analysis for hotel reviews. Hotel staff attentiveness, hotel cleanliness, value for money, and hotel location are all highly regarded hotel aspects. The proposed method is made up of two major components. BM25 is used in the first component to group the review sentences into the most relevant hotel aspect cluster. Word2Vec's skip-gram was utilized to generate the keywords relevant to each hotel aspect, which were then used as queries to organize review sentences into suitable hotel aspect cluster. Finally, hotel review sentences in each cluster are assigned a sentiment polarity as positive or negative using the sentiment polarity analyzer, which is an ensemble model comprised of five predictive models developed by C4.5 decision tree, Multinomial Naive Bayes (MNB), Support Vector Machines (SVM) with linear kernel, SVM with RBF kernel, and Logistic Regression (LR). After evaluating the proposed hybrid method via recall, precision, F1, and accuracy, our proposed method yielded satisfactory outcomes at 0.820, 0.805, 0.810, and 0.815, respectively. Furthermore, we also compared our hybrid method to a baseline utilizing the same training and test sets. The recall and precision scores of our proposed method were marginally higher than the baseline, with enhanced recall and precision scores at 4.76% and 4.88%, respectively.
Citations: 1
Aggregation Type: Journal
-------------------
Title: ASPECT-BASED SENTIMENT CLASSIFICATION FOR CUSTOMER HOTEL REVIEWS
Cover Date: 2022-12-01
Cover Display Date: December 2022
DOI: 10.24507/icicelb.13.12.1291
Description: Using only ratings to gauge public opinion about products and services is in-sufficient to improve product quality or understand the reasons for consumer preferences. This problem was addressed by employing feature/aspect-based sentiment analysis to ex-amine the polarity of customer evaluations. An aspect-based sentiment analysis method was designed for hotel evaluations, taking account of staff attentiveness, room cleanliness, hotel facilities, value for money and location convenience. A collection of keywords for each hotel aspect was learned using Word2Vec as one of the three fundamental solution mechanics. This corpus was then utilized to select hotel features during developing an aspect-based multiclassification model to categorize sentences containing customer evaluations into their specific aspect classes. A binary-based sentiment classifier was also developed to assign the sentiment polarity of each sentence in each aspect class. Term frequency-inverse gravity moment (tf-igm) was employed as a term weighting scheme, while the SVM algorithm was used to construct text classification models. Our proposed method gave superior results to the baseline with improved average recall, precision, F1 and accuracy scores of 3.45%, 2.38%, 2.35% and 2.35%, respectively, compared to the baseline.
Citations: 1
Aggregation Type: Journal
-------------------
Title: AUTOMATICALLY IDENTIFYING OF PLAGIARIZED SUBJECTIVE ANSWERS FOR THAI USING TEXT-BASED SIMILARITY ANALYSIS METHOD
Cover Date: 2022-06-01
Cover Display Date: June 2022
DOI: 10.24507/icicel.16.06.639
Description: In the context of education, many researchers design and develop methods or tools to identify plagiarism and maintain study quality. Text-based plagiarism often occurs in the academic domain, including online subjective examinations. Each one of the numerous proposed techniques has limitations in plagiarism detection. Here, a method is presented to identify plagiarized subjective answers in Thai when the subjective examination is performed online using natural language processing techniques (e.g., POS tagging) and cosine similarity analysis. The proposed method is called “similarity analysis of linguistic syntax and words used”. Results gave scores of true positive rate (TPR) as 0.81. Furthermore, the proposed method was compared with the baseline and when compared to the baseline, our proposed method improved the average TPR by 7.69%. This may demonstrate the success of our proposed method in identifying plagiarized subjective answers.
Citations: 0
Aggregation Type: Journal
-------------------
Title: A novel lightweight hybrid intrusion detection method using a combination of data mining techniques
Cover Date: 2015-01-01
Cover Display Date: 2015
DOI: 10.14257/ijsia.2015.9.4.10
Description: Hybrid intrusion detection systems that make use of data mining techniques, in order to improve effectiveness, have been actively pursued in the last decade. However, their complexity to build detection models has become very expensive when confronted with large-scale datasets, making them unviable for real-time retraining. In order to overcome the limitation of the conventional hybrid method, we propose a new lightweight hybrid intrusion detection method that consists of a combination of feature selection, clustering and classification. According to our hypothesis that there are different natures of attack events in each of network protocols, the proposed method examines each of network protocol data separately, but their processes are the same. First, the training dataset is divided into training subsets, depending on their type of network protocol. Next, each training subset is reduced dimensionally by eliminating the irrelevant and redundant features throughout the feature selection process; and then broken down into disjointed regions, depending on their similar feature values, by K-Means clustering. Lastly, the C4.5 decision tree is used to build multiple misuse detection models for suspicious regions, which deviate from the normal and anomaly regions. As a result, each detection model is built from high-quality data, which are less complex and consist of relevant data. For better understanding of the enhanced performance, the proposed method was evaluated through experiments using the NSL-KDD dataset. The experimental results indicate that the proposed method is better in terms of effectiveness (F-value: 0.9957, classification accuracy: 99.52%, false positive rate: 0.26%), and efficiency (the training and testing times of the proposed method are approximately 33% and 25%, respectively, of the time required for its comparison) than the conventional hybrid method using the same algorithm.
Citations: 2
Aggregation Type: Journal
-------------------
Title: Symbolic data conversion method using the knowledge-based extraction in anomaly intrusion detection system
Cover Date: 2014-01-01
Cover Display Date: July 2014
DOI: N/A
Description: In anomaly intrusion detection systems, machine learning algorithms, e.g. KNN, SOM, and SVM, are widely used to construct a model of normal system activity that are designed to work with numeric data. Consequently, symbolic data (e.g., TCP, SMTP, FTP, OTH, etc.) need to be converted into numeric data prior to being analyzed. From the previous works, there were different methods proposed for handling the symbolic data; for example, excluding symbolic data, arbitrary assignment, and indicator variables. However, these methods may entail a very difficult classification problem, especially an increase of the dimensionality of data that directly affect the computational complexity of machine learning algorithm. Thus, this paper proposed a new symbolic conversion method in order to overcome limitations of previous works by replacing the symbolic data with their risk values, obtained from knowledge-based extraction. The experiments affirmed that our proposed method was more effective in improving the classifier performance than did the previous works, and it did not increase the dimensionality of data. © 2005 -2014 JATIT & LLS. All rights reserved.
Citations: 3
Aggregation Type: Journal
-------------------