Title: ENSEMBLE CLUSTERING METHOD FOR ASSEMBLING OF THAI DECIDED CIVIL CASES INTO SPECIFIC CLUSTERS
Cover Date: 2025-03-01
Cover Display Date: March 2025
DOI: 10.24507/icicel.19.03.271
Description: Civil cases often pertain to legal disputes between individuals or organizations. Following a judgment, civil cases are referred to as “decided cases” and the associated documents can be utilized for future legal determinations. One alternative method for managing these decided cases and making it easier to identify relevant decided cases that meet the user’s needs is to group relevant decided cases together. As a result, the purpose of this study was to offer an ensemble clustering method for finding and identifying the most relevant legal cases from a given collection that satisfy the needs of users. In our ensemble clustering, we employ well-known clustering methods such as k-means++, spherical k-means, and DBSCAN. Upon assessing the clustering quality measure (purity score), accuracy, and F1 score, the proposed method yielded good results. Furthermore, when comparing it to the baseline, the proposed method exhibits enhancements in the purity score, accuracy, and F1 score by 6.95%, 6.67%, and 6.95%, respectively.
Citations: 0
Aggregation Type: Journal
-------------------


Title: A HYBRID METHOD OF ASPECT-BASED SENTIMENT ANALYSIS FOR HOTEL REVIEWS
Cover Date: 2024-01-01
Cover Display Date: January 2024
DOI: 10.24507/icicel.18.01.59
Description: The purpose of this study was to introduce a hybrid method of aspect-based sentiment analysis for hotel reviews. Hotel staff attentiveness, hotel cleanliness, value for money, and hotel location are all highly regarded hotel aspects. The proposed method is made up of two major components. BM25 is used in the first component to group the review sentences into the most relevant hotel aspect cluster. Word2Vec's skip-gram was utilized to generate the keywords relevant to each hotel aspect, which were then used as queries to organize review sentences into suitable hotel aspect cluster. Finally, hotel review sentences in each cluster are assigned a sentiment polarity as positive or negative using the sentiment polarity analyzer, which is an ensemble model comprised of five predictive models developed by C4.5 decision tree, Multinomial Naive Bayes (MNB), Support Vector Machines (SVM) with linear kernel, SVM with RBF kernel, and Logistic Regression (LR). After evaluating the proposed hybrid method via recall, precision, F1, and accuracy, our proposed method yielded satisfactory outcomes at 0.820, 0.805, 0.810, and 0.815, respectively. Furthermore, we also compared our hybrid method to a baseline utilizing the same training and test sets. The recall and precision scores of our proposed method were marginally higher than the baseline, with enhanced recall and precision scores at 4.76% and 4.88%, respectively.
Citations: 1
Aggregation Type: Journal
-------------------


Title: Predicting the Close-price of Cryptocurrency Using the Kernel Regression Algorithm
Cover Date: 2023-01-01
Cover Display Date: 2023
DOI: 10.1109/RI2C60382.2023.10356032
Description: The aim of this work is to utilize the kernel regression (KR) approach to predict the closed-price for cryptocurrencies. This study makes use of three datasets: Bitcoin (BTC), Litecoin (LTC), and Ethereum (ETH). The min-max normalization method was used to scale feature values to a common range, often between 0 and 1. Furthermore, support vector regression (SVR) and long-short term memory (LSTM) were used to compare the prediction model-based on KR. The result of the KR models utilizing RMSE and MAPE demonstrated that the predictive model-based on KR gave more satisfying results.
Citations: 1
Aggregation Type: Conference Proceeding
-------------------


Title: AUTOMATICALLY IDENTIFYING OF PLAGIARIZED SUBJECTIVE ANSWERS FOR THAI USING TEXT-BASED SIMILARITY ANALYSIS METHOD
Cover Date: 2022-06-01
Cover Display Date: June 2022
DOI: 10.24507/icicel.16.06.639
Description: In the context of education, many researchers design and develop methods or tools to identify plagiarism and maintain study quality. Text-based plagiarism often occurs in the academic domain, including online subjective examinations. Each one of the numerous proposed techniques has limitations in plagiarism detection. Here, a method is presented to identify plagiarized subjective answers in Thai when the subjective examination is performed online using natural language processing techniques (e.g., POS tagging) and cosine similarity analysis. The proposed method is called “similarity analysis of linguistic syntax and words used”. Results gave scores of true positive rate (TPR) as 0.81. Furthermore, the proposed method was compared with the baseline and when compared to the baseline, our proposed method improved the average TPR by 7.69%. This may demonstrate the success of our proposed method in identifying plagiarized subjective answers.
Citations: 0
Aggregation Type: Journal
-------------------


Title: Identifying of Decision Components in Thai Civil Case Decision by Text Classification Technique
Cover Date: 2020-01-01
Cover Display Date: 2020
DOI: 10.1007/978-3-030-44044-2_2
Description: A Thai civil case decision document is typically presented in a semi-structured form. Generally, Thai civil case decision documents consist of four major components comprising the dispute, facts, decision, and judgment. To perform text summarization or information extraction on this document, the first process should recognize major components in the Thai civil case decision document. This has not been addressed previously and becomes the challenge for our study that aims to present a method of identifying the four major components utilizing the text classification technique. We employed two weighting schemes and three supervised machine learning algorithms and downloaded the dataset from the Supreme Court of Thailand website (http://www.supremecourt.or.th). After testing by recall, precision, and F1 satisfactory results were achieved for identifying major components in Thai civil case decision documents at 0.83, 0.80 and 0.81, respectively.
Citations: 1
Aggregation Type: Book Series
-------------------


Title: Finding Clinical Knowledge from MEDLINE Abstracts by Text Summarization Technique
Cover Date: 2018-12-20
Cover Display Date: 20 December 2018
DOI: 10.23919/INCIT.2018.8584867
Description: Today, the MEDLINE is an important repository containing more than 26 million citations and abstracts in the fields of medicine, while PubMed provides free access to MEDLINE and links to full-text articles. MEDLINE abstracts becomes a potential source of new knowledge in medical field. However, it is time-consuming and labour-intensive to find knowledge from MEDLINE abstracts, when a search returns much abstracts and each may contain a large volume of information. Therefore, this work aims to present a method of summarizing clinical knowledge from a MEDLINE abstract. The main mechanisms of the proposed method are driven on natural language processing (NLP) and text filtering techniques. The case study of this work is to summarize the clinical knowledge from a MEDLINE abstracts relating to cervical cancer in clinical trials. In the evaluation stage, the actual results obtained from a domain expert are used to compare the predicted results. After testing by recall, precision, and F-score, they return the satisfactory results, where the average of recall, precision, and F-measure are 0.84, 1.00, and 0.91 respectively.
Citations: 7
Aggregation Type: Conference Proceeding
-------------------


Title: The automatic Thai basketry detection and recognition on the local wisdom video
Cover Date: 2015-02-09
Cover Display Date: 9 February 2015
DOI: 10.1109/INFOS.2014.7036707
Description: To retrieve the videos having content relating to Local Thai basketry, creating video's indexes is vital because indexes allow users to more quickly find video for specific individuals. In this work, it aims to propose a novel methodology of video indexing. The proposed methodology consists of two main processing stages. The first stage is to capture video frames having the Thai basketry, and then classify them into many individual groups, because local Thai basketry can be classified into many types depending on its usability and pattern. This stage is driven on using color and texture, which are processed through the Artificial Neural networks (ANNs). The second stage is to recognize the basketry shape by chain code and template matching analysis on the object's shape in order to use them as video indexing. Finally, the proposed methodology is experimented on 41 local wisdom videos having 76 shorts and 4,196 frames. After testing by recall, precision, and F-measure, they show the satisfactory results for recall, precision, and F-measure as 78.64%, 83.88%, and 81.18%, respectively.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------


Title: Concept-based text classification of thai medicine recipes represented with ancient isan language
Cover Date: 2015-01-01
Cover Display Date: 2015
DOI: 10.1007/978-3-319-19024-2_12
Description: This work presents the concept-based text classification for organizing of traditional Thai medicine recipes. These recipes were translated from the Northeastern Thai palm leaf manuscripts. It is noted that each medicine recipe is presented with the ancient Isan language. The proposed method is called ‘concept-based text classification’, because we utilize ‘concepts’ as document features, where a concept is a surrogate of a word group having a same meaning. The main mechanisms in the method are the k-Nearest Neighbor algorithm and an ancient Isan dictionary, called Isan-Thai Markup Language (ITML). The objective of this work is to assign the Thai medicine recipes into predefined 5 groups. They are the groups of medicine recipe for headache and fever, stomachache and abdomen, skin, abscess, and faint and vertigo, respectively. After testing by recall, precision, and F-measure, it returns the satisfactory results of automatic text classification.
Citations: 1
Aggregation Type: Book Series
-------------------


Title: Ontology-based text classification for filtering cholangiocarcinoma documents from PubMed
Cover Date: 2014-01-01
Cover Display Date: 2014
DOI: 10.1007/978-3-319-09891-3_25
Description: PubMed is a search engine used to access the MEDLINE database, which comprises the massive amounts of biomedical literature. This an make more difficult for accessing to find the relevant medical literature. Therefore, this problem has been challenging in this work. We present a solution to retrieve the most relevant biomedical literature relating to Cholangiocarcinoma in clinical trials from PubMed. The proposed methodology is called ontology-based text classification (On-TC). We provide an ontology used as a semantic tool. It is called Cancer Technical Term Net (CCT-Net). This ontology is intergrated to the methodology to support automatic semantic interpretation during text processing, especially in the case of synonyms or term variations. © 2014 Springer International Publishing.
Citations: 4
Aggregation Type: Book Series
-------------------


Title: Thai E-san heritage images classification based on color features analysis: The normalized R/G ratio and edge histograms
Cover Date: 2010-12-01
Cover Display Date: 2010
DOI: 10.4156/ijipm.vol1.issue1.4
Description: In order to reduce a dilapidation of Thai E-san heritages by directly accessing and touching, it is found that digital Library can be a solution to represent, retrieval, and study Thai E-san culture heritages. This digital library was started with organization of a heritage image collection through classification technique. Therefore, this work is motivated by two drivers. First, it aims to apply an alternative dimension of Content-based Image Retrieval (CBIR) to classify a collection of Thai E-san heritage images into two classes: the class of heritage images with human, and the class of heritage images without human (e.g. images of ancient remains and antiques). Second, it also proposes a method of CBIR to automatically classify a heritage image collection based on color features. This approach is valuable for the automatically classifying heritage images, where it is a time-consuming and labour-intensive process if it is done by manual classification. Two techniques are proposed to classify and organize the collection of Thai E-san heritage images: the Normalized R/G ratio and the Naïve Bayes Image classifier based on edge histogram. After testing, the experimental results show the average accuracy of the Normalized R/G ratio at 72.5%, and the average accuracy of the Naïve Bayes Image classifier based on edge histogram at 86%. This would demonstrate that our approach can be sufficiently reliable for approach.
Citations: 2
Aggregation Type: Journal
-------------------


Title: Thai heritage images classification by naïve bayes image classifier
Cover Date: 2010-10-25
Cover Display Date: 2010
DOI: N/A
Description: Digital Library is a way to represent, retrieval, and study Thai E-san culture heritages without directly accessing and touching. It may help to reduce a dilapidation of Thai E-san culture heritages. We commence our project with organizing a collection of images through classification technique. Therefore, this work is motivated by two main drivers. Firstly, we aim to apply an alternative dimension of CBIR to classify a collection of Thai E-san heritage images into two classes: the class of heritage images which involve human activities, and the class of heritage images with non-human activities (e.g. images of ancient remains and antiques). Secondly, we also propose a new method of images classification. It is to apply Naive Bayes to produce image classifier based on edge histogram features. This approach is valuable for the automatically classifying heritage images, where it is a time-consuIning and labour-intensive process if it is done by manual classification. After testing, the experimental results show an effective accuracy. This would demonstrate that our approach is sufficiently reliable for use.
Citations: 6
Aggregation Type: Conference Proceeding
-------------------


Title: A web pornography patrol system by content based analysis: In particular text and image
Cover Date: 2008-12-01
Cover Display Date: 2008
DOI: 10.1109/ICSMC.2008.4811326
Description: A problem of children being exposed to pornographic web sites on the internet has led to their safety issues. To prevent the children from these inappropriate materials, an effective web filtering system is essential. Content based web filtering is one of the important techniques to handle and filter inappropriate information on the web. In this paper, we examine a content based analysis technique to filter the pornographic web sites. Then, our system consists of two primary content based filtering techniques such as text and image. For text analysis, the Support Vector Machine (SVM) algorithm and N gram model based on Bayes' theorem is applied and experimented to filter pornographic text for both Thai and English language web sites. Meanwhile, we build and examine an image filtering system with a hierarchical image filtering method. It consists of two main processes such as normalized R/G ratio which is using the pixel ratios (red and green color channels) and human composition matrix (HCM) based on skin detection. The empirical results show that our analysis methods of text and image are more effective for pornographic web filtering. Finally, we have modeled a pornographic web filter using content based analysis into our Anti X system. © 2008 IEEE.
Citations: 25
Aggregation Type: Conference Proceeding
-------------------


Title: A web pornography patrol system based on hierarchical image filtering techniques
Cover Date: 2006-12-01
Cover Display Date: 2006
DOI: 10.2991/jcis.2006.268
Description: Due to the flood of pornographic web sites on the internet, content-based web filtering has become an important technique to detect and filter inappropriate information on the web. This is because pornographic web sites contain many sexually oriented texts, images, and other information that can be helpful to filter them. In this paper, we build and examine a system to filter web pornography based on image content. Our system consists of three main processes: (i) normalized R/G ratio, (ii) histogram, and (iii) human composition matrix (HCM) based on skin detection. The first process is using the pixel ratios (red and green color channels) for image filtering. The second process, histogram analysis, is to estimate frequency intensities of an image. If an image falls within the range of training set results, it is likely to be a pornographic image. The last process is HCM based on human skin detection. The experimental results show an effective accuracy after testing. This would demonstrate that our hierarchical image filtering techniques can achieve substantial improvements.
Citations: 2
Aggregation Type: Conference Proceeding
-------------------


Title: Content-based text classifiers for pornographic web filtering
Cover Date: 2006-01-01
Cover Display Date: 2006
DOI: 10.1109/ICSMC.2006.384926
Description: Due to the flood of pornographic web sites on the internet, effective web filtering systems are essential. Web filtering based on content has become one of the important techniques to handle and filter inappropriate information on the web. We examine two machine learning algorithms (Support Vector Machines and Naïve Bayes) for pornographic web filtering based on text content. We then focus initially on Thai-language and English-language web sites. In this paper, we aim to investigate whether machine learning algorithms are suitable for web sites classification. The empirical results show that the classifier based Support Vector Machines are more effective for pornographic web filtering than Naïve Bayes classifier after testing, especially an effectiveness for the over-blocking problem. ©2006 IEEE.
Citations: 17
Aggregation Type: Conference Proceeding
-------------------


Title: Automatic Thai character extraction from digital video
Cover Date: 2003-12-01
Cover Display Date: 2003
DOI: N/A
Description: Most texts appearing in a video often summarize the video content. Thus, it is reasonable to use these texts to build keywords for video retrieval. Unfortunately, current Thai Optical Character Recognition (OCR) technology recognize texts in digital video not well since they emerge from the complex background, and their locations are unpredictable. This work consists of two phases: Text Detection referring to locating texts in digital video by using Wavelet Transform, and Text Segmentation referring to separating texts from the complex background by using color and intensity analysis. The experimental results show that the proposed approaches can detect text location with the accuracy of 89% for graphic text. The result of text segmentation from complex background is 93.2% for less complex background and 83.8% for more complex background.
Citations: 0
Aggregation Type: Conference Proceeding
-------------------