Title: Improving Student Academic Performance Prediction Models using Feature Selection
Cover Date: 2020-06-01
Cover Display Date: June 2020
DOI: 10.1109/ECTI-CON49241.2020.9158286
Description: This paper presents methods to improve the prediction of student academic performance using feature selection by removing misclassified instances and Synthetic Minority Over-Sampling Technique. It compares the performance of seven students' academic performance prediction models, namely Naïve Bayes, Sequential Minimum Optimization, Artificial Neural Network, k-Nearest Neighbor, REPTree, Partial decision trees, and Random Forest. The data were collected from 9,458 students at the Rajabhat Maha Sarakham University, Thailand during 2015 - 2018. The model performances were evaluated with precision, recall, and F-measure. The experimental results indicated that the Random Forest approach significantly improves the performance of students' academic performance prediction models with precision up to 41.70%, recall up to 41.40% and F-measure up to 41.60%, respectively.
Citations: 16
Aggregation Type: Conference Proceeding
-------------------
Title: Enhancing the performance of association rule models by filtering instances in colorectal cancer patients
Cover Date: 2017-04-01
Cover Display Date: April-June 2017
DOI: 10.14456/easr.2017.11
Description: Colorectal cancer data available from the SEER program is analyzed with the aim of using filtering techniques to improve the performance of association rule models. In this paper, it is proposed to improve the quality of the dataset by removing its outliers using the Hidden Naïve Bayes (HNB), Naïve Bayes Tree (NBTree) and Reduced Error Pruning Decision Tree (REPTree) algorithms. The Apriori and HotSpot algorithms are applied to mine the association rules between the 13 selected attributes and average survivals. Experimental results show that the HNB algorithm can improve the accuracy of the Apriori algorithm’s performance by up to 100% and support threshold up to 45%. It can also improve the accuracy of the HotSpot algorithm’s performance up to 93.38% and support threshold up to 80%. Therefore, the HotSpot rules with minimum support of 80% are selected for explanation. The HotSpot algorithm shows that colorectal cancer patients, who died from colon cancer and were not receiving radiation therapy, were associated with survival of less than 22 months. Our study shows that filtering techniques in the preprocessing stage are a useful approach in enhancing the quality of the data set. This finding could help researchers build models for better prediction and performance analysis. Although it is heuristic, such analysis can be very useful to identify the factors affecting survival. It can also aid medical practitioners in helping patients to understand risks involved in a particular treatment procedure.
Citations: 6
Aggregation Type: Journal
-------------------
Title: Colorectal cancer survivability prediction models: A comparison of six rule based classification techniques
Cover Date: 2015-12-01
Cover Display Date: 1 December 2015
DOI: N/A
Description: The objective of this study was to compare six data mining techniques for developing accurate survival prediction models for colorectal cancer. We used six popular data mining algo-rithms (Conjunctive Rule, Decision Table Naïve Bay, Fuzzy Unordered Rule Induction, One Rule, PART and RIPPER) to develop the prediction models using the SEER data set. The data set was balanced by using the SMOTE filter. Cross vali-dation was also used to evaluate the models. Precision, recall and F-measure were employed to evaluate the correctness and effectiveness of the models. Performance comparisons indi-cated that, for colorectal cancer survivability, FURIA is more robust and balance than other classifiers.
Citations: 0
Aggregation Type: Journal
-------------------
Title: Enhancing decision tree with adaboost for predicting schizophrenia readmission
Cover Date: 2014-01-01
Cover Display Date: 2014
DOI: 10.4028/www.scientific.net/AMR.931-932.1467
Description: A psychiatric readmission is argued to be an adverse outcome because it is costly and occurs when relapse to the illness is so severe. An analysis of systematic models in readmission data can provide useful insight into the quicker and sicker patients with schizophrenia. This research aims to develop and investigate schizophrenia readmission prediction models using data mining techniques including decision tree, Random Tree, Random Forests, AdaBoost, Bagging and a combination of AdaBoost with decision tree, AdaBoost with Random Tree, AdaBoost with Random Forests, Bagging with decision tree, Bagging with Random Tree and Bagging with Random Forests. The experimental results successfully showed that AdaBoost with decision tree has the highest precision, recall and F-measure up to 98.11%, 98.79% and 98.41%, respectively. © (2014) Trans Tech Publications, Switzerland.
Citations: 3
Aggregation Type: Book Series
-------------------
Title: Web usage mining techniques and applications
Cover Date: 2012-11-01
Cover Display Date: November 2012
DOI: 10.4156/ijact.vol4.issue20.73
Description: The study of web usage and mining presents an interesting challenge for research. Web mining refers to a learning process about how users interact with various websites. The objective of mining is to automatically and quickly discover users' access patterns and use them as models that include access paths, access page groups and user clustering. Through this web usage process, the server log, registration information and other relative information, which is left by the user after access, can be mined. These models provide a foundation for decision making in many organizations. This paper provides an understanding of current web usage mining processes, techniques and applications.
Citations: 0
Aggregation Type: Journal
-------------------
Title: Prostate cancer survivability prediction models via rule-based techniques
Cover Date: 2012-11-01
Cover Display Date: November 2012
DOI: 10.4156/ijact.vol4.issue20.11
Description: Rule-based is a popular classification technique and easy to interpret results in both data mining and machine learning. It uses to build condition models in form of "if-else" for predicting new cases. In this paper, we investigated the capability and effectiveness of the rule-based models in predicting the 5-year Asian prostate cancer survivability prediction models from the SEER database. These models could assist medical professional in monitoring survival rates and up-to-date estimations of longterm survival rates. Accuracy, sensitivity and specificity were employed to evaluate capability and effectiveness of models generated from rule-based techniques including Decision Table (DT), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), PART decision lists (PART) and RIpple- DOwn Rule learner (RIDOR). Also, the stratified 10-fold cross-validation was utilised to reduce the bias of experiments. Experimental results showed that PART achieved the highest accuracy, sensitivity and specificity up to 92.15%, 91.91% and 92.37%, respectively.
Citations: 0
Aggregation Type: Journal
-------------------
Title: Toward breast cancer survivability prediction models through improving training space
Cover Date: 2009-12-01
Cover Display Date: December 2009
DOI: 10.1016/j.eswa.2009.04.067
Description: Due to the difficulties of outlier and skewed data, the prediction of breast cancer survivability has presented many challenges in the field of data mining and pattern precognition, especially in medical research. To solve these problems, we have proposed a hybrid approach to generating higher quality data sets in the creation of improved breast cancer survival prediction models. This approach comprises two main steps: (1) utilization of an outlier filtering approach based on C-Support Vector Classification (C-SVC) to identify and eliminate outlier instances; and (2) application of an over-sampling approach using over-sampling with replacement to increase the number of instances in the minority class. In order to assess the capability and effectiveness of the proposed approach, several measurement methods including basic performance (e.g., accuracy, sensitivity, and specificity), Area Under the receiver operating characteristic Curve (AUC) and F-measure were utilized. Moreover, a 10-fold cross-validation method was used to reduce the bias and variance of the results of breast cancer survivability prediction models. Results have indicated that the proposed approach leads to improving the performance of breast cancer survivability prediction models by up to 28.34% due to the improved training data space. © 2009 Elsevier Ltd. All rights reserved.
Citations: 45
Aggregation Type: Journal
-------------------
Title: AdaBoost algorithm with random forests for predicting breast cancer survivability
Cover Date: 2008-11-24
Cover Display Date: 2008
DOI: 10.1109/IJCNN.2008.4634231
Description: In this paper we propose a combination of the AdaBoost and random forests algorithms for constructing a breast cancer survivability prediction model. We use random forests as a weak learner of AdaBoost for selecting the high weight instances during the boosting process to improve accuracy, stability and to reduce overfitting problems. The capability of this hybrid method is evaluated using basic performance measurements (e.g., accuracy, sensitivity, and specificity), Receiver Operating Characteristic (ROC) curve and Area Under the receiver operating characteristic Curve (AUC). Experimental results indicate that the proposed method outperforms a single classifier and other combined classifiers for the breast cancer survivability prediction. © 2008 IEEE.
Citations: 68
Aggregation Type: Conference Proceeding
-------------------
Title: Support vector machine for outlier detection in breast cancer survivability prediction
Cover Date: 2008-01-01
Cover Display Date: 2008
DOI: 10.1007/978-3-540-89376-9_10
Description: Finding and removing misclassified instances are important steps in data mining and machine learning that affect the performance of the data mining algorithm in general. In this paper, we propose a C-Support Vector Classification Filter (C-SVCF) to identify and remove the misclassified instances (outliers) in breast cancer survivability samples collected from Srinagarind hospital in Thailand, to improve the accuracy of the prediction models. Only instances that are correctly classified by the filter are passed to the learning algorithm. Performance of the proposed technique is measured with accuracy and area under the receiver operating characteristic curve (AUC), as well as compared with several popular ensemble filter approaches including AdaBoost, Bagging and ensemble of SVM with AdaBoost and Bagging filters. Our empirical results indicate that C-SVCF is an effective method for identifying misclassified outliers. This approach significantly benefits ongoing research of developing accurate and robust prediction models for breast cancer survivability. © 2008 Springer Berlin Heidelberg.
Citations: 70
Aggregation Type: Book Series
-------------------