Enhancing Classification Performance Through Employing Feature Selection Algorithms

Wisam A. Qader; Musa M.Ameen; Bilal I. Ahmed

doi:10.23918/eajse.v10i3p8

Authors

Wisam A. Qader Department of Computer Engineering, Faculty of Engineering, Tishk International University, Erbil, Iraq https://orcid.org/0000-0002-2626-0295
Musa M.Ameen Department of Computer Engineering, Faculty of Engineering, Tishk International University https://orcid.org/0000-0003-2503-4808
Bilal I. Ahmed Department of Information Technology, Faculty of Applied Science, Tishk International University, Erbil, Iraq

DOI:

https://doi.org/10.23918/eajse.v10i3p8

Keywords:

Data Classification, Feature Selection, Naïve Bayes, Decision Tree, Sequential Minimum Optimization, Random Tree, Stacking

Abstract

Data classification is a pivotal area of research due to its critical importance across a wide range of applications, such as healthcare, finance, and predictive analytics. This study introduces an innovative approach designed to enhance classification accuracy and produce highly precise outputs. The methodology incorporates five distinct feature selection algorithms—Quick Branch and Bound (QBB), Las Vegas Filter (LVF), Branch and Bound (B&B), Focus, and Sequential Floating Forward Search (SFFS)—to identify the most relevant features. These are followed by the application of five classification algorithms—Sequential Minimal Optimization (SMO), Stacking, Random Tree (RT), Naive Bayes (NB), and Decision Table (DT)—to categorize the data. The classifiers, selected from diverse classification families, were rigorously evaluated using seven datasets encompassing various domains. The proposed approach achieved accuracy improvements ranging from 0.843 to 0.966 for Naive Bayes, 0.837 to 0.964 for SMO, 0.806 to 0.962 for Decision Table, 0.783 to 0.941 for Random Tree, and 0.877 to 0.986 for Stacking, demonstrating a significant enhancement compared to traditional methods. Among the tested combinations, the integration of QBB as a feature selection method with the Stacking classification algorithm yielded the highest performance, achieving an accuracy of 99.2% in certain datasets. Ultimately, the combination of the best-performing feature selection algorithm and the most accurate classification algorithm was identified, showcasing the versatility and robustness of this approach for practical applications across multiple fields.

Downloads

Download data is not yet available.

References

[1] Khachidze, M. & Tsintsadze, M. & Archuadze, M. Natural language processing based instrument for classification of free text medical records. BioMed research international. 2016. https://doi.org/10.1155/2016/8313454

[2] Bottou, L. & Curtis F.E. & Nocedal, J. Optimization methods for large-scale machine learning. SIAM review. 2018, 60(2), 223-311. https://doi.org/10.1137/16M1080173

[3] Bell, D.A. & Wang, H. A. formalism for relevance and its application in feature subset selection. Machine learning. 2000, 41, 175-95. https://doi.org/10.1023/A:1007612503587

[4] Chmiela, S & Tkatchenko, A. & Sauceda, H.E. & Poltavsky, I. Schütt, K.T. & Müller, K.R. Machine learning of accurate energy-conserving molecular force fields. Science advances. 2017, 3(5). https://doi.org/10.1126/sciadv.1603015

[5] Thaseen, I.S. & Kumar, C.A. Intrusion detection model using fusion of chi-square feature selection and multi class SVM. Journal of King Saud University-Computer and Information Sciences. 2017, 29(4), 462-72. https://doi.org/10.1016/j.jksuci.2015.12.004

[6] Allahyari, M. & Pouriyeh, S. & Assefi, M. & Safaei, S. & Trippe, E.D. & Gutierrez, J.B. & Kochut, K. A brief survey of text mining: Classification, clustering and extraction techniques. 2017. https://doi.org/10.48550/arXiv.1707.02919

[7] Revathy, N. & Amalraj, D. Accurate cancer classification using expressions of very few genes. International Journal of Computer Applications. 2011, 14(4), 19-22. https://doi.org/10.5120/1832-2452

[8] Nadkarni, P.M. & Ohno-Machado, L. & Chapman, W.W. Natural language processing: an introduction. Journal of the American Medical Informatics Association. 2011, 18(5), 544-51. https://doi.org/10.1136/amiajnl-2011-000464

[9] Korde, V. & Mahender, C.N. Text classification and classifiers: A survey. International Journal of Artificial Intelligence & Applications. 2012, 3(2), 85. https://doi.org/10.1136/amiajnl-2011-000464

[10] Wei, A. Optimal las vegas approximate near neighbors in ℓp. InProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms 2019, 1794-1813. https://doi.org/10.1137/1.9781611975482.108

[11] Lee, J. & Park, D. & Lee, C. Feature selection algorithm for intrusions detection system using sequential forward search and random forest classifier. KSII Transactions on Internet and Information Systems (TIIS). 2017, 11(10), 5132-48. http://dx.doi.org/10.3837/tiis.2017.10.024

[12] Morrison DR, Jacobson SH, Sauppe JJ, Sewell EC. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discrete Optimization. 2016 Feb 1;19:79-102. https://doi.org/10.1016/j.disopt.2016.01.005

[13] Kabbaj, M.M. & El Afia, A. Towards learning integral strategy of branch and bound. In2016 5th International Conference on Multimedia Computing and Systems (ICMCS) 2016, 621-626. https://doi.org/10.4018/979-8-3693-7117-6.ch006

[14] Elankavi, R. & Kalaiprasath, R. & Udayakumar, D.R. A fast clustering algorithm for high-dimensional data. International Journal Of Civil Engineering And Technology (Ijciet). 2017, 8(5), 1220-7. https://doi.org/10.34218/IJCIET_16_01_004

[15] Russell, S.J. & Norvig, P. Artificial intelligence: a modern approach. Pearson, 2016. http://dx.doi.org/10.1016/j.artint.2011.01.005

[16] Vijayarani, S. & Dhayanand, S. Liver disease prediction using SVM and Naive Bayes algorithms. International Journal of Science, Engineering and Technology Research (IJSETR). 2015, 4(4), 816-20. https://doi.org/10.1016/j.procs.2020.03.226

[17] Reddy, K. & Shiva, K.L. & Abhilash, K. & Yoganandam, Y. Database Assisted Automatic Modulation Classification Using Sequential Minimal Optimization. 2018. https://doi.org/10.48550/arXiv.1806.07566

[18] Zhang, X. & Miao, D. Three-layer granular structures and three-way informational measures of a decision table. Information sciences. 2017, 412, 67-86. https://doi.org/10.1016/j.ins.2017.05.032

[19] Kevric, J. & Jukic, S. & Subasi, A. An effective combining classifier approach using tree algorithms for network intrusion detection. Neural Computing and Applications. 2017, 28(1), 1051-8. https://doi.org/10.1007/s00521-016-2418-1

[20] Sikora, R. A modified stacking ensemble machine learning algorithm using genetic algorithms. InHandbook of research on organizational transformations through big data analytics. 2015, 43-53. https://doi.org/10.58729/1941-6679.1061

[21] Tu, M.C. & Shin, D. & Shin, D. A comparative study of medical data classification methods based on decision tree and bagging algorithms. In2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing 2009, 183-187. https://doi.org/10.1109/DASC.2009.40

[22] Kim, S.B. & Han, K.S. & Rim, H.C. & Myaeng, S.H. Some effective techniques for naive bayes text classification. IEEE transactions on knowledge and data engineering. 2006, 18(11), 1457-66. http://dx.doi.org/10.1109/TKDE.2006.180