HCBST: An efficient hybrid sampling technique for class imbalance problems.

Show simple item record

dc.contributor.author Sowah, R. A.
dc.contributor.author Kuditchar, B.
dc.contributor.author Mills, G.
dc.contributor.author Acakpovi, A.
dc.contributor.author Twum, R.
dc.contributor.author Osei, G.
dc.contributor.author Agboyi, R.
dc.date.accessioned 2023-01-17T13:04:56Z
dc.date.available 2023-01-17T13:04:56Z
dc.date.issued 2021
dc.identifier.other 10.1145/3488280
dc.identifier.uri https://dl.acm.org/doi/abs/10.1145/3488280
dc.identifier.uri http://atuspace.atu.edu.gh:8080/handle/123456789/2400
dc.description.abstract Class imbalance problem is prevalent in many real-world domains. It has become an active area of research. In binary classification problems, imbalance learning refers to learning from a dataset with a high degree of skewness to the negative class. This phenomenon causes classification algorithms to perform woefully when predicting positive classes with new examples. Data resampling, which involves manipulating the training data before applying standard classification techniques, is among the most commonly used techniques to deal with the class imbalance problem. This article presents a new hybrid sampling technique that improves the overall performance of classification algorithms for solving the class imbalance problem significantly. The proposed method called the Hybrid Cluster-Based Undersampling Technique (HCBST) uses a combination of the cluster undersampling technique to under-sample the majority instances and an oversampling technique derived from Sigma Nearest Oversampling based on Convex Combination, to oversample the minority instances to solve the class imbalance problem with a high degree of accuracy and reliability. The performance of the proposed algorithm was tested using 11 datasets from the National Aeronautics and Space Administration Metric Data Program data repository and University of California Irvine Machine Learning data repository with varying degrees of imbalance. Results were compared with classification algorithms such as the K-nearest neighbours, support vector machines, decision tree, random forest, neural network, AdaBoost, naïve Bayes, and quadratic discriminant analysis. Tests results revealed that for the same datasets, the HCBST performed better with average performances of 0.73, 0.67, and 0.35 in terms of performance measures of area under curve, geometric mean, and Matthews Correlation Coefficient, respectively, across all the classifiers used for this study. The HCBST has the potential of improving the performance of the class imbalance problem, which by extension, will improve on the various applications that rely on the concept for a solution. en_US
dc.language.iso en_US en_US
dc.publisher ACM Transactions on Knowledge Discovery from Data en_US
dc.relation.ispartofseries vol.;16
dc.title HCBST: An efficient hybrid sampling technique for class imbalance problems. en_US
dc.type Article en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account