Digging Deeper With Machine Learning for Unbalanced Multimedia Data Categorization

Abstract

Since many real-world data sets have skewed class distributions—in which the majority of data instances (examples) belong to one class and considerably fewer instances belong to others—classifying unbalanced data is an important area of research. While minority instances (fraud in banking operations, abnormal cells in medical data, etc.) in many applications actually represent the concept of interest, a classifier induced from an imbalanced data set is more likely to be biassed towards the majority class and show very poor classification accuracy for the minority class. Unbalanced data classification, particularly for multimedia data, continues to be one of the most difficult issues in data mining and machine learning, despite substantial research efforts. In this research, we present an extended deep learning strategy to address this difficulty and get encouraging results in the classification of skewed multimedia data sets. In particular, we examine the combination of advanced empirical research on convolutional neural networks (CNNs), a cutting-edge deep learning technique, and bootstrapping techniques. Given that deep learning techniques, like CNNs, are typically computationally costly, we suggest feeding low-level features to CNNs and demonstrate that this may be done in a way that saves a significant amount of training time while still producing promising results. The experimental findings demonstrate how well our methodology performs in the TRECVID data set when it comes to categorising highly unbalanced data.