Comparison of algorithms for building a cluster model based on a dataset obtained from bigdata
DOI: 10.31673/2412-9070.2025.013274
DOI:
https://doi.org/10.31673/2412-9070.2025.013274Abstract
MeanShift is a popular clustering algorithm widely used in a range of machine learning applications. A major drawback is the slow speed of the algorithm, as it requires quadratic time for one iteration. By enhancing the MeanShift algorithm with a mode-merging method based on mean-shift clustering, we justify this approach by showing that it allows probabilistic clustering interpretation based on the affinity of kernel density weights. This type of integration also optimizes the weight kernels and enables the use of variable-sized kernels according to local data structures. As a result, we achieved a significant speed improvement. Unlike classical MeanShift, this combined approach is based on linear time with respect to the number of points and exponential with respect to size. The aim of this article is to provide an overview of how mean-shift clustering can be applied to model building and to highlight the advantages of using a non-classical approach to mean-shift methodology compared to traditional methods. We will attempt to create a generalized list of crypto transactions to provide users with risk analytics for a crypto wallet or an individual crypto transaction. We will also compare the influence of different parameters and functions on cluster composition. The proposed method reduces computational costs while maintaining an acceptable level of clustering accuracy, similar to the standard mean-shift procedure. We will demonstrate the method’s effectiveness on a sequence of vectors that are non-constant and change over time. This experiment shows that the mean-shift values obtained through our distance calculation method outperform those obtained using classical methods when dealing with non-obvious and unstructured data values. To clarify the relationnnships between clusters and improve sorting accuracy, parameters such as market capitalization and other fiat indicators were used, which can be applied in future studies.
Keywords: clustering, machine learning, Big Data, blockchain, crypto transfer, Mean Shift Clustering.