COMP 5704: Parallel Algorithms and Applications in Data Science
|
|
School of Computer Science |
In this epoch, data generation advancement are massively and rapidly developed. Collecting any data is possible everywhere and anywhere. There are a lot type of data with various amount of data which stored on data warehouse like sales production, satellite orbit lane, disease data, and various data type from many disciplines. Data mining is a technology that combined data analysis method with several massive-data processing algorithm. Data mining were also used to help find and analyze new information from data that ever used with different method.Clustering is one of the most popular methods for data analysis, which is prevalent in many disciplines such as image segmentation, bioinformatics, pattern recognition and statistics etc. The most popular and simplest clustering algorithm is K-means because of its easy implementation, simplicity, efficiency and empirical success. However, the realworld applications produce huge volumes of data, thus, how to efficiently handle of these data in an important mining task has been a challenging and significant issue. In addition, MPI(Message Passing Interface) as a programming model of message passing presents high performances, scalability and portability.In this project,I intend to compare the performance of Centroid Technique based clustering algorithms that includes K-means, K-means++ and Parallel K-means using Messaging Passing Interface(MPI).
Startup Paper(s):
Deliverables:
Relevant References: