COMP 5704: Parallel Algorithms and Applications in Data Science


School of Computer Science
Carleton University, Ottawa, Canada


Project Title: Performance Comparison of Centroid Based Clustering Algorithms

Name: Aagyapal Kaur

E-Mail: aagyapalkaur@cmail.carleton.ca


Project Outline:

In this epoch, data generation advancement are massively and rapidly developed. Collecting any data is possible everywhere and anywhere. There are a lot type of data with various amount of data which stored on data warehouse like sales production, satellite orbit lane, disease data, and various data type from many disciplines. Data mining is a technology that combined data analysis method with several massive-data processing algorithm. Data mining were also used to help find and analyze new information from data that ever used with different method.Clustering is one of the most popular methods for data analysis, which is prevalent in many disciplines such as image segmentation, bioinformatics, pattern recognition and statistics etc. The most popular and simplest clustering algorithm is K-means because of its easy implementation, simplicity, efficiency and empirical success. However, the realworld applications produce huge volumes of data, thus, how to efficiently handle of these data in an important mining task has been a challenging and significant issue. In addition, MPI(Message Passing Interface) as a programming model of message passing presents high performances, scalability and portability.In this project,I intend to compare the performance of Centroid Technique based clustering algorithms that includes K-means, K-means++ and Parallel K-means using Messaging Passing Interface(MPI).

Startup Paper(s):

  • A Parallel K-means Clustering Algorithm with MPI
  • Deliverables:

    Relevant References: