Abstract |
: |
Mining of data items plays a very crucial role in recent era as the World tends to manage with huge number of data’s. Existing algorithm for frequent item set is Parallel Mining algorithm. But it lacks various mechanisms like Automatic Parallelization, Load Balancing, Data distribution and Fault Tolerance on large clusters. So a Parallel frequent item sets mining algorithm called FiDoop using the Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) is designed. To achieve compressed storage and avoid building conditional pattern bases, FiDoop is implemented on in-house Hadoop cluster and showed that FiDoop on the cluster is sensitive to data distribution and dimensions, because item sets with different lengths have different decomposition and construction costs. Also Birch performs faster, scans whole data only once, handles outlier better, superior to other algorithms in stability and scalability. To improve FiDoop’s performance, workload balance metric is developed to measure load balance across the cluster’s computing nodes. FiDoop-HD, an extension of Fi-Doop to speed up the mining performance for high-dimensional data analysis is developed extensive experiments using real-world celestial spectral data demonstrate that the proposed solution is efficient and scalable. |