Notion of cluster centers and cluster comparison in Density Based Algorithms

2017-07-17 11:41:20

I have done some research on clustering algorithms since for my goal is to cluster noisy data and identify outliers or small clusters as anomalies. I consider my data noisy because of my main feautures can have quite varying values. Therefore, my focus has been on density based algorithms with quite some success.

However, I am unable to grasp the idea of cluster comparison in such algorithms since the notion of cluster centers cannot be properly defined.

My dataset constists of network flows and I split the dataset in subsets based on an identifier. After applying clustering on each subset I want to be able to compare the clusters that are created on each subset so that I can compare the subsets themselves in some context.

Would appreciate some help from data scientist gurus on how to approach the concept of cluster comparison or cluster center in such algorithms.

Thanks all!