0

Classification Vs Clustering

I invariably get confused between the two. More because of the Classication of living beings, animals , insects, human beings etc we have been taught in our primary science/biology classes. Clustering , i have been always seeing from the marketing perspective of segmentation. Does both implies grouping in one form or the other? Absolutely not.

Say we have a population given, when we find similar groups within this population, we basically divide the population into clusters. The similarity is on basis of few characteristics/parameters say age , disease which help us to determine whether one is young, middle-aged or old. Before the start of the clustering process , we dont know which characteristics will be deterministic helping us to find the DIFFERENT SIMILAR GROUPS. There might be few outliers left which will not have any aligning characteristics to any such group.

Classification already have the grouping done. We just need to determine the group in which one particular element of the population will fall into. We start with a training set of data which already have groups mentioned. Depending on this dataset, we find out the pattern or the group/class function which will help us determine given a new population , which class the members of this population will lie.

The below figure stands apt for classification problems. We just need to decide whether a given book will fall into the algorithms box or science box or philosophy one. We categorize the books into known classes. These classes/groups and the number of them is already known (Supervised learning).


While clustering , we simply have all the books at our disposal and depending on their contents we identify what are the various groups/class of books in the library depending on which we label the box in the rack. The number of groups/classes is unknown when we start (Unsupervised learning).

The other example can be news item - whether a piece of article falls into the Economy news or Marketing News or Technological news category et all. This is classification. Finding all the categories under which news item will fall is clustering. Deciding whether a mail is a spam or not is classification while grouping the mails into different labels one of which can be a spam label is clustering. 

So is clustering always done before classification?

0 comments:

Post a Comment

Back to Top