Saturday, July 16, 2016

Classification with k nearest neighbors

One of the simplest machine learning algorithms is  K nearest neighbors algorithm .Simplest to understand and implement .Lets implement using erlang.

Overview : 

               We have existing pieces of data and we know the labels or values of them .For example movies in a database and we know what categories they fall into .For example Terminator as action movie, Back to the Future as science fiction etc .Now supposed a new movie has released we can use the k nearest algorithm to classify it . We measure features of existing movies that we know the classification such as number of action sequences , comedy sequences etc .Then we   calculate the distance of the new movie we need to classify and label it according to its k nearest neighbors.

Table 2.1. Movies with the number of kicks and number of kisses shown for each movie, along with our assessment of the movie type
Movie title
# of action sequences
# of science fiction
Type of movie
Terminator 20 20Science Fiction/Action
Back to the Future 5 15 Science Fiction/Action
? 15 25 ?
We don’t know what type of movie the question mark movie is, but we have a way of figuring that out. First, we calculate the distance to all the other movies. I’ve calculated the distances and shown those in table 2.2. (Don’t worry about how I did these calculations right now. We’ll get into that in a few minutes.)
Table 2.2. Distances between each movie and the unknown movie
Movie title
Distance to movie “?”
Terminator 7.07(math:sqrt( math:pow((20-15),2) + math:pow((20-25),2)).
Back to the Future 14.1421(math:sqrt( math:pow((5-15),2) + math:pow((15-25),2)).)
Now that we have all the distances to our unknown movie, we need to find the k-nearest movies by sorting the distances in decreasing order. Let’s assume k=2. Then, the two closest movies are Terminator, and Back to the Future. The kNN algorithm says to take the majority vote from these three movies to determine the class of the mystery movie. Because all three movies are Science Fiction, we forecast that the mystery movie is a sci fi action movie.