One of the simplest machine learning algorithms is K nearest neighbors algorithm .Simplest to understand and implement .Lets implement using erlang.
Overview :
We have existing pieces of data and we know the labels or values of them .For example movies in a database and we know what categories they fall into .For example Terminator as action movie, Back to the Future as science fiction etc .Now supposed a new movie has released we can use the k nearest algorithm to classify it . We measure features of existing movies that we know the classification such as number of action sequences , comedy sequences etc .Then we calculate the distance of the new movie we need to classify and label it according to its k nearest neighbors.
Table 2.1. Movies with the number of kicks and number of kisses shown for each movie, along with our assessment of the movie type
Movie title
|
# of action sequences
|
# of science fiction
|
Type of movie
|
---|---|---|---|
Terminator | 20 | 20 | Science Fiction/Action |
Back to the Future | 5 | 15 | Science Fiction/Action |
? | 15 | 25 | ? |
We don’t know what type of movie the question mark movie is, but we have a way of figuring that out. First, we calculate the
distance to all the other movies. I’ve calculated the distances and shown those in table 2.2. (Don’t worry about how I did these calculations right now. We’ll get into that in a few minutes.)
Table 2.2. Distances between each movie and the unknown movie
Movie title
|
Distance to movie “?”
|
---|---|
Terminator | 7.07(math:sqrt( math:pow((20-15),2) + math:pow((20-25),2)). |
Back to the Future | 14.1421(math:sqrt( math:pow((5-15),2) + math:pow((15-25),2)).) |
Now that we have all the distances to our unknown movie, we need to find the k-nearest movies by sorting the distances in
decreasing order. Let’s assume k=2. Then, the two closest movies are Terminator, and Back to the Future. The kNN algorithm says to take the majority vote from these three movies to determine the class of the mystery movie. Because
all three movies are Science Fiction, we forecast that the mystery movie is a sci fi action movie.