Hi, all you nice people out there. This is my first post and I'm looking forward to hear from you and connect. I'm a lecturer in Dublin and would like to share with you some of the work I'm doing in my machine learning course. The full blogpost can be found here.
My first job as as an automation tester and later software developer in .NET, so I learned about the pleasures (sarcasm!!!) of C#, VB.Net, etc. Then I went to a research institute where my first task was to migrate Perl code to Python (do you see where I'm going with this?). For my PhD I used lots of R for the stats part, but also Python again and I kinda started to really like programming languages. So, it went on with Haskell, Prolog (well declarative languages sounded really cool to me), until I found Ruby one day. It was love on first sight and I started to think in Ruby every time I wrote code. Now, we all know that you should think about a problem first and then pick the language that has the best potential to solve it, but somehow Ruby really feels like home to me.
After all these adventures, I finally ended up as a lecturer in Dublin (oh, did I mention that I was born in Croatia and grew up in Austria?). I'm very passionate about teaching and working with students, but also doing research and working on new and exciting problems every day. And my main courses are machine learning and semantic web, so I'm trying to teach my students through code in both, and what better language to demonstrate things than Ruby?
def knn(data_frame, k, query, distance_metric: :euclidean) distances =  data_frame.each :row do |x| distances << send(distance_metric, x.to_a[0...-1], query) end data_frame[:Distance] = distances data_frame.sort!([:Distance]) return data_frame.first(k)[-2].mode end
Now, you might have a clue or two about machine learning or not, but even if you don't it's so easy to understand how this works. You have some data in a
k that defines how many neighbours you're looking through, a
query that you want to predict, and a distance metric. Now we go through the dataset, compare each instance to our query by using the metric and add that as a new column to our frame.
Once this is done, all we have to do is sort our data based on the distance and return the
mode of the target feature (= the values we want to predict) from the
k nearest neighbours in the dataset. Voilà!
I hope this little intro and code demo explains my approach a bit, please let me know what you think! You can also check out the whole repository which is still work in progress but quickly growing. I'm also planning to do some fancy visualisations, so hope to get in touch with you and stay tuned!