It has been almost exactly a month since my last post regarding the new project I am working on, so I figure it is time for an update. First off, I was excited and encouraged with the responses I received via Twitter after my initial posting. One response in particular mentioned the related work that @silviocesare is doing with the SimSeer project as well as a book he co-authored "Software Similarityand Classification". Both appear to be excellent resources and I plan to check them both out in more detail as time allows.
Admittedly it has been awhile since I've applied math concepts like this, in my head I was secretly hoping for something more along the lines of this:
All joking aside though, if you are a self-paced learner this is a great resource that is being made available for free. It is most definitely worth checking out what they have to offer.
The course uses the software package Octave(similar to Matlab) to program solutions to exercises. The Octave language gives you command line input and some pretty impressive graphics manipulation capabilities to model your data with.
In general these tools/languages such as R and Octave would likely be used to rapidly prototype your machine learning theories against your data sets. They are great for visualizing and manipulating your data sets, and quickly testing your hypotheses. However, once you are satisfied with the output of your learning algorithm, you will likely want to implement the solution with a more efficient language such as C or Java to use in your production environments. I don't know at this point where to draw that particular line in the sand, but it is something to keep in mind as you work towards your goals.
I am trying to balance this bootstrapping type of learning along with my normal daily duties here at work, and there have already been times when I've had to put this stuff down while dealing with the influx of "real work", but I'm quite excited about the things I'm picking up already, and I'm itching to get my hands dirty. My hope is that by the end of the course I will know enough to be dangerous and I can start publishing some of my initial results right here. Stay tuned...