Simple recommendation system written in Ruby
Nothing fancy, just simple tag/words based recommendation algorithm implemented in Ruby.
Programming Collective Intelligence is an excellent book for learning these sort of things. First chapter is a recommendation engine! :)
I did something very similar in the past except I used Cosine Similarity (http://en.wikipedia.org/wiki/Cosine_similarity). It allowed me to give each tag a "weight" and when comparing the tag clouds, I would zero out any that aren't found. It works really really well.
Good stuff, and a nice writeup/explanation!
To impudently hijack the thread: for a very similar approach (jaccard similarity coefficient, ruby) which has a nice abstracted implementation for background workers, take a look at David Celis 'recommendable' - here's him introducing the same system: http://davidcel.is/blog/2012/02/07/collaborative-filtering-w... and the gem itself: http://davidcel.is/recommendable/ I believe it's been discussed on HN before.
Redis is used to store the binary votes, and to compute similarity coefficients. Since redis is very good with set operations (intersections on multi-million-member sets (and more) are crazy fast), it's quite the natural choice for the db backend. One of the cases where a NoSQL solution seems to be the right tool for the job, as a matter of fact!
I've used recommendable (incl. in production code) in the past, it works very well, is reliable, robust, and easily hackable for whatever needs. (e.g. it's meant to integrate with Rails, but it's quite simple to make it work on barebones ruby, with (e.g.) Sinatra as a lightweight web app exposing vote functionality, and so on.)
You may use Levenshtein Distance to get better results by taking word variations into account.
And also you can enhance it by using semantic similarity scores for strings.
Here is a cool approach to the subject by Linked In. http://engineering.linkedin.com/open-source/cleo-open-source...
Here is my HN-obligatory, self-written golang version: https://github.com/jamra/gocleo
I went with this author's approach to use Jaccard to rank the results, however, I like this approach better: https://neil.fraser.name/writing/patch/ They basically take the distance to the beginning of the text into account.
I recommend reading http://nlp.stanford.edu/IR-book/ on this topic.
if you want something that scales better use minhash. You get a similarity that is approx jaccard but with a lower footprint memory and cpu wise.
How well does this perform compared to your postgreSQL example?
Also - We are looking for a Ruby / Backbone.js developer drop me an email josh@seriousfox.co.uk :)
for an approach using neo4j, check out cadet! (my project) cadet is more just a jruby wrapper around neo4j, but one can use it to interact with neo4j (and thus come up with recommendations without touching a line of java, or even cypher )
still in progress, and id love any input! http://github.com/karabijavad/cadet
I remember seeing a nice little system that involved taking the square root of something, but I can't remember what it was. Anybody know it?