Building a Search Engine in 200ish Lines of Ruby
Sau Sheong Chang works at Yahoo!'s Singapore office. Yahoo! isn't implemented in Ruby, of course, but Sau's made an attempt at implementing a basic search engine in Ruby and has written a pretty interesting, indepth article about the whole process. Sau's search engine is formed of a crawler, indexer, and query system, and uses Hpricot, DataMapper, and Sinatra to get things done. Lots of code, lots of explanations - go read it.
If you want to grab Sau's code for yourself, check out the saushengine repository on Github. You can also attempt to try a live version of the engine for yourself at http://saushengine.saush.net/ - it's down at the time of writing though and Sau warns its availability will be poor.
March 21, 2009 at 3:27 am
What would be the location of these Yahoo! offices again? (Sorry, I've worked as a proof reader. Can't control myself now.)
Thanks for the link and your great service to the spreading of knowledge in the community.
March 21, 2009 at 3:35 am
...and yes, I have a funny way of writing proofreader.
March 21, 2009 at 5:40 am
Good catch - thanks :)
March 23, 2009 at 9:50 pm
Nice. It's a nice article. I wish I had read this couple of months ago when I built my own crawler for one web application... it'd save me lots of time!
March 26, 2009 at 4:45 am
My tweetjobsearch ( http://github.com/feedbackmine/tweetjobsearch/tree/master ) is an open source twitter job search engine. It is a good example of building search engine in ruby. To make it more interesting, it uses libsvm to classify text.