Fast HTML parsing in Ruby with Hpricot
Ruby legend whytheluckystuff has developed a new HTML parser called Hpricot. It's easy to install and use and parses HTML in a liberal fashion. It does, however, require a compiler to install (as it's written in C), so should be okay on Linux and Mac OS X, though not necessarily on Windows (yet).
Here's some demo code:
require 'hpricot' doc = Hpricot.parse("index.html") (doc/:p/:a).each do |link| p link.attributes end
This is a good alternative to RubyfulSoup, if you're finding RubyfulSoup too slow (though RubyfulSoup is certainly worth a try!)
July 11, 2006 at 3:33 pm
Does Hpricot install under Cygwin? I'll probably find this out for myself in a few minutes/hours but maybe someone else would want to know as well so the question might help.