libxml-ruby 0.8.0 Released: Ruby Gets Fast, Reliable XML Processing At Last
Ruby's is not known for its deftness with XML. On RubyFlow, I considered calling the community to arms over it, and solicited twenty responses on what the problem is, and what we could do about it. Robert Fischer was lamenting on the state of Ruby's libxml library, and didn't seem to like REXML much either. Tim Bray has also had a few complaints about REXML. It seemed there was a problem to be fixed; a gap in the market, as it were, for a decent XML parser for Ruby. Hpricot, despite really being an HTML parser, would have to get us by in the meantime.
Today, however, libxml-ruby 0.8.0 has been released, and Charlie Savage explains why this is such a big deal. libxml-ruby now runs on Windows (thanks to Charlie), doesn't segfault all the time, and the bindings have all been fixed over the past year (thanks to Dan Janowski). You can get going with it right now with a simple gem install libxml-ruby
libxml-ruby is known for its performance, the latest release doesn't disappoint. For a range of simple tasks, libxml clocks in at ten times quicker than Hpricot like-for-like and between 30 and 60 times faster than REXML. Charles adds:
In addition to performance, the libxml-ruby bindings provide impressive coverage of libxml's functionality. Goodies include:
- SAX
- DOM
- XMLReader (streaming interface)
- XPath
- XPointer
- XML Schema
- DTDs
- XSLT (split into the libxslt-ruby bindings)
Charles is planning to write a proper tutorial in the next week, covering some of the key features, but suggests referring to the API documentation in the meantime. The test suite (located in the test directory that comes with libxml-ruby) also looks like a great resource for code examples; very clean and straightforward. If you have any libxml-ruby tutorials or resources of your own, please post them in the comments here.
Congratulations to all of those involved in libxml-ruby's long history and especially to Charlie Savage for giving it the finish push to this mature state. Ruby's XML woes are tempered, for now at least.
July 17, 2008 at 12:51 am
You might want to hold off on the upgrade if you use aws-s3 gem, they don't play nice together.
July 17, 2008 at 3:57 am
Hi Don,
Hmm, thought we had got that fixed. If you get a chance can you post a bug on RubyForge (http://rubyforge.org/tracker/?atid=1971&group_id=494&func=browse)?
I'll try and see what's up with aws-s3, but since we're not using it, not sure how easy it will be to test.
July 17, 2008 at 5:45 am
I've verified the aws-s3 error and fixed the issue. Kudos to the aws-s3 team for inluding a nice test suite complete with mock objects that made it easy to track down.
Fix is included in libxml 0.8.1 which was just pushed to RubyForge.
Thanks for the report Don.
July 17, 2008 at 6:47 am
the API documentation link is broken
July 17, 2008 at 7:33 am
The link to the API website is broken (it says http://.). Otherwise, thanks for the posting, very interesting.
July 17, 2008 at 10:05 am
I am finding it difficult to install libxml on my Ubuntu. However I was able to install it on my CentOS. Is there a forum to ask this question further? I couldn't find one.
----------this is what my console says------------------
sudo gem install libxml-ruby
Building native extensions. This could take a while...
ERROR: Error installing libxml-ruby:
ERROR: Failed to build gem native extension.
/usr/bin/ruby1.8 extconf.rb install libxml-ruby
checking for socket() in -lsocket... no
checking for gethostbyname() in -lnsl... yes
checking for atan() in -lm... no
checking for atan() in -lm... yes
checking for inflate() in -lz... yes
checking for iconv_open() in -liconv... no
checking for libiconv_open() in -liconv... no
checking for libiconv_open() in -llibiconv... no
checking for iconv_open() in -llibiconv... no
checking for iconv_open() in -lc... yes
checking for xmlParseDoc() in -lxml2... no
checking for xmlParseDoc() in -llibxml2... no
checking for xmlParseDoc() in -lxml2... no
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers. Check the mkmf.log file for more
details. You may need configuration options.
-
-
-----------------------
July 17, 2008 at 12:12 pm
subbu, make sure you install the libxml headers. If you're on Debian, sudo apt-get install libxml-dev, or else compile and install libxml.
July 17, 2008 at 1:59 pm
@subbu: sudo apt-get install libxml2-dev
July 17, 2008 at 2:51 pm
@subbu Like the error message says, you're probably missing some libraries. Looks like you're missing libxml2
July 17, 2008 at 3:19 pm
Fixed the link. Thanks!
July 17, 2008 at 4:29 pm
"ibxml-ruby now runs on Windows"
is that relevant anymore? ;-)
In all seriousness, this is good news. Thanks, Charlie.
July 17, 2008 at 6:10 pm
Would it be possible to write some bindings between the new libxml-ruby and REXML so old REXML-based code can use it?
July 18, 2008 at 2:11 am
P
July 18, 2008 at 2:13 am
subbu - Report bugs at RubyForge:
http://rubyforge.org/tracker/?atid=1971&group_id=494&func=browse
Peter - REXML bindings are definitely possible. Anyone want to volunteer? If they pass the REXML test suite (is there one?), then I'd be happy to include them in the distribution.
July 18, 2008 at 6:11 am
Just installed libxml-ruby to get a feel for it, and one thing immediately stands out: there's no README file included!
It may be nit-picky, but I've gotten so used to a friendly README file being included in a gem that I've kind of come to expect it. Libraries near and dear to my heart like Hpricot and Net::SSH both include these.
Or, in lieu of a readme, I usually expect to be able to generate an RDoc for a library and have some good info right there in index.html.
Just my 0.02c,
S.
July 18, 2008 at 9:44 am
That is really good news.
I have been working with libxml in the past with big xml files, 20mb to 100mb.
To parse the 100mb xml file and save to database was taking like 30 minutes and huge amount of ram.
Let's see how faster it is now...
July 18, 2008 at 4:36 pm
Fred: I hope you're using a streaming API for that...
July 18, 2008 at 7:03 pm
Sebastian - How did you install. The gem package most definitely includes a readme file and its the main page of the RDocs.
Fred - Try out the xmlReader class for a streaming api.
July 18, 2008 at 8:33 pm
@Charlie : I just did a regular "gem install libxml-ruby" - maybe I should have specified the version?
July 19, 2008 at 7:00 pm
Sebastian,
What you did looks fine. What version got installed? And see if the gem directory as README file and a doc directory with RDocs (both should exist). If not, could you submit a bug? Thanks.
July 20, 2008 at 5:16 am
@Charlie: hm...this is odd. It says version 0.8.1 was installed?
I think I found the issue, sorry to raise such a fuzz: I'm using the "gemdoc" shortcut that was posted on RubyInside a while back, and it looks like it's going into a different directory than where the "good" documentation lives. On my Mac, the correct directory that has the files you mentioned is in:
file:///Library/Ruby/Gems/1.8/gems/libxml-ruby-0.8.1/doc/rdoc/index.html
But "gemdoc" looks for rdocs in:
file://localhost/Library/Ruby/Gems/1.8/doc/libxml-ruby-0.8.1/rdoc/index.html
Sorry for not catching that the first time and raising such a stink!
July 20, 2008 at 5:18 am
s/on RubyInside/on RailsEnvy/
July 21, 2008 at 6:21 pm
Sebastian - Ah, didn't know about the gemdoc shortcut. Will have to look into it.
July 25, 2008 at 8:05 pm
Rick: I did use streaming api. maybe not so optimized.
I just discovered that i was wrong, the memory footprint and the long time to do the job was only due to libxml, but loading the whole XML file and split the elements into arrays, then use libxml to parse each element at a time and save to database...
it is still a lot faster now even thou my algorithm is the slow culprit. shit...
hehe
Thanks for the tips!
and this awesome work on libXML.
it rocks!
July 25, 2008 at 8:06 pm
i mistyped previous post.
replace "was only due to libxml"
with "was not really due to libxml"
sorry
August 2, 2008 at 6:46 am
Maybe this is just a windows issue, but I had to use
require 'xml/libxml' to get it to work
http://www.concept47.com/austin_web_developer_blog/ruby/fixing-the-libxml-ruby-gem-error-uninitialized-constant-xml-nameerror/