How to Sanitize HTML and CSS in Ruby
If you've developed an application that displays user-supplied text in a Web browser, it's always possible that the user has entered some crazy HTML (or even CSS) that will break your site's layout. While it's easy to remove all HTML from a piece of text, you might want them to use certain subsets of HTML to format their content, so you need to sanitize the user supplied HTML and CSS. Luckily, two Ruby libraries have been released in the last couple of days to sanitize HTML and CSS respectively.
HTML
Sanitize (or Github repo) by Ryan Grove is a new HTML sanitization library for Ruby. Install the sanitize gem and then it's crazily simple from there:
require 'rubygems' require 'sanitize' html = %{<strong><a href="http://foo.com/">foo</a></strong><img src="http://foo.com/bar.jpg" alt="" />} Sanitize.clean(html) # => 'foo'
As Ryan explains in his blog post, Sanitize removes all HTML by default, but you can specify options to allow certain elements, attributes, protocols, and so forth - read his post to get the full scoop. Sanitize also closes tags that are left open - excellent!
CSS
Allowing users to specify custom CSS can be.. interesting (see MySpace) but potentially damaging if it gets sent to third parties. Browsers can, in many circumstances, execute JavaScript in CSS or otherwise be given nefarious CSS to parse. Courtenay Gasking's css_file_sanitize (or Github repo) helps prevent some of these issues by sanitizing the CSS provided. It's still in its earliest stages, and the README contains no documentation (but if you see his test file, you'll get the idea), so Courtenay's open for feedback, patches, etc.
January 1, 2009 at 8:54 am
Happy New Year!
January 1, 2009 at 8:35 pm
I was in the process of converting an HTML sanitizer I wrote in perl to ruby and had to shelve it. This is noice.
January 2, 2009 at 8:35 am
Happy New Year. This is a very useful small library. Thanks for showing.
I just played a little with it and it works like a gem :-)
January 3, 2009 at 12:23 am
The CSS sanitizer might come in handy. I just had a customer come up with an idea of allowing custom CSS per user.
January 3, 2009 at 8:20 am
I prefer a lovingly massaged version of the sanitize.rb library, originally by Jacques Distler. I've taken the liberty of adding a few convenient string mixins:
http://pastie.org/351431
This library takes the slightly unconventional approach of parsing the input using the html5 gem. Genius!
I've tested it against most of the hacks on this page of XSS vectors. I consider it a litmus test for declaring your shit "secure":
http://ha.ckers.org/xss.html
Your mileage will vary. I like this because it doesn't try to process every string; you call it explicitly on what you need.
January 3, 2009 at 4:51 pm
The rails sanitize helper (from my old whitelist_helper plugin) was written using that same hackers page. The nice thing about this lib is that it uses Hpricot and is probably a lot faster than any ruby based html parser.