The Pug Automatic

Rails truncate helper that handles HTML tags and entities

Written January 30, 2008. Tagged Ruby, Ruby on Rails, Hpricot.

I needed a Rails helper like truncate but that doesn't truncate in the middle of a tag or entity and doesn't leave start tags without their end tags. So

truncate_html('foo & bar baz', 9)
truncate_html('foo <b class="x">ar</b> baz', 8)


'foo &amp; ...'
'foo <b class="x">a</b>...'

and not something like

'foo &a...'
'foo <...'

Also, I made it aware of HTML in the ellipsis text, so e.g.

truncate_html(@object.description, 25, link_to(@object, "more"))

only displaces four characters (for "more") of truncated string, instead of several (for the link HTML).

Oh, and being based on Hpricot, which has a robust parser, it handles crap HTML well:

>> truncate_html(%{<i><b>foo</i></b> bar<p>baz<p>boink}, 14)
=> "<i><b>foo</b></i> bar<p>baz<p>b</p></p>..."


Get it here (highlighted source).

Save it as app/helpers/text_helper.rb and make sure you're doing

helper :all


helper TextHelper

in your ApplicationController.

Previous work

I was inspired by Mike Burns' Truncating HTML in Ruby and Joakim Andersson's response Rails + tidy + REXML. Joakim uses tidy to keep REXML from choking on malformed HTML. I suggested cleaning the HTML with Hpricot, and then figured it'd be fun to write my own helper using just Hpricot, not REXML.

My helper behaves just like the Rails helper in regard to what arguments it takes and how it interprets the max length – if the string is within limits, the full string is returned; if it's longer, the truncated string including the ellipsis is limited to the max length.