Using gsub with blocks to strip attributes from HTML tags

Written . Tagged Ruby.

I love Ruby’s gsub used with blocks. To strip specified attributes from HTML tags becomes almost too easy:

1
2
3
4
5
html = 'Getting <a href="#" id="foo">rid</a> of <code id="bar">id</code> attributes, but not in text: id="not this".'

html.gsub(/<(.*?)>/) {|innards| innards.gsub(/ id=("|').*?\1/, '') }

# => Getting <a href="#">rid</a> of <code>id</code> attributes, but not in text: id="not this".