Partial wrapper for libxml with REXML fallback

Written . Tagged Ruby.

I’m currently working on Ruby code to parse XMLTV data, apply a set of “favorite shows” rules, and export the intersection as an iCalendar file with alarms, to be used in iCal.

This might ship with the Swedish Xmltv widget somewhere along the road, so ideally it should not require any module that does not ship with OS X/Ruby. However, REXML, which ships with Ruby, is many times slower than libxml, which doesn’t. Parsing the data with REXML takes around 11 seconds on my computer, whereas using libxml is a one second job.

My solution was to write a wrapper that uses libxml when available, and REXML as fallback when it’s not.

This wrapper is only adapted for my needs for this project, and thus only covers the very basics of reading XML data – just a small part of what libxml and REXML can do. Still, I’m posting it here in case it’s of use to someone else.

The interface is part libxml/part REXML/part whatever I figured was nicer than what either provided.

Usage examples are comments in the code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# LEXML by Henrik Nyh <http://henrik.nyh.se> 2006-07-24
# Free to modify, but please credit.
#
# Very simple wrapper that uses the fast libxml
# <http://libxml.rubyforge.org/> if available, otherwise the slower
# but bundled-with-Ruby-1.8+ REXML
# <http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/>. Handles the
# very basics of reading XML data.

class LEXML

  begin
    require "rubygems"
    require "xml/libxml"
    GOT_LIBXML = true
    XML::Parser.default_warnings = false
  rescue LoadError  # Fall back to REXML
    require "rexml/document"
    GOT_LIBXML = false
  end

  def self.libxml?
    # e.g:
    # puts LEXML::libxml? ? "Using libxml" : "Using REXML"
    GOT_LIBXML
  end

  def initialize(file)
    # e.g:
    # xml = LEXML.new("file.xml")
    if LEXML::libxml?
      @handle = XML::Document.file(file)
    else  # Fall back to REXML
      @handle = REXML::Document.new File.new(file)
    end
  end

  def root
    Node.new(@handle.root)
  end

  class Nodeset < Array
  end

  class Node < String
    # For a node <dog size="small" cute="true">pug</dog>:
    #   node => "pug"
    #   node[:size] => "small"; node["size"] => "small"
    #   size, cute = node[:size, :cute]
    # For a node <animals><dogs><dog>Fabulous</dog><dog>Spanko</dog></dogs>:
    #   node.child("dogs") => the dogs node
    #   node.children("dogs/dog") => both dog nodes
    def initialize(node)
      @node = node
      super((LEXML::libxml? ? @node.content : @node.text) || "")
    end
    def [](*attributes)
      list = attributes.map { |attribute| LEXML::libxml? ? @node[attribute.to_s] : @node.attributes[attribute.to_s]  }
      list.size == 1 ? list.first : list
    end
    def children(type)
      if LEXML::libxml?
        LEXML::Nodeset.new(@node.find(type.to_s).map{ |e| Node.new(e)})
      else
        kids = []
        @node.elements.each(type.to_s) {|e| kids << Node.new(e)}
        LEXML::Nodeset.new(kids)
      end
    end
    def child(type)
      children(type).first
    end
  end

end