Ruby XML (REXML)

XML is eXtensible Markup Language like HTML. It allows programmers to develop applications that can be read by other applications irrespective of operating system and developmental language used.

It keeps track of small to medium amounts of data without any SQL based technique in backend.

REXML is a pure Ruby XML processor. It represents a full XML document including PIs, doctype, etc. An XML document has a single child that can be accessed by root(). If you want to have an XML declaration for a created document, you must add one. REXML documents do not write a default declaration for you.

REXML was inspired by Electric XML library for Java. Its API is easy to use, small size and have followed the Ruby methodology for method naming and code flow.

It supports both tree and stream document parsing. Steam parsing is 1.5 times faster than tree parsing. However, in stream parsing you don't get access to some features like XPath.


REXML features:

  • It is written 100 percent in Ruby.
  • It contains less than 2000 lines of code, hence, lighter in weight.
  • Its methods and classes are easy to understand.
  • It is shipped with Ruby installation. No need to install it separately.
  • It is used for both DOM and SAX parsing.

Parsing XML and accessing elements

Let's start with parsing an XML document:

In the above code, line 3 parses the supplied file.

Example:

In the above code, the require statement loads the REXML library. Then include REXML indicates that we don't have to use names like REXML::Document. We have created trial.xml file. Document is shown on the screen.

Output:

Ruby XML 1

The Document.new method takes IO, String object or Document as its argument. This argument specifies the source from which XML document has to be read.

If a Document constructor takes a Document as argument, all its element nodes are cloned to new Document object. If the constructor takes a String argument, string will be expected to contain an XML document.


XML with "Here Document"

A here Document is a way to specify a text block, preserving line breaks, whitespaces or identation with text.

A here Document is constructed using a command followed by "<<" followed by a token string.

In Ruby, there should be no space between "<<" and token string.

Example:

Here, we use here Document info. All the characters including newlines between <<EOF and EOF are part of info.

For XML parsing examples, we will use following XML file code as input:

file trial.xml


Ruby XML DOM-Like Parsing

We will parse our XML data in tree fashion. The above file trial.xml code is taken as input.

Output:

Ruby XML 2

Ruby XML SAX-Like Parsing

We will parse our XML data in stream fashion. The above file trial.xml code is taken as input. Here, we will define a listener class whose methods will be targeted for callbacks from the parser.

It is advisable that do not use SAX-like parsing for a small file.

Output:

Ruby XML 3




Latest Courses