XML: A Very Brief Overview

In this workshop, we will be focusing on a markup language (and file format) called XML. XML stands for eXtensible Markup Language, which is similar in some ways to HTML and was designed to describe, store, and communicate information in a way that is readable to computers, while still human readable.

Why XML markup?

If a document is described well (structured and tagged properly), it becomes easy to then transform it into other document types, to include directions for display and styling, and to perform computational analysis.

Those attending the workshop may already be familiar with this. The section below provides a very brief intro to XML for those who are unfamiliar with it so that you can follow along with the Oxygen XML Editor demo.

XML by Example

To illustrate XML, TEI By Example provides a fun example of encoding (another word for markup). (Note we will not discuss TEI in this workshop.)

image

Here is an example of a list of items. Let’s assume it’s a shopping list someone made: with a bit of contextual information, a person would be able to understand this as a list of grocery items. We can also interpret each item as separate based on how it’s formatted as separate lines or line breaks.

To encode this to be readable by a computer, you would need to describe not only the content, but also the structure, using elements and tags. An element is a building block of XML, the containers that provide structure to the content, for example, , , and are elements. Tags denote the start and end of an element in a machine readable format (they signal an element). They are usually enclosed in angle brackets. Tags almost always appear in pairs (opening and closing). For instance:

Element: <paragraph>Hello!</paragraph> Opening tag: <paragraph> Closing tag: </paragraph>

XML does not tell you how to encode: it is a metalanguage, which is a language used to describe other languages. For example, the words “noun,” “verb,” “adverb,” and “adjective” are used to describe how many languages operate, but are not the language in and of themselves. Similarly with XML, it provides you with a kind of grammar. How to encode (what elements you choose) are up to you and the object you’re encoding. So looking at the example from TEI By Example, there are multiple ways you could go about encoding, depending on your research purpose and how you’re looking at the list.

You can encode them as a list of items, for instance:

image

Or even more structurally, as lines of text without focusing on the “items” nature (below “lg” was used by the author to indicate line group).

image

Or, as TEI by Example illustrates, could this be a poem, based on the metrical composition and structure and rhyme? If so, you might want to encode it differently to more accurately describe your object. We can see the rhyme in every four lines, so you might do something like:

image

That’s all we’ll focus on with regards to XML for this workshop. XML is non-proprietary and does not depend on a specific piece of software to use it. You can create and edit XML documents in pretty much any text editor. Oxygen XML Editor is industry standard, robust, and has features that will facilitate work on your project such as validation tools and transformation tools for transforming XML into other document types like HTML. Licenses can be cost prohibitive for the individual but UBC Library has copies installed on computers in the Digital Scholarship Lab, the lab we’re in right now, as well as on the virtual lab machines.


View in GitHub

Loading last updated date...