Anatomy

Let’s look at the details of what happens in a XML file. Possible tags in a text document

It all relies on identifiers: whether it is a line, a place or an author: we want to annotate (encode!) what information can be of interest in the respective file.

Formalization

An XML document needs a neat form: Formalization

Think of the Russian Matryoshka dolls: they need a top and a bottom part in order to function. That’s how we make sure information is contained in the right place under the right tag.

XML Anatomy

This image will make more sense in 5 minutes. Look at the opening and closing tag! Just like the Matryoshka doll, they open and close the information we encoded.

XML anatomy

Formalization: Box it!

In the image, the pink line delineates the document with the opening tag <div> and closing tag </div> The value type=”act” specifies the genre! This is what we call the commandline or element. <div type=”act”> </div>

XML anatomy

Task #2

What other opening and closing tags can you identify comparing the scan of the manuscrupt and the XML code? Focus on the circled information – we’ll get to those other tags in a bit!

Solutions

Input

<head> </head>
  <stage> </stage>
    <speaker> </speaker>
      <div> </div>
      <p> </p>

Task #3

What tags would you want to single out in this newspaper clipping?

CavalierDaily

Solutions

Input

<name>UNIVERSITY OF VIRGINIA, CHARLOTTESVILLE</name>
<num>NUMBER 92</num>
<head>THE CAVALIER DAILY</head>
<date>TUESDAY, MARCH 11, 1969</date>

etc.

Attributes

Input

<div1 type="nameplate">

<head>THE CAVALIER DAILY</head>
<ab><name type="place">UNIVERSITY OF VIRGINIA, CHARLOTTESVILLE</name>,
<date value="1969-03-11">TUESDAY, MARCH 11, 1969</date>
<num type="number" value="92">NUMBER 92</num></ab>
</div1>

The attribute name specifies the tag. Its matching value and type follows the = sign and is marked by “”

Divisions

Input

Big chunks
<div> big divisions
<head>
<ab> anonymous block
Mid size
<p> paraphraphs
Small pieces
<title>
<head>
<l>

A well-formed XML file

It specifies which source and thereby data set it orients with one root element Input

<TEI XMLns:html="http://www.w3.org/1999/xhtml"
     XMLns="http://www.tei-c.org/ns/1.0" >

It has a header Input

<teiHeader>
 	<fileDesc>
            <titleStmt>
                <title>Bike Ride</title>
            </titleStmt>
            <publicationStmt>
                <p>not published</p>
            </publicationStmt>
            <sourceDesc>
                <p>D.Leesing</p>
            </sourceDesc>
        </fileDesc>
  </teiHeader>

</TEI>

Comparable quoting and signatures “ … “ It is always symmetric <> </>

A valid XML file

Plays by the rule book (depending which markup rule book is used (e.g. TEI, xhtml (turning it into a webpage))

Task #4

Find the lemons! (validity)