XML 2006

Syllabus: in HTML | in XML | in XML + XSLT

Lectures | Tools | Projects | Assignments | Final

Lectures:

  1. August 23: What is Markup? | XML Boot Camp
  2. August 30: XML Boot Camp Part 2 | The Backstory
  3. September 6: SGML for the Web
  4. September 13: Consuming XML
  5. September 20: DTDs
  6. September 27: TEI
  7. October 4: More XSLT/More TEI
  8. October 11: Publishing / Schematron
  9. November 1: Metadata and Semantics
  10. November 8: Uses of RDF
  11. November 15: XML and Digital Libraries
  12. November 29: XML Schemas and other things

Tools:

References:

These are useful documents worth browsing. They may or may not show up in the syllabus.

Projects

We'll discuss these in detail starting in class #4. The projects are meant to take about six weeks and should be chosen from the list below. I'm open to suggestions if there's something on this list that you'd really like to work on, and there are a couple of other possibilities that may shape up before class #4, when I'll ask you to commit to one.

XML Book

In my day job at Lulu, I'm working on a project to enable the creation of textbooks and teaching materials using a free online collaborative process. If you're interested, you can contribute by writing a chapter for an XML book. The project will allow both free online browsing, and the assembly and printing of full books from sections available on the site. The printed books will be purchasable through Lulu. If you do this, you will have to agree to license your writing under a Creative Commons Attribution license. Some ideas for sections are listed below:

Publishing with XML

[Aaron Brubaker]
[Carl Harris]
[Daniel Lucas]
[Janhavi Sheode]
[Bendte Fagge]

We have a huge repository of XML documents on campus (Documenting the American South) that are marked up using the TEI guidelines. These are transformed into HTML for web presentation, but a toolset for transforming them into PDF would be nice too. There are XSLT stylesheets for producing XSL Formatting Objects written for generic TEI, and XSL-FO documents can be turned into PDF using free tools like FOP.

Searching XML

[Sean Chen]

How do you search an XML repository? DocSouth could use its own full text search capability (right now they use Google). This project might involve an investigation of the possibilities with the goal being a recommendation, or it could be an experimental implementation using a free search engine tool like Lucene. Or both.

XML and Natural Language Processing

If we have anyone who's also taking Stephanie Haas's NLP class, there is the possibility of double-dipping, i.e., working on a project that straddles both classes. This might involve developing XML Schemas for customizing Named Entity Recognition using GATE.

XML and Epigraphy

[Sara Gault]
[Marcos Rodriguez]

I've been involved with a project called EpiDoc for some years now that aims to provide guidelines for marking up ancient inscriptions using TEI XML. The guidelines will use a combination of Schematron and XSLT 2.0 to "unit test" entries. The core technology is in place, but the work of adding test rules to the individual guideline pages has stalled. You can help apply bleeding edge technology to the study of the ancient world!

XML and Bibliography

[Tim Baldwin]
[Ric Simmons]

The Pleiades project is an initiative coming out of the Ancient World Mapping Center at UNC to develop online mapping tools for the Ancient World. They need access to new publications in areas they are interested in. Tom Elliott, the lead developer of the project sends this request:

  1. Code to scrape one or more of the following web pages, and similar pages from same organizations:
  2. Code to turn the scraped results into a valid rss or atom feed with embedded tei bibliographic tags
  3. Add Dublin Core metadata, vel sim., for origin, processing, status, etc.
  4. Make it all run persistently/automatically/updatefully under apache/tomcat/cocoon/exist, as appropriate

We would use this to feed bibliographic data into Pleiades for subsequent use, refinement, etc. by collaborators. We would also expect to expose the feeds perpetually for the use of others.

Assignments

All assignments are to be delivered via email and are due before the following class.

  1. Due September 13th
    Part 1: There are at least 3 problems with the HTML version of the syllabus. Find as many of them as you can. You will want to use validation tools and reading http://www.hixie.ch/advocacy/xhtml will help.
    Part 2: Create a valid RSS 2.0 or Atom feed XML file for the syllabus. In a sentence or two explain how you did it.
  2. Due October 4th
    Take the TEI document here and use XSLT to transform it into an HTML document with the same title as the source document, listing (i.e. using HTML list tags) all of the persons and places named in the document.
    Extra credit: explain why the document doesn't display properly in Firefox.
    Solution: XSLT and resulting HTML.
  3. Due November 1st
    Part 1: Develop a Schematron schema to perform check upon this document from Documenting the American South:
    1. check that all name keys are consistent (i.e. personal name keys start with 'pn', all others start with 'name')
    2. check that editor is represented in a respStmt
    3. check that all note references point to a real note
    4. check that all notes have a ref
    5. check that the text id matches the start of all the page break ids
    Part 2: Transform the schema into a stylesheet, using the reference implementation (more details here) run the report, and turn the results in along with the schema. UPDATE: the skeleton XSLT only produces text output, so you may want to use this in addition to it. The schematron-report.xsl make nice HTML output, and may be easier to read. If you use it, you need both XSLTs in the same directory. UPDATE 2: Here are the test schematron we did in class, and the XSLT generated from it. The new due date is Friday, November 3 by 5:00 pm.

Final Exam

The final is due by 5:00 pm on December the 14th. It consists of two parts. You should plan to spend less than 3 hours to complete it. There isn't a strict time limit, but if you're taking longer than that, then you're doing too much work. The exam is open book/internet/etc. You are expected to deploy all the means available to you to complete the test, with the exception that, as usual, you are not allowed to give nor receive help on the test. The exam should be delivered to me via email before the due date, and you will receive confirmation of its receipt from me. If you don't receive confirmation within a reasonable amount of time, please follow up with me to make certain I have received your exam.

  1. Create an XML vocabulary for the music metadata contained in this file. The file is tab-separated text, and should be openable in Excel or the spreadsheet application of your choice. Produce an example file containing the data, and a schema (either XML schema or RelaxNG) for the vocabulary. Write an XSLT that will display the example file as a list of songs.
  2. On your first day as digital librarian at XYZ College, your boss tells you they have a grant to digitize an archive relating to the founding of the college. The archive consists of images of the founder and his associates, the first faculty members, buildings, documents including meeting minutes, letters, and essays, plans for buildings and the layout of the campus, and relevant newspaper articles. Your job is to digitize this material, organize it, and create a searchable, browseable, web front end to present it to the public. Describe how you plan to tackle this assignment and discuss the technologies you intend to employ to make this work. How will XML technologies fit into your plans? Provide citations for materials you reference in your answer.