The Information About The Information Age: E-commerce and XML

Introduction
History
Components
Implications for E-Commerce
XML and EDI
Community Standards
Predictions and Recommendations
Sources

Introduction

The introduction of the Extensible Markup Language, or XML as it is commonly known, created a buzz in the business world, particularly within the e-commerce community. It "provides both a standards-based way to identify the information that is of importance in a particular application, and the ability to process information tagged according to highly user-specific requirements with general-purpose software, such as editing tools, composition engines, and electronic browsers" (Usdin & Graham, 1998, p.125). In simpler terms, XML allows users to customize a markup language and apply it to an information object that can then be interpreted to determine its contents, whether it is an order form, a newspaper or an advertisement. Given these descriptions, it becomes apparent that XML is a tool, an enabling technology that can be used in conjunction with other tools to provide powerful Web applications. How this tool can be customized and utilized by the business community is the subject of this white paper.

History

XML's roots lie in the Standard Generalized Markup Language, or SGML. SGML was developed 20 years ago as a formal method of annotating documents to describe their meaning and structure, but it's complexity and cost hindered widespread acceptance. However, a subset of SGML called the Hypertext Markup Language, or HTML, is a phenomena that has enabled the rapid growth of the Web over the past decade. Used primarily for stylistic and formatting purposes, HTML has caused anxiety for many of its users who were interested in utilizing its tag set for more complex presentation control, data processing and programming (Treese, 1998). Because of these issues, the World Wide Web Consortium, or W3C, started a working group for a new subset of SGML, XML, in January 1997. The group "proposed a markup language that could work in concert with existing Web technologies, using some of the tools developed for use with HTML, while moving forward with more manageable techniques" (St. Laurent, 1999, p.11). A year later, in February 1998, the XML specification was ratified as a W3C standard.

While XML has its foundation in SGML, its philosophy differs and is based on four fundamental principles (Usdin & Graham, 1998).

  1. Separation of Content from Format: What a piece of information is should be managed separately from how the information is presented. Information should be identifiable by its appearance, its use in a particular application, its role in the document in which it is contained, and its nature. For example, "knowing that a phrase is in italic is useful; knowing that it is the title of a subsection of a paper is more useful; and knowing that it is a genus and species name is potentially more useful still" (p.126).
  2. Hierarchical Data Structures: In XML, the data is assumed to be hierarchically organized, that is, a piece of information may contain other pieces of information and may be contained by yet another piece of information. Textual documents often exemplify this type of structure. For example, a book contains several chapters, each of which contains sections. Each section may have a heading, paragraphs and subsections, which also contain a heading and paragraphs.
  3. Embedded Tags: The data marked up with XML contains tags, words or phrases enclosed in point brackets, which identify where the data structures begin and end. These tags can also have attributes, which provide information about the data enclosed by the tags. Example: < tag attribute="value"> content</tag>
  4. User-Definable Structures: As mentioned above, XML is a tool, and it defines a method of customized tag creation. "XML assumes that users will create new tags as they create and work with documents, and that software such as browsers will have to display or process the content of these novel tags." As such, XML provides flexibility and extensibility by not providing a standard tag set like HTML.

Components of XML

Note: All examples are derived from St. Laurent (1999).

The basic components of XML are similar to that of HTML: tags, elements and their attributes. A tag is a piece of markup such as an opening tag <P> and a closing tag </P> . When combined, these tags are used in the composition of elements. For example,

<P align="center"> This text is part of a paragraph element. It includes the <B> bold </B> element and the <I> italics </I> element. </P>

The entire paragraph has 6 tags comprising 3 elements, 2 of which are contained within the paragraph element. The paragraph element also contains an attribute specifying that the paragraph should be centered on the page. This style of markup is used in the creation of XML documents, which can be of two types: well-formed and valid. A well-formed document is syntactically correct and can be interpreted by the computer but does not refer to a Document Type Definition (DTD) that specifies tag requirements and allows the document to be validated. Syntactical correctness includes: Microsoft's Internet Explorer 5 will check the document for its form and return its contents to the screen. An example of a well-formed catalog entry in which <CATITEM> is the root element can be seen by clicking here. The code for this example is shown below (its contents will be discussed momentarily).

<?xml version="1.0"?>
<CATITEM CATEGORY="rugs">
<ITEMNAME>South Shore decorator rug</ITEMNAME>
<DESCRIPTION><STORY>This rug will add a new dimension to any room in your home and protect those hardwood floors life's daily activities.</STORY>
<FEATURES>Resilient, textural sisal, complemented by canvas band in dark green, black, blue or natural.</FEATURES></DESCRIPTION>
<MANUFACTURER NAME="PB-1">Pottery Barn</MANUFACTURER>
<ITEM><PRODNAME>South Shore decorator rug</PRODNAME>:
<LENGTH>5</LENGTH><WIDTH>7</WIDTH>
<PRICE>$99.95</PRICE>
<AIRSHIP>$14.00</AIRSHIP>
<GROUNDSHIP>$7.00</GROUNDSHIP></ITEM>
</CATITEM>

A valid XML document is well formed and complies with the guidelines of a DTD, which defines a tag set like the one used in the catalog entry example above. The DTD can be part of the XML document, or it can be referred to by the XML document. An example of a DTD for a catalog entry and the code that refers to it can be seen below. The XML document can be displayed by clicking here.

Document Type Definition
<!ELEMENT CATITEM (ITEMNAME,DESCRIPTION,PICTURE)>
<!ATTLIST CATITEM CATEGORY CDATA #REQUIRED>
<!ELEMENT ITEMNAME (#PCDATA)>
<!ELEMENT DESCRIPTION (STORY,FEATURES,PRICE)>
<!ELEMENT STORY (#PCDATA)>
<!ELEMENT FEATURES (#PCDATA)>
<!ELEMENT PRICE (#PCDATA)>
<!ELEMENT MANUFACTURER (#PCDATA)>
<!ATTLIST MANUFACTURER NAME CDATA #REQUIRED>

XML Document
<?xml version="1.0"?>
<!DOCTYPE CATITEM SYSTEM "catalog.dtd">
<CATITEM CATEGORY="rugs">
<ITEMNAME>South Shore decorator rug</ITEMNAME>
<DESCRIPTION><STORY>This rug will add a new dimension to any room in your home and protect those hardwood floors life's daily activities.</STORY>
<FEATURES>Resilient, textural sisal, complemented by canvas band in dark green, black, blue or natural.</FEATURES>
<PRICE>$99.95</PRICE></DESCRIPTION>
<MANUFACTURER NAME="PB-1">Pottery Barn</MANUFACTURER>
</CATITEM>

The collaboration the XML document and the DTD provides content for the browser (in this case, Internet Explorer 5) to interpret and display. The <?xmlversion"1.0?> and the <!DOCTYPE CATITEM SYSTEM "catalog.dtd"> make up the prolog of the XML document, or "the glue that binds DTDs to the code that applies to them" (St. Laurent, 1999, p.117). The first statement tells the browser the version of XML in use, and the second statement provides the filename of the DTD, whether it is a system or public DTD, and its location/file name on the system. A system DTD is one that has been developed for a particular Web site or business, while a public DTD has been developed for use by types of organizations (e.g. advertising, newspapers etc.).

The elements and attributes comprise the logical structure of the XML document. The DTD defines the available elements and attributes, and these specifications can be incorporated by a single XML document or document groups. In the example above, <CATITEM> is the root element and contains the attribute CATEGORY. The value of the attribute, rugs, is enclosed in quotation marks. <DESCRIPTION> is also an element, and it is the parent element to the <PRICE>, <STORY>, and <FEATURES> elements. Another element, <MANUFACTURER> also contains an attribute, NAME, which requires the name of the manufacturer.

Notice the contents the XML documents presented above are not formatted; formatting requires the use of a stylesheet such as CSS (Cascading Style Sheets), or XSL (Extensible Style Language). Using a stylesheet adds another layer of complexity to the XML document display process. In the XML document, a line is added below the <?xmlversion"1.0?> line that contains a reference to the CSS formatting file such as, <?xml-stylesheet href="xml.css" type="text/css"?> . The contents of this CSS file are shown below.

CATITEM {
	display:block;
	font-family:arial;
}
ITEMNAME {
	display:block;
	font-size:16;
	font-weight:bold;
}
DESCRIPTION {
	display:block;
	font-size:12
}
STORY{
	display:block;
	font-size:12
}
FEATURES{
	display:block;
	font-size:12;
}
PRICE {
	display:block
	font-weight:bold;
	color:red;
}
MANUFACTURER {
	display:block
	font-weight:bold;
	color:blue;
}

In this example, each element of the DTD and hence the resulting XML document is displayed according to formatting qualities such as display, font-size, font-weight, and color. The display style determines whether the contents of an element will be displayed as a separate paragraph or within an existing paragraph. Font-size, weight and color all refer to the style of the text. The view the catalog entry that refers to both the DTD and the CSS, click here.

Implications for E-Commerce

The most ubiquitous and general effect of XML is the integration of different data sources and the consequences of integration. The logistics of integrating data sources have been limiting. Different legacy systems are difficult to transcend or integrate into seamless new systems in order to process the data jointly. The departments within an organization may have developed their own databases and processes to support their efforts without coordination with other departments, often duplicating efforts. Recent efforts have emphasized sharing knowledge within an organization, and large custom-built systems were preferred over boxed applications. The rise of the Internet and the Worldwide Web urged organizational data to appear on a seamless interface for the customer. Back-end systems could be displayed via HTML, but HTML could not transport or define the organizational information. XML offers that missing link.

XML facilitates integration of data from multiple sources that are disperse and/or incompatibly formatted while retaining the meaning of the data through each step of processing. The value-added by XML is retrieving data from several sources, combining, customizing and stepping to the next process. Aggregating information from multiple databases allows organizations to personalize the data and deliver it to browsers while the original information stays in its original database in various formats. "Without XML, data retrieval particular to each database would have to be implemented. The problem with that is that you can not easily change what information you want or how it should be combined" (SoftQuad White Paper).

In other words, XML really just tags relevant data with explanatory information; saying what the data is and allowing manipulation of the data.

Companies have been "pushing HTML to its limits" (SoftQuad), attempting to use HTML to provide more information than its tags were designed to hold. The next wave of business web sites are sites shaped by their data, not their format alone. Organizations can now utilize all of the data and information available to them to make their processes much more robust and flexible. Managing the updates of information will be much easier because data can be changed at the interface and be stored in the proper databases without having to understand multiple sources. Companies can offer their users a variety of "customizable slices of display data." Developers will have a new tool at their disposal and smaller to medium-sized companies will be able to afford a demanding web presence without trashing entire back-end legacy systems.

In addition to internal data integration, XML can perform a similar function for business to business (B2B) transactions. It meets many of the interests of exchange between businesses. B2B transactions via the web are experiencing exponential growth. Any inclusion that could facilitate these transactions will also ease the amalgamation of the entire supply chain. For example, a vendor would be able to utilize the information in systems of their suppliers and/or manufacturers without physically transferring and matching up the information within them. As long as all of the members of the supply chain are utilizing the same XML tags, then the transactions among them are mutually functional. "Instant availability transforms rigid supply chains into 'supply Webs,' in which participants transact business spontaneously" (Glusko, et al). With XML supply chain integration can more effortlessly implemented and incorporated into the organization.

XML and EDI

Electronic Data Interchange (EDI) has been the communication of B2B transactions for many organizations with different equipment and connections. Although efficient at transferring data, EDI implies direct computer to computer transaction using private networks and EDI specific data formats (TechEncyclopedia).

EDI systems are intrinsically complex, expensive and proprietary networks, and brittle syntax necessitates a custom integration solution between partners in a supply chain (Glusko). Formal EDI standards were developed twenty-five years ago, but new business practices, development of global economies, and advancements in computer technologies are just several of the factors that have made those standards unworkable and impracticable for many organizations (Laplante). Since the rise of the Internet, EDI has also begun to appear rather unyielding for data transfer across the preferred Internet protocol, TCP/IP. With the Internet, a universal platform for multi-directional information exchange permeated the business world. Small to mid-sized companies that could not afford EDI systems will be able to get into the e-commerce game; others that were able to afford EDI systems will be able to utilize these existing systems. EDI/XML systems make the supply can make chain flexible, increasing the circles of businesses that can interoperate.

XML enables businesses to connect, and increases the viability of EDI systems. EDI can be carried by XML over TCP/IP. "EDI's attraction to XML lies in their shared love of specificity" (Weiss, 42). Without XML, using EDI in coordination with the Internet was like "square pegs and round holes" (Laplante).

Community Standards

Many XML projects are currently being worked on in the business community. The Open Buying Initiative (OBI) provides a way to define the interactions between trading partners. The Open Trading Protocol (OTP) provides a framework for consumer electronic commerce that can incorporate different kinds of purchasing and payment protocols (Treese). The OTP is a consortium with over thirty member companies that have developed an XML standard for information exchange on the Internet to enable a framework for multiple forms of electronic commerce (Usdin and Graham).

Other communities of business and organizational partners are working together on common XML tags and data elements. The Astronomical Markup Language, the Legal XML Working Group and Genealogical Data in XML are only a few examples of communities working on common terms.

Predictions and Recommendations

While some may wonder if XML will live up to all of the hype, several players in the Internet world have already prepared for this evolution. Many of the browsers, specifically Internet Explorer 5.0 and Netscape Communicator 5 have already extended support of XML. XML parsers for several programming languages are now available, including Java and C++. Larry Wall, the inventor of Perl announced that it would also soon support XML.

Creating Web pages that act like database records is a logical next step in the digitizing of data. Companies should, at the very least, be thinking about XML and the possibilities of integrating XML into their existing sites. Those who do not will be left playing catch-up with a competitor, and may even risk losing business opportunities if they cannot easily conform to a community of standards. Continuing to develop proprietary data forms will limit future growth and inclusion into supply chain circles.

Companies will be using XML documents for publishing product catalogs, bank statements, placing orders and scheduling shipments. The integration of data will permanently transform the use of the Internet in business. The need for custom interfaces with every customer and supplier will be gone, empowering buyers to compare products across vendors and formats. Sellers will be able to publish their catalog one time and reach several potential buyers. Online businesses will build on each other's content and services to create a new level of virtual markets and trading. Fears that XML-tagged information will make it too easy for buyers to compare products and prices, or will compromise data integrity to their competitors will realize that opportunities will be lost as e-commerce proliferates.

XML will provide businesses with benefits, but it will not be without its difficulties. It will be more difficult than HTML because there are no ubiquitous rules or manuals. Communities developing XML standards and libraries will overlap and conflict. Each supply chain cannot invent its own XML tags for products and catalogs, or "the web would be scarcely more usable as a platform for agents and other automated processes than it is today." The need for standards is obvious to many, as seen by all of the XML working groups and communities. However, they are acting independently. The W3C will be pressured to standardize certain building blocks that companies can mix and match to assemble XML applications quickly while preserving the ability to appropriately customize them.

XML will fundamentally change the future of e-commerce. HTML was the simple and powerful tool that made the first wave of e-commerce, the business-to-consumer phase, possible. XML is the tool that will enable the second phase of e-commerce, business-to-business, universally achievable. XML will facilitate enterprise integration. While the infrastructure of the Internet filled in the communication gaps that limited e-commerce, XML will fill in the information gap between participants in the supply chain.

Cited Sources

Reference Sources