Featured White Papers
- Hosted CRM buyer's guide (Inside CRM)
- Don't miss this enterprise mobility Webcast! (TechRepublic)
- Hosted CRM comparison guide (Inside CRM)
Business Services Industry
A new set of rules - XML Under the Hood
Information Outlook, Dec, 2002 by Davida Scharf
THE CONCEPT OF XML (EXTENSIBLE MARKUP LANGUAGE) IS VERY FAMILIAR to Librarians. It is a set of rules for organizing information that will enable complex search, retrieval, and manipulation of data in many ways. It is simply the latest in a long Line of tools and techniques for creating "cataloging" or "metadata." It is an important tool, with wide-ranging implications for how information will be handled in the future. While few of us will likely be called upon to work directly with XML, we should understand something about what's under the hood of these emerging systems.
Automated library catalogs using the MARC format have been great at handling traditional library resources in a standard way--saving time and money, and facilitating communication about these resources across institutions for more than 30 years. Indeed, the MARC standard was ahead of its time. MARC, along with another library standard, the Z39.50 retrieval protocol, helped put lumbering libraries on the leading edge of networked computer applications and made them serious users of the earliest incarnations of the Internet. However, we now find that our inflexible MARC records and current online library catalogs are poorly suited to providing integrated access to a wide variety of electronic content, much of it dynamic in nature. Today, a large percentage of library "holdings" are licensed electronic sources that lie outside the online catalog. XML holds the key that will enable libraries and all Web users to unlock the door to 21st century "bibliographic control" and resource-sharing, regardless of location o r format. XML documents can be used for print, the Web, or any other document medium. XML's flexibility enables designers to adopt one set of standards, tools, and methods for processing documents, regardless of their various distribution targets.
XML Only Looks Like HTML
XML is a set of rules for structuring documents and data on the Web. (1) With bracketed start and end tags, XML looks similar to HTML (Hypertext Markup Language). But it differs in that the tags delimit the content in a meaningful semantic way rather than describing the presentation of the content. XML specification 1.0 provides a means of describing documents that is independent of medium. Similar to database field tags, the tags in XML delimit the pieces of data in a systematic way so that they can be manipulated and presented later in various ways. The presentation and manipulation of the content is done in other ways, by other software. Like HTML, XML is a text format, which means it can be looked at with any text editor rather than just with the program that produced it.
The word "extensible" means it can be extended in any dimension or direction. While XML is called a meta-language, it isn't really a language as much as a grammar, or a set of rules for creating a language. These rules are used to create markup languages for specific purposes. Each application of XML can be unique, but it doesn't have to be. Indeed, what is happening is that standardized XML applications are being developed for broad categories of use. XML is an open standard that is license-free and platform-independent. It is a natural for librarians, because it can behave like a database to facilitate structuring content for improved "understanding." XML is not just for Web pages. It can be used to store any kind of structured information and to enclose or encapsulate information in order to pass it between different computing systems that would otherwise be unable to communicate.
XML tags describe the content in a standardized and consistent way, which makes them similar to database field names. A pair of tags including their content is called an "element," regardless of the granularity. Unlike database fields, tags can be nested, thereby enhancing meaning through a description of hierarchical relationships. Librarians, who truly understand the importance of the semantic nature of information, can appreciate the need for a technology that enables the developer to retain semantic relationships.
DTDs, Schemas, XSL, and More
XML is modular and usually refers to a family of standards and tools that comprise new methods for organizing and presenting information on the Web. Basic XML documents must rely on other types of XML documents to define the specifics for a particular application. A DTD (Document Type Definition) is typically used to define elements that are allowed in the group of XML documents that refer to it. A DTD is not required for documents that are considered "well formed," but it is useful as a way to describe and validate information in related XML documents. A well-formed document follows a set of rules for XML. Many DTDs already exist in journalism, law, e-commerce, and other fields.
A schema is similar to a DTD in that it expresses shared vocabularies and allows machines to execute the rules we create. Schemas provide a means for defining the structure, content, and semantics of XML documents and were designed to overcome some of the limitations of the DTD.