XML Validation
Validation is an important feature of XML that allows one to verify that a
particular structure conforms both to the syntax and the semantics
of a DTD or an XML Schema. This insures that:
-
the XML data set will be rendered properly
-
any scripts, CGI, or other programs that process the structure
are guaranteed to work (assuming that the processing programs
have been designed to conform to the DTD)
Validation consists of determining that the XML structure is both.
well-formed and valid. Well-formed
means that it conforms to a strict tree structure, such that there
are no overlapping elements, each element has one parent node, etc.
Valid means that it conforms to the exact specifications of the DTD
or schema.
For example, the following document is not well formed:
<SHAPE> rect <SIZE> </SHAPE> 3x8 </SIZE>
because the elements overlap. However, if the DTD defines
the elements SIZE and SHAPE, this document fragment is
well-formed but not valid:
<Shape> rect <Size> 3x8 </Size> </Shape>
because, although the elements are properly nested, all definitions
are case sensitive and "SHAPE" does not equal "Shape" nor
does "SIZE" equal "Size" according to the DTD.
Validating XML Files Defined by DTDs
A DTD largely controls the syntax and semantics of an XML file by:
- specifying the sequence in which elements appear
- selecting one out of a group of elements or attributes, using the
alternation "|" character
- specifying whether or not an attribute is required
- specifying the number of times an element can appear: "+" means
1 or more times, "*" means 0 or more times, "?" means 0 or 1 time,
and no control means exactly one time
- specifying the type of data an item can hold: #PCDATA (Parsed Code
Data) means ordinary ASCII without HTML control characters, whereas
CDATA can contain any character information including HTML control
codes
A validation program reads in both the XML and the controlling DTD and then
determines whether or not any of these rules are violated: if so it can print
out a message that pinpoints the first of these errors in terms of line
and character number, the offending line, and a message about the error
type.
Validating XML Files Defined by XML Schemas
XML Schemas offer the same control as DTDs along with the
following extensions (not all of which, however, are implemented at this
time):
- Sequence: in addition to allowing different enumerations of elements
(the "+", "*", and "|" controls of DTDs), schemas also allow elements
to occur in any order
- Data types: in addition to the types
#PCDATA and CDATA, schemas let you
restrict data types to strings, numbers (integer, floating point, fixed
point), date and time formats, boolean, hexadecimal, and ASCII, among others
- Data values: schemas let you set boundaries on data values using
controls such as "dt:max", "dt.min", "dt:maxlength", etc.
- Element contents: with schemas, elements can be restricted to
containing other elements only, or text only, or a mixture of text
and elements, or null.
This program will validate any
XML file defined by a DTD or an XML Schema.