What is XML?

  • XML (Extensible Markup Language) is a markup language like HTML for storage or transmission of data.

  • XML is widely used in web services to transport data over the network.

  • XML has no predefined tags, unlike HTML.

  • XML is very easy to parse and generate.

  • XML provides strong support for unicode characters. The default character encoding is UTF-8 for XML documents.

  • XML defines set of rules for encoding documents in a format which are human-friendly.

  • XML is widely used in a SOA (Services Oriented Architecture).

  • XML files have the extension .xml and the media types of XML are application/xml and text/xml

  • Almost all major programming languages supports XML due to its language-independent data format.

Structure of a XML document:

  1. A XML document contain exactly one root element which is the start tag of the XML document and it contains all other elements.

  2. XML documents may begin with a prolog that appears before the root element. It has the metadata about the XML document, such as character encoding, document structure, and style sheets. For example,

    <?xml version="1.0" encoding="UTF-8"?>
  3. A tag in XML is a case-sensitive markup construct that begins with < and ends with >. A tag can be:

    • A start-tag, such as <name>;
    • A end-tag, such as </name>;
    • An empty-element tag, such as <name/>.
  5. An element in XML is formed by characters between the start-tag and the end-tag. For example, <name>John Snow</name>. It can also consists only of an empty-element tag. For example, <name/>.  
  7. XML elements can have attributes which exists within a start-tag or empty-element tag. An attribute consist of a name–value pair. For example, <img src="screenshot.png" alt="screenshot" />. Here the names of the attributes are src and alt, and their values are screenshot.png and screenshot respectively.

Syntax Rules:

  1. Each start-tag in XML must have a matching end-tag and all tags should be properly nested, with none missing and none overlapping. The tag names cannot contain any of the characters !"#$%&'()*+,/;<=>[email protected][\]^`{|}~, nor a space character, and cannot begin with "-", ".", or a number.  
  3. The characters < and & holds special meaning in XML. They are key syntax characters and should not be used in an element outside a CDATA section. XML provides escape facilities to handle these special characters. For example:

    • &lt; represents <;
    • &amp; represents &.

    XML has three other predefined entities:

    • &gt; represents >;
    • &apos; represents ';
    • &quot; represents ".
  5. A XML document cannot contain any whitespace before the XML declaration else it will be treated as a processing instruction by the parser. XML processors preserve all white space in element content, while all whitespace within the attribute values are reported as single spaces.  
  7. Similar to HTML, a comment in XML begins with <!-- and ends with -->.