Indentation:


What is XML?


  • XML (Extensible Markup Language) is a markup language like HTML for storage or transmission of data.

  • XML is widely used in web services to transport data over the network.

  • XML has no predefined tags, unlike HTML.

  • XML is very easy to parse and generate.

  • XML provides strong support for unicode characters. The default character encoding is UTF-8 for XML documents.

  • XML defines set of rules for encoding documents in a format which are human-friendly.

  • XML is widely used in a SOA (Services Oriented Architecture).

  • XML files have the extension .xml and the media types of XML are application/xml and text/xml

  • Almost all major programming languages supports XML due to its language-independent data format.
 

Structure of a XML document:



  1. A XML document contain exactly one root element which is the start tag of the XML document and it contains all other elements.

    <root>
    	<section>
    		<sub-section></sub-section>
    		<sub-section></sub-section>
    	</section>
    
    	<section>
    		<sub-section></sub-section>
    		<sub-section></sub-section>
    	</section>
    <root>
     
  2. XML documents may begin with a prolog that appears before the root element. It has the metadata about the XML document, such as character encoding, document structure, and style sheets. For example,

    <?xml version="1.0" encoding="UTF-8"?>
     
  3. A tag in XML is a case-sensitive markup construct that begins with < and ends with >. A tag can be:

    • A start-tag, such as <name>;
    • A end-tag, such as </name>;
    • An empty-element tag, such as <name/>.
     
  4.  
  5. An element in XML is formed by characters between the start-tag and the end-tag. For example, <name>John Snow</name>. It can also consists only of an empty-element tag. For example, <name/>.  
  6.  
  7. XML elements can have attributes which exists within a start-tag or empty-element tag. An attribute consist of a name–value pair. For example, <img src="screenshot.png" alt="screenshot" />. Here the names of the attributes are src and alt, and their values are screenshot.png and screenshot respectively.


Syntax Rules:



  1. Each start-tag in XML must have a matching end-tag and all tags should be properly nested, with none missing and none overlapping. The tag names cannot contain any of the characters !"#$%&'()*+,/;<=>?@[\]^`{|}~, nor a space character, and cannot begin with "-", ".", or a number.  
  2.  
  3. The characters < and & holds special meaning in XML. They are key syntax characters and should not be used in an element outside a CDATA section. XML provides escape facilities to handle these special characters. For example:

    • &lt; represents <;
    • &amp; represents &.

    XML has three other predefined entities:

    • &gt; represents >;
    • &apos; represents ';
    • &quot; represents ".
     
  4.  
  5. A XML document cannot contain any whitespace before the XML declaration else it will be treated as a processing instruction by the parser. XML processors preserve all white space in element content, while all whitespace within the attribute values are reported as single spaces.  
  6.  
  7. Similar to HTML, a comment in XML begins with <!-- and ends with -->.