XML character entity references

by gowtham 2010-02-11 12:51:34

Unlike traditional HTML with its large range of character entity references, in XML there are only five predefined character entity references. These are used to escape characters that are markup sensitive in certain contexts:

* & → & (ampersand, U+0026)
* < → < (less-than sign, U+003C)
* > → > (greater-than sign, U+003E)
* " → " (quotation mark, U+0022)
* ' → ' (apostrophe, U+0027)

All other character entity references have to be defined before they can be used. For example, use of é (which gives é, Latin lower-case E with acute accent, U+00E9 in Unicode) in an XML document will generate an error unless the entity has already been defined. XML also requires that the x in hexadecimal numeric references be in lowercase: for example ਛ rather than ਛ. XHTML, which is an XML application, supports the HTML 4 entity set and XML's ' entity, which does not appear in HTML 4.

However, use of ' in XHTML should generally be avoided for compatibility reasons. ' or ' may be used instead.

& has the special problem that it starts with the character to be escaped. A simple Internet search finds thousands of sequences &amp;amp;amp; ... in HTML pages for which the algorithm to replace an ampersand by the corresponding character entity reference was applied too often.
[edit]

Tagged in:

965
like
0
dislike
0
mail
flag

You must LOGIN to add comments