Introduction     Sitemap

Ken Ward's HTML Tutorial ...

Summary of XHTML

This page is a cookbook for making HTML 4.01 pages into XHTML pages, rather than a tutorial.

  1. What is XHTML

  2. Why Use XHTML

  3. A Basic XHTML Document

  4. Doctype

  5. Content Type

  6. Differences between HTML and XHTML

  7. All Lowercase

    1. Tags are Lowercase

    2. Attributes and Events are Lowercase

  8. All Tags Must Be Closed

  9. Attribute Values Must Be Quoted (in quotation marks)

  10. XHTML Elements Must Be Properly Nested and Well-formed

  11. The ID Attribute Replaces the Name Attribute

  12. Required Attributes

  13. HTML in JavaScript

  14. Invalid Characters

What is XHTML?

XHTML stands for E X tensible H yper T ext M arkup L anguage , which is aimed to replace HTML. It is very similar to HTML 4.01, but is a stricter and more logical. 

XHTML is a based HTML and XML (e X tensible M arkup L anguage) . It is compatible with older browsers, and is the 'HTML' of the future.

Put crudely, XHTML is the new HTML. You don't need to change your pages to be ".xhtml". They remain ".htm" or ".html", but they are written in a way that makes it less likely for them to fail in current browsers, and compatible with future ones.

Why Use XHTML ?

Browsers are presently very forgiving and support the old HTML, however, in the future, browsers and programs dealing with HTML will become more strict. A correct XHTML document is easier for a program to examine and to check.

When a browser opens a webpage, it can open an XHTML page quicker, because it doesn't have to go into "quirks" mode to figure out non-standard code.

Following the XHTML guidelines, documents are made more easy to transport to other media, and increases accessibility, in for instance, talking browsers and text-only browsers. In the future, the number of clients will increase, with telephones and other media reading internet documents. Using standard documents gives your pages forward compatibility.

Converting to XHTML now, will make your pages compatible with future technology.

Basic XHTML Document

The following is an example of a  basic XHTML document :

< ?xml version="1.0" encoding="UTF-8"? >

< !DOCTYPE html

PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

"" >

< html xmlns="" xml:lang="en" lang="en" >

< head >

< title > Greetings < /title >

< /head >

< body >

< p >Hello!< /p >

< /body >

< /html >

  The above document defines the character set in the first line. Insstead of using utf-8, you can use other character sets. For example:

< ?xml version="1.0" encoding="iso-8859-1"? >

The next one defines the character set in the meta tags.

< !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "" >



<title>My Page</title>

< meta http-equiv= "Content-Type" content= "text/html;charset=utf-8" / >





A minimal document must have a DTD at the top. The Content Type needs to be defined, although this can be done server side. And the page needs the html, head, body, and title tags.


There are three possible doctypes:

  1. Strict
    • < !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "" >

  2. Transitional
    • < !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "" >
  3. Frames
    • < !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "" >

An XHTML document should start with one of the above. More information from W3C.

You would use the strict dtd when you do not use any old, or depreciated HTML attributes, and use style sheets to format your pages (and not HTML).

You would use the "transitional" dtd when you have not completely used all the standards of XHTML, and wish to format your pages with HTML, using "font", "align", "size", "width", etc. Also the strict dtd does not accept the TARGET tag, although transitional does. This page, for example, follows the transitional standard.

The frames DTD is used when you use frames on your page.

XML Namespace

You can also define the xml namespace as follows:

< html xmlns="" xml:lang="en" lang="en" >

Content Type

In the example basic page , the Content Type was defined as utf-8 :

< meta http-equiv= "Content-Type" content= "text/html;charset=utf-8" / >

Another common one is ISO-8859-1

< meta http-equiv= "Content-Type" content= "text/html; charset=ISO-8859-1" / >

There are a number of valid character sets and implementations. The one above is a common one for those writing in Roman letters. The page must have its Content Type defined.

Differences between HTML and XHTML

Next we need to look at the differences between HTML and XHTML, or what changes we must make to produce XHTML documents.

All Lowercase

Tags must be lowercase

While HTML is not case sensitive, and you can write <a Href, <A hRef, etc, in XHTML, all tags must be lowercase. Therefore, the following is the only correct way to write tags:

<a href= .... tick..gif (902 Byte)

Attributes and Events Must Be Lowercase

The surprising thing here is that Events, which we think of as JavaScript, must be lowercase. So while we could write onClick=, or ONCLICK=, we must now write:

onclick="" tick..gif (902 Byte)

Similarly, properties and property names must be lowercase:

<a href="myURL.htm" id="myID"></a> tick..gif (902 Byte)

<img src="myImage.gif" height="20" width="30" align="left" alt="My Image" /> tick..gif (902 Byte)

The <img tag, by the way, MUST have an "alt" attribute in XHTML, which is logical because some browsers are text browsers and others talking browsers, which need the "alt" text.

The forward-slash at the end of the IMG tag leads us onto the next rule:

All Tags Must Be Closed

In HTML, it wasn't necessary to close some tags, and the following is OK in HTML:

<p>Hello<p>From me cross.gif (918 Byte)

However, in XHTML, the tags must be closed:

<p>Hello</p><p>From me</p> tick..gif (902 Byte)

Which, after all, is logical.

Some tags in HTML are empty tags and it wasn't necessary to close them. The famous example is:

<br> cross.gif (918 Byte)

In XHTML we must close this tag. One way to do this, which is compatible with older browsers is to add a space and a forward slash:

<br /> tick..gif (902 Byte)

So in XHTML we would write the empty tags as follows:

< img src= "Back.jpg" border= "0" width= "127" height= "44" alt= "My Message" />

< input type= "button" value= "button" / >

< inpu type= "checkbox" name= "C1" id="C1" value= "" checked= "checked" / >

< input type= "radio" name= "r1" id="r1" value= "" checked= "checked" / >

< link rel= "stylesheet " href= "menu.css" type= "text/css" / >

< meta name= "keywords" content= "" / >

< meta name= "description" content= "" / >

< meta http-equiv= "Content-Type" content= "text/html;charset=utf-8" / >

< base target= "_top" / >

These tags must be on one line in XHTML, whereas they could span several lines in HTML.

Attribute Values Must Be Quoted

In HTML, we could write:

< input type= "radio" name= "r1" id="r1" value= "" checked > cross.gif (918 Byte)

However, this breaks three rules. The correct way to write the above is:

< input type= "radio" name= "r1" id="r1" value= "" checked= "checked" / > tick..gif (902 Byte)

That is, all the values are in quotes, and we do not have minimisation , where we could write simply checked , when we meant checked="checked" or checked="true" .(If the name attribute is present, the id attribute is required, but they don't need to have the same value)

XHTML elements must be properly nested and well-formed

This means that we should not write:

<b><i>Hello</b></i> cross.gif (918 Byte)

But we should write this:

<b><i>Hello</i></b> tick..gif (902 Byte)

That is we close the tags in the opposite order to the way we opened them.

Also we should not write block tags such as "<p>" or "<table>") inside inline elements (such as "<a>", "<span>", or "<font>").


<font size="3"><p>Hello</p></font> cross.gif (918 Byte)

Because we have the block (p) element inside an inline element (font). Instead write:

<p><font size="3">Hello</font></p> tick..gif (902 Byte)

The id attribute replaces the name attribute

In XHTML, the name attribute is replaced with the id attribute. In transitional documents both name and id can be used, but a name without an id is not allowed.

The name and id property values must be one word. So not

<a name="a nice day" id="a nice day"></a> cross.gif (918 Byte)


<a name="a_nice_day" id="a_nice_day"></a> tick..gif (902 Byte)

Required Attributes

Some elements require an attribute. For instance, the FORM requires an action:

<form name="form1" id="form1" action="">

An empty action, it appears, is better than no action!

An image requires an "alt" attribute and an src attribute!:

<img src="myImage.jpg" alt="" />

JavaScript and CSS tags need a type attribute:

<script type= "text/javascript" >

<link rel= "myCSS.css" type= "text/css" />

<script type= "text/javascript" >

<style type= "text/css" >

< link rel= "stylesheet " href= "menu.css" type= "text/css" / >

In strict XHTML 1.0, the language tag is deprecated. So we write:

<script type= "text/javascript" >

Rather than:

<script language= "javascript" type= "text/javascript" >

HTML in JavaScript

XHTML considers the forward slash to be an end marker, so when writing HTML with JavaScript the forward slashes should be preceded with an escape character ("\", backward slash).


document.write("<H1>Hello< \ /H1>");

The backward slash precedes any forward slashes.

Invalid Characters

Some characters may not be recognised in the character set chosen. For instance, �� may not be recognised, and needs to be replaced by &copy;

Strict XHTML recognizes the following special character names only:

&amp; - ampersand ( & )
&lt; - less than, open bracket ( < )
&gt; - greater than, close bracket ( > )
&quot; - double quote ( " )
&nbsp; - non-breaking space (hard space) ( )
(See special characters ).

Enter your URL in the box below:

Most Recent Revision: 18-Oct-98.
Copyright �� 1998

I am always pleased to hear from you.
Send your comments to