"<>"<tagname>…</tagname><tagname/><hr> is legal in HTML<hr/> or <hr></hr><X>…<Y>…</Y></X> is legal…<X>…<Y>…</X></Y> is not"<" and ">"
&name;| Sequence | Character | Description |
|---|---|---|
<
|
<
|
Less than |
>
|
>
|
Greater than |
"
|
"
|
Double quote |
'
|
'
|
Apostrophe |
&
|
&
|
Ampersand |
Å
|
Å
|
Angstrom |
|
|
Non-breaking space |
λ
|
λ
|
Greek small letter lambda |
Λ
|
Λ
|
Greek capital letter lambda |
| Table 1: XML Character Escape Samples | ||
| Tag | Usage |
|---|---|
<html>
|
Root element of entire HTML document. |
<body>
|
Body of page (i.e., visible content). |
<h1>
|
Top-level heading. Use <h2>, <h3>, etc. for second- and third-level headings. |
<p>
|
Paragraph. |
<em>
|
Emphasized text; browser or editor will usually display it in italics. |
<address>
|
Address of document author (also usually displayed in italics). |
| Table 2: Basic XHTML Tags | |
<html> <body> <h1>Software Carpentry</h1> <p>This course will introduce <em>essential software development skills</em>, and show where and how they should be applied.</p> <address>Greg Wilson (gvwilson@third-bit.com)</address> </body> </html>
Figure 1: Simple Page Rendered by Firefox
<h1/> (level-1 heading) is semantic (meaning)<i/> (italics) is display (formatting)<h1 align="center">A Centered Heading</h1>
<p id="disclaimer" align="center">This planet provided as-is.</p>
<p align="left" align="right">…</p> is illegal<p align=center>…<p>, but modern parsers will reject it<head/> element as well as a <body/>
<!--, and end with -->
<html> <head> <title>Comments Page</title> <meta name="author" content="aturing"/> </head> <body> <!-- House style puts all titles in italics --> <h1><em>Welcome to the Comments Page</em></h1> <!-- Update this paragraph to describe the forum. --> <p>Welcome to the Comments Forum.</p> </body> </html>
<ul/> for an unordered (bulleted) list, and <ol/> for an ordered (numbered) one<li/>
<table/> for tables<tr/> (for “table row”)<td/> (for “table data”)<html>
<head>
<title>Lists and Tables</title>
<meta name="svn" content="$Id: xml.html,v 1.15 2010/04/23 20:41:32 scooter Exp $"/>
</head>
<body>
<table cellpadding="3" border="1">
<tr>
<td align="center"><em>Unordered List</em></td>
<td align="center"><em>Ordered List</em></td>
</tr>
<tr>
<td align="left" valign="top">
<ul>
<li>Hydrogen</li>
<li>Lithium</li>
<li>Sodium</li>
<li>Potassium</li>
<li>Rubidium</li>
<li>Cesium</li>
<li>Francium</li>
</ul>
</td>
<td align="left" valign="top">
<ol>
<li>Helium</li>
<li>Neon</li>
<li>Argon</li>
<li>Krypton</li>
<li>Xenon</li>
<li>Radon</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
Figure 2: Lists and Tables
<meta/> elements in document head
<img/> tagsrc argument specifies where to find the image file<html> <head> <title>Images</title> <meta name="svn" content="$Id: xml.html,v 1.15 2010/04/23 20:41:32 scooter Exp $"/> </head> <body> <h1>Our Logo</h1> <img src="../../.swc/lec/img/sc_powered.jpg" alt="[Powered by Software Carpentry]"/> </body> </html>
Figure 3: Images in Pages
alt attribute to specify alternative text<a/> element to create a linkhref attribute specifies what the link is pointing at<html>
<head>
<title>Links</title>
<meta name="svn" content="$Id: xml.html,v 1.15 2010/04/23 20:41:32 scooter Exp $"/>
</head>
<body>
<h1>A Few of My Favorite Places</h1>
<ul>
<li><a href="http://www.google.com">Google</a></li>
<li><a href="http://www.python.org">Python</a></li>
<li><a href="http://www.nature.com/index.html">Nature Online</a></li>
<li>Examples in this lecture:
<ul>
<li><a href="comments.html">Comments</a></li>
<li><a href="image.html">Images</a></li>
<li><a href="list_table.html">Lists and Tables</a></li>
</ul>
</li>
</ul>
</body>
</html>
Figure 4: Links in Pages
style attribute
<p style="text-align: center"><p style="text-align:center; font-weight:bold;"><style/> tags<head/> section:
<style type="text/css" media="all">
followed by a number of CSS instructions<head/> section:
<link rel="stylesheet" href="css/slides.css" type="text/css" media="projection" id="slideProj" />selector {property1:value1; property2:value2;...}
element.class, where class is the value of the class attribute, and element is either an HTML element or an element you've "invented".a:hover can be used to change style when over a linkp:first-letter can be used to change the style for the first letter of a paragraphp#paragaph1 would refer to the paragraph whose ID attribute is "paragraph1"ul.inc li.active" would refer to <LI/> elements with a class attribute of "active" and that are descendants of <UL/> elements with a class attribute of "inc".Example style:
<style type="text/css">
body {font-family :arial;}
p.example {font-family :courier; margin-left :5em; margin-right :5em; background-color :LightBlue;}
.center {text-align :center;}
myTitle {font-weight :bold; display :block; color :green; text-align :center; font-size :150%}
Example input:
Figure 5: Simple CSS Example Rendered by Firefox
<body> <myTitle>This is our header</myTitle> <p>We will now introduce an example. This is a standard paragraph, with all of the default styles set up by the browser. Can you think of a way you might be able to override at least one of those defaults? Back to our example, we now want to highlight a section of text, which might be a quote or some other kind of example</p> <p class="example">This is our example. Note that the margins have been adjusted and we also now have a background color. We could also have drawn a box around our example, or we could have made other adjustments.</p> <p>Now we're back to normal text.</p> </body>
<i/>, <span/>, and <b/> that can be laid out within a line (no line break)
display: inline<html>
<body>
This is a sentence with a
<myStyle style="display:inline; border: thin red solid">"myStyle" element</myStyle>
embedded in it.
</body>
</html>
<p/>, <div/> and <li/> that cause the line of text to break
display: block<html>
<body&rt;
This is a sentence with a
<myStyle style="display:block; border: thin red solid">"myStyle" element</myStyle>
embedded in it.
</body&rt;
</html&rt;
margin-top, -bottom, -left, and -right: the transparent area around the elementborder-top, -bottom, -left, and -right: the area for the border that will be painted around the elementpadding-top, -bottom, -left, and -right: the area between the actual content and the borderminidom
Figure 6: A DOM Tree
<root> <first>element</first> <second attr="value">element</second> <third-element/> </root>
ElementTree
use dictionaries instead<?xml version="1.0" encoding="utf-8"?> <planet name="Mercury"> <period units="days">87.97</period> </planet>
import xml.dom.minidom
doc = xml.dom.minidom.parse('mercury.xml')
print doc.toxml('utf-8')
<?xml version="1.0" encoding="utf-8"?> <planet name="Mercury"> <period units="days">87.97</period> </planet>
toxml method can be called on the document, or on any element node, to create textThe Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
for the detailsimport xml.dom.minidom my_xml = '''<name>Donald Knuth</name>''' my_doc = xml.dom.minidom.parseString(my_xml) name = my_doc.documentElement.firstChild.data print 'name is:', name print 'but name in full is:', repr(name)
name is: Donald Knuth but name in full is: u'Donald Knuth'
u in front of the string the second time it is printedprint statement converts the Unicode string to ASCII for displayimport xml.dom.minidom
src = '''<planet name="Venus">
<period units="days">224.7</period>
</planet>'''
doc = xml.dom.minidom.parseString(src)
print doc.toxml('utf-8')
<?xml version="1.0" encoding="utf-8"?> <planet name="Venus"> <period units="days">224.7</period> </planet>
import xml.dom.minidom
impl = xml.dom.minidom.getDOMImplementation()
doc = impl.createDocument(None, 'planet', None)
root = doc.documentElement
root.setAttribute('name', 'Mars')
period = doc.createElement('period')
root.appendChild(period)
text = doc.createTextNode('686.98')
period.appendChild(text)
print doc.toxml('utf-8')
<?xml version="1.0" encoding="utf-8"?> <planet name="Mars"><period>686.98</period></planet>
xml.dom.minidom is really just a wrapper around other platform-specific XML librariesdocument nodecreateDocument specifies the type of the document's root nodecreateDocument aresetAttribute(attributeName, newValue)
<experimenter/> nodes, extract names, and print a sorted listgetElementsByTagName method to do thisimport xml.dom.minidom
src = '''<heavenly_bodies>
<planet name="Mercury"/>
<planet name="Venus"/>
<planet name="Earth"/>
<moon name="Moon"/>
<planet name="Mars"/>
<moon name="Phobos"/>
<moon name="Deimos"/>
</heavenly_bodies>'''
doc = xml.dom.minidom.parseString(src)
for node in doc.getElementsByTagName('moon'):
print node.getAttribute('name')
Moon Phobos Deimos
nodeType
ELEMENT_NODE, TEXT_NODE, ATTRIBUTE_NODE, DOCUMENT_NODE
childNodes
data
import xml.dom.minidom
src = '''<solarsystem>
<planet name="Mercury"><period units="days">87.97</period></planet>
<planet name="Venus"><period units="days">224.7</period></planet>
<planet name="Earth"><period units="days">365.26</period></planet>
</solarsystem>
'''
def walkTree(currentNode, indent=0):
spaces = ' ' * indent
if currentNode.nodeType == currentNode.TEXT_NODE:
print spaces + 'TEXT' + ' (%d)' % len(currentNode.data)
else:
print spaces + currentNode.tagName
for child in currentNode.childNodes:
walkTree(child, indent+1)
doc = xml.dom.minidom.parseString(src)
walkTree(doc.documentElement)
solarsystem TEXT (1) planet period TEXT (5) TEXT (1) planet period TEXT (5) TEXT (1) planet period TEXT (6) TEXT (1)
Figure 7: Modifying the DOM Tree
<em/> element whose only child is a text node containing that word<em/>
getElementsByTagName, and iterate over themdef emphasize(doc):
paragraphs = doc.getElementsByTagName('p')
for para in paragraphs:
first = para.firstChild
if first.nodeType == first.TEXT_NODE:
emphasizeText(doc, para, first)
def emphasizeText(doc, para, textNode):
# Look for optional spaces, a word, and the rest of the paragraph.
m = re.match(r'^(\s*)(\S*)\b(.*)$', str(textNode.data))
if not m:
return
leadingSpace, firstWord, restOfText = m.groups()
if not firstWord:
return
# If there's text after the first word, re-save it.
if restOfText:
restOfText = doc.createTextNode(restOfText)
para.insertBefore(restOfText, para.firstChild)
# Emphasize the first word.
emph = doc.createElement('em')
emph.appendChild(doc.createTextNode(firstWord))
para.insertBefore(emph, para.firstChild)
# If there's leading space, re-save it.
if leadingSpace:
leadingSpace = doc.createTextNode(leadingSpace)
para.insertBefore(leadingSpace, para.firstChild)
# Get rid of the original text.
para.removeChild(textNode)
if __name__ == '__main__':
src = '''<html><body>
<p>First paragraph.</p>
<p>Second paragraph contains <em>emphasis</em>.</p>
<p>Third paragraph.</p>
</body></html>'''
doc = xml.dom.minidom.parseString(src)
emphasize(doc)
print doc.toxml('utf-8') <?xml version="1.0" encoding="utf-8"?> <html><body> <p><em>First</em> paragraph.</p> <p><em>Second</em> paragraph contains <em>emphasis</em>.</p> <p><em>Third</em> paragraph.</p> </body></html>