Web Client Data Model

(XML/XHTML/CSS/DOM)

John "Scooter" Morris

April 8, 2013

Portions Copyright © 2005-06 Python Software Foundation.

Overview

  • Switching gears: it's about the user!
  • XML (eXtensible Markup Language)
  • XHTML (eXtensible HyperText Markup Language)
  • CSS (Cascading Style Sheets)
  • DOM (Document Object Model)

It's about the user!

  • Scientific software has users, but they are often overlooked
  • The focus of the database is on the data
  • The focus of the analysis is on the algorithms and the programming
  • The focus of the interface (and the system as a whole) must be on the user
  • In general, you are not the user, so how do you design with the user in mind?
    • Use cases/task definitions
    • User testing
    • Personas

Use cases/task definitions

  • Questions you need to answer:
    • What is the purpose of your system?
    • What are users going to do with the system?
  • Your UI design determines the how
  • Use case:
    • Defines the task the user is trying to achieve
    • Should include the inputs and outputs
    • Often will include subtasks before using the system and after using the system
    • Clearly specify the role of the system in the user's task
    • Often are very elaborate and complicated...
      • ...but don't need to be. It's better to keep it simple

User Testing

  • Ideal:
    • Get user feed back on:
      • use cases and tasks
      • wire-frame (white board) prototypes
      • early functional prototypes
      • final system
    • Incorporate user feedback into system...
    • ...retest
  • Real world:
    • Get user input when you can:
      • from fellow students
      • from lab mates
      • from postdocs
      • from friends and neighbors
    • Do not take critiques personally!
    • Do not explain where the user went wrong!

Personas

  • Sometimes getting a cross-section of users is not possible
    • then what?
  • Your team must substitute for the users -- how?
    • Characterize your users using personas
    • Each persona has a name, background, gender, ethnicity, specific set of desired tasks, etc.
    • Define a small set of personas that encompasses your user community
    • Run through use cases/user interfaces as each persona
      • look for issues from the viewpoint of the persona
      • essentially, you are role playing
    • Incorporate feedback from personas into your design

XML

  • XML is becoming the standard way to store everything from web pages to astronomical data
    • Bewildering variety of tools for dealing with it
    • And more appearing every day
  • This lecture describes how to process and modify XML
    • Warning: the standards are more complex than they should have been
  • Reading:

In the Beginning

  • 1969-1986: Standard Generalized Markup Language (SGML)
    • Developed by Charles Goldfarb and others at IBM
    • A way of adding information to medical and legal documents so that computers could process them
    • Very complex specification (over 500 pages)
  • 1989: Tim Berners-Lee creates HyperText Markup Language (HTML) for the World Wide Web
    • Much (much) simpler than SGML
    • Anyone could write it, so everyone did

The Modern Era

  • Problem: HTML had a small, fixed set of tags
    • Everyone wanted to add new ones
    • Solution: create a standard way to define a set of tags, and the relationships between them
  • First version of XML standardized in 1998
    • A set of rules for defining markup languages
    • Much more complex than HTML, but still simpler than SGML
  • New version of HTML called XHTML was also defined
    • Like HTML, but obeys all XML rules
    • Still a lot of non-XML compliant HTML out there
  • HTML 5 working its way through the W3C standards process
    • In part, a reaction to the complexity of XHTML 2 proposals
    • Extends HTML 4.01 with new APIs and Elements

Formatting Rules

  • A basic XML document contains elements and text
    • Full spec allows for external entity references, processing instructions, and other fun
  • Elements are shown using tags
    • Must be enclosed in angle brackets "<>"
    • Full form: <tagname>…</tagname>
    • Short form (if the element doesn't contain anything): <tagname/>
    • Note that tags must be closed in XML:
      • <hr> is legal in HTML
      • in XML or XHTML it must be closed: <hr/> or <hr></hr>

Document Structure

  • Elements must be properly nested
    • If Y starts inside X, Y must end before X ends
    • So <X>…<Y>…</Y></X> is legal…
    • …but <X>…<Y>…</X></Y> is not
  • Every document must have a single root element
    • I.e., a single element must enclose everything else
  • Specific XML dialects may restrict which elements can appear inside which others
    • XHTML is very liberal
    • MathML (Mathematical Markup Language) is stricter

Text

  • Text is normal printable text
  • Must use escape sequences to represent "<" and ">"
    • In XML, written &name;
    • Sequence Character Description
      &lt; < Less than
      &gt; > Greater than
      &quot; " Double quote
      &apos; ' Apostrophe
      &amp; & Ampersand
      &Aring; Å Angstrom
      &nbsp;   Non-breaking space
      &lambda; λ Greek small letter lambda
      &Lambda; Λ Greek capital letter lambda
      Table 1: XML Character Escape Samples
    • See List of XML and HTML character entity references for the complete list

XHTML

  • Most common use of XML is still XHTML (the XML version of hypertext)
  • Basic tags:
    • Tag Usage
      <html> Root element of entire HTML document.
      <body> Body of page (i.e., visible content).
      <h1> Top-level heading. Use <h2>, <h3>, etc. for second- and third-level headings.
      <p> Paragraph.
      <em> Emphasized text; browser or editor will usually display it in italics.
      <address> Address of document author (also usually displayed in italics).
      Table 2: Basic XHTML Tags

Sample XHTML Page

<html>
<body>
<h1>Software Carpentry</h1>

<p>This course will introduce <em>essential software 
development skills</em>,
and show where and how they should be applied.</p>

<address>Greg Wilson (gvwilson@third-bit.com)</address>
</body>
</html>
		
[Simple Page Rendered by Firefox]

Figure 1: Simple Page Rendered by Firefox

Critique of HTML/XHTML

  • HTML and XHTML mix semantics and display
    • <h1/> (level-1 heading) is semantic (meaning)
    • <i/> (italics) is display (formatting)
  • Now generally considered a bad thing
    • Modern HTML/XHTML documents contain semantic tags only
    • Control display using Cascading Style Sheets (CSS)
      • We will only cover a little of the syntax and the CSS Box Model

Attributes

  • Elements can be customized by giving them attributes
    • Enclosed in the opening tag
    • <h1 align="center">A Centered Heading</h1>
    • <p id="disclaimer" align="center">This planet provided as-is.</p>
  • An attribute name may appear at most once in any element
    • Like keys in a dictionary
    • So <p align="left" align="right">…</p> is illegal
  • Values must be quoted
    • Old-style browsers accepted <p align=center>…<p>, but modern parsers will reject it
    • Must use escape sequences for angle brackets, quotes, etc. inside values

Attributes Vs. Elements

  • Use attributes when:
    • Each value can occur at most once for any element
    • The order of the values doesn't matter
    • Those values have no internal structure
  • In all other cases, use nested elements
    • If you have to parse an attribute's value to figure out what it means, use an element instead

More XHTML Tags

  • Well-written HTML pages have a <head/> element as well as a <body/>
    • Contains metadata about the page
  • Well-written pages also use comments (just like code)
    • Introduce with <!--, and end with -->
    • <html>
      <head>
        <title>Comments Page</title>
        <meta name="author" content="aturing"/>
      </head>
      <body>
      
      <!-- House style puts all titles in italics -->
      <h1><em>Welcome to the Comments Page</em></h1>
      
      <!-- Update this paragraph to describe the forum. -->
      <p>Welcome to the Comments Forum.</p>
      
      </body>
      </html>
      
    • Unfortunately, comments cannot be nested

Lists and Tables

  • Use <ul/> for an unordered (bulleted) list, and <ol/> for an ordered (numbered) one
    • Each list item is wrapped in <li/>
  • Use <table/> for tables
    • Each row is wrapped in <tr/> (for “table row”)
    • Within each row, column items are wrapped in <td/> (for “table data”)
    • Note: tables are often used to force multi-column layout, as well as for tabular data

Example

<html>
<head>
  <title>Lists and Tables</title>
  <meta name="svn" content="$Id: xml.html,v 1.15 2010/04/23 20:41:32 scooter Exp $"/>
</head>
<body>

<table cellpadding="3" border="1">
  <tr>
    <td align="center"><em>Unordered List</em></td>
    <td align="center"><em>Ordered List</em></td>
  </tr>
  <tr>
    <td align="left" valign="top">
      <ul>
        <li>Hydrogen</li>
        <li>Lithium</li>
        <li>Sodium</li>
        <li>Potassium</li>
        <li>Rubidium</li>
        <li>Cesium</li>
        <li>Francium</li>
      </ul>
    </td>
    <td align="left" valign="top">
      <ol>
        <li>Helium</li>
        <li>Neon</li>
        <li>Argon</li>
        <li>Krypton</li>
        <li>Xenon</li>
        <li>Radon</li>
      </ol>
    </td>
  </tr>
</table>

</body>
</html>

Example

[Lists and Tables]

Figure 2: Lists and Tables

    • Note how RCS keywords have been put in <meta/> elements in document head
      • Automatically updated each time the document is committed to version control

Images

  • How to put an image in a page?
    • XML documents can only contain text, so you can't store an image or audio clip directly in a page
  • Usual solution is to store a reference to the external file using the <img/> tag
    • The src argument specifies where to find the image file

Images

<html>
<head>
  <title>Images</title>
  <meta name="svn" content="$Id: xml.html,v 1.15 2010/04/23 20:41:32 scooter Exp $"/>
</head>
<body>

<h1>Our Logo</h1>

<img src="../../.swc/lec/img/sc_powered.jpg" alt="[Powered by Software Carpentry]"/>

</body>
</html>
[Images in Pages]

Figure 3: Images in Pages

  • Always use the alt attribute to specify alternative text
    • Screen readers for people with visual handicaps use this instead of the image
    • And it's good documentation for search engines

Links

  • Links to other pages is what makes it “hypertext”
  • Use the <a/> element to create a link
    • The text inside the element is displayed and (usually) underlined for clicking
    • The href attribute specifies what the link is pointing at
    • Both local filenames and URLs are supported

Links

<html>
<head>
  <title>Links</title>
  <meta name="svn" content="$Id: xml.html,v 1.15 2010/04/23 20:41:32 scooter Exp $"/>
</head>
<body>

<h1>A Few of My Favorite Places</h1>

<ul>
  <li><a href="http://www.google.com">Google</a></li>
  <li><a href="http://www.python.org">Python</a></li>
  <li><a href="http://www.nature.com/index.html">Nature Online</a></li>
  <li>Examples in this lecture:
    <ul>
      <li><a href="comments.html">Comments</a></li>
      <li><a href="image.html">Images</a></li>
      <li><a href="list_table.html">Lists and Tables</a></li>
    </ul>
  </li>
</ul>

</body>
</html>
	

Links

[Links in Pages]

Figure 4: Links in Pages

HTML5 - Differences from HTML 4.01

  • New Elements:
    • article, aside, audio, canvas, command, datalist, details, embed, figcaption, figure, footer, header, hgroup, keygen, mark, meter, nav, output, progress, rp, rt, ruby, section, source, summary, time, video
  • Inline SVG and MathML
  • New form controls:
    • dates and times, email, url, search
  • New form methods:
    • PUT and DELETE
  • Parsing rules similar to HTML (loose vs. strict)
  • New APIs

HTML5 - New APIs

  • Canvas
  • Timed media playback (SMIL)
  • Offline storage
  • Document editing
  • Drag-and-drop
  • Cross-document messaging
  • Browser history management

HTML5 - Summary

Questions on XML or HTML?

Cascading Style Sheets (CSS)

  • Style sheets provide a way to change the look(style) of a document without changing it's structure
  • CSS can be used to:
    • change font style, color, size, and spacing; adjust margins or padding; do positioning of content either relative to other content or absolute; and provide a variety of different decorations for XML elements
    • turn elements on or off, or dynamically change the look of an element

Using Cascading Style Sheets (CSS)

  • CSS instructions can be specified in the style attribute
    • For example, a centered paragraph might be written <p style="text-align: center">
    • CSS attributes are separated by semi-colons: <p style="text-align:center; font-weight:bold;">
  • CSS instructions can also be specified as part of a style sheet
    • Style sheets can be in the document itself
      • Within <style/> tags
      • For example this document has in its <head/> section:
        <style type="text/css" media="all"> followed by a number of CSS instructions
    • Style sheets can be loaded from external files
      • This document also has in its <head/> section:
        <link rel="stylesheet" href="css/slides.css" type="text/css" media="projection" id="slideProj" />
      • The file "slides.css" contains a number of CSS instructions relevant for the slide layout
  • CSS instructions

    • The general syntax for a CSS instruction is:
    • selector {property1:value1; property2:value2;...}
    • The selector tells the style system which elements the instruction refers to
    • See http://www.w3schools.com/Css/ for a list of properties
    • The most common use of the selector is: element.class, where class is the value of the class attribute, and element is either an HTML element or an element you've "invented".
    • Selectors are actually much, much more complicated:
      • A selector can be a pseudo-class. For example a:hover can be used to change style when over a link
      • A selector can be a pseudo-element. For example p:first-letter can be used to change the style for the first letter of a paragraph
      • A selector can refer to an ID. For example p#paragaph1 would refer to the paragraph whose ID attribute is "paragraph1"
      • A selector can include parent-shild relationships. For example "ul.inc li.active" would refer to <LI/> elements with a class attribute of "active" and that are descendants of <UL/> elements with a class attribute of "inc".
      • A selector can include pattern matching, attribute matching, and much, much more...

    CSS example

    Example style:

    <style type="text/css">
    body {font-family:arial;}
    p.example {font-family:courier; margin-left:5em; margin-right:5em; background-color:LightBlue;}
    .center {text-align:center;}
    myTitle {font-weight:bold; display:block; color:green; text-align:center; font-size:150%}
    

    Example input:

    Figure 5: Simple CSS Example Rendered by Firefox

    <body>
    <myTitle>This is our header</myTitle>
    
    <p>We will now introduce an example.  This 
    is a standard paragraph, with all of the default 
    styles set up by the browser.  Can you think of 
    a way you might be able to override at least one 
    of those defaults?  Back to our example, we now 
    want to highlight a section of text, which might 
    be a quote or some other kind of example</p>
    
    <p class="example">This is our example.  Note that 
    the margins have been adjusted and we also now have a 
    background color.  We could also have drawn a box 
    around our example, or we could have made other 
    adjustments.</p>
    
    <p>Now we're back to normal text.</p>
    </body> 

    CSS example

    • Notes:
      • Elements don't have to be HTML. Can introduce your own, if it helps clarify the semantics of the document
      • If you had a large document with 20 examples, all you would need to do to change them all is change the style sheet
      • Concept is identical to Styles in Word
      • Using Javascript, can switch between loaded stylesheets
        • That's how the "0" works in our slide program

    CSS Layout Model

    • Won't go over all of the CSS syntax and tips and tricks
    • Two key things to get a handle on:
      • Inline vs. block layout
      • CSS Box model

    CSS Inline vs. Block

    • display: inline vs. display: block
      • Inline layouts are things like <i/>, <span/>, and <b/> that can be laid out within a line (no line break)
        • In-line layouts can be specified with the css property display: inline
        <html>
          <body>
            This is a sentence with a 
               <myStyle style="display:inline; border: thin red solid">"myStyle" element</myStyle> 
            embedded in it.
          </body>
        </html>
        This is a sentence with a "myStyle" element embedded in it.
      • Block layouts are things like <p/>, <div/> and <li/> that cause the line of text to break
        • Block layouts can be specified with the css property display: block
        <html>
          <body&rt;
            This is a sentence with a 
               <myStyle style="display:block; border: thin red solid">"myStyle" element</myStyle> 
            embedded in it.
          </body&rt;
        </html&rt;
        This is a sentence with a "myStyle" element embedded in it.
      • Important to know if you are creating a custom element

    CSS Box Model

    • CSS Box model: margins, borders, and padding
      • CSS uses three values for each side of the box when laying out an element:
        • margin-top, -bottom, -left, and -right: the transparent area around the element
        • border-top, -bottom, -left, and -right: the area for the border that will be painted around the element
        • padding-top, -bottom, -left, and -right: the area between the actual content and the border
      • Gives you detailed control of the spacing of elements relative to each other
      • Box width and height are specified by width and height, respectively
      • Units can be in % of surrounding element, ems, or px (pixels)

    CSS Summary

    • Best way to learn CSS:
      • Find something you like on the web
      • Figure out how they did it (use View→Page Source)
      • Set up a small example and try it!
      • Use Firefox, check out the many helpful extensions:
        • Tools→Web Developer→Error Console in Firefox
        • Tools→Web Developer→Firebug in Firefox

    Questions on CSS?

    The Document Object Model

    • The Document Object Model (DOM) is a cross-language standard for representing XML documents as trees
      • One node for each element, attribute, or text
    • Pro:
      • Much easier to manipulate trees than strings
      • Same basic model in many different languages (which lowers the learning cost)
    • Con:
      • Needs a lot of memory for large documents
      • Generic standard doesn't take advantage of the more advanced features of some languages
    • Python's standard library includes a simple implementation of DOM called minidom
      • Fast, sturdy, and well documented…
      • …if you understand all the terminology, and know more or less what you're looking for

    The Basics

    • Every DOM tree has a single root representing the document as a whole
      • Doesn't correspond to anything that's actually in the document
    • This element has a single child, which is the root node of the document
    • It, and other element nodes, may have three types of children:
      • Other elements
      • Text nodes
      • Attribute nodes

    DOM Tree Example

    [A DOM Tree]

    Figure 6: A DOM Tree

    <root>
      <first>element</first>
      <second attr="value">element</second>
      <third-element/>
    </root>

    More On Tree Structure

    • Every node keeps track of what its parent is
      • Allows programs to search up the tree, as well as down
    • Note: it's easy to forget that text and attributes are stored in nodes of their own
      • Other Python libraries like ElementTree use dictionaries instead
      • Pro: makes simple things a little simpler
      • Con: not (yet) part of the standard library

    Creating a Tree

    • Usual way to create a DOM tree is to parse a file
    • <?xml version="1.0" encoding="utf-8"?>
      <planet name="Mercury">
        <period units="days">87.97</period>
      </planet>
      
      import xml.dom.minidom
      doc = xml.dom.minidom.parse('mercury.xml')
      print doc.toxml('utf-8')
      
      <?xml version="1.0" encoding="utf-8"?>
      <planet name="Mercury">
        <period units="days">87.97</period>
      </planet>
      

    Converting To Text

    Other Ways To Create Documents

    Other Ways To Create Documents

    The Details

    Finding Nodes

    Walking a Tree

    Recursive Tree Walker

    import xml.dom.minidom
    
    src = '''<solarsystem>
    <planet name="Mercury"><period units="days">87.97</period></planet>
    <planet name="Venus"><period units="days">224.7</period></planet>
    <planet name="Earth"><period units="days">365.26</period></planet>
    </solarsystem>
    '''
    
    def walkTree(currentNode, indent=0):
        spaces = ' ' * indent
        if currentNode.nodeType == currentNode.TEXT_NODE:
            print spaces + 'TEXT' + ' (%d)' % len(currentNode.data)
        else:
            print spaces + currentNode.tagName
            for child in currentNode.childNodes:
                walkTree(child, indent+1)
    
    doc = xml.dom.minidom.parseString(src)
    walkTree(doc.documentElement)
    	
    solarsystem
     TEXT (1)
     planet
      period
       TEXT (5)
     TEXT (1)
     planet
      period
       TEXT (5)
     TEXT (1)
     planet
      period
       TEXT (6)
     TEXT (1)
    	

    Modifying the Tree

    [Modifying the DOM Tree]

    Figure 7: Modifying the DOM Tree

    Complications

    Solution

    Solution

    Not Finished Yet

    Summary

    Questions?