Web Server Programming

John "Scooter" Morris

April 11, 2013

Portions Copyright © 2005-06 Python Software Foundation.

Web Programming

The Server as a Client

Fetching Pages

urllib Example

Building A Spider

$ python spider.py http://www.google.ca
http://groups.google.ca/grphp?hl=en&tab=wg&ie=UTF-8
http://news.google.ca/nwshp?hl=en&tab=wn&ie=UTF-8
http://scholar.google.com/schhp?hl=en&tab=ws&ie=UTF-8
http://www.google.ca/fr
			
import sys, urllib, re

url = sys.argv[1]
instream = urllib.urlopen(url)
page = instream.read()
instream.close()

links = re.findall(r'href=\"[^\"]+\"', page)
temp = set()
for x in links:
    x = x[6:-1]    # strip off 'href="' and '"'
    if x.startswith('http://'):
        temp.add(x)
links = list(temp)
links.sort()
for x in links:
    print x

Passing Parameters

Special Characters

Encoding Example

Screen Scraping (And Why Not)

Web Services

[Web Services]

Figure 5: Web Services

Web Services (REST)

The Server As A Client

Questions?

Server Programming

The CGI Protocol

From Server To CGI

From CGI To Server

MIME Types

Hello, CGI

Invoking a CGI

[Basic CGI Output]

Figure 6: Basic CGI Output

Generating Dynamic Content

#!/usr/bin/env python

import os, cgi

# Headers and an extra blank line
print 'Content-type: text/html'
print

# Body
print '<html><body>'
keys = os.environ.keys()
keys.sort()
for k in keys:
    print '<p>%s: %s</p>' % (cgi.escape(k), cgi.escape(os.environ[k]))
print '</body></html>'
[Environment Variable Output]

Figure 7: Environment Variable Output

A Simple Form (reprise)

[A Simple Form]

Figure 4: A Simple Form

<html>
  <body>
    <form action="/bmi280/cgi-bin/print_params.py">
      <p>Sequence: <input type="text" name="sequence"/>
      Search type:
      <select name="match">
        <option>Exact match</option>
        <option>Similarity match</option>
        <option>Sub-match</option>
      </select>
      </p>
      <p>Programs: 
      <input type="checkbox" name="frog">
        FROG (version 1.1)
      </input>
      <input type="checkbox" name="frog2">
        FROG (2.0 beta)
      </input>
      <input type="checkbox" name="bayeshart">
        Bayes-Hart
      </input>
      </p>
      <p>
        <input type="submit" value="Submit Query"/>
        <input type="reset" value="Reset"/>
      </p>
    </form>
  </body>
</html> 

Parameter Names

Handling Forms

Form Handling Example

Development Tips

Maintaining State

Maintaining State in Files

AJAX

XMLHttpRequest Example

// Handle the XMLHttpRequest
function sendRequest(sql)
{ 
  xmlhttp = new XMLHttpRequest();
  if (xmlhttp != null) {
    xmlhttp.onreadystatechange = getData; // getData is our callback method
    xmlhttp.open("GET", "/cgi-bin/getBmi280Table.py?sql="+sql, true);
    xmlhttp.send(null);
  }
}

// This method gets called whenever the object state changes.
function getData()
{ 
  // Are we complete? 
  if (xmlhttp.readyState == 4) {
    // Yes, do we have a good http status?
    if (xmlhttp.status == 200) {
      // yes, responseXML will hold the XML document, which we can address using the DOM
      // if we only wanted the raw text, we could get xmlhttp.responseText
      var response = xmlhttp.responseXML;

      // Use the DOM to get the results table from the server
      var newChild = response.getElementById("results_table");

      // Get a handle on the results div
      var tableDiv = document.getElementById("results_div");

      // Add in our results table
      tableDiv.appendChild(newChild);
    } else {
      alert("Unable to contact AJAX server: "+xmlhttp.status);
    }
  }
} 

XMLHttpRequest Methods

MethodDescription
abort()Cancels the current request
getAllResponseHeaders() Returns the complete set of http headers as a string
getResponseHeader("headername") Returns the value of the specified http header
open("method","URL",async,"username","password") Specifies the method, URL, and other optional attributes of a request

The method parameter can have a value of "GET", "POST", or "PUT" (use "GET" when requesting data and use "POST" when sending data (especially if the length of the data is greater than 512 bytes.

The URL parameter may be either a relative or complete URL.

The async parameter specifies whether the request should be handled asynchronously or not. true means that script processing carries on after the send() method, without waiting for a response. false means that the script waits for a response before continuing script processing

send(content) Sends the request
setRequestHeader("label", "value") Adds a label/value pair to the http header to be sent
Table 7: XMLHttpRequest Methods

XMLHttpRequest Properties

PropertyDescription
onreadystatechange An event handler for an event that fires at every state change
readyState Returns the state of the object:

0 = uninitialized
1 = loading
2 = loaded
3 = interactive
4 = complete

responseText Returns the response as a string
responseXML Returns the response as XML. This property returns an XML document object, which can be examined and parsed using W3C DOM node tree methods and properties
status Returns the HTTP status as a number (e.g. 404 for "Not Found" or 200 for "OK")
statusText Returns the HTTP status as a string (e.g. "Not Found" or "OK")
Table 8: XMLHttpRequest Properties

AJAX Server Side

#! /usr/bin/python
import cgi
import sys

print "Content-type: text/xml"
print ""
# We want this to be interpreted as HTML by the client
print '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'

print '<html xmlns="http://www.w3.org/1999/xhtml">'

Putting it together

bmi280.svg

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xml:lang="en" lang="en">
<head>
<script type="text/javascript" src="js/bmi280.js"></script>
</head>
<link rel="stylesheet" type="text/css" href="css/bmi280.css"/>
<body>
<h3>BMI280 - AJAX Example</h3>
<svg:svg id="svg-root" width="100%" viewBox="0 0 800 100" version="1.1" >
  <!-- Surrounding Rectangle -->
  <svg:rect x="0" y="0" width="800" height="100" style="stroke: blue; fill: none;"/>
  <!-- Recipe Entity -->
  <svg:rect x="40" y="30" width="60" height="40" class="entity" onclick="showInput('recipe_input', this);"/>
  <svg:text x="50" y="52" class="label1">Recipe</svg:text>
  <svg:line x1="100" y1="50" x2="330" y2="50" stroke="yellow" stroke-width="2"/>
  <!-- Fragment Entity -->
  <svg:rect x="330" y="30" width="60" height="40" class="entity" onclick="showInput('fragment_input', this);"/>
  <svg:text x="334" y="52" class="label1">Fragment</svg:text>
  <svg:line x1="390" y1="50" x2="630" y2="50" stroke="yellow" stroke-width="2"/>
  <!-- Gene Entity -->
  <svg:rect x="630" y="30" width="60" height="40" class="entity" onclick="showInput('gene_input', this);"/>
  <svg:text x="647" y="52" class="label1">Gene</svg:text>
  <!-- Produces relationship -->
  <svg:rect x="200" y="30" width="40" height="40" class="relationship" transform="rotate(-45,220,50)" onclick="showInput('recipe_input_join', this);"/>
  <svg:text x="201" y="52" class="label2">Produces</svg:text>
  <!-- Contains relationship -->
  <svg:rect x="500" y="30" width="40" height="40" class="relationship" transform="rotate(-45,520,50)" onclick="showInput('gene_input_join', this);"/>
  <svg:text x="501" y="52" class="label2">Contains</svg:text>
  <!-- Links and orders -->
</svg:svg>

<!-- This is the form: Note that each <span> has an ID and a class that we will use to 
     control whether we show the containing input field or not.  Also note specifically
     the way we call getTable with the arguments we want. -->
<form>
  <span id="recipe_input" class="hidden">
    Recipe Name: <input type="text" onchange="getTable('RECIPE','RECIPE.NAME', this, 'Name,File,Owner',null);"/>
  </span>
  <span id="recipe_input_join" class="hidden">
    Recipe Name: <input type="text" onchange="getTable('RECIPE,PRODUCES,FRAG','RECIPE.NAME', this, 'RECIPE.Name,RECIPE.Owner,PRODUCES.Date,FRAG.Name,FRAG.Sequence','RECIPE.RCP=PRODUCES.RCP and PRODUCES.FRAG=FRAG.FRAG');"/>
  </span>
  <span id="fragment_input" class="hidden" style="position: absolute; left: 35%;">
    Fragment Name: <input type="text" onchange="getTable('FRAG','FRAG.NAME', this, 'Name,Sequence,Circular',null);"/>
  </span>
  <span id="gene_input_join" class="hidden">
    Gene Name: <input type="text" onchange="getTable('FRAG,CONTAINS,GENE','GENE.NAME', this, 'FRAG.Name,FRAG.Sequence,GENE.Name,CONTAINS.Start,CONTAINS.End','FRAG.FRAG=CONTAINS.FRAG and GENE.ID=CONTAINS.GENE');"/>
  </span>
  <span id="gene_input" class="hidden" style="position: absolute; left: 70%;">
    Gene Name: <input type="text" onchange="getTable('GENE','GENE.NAME', this, 'Name,Protein,StartNum',null);"/>
  </span>
</form>

<!-- We'll write a header into this <h3> when we get the data -->
<h3 id="table_header" class="table_header"> </h3>
<!-- We'll write the results table into this when we get the data -->
<div id="results_div">
</div>
</body>
</html>

bmi280.css

rect.entity { fill: purple; stroke-width: 2px;}
rect.relationship { fill: lightgreen; stroke-width: 2px;}
text.label1 {fill:white; font-size:8pt; font-family: arial; font-weight: bold;}
text.label2 {fill:blue; font-size:6pt; font-family: arial; font-weight: bold;}
span.hidden {visibility: hidden; }
span.shown {visibility: visible; }
tr.table-header {font-weight: bold; text-align: center; color: green; font-family: arial;}
h3.table_header {font-family: arial; text-align: center;}
table {font-family: arial; font-size: 80%;}

bmi280.js

// Handle the XMLHttpRequest
function sendRequest(sql)
{
  xmlhttp = new XMLHttpRequest();
  if (xmlhttp != null) {
    xmlhttp.onreadystatechange = getData; // getData is our callback method
    xmlhttp.open("GET", "/bmi280/cgi-bin/getBmi280Table.py?sql="+sql, true);
    xmlhttp.send(null);
  }
}

// This method gets called whenever the object state changes
function getData()
{
  // Are we complete?
  if (xmlhttp.readyState == 4) {
    // Yes, do we have a good http status?
    if (xmlhttp.status == 200) {
      // yes, responseXML will hold the XML document, which we can address using the DOM
      // if we only wanted the raw text, we could get xmlhttp.responseText
      var response = xmlhttp.responseXML;

      // Use the DOM to get the results table from the server
      var newChild = response.getElementById("results_table");

      // Get a handle on the results div
      var tableDiv = document.getElementById("results_div");

      // Add in our results table
      tableDiv.appendChild(newChild);
    } else {
      alert("Unable to contact AJAX server: "+xmlhttp.status);
    }
  }
}
var elementShown = null;
var xmlhttp = null;
var selectedRect = null;

// ShowInput just controls the presentation of the name
// of the row we are looking for
function showInput(elementID, rect) {
  // Get a pointer to the element that called us
  var element = document.getElementById(elementID);

  // Do we already have a text input element showing?
  if (elementShown != null)
    elementShown.className = "hidden"; // Yes, hide it

  // Do we already have a rectangle highlighted?
  if (selectedRect != null)
    selectedRect.setAttributeNS(null, "stroke", "none"); // Yes, hide it

  // Show the text input
  element.className = "shown";
  elementShown = element;

  // Outline the element the user clicked on
  // Note that we need to use setAttributeNS for SVG attributes
  rect.setAttributeNS(null, "stroke", "black");
  selectedRect = rect;
}


// This is the method that gets called when a text field is changed
function getTable(tableName, column, textField, fields, where) {
    var text = textField.value; // This contains the value the user entered

    // Now, create the SELECT statement
    var sql = 'SELECT '+fields+' from '+tableName;
    if (text.length >= 2 || where != null) {
      sql += ' where ';
      if (text.length >= 2) {
        sql += column+' = "'+text+'"';
        if (where != null) {
          sql += ' AND '+where;
        }
      } else {
        sql += where;
      }
    } 
    sql += ';';
  
    // Uncomment the next line to see what we pulled together
    // alert(sql);
  
    // Issue the request.  Because our XMLHttpRequest call is
    // asynchronous, this will return immediately
    sendRequest(sql);
  
    // Clear the text field
    textField.value = "";
  
    // Add a header
    header = document.getElementById("table_header");
    header.innerHTML = tableName;
  
    // Clear the old table
    var tableDiv = document.getElementById("results_div");
    while (tableDiv.firstChild) {
      tableDiv.removeChild(tableDiv.firstChild);
    }
} 

getBmi280Table.py

#! /usr/local/bin/python

import cgi
import cgitb
import sys
import sqlite3

def returnError(errorString): 
  print """<html xmlns="http://www.w3.org/1999/xhtml">
    <body> <h3 id="results_table" style="color:red;">%s</h3> </body>
  </html>"""%errorString

cgitb.enable()

print "Content-type: text/xml"
print ""
print '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'

# Get the form data
form = cgi.FieldStorage()
if not (form.has_key("sql")):
  returnError("No SQL string?")
  sys.exit(0)

sqlStatement = form["sql"].value
rows = None

try:
  conn = sqlite3.connect ("/home/socr/b/bmi280/bmi280.db")
  cursor = conn.cursor()
  cursor.execute(sqlStatement)
  rows = cursor.fetchall()
  cursor.close()
  conn.commit()
  conn.close()

except sqlite3.Error, e:
  returnError(e.args[1])
  sys.exit(0)

print '<html xmlns="http://www.w3.org/1999/xhtml">'
print '<body>'
print   '<table id="results_table" border="1" width="80%" align="center">'
print     '<tr class="table-header">',
for column in cursor.description:
  print '<td>'+column[0]+'</td>',
print     '</tr>'
  
for row in rows:
  print     '<tr>',
  for cell in row:
    print '<td>'+str(cell)+'</td>',
  print     '</tr>'
print   '</table>'
print '</body>'
print '</html>'

AJAX - Questions?

  • Questions about CGI or AJAX?

HTML Templating

  • A lot of this program is devoted to copying values into an HTML template
    • There are lots of good systems out there, in many languages, for doing this
    • Kid in Python
    • Java Server Pages (JSPs) in Java
    • Please do not write one of your own

What About Concurrency?

  • What happens if two users try to save messages at the same time?
    • I/O is typically slower than processing
    • So most web servers try to overlap operations
  • Race condition:
    • First instance of message_form.py opens messages.txt, reads lines, closes file
    • Second instance opens messages.txt, reads the same lines, closes file
    • First instance re-opens file, writes out original data plus one new line
    • Second instance re-opens file, writes out original plus a different new line
    • First instance's message has been lost!

File Locking

  • Solution is to lock the file
    • As the name implies, gives one process exclusive rights to the file
    • After the first process acquires the lock, any other process that tries to read or write the file is suspended until the first releases it
  • Mechanics are different on different operating systems
    • But the Python Cookbook includes a generic file locking function that works on both Unix and Windows

Implementing Locking

    # Get existing messages.
    msgfile = open('messages.txt', 'r+')
    fcntl.flock(msgfile.fileno(), fcntl.LOCK_EX)
    lines = [x.rstrip() for x in msgfile.readlines()]
    
    # Add more data?
    form = cgi.FieldStorage()
    if form.has_key('newmessage'):
        lines.append(form.getfirst('newmessage'))
        msgfile.seek(0)
        for line in lines:
            print >> msgfile, line
    
    # Unlock and close.
    fcntl.flock(msgfile.fileno(), fcntl.LOCK_UN)
    msgfile.close()

Who Are You?

  • How to maintain state on the client?
    • Need to know which shopping cart to display for a particular user
  • HTTP is a stateless protocol
    • If a client makes a second (or third, or fourth…) request, server has no reliable way of connecting it to the first one
  • Can guess based on client address, elapsed time, etc.
    • But it's just a guess

Cookies

  • Solution is for the server to create a cookie
    • A string that is sent to the client in an HTTP response header
  • Client saves it (either in memory or on disk)
    • [Cookies]

      Figure 10: Cookies

  • The next time the client sends a request to the site, it sends the cookie back to the server
    • Like giving someone a claim check for their luggage

Creating Cookies

  • Represent cookies in Python using Cookie.SimpleCookie
    • Do not use SmartCookie: it is potentially insecure
  • When creating, add values to a cookie as if it were a dictionary
    • Convert it to a string (e.g., by printing it) to create the required HTTP header
  • When the cookie comes back:
    • Get the value associated of the environment variable "HTTP_COOKIE"
    • Create a SimpleCookie
    • Pass the "HTTP_COOKIE" value to the cookie's load method

Cookie Example

  • Example: count the number of times a user has visited a web site
    • If there's no cookie, create one with a count of 1
    • Otherwise, increment the count
    • Create a new cookie to send back to the user
    • Display the count
  • # Get old count.
    count = 0
    if os.environ.has_key('HTTP_COOKIE'):
        cookie = Cookie.SimpleCookie()
        cookie.load(os.environ['HTTP_COOKIE'])
        if cookie.has_key('count'):
            count = int(cookie['count'].value)
    
    # Create new count.
    count += 1
    cookie = Cookie.SimpleCookie()
    cookie['count'] = count
    
    # Display.
    print 'Content-Type: text/html'
    print cookie
    print
    print '<html><body>'
    print '<p>Visits: %d</p>' % count
    print '</body></html>'

Cookie Tips

  • Can control how long a cookie is valid by setting an expiry value
    • Either the number of milliseconds
    • Or the time it should expire (in UTC )
      • Use time.asctime(time.gmtime()) to create the value
  • Do not put sensitive information in cookies
    • Browsers store them in files on disk
    • Villains can watch network traffic, and steal data
  • Cookies should instead be random values that act as keys into server-side information

Assignment

  • Before class tomorrow get started hooking your front-end to your back-end:
    1. Write stubs on the back-end to handle any CGI or AJAX calls