PC204 Lecture 4

Tom Ferrin

tef@cgl.ucsf.edu

Homework Answers

3.1 - Encapsulate the given code into a function, write code to call it, and test the resulting values.

epsilon = 0.0000001
def square_root(a):
    x = a / 2.0
    while True:
        y = (x + a / x) / 2
        if abs(y - x) < epsilon:
            break
        x = y
    return y

def test_square_root():
    import math
    for i in range(1, 10):
        mine = square_root(i)
        theirs = math.sqrt(i)
        delta = abs(mine - theirs)
        print("%3.1f %-13.11g %-13.11g %.11g" % (float(i),  mine, theirs, delta))

test_square_root()

1.0 1             1             1.1102230246e-15
2.0 1.4142135624  1.4142135624  2.2204460493e-16
3.0 1.7320508076  1.7320508076  0
4.0 2             2             0
5.0 2.2360679775  2.2360679775  0
6.0 2.4494897428  2.4494897428  8.881784197e-16
7.0 2.6457513111  2.6457513111  0
8.0 2.8284271247  2.8284271247  4.4408920985e-16
9.0 3             3             0

(For details about the formatting used in the print statement see String Formatting Operations that was discussed in last week's lecture.)

3.2 - Read a Protein Data Bank file and compute the center of atoms by averaging the atomic coordinates.

def average_coord(code):
    f = open(code, 'r')
    sum_x = sum_y = sum_z = 0.0
    num_atoms = 0
    for line in f:
        if not line.startswith("ATOM"):
            continue
        sum_x += float(line[30:38])    # get only the X coordinate of the record
        sum_y += float(line[38:46])    # ditto for Y
        sum_z += float(line[46:54])    # ditto for Z
        num_atoms += 1
    if num_atoms == 0:
        print("no atoms were found")
    else:
        avg_x = sum_x / num_atoms
        avg_y = sum_y / num_atoms
        avg_z = sum_z / num_atoms
        print("average coordinates of ", num_atoms, "atoms:", avg_x, avg_y, avg_z)
    f.close()

average_coord(input('PDB entry: ').strip())

average coordinates of  1102 atoms: 21.8288139746 5.7510508167 95.3145789474

Assignment 3.2, take two

Recall we've been saying how there's already a ton of previously written Python modules that you can take advantage of? Well the urllib.requests module and io.TextIOWrapper class are a couple of these. Using these, we could easily have our program just fetch the needed coordinate file automatically. Our previous solution changes only slightly...

def open_pdb(code):
    url = "https://files.rcsb.org/view/%s.pdb" % code
    from urllib.request import urlopen
    from io import TextIOWrapper
    # urlopen returns a binary stream, so we convert it for text I/O
    return TextIOWrapper(urlopen(url))

def average_coord(code):
    f = open_pdb(code)
    # Everything else is the same as before

Quick Review

Python programs can be decomposed into modules, statements, and objects:
- Programs are composed of modules;
- Modules contain statements;
- Statements create and process objects.
"Objects" are also known as "data structures" in some programming languages. They’re called objects in Python to distinguish them because the low-level data structure manipulation functions often needed with many programming languages aren’t needed in Python.

Python has several built-in object types. These are...

Object Type	Examples
Numbers	3.1416, 42, 123456789
Strings	'pc204', "Joe's story"
Files	text = open('eggs', 'r').read()
Lists	[1, [2, 'three'], 4]
Tuples	(1, 'spam', 4, 'U')
Dictionaries	{key: value, 'food': 'spam'}
Sets	{key, 'food'}

Numbers can be any of several types...

Constant	Interpretation
1234, -24, 0	Integers
1.23, 3.14e-10, 0.0	Floating point number
0177, 0x9ff	Octal and hexadecimal constant
3+4j, 3.0+4.0j, 3J	Complex number constants

Python 3 integers have unlimited precision
Floating point numbers can range from +/- 4.9e-324 to +/- 1.8e+308 and have approximately 16 digits of precision
Python 2 uses integers that can be in the range of -2,147,483,648 to 2,147,483,647, and supports "long integers" which have unlimited precision

More about floating point numbers

Floating-point numbers are represented in computer hardware as base 2 (binary) fractions. For example, the decimal fraction 0.125 has the value 1/10 + 2/100 + 5/1000
Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. A consequence is that, in general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine
Consider the fraction 1/3. You can approximate that as a base 10 fraction as 0.3333333333333. But no matter how many digits you specify, the result is never exactly 1/3
In the same way, no matter how many base 2 digits you’re willing to use, the decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base 2, 1/10 is the infinitely repeating fraction 0.0001100110011001100110011001100110011001100110011...

Floating Point Numbers Continued

It's easy to forget that the stored value is an approximation to the original decimal fraction because of the way that floats are displayed by Python. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine
If Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display 0.1000000000000000055511151231257827021181583404541015625
Since this is more digits than most people find useful, Python keeps the number of digits manageable by displaying a rounded value instead, so that 0.1 prints as 0.1

Floating Point Numbers Continued

But it’s important to realize that this is, in a very real sense, an illusion: the value in the machine is not exactly 1/10, you’re simply rounding the display of the true machine value. This fact becomes apparent as soon as you try to do arithmetic with these values...
```
print("%.20f" % (0.1 + 0.2))
0.30000000000000004441
```

This can lead to some very unexpected results...

>>> if ((0.1 + 0.2) != 0.3):
>>>    print("How can this be?")
How can this be?

Floating Point Numbers Continued

Note that this is in the very nature of binary floating-point. This is not a bug in Python, and it’s not a bug in your code either. You’ll see the same kind of thing in all computer languages that support floating-point arithmetic.
This issue comes up most often when testing floats for equality. So instead of testing for equality, you can just test to see if the absolute value of the difference is a very small number...
```
>>> epsilon = 1.0e-14
>>> if (abs((0.1 + 0.2) - 0.3) > epsilon):
>>>    print("How can this be?")
```
Binary floating-point arithmetic holds several surprises like this. You should read The Perils of Floating Point by Bruce Bush for a more complete account of other common surprises.

Lists in Python

Lists are ordered collections of arbitrary objects that can be accessed by offsets (just like strings), can vary in length, and can contain other lists (i.e. are nestable). Unlike strings, lists are mutable sequences because they can be modified in place, which means they support operations like deletion, index assignment, and methods. Lists contain, technically, zero or more references to other Python objects.

Common List Expressions and Methods

Operation	Interpretation
L1 = [ ]	Creates an empty list
L2 = [0, 1, 2, 3]	A four element (or item) list
L3 = ['one', 'two', [1, 2]]	Nested sublists
L2[2]	Third item in a list (like string offsets, list offsets begin at 0)
L3[2][0]	First sublist item in the third list item
L2[i:j]	Slice (just like in strings)
len(L3)	Length (just like in strings)
L1 + L2	Concatenation
L1 * 4	Repetition
for x in L2:	Iteration
'two' in L3	Membership test

More List Expressions and Methods

Operation	Interpretation
L2.append(4)	Grow list at end by 1 item (the integer 4)
L2.extend([1, 2, 3])	Grow list at end by multiple items
L2.sort()	Sort the list
L2.index(n)	Find index of 'n' in list
L2.reverse()	Reverse items in the list
del L2[k]	Remove k^th item from the list
L2[i:j] = [ ]	Remove i^th through (j-1)^th items
L2[2] = 42	Replace 3^rd item in the list (index assignment)
L2[1:3] = [0, 0]	Replace 2^nd & 3^rd list items with zeros (slice assignment)

Examples

Just like with strings, items in a list are fetched by indexing; i.e., providing the numeric offset of the desired item in the list. In other words, indexing a list begins at 0 and ends at one less than the length of the list. You can also fetch items from a list using negative offsets, which just count backwards from the end of the list.

>>> L1=[1, 2, 3, 4]
>>> L2=[5, 6]
>>> len(L1)
4

>>> L1[1:3]		# indexing begins at 0!
[2, 3]

>>> L1[:3]		# missing first index means from the beginning
[1, 2, 3]

>>> L1[2:]		# missing second index means through end of the list
[3, 4]

>>> L1[0:-1]	# -1 means one less item than the last item
[1, 2, 3]

>>> L1 + L2
[1, 2, 3, 4, 5, 6]

>>> L1.append(L2)
>>> L1
[1, 2, 3, 4, [5, 6]]

Indexing and Slicing

Indexing and Slicing of lists are very common operations, so just remember...
Indexing = L[i]:
- Fetch items at offsets (the first item is at offset zero)
- Negative indexes mean to count from the end of the list
- L[0] fetches the first item
- L[-2] fetches the second item from the end
Slicing = L[i:j]:
- Extracts contiguous section of items from a list
- Slice boundaries default to zero and the list length
- L[1:3] fetches from offset 1 up to but not including offset 3
- L[1:] fetches from offset 1 through the end of the list
- L[:-1] fetches from offset 0 up to but not including the last item

Example Usage in a Function

Consider this function...

def extract(s):
    """extract space-separated words from string 's' and
    return each word in a list"""
    result = [ ]        # begin with empty list
    while s:
        k = 0           # k will be the end of a slice
        for c in s:     # inspect chars in string one at a time
            if c == ' ':
                break   # exit the loop if character is a space
            k = k + 1   # increment slice limit by 1
        result.append(s[0:k])  # add a new word to the list
        s = s[k+1:]     # save the remaining string
    return result       # done - return the list to the caller

It produces this...

>>> print(extract('Now is the time for all good people'))
[’Now', 'is', 'the', 'time', 'for', 'all', 'good', 'people']

>>> print(extract('Now'))       # how about with just one word in s?
[‘Now’]

It's always important to test your code, especially at the "boundaries" of where it's designed to work!

Boundary Conditions

So, for example, what if we call extract() with a argument that doesn't have any words in it?
```
>>> print(extract(''))	# test for correct result with null string
[ ]
```
That seems right. But what about a string that only has spaces and no words?
```
>>> print(extract('   '))	# a string with three spaces
['', '', '']
```
Hmm. Is a list of three items the correct result?

Boundary Conditions Continued

At least for strings, most would argue that it doesn't matter how many spaces there are between words. So, it's one or more spaces that separate words in a string. Similarly, if there aren't any words at all in the string, even if there are some spaces, then the right answer for our function to return should be an empty list. So we need to fix our code and re-test it...

def extract(s):
    """extract space-separated words from string 's' and
    return each word in a list"""
    result = [ ]        # begin with empty list
    while s:
        k = 0           # k will be the end of a slice
        for c in s:     # inspect chars in string one at a time
            if c == ' ':
                break   # exit the loop if character is a space
            k = k + 1   # increment slice limit by 1
        if k != 0:      # ONLY ADD A NEW WORD IF ONE WAS FOUND
            result.append(s[0:k])  # THERE WAS, SO ADD IT
        s = s[k+1:]     # save the remaining string
    return result       # done - return the list to the caller

Produces...

>>> print(extract('   '))   # a string with three spaces
[ ]

>>> print(extract('   Now  is   the time     '))   # spaces sprinkled here and there
['Now', 'is', 'the', 'time']

Boundary Conditions Continued

The point of all this is that as a programmer you have to decide how you want your code to behave and then test to make sure it does what is desired. "Boundary cases" such as this come up all the time in programming and you must get in the habit of testing your code for these cases. For example, does the code do the right thing if a loop never gets executed at all? How about if it's only executed once? How about if it's executed multiple times? You should also get in the habit of documenting the correct behavior -- especially with unusual cases -- with comments so that if you or someone else later looks at your code, it's clear that you've considered the boundary cases.

More String Functions

Let's build on our previous example by adding another function...

def instring(w, s):     # is word w contained in s?
    words = extract(s)  # 'words' is now a list of words
    for x in words:     # consider each, one at a time
        if x == w:      # test for a match
            return True
    return False

str = '   Now is the    time for all good people   '
print(instring('time', str))
True

print(instring('tom', str))
False

print(instring('', str))
False

print(instring('', ''))
False

The last two cases are again testing boundary conditions. Is an empty string contained in str? For our example here (where our code is designed to operate on words), since an empty string is not a word then it can never be contained in str and our function is therefore working correctly. The point is, you're the programmer and you need to decide, test, and document what the correct behavior of your code should be.

A Little Optimization

Our instring() function on the previous slide can be improved. For one thing, the "for" loop can be eliminated and return statement simplified...
```
for x in words:
    if x == w:
        return True
    return False
```
...and replaced with a membership test...
```
return w in words    # return True if w in words
```
And we may be able to use some built-in list functions:

Functions	Return Value
all(L1)	True if all elements in L1 are True
any(L1)	True if any element in L1 is True
min(L1)	Smallest element in L1
max(L1)	Largest element in L1
reversed(L1)	Elements of L1 in reverse order
sorted(L1)	Elements of L1 in sorted order
sum(L1)	Sum of elements in L1
sum(L1, start)	Sum of elements in L1 + start

Another built-in function that is very useful:

# Loop through array using indices
for index in range(len(L1)):
    value = L1[index]
    print(index, value)

Here's the equivalent using the enumerate() function:

for index, value in enumerate(L1):
    print(index, value)

And Finally There Is "List Comprehension"

Example:

# Apply an operation to each element in a list
# and keep the results in a second list
L2 = [ ]
for value in L1:
    L2.append(value + 10)

# same using list comprehension
L2 = [ value + 10 for value in L1 ]

# and you can even skip some elements if you want
L3 = [ value + 10 for value in L1 if value < 100 ]

You don’t need to use list comprehension in your programs, but you should recognize it when you see it. Sometimes people get too clever with list comprehension and the code ends up being very difficult to read. It's best use is for simple mapping from one set of values to another.

Tuples

Tuples are just like Python lists, but they are immutable. All the same operations that worked on lists work on tuples except tuples don't provide the content-altering methods that lists do (e.g. append(), sort(), reverse()). Like lists, concatenation, repetition, and slicing applied to tuples return the results in new tuples. The immutability of tuples provides object integrity; you can be sure that a tuple can't be changed inadvertently somewhere else in your program.

Examples

>>> t1 = (1, 2, 3, 4)	# this is a tuple
>>> t2 = 5, 6, 7, 8		# this too (syntactically unambiguous)
>>> t3 = (9,)	# a one-item tuple (comma required to avoid ambiguity with expressions)
>>> t4 = (9)	# an expression that evaluates to 9
>>> t5 = ('abc', (1, 2, 3), 'def')    # nested tuples

A common use of tuples is in function return values:

def trivial():
    x = 2.71828
    y = 3.14159
    return (x, y)

m, n = trivial()

Python Dictionaries

Dictionaries are unordered collections of arbitrary values that can be accessed via an associated "key". Keys are unique, i.e., can only appear at most once in each dictionary.
Dictionaries are of the category mutable mapping, which means they can be modified in place (like lists), but don’t support sequence operations (like strings and lists). An item is a (key, value) pair.

Common Dictionary Constants and Operations

Operation	Interpretation
D1 = { }	Creates an empty dictionary
D2 = {'tom':1, 'conrad':2}	A two item dictionary
D3 = {'tom':1, 'conrad':{'greg':3, 'eric':4}}	Nesting
D2['conrad']	Retrieval of value by key
D3['conrad']['eric']	Nested retrieval
D2.has_key('tom') or 'tom' in D2	Membership test
D2.keys()	List of all keys in the dictionary
D2.values()	List of all values in the dictionary
D2.items()	List of all (key, value) pairs
D2.get(k, v)	Value with key "k" if k in D2, otherwise "v"
D2.setdefault(k, v)	Like D2.get(), but also adds item to dictionary D2
len(D2)	Number of keys in the dictionary
D2[key] = value	Add or change an item
del D2[key]	Delete an item
D4 = D2.copy()	Create a (shallow) copy of D2

Examples


>>> d1 = {'tom':1, 'conrad':2, 'greg':3, 'eric':4}
>>> d1['greg']		# given a key, fetch associated value
3

>>> len(d1)		# return number of items in the dictionary
4

>>> d1.has_key('eric')		# test for the presence of a key
True

>>> d1.keys()		# return list of all keys
['tom', 'conrad', 'greg', 'eric']

>>> d1['tom'] = 42	# assign a new value for key 'tom'
>>> d1['conrad'] = [3, 4, 5]	# items can be arbitrary objects
>>> d1		# note that dictionaries are unordered!
{'conrad': [3, 4, 5], 'greg': 3, 'eric': 4, 'tom': 42}

>>> del d1['greg']	# delete an entry
>>> d1
{'conrad': [3, 4, 5], 'eric': 4, 'tom': 42}

>>> d1['al'] = 'good man'	# assigning to a new index adds a new entry
>>> d1
{'al': 'good man', 'conrad': [3, 4, 5], 'eric': 4, 'tom': 42}

>>> d1[101] = 'test'	# keys can be any immutable object
>>> d1
{'al': 'good man', 'conrad': [3, 4, 5], 101: 'test', 'eric': 4, 'tom': 42}

Sets

Sets are unordered non-redundant collections of data, just like dictionaries. But unlike dictionaries they only have keys -- no values. They are ideal for testing for membership because lookups using sets are much more efficient than using lists.


>>> s1 = {1, 2, 3, 4}		# this is a set
>>> s1
{1, 2, 3, 4}

>>> 3 in s1      # membership test (much faster than a list)
True

>>> if 3 in s1:  # similar to above
...    print('True')
True

>>> s1.add(5)	# add a new element
>>> s1
{1, 2, 3, 4, 5}

>>> s1.add(5)	# does nothing since "5" is already a member of the set
>>> s1.remove(4)	# remove an element
>>> s1
{1, 2, 3, 5}

>>> s1.discard(4)    # like remove but won't cause an exception
>>> s2 = {1, 3}
>>> s1 | s2		# returns union of s1 and s2
{1, 2, 3, 5}

>>> s1 & s2		# returns intersection of s1 and s2
{1, 3}

>>> s1 - s2		# returns difference of s1 and s2
{2, 5}

Strings, Lists and Dictionaries Example

Here's a more real-world example: jumble_example.py

Variables, objects, and values

Variables are just named references to objects. Objects have types and categories and may be mutable, but names don’t have these properties. Thus, all the following are true...

>>> x = 42         # binds the name "x" to an integer object
>>> x = "pc204"    # binds the name "x" to the string object "pc204"
>>> x = [1, 2, 3]  # binds the name "x" to a list
>>> y = ['a', x, 'c']    # binds the name "y" to a list which includes an embeded reference to another list 
>>> y
['a', [1, 2, 3], 'c']

>>> x[1] = 'b'     # modify the second item in the original list
>>> y              # ...which of course also changes the object that "y" references
['a', [1, 'b', 3], 'c']

A reference assigned to another reference is still just a reference...

>>> x = [1, 2, 3]
>>> z = x          # both x and z reference the *SAME OBJECT*
>>> y = ['a', z, 'c']
>>> y
['a', [1, 2, 3], 'c']

>>> x[1] = 'b'     # this still changes the object that "y" references
>>> y
['a', [1, 'b', 3], 'c']

Creating Copies of Objects

If you don't want x and y to share the same object, you need to create an explicit copy of the object...


>>> x = [1, 2, 3]
>>> y = x[:]    # assigning a slice of the entire list creates a copy of the object
>>> y
[1, 2, 3]

>>> # so now if I modify the original object, like this...
>>> x[1] = 'b'  # so now if I modify the original object
>>> x           # of course x is changed...
[1, 'b', 3]

>>> y           # but y is unchanged...
[1, 2, 3]

>>> # this works for embeded references as well...
>>> x = [1, 2, 3]
>>> y = ['a', x[:], 'c']
>>> x[1] = 'b'
>>> y
['a', [1, 2, 3], 'c']  # the copy of the original object did not change

To create a copy of a dictionary, use the copy() method


D1 = {'tom':1, 'conrad':2, 'eric':3, 'greg':4}
D2 = D1.copy()

Creating Copies of Objects Continued

Why is the following example different? (Hint: what's the object?)


>>> x = 5
>>> y = ['a', x, 'c']
>>> y
['a', 5, 'c']

>>> x = 10
>>> y
['a', 5, 'c']

Answer: The object "5" is an immutable integer. Once this object is created it cannot be modified (it's immutable!). The assignment of x to the integer "10" causes a new object to be created but the original object still exists and is what y still refers to.

Quick Review Of The First 4 Weeks

Objects in Python:
- Numbers - immutable numeric
- Strings - immutable sequence of characters
- Lists - mutable sequence of objects
- Tuples - immutable sequence of objects
- Dictionaries - mutable mapping of objects
- Sets - mutable collection of objects
- Files - mutable sequence of characters used for long-term storage
- Functions - immutable sequence of Python statements

Four Week Review Continued

Python Operators:

x or y	Logical ‘or’ (y evaluated only if x is false)
x and y	Logical ‘and’ (y evaluated only if x is true)
not x	Logical negation
<, <=, >, >=, = =, <>, !=, is, is not, in, not in	Comparison operators, identity tests, sequence membership
x \| y	Bitwise or
x ^ y	Bitwise exclusive or
x & y	Bitwise and
x<<y, x>>y	Shift x left or right by y bits
x + y, x - y	Addition/concatenation, substraction
x * y, x / y, x % y	Multiplication/repetition, division, remainder/format
-x, +x, ~x	Unary negation, identity, bitwise complement
x[i], x[i:j], x.y, x(...)	Indexing, slicing, qualification, function calls
(...), [...], {...}, "..."	Tuple, list, dictionary, conversion to string

Four Week Review Continued

Python compares objects as follows:
- Numbers are compared by relative magnitude;
- Strings are compared lexicography;
- List and tuples are compared by comparing each component;
- Dictionaries are compared as though comparing sorted (key,value) lists.
- Sets are compared as though comparing sorted key lists.
- Any empty object (a string, list, tuple, dictionary, set or the 'None' special object) always evaluates as false, while nonempty objects are true.
Python Keywords:

and	del	from	not	while
as	elif	global	or	with
assert	else	if	pass	yield
break	except	import	print
class	exec	in	raise
continue	finally	is	return
def	for	lamda	try

(Words in bold have already been discussed in class. Most of the rest will be within the next week or two.)

Homework

4.1 - Write a couple of functions that check for duplicates in a list
4.2 - Solve a Car Talk Puzzler problem
Look over The Object Model by Grady Booch
Bonus Exercise - Challenge 1: The Case of the Watermarked Bacteria