PC204 Lecture 4

Tom Ferrin

tef@cgl.ucsf.edu

Homework Answers

  • 3.1 - Encapsulate the given code into a function, write code to call it, and test the resulting values.
    epsilon = 0.0000001
    def square_root(a):
        x = a / 2.0
        while True:
            y = (x + a / x) / 2
            if abs(y - x) < epsilon:
                break
            x = y
        return y
    
    def test_square_root():
        import math
        for i in range(1, 10):
            mine = square_root(i)
            theirs = math.sqrt(i)
            delta = abs(mine - theirs)
            print("%3.1f %-13.11g %-13.11g %.11g" % (float(i),  mine, theirs, delta))
    
    test_square_root()
    
    1.0 1             1             1.1102230246e-15
    2.0 1.4142135624  1.4142135624  2.2204460493e-16
    3.0 1.7320508076  1.7320508076  0
    4.0 2             2             0
    5.0 2.2360679775  2.2360679775  0
    6.0 2.4494897428  2.4494897428  8.881784197e-16
    7.0 2.6457513111  2.6457513111  0
    8.0 2.8284271247  2.8284271247  4.4408920985e-16
    9.0 3             3             0
    
    (For details about the formatting used in the print statement see String Formatting Operations that was discussed in last week's lecture.)
  • 3.2 - Read a Protein Data Bank file and compute the center of atoms by averaging the atomic coordinates.
    def average_coord(code):
        f = open(code, 'r')
        sum_x = sum_y = sum_z = 0.0
        num_atoms = 0
        for line in f:
            if not line.startswith("ATOM"):
                continue
            sum_x += float(line[30:38])    # get only the X coordinate of the record
            sum_y += float(line[38:46])    # ditto for Y
            sum_z += float(line[46:54])    # ditto for Z
            num_atoms += 1
        if num_atoms == 0:
            print("no atoms were found")
        else:
            avg_x = sum_x / num_atoms
            avg_y = sum_y / num_atoms
            avg_z = sum_z / num_atoms
            print("average coordinates of ", num_atoms, "atoms:", avg_x, avg_y, avg_z)
        f.close()
    
    average_coord(input('PDB entry: ').strip())
    
    average coordinates of  1102 atoms: 21.8288139746 5.7510508167 95.3145789474
    

Assignment 3.2, take two

  • Recall we've been saying how there's already a ton of previously written Python modules that you can take advantage of? Well the urllib.requests module and io.TextIOWrapper class are a couple of these. Using these, we could easily have our program just fetch the needed coordinate file automatically. Our previous solution changes only slightly...
    def open_pdb(code):
        url = "https://files.rcsb.org/view/%s.pdb" % code
        from urllib.request import urlopen
        from io import TextIOWrapper
        # urlopen returns a binary stream, so we convert it for text I/O
        return TextIOWrapper(urlopen(url))
    
    def average_coord(code):
        f = open_pdb(code)
        # Everything else is the same as before
    
    

Quick Review

  • Python programs can be decomposed into modules, statements, and objects:
    • Programs are composed of modules;
    • Modules contain statements;
    • Statements create and process objects.

     
  • "Objects" are also known as "data structures" in some programming languages. They’re called objects in Python to distinguish them because the low-level data structure manipulation functions often needed with many programming languages aren’t needed in Python.
  • Python has several built-in object types. These are...

 
Object Type Examples
Numbers 3.1416, 42, 123456789
Strings 'pc204', "Joe's story"
Files text = open('eggs', 'r').read()
Lists [1, [2, 'three'], 4]
Tuples (1, 'spam', 4, 'U')
Dictionaries {key: value, 'food': 'spam'}
Sets {key, 'food'}
  • Numbers can be any of several types...
     
    Constant Interpretation
    1234, -24, 0 Integers
    1.23, 3.14e-10, 0.0 Floating point number
    0177, 0x9ff Octal and hexadecimal constant
    3+4j, 3.0+4.0j, 3J Complex number constants

     
  • Python 3 integers have unlimited precision
  • Floating point numbers can range from +/- 4.9e-324 to +/- 1.8e+308 and have approximately 16 digits of precision
  • Python 2 uses integers that can be in the range of -2,147,483,648 to 2,147,483,647, and supports "long integers" which have unlimited precision

More about floating point numbers

  • Floating-point numbers are represented in computer hardware as base 2 (binary) fractions. For example, the decimal fraction 0.125 has the value 1/10 + 2/100 + 5/1000
  • Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. A consequence is that, in general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine
  • Consider the fraction 1/3. You can approximate that as a base 10 fraction as 0.3333333333333. But no matter how many digits you specify, the result is never exactly 1/3
  • In the same way, no matter how many base 2 digits you’re willing to use, the decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base 2, 1/10 is the infinitely repeating fraction 0.0001100110011001100110011001100110011001100110011...

Floating Point Numbers Continued

  • It's easy to forget that the stored value is an approximation to the original decimal fraction because of the way that floats are displayed by Python. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine
  • If Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display 0.1000000000000000055511151231257827021181583404541015625
  • Since this is more digits than most people find useful, Python keeps the number of digits manageable by displaying a rounded value instead, so that 0.1 prints as 0.1

Floating Point Numbers Continued

  • But it’s important to realize that this is, in a very real sense, an illusion: the value in the machine is not exactly 1/10, you’re simply rounding the display of the true machine value. This fact becomes apparent as soon as you try to do arithmetic with these values...
    print("%.20f" % (0.1 + 0.2))
    0.30000000000000004441
    
  • This can lead to some very unexpected results...
    >>> if ((0.1 + 0.2) != 0.3):
    >>>    print("How can this be?")
    How can this be?
    

Floating Point Numbers Continued

  • Note that this is in the very nature of binary floating-point. This is not a bug in Python, and it’s not a bug in your code either. You’ll see the same kind of thing in all computer languages that support floating-point arithmetic.
  • This issue comes up most often when testing floats for equality. So instead of testing for equality, you can just test to see if the absolute value of the difference is a very small number...
    >>> epsilon = 1.0e-14
    >>> if (abs((0.1 + 0.2) - 0.3) > epsilon):
    >>>    print("How can this be?")
    
    
  • Binary floating-point arithmetic holds several surprises like this. You should read The Perils of Floating Point by Bruce Bush for a more complete account of other common surprises.

Lists in Python

  • Lists are ordered collections of arbitrary objects that can be accessed by offsets (just like strings), can vary in length, and can contain other lists (i.e. are nestable). Unlike strings, lists are mutable sequences because they can be modified in place, which means they support operations like deletion, index assignment, and methods. Lists contain, technically, zero or more references to other Python objects.

Common List Expressions and Methods

Operation Interpretation
L1 = [ ] Creates an empty list
L2 = [0, 1, 2, 3] A four element (or item) list
L3 = ['one', 'two', [1, 2]] Nested sublists
L2[2] Third item in a list (like string offsets, list offsets begin at 0)
L3[2][0] First sublist item in the third list item
L2[i:j] Slice (just like in strings)
len(L3) Length (just like in strings)
L1 + L2 Concatenation
L1 * 4 Repetition
for x in L2: Iteration
'two' in L3 Membership test

More List Expressions and Methods

Operation Interpretation
L2.append(4) Grow list at end by 1 item (the integer 4)
L2.extend([1, 2, 3]) Grow list at end by multiple items
L2.sort() Sort the list
L2.index(n) Find index of 'n' in list
L2.reverse() Reverse items in the list
del L2[k] Remove kth item from the list
L2[i:j] = [ ] Remove ith through (j-1)th items
L2[2] = 42 Replace 3rd item in the list (index assignment)
L2[1:3] = [0, 0] Replace 2nd & 3rd list items with zeros (slice assignment)

Examples

  • Just like with strings, items in a list are fetched by indexing; i.e., providing the numeric offset of the desired item in the list. In other words, indexing a list begins at 0 and ends at one less than the length of the list. You can also fetch items from a list using negative offsets, which just count backwards from the end of the list.
>>> L1=[1, 2, 3, 4]
>>> L2=[5, 6]
>>> len(L1)
4

>>> L1[1:3]		# indexing begins at 0!
[2, 3]

>>> L1[:3]		# missing first index means from the beginning
[1, 2, 3]

>>> L1[2:]		# missing second index means through end of the list
[3, 4]

>>> L1[0:-1]	# -1 means one less item than the last item
[1, 2, 3]

>>> L1 + L2
[1, 2, 3, 4, 5, 6]

>>> L1.append(L2)
>>> L1
[1, 2, 3, 4, [5, 6]]

Indexing and Slicing

  • Indexing and Slicing of lists are very common operations, so just remember...
  • Indexing = L[i]:
    • Fetch items at offsets (the first item is at offset zero)
    • Negative indexes mean to count from the end of the list
    • L[0] fetches the first item
    • L[-2] fetches the second item from the end
  • Slicing = L[i:j]:
    • Extracts contiguous section of items from a list
    • Slice boundaries default to zero and the list length
    • L[1:3] fetches from offset 1 up to but not including offset 3
    • L[1:] fetches from offset 1 through the end of the list
    • L[:-1] fetches from offset 0 up to but not including the last item

Example Usage in a Function

  • Consider this function...
    def extract(s):
        """extract space-separated words from string 's' and
        return each word in a list"""
        result = [ ]        # begin with empty list
        while s:
            k = 0           # k will be the end of a slice
            for c in s:     # inspect chars in string one at a time
                if c == ' ':
                    break   # exit the loop if character is a space
                k = k + 1   # increment slice limit by 1
            result.append(s[0:k])  # add a new word to the list
            s = s[k+1:]     # save the remaining string
        return result       # done - return the list to the caller
    
  • It produces this...
    >>> print(extract('Now is the time for all good people'))
    [’Now', 'is', 'the', 'time', 'for', 'all', 'good', 'people']
    
    >>> print(extract('Now'))       # how about with just one word in s?
    [‘Now’]
    
  • It's always important to test your code, especially at the "boundaries" of where it's designed to work!

Boundary Conditions

  • So, for example, what if we call extract() with a argument that doesn't have any words in it?
    
    >>> print(extract(''))	# test for correct result with null string
    [ ]
    
    
  • That seems right. But what about a string that only has spaces and no words?
    
    >>> print(extract('   '))	# a string with three spaces
    ['', '', '']
    
    
  • Hmm. Is a list of three items the correct result?

Boundary Conditions Continued

  • At least for strings, most would argue that it doesn't matter how many spaces there are between words. So, it's one or more spaces that separate words in a string. Similarly, if there aren't any words at all in the string, even if there are some spaces, then the right answer for our function to return should be an empty list. So we need to fix our code and re-test it...
    def extract(s):
        """extract space-separated words from string 's' and
        return each word in a list"""
        result = [ ]        # begin with empty list
        while s:
            k = 0           # k will be the end of a slice
            for c in s:     # inspect chars in string one at a time
                if c == ' ':
                    break   # exit the loop if character is a space
                k = k + 1   # increment slice limit by 1
            if k != 0:      # ONLY ADD A NEW WORD IF ONE WAS FOUND
                result.append(s[0:k])  # THERE WAS, SO ADD IT
            s = s[k+1:]     # save the remaining string
        return result       # done - return the list to the caller
    
  • Produces...
    >>> print(extract('   '))   # a string with three spaces
    [ ]
    
    >>> print(extract('   Now  is   the time     '))   # spaces sprinkled here and there
    ['Now', 'is', 'the', 'time']
    

Boundary Conditions Continued

  • The point of all this is that as a programmer you have to decide how you want your code to behave and then test to make sure it does what is desired. "Boundary cases" such as this come up all the time in programming and you must get in the habit of testing your code for these cases. For example, does the code do the right thing if a loop never gets executed at all? How about if it's only executed once? How about if it's executed multiple times? You should also get in the habit of documenting the correct behavior -- especially with unusual cases -- with comments so that if you or someone else later looks at your code, it's clear that you've considered the boundary cases.

More String Functions

  • Let's build on our previous example by adding another function...
    def instring(w, s):     # is word w contained in s?
        words = extract(s)  # 'words' is now a list of words
        for x in words:     # consider each, one at a time
            if x == w:      # test for a match
                return True
        return False
    
    str = '   Now is the    time for all good people   '
    print(instring('time', str))
    True
    
    print(instring('tom', str))
    False
    
    print(instring('', str))
    False
    
    print(instring('', ''))
    False
    
  • The last two cases are again testing boundary conditions. Is an empty string contained in str? For our example here (where our code is designed to operate on words), since an empty string is not a word then it can never be contained in str and our function is therefore working correctly. The point is, you're the programmer and you need to decide, test, and document what the correct behavior of your code should be.

A Little Optimization

  • Our instring() function on the previous slide can be improved. For one thing, the "for" loop can be eliminated and return statement simplified...
    for x in words:
        if x == w:
            return True
        return False
    
    ...and replaced with a membership test...
    return w in words    # return True if w in words
    
  • And we may be able to use some built-in list functions:
Functions Return Value
all(L1) True if all elements in L1 are True
any(L1) True if any element in L1 is True
min(L1) Smallest element in L1
max(L1) Largest element in L1
reversed(L1) Elements of L1 in reverse order
sorted(L1) Elements of L1 in sorted order
sum(L1) Sum of elements in L1
sum(L1, start) Sum of elements in L1 + start
  • Another built-in function that is very useful:
    # Loop through array using indices
    for index in range(len(L1)):
        value = L1[index]
        print(index, value)
    
  • Here's the equivalent using the enumerate() function:
    for index, value in enumerate(L1):
        print(index, value)
    

And Finally There Is "List Comprehension"

  • Example:
    # Apply an operation to each element in a list
    # and keep the results in a second list
    L2 = [ ]
    for value in L1:
        L2.append(value + 10)
    
    # same using list comprehension
    L2 = [ value + 10 for value in L1 ]
    
    # and you can even skip some elements if you want
    L3 = [ value + 10 for value in L1 if value < 100 ]
    
  • You don’t need to use list comprehension in your programs, but you should recognize it when you see it. Sometimes people get too clever with list comprehension and the code ends up being very difficult to read. It's best use is for simple mapping from one set of values to another.

Tuples

  • Tuples are just like Python lists, but they are immutable. All the same operations that worked on lists work on tuples except tuples don't provide the content-altering methods that lists do (e.g. append(), sort(), reverse()). Like lists, concatenation, repetition, and slicing applied to tuples return the results in new tuples. The immutability of tuples provides object integrity; you can be sure that a tuple can't be changed inadvertently somewhere else in your program.
  • Examples
    >>> t1 = (1, 2, 3, 4)	# this is a tuple
    >>> t2 = 5, 6, 7, 8		# this too (syntactically unambiguous)
    >>> t3 = (9,)	# a one-item tuple (comma required to avoid ambiguity with expressions)
    >>> t4 = (9)	# an expression that evaluates to 9
    >>> t5 = ('abc', (1, 2, 3), 'def')    # nested tuples
    
  • A common use of tuples is in function return values:
    def trivial():
        x = 2.71828
        y = 3.14159
        return (x, y)
    
    m, n = trivial()
    

Python Dictionaries

  • Dictionaries are unordered collections of arbitrary values that can be accessed via an associated "key". Keys are unique, i.e., can only appear at most once in each dictionary.
  • Dictionaries are of the category mutable mapping, which means they can be modified in place (like lists), but don’t support sequence operations (like strings and lists). An item is a (key, value) pair.

Common Dictionary Constants and Operations

Operation Interpretation
D1 = { } Creates an empty dictionary
D2 = {'tom':1, 'conrad':2} A two item dictionary
D3 = {'tom':1, 'conrad':{'greg':3, 'eric':4}} Nesting
D2['conrad'] Retrieval of value by key
D3['conrad']['eric'] Nested retrieval
D2.has_key('tom') or 'tom' in D2 Membership test
D2.keys() List of all keys in the dictionary
D2.values() List of all values in the dictionary
D2.items() List of all (key, value) pairs
D2.get(k, v) Value with key "k" if k in D2, otherwise "v"
D2.setdefault(k, v) Like D2.get(), but also adds item to dictionary D2
len(D2) Number of keys in the dictionary
D2[key] = value Add or change an item
del D2[key] Delete an item
D4 = D2.copy() Create a (shallow) copy of D2

Examples


>>> d1 = {'tom':1, 'conrad':2, 'greg':3, 'eric':4}
>>> d1['greg']		# given a key, fetch associated value
3

>>> len(d1)		# return number of items in the dictionary
4

>>> d1.has_key('eric')		# test for the presence of a key
True

>>> d1.keys()		# return list of all keys
['tom', 'conrad', 'greg', 'eric']

>>> d1['tom'] = 42	# assign a new value for key 'tom'
>>> d1['conrad'] = [3, 4, 5]	# items can be arbitrary objects
>>> d1		# note that dictionaries are unordered!
{'conrad': [3, 4, 5], 'greg': 3, 'eric': 4, 'tom': 42}

>>> del d1['greg']	# delete an entry
>>> d1
{'conrad': [3, 4, 5], 'eric': 4, 'tom': 42}

>>> d1['al'] = 'good man'	# assigning to a new index adds a new entry
>>> d1
{'al': 'good man', 'conrad': [3, 4, 5], 'eric': 4, 'tom': 42}

>>> d1[101] = 'test'	# keys can be any immutable object
>>> d1
{'al': 'good man', 'conrad': [3, 4, 5], 101: 'test', 'eric': 4, 'tom': 42}

Sets

  • Sets are unordered non-redundant collections of data, just like dictionaries. But unlike dictionaries they only have keys -- no values. They are ideal for testing for membership because lookups using sets are much more efficient than using lists.

>>> s1 = {1, 2, 3, 4}		# this is a set
>>> s1
{1, 2, 3, 4}

>>> 3 in s1      # membership test (much faster than a list)
True

>>> if 3 in s1:  # similar to above
...    print('True')
True

>>> s1.add(5)	# add a new element
>>> s1
{1, 2, 3, 4, 5}

>>> s1.add(5)	# does nothing since "5" is already a member of the set
>>> s1.remove(4)	# remove an element
>>> s1
{1, 2, 3, 5}

>>> s1.discard(4)    # like remove but won't cause an exception
>>> s2 = {1, 3}
>>> s1 | s2		# returns union of s1 and s2
{1, 2, 3, 5}

>>> s1 & s2		# returns intersection of s1 and s2
{1, 3}

>>> s1 - s2		# returns difference of s1 and s2
{2, 5}

Strings, Lists and Dictionaries Example

Variables, objects, and values

  • Variables are just named references to objects. Objects have types and categories and may be mutable, but names don’t have these properties. Thus, all the following are true...
    >>> x = 42         # binds the name "x" to an integer object
    >>> x = "pc204"    # binds the name "x" to the string object "pc204"
    >>> x = [1, 2, 3]  # binds the name "x" to a list
    >>> y = ['a', x, 'c']    # binds the name "y" to a list which includes an embeded reference to another list 
    >>> y
    ['a', [1, 2, 3], 'c']
    
    >>> x[1] = 'b'     # modify the second item in the original list
    >>> y              # ...which of course also changes the object that "y" references
    ['a', [1, 'b', 3], 'c']
    
  • A reference assigned to another reference is still just a reference...
    >>> x = [1, 2, 3]
    >>> z = x          # both x and z reference the *SAME OBJECT*
    >>> y = ['a', z, 'c']
    >>> y
    ['a', [1, 2, 3], 'c']
    
    >>> x[1] = 'b'     # this still changes the object that "y" references
    >>> y
    ['a', [1, 'b', 3], 'c']
    

Creating Copies of Objects

  • If you don't want x and y to share the same object, you need to create an explicit copy of the object...
    
    >>> x = [1, 2, 3]
    >>> y = x[:]    # assigning a slice of the entire list creates a copy of the object
    >>> y
    [1, 2, 3]
    
    >>> # so now if I modify the original object, like this...
    >>> x[1] = 'b'  # so now if I modify the original object
    >>> x           # of course x is changed...
    [1, 'b', 3]
    
    >>> y           # but y is unchanged...
    [1, 2, 3]
    
    >>> # this works for embeded references as well...
    >>> x = [1, 2, 3]
    >>> y = ['a', x[:], 'c']
    >>> x[1] = 'b'
    >>> y
    ['a', [1, 2, 3], 'c']  # the copy of the original object did not change
    
    
  • To create a copy of a dictionary, use the copy() method
    
    D1 = {'tom':1, 'conrad':2, 'eric':3, 'greg':4}
    D2 = D1.copy()
    
    

Creating Copies of Objects Continued

  • Why is the following example different? (Hint: what's the object?)
    
    >>> x = 5
    >>> y = ['a', x, 'c']
    >>> y
    ['a', 5, 'c']
    
    >>> x = 10
    >>> y
    ['a', 5, 'c']
    
    
  • Answer: The object "5" is an immutable integer. Once this object is created it cannot be modified (it's immutable!). The assignment of x to the integer "10" causes a new object to be created but the original object still exists and is what y still refers to.

Quick Review Of The First 4 Weeks

  • Objects in Python:
    • Numbers - immutable numeric
    • Strings - immutable sequence of characters
    • Lists - mutable sequence of objects
    • Tuples - immutable sequence of objects
    • Dictionaries - mutable mapping of objects
    • Sets - mutable collection of objects
    • Files - mutable sequence of characters used for long-term storage
    • Functions - immutable sequence of Python statements

Four Week Review Continued

  • Python Operators:
    x or y Logical ‘or’ (y evaluated only if x is false)
    x and y Logical ‘and’ (y evaluated only if x is true)
    not x Logical negation
    <, <=, >, >=, = =, <>, !=, is, is not, in, not in Comparison operators, identity tests, sequence membership
    x | y Bitwise or
    x ^ y Bitwise exclusive or
    x & y Bitwise and
    x<<y, x>>y Shift x left or right by y bits
    x + y, x - y Addition/concatenation, substraction
    x * y, x / y, x % y Multiplication/repetition, division, remainder/format
    -x, +x, ~x Unary negation, identity, bitwise complement
    x[i], x[i:j], x.y, x(...) Indexing, slicing, qualification, function calls
    (...), [...], {...}, "..." Tuple, list, dictionary, conversion to string

Four Week Review Continued

  • Python compares objects as follows:
    • Numbers are compared by relative magnitude;
    • Strings are compared lexicography;
    • List and tuples are compared by comparing each component;
    • Dictionaries are compared as though comparing sorted (key,value) lists.
    • Sets are compared as though comparing sorted key lists.
    • Any empty object (a string, list, tuple, dictionary, set or the 'None' special object) always evaluates as false, while nonempty objects are true.
  • Python Keywords:
and del from not while
as elif global or with
assert else if pass yield
break except import print
class exec in raise
continue finally is return
def for lamda try
(Words in bold have already been discussed in class. Most of the rest will be within the next week or two.)

Homework