PC204 Lecture 3

Tom Ferrin

tef@cgl.ucsf.edu

Homework Answers

2.1.1 - Write a recursive function that prints the range of numbers from start to end.

def print_range(start, end):
    print(start)
    if start == end:
        return
    elif start > end:
        print_range(start - 1, end)
    else:
        print_range(start + 1, end)

2.1.2 - Write a test program that uses input() to fetch the starting and ending numbers from the user.

start = int(input("Enter start of range: "))
end = int(input("Enter end of range: "))
print_range(start, end)

2.2 - Write a function that returns the greatest common divisor of two numbers, a and b.

def gcd(a, b):
    """Return greatest common denominator of a and b.
    Returns None on bad input.
    This code is based on Euculid's algorithm where gcd(a, b) = gcd(b, r)"""

    if a <= 0 or b <= 0:
        return None		# limit our solution to only positive numbers
    if a < b:
        return gcd(b, a)	# if a is less than b, just reverse the args
    r = a % b
    if r == 0:
        return b         # if no remainder then we found the gcd
    else:
        return gcd(b, r)	# otherwise recursiveely call gcd using the remainder as the new divisor

def test(a, b):
    print("gcd(%s, %s) = %s" % (a, b, gcd(a, b)))

if __name__ == "__main__":
    print("testing gcd() function...")
    test(10, 15)
    test(24, 16)
    test(-1, 34)
    test(17, 15)

% python gcd.py
gcd(10, 15) = 5
gcd(24, 16) = 8
gcd(-1, 34) = None
gcd(17, 15) = 1

Alternative solution to 2.2:

def gcd(a, b):
    r = a % b
    if r == 0:
        return b
    else:
        return gcd(b, r)

print("gcd of 12 and 8 is", gcd(12, 8))
print("gcd of 10 and 5 is", gcd(10, 5))
print("gcd of 20 and 24 is", gcd(20, 24))

gcd of 12 and 8 is 4
gcd of 10 and 5 is 5
gcd of 20 and 24 is 4

The tests for illegal parameter values and for a < b are gone. Why?

Iteration

Iteration is the repetition of a block of statements. In Python, iteration is accomplished with either a "while" loop or a "for" loop.

While Loops

A "while" statement repeatedly executes a block of indented statements as long as the test expression evaluates to "true."

Syntax:

while test_expression_1:
    statement_block_1    # these statements executed as long as test_expression_1 evalutes to true
    if test_expression_2: break
    if test_expression_3: continue
else:
    statement_block_2

Notes:
1. The "break," "continue," and "else" components are all optional.
2. When "test_expression_1" is no longer true, control passes to the "else" component (if present) or to the next statements after the end of the "while" loop.
3. If "test_expression_1" is never true then statement_block_1 is never executed.
4. The "else" part of the loop is executed only if the looping part is exited normally or never executed at all.
5. "break" causes control to jump out of the nearest enclosing loop. If this happens, the "else" part is not executed.
6. "continue" causes control to jump to the top of the nearest enclosing loop.

Examples

a = 0
b = 10
while a < b:
    print(a, end=' ')
    a = a + 1
print()

0 1 2 3 4 5 6 7 8 9

The end=' ' argument to the print function indicates that a space character should be printed following the value of a, rather than the usual newline ('\n') character. The final print function call terminates the line by printing a newline character.

y = 67
x = y // 2
while x > 1:
    if y % x == 0:
        print(y, "has factor", x)
        break
    x = x - 1
else:
    print(y, "is prime")

67 is prime

If y <= 2, then the test expression is never true and the "else" part of the loop is executed, which is what we want.

For Loops

"for" loops are used to step through any object that responds to sequence indexing operations. They work on strings, lists, and tuples.

Syntax:

for target_name in object:
    statement_block_1
    if test_expression_1: break
    if test_expression_2: continue
else:
    statement_block_2

Notes: All the same rules that applied to "while" loops apply to "for" loops too.

Examples

word = "hello"
for x in word:
    print(x)

h
e
l
l
o

for n in range(10):
    print(n, end=' ')
print()

0 1 2 3 4 5 6 7 8 9

For the perfectionists: There is a space character at the end of the output line. How do we get rid of it?

Avoiding Trailing Space Characters

first = True
for n in range(10):
    if first:
        first = False
    else:
        print(end=' ')
    print(n, end='')
print()

0 1 2 3 4 5 6 7 8 9

Iteration vs. Recursion

Some of you are probably thinking "for and while loops seem a lot like recursive functions". It's certainly true that iteration and recursion are closely related, and, in general, one may be converted to the other.

Iteration is stated as "repeat this procedure until the problem is solved", while the idea of recursion is that a problem is either (a) solved, or (b) solvable using the same algorithm after some simplification.

As an exercise, try re-implementing Euclid's algorithm using a while loop.

In general (again), recursion is slightly more expensive in terms of memory use and execution time, but some problems are much more easily expressed as recursion (e.g., traversing a graph or visiting all nodes in a tree) because much of the bookkeeping is handled implicitly (through passing of parameters) and you do not need to construct an elaborate system to track what data has been used and what still needs to be done.

Strings in Python

Strings in Python are an ordered collection of characters. Strings are immutable sequences, which means they respond to standard Python sequence operations but can't be changed in place.

Operation	Interpretation
str = ""	An empty string
str = "tom's office"	Double-quoted string
strblk = """lots-of-text"""	Triple-quoted block
str1 + str2	Concatenation
str * i	Repetition
str[i]	Index
str[i:j]	Slice
len(str)	Length
"this is a number: %d" % 100	Formatting
for x in str:	Iteration
"m" in str	Membership test

Examples

s1 = "Now is the time for all"
s2 = " good people to cast their vote."
s1 + s2

'Now is the time for all good people to cast their vote.'

len(s1)
23

len(s1+s2)
55

s1[0]    # index: note that string offsets begin at zero
'N'

s2[6:12]    # slice: up to, but not including upper bound
'people'

s1[1:-1]    # slice: negative offsets count from the end
'ow is the time for al'

for x in s1:
    print(x, end=' ')

N o w   i s   t h e   t i m e  f o r  a l l

"N" in s1    # test membership for character N in s1
True

s1 == s2    # test for equality
False

if 'vote' in s1+s2:
    print('found word "vote"')

found word "vote"

Built-In String Functions

Since manipulating text strings is a frequent operation in most programs, Python has several built-in functions for doing just that.

capitalize()	isalnum()	lstrip()	splitlines()
center()	isalpha()	partition()	startswith()
count()	isdigit()	replace()	strip()
decode()	islower()	rfind()	swapcase()
encode()	isspace()	rindex()	title()
endswith()	istitle()	rjust()	translate()
expandtabs()	isupper()	rpartition()	upper()
find()	join()	rsplit()	zfill()
format()	ljust()	rstrip()
index()	lower()	split()

See this page to find out what all of these do.

Examples

All of the functions on the previous slide are a special kind of Python function called a "method." Methods operate directly on single objects, in this case string objects.

s = 'Now iS tHe TIme for ALL gOOd PeoPLe'
s.lower()

'now is the time for all good people'

uc = s.upper()
print(uc)

NOW IS THE TIME FOR ALL GOOD PEOPLE

# find a substring within a string
s = 'The stock market has tanked this week.'
s.find('has')

17

s.find('have')
-1

s.find('the')
-1

s2 = s.lower()
s2.find('the')
0

s.lower().find('the')	# equivalent
0

Strings vs Binary Data

In Python 3, strings contain characters, not binary data. The bytes data type is used to store binary data. To interconvert between the str and bytes, one needs an encoding scheme.
Python 3 uses the Unicode character set for its strings. (For history buffs, Unicode is a superset of ASCII.) This means each known character is assigned a particular numerical value, e.g., 'A' = 65 and ✁ = 9985.
The most common encoding scheme used to convert from Python 3 strings to binary data (and vice versa) is UTF-8, which represents each character as a series of 8-bit bytes. Since most common used characters, e.g., Latin alphabet, are assigned values less than 256, UTF-8 is a very space-efficient scheme. Examples of other encodings include UTF-16 for Unicode, and CP1252 for the Windows-1252 character set.

s = 'Now iS tHe TIme for ALL gOOd PeoPLe'
print(repr(s))
b = s.encode("utf-8")       # "utf-8" may be omitted since it is the default
print(repr(b))
print(repr(b.decode("utf-8")))
print(repr(s.encode("utf-16")))

'Now iS tHe TIme for ALL gOOd PeoPLe'
b'Now iS tHe TIme for ALL gOOd PeoPLe'
'Now iS tHe TIme for ALL gOOd PeoPLe'
b'\xff\xfeN\x00o\x00w\x00 \x00i\x00S\x00 \x00t\x00H\x00e\x00 \x00T\x00I
\x00m\x00e\x00 \x00f\x00o\x00r\x00 \x00A\x00L\x00L\x00 \x00g\x00O\x00O
\x00d\x00 \x00P\x00e\x00o\x00P\x00L\x00e\x00'

Python "help" Documentation

Many Python language features have their own documentation, and can be especially useful for built-in functions and methods.

s = "text string"    # s now references a string object
help(s.find)

Help on built-in function find:
		
find(...)
    S.find(sub [,start [,end]])) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within s[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.

END    <== type q to quit help

Printing

In Python 3, the print function writes strings to the computer screen with some default formatting. By default, print adds a space between items separated by commas, and a newline after all output. To change the final output charactor, provide an end argument, e.g., end=' '. To suppress the space between items, build up the output string using the concatenation operator or formatting commands.

Example:

print('a', 'b')
a b

print('a' + 'b')
ab

s1 = 'Now is the time for all good people'
s2 = 'to cast their vote.'
print(s1, s2)
Now is the time for all good people to cast their vote.

String Formatting

The arguments to the print function are converted to strings automatically using a default format. To output the data in a particular format, you must first create a string from your data using a "format specification" that you provide. The general form of a format specification is...

format_specification_string % values

...where % in this case is the format operator.
Example...
```
'The answer for question %d is %s.' % (2, 'wrong')

The answer for question 2 is wrong.
```
Here %d and %s are "formatting codes" and (2, 'wrong') is a tuple of values (an integer and a string in this example).

Common Formatting Codes:

Code Conversion
%s string
%c single character
%d decimal integer
%f floating-point (e.g., 3.141590)
%e floating-point (e.g., 3.141590e+00)
%% a literal '%'
Common String "Backslash" (non-visible) Characters:

Character Meaning
\n newline
\t horizontal tab
\' single quote
\" double quote
\\ backslash (just one, not two)
\0nn octal value nn (e.g., \000 for a null character)

Code	Conversion
%s	string
%c	single character
%d	decimal integer
%f	floating-point (e.g., 3.141590)
%e	floating-point (e.g., 3.141590e+00)
%%	a literal '%'

Character	Meaning
\n	newline
\t	horizontal tab
\'	single quote
\"	double quote
\\	backslash (just one, not two)
\0nn	octal value nn (e.g., \000 for a null character)

Examples

str1 = "tom's book"
print('Did you know that %s\nis lost.' % str1)

Did you know that tom's book
is lost.

import math
n = math.pi
print('pi approximately equal to %f or %e using scientific notation' % (n, n))

pi approximately equal to 3.141593 or 3.141593e+00 using scientific notation

In format strings with multiple formatting codes and corresponding values, order is important...

'The answer for question %d is %s.'  %  ('wrong', 2)

Traceback (most recent call last):
  File "test.py", line 1, in ?
TypeError: %d format: a number is required, not str

Remember our Function Example from Week 1?

def printProduct(i, j):
    print(i * j, end=' ')

def printRow(n):
    for i in range(1, 10):   # this is a "for" loop (more about this in week 3)
        printProduct(i, n)   # call function printProduct 10 times
    print()                  # start a new line

def printTable():
    for i in range(1,10):
        printRow(i)

printTable()	# function printTable must be defined before here

Produces:

1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18 
3 6 9 12 15 18 21 24 27 
4 8 12 16 20 24 28 32 36 
5 10 15 20 25 30 35 40 45 
6 12 18 24 30 36 42 48 54 
7 14 21 28 35 42 49 56 63 
8 16 24 32 40 48 56 64 72 
9 18 27 36 45 54 63 72 81

Here it is again using a string formatting

def printProduct(i, j):
    print("%2d" % (i * j), end=' ')

...(everything else is the same as before)

Produces:

 1  2  3  4  5  6  7  8  9
 2  4  6  8 10 12 14 16 18 
 3  6  9 12 15 18 21 24 27 
 4  8 12 16 20 24 28 32 36 
 5 10 15 20 25 30 35 40 45 
 6 12 18 24 30 36 42 48 54 
 7 14 21 28 35 42 49 56 63 
 8 16 24 32 40 48 56 64 72 
 9 18 27 36 45 54 63 72 81

More String Formatting

What we've covered here are just the basics of string formatting. The String Formatting Operations section of the online Python 2 documention covers string formating in much greater, and it applies to the % format operator in Python 3 as well. For example, you can control the field width used when printing values, the precision of floating point numbers, whether numbers are zero padded, if a blank space should be left before a positive number, and more.
Python 3 also provides a format method for strings. Instead of using % placeholders, the format method looks for {} instead.

"hello %d %s" % (3, "world")
'hello 3 world'

"hello {} {}".format(3, "world")
'hello 3 world'

"hello {howmany} {who}".format(who="world", howmany=3)
'hello 3 world'

print: Python 2 vs 3

In Python 3 print() is a function which means the parentheses are required...
```
print('pi is a very special number.  It equals %f.' % n)
```
In Python 2, print is a statement, and the parentheses are optional. In fact, if you include the parentheses when trying to print multiple comma-separated values, it would be interpreted as trying to print "a single tuple of values", rather than "several values one after the other".
In Python 3, the end argument is used to control the final character printed, while in Python 2, the only control one has is to supply a trailing comma to suppress the output of the newline.
```
print("Here's some text", end="")
print(" and some more.")

Here's some text and some more.
```
Must be done like this in Python 2...
```
print "Here's some text",
print " and some more."

Here's some text and some more.
```

Files in Python

Files provide named long-term storage 'compartments' for data. You can open files, read data from them, write data to them, and close them (i.e. inhibit further access).
Files may be opened for reading, writing, or both. Reading from a write-only file and writing to a read-only file result in errors.
Python distinguishes between text and binary files.
- Text files consist of lines of characters.
- Binary files consist of arbitrary bytes of data.
Python provides libraries on top of basic file input/output for more complex formats such as CSV (comma-separated values), XML (eXtensible Markup Language) and Excel. We'll cover some of these later in the course.

Basic Text File Operations

Operation	Interpretation
i = open('filename', 'r')	Open an existing text file for reading
o = open('filename', 'w')	Create a new (or overwrite an existing) text file for writing
i = open('filename', 'r+')	Open an existing text file for reading and writing
str = i.read()	Read entire file into string 'str'
str = i.read(N)	Read N characters into 'str'
str = i.readline()	Read next line into 'str'. Returns an empty string upon end-of-file.
for str in i:	Read next line into 'str'. Exit loop upon end-of-file.
L1 = i.readlines()	Read all lines into list 'L1'.
o.write(str)	Write string 'str' into the file
o.writelines(L1)	Write strings in list L1 into the file
i.seek(0)	Set current read pointer to beginning of the file
o.close()	Close the file

Basic Binary File Operations

Operation	Interpretation
i = open('filename', 'rb')	Open an existing binary file for reading
o = open('filename', 'wb')	Create a new (or overwrite an existing) binary file for writing
i = open('filename', 'rb+')	Open an existing binary file for reading and writing
bytes = i.read()	Read entire file into 'bytes'
bytes = i.read(N)	Read N bytes into 'bytes'
o.write(bytes)	Write 'bytes' into the file
i.seek(0)	Set current read pointer to beginning of the file
o.close()	Close the file

Examples

out = open('/tmp/test', 'w')
s1 = "Now is the time for all good men\n"
s2 = "to come to the aid of their country.\n"
out.writelines([s1, s2])
out.close()

% cat /tmp/test
Now is the time for all good men
to come to the aid of their country.

i = open('/tmp/test', 'r')
s = i.read()	# read entire file at once
print(s)

'Now is the time for all good men\nto come to the aid of their country.\n'

i.close()
i = open('/tmp/test', 'r')
i.readline()	# read a line at a time
'Now is the time for all good men\n'
i.readline()
'to come to the aid of their country.\n'
i.readline()
''	<== empty string means end-of-file

Converting a Binary File into Text

Python 3 modules for handling web connections deal strictly in binary data, mainly because there is no way to know a priori whether the returned data is text or binary.
The io.TextIOWrapper class may be used to convert a binary data stream into a text stream. The default encoding used by io.TextIOWrapper is UTF-8 but may be replaced by other encodings.

import urllib.request, io
binary_stream = urllib.request.urlopen(url)
text_stream = io.TextIOWrapper(binary_stream)

This will be necessary for the homework.

Keyboard Input

Python 3 provides a built-in function called input() for reading from the keyboard. When input() is called, it collects characters typed on the keyboard, then when the user hits the Return or Enter key, it returns a text string containing all the characters as the return value of the function. It's a good idea to prompt the user to provide input, and if a text string is supplied as an argument to input() this string is printed before collecting any input characters. So for example...
```
myString = input('Enter reaction constant: ')
reactK = float(myString)
print(reactK)
```
Produces...
```
Enter reaction constant: 22.59
22.59
```

Keyboard Input Continued

Sometimes the user may not provide the input you expect, and this can lead to a runtime error...
```
Enter reaction constant: unknown
Traceback (most recent call last):
  File "stdin", line 1, in ?
ValueError: invalid literal for float(): unknown
```
This type of error is easy to handle, but not until we learn about Python exception handling using "try" and "except" statements in week 5.

Python 2 Keyboard Input

In Python 2, the function is called raw_input() instead of input().
What's a little confusing is that Python 2 also has a function named input(), but it does something different and is of little value for our purposes.

Case Study #2: Word Play

Exercise 9.1

All the Python code for this case study is available on the course web site here.
For all of the exercises in this case study, we'll be using a list of words found in the file words.txt. This file contains 113,809 words considered valid in crossword puzzles and other word games.

The first exercise is pretty easy: write a program that reads words.txt and prints only the words with more than 20 characters, not counting white space. Here's the solution:

fin = open('words.txt', 'r')
for line in fin:
    word = line.strip()    # remove any leading or trailing white space characters like space and \n
    if len(word) > 20:
        print(word)

Exercise 9.2

In 1939 Ernest Wright published Gadsby, a 50,000 word novel that does not contain the letter 'e.' Since 'e' is the most common letter in the English language, that's not easy. For this exercise we'll print words having no e's and compute how often these words occur.

First we'll need a function that checks if the letter 'e' is found in a given word:

def has_no_e(word): 
    if word.find('e') == -1:
        return True
    return False

Now we just need to read all the words and keep track of how many have no e's:

fin.seek(0)    # reset file pointer to beginning of the file
numWords = 0
numNoEs = 0
print("\nThe words containing no e's are:")
for line in fin:
    numWords += 1
    word = line.strip()
    if has_no_e(word):
         numNoEs += 1
         print(word)

print("Percentage of words with no e's = %f" % (float(numNoEs)/numWords/100.))

Exercise 9.4

Find words having only the letters "acefhlo". We'll need another function for that:

def uses_only(word, letters):
    for c in word:
        if c not in letters:
            return False
    return True

Now just read the words and check them for the given letters:

fin.seek(0)
print("\nThe words containing only the letters 'acefhlo' are:")
for line in fin:
    word = line.strip()
    if uses_only(word, 'acefhlo'):
        print(word)

Exercise 9.6

Find all the words, and how often these occur, where the letters appear in alphabetical order (double letters are ok).

Of course this requires a new function...

def is_abecedarian(word):
    i = 0
    while i < len(word)-1:
        if word[i+1] < word[i]:
            return False
        i += 1
    return True

Now the rest is just like exercise 9.2...

fin.seek(0)    # reset file pointer to beginning of the file
numWords = 0
numAbeced = 0
print("\nThe abecedarian words are:")
for line in fin:
    numWords += 1
    word = line.strip()
    if is_abecedarian(word):
         numAbeced += 1
         print(word)

print("Percentage of abecedarian words = %f" % (float(numAbeced)/numWords/100.))

Lessons Learned

Often problems that at first seem difficult are relatively easy to solve just by breaking them down into managable chunks. This important concept is called problem decomposition. Python functions are especially helpful for this and a good approach is to design each function to do one thing. This helps keep your functions easy to design, write, debug and understand.
Code is often reusable, so you don't always have to write everything from scratch. In this case, exercise 9.6 was very similar to 9.2 with the only real difference being a different function call. So it was easy to "cut & paste" working code from one place to another, saving time.
Python provides a bunch of Built-in Functions and it's always a good idea to familarize yourself with these since then you won't have to write the code to implement these functions yourself.

Homework

3.1 - Encapsulate the given code into a function, write code to call it, and test the resulting values
3.2 - Read a Protein Data Bank file and compute the center of atoms by averaging the atomic coordinates