PC204 Lecture 6

Conrad Huang

conrad@cgl.ucsf.edu

Office: GH-N453A

Topics

Homework review
Exceptions
Module syntax
Using modules
Class syntax and using classes

Homework Review

5.1 - use os.walk to count files in a directory tree
5.2 - retrieve data from RCSB web service

Exception Handling

Exceptions are "thrown" when Python encounters something it cannot handle
Exceptions can indicate one of several conditions
- Fatal error in code
- Recoverable error in execution, e.g., bad data
- Expected (but usually rare) execution state

Fatal Errors

Some exceptions are generated by faulty code that never work correctly

>>> data = "this is not an integer"
>>> "%d" % data
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  TypeError: int argument required
>>> "%d %d" % (1, 2, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not all arguments converted during string formatting

Solution: Fix the code

Recoverable Errors

Some exceptions are generated by code that work some of the time
- For example, this code might throw an exception if an expected data file is missing
- But perhaps we can recover from the error by regenerating the data from the original text
To recover from an exception, we have to "catch" the thrown exception and run some recovery code by using a try statement

try … except … else

The general form for a try statement is:

try:
	statements that may throw an exception
except exception_type1 [, exception_data1]:
	recovery statements for exception_type1 errors
except exception_type2 [, exception_data2]:
	recovery statements for exception_type2 errors
else:
	statements executed if no exceptions thrown
finally:
	statements always executed whether exception thrown or not

There must be at least one except or finally clause
The else and finally clauses are optional

`try` Statements

A try statement is executed as follows:
- statements in the try clause are first executed
- If they do not generate an exception, statements in the else clause are executed
- Otherwise, the thrown exception is matched against the exception_types listed in the except clauses and the corresponding recovery statements are executed
- If an exception is thrown but no matching except clause is found, the exception is handled "normally" (as if the statements were not in a try statement) and neither else nor any except clause statements are executed

`try` Statements (cont.)

Note that the statements inside the try may be function calls
- The called function may also have try statements
- When an exception is raised, the last function called has the first opportunity to catch it
- If a function does not catch an exception, the "stack is unwound" and its caller gets a chance to catch the exception
- This continues until the main program has a chance to catch the exception
- Finally, the Python interpreter catches and reports any uncaught exceptions

`try` Statements (cont.)

Statements in the finally clause are always executed
- If no exception is thrown the finally is executed after the else clause (if there is one)
- If an exception is thrown and is caught by an except clause, the statements are executed after the except clause
- If an exception is thrown and is not caught, the exception is temporarily caught, the finally statements executed, and the exception rethrown

Recovery Example

Example of recovering from missing data file:

try:
    f = open(“bee.tsv”)
except IOError:
    f = open(“bee.txt”)
    … # Regenerate data directly from bee.txt
    f.close() # closes bee.txt file
else:
    … # Read cached data from bee.tsv
    f.close() # closes bee.tsv file

Note that if both bee.tsv and bee.txt are missing, an IOError exception will still be thrown because the open call in the recovery statements is not in a try statement

Recovery Example (cont.)

We want to close files in the presence of exceptions
- We cannot put f.close() in a finally clause because we may not have opened the file successfully (f is not defined)

A modern version using context managers (with statement):

try:
    with open("bee.tsv") as f:
        … # Read cached data from bee.tsv
except IOError:
    try:
        with open(“bee.txt”) as f:
            … # Regenerate data directly from bee.txt
    except IOError:
        … # Complain

We now catch IOError when we try to open either file

Expected Exceptions

try statements can be used in anticipation of rare or unusual conditions
Suppose d is a histogram and we want to update the count for some keys

The action is different for the first time a key is seen than for all subsequent times:

# Approach A: LBYL - Look Before You Leap
if my_key in d:       # Check first
    d[my_key] += 1    # Guaranteed safe
else:
    d[my_key] = 1     # First time, rare case

# Approach B: EAFP - Easier to Ask for Forgiveness than Permission
try:
    d[my_key] += 1    # Just do it
except KeyError:
    d[my_key] = 1     # First time, rare case

LBYL vs. EAFP

EAFP is endorsed by many Python experts because it tends to be more efficient and the code is generally easier to read
- There are fewer tests being performed
- The unusual conditions are distinctly and explicitly labeled
  - except clauses are generally considered more unusual than else clauses

Pitfalls of `try` Statements

It is possible to use a bare except: clause (without specifying exception types) in a try statement
- The bare except clause matches any exception type
- It is tempting to use this because it enables our programs to continue executing in the presence of all types of errors
- Unless we plan to handle all types of exceptions, this is a bad idea because it tends to intercept errors from any "higher level" try statements that may properly recover from specific types of errors

Writing Modules

Although Think Python only spends one very brief section on "Writing Modules", there is actually quite a bit more to say
Syntax for using multiple modules in a single program is very straightforward
Reasons for using modules and how code should be organized is more complex
- Avoid code duplication in multiple programs
- Help organize related functions and data

Module Syntax

Python treats any file with the .py suffix as a module, with the caveat that the part of the file name preceding .py forms a legal Python identifier
- So wc.py is a legal Python module name but 0wc.py is not
- But both are legal script names because Python does not use script names in code

For example, wc.py

def linecount(filename):
    count = 0
    for line in open(filename):
        count += 1
    return count

print(linecount(“wc.py”))

Module Syntax (cont.)

To use the wc module, we need to import it

>>> import wc
7
>>> print(wc)
<module 'wc' from 'wc.py'>
>>> import wc
>>> wc.linecount("wc.py")
7
>>> wc.linecount("bee.tsv")
75
>>> import 0wc
  File "", line 1
    import 0wc
           ^
SyntaxError: invalid syntax

Importing a Module

Executing import wc the first time:
- Creates a new module object
- Executes the code in wc.py within the context of the new module
- In the importing module, creates a variable named wc, which references the module object, for accessing the contents of the module
Executing import wc again only does only the very last step, i.e., the code in wc.py is not executed more than once

Module Context

Python has the concept of contexts or namespaces for modules
- Each module keeps track of its own set of variable names, so the same variable name in different modules refer to different variables
- The def linecount(…) statement in wc.py creates a function named linecount in the wc module
To access a function or variable in another module, we need to specify both the module and function/variable name, e.g., wc.linecount
Another example: each module has a variable named __name__ which contains the name of the module
- For the main program it has value "__main__"
- For our wc module, wc.__name__ has value "wc"

Importing a Module(cont.)

Remember that import wc creates two things:
- A module object that contains the module variables and functions. The module name is derived from the file name and is stored as the __name__ attribute in module.
- A variable in the importing module/script. The variable name is the same as the module name. This is why module names must be legal Python identifiers.
We can only refer to modules through variables
We can do things like: zzz = wc and refer to zzz.linecount, but the module name is still wc (as witnessed by zzz.__name__)

We now have two variables, wc (from the import) and zzz (from the assignment) that refer to the same module, wc

Importing a Module (cont.)

There are other forms of the import statement
- import module as myname
  - This does the same thing as import module except the variable created in the importing module is named myname instead of module
- from module import name
  - This creates a variable name in the importing module that refers to the same object as module.name at the time when the import statement is executed
  - This is mainly used to avoid having the imported module name appear many times in the code (either to reduce typing or to improve code readability)
  - You should only use this form with constants and functions, i.e., items that do not change value over time

Importing a Module (cont.)

from module import *
- For every variable or function (whose name does not begin with _) in the imported module, a corresponding variable of the same name is created in the importing module
- This was done frequently in the early days of Python to minimize typing
- It is generally accepted that this is a bad practice to be avoided when possible because it destroys the name-clash protection of multiple namespaces, and makes it difficult to track down where variables come from
- (This as an example of what not to do)

Referencing Variables

When a function executes, it looks for variables using the LSGB rule
- L(ocal) variables defined in the function
- S(cope) variables defined in enclosing functions
- G(lobal) variables defined in the module
- B(uilt-in) variables defined by Python
- scope.py shows some examples
The global variables refer to variables in the module where the function is defined, not the module where the function is called
Use of scope variables are infrequent because most code does not use nested functions (functions defined within functions)
- There is a design pattern for using nested function definition for "callback functions," but that is advanced Python

Referencing Variables (cont.)

LSGB rule only applies to variables without explicit module reference
Variables of form module.name always refer to global variables from module

# Using gmod.print_var
>>> var = 20
>>> import gmod
10
>>> gmod.print_var()
10
>>> from gmod import print_var
>>> print_var()
10

# Contents of gmod.py
var = 10

def print_var():
        print(var)

print_var()

Module Location

Where does Python look for module source files?
- Python is shipped with many modules ("batteries included") and they are all part of the Python installation
- Modules that go with your main program should be in the same folder as the main program itself
- If you have modules that are shared among multiple programs, you can either install them in the Python installation location, or set up your own module folder and modify sys.path or PYTHONPATH (see notes from previous week)

Using Modules

Why use modules?
Modules are an organizational tool

Put related functions together into the same file
Avoid having multiple copies of the same code

Functional decomposition (a.k.a, divide and conquer)
- Put all code related to one task into a single file
  - markov2_prep.py for preprocssing text into histograms
  - markov2_use.py for using histograms to generate text
- Main drawback is code duplication if the same subtask needs to be done in multiple modules, e.g. shift
- What if other programs also read the data files? Do we replicate read_grams in all of them?

Modular Programming

How do we avoid duplicating code?
- Put common code into files shared by multiple programs
Modular programming
- Put all code related to a subtask into a single file
  - markov3_io.py for reading and writing histogram data files (along with shift so that it is not duplicated)
  - markov3_prep.py for preprocssing text into histograms, using markov2_io.py functions to write histograms
  - markov3_use.py for using histograms to generate text, using markov2_io.py functions to read histograms
- (How do you choose the extent of a subtask?)

Modular Programming (cont.)

On the plus side:
- There is only one copy of the shift function
- we no longer need to change either markov3_prep.py or markov3_use.py if we decide to use a different storage format; we just change markov3_io.py
read_grams returns two dictionaries with specific key and value types, e.g., dictionary values are redundant list of words
We would have to change all the files if we decide to use a different data structure for the prefix-suffix mapping, e.g., use a histogram instead of a redundant list of words
Can we apply the shared module concept further to minimize work when changing code?

Object-Oriented Programming

Move from syntactic (focusing on programming language statements) to semantic (focusing on the intent of the statements)
In markov3_use.py:
```
next_word = random.choice(m2[prefix])
```
How do we interpret this statement?
- Syntactically: choose a random value from the list of values that appear for key prefix in dictionary m2
- Semantically: choose a random value from the list of words that follow the two-word prefix using bigram-suffix mapping m2

Object-Oriented Programming (cont.)

Assuming we:

def random_suffix(m, prefix):
   return random.choice(m[prefix])

We can use the statement:
```
next_word = random_suffix(m2, prefix)
```
instead of:
```
next_word = random.choice(m2[prefix])
```

Why bother?
- The reader gets a clearer idea of what is happening ("Oh, we’re retrieving a random word following prefix.")
- We can change how random_suffix is implemented (e.g., bias the word choice by the length of the word) without changing any other code in the program

Object-Oriented Programming (cont.)

Select a concept that can be represented as a collection of data structures
Group it together with the operations (functions) associated with the concept
Put the data structures and operations together and call the combination a "class" for the concept

Object-Oriented Programming (cont.)

Our markov3_*.py example has three files
- markov3_prep.py reads a text file and generates two mappings: unigram-to-suffix and bigram-to-suffix
- markov3_use.py uses the precomputed mappings to generate a partial sentence
- markov3_io.py reads and writes the mappings
What is a concept (and therefore candidate class) that spans the three files?

Object-Oriented Programming (cont.)

Concept: prefix-suffix mapping
- We could have chosen to use two concepts: unigram-suffix mapping and bigram-suffix mapping
We extract all data structures and operations on prefix-suffix mapping and put them into markov4_gram.py
markov4_prep.py and markov4_use.py are the same as their markov3 counterparts, but rewritten to use functions from markov4_gram.py (instead of accessing dictionaries directly)

Interfaces and Implementations

Once the prep and use programs no longer directly access the mapping data, we are free to change how we represent the mapping data
This is the separation of interface from implementation (aka data abstraction or data encapsulation)

Interface (aka API or application programming interface) is what callers of a module uses, e.g., functions and variables
Implementation is all the code within the module that makes using the interface work, e.g., module variables and function definitions
As long as the module interface remains the same, the implementation may be changed at will

Interfaces and Implementations (cont.)

Another way to look at it:
- An API or interface is the border between the caller and the callee
- It defines what can be done semantically with a concept
- An implementation is the underlying code that makes the semantic operations possible
- A caller should only care about the semantics and never about the underlying code
- A callee should only care about what changes to make in the class/module and never about how the return values will be used
- The underlying code may be changed as long as it re-implements the same or a superset of the API
  - Adding new functionality is fine
  - Removing or changing functionality is not

Interfaces and Implementations (cont.)

In our example, markov4_gram.py uses a redundant word list to represent possible suffixes for a given prefix
We can change the implementation to using a word histogram and save a lot of memory
In the new set of programs, notice that only markov5_gram.py differs from markov4_gram.py; markov5_prep.py and markov5_use.py are essentially identical to their markov4 counterparts

Class Syntax and Using Classes

Note that in our example, we used only functions and modules to do object-oriented programming (OOP)
Python (and many other languages such as C++ and Java) supports OOP by providing some extra constructs that aid bookkeeping
For example, each of our mappings is implemented using a single dictionary; there is no code to guarantee that we do not mistakenly use a unigram as the prefix for the bigram mapping
We can implement each mapping as a 2-tuple, with element 0 being the prefix length and element 1 being the dictionary, but this makes the code harder to read

Class Syntax

Python provides a class syntax that allows us to group data together and access them by name

class ClassName:
	"""Documentation string"""
Instance1 = ClassName()
instance1.first_attribute = first_value
print(instance1.first_attribute)
Instance2 = ClassName()
instance2.second_attribute = second_value
print(instance2.second_attribute)

Class Syntax (cont.)

We can switch from dictionary to class syntax very easily

Class Syntax (cont.)

Classes offer much more than just bookkeeping
Next two weeks, more on classes and OOP
- attributes and methods
- initialization (constructor) and termination (destructor)
- inheritance and polymorphism

Steps in Software Design

Figure out what problem you are solving
Analyze the problem to identify concepts (divide and conquer)
Figure out what data and functions are needed

Steps in Programming

Write simplest code that solves the problem
Write test code and debug
- Go to 1
Measure performance
Optimize
- Speed up hotspots
- Change algorithms
- Go to 3

Homework

6.1 - rectangles
- Copy some code that use classes
- Write some code that implement additional operations
6.2 - more rectangles
- Write some code that calls the rectangle code
What would you change to make the code more object-oriented?

PC204 Lecture 6

Conrad Huang

conrad@cgl.ucsf.edu

Office: GH-N453A

Topics

Homework Review

Exception Handling

Fatal Errors

Recoverable Errors

try … except … else

try Statements

try Statements (cont.)

try Statements (cont.)

Recovery Example

Recovery Example (cont.)

Expected Exceptions

LBYL vs. EAFP

Pitfalls of try Statements

Writing Modules

Module Syntax

Module Syntax (cont.)

Importing a Module

Module Context

Importing a Module(cont.)

Importing a Module (cont.)

Importing a Module (cont.)

Referencing Variables

Referencing Variables (cont.)

Module Location

Using Modules

Modular Programming

Modular Programming (cont.)

Object-Oriented Programming

Object-Oriented Programming (cont.)

Object-Oriented Programming (cont.)

Object-Oriented Programming (cont.)

Object-Oriented Programming (cont.)

Interfaces and Implementations

Interfaces and Implementations (cont.)

Interfaces and Implementations (cont.)

Class Syntax and Using Classes

Class Syntax

Class Syntax (cont.)

Class Syntax (cont.)

Steps in Software Design

Steps in Programming

Homework

`try` Statements

`try` Statements (cont.)

`try` Statements (cont.)

Pitfalls of `try` Statements