PC204 Lecture 6

Conrad Huang

conrad@cgl.ucsf.edu

Office: GH-N453A

Topics

  • Homework review
  • Exceptions
  • Module syntax
  • Using modules
  • Class syntax and using classes

Homework Review

  • 5.1 - use os.walk to count files in a directory tree
  • 5.2 - retrieve data from RCSB web service

Exception Handling

  • Exceptions are "thrown" when Python encounters something it cannot handle
  • Exceptions can indicate one of several conditions
    • Fatal error in code
    • Recoverable error in execution, e.g., bad data
    • Expected (but usually rare) execution state

Fatal Errors

  • Some exceptions are generated by faulty code that never work correctly
      >>> data = "this is not an integer"
      >>> "%d" % data
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        TypeError: int argument required
      >>> "%d %d" % (1, 2, 3)
      Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      TypeError: not all arguments converted during string formatting
  • Solution: Fix the code

Recoverable Errors

  • Some exceptions are generated by code that work some of the time
    • For example, this code might throw an exception if an expected data file is missing
    • >>> f = open('bee.tsv')
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
      IOError: [Errno 2] No such file or directory: 'bee.tsv'
    • But perhaps we can recover from the error by regenerating the data from the original text
  • To recover from an exception, we have to "catch" the thrown exception and run some recovery code by using a try statement

try … except … else

  • The general form for a try statement is:
    try:
    	statements that may throw an exception
    except exception_type1 [, exception_data1]:
    	recovery statements for exception_type1 errors
    except exception_type2 [, exception_data2]:
    	recovery statements for exception_type2 errors
    else:
    	statements executed if no exceptions thrown
    finally:
    	statements always executed whether exception thrown or not
  • There must be at least one except or finally clause
  • The else and finally clauses are optional

try Statements

  • A try statement is executed as follows:
    • statements in the try clause are first executed
    • If they do not generate an exception, statements in the else clause are executed
    • Otherwise, the thrown exception is matched against the exception_types listed in the except clauses and the corresponding recovery statements are executed
    • If an exception is thrown but no matching except clause is found, the exception is handled "normally" (as if the statements were not in a try statement) and neither else nor any except clause statements are executed

try Statements (cont.)

  • Note that the statements inside the try may be function calls
    • The called function may also have try statements
    • When an exception is raised, the last function called has the first opportunity to catch it
    • If a function does not catch an exception, the "stack is unwound" and its caller gets a chance to catch the exception
    • This continues until the main program has a chance to catch the exception
    • Finally, the Python interpreter catches and reports any uncaught exceptions

try Statements (cont.)

  • Statements in the finally clause are always executed
    • If no exception is thrown the finally is executed after the else clause (if there is one)
    • If an exception is thrown and is caught by an except clause, the statements are executed after the except clause
    • If an exception is thrown and is not caught, the exception is temporarily caught, the finally statements executed, and the exception rethrown

Recovery Example

  • Example of recovering from missing data file:
    try:
        f = open(“bee.tsv”)
    except IOError:
        f = open(“bee.txt”)
        … # Regenerate data directly from bee.txt
        f.close() # closes bee.txt file
    else:
        … # Read cached data from bee.tsv
        f.close() # closes bee.tsv file
  • Note that if both bee.tsv and bee.txt are missing, an IOError exception will still be thrown because the open call in the recovery statements is not in a try statement

Recovery Example (cont.)

  • We want to close files in the presence of exceptions
    • We cannot put f.close() in a finally clause because we may not have opened the file successfully (f is not defined)
  • A modern version using context managers (with statement):
    try:
        with open("bee.tsv") as f:
            … # Read cached data from bee.tsv
    except IOError:
        try:
            with open(“bee.txt”) as f:
                … # Regenerate data directly from bee.txt
        except IOError:
            … # Complain
  • We now catch IOError when we try to open either file

Expected Exceptions

  • try statements can be used in anticipation of rare or unusual conditions
  • Suppose d is a histogram and we want to update the count for some keys
  • The action is different for the first time a key is seen than for all subsequent times:
    # Approach A: LBYL - Look Before You Leap
    if my_key in d:       # Check first
        d[my_key] += 1    # Guaranteed safe
    else:
        d[my_key] = 1     # First time, rare case
    
    # Approach B: EAFP - Easier to Ask for Forgiveness than Permission
    try:
        d[my_key] += 1    # Just do it
    except KeyError:
        d[my_key] = 1     # First time, rare case
    

LBYL vs. EAFP

  • EAFP is endorsed by many Python experts because it tends to be more efficient and the code is generally easier to read
    • There are fewer tests being performed
    • The unusual conditions are distinctly and explicitly labeled
      • except clauses are generally considered more unusual than else clauses

Pitfalls of try Statements

  • It is possible to use a bare except: clause (without specifying exception types) in a try statement
    • The bare except clause matches any exception type
    • It is tempting to use this because it enables our programs to continue executing in the presence of all types of errors
    • Unless we plan to handle all types of exceptions, this is a bad idea because it tends to intercept errors from any "higher level" try statements that may properly recover from specific types of errors

Writing Modules

  • Although Think Python only spends one very brief section on "Writing Modules", there is actually quite a bit more to say
  • Syntax for using multiple modules in a single program is very straightforward
  • Reasons for using modules and how code should be organized is more complex
    • Avoid code duplication in multiple programs
    • Help organize related functions and data

Module Syntax

  • Python treats any file with the .py suffix as a module, with the caveat that the part of the file name preceding .py forms a legal Python identifier
    • So wc.py is a legal Python module name but 0wc.py is not
    • But both are legal script names because Python does not use script names in code
  • For example, wc.py
    def linecount(filename):
        count = 0
        for line in open(filename):
            count += 1
        return count
    
    print(linecount(“wc.py”))

Module Syntax (cont.)

  • To use the wc module, we need to import it
    >>> import wc
    7
    >>> print(wc)
    <module 'wc' from 'wc.py'>
    >>> import wc
    >>> wc.linecount("wc.py")
    7
    >>> wc.linecount("bee.tsv")
    75
    >>> import 0wc
      File "", line 1
        import 0wc
               ^
    SyntaxError: invalid syntax
    

Importing a Module

  • Executing import wc the first time:
    • Creates a new module object
    • Executes the code in wc.py within the context of the new module
    • In the importing module, creates a variable named wc, which references the module object, for accessing the contents of the module
  • Executing import wc again only does only the very last step, i.e., the code in wc.py is not executed more than once

Module Context

  • Python has the concept of contexts or namespaces for modules
    • Each module keeps track of its own set of variable names, so the same variable name in different modules refer to different variables
    • The def linecount(…) statement in wc.py creates a function named linecount in the wc module
  • To access a function or variable in another module, we need to specify both the module and function/variable name, e.g., wc.linecount
  • Another example: each module has a variable named __name__ which contains the name of the module
    • For the main program it has value "__main__"
    • For our wc module, wc.__name__ has value "wc"

Importing a Module(cont.)

  • Remember that import wc creates two things:
    • A module object that contains the module variables and functions. The module name is derived from the file name and is stored as the __name__ attribute in module.
    • A variable in the importing module/script. The variable name is the same as the module name. This is why module names must be legal Python identifiers.
  • We can only refer to modules through variables
  • We can do things like: zzz = wc and refer to zzz.linecount, but the module name is still wc (as witnessed by zzz.__name__)
    • We now have two variables, wc (from the import) and zzz (from the assignment) that refer to the same module, wc

Importing a Module (cont.)

  • There are other forms of the import statement
    • import module as myname
      • This does the same thing as import module except the variable created in the importing module is named myname instead of module
    • from module import name
      • This creates a variable name in the importing module that refers to the same object as module.name at the time when the import statement is executed
      • This is mainly used to avoid having the imported module name appear many times in the code (either to reduce typing or to improve code readability)
      • You should only use this form with constants and functions, i.e., items that do not change value over time

Importing a Module (cont.)

  • from module import *
    • For every variable or function (whose name does not begin with _) in the imported module, a corresponding variable of the same name is created in the importing module
    • This was done frequently in the early days of Python to minimize typing
    • It is generally accepted that this is a bad practice to be avoided when possible because it destroys the name-clash protection of multiple namespaces, and makes it difficult to track down where variables come from
    • (This as an example of what not to do)

Referencing Variables

  • When a function executes, it looks for variables using the LSGB rule
    • L(ocal) variables defined in the function
    • S(cope) variables defined in enclosing functions
    • G(lobal) variables defined in the module
    • B(uilt-in) variables defined by Python
    • scope.py shows some examples
  • The global variables refer to variables in the module where the function is defined, not the module where the function is called
  • Use of scope variables are infrequent because most code does not use nested functions (functions defined within functions)
    • There is a design pattern for using nested function definition for "callback functions," but that is advanced Python

Referencing Variables (cont.)

  • LSGB rule only applies to variables without explicit module reference
  • Variables of form module.name always refer to global variables from module
  • # Using gmod.print_var
    >>> var = 20
    >>> import gmod
    10
    >>> gmod.print_var()
    10
    >>> from gmod import print_var
    >>> print_var()
    10
    
    # Contents of gmod.py
    var = 10
    
    def print_var():
            print(var)
    
    print_var()
    

Module Location

  • Where does Python look for module source files?
    • Python is shipped with many modules ("batteries included") and they are all part of the Python installation
    • Modules that go with your main program should be in the same folder as the main program itself
    • If you have modules that are shared among multiple programs, you can either install them in the Python installation location, or set up your own module folder and modify sys.path or PYTHONPATH (see notes from previous week)

Using Modules

  • Why use modules?
  • Modules are an organizational tool
    • Put related functions together into the same file
    • Avoid having multiple copies of the same code
  • Functional decomposition (a.k.a, divide and conquer)
    • Put all code related to one task into a single file
    • Main drawback is code duplication if the same subtask needs to be done in multiple modules, e.g. shift
    • What if other programs also read the data files? Do we replicate read_grams in all of them?

Modular Programming

  • How do we avoid duplicating code?
    • Put common code into files shared by multiple programs
  • Modular programming
    • Put all code related to a subtask into a single file
      • markov3_io.py for reading and writing histogram data files (along with shift so that it is not duplicated)
      • markov3_prep.py for preprocssing text into histograms, using markov2_io.py functions to write histograms
      • markov3_use.py for using histograms to generate text, using markov2_io.py functions to read histograms
    • (How do you choose the extent of a subtask?)

Modular Programming (cont.)

  • On the plus side:
    • There is only one copy of the shift function
    • we no longer need to change either markov3_prep.py or markov3_use.py if we decide to use a different storage format; we just change markov3_io.py
  • read_grams returns two dictionaries with specific key and value types, e.g., dictionary values are redundant list of words
  • We would have to change all the files if we decide to use a different data structure for the prefix-suffix mapping, e.g., use a histogram instead of a redundant list of words
  • Can we apply the shared module concept further to minimize work when changing code?

Object-Oriented Programming

  • Move from syntactic (focusing on programming language statements) to semantic (focusing on the intent of the statements)
  • In markov3_use.py:
    next_word = random.choice(m2[prefix])
  • How do we interpret this statement?
    • Syntactically: choose a random value from the list of values that appear for key prefix in dictionary m2
    • Semantically: choose a random value from the list of words that follow the two-word prefix using bigram-suffix mapping m2

Object-Oriented Programming (cont.)

  • Assuming we:
      def random_suffix(m, prefix):
         return random.choice(m[prefix])
    • We can use the statement:
      next_word = random_suffix(m2, prefix)
    • instead of:
      next_word = random.choice(m2[prefix])
  • Why bother?
    • The reader gets a clearer idea of what is happening ("Oh, we’re retrieving a random word following prefix.")
    • We can change how random_suffix is implemented (e.g., bias the word choice by the length of the word) without changing any other code in the program

Object-Oriented Programming (cont.)

  • Select a concept that can be represented as a collection of data structures
  • Group it together with the operations (functions) associated with the concept
  • Put the data structures and operations together and call the combination a "class" for the concept

Object-Oriented Programming (cont.)

  • Our markov3_*.py example has three files
    • markov3_prep.py reads a text file and generates two mappings: unigram-to-suffix and bigram-to-suffix
    • markov3_use.py uses the precomputed mappings to generate a partial sentence
    • markov3_io.py reads and writes the mappings
  • What is a concept (and therefore candidate class) that spans the three files?

Object-Oriented Programming (cont.)

  • Concept: prefix-suffix mapping
    • We could have chosen to use two concepts: unigram-suffix mapping and bigram-suffix mapping
  • We extract all data structures and operations on prefix-suffix mapping and put them into markov4_gram.py
  • markov4_prep.py and markov4_use.py are the same as their markov3 counterparts, but rewritten to use functions from markov4_gram.py (instead of accessing dictionaries directly)

Interfaces and Implementations

  • Once the prep and use programs no longer directly access the mapping data, we are free to change how we represent the mapping data
  • This is the separation of interface from implementation (aka data abstraction or data encapsulation)
    • Interface (aka API or application programming interface) is what callers of a module uses, e.g., functions and variables
    • Implementation is all the code within the module that makes using the interface work, e.g., module variables and function definitions
    • As long as the module interface remains the same, the implementation may be changed at will

Interfaces and Implementations (cont.)

  • Another way to look at it:
    • An API or interface is the border between the caller and the callee
    • It defines what can be done semantically with a concept
    • An implementation is the underlying code that makes the semantic operations possible
    • A caller should only care about the semantics and never about the underlying code
    • A callee should only care about what changes to make in the class/module and never about how the return values will be used
    • The underlying code may be changed as long as it re-implements the same or a superset of the API
      • Adding new functionality is fine
      • Removing or changing functionality is not

Interfaces and Implementations (cont.)

  • In our example, markov4_gram.py uses a redundant word list to represent possible suffixes for a given prefix
  • We can change the implementation to using a word histogram and save a lot of memory
  • In the new set of programs, notice that only markov5_gram.py differs from markov4_gram.py; markov5_prep.py and markov5_use.py are essentially identical to their markov4 counterparts

Class Syntax and Using Classes

  • Note that in our example, we used only functions and modules to do object-oriented programming (OOP)
  • Python (and many other languages such as C++ and Java) supports OOP by providing some extra constructs that aid bookkeeping
  • For example, each of our mappings is implemented using a single dictionary; there is no code to guarantee that we do not mistakenly use a unigram as the prefix for the bigram mapping
  • We can implement each mapping as a 2-tuple, with element 0 being the prefix length and element 1 being the dictionary, but this makes the code harder to read

Class Syntax

  • Python provides a class syntax that allows us to group data together and access them by name
    class ClassName:
    	"""Documentation string"""
    Instance1 = ClassName()
    instance1.first_attribute = first_value
    print(instance1.first_attribute)
    Instance2 = ClassName()
    instance2.second_attribute = second_value
    print(instance2.second_attribute)

Class Syntax (cont.)

Class Syntax (cont.)

  • Classes offer much more than just bookkeeping
  • Next two weeks, more on classes and OOP
    • attributes and methods
    • initialization (constructor) and termination (destructor)
    • inheritance and polymorphism

Steps in Software Design

  • Figure out what problem you are solving
  • Analyze the problem to identify concepts (divide and conquer)
  • Figure out what data and functions are needed

Steps in Programming

  1. Write simplest code that solves the problem
  2. Write test code and debug
    • Go to 1
  3. Measure performance
  4. Optimize
    • Speed up hotspots
    • Change algorithms
    • Go to 3

Homework

  • 6.1 - rectangles
    • Copy some code that use classes
    • Write some code that implement additional operations
  • 6.2 - more rectangles
    • Write some code that calls the rectangle code
  • What would you change to make the code more object-oriented?