PC204 Lecture 7

Conrad Huang

conrad@cgl.ucsf.edu

Office: GH-N453A

Topics

Homework review
Object-oriented design
Object-oriented programming with Python

Homework Review

6.1 - Rectangle functions
6.2 - area_difference
Object-oriented perspective
- 6.1 defines and implements rectangle API
  - Methods: create, convert to string, shift and offset
  - Attributes: width, height, corner (x, y)
- 6.2 uses the API
  - Area difference requires two rectangles and is not a method of a single rectangle (usually)

Object-Oriented Design and Programming

Designing programs around objects instead of functions or data
Conceptually, an object is something we think of as a single unit (i.e., state and behavior)
Collective, all objects that share the same behavior (but not necessarily the same data) form a class
Programmatically, an object is represented by attributes (data/state, generally unique for each object) and methods (function/behavior, generally shared by all objects of the same class)

OOD and OOP Criteria

Object-oriented design (OOD) focuses on what are the attributes and methods (the interface) of objects/classes while object-oriented programming (OOP) focuses on how to make the interface functional (the implementation).
Abstraction: semantics defined by the interfaces of one or more classes. The quality of abstraction is discussed in terms of:
- Cohesion: how much attributes and methods in a single class are related to each other (higher is better).
- Coupling: how much interdependencies there are among a set of classes (lower is better).
- Goal is to design a set of cohesive classes with low coupling.
Encapsulation: concealment of implementation for a class. The degree of encapsulation is measured by how much internal data structures are accessible externally. (Appropriate degree of encapsulation depends on the application.)

Divide and Conquer

Abstraction and encapsulation allow us to separate the implementation and use of classes
- When implementing a class, we do not need to worry about how methods are used, e.g. "will the area of a rectangle be used for comparing sizes or computing the total area?"
- When using a class, we do not need to worry about how methods are implemented, e.g. "for calculating the rectangle ares, are the width and height stored as a tuple or two attributes?"
This separation leads to more organized programming and testing
- Unit test - does this class work by itself?
- Integration - do all these classes work together?

Why Is OOD/OOP a Good Paradigm?

An object-oriented approach is one of the commonly used programming methodologies.
OOD/OOP are good for similar reasons that modules are a good idea:
- Divide-and-conquer approach allows developers to focus on smaller problems instead of everything at once.
- Small, independent units of code (whether a module or class) are easier to write and debug.
- Complete solution is developed by integrating multiple tested units into a complete system. Integration is simplified because we think in terms of unit interface semantics, not unit implementation details.

Why Is OOD/OOP a Good Paradigm? (cont.)

… and because:
- Classes may inherit from other classes to form class hierarchies with a low degree of code duplication.
- Classes may be polymorphic so that they may be used interchangeably in generic algorithms.

Designing "Object-Oriented-ly"

Good design is still an art. People with different ideas come up with different abstractions.
Programmatic objects (classes) often correspond to real-world objects.
Frequently, enumerating a variety of subtasks helps identify classes. Nouns are class candidates. Nouns that occur in multiple contexts are better candidates because creating such a class reduces code duplication through increased sharing by multiple subtasks.
When a design may be achieved using several different sets of classes, compare cohesion (high is better) and coupling (low is better) among the solutions.
In general, large classes that do many things should be reviewed to see whether they can be logically split into smaller independent classes. Several small classes that refer to each other very frequently should be reviewed to see whether they can be combined into a single class with a smaller cohesive interface.
Multiple design iterations are commonplace. Don't be too attached to early decisions.

OOP in Python

Implementing a Rectangle class in Python is like writing a module, but indented:

class Rectangle:  
  NUMBER_OF_SIDES = 4
  def __init__(self, w, h, o):
    self.width = w
    self.height = h
    self.origin = o
  def area(self):
    return self.width * self.height
  def shift(self, dx, dy):
    self.origin = (self.origin[0] + dx, self.origin[1] + dy)

Using the class:

r1 = Rectangle(10, 20, (0,0))
r2 = Rectangle(20, 10, (-10,5))
print(r1.origin)
r1.shift(5, 12)
print("r1", r1.origin, r1.area())
print("r2", r2.origin, r2.area())

Objects, Classes, Instances

In Python, everything is an object, which may be an instance of a class.
- In Python 2, an object has a type and, if of type instance, a class
- In Python 3, an object is always an instance and its type is its class.
A class object defines, among other things, a collection of methods (verbosely instance methods) that define how instances of the class behave.
An instance is an object that is associated with a class object. There may be many instances associated with the same class object.
- All instances of the same class have the same shared methods defined by their class object.
- Each instance has its own attribute values that are distinct from attribute values of other instances.
- When a method is defined in a class, it can refer to instance attributes.
- An instance method cannot be called unless it is bound to an instance. This is because references to instance attributes are undefined unless there is a definitive instance in which to search for attribute values.

Classes

A class object is defined by a class statement:
- A class statement is introduced by the class keyword:
```
class Rectangle:
```
- In Python 2, the above statement creates an old-style class while the preferred new-style class is defined with:
```
class Rectangle:
```
def statement indented in a class statement define instance methods.
- Method definitions look exactly like function definitions, except the first argument of a method always refer to the bound instance and is conventionally named self.
Assignment statements indented in class statement define class attributes (as opposed to instance attributes).
- Class attributes are associated with the class object, not instances.
- Unlike instance attributes (which are unique for each instance), all instances of the class share the same class attribute.

Class Attributes

Although used infrequently, class attributes come in handy for keeping track of information about the class object (not instances) and for defining shared constants.
In our example:
```
class Rectangle:  
  NUMBER_OF_SIDES = 4
  [...]
```
we define a class attribute NUMBER_OF_SIDES. There is only one NUMBER_OF_SIDES shared by the class object and all instances of Rectangle, i.e., changing the value of NUMBER_OF_SIDE:
```
Rectangle.NUMBER_OF_SIDES = 6
```
will change it for all Rectangle instances as well as the Rectangle class object.
(NUMBER_OF_SIDES is all caps to conform with Python naming convention. Google for "Python PEP 8".)

Methods

Defining a method is exactly analogous to defining a function, except the def statement is indented inside a class statement.
All methods take at least one argument. The first argument refers to the instance in which to look for any referenced instance attributes. By convention, the first argument is named self.
In our example:
```
class Rectangle:  
  [...]
  def area(self):
    return self.width * self.height
  [...]
```
we define the method area. When area is invoked, the first argument, self, is bound (assigned) to a Rectangle instance, and values for the width and height attributes from that instance are used to calculate the return value.
Python will make sure that the first argument is an instance of the correct class. For example, if we have two classes, Rectangle and City, that both define area methods, Python will raise an exception if Rectangle.area is called with an instance of City as self.

Class Namespace

Each class has its own namespace. Different classes may use the same method and class attribute names without worrying about ambiguity. You can think of classes as something like mini-modules.
Methods and class attributes share the same namespace within a class.
- If there are multiple def statements defining methods with the same name, last one wins.
- If there are multiple assignment statements defining class attributes with the same name, last one wins.
- If there are definitions of methods and class attributes with the same name, last one wins.
The moral of the story: use any attribute and method names in different classes, but use unique names within a class.

Instances

A class statement defines a single class object. To create instances of a class, the class object is used as if it were a function:

class Rectangle:  
  NUMBER_OF_SIDES = 4
  def __init__(self, w, h, o):
    self.width = w
    self.height = h
    self.origin = o
  [...]
r1 = Rectangle(10, 20, (0,0))
r2 = Rectangle(20, 10, (-10,5))
[...]

When a class object is called as a function:
1. it creates an instance of the class;
2. the __init__ method, if defined, is invoked with the first argument (self) set to the newly created instance, and all function call arguments are passed as additional arguments to __init__.
As its name suggests, __init__ is where instance initialization should occur, e.g., setting instance attributes according to passed arguments. All methods and class attributes may be used as part of instance initialization.

Instances (cont.)

An instance has access to:
- its own instance attributes (data unique to the instance and not shared with either its class object nor other instances of the class),
- class attributes (data associate with the class object), and
- methods (functions defined in the class object).
Instance attributes, class attributes, and methods are all accessed using the same syntax of instance.name, where name is the name of an attribute or method.

Instances (cont.)

In this example:

class Rectangle:  
  NUMBER_OF_SIDES = 4
  def __init__(self, w, h, o):
    self.width = w                      # set instance attribute
    self.height = h
    self.origin = o
  def area(self):
    return self.width * self.height     # use instance attributes
  [...]
r1 = Rectangle(10, 20, (0,0))           # create instance
print(r1.width, r1.height)              # use instance attribute
print(r1.NUMBER_OF_SIDES)               # use class attribute
print(r1.area())                        # call method

Rectangle

When we create r1, __init__ is implicitly called with self set to the new instance and w, h and o set to 10, 20 and (0,0), respectively.
The three print statements consecutively access instance attributes (set in the call to __init__), a class attribute (defined by class statement), and call a method. Even though the syntax for all three statements are similar, they access different types of data associated with r1.

Instance Attributes

Because an expression instance.name may potentially refer to several types of data associated with the instance, we need some precedence rules to take care of possible ambiguities.
For retrieving data:
1. If name matches an instance attribute, the value of the instance attribute is used;
2. If name matches a class attribute or method in the class object, use that value (note that the name cannot match both a class attribute and a method since defining one overrides the other);
3. If neither #1 or #2, then raise AttributeError exception.
For defining data (usually an assigment statment):
1. If name matches an instance attribute, the attribute is updated with the new value;
2. If name does not match an instance attribute, create an instance attribute with the new value.
Note the asymmetry of accessing and setting attributes.

Instance Attributes (cont.)

Each instance is its own namespace.
For retrieving an attribute value, the instance namespace is checked first for the attribute name. If the name is not found in the instance namespace, the class object namespace is checked. (This is similar to the idea of LSGB scoping where a name is searched in progressively less specific namespaces.)
For assigning an attribute value, only the instance namespace is used. The assignment either replaces an existing value, or creates a new value. There are two ramifications:
- New attributes may be created for each instance, independently of the class object or other instances. So different instances may have different attributes. (That is not generally considered a good idea, but Python allows it.)
- Instance attributes can shadow class attributes (as the example below shows).

Instance Attributes (cont.)

The following example illustrates the pitfalls of undisciplined use of instance attribute names:

class Rectangle:  
  NUMBER_OF_SIDES = 4
  def __init__(self, w, h, o):
    self.width = w
    self.height = h
    self.origin = o
  def area(self):
    return self.width * self.height

r = Rectangle(20, 10, (0,0))
print(r.NUMBER_OF_SIDES)        # use class attribute
r.NUMBER_OF_SIDES = 6           # define _instance_ attribute
print(r.NUMBER_OF_SIDES)        # uses new instance attribute!
print(r.area())                 # call method
r.area = 300                    # define instance attribute
print(r.area())                 # call 300 as function?!

Calling Methods

Calling a method is slightly more complicated than calling a function. The following example shows two different ways of calling a method:

class Rectangle:  
  NUMBER_OF_SIDES = 4
  def __init__(self, w, h, o):
    self.width = w
    self.height = h
    self.origin = o
  def area(self):
    return self.width * self.height
  def shift(self, dx, dy):
    self.origin = (self.origin[0] + dx, self.origin[1] + dy)

r = Rectangle(20, 10, (0,0))
print(r.area())                 # Call _bound_ method
print(Rectangle.area(r))        # Call _unbound_ method

Calling Methods (cont.)

A method may be found in two ways:
- Associated with an instance, e.g., r.area. In this case, the found method is called a bound method because there is already an instance, r, associated with how the method was found.
- Associated with a class, e.g., Rectangle.area. In this case, the found method is called an unbound method because there is no instance associated with how the method was found.
When an bound method is called, Python implicitly inserts the instance used to find the method as the method's first argument, self. That is why even though area is defined to take one argument, the call r.area() passes zero arguments.
When an unbound method is called, Python has no idea what instance should be used and therefore does not insert the first argument. That is why we must explicitly pass r as the first argument in Rectangle.area(r).
Bound methods are the preferred way to call methods because the class is not named explicitly. The advantage of this will be discussed in the inheritance and polymorphism topics.

Calling Methods (cont.)

A very common error with using methods is shown below:

>>> class C:
...     def m(self, v):
...             print(self, v)
...
>>> i = C()
>>> i.m(10)
<__main__.C object at 0x102f4f950> 10
>>> i.m()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: m() takes exactly 2 arguments (1 given)
>>> i.m(i, 10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: m() takes exactly 2 arguments (3 given)

Remember to account for the implicit first argument in a call to a bound method.

Special Methods

In addition to executing methods when they are explicitly called, Python also calls "special methods" (if they exist) under certain conditions. All special methods have names that begin with __ (double underscore).
- __ prefixes get special treatment from Python and you should not use them unless you know exactly what you are doing.
For example, when an instance is created, Python calls a method __init__ if it is defined in the class.
Another special method is __str__ which is called to convert an instance into its string representation.

str

__str__ overrides the default string representation that Python uses for instances:

>>> class Rectangle:
...  pass
...
>>> r = Rectangle()  
>>> print(r) 
<__main__.Rectangle object  at  0x100599fd0>  
>>> class Rectangle2:
...  def __str__(self):
...    return "I am a Rectangle2 instance"
...
>>> r2 = Rectangle2()
>>> print(r2)
I am a Rectangle2 instance

A better choice may be to include the rectangle origin and size in the output.

Operator Overloading

Python can call methods when standard operators (e.g., +, –, * and /) are used
If one of the operands of the operator is an instance of a class that defines a corresponding special method, then the method is called with the operand(s) as arguments. The return value of the special method is expected to "make sense".

For example:

>>> class Vector:
...  def __init__(self, x, y):
...    self.x = x
...    self.y = y
...  def __str__(self):
...    return "(%g,%g)" % (self.x, self.y)
...  def __add__(self, other):
...    "Overload + operator"
...    return Vector(self.x + other.x, self.y + other.y)
...
>>> print(Vector(1, 2) + Vector(2, 4))
(3, 6)

The __add__ method is called automatically when the left operand of + is an instance of Vector

Operator Overloading (cont.)

Other operators that can be overloaded include list or map lookup ([]), function call (()), comparison (<, >, etc.) and even getting or setting attributes (.).
See the Data model page ( Python 2, Python 3) for the full list of operators that you can overload and the names of their corresponding special methods.
Operator overloading should only be used when the operator "makes sense", e.g. overriding + for vectors. Gratuitous use of operator overloading can easily lead to completely inscrutable code.

Debugging

Invariants are data consistency requirements
In general, we can use assert statements in our methods to make sure that the invariants hold
Two special cases are pre-conditions, which are invariant tests made just after a function is called (before any change has been made), and post-conditions, which are tests made just before a function returns (after all changes have been made).
Pre- and post-conditions are particularly useful in the context of objects. When placed at the start and end of methods, they (try to) guarantee that an object is always in a consistent state (both when we receive it and when we give it back).
Wide use of pre- and post-conditions helps developers detect inconsistencies early, and minimize red herrings that derive from propagation of bad data.

Homework

7.1 - convert 6.1 to use classes
7.2 - implement and use special method __add__