Originally Published: Wednesday, 13 September 2000 Author: Jason Tackaberry
Published to: develop_articles_tutorials/Development Tutorials Page: 1/1 - [Printable]

Programming with Python - Part 2: The Real World

The title of the first part of this series is Baby Steps, and that's really all it is. It didn't arm you with enough knowledge to jump into the deep end and get coding with Python. Hopefully, though, it piqued your curiousity. In this part, The Real World, we're going to put Python to work, doing useful things that you might need to do in any real-world project.

   Page 1 of 1  

Programming with Python series
1. Baby Steps
2. The Real World
3. Extending Python

In Part 1, I introduced you to the basics of Python, including its syntax, the basic constructs, classes, and exceptions. If you haven't read part 1 yet, or need a refresher, it might be a good idea to have a look. As with the previous installment of this tutorial, I'll draw on some of your past programming experience, especially with Perl. If you're still a beginner with little or no experience, you needn't worry too much. In most cases you shouldn't need to thoroughly understand the comparisons I make with other languages.

The title of the first part is Baby Steps, and that's really all it is. It didn't arm you with enough knowledge to jump into the deep end and get coding with Python. Hopefully, though, it piqued your curiousity. In this part, The Real World, we're going to put Python to work, doing useful things that you might need to do in any real-world project.

Before we look at any useful code, though, we need to build on your Python foundation just a bit more...

Down Periscope!

Understanding the rules of scope is a necessary evil, especially because scope rules work differently in Python than in languages like Perl or C. A scope is a "region" where attributes (or any name mapped to an object) are directly accessible. Before defining the semantics of scopes in Python, it might be worth looking at an example. Consider the following Perl code:

  my $a = 1;
  if ($a) {
    print "$a\n";
    my $a = 2;
    if ($a) {
      print "$a\n";
      my $a = 3;
      print "$a\n";
    }
    print "$a\n";
  }
  print "$a\n";

This code will output:

  1
  2
  3
  2
  1

The above code shows that, in Perl, you can nest scopes. In the innermost scope, $a is given the value 3. As we traverse back up through the nested scopes, $a has the value given to it in that scope. The key here is that $a doesn't get reassigned values as it moves back through the scopes. Each of the three are different variables altogether.

Now let's rewrite that code in Python, and have a look at what it displays:

  a = 1
  if a:
    print a
    a = 2
    if a:
      print a
      a = 3
      print a
    print a
  print a

And the output is:

  1
  2
  3
  3
  3

In the Python code, the a variable in the innermost scope is the same as a in the scope above it, which is the same as a in the scope above that. These are the same variable name, and the same object. There are no nested scopes in this way; the code above is executed in only one scope. For you Perl programmers, the Python version works the same way as if the my keyword were removed from all the $a declarations. In short, there are no nested scopes for local variables.

While you can't introduce nested scopes in Python, there are already 3 implicitly nested scopes. At the top is the scope containing all of Python's built-in names (functions and exceptions); next, the current module's global names; and finally, the inner most scope, the local names. These scopes are searched outwardly beginning from the inner most scope, as you might expect.

In order to access a scope which is not one of the three above, we must explicitly reference its namespace. Namespaces contain mappings for names to objects. For example, in part 1, we made use of string.split. The split function is part of the string module. To access it, we must prepend the module name.

To drive home the notion of scopes and namespaces, let's look at this little snippet:

  
  msg = "5 out of 4 people have trouble with fractions."
  print len(msg)
  len = 3
  print len
  import __builtin__
  print __builtin__.len(msg)

Outputs:

  46
  3
  46

The first line merely assigns a string object to the variable msg. Next, we call the function len, which calculates the length of an object (such as a tuple, list, or string). Since len is in neither the current nor the global scope, the built-in namespace is checked. Lo and behold, len is a built-in function, and this is what is invoked. In the line that follows, however, we define len to be 3. We don't redefine len, per se. We are introducing a new attribute into the local namespace with an integer value. So, as we'd expect, in the next line, when we print len, we see the value 3. What do we do if we need to access the built-in len function? Well, the smart thing to do would be to choose a different variable name so that we avoid this conflict. Barring that, however, we must access the namespace directly. All built-ins are part of the __builtin__ module. When we import __builtin__, we add this module to our local namespace under the name __builtin__. As we see in the next line, calling __builtin__.len accesses the built-in len function.

I know, you're getting sick of scopes and namespaces, but we're almost finished. When a function is called, the variables used in that function are local only to that function (because they are in the local scope). But what happens you need to access a global variable from a function? You need to use the global keyword to indicate that variable exists in the global scope. With any luck, this next example should make this point clear:


  a = 1      # a exists in the global scope

  def set():
    a = 2      # define a new local variable
    print a      # This will output "2"
    global a      # Indicate that a is a global variable
    print a      # This will output "2"
    a = 3      # The global a is now set to 3

  set()

The Meek Shall Inherit

In Part 1, we looked at Python's object-oriented features, and touched on classes. I conveniently skated around inheritance though, because I felt it was too much to throw on your shoulders this early in the game. But the time has come. In order for a language to be object-oriented, it must support (among other things) inheritance.

In object-oriented programming, inheritance introduces the notion of parent or base classes, and child or derived classes. Child classes are derived from parent classes, and in the process, inherit all of the parents' attributes and methods. This allows a parent to implement a skeleton with functionality that is common to a group of classes, and a child class can derive its parent and flesh out functionality specific to the child.

An example I commonly use when describing inheritance is a shape class. There are many shapes: circles, triangles, rectangles, squares, ovals, polygons, etc. One could create a base class called Shape, and derive each of the specific shapes from this class. But this example gets even more interesting because you'll notice that a square is just a special case of a rectangle, and an circle is just a special case of an oval. So we could derive Circle from Oval, and Square from Rectangle.

If we're going to start with the base Shape class, we need to think about what operations are common to all shapes. How about two operations: area and perimeter. Do the implementations of the operations for these shapes have any commonality? If you passed grade 7 math, you answered no. At this point, if this were a C++ tutorial, we'd start to build a virtual base class Shape. Or, if we were talking about Java, we might prefer to use an interface. But because Python is dynamically typed and polymorphism comes so easily, we don't need a base class Shape for this example to work. If there were some commonality in the implementation of each of the shapes we might choose to create a base Shape class. (And even if there weren't, one could make a strong case in favor of it simply because it allows us to perform type checking. This is more of a software engineering issue, however.)

All this talk and no code makes things pretty confusing, so let's not waste any more time. Let's implement a square, whose parent class will be a rectangle:

  class Rectangle:
    def __init__(self, w, h):
      (self.w, self.h) = (w, h)

  def area(self):
    return self.w * self.h

  def perimeter(self):
    return (2 * self.w) + (2 * self.h)

  class Square(Rectangle):
    def __init__(self, side):
      Rectangle.__init__(self, s ide, side)

We see that class Square is defined differently than Rectangle. The object in the parens specifies the base class for this class. Notice that the parameter list for the constructors of the two classes are different. They needn't be the same, and in this case, it doesn't make sense to specify more than one side for the Square object. The constructor for Square calls its parent class's constructor, passing the side argument for both the width and height. And because the formulas for calculating the area and perimeter for a square are the same as a rectangle (where w = h), the square class needn't reimplement these methods. It should be easy for you to imagine what the implementation of Oval and Circle might be.

Java, Objective-C, and many other languages introduce the notion of interfaces, or protocols. A class that implements the protocol must implement all the methods defined in that protocol. In our example, it might make sense define a protocol that declares the area and perimeter methods. For a statically typed language, this sort of practice (or something similar to it, like an abstract base class) is required. In a language that uses late bindings, like Python, this practice isn't required. It's left to the programmer to ensure that classes conform to some protocol.

Still, even in our example, it is possible to validate that an instance implements certain methods. We won't worry about that right now, since it requires using some special attributes that we haven't looked at yet. Let's write a function that returns the area of any of our shapes:

  def get_shape(shape):
    return shape.area()

Admittedly, this is a silly example. After all, why would anyone call get_shape(myshape) when they can just do myshape.area() themselves? It does at least help to illustrate the magic of late binding and polymorphism.

As for multiple inheritance: yes, Python supports it, to some degree. In most cases, it's best to avoid it, because it's easy to fall victim to its complexities. Besides, if you're ready to start using multiple inheritance, you won't need me to explain it to you.

Getting in Touch with Your Inner Self

If you're coming from strictly a C++ background, the concept of introspection may seem a little foreign to you. Introspection, also called reflection, is a way of querying objects about their state or metadata. Introspection is commonly used for discovering what methods an instance implements, or determining the inheritance tree of some instance. Introspection is an amazingly useful tool. After using Python's introspective capabilities, you will find yourself missing them dearly if you have cause to use C++.

Most objects have special attributes that are used for introspective purposes. These attributes usually begin and end with two underscores (__), for example, __dict__. __dict__ is a commonly used attribute, which is available in objects for modules, classes, and instances. __dict__ is a dictionary that holds all of an object's attributes, or in the case of a module, its namespace. Let's have a look the output of an interactive Python session, after importing the code from the shapes example in the last section:

  >>> Square.__dict__
  {'__init__': , '__doc__': None, '__module__': 't'}

Notice that Square's dictionary doesn't contain entries for the area and perimeter methods. That's because those methods are really defined in the Rectangle class:

  >>> Rectangle.__dict__
  {'area': , '__init__': , '__doc__': None, '__module__': 't', 'perimeter': }

Given an instance of a Square class, how could we determine all the methods of its class and its parent classes? Instance objects have a special attribute __class__ that returns the class object from whence it was created. And class objects have a special attribute __bases__ that returns a tuple of all the base classes of that class. (This will normally be a tuple of size one unless multiple inheritance is used.) So, putting all this together, a function to gather a list of all methods available to an instance might look like:

  def get_methods(o):
    import types

    # If a class object is passed, compile a list of all functions in
    # this class all the way up the heirarchy tree.
    if type(o) == types.ClassType:
      functions = []
      # Loop throguh all attributes in this class
      for name in o.__dict__.keys():
        attr = o.__dict__[name]
        # Append the attribute to the list if it's a function
        if type(attr) == types.FunctionType:
          functions.append(attr)
      # Work through this class's base classes and add to the list
      for base in o.__bases__:
        functions = functions + get_methods(base)
      return functions

    # If it's an instance object passed ...
    elif type(o) == types.InstanceType:
    # First fetch a list of the functions for the instance's class
    functions = get_methods(o.__class__)
    # Now compile a list of methods
    methods = []
    for function in functions:
      # Get the instance's method object for the function name
      method = getattr(o, function.__name__)
      # Add it to the list if it doesn't already exist (we need to do
      # this because the functions list will contain duplicates for
      # overridden methods such as __init__
      if methods.count(method) == 0:
        methods.append(method)
      # Return the list
    return methods

The above function may look a little lengthy and confusing. The truth is that whenever you deal with Python's special object attributes, things look a little bizarre at first. The notation (__somename__) was chosen for good reason, however: when you refer to an attribute in this form, it's pretty clear that something "out of the ordinary" is going on. As for the length, this function could be trimmed down quite a bit, but I felt it was necessary to keep it as explicit as possible for this example.

One thing you may have noticed was the use of the types module. In Python, while implicit type checking is not performed, you have to the ability to check an object's type by using the built-in function type, and comparing it to one of the values in the types module.

Special Class Methods

Classes have a number of special methods which can be used to override the behavior for certain actions, such as getting or assigning attributes, instantiation and deletion, and others. These special methods are in the form -- yep, you guessed it -- __method__. Let's suppose we want to override these methods to change the way its answer attribute is handled:

  class Foo:
    def __getattr__(self, name):
      if name == "answer":
        return 42
      return self.__dict__[name]

    def __setattr__(self, name, value):
      if name == "answer":
        raise "Eeegads!", "You can't change The Answer! There's only one!"
      else:
        self.__dict__[name] = value

When one tries to assign a value to answer for any Foo instance, it will raise an exception. After all, everyone knows The Answer is always 42, so there's no point in trying to change it.

Other special methods include: __call__, invoked when the instance is called as if it were a function; __cmp__, invoked when a comparison operation is performed on the object; __hash__, invoked when a dictionary operation is performed on the object; and many more. Naturally, one can also override operators: __mul__, for multiplication; __pow__, for power operations; __invert__, bitwise inversion; including all the others.

A Python Quick Reference is a very useful page, which briefly lists all these operators, and pretty much everything else you'd need to quickly look up. Any Python programmer would benefit from bookmarking this page.

That's a Dilly of a Pickle!

If you've made it this far, you already know a great deal about Python. Now you're ready to make Python work for you. If you're impressed with Python at this point, you're about to be even more amazed. The area where Python really shines is its vast library of modules.

We're going to be sampling but a few useful modules next. The first pick of the litter is Python's Pickle module. The first time I read about pickling in Python, I exclaimed aloud, "That's so cool!" The pickle module lets us convert almost any Python object to a stream of bytes that we can store to a file or database, or send over a network. This process is also commonly called serializing or marshalling. In this way, pickling provides a somewhat rudamentary way to implement persistent objects.

The best part about the pickle module is the magical way in which it can handle nearly any object you throw at it. It's smart enough to deal with recursive object references (for example, self.me = self), user-defined classes and instances, and multiple references to the same object (so it won't marshal multiple copies). There are only a few caveats (pickle is unable to handle instance methods, for instance), but fortunately these are usually easy to get around.

The Python library offers a module called cPickle, which is an implementation of pickle written in C. According to the documentation, cPickle can be up to 1000 times faster than its native Python counterpart. As an example, let's pickle an instance of our Square class, from above:

  square = Square(5)
  import cPickle
  cPickle.dump(s, open("squareobj", "w"))
  new_square = cPickle.load(open("squareobj"))

First we create an instance of Square that we want to pickle. After importing the cPickle module, we invoke its dump function. dump takes 3 arguments: the object to be pickled, a file object where the output is directed, and an optional flag used to indicate whether the output should be text or binary (in its absence, it will default to text). Once we have the file that holds the pickled data, we can unpickle it by calling the load function, which requires a file object pointing to the pickled data as an argument.

This module also has two other functions, dumps and loads, which can pickle and unpickle data from strings. These functions would be useful if, for example, you want to store the pickled data to a database or send it over a network.

Nothing Regular About It!

While Perl has regular expressions built directly into its syntax, Python offers this functionality as a module. Python's re module provides regular expression operations that will make the Perl programmer feel right at home.

With the re module, regular expressions may be compiled before they are used for matching. First compiling the expressions improves performance in cases where the expression is frequently used. Compiling results in a regular expression object. Regular expression objects then provide methods for matching or searching (a subtle distinction explained below), splitting, or substituting. These methods return match objects, which you can then query to determine matched subgroups.

While the expression syntax itself is the same as Perl, as you can see actually using regular expressions is much different than in Perl. So, we'll first have a look at an example in Perl, and reimplement it in Python. First, the Perl code:

  $text = "This is a sample message";
  print $1 if $text =~ /is a (\w+)/;

The output of this snippet is sample. Let's see what that looks like in Python:

  text = "This is a sample message"
  import re
  exp = re.compile("is a (\w+)")
  match = exp.search(text)
  if match:
    print match.group(1)

This also outputs sample. Once we compile the expression, we apply its search method against the text. The result, if the search succeeded, is a match object. The match object provides several methods, but in particular interest to us is the group method, which returns substrings obtained from the search, starting at 1. The 0th substring returns the entire matched result.

You can see using regular expressions in Python is quite a bit more involved than in Perl. This is to be expected when you consider that Perl incorporates this ability directly into the language (and that Perl is generally very succinct). In our example, though, we went through the trouble of compiling the expression first. This isn't required; the re module offers the same methods as a match object, except that they take an additional argument containing the expression. We could rewrite the above example as:

  match = re.search("is a (\w+)", text)

This is only recommended if the expression is only used once, otherwise subsequent uses will need to recompile the expression. It is particularly inefficient to compile the same expression in a loop, especially if the expression is complicated. Instead, compile the expression once, and use the regular expression object in the loop.

Regular expression objects also provide a match method. For all practical purposes, match is the equivalent to search whose regular expression begins with a ^. (^ denotes the beginning of the line.) So, these two lines return the same result:

  re.match("This is a (\w+)", text)
  re.search("^This is a (\w+)", text)

Similarly, these lines return the same result:

  re.match(".*is a (\w+)", text)
  re.search("is a (\w+)", text)

The difference is how these two approaches are implemented under the covers. The first line in each of the examples above is always favored over the other because it is more efficient.

The Python website has a great HOWTO on how to use regular expressions. If there's something the library documentation doesn't make clear to you, check it out.

If a Packet Hits a Pocket on a Socket On a Port ...

Dr. Seuss humor aside, any application that wants to work on the Internet needs to use a socket API of some sort. Python has an assortment of modules that can interface with many protocols, including FTP, HTTP, IMAP, gopher, NNTP, SMTP, POP, and the list goes on. There are also lower level APIs for dealing with sockets, including asyncore, and the socket module, which provides all the Unix system calls for dealing with sockets.

The asyncore module is worth taking a closer look at because it provides enough access to sockets in order to implement your own protocol, but is high-level enough to abstract many of the annoyances one normally has to deal with when using the low-level sockets API, such as asynchronous sockets. asyncore implements non-blocking I/O and multiplexes so that it is able to handle multiple connections without implementing threads or forking. (It uses select to do this.)

In our example, we'll create a simple time server that accepts connections and returns the time. As a twist (and granted, not a very practical one), we'll use the cPickle module we talked about, and pickle the time value (which will be a float object) over the network.

  import asyncore, cPickle, socket, time

  # Create a new class derived from the asyncore module's dispatcher
  class TimeServer(asyncore.dispatcher):

    def __init__(self):
      asyncore.dispatcher.__init__(self)
      # Create our a socket and listen on port 4567
      self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
      self.bind( ('', 4567) )
      # Accept up to 5 simultaneous connection attempts at once
      self.listen(5)

    # This function is called by the asyncore dispatcher when a
    # connection comes in
    def handle_accept(self):
      # Accept the connection. accept() returns a tuple containing the
      # socket object and the address from whence the connection came.
      (socket, addr) = self.accept()
      # Pickle the time into a string
      data = cPickle.dumps(time.time())
      # Send the pickled data over the wire
      socket.send(data)
      # Now close the socket for this connection
      socket.close()

  # Create a new instance of the time server. This registers itself with
  # the dispatcher
  TimeServer()
  # Begin the asyncore main loop.
  asyncore.loop()

Starting the server causes it to sit and accept connections on port 4567. For each connection, it sends the pickled object holding the current time, and closes the connection. Now let's see the client side:

  import asyncore, cPickle, socket, time

  class TimeClient(asyncore.dispatcher):

    def __init__(self, host):
      asyncore.dispatcher.__init__(self)
      # Create a socket and connect to the specified host on port 4567
      self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
      self.connect( (host, 4567) )

    # Override the log method to suppress logging messages. The asyncore
    # module calls this method when certain events happen; it's useful
    # for debugging purposes.
    def log(self, msg):
      pass

    # This function is called by the asyncore dispatcher when data is
    # waiting to be read on the socket.
    def handle_read(self):
      try:
        # Read up to 1500 bytes (which is more than adequate for our purposes)
        data = self.recv(1500)
        # Unpickle the time
        t = cPickle.loads(data)
        # Display the result in a human-readable format
        print "The time server returned:", time.ctime(t)
      except:
        print "Time server disconnected unexpectedly"
      self.close()

  # Create the client object; again, this registers itself with the asyncore
  # dispatcher
  TimeClient("localhost")
  # Start the asyncore main loop
  asyncore.loop()

While not complete in every detail, the client and server above are perfectly functional. The server may handle multiple connections at once, and all the details are handled by the asyncore dispatcher. This example may be a bit over-simplified; almost every protocol requires some sort of continuous chat-like communication between the client and server, which isn't required by this protocol. You should be able to see how you might implement this sort of functionality from the code above. It's not much more complicated at all.

The Magic XML Pixie Dust

XML is one of the buzzwords of the year. Many companies these days seem to tout XML has "a technology that will enable us to deliver premium solutions leveraging open standards." XML won't solve hunger in third world countries or promote world peace (well okay, it might promote world peace), but it is proving itself to be very useful and is gaining considerable popularity.

I once read someone say in sarcasm that to solve almost any problem, all one needs to do is wave the magic XML pixie dust on it. Python provides one way for you to wave that magic pixie dust in the form of xmllib. Python 2.0 will provide much more flexible XML support, including a SAX parser (finally!), but since 1.6 is the latest release as of this writing, we'll focus on what it supports, which happens to be xmllib.

In order to parse some XML, we'll need an example to work with. Because XML lets us use our own tags and attributes, let's make one up:

  <people>
    <person name="Homer Simpson">
      <address label="home">742 Evergreen Terrace</address>
      <address label="work">Springfield Nuclear Power Plant</address>
      <occupation>Safety Inspector</occupation>
      <relative name="Bart Simpson" relation="son"/>
    </person>

    <person name="Bart Simpson">
      <address label="home">742 Evergreen Terrace</address>
      <address label="school">Springfield Elementary</address>
      <relative name="Homer Simpson" relation="father"/>
    </person>
  </people>

The xmllib module provides a class called XMLParser that must be subclassed (derived) and built to handle certain XML tags. When an opening tag is encountered, the start_tagname method is called, where tagname is the name of the tag, such as person, or address. Similarly, when a tag ends, the end_tagname method is invoked. If these methods don't exist for the tag in question, then the unknown_starttag and unknown_endtag methods are called. For data outside any tag, the method handle_data is called for each byte of data.

xmllib won't build a tree of nodes for you; however you handle the data it parses it up to you. The code below merely displays the data handled by xmllib:


  import xmllib, string

  class PeopleParser(xmllib.XMLParser):
    def __init__(self):
      xmllib.XMLParser.__init__(self)
      self.data = ""

    def start_person(self, attrs):
      print "Begin new person:", attrs["name"]

    def start_address(self, attrs):
      print "New address for", attrs["label"]

    def end_address(self):
      print " ", string.strip(self.data)
      self.data = ""

    def handle_data(self, char):
      self.data = self.data + char

    def unknown_endtag(self, tag):
      self.data = string.strip(self.data)
      if len(self.data):
        print "Tag", tag, "has data:", self.data
        self.data = ""

    def unknown_starttag(self, tag, attrs):
      if len(attrs):
        print "Start", tag
      for key in attrs.keys():
        print " ", key, "=", attrs[key]


  # Create a new instance of the parser
  parser = PeopleParser()
  # Open the xml file and feed it in, line by line
  for line in open("people.xml").readlines():
    parser.feed(line)

The output generated by this example is:

  Begin new person: Homer Simpson
  New address for home
    742 Evergreen Terrace
  New address for work
    Springfield Nuclear Power Plant
  Tag occupation has data: Safety Inspector
  Start relative
    name = Bart Simpson
    relation = son
  Begin new person: Bart Simpson
  New address for home
    742 Evergreen Terrace
  New address for school
    Springfield Elementary
  Start relative
    name = Homer Simpson
    relation = father

This parser explicitly handles the person and address tags. When any of the other tags are encountered, the unknown_starttag method is called, which prints the name of the tag and the attributes passed. In order to be useful, this parser would need to keep track of what it read. For example, we may want to create a Person class to represent an XML person tag. When a start tag is encountered for person, it could create a new instance of this class, and append this instance to a list.

The XML parser provided by xmllib doesn't do a great deal of hand-holding, but it provides enough convenience to be quite useful. The XML support in Python 2.0 should prove to be much more extensive and flexible.

But Wait, There's More!

Parts 1 and 2 covered quite a bit of ground. If you've managed to swallow everything you've read so far (and keep it down), then you should be in pretty good shape for parts 3 and 4. With any luck, you're now eager to throw your next problem at Python. I'm sure it's up to the challenge, if you are.

The next part of this series will cater to the hacker in you. We're going to slice open Python and see how it ticks inside, at least from the perspective of extending it. You'll receive an overview of reference counting, how to create new Python types, and an introduction to the Python/C API. We'll also wrap parts of gnome-xml, and expose some functionality to Python as a practical example.

Of course, if you have specific questions or suggestions for me to address in the coming parts, please let me know.

Jason Tackaberry (tack@linux.com) works in Ontario, Canada as a Unix/Network administrator. He is the author of ORBit-Python, Python bindings for ORBit, and several soon to be released projects. Having over 12 years of development experience in C and C++, and hacking with Perl for 4 years, he has turned to Python as his new favorite language.





   Page 1 of 1