Dynamic Programming Languages (DYPL)

Getting Started With Python

Please note that this introduction is made for Python 2.7

By popular demand from previous students, we have put together a small collection of exercises to get you started with Python and Ruby. We start with some of the basic stuff like getting to know the interactive environments and converting between different data types, and then move on to simple functions, classes, tools and a graphical user interface.

While you don't need to do exactly these exercises to pass the course, they are a good starting point to learn the language.

Familiarise Yourself with the Interactive Environments

Start the interactive Python interpreter by typing python on the command line. If installed, you might also want to try the IPython shell, started by issuing ipython at the terminal. (Cool features about IPython include tab-completion of method names.)

When Python starts, it will look something like this:

			Python 2.4.3 (#1, Dec  9 2006, 21:49:24) 
			[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
			Type "help", "copyright", "credits" or "license" for more 
			information.
			>>>

Now it is time to play around with the interactive language. If you have not seen something like this before, take ample time to be blown away. If your computer is set-up properly, you should have readline support, which means Emacs style commands should work fine.

Let's start by evaluating a few simple expressions and watch how the terminal behaves.

			1. 42
   		        2. "Hello World"
			3. x (this should raise an error)
			4. [] 
			5. [1, 2, 3, 4] 
			6. x = "Deeo" 
			7. x 
			8. y = { 1 : "First", "Second" : 2 }

Ok, that was a few Python literals and expressions. Use google or your books if you have trouble understanding what happens. If all else fails, note down questions for the Python tutoring session (T1). Then, let's do something a little more complicated.

			 1. 4+47 
			 2. 3245434*23423434 
			 3. 10/3 
			 4. 10/2.0 
			 5. 4**2 
			 6. "==" * 20 
			 7. 20 * "==" 
			 8. y

If y raises an error, you should type in y = { 1 : "First", "Second" : 2 } again. This creates a dictionary object (a hash table) with 1 and "Second" as keys and "First" and 2 as values mapped to by the respective keys. To get a feel for what you can do with dictionaries, you can use the dir BIF (BIF is the standard abbreviation for built-in-function) on it to list its method. Go ahead:

			 1. dir(y)

The result should look something like this:

			 ['__class__', '__cmp__', '__contains__', '__delattr__',
			 '__delitem__', '__doc__', '__eq__', '__ge__',
			 '__getattribute__', '__getitem__', '__gt__', '__hash__',
			 '__init__', '__iter__', '__le__', '__len__', '__lt__', 
			 '__ne__', '__new__', '__reduce__', '__reduce_ex__', 
			 '__repr__', '__setattr__', '__setitem__', '__str__',
			 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 
			 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 
			 'popitem', 'setdefault', 'update',
			 'values']

So, let's go ahead and use it.

			 1. y.keys()
			 2. y.values()
			 3. y[1]
			 4. y["Second"]
			 5. y["Second"] += 1
			 6. y["Second"]
			 7. y

This simple piece of code will iterate over all keys of y and print it corresponding values:

			 for key in y.keys():
			   print y[key]

A nicer way of printing the hashmap would be something like this:

			 for key in y.keys(): 
			   print key, " --> ", y[key]

Even though in this case it will make the code messier, let's take this opportunity to show off Python's string interpolation mechanism:

			   for key in y.keys(): 
			     print "%s --> %s" % ( key, y[key] )

Python lists and hash maps are extremely powerful, light-weight data structures. You'll find that a lot of time, there will be no need to create classes to keep data around. Lists and hash maps will do fine.

Before ending this installment, lets do this:

			 1. y.keys

What you get back when you omit the parentheses is a pointer to the method of the actual object. Thus you can do this:

			 1. m = y.keys
			 2. m()
			 3. len(m())

Last, let's examine the __doc__ attribute. Many Python objects have a __doc__ attribute that contains a documentation string for the actual thing. For example, the docstring for the keys() method is "D.keys() -> list of D's keys".

			 1. m.__doc__

Now you are ready to start converting between data types. Move along.

Converting Between Different Data Types

Python has the following built-in functions (BIFs) for data conversion. Conversion yields copies, it does not coerce the object in-place.

			 
			   * ord(char) (ordinal value for character)
			   * chr(int) (character for ordinal value)
			   * str(obj) (converts anything to a string, eq. with 
			   Java's toString())
			   * repr(obj) (returns the string representation as Python 
			   code)
			   * int(str, [base]) (converts a string to an int, e.g., 
			   from "23" to 23.
			   * float(str) (as above but for floats)
			   * long(str, [base]) (as above but for longs)
			   * hex(int) (convert integer to hexadecimal string)
			   * oct(int) (convert integer to octal string)
			   * eval(str) (convert from string to Python object, with

Where [...] are optional parameters.

Ordinals and Characters

Using the for loop construct as outlined above, write a small loop, still in the interpreter, for printing the ascii values of the letters A-Z. To our aid comes the Python BIF range that operates like this:

			 
			   >>> range(0,10)
			   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The for loop construct that we used above was a for each loop, so by use of ord, chr, range, and print, you should be able to get this output from a 2-line program, but for capital letters:

			 
			a has ord 97
			b has ord 98
			c has ord 99
			...
			y has ord 121
			z has ord 122

String Conversion and Eval

The eval BIF is powerful and dangerous. It takes a string and evaluates it as if it was Python code, returning the result. This means that you could for example read a text file from disk and execute it, thus:

			   f = open("textfile") # given that the file is
			                        # called exactly "textfile"
			   eval("f.read()")

Simple Functions

Still in the interactive Python environment, type (make sure to not omit the indentation for the third line):

			
			   MESSAGE = "Hello "
			   def greet(person):
			     print MESSAGE + person + "!"

This defines a function greet that takes a single argument, person and prints the standard greeting message. Now go ahead and call the method:

			
			   greet("Tobias")

The dynamic typing allows even this:

			   greet([])

which of course makes the program blow up in our faces. (Can you see how you can use the tricks of the previous section to make the code always work?)

Let's now define a classic method -- factorial:

			def fac(n):
			  if n:
 			    return n * fac(n-1)
			  else:
			    return 1

Try it with really big numbers. You'll be surprised. The greet function is still around, so lets do this:

			
			   MESSAGE = "The result is: "
			   greet(fac(27))

This wont work unless you didn't fix up the greet function to deal with whatever kind of data you feed it.

A very nice Python feature, which reminds me of Smalltalk in its readability is the use of parameter names in method calls allowing parameters to be passed in any order:

			   def example(fst, snd, thrd):
			     print fst, snd, thrd

			   example(snd = "dole", fst = "Ole", thrd = "doff")

Exception handling works much like expected (if you know Java):

			   def example(a, b):
			     try:
			       return a / b
			     except Exception:
 			       print "Division by zero"
			       return 0

			   example(4,5)
			   example(4,0)

Try to find out how to:

* Define an empty method (one that does nothing, i.e., has an empty body)
* Handle an arbitrary number of arguments (like e.g., print does)

Simple Utils

cat (simple file open/read)

The cat utility reads files sequentially, writing them to the standard output. Write your own cat utility that takes a number of file names as command line arguments and for each file writes it to standard out. If the -n option is specified, the lines should be properly numbered starting from 1 with each file. If the file is 800 lines long, line 1 should have number __1 (where _ are spaces).

Feel free to play around more with cat or do man cat to get more information on the util.

grep (regular expressions)

The command line tool grep is invoked like this:

			  grep PATTERN [FILE...]

When it runs, searches the named input FILEs (or standard input if no files are named, or the file name - is given) for lines containing a match to the given PATTERN. By default, grep prints the matching lines. The pattern should be a regular expression.

Re-using the skeleton code from cat, implement a grep utility of your own. Feel free to implement the additional bells and whistles that you find in grep too.

(What happens if PATTERN is a faulty regular expression?)

OS interaction

Rename

(This exercise will drill you more in the use of regular expressions, but also in the OS libraries for moving files etc.)

In many *nix systems there is a file renaming utility called rename written in Perl. The rename utility will allow more advanced batch file renaming. For example, to rename all files matching "*.bak" to strip the extension, you might say

			    rename 's/\.bak$//' *.bak

To translate uppercase names to lower, you'd use

			    rename 'y/A-Z/a-z/' *

Read up on regular expressions if the above commands are not clear to you. Hints: s/a/b/ substitutes all occurrences of a with b. Further, y/ab/12/ transliterates all occurrences of a with 1 and b with 2. $ matches the end of the line (or before newline at the end). A good place to read is in the man pages:

			    man rename
			    man perlre
			    man perlop

Comparing File Trees

Using the Python libraries, create a utility to compare the contents of two directories (including subdirectories). The utility should be invoked like this:

			    beefheart$ ./dircmp  Directory1 Directory2

where is either -b or --both for finding the set of files that are in both directories, or -u or --unique for finding the set of files in Directory2 that are not in Directory1.

Files are considered equivalent if they have the same name and size. If you want, you can extend the tool to also compute checksums for files of the same size and compare checksums to capture content. Or some other method.

Navigating the Python libraries is quite simple. Start looking at the [module index] and go from there. As static type information is lacking, you will find that the libraries look a bit different from, say [JavaDoc]. If you are intereted in how well dynamic libraries can be documented, the Apple Objective-C Cocoa libraries are a good place to start.

Regular Expressions

The Windows .ini file format is very simple. Each file has 0 or more clauses. A clause starts with the name on the form [NAME-OF-CLAUSE] followed by 0 or more key-value maps on the form key: value. Keys need not be unique, but have a precedence order -- a key that appears again will "override" the all previous ones.

Your task is to write a program that converts from .ini files to XML. For example:

			    [JAN GUILLOU]
			    arn: bad
			    hamilton: worse
			    fagerhult: great
			    arn: worst
			  
			    [PETER BRATT]
			    book: IB och hotet mot vår säkerhet
			    collaborations: Jan Guillou

Your program should generate this XML file (or equivalent):

Note that the duplicate key has been removed.

Classes

TBW

Class Dependency Graph

TBW

Parsing Java code with RE and create a dependency graph in DOT format.

GUI: Editor & Previewer

TBW

Build an editor and previewer for [ http://docutils.sourceforge.net/rst.html ]reStructuredText that renders to HTML and previews in a separate window. Try to handle errors too.

Misc

Write a Python program that parses a number of text files for words and counts the word frequency. The program should be started as follows:

			    ./wordfreq.pl infile1 infile2 infileN

The program should then keep track of how many times each word occurs in the files (you need not keep track of in which files the words were found.) The output of the program should be the unique set of all words encountered with each word on its own line. Following the word should be its frequency:

Sample input (contents of a text file):

Sample output:

			    Donec 1
			    Lorem 2
			    adipiscing 1
			    amet 2
			    consectetuer 1
			    dolor 2
			    elit 1
			    hendrerit 1
			    ipsum 2
			    sit 2
			    tellus 1
			    tempor 1

Addition

Rewrite the program to not use a standard Python collection, but make your own. I suggest you use a binary tree, where the tree is ordered on the words and the frequency is the "data". Make sure you overload all methods and operators you used before, so that you ideally only have to change the line where you first create the collection.

Other additions

You may want to add command line arguments to you program. A -o flag would perhaps be appropriate, this flag would specify an output file other than stdout. Something like this:

			    ./wordfreq.pl -o outfile infile1 infile2 infileN

Another flag could be one to have the output printed in ascending or descending order.

Hint: check out the getopt module.

Links

The following links might also be helpful (both for beginners and advanced users). Links collected by Louppe Gilles.

			    - (Official) Python documentation:
			      http://docs.python.org/index.html
			    - (Official) Python tutorial:
			      http://docs.python.org/tutorial/index.html
			    - (Official) Python standard library reference:
			      http://docs.python.org/library/
			    - PEP8 (Style Guide for Python):
			      http://www.python.org/dev/peps/pep-0008/
			    - Dive Into Python (a very practical introduction to Python):
			      http://www.diveintopython.org/