View on GitHub

practical-python

Contents | Previous (6.1 Iteration Protocol) | Next (6.3 Producer/Consumer)

6.2 Customizing Iteration

This section looks at how you can customize iteration using a generator function.

A problem

Suppose you wanted to create your own custom iteration pattern.

For example, a countdown.

>>> for x in countdown(10):
...   print(x, end=' ')
...
10 9 8 7 6 5 4 3 2 1
>>>

There is an easy way to do this.

Generators

A generator is a function that defines iteration.

def countdown(n):
    while n > 0:
        yield n
        n -= 1

For example:

>>> for x in countdown(10):
...   print(x, end=' ')
...
10 9 8 7 6 5 4 3 2 1
>>>

A generator is any function that uses the yield statement.

The behavior of generators is different than a normal function. Calling a generator function creates a generator object. It does not immediately execute the function.

def countdown(n):
    # Added a print statement
    print('Counting down from', n)
    while n > 0:
        yield n
        n -= 1
>>> x = countdown(10)
# There is NO PRINT STATEMENT
>>> x
# x is a generator object
<generator object at 0x58490>
>>>

The function only executes on __next__() call.

>>> x = countdown(10)
>>> x
<generator object at 0x58490>
>>> x.__next__()
Counting down from 10
10
>>>

yield produces a value, but suspends the function execution. The function resumes on next call to __next__().

>>> x.__next__()
9
>>> x.__next__()
8

When the generator finally returns, the iteration raises an error.

>>> x.__next__()
1
>>> x.__next__()
Traceback (most recent call last):
File "<stdin>", line 1, in ? StopIteration
>>>

Observation: A generator function implements the same low-level protocol that the for statements uses on lists, tuples, dicts, files, etc.

Exercises

Exercise 6.4: A Simple Generator

If you ever find yourself wanting to customize iteration, you should always think generator functions. They’re easy to write—make a function that carries out the desired iteration logic and use yield to emit values.

For example, try this generator that searches a file for lines containing a matching substring:

>>> def filematch(filename, substr):
        with open(filename, 'r') as f:
            for line in f:
                if substr in line:
                    yield line

>>> for line in open('Data/portfolio.csv'):
        print(line, end='')

name,shares,price
"AA",100,32.20
"IBM",50,91.10
"CAT",150,83.44
"MSFT",200,51.23
"GE",95,40.37
"MSFT",50,65.10
"IBM",100,70.44
>>> for line in filematch('Data/portfolio.csv', 'IBM'):
        print(line, end='')

"IBM",50,91.10
"IBM",100,70.44
>>>

This is kind of interesting–the idea that you can hide a bunch of custom processing in a function and use it to feed a for-loop. The next example looks at a more unusual case.

Exercise 6.5: Monitoring a streaming data source

Generators can be an interesting way to monitor real-time data sources such as log files or stock market feeds. In this part, we’ll explore this idea. To start, follow the next instructions carefully.

The program Data/stocksim.py is a program that simulates stock market data. As output, the program constantly writes real-time data to a file Data/stocklog.csv. In a separate command window go into the Data/ directory and run this program:

bash % python3 stocksim.py

If you are on Windows, just locate the stocksim.py program and double-click on it to run it. Now, forget about this program (just let it run). Using another window, look at the file Data/stocklog.csv being written by the simulator. You should see new lines of text being added to the file every few seconds. Again, just let this program run in the background—it will run for several hours (you shouldn’t need to worry about it).

Once the above program is running, let’s write a little program to open the file, seek to the end, and watch for new output. Create a file follow.py and put this code in it:

# follow.py
import os
import time

f = open('Data/stocklog.csv')
f.seek(0, os.SEEK_END)   # Move file pointer 0 bytes from end of file

while True:
    line = f.readline()
    if line == '':
        time.sleep(0.1)   # Sleep briefly and retry
        continue
    fields = line.split(',')
    name = fields[0].strip('"')
    price = float(fields[1])
    change = float(fields[4])
    if change < 0:
        print(f'{name:>10s} {price:>10.2f} {change:>10.2f}')

If you run the program, you’ll see a real-time stock ticker. Under the hood, this code is kind of like the Unix tail -f command that’s used to watch a log file.

Note: The use of the readline() method in this example is somewhat unusual in that it is not the usual way of reading lines from a file (normally you would just use a for-loop). However, in this case, we are using it to repeatedly probe the end of the file to see if more data has been added (readline() will either return new data or an empty string).

Exercise 6.6: Using a generator to produce data

If you look at the code in Exercise 6.5, the first part of the code is producing lines of data whereas the statements at the end of the while loop are consuming the data. A major feature of generator functions is that you can move all of the data production code into a reusable function.

Modify the code in Exercise 6.5 so that the file-reading is performed by a generator function follow(filename). Make it so the following code works:

>>> for line in follow('Data/stocklog.csv'):
          print(line, end='')

... Should see lines of output produced here ...

Modify the stock ticker code so that it looks like this:

if __name__ == '__main__':
    for line in follow('Data/stocklog.csv'):
        fields = line.split(',')
        name = fields[0].strip('"')
        price = float(fields[1])
        change = float(fields[4])
        if change < 0:
            print(f'{name:>10s} {price:>10.2f} {change:>10.2f}')

Exercise 6.7: Watching your portfolio

Modify the follow.py program so that it watches the stream of stock data and prints a ticker showing information for only those stocks in a portfolio. For example:

if __name__ == '__main__':
    import report

    portfolio = report.read_portfolio('Data/portfolio.csv')

    for line in follow('Data/stocklog.csv'):
        fields = line.split(',')
        name = fields[0].strip('"')
        price = float(fields[1])
        change = float(fields[4])
        if name in portfolio:
            print(f'{name:>10s} {price:>10.2f} {change:>10.2f}')

Note: For this to work, your Portfolio class must support the in operator. See Exercise 6.3 and make sure you implement the __contains__() operator.

Discussion

Something very powerful just happened here. You moved an interesting iteration pattern (reading lines at the end of a file) into its own little function. The follow() function is now this completely general purpose utility that you can use in any program. For example, you could use it to watch server logs, debugging logs, and other similar data sources. That’s kind of cool.

Contents | Previous (6.1 Iteration Protocol) | Next (6.3 Producer/Consumer)