Introducing Programming a Different Way

Our quick introduction to Python is the module I'm least happy with, so I've been thinking about how to re-design it. I've included a new outline below; comments would be very welcome.


Programming is what you do when you can't find an off-the-shelf tool to do what you want

Why is programming hard to teach/learn?

We will teach basic programming by example

We will use Python

Will not start with multimedia programming, 3D graphics, etc.

We assume that you've done some programming, in some language, at some point

Before we dive in, what is a program?


Programs store data and do calculations

Put the following in a text file (not Word) and run it

# Convert temperature in Fahrenheit to Kelvin.
temp_in_f = 98.6
temp_in_k = (temp_in_f - 32.0) * (5.0 / 9.0) + 273.15
print "body temperature in Kelvin:", temp_in_k
body temperature in Kelvin: 310.15

Variable is a name that labels a value (picture)

Created by assignment

Usual rules of arithmetic: * before +, parentheses

Print displays values


Need to know it: use "5/9" instead of "5.0/9.0"

# Convert temperature in Fahrenheit to Kelvin.
temp_in_f = 98.6
temp_in_k = (temp_in_f - 32.0) * (5 / 9) + 273.15  # this line is different
print "body temperature in Kelvin:", temp_in_k
body temperature in Kelvin: 273.15

Run interpreter, try 5/9, get 0

Integer vs. float, and what division does

Automatic conversion: 5.0/9 does the right thing

[Box] Why are so many decimal places shown in 5.0/9


Need to know it: sometimes Python doesn't know what to do

# Try adding numbers and strings.
print "2 + 3:", 2 + 3
print "two + three:", "two" + "three"
print "2 + three:", 2 + "three"
2 + 3: 5 two + three: twothree 2 + three: Traceback (most recent call last): File "add-numbers-strings.py", line 5, in <module> print "2 + three:", 2 + "three" TypeError: unsupported operand type(s) for +: 'int' and 'str'

In this case, "2three" would be sensible

But what about "1" + 2?

On your own, try "two" * 3


Back to useful things

Computers are useful because they can do lots of calculations on lots of data

Which means we need a concise way to represent multiple values and multiple steps

# Find the mean.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
total = 0
number = 0
for value in data:
    total = total + value
    number = number + 1
mean = total / number
print "mean is", mean
mean is 2

Use list to store multiple values

Use loop to perform multiple operations

Can trace execution step by step manually or in a debugger


Did you notice that the result in the example above is wrong?

Problem is that total starts as an integer, we're adding integers, we wind up doing int/int (again)

Could fix it by initializing total to 0.0

Or use a function to do the conversion explicitly

# Find the mean.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
total = 0
number = 0
for value in data:
    total = total + value
    number = number + 1
mean = float(total) / number   # this line has changed
print "mean is", mean
mean is 2.77777777778

Functions do what they do in mathematics

Spend a whole chapter on them, since they're key to building large programs

Right now, most important lesson is that just because a program runs, doesn't mean it's correct


Need to know it: the len function

# Find the mean.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
total = 0
for value in data:
    total = total + value
mean = float(total) / len(data) # this line has changed
print "mean is", mean
mean is 2.77777777778

Need to know it: list are mutable

# Calculate running sum by creating new list.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
result = []
current = 0
for value in data:
    current = current + value
    result.append(current)
print "running total:", result
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]

Start with the empty list

result.append is a method


How to double the values in place?

# Try to double the values in place.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
for value in data:
    value = 2 * value
print "doubled data is:", data
doubled data is [1, 4, 2, 3, 3, 4, 3, 4, 1]

New values are being created, but never assigned to list elements

Easiest to understand with a picture


Need to know it: list indexing

Mathematicians use subscripts, we use square brackets

Index from 0..N-1 rather than 1..N for reasons that made sense in 1970 and have become customary since

# Try to double the values in place.
data = [1, 4, 2]
data[0] = 2 * data[0]
data[1] = 2 * data[1]
data[2] = 2 * data[2]
print "doubled data is:", data
doubled data is [2, 8, 4]

Clearly doesn't scale...

Need to get all the indices for a list of length N


The range function produces a list of numbers from 0..N-1

You will almost never be the first person to need something

# Double the values in a list in place
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
length = len(data) # 9
indices = range(length) # [0, 1, 2, 3, 4, 5, 6, 7, 8]
for i in indices:
    data[i] = 2 * data[i]
print "doubled data is:", data
doubled data is: [2, 8, 4, 6, 6, 8, 6, 8, 2]

Fold this together by combining function calls (like \sqrt{sin(x)})

# Double the values in a list in place.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
for i in range(len(data)):
    data[i] = 2 * data[i]
print "doubled data is:", data
doubled data is: [2, 8, 4, 6, 6, 8, 6, 8, 2]

Usually won't type in our data

Store it outside program

# Count the number of lines in a file
reader = open("data.txt", "r")
number = 0
for line in reader:
    number = number + 1
reader.close()
print number, "values in file"
9 lines in file

What about mean?

# Find the mean.
reader = open("data.txt", "r")
total = 0.0
number = 0
for line in reader:
    total = total + line
    number = number + 1
reader.close()
print "mean is", total / number
Traceback (most recent call last): File "mean-read-broken.py", line 7, in <module> total = total + line TypeError: unsupported operand type(s) for +: 'float' and 'str'

Data in file is text, so we need to convert


# Find the mean.
reader = open("data.txt", "r")
total = 0.0
number = 0
for line in reader:
    value = float(line)
    total = total + value
    number = number + 1
reader.close()
print "mean is", total / number
mean is 2.77777777778

Notice that we're using the original program as an oracle


Real-world data is never clean

Count how many scores were not between 0 and 5

# Count number of values out of range.
data = [0, 3, 2, -1, 1, 4, 4, 6, 5, 5, 6]
num_outliers = 0
for value in data:
    if value < 0:         num_outliers = num_outliers + 1     if value > 5:
        num_outliers = num_outliers + 1
print num_outliers, "values out of range"
3 values out of range

Need to know it: combine tests using and and or

# Count number of values out of range.
data = [0, 3, 2, -1, 1, 4, 4, 6, 5, 5, 6]
num_outliers = 0
for value in data:
    if (value < 0) or (value > 5):
        num_outliers = num_outliers + 1
print num_outliers, "values out of range"
3 values out of range

Need to know it: in-place operators

# Count number of values out of range.
data = [0, 3, 2, -1, 1, 4, 4, 6, 5, 5, 6]
num_outliers = 0
for value in data:
    if (value < 0) or (value > 5):
        num_outliers += 1
print num_outliers, "values out of range"
3 values out of range

Don't actually "need" to know it

But it's a common idiom in many languages


Data cleanup

# Report where values are not monotonically inreasing
data = [1, 2, 2, 3, 4, 4, 5, 6, 5, 6, 7, 7, 8]
for i in range(2, len(data)):
    if data[i] < data[i-1]:
        print "failure:", i
    i = i + 1
failure: 8

Group by threes

# Combine successive triples of data.
data = [1, 2, 2, 3, 4, 4, 5, 6, 5, 6, 7, 7, 8]
result = []
for i in range(0, len(data), 3):
    sum = data[i] + data[i+1] + data[i+2]
    result.append(sum)
print "grouped data:", result
Traceback (most recent call last): File "group-by-threes-fails.py", line 6, in <module> sum = data[i] + data[i+1] + data[i+2] IndexError: list index out of range

13 values = 4 groups of 3 and 1 left over

First question must be, what's the right thing to do scientifically?


Let's assume, "Add up as many as are there"

# Combine successive triples of data.
data = [1, 2, 2, 3, 4, 4, 5, 6, 5, 6, 7, 7, 8]
result = []
for i in range(0, len(data), 3):
    sum = data[i]
    if (i+1) < len(data):
        sum += data[i+1]
    if (i+2) < len(data):
        sum += data[i+2]
    result.append(sum)
print "grouped data:", result
grouped data: [5, 11, 16, 20, 8]

But this is clumsy


How do we add up the first three, or as many as are there?

Don't want to have to keep modifying the list as we try out ideas

So use a list of lists.

# Add up the first three, or as many as are there.
test_cases = [[],                     # no data at all
              [10],                   # just one value
              [10, 20],               # two values
              [10, 20, 30],           # three
              [10, 20, 30, 40]]       # more than enough

for data in test_cases:
    print data
[] [10] [10, 20] [10, 20, 30] [10, 20, 30, 40]

Can now try all our tests by running one program


Back to our original problem: sum of at most the first three

# Sum up at most the first three values.
test_cases = [[],                     # no data at all
              [10],                   # just one value
              [10, 20],               # two values
              [10, 20, 30],           # three
              [10, 20, 30, 40]]       # more than enough

for data in test_cases:
    limit = min(3, len(data))
    sum = 0
    for i in range(limit):
        sum += data[i]
    print data, "=>", sum
[] => 0 [10] => 10 [10, 20] => 30 [10, 20, 30] => 60 [10, 20, 30, 40] => 60

That looks right


Need one more tool: nested loops

# Loops can run inside loops.
for i in range(4):
    for j in range(i):
        print i, j
1 0 2 0 2 1 3 0 3 1 3 2

Easiest to understand with a picture


Final step: instead of starting at zero every time, start at 0, 3, 6, 9, etc.

Don't need to test everything (which is why we skip from 40 to 60 to 80)

# Sum up in groups of three.
test_cases = [[],
              [10],
              [10, 20],
              [10, 20, 30],
              [10, 20, 30, 40],
              [10, 20, 30, 40, 50, 60],
              [10, 20, 30, 40, 50, 60, 70, 80]]

for data in test_cases:
    result = []
    for i in range(0, len(data), 3):
        limit = min(i+3, len(data))
        sum = 0
        for i in range(i, limit):
            sum += data[i]
        result.append(sum)
    print data, "=>", result
[] => [] [10] => [10] [10, 20] => [30] [10, 20, 30] => [60] [10, 20, 30, 40] => [60, 40] [10, 20, 30, 40, 50, 60] => [60, 150] [10, 20, 30, 40, 50, 60, 70, 80] => [60, 150, 150]

Understand this in pieces

Outer for loop is selecting a test case

Inner loop is going in strides of three

limit is as far as we can go toward three values up from i


Human beings can only keep a few things in working memory at once

How we actually understand this program is:

for data in test_cases:
    result = sum_by_threes(data)
    print data, "=>", result

to sum_by_threes given a list data:
    result = []
    for i in range(0, len(data), 3):
        limit = min(i+3, len(data))
        sum = sum_from(data, i, limit)
        result.append(sum)

to sum_from given a list data, and start and end indices:
    sum = 0
    for i in range(start, end):
        sum += data[i]

The computer doesn't care one way or another

But what we need is a way to write our programs in pieces, then combine the pieces

That's the subject of the next chapter

Originally posted 2011-08-08 by Greg Wilson in Content.

comments powered by Disqus