Our quick introduction to Python is the module I'm least happy with, so I've been thinking about how to re-design it. I've included a new outline below; comments would be very welcome.
Programming is what you do when you can't find an off-the-shelf tool to do what you want
Why is programming hard to teach/learn?
We will teach basic programming by example
We will use Python
Will not start with multimedia programming, 3D graphics, etc.
We assume that you've done some programming, in some language, at some point
Before we dive in, what is a program?
Programs store data and do calculations
Put the following in a text file (not Word) and run it
# Convert temperature in Fahrenheit to Kelvin.
temp_in_f = 98.6
temp_in_k = (temp_in_f - 32.0) * (5.0 / 9.0) + 273.15
print "body temperature in Kelvin:", temp_in_k
body temperature in Kelvin: 310.15
Variable is a name that labels a value (picture)
Created by assignment
Usual rules of arithmetic: * before +, parentheses
Print displays values
Need to know it: use "5/9" instead of "5.0/9.0"
# Convert temperature in Fahrenheit to Kelvin.
temp_in_f = 98.6
temp_in_k = (temp_in_f - 32.0) * (5 / 9) + 273.15 # this line is different
print "body temperature in Kelvin:", temp_in_k
body temperature in Kelvin: 273.15
Run interpreter, try 5/9
, get 0
Integer vs. float, and what division does
Automatic conversion: 5.0/9
does the right thing
[Box] Why are so many decimal places shown in 5.0/9
Need to know it: sometimes Python doesn't know what to do
# Try adding numbers and strings.
print "2 + 3:", 2 + 3
print "two + three:", "two" + "three"
print "2 + three:", 2 + "three"
2 + 3: 5 two + three: twothree 2 + three: Traceback (most recent call last): File "add-numbers-strings.py", line 5, in <module> print "2 + three:", 2 + "three" TypeError: unsupported operand type(s) for +: 'int' and 'str'
In this case, "2three"
would be sensible
But what about "1" + 2?
On your own, try "two" * 3
Back to useful things
Computers are useful because they can do lots of calculations on lots of data
Which means we need a concise way to represent multiple values and multiple steps
# Find the mean.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
total = 0
number = 0
for value in data:
total = total + value
number = number + 1
mean = total / number
print "mean is", mean
mean is 2
Use list to store multiple values
Use loop to perform multiple operations
Can trace execution step by step manually or in a debugger
Did you notice that the result in the example above is wrong?
Problem is that total starts as an integer, we're adding integers, we wind up doing int/int (again)
Could fix it by initializing total to 0.0
Or use a function to do the conversion explicitly
# Find the mean.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
total = 0
number = 0
for value in data:
total = total + value
number = number + 1
mean = float(total) / number # this line has changed
print "mean is", mean
mean is 2.77777777778
Functions do what they do in mathematics
Spend a whole chapter on them, since they're key to building large programs
Right now, most important lesson is that just because a program runs, doesn't mean it's correct
[1, 4]
produces 2 instead of 2.5Need to know it: the len
function
# Find the mean.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
total = 0
for value in data:
total = total + value
mean = float(total) / len(data) # this line has changed
print "mean is", mean
mean is 2.77777777778
Need to know it: list are mutable
# Calculate running sum by creating new list.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
result = []
current = 0
for value in data:
current = current + value
result.append(current)
print "running total:", result
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
Start with the empty list
result.append
is a method
How to double the values in place?
# Try to double the values in place.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
for value in data:
value = 2 * value
print "doubled data is:", data
doubled data is [1, 4, 2, 3, 3, 4, 3, 4, 1]
New values are being created, but never assigned to list elements
Easiest to understand with a picture
Need to know it: list indexing
Mathematicians use subscripts, we use square brackets
Index from 0..N-1 rather than 1..N for reasons that made sense in 1970 and have become customary since
# Try to double the values in place.
data = [1, 4, 2]
data[0] = 2 * data[0]
data[1] = 2 * data[1]
data[2] = 2 * data[2]
print "doubled data is:", data
doubled data is [2, 8, 4]
Clearly doesn't scale...
Need to get all the indices for a list of length N
The range
function produces a list of numbers from 0..N-1
You will almost never be the first person to need something
# Double the values in a list in place
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
length = len(data) # 9
indices = range(length) # [0, 1, 2, 3, 4, 5, 6, 7, 8]
for i in indices:
data[i] = 2 * data[i]
print "doubled data is:", data
doubled data is: [2, 8, 4, 6, 6, 8, 6, 8, 2]
Fold this together by combining function calls (like \sqrt{sin(x)})
# Double the values in a list in place.
data = [1, 4, 2, 3, 3, 4, 3, 4, 1]
for i in range(len(data)):
data[i] = 2 * data[i]
print "doubled data is:", data
doubled data is: [2, 8, 4, 6, 6, 8, 6, 8, 2]
Usually won't type in our data
Store it outside program
# Count the number of lines in a file
reader = open("data.txt", "r")
number = 0
for line in reader:
number = number + 1
reader.close()
print number, "values in file"
9 lines in file
What about mean?
# Find the mean.
reader = open("data.txt", "r")
total = 0.0
number = 0
for line in reader:
total = total + line
number = number + 1
reader.close()
print "mean is", total / number
Traceback (most recent call last): File "mean-read-broken.py", line 7, in <module> total = total + line TypeError: unsupported operand type(s) for +: 'float' and 'str'
Data in file is text, so we need to convert
# Find the mean.
reader = open("data.txt", "r")
total = 0.0
number = 0
for line in reader:
value = float(line)
total = total + value
number = number + 1
reader.close()
print "mean is", total / number
mean is 2.77777777778
Notice that we're using the original program as an oracle
Real-world data is never clean
Count how many scores were not between 0 and 5
# Count number of values out of range.
data = [0, 3, 2, -1, 1, 4, 4, 6, 5, 5, 6]
num_outliers = 0
for value in data:
if value < 0: num_outliers = num_outliers + 1 if value > 5:
num_outliers = num_outliers + 1
print num_outliers, "values out of range"
3 values out of range
Need to know it: combine tests using and
and or
# Count number of values out of range.
data = [0, 3, 2, -1, 1, 4, 4, 6, 5, 5, 6]
num_outliers = 0
for value in data:
if (value < 0) or (value > 5):
num_outliers = num_outliers + 1
print num_outliers, "values out of range"
3 values out of range
Need to know it: in-place operators
# Count number of values out of range.
data = [0, 3, 2, -1, 1, 4, 4, 6, 5, 5, 6]
num_outliers = 0
for value in data:
if (value < 0) or (value > 5):
num_outliers += 1
print num_outliers, "values out of range"
3 values out of range
Don't actually "need" to know it
But it's a common idiom in many languages
Data cleanup
# Report where values are not monotonically inreasing
data = [1, 2, 2, 3, 4, 4, 5, 6, 5, 6, 7, 7, 8]
for i in range(2, len(data)):
if data[i] < data[i-1]:
print "failure:", i
i = i + 1
failure: 8
Group by threes
# Combine successive triples of data.
data = [1, 2, 2, 3, 4, 4, 5, 6, 5, 6, 7, 7, 8]
result = []
for i in range(0, len(data), 3):
sum = data[i] + data[i+1] + data[i+2]
result.append(sum)
print "grouped data:", result
Traceback (most recent call last): File "group-by-threes-fails.py", line 6, in <module> sum = data[i] + data[i+1] + data[i+2] IndexError: list index out of range
13 values = 4 groups of 3 and 1 left over
First question must be, what's the right thing to do scientifically?
Let's assume, "Add up as many as are there"
# Combine successive triples of data.
data = [1, 2, 2, 3, 4, 4, 5, 6, 5, 6, 7, 7, 8]
result = []
for i in range(0, len(data), 3):
sum = data[i]
if (i+1) < len(data):
sum += data[i+1]
if (i+2) < len(data):
sum += data[i+2]
result.append(sum)
print "grouped data:", result
grouped data: [5, 11, 16, 20, 8]
But this is clumsy
How do we add up the first three, or as many as are there?
Don't want to have to keep modifying the list as we try out ideas
So use a list of lists.
# Add up the first three, or as many as are there.
test_cases = [[], # no data at all
[10], # just one value
[10, 20], # two values
[10, 20, 30], # three
[10, 20, 30, 40]] # more than enough
for data in test_cases:
print data
[] [10] [10, 20] [10, 20, 30] [10, 20, 30, 40]
Can now try all our tests by running one program
Back to our original problem: sum of at most the first three
# Sum up at most the first three values.
test_cases = [[], # no data at all
[10], # just one value
[10, 20], # two values
[10, 20, 30], # three
[10, 20, 30, 40]] # more than enough
for data in test_cases:
limit = min(3, len(data))
sum = 0
for i in range(limit):
sum += data[i]
print data, "=>", sum
[] => 0 [10] => 10 [10, 20] => 30 [10, 20, 30] => 60 [10, 20, 30, 40] => 60
That looks right
Need one more tool: nested loops
# Loops can run inside loops.
for i in range(4):
for j in range(i):
print i, j
1 0 2 0 2 1 3 0 3 1 3 2
Easiest to understand with a picture
Final step: instead of starting at zero every time, start at 0, 3, 6, 9, etc.
Don't need to test everything (which is why we skip from 40 to 60 to 80)
# Sum up in groups of three.
test_cases = [[],
[10],
[10, 20],
[10, 20, 30],
[10, 20, 30, 40],
[10, 20, 30, 40, 50, 60],
[10, 20, 30, 40, 50, 60, 70, 80]]
for data in test_cases:
result = []
for i in range(0, len(data), 3):
limit = min(i+3, len(data))
sum = 0
for i in range(i, limit):
sum += data[i]
result.append(sum)
print data, "=>", result
[] => [] [10] => [10] [10, 20] => [30] [10, 20, 30] => [60] [10, 20, 30, 40] => [60, 40] [10, 20, 30, 40, 50, 60] => [60, 150] [10, 20, 30, 40, 50, 60, 70, 80] => [60, 150, 150]
Understand this in pieces
Outer for
loop is selecting a test case
Inner loop is going in strides of three
limit is as far as we can go toward three values up from i
range(i, limit)
guaranteed to be valid indices for listHuman beings can only keep a few things in working memory at once
How we actually understand this program is:
for data in test_cases: result = sum_by_threes(data) print data, "=>", result to sum_by_threes given a list data: result = [] for i in range(0, len(data), 3): limit = min(i+3, len(data)) sum = sum_from(data, i, limit) result.append(sum) to sum_from given a list data, and start and end indices: sum = 0 for i in range(start, end): sum += data[i]
The computer doesn't care one way or another
But what we need is a way to write our programs in pieces, then combine the pieces
That's the subject of the next chapter
Originally posted 2011-08-08 by Greg Wilson in Content.
comments powered by Disqus