A Question and Answer Matrix for Software Carpentry
Following up on yesterday's post about applying educational principles to this course, here's a not-yet-completed Q&A matrix for this course. The section headings are questions people ask (or equivalently, tasks they want to perform). The headings underneath are the major topics Software Carpentry covers, and below each of those is my attempt to relate those topics to the questions. "TBD" means "I haven't written it yet", while "N/A" means "I can't think of any relationship." This matrix is going to be the basis of our next big reorganization of material (which should start this fall), so we would be very grateful for your input:
What have we missed?
What's in the wrong place?
Most importantly, can we reframe our key questions to divide things up more usefully or more logically, and if so, how?
Thanks for your help!
Q01: How can I manage this data?
Q02: How can I process it?
Q03: How can I tell if I've processed it correctly?
Q04: How can I find and fix bugs when I haven't?
Q05: How can I keep track of what I've done?
Q06: How can I find and use other people's work?
Q07: How can other people find and use mine?
Q08: How can I do all these things faster?
Q01: How can I manage this data?
The Shell
Use directories and sub-directories with meaningful names.
Use filenames that can easily be matched with wildcards.
Use filename extensions that indicate the type of data in the file.
Use text unless there's a powerful reason to use something else.
Version Control
If it's megabytes or less, put it under version control.
Basic Programming
Create and use data formats that are easy for programs to parse.
Functions and Libraries
TBD
Databases
Store it in a relational database.
Store each atom of information in its own field.
Make sure each record has a unique key.
Make sure that information is never duplicated.
Use foreign keys and joins to combine information from different tables.
Number Crunching
Represent it as a matrix, because that's easy to process.
Quality
N/A
Sets and Dictionaries
TBD
Development
N/A
Web Programming
Format it as HTML (or XML, or some other widely-used format).
Separate content from presentation (e.g., use CSS for styling).
Q02: How can I process it?
The Shell
Use Unix commands that manipulate lines of text.
Combine those commands using pipes and redirection.
Use loops to perform the same operations on many files.
Version Control
N/A
Basic Programming
Write programs that use loops, file I/O, and string splitting to read data.
Use floating-point numbers unless you are sure all values (including calculated values) will always be integers.
Functions and Libraries
TBD
Databases
Write SQL queries to select, filter, aggregate, and sort data.
Use a general-purpose programming language for everything else.
Number Crunching
Use a linear algebra package like NumPy.
Quality
N/A
Sets and Dictionaries
TBD
Development
Use the right data structures.
Web Programming
Use an HTTP library to fetch it.
Use an XML or JSON library to parse it.
Q03: How can I tell if I've processed it correctly?
The Shell
N/A
Version Control
N/A
Basic Programming
Test your programs with small data sets whose results can be checked by hand.
Functions and Libraries
TBD
Databases
Build queries in small steps.
Run queries against small data sets whose output can be checked manually.
Number Crunching
Compare a program's output to analytic results, experimental results, simplified test cases, and previous programs.
Use tolerances when comparing results.
Quality
Create simple data sets for which the right answer can be calculated by hand.
Compare the results produced by the new program to results produced by older programs.
Sets and Dictionaries
TBD
Development
Make code testable by dividing it into functions, and then replacing some functions with others for testing purposes.
Web Programming
N/A
Q04: How can I find and fix bugs when I haven't?
The Shell
N/A
Version Control
N/A
Basic Programming
N/A
Functions and Libraries
TBD
Databases
N/A
Number Crunching
N/A
Quality
Write test cases that fail when the bug is present, but pass when the bug is fixed.
Add assertions to programs to check its internal consistency.
Use a debugger.
Sets and Dictionaries
TBD
Development
Write tests.
Web Programming
N/A
Q05: How can I keep track of what I've done?
The Shell
N/A
Version Control
Keep your work under version control.
Check in whenever you've completed a significant change.
Write meaningful check-in comments.
Basic Programming
Put version control IDs in programs (and data files), and copy them forward to results.
Functions and Libraries
TBD
Databases
Store queries in files (just like programs).
Number Crunching
N/A
Quality
Turn bug fixes into assertions and test cases.
Use a coverage analyzer to see what code is and isn't being tested.
Sets and Dictionaries
TBD
Development
N/A
Web Programming
Use meta
headers in your HTML/XML data files.
Q06: How can I find and use other people's work?
The Shell
N/A
Version Control
Get it from their version control repositories.
Basic Programming
N/A
Functions and Libraries
TBD
Databases
N/A
Number Crunching
N/A
Quality
N/A
Sets and Dictionaries
TBD
Development
N/A
Web Programming
Ask them to use well-formed URLs.
And to format it according to well-defined machine-readable standards (e.g., XML or JSON).
Q07: How can other people find and use mine?
The Shell
N/A
Version Control
Put your work in a publicly-accessible version control repository.
Basic Programming
N/A
Functions and Libraries
TBD
Databases
Raise exceptions to signal errors so that other people can handle them as they think best.
Number Crunching
N/A
Quality
N/A
Sets and Dictionaries
TBD
Development
N/A
Web Programming
Put it on the web at a stable URL.
Format it according to well-defined machine-readable standards (e.g., XML or JSON).
Include meta-data.
Q08: How can I do all these things faster?
The Shell
Put commands in shell scripts so that they can be re-used.
Version Control
N/A
Basic Programming
Use appropriate variable names so that people will waste less time trying to read programs.
Functions and Libraries
TBD
Databases
N/A
Number Crunching
Use a linear algebra package like NumPy.
Quality
Design code for testing.
Write test cases before writing new code.
Sets and Dictionaries
TBD
Development
Use a profiler to figure out why code is slow before trying to optimize it.
Build code so that parts can be replaced easily.
Web Programming
N/A
Originally posted 2012-08-14 by Greg Wilson in Content, Education.
Please enable JavaScript to view the comments powered by Disqus.
comments powered by