Software Carpentry: Teaching Security to Scientists

Thanks to everyone for their suggestions regarding what we should teach about computer security if we only have one hour (the usual constraint for topics in this course). The outline below is based in part on the lecture on security from Version 3 of this course, in part on Rick Wash's excellent study of folks models of computer security, and in part on mistakes I've seen (or made) myself in the past five years. Feedback would be very welcome, but remember: we're teaching scientists and engineers who are programming as a way to do science, not as an end in itself.

Introduction

"steal your data" is the Hollywood threat
- correlate your data is just as big a threat that we all ignore
- injecting data (corrupting your database with evil intent)
- steal your credentials to attack something else
denial of service attacks
botnet: use your computer for spam, click fraud, DDOS
- not after your data in particular, so infection isn't particularly visible
- more is better, so attackers are not just after big fish

Overview

a hacker is a human criminal (yes, the word used to have another meaning, get over it)
- not geeky teenage graffiti artists (though some kids run canned warez)
- live real-time attacks by human beings are rare because they're not cost-effective
  - social engineering attacks are a much "better" use of human criminals' time
a virusis a piece of software that infects a target computer and tries to propagate itself
- what "infection" means (usually relies on access control failure)
- not the same as "buggier than usual"
  - although bugs in software are often the targets of attack
- what anti-virus software does
- no, your Mac/Linux machine is not magically immune to all viruses
ways that viruses can spread
- "download and run this program"
- "open this attachment" (you explicitly run the program)
- "put this USB into your computer" (computer may run it automatically)
- "open this text file in your editor" almost certainly can'tinfect your machine
  - unless there's a bug in the text editor — see below on stack smashing
  - so keep patches up to date
but those aren't the only ways to spread in a networked world
- "click this link" may run some JavaScript in your browser
  - honestly, viruses have nothing to do with pop-ups
many more programs (services) running on your computer these days than you realize
- all of which can be attacked, even if you don't click on anything
- send them data to trick them into doing things — see below on stack smashing (again)
- 'ps' command or equivalent shows what they are (there are lots)
- port scanning (what command to use?) shows how many are listening for data
- what a firewall does

Framework and Examples

need a framework for thinking about this
- authentication (something you know, something you have, something you are)
- authorization (who's allowed to do what)
- access control (enforcement of rules)
- usability (can/will people actually understand and use the above)
  - digital signatures have been around for years...
  - ...and almost nobody uses them ("Why Johnny Can't Encrypt")
running example: WebDTR is a password-protected web interface to a database of drug trial results
example: someone steals your laptop
- you really should have encrypted the answers you downloaded and saved...
- but "password at reboot" isn't really much security (I haven't rebooted my machine in a month)
example: easily-guessed password is an authentication and usability failure
- dictionary attacks on encrypted passwords (never store as plain text)
- XKCD cartoon idea: requiring 8 characters and unmemorable is less secure than longer phrase
- confirming identity for bad passwords is a bad idea: tells the attacker "this is a valid ID"
example: listening to unencrypted network traffic to steal password
- access control failure
- replay attacks
example: getting a file defeated by "../.." attack
- authorization: the web server shouldn't have read permission
- access control: program shouldn't be able to reach that file
example: using user ID in URL to track the user
- authentication failure (is this the actualy user?)
- authorization failure: not checking that this person (logged in as someone else) actually allowed to do things
example: burying the user ID in a hidden form field
- same as above: someone can craft an HTTP POST
example: SQL injection
- authorization: you're not supposed to be able to run code
example: displaying stack trace for exceptions
- useful for debugging
- but now the attacker knows some of the libraries you're using, and can look up exploits that target them
- log the stack trace instead of displaying it
- but remember: security by obscurity doesn't work
example: flood the application with login requests
- no information lost, but no service provided
example: phishing
- the text displayed with a link has nothing to do with where the link sends you
- what a page looks like tells you nothing about where that page is actually hosted
example: smashing the stack
- wave hands
- really use this example to show people that code is data

Keep Calm and Carry On

how does this all apply to scientists?
- have to do everything that regular people do to stay safe
- plus everything programmers do when creating web services, sharing code libraries, etc.
  - are you sure that FFT library you downloaded doesn't contain an attack?
  - is its author sure that the compiler she used doesn't inject attacks without her knowing about them?
plus everything IT departments do when managing data
- patient records and other sensitive information are obvious
- ClimateGate: if your science actually matters, someone will want to cast doubt on it, honestly or otherwise
it's easy to be crippled by fear
- or to use fear as an excuse for clutching at power
- which would be a tragedy, since the web has so much potential to accelerate science
the bigger picture (or, please help us engineer a more secure world)
- computer security is a matter of economics: extent of damage vs. cost to prevent or clean up
- keep usability in mind
  - facial recognition software to spot terrorists
  - 1% false positive rate, 300K passengers per day in an airport, equals one false alarm every 30 seconds
  - do you think the guards will still be paying attention to the alarms on Tuesday?

Risk	Importance	Discussion
Denial of service	Minor	Researchers can wait until the system comes back up
Data in database destroyed	Minor	Restore from backup
Unauthorized data access	Major	If competitors access data, competitive advantage may be lost
Backups corrupted, so that data is permanently lost	Major	Redoing trials may cost millions of dollars
Data corrupted, and corruption not immediately detected	Critical	Researchers may make recommendations or diagnoses that lead to injury or death

what do do?
- top-down mandates of precautions against specific threats hasn't worked, and won't
- criminals are smart people who adapt quickly
best model is how we deal with credit card fraud
- make credit card companies liable for losses, then let the free market figure out the balance
need changes to legislation so that:
- creators of software vulnerabilities are liable for losses
- whoever first collects data is liable for its loss, no matter where it is when it's stolen
on their own, such changes would stifle open science
- scientists are broke
- university bureaucrats don't like risk
- result is no sharing ever
so we need:
- equivalent of copyright's "fair use" provisions
- meaningful academic credit ("points toward tenure") for creating data and software that people use
in the short term, get professional help!
- the only time in this course that we've said that
- so please take it seriously

Originally posted 2011-09-02 by Greg Wilson in Content.