Team Project Presentations

Each team will present for 30 minutes, including ~10 minutes of Q&A. All team members must participate and be prepared to discuss any aspect of the project. Presentation should include:

  • Team member introductions, including your role on the project
  • Project description
  • Entity-Relationship diagram
  • Use cases and personas
  • Web site walk-through
  • Architectural features of your project
  • Future plans
  • What were the hardest parts to implement?
  • Suggestions to improve the course

Security

Tom Ferrin

April 18, 2017

Portions Copyright © 2005-06 Python Software Foundation.

Evil Exists

  • Computer security is a collective responsibility
    - A system is only as strong as its weakest component.
    - If you are creating CGI scripts, or sending data over the web, you are putting others at risk as well as yourself.
  • Impossible to cover anything more than the basics in this lecture
    - Bruce Schneier provides broad non-alarmist coverage. See his website or his book Secret and Lies.

You Can Skip This Lecture If...

  1. You understand the tradeoff between convenience and security
  2. You know that computer security is not primarily a technological problem
  3. You know what authentication, authorization, and access control are
  4. You never trust user input
  5. You know what public-key cryptography, HTTPS, and SSH are

What Are We Trying to Do?

  • Goal: let everyone who should be able to do something do it easily…
    - while blocking people who shouldn't be able to…
    - and gathering information about their attempts.
  • Most people are trustworthy most of the time.
    - Preventing legitimate users from doing things annoys them.
    - If people are sufficiently annoyed, they'll turn security off, or find ways around it.
  • But we also must account for the villainous minority.
    - Any system that relies on trust will attract abuse.
  • Keeping track of how villains are trying to break in is (almost) as important as preventing them.
    - You can't fix holes unless you know they exist.
    - Often need an audit trail in order to take legal or disciplinary action.

Technology Alone Is Not A Solution

  • Many successful attacks rely on social engineering.
    - Call up your bank, and see if you can get your credit card balance without your PIN.
    - Helps if you sound like a grandmother who is close to tears because her poodle has just been hit by a car.
  • Second way to attack a system is to get a job with the company running it.
    - Many companies choose not to press charges, rather than deal with bad publicity after a security failure.
    - So burn an extra copy of credit card data while backing up the server…
    - or take notes of all the “to be fixed later” points that come up during the security audit of the web site.

More Ways Security Can Fail

  • And then there's carelessness:
    - Many people don't bother to change the default password on their wireless router.
    - Many more choose easily-guessed passwords:
    • Where “easy” means “can be found by a clever program running for a couple of hours”
    • Remember: once one villain builds a tool, they can all use it
  • In fact, technology can make systems less secure:
    - Imagine a facial recognition system that works correctly 99% of the time, so one person in a hundred is mistakenly identified as a potential terrorist.
    - 300,000 passengers a day in a busy airport means one false alarm every 30 seconds.
    - Do you think the guards will still be paying attention to the alarms on Tuesday?

How to Think About Security

  • Security systems are responsible for:
    - Authentication: who are you?
    - Authorization: who is allowed to do what?
    - Access control: how are authorization rules enforced?
  • When analyzing security, look for ways to compromise the three A's:
    - Convince the system you are:
    • Some other regular user (if you're trying to buy stuff with someone else's credit card)
    • An administrator (or someone else with special privileges)
    - Convince it that you're allowed to do something you're not:
    • E.g., give yourself administrative privileges

    - Circumvent its enforcement of the rules:
    • E.g., take advantage of a browser bug that lets Javascript in a page make copies of your cookies

Risk Assessment

 

First step is always risk assessment!

  • What could an attacker do?
  • How much would it cost?

Example:

WebDTR is a password-protected web interface to a database of drug trial results
Risk Importance Discussion
Denial of service Minor Researchers can wait until the system comes back up
Data in database destroyed Minor Restore from backup
Unauthorized data access Major If competitors access data, competitive advantage may be lost
Backups corrupted, so that data is permanently lost Major Redoing trials may cost millions of dollars
Data corrupted, and corruption not immediately detected Critical Researchers may make recommendations or diagnoses that lead to injury or death

Thinking Like A Villain

  • Good judgment comes from experience.
    - But experience is just the name we give to our mistakes when talking to our grandchildren
  •  
  • The books listed in the introduction describe attacks that have worked in the past.
    - Use these to guide your analysis of your system...
[Web Project Architecture]

Example: Don't Trust Your Input

  • Anyone who knows the URL of a web application can send it data
    ...and can study its HTTP requests and responses.
  • There is therefore no guarantee that the HTTP request you receive was generated from your form:
    - The input provided for a selection list may not be one of the values you offered.
    - The input for a text field may be longer than the maximum you specified.
    - Some parameters may be missing from QUERY_STRING, while unexpected ones may be present.
    - QUERY_STRING may not even be formatted according to the HTTP specification.

Attacking URLs

  1. Attacker looks at WebDTR URLs
    - Before logging in: http://www.webdtr.com
    - After logging in: http://www.webdtr.com/display.py?user=cdarwin
  2. Look for a cookie from webdtr.com. If none present…
  3. Conclusion: user ID is being stored in the URL
    - Try surfing to http://www.webdtr.com/display.py?user=bmcclintock
    - Yup, we've broken in…

Leaking Information

  • Now try http://www.webdtr.com/display.py?user=nobody
    - Result is an error page saying “no such user”.
    - Which means we have a way to see who's authorized to use the system.
    • I.e., whose password it might be worth cracking
  • What about the URL http://www.webdtr.com/display.py?user=?
    - Result is a page containing a stack track.
    • Developer left cgitb (or its equivalent) enabled in the production system:
    - Doesn't help normal users: stack trace doesn't tell them what they did wrong.
    - But it does help attackers by telling them what functions are being called, what libraries are in use, etc.
  • Every piece of information that leaks out of the application helps attackers find vulnerabilities.

SQL Injection

  • OK, your new version of WebDTR uses secure connections and encrypted cookies to close the holes identified above.
  • But the URL used to look up a result is...
    http://www.webdtr.com/display.py?testid=178923 - What if the CGI looks something like this:
    
    form = cgi.FieldStorage()
    test_id = form.getvalue('testid')
    
    query = "SELECT date,result FROM Results WHERE (id=%s)" % test_id
    
    cursor = connection.cursor()
    cursor.execute(query)
    results = cursor.fetchall()
    cursor.close()
    			

But now what if…

  • A villain sets testid to...
    "1);UPDATE Results SET result=FALSE WHERE (id=*"
  •  

  • The whole query is then...
    "SELECT date,result FROM Results WHERE (id=1);UPDATE Results SET result=FALSE WHERE (id=*)"
  •  

  • Oops!
  • Someone just set results to false for every entry in the database
  • Mistake #1: CGI program has a capability (updating the database) it doesn't actually need!
    - Good applications use the principle of least privilege
  •  

  • Mistake #2: application failed to validate its input!
    - Should have checked that value of testid was an integer, and within range of acceptable values

Defensive coding

  • Instead of this…
    
    form = cgi.FieldStorage()
    test_id = form.getvalue('testid')
    query = "SELECT date,result FROM Results WHERE (id=%s)" % test_id
    cursor = connection.cursor()
    cursor.execute(query)
    results = cursor.fetchall()
    cursor.close()
            
  • Do this…
    
    form = cgi.FieldStorage()
    test_id = form.getvalue('testid')
    cursor = connection.cursor()
    cursor.execute("SELECT date,result FROM Results WHERE (id=?)", test_id)
    results = cursor.fetchall()
    cursor.close()
            
  • The “execute” method properly quotes the given query argument and therefore prevents SQL injection!

Pathnames in Hidden Form Fields

This one occured in a BMI219 Python script from last year and was discovered during a recent security audit by a UCOP contractor. The CGI script was using a value from the request that was sent using a "hidden" form field. The script then simply concatenated the upload directory path with the form value, without checking whether the value contained path separators, e.g., “../../../../../../etc/passwd”. Even though parsing of the target file fails, the generated error messages included parts of the target file:

Blah blah blah.
Offending line: root:x:0:0:root:/root:/bin/bash

The fix for this is two-fold:

 

  1. Explicitly check for “/” in the form value and reject the request if present

     

  2. Only return error messages like tracebacks or file contents if a “debug” flag is set (which you leave unset as soon as your code is debugged and put into production)

In General…

 

There is no Python package for checking CGI input because it really depends on how the values will be used. A “/” should be illegal in path names but is probably just fine in dates. To verify whether a form value is malicious or even legal, the CGI script writer must apply domain-specific knowledge. For example, checking a date is fairly standard (and there exists code to do it in the “datetime” module); checking whether a protein sequence is legal is pretty specialized (but maybe Biopython has something to do that). The point is that there is no single call a script writer can use to guarantee all supplied form values are safe to use.

Attacking Defaults and Denial of Service

  • Another attack is to see if default accounts or passwords are still enabled:
  • Try logging in as user “admin” with password “admin”, or “guest” and “guest”, etc.
    - Better yet, write a small Python program to try this.
    - Helps (the attacker) if the results distinguish between “no such user” and “invalid password”
  • Can use a script like this to run a denial of service (DoS) attack.
    - Flood the server with login requests, so that legitimate users can't get access.
    - Or their connections time out even if they do.

Phishing

  • Phishing is increasingly common.
    - Trick users into giving away sensitive information
  • Email someone you believe is a user of the system with...
    - “System crashed last night, click here to reset your password”
  • The link actually sends them to http://www.webbdtr.com
    - Did you notice the difference in the host name? (the extra "b")
  • Phony site shows them the same login page as the real one, but…
    - Records their password, then automatically redirects them to the correct web page.
    - Real web page just asks them for same login and password information and user may just think they mistyped it the first time.

Attacking Data Entry

  • How is the database updated?
    1. Files mailed in by clinicians are formatted and concatenated by a Python script.
    2. Results temporarily stored in /tmp/webdtr/0001.tmp, /tmp/webdtr/0002.tmp, etc.
    3. Administrator periodically runs another Python script to load this data into the database.
    4. Backups are run twice a week.
  • Attack #1: mail in a file full of fake data.
    - Administrator “authenticates” messages just by looking at sender address (Which is very easy to fake!)
  • Attack #2: modify or replace one or the other Python script
  • Attack #3: create a file /tmp/webdtr/9999.tmp
    - Does the script that loads the database check that sequence numbers are consecutive?
    - Does it check who created or owns the file?

Timed Attacks

  • See a message on the WebDTR mailing list saying that the program now checks for attack #3 above.
  • 
    def read_file(filename, required_uid):
        '''Read submission data from a file, checking that the file
        is owned by the specified user.'''
    
        owner = os.stat(filename)[ST_UID]
        if owner != required_uid:
            raise SecurityException('%s has incorrect owner' % filename)
        stream = open(filename, 'r')
        data = stream.read()
        stream.close()
        return data
    			
  • There's a tiny window of opportunity between when the program checks ownership, and when it opens the file.
    - Write a script that loops over files, deleting them and creating new ones in their place.
    - Low chance of success on any one try, but computers are very patient.

Securing HTTP

  • HTTP sends data as “clear” (unencrypted) text!
    - As is too often the case in software design, security was ignored in HTTP's original design.
  • Netscape later developed HTTPS (Secure HTTP) to protect confidential information.
    - Uses a different network port # (443 instead of 80) and protocol (https in URL instead of http).
    - Encrypts data between the browser and the web server.
    - Does not guarantee secure storage on the server.
    • - Far too many web sites store sensitive information in databases as cleartext.
    • - Gives villains another point of attack.

Cryptography 101

  • Encryption is the process of obscuring information so that it can't be read without special knowledge.
    - Recovering the information is called decryption.
  • An algorithm for encrypting and decrypting is called a cipher.
  • Original and encrypted messages are called plaintext and ciphertext respectively.
  • All classical (pre-1970s) ciphers are symmetric:
    - Same key is used for both encryption and decryption.
    - Which also means that the key can only be shared among trusted parties.

Public-Key Cryptography

  • Asymmetric ciphers have two keys:
    - Each undoes the other's effects
    - Practically impossible to determine one given the other
  • Asymmetric systems are often called public key cryptography systems.
    - Publish one key (called the public key)
    - Keep the private key secret
  • Symmetric encryption is typically many times faster than asymmetric encryption, so...
    - Usual scheme these days is to use asymmetric encryption (slow) to exchange a one-time symmetric key,
    - then use the symmetric key (fast) for the rest of the conversation.

Sending and Receiving

  • Anyone who wants to send a message to you encrypts it using the public key
    - You're the only one who can decrypt it.
    - Look up their public key in order to encode your reply.
[Secure Communication with Asymmetric Keys]

Digital Signatures

  • Key pairs can also be used to sign messages.
    - Encrypt message using your private key, and append the result to the original message
    - Recipients use your public key to decrypt the signature
    - If it matches the message, you must have been the sender
    - Also guarantees that the clear text was not changed
  • In practice, encrypt a digest
    of the original message. - Practically impossible for someone to construct a message that has a given digest
[Signing a Message]

Securing Login

  • Another flaw in HTTP is its built-in password handling (called basic authentication).
    - Sends the user name and password as cleartext
  • Solution is simple: never use HTTP basic authentication.
    - And never have users submit ID and password via a form, since form data isn't encrypted!
  • Alternative:
    - Have user provide ID and password over secure connection.
    - Use a random number as a cookie.
    • Do not just use a sequence of integer session IDs: too easy for attackers to fabricate.

    - Give that to the client to track the session.
    • When it comes back, use it as a key into a dictionary of active sessions.

Example HTTPS Authenication

  • https://bmi219.rbvi.ucsf.edu
    • Provides basic authentication using SSL for all users with plato accounts.
    • Enabled through the Apache configuration file.
    • Users cannot access web pages through this URL without first authenicating.
    • "REMOTE_USER" environment variable will be set to user's login name.
  • https://bmi219.rbvi.ucsf.edu/cgi-bin/test-cgi.py
    
       CGI/1.0 test Python script report:
       
       argc is 1
       argv is ['/usr/local/www/rbvi/preview/cgi-bin/test-cgi.py']
       You are apache on watson.cgl.ucsf.edu
       
       cwd is /usr/local/www/rbvi/preview/cgi-bin
       
       encryted channel
       authenticated as tef
       
       AUTH_TYPE=Basic
       SERVER_SOFTWARE=Apache/2.2.15 (Red Hat)
       SCRIPT_NAME=/cgi-bin/test-cgi.py
       SERVER_SIGNATURE=Apache/2.2.15 (Red Hat) Server at bmi219.rbvi.ucsf.edu Port 443
       
       REQUEST_METHOD=GET
       REMOTE_USER=tef
       SERVER_PROTOCOL=HTTP/1.1
       QUERY_STRING=
       PATH=/bin:/sbin:/usr/bin:/usr/sbin
       SSL_TLS_SNI=bmi219.rbvi.ucsf.edu
       HTTP_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0
       HTTP_CONNECTION=keep-alive
       SERVER_NAME=bmi219.rbvi.ucsf.edu
       REMOTE_ADDR=169.230.11.112
       SERVER_PORT=443
       SERVER_ADDR=169.230.27.19
       ...
    	

Python code to implement this:


#!/usr/bin/python

print "Content-type: text/plain"
print

print "CGI/1.0 test Python script report:"
print

import sys
print "argc is", len(sys.argv)
print "argv is", sys.argv

import pwd, os, socket
print "You are", pwd.getpwuid(os.geteuid()).pw_name, "on", socket.gethostname()
print

print "cwd is", os.getcwd()
print

if "HTTPS" in os.environ and os.environ["HTTPS"] == "on":
	print "encryted channel"
else:
	print "unencrypted channel"
if "REMOTE_USER" in os.environ:
	print "authenticated as", os.environ["REMOTE_USER"]
else:
	print "unauthenticated"
print 

for key, value in os.environ.items():
	print "%s=%s" % (key, value)
print

import cgi
data = cgi.FieldStorage()
for key in data.keys():
	print "%s: %s" % (key, data[key].value)

What About Authorization?

  • In this example it's simple: Is user "tef" authorized to perform whatever operation is being attempted?
    • In your code, test to see if "REMOTE_USER" is in the list of authorized users.
    • In a more complicated scenario, you could store user names and their "level" of authorization in a database and test against that information.

Red Queen Race

  • If villains can snoop on network traffic, they can hijack sessions.
    - By inserting a copy of your cookie into their message, they get remote systems to mistakenly trust their messages.
  • Snooped network traffic also makes systems vulnerable to replay attacks.
    - Copy the cookie (or an entire message) and re-send it later.
    - Useful if the message means “open the vault door”.
  • And no security measures you implement are useful if there is spyware on the client machine.
    - Nowadays, this is much more likely than someone sniffing network traffic.
    - Keep your anti-virus protection and spyware monitors up to date.

It Isn't Just The Web

  • Security isn't just a feature of web-based applications.
    - How do you know the software you've installed on your machine is reliable?
    - How would you find out if it had been tampered with during production?
  • C and C++ have vulnerabilities that other languages don't.
    - Best-known are buffer overflow attacks:
    1. Attacker sends more data than the program has allocated memory to receive.
    2. “Extra” bytes overwrite the program itself.
    3. If those bytes' values correspond to machine instructions, the attacker can change a program's behavior.

Summary

  • Remember that technology doesn't solve security problems: it just moves them around
  • Never rely on keeping your techniques secret to ensure security
  • Never design your own ciphers
    - Use 3DES or AES for symmetric encryption (NIST standard - very secure)
    - Use RSA, DSA, or EC-DSA for public-key
  • Most important: security has to be designed in from the start!
    - And tested, tested, tested.