Static Code Checking:
Security and Concurrency
Ben Watson
The George Washington University
CS 297 Security and Programming Languages
June 9, 2005
The Video
The Problem
How to discover errors in code without running it
Code can run for weeks or months without
displaying the error
Many errors are caused by pieces of code that
are very difficult to test

Device drivers – manufacturers aren’t always good at
this, and one OS company can’t possibly test all the
tens of thousands of devices out there
The Windows 98 crash was caused by a bad scanner driver

Concurrent code—debugging complicated
concurrency problems is a nightmare x n.
The Scope
Lines of Code (estimated)
Windows 3.1
Windows NT 3.51
Windows 95
RedHat Linux 7.1
Windows 2000
Windows XP
Debian Linux 2.2
Debian Linux 3.1
3,000,000
4,000,000
15,000,000
30,000,000
35,000,000
40,000,000
56,000,000
213,000,000
The Real Problem
We’re only human


No person, no group of people can possibly
manually debug anything as complicated as
an OS and its related pieces
Good tools are not enough
Can’t rely on thorough annotations of entire code
base
Can’t rely on manual directions: the more
automated the better
The Solutions
MC Security checking system
RacerX: Race condition and Deadlock
detection
General rule inference from source code
MECA: Statically Checking
Security Properties
Checks low-level properties (pointer safety,
etc.)
Relies on annotations that propagate through
the analysis
Goals



Expressiveness
Low manual overhead—programmers only have to
type in a relatively few number of annotations
Low false-positives
How MC Works
Uses a modified GCC compiler
Parses source along with abstract syntax
tree generated by compiler
AST used to build a control-flow graph
Annotation propagator uses CFG to
propagate annotations through entire
graph
Checkers are run on the completed graph
Results are ranked and filtered
An example
Rule: OS kernel may not access a userpointer (there are “paranoid” functions to
access the data pointed to by a userpointer)

Referred to as a “tainted” pointers
Annotate:


Tainted variables, parameters, and fields
Functions that produce tainted values
Source annotations
struct myStruct {
/*@ tainted */ int*p;
};
/*@ tainted */ int *foo(/*@ tainted
*/int *p);
void memcpy(/*@ !tainted */void *dst,
/*@ !tainted */void *src, unsigned
nbytes);
Source annotations
//Binding:
/*@ set_length($ret, sz) */
void* malloc(unsigned sz);
//Global: all sys_* calls
//are tainted
/*@ global $param
${!strncmp(current_fn,”sys_”,4)} ==>
tainted */
Propagation
void bar(/*@ tainted */void *p);
struct S{char* buf;}
//Before analysis
void foo(char** p, struct S* s)
{
char *r;
struct S* ss;
r=*p;
bar(r);
//taints r and *p
ss =s;
bar(ss->buf);
//taints ss and s
}
//At the end of analysis:
Foo(/*@ tainted (*p) */char **p, [email protected](s->buf)
*/struct S* s);s
MECA results
On average, one manual annotation led to 682 checks
Linux 2.5.63 Bugs:
Type
Arbitrary write
Arbitrary read
Fault at will
Always fail
Total
False
Positives
Warnings
11
8
19
6
44
8
Fixed
11
8
17
3
39
RacerX
Static detection of race conditions and
deadlocks
Designed to find errors in large, multithreaded systems
Sorts errors by severity (the hard part)
They checked Linux, FreeBSD, and a
mystery OS that has only 500,000 lines of
code
Deadlock
Deadlock





Thread 1 has locked resource A
Thread 2 has locked resource B
Thread 1 needs resource B to complete
Thread 2 needs resource A to complete
Neither can proceed—these threads are
deadlocked
Race condition
Multiple threads access the same memory
If memory is unprotected:



Two threads can simultaneously write to same
memory (bad)
One thread can read, another can write
simultaneously (bad)
Two threads can simultaneously read from same
memory (probably ok)
It’s a race because final value is nondeterministically chosen by who gets there first.
Avoiding the Problem
If data is never accessed by more than
one thread, you don’t have to worry about
concurrency
If program logic ensures that only one
thread accesses data, you don’t need to
worry about locking the data
If you’re writing a shared component, you
almost always have to worry about
concurrency
Algorithm
“Lockset” algorithm detects both types of
problems
Lockset - A pair of



Lock()/Unlock()
InterruptDisable()/InterruptEnable()
Etc.
Algorithm
Top-down analysis of control-flow graph
Add/remove locks as needed
Check for race/deadlock on each
statement
Cache results to ease exponential graph
size
Deadlock Check
Basically, finds if there are cycles in the
lockset dependencies

If lock a is obtained, then lock b, we have:
ab

Following this line of reasoning, we can
discover cases that look like this:
abca
Deadlock Check
Deciding how important the cycle is, is
non-trivial.
Basically, rank higher according to:



Global locks vs. local locks
Small depth difference vs. big depth
difference
Fewer threads vs. more threads
Race Checking
This is even harder than deadlock detection
Must answer:



Is lockset valid (if not, you will have LOTS of false
positives)
Can the unprotected memory be accessed more than
one thread?
Does the access need to be protected?
Two reads do not a wrong make

Must annotate API functions that require locks
Race Checking
Deciding if code is multithreaded:


Inferred from “programmer belief” – if a piece
of code contains concurrency-related
statements, the code is probably multithreaded
Annotations—designate API functions as
requiring locks
Race Checking
Does memory need to be protected?





If it’s never written to, no.
If it’s only written on initialization, no.
On a certain code path, if there are a high-number of
variables that are potentially written to concurrently,
probably.
Anything that can’t be written atomically, yes.
(although, this is pretty much anything, especially if
you have more than 1 CPU)
If a variable is statistically likely to be protected by
locking code (“Programmer Belief”)
RacerX: Results
Confirmed
Unconfirmed
Benign
False
2
3
7
Linux 2.5.62 4
8
6
FreeBSD
2
3
6
7
4
13
14
Linux 2.5.62 3
2
2
6
Deadlock
System X
Race
System X
Pop Quiz – Question 1
If you have read the 3rd paper, you may not
answer this question.
Find the bug:
if (card==NULL) {
printk(KERN_ERR “capidrv-%d: …
%d!\n”,
card->contrnr, id);
}
Pop Quiz – Answer 1
if (card==NULL) {
printk(KERN_ERR “capidrv-%d: …
%d!\n”,
card->contrnr, id);
}
Pop Quiz – Question 2
If you have read the 3rd paper, you may
not answer this question.
Find the bug:
struct mxser_struct *info =
tty->driver_data;
unsigned long flags;
if (!tty || !info->xmit_buf)
return 0;
Pop Quiz – Answer 2
struct mxser_struct *info =
tty->driver_data;
unsigned long flags;
if (!tty || !info->xmit_buf)
return 0;
General Methodology
Take advantage of programmer beliefs
Statistics are our friend
If something is usually done a certain way,
then instances that violate that should be
examined
Check internal consistency


Discover rules that are built-in to the code
Minimal to no annotation
Conclusion
The methods tonight provide some of the
best ways to find errors:

Millions of lines of code can be checked with
at most hundreds of lines of annotations
The bugs these methods find are fairly
specific in nature (revolve around wellstructured code constructs)
References
Junfeng Yang, Ted Kremenek, Yichen Xie, and Dawson Engler.
MECA: an Extensible, Expressive System and Language for
Statically Checking Security Properties. ACM CCS, 2003.
Dawson Engler and Ken Ashcraft. RacerX: Effective, Static
Detection of Race Conditions and Deadlocks. SOSP 2003.
Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and
Benjamin Chelf. Bugs as Deviant Behavior: A General Approach to
Inferring Errors in Systems Code. OSDI 2000.
Source Lines of Code, http://www.answers.com/topic/source-linesof-code
Concurrency – Part 2: Avoiding the Problem,
http://blogs.msdn.com/larryosterman/archive/2005/02/15/373460.as
px
Descargar

Document