LIFT: A Low-Overhead Practical
Information Flow Tracking System for
Detecting Security Attacks
Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim,
Yuanyuan zhou, Youfeng Wu
University of Illinois at Urbana-Champaign
Intel Corporation
The Ohio State University
Information Flow Tracking

Taint Analysis

To detect / prevent security attacks



For attacks that corrupts control data
General: not for specific types of software vulnerabilities
Even for unknown attacks
Approach

1. Tag (label) the input data from unsafe channels: network

2. Propagate the data tags through the computation


Any data derived from unsafe data are also tagged as unsafe
3. Detect unexpected usages of the unsafe data

Switch the program control to the unsafe data
A Simple Example



a is unsafe
Information flows from a to b: b is unsafe
If c is unsafe, jumping to the location pointed by c fails
Three Ways <1>

Language-based


For programs written in special type-safe programming languages
To track information flow at compile time


Good: No runtime overhead

Bad: Only for specific program languages

Not Practical
Three Ways <2>

Instrumentation


To track the information flow and detect exploits at runtime
Source code instrumentation



Lower overhead
Cannot track in third-party library code
Require a specification of library calls
• Complex, error-prone, side-effects

Binary code instrumentation

Runtime overhead: 37 times
Three Ways <3>

Hardware-based

RIFLE

Good: low overhead

Bad: Non-trivial hardware extensions
Overview of LIFT

Dynamically instruments the binary code



Advantages:


(1) tracking information flow
(2) detect security exploits
Low overhead, software-only, No source code
Built on top of StarDBT

Binary translator by Intel
Design of LIFT

Basic design





Tag management
Information flow tracking
Exploit detection
Protection of the tag space
Optimizations
Tag Management: Design

Associate a one-bit tag for each byte of data in memory and
general data register



At the beginning: all tags are cleared to zero
Data may be tagged with 1 when



0: safe; 1: unsafe
It is read from network or standard input
Information flow from other unsafe data to it
An unsafe data can become safe if it is reassigned from
some safe data
Tag Management: Storage

For memory data




Storage: a special memory region (tag space)
Look-up: one-to-one mapping between a tag bit and a memory
byte in the virtual address space
Overhead: 12.5%
Compression:
• memory data nearby each other usually have similar tag values

For general registers



Store tags in a dedicated extra register (64-bit)
Reduce overhead
If no spare registers: a special memory area
• No significant overhead as the L1 cache
• Hardware ??
Information Flow Tracking <1>


Dynamically instrument instructions
Instrumented once at runtime, and executed multiple times

The instrumentation is done before the instruction in the
original program

Tracks information flow based on data dependencies but
not control dependencies
Information Flow Tracking <2>

For data movement-based instructions



For arithmetic instructions



E.g., MOV, PUSH, POP
Tag propagation: source operand  destination
E.g., ADD, OR
Tag propagation: both source operands  destination
For instructions that involve only one operand


E.g., INC
The tag does not change
Information Flow Tracking <3>

Special cases



XOR reg, reg:
reset reg to zero
SUB reg, reg:
Clear the corresponding tag
Exploit Detection

Also instrument instructions to detect exploits

Unsafe data cannot be used as a return address or the
destination of an indirect jump instruction
Protection of Tag Space and Code

It is necessary to protect them

To protect the LIFT code


Make the memory pages that store the LIFT code read-only
To protect the tag space


Turn off the access permission of the pages that store the tag
values of the tag space itself
Any access of the original program or hijacked code to the tag
space results in access to the corresponding tag and triggers a fault
Optimizations


47 times runtime overhead
Three binary optimizations
Fast Path (FP): Motivation

Observation: for most server applications, majority of tag
propagations are zero-to-zero

From safe data sources to a safe destination
FP: Approach <1>

Before a code segment, insert a check
Check whether all its live-in and live-out registers and
memory data are safe or not

If so, no need to do tracking inside the code segment



Run the fast binary version (check version)
If not, run the slow version (track version)
FP: Approach <2>

Live-in: source operand
Live-out: may change to safe after the execution if they
are unsafe before the execution

Others:



(a) not used in the code segment
(b) dead at the beginning or end of the code segment
FP: More Technique Details

Difficult to know the address of all units at the beginning




Granularity of code segments



Run the check version first
Postpone the check until the memory location is known
Jump to track version when the check fails
Basic blocks
Hot trace
Remove unnecessary checks

Network processing component
Merged Check (MC): Motivation

Temporal / Spatial Locality



A recently accessed data is likely to be accessed again in a near
future
After an access to a location, memory locations that are nearby
are also likely to be accessed again in near future
To combine multiple checks into one

Combine the temporally and spatially nearby checks
Merged Check: Approach

Clustering the memory references into groups

Scan all the instructions and build a data dependency
graph for each memory reference

Introduce version number to represent the timing attribute

Clustering based on spatially / temporally distance
Fast Switch (FS)

When the program execution switches between the
original binary code and the instrumented code it requires
saving and restoring the context

Introduce large runtime overhead because they are
inserted at many locations

Use cheaper instructions and remove unnecessary saves
/ restores
Evaluation


Effectiveness
Performance
Evaluation: Effectiveness
Evaluation: Performance <1>

Throughput and response time of Apache

Throughput: 6.2% (StarDBT: 3.4%)
Time: 90.9%

Evaluation: Performance <2>

SPEC2000: 3.6 times on average
Conclusion




A “Practical” Information flow tracking system
Low-overhead
Not requiring hardware extension
Not requiring source code
Discussions

Source-code instrumentation




81% on average for CPU-intensive C-programs
5% on average for IO-intensive (sever) program
If we are able to apply similar optimization techniques to sourcecode instrumentation, the performance could be “practical”
Binary-code instrumentation


CPU-bound: 24 times
Apache server: worst case 25 times, most cases: 5~10 times
More Discussions

Focus on basic design and three optimizations


Not much details about the taint analysis
Evaluation


Effectiveness: false positive / false negative
Performance
• IO-incentive vs. CPU-incentive
• More benchmarks

Formal model to analyze taint analysis
Descargar

LIFT: A Low-Overhead Practical Information Flow …