Input Sanitization
COEN 225
All Input is Evil

All input is evil:
 At
least potentially
 Input can be: (A random collection)
Files
 Web forms
 Cookies
 Registry entries
 Database contents
 Command line arguments

Environmental
variables
 HTTP requests
 Named pipes
 E-mail
 …

Finding Common Entry Points

Files
 Contain
data specified by users
 Contain data supplied by application
 Can be intentionally or unintentionally corrupted
 Attacker can also attack file metadata:




Extension
Path
File system attributes
…
Finding Common Entry Points

Sockets
 Easy
to connect to sockets  need to filter
data
 Attacker can
Monitor data
 Send malformed data to client or to server
 Intercept data in the middle of a request and
replace it


A.k.a Man in the middle attack
Finding Common Entry Points

HTTP requests
 Almost
always passes through firewalls
 Using webproxy, users have complete control over
what is send to the server

Named pipes
 See
sockets
 But programmers might forget how named pipes work
and trust input


E.g. SQL Server 2000 vulnerability
See http://www.blakewatts.com/namedpipepaper.html
Finding Common Entry Points

Pluggable Protocol Handler
 Example:


http, ftp, https in URL
mailto:[email protected]?subject=WrongPerson
 Tell
system which application handles data when a
hyperlink is clicked
 Maliciously crafted link irc://[~900 characters] caused
buffer overflow in mIRC protocol handler that allowed
arbitrary code execution
Finding Common Entry Points

Programmatic Interfaces
 RPC
 COM
 DCOM
 ActiveX
 Managed
code entry points (Windows)
 .NET Remoting
Finding Common Entry Points

SQL
 Improperly
filtered input strings can lead to
execution of powerful SQL commands
Registry
 User Interfaces

 Win95
machines were used in libraries
 Attacker could remove the “Start” button for
free entertainment
Finding Common Entry Points

Command line arguments
 Attacker
provides helpful link with arguments
embedded
 Example: Cross scripting attacks

Environmental Variables
 Can
be used by programs to make decisions
Canonicalization
Authentication decision made by one
module
 Access done by other module

Input Validation

Input – Anything controlled by outsider
 user
command line input
 configuration files that could be manipulated
 http requests
 packets under consideration by firewall
…
Input Validation

Security Strategies
 Black List
 List all things that are NOT allowed


List is difficult to create
 Adding insecure constructs on a continuous basis means
that the previous version was unsafe
 Testing is based on known attacks.
List from others might not be trustworthy.
 White List
 List of things that are allowed


List might be incomplete and disallow good content
 Adding exceptions on a continuous basis does not imply
security holes in previous versions.
 Testing can be based on known attacks.
List from others can be trusted if source can be trusted.
Input Validation

Principle problem
 Location

of Check  Location of Use
Principle solution
 Canonicalization

of input
Transform input into a canonical form
 Decision
is made on input in the same form
that program uses
Canonicalization

Two major program errors:
 Misunderstanding
definition of canonical form
 Stopping canonicalization process to early
Canonicalization:
Dealing with Metacharacters

Meta-information can be attached
 Out-Of-Band
 In-Band



Often more readable
Often more compact
Has security implications
 Potential for overlapping trust domains:
 There exists a logical boundary between data and
metadata
 Parser need to identify the difference between data
and metadata correctly
Canonicalization:
Dealing with Metacharacters

Example: NULL character for termination
of strings
Canonicalization:
Dealing with Metacharacters

Simplest Vulnerability:
 Users
can embed metacharacters into input
that is not filtered
 Instance of second-order injection attack

The attack happens when the metacharacter is
evaluated
 Example:
Password update (next slide)
Canonicalization:
Dealing with Metacharacters
No input
use CGI;
sanitization!
… verify session details …
$new_password = $query->param(′password′);
open(IFH,″</opt/passwords.txt″) || die (″$!″);
User bob inputs:
open(OFH,″>/opt/passwords.txt.tmp″) || die (″$!″); test\njim:npwd
while(IFH)
{
($user, $pass) = split /:/;
if ($user ne $session_username)
OFH becomes:
print OFH ″$user:$pass\n″;
bob:test
else
jim:npwd
print OFH ″$user:$new_password\n″;
}
…
Bob just added a
close( IFH );
new user
close( OFH );
Canonicalization:
Dealing with Metacharacters

Discovering attacks like this:
1.
2.
3.
4.
5.
Identify code that deals with metacharacter strings
Identify all delimiter characters that are specially
handled and put them into a list
Identify filtering performed on input
Eliminate potentially hazardous delimiter characters
from list
Remaining characters on list indicate a vulnerability
Canonicalization:
Dealing with Metacharacters
Bool HandleUploadedFile(char * filename)
{
unsigned char buf[MAX_PATH], pathname[MAX_PATH];
char * fname = filename, *tmp1, *tmp2;
DWORD rc;
HANDLE hFile;
tmp1 = strrchr(filename,′/′);
tmp2 = strrchr(filename,′\\′);
if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1;
if(!fname) return FALSE;
if(strstr(fname, ″.. ″)) return FALSE;
_snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname);
rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname));
if(rc == 0 || rc > sizeof(pathname)) return FALSE;
hFile = CreateFile(pathname, …);
… read bytes into the file …
}
1 Input string is formatted a number of ways before it becomes a file
name.
Added to a statically sized buffer and prefixed with \\\\?\\%TEMP%\\
Canonicalization:
Dealing with Metacharacters
Bool HandleUploadedFile(char * filename)
{
unsigned char buf[MAX_PATH], pathname[MAX_PATH];
char * fname = filename, *tmp1, *tmp2;
DWORD rc;
HANDLE hFile;
tmp1 = strrchr(filename,′/′);
tmp2 = strrchr(filename,′\\′);
if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1;
if(!fname) return FALSE;
if(strstr(fname, ″.. ″)) return FALSE;
_snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname);
rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname));
if(rc == 0 || rc > sizeof(pathname)) return FALSE;
hFile = CreateFile(pathname, …);
… read bytes into the file …
}
2 Set of delimiter characters that are specially handled:
‘/’ ‘\’ “..”
String is passed to Expand EnvironmentStrings( ).
Environmental variables are denoted with % characters.
Canonicalization:
Dealing with Metacharacters
Bool HandleUploadedFile(char * filename)
{
unsigned char buf[MAX_PATH], pathname[MAX_PATH];
char * fname = filename, *tmp1, *tmp2;
DWORD rc;
HANDLE hFile;
tmp1 = strrchr(filename,′/′);
tmp2 = strrchr(filename,′\\′);
if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1;
if(!fname) return FALSE;
if(strstr(fname, ″.. ″)) return FALSE;
_snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname);
rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname));
if(rc == 0 || rc > sizeof(pathname)) return FALSE;
hFile = CreateFile(pathname, …);
… read bytes into the file …
}
3 Set of delimiter characters that are specially handled:
‘/’ ‘\’ “..”
String is passed to Expand EnvironmentStrings( ).
Environmental variables are denoted with % characters.
Canonicalization:
Dealing with Metacharacters
Bool HandleUploadedFile(char * filename)
{
unsigned char buf[MAX_PATH], pathname[MAX_PATH];
char * fname = filename, *tmp1, *tmp2;
DWORD rc;
HANDLE hFile;
tmp1 = strrchr(filename,′/′);
tmp2 = strrchr(filename,′\\′);
if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1;
if(!fname) return FALSE;
if(strstr(fname, ″.. ″)) return FALSE;
_snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname);
rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname));
if(rc == 0 || rc > sizeof(pathname)) return FALSE;
hFile = CreateFile(pathname, …);
… read bytes into the file …
}
4 Filtering:
strrchr searches last occurrence for ‘/’ and ‘\’ and increments past it.
strstr searches for “..”
Canonicalization:
Dealing with Metacharacters
Bool HandleUploadedFile(char * filename)
{
unsigned char buf[MAX_PATH], pathname[MAX_PATH];
char * fname = filename, *tmp1, *tmp2;
DWORD rc;
HANDLE hFile;
tmp1 = strrchr(filename,′/′);
tmp2 = strrchr(filename,′\\′);
if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1;
if(!fname) return FALSE;
if(strstr(fname, ″.. ″)) return FALSE;
_snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname);
rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname));
if(rc == 0 || rc > sizeof(pathname)) return FALSE;
hFile = CreateFile(pathname, …);
… read bytes into the file …
}
5 However, ‘%’ remains
Client can supply a number of environmental variables such as
QUERY_STRING
In addition, something like
..\..\..\any\pathname\file.txt supplied in
QUERY_STRING allows client to write to arbitrary locations in the file system
Canonicalization:
Dealing with Metacharacters

open(FH,
″>$username.txt″)
NULL character
injection


|| die(″$!″);
print are
FH necessary
$data;
NULL characters
to terminate strings when
calling C routines
from OS and many APIs
close(FH);
Perl and other languages do not use NULL for termination
 Example:


Perl application programmer tests that file name ends in “.txt”
Attack inputs sequence “%00” in CGI input


Decoded as NUL character
Can be used to cut-off filename, including extension
Canonicalization:
Dealing with Metacharacters: NULL

NUL metacharacter is used to end Cstrings, but not Perl, Java, PHP, …
 This

is a canonicalization issue:
C-based modules canonicalize strings differently
than the no-C/no-Unix world
 Issues
arise when strings cross boundaries
between these worlds
Canonicalization:
Dealing with Metacharacters: NULL

Possible results:
 Memory
corruption because strlen returns a
different value
 Truncation of strings  False decisions

Especially for FILE NAMES
B O B . T X T \0
B O B \0 . T X T \0
Path Metacharacters

Windows File Names:
 C:\\WINDOWS\system32\calc.exe
 Optional device
 Followed by path
 NOT UNIQUE






C:\\WINDOWS\system32\drivers\..\calc.exe
calc.exe
.\calc.exe
..\calc.exe
\\?\WINDOWS\systems32\calc.exe
File system uses file canonicalization

But the system is less than canonical
Path Metacharacters

Issues:

File squatting (in Windows)

Need to use CreateFile carefully in order to


CreateFile canonicalization



Not open an existing file that sits in the canonical path of the file name
eliminates any directory traversal components before validating whether
each path segment exists
C:\nonexistent\path\..\..\blah.txt accesses C:\blah.txt
File-like Objects

CreateFile can open objects that are treated like files but are not files:



\\host\object
type\name
Device Files



Reside in the file hierarchy
But are canonicalized differently
 COM!-9, LPT1-9, CON, CONIN$, CONOUT$, PRN, AUX, CLOCK$, NUL
Programmers are often not aware of the rules
Path Metacharacters

CreateFile() (Windows) idiosyncrasies

Strips out trailing spaces in file names

Example attack




Case Sensitivity


Windows filenames are not case sensitive, UNIX and HFS filenames are
DOS 8.3




Programmer attaches “.txt” to a user-provided name
Attacker provides “helloworld.exe “ with trailing space
The trailing space with following .txt is stripped out
Short file name is created by the file system if the file name is too long.
File can be referred to by the short file name
Use \\?\ before file name to disable DOS filename parsing
Insure that files are normal files by checking for
FILE_ATTRIBUTE_NORMAL, or face access to named pipes, …
 Alternative Data Streams are created with an “:” separator
Path Metacharacters

Registry keys
 Naming
similar to files
 Similar issues
 Worthy of its own presentation
Canonicalization:
Dealing with Metacharacters

Shell Metacharacter Injection
 Attack

vector
User controls input to an argument for execve(),
popen(), …
 Dangerous

shell characters
; | & < > ` ! - * / ? ( ) . [space] [ ] “\t” ^ ~ \ “\\”
quotes “\r” “\n” $
Canonicalization:
Dealing with Metacharacters

SQL Injection attack
 Attack

vector:
User controls part of the SQL query string
Canonicalization
Meta Character Filtering

Three basic options
Detect erroneous input and reject what
appears to be an attack
2. Detect and strip dangerous characters
3. Detect and encode dangerous characters
with a metacharacter escape sequence
1.
Canonicalization
Meta Character Filtering

Eliminating Metacharacters
 Whitelisting: Allow only good strings
if($input_data =~ /[^A-Za-z0-9_ ]/) {
exit;
}
 Whitelisting: Strip away anything that is not good
$input_data =~ s/ /[^A-Za-z0-9]/g
 Stripping is vulnerable to mistakes
 Blacklisting:
Make decisions based on dangerous
characters (not recommended)
Canonicalization
Meta Character Filtering

Escaping Metacharacters
 Non-destructive:
metacharacters are
preserved in string
 Goal: Receiving module receives a safe string
 Attack vectors:

Metacharacter evasion

Encoded metacharacter can be used to avoid other
filtering
Canonicalization
Meta Character Filtering

Escaping Metacharacters
 Filtering
does not detect encoded
dXNlcj1wYXNzd2QmaG9tZWRpcj0uLiUyNSUzMiU0Ni4uJTL1JTMyJTQ
metacharacters

Base 64 Decoder
Example: ..%2F..%2Fetc%2Fpasswd
user=passwd&homedir=..%25%32%46..%25%32%46etc
 Double
Encoding
Attacks
Hexadecimal Decoder
pass 1
user=passwd&homedir=..%2F..%2Fetc
Hexadecimal Decoder pass 1
user=passwd&homedir=../../etc
Canonicalization
Meta Character Filtering

Character Sets
 Example vulnerabilities
 Wide characters (unicode) C-style strings are terminated with
a 16 NULL, normal character strings with an 8 NULL
 Homographic attacks
 Different characters look the same

 String
“Microsoft”  “Microsoft” in Unicode
 one ‘o’ is cyrillic
length calculations need to take character set
into account (wide characters vs. normal characters)
Descargar

Input Sanitization