Towards Eradicating Phishing
Attacks
Stefan Saroiu
University of Toronto
Today’s anti-phishing tools
have done little to stop the
proliferation of phishing
Many Anti-Phishing Tools Exist
Phishing is Gaining Momentum
Current Anti-Phishing Tools Are Not Effective

Let’s look at new approaches & new insights!

Part 1: new approach: user-assistance
Part 2: need new measurement system

Part 1
iTrustPage: A User-Assisted AntiPhishing Tool
The Problems with Automation

Many anti-phishing tools use auto. detection

Automatic detection makes tools user-friendly
But it is subject to false negatives


Each false negative puts a user at risk
What are False Negatives &
False Positives?

Example of a false negative:


Phishing e-mail not detected by filter heuristics
Example of a false positive:

Legitimate e-mail dropped by filter heuristics
Current Anti-Phishing Tools Are Not Effective

Most anti-phishing tools use auto. detection

Automatic detection makes tools user-friendly
But it is subject to false negatives


Each false negative puts a user at risk
Can false negatives be
eliminated?
Case Study: SpamAssassin


SpamAssassin: one way to stop phishing
Methodology

Two e-mail corpora:
Phishing: 1,423 e-mails (Nov. 05 -- Aug. 06)
 Legitimate: 478 e-mails from our Sent Mail folders


SpamAssassin version 3.1.8

Various levels of aggressiveness
False Negatives Can’t Be Eliminated
F a ls e N e g a tiv e s
40%
30%
29.7%
21.1%
20%
8.7%
10%
0%
Default
More Aggressive
Most Aggressive
SpamAssassin Configurations
Trade-off btw.
False Negatives and False Positives
F a ls e P o s itiv e s
40%
30%
20%
15.5%
10%
0.0%
0.4%
0%
Default
More Aggressive
Most Aggressive
SpamAssassin Configurations
Reducing false negatives increases false positives
Summary: Automatic Detection



False negatives put users at risk
Hard to eliminate false negatives
Making automatic detection more aggressive
increases rate of false positives


Appears to be fundamental trade-off
Let’s look at new approaches
New Approach: User-Assistance

Involve user in the decision making process

Benefits:
False-positives unlikely and more tolerable
1.

2.
Combine with conservative automatic detection
Use detection that is hard-for-computers but
easy-for-people
Outline




Motivation
Design of iTrustPage
Evaluation of iTrustPage
Summary of Part 1
Two Observations about Phishing
1. Users intend to visit a legitimate page, but they are
misdirected to an illegitimate page
2. If two pages look the same, one is likely phishing
the other
[Florêncio & Herley - HotSec ‘06]
Two Observations about Phishing
1. Users intend to visit a legitimate page, but they are
misdirected to an illegitimate page
2. If two pages look the same, one is likely phishing
the other
[Florêncio & Herley - HotSec ‘06]
Idea: use these observations to detect phishing
Involving Users

Determine “intent”


Determine whether pages “look alike”


Ask user to describe page as if entering search terms
Ask user to detect visual similarity between two pages
Tasks are hard-for-computers but easy-for-people
iTrustPage’s Validation


When user enters input on a Web page
Two-step validation process
1.
Conservative automatic validation


2.
Simple whitelist -- top 500 most popular Web sites
Cache -- avoid “re-validation”
Flag page “suspicious”; rely on user-assistance
iTrustPage:
Validating Site
Step 1: Filling Out a Form
Step 2: Page Validated
iTrustPage:
Avoid Phishing Site
Step 1: Filling Out a Suspicious Page
Step 2: Visual Comparison
Step 3: Attack Averted
Two Issues: Revise & Bypass

What if users can’t find the page on Google?



Visiting an un-indexed page
Wrong/ambiguous keywords for search
iTrustPage supports two options:


Revise search terms
Bypass validation process

Similar to false negatives in automatic tools
Outline




Motivation
Design of iTrustPage
Evaluation of iTrustPage
Summary of Part 1
Methodology

Instrumented code sends anonymized logs:


Info about iTrustPage usage
High-Level Stats:
June 27th 2007 -- August 9th, 2007
 5,184 unique installations
 2,050 users with 2+ weeks of activity

Evaluation Questions






How disruptive is iTrustPage?
Are users willing to help iTrustPage’s validation?
Did iTrustPage prevent any phishing attacks?
How many searches until validate?
How effective are the whitelist and cache?
How often do users visit pages accepting input?
How disruptive is
iTrustPage?
iTrustPage is not disruptive
% o f p a g e s th a t iT ru s tP a g e
a s k e d to b e v a lid a te d
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
0
5
10
15
20
25
30
# of days since iTrustPage's install
Users interrupted on less than 2% of pages
After first day of use, 50+% of users never interrupted
Are users willing to help
iTrustPage’s validation?
Many Users are Willing to Participate
100%
80%
60%
34.86%
Mix
Validate Only
Bypass Only
20.21%
40%
20%
44.93%
0%
Users
Half the users willing to assist the tool in validation
Did iTrustPage prevent any
phishing attacks?
An Upper Bound

Anonymization of logs prevents us from
measuring iTrustPage’s effectiveness

291 visually similar pages chosen instead

1/3 occurred after two weeks of use
Summary of Evaluation




Not disruptive; disruption rate decreasing over time
Half the users are willing to participate in validation
Pages with input are very common on Internet
iTrustPage is easy to use
Summary of Part 1

An alternative approach to automation:
 Have user assist tool to provide better protection

Our evaluation has shown our tool’s benefits while
avoiding pitfalls of automated tools

iTrustPage protects users who always participate in page
validation
What is the Take-Away Point?
What is the Take-Away Point?
usability
Automatic Detection
security
User-Assistance
What is the Take-Away Point?
Many of today’s tools
usability
Automatic Detection
security
User-Assistance
What is the Take-Away Point?
Many of today’s tools
usability
Automatic Detection
iTrustPage
security
User-Assistance
Part 2
Bunker: A System for Gathering
Anonymized Traces
Motivation

Two ways to anonymize network traces:


Offline: anonymize trace after raw data is collected
Online: anonymize while it is collected
Motivation

Two ways to anonymize network traces:



Offline: anonymize trace after raw data is collected
Online: anonymize while it is collected
Today’s traces require deep packet inspection

Privacy risks make offline anonymization unsuitable
Motivation

Two ways to anonymize network traces:



Today’s traces require deep packet inspection


Offline: anonymize trace after raw data is collected
Online: anonymize while it is collected
Privacy risks make offline anonymization unsuitable
Phishing involves sophisticated analysis

Performance needs makes online anon. unsuitable
Simple Tasks are Very Slow

Regular expression for phishing:
" ((password)|(<form)|(<input)|(PIN)|(username)|(<script)|
(user id)|(sign in)|(log in)|(login)|(signin)|(log on)|
(signon)|(signon)|(passcode)|(logon)|(account)|(activate)|(verify)|
(payment)|(personal)|(address)|(card)|(credit)|(error)|(terminated)|
(suspend))[^A-Za-z]”

libpcre: 5.5 s for 30 M = 44 Mbps max
Motivation

Two ways to anonymize network traces:



Today’s traces require deep packet inspection


Offline: anonymize trace after raw data is collected
Online: anonymize while it is collected
Privacy risks make offline anonymization unsuitable
Phishing involves sophisticated analysis

Performance needs makes online anon. unsuitable
Motivation

Two ways to anonymize network traces:



Today’s traces require deep packet inspection


Privacy risks make offline anonymization unsuitable
Phishing involves sophisticated analysis


Offline: anonymize trace after raw data is collected
Online: anonymize while it is collected
Performance needs makes online anon. unsuitable
Need new tool to combine best of both worlds
Threat Model

Accidental disclosure:


Risk is substantial whenever humans are handling data
Subpoenas:


Attacker has physical access to tracing system
Subpoenas force researcher and ISPs to cooperate


As long as cooperation is not “unduly burdensome”
Implication: Nobody can have access to raw data
Is Developing Bunker Legal?
It Depends on Intent of Use

Developing Bunker is like
developing encryption

Must consider purpose and uses of Bunker
Developing Bunker for user privacy is legal
 Misuse of Bunker to bypass law is illegal

Our solution: Bunker

Combines best of both worlds



Same privacy benefits as online anonymization
Same engineering benefits as offline anonymization
Pre-load analysis and anonymization code

Lock-it and throw away the key (tamper-resistance)
Outline




Motivation
Design of Bunker
Evaluation of Bunker
Summary of Part 2
Logical Design
anonymize
One-Way Interface
(anon. data)
parse
Anon.
Key
assemble
Offline
Online
capture
Capture Hardware
VM-based Implementation
Closed-box VM
anonymize
One-Way
Socket
parse
Anon.
Key
assemble
decrypt
Enc.
Key
encrypt
Offline
Online
capture
Hypervisor
Open-box NIC
Encrypted Raw Data
Capture Hardware
VM-based Implementation
Open-box VM
Closed-box VM
anonymize
One-Way
Socket
parse
save trace
Anon.
Key
assemble
logging
maintenance
decrypt
Enc.
Key
encrypt
Offline
Online
capture
Hypervisor
Open-box NIC
Encrypted Raw Data
Capture Hardware
Outline




Motivation
Design of Bunker
Evaluation of Bunker
Summary of Part 2
Software Engineering Benefits
Python
L in e s o f C o d e
60,000
C
40,000
63,382
53,995
20,000
1,350
5,512
0
UW
Toronto
Bunker
One order of magnitude btw. online and offline
Development time: Bunker - 2 months, UW/Toronto - years
Summary of Part 2

Bunker combines:
Privacy benefits of online anonymization
 Software engineering benefits of offline anon.


Ideal tool for characterizing phishing
Our Current Use of Bunker

Few “hard facts” known about phishing:



Banks have no incentive to disclose info
Must focus on victims than on phishing attacks
Preliminary study of Hotmail users:



How often do people click on links in their e-mails?
Do the same people fall victims to phishing?
How cautious are people who click on links in e-mails?
Our Contributions

iTrustPage: new approach to anti-phishing

Bunker: system for gathering anonymized traces
Acknowledgements

Graduate students at Toronto



Researchers


Andrew Miklas
Troy Ronda
Alec Wolman (MSR Redmond)
Faculty

Angela Demke Brown (Toronto)
Questions?
iTrustPage: https://addons.mozilla.org
http://www.cs.toronto.edu/~stefan
Research Interests

Building Systems Leveraging Social Networks



Making the Internet more secure



Exploiting social interactions in mobile systems
Rethinking access control for Web 2.0
Characterizing spread of Bluetooth worms
iTrustPage + Bunker
Characterizing network environments in the wild


Characterizing residential broadband networks
Evaluating emerging “last-meter” Internet apps
Circumventing iTrustPage

“Google bomb”: increasing a phishing page’s
rank

This is not enough to circumvent iTrustPage

Breaking into a popular site that is already in
iTrustPage’s whitelist or cache

Compromising a user’s browser
Problems with Password Managers

When password field present:
Ask user to select from a list of passwords
 Remember password selection for re-visits


Challenges:
Auto. detection of passwd. fields can be “fooled”
 Such tools increase amount of confidential info
 Don’t assist users on how to handle phishing

Downloads
N u m b e r o f D a ily F re s h I n s ta lls
700
Released on Mozilla.org
600
500
400
300
200
100
0
13-Jun-07
25-Jun-07
7-Jul-07
19-Jul-07
31-Jul-07
Most Searches Don’t Need Revision
100%
80%
6.8%
12.2%
3+ Searches
2 Searches
60%
40%
20%
81.1%
One Search
0%
Number of Searches Performed Until Page
Found
Users can find their page majority of the time
Outcomes of Validation Process
100%
80%
Users chose to bypass
iTrustPage's validation
(33.6%)
60%
40%
20%
Page validated by
navigating to it from
Google Search
(61.6%)
Visually similar
page chosen instead
(0.2%)
Page validated
by searching and
found in top 10
answers
(4.6%)
0%
Outcomes of iTrustPage's User-Assisted Validation
1/3 of time, users choose to bypass validation
F ra c tio n o f B ro w s e d P a g e s
Forms and Scripts are Prevalent
100%
80.3%
80%
63.3%
60%
40%
20%
7.7%
0%
Forms
Passwords
Scripts
Many Web pages have multiple forms
H it R a te o f
iT ru s tP a g e 's w h ite lis t
Whitelist’s Hit Rate
100%
80%
60%
40%
20%
0%
0
10
20
# of days since iTrustPage's install
Hit rate remains flat at 55%
30
H it R a te o f
iT ru s tP a g e 's C a c h e
Cache’s Hit Rate
100%
80%
60%
40%
20%
0%
0
10
20
# of days since iTrustPage's install
Hit rate reaches 65% after one week
30
Our solution

Combines best of both worlds



Stronger privacy benefits than online anonymization
Same engineering benefits as offline anonymization
Experimenter must commit to an
anonymization process before trace begins
Illustrating the Arms Race
60%
51.2%
45.6%
F a ls e N e g a tiv e s
45.4%
40%
30.7%
20%
SpamAssassin as of 2004
SpamAssassin as of 2007
0%
Older Dataset
Newer Dataset
SpamAssassin is adapting to phishing attacks
Attackers are also adapting to SpamAssassin
Current Anti-Phishing Tools Are Not Effective

Most anti-phishing tools use auto. detection

Automatic detection makes tools user-friendly
But it is subject to false negatives


Each false negative puts a user at risk
Offline Anonymization

Trace anonymized after raw data is collected


Today’s traces require deep packet inspection



Privacy risk until raw data is deleted
Headers insufficient to understand phishing
Payload traces pose a serious privacy risk
Risk to user privacy is too high

Two universities rejected offline anonymization
Online Anonymization

Trace anonymized online


Difficult to meet performance demands


Extraction and anonymization must be done at line speeds
Code is frequently buggy and difficult to maintain


Raw data resides in RAM only
Low-level languages (e.g. C) + “Home-made” parsers
Small bugs cause large amounts of data loss

Introduces consistent bias against long-lived flows
Motivation

Two ways to anonymize traces:



Deep packet inspection killed us with phishing



Offline: trace anonymized after raw data is collected
Online: trace anonymized while raw data is collected
A game changer
Motivation: try to get the best of both worlds
Before I tell you about the design let me elaborate on
the security concerns
Related Work: iTrustPage

Spam filters and blacklists



New Web authentication tools



Out-of-band [JDM06, PKA06] (MITM)
Password managers [HWF05, RJM+05, YS06]
New Web interfaces


Exchange, Outlook, SpamAssassin
IE7, Firefox, Opera
Passpet, WebWallet, CANTINA
Centralized approaches


Central server for password similarity [FH06]
Central server for valid sites [LDHF 05]
Related Work: User Studies

Web password habits [FH07]





Huge password management problems
People fall for simple attacks [DTH06]
Warnings more effective than passive cues [WMG06]
Personalized attacks are very successful [JJJM06]
Security tools must be intuitive and simple to use [CO06]
Related Work: Bunker

Network tracing systems:


Anonymization schemes:



Httpdump [WWB96], BLT [Fe00], UWTrace [Wo02], CoMo
[Ia05]
Prefix-preserving [XFA+01]
High-level anonymization languages [PV03]
Secure VMs:




Tamper-resistant hardware [LTH03]
Small VMMs + formal verification [Ka06, Ru08]
PL techniques for memory safety + control flow [KBA02,
CLD+07]
Hardware memory protection [SLQ07]
Descargar

Modeling and Analysis of Human Encounters in a Mobile …