Virus vs Anti-Virus:
The Arms Race
Patrick Graydon
Qiuhua Cao
A virus is “a program that can ‘infect other
programs by modifying them to include a
possibly evolved copy of itself.” - Fred Cohen
Fred Cohen seems to have been the first to
define the term virus, but the concept had
been discussed earlier and there were some
viruses out in the wild before he began his
Link to virus history
Example of a virus
In his 1984 Turing award acceptance speech
to the ACM, Ken Thompson related the story
of how he modified the C compiler to insert a
backdoor into the UNIX login program and to
insert his modifications into any C compiler
compiled using his modified compiler.
Slick—no trace of the backdoor remains in any
source code!
Viruses example
The WM.Nuclear Microsoft Word macro virus
infects Word documents during opening,
saving, and printing by adding a set of
macros to them. On April 5th it attempts to
overwrite critical system files, and it
occaisonally adds the text "STOP ALL
PACIFIC!" to the current document.
(Information from Symantec’s security
Worms are not viruses
The [email protected] “Anna Kournikova”
malware is a worm, not a virus, because it emails copies of itself but does not infect any
other documents. (Information about
[email protected] from Symantec’s security
Malware terminology
We found a web site listing 56 different terms
related to viruses and malware, including:
boot sector viruses
Encrypted virus
Micro virus
Virus statistics
Here are some statistics from 2000 we found on
the web:
Over 85% of all the known viruses are for Microsoft
platforms (nearly all the self-propagating worms are as
Slightly less than 52,000 are viruses for
DOS/Windows/NT platforms
- about 6000 of these are Word macro viruses
- about 150-200 of these are known to be widespread
"in the wild"
- in 1999, approximately 650 new viruses were
reported each month (more than 20 a day)
Virus statistics
More statistics from the same site
A few hundred are for Javascript, Hypercard, Perl, and
other scripting languages. Few of these can spread
beyond a few machines without active support of the
150 are for the Atari
31 are native to the Macintosh, and only two of
them are known to exist anymore
2 or 3 are viruses native to OS/2
Virus statistics (cont’d)
More statistics from the same site
About 5 are for Linux/Unix/etc, but none have
been found in quantity "in the wild", nor would
they be likely to spread very far if they were
None are for BeOS, ErOS, or other smallpopulation systems.
Question: can we reduce the risk of getting
a virus infection by not using Microsoft
Example virus
Fred Cohen’s example virus:
program virus := { 1234567;
subroutine infect-executable := {
loop:file = get-random-executable-file;
if first-line-of-file = 1234567 then goto loop;
prepend virus to file; }
subroutine do-damage := { whatever damage is to be done }
subroutine trigger-pulled := { return true if some condition holds }
main-program := {
if trigger-pulled then do-damage;
goto next;}
More about viruses
Viruses aren’t necessarily hard to write
Cohen reports that his first virus took only 8 hours
for an experienced programmer to write.
Viruses aren’t necessarily big
Cohen reports on a UNIX shell script virus that
was only 7 lines long
Viruses aren’t necessarily malware
Cohen describes a hypothetical virus that
compresses executables to conserve disk
Viruses can be malicious in many ways
Virus payloads could:
Carry out a denial of service attack
Crash the machine
Randomly destroy data
Install a trojan horse program
Perform password cracking
… and basically any other nasty thing you can
think of.
Making matters worse…
Virus payloads may not trigger immediately.
If a virus has few detectable side effects, it
could spread without notice and become
widespread before the payload is triggered.
Question: is it possible that there are viruses in
the wild today that have infected large numbers of
systems but have gone unnoticed because they
have few if any side effects and have not yet
triggered their destructive payloads?
One way to protect against infection is to
isolate systems, users, and/or information to
make it difficult or impossible for a virus to
spread widely.
Total isolation is a sure cure.
Total isolation probably isn't practical for most
Imagine life without google … without BitTorrent
… without …
If we can’t isolate systems and users from
each other completely, maybe we can erect
partitions to limit the spread of malware.
It was thought that the Bell-LaPadula model
might help limit the spread of viruses, but
Cohen reports that “viruses demonstrated the
ability to cross users boundaries and move
from a given security level to a higher
security level.”
Partitioning (continued)
According to Cohen, the Biba and BellLaPadula models, if combined, would tend to
create partitions.
Unfortunately: “When we mix the Biba and BellLaPadula models, we find that the resulting
isolationism secures us from viruses, but doesn’t
permit any user to write programs that can be
used throughout the system.” – Cohen
Bad news about partitioning
Transitivity is a problem:
“If there is a path from user A to user B, and there
is a path from user B to user C, then there is a
path from user A to user C with the witting or
unwitting cooperation of user B.” – Cohen
The military uses a category system in which
users can only access information needed for
their current duties. But, some users have
simultaneous access to multiple categories…
More bad news…
According to Cohen “a precise system for
integrity is NP-complete” and “any non-NP
complete solution must tend toward
If a system restricts user’s actions unnecessarily,
it will be unpopular…
And the hits just keep on coming…
Cohen notes that flow distance and flow list
models may limit virus spread.
Flow distance restrictions limit how far information
can travel.
Flow lists allow more arbitrary expressions for
accessibility based on the list of users who have
had an effect on an object.
BUT: “tracing exact information flow requires NPcomplete time, and maintaining markings requires
large amounts of space.”
Prevention by law
Couldn’t we just make it against the law?
“By simply telling users not to launch attacks,
little is accomplished; users who can be
trusted will not launch attacks; but users who
would do damage cannot be trusted, so only
legitimate work is blocked.” - Cohen
Limited interpretation
If a given document is interpreted, and the
interpreter lacks commands like “write file,” it
may be impossible for it to have a virus
Graphics files are probably immune
Except AnnaKournikova.jpg.vbs 
Documents that can hold scripts probably aren’t
Word documents can contain macro viruses such as
If we can’t limit the spread of a virus, maybe
we can find it and quarantine infected files…
Unfortunately, no general algorithm for detecting
virus behavior is possible.
Cohen argues this by proposing a virus that infects only
when the detection algorithm thinks it isn’t a virus.
Anti-virus programs must make do with more limited
solutions, such as scanning for a virus signature.
Virus detection problems
According to Cohen, the following are undecidable:
Detection of a virus by its appearance
Detection of a virus by its behavior
Detection of an evolution of a known virus
Detection of a triggering mechanism by its appearance
Detection of a triggering mechanism by its behavior
Detection of an evolution of a known triggering mechanism
Detection of a virus detector by its appearance
Detection of a virus detector by its behavior
Detection of an evolution of a known viral detector
Detection by signature
Rather than implement a general solution,
virus scanners look for virus signatures.
These signatures could be as small as a few
bytes or as large as the entire virus code.
If a virus scanner uses the whole virus code as a
signature, it may not be able to find simple
variants of a virus.
However, if a virus uses a very small signature, it
may incorrectly infections that aren’t there.
Updated signatures
Anti-virus companies must release new
signatures each time a new virus is
A virus’s spread is unimpeded for a while…
According to Andreas Marx of, it took
Symantec 25h 5m to release an updated
signature file in response to the W32/Sober.C
worm attack.
The arms race
In order to make it hard for virus scanners to
detect their vurises, virus writers can add
morphing behavior to their creations:
“A polymorphic virus ‘morphs’ itself in order to
evade detection. … Metamorphic viruses attempt
to evade heuristic detection techniques by using
more complex obfuscations.”
– Christodorescu and Jha
More bad news…
Cohen argues that no general solution for
proving the equivalence of two programs is
His argument follows the same form as his
argument against a general algorithm for virus
detection: he proposes a virus in which two
different infection instances will behave differently
when a watching antivirus program believes they
are the same.
A virus may morph itself by:
Encrypting part of itself using a different key for each
Changing variable names (in a script virus)
Binary obfuscation techniques (more on this later)
Polymorphic virus examples:
Chameleon -- first polymorphic virus, 90’s
A partial list of the viruses that can be called 100 percent
polymorphic (late 1993): Bootache, CivilWar (four versions),
Crusher, Dudley, Fly, Freddy, Ginger, Grog, Haifa,
Moctezuma (two versions), MVF, Necros, Nukehard, PcFly
(three versions), Predator, Satanbug, Sandra, Shoker,
Todor, Tremor, Trigger, Uruguay (eight versions). – at link
Arming the virus writers
If virus author knew what the anti-virus programs
look for, he or she could design a virus that they
wouldn’t find…
Example: in the early 90s there were a few MS-DOS
'stealth' viruses that could interrupt a virus-scanning
program's attempt to read the boot record and show it a
clean versions rather than what was really there.
See Symantec’s description of the Stealth_boot virus.
"Frodo.4096" virus, first Stealth virus
“Beast.512" Stealth virus, less than a year after Frodo.4096
More on this at Virus-Scan-Software
Extracting signatures
Christodorescu and Jha report on a
technique for extracting the signature used by
a given antivirus program.
Basically they obfuscate parts of the program and
determine what has to remain unobfuscated for
the antivirus program to find the virus.
FYI there is a typo in the paper: the conditions on the
loop in the SignatureExtraction function cause it to never
They say it “was successful in many cases.”
Binary obfuscation techniques
The goal of binary obfuscation is to make it
difficult to obtain an assembly-language
description of a program from its raw bytes
You need to turn raw bytes back into assembly
code before you can decompile
You can obfuscate by:
Garbage insertion (more in a minute)
Variable renaming
Code reordering
Encapsulating/encrypting code or data
x86 binary obfuscation
If you create unused regions in the
executable and fill them with garbage bytes,
the variable-length nature of the x86
instruction set can cause disassemblers to
think that the legitimate instructions following
the garbage are in fact operands.
You can use a conditional branch instruction
to do an unconditional jump—disassemblers
assume no garbage bytes at the target
address or following the branch instruction.
Better obfuscation
Linn and Debray describe obfuscation using
a branch function
This function in turn branches to another target
depending on where it is called from.
This makes determining which parts of the program are
real by following the branch instructions difficult.
The function can return to an instruction one or more
bytes after the usual return point, opening up a region to
insert more garbage bytes into.
Advances in disassembly
Kruegel, Robertson, Valeur and Vigna
describe a disassembler that is able to
correctly disassemble most instructions from
a program obfuscated by the obfuscator Linn
and Debray describe.
Dissasembly in detail
Static analysis techniques
Linear sweep
Recursive traversal following control flow
GNU's objdump uses linear sweep
Gets confused by garbage bytes in unreachable areas
Drawback: indirect jumps
Doesn’t always “see” the whole binary
Speculative disassembly
Hybrid approach
Now for some good news
“This arms race is usually in favor of the deobfuscator. The obfuscator has to devise
techniques that transform the program
without seriously impacting the run-time
performance or increasing the binary's size or
memory footprint while there are no
such constraints for the de-obfuscator.”
- Kruegel et al
AV tool resistance to obfuscation
Christodorescu and Jha claim “the state of
the art for malware detectors is dismal!”
They propose a testing technique and then use it
to show that the tested virus scanners were not
generally able to identify the sampled viruses
when they were obfuscated by code reordering or
AV tool resistance to obfuscation (cont’d)
This doesn’t mean that these products aren’t
capable of detecting morphing viruses—the
viruses in the sample set did not perform
these morphs in the wild.
This does mean that in order to protect
against a new virus that is just a simple
modification of one of these existing viruses
the AV companies would have to release a
new signature file.
Known clean system
Some virus detection techniques require you
to start from a clean system.
DOS users used clean boot disks to defeat stealth
But is it always possible to get to a known clean
What if every UNIX vendor had been infected with Ken
Thompson’s C compiler virus? Even their “clean”
distribution media would be infected…
Obfuscation vs deobfuscation, who can win?
Discussion (cont’d)
Anti-virus can win in the future?

Virus vs Anti-Virus: The Arms Race