Cryptography on Graphics
Processors
Francisco Chinchilla
COMP 290-GPGP Presentation
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Papers
“Brute Force Attack on UNIX Passwords
with SIMD Computer”
Gershon Kedem and Yuriko Ishihara, Computer Science
Department, Duke University. Usenix ’99 Security
Symposium
"CryptoGraphics: Secret Key Cryptography
Using Graphics Cards"
Debra L. Cook, John Ioannidis, Angelos D. Keromytis,
and Jake Luck, Columbia University. To appear in the
Proceedings of the RSA Conference, Cryptographer's
Track (CT-RSA). February 2005, San Francisco, CA.
Remotely-Keyed Cryptographics
Debra L. Cook, Ricardo Baratto, and Angelos D.
Keromytis, Columbia University.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
Quick Tour of Cryptography
Brute force password hacking using
PixelFlow
Implementing Cryptographic systems
on a GPU
Using GPU based cryptographic
systems in remote keyed systems in
unfriendly environments
Conclusions
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cryptography Basics
Two functions of two variables
E(keye, text) encrypts
D(keyd, E(keye, text)) decrypts
If keye==keyd cryptosystem is
called symmetric, otherwise it’s
asymmetric
We’ll deal with symmetric
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cipher Types
Block Cipher
Encrypts fixed-size blocks of data
Stream Cipher
Encrypts input data one bit (sometimes one byte)
at a time
Useful for data with unknown length
Stream ciphers are faster in
hardware, but slower in software
than block ciphers
Both use XOR heavily
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Attacks to Break Encryption
Techniques used
Differential
Linear
Brute Force
Underlying methods
Chosen Plaintext
Ciphertext-only
Known Plaintext
Goals
Find a password
Find the key
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
DES
Block cipher
Developed by IBM, made a FIPS in ‘76
Uses only 56-bit keys, not 64, not 128,
possibly due to NSA
Designed against differential cryptoanalysis,
a ‘new’ (not to IBM nor NSA) field of
cryptography in the late 1980’s. Takes 2^47
chosen plaintexts to break.
Not designed against linear cryptoanalysis.
Takes 2^39-2^44 known plaintexts
Broken in 56 hours by EFF $250k machine in
1998
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
DES Encryption Details
The Feistel (F) function
Key Schedule
Subkey
(48 bits)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
DES – Other Boxes
http://en.wikipedia.org/wiki/Substitution_box
http://en.wikipedia.org/wiki/DES_supplementary_material
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Crypt
A variation of the DES algorithm
Used to encrypt Unix passwords
Sample invocations:
crypt(password)=pajXG3p8vEaok
crypt(God)=GoqTni0y.7CH2
crypt(sex)=se8lvCoF9FQE.
crypt(money)=moIzVjU7RTRr6
crypt(love)=loYMPnrPlj9gk
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
AES
Competition to replace DES FIPS
Competition won by Rijndael
AES is fast in both SW and HW
Relatively easy to implement
Requires little memory
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
AES 128-bit Details
AES operates on a state, a 4×4 array of
bytes.
For encryption, each round of a total of 10
consists of four stages:
SubBytes — a non-linear substitution step where each
byte is replaced with another according to a lookup
table, an S-block. This is the non-linear step.
ShiftRows — a transposition step where each row of the
state is shifted cyclically a certain number of steps.
MixColumns — a mixing operation which operates on
the columns of the state, combining the four bytes in
each column using a linear transformation. This is the
“Diffusion” step and is omitted in the final round!
AddRoundKey — each byte of the state is combined
with the round key; each round key is derived from the
cipher key using a key schedule.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
RC4
Stream cipher
Designed by Ron Rivest in ’87
In 1994 a description was anonymously
posted, which lead to ARCFOUR
In 2001 it was found that the first 256 bytes
of the output were non-random (this leads
to one big vulnerability in WEP)
Unusually simple
Requires only byte-length manipulations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
RC4 - details
RC4
Key Scheduling Algorithm (KSA)
Encrypt(bits, key){
for i from 0 to 255
S[i] := i
j: = 0
for i from 0 to 255
j := (j + S[i] + key[i mod l]) mod 256
swap(S[i],S[j])
S = new byte[256];
KSA(key,S);
randomBits = PRGA(S);
for i=0 to length(bits)-1{
bits[i] = XOR(bits[i],randomBits[i])
pseudo-random gen. algorithm (PRGA)
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap(S[i],S[j])
output S[(S[i] + S[j]) mod 256]
}
}
Decrypt(bits, key){ Encrypt(bits, key) }
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
PixelFlow
Unit
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Processing Element
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
PixelFlow
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Password Hacking
Two full chassis=
18 SIMD Arrays=
147,456 PE’s
Find the key used to encrypt a
stream of bytes using RC4
Check Unix passwords
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Password Hacking Results
Each PE checks 1,000
RC4 keys in 3.75s
=
38.8M/s for the system
Each PE checks 1,000
crypt passwords in 6s
=
24.6M/s for the system
System encrypts 614M
DES passwords/second
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Secret Key Cryptography
Test capability of XOR on different
cards using different ciphers
RC4
AES
Used OpenGL and GLUT, no CG
Did not use ‘the latest greatest’ cards
Basic steps
Disable dithering!
Copy pixels between coordinates, with
colormapping and XOR on or off
Why colormaps?
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Test systems
Note: 256KB RAM = 256 MB RAM?
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
RC4 on a GPU
Random bit stream must be
precomputed
Simply load the bit stream and
the data to the GPU and then
perform the XOR
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
RC4 Comparisons
(includes all data transfers)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
AES on a GPU
Key schedule is not done on GPU
Only store 1 byte of data in each
pixel
(De/En)crypts 4*n blocks
simultaneously, given that the
data is stored in n pixels
Tested against reference and
optimized CPU implementations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
AES on GPU data layout
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
AES Encryption Rates
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cryptography on GPU Conclusions
Performance isn’t great, now what?
GPU’s can still offload some cycles off the CPU?
• Not really, the CPU was still fully utilized
Look for a new application, like video system
described in their next paper
Resolution
Required
MB/s
GeForce3
RC4
Radeon 7500
RC4
TNT2 RC4
[email protected]
1.92
pass
pass
pass
[email protected]
2.88
pass
pass
pass
[email protected]
39.25
pass
maybe
fail
[email protected]
58.9
pass
fail
fail
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Remotely Keyed Cryptographics
Extend their previous work
RC4 can be used to sustain rates that are ok for
live video
The video is not susceptible to attacks from within
the OS
Finally? explained why they didn’t use CG:
“Higher level languages do not allow the developer to
specify which OpenGL commands are utilized when there
are multiple ways of implementing a function via OpenGL
commands and do not even guarantee the operations will
be transformed into OpenGL commands but instead may
transform it into C code… that may be executed on the
OS and not using OpenGL.”
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Regarding compression and
transmission
Encrypted data doesn’t compress
100 Mbps = 16.7 500x500 RGB
frames/s
If you know about the RFP protocol
(used in VNC) or MPG video, you
know that the entire screen isn’t
necessarily updated from frame to
frame, only the sections that need it
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
System Overview
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Session Overview
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Test systems (again)
Note: 256KB RAM = 256 MB RAM?
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Test results
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Remote Keyed CryptographyConclusion and Comments
99% of time taken on GPU is
writing the keystream
Rates still good for thin-clients
Great for encrypting very
sensitive images in a system that
is not completely trusted
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Presentation Conclusion
GPU’s can have lots and lots of FLOPs
Can we find applications that use
them as the main PU?
As a co-processor to offload cycles?
If we fail, can we apply a different
twist or find a special case/market of
the setting where we had failed
previously?
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Discussion
Could make the performance of
these algorithms better if we
could use fragment programs for
example?
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Additional References
Wikipedia
lots and lots of info and images taken from here
PixelFlow Website
In particular:
Eyles, John, Steven Molnar, John Poulton, Trey
Greer, Anselmo Lastra, and Nick England,
"PixelFlow: The Realization", Proceedings of the
Siggraph/Eurographics Workshop on Graphics
Hardware, Los Angeles, CA, August 3-4, 1997, 5768.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Descargar

COMP 238 -- Lecture 1, Intro