By Trevor Tonn
CS147 Spring 2009
II. The SPARC Architecture
III. SPARC in the Marketplace
IV. SPARC Today and Tomorrow
RISC: Defintion & Background
Reduced Instruction Set Computing, also
called “load-store architecture” (due to the
load-store instructions for accessing memory).
In the 1970's researchers noticed that a
promising alternative to providing a large set
of instructions to the ISA (instruction set arch.)
would be to support only the most frequently
used instructions, leaving the scarcely used
instructions to be implemented by these as
instruction sequences.
RISC: Which instructions are used most?
(% of total
Conditional branch 20
Move register4
Developers at IBM in
the 70's found that
80% of a typical
programs computations required only
about 20% of the
instructions available in
the processor's ISA.
Focus: few, wellchosen, simple
instructions and an
optimizing compiler.
RISC: Design Principles
Simple instructions and few addressing modes
Instructions conform to a simple format
Extra stuff left out saves space for other things
Reduces decoding delays
Load-store design
Everything operates on registers only; load from
memory to the large number of registers first,
then manipulate the registers only; store to
memory when done.
RISC: Design Principles (cont)
Hardwired control—no microcode
Goal: Execute one instruction per clock cycle
No translation from machine instructions,
freeing CPU cycles needed to perform an
Also frees up chip space.
Uses pipelining and other features/principles
described to get there.
Simplicity to facilitate the use of higher
frequency clock cycles
RISC: Advantages
The fewer number of instructions required
relatively little on-chip control logic, leaving
space on the chip for other functions:
enhances performance & versatility of the
One example is the use of a large Register File
(array of registers defined by the ISA) to allow
for the register-window approach utilized in
SPARC; lots of registers available.
Fast cycle time coupled with a high
performance memory hierarchy can yield
incredible processing power.
RISC: Advantages (cont)
The execution time for a
large, compute-bound
program can be
expressed as the
product of three terms:
The second equation is a
simplified version of the
RISC's achieve high
of performance by
Ip = # of instr executed by program
Cp = avg # of cpu clock cycles per
instr executed by program
T = time per cycle (usually
MIPSp = million instr per second
minimizing Ip and
maximizing MIPSp.
Ip – lots of registers (> 16)
Cp – simple instructions
RISC: Disadvantages
Lack the more powerful instructions of CISC;
requires many clock cycles to execute the
many simple instructions that make up the
equivalent instruction sequences.
Execution of a lot of small instructions causes
a lot of instruction traffic—more than CISC.
The advantages of a high clock
frequency are at least partially
offset by these characteristics.
RISC: Solutions & Impure RISC
Instead of focusing on the most commonly
used ops, add in some sets of complex
instructions whose equivalent instruction
sequences bring RISC to its knees.
If you need fast floating point performance, add
in some complex fp instructions, for example.
Use free chip space to facilitate techniques to
ease instruction traffic problem.
Register-window technique used in SPARC.
These & other solutions go against some
of the original design principles—impure RISC.
Most implementations start with pure
RISC and enhance it to fulfill requirements.
RISC: Register-window
Each assembly procedure has a “window” of
registers available to it, with an area of overlap
between the procedure and the calling
procedure to facilitate efficient parameter
passing—no need to save or reload registers.
The windows change dynamically on
procedure entry/exit.
Reduces instruction traffic.
SPARC: What is it?
Scalable Processor
Designed by Sun
Microsystems 1984-1987.
Based on RISC work done
at UC Berkeley in 1980-82.
An architecture with many
families of processors
created by several
SPARC: Brief History
3 major revisions to the architecture
1) SPARC-V7, 32bit, 1986
2) SPARC-V8, 32bit, 1990
3) SPARC-V9, 64bit, 1993
UltraSPARC extension, 1995
Backwards, binary compatibility between all
SPARC: Brief History (cont)
V9 greatly improves upon V8:
64bit integer mul & div instructions
load/store floating-point quadword instructions
Load & store 128bits at a time
Software-settable branch prediction
Branch on register value
Conditional move instruction
Reduces total number of instructions to execute
Allows you to remove branch instructions.
Improved support for very large-scale multiprocessors
Relaxed memory ordering model
SPARC: Architecture Features
Integer unit (IU), floating-point unit (FPU),
optional implementation-defined coprocessor
(CP), each with its own set of registers.
Allows for maximum concurrency between
integer, floating-point & coprocessor
All IU & FPU registers are 32bits wide
Instructions operate on single, pairs and
quads of registers.
SPARC: Integer & Floating-point Units
IU may contain between 40 and 520 general
purpose registers. FPU has 32 registers.
Groups of 2 to 32 overlapping register
No direct path between FPU & IU—must be
accessed by load/store calls.
FPU can have several multipliers & adders
Register windows perform well with LISP and
OO languages like Smalltalk.
Implementation dependent
FPU: Concurrent execution of add/mul &
SPARC: Core Pipeline
SPARC: Multiprocessor instructions
Two special instructions support tightly coupled
• swap
Exchanges contents of an IU register with a
word from memory while preventing other
memory accesses from intervening on the
memory or I/O bus.
Can be used with a CP to perform other
synchronization techniques.
Can be used to create semaphores.
SPARC: Firsts...
register windows (1987)
32-way server on a chip (UltraSPARC T1,
64-way SuperServer (SuperSPARC XDBus,
Major 64bit architecture (UltraSPARC, 1995)
IBM POWER in 1998, x86_64 in 2003
Many more...
SPARC in the Marketplace
Sun developed Solaris (a UNIX variant) and
sold hardware—sell them together.
Licensed implementations by other companies
like Fujitsu, LSI Logic, and Texas Instruments
Open sourced the design of UltraSPARC,
allowing grass-roots development and new
OpenSPARC T1 and T2 processors, 2005 &
2008 respectively
“open source processors”
TODAY: Architecture
Although SPARC V9 allows its implementations
freedom in their MMU (memory management
unit) designs, SPARC JPS1 defines a common
MMU architecture with some specifics left to
Current products like the Sun SPARC64 series
are based on this specification.
Still binary compatible with V8 implementations.
TODAY: Current Features
Characteristics of current V9 implementations:
Ex: Fujitsu SPARC Enterprise M9000
Quadcore SPARC64 VII processors scalable to 256
cores (64 processors)
2.52GHz maximum clock frequency
SPARC V9/JPS2 architecture implementation
Multiple threads per core
Multi-threading technology minimizes CPU core
wait times and increases CPU core utilization.
SMT (Simultaneous Multithreading) enables two
threads running in parallel.
Sun's involvement is uncertain as they've
recently claimed “4 years to go” in their
transition to x86.
Sun is still developing processors based on
V9 architecture and its updates
'Rock' processor to be released in 2009
OpenSPARC continues to build on
UltraSPARC architecture, which is a
descendent & fully compatible with V9.
Active community involvement, new “open”
processor implementations of architecture.
Stone, Harold S. High-performance computer architecture. Menlo
Park: Addison-Wesley, 1990
Šilc, Jurij, Borut Robič, Theo Ungerer. Processor Architecture:
From Dataflow to Superscalar and Beyond. Berlin: SpringerVerlag, 1999
Weaver, David L., Tom Germond. The SPARC Architecture
Manual. Menlo Park: PTR Prentice Hall, 1994
Catanzaro, Ben J. The SPARC Technical Papers. New York:
Springer-Verlag, 1991
OpenSPARC T1 Microarchitecture Specification. Revision A. Sun
Microsystems. Santa Clara, 2006
SPARC Joint Programming Specification (JPS1): Commonality.
Release 1.0.4. Sun Microsystems and Fujitsu Limited. Santa
Clara/Japan, 2002

SPARC Architecture - SJSU Computer Science Department