Number eight of a series
Drinking from the Firehose
Many chips from one –
Specification in the Mill™ CPU Architecture
2014-05-14
Mill Computing
1
Patents pending
The Mill CPU
The Mill is a new general-purpose commercial CPU family.
The Mill has a 10x single-thread power/performance gain
over conventional out-of-order superscalar architectures,
yet runs the same programs, without rewrite.
This talk will explain:
• configurable architecture strategy
• attributed specification
• operation set specification
• component configuration at core/chip/board levels
• automatic tool generation
2014-05-14
Mill Computing
2
Patents pending
Talks in this series
1.
2.
3.
4.
5.
6.
7.
8.
9.
Encoding
The Belt
Memory
Prediction
Metadata and speculation
Execution
Security
Specification
…
You are here
Slides and videos of other talks are at:
MillComputing.com/docs
2014-05-14
Mill Computing
3
Patents pending
The Mill Architecture
Specification and configuration
New with the Mill:
Family members built from specifications
Reusable components
Instruction set built by composing attributes
Fully regular instruction set
Mechanically generated bit-level encoding
Entropy-optimal encoding throughout
Configuration-specific generated tool sets
Asm, sim, debugger, compiler, …
Generated hardware
Verilog from specification
2014-05-14
Mill Computing
4
Patents pending
Caution!
Gross over-simplification!
This talk tries to convey an intuitive
understanding to the non-specialist.
The reality is more complicated.
2014-05-14
Mill Computing
5
Patents pending
Specification
Unlike other talks in this series…
This talk does not describe
the Mill architecture
It describes how the operation set and particular
family member micro-architectures are specified.
It describes, and demonstrates, some of the software
tools built from the specifications.
It describes how the specification supports manual
creation of Mill hardware.
2014-05-14
Mill Computing
6
Patents pending
Specification
Unlike other talks in this series…
This talk does not describe
the Mill architecture
The specification tools are for internal use in creation of
Mill CPUs; the tools are not intended to be products.
By use of these tools, we can create new Mill chip
products more quickly and at lower cost than usual.
The intended audience includes tool designers and
software developers interested in advanced design.
2014-05-14
Mill Computing
7
Patents pending
Specification
The Mill is a family of member CPUs sharing an
abstract operation set and micro-architecture.
abstract
Mill CPU architecture
specification
driven
family
members
Tin
Copper
Silver
Gold
Members differ in concrete operation set and micro-architecture..
Designers describe a concrete member by writing a specification.
2014-05-14
Mill Computing
8
Patents pending
Specification
Software automatically creates system software, verification
tests, documentation, and a hardware framework for the
new member from the specification.
Mill CPU architecture
abstract
specification
driven
family
members
Tin
Copper
Silver
Gold
data driven
tools
2014-05-14
compiler
Mill Computing
asm
debugger
9
sim
Patents pending
HWgen
C++ compiler masquerade
assembly
language
The Mill assembler syntax is C++.
Suitably disguised.
2014-05-14
Mill Computing
10
Patents pending
Two-pass assemblers
Traditional assemblers have two passes.
The first pass treats the source as a program in a
meta-language, usually a macro language, and
interprets that program to produce a different source
program in machine language.
The second pass translates the program in machine
language to binary and produces the executable file.
source file
load module
macro
language
2014-05-14
Mill Computing
first
pass
machine
language
11
second
pass
Patents pending
binary
The Mill assembler uses the C++ compiler
The first pass is the C++ compiler, which translates the
assembly language source program to an executable.
The second pass is the execution of the C++ program,
to emit binary and produce the executable file.
source file
load module
C++
C++
compiler
program
execution
source file
load module
macro
language
2014-05-14
binary
Mill Computing
first
pass
machine
language
12
second
pass
Patents pending
binary
C++ is Mill assembly language?
Each assembler operation is a C++ function call.
conventional
assembler
Mill
assembler
add b3,b5
jump loop
loop:
add(b3,b5)
br("loop")
L("loop")
A Mill instruction is a C++ statement
comprising operations separated by commas
add(b0, 3), store(*b5, b7), br("loop“);
operations
2014-05-14
Mill Computing
instruction
13
Patents pending
C++ as meta-language
Each call of an asm function emits that operation.
for(int i = 1; i < 5; ++i)
add(b0, i);
gives the same machine code as:
add(b0,
add(b0,
add(b0,
add(b0,
1);
2);
3);
4);
As in a macro assembler, in Mill assembler you
can meta-program what your program will be.
2014-05-14
Mill Computing
14
Patents pending
Demo: extend a core
The test case contains this code fragment:
con(w(fpemu::st2bin32("3.0")));
con(w(fpemu::st2bin32("5.0")));
addb(b0, b1);
nop();
nop();
nop();
However:
the Tin CPU does not support native floating-point
2014-05-14
Mill Computing
15
Patents pending
Demo: extend a core
Build for Tin – fail
2014-05-14
Mill Computing
16
Patents pending
Demo: create new “Demo” member like Tin
Copy code tree Tin -> Demo
Clear old files from new tree
Tell builder tool to use new member
Create new specification from Tin spec
2014-05-14
Mill Computing
17
Patents pending
Demo: update Demo member with FPU
Populate execution pipeline slot with floating point
2014-05-14
Mill Computing
18
Patents pending
ISA design by composition
attributes
The semantic pieces of operations
2014-05-14
Mill Computing
19
Patents pending
Operation attributes
A Mill operation invocation comprises a core operation
and values for some number of attributes.
addus(b3, 17)
core operation add
attribute
value
domain
overflow
opand0
imm0
unsigned integer
saturating
b3 – third belt position
17 - literal
Specific attribute values are supplied by the operation
mnemonic or by an argument to the operation function.
There are ~50 attributes. Only a handful are
meaningful for any particular operation.
2014-05-14
Mill Computing
20
Patents pending
Attribute values
Most attributes are enumerations:
enum directionCode {leftward, rightward};
enum condSenseCode {allSense, falseSense, trueSense};
enum overflowCode {excepting, modulo, saturating,
widening};
enum domainCode{binFloat, boolean, decFloat, logical,
pointers, signedInt, unsignedInt};
Attribute values can be specified individually, or as bitsets
with a selection of values of the same attribute.
2014-05-14
Mill Computing
21
Patents pending
Mnemonics
Each opcode and attribute value has a text string nick.
value
nick
leftward
rightward
signedInt
unsignedInt
l
r
s
u
Spec software concatenates the nicks of the opcode and
attributes to make the assembler mnemonic automatically.
shiftrs
operation is shift, right, signed
There are ~120 core ops and ~1000 mnemonics.
2014-05-14
Mill Computing
22
Patents pending
Attribute semantics
Besides its type, each attribute has three choices:
how values are expressed in assembler source
• by mnemonic, based on the function name
• by parameter, based on explicit argument
• derived from other attributes, not in source
how values are encoded in target binary code
•
•
•
•
pinned in a single bit field in all formats
direct in different bit fields in different formats
merged into an opcode super-field
uncoded, for internal use only
how the set of permitted values is determined
• universal, same for all slots for all members
• by member, same for all slots on a given member
• by slot, may vary based on available entropy
2014-05-14
Mill Computing
23
Patents pending
A candidate operation
Semantics of the new operation:
#define
N = 7
uint16_t NEWOP (uint16_t a, uint16_t b) {
return (a << N) + b + 1; }
Assembler:
shift
add
increment
Any ALU can do this in one cycle
Pick a name:
2014-05-14
Mill Computing
Pick a value for N:
24
Patents pending
Demo: define a new opcode
Add new opcode
Add printname
Add traits
2014-05-14
Mill Computing
– opAttr.hh
– opAttr.cc
– attrTraits.cc
25
Patents pending
Argument signatures
Some attributes get their value from the function
arguments in the operation, rather than the mnemonic.
arg kind
meaning
exuArg
immArg
bitArg
offArg
exu-side belt position
small immediate constant
bit number
load/store offset
Argument nicks are concatenated into signatures.
signature
arguments, in order
exuBitSig
baseOffWidthfSig
exuExuExuSig
belt position, bit number
address base, offset, width
three belt positions
Ops are uniquely identified by their mnemonic and signature
2014-05-14
Mill Computing
26
Patents pending
Operation patterns
Operations are defined as patterns, not individually.
An operation pattern comprises:
• the core operation and its encoding block
• the argument list
• all meaningful values for all mnemonic attributes
Each pattern defines all the operations that result from
the cross-product of attribute values: the models.
opPattern(exuBlock, addOp) << floats << roundings << exuArg
<< exuArg;
This defines 12 models: six different rounding modes
for each of binary and decimal floating point
There are around a thousand models.
2014-05-14
Mill Computing
27
Patents pending
What attributes for our new operation?
What domain?
signedInt?
unsignedInt?
What about overflow?
ignore it?
mark result as an error?
saturate to maximal value?
produce a double-width result?
Where to encode it?
exuBlock?
What arguments?
exuArg, exuArg?
2014-05-14
Mill Computing
28
Patents pending
Demo: define a new operation
Add specification
– opSpecs.cc
Add sim implementation
build sim
2014-05-14
Mill Computing
29
Patents pending
Say how – or say what?
specification
Hardware development made easy.
2014-05-14
Mill Computing
30
Patents pending
Abstract Mill-ness
The Mill is a family of member CPUs sharing an abstract
operation set and micro-architecture.
abstract Mill
operation set
microarchitecture
2014-05-14
Mill Computing
31
Patents pending
Abstract Mill-ness
The Mill is a family of member CPUs sharing an abstract
operation set and micro-architecture.
abstract Mill
operation set
microarchitecture
2014-05-14
Mill Computing
32
Patents pending
Specifications make concrete from abstract
The Mill is a family of member CPUs sharing an abstract
operation set and micro-architecture.
abstract Mill
operation set
microarchitecture
concrete
Mill chips
specifications
Monocore
Crimson
2014-05-14
Mill Computing
33
Patents pending
...
Why specification/configuration?
Creating a CPU by hand is fabulously expensive.
Much of CPU implementation is repetitive, error-prone,
tedious and wasteful.
Often the design winds up sub-optimal because it’s too
much trouble to change it yet again
The Mill team knew it lacked the resources to implement
– and re-implement – a moving target from scratch
So we got the software to do it
• can address multiple markets efficiently
Result: • fast pivots for new chips
• economy for company and customers
2014-05-14
Mill Computing
34
Patents pending
Concrete Mill chips
Each concrete chip is specified as a set of components,
including cores, caches, memory controllers, etc.
“Crimson” chip
concrete
Mill chips
Monocore
Crimson
2014-05-14
Mill Computing
35
Patents pending
...
Concrete Mill chips
Each concrete chip is specified as a set of components,
including cores, caches, memory controllers, etc.
“Crimson” chip
Copper
core
2014-05-14
Mill Computing
“Copper” core
Silver
core
36
caches
...
Patents pending
Concrete Mill cores
The component cores in turn specify still more nested
components.
“Copper” core
caches
specRegs
Belt
ALUs
2014-05-14
Mill Computing
decoders
37
...
Patents pending
Recursive specification
Big parts have little parts,
within each to excite ’em;
and little parts have smaller parts,
and so ad infinitum
Apologies to
Jonathan Swift
It’s components, all the way down!
2014-05-14
Mill Computing
38
Patents pending
Component parameters
Components have parameters to define their function.
belt
cache
size = 16
predictor
exit table size = 2048
latency = 2
…
bank count = 4
line width = 64
evict policy = LRU
way count = 4
…
Components of the same kind but different parameter
values can be collected in palettes for reuse in
designing other Mills.
2014-05-14
Mill Computing
39
Patents pending
Behind components
Behind each component kind is hand-written software:
• An emulation function sits in the simulator.
It defines what the component does in the machine.
• A generator function sits in the generator.
It emits the Verilog starting point for hardware.
The emulation function is definitive; if the hardware
doesn’t match the simulator then the simulator is right.
2014-05-14
Mill Computing
40
Patents pending
Clock domains
The Mill sim is event-driven at pico-second accuracy.
All components reside in a clock domain. By default
sub-components reside in the domain of their parent.
Xtal components create top-level clock domains.
PLL components link different domains. The ratio
registers are in MMIO space for program control.
A simulated Mill program can use simulated MMIO
to control the simulated hardware and change the
simulated clock rate that it itself is running under.
2014-05-14
Mill Computing
41
Patents pending
Memory hierarchy
Components that derive from the memLevel type can be
hooked together to model the memory hierarchy.
The connections are streams of requests and
responses. Each component only deals with the stream.
It does not know or care what is on the other end.
The streams use predictive throttling for congestion
control, similar to network message methods.
Streams run at full speed, without handshaking delay.
2014-05-14
Mill Computing
42
Patents pending
Demo: try it out
run sim
2014-05-14
Mill Computing
-
ivan/build/testAsm.sim
43
Patents pending
Other roads…
There are other architectures that provide operation
specification. These differ significantly from the Mill.
purpose:
add special-purpose embedded operations
form optimal subsets for family members
encoding:
reserved bit patterns, manually selected
automatically generated optimal-entropy
specification:
one-at-a-time manual process
pattern-based orthogonal generation
2014-05-14
Mill Computing
44
Patents pending
Summary:
The Mill:
Defines operations by composing attributes
Tool produces cross-product of attributes
Defines members by component lists
Recursive composition – mix and match
Compact notation expresses clock, memory
Says what connects to what, tool creates “how”.
2014-05-14
Mill Computing
45
Patents pending
Shameless plug
For technical info about the Mill CPU architecture:
MillComputing.com/docs
To sign up for future announcements, white papers etc.
MillComputing.com/mailing-list
2014-05-14
Mill Computing
46
Patents pending
Descargar

Document