```Lecture 11

Circuits for floating-point operations
multiplication
division (only sketchy)
Oct 12

Consider a 4-digit decimal example


1. Align decimal points



9.999 × 101 + 0.016 × 101 = 10.015 × 101
3. Normalize result & check for over/underflow


Shift number with smaller exponent
9.999 × 101 + 0.016 × 101


9.999 × 101 + 1.610 × 10–1
1.0015 × 102
4. Round and renormalize if necessary

1.002 × 102

Now consider a 4-digit binary example


1. Align binary points



1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
3. Normalize result & check for over/underflow


Shift number with smaller exponent
1.0002 × 2–1 + –0.1112 × 2–1


1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + – 0.4375)
1.0002 × 2–4, with no over/underflow
4. Round and renormalize if necessary

1.0002 × 2–4 (no change) = 0.0625


Much more complex than integer adder
Doing it in one clock cycle would take too
long



Much longer than integer operations
Slower clock would penalize all instructions
FP adder usually takes several cycles

Can be pipelined
Start

1. Compare the exponents of the two numbers.
Shift the smaller number to the right until its
exponent would match the larger exponent
3. Normalize the sum, either shifting right and
incrementing the exponent or shifting left
and decrementing the exponent
Overflow or
underflow?
Yes
No
4. Round the significand to the appropriate
number of bits
No
Still normalized?
Yes
Done
Exception
Step 1
Step 2
Step 3
Step 4




Exponents of both operands must be equal before adding or
subtracting significands
Significands aligned by shifting the significand of the smaller
operand |E1-E2| base- positions to the right, increasing its
exponent, until exponents are equal
E1E2 -
Exponent of larger number not decreased - this will result in a
significand larger than 1 - a larger significand adder required




Addition - resultant significand M (sum of two
aligned significands) is in range 1/  M < 2
If M >1 - a postnormalization step - shifting
significand to the right to yield M3 and increasing
exponent by one - is required (an exponent
overflow may occur)
Subtraction - Resultant significand M is in range 0 
|M|< 1 - postnormalization step - shifting significand
to left and decreasing exponent - is required if
M<1/ (an exponent underflow may occur)
In extreme cases, the postnormalization step may
require a shift left operation over all bits in
significand, yielding a zero result
Example




F1=(0.100000)16  163 ; F2=(0.FFFFFF)16  162
Short IBM format ; calculate F1 – F2
Significand of smaller number (F2) is shifted to the
right - least-significant digit lost
Shift is time consuming - result is wrong
Example - Continued




Correct result (with “unlimited" number of
significand digits)
Error (also called loss of significance) is
0.1  16-2 - 0.1 
16-3 = 0.F  16-3
Solution to problem - guard digits - additional
digits to the right of the significand to hold shiftedout digits
In example - a single (hexadecimal) guard digit is
sufficient
Floating-Point Numbers





Step 1: Calculate difference d of the two exponents
- d=|E1 - E2|
Step 2: Shift significand of smaller number by d base positions to the right
Step 3: Add aligned significands and set exponent
of result to exponent of larger operand
Step 4: Normalize resultant significand and adjust
exponent if necessary
Step 5: Round resultant significand and adjust
exponent if necessary
Floating Point Complexities

Operations are somewhat more complicated

In addition to overflow we can have “underflow”

Accuracy can be a big problem



IEEE 754 keeps two extra bits, guard and round

four rounding modes

positive divided by zero yields “infinity”

zero divide by zero yields “not a number”

other complexities
Implementing the standard can be tricky
Not using the standard can be even worse
 see text for description of 80x86 and Pentium bug!
FP Arithmetic Hardware

FP multiplier is of similar complexity to FP


FP arithmetic hardware usually does



But uses a multiplier for significands instead
reciprocal, square-root
FP  integer conversion
Operations usually takes several cycles

Can be pipelined
Floating-Point Dividers
( s1  b e1) / ( s2  b e2) = ( s1 / s2 )  b e1-e2
Floating-point operands
s1 / s2  (0.5, 2): may need postshifting
Unpack
Overflow or underflow can occur
during division or normalization
Rounding considerations
XOR
Subt ract
Exponents
Divide
Signifi cands
Exponent
Quotient must be produced with two
extra bits (G and R), in case of the
need for a normalizing left shift
Normalize
Round
Exponent
Normalize
P ack
Quotient
Block diagram of a floating-point divider
round and sticky bits
Original source: CMU 18-447 S’09 L4-21© 2009 J. C. Hoe
Original source: CMU 18-447 S’09 L4-21© 2009 J. C. Hoe
FP Instructions in MIPS
FP hardware is coprocessor 1

Adjunct processor that extends the ISA
Separate FP registers


32 single-precision: \$f0, \$f1, … \$f31
Paired for double-precision: \$f0/\$f1, \$f2/\$f3, …
FP instructions operate only on FP registers


Programs generally don’t do integer ops on FP data, or vice
versa
More registers with minimal code-size impact

lwc1, ldc1, swc1, sdc1
 e.g., ldc1 \$f8, 32(\$sp)
FP Instructions in MIPS

Single-precision arithmetic



Double-precision arithmetic



e.g., mul.d \$f4, \$f4, \$f6
Single- and double-precision comparison


c.xx.s, c.xx.d (xx is eq, lt, le, …)
Sets or clears FP condition-code bit


e.g. c.lt.s \$f3, \$f4
Branch on FP condition code true or false

bc1t, bc1f

e.g., bc1t TargetLabel
Floating-Point Instructions
Floating-point arithmetic instructions for MIPS:
sub.d
mul.d
div.s
neg.s
31
F
\$f0,\$f8,\$f10
\$f0,\$f8,\$f10
\$f0,\$f8,\$f10
\$f0,\$f8,\$f10
\$f0,\$f8
op
25
ex
#
#
#
#
#
20
set
set
set
set
set
ft
\$f0
\$f0
\$f0
\$f0
\$f0
15
to
to
to
to
to
fs
(\$f8) +fp
(\$f8) –fp
(\$f8) fp
(\$f8) /fp
–(\$f8)
10
fd
(\$f10)
(\$f10)
(\$f10)
(\$f10)
fn
5
0
0 1 0 0 0 1 0 0 0 0 x 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 x x x
Floating-point
instruction
s=0
d=1
Source
register 2
Source
register 1
Destination
register
sub.* = 1
mul.* = 2
div.* = 3
neg.* = 7
The common floating-point instruction format for MIPS and
components for arithmetic instructions. The extension (ex) field
distinguishes single (* = s) from double (* = d) operands.
Floating-Point Format Conversions
Floating-Point Data Transfers
MIPS instructions for floating-point load, store, and move:
lwc1
swc1
mov.s
mov.d
mfc1
mtc1
31
F
\$f8,40(\$s3)
\$f8,A(\$s3)
\$f0,\$f8
\$f0,\$f8
\$t0,\$f12
\$f8,\$t4
op
25
20
ft
store (\$f8) into mem[A+(\$s3)]
15
fs
10
fd
5
fn
0
0 1 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Floating-point
instruction
31
R
ex
#
#
#
#
#
#
op
s=0
d=1
25
rs
Unused
20
rt
Source
register
15
rd
Destination
register
10
sh
mov.* = 6
5
fn
0
0 1 0 0 0 1 0 0 x 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Floating-point
instruction
mfc1 = 0
mtc1 = 4
Source
register
Destination
register
Unused
Unused
Instructions for floating-point data movement in MIPS.
Floating-Point Branches and Comparisons
MIPS instructions for floating-point load, store, and move:
bc1t
bc1f
c.eq.*
c.lt.*
c.le.*
31
I
L
L
\$f0,\$f8
\$f0,\$f8
\$f0,\$f8
op
25
20
branch on fp flag true
branch on fp flag false
if (\$f0)=(\$f8), set flag to “true”
if (\$f0)<(\$f8), set flag to “true”
if (\$f0)(\$f8), set flag to “true”
rt
operand / offset
15
0
0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 x 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
Floating-point
instruction
31
F
rs
#
#
#
#
#
op
bc1? = 8
25
ex
true = 1
false = 0
20
ft
Offset
15
fs
10
fd
5
fn
0
0 1 0 0 0 1 0 0 0 0 x 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Floating-point
instruction
s=0
d=1
Source
register 2
Source
register 1
Unused
c.eq.* = 50
c.lt.* = 60
c.le.* = 62
Floating-point branch and comparison instructions in MIPS.
Floating-Point
Instructions
MIPS
Copy
Arithmetic
*
s/d for single/double
# 0/1 for single/double
Conversions
Memory access
Control transfer
Instruction
Usage
Move s/d registers
Move fm coprocessor 1
Move to coprocessor 1
Subtract single/double
Multiply single/double
Divide single/double
Negate single/double
Compare equal s/d
Compare less s/d
Compare less or eq s/d
Convert integer to single
Convert
integer
to
double
Convert single to double
Convert double to single
Convert single to integer
Convert
double
to
integer
Store word coprocessor 1
Branch coproc 1 true
Branch coproc 1 false
mov.* fd,fs
mfc1 rt,rd
mtc1 rd,rt
sub.* fd,fs,ft
mul.* fd,fs,ft
div.* fd,fs,ft
neg.* fd,fs
c.eq.* fs,ft
c.lt.* fs,ft
c.le.* fs,ft
cvt.s.w fd,fs
cvt.d.w fd,fs
cvt.d.s fd,fs
cvt.s.d fd,fs
cvt.w.s fd,fs
cvt.w.d fd,fs
lwc1 ft,imm(rs)
swc1 ft,imm(rs)
bc1t L
bc1f L
ex fn
#
0
4
#
#
#
#
#
#
#
#
0
0
1
1
0
1
rs
rs
8
8
6
0
1
2
3
7
50
60
62
32
33
33
32
36
36
FP Example: °F to °C

C code:
float f2c (float fahr) {
return ((5.0/9.0)*(fahr - 32.0));
}
fahr in \$f12, result in \$f0, constants in addresses
specified.

Compiled MIPS code:
f2c: lwc1
lwc2
div.s
lwc1
sub.s
mul.s
jr
\$f16,
\$f18,
\$f16,
\$f18,
\$f18,
\$f0,
\$ra
const5
const9
\$f16, \$f18
const32
\$f12, \$f18
\$f16, \$f18
FP Example: Array Multiplication

X=X+Y×Z


All 32 × 32 matrices, 64-bit double-precision elements
C code:
void mm (double x[][],
double y[][], double z[][]) {
int i, j, k;
for (i = 0; i! = 32; i= i + 1)
for (j = 0; j! = 32; j = j + 1)
for (k = 0; k! = 32; k = k + 1)
x[i][j] = x[i][j]
+ y[i][k] * z[k][j];
}

Addresses of x, y, z in \$a0, \$a1, \$a2, and
i, j, k in \$s0, \$s1, \$s2
FP Example: Array Multiplication

MIPS code:
li
li
L1: li
L2: li
sll
sll
l.d
L3: sll
sll
l.d
…
\$t1, 32
\$s0, 0
\$s1, 0
\$s2, 0
\$t2, \$s0, 5
\$t2, \$t2, \$s1
\$t2, \$t2, 3
\$t2, \$a0, \$t2
\$f4, 0(\$t2)
\$t0, \$s2, 5
\$t0, \$t0, \$s1
\$t0, \$t0, 3
\$t0, \$a2, \$t0
\$f16, 0(\$t0)
#
#
#
#
#
#
#
#
#
#
#
#
#
#
\$t1 = 32 (row size/loop end)
i = 0; initialize 1st for loop
j = 0; restart 2nd for loop
k = 0; restart 3rd for loop
\$t2 = i * 32 (size of row of x)
\$t2 = i * size(row) + j
\$t2 = byte offset of [i][j]
\$t2 = byte address of x[i][j]
\$f4 = 8 bytes of x[i][j]
\$t0 = k * 32 (size of row of z)
\$t0 = k * size(row) + j
\$t0 = byte offset of [k][j]
\$t0 = byte address of z[k][j]
\$f16 = 8 bytes of z[k][j]
FP Example: Array Multiplication
…
sll \$t0, \$s0, 5
sll
\$t0, \$t0, 3
l.d
\$f18, 0(\$t0)
mul.d \$f16, \$f18, \$f16
bne
\$s2, \$t1, L3
s.d
\$f4, 0(\$t2)
bne
\$s1, \$t1, L2
bne
\$s0, \$t1, L1
#
#
#
#
#
#
#
#
#
#
#
#
#
#
\$t0 = i*32 (size of row of y)
\$t0 = i*size(row) + k
\$t0 = byte offset of [i][k]
\$t0 = byte address of y[i][k]
\$f18 = 8 bytes of y[i][k]
\$f16 = y[i][k] * z[k][j]
f4=x[i][j] + y[i][k]*z[k][j]
\$k k + 1
if (k != 32) go to L3
x[i][j] = \$f4
\$j = j + 1
if (j != 32) go to L2
\$i = i + 1
if (i != 32) go to L1
A simple example – with MIPS
implementation
Example: Write a MIPS program to load the
smallest positive real number into the coprocessor, multiply it by 10 and display the
result.
involves using the instruction
l.s \$fxx num
# num is a memory location
Code in MIPS
.text
.globl
main
main:
l.s
\$f6, small
l.s \$f8, ten
mul.s \$f12, \$f6, \$f8
exit:
li
\$v0, 4
la
\$a0, msg2
syscall
# print message
li
\$v0,2
syscall
# print sum
li
\$v0,4
la
\$a0, cr
syscall
li
\$v0,10
syscall
# print an end of line
.data
small: .word 0x00000001
ten:
.float 10.0
msg2:
cr:
# exit
# this is number 2^(-140)
.asciiz "Ten times 2^(-140) =
.asciiz "\n"
"
MARS output
Example of FP underflow
In the previous example, suppose we multiply
the result by 0.5 what will be the result?
We can try this in MARS.
This is the correct answer, since the operation
has caused FP underflow.

Parallel programs may interleave operations in
unexpected orders
 Assumptions of associativity may fail
(x+y)+z
x+(y+z)
-1.50E+38
x -1.50E+38
y 1.50E+38 0.00E+00
z
1.0
1.0 1.50E+38
1.00E+00 0.00E+00

Need to validate parallel programs under
varying degrees of parallelism
§3.6 Parallelism and Computer Arithmetic: Associativity
Associativity
Concluding Remarks

ISAs support arithmetic



Bounded range and precision


Signed and unsigned integers
Floating-point approximation to reals
Operations can overflow and underflow
MIPS ISA

Core instructions: 54 most frequently used


100% of SPECINT, 97% of SPECFP
Other instructions: less frequent
```