ACT 7000SC
64-Bit Superscaler Microprocessor
Features
■
■
Full militarized QED RM7000 microprocessor
Dual Issue symmetric superscalar microprocessor with
instruction prefetch optimized for system level
price/performance
150, 200, 210, 225 MHz operating frequency
Consult Factory for latest speeds
●
MIPS IV Superset Instruction Set Architecture
●
■
Integrated memory management unit (ACT52xx compatible)
●
●
●
●
Fully associative joint TLB (shared by I and D translations)
48 dual entries map 96 pages
4 entry DTLB and 4 entry ITLB
Variable page size (4KB to 16MB in 4x increments)
Specialized DSP integer Multiply-Accumulate instruction,
(MAD/MADU) and three-operand multiply instruction (MUL/U)
Per line cache locking in primaries and secondary
Bypass secondary cache option
I&D Test/Break-point (Watch) registers for emulation & debug
Performance counter for system and software tuning & debug
Ten fully prioritized vectored interrupts - 6 external, 2 internal, 2
software
Fast Hit-Writeback-Invalidate and Hit-Invalidate cache operations
for efficient cache management
■
Embedded application enhancements
●
●
●
●
●
●
■
High performance interface (RM52xx compatible)
●
●
●
●
600 MB per second peak throughput
75 MHz max. freq., multiplexed address/data
Supports 1/2 clock multipliers (2, 2.5, 3, 3.5, 4, 4.5, 5, 6, 7, 8, 9)
IEEE 1149.1 JTAG (TAP) boundary scan
■
Integrated primary and secondary caches - all are 4-way set
associative with 32 byte line size
●
●
●
●
16KB instruction
16KB data: non-blocking and write-back or write-through
256KB on-chip secondary: unified, non-blocking, block writeback
Data PREFETCH instruction allows the processor to overlap cache
miss latency and instruction execution
Floating point combined multiply-add instruction increases
performance in signal processing and graphics applications
Conditional moves reduce branch frequency
Index address modes (register + register)
■
High-performance floating point unit - 600 M FLOPS
maximum
●
●
■
MIPS IV instruction set
●
●
●
Single cycle repeat rate for common single-precision operations
and some double-precision operations
Single cycle repeat rate for single-precision combined multiply-
add operations
Two cycle repeat rate for double-precision multiply and
double-precision combined multiply-add operations
Standby reduced power mode with WAIT instruction
4 watts typical @ 2.5V Int., 3.3V I/O, 200MHz
■
Fully static CMOS design with dynamic power down logic
●
●
●
●
■
Embedded supply de-coupling capacitors and additional PLL
filter components
■
■
208-lead CQFP, cavity-up package (F17)
208-lead CQFP, inverted footprint (F24), with the same pin
rotation as the commercial QED RM5261
BLOCK DIAGRAM
On - Chip 256K Byte Secondary Cache, 4 - Way Set Associative
Secondary Tags
Set A
Primary Data Cache
4 - Way Set Associative
Secondary Tags
Set B
DTag
DTLB
Secondary Tags
Set C
ITag
ITLB
Secondary Tags
Set D
Primary Instruction Cache
4 - Way Set Associative
A/D Bus
Pad Bus
Store Buffer
Write Buffer
Read Buffer
Pad Buffer
Address Buffer
Prefetch Buffer
Instruction Dispatch Unit
F Pipe Register
M Pipe Register
F-Pipe Bus
M-Pipe Bus
D Bus
Floating-Point
Load / Align
Floating-Point
Register File
Packer / Unpacker
Comparator
Floating-Point
MultAdd, Add, Sub,
Cvt, Div, Sqrt
Multiplier Array
Floating - Point Control
Joint TLB
Coprocessor 0
System / Memory
Control
PC Incrementer
Branch PC Adder
ITLB Virtuals
Program Counter
DVA
Load Aligner
Integer Register File
M Pipe
Adder
StAin/Sh
Logicals
FA Bus
IVA
F Pipe
Adder
Shifter
Logicals
DTLB Virtuals
PLL/Clocks
Int Mult. Div. Madd
eroflex Circuit Technology – MIPS RISC Microprocessors © SCD7000SC REV B 7/30/01
Integer Control
DESCRIPTION
The ACT 7000SC is a highly integrated symmetric
superscalar microprocessor capable of issuing two
instructions each processor cycle. It has two high
performance 64-bit integer units as well as a high
throughput, fully pipelined 64-bit floating point unit. To
keep its multiple execution units running efficiently,
the ACT 7000SC integrates not only 16KB 4-way set
associative instruction and data caches but backs
them up with an integrated 256KB 4-way set
associative secondary as well. For maximum
efficiency, the data and secondary caches are
writeback and nonblocking. A RM52XX family
compatible, operating system friendly memory
management unit with a 64/48-entry fully associative
TLB and a high-performance 64-bit system interface
supporting hardware prioritized and vectored
interrupts round out the main features of the
processor.
The ACT 7000SC is ideally suited for highend
embedded
control
applications
such
as
internetworking,
high
performance
image
manipulation, high speed printing, and 3-D
visualization.
CPU Registers
Like all MIPS ISA processors, the ACT 7000SC
CPU has a simple, clean user visible state consisting
of 32 general purpose registers, or GPR’s, two special
purpose registers for integer multiplication and
division, and a program counter; there are no
condition code bits. Figure 1 shows the user visible
state.
Superscalar Dispatch
The ACT 7000SC has an efficient symmetric
superscalar dispatch unit which allows it to issue up to
two instructions per cycle. For purposes of instruction
issue, the ACT 7000SC defines four classes of
instructions: integer, load/store, branches, and
floating-point. There are two logical pipelines, the
function,
or F, pipeline and the
memory,
or M,
pipeline. Note however that the M pipe can execute
integer as well as memory type instructions.
Table 1 – Instruction Issue Rules
F Pipe
one of:
integer, branch, floating-point,
integer mul, div
M Pipe
one of:
integer, load/store
HARDWARE OVERVIEW
The ACT 7000SC offers a high-level of integration
targeted
at
high-performance
embedded
applications. The key elements of the ACT 7000SC
are briefly described below.
Figure 2 is a simplification of the pipeline section
and illustrates the basics of the instruction issue
mechanism.
General Purpose Registers
63
0
r1
r2
•
•
•
•
r29
r30
r31
63
PC
Program Counter
0
0
63
Multiply/Divide Registers
0
HI
63
LO
0
Figure 1 – CP0 Registers
Aeroflex Circuit Technology
2
SCD7000SC REV B 7/30/01 Plainview NY (516) 694-6700
.
Table 2 – Dual Issue Instruction Classes
integer
load/store
floating-point
branch
Instruction
Cache
Dispatch
Unit
F Pipe IBus
M Pipe IBus
beq, bne,
add, sub, or, xor, lw, sw, ld, sd, fadd, fsub, fmult,
shift, etc.
ldc1, sdc1, fmadd, fdiv, fcmp, bCzT, bCzF, j,
etc.
fsqrt, etc.
mov, movc,
fmov, etc.
FP
F Pipe
FP
M Pipe
Integer
F Pipe
Integer
M Pipe
The symmetric superscalar capability of the ACT
7000SC, in combination with its low latency integer
execution units and high-throughput fully pipelined
floating-point execution unit, provides unparalleled
price/performance
in
computational intensive
embedded applications.
Pipeline
The logical length of both the F and M pipelines is
five stages with state committing in the register write,
or W, pipe stage. The physical length of the
floating-point execution pipeline is actually seven
stages but this is completely transparent to the user.
Figure 3 shows instruction execution within the
ACT 7000SC when instructions are issuing
simultaneously down both pipelines. As illustrated in
the figure, up to ten instructions can be executing
simultaneously. This figure presents a somewhat
simplistic view of the processors operation however
since the out-of-order completion of loads, stores, and
Figure 2 – Instruction Issue Paradigm
The figure illustrates that one F pipe instruction and
one M pipe instruction can be issued concurrently but
that two M pipe or two F pipe instructions cannot be
issued. Table 2 specifies more completely the
instructions within each class.
I0
I1
I2
I3
I4
I5
I6
I7
I8
I9
1l
1l
2l
2l
1R
1R
1l
1l
2R
2R
2l
2l
1A
1A
1R
1R
1l
1l
2A
2A
2R
2R
2l
2l
1D
1D
1A
1A
1R
1R
1l
1l
2D
2D
2A
2A
2R
2R
2l
2l
1W
1W
1D
1D
1A
1A
1R
1R
1l
1l
2W
2W
2D
2D
2A
2A
2R
2R
2l
2l
1W
1W
1D
1D
1A
1A
1R
1R
2W
2W
2D
2D
2A
2A
2R
2R
1W
1W
1D
1D
1A
1A
2W
2W
2D
2D
2A
2A
1W
1W
1D
1D
2W
2W
2D
2D
1W
1W
2W
2W
one cycle
1I-1R:
2I:
2R:
1A:
1A:
1A-2A:
2A:
2A-2D:
1D:
2W:
Instruction cache access
Instruction virtual to physical address translation
Register file read, Bypass calculation, Instruction decode, Branch address calculation
Issue or slip decision, Branch decision
Data virtual address calculation
Integer add, logical, shift
Store Align
Data cache access and load align
Data virtual to physical address translation
Register file write
Figure 3 – Pipeline
Aeroflex Circuit Technology
3
SCD7000SC REV B 7/30/01 Plainview NY (516) 694-6700
long latency floating-point operations can result in
there being even more instructions in process than
what is shown.
Note that instruction dependencies, resource
conflicts, and branches result in some of the
instruction slots being occupied by NOPs.
Table 3 – ALU Operations
Unit
Adder
Logic
Shifter
F Pipe
add, sub
logic, moves, zero shifts
(nop)
non zero shift
M Pipe
add, sub, data address
add
logic, moves, zero shifts
(nop)
non zero shift, store
align
Integer Unit
Like the ACT 52xx family, the ACT 7000SC
implements the MIPS IV Instruction Set Architecture,
and is therefore fully upward compatible with
applications that run on processors such as the
R4650 and R4700 that implement the earlier
generation MIPS III Instruction Set Architecture.
Additionally, the
ACT 7000SC includes two
implementation specific instructions not found in the
baseline MIPS IV ISA, but that are useful in the
embedded market place. Described in detail in a later
section of this datasheet, these instructions are
integer multiply-accumulate and three-operand
integer multiply.
The ACT 7000SC integer unit includes thirty-two
general purpose 64-bit registers, the HI/LO result
registers for the two-Pipeline operand integer
multiply/divide operations, and the program counter,
or PC. There are two separate execution units, one of
which can execute function, or F, type instructions
and one which can execute memory, or M, type
instructions. See above for a description of the
instruction types and the issue rules. As a special
case, integer multiply/divide instructions as well as
their corresponding MFHi and MFLo instructions can
only be executed in the F type execution unit. Within
each execution unit the operational characteristics
are the same as on previous QED designs with single
cycle ALU operations (add, sub, logical, shift), one
cycle load delay, and an autonomous multiply/divide
unit.
Register File
The ACT 7000SC has thirty-two general purpose
registers with register location (r0) hard wired to zero
value. These registers are used for scalar integer
operations and address calculation. In order to
service the two integer execution units, the register
file has four read ports and two write ports and is fully
bypassed both within and between the two execution
units to minimize operation latency in the pipeline.
Integer Multiply/Divide
The ACT 7000SC has a single dedicated integer
multiply/divide unit optimized for high-speed multiply
and
multiply-accumulate
operations.
The
multiply/divide unit resides in the F type execution
unit. Table 4 shows the performance of the
multiply/divide unit on each operation.
Table 4 – Integer Multiply / Divide Operations
Opcode
MULT/U,
MAD/U
MUL
DMULT,
DMULTU
DIV, DIVD
DDIV,
DDIVU
Operand
Size
16 bit
32 bit
16 bit
32 bit
any
any
any
Latency
4
5
4
5
9
36
68
Repeat
Rate
3
4
3
4
8
36
68
Stall
Cycles
0
0
2
3
0
0
0
ALU
The ACT 7000SC has two complete integer ALU’s
each consisting of an integer adder/subtractor, a logic
unit, and a shifter. Table 3 shows the functions
performed by the ALU’s for each execution unit. Each
of these units is optimized to perform all operations in
a single processor cycle.
The baseline MIPS IV ISA specifies that the results
of a multiply or divide operation be placed in the Hi
and Lo registers. These values can then be
transferred to the general purpose register file using
the Move-from-Hi and Move-from-Lo (MFHI/MFLO)
instructions.
In addition to the baseline MIPS IV integer multiply
instructions, the ACT 7000SC also implements the
3-operand multiply instruction, MUL. This instruction
specifies that the multiply result go directly to the
integer register file rather than the Lo register. The
portion of the multiply that would have normally gone
into the Hi register is discarded. For applications
where it is known that the upper half of the multiply
result is not required, using the MUL instruction
eliminates the necessity of executing an explicit
MFLO instruction.
Also included in the ACT 7000SC are the
multiply-add
instructions
MAD/MADU.
This
instruction multiplies two operands and adds the
resulting product to the current contents of the Hi and
Lo registers. The multiply-accumulate operation is the
core primitive of almost all signal processing
algorithms allowing the ACT 7000SC to eliminate the
need for a separate DSP engine in many embedded
applications.
4
SCD7000SC REV B 7/30/01 Plainview NY (516) 694-6700
Aeroflex Circuit Technology
By pipelining the multiply-accumulate function and
dynamically determining the size of the input
operands, the ACT 7000SC is able to maximize
throughput while still using an area efficient
implementation.
Table 5 – Floating Point Latencies and
Repeat Rates
Operation
fadd
fsub
fmult
fmadd
fmsub
fdiv
fsqrt
frecip
frsqrt
fcvt.s.d
fcvt.s.w
fcvt.s.l
fcvt.d.s
fcvt.d.w
fcvt.d.l
fcvt.w.s
fcvt.w.d
fcvt.l.s
fcvt.l.d
fcmp
fmov, fmovc
fabs, fneg
Latency
single/double
4
4
4/5
4/5
4/5
21/36
21/36
21/36
38/68
4
6
6
4
4
4
4
4
4
4
1
1
1
Repeat Rate
single/double
1
1
1/2
1/2
1/2
19/34
19/34
19/34
36/66
1
3
3
1
1
1
1
1
1
1
1
1
1
Floating-Point Coprocessor
The ACT 7000SC incorporates a high-performance
fully pipe-lined floating-point coprocessor which
includes a floating-point register file and autonomous
execution units for multiply/ add/convert and
divide/square root. The floating-point coprocessor is a
tightly coupled co-execution unit, decoding and
executing instructions in parallel with, and in the case
of floating-point loads and stores, in cooperation with
the M pipe of the integer unit. As described earlier, the
superscalar capabilities of the ACT 7000SC allow
floating-point computation instructions to issue
concurrently with integer instructions.
Floating-Point Unit
The ACT 7000SC floating-point execution unit
supports single and double precision arithmetic, as
specified in the IEEE Standard 754. The execution
unit is broken into a separate divide/square root unit
and a pipelined multiply/add unit. Overlap of
divide/square root and multiply/add is supported.
The ACT 7000SC maintains fully precise
floating-point exceptions while allowing both
overlapped and pipelined operations. Precise
exceptions are extremely important in object-oriented
programming environments and highly desirable for
debugging in any environment.
The floating-point unit’s operation set includes
floating-point add, subtract, multiply, multiply-add,
divide, square root, reciprocal, reciprocal square root,
conditional moves, conversion between fixed-point
and floating-point format, conversion between
floating-point formats, and floating-point compare.
Table 5 gives the latencies of the floating-point
instructions in internal processor cycles.
To support superscalar operations, the FGR has
four read ports and two write ports, and is fully
bypassed to minimize operation latency in the
pipeline. Three of the read ports and one write port
are used to support the combined multiply-add
instruction while the fourth read and second write port
allows a concurrent floating-point load or store and
conditional moves.
System Control Coprocessor (CP0)
Floating-Point General Register File
The floating-point general register file, FGR, is
made up of thirty-two 64-bit registers. With the
floating-point load and store double instructions,
LDC1 and SDC1, the floating-point unit can take
advantage of the 64-bit wide data cache and issue a
floating-point coprocessor load or store double-word
instruction in every cycle.
The floating-point control register file contains two
registers; one for determining configuration and
revision information for the coprocessor and one for
control and status information. These registers are
primarily used for diagnostic software, exception
handling, state saving and restoring, and control of
rounding modes.
The system control coprocessor (CP0) in the MIPS
architecture is responsible for the virtual memory
sub-system, the exception control system, and the
diagnostics capability of the processor. In the MIPS
architecture, the system control coprocessor (and
thus the kernel software) is implementation
dependent. For memory management, the ACT
7000SC CP0 is logically identical to that of the
RM5200 Family and R5000. For interrupt exceptions
and diagnostics, the ACT 7000SC is a superset of the
RM5200 Family and R5000 implementing additional
features described later in the sections on Interrupts,
the Test/Breakpoint facility, and the Performance
Counter facility.
The memory management unit controls the virtual
memory system page mapping. It consists of an
instruction address translation buffer, or ITLB, a data
5
SCD7000SC REV B 7/30/01 Plainview NY (516) 694-6700
Aeroflex Circuit Technology