Features
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
3000 Dhrystone 2.1 MIPS at 1.3 GHz
Selectable Bus Clock (30 CPU Bus Dividers up to 28x)
Selectable MPx/60x Interface Voltage (1.8V, 2.5V)
P
D
Typically 18W at 1.33 GHz at V
DD
= 1.3V; 8.0W at 1 GHz at V
DD
= 1.1V
Full Operating Conditions
Nap, Doze and Sleep Power Saving Modes
Superscalar (Four Instructions Fetched Per Clock Cycle)
4 GB Direct Addressing Range
Virtual Memory: 4 Hexabytes (2
52
)
64-bit Data and 36-bit Address Bus Interface
Integrated L1: 32 KB Instruction and 32 KB Data Cache
Integrated L2: 512 KB
11 Independent Execution Units and three Register Files
Write-back and Write-through Operations
f
INT
Max = 1.167 MHz
f
BUS
Max = 133 MHz/166 MHz
PowerPC 7447A
RISC
Microprocessor
PC7447A
Description
The PC7447A host processor is a high-performance, low-power, 32-bit implementa-
tions of the PowerPC
®
Reduced Instruction Set Computer (RISC) architecture
combined with a full 128-bit implementation of Freescale’s AltiVec technology.
This microprocessor is ideal for leading-edge embedded computing and signal pro-
cessing applications. The PC7447A features 512 KB of on-chip L2 cache. The
PC7447A microprocessor has no backside L3 cache, allowing for a smaller package
designed as a pin-for-pin replacement for the PC7447 microprocessor. This device
benefits from a silicon-on-insulator (SOI) CMOS process technology, engineered to
help deliver tremendous power savings without sacrificing speed. A low-power version
of the PC7447A microprocessor is also available.
Figure 1-1 on page 2
shows a block diagram of the PC7447A. The core is a high-per-
formance superscalar design supporting a double-precision floating-point unit and a
SIMD multimedia unit. The memory storage subsystem supports the MPX bus proto-
col and a subset of the 60x bus protocol to the main memory and other system
resources.
Note that the PC7447A is a footprint-compatible, drop-in replacement in a PC7447
application if the core power supply is 1.3V.
Screening
•
Full Military Temperature Range (T
J
= -55°C, +125°C)
•
Industrial Temperature Range (T
J
= -40°C, +110°C)
GH suffix
HITCE 360
Rev. 5387D–HIREL–07/06
Figure 1-1.
1. Block Diagram
2
Instruction Unit
Branch Processing Unit
Fetcher
Tags
IBAT Array
BHT (2048-Entry)
Dispatch
Unit
Data MMU
SRs
(Original)
VR Issue
(4-Entry/2-Issue)
DBAT Array
GPR Issue
(6-Entry/3-Issue)
FPR Issue
(2-Entry/1-Issue)
128-Entry
DTLB
Tags
32-Kbyte
D Cache
LR
32-Kbyte
I Cache
BTIC (128-Entry)
CTR
Instruction Queue
(12-Word)
SRs
(Shadow)
128-Entry
ITLB
Instruction MMU
128-Bit (4 Instructions)
96-Bit (3 Instructions)
Reservation
Stations (2-Entry)
EA
Load/Store Unit
Vector Touch Engine
+ (EA Calculation)
Finished
Stores
L1 Castout
PA
FPR File
16 Rename
Buffers
Reservation
Stations (2)
Completes up
to three
instructions
per clock
VR File
16 Rename
Buffers
Reservation
Stations (2)
Reservation
Station
GPR File
Vector
Touch
Queue
Additional Features
• Time Base Counter/Decrementer
Clock Multiplier
JTAG/COP Interface
Thermal/Power Management
Performance Monitor
Dynamic Frequency Switching (DFS)
Temperature Dioder
PC7447A
PC7447A Microprocessor Block Diagram
Integer
Unit 2
Integer
Unit 1
(3)
+
32-Bit
32-Bit
Completed
Stores
Load Miss
64-Bit
64-Bit
x÷
Vector
FPU
32-Bit
128-Bit
128-Bit
Vector
Integer
Unit 2
Vector
Integer
Unit 1
Floating-
Point Unit
L1 Push
+ x÷
FPSCR
Memory Subsystem
L1 Store Queue
(LSQ)
L1 Load Queue (LLQ)
L1 Load Miss (5)
L2 Prefetch (3)
Instruction Fetch (2)
Cacheable Store Request (1)
L1 Service
Queues
512-Kbyte Unified L2 Cache Controller
Line
Block 0 (32-Byte)
Block 1 (32-Byte)
Tags Status
Status
System Bus Interface
Load
Queue (11)
Bus Store Queue
Castout
Queue (9) /
Push
Queue (10)2
L2 Store Queue (L2SQ)
Snoop Push/
L1 Castouts
Interventions
(4)
Bus Accumulator
36-bit
Address Bus
64-bit
Data Bus
Completion Unit
Completion Queue
(16-Entry)
16 Rename
Buffers
Reservation Reservation Reservation Reservation
Station
Station
Station
Station
Vector
Permute
Unit
5387D–HIREL–07/06
Notes: The castout queue and push queue share resources such for a combined total of entries.
The castout queue itself is limited to 9 entries, ensuring 1 entry will be available for a push.
PC7447A
2. Features
This section summarizes features of the PC7447A implementation of the PowerPC architecture.
Major features of the PC7447A are as follows:
• High-performance, superscalar microprocessor
– Up to four instructions can be fetched from the instruction cache at a time
– Up to 12 instructions can be in the instruction queue (IQ)
– Up to 16 instructions can be at some stage of execution simultaneously
– Single-cycle execution for most instructions
– One instruction per clock cycle throughput for most instructions
– Seven-stage pipeline control
• Eleven independent execution units and three register files
– Branch processing unit (BPU) features static and dynamic branch prediction
128-entry (32-set, four-way set-associative) branch target instruction cache (BTIC),
a cache of branch instructions that have been encountered in branch/loop code
sequences. If a target instruction is in the BTIC, it is fetched into the instruction
queue a cycle sooner than it can be made available from the instruction cache.
Typically, a fetch that hits the BTIC provides the first four instructions in the target
stream.
2048-entry branch history table (BHT) with two bits per entry for four levels of
prediction: not taken, strongly not taken, taken, and strongly taken
Up to three outstanding speculative branches
Branch instructions that do not update the count register (CTR) or link register (LR)
are often removed from the instruction stream
Eight-entry link register stack to predict the target address of Branch Conditional to
Link Register (BCLR) instructions
– Four integer units (IUs) that share 32 GPRs for integer operands
Three identical IUs (IU1a, IU1b, and IU1c) can execute all integer instructions except
multiply, divide, and move to/from special-purpose register instructions.
IU2 executes miscellaneous instructions including the CR logical operations, integer
multiplication and division instructions, and move to/from special-purpose register
instructions.
– Five-stage FPU and a 32-entry FPR file
Fully IEEE
®
754-1985-compliant FPU for both single- and double-precision
operations
Supports non-IEEE mode for time-critical operations
3
5387D–HIREL–07/06
Hardware support for denormalized number
Thirty-two 64-bit FPRs for single- or double-precision operands
– Four vector units and 32-entry vector register file (VRs)
Vector permute unit (VPU)
Vector integer unit 1 (VIU1) handles short-latency AltiVec™ integer instructions,
such as vector add instructions (for example, vaddsbs, vaddshs, and vaddsws).
Vector integer unit 2 (VIU2) handles longer-latency AltiVec integer instructions, such
as vector multiply add instructions (for example, vmhaddshs, vmhraddshs, and
vmladduhm).
Vector floating-point unit (VFPU)
– Three-stage load/store unit (LSU)
Supports integer, floating-point, and vector instruction load/store traffic
Four-entry vector touch queue (VTQ) supports all four architectures of the AltiVec
data stream operations
Three-cycle GPR and AltiVec load latency (byte, half word, word, vector) with one-
cycle throughput
Four-cycle FPR load latency (single, double) with one-cycle throughput
No additional delay for misaligned access within double-word boundary
Dedicated adder calculates effective addresses (EAs)
Supports store gathering
Performs alignment, normalization, and precision conversion for floating-point data
Executes cache control and TLB instructions
Performs alignment, zero padding, and sign extension for integer data
Supports hits under misses (multiple outstanding misses)
Supports both big- and little-endian modes, including misaligned little-endian
accesses
• Three issue queues, FIQ, VIQ, and GIQ, can accept as many as one, two, and three
instructions, respectively, in a cycle. Instruction dispatch requires the following:
– Instructions can only be dispatched from the three lowest IQ entries: IQ0, IQ1, and
IQ2
4
PC7447A
5387D–HIREL–07/06
PC7447A
– A maximum of three instructions can be dispatched to the issue queues per clock
cycle
– Space must be available in the CQ for an instruction to dispatch (this includes
instructions that are assigned a space in the CQ but not in an issue queue)
• Rename buffers
– 16 GPR rename buffers
– 16 FPR rename buffers
– 16 VR rename buffers
• Dispatch unit
– Decode/dispatch stage fully decodes each instruction
• Completion unit
– The completion unit retires an instruction from the 16-entry completion queue (CQ)
when all instructions ahead of it have been completed, the instruction has finished
execution, and no exceptions are pending
– Guarantees sequential programming model (precise exception model)
– Monitors all dispatched instructions and retires them in order
– Tracks unresolved branches and flushes instructions after a mispredicted branch
– Retires as many as three instructions per clock cycle
• Separate on-chip L1 instruction and data caches (Harvard Architecture)
– 32-Kbyte, eight-way set-associative instruction and data caches
– Pseudo least-recently-used (PLRU) replacement algorithm
– 32-byte (eight-word) L1 cache block
– Physically indexed/physical tags
– Cache write-back or write-through operation programmable on a per-page or per-
block basis
– Instruction cache can provide four instructions per clock cycle; data cache can
provide four words per clock cycle
– Caches can be disabled in software
– Caches can be locked in software
– MESI data cache coherency maintained in hardware
– Separate copy of data cache tags for efficient snooping
– Parity support on cache and tags
– No snooping of instruction cache except for icbi instruction
– Data cache supports AltiVec LRU and transient instructions
– Critical double- and/or quad-word forwarding is performed as needed. Critical quad-
word forwarding is used for AltiVec loads and instruction fetches. Other accesses
use critical double-word forwarding.
• Level 2 (L2) cache interface
– On-chip, 512-Kbyte, eight-way set-associative unified instruction and data cache
– Fully pipelined to provide 32 bytes per clock cycle to the L1 caches
– A total nine-cycle load latency for an L1 data cache miss that hits in L2
5
5387D–HIREL–07/06