Features
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
3000 Dhrystone 2.1 MIPS at 1.3 GHz
Selectable Bus Clock (30 CPU Bus Dividers up to 28x)
13 Selectable Core-to-L3 Frequency Divisors
Selectable MPx/60x Interface Voltage (1.8V, 2.5V)
Selectable L3 Interface of 1.8V or 2.5V
P
D
Typical 12.6W at 1 GHz at V
DD
= 1.3V; 8.3W at 1 GHz at V
DD
= 1.1V, Full Operating
Conditions
Nap, Doze and Sleep Modes for Power Saving
Superscalar (Four Instructions Fetched Per Clock Cycle)
4 GB Direct Addressing Range
Virtual Memory: 4 Hexabytes (2
52
)
64-bit Data and 36-bit Address Bus Interface
Integrated L1: 36 KB Instruction and 32 KB Data Cache
Integrated L2: 512 KB
11 Independent Execution Units and Three Register Files
Write-back and Write-through Operations
f
INT
Max = 1 GHz (1.2 GHz to be Confirmed)
f
BUS
Max = 133 MHz/166 MHz
PowerPC 7457
RISC
Microprocessor
PC7457
Description
The PC7457 is implementations of the PowerPC
®
microprocessor family of reduced
instruction set computer (RISC) microprocessors. This document describes pertinent
electrical and physical characteristics of the PC7457.
The PC7457 is the fourth implementation of the fourth generation (G4) microproces-
sors from Freescale. The PC7457 implements the full PowerPC 32-bit architecture
and is targeted at networking and computing systems applications. The PC7457 con-
sists of a processor core, a 512 Kbyte L2, and an internal L3 tag and controller which
support a glueless backside L3 cache through a dedicated high-bandwidth interface.
The core is a high-performance superscalar design supporting a double-precision
floating-point unit and a SIMD multimedia unit. The memory storage subsystem sup-
ports the MPX bus interface to main memory and other system resources. The L3
interface supports 1, 2, or 4M bytes of external SRAM for L3 cache and/or private
memory data. For systems implementing 4M bytes of SRAM, a maximum of 2M bytes
may be used as cache; the remaining 2M bytes must be private memory.
Note that the PC7457 is a footprint-compatible, drop-in replacement in a PC7455
application if the core power supply is 1.3V.
Rev. 5345D–HIREL–07/06
Screening
• CBGA Upscreenings Based on Atmel Standards
• Full Military Temperature Range (T
J
= -55° C, +125° C),
Industrial Temperature Range (T
J
= -40° C, +110° C)
• HCTE Package for the 7457
G suffix
CBGA 483
Ceramic Ball Grid Array
GH suffix
HITCE 483
Ceramic Ball Grid Array
2
PC7457
5345D–HIREL–07/06
Figure 1-1.
Additional Features
Instruction Unit
Branch Processing Unit
Fetcher
Tags
IBAT Array
BHT (2048-Entry)
Dispatch
Unit
Data MMU
SRs
(Original)
VR Issue
(4-Entry/2-Issue)
DBAT Array
GPR Issue
(6-Entry/3-Issue)
FPR Issue
(2-Entry/1-Issue)
128-Entry
DTLB
32-Kbyte
LR
32-Kbyte
I Cache
BTIC (128-Entry)
CTR
Instruction Queue
(12-Word)
SRs
(Shadow)
128-Entry
ITLB
Instruction MMU
5345D–HIREL–07/06
128-Bit (4 Instructions)
- Time Base Counter/Decrementer
- Clock Multiplier
- JTAG/COP Interface
- Thermal/Power Management
- Performance Monitor
1. Block Diagram
Completion Unit
96-Bit (3 Instructions)
Completion Queue
(16-Entry)
Tags
D Cache
Reservation
Stations (2-Entry)
EA
Completes up
to three
instructions
per clock
Load/Store Unit
Vector Touch Engine
+ (EA Calculation)
Finished
Stores
VR File
16 Rename
Buffers
Reservation
v
Stations (2)
GPR File
Reservation
Reservation
Reservation
Station
Station
Station
Vector
Touch
Queue
PA
FPR File
L1 Castout
16 Rename
Buffers
Reservation
Stations (2)
PC7457 Microprocessor Block Diagram
16 Rename
Buffers
Reservation Reservation Reservation Reservation
v
Station
Station
Station
Station
Integer
Unit 2
x÷
Vector
FPU
32-Bit
128-Bit
128-Bit
+++
32-Bit
32-Bit
Integer
Integer
Integer
Unit 122
Unit
Unit
(3)
Floating-
Point Unit
L1 Push
Completed
Stores
+ x÷
FPSCR
Load Miss
64-Bit
64-Bit
Vector
Permute
Unit
Vector
Integer
er
Unit 2
Vector
Integer
er
Unit 1
Memory Subsystem
System Bus Interface
L3 Cache Controller(1)
Line Block 0/1
Tags Status
L3CR
L2 Prefetch (3)
Instruction Fetch (2)
Cacheable Store Request(1)
Bus Accumulator
19-Bit Address
64-Bit Data
(8-Bit Parity)
External SRAM
(1, 2, or 4 Mbytes)
Load
Queue (11)
L1 Store Queue
(LSQ)
L1 Service
Queues
512-Kbyte UniÞed L2 Cache Controller
Line
Block 0 (32-Byte) Block 1 (32-Byte)
Tags Status
Status
L1 Load Queue (LLQ)
L1 Load Miss (5)
Bus Store Queue
Castout
Queue (9)/
Push
Queue (10)(2)
L2 Store Queue (L2SQ)
Snoop Push/
L1 Castouts
Interventions
(4)
Bus Accumulator
36-Bit
Address Bus
Notes:
1. The L3 cache interface is not implemented on the PC7447.
2. The Castout Queue and Push Queue share resources such for a combined total of 10 entries.
The Castout Queue itself is limited to 9 entries, ensuring 1 entry will be available for a push.
64-Bit
Data Bus
PC7457
3
2. General Parameters
Table 2-1
provides a summary of the general parameters of the PC7457.
Table 2-1.
Parameter
Technology
Die size
Transistor count
Logic design
Packages
Core power supply
I/O power supply
Device Parameters
Description
0.13 µm CMOS, nine-layer metal
9.1 mm × 10.8 mm
58 million
Fully-static
PC7447: surface mount 360 ceramic ball grid array (CBGA)
PC7457: surface mount 483 ceramic ball grid array (CBGA) + HiTCE CBGA
1.3V ±500 mV DC nominal or 1.1V ±50 mV (nominal, see
“Recommended
Operating Conditions
(1)
” on page 12
1.8V ±5% DC, or 2.5V ±5% for recommended operating conditions
3. Overview
This section summarizes features of the PC7457 implementation of the PowerPC architecture.
Major features of the PC7457 are as follows:
• High-performance, superscalar microprocessor
– As many as 4 instructions can be fetched from the instruction cache at a time
– As many as 3 instructions can be dispatched to the issue queues at a time
– As many as 12 instructions can be in the instruction queue (IQ)
– As many as 16 instructions can be at some stage of execution simultaneously
– Single-cycle execution for most instructions
– One instruction per clock cycle throughput for most instructions
– Seven-stage pipeline control
• Eleven independent execution units and three register files
– Branch processing unit (BPU) features static and dynamic branch prediction
128-entry (32-set, four-way set-associative) branch target instruction cache (BTIC),
a cache of branch instructions that have been encountered in branch/loop code
sequences. If a target instruction is in the BTIC, it is fetched into the instruction
queue a cycle sooner than it can be made available from the instruction cache.
Typically, a fetch that hits the BTIC provides the first four instructions in the target
stream
2048-entry branch history table (BHT) with two bits per entry for four levels of
prediction – not-taken, strongly not-taken, taken, and strongly taken
Up to three outstanding speculative branches
Branch instructions that
don’t
update the count register (CTR) or link register (LR)
are often removed from the instruction stream
4
PC7457
5345D–HIREL–07/06
PC7457
Eight-entry link register stack to predict the target address of Branch Conditional to
Link Register (BCLR) instructions
– Four integer units (IUs) that share 32 GPRs for integer operands
Three identical IUs (IU1a, IU1b, and IU1c) can execute all integer instructions except
multiply, divide, and move to/from special-purpose register instructions
IU2 executes miscellaneous instructions including the CR logical operations, integer
multiplication and division instructions, and move to/from special-purpose register
instructions
– Five-stage FPU and a 32-entry FPR file
Fully IEEE 754-1985-compliant FPU for both single- and double-precision
operations
Supports non-IEEE mode for time-critical operations
Hardware support for denormalized numbers
Thirty-two 64-bit FPRs for single- or double-precision operands
– Four vector units and 32-entry vector register file (VRs)
Vector permute unit (VPU)
Vector integer unit 1 (VIU1) handles short-latency AltiVec integer instructions, such
as vector add instructions (vaddsbs, vaddshs, and vaddsws, for example)
Vector integer unit 2 (VIU2) handles longer-latency AltiVec integer instructions, such
as vector multiply add instructions (vmhaddshs, vmhraddshs, and vmladduhm, for
example)
Vector floating-point unit (VFPU)
– Three-stage load/store unit (LSU)
Supports integer, floating-point, and vector instruction load/store traffic
Four-entry vector touch queue (VTQ) supports all four architected AltiVec data
stream operations
Three-cycle GPR and AltiVec load latency (byte, half-word, word, vector) with one-
cycle throughput
Four-cycle FPR load latency (single, double) with one-cycle throughput
No additional delay for misaligned access within double-word boundary
5
5345D–HIREL–07/06