Featured White Papers
- Oct. 14th: Simplified IT with Software-as-a-Service (SaaS) (ZDNet)
- PCI DSS therapy for the smaller retailer (McAfee)
- The rise of Web commuting (Citrix Online)
A high-frequency custom CMOS S/390 microprocessor
IBM Journal of Research and Development, Jul-Sep 1997 by Webb, C F, Liptay, J S
The S/390(R) Parallel Enterprise Server Generation 4 processor is an implementation of the IBM ESA/3909TM) architecture on a single custom CMOS chip. It was designed on a blank slate after consideration of remapping either a prior CMOS design or a prior bipolar design. It uses a straightforward pipeline both to achieve a fast cycle time and to speed the design cycle. The complex instructions are implemented using highly privileged subroutines called millicode. To achieve high data integrity while maintaining a high clock frequency, the chip contains duplicate I- and E-units which perform the same operations each cycle and have their results compared.
Introduction
A major element of the transformation of mainframe computing is the transition from bipolar emitter-coupled logic (ECL) to complementary metal-oxide semiconductor (CMOS) logic. This change is nowhere more conspicuous than in the design of the central processing unit (CPU) of the S/390* Generation 4 (G4) CMOS system. CMOS technology has a tremendous advantage in circuit density, such that a high-performance CPU may be contained in a single CMOS chip, rather than hundreds of ECL chips. In addition, the low switching current of CMOS greatly reduces power consumption and eliminates the need for complex and expensive water-cooled packaging. The result is an S/390 G4 CMOS system with the CPUs, L2 caches, and bus-switching logic packaged in a single multichip module, which delivers performance comparable to that of a 9021 9X2 system in which the corresponding logic occupied 52 modules on 12 boards in six frames.
Designing a CPU, particularly for an architecture as rich as ESA/390*, is a major undertaking. One of the first questions to answer is whether the design should be based on an existing design or done on a clean slate. An existing design has the advantage that it is tested and proven, but the disadvantage that it embodies a set of design assumptions which may not be best for the new design. It is necessary to balance the characteristics of existing designs against the requirements for a new design and against the opportunities offered by a clean-slate design. The considerations which shaped the S/390 G4 CMOS design are discussed next.
Design considerations
Requirements
The requirements for this design were as follows:
Full support for the ESA/390 architecture.
Ability to fit on a single chip in CMOS 5X technology to allow cost-effective packaging of a 10-way symmetric multiprocessor system.
Extendibility to exploit future CMOS technology improvements.
Overall performance comparable to that of the 9021 9X2.
Mainframe-class reliability, availability, and serviceability (RAS).
Mid-1997 delivery to the marketplace.
Existing CMOS design point
The existing design point used in the first three generations of CMOS S/390 systems was intended to optimize chip area and cost rather than high-end performance [1]. Successive refinement of the design, culminating in the S/390 G3 CMOS system, has yielded significant performance improvements. However, this design utilizes a relatively long cycle time, shallow pipeline, and narrow dataflow, which limits the potential for further extension and, in particular, for exploiting custom high-frequency CMOS design.
Existing bipolar design point
The existing bipolar S/390 CPU designs offered different barriers to their use in this project [2]. These designs had been created for high-end server use, but the design points had been optimized for ECL technology. ECL and CMOS circuits have significantly different electrical characteristics. For example, ECL logic gates allow high fan-in and fan-out with little performance loss, whereas CMOS gates lose performance rapidly with more than three inputs and have much more limited drive capability. This allows complex controls in ECL to have significantly fewer stages of logic than controls in CMOS. Also, greater sensitivity to output loading in CMOS affects the ability to drive signals from one section of the chip to another.
Another difference is that ECL technology is much less dense than CMOS, which means that a processor designed in ECL is spread across multiple chips on multiple modules. Such a design must allow for the different delays involved in crossing such packaging boundaries.
These factors lead to differences between CMOS and ECL designs in overall processor structure, the partitioning of the pipeline into machine cycles, and the organization of dataflow and control logic.
Opportunity for custom design
The availability of custom design techniques for CMOS circuits adds another dimension to the design process. Custom design allows CMOS logic to be built in a smaller area, with faster timing, than is possible using a library of standard logic gates. This is done by designing at the device (transistor) level and carefully tuning each element.
However, custom design requires considerable designer time and is difficult to modify once completed. Therefore, it is best suited to regular structures, such as arrays and arithmetic logic, where large functions can be built by repeated use of smaller components, and pieces can be physically arranged to minimize wiring delays between and within functional elements. The logic which controls the sequencing and execution of instructions is less suited to custom design because of its irregularity and the need to change it late in the design cycle to correct problems found during functional verification. Since it cannot be custom, but must run at the same clock cycle as the custom logic, it must be less complex than the control logic in recent bipolar S/390 processors.