Bokai Gu

Posted on Feb 13

AI Infra HPC — CPU

#ai #hpc #infrastructure #productivity

The Basic components of a CPU

Nowadays, CPUs have evolved to the point where they integrate large-scale and complex circuits. It can be regarded as a complex machine assembled by many work units. However, no matter how the specific implementation of the CPU changes or how many times the number of transistors increases, it can be roughly divided into three main parts from a functional perspective: arithmetic logic unit, storage unit, and control unit.

Of course, the image below is just a simplified diagram. Actually, the central Arithmetic and Logic Unit(ALU) has extremely complex wiring, the entire I/O, and the entire specific control flow. The following section will introduce these units and how they collaborate. with each other.

Arithmetic and Logic Unit(ALU)

The main function of CPU is to perform calculations, which is implemented through the Arithmetic and Logic Unit (ALU). An ALU circuit is internally composed of an arithmetic unit (AU) and a logic unit (LU), which can perform arithmetic or logical operations on two input values (operands) and produce an output value.

The arithmetic unit is responsible for performing mathematical operations such as addition and subtraction on binary numbers, while the logic unit performs logical operations such as AND, OR, and NOT, as well as comparing two operands. Otherwise, the ALU also has a shifting function, which can shift the input operand to the left or right to obtain a new operand. So, not only in CPUs, but also in almost all other microprocessors such as graphics processing units (GPUs), the ALU is the most basic component.

In addition to performing calculations related to addition and subtraction, the ALU can also handle the multiplication of two integers because they are designed to perform integer calculations, so its result is also an integer. However, division operations are generally not performed by the ALU because they may produce floating-point results. Instead, the floating-point unit (FPU) typically handles division operations; the FPU can also perform other non-integer calculations.

Although the ALU is a major component of a processor, its design and functionality can vary across different processors. For example, some ALUs are designed to perform only integer calculations, while others are designed for floating-point operations. Some processors contain a single arithmetic logic unit (ALU) to perform operations, while others may contain multiple ALUs to complete calculations. The operations performed by the ALU are:

Logical operations: These include NOR, NOT, AND, NAND, OR, XOR, etc.
Shift operations: These are responsible for shifting bits to the right or left by a certain number of positions; also known as multiplication.
Arithmetic operations: While it performs multiplication and division, this refers to bit addition and bit subtraction. However, multiplication and division are more expensive (in terms of logical complexity and area). In multiplication, addition can be used as an alternative to division and subtraction.

As shown in the diagram, the ALU contains various input and output connections, allowing digital signals to be projected between external electronic devices and the ALU. The ALU inputs signals from external circuitry, and in response, the external electronic devices obtain output signals from the ALU.

Data: The ALU contains three parallel buses, including two input and one output operand. These three buses handle the same number of signals.
Opcode: When the ALU is about to perform an operation, the opcode describes the type of arithmetic or logical operation the ALU will perform.
Output: The result of an ALU operation is provided by status outputs in the form of supplementary data, as these are multiple signals. Typically, status signals such as overflow, zero, execute, and negative are included in a general-purpose ALU. When the ALU completes each operation, external registers contain the status output signals. These signals are stored in external registers so they can be used for future ALU operations.
Input: When the ALU performs an operation, status inputs allow the ALU to access more information to confirm successful operation completion. Additionally, the carry from a previous ALU operation is called a single “carry” bit.

An ALU is a combinational logic circuit, meaning its output changes asynchronously with its inputs. During normal operation, a stable signal is applied to all ALU inputs, and the result of the ALU operation appears at the ALU output after the signal has propagated through the ALU circuit for a sufficient time (called the “propagation delay”). External circuitry connected to the ALU is responsible for ensuring the stability of the ALU input signals throughout the operation and allowing sufficient time for the signal to propagate through the ALU before sampling the result.

Typically, external circuitry controls the ALU by applying signals to its inputs. Typically, the external circuitry uses sequential logic to control the ALU operation, controlled by a sufficiently low clock signal to ensure that the ALU output has enough time to stabilize in the worst-case scenario.

Memory Unit

A memory unit (MU), also known as a register, exists because programs are loaded into memory and run by the CPU, whose primary function is data processing. This process inevitably involves reading and writing data from memory. It involves sending data requests via the control bus and retrieving data from memory through the same channel. This process is cumbersome and consumes a significant amount of memory. Furthermore, some frequently used memory pages are unnecessary. Therefore, registers were created and stored within the CPU.

Without registers, the CPU would need to constantly read and write data from memory, severely degrading computer performance. Because registers are faster than memory, they accelerate computer operations and calculations. Additionally, registers can store intermediate results and operands, simplifying the CPU’s internal calculations.

Registers are mainly divided into two types: instruction registers and data registers. They are responsible for temporarily storing instructions, operands required by the ALU, and results calculated by the ALU. The Arithmetic Logic Unit (ALU) reads operands stored in registers during calculations, and saves the result to the accumulator (also a type of register). Instructions are executed by the ALU from the instruction register.

For example, when adding two numbers, one number is placed in register A and the other in register B. After the ALU performs the addition, the result is placed in the accumulator. For logical operations, the data to be compared is placed in the input register, and the result (1 or 0) is placed in the accumulator. Whether logical or arithmetic, the contents of the accumulator are cached.

The storage capacity of a register is determined by its bit width. Different registers have different bit widths and can store different amounts of data. For example, an 8-bit register (INPR) can store 256 different values.

Furthermore, the storage capacity of a register can also be determined by its purpose and design. For example, in a CPU, the instruction register typically stores only one instruction, while data registers can store multiple data items. Registers enable the CPU to quickly store, access, and manipulate instructions and data, thereby improving the overall performance and efficiency of the computer.

As shown in the diagram above, there are many types of registers. Let’s look at the functions of some common registers:

Data Register (DR): Also known as the data buffer register, the data register stores operands. Its bit width should meet the numerical range of most data types. Its main function is to act as a relay station for information transfer between the CPU and main memory/peripherals, bridging the speed difference between the CPU, main memory, and peripherals. The data register is used to temporarily store an instruction or a data word read from main memory; conversely, when an instruction or a data word is stored in main memory, it is also temporarily stored in the data register. The functions of the data register are: a. To act as a relay station for information transfer between the CPU and main memory/peripherals; b. To bridge the speed difference between the CPU and main memory/peripherals; c. In a single-accumulator arithmetic unit, the data register can also serve as an operand register.
Address Register (AR): The address register stores the address of the main memory location currently accessed by the CPU. Address registers can be general-purpose or used for special addressing modes, such as segment pointers (storing the base address) for base addressing, index registers for indexed addressing, and stack pointers for stack addressing. The address register must be long enough to accommodate the maximum address range. Due to the speed difference between main memory and the CPU, address registers must be used to temporarily store main memory address information until the main memory access operation is complete.
Accumulator Register (AC): The accumulator register, often simply called the accumulator (AC), is a general-purpose register. Its function is to provide a working area for the arithmetic logic unit (ALU) when performing arithmetic or logical operations, temporarily storing an operand or the result of the operation. Clearly, the arithmetic unit must have at least one accumulator register.
Program Counter (PC): The program counter (PC) has both registering and counting functions, and is generally used to store the address of the next instruction in main memory. Before a program executes, the program’s starting address — the address of the main memory location containing the first instruction — must first be loaded into the Program Counter (PC). Therefore, the PC’s content is the address of the first instruction fetched from main memory. When executing an instruction, the CPU automatically increments the PC’s content, ensuring it always stores the main memory address of the next instruction to be executed, preparing for fetching the next instruction. However, when a branch instruction is encountered, the address of the next instruction is specified by the branch instruction’s address code field, rather than being obtained by sequentially incrementing the PC’s content as usual.
Instruction Register (IR): The Instruction Register (IR) stores the instruction to be executed. When executing an instruction, it is first read from main memory into the data register, and then transferred to the instruction register. Instructions include two fields: opcode and address code. To execute an instruction, the opcode must be tested to identify the required operation; the instruction decoder performs this task. The instruction decoder decodes the opcode portion of the instruction register to generate the control potential required by the instruction, and sends it to the micro-operation control circuitry. Under the timing signal of the timing unit, the specific operation control signal is generated. The output of the opcode field in the instruction register is the input of the instruction decoder. Once the opcode is decoded, a specific signal for the specific operation is sent to the operation controller.

Besides this, there are many other types of registers; interested readers can consult relevant materials for further study.

The aforementioned registers are the internal storage units of the CPU, used to store data and instructions accessible to the CPU, as well as intermediate results of any calculations or tasks. The final results of processing are also saved to these storage units, and then these results are published to output devices for the user. However, the capacity of the CPU’s internal storage units is extremely limited; a large amount of data must be stored in RAM (Random Access Memory) chips outside the CPU, which is what we usually call main memory. The main memory unit is responsible for retrieving and temporarily storing data from main memory and managing the data flow between the CPU and main memory. Many people are familiar with main memory. Although both registers and memory can store data, they are vastly different. Below is a brief summary of the differences between registers and memory:

Function: Registers are components within the central processing unit (CPU) used to temporarily store instructions and data. They can be used to quickly store operands and intermediate results, and serve as a buffer for data exchange between the CPU’s internal and external memory or with input/output devices. Memory’s primary function is to store data processed by the CPU, as well as data exchanged with external storage devices such as hard drives.
Speed: Registers are located inside the CPU and execute quickly. Memory is relatively slower. Registers are extremely fast, typically completing data access operations within nanoseconds. In contrast, memory is relatively slower, but it is still one of the fastest types of storage, much faster than a hard drive.
Capacity: Registers typically have a capacity of only a few bytes to tens of bytes. Memory’s storage capacity is usually much larger than registers, expandable to several gigabytes or more.

So, given how important and fast registers are, why not make them larger? The reason is that good components are expensive, making memory a more cost-effective option.

Control Unit

The main function of the Control Unit (CU) can be summarized as instructing the most efficient way to perform tasks. The CU retrieves and selects instructions from main memory, decodes them, and then issues appropriate control signals to guide other components of the computer to execute the required operations. The CU itself does not execute program instructions; it merely outputs signals to instruct other parts of the system on how to do things.

If the CPU is the brain of the computer, then the CU is the brain of the CPU and its most important part. The CU’s tasks can be divided into decoding instructions, generating control signals, and sending these signals to other components, such as the ALU, MU, memory, and input/output devices. The following sections will describe the CU’s tasks in detail and provide examples.

Instruction Decoding: The CU is responsible for reading instructions from memory and decoding them. Instruction decoding is the process of converting binary instructions into control signals for various computer components. Through decoding, the CU can identify the instruction type, operands, and execution method, and prepare for subsequent execution steps. For example, suppose there is an instruction “ADD R1, R2, R3”, which means adding the values in registers R2 and R3 and storing the result in register R1. The control unit decodes the instruction, identifies it as an addition instruction, and generates corresponding control signals to instruct the arithmetic unit to read data from R2 and R3 and write the result to R1.
Control Signal Generation: Based on the decoded instruction type and operands, the control unit generates appropriate control signals to control the operation of various components in the computer. These control signals include clock signals, read/write signals, address selection signals, operand selection signals, etc. The control unit generates appropriate control signals according to the instruction’s requirements to ensure that the various components of the computer operate as required by the instruction. For example, for a store instruction “LOAD R1, 2000”, it means loading data at memory address 2000 into register R1. The control unit generates a control signal to read the data, sends address 2000 to memory, and writes the read data to R1.
Instruction Execution Order: The control unit is also responsible for managing the execution order of instructions. It schedules the execution of instructions one by one according to the instruction sequence, ensuring that the operation of each instruction is completed within the correct clock cycle. The control unit (CU) can control the flow of instructions, including jumps, branches, and loops, according to the needs of different instructions. For example, in a program, there is a conditional branch instruction “IF R1 == R2 THEN GOTO 100”, which means that if the value of register R1 equals the value of R2, execution will jump to the instruction labeled 100. The CU will generate the corresponding control signal based on the conditional judgment result to determine whether to jump to instruction labeled 100.

The task of the CU is to receive instructions and direct their execution. Let’s look at the CU’s working process as shown in the diagram below. The CU receives three inputs: a step counter, an instruction decoder, and a condition signal.

Step Counter: The clock sends clock pulses of a certain frequency to the step counter. The step counter sends step signals to the CU cyclically according to the number of clock pulses. If each machine cycle consists of twelve clock cycles, then when the clock generator receives the thirteenth clock pulse, it will re-issue the clock signal “1”. The CU will then enter the next machine cycle based on this clock signal — the CU will “light up” one of the four instruction cycle registers: Fetch (FE), Indirect Fetch (IND), Execute (EX), and Cycle (INT) (in fact, these four flip-flops are integrated inside the CU), indicating which machine cycle the current system is in.
Operation Decoder: Register IR sends the n-bit binary opcode from the instruction to the operation decoder — n bits correspond to 2^n states. The decoder needs to connect to the CU via 2^n lines, one line for each state, allowing the CU to recognize the opcode.
Flag Signals: Flags are feedback signals — whether the number processed by the ALU is positive or negative, whether there has been an overflow, whether the mouse has been clicked, which key has been pressed, etc., are all feedback signals.

After receiving these three external parameters, the CU can send control signals — micro-commands — to instruct the CPU to perform micro-operations.

Summary

The image below shows an architecture diagram of an Intel Core CPU. From the complex control flow, we can still clearly see the ALU, MU, and CU, which we have just learned about. Of course, readers will need a lot of knowledge to design such an architecture diagram.

DEV Community