DEV Community

WHY, not HOW is the question!
WHY, not HOW is the question!

Posted on

Custom RISC cpu - practical principles on the way to implement one for the first time

I just completed little ALU implementation for a RISC-alike custom CPU I wanted to design as a software coded emulator.

There are several principles that you need to consider when implementing RISC CPUs, I have learnt on the way implementing it:

  • Simple Instructions: RISC CPUs prioritize a small and simple set of instructions that perform basic operations, such as arithmetic, logical, and memory operations. Each instruction is typically designed to execute in a single clock cycle.

I ended up with 32 instructions

  • Uniform Instruction Format: RISC CPUs often use a fixed-length instruction format, where each instruction occupies the same number of bits. This simplifies instruction decoding and allows for efficient pipelining.

In my case the core part of the instruction is always 8bit, 5 bits to cover the instruction's op code and 3 bit for various addressing modes for the particular op code. That works fine. Then the CPU needs to decode it and the data payload part for the instruction is still a variable length. If this is a memory data, it is 32 bit always. If it is an register then I index them within a single 8bit value, making "mov reg, reg" shorter in size comparing to "mov reg, mem".

I could make it all the same size, but binary code would include some additional slack of zeroes, I have optimized already. I can perhaps change it later, but first I really need to test how it works in practice and within the FPGA implementation. This is software code not yet turned into LU signaling.

  • Load-Store Architecture: RISC CPUs commonly adopt a load-store architecture, which means that data transfers between memory and the CPU registers occur explicitly through load and store instructions. Only load and store instructions can access memory, while arithmetic and logical operations are performed only on registers. This simplifies the instruction set and improves memory access efficiency.

That requires complete revision of my design, I've started this completely wrong with a lot of direct and indirect addressing options within the number of options that can be hold within the 3 bits I allocated for it. Then CPU's ALU reads all the addressing modes and each instruction is covered, but perhaps unnecessarily complex. I keep it for further testing, but that instantly requires me to fork my CPU design if I wanted or are required to comply with this principle. The cost of that change is already expensive which is a perfect example that if you outline wrong design or inconsequentially... then your whole system if screwed with significant breaking changes or you keep the legacy endlessly, just like Intel, by the way, does...

  • Register-Rich Design: RISC CPUs typically have a large number of general-purpose registers, allowing more data to be held in registers instead of memory. This reduces memory access operations and improves performance.

I started with 16 general purpose registers and a few special purpose registers like stack, IP and CPUFlags. General purpose registers are indexed by 8 bit space at the opcode encode/decode level, thus I'm totally OK to extend it up to 256 without any breaking backward compatibility. I'm thinking positive about it.

  • Pipelining: RISC CPUs are designed with pipelining in mind. Pipelining allows for the overlap of instruction execution stages, improving performance by allowing multiple instructions to be processed simultaneously.

This is still my learning curve. I need to learn about pipelining more. I know how to consume it as assembly coder from the past, but how to effectively design CPU to process it all simultaneously, I don't know. It's a single core, single pipeline design until I learn.

  • Single-Cycle Execution: RISC CPUs aim to execute most instructions in a single clock cycle, ensuring predictable and uniform execution time. This simplifies the design and improves performance.

Well, I've started with a software developed emulator, thus cycling ain't part of the exercise yet. I'm rather trying to learn the logical structure of the CPU, then to perhaps repeat similar exercise as FPGA/Verilog signaling project. Then Single Cycle Execution can be practically tested. As a software emulator it's easy as it's all about timing and clock setup which is also something fully algorithmic, but as I just completed ALU, I still need to add a lot into control unit and cpu-bus integration to fully implement timing functions. I'll cover in separate article when more ready to share some insight.

  • Reduced Addressing Modes: RISC CPUs usually have a limited number of addressing modes, focusing on simplicity and regularity. Common addressing modes include immediate, register-direct, and register indirect.

3 bits to encode addressing gives me 16 options, a lot more than RISC recommends and perhaps I'll optimize that when in testing mode and remove unnecessary fat. I want to keep it simple, Motorola 68k is a powerful, yet RISC anti-pattern to me.

  • Compiler-Friendly: RISC CPUs are designed to be compiler-friendly, with an instruction set that maps well to high-level language constructs. The simplicity and orthogonality of the instruction set make it easier for compilers to generate efficient code.

As I just finished ALU, one of the next steps would be to create a simple assembler to compile BIOS/ROM code from a source code and that will finally answer the question. Stay tuned!

Top comments (0)