Page 1: Multi-cycle CPU & Conclusion
Intro:
A Central Processing Unit is one of the major components of a computer. The CPU controls logic flow and other control type operations. There are many different things inside a modern CPU. How it works is what we are after today.
Assumptions:
I will be referring to a RISC style architecture throughout this article and there are five basic steps in the architecture; Instruction Fetch, Instruction Decode (Register Fetching), ALU/Execution, Memory (Read & Write), Write-back (Register Write). RISC stands for Reduced Instruction Set Computer, basically it means that every piece of data that the CPU wants to operate on needs to be in the register file. If you need to work with data that is in 0xFF56, you need to load it into a register and then work with it. It is due to this nature that these types of CPUs are called load-store machines. x86 ISA CPUs are CISC (Complex). They can work on memory directly, but their instructions are different lengths and need more logic to do. RISC instructions are all the same size. The RISC instructions and data are all aligned in memory and both take exactly one cycle to fetch each instruction.
This article will not be going into how cache helps alleviate bottleneck problems. I will leave that topic until a later article explains how that works as well as concepts such as virtual memory.
We will measure performance in execution time. Exectime = t x InstrCount x CPI
t=Clock Period
CPI=Cycles per Instruction
Commonly, you will see that people refer to IPC which is Instructions per Cycle in the case of super scalar CPUs where in a single-cycle, more than one instruction can complete. In this case Exectime = t x InstrCount x (1/IPC) .
Simple Case - Single-Cycle CPU:
The simplest CPU is a single-cycle CPU. You feed an instruction in and in one clock cycle, the data is produced. Think of going to the laundromat to get your clothes clean. In a single-cycle CPU, you take over a combination of the washing machine (30 minutes), the dryer (40 minutes) and the folding table (20 minutes). No one can use the washing machine until you have completed your entire task. In a single-cycle CPU, each instruction takes one cycle no matter which instruction is executing. The clock speed is determined by the runtime of the longest instruction. In a single-cycle CPU, logic is easy. There is a purely combinational approach to this type of CPU and there is no need to worry about data dependencies or hazards. Today's single-cycle CPUs are typically called micro controllers. Since a single-cycle CPU effectively locks out the entire datapath of the CPU, it is considered to waste resources. Going back to the example, when you finished your washing machine load and moved onto the dryer, someone else could start using the washing machine right after. This is the same as a multi-cycle CPU or pipelined CPU. Most (if not all) server and desktop CPUs are pipelined including the x86 CPUs. In this example, a person exists the laundromat every 90 minutes (30+40+20), which is the execution time of a single instruction.
The execution time for a single-cycle CPU would be t x InstrCount. CPI is equal to 1. Let's say that a typical benchmark is 1,000,000,000 instructions and the clock speed of this particular type of CPU is 200MHz. This particular CPU would take 5 seconds to execute this benchmark.
A Central Processing Unit is one of the major components of a computer. The CPU controls logic flow and other control type operations. There are many different things inside a modern CPU. How it works is what we are after today.
Assumptions:
I will be referring to a RISC style architecture throughout this article and there are five basic steps in the architecture; Instruction Fetch, Instruction Decode (Register Fetching), ALU/Execution, Memory (Read & Write), Write-back (Register Write). RISC stands for Reduced Instruction Set Computer, basically it means that every piece of data that the CPU wants to operate on needs to be in the register file. If you need to work with data that is in 0xFF56, you need to load it into a register and then work with it. It is due to this nature that these types of CPUs are called load-store machines. x86 ISA CPUs are CISC (Complex). They can work on memory directly, but their instructions are different lengths and need more logic to do. RISC instructions are all the same size. The RISC instructions and data are all aligned in memory and both take exactly one cycle to fetch each instruction.
This article will not be going into how cache helps alleviate bottleneck problems. I will leave that topic until a later article explains how that works as well as concepts such as virtual memory.
We will measure performance in execution time. Exectime = t x InstrCount x CPI
t=Clock Period
CPI=Cycles per Instruction
Commonly, you will see that people refer to IPC which is Instructions per Cycle in the case of super scalar CPUs where in a single-cycle, more than one instruction can complete. In this case Exectime = t x InstrCount x (1/IPC) .
Simple Case - Single-Cycle CPU:
The simplest CPU is a single-cycle CPU. You feed an instruction in and in one clock cycle, the data is produced. Think of going to the laundromat to get your clothes clean. In a single-cycle CPU, you take over a combination of the washing machine (30 minutes), the dryer (40 minutes) and the folding table (20 minutes). No one can use the washing machine until you have completed your entire task. In a single-cycle CPU, each instruction takes one cycle no matter which instruction is executing. The clock speed is determined by the runtime of the longest instruction. In a single-cycle CPU, logic is easy. There is a purely combinational approach to this type of CPU and there is no need to worry about data dependencies or hazards. Today's single-cycle CPUs are typically called micro controllers. Since a single-cycle CPU effectively locks out the entire datapath of the CPU, it is considered to waste resources. Going back to the example, when you finished your washing machine load and moved onto the dryer, someone else could start using the washing machine right after. This is the same as a multi-cycle CPU or pipelined CPU. Most (if not all) server and desktop CPUs are pipelined including the x86 CPUs. In this example, a person exists the laundromat every 90 minutes (30+40+20), which is the execution time of a single instruction.
The execution time for a single-cycle CPU would be t x InstrCount. CPI is equal to 1. Let's say that a typical benchmark is 1,000,000,000 instructions and the clock speed of this particular type of CPU is 200MHz. This particular CPU would take 5 seconds to execute this benchmark.