What happens when we increase cache memory size?
How does it impact my system performance?
Does increasing cache size/RAM always improve my system performance?
Answers to a couple of these questions are not straightforward as they depend on several factors such as the type of application we run, processor architecture, etc. It is practically impossible and expensive to try out different hardware combinations (in this case cache memory) on actual systems to experiment out. In such cases, we can use computer architectural simulators to analyze various aspects of computer systems.
A computer architectural simulator helps users to:
- Perform detailed analysis of hardware components like CPUs, caches, memory, and interconnects by emulating them
- Tweak system parameters (e.g., cache sizes, clock speeds, pipeline stages) to study their impact on performance.
- Execute real or synthetic workloads to measure system performance, energy consumption, and other metrics.
- Explore unconventional architectures or emerging technologies.
- Track system behavior and issues using debugging tools.
- Functional Simulators: Focus on the correctness and behavior of the system rather than performance.
- Cycle-Accurate Simulators: Model the timing of every hardware component cycle-by-cycle.
- Full-System Simulators: Simulate an entire system, including processors, peripherals, memory, and OS.
- Trace-Driven Simulators: Use traces (logs) of past executions to replay system behavior and analyze performance.
- Event-Driven Simulators: Focus on discrete events (e.g., cache misses, branch predictions) and simulate their effects.
- SimpleScalar: A widely-used simulator for basic processor modeling, known for its simplicity and ease of use.
- Sniper: A fast and accurate x86 multi-core simulator optimized for performance estimation.
- MARSSx86: A full-system x86 simulator focused on detailed modeling of CPUs and memory systems.
- QEMU: Primarily an emulator but often used for system-level simulations due to its speed.
- ZSim: A simulator aimed at high-speed simulation of large-scale multi-core systems.
- Simics: A commercial simulator that provides detailed full-system modeling with support for various ISAs.
- Synopsys Platform Architect Ultra: A commercial tool for system-level performance and power modeling, focused on architecture exploration and optimization of SoC designs. It is widely used in industry for early-stage design and analysis.
- SPARTA: A high-performance, modular, event-driven framework for detailed microarchitecture modeling. It is particularly useful for cycle-accurate simulations and fine-grained performance analysis.
- gem5: A flexible, modular, and open-source simulator capable of detailed modeling of CPUs, GPUs, memory systems, and full-system simulations, supporting multiple ISAs such as ARM, x86, RISC-V, and MIPS.
In this blog we will discuss gem5 further as it stands out among these simulators due to several key advantages:
- Modularity and Flexibility: gem5’s object-oriented design and modular components make it highly extensible for custom hardware-software co-design research.
- Comprehensive Support: It supports multiple ISAs, including ARM, x86, RISC-V, MIPS, and SPARC, making it versatile for cross-architecture studies.
- Detailed Modeling: gem5 provides accurate microarchitectural details for CPUs, GPUs, and memory subsystems, which is crucial for in-depth research.
- Full-System Simulation: It allows full-system simulation, enabling the study of hardware and software interactions, unlike some simulators that are limited to user-level simulations.
- Active Community: gem5 has a large, active community contributing to its development, ensuring continuous improvement, bug fixes, and support.
- Open Source: As an open-source tool, it is freely available and can be modified to meet specific research needs, unlike commercial options such as Simics.
Introduction to gem5
gem5 is a state-of-the-art open-source computer architecture simulator widely used in academia and industry for modeling and evaluating computer systems. It provides a flexible and modular framework for simulating diverse architectures, from simple single-core systems to complex multi-core and heterogeneous setups. gem5 is primarily used for research and development in computer architecture, system software, and hardware-software co-design. gem5 is written primarily in C++ and python. It can simulate a system with devices and an operating system in full system mode (FS mode) or user space-only programs where system services are provided directly by the simulator in syscall emulation mode (SE mode). gem5 supports executing Alpha, ARM, MIPS, Power, SPARC, RISC-V, and 64-bit x86 binaries on CPU models including two simple single CPI models, an out-of-order model, and an in-order pipelined model. It can also run precompiled binaries for performance evaluation.
Memory models
gem5 provides two memory models for simulating memory systems; classic and Ruby. The table below summarizes their key features
Feature | Classic Model | Ruby Model |
---|---|---|
Cache Coherence Protocols | Predefined (MOESI, MESI) | Fully customizable |
Ease of Use | Simple to configure and use | Complex, requires expertise |
Simulation Speed | Faster | Slower |
Flexibility | Limited | High |
Custom Protocol Support | No | Yes |
Use Case | General-purpose simulations | Advanced research and experiments |
Step 1: Install dependencies
Step 2: Clone gem5 repo
We can build for any supported ISA, here I am taking RISCV as an example – scons build/RISCV/gem5.opt -j9
Experimenting with gem5
Now, let’s try out some experiments with the RISCV system we just built and analyze its performance.
Let’s start by measuring the level 2 cache misses and using an IPC performance metric (Instructions Per Cycle). We will experiment by changing the L2 cache size and see the impact on L2 miss rate and IPC.
For this we need to select one application, and create the application binary for the required ISA; in this case RISC-V binaries
I used Canneal from the PARSEC benchmark suite. We need to build the binaries for RISC-V using RISC-V toolchain
You can find the source code and steps to create riscv binaries for various applications including canneal from the link given below
https://github.com/RALC88/riscv-vectorized-benchmark-suite
After building the benchmark binaries, run the binaries with different cache sizes, in this case, I am experimenting with L2 cache size. For running canneal benchmark with L2 cache size 512 KB you can run this command given below:
./build/RISCV/gem5.opt configs/deprecated/example/se.py –cmd=/home/siva/gem5/canneal_serial.exe –options=”1 15000 2000 input_can/200000.nets 64″ –caches –l2cache –l2_size=512kB –cpu-type=RiscvO3CPU
The possible command line arguments are given inside gem5/configs/common/Options.py file
Once the simulation ends you can check the stats file (gem5/m5out/stats.txt) for required parameters. I have given the values for the L2 hit rate and IPC for different L2 cache sizes as references. The results demonstrate the impact of cache size on IPC and hit rate for a given application.