Back to projects

Chip Design · Microarchitecture

RISC-V RV32I Processor

Three-stage pipelined RV32I core with operand forwarding and direct-mapped caches, synthesized and placed-and-routed in SkyWater 130nm.

Status
Completed December 2025 · UC Berkeley EECS 251A · Solo
Stack
Verilog · Berkeley Hammer / Chipyard flow · Cadence Genus (synthesis) · Cadence Innovus (place & route) · Calibre (DRC / LVS) · SkyWater 130nm PDK · riscv-tests (verification)

Overview

A pipelined RISC-V processor implementing the RV32I base ISA, written in Verilog and pushed through synthesis and place-and-route in SkyWater 130nm. The core uses a three-stage pipeline: fetch (S1), decode/execute (S2), and memory/writeback (S3), fronted by direct-mapped instruction and data caches built on sky130 SRAM macros and connected to a 128-bit main-memory interface through an arbiter.

The three-stage organization is the deliberate middle ground for this target: deeper than a single-cycle core (so it can close timing at the course’s clock constraint) but shallow enough that the only hazard machinery needed is a single forwarding path and memory stalls, rather than a full multi-stage hazard unit.

What I did

Specs

ItemValue
ISARV32I base + machine CSRs
Pipeline3 stages, IF / ID-EX / MEM-WB
Hazard handlingS3 → S2 forwarding (operands + store data); memory stalls
Branch resolutionIn S2 (low branch penalty)
CachesDirect-mapped I/D (two-way variant implemented), sky130 SRAM
Clock constraint9.0 ns (≈111 MHz target), sky130, 1.8 V
FlowRTL → Genus (synth) → Innovus (P&R) → Calibre (DRC/LVS)
Verificationriscv-tests ISA suite + benchmark programs

Approach and key decisions

Forwarding over stalling. With only three stages, the single read-after-write hazard worth optimizing is the producer-in-S3 / consumer-in-S2 case. A dedicated forwarding decoder handles it combinationally for ALU operands and store data, so back-to-back dependent instructions don’t bubble. The decoder explicitly excludes x0 and non-writing opcodes, which is the easy correctness trap in forwarding logic.

Branch resolution in S2. Resolving the branch condition and computing the target in the execute stage keeps the misprediction/redirect penalty to a single stage, which matters for the branch-heavy benchmarks in the test suite.

Caches and SRAM-macro placement. The most interesting physical-design wrinkle was the memory system: the direct-mapped caches map onto banked sky130 SRAM macros, which have to be explicitly floorplanned (placement constraints in the PAR config) rather than left to automatic placement, the same physical-design discipline a real block requires.

Figures

Microarchitecture

Full datapath diagram across the three pipeline stages, labeled Instruction Fetch, Instruction Decode, ALU Execute, Memory Access, and Write Back, showing PC, IMEM, register file, DMEM, the CSR path, pipeline registers, and the data-forwarding paths. Full datapath across the IF, ID, ALU Execute, Memory, and Write Back stages, with pipeline registers (blue) and the data-forwarding paths (green).

Control path diagram showing the per-signal decoders (ImmDec, BranchDec, ABDdec, ALUDec, CSRdec, PCDec, MemDec, WBDec, RegDec) split across Stage 2 and Stage 3, plus the forwarding-logic block that compares the current and previous instructions. Control path: one decoder per signal group, split across Stage 2 and Stage 3, plus the forwarding-logic comparator that drives hazard control and data forwarding.

Memory-access (MEMX) block diagram showing the interfaces from the datapath, from control, to the datapath, and to/from the external memory, alongside the IMEM and DMEM blocks. Memory-access stage (MEMX): the interface between the datapath, control, and the external memory model, feeding IMEM and DMEM.

Physical implementation

Place-and-routed layout of the core in SkyWater 130nm, showing dense multi-layer routing across the die. Post-place-and-route layout of the core in SkyWater 130nm (not DRC clean).