RISC-V RV32I Processor

Overview

A pipelined RISC-V processor implementing the RV32I base ISA, written in Verilog and pushed through synthesis and place-and-route in SkyWater 130nm. The core uses a three-stage pipeline: fetch (S1), decode/execute (S2), and memory/writeback (S3), fronted by direct-mapped instruction and data caches built on sky130 SRAM macros and connected to a 128-bit main-memory interface through an arbiter.

The three-stage organization is the deliberate middle ground for this target: deeper than a single-cycle core (so it can close timing at the course’s clock constraint) but shallow enough that the only hazard machinery needed is a single forwarding path and memory stalls, rather than a full multi-stage hazard unit.

What I did

Pipeline datapath: partitioned the datapath into three stages with explicit pipeline registers (PC, instruction, ALU result, store data) carrying state from S1 → S2 → S3.
Control path: built out the decode logic as separate, testable decoders: ALU-op, immediate, branch, memory, CSR, register-write, PC-select, and a dedicated forwarding decoder.
Hazard resolution: implemented S3 → S2 forwarding for both ALU operands (forward_a / forward_b) and store data (forward_data), with x0 and write-enable guards so only genuine register-write instructions forward; multi-cycle memory accesses stall the pipeline via the cache/arbiter handshake.
Branch & jump: branch comparison (BrEq / BrLT, signed or unsigned via BrUn) is resolved in S2 and drives the PC-select mux, covering conditional branches, JAL, and JALR with a low branch penalty.
ISA coverage: full RV32I (R/I-type arithmetic, LOAD / STORE, BRANCH, LUI, AUIPC, JAL, JALR) plus the machine CSRs (tohost, hartid, status) needed by the test harness.
Memory system: direct-mapped I- and D-caches (a two-way set-associative variant is also implemented) over the banked sky130 SRAM macros.
Physical implementation: drove synthesis (Genus) and place & route (Innovus) through Berkeley Hammer under a 9.0 ns clock constraint with explicit SRAM-macro placement, then ran Calibre DRC/LVS and Voltus power analysis. The block routed, but the layout did not close to a DRC-clean signoff (outstanding violations remained).
Verification: ran the riscv-tests assembly ISA suite and benchmarks (dhrystone, median, mm, multiply, qsort, rsort, spmv, towers, vvadd), plus directed ALU test-vector and PC testbenches.

Specs

Item	Value
ISA	RV32I base + machine CSRs
Pipeline	3 stages, IF / ID-EX / MEM-WB
Hazard handling	S3 → S2 forwarding (operands + store data); memory stalls
Branch resolution	In S2 (low branch penalty)
Caches	Direct-mapped I/D (two-way variant implemented), sky130 SRAM
Clock constraint	9.0 ns (≈111 MHz target), sky130, 1.8 V
Flow	RTL → Genus (synth) → Innovus (P&R) → Calibre (DRC/LVS)
Verification	riscv-tests ISA suite + benchmark programs

Approach and key decisions

Forwarding over stalling. With only three stages, the single read-after-write hazard worth optimizing is the producer-in-S3 / consumer-in-S2 case. A dedicated forwarding decoder handles it combinationally for ALU operands and store data, so back-to-back dependent instructions don’t bubble. The decoder explicitly excludes x0 and non-writing opcodes, which is the easy correctness trap in forwarding logic.

Branch resolution in S2. Resolving the branch condition and computing the target in the execute stage keeps the misprediction/redirect penalty to a single stage, which matters for the branch-heavy benchmarks in the test suite.

Caches and SRAM-macro placement. The most interesting physical-design wrinkle was the memory system: the direct-mapped caches map onto banked sky130 SRAM macros, which have to be explicitly floorplanned (placement constraints in the PAR config) rather than left to automatic placement, the same physical-design discipline a real block requires.

Figures

Microarchitecture

Full datapath diagram across the three pipeline stages, labeled Instruction Fetch, Instruction Decode, ALU Execute, Memory Access, and Write Back, showing PC, IMEM, register file, DMEM, the CSR path, pipeline registers, and the data-forwarding paths. Full datapath across the IF, ID, ALU Execute, Memory, and Write Back stages, with pipeline registers (blue) and the data-forwarding paths (green).

Control path diagram showing the per-signal decoders (ImmDec, BranchDec, ABDdec, ALUDec, CSRdec, PCDec, MemDec, WBDec, RegDec) split across Stage 2 and Stage 3, plus the forwarding-logic block that compares the current and previous instructions. Control path: one decoder per signal group, split across Stage 2 and Stage 3, plus the forwarding-logic comparator that drives hazard control and data forwarding.

Memory-access (MEMX) block diagram showing the interfaces from the datapath, from control, to the datapath, and to/from the external memory, alongside the IMEM and DMEM blocks. Memory-access stage (MEMX): the interface between the datapath, control, and the external memory model, feeding IMEM and DMEM.

Physical implementation

Place-and-routed layout of the core in SkyWater 130nm, showing dense multi-layer routing across the die. Post-place-and-route layout of the core in SkyWater 130nm (not DRC clean).