System architecture¶
Ventium is organised in two layers. The core (ventium_top wrapping
core.sv) is the P5/P54C microprocessor itself — the in-order, dual-issue
integer pipeline, the x87 floating-point unit, the L1 caches and TLB, and the
64-bit bus interface. The SoC (ventium_soc.sv) wraps that core with the
PC-platform peripherals — interrupt controller, timers, keyboard, VGA, IDE and
so on — so that bare-metal images and the external CPU tester boot against a
period-correct machine.
Both layers are verified the same way: a program runs on the RTL and on a QEMU reference, and the architectural (and, in cycle mode, per-instruction retire) state is diffed for bit-identity (see References).
The core microarchitecture¶
The core implements the classic Pentium five-stage, in-order, dual-issue
pipeline. The diagram below follows the layout of the Pentium Processor manual’s
block diagram (Fig. 1) — instructions flow top to bottom from the instruction
cache down through decode, the control unit and the execution units, with the
Bus Unit and Page Unit on the left and the memory subsystem at the bottom. Each
block is annotated with the RTL file that implements it (per
rtl/README.md).
PF — front end. The L1 instruction cache (icache.sv, 8 KB / 2-way /
32-byte lines) and the branch target buffer with its two-bit predictors
(bpred_btb.sv) feed a prefetch / instruction buffer. A correctly predicted
branch costs no bubble; a misprediction is resolved in the execute stage.
D1 — decode & pairing. Two fast-path decoders (decode.sv) crack the
variable-length x86 stream into the internal fpd_t micro-op form. The U/V
pairing checker (issue_uv.sv) applies the AP-500 rules — both ops
“simple”, no register dependency, no displacement+immediate conflict, prefix and
branch-position restrictions — to decide whether the pair may dual-issue (up to
2 instructions per clock) or must issue singly in the U pipe (see the
Instruction Catalog for the per-instruction U/V class).
D2 — operands & AGU. The GPR register file is read (with partial-register handling and the bypass network), and the address-generation unit forms effective addresses for memory operands. A result feeding an address one clock later triggers the AGI interlock.
EX — execute. The U pipe runs the full ALU / shifter / multiply-divide /
branch-resolve datapath; the V pipe runs the simple ALU / shift / branch
subset. The x87 FPU (fpu_x87_pkg / fpu_top) operates on 80-bit
floatx80 values and hosts the optional genuine radix-4 SRT divider that
reproduces the FDIV bug (see The r4 SRT divider and the FDIV bug). The dual-ported L1
D-cache (dcache_timing.sv, banked, MESI) services loads and stores from
either pipe; the TLB (tlb.sv) supplies the physical address.
WB — writeback. Results commit to the register file and EFLAGS; the full EX→EX and WB→EX bypass network lets a dependent chain of simple ops sustain one result per clock.
Memory & bus. The bus interface unit (biu / the SVA-verified
biu_p5.sv) drives the 64-bit P5 bus, filling the I- and D-caches and draining
writebacks.
Note
The file label on each block is the RTL unit that implements it. The R2
refactor extracted behaviour-preserving leaf modules from the core.sv
spine: bpred_btb.sv (BTB arrays + predictor), icache.sv (I-cache
arrays + fill + LRU), tlb.sv (split I/D TLB arrays + lookup), and
dcache_timing.sv (D-cache timing model — no data array; load data still
comes from the bus), alongside decode.sv / issue_uv.sv (decode +
pairing), the x87 state file fpu_top.sv, and the pure-function packages
ventium_alu_pkg.sv / fpu_x87_pkg.sv. The spine core.sv still runs
the pipeline FSM, the prefetch path, the slow-path microsequencer, the
page-table walk, the execution datapath and the FPU scoreboard — which is why
Prefetch Buffers, Microcode / µseq, the Control Unit, the Integer
Datapath and the Page Unit are labelled core.sv. biu_p5.sv is the
standalone pin-level 64-bit P5 bus FSM (biu.sv is its default-OFF
integration wrapper). See rtl/README.md for the authoritative file list.
The SoC integration¶
ventium_soc.sv instantiates the core with soc_en=1 and wires the
PC-platform peripheral models onto the programmed-I/O (PMIO) bus. Memory traffic
goes over the 64-bit bus through the BIU; IN/OUT to the legacy port map
is decoded to the device models; and device interrupts are funneled through the
8259A PIC to the core’s INTR/INTA handshake.
Peripherals (rtl/soc/ven_*.sv): the 8259A PIC (master + slave,
0x20/0xA0), the 8254 PIT (0x40–0x43), the MC146818 RTC
(0x70/0x71), the 8042 keyboard/mouse controller (0x60/0x64),
the port-92 fast-A20 gate (0x92), the VGA register file
(0x3B0–0x3DF), the ACPI PM timer (0x608), and two IDE/ATA
channels — a primary master disk (0x1F0/0x3F6) and a secondary ATAPI
CD-ROM (0x170/0x376).
Interrupts. Device IRQ lines (PIT → IRQ0, keyboard/mouse → IRQ1/IRQ12, RTC →
IRQ8, IDE → IRQ14/IRQ15) feed the cascaded 8259A pair, which presents a single
INTR to the core and returns the vector over the INTA cycle. Several
lines are wired but held quiescent (e.g. the polled IDE channels run with
nIEN) so they cannot perturb the differential gate.
A20. The port-92 register and the 8042 (port 0x64) commands combine into
the physical A20 address mask applied at the core’s bus boundary.
The SoC track has its own regression aggregate (make verify-soc): each device
model is diffed against qemu-system-i386 over a directed bare-metal test, and
the test386 external CPU tester boots on ventium_soc and is diffed
byte-for-byte against QEMU.