System architecture

Ventium is organised in two layers. The core (ventium_top wrapping core.sv) is the P5/P54C microprocessor itself — the in-order, dual-issue integer pipeline, the x87 floating-point unit, the L1 caches and TLB, and the 64-bit bus interface. The SoC (ventium_soc.sv) wraps that core with the PC-platform peripherals — interrupt controller, timers, keyboard, VGA, IDE and so on — so that bare-metal images and the external CPU tester boot against a period-correct machine.

Both layers are verified the same way: a program runs on the RTL and on a QEMU reference, and the architectural (and, in cycle mode, per-instruction retire) state is diffed for bit-identity (see References).

The core microarchitecture

The core implements the classic Pentium five-stage, in-order, dual-issue pipeline. The diagram below follows the layout of the Pentium Processor manual’s block diagram (Fig. 1) — instructions flow top to bottom from the instruction cache down through decode, the control unit and the execution units, with the Bus Unit and Page Unit on the left and the memory subsystem at the bottom. Each block is annotated with the RTL file that implements it (per rtl/README.md).

// Ventium core — drawn in the style of the Pentium Processor manual block // diagram (Fig. 1): top-down, square boxes, nested colored units, orthogonal // connectors, bus-width labels. Each block is annotated with the RTL file that // implements it (rtl/README.md block map). Rendered by sphinx.ext.graphviz. digraph ventium_core { rankdir=TB; splines=ortho; graph [fontname="Helvetica", fontsize=13, labelloc=t, label="Ventium core — Pentium (P5/P54C) block diagram (block — RTL file)", bgcolor="white", nodesep=0.35, ranksep=0.40]; node [shape=box, style=filled, fontname="Helvetica", fontsize=10, color="black", fillcolor="white", penwidth=1.1, margin="0.10,0.06"]; edge [fontname="Helvetica", fontsize=9, color="black", arrowsize=0.7, penwidth=1.1]; // ---- external interface (left rail) ---- busunit [label=<Bus<BR/>Unit<BR/><FONT POINT-SIZE="8">biu.sv / biu_p5.sv</FONT>>, fillcolor="#bfe3f0", width=1.15, height=3.0]; pageunit [label=<Page<BR/>Unit<BR/><FONT POINT-SIZE="8">core.sv / tlb.sv</FONT>>, fillcolor="#f6d6d6", width=1.2]; // plain text labels (white fill so SVG never renders a black box) databus [shape=plaintext, style=filled, fillcolor=white, label="64-bit\nData Bus"]; addrbus [shape=plaintext, style=filled, fillcolor=white, label="32-bit\nAddress Bus"]; ctrlbus [shape=plaintext, style=filled, fillcolor=white, label="Control"]; // ---- front end ---- btb [label=<Branch<BR/>Target Buffer<BR/><FONT POINT-SIZE="8">bpred_btb.sv</FONT>>, fillcolor="#cfcfcf", width=1.3]; itlb [label=<TLB<BR/><FONT POINT-SIZE="8">tlb.sv</FONT>>, fillcolor="#f0c674", width=0.7]; icache [label=<Instruction Cache<BR/>8 KB, 2-way<BR/><FONT POINT-SIZE="8">icache.sv</FONT>>, fillcolor="#f6a6a6", width=2.1]; prefetch [label=<Prefetch Buffers<BR/><FONT POINT-SIZE="8">core.sv (spine)</FONT>>, fillcolor="#fbe0bf", width=2.6]; decode [label=<Instruction Decode<BR/><FONT POINT-SIZE="8">decode.sv / issue_uv.sv</FONT>>, fillcolor="#fbe0bf", width=2.6]; ucode [label=<Microcode / &#181;seq<BR/><FONT POINT-SIZE="8">core.sv (slow&#8209;path FSM)</FONT>>, fillcolor="#ecd5b8", width=1.7]; control [label=<Control Unit<BR/><FONT POINT-SIZE="8">core.sv (pipeline FSM)</FONT>>, fillcolor="#f3c07a", width=5.0, height=0.6]; // ---- integer datapath (nested, manual-style gray container) ---- intdp [shape=plaintext label=< <TABLE BORDER="1" CELLBORDER="1" CELLSPACING="3" CELLPADDING="6" COLOR="black" BGCOLOR="#9fa6a6"> <TR><TD COLSPAN="2"><B>Integer Datapath</B> &#160;<FONT POINT-SIZE="8">core.sv / ventium_alu_pkg.sv</FONT></TD></TR> <TR> <TD BGCOLOR="#d9d9d9">Address&#160;Generate<BR/>U&#8209;pipe</TD> <TD BGCOLOR="#d9d9d9">Address&#160;Generate<BR/>V&#8209;pipe</TD> </TR> <TR><TD COLSPAN="2" BGCOLOR="#ef5350"><FONT COLOR="white"><B>Integer Register File</B></FONT></TD></TR> <TR> <TD BGCOLOR="#f4978e">ALU<BR/>U&#8209;pipe</TD> <TD BGCOLOR="#f4978e">ALU<BR/>V&#8209;pipe</TD> </TR> <TR><TD COLSPAN="2" BGCOLOR="#f8c9a0">Barrel Shifter</TD></TR> </TABLE>>]; // ---- FPU (nested, manual-style red container) ---- fpu [shape=plaintext label=< <TABLE BORDER="1" CELLBORDER="1" CELLSPACING="3" CELLPADDING="6" COLOR="black" BGCOLOR="#ef5350"> <TR><TD><FONT COLOR="white"><B>Floating&#160;Point&#160;Unit</B></FONT><BR/><FONT COLOR="white" POINT-SIZE="8">fpu_top.sv / fpu_x87_pkg.sv</FONT></TD></TR> <TR><TD BGCOLOR="#fde1e1">Control</TD></TR> <TR><TD BGCOLOR="#fcdada">FP Register File</TD></TR> <TR><TD BGCOLOR="#f5a623">Add</TD></TR> <TR><TD BGCOLOR="#f5a623">Divide&#160;(radix&#8209;4 SRT)</TD></TR> <TR><TD BGCOLOR="#f5a623">Multiply</TD></TR> </TABLE>>]; // ---- data cache ---- dcache [shape=plaintext label=< <TABLE BORDER="1" CELLBORDER="1" CELLSPACING="0" CELLPADDING="6" COLOR="black" BGCOLOR="#f6a6a6"> <TR><TD>Dual&#8209;Access Data Cache &#160; 8 KB, 2&#8209;way<BR/><FONT POINT-SIZE="8">dcache_timing.sv</FONT></TD> <TD BGCOLOR="#f0c674">TLB<BR/><FONT POINT-SIZE="8">tlb.sv</FONT></TD></TR> </TABLE>>]; // ================= vertical spine (defines the ranks) ================= icache -> prefetch [label="256", dir=both]; prefetch -> decode; decode -> control; control -> intdp; control -> fpu; intdp -> dcache [label="32", dir=both]; // ================= same-rank rows (mirror the manual floorplan) ======= { rank=same; databus; btb; itlb; icache; } { rank=same; addrbus; busunit; decode; ucode; } { rank=same; ctrlbus; pageunit; intdp; fpu; } // left-right ordering within those rows (invisible) databus -> btb [style=invis]; btb -> itlb [style=invis]; itlb -> icache [dir=both]; // TLB -- Instruction Cache addrbus -> busunit [style=invis]; busunit -> decode [style=invis]; decode -> ucode [dir=both]; // Instruction Decode -- Microcode ROM ctrlbus -> pageunit [style=invis]; pageunit -> intdp [style=invis]; intdp -> fpu [style=invis]; // ================= external bus arrows (off-chip, into Bus Unit) ======= databus -> busunit [dir=both, constraint=false]; addrbus -> busunit [dir=both, constraint=false]; ctrlbus -> busunit [dir=both, constraint=false]; // ================= non-spine connections (constraint=false) =========== btb -> prefetch [label="target\naddress", constraint=false]; ucode -> control [constraint=false]; busunit -> icache [xlabel="64", dir=both, constraint=false]; busunit -> pageunit [dir=both, constraint=false]; pageunit -> intdp [xlabel="address", dir=both, constraint=false]; fpu -> dcache [label="80", dir=both, constraint=false]; dcache -> busunit [xlabel="64", dir=both, constraint=false]; intdp -> btb [label="branch verif.\n& target addr", style=dashed, constraint=false]; }

PF — front end. The L1 instruction cache (icache.sv, 8 KB / 2-way / 32-byte lines) and the branch target buffer with its two-bit predictors (bpred_btb.sv) feed a prefetch / instruction buffer. A correctly predicted branch costs no bubble; a misprediction is resolved in the execute stage.

D1 — decode & pairing. Two fast-path decoders (decode.sv) crack the variable-length x86 stream into the internal fpd_t micro-op form. The U/V pairing checker (issue_uv.sv) applies the AP-500 rules — both ops “simple”, no register dependency, no displacement+immediate conflict, prefix and branch-position restrictions — to decide whether the pair may dual-issue (up to 2 instructions per clock) or must issue singly in the U pipe (see the Instruction Catalog for the per-instruction U/V class).

D2 — operands & AGU. The GPR register file is read (with partial-register handling and the bypass network), and the address-generation unit forms effective addresses for memory operands. A result feeding an address one clock later triggers the AGI interlock.

EX — execute. The U pipe runs the full ALU / shifter / multiply-divide / branch-resolve datapath; the V pipe runs the simple ALU / shift / branch subset. The x87 FPU (fpu_x87_pkg / fpu_top) operates on 80-bit floatx80 values and hosts the optional genuine radix-4 SRT divider that reproduces the FDIV bug (see The r4 SRT divider and the FDIV bug). The dual-ported L1 D-cache (dcache_timing.sv, banked, MESI) services loads and stores from either pipe; the TLB (tlb.sv) supplies the physical address.

WB — writeback. Results commit to the register file and EFLAGS; the full EX→EX and WB→EX bypass network lets a dependent chain of simple ops sustain one result per clock.

Memory & bus. The bus interface unit (biu / the SVA-verified biu_p5.sv) drives the 64-bit P5 bus, filling the I- and D-caches and draining writebacks.

Note

The file label on each block is the RTL unit that implements it. The R2 refactor extracted behaviour-preserving leaf modules from the core.sv spine: bpred_btb.sv (BTB arrays + predictor), icache.sv (I-cache arrays + fill + LRU), tlb.sv (split I/D TLB arrays + lookup), and dcache_timing.sv (D-cache timing model — no data array; load data still comes from the bus), alongside decode.sv / issue_uv.sv (decode + pairing), the x87 state file fpu_top.sv, and the pure-function packages ventium_alu_pkg.sv / fpu_x87_pkg.sv. The spine core.sv still runs the pipeline FSM, the prefetch path, the slow-path microsequencer, the page-table walk, the execution datapath and the FPU scoreboard — which is why Prefetch Buffers, Microcode / µseq, the Control Unit, the Integer Datapath and the Page Unit are labelled core.sv. biu_p5.sv is the standalone pin-level 64-bit P5 bus FSM (biu.sv is its default-OFF integration wrapper). See rtl/README.md for the authoritative file list.

The SoC integration

ventium_soc.sv instantiates the core with soc_en=1 and wires the PC-platform peripheral models onto the programmed-I/O (PMIO) bus. Memory traffic goes over the 64-bit bus through the BIU; IN/OUT to the legacy port map is decoded to the device models; and device interrupts are funneled through the 8259A PIC to the core’s INTR/INTA handshake.

// Ventium SoC — drawn in the Pentium-manual block-diagram style (top-down, // square boxes, orthogonal connectors) to match the core diagram. The core sits // on the memory bus and the PMIO I/O bus; the ven_* peripheral models tap the // PMIO bus, and device IRQs funnel through the 8259A PIC to the core. digraph ventium_soc { rankdir=TB; splines=ortho; graph [fontname="Helvetica", fontsize=13, labelloc=t, label="Ventium SoC — core + PC-platform peripherals (ventium_soc.sv)", bgcolor="white", nodesep=0.30, ranksep=0.55]; node [shape=box, style=filled, fontname="Helvetica", fontsize=10, color="black", fillcolor="white", penwidth=1.1, margin="0.10,0.07"]; edge [fontname="Helvetica", fontsize=9, color="black", arrowsize=0.7, penwidth=1.1]; // ---- core + memory ---- core [label="Ventium core\n(ventium_top / core.sv, soc_en=1)", fillcolor="#bfe3f0", width=3.2, height=0.8]; mem [label="System memory\n(64-bit P5 bus via BIU)", fillcolor="#f6a6a6", width=2.2]; // ---- the PMIO I/O bus (wide bar, like the Control Unit) ---- pmio [label=<PMIO I/O bus &#160;&#160;(IN / OUT)<BR/><FONT POINT-SIZE="8">port decode in ventium_soc.sv</FONT>>, fillcolor="#f3c07a", width=9.0, height=0.55]; // ---- peripheral models (rtl/soc/ven_*.sv) ---- pic [label=<<B>8259A PIC</B> master + slave<BR/>0x20/0x21, 0xA0/0xA1<BR/><FONT POINT-SIZE="8">ven_pic.sv</FONT>>, fillcolor="#ef5350", fontcolor="white"]; pit [label=<8254 PIT<BR/>0x40-0x43<BR/><FONT POINT-SIZE="8">ven_pit.sv</FONT>>, fillcolor="#fbe0bf"]; rtc [label=<MC146818 RTC<BR/>0x70/0x71<BR/><FONT POINT-SIZE="8">ven_rtc.sv</FONT>>, fillcolor="#fbe0bf"]; kbd [label=<8042 kbd/mouse<BR/>0x60/0x64<BR/><FONT POINT-SIZE="8">ven_i8042.sv</FONT>>, fillcolor="#fbe0bf"]; p92 [label=<Port 92 fast A20<BR/>0x92<BR/><FONT POINT-SIZE="8">ven_port92.sv</FONT>>, fillcolor="#f6d6d6"]; vga [label=<VGA registers<BR/>0x3B0-0x3DF<BR/><FONT POINT-SIZE="8">ven_vgaregs.sv</FONT>>, fillcolor="#fbe0bf"]; acpi [label=<ACPI PM timer<BR/>0x608<BR/><FONT POINT-SIZE="8">ven_acpipm.sv</FONT>>, fillcolor="#ecd5b8"]; ide1 [label=<IDE primary master disk<BR/>0x1F0/0x3F6<BR/><FONT POINT-SIZE="8">ven_ide.sv</FONT>>, fillcolor="#f8c9a0"]; ide2 [label=<IDE secondary ATAPI CD-ROM<BR/>0x170/0x376<BR/><FONT POINT-SIZE="8">ven_ide.sv</FONT>>, fillcolor="#f8c9a0"]; // ---- ranks ---- { rank=same; core; mem; } { rank=same; pic; pit; rtc; kbd; p92; vga; acpi; ide1; ide2; } // ---- buses ---- core -> pmio [dir=both, xlabel="IN/OUT"]; core -> mem [dir=both, xlabel="64-bit memory bus", constraint=false]; pmio -> pic; pmio -> pit; pmio -> rtc; pmio -> kbd; pmio -> p92; pmio -> vga; pmio -> acpi; pmio -> ide1; pmio -> ide2; // keep the peripheral row in port order (invisible) pic -> pit -> rtc -> kbd -> p92 -> vga -> acpi -> ide1 -> ide2 [style=invis]; // ---- interrupt delivery (8259A) ---- edge [color="#aa3333", fontcolor="#aa3333", style=dashed, constraint=false, penwidth=1.2]; pit -> pic [xlabel="IRQ0"]; kbd -> pic [xlabel="IRQ1/12"]; rtc -> pic [xlabel="IRQ8"]; ide1 -> pic [xlabel="IRQ14"]; ide2 -> pic [xlabel="IRQ15"]; pic -> core [xlabel="INTR / INTA / vector"]; // ---- A20 mask ---- edge [color="#2f7d2f", fontcolor="#2f7d2f", style=dotted, constraint=false, penwidth=1.2]; p92 -> core [xlabel="A20 mask"]; kbd -> core [xlabel="A20 (port 64)"]; }

Peripherals (rtl/soc/ven_*.sv): the 8259A PIC (master + slave, 0x20/0xA0), the 8254 PIT (0x400x43), the MC146818 RTC (0x70/0x71), the 8042 keyboard/mouse controller (0x60/0x64), the port-92 fast-A20 gate (0x92), the VGA register file (0x3B00x3DF), the ACPI PM timer (0x608), and two IDE/ATA channels — a primary master disk (0x1F0/0x3F6) and a secondary ATAPI CD-ROM (0x170/0x376).

Interrupts. Device IRQ lines (PIT → IRQ0, keyboard/mouse → IRQ1/IRQ12, RTC → IRQ8, IDE → IRQ14/IRQ15) feed the cascaded 8259A pair, which presents a single INTR to the core and returns the vector over the INTA cycle. Several lines are wired but held quiescent (e.g. the polled IDE channels run with nIEN) so they cannot perturb the differential gate.

A20. The port-92 register and the 8042 (port 0x64) commands combine into the physical A20 address mask applied at the core’s bus boundary.

The SoC track has its own regression aggregate (make verify-soc): each device model is diffed against qemu-system-i386 over a directed bare-metal test, and the test386 external CPU tester boots on ventium_soc and is diffed byte-for-byte against QEMU.