Introduction¶
What is tg2hdl?¶
tg2hdl is a compiler from tinygrad’s IR to synthesizable FPGA hardware. You describe a neural network in tinygrad; tg2hdl compiles it to an Amaranth HDL module that simulates cycle-accurately and can be synthesized to an FPGA.
The compiler operates on tinygrad’s linearized UOps — the same IR tinygrad uses to emit GPU kernels — and maps each op to hardware: memories, combinational arithmetic, and an FSM sequencer.
Components¶
Path |
Role |
|---|---|
|
|
|
|
|
Typed IR: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Correctness suite: Tier 1–3 (elementwise, GEMV, multi-kernel MLP) |
|
Performance suite: 10 workloads, scalar → MNIST-scale |
|
|
|
Compiler unit and simulation tests |
|
TopModule hardware simulation tests |
|
IEEE 754 FP32 unit and integration tests |
|
End-to-end MNIST: CPU float32 vs compiler INT8 |
Workflow¶
tinygrad model
│ .schedule()
▼
list[ExecItem]
│ compile_top_module() ← auto-detects inter-kernel connections
▼
TopModule + list[KernelSpec] (Amaranth Elaboratables)
│ simulate_kernel() per kernel — or — simulate_top()
▼
numpy outputs + cycle counts
Or via the benchmark harness (handles single- and multi-kernel automatically):
from benchmarks.harness import run_bench
result = run_bench("my_kernel", build_fn, input_arrays)
assert result.correct
Status¶
Capability |
Status |
|---|---|
Generic kernel compilation |
✅ |
Scalar / elementwise / GEMV patterns |
✅ |
Fused multi-op kernels (matmul + bias + relu) |
✅ |
Multi-kernel hardware sequencing ( |
✅ |
Float32 — IEEE 754 hardware simulation |
✅ |
Float16 / BFloat16 arithmetic |
❌ No dedicated units — compile error in practice |
Multi-MAC parallelism (UNROLL) |
Planned |
FPGA synthesis |
Planned |
Supported ops¶
The compiler handles: ADD, MUL, CAST, CMPLT, WHERE, MAX, LOAD, STORE, RANGE, INDEX, DEFINE_GLOBAL, DEFINE_REG, CONST, AFTER.
All other UOps raise NotImplementedError at compile time (fail-loud policy).