# Introduction ## Project goal `tg2hdl` explores a practical question: **can we take neural-network math from tinygrad and map it to a hardware-friendly datapath?** The current focus is intentionally narrow and concrete: - a tiny 2-layer MNIST MLP, - the key matrix-vector kernels in that model, - and a sequential INT8 GEMV implementation in Amaranth HDL. This repo is a prototype to validate correctness, interfaces, and timing behavior before bigger steps like kernel auto-generation and parallel MAC architectures. ## End-to-end flow today At a high level, the workflow is: 1. Build / inspect a tinygrad model and identify kernel shapes (`inspect_kernels.py`). 2. Train and export model parameters (`train_mnist.py` → `mnist_weights.safetensors`). 3. Implement hardware building blocks in Amaranth (`hdl/gemv.py`, `hdl/relu.py`). 4. Validate behavior cycle-by-cycle in simulation (`tests/test_gemv.py`). The current hardware primitive is GEMV because the model is run with batch size 1, so matrix multiplications collapse to matrix-vector products. ## Is this auto-generated from Python today? **Partially, but not end-to-end auto-generated yet.** - tinygrad is used in Python to define/inspect model compute and kernel structure. - Amaranth is also Python, but the HDL module (`GEMVUnit`) is currently **hand-authored**. - Tests compare the hardware simulation output against NumPy reference math for correctness. So the project already has Python at every stage, but there is **not yet** an automatic compiler pass that converts tinygrad kernel IR directly into Amaranth modules. That is a planned direction. ## What exists now vs. what is planned ### Implemented now - Sequential single-MAC GEMV with INT8 × INT8 multiply and INT32 accumulation. - FSM-driven control (`IDLE → COMPUTE → EMIT → DONE`). - Deterministic and randomized simulation tests, including timing checks. ### Planned next - Multi-MAC parallelization. - Bias and activation chaining for fuller layer execution. - Better handling of larger kernel dimensions. - **Auto-generation from tinygrad kernel IR to hardware templates**. ## Why this shape is valuable This incremental structure keeps risk low: - validate numeric behavior early, - make cycle costs explicit, - and keep hardware/software assumptions transparent. In short, this repo is the bridge between model-kernel understanding and hardware realization, with correctness-first tooling to support iteration. ## Are these docs auto-generated from Python docstrings? Partially. Guides are hand-written, and API reference pages are auto-generated with **Sphinx autodoc**. - Narrative pages in `docs/guide/*.md` are written manually. - `docs/guide/api-reference.md` renders API docs from Python modules/docstrings automatically via `automodule`. - This gives both high-level explainers and synchronized code-level reference.