Maintenance
===========
**ml-switcheroo** is a data-driven transpiler. Its intelligence relies on a distributed **Knowledge Base** separating *Abstract Specifications* (The Hub) from *Framework Implementations* (The Spokes).
Maintenance primarily involves synchronizing this knowledge base with the ecosystem of Machine Learning libraries and upstream standards.
This guide covers the full lifecycle: **Ingestion**, **Discovery**, **Mapping**, **Verification**, and **Release**.
---
## 🔄 The Maintenance Lifecycle
Data flows from external authoritative sources (Standards Bodies, Library APIs) into our semantic storage tiers, and finally into verification reports.
```mermaid
graph TD
%% Style definitions
classDef external fill:#ea4335,stroke:#20344b,color:#ffffff,rx:5px;
classDef hub fill:#f9ab00,stroke:#20344b,color:#20344b,rx:5px;
classDef spoke fill:#fff4c7,stroke:#f9ab00,stroke-dasharray: 5 5,color:#20344b,rx:5px;
classDef action fill:#4285f4,stroke:#20344b,color:#ffffff,rx:5px;
subgraph Sources [1. Upstream Sources]
direction TB
STD_A("Array API Standard
(Python Consortium)"):::external
STD_B("ONNX Operators
(Linux Foundation)"):::external
LIBS("Installed Libraries
(Torch, JAX, TF)"):::external
end
subgraph Ingestion [2. Ingestion & Discovery]
IMPORT("import-spec"):::action
SCAFFOLD("scaffold / sync"):::action
CONSENSUS("sync-standards
(Consensus Engine)"):::action
HARVEST("harvest
(Learn from Tests)"):::action
end
subgraph Storage [3. Knowledge Base]
direction TB
HUB[("The Hub (Specs)
semantics/*.json
Definitions & Types")]:::hub
SPOKE[("The Spokes (Maps)
snapshots/*_mappings.json
API Links & Plugins")]:::spoke
end
subgraph Verify [4. Verification]
CI("CI Fuzzer &
Gen-Tests"):::action
LOCK("Verified Lockfile
README Matrix"):::spoke
end
STD_A --> IMPORT
STD_B --> IMPORT
LIBS --> SCAFFOLD
LIBS --> CONSENSUS
IMPORT --> HUB
CONSENSUS --> HUB
HARVEST --> SPOKE
SCAFFOLD --> SPOKE
HUB --> CI
SPOKE --> CI
CI --> LOCK
```
---
## ⚡ Quick Start: The Bootstrap Script
The entire Knowledge Base can be hydrated from scratch using the bootstrap utility. This script sequentially runs ingestion, consensus discovery, scaffolding, ghost snapshotting, and synchronization for all supported frameworks.
**Run this when:**
* You have added a new framework adapter.
* You want to update mappings for newer versions of PyTorch/JAX/TF.
* You want to reset the semantic definitions to their upstream defaults.
```bash
# Full hydration cycle (Warning: Overwrites existing JSONs)
./scripts/bootstrap.sh
```
---
## 🛠️ Phase 1: Ingestion (The Hub)
We maintain three tiers of "Abstract Standards" in `src/ml_switcheroo/semantics/` defining **WHAT** an operation is.
### Tier A: Math (Array API)
Derived from the Python Data API Consortium.
```bash
# 1. Clone the standard stubs
git clone -b 2024.12 --depth=1 https://github.com/data-apis/array-api _tmp/array-api
# 2. Import definitions to k_array_api.json
ml_switcheroo import-spec ./_tmp/array-api/src/array_api_stubs/_2024_12
```
### Tier B: Neural (ONNX)
Derived from the Open Neural Network Exchange (ONNX) operator set.
```bash
# 1. Fetch Operators docs
git clone --depth=1 -b v1.20.0 https://github.com/onnx/onnx _tmp/onnx
# 2. Parse Markdown to k_neural_net.json
ml_switcheroo import-spec ./_tmp/onnx/docs/Operators.md
```
### Discovery (Consensus Engine)
For operations not covered by official bodies (e.g., Optimizers, proprietary Layers), we use the **Consensus Engine**. It scans all installed frameworks, clusters compatible API signatures (e.g., `Torch.Adam` vs `Flax.Adam`), and proposes a unified standard.
```bash
# Scan installed libs and generate k_discovered.json
ml_switcheroo sync-standards --categories layer activation loss optimizer
```
---
## 🔗 Phase 2: Mapping (The Spokes)
Once the Hub (Specs) is populated, we link specific frameworks to it defining **HOW** operations are implemented. These mappings live in `src/ml_switcheroo/snapshots/`.
### Mapping a Framework (`sync`)
The `sync` command introspects a library (e.g., `torch`) and matches its API surface against the known Spec.
```bash
# Link PyTorch implementation to the Standards
ml_switcheroo sync torch
# Link JAX implementation
ml_switcheroo sync jax
```
### Heuristic Scaffolding (`scaffold`)
For frameworks with non-standard naming conventions (e.g., `tensorflow`), use the `scaffold` command. It utilizes regex patterns defined in the Framework Adapter's `discovery_heuristics` property to fuzzy-match APIs.
```bash
# Scan and populate mappings via regex heuristics
ml_switcheroo scaffold --frameworks tensorflow mlx
```
### Semantic Harvesting (`harvest`)
The most robust way to maintain mappings is to "Learn from Humans." If you write a manual test case fixing a translation error, the Harvester can extract the rule back into the JSONs.
1. **Write/Fix a test** in `tests/examples/`:
```python
def test_custom_add():
# You manually fixed arguments: alpha -> scale
jax.numpy.add(x, y, scale=0.5)
```
2. **Run the extractor**:
```bash
ml_switcheroo harvest tests/examples/test_custom_add.py --target jax
```
---
## 👻 Phase 3: Ghost Mode Support
ml-switcheroo can run in browser environments (WebAssembly) where heavy libraries like PyTorch cannot be installed. To support this, we must capture raw API signatures.
### Capturing Snapshots
This command dumps the raw introspection data (signatures, docstrings, class hierarchies) of installed libraries into JSON files. This data allows the `GhostInspector` to simulate the presence of the library during transpilation.
```bash
# Generates files like snapshots/torch_v2.1.0.json
ml_switcheroo snapshot --out-dir src/ml_switcheroo/snapshots
```
*Note: `bootstrap.sh` runs this automatically.*
---
## ✅ Phase 4: Verification (CI Loop)
We validate the mathematical correctness of mappings using a robotic fuzzer. It generates random inputs based on Type Hints in the Spec, executes the operation in both Source and Target frameworks, and asserts equivalence.
### Running the Fuzzer
```bash
# 1. Install all backends
pip install ".[test]"
pip install torch jax flax tensorflow mlx numpy
# 2. Run Verification Suite
ml_switcheroo ci
```
### Physical Test Generation
To ensure regression testing without running the full fuzzer every time, generate physical Python test files:
```bash
ml_switcheroo gen-tests --out tests/generated/test_tier_a_math.py
```
### Updating Compatibility Matrix
If the CI pass changes the support status of any operation, update the `README.md` table:
```bash
ml_switcheroo ci --update-readme
```
---
## 📚 Documentation & Web Demo
The project documentation (Sphinx) includes a client-side WebAssembly (WASM) demo powered by Pyodide.
### Building Docs & Wheel
The documentation build script automatically packages the current source into a `.whl` and injects it into the static site assets.
```bash
python scripts/build_docs.py
```
---
## 🗃️ Glossary of Artifacts
The Knowledge Base is composed of specific JSON files with distinct roles.
| Artifact Path | Classification | Role & Purpose | Maintenance Strategy |
| :--- | :--- | :--- | :--- |
| `semantics/k_array_api.json` | **Hub (Spec)** | **Tier A (Math):** Basic array operations (abs, sum) derived from the Python Data API Consortium. | **Import** via `import-spec`. |
| `semantics/k_neural_net.json` | **Hub (Spec)** | **Tier B (Neural):** Stateful layers (Conv2d, LSTM) derived from ONNX Operators. | **Import** via `import-spec`. |
| `semantics/k_framework_extras.json` | **Hub (Spec)** | **Tier C (Extras):** Utilities, IO, Devices. Often manually curated or scaffolded. | **Harvest** or **Wizard**. |
| `semantics/k_discovered.json` | **Hub (Spec)** | **Consensus:** Ops discovered by overlapping API surfaces (Optimizers/Activations). | **Generate** via `sync-standards`. |
| `snapshots/{fw}_v*_map.json` | **Spoke (Overlay)** | **Mapping Overlay:** Defines how a specific framework implements the specs. Contains API paths and Plugin hooks. | **Sync**, **Scaffold**, or **Harvest**. |
| `snapshots/{fw}_v*.json` | **Ghost Snapshot** | **Raw API Dump:** Serialized signatures of the library. Used by `GhostInspector` in WASM. | **Capture** via `snapshot`. |