ml_switcheroo.compiler.backends.sass.macros¶
SASS Macro Expansion Logic.
This module defines procedural generators for complex SASS instruction kernels.
Unlike 1:1 mappings (e.g. Add -> FADD), these macros generate entire
control flow blocks (loops, address calculations, memory loads) required to
implement high-level Neural Network layers like Convolution and Linear layers
directly in assembly.
Classes¶
Protocol for the Register Allocator used during expansion. |
Functions¶
|
Generates the SASS assembly kernel for a 2D Convolution loop. |
|
Generates the SASS assembly kernel for a Linear Layer (Matrix Multiply). |
Module Contents¶
- class ml_switcheroo.compiler.backends.sass.macros.RegisterAllocatorProtocol[source]¶
Bases:
ProtocolProtocol for the Register Allocator used during expansion.
- get_register(var_name: str) ml_switcheroo.compiler.frontends.sass.nodes.Register[source]¶
Gets or allocates a register for a symbolic variable.
- Parameters:
var_name (str) – The logical identifier.
- Returns:
The physical register.
- Return type:
- allocate_temp() ml_switcheroo.compiler.frontends.sass.nodes.Register[source]¶
Allocates an anonymous temporary register.
- Returns:
The physical register.
- Return type:
- ml_switcheroo.compiler.backends.sass.macros.expand_conv2d(allocator: RegisterAllocatorProtocol, node_id: str, metadata: Dict[str, Any]) List[ml_switcheroo.compiler.frontends.sass.nodes.SassNode][source]¶
Generates the SASS assembly kernel for a 2D Convolution loop.
Logic flow: 1. Initialize Accumulator (R_ACC). 2. Setup Loop Counters (Ky, Kx). 3. Enter Y Loop -> Enter X Loop. 4. Calculate addresses (IMAD) for image and weights. 5. Load values (LDG). 6. Multiply-Add (FFMA). 7. Increment and Branch. 8. Store result.
- Parameters:
allocator (RegisterAllocatorProtocol) – The register manager.
node_id (str) – The unique ID of the operation node (used for output reg).
metadata (Dict[str, Any]) – Layer configuration (k, stride, etc).
- Returns:
Sequence of labels and instructions.
- Return type:
List[SassNode]
- ml_switcheroo.compiler.backends.sass.macros.expand_linear(allocator: RegisterAllocatorProtocol, node_id: str, metadata: Dict[str, Any]) List[ml_switcheroo.compiler.frontends.sass.nodes.SassNode][source]¶
Generates the SASS assembly kernel for a Linear Layer (Matrix Multiply).
Structure: 1. Initialize Accumulator. 2. Loop over input features (Dot Product). 3. Load Input element and Weight element. 4. Fused Multiply-Add. 5. Increment pointers. 6. Add Bias (if present).
- Parameters:
allocator (RegisterAllocatorProtocol) – The register manager.
node_id (str) – The unique ID of the operation node.
metadata (Dict[str, Any]) – Attributes (in_features, out_features).
- Returns:
Sequence of instructions.
- Return type:
List[SassNode]