ml_switcheroo.compiler.backends.sass.macros¶

SASS Macro Expansion Logic.

This module defines procedural generators for complex SASS instruction kernels. Unlike 1:1 mappings (e.g. Add -> FADD), these macros generate entire control flow blocks (loops, address calculations, memory loads) required to implement high-level Neural Network layers like Convolution and Linear layers directly in assembly.

Classes¶

RegisterAllocatorProtocol

Protocol for the Register Allocator used during expansion.

Functions¶

expand_conv2d(...)

Generates the SASS assembly kernel for a 2D Convolution loop.

expand_linear(...)

Generates the SASS assembly kernel for a Linear Layer (Matrix Multiply).

Module Contents¶

class ml_switcheroo.compiler.backends.sass.macros.RegisterAllocatorProtocol[source]¶

Bases: Protocol

Protocol for the Register Allocator used during expansion.

get_register(var_name: str) → ml_switcheroo.compiler.frontends.sass.nodes.Register[source]¶

Gets or allocates a register for a symbolic variable.

Parameters:

var_name (str) – The logical identifier.

Returns:

The physical register.

Return type:

Register

allocate_temp() → ml_switcheroo.compiler.frontends.sass.nodes.Register[source]¶

Allocates an anonymous temporary register.

Returns:

The physical register.

Return type:

Register

ml_switcheroo.compiler.backends.sass.macros.expand_conv2d(allocator: RegisterAllocatorProtocol, node_id: str, metadata: Dict[str, Any]) → List[ml_switcheroo.compiler.frontends.sass.nodes.SassNode][source]¶

Generates the SASS assembly kernel for a 2D Convolution loop.

Logic flow: 1. Initialize Accumulator (R_ACC). 2. Setup Loop Counters (Ky, Kx). 3. Enter Y Loop -> Enter X Loop. 4. Calculate addresses (IMAD) for image and weights. 5. Load values (LDG). 6. Multiply-Add (FFMA). 7. Increment and Branch. 8. Store result.

Parameters:
  • allocator (RegisterAllocatorProtocol) – The register manager.

  • node_id (str) – The unique ID of the operation node (used for output reg).

  • metadata (Dict[str, Any]) – Layer configuration (k, stride, etc).

Returns:

Sequence of labels and instructions.

Return type:

List[SassNode]

ml_switcheroo.compiler.backends.sass.macros.expand_linear(allocator: RegisterAllocatorProtocol, node_id: str, metadata: Dict[str, Any]) → List[ml_switcheroo.compiler.frontends.sass.nodes.SassNode][source]¶

Generates the SASS assembly kernel for a Linear Layer (Matrix Multiply).

Structure: 1. Initialize Accumulator. 2. Loop over input features (Dot Product). 3. Load Input element and Weight element. 4. Fused Multiply-Add. 5. Increment pointers. 6. Add Bias (if present).

Parameters:
  • allocator (RegisterAllocatorProtocol) – The register manager.

  • node_id (str) – The unique ID of the operation node.

  • metadata (Dict[str, Any]) – Attributes (in_features, out_features).

Returns:

Sequence of instructions.

Return type:

List[SassNode]