ml_switcheroo.compiler.backends.sass.macros =========================================== .. py:module:: ml_switcheroo.compiler.backends.sass.macros .. autoapi-nested-parse:: SASS Macro Expansion Logic. This module defines procedural generators for complex SASS instruction kernels. Unlike 1:1 mappings (e.g. ``Add`` -> ``FADD``), these macros generate entire control flow blocks (loops, address calculations, memory loads) required to implement high-level Neural Network layers like Convolution and Linear layers directly in assembly. Classes ------- .. autoapisummary:: ml_switcheroo.compiler.backends.sass.macros.RegisterAllocatorProtocol Functions --------- .. autoapisummary:: ml_switcheroo.compiler.backends.sass.macros.expand_conv2d ml_switcheroo.compiler.backends.sass.macros.expand_linear Module Contents --------------- .. py:class:: RegisterAllocatorProtocol Bases: :py:obj:`Protocol` Protocol for the Register Allocator used during expansion. .. py:method:: get_register(var_name: str) -> ml_switcheroo.compiler.frontends.sass.nodes.Register Gets or allocates a register for a symbolic variable. :param var_name: The logical identifier. :type var_name: str :returns: The physical register. :rtype: Register .. py:method:: allocate_temp() -> ml_switcheroo.compiler.frontends.sass.nodes.Register Allocates an anonymous temporary register. :returns: The physical register. :rtype: Register .. py:function:: expand_conv2d(allocator: RegisterAllocatorProtocol, node_id: str, metadata: Dict[str, Any]) -> List[ml_switcheroo.compiler.frontends.sass.nodes.SassNode] Generates the SASS assembly kernel for a 2D Convolution loop. Logic flow: 1. Initialize Accumulator (R_ACC). 2. Setup Loop Counters (Ky, Kx). 3. Enter Y Loop -> Enter X Loop. 4. Calculate addresses (IMAD) for image and weights. 5. Load values (LDG). 6. Multiply-Add (FFMA). 7. Increment and Branch. 8. Store result. :param allocator: The register manager. :type allocator: RegisterAllocatorProtocol :param node_id: The unique ID of the operation node (used for output reg). :type node_id: str :param metadata: Layer configuration (k, stride, etc). :type metadata: Dict[str, Any] :returns: Sequence of labels and instructions. :rtype: List[SassNode] .. py:function:: expand_linear(allocator: RegisterAllocatorProtocol, node_id: str, metadata: Dict[str, Any]) -> List[ml_switcheroo.compiler.frontends.sass.nodes.SassNode] Generates the SASS assembly kernel for a Linear Layer (Matrix Multiply). Structure: 1. Initialize Accumulator. 2. Loop over input features (Dot Product). 3. Load Input element and Weight element. 4. Fused Multiply-Add. 5. Increment pointers. 6. Add Bias (if present). :param allocator: The register manager. :type allocator: RegisterAllocatorProtocol :param node_id: The unique ID of the operation node. :type node_id: str :param metadata: Attributes (in_features, out_features). :type metadata: Dict[str, Any] :returns: Sequence of instructions. :rtype: List[SassNode]