ml_switcheroo.core.mlir.parser

MLIR Recursive Descent Parser.

This module parses text-based MLIR code into the CST object model defined in nodes.py. It is designed to preserve trivia (comments/whitespace) to support high-fidelity round-trip transformations.

Classes

Token

Represents a lexical token extracted from the source string.

Tokenizer

Lexical analyzer for MLIR syntax.

MlirParser

Parses a stream of MLIR tokens into a Concrete Syntax Tree.

Module Contents

class ml_switcheroo.core.mlir.parser.Token[source]

Represents a lexical token extracted from the source string.

kind: str
text: str
line: int
col: int
class ml_switcheroo.core.mlir.parser.Tokenizer(text: str)[source]

Lexical analyzer for MLIR syntax.

Splits the input string into a stream of typed Tokens based on regex patterns.

PATTERN_DEFS
text
tokenize() Generator[Token, None, None][source]

Yields tokens from the source text one by one.

Yields:

Token – The next lexical token.

Raises:

ValueError – If an unrecognized character sequence is encountered.

class ml_switcheroo.core.mlir.parser.MlirParser(text: str)[source]

Parses a stream of MLIR tokens into a Concrete Syntax Tree.

Implements recursive descent logic to handle Modules, Blocks, Operations, and Regions while preserving whitespace and comments (trivia) for accurate reproduction.

tokenizer
tokens
pos = 0
trivia_buffer: List[ml_switcheroo.core.mlir.nodes.TriviaNode] = []
peek(offset: int = 0) Token[source]

Look ahead at a token without consuming it.

Parameters:

offset (int) – Number of tokens to look ahead. Defaults to 0 (current).

Returns:

The token at the lookahead position.

Return type:

Token

consume() Token[source]

Consumes and returns the current token, advancing the pointer.

Returns:

The consumed token.

Return type:

Token

match(kind: str) bool[source]

Checks if the current token matches the specified kind or text.

Parameters:

kind (str) – The token kind (e.g. TokenKind.VAL_ID) or specific symbol text (e.g. ‘{‘).

Returns:

True if the current token matches.

Return type:

bool

expect(kind: str) Token[source]

Consume the current token if it matches kind, otherwise raise SyntaxError.

Parameters:

kind (str) – The expected token kind or text.

Returns:

The consumed token.

Return type:

Token

Raises:

SyntaxError – If the current token does not match the expectation.

parse() ml_switcheroo.core.mlir.nodes.ModuleNode[source]

Top-level parsing entry point.

Returns:

The root of the MLIR CST.

Return type:

ModuleNode

parse_block(is_top_level: bool = False) ml_switcheroo.core.mlir.nodes.BlockNode[source]

Parses a Basic Block.

A block consists of an optional label (with arguments) and a list of operations.

Parameters:

is_top_level (bool) – If True, treats the input as an implicit top-level module block which may not have a label or braces.

Returns:

The parsed block structure.

Return type:

BlockNode

Raises:

SyntaxError – If invalid tokens are encountered where an operation was expected.

parse_operation() ml_switcheroo.core.mlir.nodes.OperationNode | None[source]

Parses a single MLIR Operation.

Structure: %results = “op.name”(%operands) {attributes} ({regions}) : type

Returns:

The parsed operation, or None if no valid op start found.

Return type:

Optional[OperationNode]

Raises:

SyntaxError – If structural expectations (e.g. closing parens) are unmet.

parse_region() ml_switcheroo.core.mlir.nodes.RegionNode[source]

Parses a Region containing nested Blocks.

Enclosed in curly braces { … }.

Returns:

The parsed region.

Return type:

RegionNode