ml_switcheroo.discovery¶

Discovery Package.

This package implements the Knowledge Acquisition layer of ml-switcheroo. It is responsible for identifying operations in machine learning libraries, aligning them with abstract standards, and populating the Semantic Knowledge Base.

Modules:

inspector: Low-level introspection of Python modules/objects (Live & Ghost).
scaffolder: Heuristic-based scanning to generate initial mappings.
consensus: Algorithms to align divergent API names across frameworks.
harvester: Extraction of semantic rules from manual test files.
syncer: Linking abstract operation definitions to concrete framework APIs.

Submodules¶

Classes¶

`ConsensusEngine`	Algorithms for aligning divergent API naming conventions.
`SemanticHarvester`	Analyzes Python source code to extract valid API signatures from usage.
`ApiInspector`	A robust inspector for discovering API surfaces of installed libraries.
`Scaffolder`	Automated discovery tool that aligns framework APIs.
`FrameworkSyncer`	Links abstract operations to concrete framework implementations.

Package Contents¶

class ml_switcheroo.discovery.ConsensusEngine¶

Algorithms for aligning divergent API naming conventions.

Capabilities:

Clustering: Groups APIs like HuberLoss, huber_loss, and Huber together.
Normalization: Strips common noise (prefixes/suffixes) to find the semantic root.
Signature Alignment: Builds a translation map for arguments (e.g., keepdims <-> keep_dims).
Type Consensus: Aggregates type hints found in source candidates to enrich the standard signature.

IGNORED_SUFFIXES = ['loss', 'error', 'layer', 'block', '2d', '1d', '3d', 'v1', 'v2', 'object', 'op', 'func']¶

ARG_ALIASES¶

classmethod normalize_name(name: str) → str¶

Reduces an API Name to its semantic core for comparison.

This removes casing, underscores, and common prefixes/suffixes.

Examples:

‘HuberLoss’ -> ‘huber’
‘reduce_mean’ -> ‘mean’
‘conv2d’ -> ‘conv’

Parameters:: name (str) – The raw API name (e.g. ‘CrossEntropyLoss’).
Returns:: The normalized key (e.g. ‘crossentropy’).
Return type:: str

classmethod normalize_arg(arg_name: str) → str¶

Canonicalizes an argument name using the alias map.

Example

‘learning_rate’ -> ‘lr’

Parameters:: arg_name (str) – The raw argument name.
Returns:: The canonical standard name.
Return type:: str

cluster(framework_inputs: Dict[str, List[ml_switcheroo.core.ghost.GhostRef]]) → List[CandidateStandard]¶

Groups API definitions from multiple frameworks into Candidates based on name similarity.

Parameters:: framework_inputs – Dictionary mapping ‘framework_name’ -> List of discovered GhostRefs.
Returns:: A list of potential standards, sorted by descending score.
Return type:: List[CandidateStandard]

filter_common(candidates: List[CandidateStandard], min_support: int = 2) → List[CandidateStandard]¶

Filters candidates to keep only those present in a minimum number of frameworks.

This ensures we only create standards for concepts that are truly shared across ecosystems, avoiding framework-specific noise.

Parameters:

candidates (List[CandidateStandard]) – List of candidates from clustering.
min_support (int) – Minimum number of different frameworks that must implement the op.

Returns:

Filtered list of robust candidates.

Return type:

List[CandidateStandard]

align_signatures(candidates: List[CandidateStandard], consensus_threshold: float = 0.5) → None¶

Analyses the arguments of all variants in a candidate to determine Standard Arguments and Types.

It populates std_args on the candidate by voting:

If an argument (normalized) appears in >50% of the implementations, it becomes part of the standard.
If type hints are available across the variants, it determines the consensus type and populates a rich argument definition (e.g. {‘name’: ‘x’, ‘type’: ‘int’}) instead of a simple string.

It also populates arg_mappings to translate between the Standard name and the specific framework name (e.g. Standard ‘dim’ -> Torch ‘dim’, Jax ‘axis’).

Parameters:

candidates (List[CandidateStandard]) – List of CandidateStandards to process (in-place modification).
consensus_threshold (float) – Fraction of variants that must share an arg (0.0 - 1.0).

class ml_switcheroo.discovery.SemanticHarvester(semantics: ml_switcheroo.semantics.manager.SemanticsManager, target_fw: str = 'jax')¶

Analyzes Python source code to extract valid API signatures from usage.

semantics¶

target_fw = 'jax'¶

harvest_file(file_path: pathlib.Path, dry_run: bool = False) → int¶

Scans a file, extracts mappings, and updates the semantics JSONs.

Parameters:

file_path – Path to the python test file.
dry_run – If True, does not write changes to disk.

Returns:

Number of definitions updated.

Return type:

int

class ml_switcheroo.discovery.ApiInspector¶

A robust inspector for discovering API surfaces of installed libraries.

_package_cache¶: Cache of statically parsed Griffe trees to avoid re-parsing large packages.

inspect(package_name: str, unsafe_modules: Set[str] | None = None) → Dict[str, Any]¶

Scans a package and returns a flat catalog of its public API.

Attempts static analysis first, then falls back to runtime inspection.

Parameters:

package_name – The importable name of the package (e.g. ‘torch’, ‘jax’).
unsafe_modules – A set of submodule names to exclude from recursion (e.g., {‘_C’, ‘distributed’}).

Returns:

Dict mapping ‘fully.qualified.name’ -> {metadata_dict}. Metadata dict contains ‘name’, ‘type’, ‘params’, etc.

class ml_switcheroo.discovery.Scaffolder(semantics: ml_switcheroo.semantics.manager.SemanticsManager | None = None, similarity_threshold: float = 0.8, arity_penalty: float = 0.3)¶

Automated discovery tool that aligns framework APIs.

This class scans multiple frameworks, identifies common operations based on name similarity (e.g., ‘torch.abs’ == ‘jax.numpy.abs’), and generates the initial JSON mappings required for the transpiler.

inspector¶

console¶

semantics¶

similarity_threshold = 0.8¶

arity_penalty = 0.3¶

staged_specs: Dict[str, Dict[str, Any]]¶

staged_mappings: Dict[str, Dict[str, Any]]¶

scaffold(frameworks: List[str], root_dir: pathlib.Path | None = None)¶

Main entry point. Scans frameworks and builds/updates JSON mappings.

Scans all requested frameworks using ApiInspector.
Aligns APIs against known standards (Specs).
Uses fuzzy matching to align APIs between frameworks.
Writes results to disk (semantics/ and snapshots/).

Parameters:

frameworks – List of framework keys to scan (e.g. [‘torch’, ‘jax’]).
root_dir – Optional root directory path. Defaults to package paths.

class ml_switcheroo.discovery.FrameworkSyncer¶

Links abstract operations to concrete framework implementations.

console¶

sync(tier_data: Dict[str, Any], framework: str) → None¶: Updates the ‘variants’ dict in tier_data by hunting for ops in the target framework.