ml_switcheroo.discovery¶

Discovery Package.

This package implements the Knowledge Acquisition layer of ml-switcheroo. It is responsible for identifying operations in machine learning libraries, aligning them with abstract standards, and populating the Semantic Knowledge Base.

Modules:

  • inspector: Low-level introspection of Python modules/objects (Live & Ghost).

  • scaffolder: Heuristic-based scanning to generate initial mappings.

  • consensus: Algorithms to align divergent API names across frameworks.

  • harvester: Extraction of semantic rules from manual test files.

  • syncer: Linking abstract operation definitions to concrete framework APIs.

Submodules¶

Classes¶

ConsensusEngine

Algorithms for aligning divergent API naming conventions.

SemanticHarvester

Analyzes Python source code to extract valid API signatures from usage.

ApiInspector

A robust inspector for discovering API surfaces of installed libraries.

Scaffolder

Automated discovery tool that aligns framework APIs.

FrameworkSyncer

Links abstract operations to concrete framework implementations.

Package Contents¶

class ml_switcheroo.discovery.ConsensusEngine¶

Algorithms for aligning divergent API naming conventions.

Capabilities:

  1. Clustering: Groups APIs like HuberLoss, huber_loss, and Huber together.

  2. Normalization: Strips common noise (prefixes/suffixes) to find the semantic root.

  3. Signature Alignment: Builds a translation map for arguments (e.g., keepdims <-> keep_dims).

  4. Type Consensus: Aggregates type hints found in source candidates to enrich the standard signature.

IGNORED_SUFFIXES = ['loss', 'error', 'layer', 'block', '2d', '1d', '3d', 'v1', 'v2', 'object', 'op', 'func']¶
ARG_ALIASES¶
classmethod normalize_name(name: str) → str¶

Reduces an API Name to its semantic core for comparison.

This removes casing, underscores, and common prefixes/suffixes.

Examples:

  • ‘HuberLoss’ -> ‘huber’

  • ‘reduce_mean’ -> ‘mean’

  • ‘conv2d’ -> ‘conv’

Parameters:

name (str) – The raw API name (e.g. ‘CrossEntropyLoss’).

Returns:

The normalized key (e.g. ‘crossentropy’).

Return type:

str

classmethod normalize_arg(arg_name: str) → str¶

Canonicalizes an argument name using the alias map.

Example

‘learning_rate’ -> ‘lr’

Parameters:

arg_name (str) – The raw argument name.

Returns:

The canonical standard name.

Return type:

str

cluster(framework_inputs: Dict[str, List[ml_switcheroo.core.ghost.GhostRef]]) → List[CandidateStandard]¶

Groups API definitions from multiple frameworks into Candidates based on name similarity.

Parameters:

framework_inputs – Dictionary mapping ‘framework_name’ -> List of discovered GhostRefs.

Returns:

A list of potential standards, sorted by descending score.

Return type:

List[CandidateStandard]

filter_common(candidates: List[CandidateStandard], min_support: int = 2) → List[CandidateStandard]¶

Filters candidates to keep only those present in a minimum number of frameworks.

This ensures we only create standards for concepts that are truly shared across ecosystems, avoiding framework-specific noise.

Parameters:
  • candidates (List[CandidateStandard]) – List of candidates from clustering.

  • min_support (int) – Minimum number of different frameworks that must implement the op.

Returns:

Filtered list of robust candidates.

Return type:

List[CandidateStandard]

align_signatures(candidates: List[CandidateStandard], consensus_threshold: float = 0.5) → None¶

Analyses the arguments of all variants in a candidate to determine Standard Arguments and Types.

It populates std_args on the candidate by voting:

  1. If an argument (normalized) appears in >50% of the implementations, it becomes part of the standard.

  2. If type hints are available across the variants, it determines the consensus type and populates a rich argument definition (e.g. {‘name’: ‘x’, ‘type’: ‘int’}) instead of a simple string.

It also populates arg_mappings to translate between the Standard name and the specific framework name (e.g. Standard ‘dim’ -> Torch ‘dim’, Jax ‘axis’).

Parameters:
  • candidates (List[CandidateStandard]) – List of CandidateStandards to process (in-place modification).

  • consensus_threshold (float) – Fraction of variants that must share an arg (0.0 - 1.0).

class ml_switcheroo.discovery.SemanticHarvester(semantics: ml_switcheroo.semantics.manager.SemanticsManager, target_fw: str = 'jax')¶

Analyzes Python source code to extract valid API signatures from usage.

semantics¶
target_fw = 'jax'¶
harvest_file(file_path: pathlib.Path, dry_run: bool = False) → int¶

Scans a file, extracts mappings, and updates the semantics JSONs.

Parameters:
  • file_path – Path to the python test file.

  • dry_run – If True, does not write changes to disk.

Returns:

Number of definitions updated.

Return type:

int

class ml_switcheroo.discovery.ApiInspector¶

A robust inspector for discovering API surfaces of installed libraries.

_package_cache¶

Cache of statically parsed Griffe trees to avoid re-parsing large packages.

inspect(package_name: str, unsafe_modules: Set[str] | None = None) → Dict[str, Any]¶

Scans a package and returns a flat catalog of its public API.

Attempts static analysis first, then falls back to runtime inspection.

Parameters:
  • package_name – The importable name of the package (e.g. ‘torch’, ‘jax’).

  • unsafe_modules – A set of submodule names to exclude from recursion (e.g., {‘_C’, ‘distributed’}).

Returns:

Dict mapping ‘fully.qualified.name’ -> {metadata_dict}. Metadata dict contains ‘name’, ‘type’, ‘params’, etc.

class ml_switcheroo.discovery.Scaffolder(semantics: ml_switcheroo.semantics.manager.SemanticsManager | None = None, similarity_threshold: float = 0.8, arity_penalty: float = 0.3)¶

Automated discovery tool that aligns framework APIs.

This class scans multiple frameworks, identifies common operations based on name similarity (e.g., ‘torch.abs’ == ‘jax.numpy.abs’), and generates the initial JSON mappings required for the transpiler.

inspector¶
console¶
semantics¶
similarity_threshold = 0.8¶
arity_penalty = 0.3¶
staged_specs: Dict[str, Dict[str, Any]]¶
staged_mappings: Dict[str, Dict[str, Any]]¶
scaffold(frameworks: List[str], root_dir: pathlib.Path | None = None)¶

Main entry point. Scans frameworks and builds/updates JSON mappings.

  1. Scans all requested frameworks using ApiInspector.

  2. Aligns APIs against known standards (Specs).

  3. Uses fuzzy matching to align APIs between frameworks.

  4. Writes results to disk (semantics/ and snapshots/).

Parameters:
  • frameworks – List of framework keys to scan (e.g. [‘torch’, ‘jax’]).

  • root_dir – Optional root directory path. Defaults to package paths.

class ml_switcheroo.discovery.FrameworkSyncer¶

Links abstract operations to concrete framework implementations.

console¶
sync(tier_data: Dict[str, Any], framework: str) → None¶

Updates the ‘variants’ dict in tier_data by hunting for ops in the target framework.