ml_switcheroo.discovery¶
Discovery Package.
This package implements the Knowledge Acquisition layer of ml-switcheroo. It is responsible for identifying operations in machine learning libraries, aligning them with abstract standards, and populating the Semantic Knowledge Base.
Modules:
inspector: Low-level introspection of Python modules/objects (Live & Ghost).scaffolder: Heuristic-based scanning to generate initial mappings.consensus: Algorithms to align divergent API names across frameworks.harvester: Extraction of semantic rules from manual test files.syncer: Linking abstract operation definitions to concrete framework APIs.
Submodules¶
Classes¶
Algorithms for aligning divergent API naming conventions. |
|
Analyzes Python source code to extract valid API signatures from usage. |
|
A robust inspector for discovering API surfaces of installed libraries. |
|
Automated discovery tool that aligns framework APIs. |
|
Links abstract operations to concrete framework implementations. |
Package Contents¶
- class ml_switcheroo.discovery.ConsensusEngine¶
Algorithms for aligning divergent API naming conventions.
Capabilities:
Clustering: Groups APIs like HuberLoss, huber_loss, and Huber together.
Normalization: Strips common noise (prefixes/suffixes) to find the semantic root.
Signature Alignment: Builds a translation map for arguments (e.g., keepdims <-> keep_dims).
Type Consensus: Aggregates type hints found in source candidates to enrich the standard signature.
- IGNORED_SUFFIXES = ['loss', 'error', 'layer', 'block', '2d', '1d', '3d', 'v1', 'v2', 'object', 'op', 'func']¶
- ARG_ALIASES¶
- classmethod normalize_name(name: str) str¶
Reduces an API Name to its semantic core for comparison.
This removes casing, underscores, and common prefixes/suffixes.
Examples:
‘HuberLoss’ -> ‘huber’
‘reduce_mean’ -> ‘mean’
‘conv2d’ -> ‘conv’
- Parameters:
name (str) – The raw API name (e.g. ‘CrossEntropyLoss’).
- Returns:
The normalized key (e.g. ‘crossentropy’).
- Return type:
str
- classmethod normalize_arg(arg_name: str) str¶
Canonicalizes an argument name using the alias map.
Example
‘learning_rate’ -> ‘lr’
- Parameters:
arg_name (str) – The raw argument name.
- Returns:
The canonical standard name.
- Return type:
str
- cluster(framework_inputs: Dict[str, List[ml_switcheroo.core.ghost.GhostRef]]) List[CandidateStandard]¶
Groups API definitions from multiple frameworks into Candidates based on name similarity.
- Parameters:
framework_inputs – Dictionary mapping ‘framework_name’ -> List of discovered GhostRefs.
- Returns:
A list of potential standards, sorted by descending score.
- Return type:
List[CandidateStandard]
- filter_common(candidates: List[CandidateStandard], min_support: int = 2) List[CandidateStandard]¶
Filters candidates to keep only those present in a minimum number of frameworks.
This ensures we only create standards for concepts that are truly shared across ecosystems, avoiding framework-specific noise.
- Parameters:
candidates (List[CandidateStandard]) – List of candidates from clustering.
min_support (int) – Minimum number of different frameworks that must implement the op.
- Returns:
Filtered list of robust candidates.
- Return type:
List[CandidateStandard]
- align_signatures(candidates: List[CandidateStandard], consensus_threshold: float = 0.5) None¶
Analyses the arguments of all variants in a candidate to determine Standard Arguments and Types.
It populates std_args on the candidate by voting:
If an argument (normalized) appears in >50% of the implementations, it becomes part of the standard.
If type hints are available across the variants, it determines the consensus type and populates a rich argument definition (e.g. {‘name’: ‘x’, ‘type’: ‘int’}) instead of a simple string.
It also populates arg_mappings to translate between the Standard name and the specific framework name (e.g. Standard ‘dim’ -> Torch ‘dim’, Jax ‘axis’).
- Parameters:
candidates (List[CandidateStandard]) – List of CandidateStandards to process (in-place modification).
consensus_threshold (float) – Fraction of variants that must share an arg (0.0 - 1.0).
- class ml_switcheroo.discovery.SemanticHarvester(semantics: ml_switcheroo.semantics.manager.SemanticsManager, target_fw: str = 'jax')¶
Analyzes Python source code to extract valid API signatures from usage.
- semantics¶
- target_fw = 'jax'¶
- harvest_file(file_path: pathlib.Path, dry_run: bool = False) int¶
Scans a file, extracts mappings, and updates the semantics JSONs.
- Parameters:
file_path – Path to the python test file.
dry_run – If True, does not write changes to disk.
- Returns:
Number of definitions updated.
- Return type:
int
- class ml_switcheroo.discovery.ApiInspector¶
A robust inspector for discovering API surfaces of installed libraries.
- _package_cache¶
Cache of statically parsed Griffe trees to avoid re-parsing large packages.
- inspect(package_name: str, unsafe_modules: Set[str] | None = None) Dict[str, Any]¶
Scans a package and returns a flat catalog of its public API.
Attempts static analysis first, then falls back to runtime inspection.
- Parameters:
package_name – The importable name of the package (e.g. ‘torch’, ‘jax’).
unsafe_modules – A set of submodule names to exclude from recursion (e.g., {‘_C’, ‘distributed’}).
- Returns:
Dict mapping ‘fully.qualified.name’ -> {metadata_dict}. Metadata dict contains ‘name’, ‘type’, ‘params’, etc.
- class ml_switcheroo.discovery.Scaffolder(semantics: ml_switcheroo.semantics.manager.SemanticsManager | None = None, similarity_threshold: float = 0.8, arity_penalty: float = 0.3)¶
Automated discovery tool that aligns framework APIs.
This class scans multiple frameworks, identifies common operations based on name similarity (e.g., ‘torch.abs’ == ‘jax.numpy.abs’), and generates the initial JSON mappings required for the transpiler.
- inspector¶
- console¶
- semantics¶
- similarity_threshold = 0.8¶
- arity_penalty = 0.3¶
- staged_specs: Dict[str, Dict[str, Any]]¶
- staged_mappings: Dict[str, Dict[str, Any]]¶
- scaffold(frameworks: List[str], root_dir: pathlib.Path | None = None)¶
Main entry point. Scans frameworks and builds/updates JSON mappings.
Scans all requested frameworks using ApiInspector.
Aligns APIs against known standards (Specs).
Uses fuzzy matching to align APIs between frameworks.
Writes results to disk (semantics/ and snapshots/).
- Parameters:
frameworks – List of framework keys to scan (e.g. [‘torch’, ‘jax’]).
root_dir – Optional root directory path. Defaults to package paths.