Core Modules¶
CLI Runner (run.py)¶
CLI runner for BenchAudit dataset audits.
- run.load_yaml(path)[source]¶
Load a YAML file into a dict.
- Parameters:
path (Path)
- Return type:
Dict[str, Any]
- run.echo_config(cfg)[source]¶
Return a lightweight echo of the config for inclusion in summary.json.
- Parameters:
cfg (Dict[str, Any])
- Return type:
Dict[str, Any]
- run.discover_yaml_files(configs_dir, single_config)[source]¶
Collect unique YAML files from a folder or a single path.
- Parameters:
configs_dir (Path | None)
single_config (Path | None)
- Return type:
List[Path]
- run.run_one_config(cfg, config_path, out_root, log, do_benchmark=False, configs_root=None, force=False)[source]¶
Run the loader, analyzer, and optional baselines for a single config.
- Parameters:
cfg (Dict[str, Any])
config_path (Path)
out_root (Path)
log (Logger)
do_benchmark (bool)
configs_root (Path | None)
force (bool)
- Return type:
None
Top-level Utilities (utils)¶
Public builders, logging helpers, and artifact writers for BenchAudit.
- utils.build_loader(cfg)[source]¶
Factory that instantiates the appropriate loader for the config.
- Parameters:
cfg (Dict[str, Any])
- Return type:
- utils.build_analyzer(cfg, logger=None)[source]¶
Factory that picks the analyzer (SMILES vs DTI) and configures it.
- Parameters:
cfg (Dict[str, Any])
logger (Logger | None)
- utils.resolve_output_dir(cfg, cli_out_root, config_path=None, configs_root=None)[source]¶
Derive the output folder: <cfg[‘out’] or cli_root/type>/<relative-config-path>/<config-name>.
- Parameters:
cfg (Dict[str, Any])
cli_out_root (Path)
config_path (Path | None)
configs_root (Path | None)
- Return type:
Path
- utils.make_logger(name=LOGGER_NAME, level='INFO')[source]¶
Return a logger with a consistent, informative format.
- Parameters:
name (str)
level (str | int)
- Return type:
Logger
- class utils.ResultWriter(out_dir, logger=None)[source]¶
Bases:
objectPersist analyzer artifacts (summary, tables, drill-down files).
- Parameters:
out_dir (Path)
logger (Optional[logging.Logger])
- write_analysis(result, write_summary=True)[source]¶
- Parameters:
result (AnalysisResult)
write_summary (bool)
- Return type:
Dict[str, Path | None]
- utils.json_default(value)[source]¶
Safe JSON encoder that understands numpy/pandas scalars.
- Parameters:
value (Any)
- utils.run_baselines(cfg, splits=None)[source]¶
Public entry point. Uses Polaris path when cfg[‘type’]==’polaris’, else generic.
- Parameters:
cfg (Dict[str, Any])
splits (Dict[str, pandas.DataFrame] | None)
- Return type:
Dict[str, Any]
- utils.clean_benchmark_splits(splits, task_type, *, reference_splits=DEFAULT_REFERENCE_SPLITS, remove_invalid=True, remove_conflicts=True, remove_contaminants=True)[source]¶
Return cleaned benchmark splits and a JSON-serializable cleaning report.
Cleaning is intentionally opt-in and operates only on in-memory split frames. Removal precedence is invalid rows, label-conflicting molecules, then exact contaminants in non-reference splits.
- Parameters:
splits (Mapping[str, pandas.DataFrame])
task_type (str)
reference_splits (Sequence[str] | str)
remove_invalid (bool)
remove_conflicts (bool)
remove_contaminants (bool)
- Return type:
Tuple[Dict[str, pandas.DataFrame], Dict[str, Any]]