Support Utilities¶
Configuration models (utils.config_models)¶
Configuration validation and normalization helpers.
- utils.config_models.normalize_split_column(series)[source]¶
Normalize split labels to train/valid/test while preserving pandas semantics.
- Return type:
Any
- utils.config_models.validate_yaml_mapping(data, *, source=None)[source]¶
Validate that parsed YAML data is a non-empty mapping.
- Parameters:
data (Any)
source (Path | None)
- Return type:
dict[str, Any]
- utils.config_models.normalize_loader_config(cfg)[source]¶
Validate and normalize a loader config without mutating the caller’s dict.
- Parameters:
cfg (Any)
- Return type:
dict[str, Any]
Benchmark cleaning (utils.benchmark_cleaning)¶
Opt-in cleaning utilities for curated benchmark splits.
- utils.benchmark_cleaning.clean_benchmark_splits(splits, task_type, *, reference_splits=DEFAULT_REFERENCE_SPLITS, remove_invalid=True, remove_conflicts=True, remove_contaminants=True)[source]¶
Return cleaned benchmark splits and a JSON-serializable cleaning report.
Cleaning is intentionally opt-in and operates only on in-memory split frames. Removal precedence is invalid rows, label-conflicting molecules, then exact contaminants in non-reference splits.
- Parameters:
splits (Mapping[str, pandas.DataFrame])
task_type (str)
reference_splits (Sequence[str] | str)
remove_invalid (bool)
remove_conflicts (bool)
remove_contaminants (bool)
- Return type:
Tuple[Dict[str, pandas.DataFrame], Dict[str, Any]]
Pydantic compatibility layer (utils.pydantic_compat)¶
Small compatibility layer for pydantic v1 and v2 APIs.