Configuration and CLI¶
CLI entry points¶
BenchAudit exposes run.py as the main CLI and installs console scripts benchaudit and bench.
Examples:
python run.py --config path/to/config.yaml --out-root runs
python run.py --configs configs --out-root runs --benchmark
benchaudit --configs configs --out-root runs --force
Important CLI flags¶
--config: run a single YAML config--configs: run all YAML configs under a directory--out-root: output root directory (defaultruns)--benchmark: run baseline models and writeperformance.json--force: rerun even if outputs already exist--log-level: logging level (DEBUG,INFO, …)
Runtime reporting¶
summary.json contains a runtime block for newly generated runs. To
summarize recorded runtimes and estimate older runs from artifact timestamps:
python experiments/report_runtimes.py --runs-root runs --out-dir experiments/plots
The generated CSV and Markdown report include an estimate_type column so
exact timings and approximate/lower-bound values are distinguishable.
Config shape (high level)¶
BenchAudit config files are YAML mappings. Common top-level keys:
type(ormodality): loader/analyzer routing (e.g.tabular,tdc,polaris,dti)task:classificationorregressionname: dataset identifierpathorpaths: input file locations for tabular/DTI datainfo: loader/analyzer options (column names, split settings, similarity params)out: optional output directory overrideseed: optional random seed used by some components
Tabular single-file example¶
type: tabular
name: Tiny Tabular
task: classification
path: tests/data/tabular_single.csv
info:
split_col: split
smiles_col: smiles
label_col: label
id_col: compound_id
cleaner: none
Tabular three-path example¶
type: tabular
name: Split Tabular
task: classification
paths:
train: train.csv
valid: valid.csv
test: test.csv
info:
smiles_col: Drug
label_col: Y
id_col: ID
cleaner: none
DTI example¶
type: dti
modality: dti
name: Example DTI
task: classification
paths:
train: train.csv
valid: valid.csv
test: test.csv
info:
smiles_col: Ligand
label_col: classification_label
sequence_col: Protein
target_id_col: Target_ID
sequence_alignment_workers: 4
cleaner: none
keep_invalid: true
info.sequence_alignment_workers controls how many independent EMBOSS
stretcher jobs are run at once for DTI nearest-neighbor sequence alignment.
The default is 1, which preserves the previous serial behavior.
Opt-in benchmark cleaning¶
By default, BenchAudit reports invalid molecules, exact contamination, and
label conflicts without changing the loaded benchmark. To curate the in-memory
benchmark used by analysis and optional baselines, enable info.clean_benchmark:
type: tabular
name: Cleaned Example
task: classification
path: data/example.csv
info:
split_col: split
smiles_col: smiles
label_col: label
clean_benchmark: true
With true, BenchAudit applies the default policy after loading and before
analysis:
invalid molecules are removed
all rows for a molecule with conflicting labels are removed from every split
exact contaminants are kept only in reference splits
The default reference splits are train and valid. If valid is not
present, train is used. Exact contamination is defined by exact overlap of
the cleaned SMILES string; near-neighbor and scaffold similarity are still audit
signals, not automatic removal criteria. REOS alerts remain annotations unless a
row is otherwise invalid.
The policy can be customized:
info:
clean_benchmark:
reference_splits: [train, valid]
remove_invalid: true
remove_conflicts: true
remove_contaminants: true
The source files are never overwritten. Cleaned rows flow into records.csv,
summary.json, and performance.json when --benchmark is used. The
summary contains a benchmark_cleaning block with original counts, cleaned
counts, per-split removal counts, and the effective options.
Validation behavior¶
BenchAudit now validates and normalizes config payloads before loaders and analyzers run.
Examples of early validation failures:
non-mapping YAML root documents
pathandpathsboth presentmalformed
infoorpathssectionsunsupported split labels (must normalize to
train,valid/val,test)