Configuration and CLI
=====================

CLI entry points
----------------

BenchAudit exposes ``run.py`` as the main CLI and installs console scripts ``benchaudit`` and ``bench``.

Examples:

.. code-block:: bash

   python run.py --config path/to/config.yaml --out-root runs
   python run.py --configs configs --out-root runs --benchmark
   benchaudit --configs configs --out-root runs --force

Important CLI flags
-------------------

* ``--config``: run a single YAML config
* ``--configs``: run all YAML configs under a directory
* ``--out-root``: output root directory (default ``runs``)
* ``--benchmark``: run baseline models and write ``performance.json``
* ``--force``: rerun even if outputs already exist
* ``--log-level``: logging level (``DEBUG``, ``INFO``, ...)

Config shape (high level)
-------------------------

BenchAudit config files are YAML mappings. Common top-level keys:

* ``type`` (or ``modality``): loader/analyzer routing (e.g. ``tabular``, ``tdc``, ``polaris``, ``dti``)
* ``task``: ``classification`` or ``regression``
* ``name``: dataset identifier
* ``path`` or ``paths``: input file locations for tabular/DTI data
* ``info``: loader/analyzer options (column names, split settings, similarity params)
* ``out``: optional output directory override
* ``seed``: optional random seed used by some components

Tabular single-file example
---------------------------

.. code-block:: yaml

   type: tabular
   name: Tiny Tabular
   task: classification
   path: tests/data/tabular_single.csv
   info:
     split_col: split
     smiles_col: smiles
     label_col: label
     id_col: compound_id
     cleaner: none

Tabular three-path example
--------------------------

.. code-block:: yaml

   type: tabular
   name: Split Tabular
   task: classification
   paths:
     train: train.csv
     valid: valid.csv
     test: test.csv
   info:
     smiles_col: Drug
     label_col: Y
     id_col: ID
     cleaner: none

DTI example
-----------

.. code-block:: yaml

   type: dti
   modality: dti
   name: Example DTI
   task: classification
   paths:
     train: train.csv
     valid: valid.csv
     test: test.csv
   info:
     smiles_col: Ligand
     label_col: classification_label
     sequence_col: Protein
     target_id_col: Target_ID
     cleaner: none
     keep_invalid: true

Validation behavior
-------------------

BenchAudit now validates and normalizes config payloads before loaders and analyzers run.

Examples of early validation failures:

* non-mapping YAML root documents
* ``path`` and ``paths`` both present
* malformed ``info`` or ``paths`` sections
* unsupported split labels (must normalize to ``train``, ``valid``/``val``, ``test``)