Rank-Fragility Analysis¶
The utils.rank_fragility package evaluates whether molecular leaderboard
rankings remain stable when the audited composition of a test panel changes.
Configuration (utils.rank_fragility.config)¶
Configuration dataclasses for rank-fragility analysis.
- class utils.rank_fragility.config.AuditConfig(id_col='molecule_id', smiles_col='smiles', label_col='y', split_col='split', task='classification', near_leak_thresholds=(0.85, 0.9), primary_near_leak_threshold=0.85, regression_conflict_threshold=1.0, regression_conflict_threshold_sensitivity=None, random_seed=13)[source]¶
Bases:
objectColumn names and thresholds used to annotate audited molecules.
- Parameters:
id_col (str)
smiles_col (str)
label_col (str)
split_col (str)
task (Literal['classification', 'regression'])
near_leak_thresholds (tuple[float, ...])
primary_near_leak_threshold (float)
regression_conflict_threshold (float)
regression_conflict_threshold_sensitivity (float | None)
random_seed (int)
- id_col: str = 'molecule_id'¶
- smiles_col: str = 'smiles'¶
- label_col: str = 'y'¶
- split_col: str = 'split'¶
- task: Literal['classification', 'regression'] = 'classification'¶
- near_leak_thresholds: tuple[float, ...] = (0.85, 0.9)¶
- primary_near_leak_threshold: float = 0.85¶
- regression_conflict_threshold: float = 1.0¶
- regression_conflict_threshold_sensitivity: float | None = None¶
- random_seed: int = 13¶
- class utils.rank_fragility.config.PanelConfig(id_col='molecule_id', label_col='y', task='classification', panel_size='auto', n_panels=1000, target_rates=(0.0, 0.05, 0.1, 0.25, 'observed', 0.5, 0.75), random_seed=13, output_dir=PosixPath('runs/rank_fragility'))[source]¶
Bases:
objectSampling controls for generated counterfactual evaluation panels.
- Parameters:
id_col (str)
label_col (str)
task (Literal['classification', 'regression'])
panel_size (int | str)
n_panels (int)
target_rates (tuple[float | str, ...])
random_seed (int)
output_dir (Path | str)
- id_col: str = 'molecule_id'¶
- label_col: str = 'y'¶
- task: Literal['classification', 'regression'] = 'classification'¶
- panel_size: int | str = 'auto'¶
- n_panels: int = 1000¶
- target_rates: tuple[float | str, ...] = (0.0, 0.05, 0.1, 0.25, 'observed', 0.5, 0.75)¶
- random_seed: int = 13¶
- output_dir: Path | str = PosixPath('runs/rank_fragility')¶
- class utils.rank_fragility.config.MetricConfig(task='classification', metric='auroc', baseline_model='ecfp_rf', sota_model='auto')[source]¶
Bases:
objectMetric and model-selection settings for leaderboard comparisons.
- Parameters:
task (Literal['classification', 'regression'])
metric (str)
baseline_model (str)
sota_model (str)
- task: Literal['classification', 'regression'] = 'classification'¶
- metric: str = 'auroc'¶
- baseline_model: str = 'ecfp_rf'¶
- sota_model: str = 'auto'¶
- class utils.rank_fragility.config.RunConfig(data, pred_dir, id_col='molecule_id', smiles_col='smiles', label_col='y', split_col='split', task='classification', metric='auroc', near_leak_thresholds=(0.85, 0.9), primary_near_leak_threshold=0.85, regression_conflict_threshold=1.0, regression_conflict_threshold_sensitivity=None, random_seed=13, panel_size='auto', n_panels=1000, target_rates=<factory>, baseline_model='ecfp_rf', sota_model='auto', output_dir=PosixPath('runs/rank_fragility'))[source]¶
Bases:
objectComplete input, audit, panel, and output settings for one analysis run.
- Parameters:
data (Path)
pred_dir (Path)
id_col (str)
smiles_col (str)
label_col (str)
split_col (str)
task (Literal['classification', 'regression'])
metric (str)
near_leak_thresholds (tuple[float, ...])
primary_near_leak_threshold (float)
regression_conflict_threshold (float)
regression_conflict_threshold_sensitivity (float | None)
random_seed (int)
panel_size (int | str)
n_panels (int)
target_rates (tuple[float | str, ...])
baseline_model (str)
sota_model (str)
output_dir (Path)
- data: Path¶
- pred_dir: Path¶
- id_col: str = 'molecule_id'¶
- smiles_col: str = 'smiles'¶
- label_col: str = 'y'¶
- split_col: str = 'split'¶
- task: Literal['classification', 'regression'] = 'classification'¶
- metric: str = 'auroc'¶
- near_leak_thresholds: tuple[float, ...] = (0.85, 0.9)¶
- primary_near_leak_threshold: float = 0.85¶
- regression_conflict_threshold: float = 1.0¶
- regression_conflict_threshold_sensitivity: float | None = None¶
- random_seed: int = 13¶
- panel_size: int | str = 'auto'¶
- n_panels: int = 1000¶
- target_rates: tuple[float | str, ...]¶
- baseline_model: str = 'ecfp_rf'¶
- sota_model: str = 'auto'¶
- output_dir: Path = PosixPath('runs/rank_fragility')¶
Audit and predictions¶
Molecular audit annotations for rank-fragility analysis.
- utils.rank_fragility.audit.audit_dataset(df, config)[source]¶
Annotate dataset rows with chemistry and train-test audit flags.
- Parameters:
df (pandas.DataFrame)
config (AuditConfig)
- Return type:
pandas.DataFrame
- utils.rank_fragility.audit.summarize_audit(audited_df)[source]¶
Return a long-form audit summary table.
- Parameters:
audited_df (pandas.DataFrame)
- Return type:
pandas.DataFrame
Prediction loading and audit-merge helpers.
Panels, metrics, and leaderboards¶
Counterfactual evaluation-panel sampling utilities.
- utils.rank_fragility.panels.generate_counterfactual_panels(audited_test_df, config)[source]¶
Generate a long-form counterfactual panel manifest.
- Parameters:
audited_test_df (pandas.DataFrame)
config (PanelConfig)
- Return type:
pandas.DataFrame
Metric helpers for rank-fragility leaderboard evaluation.
- utils.rank_fragility.metrics.higher_is_better(metric)[source]¶
Return whether larger values indicate better performance.
- Parameters:
metric (str)
- Return type:
bool
- utils.rank_fragility.metrics.compute_metric(y_true, y_pred, task, metric)[source]¶
Compute one supported classification or regression metric.
- Parameters:
task (str)
metric (str)
- Return type:
float
- utils.rank_fragility.metrics.per_sample_loss(y_true, y_pred, task, loss)[source]¶
Return per-example loss values for attribution summaries.
- Parameters:
task (str)
loss (str)
- Return type:
numpy.ndarray
Leaderboard scoring and ranking helpers.
- utils.rank_fragility.leaderboard.evaluate_models(pred_audit_df, subset_ids, task, metric)[source]¶
Evaluate every model on a molecule-id subset.
- Parameters:
pred_audit_df (pandas.DataFrame)
task (str)
metric (str)
- Return type:
pandas.DataFrame
Counterfactual outputs¶
Counterfactual panel evaluation and aggregate stability summaries.
- utils.rank_fragility.counterfactual.run_counterfactual_evaluation(pred_audit_df, panel_manifest, task, metric, baseline_model, sota_model)[source]¶
Evaluate all models on each counterfactual panel and aggregate stability summaries.
- Parameters:
pred_audit_df (pandas.DataFrame)
panel_manifest (pandas.DataFrame)
task (str)
metric (str)
baseline_model (str)
sota_model (str)
- Return type:
dict[str, pandas.DataFrame]
Summary helpers for composition-driven leaderboard fragility.
- utils.rank_fragility.fragility.compute_fragility_summary(rank_probabilities, sota_margin_by_composition, sota_model)[source]¶
Summarize composition rates where the SOTA conclusion becomes fragile.
- Parameters:
rank_probabilities (pandas.DataFrame)
sota_margin_by_composition (pandas.DataFrame)
sota_model (str)
- Return type:
pandas.DataFrame
Advantage decomposition helpers for audited prediction tables.
- utils.rank_fragility.attribution.compute_advantage_decomposition(pred_audit_df, sota_model, baseline_model, task, loss)[source]¶
Compute per-example SOTA advantage and aggregate by chemistry audit strata.
- Parameters:
pred_audit_df (pandas.DataFrame)
sota_model (str)
baseline_model (str)
task (str)
loss (str)
- Return type:
tuple[pandas.DataFrame, pandas.DataFrame]
Command-line driver¶
Command-line driver for rank-fragility analysis.
- utils.rank_fragility.run.build_arg_parser()[source]¶
Build the command-line parser for single-run and batch analysis.
- Return type:
ArgumentParser
- utils.rank_fragility.run.run_analysis(config)[source]¶
Run one rank-fragility analysis and write its CSV outputs.
- Parameters:
config (RunConfig)
- Return type:
dict[str, pandas.DataFrame]