Experiment Studies¶
This guide describes how to run multi-condition studies with experiments/run_study.py.
Use this when you need: - hypothesis trees - seed replication - condition-specific Hydra overrides - optional exact run commands - multiple evaluators per run
Quick Start¶
uv run python -m experiments.run_study --study experiments/studies/study_template_v1 plan
uv run python -m experiments.run_study --study experiments/studies/study_template_v1 generate-bash
uv run python -m experiments.run_study --study experiments/studies/study_template_v1 run
uv run python -m experiments.run_study --study experiments/studies/study_template_v1 run --only-hypothesis h2_followup_from_h1
uv run python -m experiments.run_study --study experiments/studies/study_template_v1 run --only-sub-experiment bill_bias
uv run python -m experiments.run_study --study experiments/studies/study_template_v1 summary-append --author analyst --hypothesis h1_timeline_mechanism --note "Observed higher interaction counts in recsys arms" --evidence experiments/studies/recsys_behavior_sweep/generated/repro_lock.json
Compatibility note:
- Use the canonical module entrypoint: uv run python -m experiments.run_study ....
Minimal Study File¶
schema_version: 1
study:
name: recsys_behavior_sweep
study_id: recsys_behavior_sweep
question: "How do timeline settings shift engagement?"
study_summary_path: experiments/studies/recsys_behavior_sweep/SUMMARY.md
summary_log_path: experiments/studies/recsys_behavior_sweep/generated/summary_log.jsonl
scenarios: [election_recsys_engagement]
run_defaults:
config_path: scenarios/election_recsys_engagement/conf
run_name_template: "{study_id}_{hypothesis_id}_{condition_id}_{scenario}_seed{seed}"
output_root_override: "experiments/studies/{study_id}/runs/{hypothesis_id}/{condition_id}/{scenario}/seed_{seed}/run"
seed_start: 11
seed_repeats: 3
overrides:
num_agents: 50
num_steps: 10
evaluations:
- id: action_metrics
preset: builtin.action_metrics_detailed
- id: probe_metrics
preset: builtin.probe_metrics_detailed
hypotheses:
h1:
statement: "Recommendation-heavy timelines increase interactive actions."
conditions:
chronological:
sub_experiment: bill_bias
overrides:
env.gm.components.observe.params.timeline_mode: follower_chronological
recsys:
sub_experiment: bill_bias
overrides:
env.gm.components.observe.params.timeline_mode: pure_recsys
Exact Command Mode¶
For a condition, you can fully override execution command:
execution:
mode: run
command:
- uv
- run
- python
- -m
- silisocs.runtime.runner
- --config-path
- scenarios/election_recsys_engagement/conf
- seed={seed}
Supported placeholders:
- {run_id}
- {study_name}
- {study_id}
- {hypothesis_id}
- {condition_id}
- {scenario}
- {seed}
The same placeholders also work in:
- study.run_defaults.run_name_template
- study.run_defaults.output_root_override
- conditions.<id>.run_name_template
- conditions.<id>.output_root_override
Fine-grained run control fields:
- conditions.<id>.sub_experiment: logical run group label (for example bill_bias, bradley_bias).
- conditions.<id>.config_path: optional per-condition scenario config root override.
CLI selection knobs:
- --only-hypothesis
- --only-condition
- --only-sub-experiment
- --only-seed
- --only-run-id
Where To Plug In Custom Commands¶
Most studies should use the default runner path and vary behavior with Hydra
overrides in study.run_defaults.overrides or
hypotheses.<id>.conditions.<condition>.overrides.
Use explicit commands only where the default runtime is not the thing you want to execute:
- Simulation command replacement: set
hypotheses.<id>.conditions.<condition>.execution.command. - Existing run reuse: set
execution.mode: reuse_existingand list prior runs underreuse.runs. - Evaluation and post-processing: add entries under
evaluationswith apresetor explicitcommand, plusstatic_argswhen needed. - Local/HPC setup: use
submititorslurm-arraywith--setup-command,--server-command, and--server-ready-url, or export the matchingSILISOCS_HPC_*environment variables for the generic Slurm templates.
Default Evaluators¶
Light summaries:
- builtin.activity_summary
- builtin.probe_summary
Detailed summaries:
- builtin.action_metrics_detailed
- builtin.probe_metrics_detailed
- builtin.probe_binary_detailed
- builtin.probe_numeric_detailed
- builtin.probe_choice_detailed
- builtin.probe_freetext_detailed
Detailed probe evaluators now also generate probe-type-specific PNG plots in
*_plots/ directories next to each evaluator JSON output.
Extension hook mechanism (for custom plotting/post-processing):
- Add one or more --postprocessor args via evaluator static_args.
- Format: module:function.
- Function signature: (records_by_type, out_dir, context) -> dict | list | None.
Example:
evaluations:
- id: probe_metrics
preset: builtin.probe_metrics_detailed
static_args:
- --postprocessor
- silisocs.evaluations.postprocessors:episode_probe_volume
Detailed probe evaluators use effective_config.yaml to map probe labels to configured probe types when available.
Outputs¶
Study artifacts are written under the study directory:
experiments/studies/{study_id}/generated/
plan.json
run_study.sh
repro_lock.jsonl
repro_lock.json
study_index.json
study_enriched.yaml
logs/
eval/
Simulation outputs are grouped by hypothesis/condition/scenario/seed:
Evaluator outputs mirror that hierarchy:
experiments/studies/{study_id}/generated/eval/{hypothesis_id}/{condition_id}/{scenario}/seed_{seed}/{eval_id}/...
When you run a study, the workflow is:
- validate and expand the study YAML into concrete runs
- execute fresh runs or reuse existing ones
- run configured evaluators for each record
- write reproducibility artifacts (
repro_lock.jsonl,repro_lock.json,study_index.json,study_enriched.yaml) - rebuild a notebook-friendly organized tree under
generated/organized/
The organized tree looks like this:
experiments/studies/{study_id}/generated/organized/
study_summary.yaml
summary.json
{hypothesis_id}/
hypothesis.yaml
runs.json
{condition_id}/{scenario}/seed_{seed}/
config.yaml
run -> <symlink to the run directory when available>
eval.json -> <symlink to the first evaluator output>
eval/{eval_id}/...
run builds both the raw and organized outputs. organize can be called later
to rebuild just the organized view from repro_lock.json.
Iterative Workflow (h1 -> analyze -> h2)¶
- Execute initial hypotheses:
uv run python -m experiments.run_study --study experiments/studies/election_opinion_program_v1 run --only-hypothesis h1_initial_news_bias_shift
- Review evidence and append summary:
uv run python -m experiments.run_study --study experiments/studies/election_opinion_program_v1 summary-append --author researcher --hypothesis h1_initial_news_bias_shift --note "Bias direction changed vote and favorability trajectories" --evidence experiments/studies/election_opinion_program_v1/generated/repro_lock.json
- Run follow-up hypothesis only:
uv run python -m experiments.run_study --study experiments/studies/election_opinion_program_v1 run --only-hypothesis h2_initial_persona_prior_carryover
Sample study file:
- experiments/studies/election_opinion_program_v1/study.yaml
Public HPC Usage¶
Local orchestration does not require Slurm/HPC:
uv run python -m experiments.run_study --study experiments/studies/election_opinion_program_v1 run --only-hypothesis h1_initial_news_bias_shift
For clusters, keep site-specific account, partition, module, cache, and model
startup choices outside the repository. The public templates only wire Silisocs
study/runner commands into Slurm; they do not launch any specific model server.
If you use a local OpenAI-compatible server, configure sim.llm.provider and
sim.llm.api_base in your study overrides or scenario config.
Submitit study submission¶
Install the optional HPC dependencies:
Then submit study run groups through the study runner:
uv run python -m experiments.run_study \
--study experiments/studies/election_opinion_program_v1 \
submitit \
--array-mode case \
--partition <partition> \
--account <account> \
--gpus-per-node 0 \
--only-hypothesis h1_initial_news_bias_shift
By default, submitted jobs assume any LLM endpoint already exists. If your cluster requires job-local setup, pass explicit hooks:
uv run python -m experiments.run_study \
--study experiments/studies/election_opinion_program_v1 \
submitit \
--array-mode seed \
--setup-command 'module load cuda && source .venv/bin/activate' \
--server-command './scripts/start-my-llm-server.sh' \
--server-ready-url 'http://127.0.0.1:8000/v1/models'
Silisocs treats those hooks as user-owned shell commands; it does not ship model-specific vLLM or cluster defaults.
Generic Slurm templates¶
Use slurm-array when you want to keep using direct sbatch scripts. It
computes array size from filtered study runs and prints/submits the command:
uv run python -m experiments.run_study \
--study experiments/studies/election_opinion_program_v1 \
slurm-array \
--base-script slurm_scripts/study-array-template.sh \
--array-mode case \
--only-hypothesis h1_initial_news_bias_shift \
--submit
For one direct runner job, copy or submit slurm_scripts/runner-template.sh.
Both templates support the same optional hook environment variables:
SILISOCS_HPC_SETUP_COMMANDSILISOCS_HPC_SERVER_COMMANDSILISOCS_HPC_SERVER_READY_URLSILISOCS_HPC_SERVER_TIMEOUT_SECONDS
The slurm-array command preserves the same study filters as local execution:
--only-hypothesis, --only-condition, --only-sub-experiment,
--only-seed, and --only-run-id. It also accepts --runner-python and the
same hook options (--setup-command, --server-command,
--server-ready-url, --server-timeout-seconds) and exports them to the
generic template.
Array modes:
- case (default): one task per case, all case seeds executed inside that task.
- seed: one task per seed of a case.
- hypothesis: one task per hypothesis.
- run: one task per expanded run row.
For full schema details, see Study Schema Reference.