Evaluation Probes¶
Probes are structured surveys deployed to agents during the simulation. They measure agent attitudes, preferences, and knowledge over time — providing quantitative data for analysis.
Overview¶
Probes are configured under eval.probes in your scenario YAML. The system
supports multiple probe types, configurable deployment schedules, and concurrent
execution across large agent populations.
Probe Types¶
Built-in Types¶
| Type | Response | Example |
|---|---|---|
NumericRatingProbe |
Integer on a configurable scale (default 1-10) | "Rate your satisfaction 1-10" |
BinaryProbe |
Yes/No | "Would you vote for candidate X?" |
ChoiceProbe |
One of N options | "Which candidate do you prefer: A, B, or C?" |
FreeTextProbe |
Open-ended string | "What are your concerns about the election?" |
Probe Configuration¶
probes:
probes:
satisfaction:
probe_name: satisfaction
probe_type: NumericRatingProbe
probe_data:
name: Satisfaction
question: "Return one number from {lo} to {hi}: how satisfied are you with community discussions?"
lo: 1
hi: 10
turnout_intent:
probe_name: turnout_intent
probe_type: BinaryProbe
probe_data:
name: VoteIntent
question: "Will you participate in the upcoming vote? Reply yes or no."
topic_preference:
probe_name: topic_preference
probe_type: ChoiceProbe
probe_data:
name: TopicPreference
question: "Which topic interests you most?"
choices: [Technology, Politics, Entertainment]
Deployment Schedule¶
Control when and to whom probes are deployed:
probes:
deployment:
enabled: true
start_step: 1 # First step to deploy probes
every_n_steps: 5 # Deploy every N steps
include_agents: [] # Empty = all agents
exclude_agents: # Skip specific agents
- "Storhampton Gazette"
Targeting¶
- All agents: Leave
include_agentsempty - Specific agents: List agent names in
include_agents - Exclude agents: List names in
exclude_agents(e.g., news bots)
Custom Probe Types¶
For scenario-specific probes, create a module and reference it:
# my_world/probes.py
from silisocs.evaluations.probes.types import ProbeBase
class Favorability(ProbeBase):
def __init__(self, probe_data=None):
self.name = "Favorability"
self._candidate = (probe_data or {}).get("candidate", "Candidate A")
def form_question_for_agent(self, agent):
return f"On a scale of 1-10, how favorable is your view of {self._candidate}?"
def parse_answer(self, raw_response):
# Extract numeric value from LLM response
...
Custom types are resolved via importlib at runtime. A probe must return a
question string from form_question_for_agent(agent) and a parsed string or
None from parse_answer(raw_response). Probe prompts should contain the
measurement question and any answer-format constraint only. Agent identity,
persona, and recent observations should come from the agent runtime itself.
This is optional and most worlds can use the built-in generalist probe types
directly.
Questionnaire Batching¶
For efficiency, the probe system batches multiple probe questions into a single LLM call per agent. This reduces API costs significantly when deploying many probes.
The batched questionnaire prompt presents all questions numbered, and the response parser extracts individual answers. If parsing fails for any question, the system falls back to individual LLM calls for those specific questions.
Output¶
Probe results are saved to probe_events.jsonl in the simulation output directory:
{
"episode": 5,
"event_type": "probe",
"source_user": "Alice Smith",
"label": "turnout_intent",
"data": {
"probe_type": "BinaryProbe",
"raw_response": "I'd say about a 7",
"probe_return": "yes"
}
}
The election world demonstrates probes in action with named built-in probes (vote preference, favorability, intent) — see the Election Walkthrough.
Default Detailed Probe Evaluators¶
When running studies with run_study.py, you can use built-in probe evaluator presets:
builtin.probe_metrics_detailed(all probe events)builtin.probe_binary_detailedbuiltin.probe_numeric_detailedbuiltin.probe_choice_detailedbuiltin.probe_freetext_detailed
Plot outputs are generated by these detailed probe evaluators directly (in sibling
*_plots/ directories next to each evaluator JSON), rather than through a separate
plot-only evaluator hook.
These evaluators:
- read action_events.jsonl
- aggregate metrics per agent and per episode
- aggregate per probe label and per inferred/configured probe type
- use effective_config.yaml to map labels to configured probe types when available
For orchestration details and preset usage, see Experiment Studies.
Related¶
- Building Agents — Agent construction and persona pipeline
- Usage Overview — Probes in the end-to-end workflow
- Configuration Reference — Full probes config options