Evaluation Probes¶

Probes are structured surveys deployed to agents during the simulation. They measure agent attitudes, preferences, and knowledge over time, providing quantitative data for analysis.

What a probe captures

A probe asks every agent the same question on a schedule (e.g. every episode) and records typed responses to probe_events.jsonl, turning a running society into a longitudinal panel survey.

Overview¶

Probes are configured under eval.probes in your scenario YAML. The system supports multiple probe types, configurable deployment schedules, and concurrent execution across large agent populations.

Probe Types¶

Built-in Types¶

Type	Response	Example
`NumericRatingProbe`	Integer on a configurable scale (default 1-10)	"Rate your satisfaction 1-10"
`BinaryProbe`	Yes/No	"Would you vote for candidate X?"
`ChoiceProbe`	One of N options	"Which candidate do you prefer: A, B, or C?"
`FreeTextProbe`	Open-ended string	"What are your concerns about the election?"

Probe Configuration¶

probes:
  probes:
    satisfaction:
      probe_name: satisfaction
      probe_type: NumericRatingProbe
      probe_data:
        name: Satisfaction
        question: "Return one number from {lo} to {hi}: how satisfied are you with community discussions?"
        lo: 1
        hi: 10

    turnout_intent:
      probe_name: turnout_intent
      probe_type: BinaryProbe
      probe_data:
        name: VoteIntent
        question: "Will you participate in the upcoming vote? Reply yes or no."

    topic_preference:
      probe_name: topic_preference
      probe_type: ChoiceProbe
      probe_data:
        name: TopicPreference
        question: "Which topic interests you most?"
        choices: [Technology, Politics, Entertainment]

Deployment Schedule¶

Control when and to whom probes are deployed:

probes:
  deployment:
    enabled: true
    start_step: 1             # First step to deploy probes
    every_n_steps: 5          # Deploy every N steps
    include_agents: []      # Empty = all agents
    exclude_agents:         # Skip specific agents
      - "Storhampton Gazette"

Targeting¶

All agents: Leave include_agents empty
Specific agents: List agent names in include_agents
Exclude agents: List names in exclude_agents (e.g., news bots)

Custom Probe Types¶

For scenario-specific probes, create a module and reference it:

probes:
  probe_lib_module: my_world.probes

# my_world/probes.py
from silisocs.evaluations.probes.types import ProbeBase

class Favorability(ProbeBase):
    def __init__(self, probe_data=None):
        self.name = "Favorability"
        self._candidate = (probe_data or {}).get("candidate", "Candidate A")

    def form_question_for_agent(self, agent):
        return f"On a scale of 1-10, how favorable is your view of {self._candidate}?"

    def parse_answer(self, raw_response):
        # Extract numeric value from LLM response
        ...

Custom types are resolved via importlib at runtime. A probe must return a question string from form_question_for_agent(agent) and a parsed string or None from parse_answer(raw_response). Probe prompts should contain the measurement question and any answer-format constraint only. Agent identity, persona, and recent observations should come from the agent runtime itself. This is optional and most worlds can use the built-in generalist probe types directly.

Questionnaire Batching¶

For efficiency, the probe system batches multiple probe questions into a single LLM call per agent. This reduces API costs significantly when deploying many probes.

The batched questionnaire prompt presents all questions numbered, and the response parser extracts individual answers. If parsing fails for any question, the system falls back to individual LLM calls for those specific questions.

Output¶

Probe results are saved to probe_events.jsonl in the simulation output directory:

{
  "episode": 5,
  "event_type": "probe",
  "source_user": "Alice Smith",
  "label": "turnout_intent",
  "data": {
    "probe_type": "BinaryProbe",
    "raw_response": "I'd say about a 7",
    "probe_return": "yes"
  }
}

The election world demonstrates probes in action with named built-in probes (vote preference, favorability, intent): see the Election Walkthrough.

Default Detailed Probe Evaluators¶

When running studies with run_study.py, you can use built-in probe evaluator presets:

builtin.probe_metrics_detailed (all probe events)
builtin.probe_binary_detailed
builtin.probe_numeric_detailed
builtin.probe_choice_detailed
builtin.probe_freetext_detailed

Plot outputs are generated by these detailed probe evaluators directly (in sibling *_plots/ directories next to each evaluator JSON), rather than through a separate plot-only evaluator hook.

These evaluators: - read action_events.jsonl - aggregate metrics per agent and per episode - aggregate per probe label and per inferred/configured probe type - use effective_config.yaml to map labels to configured probe types when available

For orchestration details and preset usage, see Experiment Studies.

Building Agents: Agent construction and persona pipeline
Usage Overview: Probes in the end-to-end workflow
Configuration Reference: Full probes config options