Skip to content

Evaluation Probes

Probes are structured surveys deployed to agents during the simulation. They measure agent attitudes, preferences, and knowledge over time — providing quantitative data for analysis.

Overview

Probes are configured under eval.probes in your scenario YAML. The system supports multiple probe types, configurable deployment schedules, and concurrent execution across large agent populations.


Probe Types

Built-in Types

Type Response Example
NumericRatingProbe Integer on a configurable scale (default 1-10) "Rate your satisfaction 1-10"
BinaryProbe Yes/No "Would you vote for candidate X?"
ChoiceProbe One of N options "Which candidate do you prefer: A, B, or C?"
FreeTextProbe Open-ended string "What are your concerns about the election?"

Probe Configuration

probes:
  probes:
    satisfaction:
      probe_name: satisfaction
      probe_type: NumericRatingProbe
      probe_data:
        name: Satisfaction
        question: "Return one number from {lo} to {hi}: how satisfied are you with community discussions?"
        lo: 1
        hi: 10

    turnout_intent:
      probe_name: turnout_intent
      probe_type: BinaryProbe
      probe_data:
        name: VoteIntent
        question: "Will you participate in the upcoming vote? Reply yes or no."

    topic_preference:
      probe_name: topic_preference
      probe_type: ChoiceProbe
      probe_data:
        name: TopicPreference
        question: "Which topic interests you most?"
        choices: [Technology, Politics, Entertainment]

Deployment Schedule

Control when and to whom probes are deployed:

probes:
  deployment:
    enabled: true
    start_step: 1             # First step to deploy probes
    every_n_steps: 5          # Deploy every N steps
    include_agents: []      # Empty = all agents
    exclude_agents:         # Skip specific agents
      - "Storhampton Gazette"

Targeting

  • All agents: Leave include_agents empty
  • Specific agents: List agent names in include_agents
  • Exclude agents: List names in exclude_agents (e.g., news bots)

Custom Probe Types

For scenario-specific probes, create a module and reference it:

probes:
  probe_lib_module: my_world.probes
# my_world/probes.py
from silisocs.evaluations.probes.types import ProbeBase

class Favorability(ProbeBase):
    def __init__(self, probe_data=None):
        self.name = "Favorability"
        self._candidate = (probe_data or {}).get("candidate", "Candidate A")

    def form_question_for_agent(self, agent):
        return f"On a scale of 1-10, how favorable is your view of {self._candidate}?"

    def parse_answer(self, raw_response):
        # Extract numeric value from LLM response
        ...

Custom types are resolved via importlib at runtime. A probe must return a question string from form_question_for_agent(agent) and a parsed string or None from parse_answer(raw_response). Probe prompts should contain the measurement question and any answer-format constraint only. Agent identity, persona, and recent observations should come from the agent runtime itself. This is optional and most worlds can use the built-in generalist probe types directly.


Questionnaire Batching

For efficiency, the probe system batches multiple probe questions into a single LLM call per agent. This reduces API costs significantly when deploying many probes.

The batched questionnaire prompt presents all questions numbered, and the response parser extracts individual answers. If parsing fails for any question, the system falls back to individual LLM calls for those specific questions.


Output

Probe results are saved to probe_events.jsonl in the simulation output directory:

{
  "episode": 5,
  "event_type": "probe",
  "source_user": "Alice Smith",
  "label": "turnout_intent",
  "data": {
    "probe_type": "BinaryProbe",
    "raw_response": "I'd say about a 7",
    "probe_return": "yes"
  }
}

The election world demonstrates probes in action with named built-in probes (vote preference, favorability, intent) — see the Election Walkthrough.

Default Detailed Probe Evaluators

When running studies with run_study.py, you can use built-in probe evaluator presets:

  • builtin.probe_metrics_detailed (all probe events)
  • builtin.probe_binary_detailed
  • builtin.probe_numeric_detailed
  • builtin.probe_choice_detailed
  • builtin.probe_freetext_detailed

Plot outputs are generated by these detailed probe evaluators directly (in sibling *_plots/ directories next to each evaluator JSON), rather than through a separate plot-only evaluator hook.

These evaluators: - read action_events.jsonl - aggregate metrics per agent and per episode - aggregate per probe label and per inferred/configured probe type - use effective_config.yaml to map labels to configured probe types when available

For orchestration details and preset usage, see Experiment Studies.