DEV Community

Cover image for How to generate 100+ agent configurations using LLMs with batch processing
Wanda
Wanda

Posted on • Originally published at apidog.com

How to generate 100+ agent configurations using LLMs with batch processing

Introduction

Configuring hundreds of AI agents for a social media simulation is a large-scale engineering problem. Each agent requires specific settings: activity schedules, posting frequency, response delays, influence weight, and stance. Manual setup is not scalable.

Try Apidog today

MiroFish automates this with LLM-powered config generation. It parses your documents, knowledge graph, and simulation requirements, then generates detailed configs for every agent.

But LLMs can fail: outputs may truncate, JSON may be invalid, or token limits may cause context loss.

This guide provides a practical implementation covering:

  • Step-by-step generation (time β†’ events β†’ agents β†’ platforms)
  • Batch processing to avoid context/token limits
  • JSON repair techniques for truncated outputs
  • Rule-based fallback configs if LLM fails
  • Agent activity patterns by type (Student, Official, Media, etc.)
  • Validation and correction logic

πŸ’‘ Pipeline Tip: The config generation pipeline processes 100+ agents via API calls. Apidog validates request/response schemas at each stage, catches JSON errors before production, and generates test cases for LLM edge cases.

All code here is from real production use in MiroFish.

Architecture Overview

The config generator uses a pipelined design:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Context       β”‚ ──► β”‚   Time Config   β”‚ ──► β”‚   Event Config  β”‚
β”‚   Builder       β”‚     β”‚   Generator     β”‚     β”‚   Generator     β”‚
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β”‚ - Simulation    β”‚     β”‚ - Total hours   β”‚     β”‚ - Initial posts β”‚
β”‚   requirement   β”‚     β”‚ - Minutes/round β”‚     β”‚ - Hot topics    β”‚
β”‚ - Entity summaryβ”‚     β”‚ - Peak hours    β”‚     β”‚ - Narrative     β”‚
β”‚ - Document text β”‚     β”‚ - Activity mult β”‚     β”‚   direction     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                        β”‚
                                                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Final Config  β”‚ ◄── β”‚   Platform      β”‚ ◄── β”‚   Agent Config  β”‚
β”‚   Assembly      β”‚     β”‚   Config        β”‚     β”‚   Batches       β”‚
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β”‚ - Merge all     β”‚     β”‚ - Twitter paramsβ”‚     β”‚ - 15 agents     β”‚
β”‚ - Validate      β”‚     β”‚ - Reddit params β”‚     β”‚   per batch     β”‚
β”‚ - Save JSON     β”‚     β”‚ - Viral thresholdβ”‚    β”‚ - N batches     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

File Structure

backend/app/services/
β”œβ”€β”€ simulation_config_generator.py  # Main config generation logic
β”œβ”€β”€ ontology_generator.py           # Ontology generation (shared)
└── zep_entity_reader.py            # Entity filtering

backend/app/models/
β”œβ”€β”€ task.py                         # Task tracking
└── project.py                      # Project state
Enter fullscreen mode Exit fullscreen mode

Step-by-Step Generation Strategy

To avoid LLM token/context limits, generation is broken into stages:

class SimulationConfigGenerator:
    AGENTS_PER_BATCH = 15
    # ... context and length constants ...

    def generate_config(
        self,
        simulation_id: str,
        project_id: str,
        graph_id: str,
        simulation_requirement: str,
        document_text: str,
        entities: List[EntityNode],
        enable_twitter: bool = True,
        enable_reddit: bool = True,
        progress_callback: Optional[Callable[[int, int, str], None]] = None,
    ) -> SimulationParameters:

        num_batches = math.ceil(len(entities) / self.AGENTS_PER_BATCH)
        total_steps = 3 + num_batches  # Time + Events + N Agent Batches + Platform

        def report_progress(step: int, message: str):
            # ... progress reporting logic ...

        context = self._build_context(
            simulation_requirement=simulation_requirement,
            document_text=document_text,
            entities=entities
        )

        # 1. Generate time config
        report_progress(1, "Generating time configuration...")
        time_config_result = self._generate_time_config(context, len(entities))
        time_config = self._parse_time_config(time_config_result, len(entities))

        # 2. Generate event config
        report_progress(2, "Generating event config and hot topics...")
        event_config_result = self._generate_event_config(context, simulation_requirement, entities)
        event_config = self._parse_event_config(event_config_result)

        # 3-N. Agent configs in batches
        all_agent_configs = []
        for batch_idx in range(num_batches):
            start_idx = batch_idx * self.AGENTS_PER_BATCH
            end_idx = min(start_idx + self.AGENTS_PER_BATCH, len(entities))
            batch_entities = entities[start_idx:end_idx]

            report_progress(
                3 + batch_idx,
                f"Generating agent config ({start_idx + 1}-{end_idx}/{len(entities)})..."
            )

            batch_configs = self._generate_agent_configs_batch(
                context=context,
                entities=batch_entities,
                start_idx=start_idx,
                simulation_requirement=simulation_requirement
            )
            all_agent_configs.extend(batch_configs)

        # Assign initial post publishers
        event_config = self._assign_initial_post_agents(event_config, all_agent_configs)

        # Final: Platform config
        report_progress(total_steps, "Generating platform configuration...")
        twitter_config = PlatformConfig(platform="twitter", ...) if enable_twitter else None
        reddit_config = PlatformConfig(platform="reddit", ...) if enable_reddit else None

        # Assemble final config
        params = SimulationParameters(
            simulation_id=simulation_id,
            project_id=project_id,
            graph_id=graph_id,
            simulation_requirement=simulation_requirement,
            time_config=time_config,
            agent_configs=all_agent_configs,
            event_config=event_config,
            twitter_config=twitter_config,
            reddit_config=reddit_config,
            generation_reasoning=" | ".join(reasoning_parts)
        )

        return params
Enter fullscreen mode Exit fullscreen mode

Benefits of this approach:

  1. Each LLM call is focused and within context limits.
  2. Progress reporting is granular.
  3. If a stage fails, partial recovery is possible.

Building Context

The context builder assembles only relevant details, within token limits:

def _build_context(
    self,
    simulation_requirement: str,
    document_text: str,
    entities: List[EntityNode]
) -> str:

    entity_summary = self._summarize_entities(entities)

    context_parts = [
        f"## Simulation Requirement\n{simulation_requirement}",
        f"\n## Entity Information ({len(entities)} entities)\n{entity_summary}",
    ]

    # Add document text if space allows
    current_length = sum(len(p) for p in context_parts)
    remaining_length = self.MAX_CONTEXT_LENGTH - current_length - 500

    if remaining_length > 0 and document_text:
        doc_text = document_text[:remaining_length]
        if len(document_text) > remaining_length:
            doc_text += "\n...(document truncated)"
        context_parts.append(f"\n## Original Document\n{doc_text}")

    return "\n".join(context_parts)
Enter fullscreen mode Exit fullscreen mode

Entity Summarization Example

Entities are grouped and summarized by type:

def _summarize_entities(self, entities: List[EntityNode]) -> str:
    lines = []
    by_type: Dict[str, List[EntityNode]] = {}
    for e in entities:
        t = e.get_entity_type() or "Unknown"
        if t not in by_type:
            by_type[t] = []
        by_type[t].append(e)

    for entity_type, type_entities in by_type.items():
        lines.append(f"\n### {entity_type} ({len(type_entities)} entities)")
        display_count = self.ENTITIES_PER_TYPE_DISPLAY
        summary_len = self.ENTITY_SUMMARY_LENGTH

        for e in type_entities[:display_count]:
            summary_preview = (e.summary[:summary_len] + "...") if len(e.summary) > summary_len else e.summary
            lines.append(f"- {e.name}: {summary_preview}")

        if len(type_entities) > display_count:
            lines.append(f"  ... and {len(type_entities) - display_count} more")

    return "\n".join(lines)
Enter fullscreen mode Exit fullscreen mode

Sample Output:

### Student (45 entities)
- Zhang Wei: Active in student union, frequently posts about campus events and academic pressure...
- Li Ming: Graduate student researching AI ethics, often shares technology news...
... and 43 more

### University (3 entities)
- Wuhan University: Official account, posts announcements and news...
Enter fullscreen mode Exit fullscreen mode

Time Configuration Generation

The time config determines simulation length and agent activity patterns:

def _generate_time_config(self, context: str, num_entities: int) -> Dict[str, Any]:
    context_truncated = context[:self.TIME_CONFIG_CONTEXT_LENGTH]
    max_agents_allowed = max(1, int(num_entities * 0.9))

    prompt = f"""Based on the following simulation requirements, generate time configuration.

{context_truncated}

## Task
Generate time configuration JSON.

### Basic Principles (adjust based on event type and participant groups):
- User base is Chinese, must follow Beijing timezone habits
- 0-5 AM: Almost no activity (coefficient 0.05)
- 6-8 AM: Gradually waking up (coefficient 0.4)
- 9-18 PM: Work hours, moderate activity (coefficient 0.7)
- 19-22 PM: Evening peak, most active (coefficient 1.5)
- 23 PM: Activity declining (coefficient 0.5)

### Return JSON format (no markdown):

Example:
{{
    "total_simulation_hours": 72,
    "minutes_per_round": 60,
    "agents_per_hour_min": 5,
    "agents_per_hour_max": 50,
    "peak_hours": [19, 20, 21, 22],
    "off_peak_hours": [0, 1, 2, 3, 4, 5],
    "morning_hours": [6, 7, 8],
    "work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
    "reasoning": "Time configuration explanation"
}}
...
"""
    system_prompt = "You are a social media simulation expert. Return pure JSON format."

    try:
        return self._call_llm_with_retry(prompt, system_prompt)
    except Exception as e:
        logger.warning(f"Time config LLM generation failed: {e}, using default")
        return self._get_default_time_config(num_entities)
Enter fullscreen mode Exit fullscreen mode

Parsing and Validation

def _parse_time_config(self, result: Dict[str, Any], num_entities: int) -> TimeSimulationConfig:
    agents_per_hour_min = result.get("agents_per_hour_min", max(1, num_entities // 15))
    agents_per_hour_max = result.get("agents_per_hour_max", max(5, num_entities // 5))
    if agents_per_hour_min > num_entities:
        agents_per_hour_min = max(1, num_entities // 10)
    if agents_per_hour_max > num_entities:
        agents_per_hour_max = max(agents_per_hour_min + 1, num_entities // 2)
    if agents_per_hour_min >= agents_per_hour_max:
        agents_per_hour_min = max(1, agents_per_hour_max // 2)
    return TimeSimulationConfig(
        total_simulation_hours=result.get("total_simulation_hours", 72),
        minutes_per_round=result.get("minutes_per_round", 60),
        agents_per_hour_min=agents_per_hour_min,
        agents_per_hour_max=agents_per_hour_max,
        peak_hours=result.get("peak_hours", [19, 20, 21, 22]),
        off_peak_hours=result.get("off_peak_hours", [0, 1, 2, 3, 4, 5]),
        off_peak_activity_multiplier=0.05,
        morning_activity_multiplier=0.4,
        work_activity_multiplier=0.7,
        peak_activity_multiplier=1.5
    )
Enter fullscreen mode Exit fullscreen mode

Default Time Config (Chinese Timezone)

def _get_default_time_config(self, num_entities: int) -> Dict[str, Any]:
    return {
        "total_simulation_hours": 72,
        "minutes_per_round": 60,
        "agents_per_hour_min": max(1, num_entities // 15),
        "agents_per_hour_max": max(5, num_entities // 5),
        "peak_hours": [19, 20, 21, 22],
        "off_peak_hours": [0, 1, 2, 3, 4, 5],
        "morning_hours": [6, 7, 8],
        "work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
        "reasoning": "Using default Chinese timezone configuration"
    }
Enter fullscreen mode Exit fullscreen mode

Event Configuration Generation

Define initial posts, hot topics, and narrative direction:

def _generate_event_config(
    self,
    context: str,
    simulation_requirement: str,
    entities: List[EntityNode]
) -> Dict[str, Any]:

    entity_types_available = list(set(
        e.get_entity_type() or "Unknown" for e in entities
    ))

    type_examples = {}
    for e in entities:
        etype = e.get_entity_type() or "Unknown"
        if etype not in type_examples:
            type_examples[etype] = []
        if len(type_examples[etype]) < 3:
            type_examples[etype].append(e.name)

    type_info = "\n".join([
        f"- {t}: {', '.join(examples)}"
        for t, examples in type_examples.items()
    ])

    context_truncated = context[:self.EVENT_CONFIG_CONTEXT_LENGTH]

    prompt = f"""Based on the following simulation requirements, generate event configuration.

Simulation Requirement: {simulation_requirement}

{context_truncated}

## Available Entity Types and Examples
{type_info}

## Task
Generate event configuration JSON:
- Extract hot topic keywords
- Describe narrative direction
- Design initial posts, **each post must specify poster_type**

**Important**: poster_type must be selected from "Available Entity Types" above.

Return JSON format (no markdown):
{{
    "hot_topics": ["keyword1", "keyword2", ...],
    "narrative_direction": "<narrative direction description>",
    "initial_posts": [
        {{"content": "Post content", "poster_type": "Entity Type (must match available types)"}},
        ...
    ],
    "reasoning": "<brief explanation>"
}}"""

    system_prompt = "You are an opinion analysis expert. Return pure JSON format."

    try:
        return self._call_llm_with_retry(prompt, system_prompt)
    except Exception as e:
        logger.warning(f"Event config LLM generation failed: {e}, using default")
        return {
            "hot_topics": [],
            "narrative_direction": "",
            "initial_posts": [],
            "reasoning": "Using default configuration"
        }
Enter fullscreen mode Exit fullscreen mode

Assigning Initial Post Publishers

After generating initial posts, match them to agents:

def _assign_initial_post_agents(
    self,
    event_config: EventConfig,
    agent_configs: List[AgentActivityConfig]
) -> EventConfig:

    if not event_config.initial_posts:
        return event_config

    # Index agents by type
    agents_by_type: Dict[str, List[AgentActivityConfig]] = {}
    for agent in agent_configs:
        etype = agent.entity_type.lower()
        if etype not in agents_by_type:
            agents_by_type[etype] = []
        agents_by_type[etype].append(agent)

    # Type alias mapping (handles LLM variations)
    type_aliases = {
        "official": ["official", "university", "governmentagency", "government"],
        "university": ["university", "official"],
        "mediaoutlet": ["mediaoutlet", "media"],
        "student": ["student", "person"],
        "professor": ["professor", "expert", "teacher"],
        "alumni": ["alumni", "person"],
        "organization": ["organization", "ngo", "company", "group"],
        "person": ["person", "student", "alumni"],
    }

    used_indices: Dict[str, int] = {}

    updated_posts = []
    for post in event_config.initial_posts:
        poster_type = post.get("poster_type", "").lower()
        content = post.get("content", "")

        matched_agent_id = None

        # 1. Direct match
        if poster_type in agents_by_type:
            agents = agents_by_type[poster_type]
            idx = used_indices.get(poster_type, 0) % len(agents)
            matched_agent_id = agents[idx].agent_id
            used_indices[poster_type] = idx + 1
        else:
            # 2. Alias match
            for alias_key, aliases in type_aliases.items():
                if poster_type in aliases or alias_key == poster_type:
                    for alias in aliases:
                        if alias in agents_by_type:
                            agents = agents_by_type[alias]
                            idx = used_indices.get(alias, 0) % len(agents)
                            matched_agent_id = agents[idx].agent_id
                            used_indices[alias] = idx + 1
                            break
                    if matched_agent_id is not None:
                        break

        # 3. Fallback: highest influence agent
        if matched_agent_id is None:
            if agent_configs:
                sorted_agents = sorted(agent_configs, key=lambda a: a.influence_weight, reverse=True)
                matched_agent_id = sorted_agents[0].agent_id
            else:
                matched_agent_id = 0

        updated_posts.append({
            "content": content,
            "poster_type": post.get("poster_type", "Unknown"),
            "poster_agent_id": matched_agent_id
        })

    event_config.initial_posts = updated_posts
    return event_config
Enter fullscreen mode Exit fullscreen mode

Batch Agent Configuration Generation

Generating configs for hundreds of agents at once will exceed LLM context limits. Process in batches of 15:

def _generate_agent_configs_batch(
    self,
    context: str,
    entities: List[EntityNode],
    start_idx: int,
    simulation_requirement: str
) -> List[AgentActivityConfig]:

    entity_list = []
    summary_len = self.AGENT_SUMMARY_LENGTH
    for i, e in enumerate(entities):
        entity_list.append({
            "agent_id": start_idx + i,
            "entity_name": e.name,
            "entity_type": e.get_entity_type() or "Unknown",
            "summary": e.summary[:summary_len] if e.summary else ""
        })

    prompt = f"""Based on the following information, generate social media activity configuration for each entity.

Simulation Requirement: {simulation_requirement}

## Entity List
Enter fullscreen mode Exit fullscreen mode


json
{json.dumps(entity_list, ensure_ascii=False, indent=2)}

"""

    # Generation task instructions...

    system_prompt = "You are a social media behavior analysis expert. Return pure JSON format."

    try:
        result = self._call_llm_with_retry(prompt, system_prompt)
        llm_configs = {cfg["agent_id"]: cfg for cfg in result.get("agent_configs", [])}
    except Exception as e:
        logger.warning(f"Agent config batch LLM generation failed: {e}, using rule-based generation")
        llm_configs = {}

    # Build AgentActivityConfig objects
    configs = []
    for i, entity in enumerate(entities):
        agent_id = start_idx + i
        cfg = llm_configs.get(agent_id, {})

        if not cfg:
            cfg = self._generate_agent_config_by_rule(entity)

        config = AgentActivityConfig(
            agent_id=agent_id,
            entity_uuid=entity.uuid,
            entity_name=entity.name,
            entity_type=entity.get_entity_type() or "Unknown",
            activity_level=cfg.get("activity_level", 0.5),
            posts_per_hour=cfg.get("posts_per_hour", 0.5),
            comments_per_hour=cfg.get("comments_per_hour", 1.0),
            active_hours=cfg.get("active_hours", list(range(9, 23))),
            response_delay_min=cfg.get("response_delay_min", 5),
            response_delay_max=cfg.get("response_delay_max", 60),
            sentiment_bias=cfg.get("sentiment_bias", 0.0),
            stance=cfg.get("stance", "neutral"),
            influence_weight=cfg.get("influence_weight", 1.0)
        )
        configs.append(config)

    return configs
Enter fullscreen mode Exit fullscreen mode


python

Rule-Based Fallback Configs

When LLM fails, use predefined activity patterns:

def _generate_agent_config_by_rule(self, entity: EntityNode) -> Dict[str, Any]:
    entity_type = (entity.get_entity_type() or "Unknown").lower()

    if entity_type in ["university", "governmentagency", "ngo"]:
        return {
            "activity_level": 0.2,
            "posts_per_hour": 0.1,
            "comments_per_hour": 0.05,
            "active_hours": list(range(9, 18)),
            "response_delay_min": 60,
            "response_delay_max": 240,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 3.0
        }

    elif entity_type in ["mediaoutlet"]:
        return {
            "activity_level": 0.5,
            "posts_per_hour": 0.8,
            "comments_per_hour": 0.3,
            "active_hours": list(range(7, 24)),
            "response_delay_min": 5,
            "response_delay_max": 30,
            "sentiment_bias": 0.0,
            "stance": "observer",
            "influence_weight": 2.5
        }

    elif entity_type in ["professor", "expert", "official"]:
        return {
            "activity_level": 0.4,
            "posts_per_hour": 0.3,
            "comments_per_hour": 0.5,
            "active_hours": list(range(8, 22)),
            "response_delay_min": 15,
            "response_delay_max": 90,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 2.0
        }

    elif entity_type in ["student"]:
        return {
            "activity_level": 0.8,
            "posts_per_hour": 0.6,
            "comments_per_hour": 1.5,
            "active_hours": [8, 9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
            "response_delay_min": 1,
            "response_delay_max": 15,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 0.8
        }

    elif entity_type in ["alumni"]:
        return {
            "activity_level": 0.6,
            "posts_per_hour": 0.4,
            "comments_per_hour": 0.8,
            "active_hours": [12, 13, 19, 20, 21, 22, 23],
            "response_delay_min": 5,
            "response_delay_max": 30,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 1.0
        }

    else:
        return {
            "activity_level": 0.7,
            "posts_per_hour": 0.5,
            "comments_per_hour": 1.2,
            "active_hours": [9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
            "response_delay_min": 2,
            "response_delay_max": 20,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 1.0
        }
Enter fullscreen mode Exit fullscreen mode

LLM Call with Retry and JSON Repair

LLM outputs can be truncated or invalid. Always retry and attempt to repair JSON:

def _call_llm_with_retry(self, prompt: str, system_prompt: str) -> Dict[str, Any]:
    import re

    max_attempts = 3
    last_error = None

    for attempt in range(max_attempts):
        try:
            response = self.client.chat.completions.create(
                model=self.model_name,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ],
                response_format={"type": "json_object"},
                temperature=0.7 - (attempt * 0.1)
            )

            content = response.choices[0].message.content
            finish_reason = response.choices[0].finish_reason

            if finish_reason == 'length':
                content = self._fix_truncated_json(content)

            try:
                return json.loads(content)
            except json.JSONDecodeError as e:
                fixed = self._try_fix_config_json(content)
                if fixed:
                    return fixed
                last_error = e

        except Exception as e:
            last_error = e
            import time
            time.sleep(2 * (attempt + 1))

    raise last_error or Exception("LLM call failed")
Enter fullscreen mode Exit fullscreen mode

Fixing Truncated JSON

def _fix_truncated_json(self, content: str) -> str:
    content = content.strip()
    open_braces = content.count('{') - content.count('}')
    open_brackets = content.count('[') - content.count(']')
    if content and content[-1] not in '",}]':
        content += '"'
    content += ']' * open_brackets
    content += '}' * open_braces
    return content
Enter fullscreen mode Exit fullscreen mode

Advanced JSON Repair

def _try_fix_config_json(self, content: str) -> Optional[Dict[str, Any]]:
    import re
    content = self._fix_truncated_json(content)
    json_match = re.search(r'\{[\s\S]*\}', content)
    if json_match:
        json_str = json_match.group()
        def fix_string(match):
            s = match.group(0)
            s = s.replace('\n', ' ').replace('\r', ' ')
            s = re.sub(r'\s+', ' ', s)
            return s
        json_str = re.sub(r'"[^"\\]*(?:\\.[^"\\]*)*"', fix_string, json_str)
        try:
            return json.loads(json_str)
        except:
            json_str = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', json_str)
            json_str = re.sub(r'\s+', ' ', json_str)
            try:
                return json.loads(json_str)
            except:
                pass
    return None
Enter fullscreen mode Exit fullscreen mode

Configuration Data Structures

Agent Activity Config

@dataclass
class AgentActivityConfig:
    """Single agent activity configuration"""
    agent_id: int
    entity_uuid: str
    entity_name: str
    entity_type: str

    activity_level: float = 0.5
    posts_per_hour: float = 1.0
    comments_per_hour: float = 2.0
    active_hours: List[int] = field(default_factory=lambda: list(range(8, 23)))
    response_delay_min: int = 5
    response_delay_max: int = 60
    sentiment_bias: float = 0.0
    stance: str = "neutral"
    influence_weight: float = 1.0
Enter fullscreen mode Exit fullscreen mode

Time Simulation Config

@dataclass
class TimeSimulationConfig:
    """Time simulation configuration (Chinese timezone)"""
    total_simulation_hours: int = 72
    minutes_per_round: int = 60
    agents_per_hour_min: int = 5
    agents_per_hour_max: int = 20
    peak_hours: List[int] = field(default_factory=lambda: [19, 20, 21, 22])
    peak_activity_multiplier: float = 1.5
    off_peak_hours: List[int] = field(default_factory=lambda: [0, 1, 2, 3, 4, 5])
    off_peak_activity_multiplier: float = 0.05
    morning_hours: List[int] = field(default_factory=lambda: [6, 7, 8])
    morning_activity_multiplier: float = 0.4
    work_hours: List[int] = field(default_factory=lambda: [9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
    work_activity_multiplier: float = 0.7
Enter fullscreen mode Exit fullscreen mode

Complete Simulation Parameters

@dataclass
class SimulationParameters:
    """Complete simulation parameter configuration"""
    simulation_id: str
    project_id: str
    graph_id: str
    simulation_requirement: str
    time_config: TimeSimulationConfig = field(default_factory=TimeSimulationConfig)
    agent_configs: List[AgentActivityConfig] = field(default_factory=list)
    event_config: EventConfig = field(default_factory=EventConfig)
    twitter_config: Optional[PlatformConfig] = None
    reddit_config: Optional[PlatformConfig] = None
    llm_model: str = ""
    llm_base_url: str = ""
    generated_at: str = field(default_factory=lambda: datetime.now().isoformat())
    generation_reasoning: str = ""
    def to_dict(self) -> Dict[str, Any]:
        time_dict = asdict(self.time_config)
        return {
            "simulation_id": self.simulation_id,
            "project_id": self.project_id,
            "graph_id": self.graph_id,
            "simulation_requirement": self.simulation_requirement,
            "time_config": time_dict,
            "agent_configs": [asdict(a) for a in self.agent_configs],
            "event_config": asdict(self.event_config),
            "twitter_config": asdict(self.twitter_config) if self.twitter_config else None,
            "reddit_config": asdict(self.reddit_config) if self.reddit_config else None,
            "llm_model": self.llm_model,
            "llm_base_url": self.llm_base_url,
            "generated_at": self.generated_at,
            "generation_reasoning": self.generation_reasoning,
        }
Enter fullscreen mode Exit fullscreen mode

Summary Table: Agent Type Patterns

Agent Type Activity Active Hours Posts/Hour Comments/Hour Response (min) Influence
University 0.2 9-17 0.1 0.05 60-240 3.0
GovernmentAgency 0.2 9-17 0.1 0.05 60-240 3.0
MediaOutlet 0.5 7-23 0.8 0.3 5-30 2.5
Professor 0.4 8-21 0.3 0.5 15-90 2.0
Student 0.8 8-12, 18-23 0.6 1.5 1-15 0.8
Alumni 0.6 12-13, 19-23 0.4 0.8 5-30 1.0
Person (default) 0.7 9-13, 18-23 0.5 1.2 2-20 1.0

Conclusion

LLM-driven configuration generation for agent-based simulations is robust if you:

  1. Break the workflow into stages (time β†’ events β†’ agents β†’ platforms)
  2. Batch process agent configs (e.g., 15 agents per batch)
  3. Repair or retry malformed/truncated JSON from LLMs
  4. Use rule-based defaults when LLMs fail
  5. Encode type-specific agent activity patterns
  6. Validate and correct all generated values (e.g., ensure per-hour agent counts never exceed total)

By applying these engineering patterns, you can automate the configuration of large agent-based simulations reliably and at scale.

Top comments (0)