Sebastian Petrus

Posted on Mar 19 • Originally published at apidog.com

Tạo hơn 100 cấu hình agent bằng LLM hàng loạt

#ai #agents #llm #automation

Giới thiệu

Cấu hình hàng trăm tác nhân AI cho một mô phỏng mạng xã hội là một công việc phức tạp, đặc biệt khi mỗi tác nhân cần lịch trình hoạt động, tần suất đăng bài, độ trễ phản hồi, trọng số ảnh hưởng và lập trường riêng. Việc thao tác thủ công sẽ tốn rất nhiều thời gian và dễ mắc lỗi.

Dùng thử Apidog ngay hôm nay

MiroFish tự động hóa toàn bộ quy trình bằng cách tận dụng LLM để sinh ra các cấu hình chi tiết cho từng tác nhân dựa trên phân tích tài liệu, đồ thị tri thức và yêu cầu mô phỏng.

Thách thức thực tế:

LLM có thể trả về kết quả không đầy đủ, JSON lỗi, hoặc bị giới hạn token.
Cần một quy trình vừa tự động, vừa có khả năng phát hiện - sửa lỗi và dự phòng.

Bài viết này hướng dẫn từng bước xây dựng hệ thống sinh cấu hình mô phỏng, từ pipeline xử lý theo lô, logic sửa lỗi JSON, đến chiến lược dự phòng và xác thực tự động bằng Apidog.

💡 Quy trình tạo cấu hình xử lý hơn 100 tác nhân thông qua một loạt các lệnh gọi API. Apidog được sử dụng để xác thực các lược đồ yêu cầu/phản hồi ở mỗi giai đoạn, phát hiện lỗi định dạng JSON trước khi chúng đến giai đoạn sản xuất và tạo các trường hợp thử nghiệm cho các tình huống khó như đầu ra LLM bị cắt ngắn.

Tất cả mã nguồn dưới đây đều đến từ ứng dụng thực tế trong MiroFish.

Tổng quan Kiến trúc

Hệ thống chia thành các giai đoạn pipeline để dễ quản lý:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Trình xây dựng│ ──► │   Cấu hình thời │ ──► │   Cấu hình sự   │
│   ngữ cảnh      │     │   gian          │     │   kiện          │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                                        │
                                                        ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Tập hợp cấu   │ ◄── │   Cấu hình      │ ◄── │   Các lô cấu    │
│   hình cuối cùng│     │   nền tảng      │     │   hình tác nhân │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Cấu trúc tệp điển hình:

backend/app/services/
├── simulation_config_generator.py  # Logic tạo cấu hình chính
├── ontology_generator.py           # Tạo Ontology (chung)
└── zep_entity_reader.py            # Lọc thực thể

backend/app/models/
├── task.py                         # Theo dõi nhiệm vụ
└── project.py                      # Trạng thái dự án

Chiến lược tạo theo từng bước

Để tránh vượt giới hạn token, mỗi giai đoạn được xử lý độc lập và sinh kết quả từng phần:

class SimulationConfigGenerator:
    AGENTS_PER_BATCH = 15
    MAX_CONTEXT_LENGTH = 50000
    TIME_CONFIG_CONTEXT_LENGTH = 10000
    EVENT_CONFIG_CONTEXT_LENGTH = 8000
    ENTITY_SUMMARY_LENGTH = 300
    AGENT_SUMMARY_LENGTH = 300
    ENTITIES_PER_TYPE_DISPLAY = 20

    def generate_config(
        self,
        simulation_id: str,
        project_id: str,
        graph_id: str,
        simulation_requirement: str,
        document_text: str,
        entities: List[EntityNode],
        enable_twitter: bool = True,
        enable_reddit: bool = True,
        progress_callback: Optional[Callable[[int, int, str], None]] = None,
    ) -> SimulationParameters:

        num_batches = math.ceil(len(entities) / self.AGENTS_PER_BATCH)
        total_steps = 3 + num_batches  # Thời gian + Sự kiện + N Lô Tác nhân + Nền tảng
        current_step = 0

        def report_progress(step: int, message: str):
            nonlocal current_step
            current_step = step
            if progress_callback:
                progress_callback(step, total_steps, message)
            logger.info(f"[{step}/{total_steps}] {message}")

        # Xây dựng ngữ cảnh
        context = self._build_context(
            simulation_requirement=simulation_requirement,
            document_text=document_text,
            entities=entities
        )

        reasoning_parts = []

        # Bước 1: Cấu hình thời gian
        report_progress(1, "Đang tạo cấu hình thời gian...")
        time_config_result = self._generate_time_config(context, len(entities))
        time_config = self._parse_time_config(time_config_result, len(entities))
        reasoning_parts.append(f"Cấu hình thời gian: {time_config_result.get('reasoning', 'Thành công')}")

        # Bước 2: Cấu hình sự kiện
        report_progress(2, "Đang tạo cấu hình sự kiện và các chủ đề nóng...")
        event_config_result = self._generate_event_config(context, simulation_requirement, entities)
        event_config = self._parse_event_config(event_config_result)
        reasoning_parts.append(f"Cấu hình sự kiện: {event_config_result.get('reasoning', 'Thành công')}")

        # Bước 3-N: Cấu hình tác nhân theo lô
        all_agent_configs = []
        for batch_idx in range(num_batches):
            start_idx = batch_idx * self.AGENTS_PER_BATCH
            end_idx = min(start_idx + self.AGENTS_PER_BATCH, len(entities))
            batch_entities = entities[start_idx:end_idx]

            report_progress(
                3 + batch_idx,
                f"Đang tạo cấu hình tác nhân ({start_idx + 1}-{end_idx}/{len(entities)})..."
            )

            batch_configs = self._generate_agent_configs_batch(
                context=context,
                entities=batch_entities,
                start_idx=start_idx,
                simulation_requirement=simulation_requirement
            )
            all_agent_configs.extend(batch_configs)

        reasoning_parts.append(f"Cấu hình tác nhân: Đã tạo {len(all_agent_configs)} tác nhân")

        # Gán người đăng bài ban đầu
        event_config = self._assign_initial_post_agents(event_config, all_agent_configs)

        # Bước cuối: Cấu hình nền tảng
        report_progress(total_steps, "Đang tạo cấu hình nền tảng...")
        twitter_config = PlatformConfig(platform="twitter", ...) if enable_twitter else None
        reddit_config = PlatformConfig(platform="reddit", ...) if enable_reddit else None

        # Kết hợp cấu hình cuối cùng
        params = SimulationParameters(
            simulation_id=simulation_id,
            project_id=project_id,
            graph_id=graph_id,
            simulation_requirement=simulation_requirement,
            time_config=time_config,
            agent_configs=all_agent_configs,
            event_config=event_config,
            twitter_config=twitter_config,
            reddit_config=reddit_config,
            generation_reasoning=" | ".join(reasoning_parts)
        )

        return params

Lợi ích:

Dễ kiểm soát và debug từng bước.
Dễ phục hồi hoặc chạy lại từng giai đoạn riêng biệt nếu xảy ra lỗi.
Giao diện progress rõ ràng cho người dùng.

Xây dựng ngữ cảnh

Cần đảm bảo ngữ cảnh gửi vào LLM đủ thông tin, nhưng không vượt token:

def _build_context(
    self,
    simulation_requirement: str,
    document_text: str,
    entities: List[EntityNode]
) -> str:

    entity_summary = self._summarize_entities(entities)

    context_parts = [
        f"## Yêu cầu mô phỏng\n{simulation_requirement}",
        f"\n## Thông tin thực thể ({len(entities)} thực thể)\n{entity_summary}",
    ]

    current_length = sum(len(p) for p in context_parts)
    remaining_length = self.MAX_CONTEXT_LENGTH - current_length - 500

    if remaining_length > 0 and document_text:
        doc_text = document_text[:remaining_length]
        if len(document_text) > remaining_length:
            doc_text += "\n...(tài liệu bị cắt ngắn)"
        context_parts.append(f"\n## Tài liệu gốc\n{doc_text}")

    return "\n".join(context_parts)

Tóm tắt thực thể

def _summarize_entities(self, entities: List[EntityNode]) -> str:
    lines = []
    by_type: Dict[str, List[EntityNode]] = {}
    for e in entities:
        t = e.get_entity_type() or "Unknown"
        if t not in by_type:
            by_type[t] = []
        by_type[t].append(e)

    for entity_type, type_entities in by_type.items():
        lines.append(f"\n### {entity_type} ({len(type_entities)} thực thể)")
        display_count = self.ENTITIES_PER_TYPE_DISPLAY
        summary_len = self.ENTITY_SUMMARY_LENGTH

        for e in type_entities[:display_count]:
            summary_preview = (e.summary[:summary_len] + "...") if len(e.summary) > summary_len else e.summary
            lines.append(f"- {e.name}: {summary_preview}")

        if len(type_entities) > display_count:
            lines.append(f"  ... và {len(type_entities) - display_count} thực thể khác")

    return "\n".join(lines)

Kết quả ví dụ:

### Sinh viên (45 thực thể)
- Zhang Wei: Tích cực trong hội sinh viên, thường xuyên đăng bài về các sự kiện trong khuôn viên trường và áp lực học tập...
- Li Ming: Sinh viên cao học nghiên cứu đạo đức AI, thường chia sẻ tin tức công nghệ...
... và 43 thực thể khác

### Đại học (3 thực thể)
- Đại học Vũ Hán: Tài khoản chính thức, đăng thông báo và tin tức...

Tạo cấu hình thời gian

def _generate_time_config(self, context: str, num_entities: int) -> Dict[str, Any]:
    context_truncated = context[:self.TIME_CONFIG_CONTEXT_LENGTH]
    max_agents_allowed = max(1, int(num_entities * 0.9))

    prompt = f"""Dựa trên các yêu cầu mô phỏng sau, hãy tạo cấu hình thời gian.

{context_truncated}

## Nhiệm vụ
Tạo JSON cấu hình thời gian.

### Nguyên tắc cơ bản (điều chỉnh dựa trên loại sự kiện và nhóm người tham gia):
- Cơ sở người dùng là người Trung Quốc, phải tuân theo thói quen múi giờ Bắc Kinh
- 0-5 giờ sáng: Hầu như không có hoạt động (hệ số 0.05)
- 6-8 giờ sáng: Dần dần thức dậy (hệ số 0.4)
- 9-18 giờ tối: Giờ làm việc, hoạt động vừa phải (hệ số 0.7)
- 19-22 giờ tối: Giờ cao điểm buổi tối, hoạt động tích cực nhất (hệ số 1.5)
- 23 giờ tối: Hoạt động giảm dần (hệ số 0.5)

### Trả về định dạng JSON (không markdown):

Ví dụ:
{{
    "total_simulation_hours": 72,
    "minutes_per_round": 60,
    "agents_per_hour_min": 5,
    "agents_per_hour_max": 50,
    "peak_hours": [19, 20, 21, 22],
    "off_peak_hours": [0, 1, 2, 3, 4, 5],
    "morning_hours": [6, 7, 8],
    "work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
    "reasoning": "Giải thích cấu hình thời gian"
}}

Mô tả trường:
- total_simulation_hours (int): 24-168 giờ, ngắn hơn cho tin tức nóng hổi, dài hơn cho các chủ đề đang diễn ra
- minutes_per_round (int): 30-120 phút, khuyến nghị 60
- agents_per_hour_min (int): Phạm vi 1-{max_agents_allowed}
- agents_per_hour_max (int): Phạm vi 1-{max_agents_allowed}
- peak_hours (mảng int): Điều chỉnh dựa trên nhóm người tham gia
- off_peak_hours (mảng int): Thường là đêm muộn/sáng sớm
- morning_hours (mảng int): Giờ buổi sáng
- work_hours (mảng int): Giờ làm việc
- reasoning (chuỗi): Giải thích ngắn gọn"""

    system_prompt = "Bạn là chuyên gia mô phỏng mạng xã hội. Trả về định dạng JSON thuần túy."

    try:
        return self._call_llm_with_retry(prompt, system_prompt)
    except Exception as e:
        logger.warning(f"Tạo cấu hình thời gian bằng LLM thất bại: {e}, sử dụng mặc định")
        return self._get_default_time_config(num_entities)

Phân tích và xác thực cấu hình thời gian

def _parse_time_config(self, result: Dict[str, Any], num_entities: int) -> TimeSimulationConfig:
    agents_per_hour_min = result.get("agents_per_hour_min", max(1, num_entities // 15))
    agents_per_hour_max = result.get("agents_per_hour_max", max(5, num_entities // 5))

    if agents_per_hour_min > num_entities:
        logger.warning(f"agents_per_hour_min ({agents_per_hour_min}) vượt quá tổng số tác nhân ({num_entities}), đã sửa")
        agents_per_hour_min = max(1, num_entities // 10)

    if agents_per_hour_max > num_entities:
        logger.warning(f"agents_per_hour_max ({agents_per_hour_max}) vượt quá tổng số tác nhân ({num_entities}), đã sửa")
        agents_per_hour_max = max(agents_per_hour_min + 1, num_entities // 2)

    if agents_per_hour_min >= agents_per_hour_max:
        agents_per_hour_min = max(1, agents_per_hour_max // 2)
        logger.warning(f"agents_per_hour_min >= max, đã sửa thành {agents_per_hour_min}")

    return TimeSimulationConfig(
        total_simulation_hours=result.get("total_simulation_hours", 72),
        minutes_per_round=result.get("minutes_per_round", 60),
        agents_per_hour_min=agents_per_hour_min,
        agents_per_hour_max=agents_per_hour_max,
        peak_hours=result.get("peak_hours", [19, 20, 21, 22]),
        off_peak_hours=result.get("off_peak_hours", [0, 1, 2, 3, 4, 5]),
        off_peak_activity_multiplier=0.05,
        morning_activity_multiplier=0.4,
        work_activity_multiplier=0.7,
        peak_activity_multiplier=1.5
    )

Cấu hình thời gian mặc định (Múi giờ Trung Quốc)

def _get_default_time_config(self, num_entities: int) -> Dict[str, Any]:
    return {
        "total_simulation_hours": 72,
        "minutes_per_round": 60,
        "agents_per_hour_min": max(1, num_entities // 15),
        "agents_per_hour_max": max(5, num_entities // 5),
        "peak_hours": [19, 20, 21, 22],
        "off_peak_hours": [0, 1, 2, 3, 4, 5],
        "morning_hours": [6, 7, 8],
        "work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
        "reasoning": "Sử dụng cấu hình múi giờ Trung Quốc mặc định"
    }

Tạo cấu hình sự kiện

def _generate_event_config(
    self,
    context: str,
    simulation_requirement: str,
    entities: List[EntityNode]
) -> Dict[str, Any]:

    entity_types_available = list(set(
        e.get_entity_type() or "Unknown" for e in entities
    ))

    type_examples = {}
    for e in entities:
        etype = e.get_entity_type() or "Unknown"
        if etype not in type_examples:
            type_examples[etype] = []
        if len(type_examples[etype]) < 3:
            type_examples[etype].append(e.name)

    type_info = "\n".join([
        f"- {t}: {', '.join(examples)}"
        for t, examples in type_examples.items()
    ])

    context_truncated = context[:self.EVENT_CONFIG_CONTEXT_LENGTH]

    prompt = f"""Dựa trên các yêu cầu mô phỏng sau, hãy tạo cấu hình sự kiện.

Yêu cầu mô phỏng: {simulation_requirement}

{context_truncated}

## Các loại thực thể có sẵn và ví dụ
{type_info}

## Nhiệm vụ
Tạo JSON cấu hình sự kiện:
- Trích xuất từ khóa chủ đề nóng
- Mô tả hướng kể chuyện
- Thiết kế các bài đăng ban đầu, **mỗi bài đăng phải chỉ định poster_type**

**Quan trọng**: poster_type phải được chọn từ "Các loại thực thể có sẵn" ở trên, để các bài đăng ban đầu có thể được gán cho các tác nhân phù hợp.

Ví dụ: Các tuyên bố chính thức nên được đăng bởi các loại Official/University, tin tức bởi MediaOutlet, ý kiến sinh viên bởi Student.

Trả về định dạng JSON (không markdown):
{{
    "hot_topics": ["từ_khóa1", "từ_khóa2", ...],
    "narrative_direction": "<mô tả hướng kể chuyện>",
    "initial_posts": [
        {{"content": "Nội dung bài đăng", "poster_type": "Loại thực thể (phải khớp với các loại có sẵn)"}},
        ...
    ],
    "reasoning": "<giải thích ngắn gọn>"
}}"""

    system_prompt = "Bạn là chuyên gia phân tích ý kiến. Trả về định dạng JSON thuần túy."

    try:
        return self._call_llm_with_retry(prompt, system_prompt)
    except Exception as e:
        logger.warning(f"Tạo cấu hình sự kiện bằng LLM thất bại: {e}, sử dụng mặc định")
        return {
            "hot_topics": [],
            "narrative_direction": "",
            "initial_posts": [],
            "reasoning": "Sử dụng cấu hình mặc định"
        }

Gán người đăng bài ban đầu

def _assign_initial_post_agents(
    self,
    event_config: EventConfig,
    agent_configs: List[AgentActivityConfig]
) -> EventConfig:

    if not event_config.initial_posts:
        return event_config

    agents_by_type: Dict[str, List[AgentActivityConfig]] = {}
    for agent in agent_configs:
        etype = agent.entity_type.lower()
        if etype not in agents_by_type:
            agents_by_type[etype] = []
        agents_by_type[etype].append(agent)

    type_aliases = {
        "official": ["official", "university", "governmentagency", "government"],
        "university": ["university", "official"],
        "mediaoutlet": ["mediaoutlet", "media"],
        "student": ["student", "person"],
        "professor": ["professor", "expert", "teacher"],
        "alumni": ["alumni", "person"],
        "organization": ["organization", "ngo", "company", "group"],
        "person": ["person", "student", "alumni"],
    }

    used_indices: Dict[str, int] = {}

    updated_posts = []
    for post in event_config.initial_posts:
        poster_type = post.get("poster_type", "").lower()
        content = post.get("content", "")
        matched_agent_id = None

        if poster_type in agents_by_type:
            agents = agents_by_type[poster_type]
            idx = used_indices.get(poster_type, 0) % len(agents)
            matched_agent_id = agents[idx].agent_id
            used_indices[poster_type] = idx + 1
        else:
            for alias_key, aliases in type_aliases.items():
                if poster_type in aliases or alias_key == poster_type:
                    for alias in aliases:
                        if alias in agents_by_type:
                            agents = agents_by_type[alias]
                            idx = used_indices.get(alias, 0) % len(agents)
                            matched_agent_id = agents[idx].agent_id
                            used_indices[alias] = idx + 1
                            break
                    if matched_agent_id is not None:
                        break

        if matched_agent_id is None:
            logger.warning(f"Không tìm thấy tác nhân phù hợp cho loại '{poster_type}', sử dụng tác nhân có ảnh hưởng cao nhất")
            if agent_configs:
                sorted_agents = sorted(agent_configs, key=lambda a: a.influence_weight, reverse=True)
                matched_agent_id = sorted_agents[0].agent_id
            else:
                matched_agent_id = 0

        updated_posts.append({
            "content": content,
            "poster_type": post.get("poster_type", "Unknown"),
            "poster_agent_id": matched_agent_id
        })

        logger.info(f"Gán bài đăng ban đầu: poster_type='{poster_type}' -> agent_id={matched_agent_id}")

    event_config.initial_posts = updated_posts
    return event_config

Tạo cấu hình tác nhân theo lô

Xử lý mỗi lô 15 thực thể:

def _generate_agent_configs_batch(
    self,
    context: str,
    entities: List[EntityNode],
    start_idx: int,
    simulation_requirement: str
) -> List[AgentActivityConfig]:

    entity_list = []
    summary_len = self.AGENT_SUMMARY_LENGTH
    for i, e in enumerate(entities):
        entity_list.append({
            "agent_id": start_idx + i,
            "entity_name": e.name,
            "entity_type": e.get_entity_type() or "Unknown",
            "summary": e.summary[:summary_len] if e.summary else ""
        })

    prompt = f"""Dựa trên thông tin sau, hãy tạo cấu hình hoạt động mạng xã hội cho mỗi thực thể.

Yêu cầu mô phỏng: {simulation_requirement}

## Danh sách thực thể

json
{json.dumps(entity_list, ensure_ascii=False, indent=2)}

...

markdown

Nhiệm vụ

Thời gian hoạt động theo thói quen người dùng Trung Quốc.
Chính sách từng nhóm:
- Chính thức: Hoạt động thấp, giờ hành chính, phản hồi chậm, ảnh hưởng cao.
- Truyền thông: Hoạt động vừa phải, cả ngày, phản hồi nhanh, ảnh hưởng cao.
- Cá nhân: Hoạt động cao, chủ yếu buổi tối, phản hồi nhanh, ảnh hưởng thấp.
- Chuyên gia: Hoạt động vừa phải, ảnh hưởng trung bình-cao.

System prompt:

system_prompt = "Bạn là chuyên gia phân tích hành vi mạng xã hội. Trả về định dạng JSON thuần túy."

Nếu LLM thất bại, sinh cấu hình dự phòng theo rule:

def _generate_agent_config_by_rule(self, entity: EntityNode) -> Dict[str, Any]:
    entity_type = (entity.get_entity_type() or "Unknown").lower()
    if entity_type in ["university", "governmentagency", "ngo"]:
        return {
            "activity_level": 0.2,
            "posts_per_hour": 0.1,
            "comments_per_hour": 0.05,
            "active_hours": list(range(9, 18)),
            "response_delay_min": 60,
            "response_delay_max": 240,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 3.0
        }
    elif entity_type in ["mediaoutlet"]:
        return {
            "activity_level": 0.5,
            "posts_per_hour": 0.8,
            "comments_per_hour": 0.3,
            "active_hours": list(range(7, 24)),
            "response_delay_min": 5,
            "response_delay_max": 30,
            "sentiment_bias": 0.0,
            "stance": "observer",
            "influence_weight": 2.5
        }
    elif entity_type in ["professor", "expert", "official"]:
        return {
            "activity_level": 0.4,
            "posts_per_hour": 0.3,
            "comments_per_hour": 0.5,
            "active_hours": list(range(8, 22)),
            "response_delay_min": 15,
            "response_delay_max": 90,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 2.0
        }
    elif entity_type in ["student"]:
        return {
            "activity_level": 0.8,
            "posts_per_hour": 0.6,
            "comments_per_hour": 1.5,
            "active_hours": [8, 9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
            "response_delay_min": 1,
            "response_delay_max": 15,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 0.8
        }
    elif entity_type in ["alumni"]:
        return {
            "activity_level": 0.6,
            "posts_per_hour": 0.4,
            "comments_per_hour": 0.8,
            "active_hours": [12, 13, 19, 20, 21, 22, 23],
            "response_delay_min": 5,
            "response_delay_max": 30,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 1.0
        }
    else:
        return {
            "activity_level": 0.7,
            "posts_per_hour": 0.5,
            "comments_per_hour": 1.2,
            "active_hours": [9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
            "response_delay_min": 2,
            "response_delay_max": 20,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 1.0
        }

Gọi LLM có thử lại và sửa lỗi JSON

Các lệnh gọi LLM dễ gặp lỗi JSON, cắt ngắn, hoặc lỗi format. Để tăng độ bền:

def _call_llm_with_retry(self, prompt: str, system_prompt: str) -> Dict[str, Any]:
    import re
    max_attempts = 3
    last_error = None

    for attempt in range(max_attempts):
        try:
            response = self.client.chat.completions.create(
                model=self.model_name,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ],
                response_format={"type": "json_object"},
                temperature=0.7 - (attempt * 0.1)
            )

            content = response.choices[0].message.content
            finish_reason = response.choices[0].finish_reason

            if finish_reason == 'length':
                logger.warning(f"Đầu ra LLM bị cắt ngắn (lần thử {attempt+1})")
                content = self._fix_truncated_json(content)

            try:
                return json.loads(content)
            except json.JSONDecodeError as e:
                logger.warning(f"Phân tích JSON thất bại (lần thử {attempt+1}): {str(e)[:80]}")
                fixed = self._try_fix_config_json(content)
                if fixed:
                    return fixed
                last_error = e

        except Exception as e:
            logger.warning(f"Gọi LLM thất bại (lần thử {attempt+1}): {str(e)[:80]}")
            last_error = e
            import time
            time.sleep(2 * (attempt + 1))

    raise last_error or Exception("Gọi LLM thất bại")

Sửa lỗi JSON bị cắt ngắn

def _fix_truncated_json(self, content: str) -> str:
    content = content.strip()
    open_braces = content.count('{') - content.count('}')
    open_brackets = content.count('[') - content.count(']')
    if content and content[-1] not in '",}]':
        content += '"'
    content += ']' * open_brackets
    content += '}' * open_braces
    return content

Sửa lỗi JSON nâng cao

def _try_fix_config_json(self, content: str) -> Optional[Dict[str, Any]]:
    import re
    content = self._fix_truncated_json(content)
    json_match = re.search(r'\{[\s\S]*\}', content)
    if json_match:
        json_str = json_match.group()

        def fix_string(match):
            s = match.group(0)
            s = s.replace('\n', ' ').replace('\r', ' ')
            s = re.sub(r'\s+', ' ', s)
            return s

        json_str = re.sub(r'"[^"\\]*(?:\\.[^"\\]*)*"', fix_string, json_str)

        try:
            return json.loads(json_str)
        except:
            json_str = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', json_str)
            json_str = re.sub(r'\s+', ' ', json_str)
            try:
                return json.loads(json_str)
            except:
                pass
    return None

Cấu trúc dữ liệu cấu hình

Cấu hình hoạt động tác nhân

@dataclass
class AgentActivityConfig:
    agent_id: int
    entity_uuid: str
    entity_name: str
    entity_type: str
    activity_level: float = 0.5
    posts_per_hour: float = 1.0
    comments_per_hour: float = 2.0
    active_hours: List[int] = field(default_factory=lambda: list(range(8, 23)))
    response_delay_min: int = 5
    response_delay_max: int = 60
    sentiment_bias: float = 0.0
    stance: str = "neutral"
    influence_weight: float = 1.0

Cấu hình mô phỏng thời gian

@dataclass
class TimeSimulationConfig:
    total_simulation_hours: int = 72
    minutes_per_round: int = 60
    agents_per_hour_min: int = 5
    agents_per_hour_max: int = 20
    peak_hours: List[int] = field(default_factory=lambda: [19, 20, 21, 22])
    peak_activity_multiplier: float = 1.5
    off_peak_hours: List[int] = field(default_factory=lambda: [0, 1, 2, 3, 4, 5])
    off_peak_activity_multiplier: float = 0.05
    morning_hours: List[int] = field(default_factory=lambda: [6, 7, 8])
    morning_activity_multiplier: float = 0.4
    work_hours: List[int] = field(default_factory=lambda: [9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
    work_activity_multiplier: float = 0.7

Các thông số mô phỏng hoàn chỉnh

@dataclass
class SimulationParameters:
    simulation_id: str
    project_id: str
    graph_id: str
    simulation_requirement: str
    time_config: TimeSimulationConfig = field(default_factory=TimeSimulationConfig)
    agent_configs: List[AgentActivityConfig] = field(default_factory=list)
    event_config: EventConfig = field(default_factory=EventConfig)
    twitter_config: Optional[PlatformConfig] = None
    reddit_config: Optional[PlatformConfig] = None
    llm_model: str = ""
    llm_base_url: str = ""
    generated_at: str = field(default_factory=lambda: datetime.now().isoformat())
    generation_reasoning: str = ""

    def to_dict(self) -> Dict[str, Any]:
        time_dict = asdict(self.time_config)
        return {
            "simulation_id": self.simulation_id,
            "project_id": self.project_id,
            "graph_id": self.graph_id,
            "simulation_requirement": self.simulation_requirement,
            "time_config": time_dict,
            "agent_configs": [asdict(a) for a in self.agent_configs],
            "event_config": asdict(self.event_config),
            "twitter_config": asdict(self.twitter_config) if self.twitter_config else None,
            "reddit_config": asdict(self.reddit_config) if self.reddit_config else None,
            "llm_model": self.llm_model,
            "llm_base_url": self.llm_base_url,
            "generated_at": self.generated_at,
            "generation_reasoning": self.generation_reasoning,
        }

Bảng tóm tắt: Các mẫu tác nhân theo loại

Loại tác nhân	Hoạt động	Giờ hoạt động	Bài đăng/giờ	Bình luận/giờ	Phản hồi (phút)	Ảnh hưởng
Đại học	0.2	9-17	0.1	0.05	60-240	3.0
Cơ quan chính phủ	0.2	9-17	0.1	0.05	60-240	3.0
Đơn vị truyền thông	0.5	7-23	0.8	0.3	5-30	2.5
Giáo sư	0.4	8-21	0.3	0.5	15-90	2.0
Sinh viên	0.8	8-12, 18-23	0.6	1.5	1-15	0.8
Cựu sinh viên	0.6	12-13, 19-23	0.4	0.8	5-30	1.0
Cá nhân (mặc định)	0.7	9-13, 18-23	0.5	1.2	2-20	1.0

Kết luận

Để tự động hóa cấu hình mô phỏng tác nhân AI vận hành quy mô lớn, cần chú ý:

Tạo theo từng bước: Phân tách pipeline thành các giai đoạn nhỏ, từng bước xác thực kết quả, dễ debug.
Xử lý theo lô: Chia nhỏ khối lượng phù hợp với giới hạn context/token của LLM.
Sửa lỗi JSON: Luôn kiểm tra, tự động sửa lỗi JSON hoặc output bị cắt ngắn.
Dự phòng dựa trên quy tắc: Khi LLM không trả về kết quả hợp lệ, sinh cấu hình bằng rule cứng.
Cấu hình mẫu chi tiết: Mỗi loại tác nhân cần quy tắc hoạt động riêng biệt.
Xác thực liên tục: Tích hợp Apidog để kiểm tra schema, phát hiện và sửa lỗi trước khi sản xuất.

Hệ thống này giúp bạn triển khai mô phỏng xã hội lớn một cách tự động, bền vững và dễ mở rộng.

(Giữ nguyên toàn bộ hình ảnh, video và bảng như trong nội dung gốc)

DEV Community