DEV Community

pueding profile picture

pueding

Computer Science and Engineering

Agent-Harness Scaling Law: Feedback Quality Predicts Success, Not Raw Compute: Effective Feedback Compute (EFC)

Agent-Harness Scaling Law: Feedback Quality Predicts Success, Not Raw Compute: Effective Feedback Compute (EFC)

Comments
7 min read
AutoLab Benchmarks Frontier Agents on Long-Horizon R&D Tasks: Iterative Experiment-Loop Evaluation

AutoLab Benchmarks Frontier Agents on Long-Horizon R&D Tasks: Iterative Experiment-Loop Evaluation

Comments
6 min read
MCP SEP-2106: Full JSON Schema 2020-12 in Tool I/O

MCP SEP-2106: Full JSON Schema 2020-12 in Tool I/O

Comments
7 min read
MarginGate: Margin-Gated Verification for Batch-Invariant Decoding

MarginGate: Margin-Gated Verification for Batch-Invariant Decoding

Comments
5 min read
MCP 2026-07-28 RC: Stateless Transport

MCP 2026-07-28 RC: Stateless Transport

Comments
9 min read
Token Budgets Paper: Affine-Typed Budget Ownership

Token Budgets Paper: Affine-Typed Budget Ownership

Comments
6 min read
Microsoft MAI-Code-1-Flash: Adaptive Solution-Length Control

Microsoft MAI-Code-1-Flash: Adaptive Solution-Length Control

1
Comments
6 min read
Harness-1: State-Externalizing Search Harness

Harness-1: State-Externalizing Search Harness

Comments
7 min read
GrepSeek Trains a Search Agent to Use Shell Commands: GRPO-Trained Shell-Command Search

GrepSeek Trains a Search Agent to Use Shell Commands: GRPO-Trained Shell-Command Search

Comments
6 min read
AgentDoG 1.5: Small Inline Guard Models for Agent Actions

AgentDoG 1.5: Small Inline Guard Models for Agent Actions

Comments
7 min read
Claude Opus 4.8: Parallel-Subagent Dynamic Workflows

Claude Opus 4.8: Parallel-Subagent Dynamic Workflows

Comments
6 min read
OmniRetrieval: Source-Native Query Dispatch

OmniRetrieval: Source-Native Query Dispatch

Comments
6 min read
Gemini 3.5 Flash: Agent-First Model Design

Gemini 3.5 Flash: Agent-First Model Design

Comments 1
8 min read
CDD Paper: Context-Driven Decomposition for RAG Knowledge Conflict

CDD Paper: Context-Driven Decomposition for RAG Knowledge Conflict

Comments
9 min read
Cursor Composer 2.5: Targeted Textual Feedback RL

Cursor Composer 2.5: Targeted Textual Feedback RL

Comments
8 min read
Boiling the Frog Paper: Multi-Turn Norm Erosion vs Single-Prompt Agent Safety

Boiling the Frog Paper: Multi-Turn Norm Erosion vs Single-Prompt Agent Safety

Comments
8 min read
OpenSCAD Pantheon Benchmark: Human-In-The-Loop vs Autonomous Coding Agents

OpenSCAD Pantheon Benchmark: Human-In-The-Loop vs Autonomous Coding Agents

Comments
8 min read
Camouflage Injection Paper: Camouflage Detection Gap

Camouflage Injection Paper: Camouflage Detection Gap

Comments
7 min read
MCP SEP-2468: RFC 9207 Iss Parameter for OAuth Mix-Up Defense

MCP SEP-2468: RFC 9207 Iss Parameter for OAuth Mix-Up Defense

1
Comments
8 min read
Is Grep All You Need? Grep vs Vector Retrieval for Agentic Search

Is Grep All You Need? Grep vs Vector Retrieval for Agentic Search

2
Comments
7 min read
loading...