[AutoBe] achieved 100% compilation success of backend generation with "qwen3-next-80b-a3b"

#ai #llm #backend #opensource

https://www.reddit.com/r/LocalLLaMA/comments/1o3604u/autobe_achieved_100_compilation_success_of/

This is an article copied from Reddit Local LLaMa channel's article of 4 months ago written. A new shocking article may come soon.

AutoBE is an open-source project that serves as an agent capable of automatically generating backend applications through conversations with AI chatbots.

AutoBE aims to generate 100% functional backend applications, and we recently achieved 100% compilation success for backend applications even with local AI models like qwen3-next-80b-a3b (also mini models of GPTs). This represents a significant improvement over our previous attempts with qwen3-next-80b-a3b, where most projects failed to build due to compilation errors, even though we managed to generate backend applications.

Dark background screenshots: After AutoBE improvements
- 100% compilation success doesn't necessarily mean 100% runtime success
- Shopping Mall failed due to excessive input token size
Light background screenshots: Before AutoBE improvements
- Many failures occurred with gpt-4.1-mini and qwen3-next-80b-a3b

Project	`qwen3-next-80b-a3b-instruct`	`openai/gpt-4.1-mini`	`openai/gpt-4.1`
To Do List	Qwen3 To Do	GPT 4.1-mini To Do	GPT 4.1 To Do
Reddit Community	Qwen3 Reddit	GPT 4.1-mini Reddit	GPT 4.1 Reddit
Economic Discussion	Qwen3 BBS	GPT 4.1-mini BBS	GPT 4.1 BBS
E-Commerce	Qwen3 Shopping	GPT 4.1-mini Shopping	GPT 4.1 Shopping

Of course, achieving 100% compilation success for backend applications generated by AutoBE does not mean that these applications are 100% safe or will run without any problems at runtime.

AutoBE-generated backend applications still don't pass 100% of their own test programs. Sometimes AutoBE writes incorrect SQL queries, and occasionally it misinterprets complex business logic and implements something entirely different.

Current test function pass rate is approximately 80%

We expect to achieve 100% runtime success rate by the end of this year

Through this month-long experimentation and optimization with local LLMs like qwen3-next-80b-a3b, I've been amazed by their remarkable function calling performance and rapid development pace.

The core principle of AutoBE is not to have AI write programming code as text for backend application generation. Instead, we developed our own AutoBE-specific compiler and have AI construct its AST (Abstract Syntax Tree) structure through function calling. The AST inevitably takes on a highly complex form with countless types intertwined in unions and tree structures.

When I experimented with local LLMs earlier this year, not a single model could handle AutoBE's AST structure. Even Qwen's previous model, qwen3-235b-a22b, couldn't pass through it such perfectly. The AST structures of AutoBE's specialized compilers, such as AutoBeDatabase, AutoBeOpenApi, and AutoBeTest, acted as gatekeepers, preventing us from integrating local LLMs with AutoBE. But in just a few months, newly released local LLMs suddenly succeeded in generating these structures, completely changing the landscape.

// Example of AutoBE's AST structure
export namespace AutoBeOpenApi {
  export type IJsonSchema = 
    | IJsonSchema.IConstant
    | IJsonSchema.IBoolean
    | IJsonSchema.IInteger
    | IJsonSchema.INumber
    | IJsonSchema.IString
    | IJsonSchema.IArray
    | IJsonSchema.IObject
    | IJsonSchema.IReference
    | IJsonSchema.IOneOf
    | IJsonSchema.INull;
}
export namespace AutoBeTest {
  export type IExpression =
    | IBooleanLiteral
    | INumericLiteral
    | IStringLiteral
    | IArrayLiteralExpression
    | IObjectLiteralExpression
    | INullLiteral
    | IUndefinedKeyword
    | IIdentifier
    | IPropertyAccessExpression
    | IElementAccessExpression
    | ITypeOfExpression
    | IPrefixUnaryExpression
    | IPostfixUnaryExpression
    | IBinaryExpression
    | IArrowFunction
    | ICallExpression
    | INewExpression
    | IArrayFilterExpression
    | IArrayForEachExpression
    | IArrayMapExpression
    | IArrayRepeatExpression
    | IPickRandom
    | ISampleRandom
    | IBooleanRandom
    | IIntegerRandom
    | INumberRandom
    | IStringRandom
    | IPatternRandom
    | IFormatRandom
    | IKeywordRandom
    | IEqualPredicate
    | INotEqualPredicate
    | IConditionalPredicate
    | IErrorPredicate;
}

As an open-source developer, I send infinite praise and respect to those creating these open-source AI models. Our AutoBE team is a small project with 2 developers, and our capabilities and recognition are incomparably lower than those of LLM developers. Nevertheless, we want to contribute to the advancement of local LLMs and grow together.

To this end, we plan to develop benchmarks targeting each compiler component of AutoBE, conduct in-depth analysis of local LLMs' function calling capabilities for complex types, and publish the results periodically. We aim to release our first benchmark in about two months, covering most commercial and open-source AI models available.

We appreciate your interest and support, and will come back with the new benchmark.