Sebastian Häni

Posted on Mar 26

Running a WebDev Competition with AI Allowed

#ai #webdev #competition

A few weeks ago, we held the yearly Swiss ICT Regional Championship in Web Technologies with 91 participants across Switzerland, and I as Chief Expert was accountable for the task design, task distribution, submission collection, and importantly the marking. We have an amazing team of experts that help with this.

Competitors were apprentices mostly between 16 and 18 years old. The competition takes place at several venues across Switzerland, where participants are supervised and cannot communicate with others. The format has remained stable over the years: all participants solve the same centrally designed tasks within a three-hour window, and submissions are evaluated automatically using a mix of unit, integration, and end-to-end tests.

At this scale, automated evaluation is essential. It ensures consistency and allows comparison across regions without manual intervention. With 91 * 3 hours of highly focused engineer work, the amount of output would be impractical to grade manually.

The goal of the competition is twofold. First, it serves as a qualification step for the national competition SwissSkills (next in September 2027), where the best young web engineer in Switzerland is selected, which officially represents Switzerland at the world competition WorldSkills (next one is in September 2026 in Shanghai, then 2028 in Japan). Second, the regional competition identifies the best web engineer within each Swiss region. Therefore, the format should resemble both a typical workday as a web engineer in Switzerland and the international competition.

One key difference: AI is not allowed at the international level, while in practice it is widely used in everyday work.

Until last year, AI tools were not permitted. This year, we decided to allow them to better reflect current industry practice. And we clearly approached it as an experiment. We said: If we never try allowing AI at a competition, we will never actually learn what the outcomes are.

Adjustments to the Format

Allowing AI required several changes to the competition setup.

In previous years, participants had full access to the test suite code. With AI tools, this creates a feedback loop where solutions can be iteratively refined until all tests pass. To avoid this, we removed direct access to the tests.

Instead, we introduced a central submission platform. Participants could submit as often as they wanted and received feedback as total points per module. There were four modules: backend, frontend, styling, and a CSS selector puzzle. A total of 165 tests ensured enough fine grained scoring. Detailed test reports were still available, but could only be unlocked three times per module to encourage deliberate thinking over trial-and-error and just feeding it back to the AI agent.

We also increased the overall task volume by roughly 20 to 30 percent.

Additionally, we adjusted task design to require more context-building by the competitor instead of being fully AI-ready. Some requirements were expressed visually, for example via screenshots. In other cases, participants had to find resources online based on a description.

Productivity Effects

The median score increased from roughly 30–40 percent in previous years without AI to around 50 percent with AI.

This indicates a measurable productivity gain with AI. At the same time, the increase remained moderate, not a 10x jump.

We solve all tasks beforehand to validate their difficulty. With paid frontier AI models, we were able to reach a solid level of completeness with relatively few prompts. Among participants, however, results varied much more.

Using AI effectively requires structuring the problem, preparing context, writing prompts, and validating outputs. These skills appear unevenly distributed among young upcoming engineers. The variation in AI usage skill among young engineers was larger than expected.

Score Distribution

The biggest surprise was the distribution of scores.

Historically, results followed a logarithmic pattern: a few top performers with clear gaps at the top, then a gradual flattening. This year with AI, results were almost perfectly linear across participants. Before the competition, we expected AI to create clustering near the maximum score followed by a sharper drop. This did not happen.

Interpretation

Breaking down the distribution:

The middle range shifted upward, consistent with the higher median
The lower end remained broadly spread
The top end became more compressed, with smaller gaps

This suggests AI impacts different parts of the distribution differently. Gains are most visible in the middle, while differences among top performers shrink.

The idea that AI amplifies existing skill still holds, but the amplification is not uniform. It appears strongest in the middle of the distribution in our data.

Participant Feedback

Participants were invited to give feedback after the competition.

Many reported that working with AI felt less satisfying than working with deterministic tests.

Some also said AI reduces the enjoyment of the competition, as they value coding as a craft and prefer solving problems without assistance.

We also asked whether they would have participated if AI had not been allowed. Almost everyone said yes. This was somewhat surprising, as we expected some to opt out due to relying on AI.

When asked whether AI should be allowed in future editions, responses were exactly split 50/50, showing how there is no clear consensus on this topic on using AI in coding competitions.

Note: My understanding is that most if not all apprentices can sign up for student licenses with known providers of AI models to get access to frontier models, meaning it's not pay-to-win. If that is not true, it will influence the next editions.

Closing Remarks

This competition provides a relatively controlled environment: identical tasks, a fixed time frame, and automated evaluation across a large group.

Within this setup, allowing AI was associated with:

a moderate increase in productivity
a shift toward a more even score distribution
varying effectiveness depending on the participant

We are currently evaluating how these findings should influence future editions of the competition.

DEV Community