<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nicholas Kang (Nick)</title>
    <description>The latest articles on DEV Community by Nicholas Kang (Nick) (@nicholas_kangnick_ac18).</description>
    <link>https://dev.to/nicholas_kangnick_ac18</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3711508%2Fcccf23c9-a97a-41df-9483-1bd94c222a43.jpg</url>
      <title>DEV Community: Nicholas Kang (Nick)</title>
      <link>https://dev.to/nicholas_kangnick_ac18</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nicholas_kangnick_ac18"/>
    <language>en</language>
    <item>
      <title>Kaggle is making AI benchmark creation effortless</title>
      <dc:creator>Nicholas Kang (Nick)</dc:creator>
      <pubDate>Thu, 04 Jun 2026 15:51:23 +0000</pubDate>
      <link>https://dev.to/googleai/kaggle-is-making-ai-benchmark-creation-effortless-1g7n</link>
      <guid>https://dev.to/googleai/kaggle-is-making-ai-benchmark-creation-effortless-1g7n</guid>
      <description>&lt;p&gt;As AI models evolve from simple chatbots into reasoning agents that write code, use tools and solve complex problems, traditional benchmarks are no longer enough. The community needs dynamic, rigorous evaluations — built by the people who use these models in the real-world.&lt;/p&gt;

&lt;p&gt;That’s why we launched &lt;a href="https://www.kaggle.com/benchmarks" rel="noopener noreferrer"&gt;Kaggle Benchmarks&lt;/a&gt;. Since then, the global AI community has created more than 10,000 evaluation tasks, creating the trustworthy, transparent public leaderboards that help labs measure and accelerate AI progress.&lt;/p&gt;

&lt;p&gt;Today, we are taking the next step by launching local development for Kaggle Benchmarks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Kaggle Benchmarks from your local development environment
&lt;/h2&gt;

&lt;p&gt;Until now, creating evaluation tasks meant working exclusively in Kaggle's web-based notebook editor, instead of developers’ preferred stack to build with. &lt;/p&gt;

&lt;p&gt;Our new update enables developers to create, validate, push, run and download tasks directly from their local development environments like Antigravity, VSCode, Cursor and coding agents. This update is designed to meet developers where they work, making the journey from idea to evaluation faster and more intuitive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build evaluation tasks in natural language with AI coding agents
&lt;/h2&gt;

&lt;p&gt;Local development also unlocks a powerful new workflow: using AI coding agents to write benchmark tasks through the &lt;a href="https://github.com/Kaggle/kaggle-skills/blob/main/write-kaggle-benchmarks/SKILL.md" rel="noopener noreferrer"&gt;write-kaggle-benchmarks skill&lt;/a&gt;. This skill comprises a set of structured instructions that teaches a coding agent how to build tasks using the &lt;a href="https://github.com/Kaggle/kaggle-benchmarks" rel="noopener noreferrer"&gt;kaggle-benchmarks SDK&lt;/a&gt; and the &lt;a href="https://github.com/Kaggle/kaggle-cli/blob/main/docs/benchmarks.md" rel="noopener noreferrer"&gt;Kaggle CLI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To add this skill to your agent, simply ask your agent to: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Install the write-kaggle-benchmarks skill: &lt;a href="https://github.com/Kaggle/kaggle-skills" rel="noopener noreferrer"&gt;https://github.com/Kaggle/kaggle-skills&lt;/a&gt;” &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once installed,  you can describe an evaluation in plain language and get a working task on Kaggle. For example, you can tell your agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using the write-kaggle-benchmarks skill, build a task that asks the model if &lt;a href="https://www.kaggle.com/benchmarks/tasks/nicholaskanggoog/math-false-statement" rel="noopener noreferrer"&gt;"300+140=460 is correct?"&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These powerful capabilities are driven by the new commands that we have built for Benchmarks in the Kaggle CLI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understand why community-driven evaluations matter
&lt;/h2&gt;

&lt;p&gt;We built Kaggle Benchmarks to democratize trustworthy AI evaluations. We believe that if a capability can be measured, labs will race to improve it. By providing these clear, objective signals, our hope is to empower AI labs to drive model improvements in the areas that matter most.&lt;/p&gt;

&lt;p&gt;For AI to truly benefit humanity, evaluations must reflect the full diversity of real-world challenges. We believe this launch is a significant step toward enabling anyone, anywhere, to build the evaluations that will shape the future of AI.&lt;/p&gt;

&lt;p&gt;Ready to build? Try &lt;a href="https://www.kaggle.com/benchmarks?task=true" rel="noopener noreferrer"&gt;Kaggle Benchmarks&lt;/a&gt; today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Additional resources&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read the &lt;a href="https://github.com/Kaggle/kaggle-cli/blob/main/docs/benchmarks.md" rel="noopener noreferrer"&gt;docs for kaggle-cli on GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install the &lt;a href="https://github.com/Kaggle/kaggle-skills/tree/main/write-kaggle-benchmarks" rel="noopener noreferrer"&gt;write-kaggle-benchmark skill&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🎁 Idea → eval? Show us. Post your task + workflow by tagging @kaggle on X or LinkedIn by July 1st for a chance to win Kaggle swag and a social shoutout&lt;/li&gt;
&lt;li&gt;Watch the &lt;a href="https://www.youtube.com/watch?v=c7B8vyehyUA" rel="noopener noreferrer"&gt;product demo on YouTube&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Introducing Community Benchmarks on Kaggle</title>
      <dc:creator>Nicholas Kang (Nick)</dc:creator>
      <pubDate>Wed, 14 Jan 2026 20:54:10 +0000</pubDate>
      <link>https://dev.to/googleai/introducing-community-benchmarks-on-kaggle-35nc</link>
      <guid>https://dev.to/googleai/introducing-community-benchmarks-on-kaggle-35nc</guid>
      <description>&lt;p&gt;Today, Kaggle is launching &lt;a href="https://www.kaggle.com/benchmarks?type=community" rel="noopener noreferrer"&gt;Community Benchmarks&lt;/a&gt;, which lets the global AI community design, run and share their own custom benchmarks for evaluating AI models. This is the next step after we launched &lt;a href="https://www.kaggle.com/benchmarks" rel="noopener noreferrer"&gt;Kaggle Benchmarks last year&lt;/a&gt;, to provide trustworthy and transparent access to evaluations from top-tier research groups like &lt;a href="https://www.kaggle.com/benchmarks/metaresearch/multiloko" rel="noopener noreferrer"&gt;Meta’s MultiLoKo&lt;/a&gt; and &lt;a href="https://www.kaggle.com/benchmarks/google/facts" rel="noopener noreferrer"&gt;Google’s FACTS suite&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why community-driven evaluation matters
&lt;/h2&gt;

&lt;p&gt;AI capabilities have evolved so rapidly that it’s become difficult to evaluate model performance. Not long ago, a single accuracy score on a static dataset was enough to determine model quality. But today, as LLMs evolve into reasoning agents that collaborate, write code and use tools, those static metrics and simple evaluations are no longer sufficient.&lt;/p&gt;

&lt;p&gt;Kaggle Community Benchmarks provide developers with a transparent way to validate their specific use cases and bridge the gap between experimental code and production-ready applications.&lt;/p&gt;

&lt;p&gt;These real-world use cases demand a more flexible and transparent evaluation framework. Kaggle’s Community Benchmarks provide a more dynamic, rigorous and continuously evolving approach to AI model evaluation — one shaped by the users building and deploying these systems everyday.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to build your own benchmarks on Kaggle
&lt;/h2&gt;

&lt;p&gt;Benchmarks start with building tasks, which can range from evaluating multi-step reasoning and code generation to testing tool use or image recognition. Once you have tasks, you can add them to a benchmark to evaluate and rank selected models by how they perform across the tasks in the benchmark.&lt;/p&gt;

&lt;p&gt;Here’s how you can get started:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create a task:&lt;/strong&gt; Tasks test an AI model’s performance on a specific problem. They allow you to run reproducible tests across different models to compare their accuracy and capabilities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create a benchmark:&lt;/strong&gt; Once you have created one or more tasks, you can group them into a Benchmark. A benchmark allows you to run tasks across a suite of leading AI models and generate a leaderboard to track and compare their performance.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/VBlyJJ7PTD8"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;Once you build your benchmark, here’s what benefits you’ll see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Broad model access:&lt;/strong&gt; Free access (within quota limits) to state-of-the-art models from labs like Google, Anthropic, DeepSeek and more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reproducibility:&lt;/strong&gt; Benchmarks capture exact outputs and model interactions so results can be audited and verified.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex interactions:&lt;/strong&gt; They support testing for multi-modal inputs, code execution, tool use and multi-turn conversations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rapid prototyping:&lt;/strong&gt; They allow you to quickly design and iterate on creative new tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These powerful capabilities are powered by the new &lt;a href="https://github.com/Kaggle/kaggle-benchmarks" rel="noopener noreferrer"&gt;kaggle-benchmarks SDK&lt;/a&gt;. Here are a few resources for getting started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarks Cookbook:&lt;/strong&gt; &lt;a href="https://github.com/Kaggle/kaggle-benchmarks/blob/ci/cookbook.md" rel="noopener noreferrer"&gt;A guide to advanced features and use cases.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example tasks:&lt;/strong&gt; &lt;a href="https://github.com/Kaggle/kaggle-benchmarks/tree/ci/documentation/examples" rel="noopener noreferrer"&gt;Get inspired with a variety of pre-built tasks.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Getting started:&lt;/strong&gt; &lt;a href="https://www.kaggle.com/docs/benchmarks#How%20to%20create%20a%20benchmark" rel="noopener noreferrer"&gt;How to create your first task &amp;amp; benchmark&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How we’re shaping the future of AI evaluation
&lt;/h2&gt;

&lt;p&gt;The future of AI progress depends on how models are evaluated. With Kaggle Community Benchmarks, Kagglers are no longer just testing models, they’re helping shape the next generation of intelligence.&lt;/p&gt;

&lt;p&gt;Ready to build? Try &lt;a href="https://www.kaggle.com/benchmarks?type=community" rel="noopener noreferrer"&gt;Community Benchmarks&lt;/a&gt; today.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>kaggle</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
