<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ricardo Costa</title>
    <description>The latest articles on DEV Community by Ricardo Costa (@ricardocosta0405).</description>
    <link>https://dev.to/ricardocosta0405</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940957%2Fdafe89c4-6a5e-4656-9cf2-d6d5b08cf30c.jpeg</url>
      <title>DEV Community: Ricardo Costa</title>
      <link>https://dev.to/ricardocosta0405</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ricardocosta0405"/>
    <language>en</language>
    <item>
      <title>Building a CI helper for Playwright Java</title>
      <dc:creator>Ricardo Costa</dc:creator>
      <pubDate>Tue, 02 Jun 2026 18:29:35 +0000</pubDate>
      <link>https://dev.to/ricardocosta0405/building-a-ci-helper-for-playwright-java-43i6</link>
      <guid>https://dev.to/ricardocosta0405/building-a-ci-helper-for-playwright-java-43i6</guid>
      <description>&lt;p&gt;Playwright has excellent tooling around browser automation, but most of the ecosystem still feels heavily Node.js-centric.&lt;/p&gt;

&lt;p&gt;For Java teams, there's a surprising amount of infrastructure work that sits between:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git push
   ↓
ci execution
   ↓
useful failure diagnostics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To explore that gap, I built a small Java CLI:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub repo:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/ricardo-costa0405/playwright-java-ci-helper" rel="noopener noreferrer"&gt;https://github.com/ricardo-costa0405/playwright-java-ci-helper&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The current implementation focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;build system detection&lt;/li&gt;
&lt;li&gt;test execution&lt;/li&gt;
&lt;li&gt;artifact collection&lt;/li&gt;
&lt;li&gt;machine-readable failure summaries&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  build system detection
&lt;/h2&gt;

&lt;p&gt;The first requirement was zero project configuration.&lt;/p&gt;

&lt;p&gt;The helper attempts to detect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./mvnw
pom.xml
./gradlew
build.gradle
build.gradle.kts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and automatically generates the appropriate execution strategy.&lt;/p&gt;

&lt;p&gt;The goal is straightforward:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;the same binary should be able to run inside arbitrary playwright java repositories without requiring repository-specific configuration.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This allows the tool to work consistently across Maven and Gradle projects while keeping onboarding friction close to zero.&lt;/p&gt;




&lt;h2&gt;
  
  
  test execution
&lt;/h2&gt;

&lt;p&gt;The helper can execute either an automatically detected build command or a user-supplied command.&lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;java &lt;span class="nt"&gt;-jar&lt;/span&gt; playwright-java-ci-helper.jar &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project-dir&lt;/span&gt; my-project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;java &lt;span class="nt"&gt;-jar&lt;/span&gt; playwright-java-ci-helper.jar &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--test-command&lt;/span&gt; &lt;span class="s2"&gt;"mvn test -Dtest=LoginTest"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An optional setup phase can also be be executed before running tests.&lt;/p&gt;

&lt;p&gt;This allows repositories to perform environment preparation, Playwright installation, or custom bootstrap steps before execution begins.&lt;/p&gt;




&lt;h2&gt;
  
  
  why not parse console logs?
&lt;/h2&gt;

&lt;p&gt;Many CI systems still derive test status from stdout.&lt;/p&gt;

&lt;p&gt;That approach tends to be fragile because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;log formats change&lt;/li&gt;
&lt;li&gt;plugins inject additional output&lt;/li&gt;
&lt;li&gt;parallel execution interleaves messages&lt;/li&gt;
&lt;li&gt;different frameworks produce different structures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead, the helper parses JUnit XML directly and extracts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tests
failures
errors
skipped
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;from the actual source of truth.&lt;/p&gt;

&lt;p&gt;This produces deterministic results regardless of how verbose or customized the console output becomes.&lt;/p&gt;




&lt;h2&gt;
  
  
  artifact collection
&lt;/h2&gt;

&lt;p&gt;The less obvious challenge is artifact discovery.&lt;/p&gt;

&lt;p&gt;A failing Playwright run can generate output across multiple locations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;target/surefire-reports
target/failsafe-reports
build/test-results
build/reports/tests
playwright-report
test-results
screenshots
videos
traces
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;depending on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;build tool&lt;/li&gt;
&lt;li&gt;project structure&lt;/li&gt;
&lt;li&gt;reporting configuration&lt;/li&gt;
&lt;li&gt;team conventions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The helper currently collects only artifacts generated during the active execution window.&lt;/p&gt;

&lt;p&gt;This avoids a common CI problem where stale artifacts from previous executions are accidentally included in failure analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  CI sharding
&lt;/h2&gt;

&lt;p&gt;One area I wanted to support from the beginning was CI parallelization.&lt;br&gt;
The helper exports:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PW_JAVA_CI_SHARD_INDEX
PW_JAVA_CI_SHARD_TOTAL
PW_JAVA_CI_WORKERS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and automatically injects equivalent parameters into Maven and Gradle executions.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;java &lt;span class="nt"&gt;-jar&lt;/span&gt; playwright-java-ci-helper.jar &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--shard-index&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--shard-total&lt;/span&gt; 4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--workers&lt;/span&gt; 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The idea is to keep orchestration concerns outside the test implementation itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  machine-readable failure context
&lt;/h2&gt;

&lt;p&gt;The part I find most interesting isn't the reporting itself.&lt;br&gt;
It's creating a deterministic interface between CI systems and automated tooling.&lt;br&gt;
Today, many teams experimenting with agents and AI-assisted debugging are still passing large amounts of raw information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;thousands of log lines
screenshots
reports
traces
console output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The approach works, but it scales poorly.&lt;br&gt;
As more platforms move toward API-based billing models, context size starts becoming an engineering concern rather than just an implementation detail.&lt;/p&gt;

&lt;p&gt;Instead of sending:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4000+ lines of CI logs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;a tool can provide:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tests"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;182&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"failures"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"screenshots"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"traces"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"failedTests"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal isn't only to improve signal quality.&lt;br&gt;
The goal is to reduce the amount of context required for an agent to reason about a failure.&lt;/p&gt;

&lt;p&gt;This becomes increasingly important when traces, screenshots, reports, and execution logs start accumulating across hundreds or thousands of CI runs.&lt;/p&gt;

&lt;p&gt;I suspect we'll see more tooling move in this direction as agents become part of the standard engineering workflow.&lt;/p&gt;


&lt;h2&gt;
  
  
  generating playwright java skeletons
&lt;/h2&gt;

&lt;p&gt;I've also been experimenting with generating Playwright Java test skeletons from browser interaction flows and agent command scripts.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;playwright-cli open https://demo.playwright.dev/todomvc
playwright-cli type "Buy groceries"
playwright-cli press Enter
playwright-cli screenshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;can be transformed into a Java test template.&lt;br&gt;
One interesting limitation is locator generation.&lt;/p&gt;

&lt;p&gt;Agent references such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;e21
e37
e42
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cannot safely be translated into stable Playwright locators.&lt;br&gt;
The generated code compiles, but locator selection remains a human responsibility.&lt;/p&gt;

&lt;p&gt;At least for now, a human-in-the-loop approach feels significantly more realistic than fully autonomous test generation.&lt;/p&gt;




&lt;h2&gt;
  
  
  open questions
&lt;/h2&gt;

&lt;p&gt;Some areas I'm currently exploring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;should junit parsing remain framework-agnostic?&lt;/li&gt;
&lt;li&gt;or should framework-specific adapters be introduced for richer diagnostics (e.g. TestNG retries, groups and dependencies)?&lt;/li&gt;
&lt;li&gt;is artifact collection better handled through plugins than filesystem discovery?&lt;/li&gt;
&lt;li&gt;what is the smallest useful schema for agent-driven failure analysis?&lt;/li&gt;
&lt;li&gt;can locator repair be performed safely without introducing additional flakiness?&lt;/li&gt;
&lt;li&gt;how much CI context should be exposed to agents before signal becomes noise?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  next steps
&lt;/h2&gt;

&lt;p&gt;Current roadmap items include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;testng support&lt;/li&gt;
&lt;li&gt;richer failure diagnostics&lt;/li&gt;
&lt;li&gt;ai-friendly summaries&lt;/li&gt;
&lt;li&gt;sarif output&lt;/li&gt;
&lt;li&gt;environment validation ("doctor" command)&lt;/li&gt;
&lt;li&gt;locator repair suggestions&lt;/li&gt;
&lt;li&gt;deeper agent integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project is still in its early stages, but the objective is simple:&lt;br&gt;
Build better tooling around the gap between test execution and actionable failure diagnostics for Playwright Java teams.&lt;/p&gt;

&lt;p&gt;I'm curious how other teams running Playwright Java at scale are approaching these problems&lt;/p&gt;

</description>
      <category>ci</category>
      <category>playwright</category>
      <category>java</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
