<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sara Bezjak</title>
    <description>The latest articles on DEV Community by Sara Bezjak (@sara_bezjak).</description>
    <link>https://dev.to/sara_bezjak</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3841314%2F131c4099-8b3d-474d-9357-5d72e88c9e5d.png</url>
      <title>DEV Community: Sara Bezjak</title>
      <link>https://dev.to/sara_bezjak</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sara_bezjak"/>
    <language>en</language>
    <item>
      <title>A QA engineer's first AI testing project - FastAPI + local LLM + pytest</title>
      <dc:creator>Sara Bezjak</dc:creator>
      <pubDate>Fri, 24 Apr 2026 11:15:15 +0000</pubDate>
      <link>https://dev.to/sara_bezjak/a-qa-engineers-first-ai-testing-project-fastapi-local-llm-pytest-5b1c</link>
      <guid>https://dev.to/sara_bezjak/a-qa-engineers-first-ai-testing-project-fastapi-local-llm-pytest-5b1c</guid>
      <description>&lt;p&gt;I'm an automation engineer that writes mostly UI tests with some API sprinkled in. A recruiter wrote to me about an interesting job - AI/LLM testing. I was curious to learn more so I asked the model itself: what skills do I need to learn? The answer was this project.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is it
&lt;/h2&gt;

&lt;p&gt;A FastAPI service with one endpoint (&lt;code&gt;/ask&lt;/code&gt;) that forwards a question to a local LLM (Ollama running llama3.2) and returns the answer. Plus a pytest suite.&lt;/p&gt;

&lt;p&gt;~90 lines of app code, 23 tests, 100% coverage, two-tier test split (fast &amp;lt;1s, full ~90s).&lt;/p&gt;

&lt;p&gt;The point was to learn what AI testing actually looks like compared to UI/API testing.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/sbezjak/llm-api-testing" rel="noopener noreferrer"&gt;https://github.com/sbezjak/llm-api-testing&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One honest thing up front.&lt;/strong&gt; The suite worked first try. That made it harder to learn from, not easier — when nothing breaks, you don't have to understand it. I spent more time reading the code than I would have spent writing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Process timeline
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Read every line before running anything.&lt;/strong&gt; Docs, code, tests, setup. I wanted the big picture — classes, endpoints, test structure in my head before I touched anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Ask questions instead of copy-pasting.&lt;/strong&gt; It's easy to create something that passes. It's harder to understand why it does. I spent 2 hours just discussing the project with the model. Questions like: Why 70% and not 100%? What does &lt;code&gt;ASGITransport&lt;/code&gt; actually do? Why does &lt;code&gt;ConnectError&lt;/code&gt; map to 503 and HTTP errors to 502? Why mock at all with &lt;code&gt;respx&lt;/code&gt;? What's xfail and why is it used like this? What's temperature?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Ran it. All passed.&lt;/strong&gt; But "10 passed in 99s" wasn't enough. I wanted to see which tests hit the model, how long each took, what the model actually answered. So I added structured logging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /ask verdict=allowed status=200 elapsed=0.42s answer='Paris.'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a &lt;code&gt;pytest-html&lt;/code&gt; report with per-test captured logs. Now every test run is a document I can read.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Iterate with the model.&lt;/strong&gt; Added logs, reports, comments. Asked about code I didn't understand — why something was there, what a piece did. This is where the differences between UI and AI testing started to click. Probabilistic vs deterministic. The 70% Paris case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Make it production-ish.&lt;/strong&gt; Asked how a real team would harden this. Mocking Ollama and 100% coverage were added in this step.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing that actually clicked — probabilistic vs deterministic
&lt;/h2&gt;

&lt;p&gt;The consistency test sends "What is the capital of France?" ten times and asserts ≥70% of answers contain "paris".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;answers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;answers&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paris&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In UI testing, same input produces the same output. You assert on exact values. &lt;code&gt;assert button.opens_modal() == True&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;LLMs don't work like that. Same prompt, different valid answers every call — "Paris.", "The capital is Paris.", a paragraph about French geography. The model samples from a distribution. There is no single right string.&lt;/p&gt;

&lt;p&gt;So you assert on properties of the distribution, or on the envelope of acceptable answers. &lt;code&gt;assert ≥70% of answers contain "paris"&lt;/code&gt;. 70% is arbitrary — high enough to catch regressions, low enough to tolerate the model's variance. In a real system you'd tune per prompt. &lt;/p&gt;

&lt;p&gt;Point vs region. Four years of UI-testing instincts took a while to shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three bugs and what they taught me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bug 1 — latency test failing at 35s.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First thought: my M1 is slow. Then I ran &lt;code&gt;ollama run llama3.2 "say hi"&lt;/code&gt; directly in the terminal — instant. So the model was fine.&lt;/p&gt;

&lt;p&gt;llama3.2 is chatty. Asking "string" produced an essay on null-termination and Unicode. The 35 seconds was generation time, not system latency.&lt;/p&gt;

&lt;p&gt;Fix: &lt;code&gt;"options": {"num_predict": 200}&lt;/code&gt; to cap output tokens. Warm requests dropped to 1-3 seconds.&lt;/p&gt;

&lt;p&gt;Lesson: traditional APIs return what you ask for. LLMs return what they feel like returning. Latency tests measure output length unless you constrain it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 2 — coverage stuck at 85%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cause: no test exercised Ollama failure paths.&lt;/p&gt;

&lt;p&gt;Fix: three mocked tests with &lt;code&gt;respx&lt;/code&gt; — unreachable → 503, Ollama 5xx → 502, empty response → 502. Coverage hit 100%. New tests run in &amp;lt;50ms each because no real model is involved.&lt;/p&gt;

&lt;p&gt;Lesson: check coverage reports. Gaps usually point at untested failure modes, not untested happy paths.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 3 — moderation filter false positives.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The moderation filter is a substring blocklist — a Python list of phrases like &lt;code&gt;"how to kill"&lt;/code&gt;, &lt;code&gt;"how to hack"&lt;/code&gt;, etc. Any question containing one gets refused with a 400. Simple: &lt;code&gt;"how to kill a process on linux"&lt;/code&gt; contains &lt;code&gt;"how to kill"&lt;/code&gt;, so a normal dev question gets blocked.&lt;/p&gt;

&lt;p&gt;Fix: added the false positive to the benign dataset with &lt;code&gt;pytest.mark.xfail&lt;/code&gt; and a written reason. The test now runs, fails as expected, and shows as a yellow dot in the report instead of red. Documented in the suite itself.&lt;/p&gt;

&lt;p&gt;It flips to green the day the substring is replaced with a real classifier — a model that understands &lt;em&gt;intent&lt;/em&gt; ("is this user actually trying to cause harm?") instead of just matching strings. That could be a small fine-tuned model, an open-source moderation model like Llama Guard, or a commercial moderation API. The upgrade closes the false-positive gap, the test starts passing, and &lt;code&gt;xfail(strict=False)&lt;/code&gt; signals "unexpectedly passed" — the cue to remove the marker.&lt;/p&gt;

&lt;p&gt;Lesson: xfail makes the suite record what's broken, not just what works. I'd only used xfail for flaky tests before, not as living documentation of known bugs. Much better than hiding a bug in a backlog ticket.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I still don't fully understand
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The ASGI internals &lt;code&gt;ASGITransport&lt;/code&gt; relies on. I know what it does, not what's happening inside.&lt;/li&gt;
&lt;li&gt;When &lt;code&gt;respx&lt;/code&gt; is the right call vs building a proper fake.&lt;/li&gt;
&lt;li&gt;Embedding similarity math beyond "cosine measures angle."&lt;/li&gt;
&lt;li&gt;What a real production eval harness looks like.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  From a QA perspective
&lt;/h2&gt;

&lt;p&gt;Most UI-testing instincts didn't transfer. Equality assertions, fixed latency thresholds, asserting a single correct outcome — all had to shift.&lt;/p&gt;

&lt;p&gt;What did transfer: discipline around edge cases, thoughts about what happens when the upstream service dies, care about keeping the feedback loop fast, coverage reports.&lt;/p&gt;

&lt;p&gt;Setting up a local model was new. Using it as a dependency in a test suite was new. Testing something that returns different valid outputs every call was new. If you're a QA engineer looking at this direction — the probability side is the new thing. The rest is still testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to run it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install &amp;amp; start Ollama&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama
ollama serve             &lt;span class="c"&gt;# leave running in its own terminal&lt;/span&gt;
ollama pull llama3.2     &lt;span class="c"&gt;# in another terminal&lt;/span&gt;

&lt;span class="c"&gt;# Python env&lt;/span&gt;
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Run the API&lt;/span&gt;
uvicorn app.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;span class="c"&gt;# → http://localhost:8000/docs&lt;/span&gt;

&lt;span class="c"&gt;# Tests&lt;/span&gt;
pytest &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"not ollama"&lt;/span&gt;   &lt;span class="c"&gt;# fast tier, no Ollama needed, ~1s&lt;/span&gt;
pytest                   &lt;span class="c"&gt;# full suite with HTML reports&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;When you're testing robustness (did the system stay well-behaved?) instead of correctness (did the right thing happen?), you assert the shape of acceptable failure, not the shape of success. AI systems fail in more ways, so the distinction matters more — a 500 is always a bug; anything else might be correct behavior for an edge case.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/sbezjak/llm-api-testing" rel="noopener noreferrer"&gt;https://github.com/sbezjak/llm-api-testing&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next up — 5 more projects on the list: eval harness, RAG with observability, red-team suite, agent testing, model benchmarking. Writing each one up as I go.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>python</category>
      <category>qa</category>
    </item>
    <item>
      <title>AI Tools for Existing Playwright + Pytest Frameworks: What Actually Works</title>
      <dc:creator>Sara Bezjak</dc:creator>
      <pubDate>Thu, 26 Mar 2026 16:02:24 +0000</pubDate>
      <link>https://dev.to/sara_bezjak/ai-tools-for-existing-playwright-pytest-frameworks-what-actually-works-3jen</link>
      <guid>https://dev.to/sara_bezjak/ai-tools-for-existing-playwright-pytest-frameworks-what-actually-works-3jen</guid>
      <description>&lt;h2&gt;
  
  
  Purpose
&lt;/h2&gt;

&lt;p&gt;Research and evaluate AI-powered tools and workflows to improve test automation efficiency, specifically for test creation speed and reducing maintenance time when UI or business flows change. Focus on tools compatible with an existing Playwright + pytest (Python) stack and IntelliJ IDE.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Workflow &amp;amp; Pain Points
&lt;/h2&gt;

&lt;p&gt;The two primary pain points in test automation are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating new tests:&lt;/strong&gt; Requires manually assembling context (page objects, fixture patterns, example tests) and writing tests that match existing conventions. The copy-paste workflow works but is slow and repetitive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Updating tests when UI or flows change:&lt;/strong&gt; When the product changes, tests break. Diagnosing which tests are affected, understanding what changed, and fixing them to match the new behavior consumes significant time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools Evaluated
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Claude Code (Anthropic) — Recommended
&lt;/h3&gt;

&lt;p&gt;Claude Code is a terminal-based AI coding assistant that works with your entire codebase as context. It integrates with IntelliJ via a plugin (currently in beta) and can read, generate, and modify files directly in the project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Works in IntelliJ via plugin or integrated terminal. No IDE switch required.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reads the full repository — page objects, fixtures, test files so generated code matches existing patterns and conventions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Supports a CLAUDE.md configuration file in the project root which contains definitions of framework conventions, naming patterns, fixture usage, and domain context. This ensures output is framework-specific and not generic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Suggests changes via IntelliJ's native diff viewer, making review and approval straightforward.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shares IDE diagnostics (lint errors, syntax issues) automatically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Available on Pro plan ($20/month), which is sufficient for regular usage.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Used for:&lt;/strong&gt; Generated a change billing test using Claude Code with full project context. The output followed existing page object patterns, used the correct fixtures, and required minimal manual adjustment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Playwright MCP (Model Context Protocol)
&lt;/h3&gt;

&lt;p&gt;Playwright MCP is a server that gives AI tools live browser access. Instead of manually inspecting the DOM for selectors or using codegen tools, Claude Code can navigate the application, interact with elements, and read the actual page structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Useful for:&lt;/strong&gt; Discovering selectors on new or changed pages without manually opening DevTools / Codegen. Especially valuable when new UI elements are added as part of feature changes. Requires guidance on which flow to walk through (natural language instructions).&lt;/p&gt;

&lt;h3&gt;
  
  
  Playwright Agents (Planner / Generator / Healer) — Not Compatible Yet
&lt;/h3&gt;

&lt;p&gt;Playwright v1.56 introduced three AI agents that can generate test plans, create test code, and automatically fix broken tests. The Healer agent is particularly interesting for maintenance. It replays failing tests, inspects the live UI, and patches selectors or waits.&lt;/p&gt;

&lt;p&gt;However, these agents currently only support TypeScript/JavaScript. There is an open feature request for Python support but no timeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor — Viable Alternative
&lt;/h3&gt;

&lt;p&gt;Cursor is an AI-powered IDE (VS Code-based) that provides full codebase context and inline AI editing. Comparable to Claude Code in capabilities for test generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disadvantage:&lt;/strong&gt; Requires switching from IntelliJ to a VS Code-based editor, which means losing existing IDE configuration, shortcuts, and debugging setup. The functionality overlap with Claude Code did not justify the migration cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Platform-Based Tools (Testim, Mabl, Katalon, ContextQA)
&lt;/h3&gt;

&lt;p&gt;These are full test automation platforms with AI features including self-healing selectors, test generation from natural language, and visual test builders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not recommended because:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;They require adopting their platform and abandoning your existing framework.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generated test code is generic and does not match existing page object structure, fixture patterns, or naming conventions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You lose domain-specific knowledge already embedded in your current test suite.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migrating away from a platform later is expensive.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Qase Aiden
&lt;/h3&gt;

&lt;p&gt;Evaluated previously and joined a live demo call. Generates test code but it is generic and does not adapt to codebase patterns. Same limitation as the platform tools above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Completed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Installed Claude Code CLI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Set up Playwright MCP server for live browser access during test creation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Created CLAUDE.md in project root with framework conventions, project structure, page object patterns, fixture descriptions, test naming conventions, and domain context&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Successfully generated a test using Claude Code with full project context — output matched existing framework patterns&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Next steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Continue using Claude Code for upcoming test generation (simple vs complex tests and comparison between them)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Claude Code for upcoming test maintenance and updates to measure time savings vs manual approach&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Continue monitoring Playwright Agents for Python support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Research and write about Javascript agent healers&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;After evaluating the available tools, the best results came from bringing AI into the existing codebase rather than switching to a new platform. The md file made the biggest difference — once the framework conventions were clearly described, the generated code matched existing patterns consistently. There's a clear improvement in speed for both test creation and maintenance, but it still requires human guidance, architectural thinking, and review. It's a powerful assistant, not a replacement, but one wonders what else it will be capable of in the future.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm a solo QA automation engineer and founder based in Slovenia. I build test frameworks, evaluate tooling, and write about what actually works in QA. Find me on &lt;a href="https://www.linkedin.com/in/sara-bezjak/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
