DEV Community: Nevena

Why the Design File No Longer Makes Sense

Nevena — Wed, 06 May 2026 08:24:33 +0000

For years, the design-to-code workflow has relied on a quiet assumption: design lives somewhere else, a separate file, a separate platform, and gets translated, by hand, into what actually ships.

Guzman Iglesias, Senior designer at DataArt, challenges that assumption. Tools like pencil.dev don’t just improve the bridge between design and engineering. They raise a more direct question: should that gap exist at all?

Introduction

AI-generated design is no longer impressive. It's expected. What matters now is everything around it: how responsibilities shift, how craft gets redefined, and how tools like pencil.dev change the way digital products get built.

Closing the Design-Code Gap

This convergence didn't happen overnight. It's the result of a steady realignment between design and engineering. Figma signaled the shift early, adopting vocabulary long familiar almost exclusively to developers: design tokens, component systems, variables with conditionals, props, auto-layout, branches. The innovation was to adopt the devs’ language. The vocabulary changed first. The workflow followed.

Handoff improved next. Figma absorbed Zeplin's core use case by letting developers inspect designs, copy CSS values, and export assets without ever leaving the file. Dev Mode pushed it further, turning the design file into a live technical specification. The implication was clear: the Figma file shouldn't just describe the product. It should define it.

The Translation Problem

Even so, handoff is still a symptom. It reveals that design and code remain separate systems, connected through conventions and manual effort. The entire workflow rests on an unquestioned assumption: the design lives in an independent document that is eventually translated into code. That separation comes at a cost. Translation has always been expensive, lossy, and a consistent source of friction.

Recent updates pushed further. Figma's MCP allows AI agents to read design files and generate code directly. But automating a broken process doesn't fix the underlying model: there's still a separate document that needs to be interpreted and converted.

Figma Canvas has since gone even further: a new use_figma write tool lets agents create components, define variables, and modify the canvas directly. generate_figma_design can convert a live UI into editable design layers. The loop is now genuinely bidirectional. But the model hasn’t changed. The Figma file still lives outside the repository. Every sync, in either direction, is still an explicit, manual, or agentic operation. Design and code can talk to each other. They just don't live together.

So the real question is: why maintain design in a separate platform when the product itself lives in the repository?

.pen

A .pen file, the core format behind pencil.dev, is just JSON. It lives alongside the code, versioned with Git. It doesn't require export, sync, or a subscription to an external platform to stay current. The design stops being a parallel artifact that someone has to maintain separately and becomes just another file in the project, with the same history, branches, and review process.

That's the real difference from Figma. A Figma file is still a vector document that needs to be translated. A .pen file is closer to code than it is to a mockup. Handoff doesn’t disappear. It becomes unnecessary. The AI has simultaneous access to both the design and the codebase in the same repository, making inconsistencies visible and fixable in the same place without context switching.

Collaboration: Pencil's Weak Spot

Figma's real strength wasn't just the design tools. It was also the collaboration layer. Live cursors, contextual comments, multiple people in the same file at the same time. The Google Docs model applied to design. For mixed teams that include non-technical stakeholders, that's not a nice-to-have. It's the whole workflow.

pencil.dev takes a different approach. Its collaboration model is Git's: branches, commits, pull requests. Asynchronous, structured, and built for developers.

For some teams, this works. A two-person startup with one technical designer and one engineer can move faster without Figma's overhead. No client in the loop, no stakeholder reviews. A design-engineering team already used to reviewing UI in pull requests loses almost nothing.

For others, it's a blocker. An agency presenting work to clients who need to comment on a mockup without touching code has no viable path here. A product team where the PM reviews flows in Figma before anything reaches engineering loses a critical checkpoint. A large org where design is a separate function with its own review cadence would need to renegotiate how decisions get made before switching tools.

This isn't a question of which model is better. It's whether a team can absorb the collaboration tradeoff without disrupting how they already work.

Governance: Who Owns the Design?

When design and code become the same artifact, ownership gets blurry.

A developer fixes a bug, tweaks some padding, and merges. A week later, a designer notices the spacing is off. In Figma, that would have been a comment before merge. In a Git-based model, it's a regression discovered after the fact because no one was looking at the .pen diff with a designer's eye.

This is where the Design Engineer gets critical. Someone who understands both domains. Not a new role —pencil.dev doesn't create it— but a more central one. The Design Engineer stops being a specialty hire and becomes the natural reference point for how the team builds.

The practical answer is structural: require one as a reviewer on any PR that touches .pen files. Not optional. Required. Merge rights on visual changes shouldn't default to whoever approves the code. Teams that skip this step don't eliminate the design review; they just move it downstream.

Merging design and code removes the implicit agreements that tools used to enforce. Teams now need to define them explicitly:

Who has a final say over the design?
Who approves a visual change in a pull request?
Who's responsible when design and implementation diverge?

The tool used to answer these questions by default. Now the team does.

Conclusion

The Design Engineer has always been the workaround for a fragmented model. pencil.dev doesn't improve that bridge. It removes the river. It won't fit every team yet. But it reframes the problem in a way that's hard to ignore: If the product lives in the repository, why are we still maintaining the design somewhere else?

*The article was initially published on DataArt's Team blog.

Databricks Data Engineer Associate Exam: What Changed in 2025

Nevena — Mon, 04 May 2026 09:25:29 +0000

Earning the Databricks Data Engineer Associate certification was a valuable milestone for DataArt's Data Engineer, Milka Kutseva. It strengthened her understanding of how Databricks operates in real production environments and improved her confidence when working with data pipelines and Spark.

In this article, Milka breaks down what the exam covers, what changed in the July 2025 update, how she prepared, and the practical tips that proved most useful, along with links to them.

What Is the Databricks Data Engineer Associate Certification?

The Databricks Data Engineer Associate exam evaluates your ability to work with the Databricks platform in real-world scenarios. It focuses on practical skills such as:

Building data pipelines
Working with Delta tables
Using Spark SQL and PySpark
Orchestrating workflows

The certification is beginner‑friendly but hands‑on. Instead of testing abstract theory, it emphasizes applied knowledge and everyday operational tasks.

Basic exam details:

45 multiple-choice questions
90 minutes
Online, proctored format
Focus on real scenarios and practical knowledge

For the most accurate and up-to-date requirements, always refer to the official Databricks website.

What Changed in the July 2025 Exam Update?

In July 2025, Databricks updated the exam content. The new version reflects current platform usage and removes several outdated concepts.

Key changes include:

A stronger emphasis on the Databricks Data Intelligence Platform, beyond the traditional Lakehouse model
More scenario-based questions focused on decision-making and problem-solving
Expanded coverage of modern platform features, including: Delta Live Tables (DLT), Unity Catalog and data governance, Delta Sharing, Serverless compute, and Performance optimization and pipeline orchestration

Because of these changes, older learning materials and practice tests may no longer align with the new exam format. Using updated resources is essential.

How I Prepared for the Exam

Here is the approach that worked best for me.

1. Build a strong foundation with structured online courses

I began with Udemy courses that cover Databricks fundamentals and core data engineering concepts.

Courses I recommend:

What made this step effective:

The content is structured and easy to follow
You get a clear overview of the key topics
Most courses include quizzes to check your understanding

This gave me a solid base before moving into intensive practice.

2. Practice with exam-style questions

After completing the theory, I focused heavily on practice sets.

Resources I used:

I also searched YouTube and filtered results by upload date, since many new practice questions and explanations that match the latest exam format have been shared.

My approach was simple and consistent:

Solve as many questions as possible
Review every wrong answer
Go back to the topic and make sure I understand it

This stage was critical because it showed what the exam really tests and which areas require deeper understanding. [needed to strengthen.]

How Much Time Did Preparation Take

Since I was working on a project at the same time, the preparation took more effort and time. In total, it took me about 4–6 weeks with around 1–2 hours of study per day. If someone can fully focus on the certification (for example, while on the bench), the preparation time can be much shorter.

Tips That Made the Difference

Here are a few things that helped me perform better on exam day:

Always check the official exam page before you start studying
Get hands-on experience with Databricks, even with small examples
Focus on understanding concepts, not memorizing answers
Practice under time pressure to get used to the exam pace
Learn from your mistakes; they teach you the most

Final Thoughts

The Databricks Data Engineer Associate certification is a solid entry point if you want to strengthen your data engineering skills and work more confidently with Databricks.

With the July 2025 update, the exam became more practical and closer to real work scenarios. If you follow a structured study plan, use current resources, and practice consistently, you'll be well prepared.

Stay focused, keep practicing, and approach the exam with confidence. With steady learning and practice, the exam is absolutely achievable. Good luck… and don't stress!

*The article was initially published on DataArt's Team blog.

Where to Start in Data Science: Free Courses from IBM, Google, and Universities

Nevena — Mon, 27 Apr 2026 08:26:45 +0000

Curious about Data Science or considering a career in the field? DataArt curated a list of free online courses from leading global companies and universities. You can audit the content at no cost.

Data Science Foundations by IBM

Duration: 6 months, 3–6 hours per week
Certificate + practical assignments with feedback: $356

Designed for complete beginners, this program requires no programming experience, just a computer and basic digital skills. You will get acquainted with data processing tools and learn how to use some of them, as well as understand the underlying methodologies. You’ll learn how data professionals think, write SQL queries for databases, and master the concepts of relational databases.

The program consists of four courses:

For free access, go to each course page and select the 'Audit Track' option. After completing the program, check out the IBM Data Science program for a deeper dive into Data Science.

Data Science Fundamentals by the University of California

Duration: 4 months, 1 hour per week
Certificate + practical assignments with feedback: $49 per month / $399 per year (Coursera subscription)

This program covers key Data Science concepts, including analytics taxonomy, data mining processes, and diagnostics. You'll explore methods like data engineering, statistical modeling, and machine learning, and learn how to apply them to solve business problems.

Courses included:

To access the theoretical parts for free, visit each course page, click “Enroll for Free,” then select “Audit” at the bottom of the pop-up window.

Data Science (with R) by Harvard University

Duration: 66 weeks
Certificate + practical assignments with feedback: $1,332

This program teaches you how to solve data analysis problems using R. You'll learn basic programming, data visualization with ggplot2, and data wrangling with dplyr. It also covers machine learning, key statistical concepts, and tools like Unix/Linux, Git, GitHub, and RStudio. Real-world case studies help reinforce core Data Science principles.

The program includes nine courses:

Free access: Choose the 'Audit Track' option on each course page.

Python Data Science by IBM

Duration: 6 months, 3–5 hours per week
Certificate + practical assignments with feedback: $517

This five-course program focuses on a career in data science and machine learning. You'll start with Python, then learn data analysis, visualization, and machine learning. The program is hands-on and job-focused, using real-world datasets and industry-standard tools, including Jupyter Notebooks on IBM Cloud.

You'll work with popular Python libraries: pandas, numpy, matplotlib, seaborn, folium, scipy, scikit-learn, and more. By the end, you'll be ready to tackle real data science challenges.

The program includes six courses:

To access the courses for free, go to each course page and select the 'Audit Track' option.

Code Free Data Science by the University of California

Duration: 1 week, 10 hours
Certificate + practical assignments with feedback: $49

A short, focused course for those who want to understand predictive modeling without programming. You'll learn how to analyze trends, evaluate results, and build models using the KNIME Analytics platform. It covers machine learning methods and helps you discover patterns and relationships in data (no coding required).

Google Data Analytics by Google

Duration: 6 months, 10 hours per week
Certificate + practical assignments with feedback: $49 per month / $399 per year (Coursera subscription)

This course introduces data analytics fundamentals, including collecting, organizing, and interpreting data to support informed decision-making. Guided by Google experts, you'll learn data cleaning, problem-solving, critical thinking, and visualization. It also explores data ethics, analytical tools, and real-world scenarios to help you prepare for roles like Junior Data Analyst or database administrator.

8 courses included:

To access the theoretical parts for free, visit each course page, click "Enroll for Free" and then "Audit" at the bottom of the pop-up window.

Data Science Ethics by the University of Michigan

Duration: 1 week at 10 hours a week
Certificate + practical assignments with feedback: $49

This course explores the ethical aspects of data science, including privacy, data collection, and the impact of big data. You'll learn about fairness, transparency, and the importance of user consent, especially when working with metadata and AI systems. Topics include data ownership, responsible data use, and informed consent.

Final Thoughts

Data Science skills are in demand across various industries, from technology and finance to healthcare and media. These courses offer a low-risk opportunity to explore the field, build foundational knowledge, and decide where to go deeper. Starting with an audit track lets you learn at your own pace before committing to a certificate or specialization. Start learning now!

*The article was initially published on DataArt's Team blog.

Stop Fixing Kubectl Typos: Let an AI Agent Handle It

Nevena — Mon, 20 Apr 2026 09:54:05 +0000

What happens if you let an AI read your Kubernetes docs and actually run the commands? Eugene Kiselev, an engineer at DataArt, tried a small experiment: an AI agent scans a messy k8s lab, extracts commands, runs them in a real cluster, fixes errors, and rewrites the docs. The result shows how small and large models behave in real conditions, where they fail, and how a tiny agent can act like a junior engineer keeping labs clean and working.

Lab Environment

To test the idea, I acted as a k8s instructor and created a set of simple labs covering core tasks: creating, scaling, and exposing deployments. Each lab includes up to 20 commands with deliberate typos, wrong flags, labels, and namespaces.

K8s labs are well-suited for this kind of experiment. The environment is safe, so if something goes wrong, I just delete the K3D cluster and start over in under a minute.

Architecture

A small LLM isn't reliable enough to handle the full workflow. The tool is split into components:

Command extraction
Syntax validation
Execution in k8s
Stderr analisys
Iterative fixing based on the results

Beyond syntax errors, the agent also checks that the lab remains logically correct.

Approach and Constraints

No CrewAi or LangChain here. I drew inspiration from articles like Docker in Bash and Docker in Ruby, where engineers took a simple approach to rebuilding enterprise tools. I kept that same spirit here, using plain prompts, Python, and local models.

And yes, I know that Cursor, Antigravity, and Claude can do this. But for me, this is a fun experiment, and the goal is to integrate the tool into a pipeline, not an IDE.

Extraction: Small Ollama-Hosted Models

I started small with Gemma 3:1B, a model running on Ollama on my MacBook. The goal: parse the lab's commands and see what happens. I wrote a simple extraction prompt and ran the extractor against the smallest model. It was fast and cheap, and my MacBook stayed cool. Here are the results from three runs.

====================================================

1 | 10 | True | True | 7.92 | 583

2 | 9 | True | True | 6.75 | 544

3 | 8 | False | True | 7.06 | 575

====================================================

The results were messy. With this level of determinism, prompt-based fixing isn't reliable. We'll need some help from Qwen8b. Unfortunately, Qwen just cooked my MacBook for seven minutes and didn't give me anything useful.

On to Gemma 3:4B. The difference is like night and day: the model identified 16 out of 16 commands across multiple iterations with no syntax errors. It was slower (about 27 seconds instead of 7). Still, the 4B version handled tricky tasks that the smaller model missed, such as escaping nested JSON strings in kubectl patch commands and spotting environment variable dependencies like export NODE_PORT. And this was without adjusting temperature or top_k settings.

====================================================

1 | 16 | True | True | 28.51 | 1237

2 | 16 | True | True | 27.44 | 1202

3 | 16 | True | True | 27.18 | 1183

==================================================

Cool? Yes, kind of.

The next day, I tried extraction again with a tougher prompt, and the model struggled. It was still consistent, but I only got 12 out of 16 right. LLMs, like people, have good and bad days. Keep that in mind!

Let’s set the temperature to 0, to make it less "creative”.

 "options": {
               "temperature": 0.0,
               "num_predict": 2048,
               "top_k": 1,
               "top_p": 0.0,

After I shortened the output and switched to one-shot parsing, the results became consistent again.

=====================================================

1 | 16 | True | True | 29.72 | 1310

2 | 16 | True | True | 27.95 | 1310

3 | 16 | True | True | 27.94 | 1310

===================================================

So, 4b looks like it can work grep | wc -l, and gather the commands from the lab. But is it doing the job correctly?

When I introduced intentional errors like depolments and -o wede, the model fixed the typos on its own and returned the correct commands. That’s impressive, but it doesn’t solve the documentation validation problem.

Giving the model a clear instruction in the prompt helped.

1. **LITERAL COMMAND EXTRACTION**: You are a copy-paste robot. Extract commands EXACTLY as they appear, including all typos. DO NOT FIX THEM.

Did it actually work? Since the model is deterministic, I ran it a few more times to check.

Lab 1

Target (Typo/Error in Lab)	Run 1 Status	Run 2 Status	Run 3 Status	Verdict
sca.le (dot typo)	FAIL (ID missed)	SUCCESS (preserved)	SUCCESS (preserved)	Unstable focus
depolyment (typo)	SUCCESS (preserved)	SUCCESS (preserved)	SUCCESS (preserved)	Stable (expected behavior)
-o json_path (underscore)	FAIL (fixed to jsonpath)	SUCCESS (preserved)	SUCCESS (preserved)	Inconsistent auto-fixing
items[0] → items[*]	FAIL (corrupted)	FAIL (corrupted)	FAIL (corrupted)	Systemic Hallucination
nginx-deployments (plural)	SUCCESS (preserved)	SUCCESS (preserved)	SUCCESS (preserved)	Stable (expected behavior)

Lab 2

Feature / Target	Run 1 Status	Run 2 Status	Run 3 Status	Verdict
crete (typo)	SUCCESS (preserved)	SUCCESS (preserved)	SUCCESS (preserved)	Perfect literal extraction
deploymenst (typo)	SUCCESS (preserved)	SUCCESS (preserved)	SUCCESS (preserved)	Perfect literal extraction
Environment Export	FAIL (skipped) missed export command	FAIL (skipped) missed export command	FAIL (skipped) missed export command	Systemic Context Loss
Multi-command Blocks	FAIL (filtered) missed two commands	FAIL (filtered) missed two commands	FAIL (filtered) missed two commands	Semantic Filtering

Lab2 is a bit larger and, surprisingly, more stable, but still not quite there. It deterministically drops commands from multicommand blocks, picking only what it thinks is most important, and keeps only the kubectl part from commands like export XXX | kubectl YYY.

Extraction: Big Models

Poor extraction hurts the process, no matter how good the next model is. Even clear instructions can be missed by small models. Time to bring in the big guns: Claude or Gemini.

Gemini3-flash-preview from AI Studio gave good results and parsed everything correctly. It followed the prompt and kept the output as-is.

ID	Command Extracted	Status	Extraction Strategy	Logic Preservation
1–2	cluster-info, get nodes	SUCCESS	Atomic: Split one block into 2 entries.	High
3–5	crete, deploymenst, get pods	SUCCESS	Literal: Preserved typos crete and deploymenst.	High
8	jsonpath='{.spec.type}'	SUCCESS	Exact: No attempt to "fix" syntax.	High
12	export NODE_PORT=$(...)	SUCCESS	Contextual: Recognized export as a vital command.	Perfect
13	curl -I ...$NODE_PORT	SUCCESS	Dependency: Kept the variable usage intact.	Perfect
14	get pods ... -o wede	SUCCESS	Literal: Preserved wede typo.	High
15–16	get svc, get endpoints	SUCCESS	Atomic: Did not filter out "verification" steps.	High

Here's the main point: a small model without fine-tuning or clear instructions is a poor parser. It ignores instructions and outputs what's statistically likely rather than what's actually in the text. Even if it knows kubectl syntax well, it's not a reliable extractor. For better results, use a larger model. Gemini-flash is fast, cheap, and good enough for this. The task doesn’t require deep reasoning; just a model that can follow instructions over a medium-sized context.

Syntax Validation: Small Ollama-hosted Models

Now it gets interesting. We have a JSON file from Gemini3 with the extracted commands. Let's run syntax checks and see if the model can spot simple typos and handle tougher cases like wrong labels or missing namespaces.

A simple one-shot attempt showed that the model is good at finding and fixing typos:

"fix": "kubectl create deployment nginx-demo --image=nginx:stable",
 "reason": "Typo: 'crete' should be 'create'",

and

"fix": "kubectl get deployment nginx-demo",
"reason": "Typo: 'deploymenst' should be 'deployment'",

That’s exactly what I wanted. But the results aren’t consistent across all runs and error types. Here’s what happened in a second test:

Error Category	Original Input	Model Response	Result	Analysis
Command Typo	crete deployment	crete deployment	FAILED / PARTIAL	Identified as INVALID, but the fix was identical to the error.
Resource Typo	get deploymenst	get deployment	PASSED	Correctly identified and fixed the typo.
Alias Handling	patch svc	VALID	PASSED	Recognized svc as a legitimate alias for service.
Flag Value	--replicas=-1	--replicas=1	FAILED (Over-fix)	Performed a logical fix instead of a syntax audit.
Output Format	-o wede	(Skipped / Ignored)	FAILED	Completely missed the typo in the output flag.

I have intentionally added a specific case to test model limitations. This is what I have in the test lab:

"kubectl create deployment fail-demo --image=nginx --replicas=-1",

It has replicas -1, which is obviously incorrect, and the model is smart enough to see this. But it doesn't know the correct value: 1, 2, or 11. The model should flag it, not fix it. Gemma corrected it to 1 anyway. The command works, but that's not the point.

When I changed the prompt to separate syntax fixes from semantic ones, the results became inconsistent. The model can’t reliably distinguish between them.

Small models try to help even when it's not needed, a bit like our helpful relatives. We can’t trust their guesses, so it’s time to use larger models again.

Syntax Validation: Big Models

When asked to correct syntax, the big model found typos during extraction without any specific prompting. In the expected outcome field, it is already noted:

"expected_outcome": "Attempts to create a deployment; will fail due to typo 'crete'."

In other words, it extracts and validates syntax simultaneously, something larger models do well.

This changes the setup. Instead of using several validators, we can let Gemini extract and validate, then run the commands in the cluster and check the results. For labs in the 5–8K token range, big models handle this just fine.

We’re going from complicated:

to simple:

The updated output includes intent, suggested fix, and original text for full context:

The model successfully grabbed the commands, suggested fixes, and correctly provided the user's intent.

  "id": 3,
   "tool": "kubectl",
   "command": "kubectl crete deployment nginx-demo --image=nginx:stable",
   "suggested_fix": "kubectl create deployment nginx-demo --image=nginx:stable",
   "original_text": "## Step 1: Create Nginx Deployment\nCreate a basic Nginx deployment to serve as the backend for our services.\n# Create the deployment",
   "intent": "Create a deployment named nginx-demo using the nginx:stable image.",
   "expected_outcome": "[WILL_FAIL] The verb 'crete' is a typo. Kubernetes will return 'Unknown command'."

This looks good, but LLMs can be confidently wrong, so validation is still necessary.

It helps to have the LLM take on another role and check the results of the first step. I asked Gemini Flash to act as a quality assurance lead and review the work.

### ROLE
Infrastructure Quality Assurance Lead.

Claude Sonnet would be the best choice here, since it’s known for handling these tasks well. But Claude isn't available in Google's free AI studio, and Anthropic doesn't offer a similar free option, so I used Gemini Flash again.

The results are promising.

Gemini call completed: 19.43s, Tokens: 5772
{"status": "verified", "missing_commands": []}

On cost, this call was free under AI Studio's free tier, but at market rate, it would run about:

Input cost: Approximately $0.0008658

So, if we have to validate several labs twice a month, it's possible.

Agent: The Tools that Will Run the Commands.

The extractor works, and the validator checks its output. Now it‘s time to create a real agent. The component that will execute commands one by one, read the result, and suggest a fixed command if something went wrong. For the first iteration, we won't implement complex retries or chains. The basic architecture is the following.

To make things more interesting, I introduced not only syntax errors but also logical ones. Which I expect will need some effort.

Take lab1:Some commands are missing the namespace key. The command itself is valid, but it fails because it runs in the wrong namespace.

The expected behavior for an agent to add the missing "create ns" command, and a -n flag for all commands where it's not presented. It's a bit more complex than just fixing typos, so let's see. As an alternative, I expect the agent to cut corners and remove the -n flag, using the default namespace for everything. We'll use prompting to steer it away from that shortcut.

Here's the full list of errors in lab 1:

Error Category	Specific Token / Issue	Description	Reference (ID)
Syntax (Typo)	depolyment	Typo in the Kubernetes resource noun.	#3
Syntax (Typo)	sca.le	Illegal character (dot) within the kubectl verb.	#6
Syntax (Flag)	-o json_path	Incorrect flag naming; used an underscore instead of the standard jsonpath.	#8
Logical (Namespace)	Missing `-n test`	Command is valid, but targets the default namespace instead of the previously created test environment.	#4, #7, #9, #10
Logical (Pre-requisite)	Namespace not found	Attempting to create a deployment in a namespace that doesn't exist yet.	#3 (Attempt 1)
Logical (Selector)	app=nginx-deployments	Misalignment between the label defined in the deployment and the label used in the query (plural vs. singular).	#12

To guide the agent toward the right fixes (for example, add namespace, not delete -n flag), the following rules were added to the prompt:

### STRATEGIC RULES

1. **Contextual Continuity**: Resources in Kubernetes are scoped (Namespaced or Clustered). Analyze the Execution History to identify the scope where target resources were previously managed. Ensure the fixed command operates within that same scope.

2. **Environmental Pre-requisites**: If a command fails due to a missing environmental object (as indicated by STDERR), prepend necessary commands to establish the required state before executing the main task.

3. **Syntax Integrity**: Correct structural errors, misspellings of subcommands, or malformed flags while strictly preserving the logic defined in the Target Intent.

4. **Selector Alignment**: For commands involving filtering (labels, selectors), cross-reference the history to ensure identifiers match those of the actual resources in the cluster.

5. **Minimal Disruption**: Do not modify values (images, replicas, names) that are syntactically correct and aligned with the Target Intent, even if the command failed for other reasons.

For namespace and label issues, the agent can't work atomically and relies solely on the command's execution result. The LLM should know that the namespace was previously created, and that the commands already have the -n test flag added.

The LLM should know that during the previous step, a pod named CAT was created, so verifying pod DOG is wrong. For small labs, feeding the full execution history works. To save tokens, passing only the last successful commands is enough. Providing everything risks the model getting confused and trying to fix its own previous fixes.

Note: It's not worth using a local model here because the context is too complex. The model needs to know the command, the intent, what's broken, and what happened before. That's too much for a small 4B model.

And let's finally run it and see what will happen. The agent created the namespace, fixed the typo, deployed, and added the -n flag:

   "original_command": "kubectl create depolyment nginx-deployment --image=nginx:1.14.2 -n test",
   "attempts": [
     {
       "cmd": "kubectl create deployment nginx-deployment --image=nginx:1.14.2 -n test",
       "result": "failed",
       "error": "error: failed to create deployment: namespaces \"test\" not found"
     },
     {
       "cmd": "kubectl create namespace test && kubectl create deployment nginx-deployment --image=nginx:1.14.2 -n test",
       "result": "success",
       "stdout": "namespace/test created\ndeployment.apps/nginx-deployment created"
     }
   ],
   "status": "success",
   "semantic_verification": "Verified: Command executed successfully."

Even better, the agent started adding -n test to all subsequent commands that were missing the flag. It also fixed other issues.

  "cmd": "kubectl get pods -l app=nginx-deployment -o jsonpath='{.items[0].spec.containers[0].image}'",
       "result": "failed",
       "error": "error: error executing jsonpath \"{.items[0].spec.containers[0].image}\": Error executing template: array index out of bounds: index 0, length 0. Printing more information for debugging the template:\n\ttemplate was:\n\t\t{.items[0].spec.containers[0].image}\n\tobject given to jsonpath engine was:\n\t\tmap[string]interface {}{\"apiVersion\":\"v1\", \"items\":[]interface {}{}, \"kind\":\"List\", \"metadata\":map[string]interface {}{\"resourceVersion\":\"\"}}"
     },
     {
       "cmd": "kubectl get pods -l app=nginx-deployment -n test -o jsonpath='{.items[0].spec.containers[0].image}'",
       "result": "success",
       "stdout": "nginx:1.14.2"

And handled rate limits correctly.

One case that both the extraction and fixer parts missed is 14.

 "cmd": "kubectl get pods -l app=nginx-deployment -o jsonpath='{.items[*].spec.containers[0].image}'",
       "result": "success",
       "stdout": ""

As I mentioned earlier, the command ran “successfully” in the wrong namespace, where there was nothing to fix. This fix doesn't need an LLM: if a get or describe command returns no output with a zero exit code, route it to the fixer. This simple logic change works. For more complex scenarios, a semantic validation layer that queries the cluster and verifies the command actually did what it was supposed to do would help, but that's out of scope for this agent.

Agent Security

At the end of execution, something unexpected happened:

 "cmd": "kubectl get deployments nginx-deployment",
       "result": "failed",
       "error": "Error from server (NotFound): deployments.apps \"nginx-deployment\" not found"
     },
     {
       "cmd": "Error: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. \n* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-3-flash\nPlease retry in 26.365091249s. [links {\n  description: \"Learn more about Gemini API quotas\"\n  url: \"https://ai.google.dev/gemini-api/docs/rate-limits\"\n}\n, violations {\n  quota_metric: \"generativelanguage.googleapis.com/generate_content_free_tier_requests\"\n  quota_id: \"GenerateRequestsPerDayPerProjectPerModel-FreeTier\"\n  quota_dimensions {\n    key: \"model\"\n    value: \"gemini-3-flash\"\n  }\n  quota_dimensions {\n    key: \"location\"\n    value: \"global\"\n  }\n  quota_value: 20\n}\n, retry_delay {\n  seconds: 26\n}\n]",
       "result": "failed",
       "error": "/bin/sh: Error:: command not found\n/bin/sh: line 1: __pycache__: command not found\n/bin/sh: line 2: Please: command not found\n/bin/sh: line 3: description:: command not found\n/bin/sh: line 4: url:: command not found\n/bin/sh: -c: line 5: syntax error near unexpected token `}'\n/bin/sh: -c: line 5: `}'"
     }

One more answer to the question "What happens when I let an LLM generate output and have the agent run commands on my cluster?” I expected missed syntax errors and odd attempts, but this time something else happened.

AI Studio hit a limit, and the agent took the error message and tried to run it as a shell command. Luckily, it wasn’t something dangerous like rm -rf /, so it was just funny, but we need to be careful. The agent should have a validation layer. The simplest way is to use a strict whitelist of allowed commands. Anything else gets dropped with a warning. For this test, I used a clean, small, and isolated K3D cluster.

Fixing the Docs

After validating and fixing the commands, the last step is updating the docs. We should get the results, regenerate the lab, have fixes in place, warnings where the agent can't help, and expect human intervention. I used Gemini again, and it worked well. It had the list of correct commands, the fixed commands, and the original lab markdown file, and made careful replacements.

To prevent the model from getting creative, I gave Gemini one job: act as a "sophisticated sed". No new content, no rewrites, just make precise replacements.

### STRATEGIC RULES
1. **Precision Replacement**: Locate the exact code blocks containing failing commands. Replace the content of those code blocks with the verified successful command(s).
2. **Preserve Context**: Do not change the surrounding text, headers, or explanations in the Markdown document unless they directly conflict with the new command logic.
3. **Markdown Integrity**: Ensure the final output is a valid Markdown document with correctly formatted code blocks.
4. **Minimal Disruption**: Only modify the commands that were identified as fixed. Successful original commands should remain untouched.

Looks like it successfully fixed errors and saved the corrected lab to the right place.

docs-validator diff labs/lab1.md labs/lab1_fixed.md
< kubectl create depolyment nginx-deployment --image=nginx:1.14.2 -n test
---
> kubectl create namespace test && kubectl create deployment nginx-deployment --image=nginx:1.14.2 -n test
17,18c17,18
< kubectl get deployments nginx-deployment
< kubectl get pods -l app=nginx-deployment -o jsonpath='{.items[0].spec.containers[0].image}'
---
> kubectl get deployments nginx-deployment -n test
> kubectl get pods -l app=nginx-deployment -n test -o jsonpath='{.items[0].spec.containers[0].image}'
25c25
< kubectl sca.le deployment nginx-deployment --replicas=2
---
> kubectl scale deployment nginx-deployment --replicas=2 -n test
28,29c28,29
< kubectl get deployments nginx-deployment
< kubectl get pods -l app=nginx-deployment -o json_path='{.items[0].spec.containers[0].image}'
---
> kubectl get deployments nginx-deployment -n test
> kubectl get pods -l app=nginx-deployment -n test -o jsonpath='{.items[0].spec.containers[0].image}'
36c36
< kubectl set image deployment/nginx-deployment nginx=nginx:1.16.1
---
> kubectl set image deployment/nginx-deployment nginx=nginx:1.16.1 -n test
39,41c39,41
< kubectl rollout status deployment/nginx-deployment
< kubectl get deployments nginx-deployment
< kubectl get pods -l app=nginx-deployments -o jsonpath='{.items[0].spec.containers[0].image}'
---
> kubectl rollout status deployment/nginx-deployment -n test
> kubectl get deployment nginx-deployment -n test
> kubectl get pods -l app=nginx-deployment -n test -o jsonpath='{.items[0].spec.containers[0].image}'
49c49

This isn't a magic "fix everything" tool. But it's a working concept, at least for simple tasks. Those that don't need deep thinking, just someone (or something) to add -n test to every kubectl command. We don’t let agents commit directly to main, but there’s no reason they can’t open a merge request for a human to review.

Conclusion

This isn’t a complete solution, but it proves the concept.

After a few weekends of coding and testing, here’s my verdict. This "baby agent," built without heavy frameworks or complex graphs, can already:

Extract commands from messy Markdown with high reliability.
Auto-fix syntax errors on the fly.
Resolve logical/contextual issues (~80% success rate) by "remembering" the environment.
Regenerate corrected labs for human review, closing the feedback loop.
Free up engineer time for more interesting tasks (like building a better agent) instead of manual toil.

Yes, the scenarios are simple for now, and there aren't many commands. We still have issues like 'Semantic Silence' and rate-limit errors to fix. And hitting the quota reminded me again: never unquestioningly trust an agent; always check its output.

But despite the "duct tape and rust" (as seen in my engine model), it actually works. It’s not about building a perfect machine from the start; it’s about making something that can fail, learn, and recover. And honestly, it’s just a lot of fun.

*The article was initially published on DataArt's Team blog.

5 Beginner Python Books Worth Reading

Nevena — Wed, 15 Apr 2026 07:49:26 +0000

Python is one of the most popular programming languages today, especially in the age of AI. Below are five highly rated Python books that offer strong fundamentals, practical experience, and clear explanations for anyone starting out or strengthening their grasp of the basics.

1. Python Programming for the Absolute Beginner by Michael Dawson

As the title suggests, this book is built for readers with no previous programming experience. The third edition includes updated content and expanded coverage for modern Python features.

It starts with core fundamentals and gradually introduces more complex topics, including data structures, file handling, exceptions, object-oriented programming, and graphics. Visual examples and step-by-step explanations keep the learning curve manageable. By the end, you'll be able to build your own games from scratch using Python!

Best for: True beginners who want a structured, confidence-building introduction.

2. Python Crash Course by Eric Matthes

This book offers a hands-on introduction to Python for beginners eager to start writing useful code quickly.

Along the way, you'll work with libraries and tools such as Pygame, Matplotlib, Plotly, and Django. Core concepts, including variables, lists, classes, and loops, are covered early through engaging exercises. Later chapters guide you through building interactive programs, testing code, and developing a 2D arcade-style Space Invaders game.

Best for: Learners who want to move fast and learn through building.

3. Automate The Boring Stuff With Python by Al Sweigart

This best-selling book teaches Python 3 through practical examples, making it ideal for beginners.

You'll learn how to write programs that automate repetitive tasks, saving hours of manual work like file processing, data manipulation, and basic web interactions. Once you grasp the basics, you'll be able to build scripts that handle useful automation jobs with ease.

The book covers both basic and advanced data structures. Each chapter includes an introduction, a case study, tips, key library methods, and exercises to reinforce learning.

Best for: Beginners motivated by immediate, real-world use cases.

4. Head-First Python by Paul Barry

Known for its visual and interactive style, this book is a popular choice for quickly learning Python basics, including built-in functions and data structures. It covers the basics first, then builds toward more advanced topics, such as creating web applications, handling exceptions, and others.

Author Paul Barry, a lecturer at the Institute of Technology in Carlow, Ireland, brings over a decade of IT industry experience to his teaching, which is evident in his clear explanations and approachable tone.

Best for: Visual learners who prefer a less traditional textbook format.

5. Learn Python the Hard Way by Zed Shaw

This book takes an exercise-driven approach, covering key topics like organizing code, math, variables, loops, logic, packaging, automated testing, and game development.

Guided through 52 exercises designed to be typed out manually, you’ll learn by doing and correcting your own mistakes. You'll gain a solid understanding of how programs work, learn how to read and write code, and develop effective debugging skills.

An additional free support course is available through the author's website.

Best for: Learners who prefer repetition and learning through trial and error.

Conclusion

Each of these five books offers practical ways to learn Python, each from a different angle, whether through structured lessons, hands-on projects, automation tasks, or disciplined practice. Choosing the right one depends on how you learn best and what you want to build. Pick the one that fits your goals, stay consistent, and start coding!

*The article was initially published on DataArt's Team blog.

Code Testing Fundamentals: How to Do It Right

Nevena — Mon, 09 Mar 2026 09:00:01 +0000

DataArt's Senior Developer Alexey Klimenko explains why testing matters and how to approach it in practice. This guide covers core concepts, test types, working strategies, best practices, and a risk-based mindset to help teams make testing a natural part of engineering culture.

Why Do We Need Tests?

Testing often gets buried under buzzwords: coverage, reports, pipelines, TDD debates. Strip that away, and the idea is simple. Tests exist to give us confidence.

A clear testing strategy delivers tangible benefits:

Higher product quality, with fewer production bugs and hidden issues
Fewer regressions, reducing stress when shipping new features
Lower long-term costs, since refactoring and fixes become safer and faster
Reduced business risk from broken core flows

In essence, tests create a safety net. They make growth possible without turning every change into a gamble.

Understanding Test Types:

Instead of memorizing labels, it helps to look at testing from three perspectives:

By level — what exactly we're testing
By approach — how we write and run the tests
By goal — what this particular test is meant to cover

By Level: From Unit to E2E (End-to-End)

Unit Tests

Unit tests validate small, isolated pieces of logic: functions, utilities, methods.

A good unit test is fast, independent of the database/network/timing, and focused on a specific behavior.

Example:

// utils/calcDiscount.js
export function calcDiscount(price, percent) {
  if (percent < 0 || percent > 100) {
    throw new Error('Invalid percent');
  }
  return price - (price * percent) / 100;
}

// calcDiscount.test.js (Jest)
import { calcDiscount } from './calcDiscount';

describe('calcDiscount', () => {
  it('applies percentage discount', () => {
    expect(calcDiscount(100, 10)).toBe(90);
  });

  it('throws on invalid percent', () => {
    expect(() => calcDiscount(100, 150)).toThrow('Invalid percent');
  });
});

Component Tests

Component tests focus on UI components in isolation: different props, states, and events.

Example:

// components/Counter.jsx
export const Counter = ({ initial = 0 }) => {
  const [value, setValue] = React.useState(initial);

  return (
    <div>
      <span aria-label="value">{value}</span>
      <button onClick={() => setValue(value + 1)}>+</button>
    </div>
  );
}

// Counter.test.jsx (React Testing Library + Jest)
import { render, screen, fireEvent } from '@testing-library/react';
import { Counter } from './Counter';

it('increments value when user clicks plus', () => {
  render(<Counter initial={1} />);

  const value = screen.getByLabelText('value');
  const button = screen.getByText('+');

  expect(value).toHaveTextContent('1');

  fireEvent.click(button);
  expect(value).toHaveTextContent('2');
});

Integration Tests

Integration tests verify how multiple modules of the system work together: controller + validator, component + API(mocked).

Example (a hypothetical service + external client):

// services/userService.js
export async function createUser(userData, { userRepo, emailService }) {
  const user = await userRepo.save(userData);
  await emailService.sendWelcome(user.email);
  return user;
}

// userService.integration.test.js
import { createUser } from './userService';

it('creates user and sends welcome email', async () => {
  const savedUsers = [];
  const sentEmails = [];

  const userRepo = {
    save: async (userData) => {
      savedUsers.push(userData);
      return { id: 1, ...userData };
    },
  };

  const emailService = {
    sendWelcome: async (email) => {
      sentEmails.push(email);
    },
  };

  const result = await createUser(
    { email: 'test@example.com' },
    { userRepo, emailService }
  );

  expect(result.id).toBe(1);
  expect(savedUsers).toHaveLength(1);
  expect(sentEmails).toContain('test@example.com');
});

E2E (End-to-End) Tests

This is the whole user journey through the system: from the UI down to the database and back. E2E tests are more expensive and slower to maintain, but they give us tremendous confidence that real-world scenarios actually work.

Example:

// e2e/checkout.spec.js
import { test, expect } from '@playwright/test';

test('user can buy a product', async ({ page }) => {
  await page.goto('https://my-shop.example');

  await page.getByText('Fancy Mug').click();
  await page.getByRole('button', { name: 'Add to cart' }).click();
  await page.getByRole('link', { name: 'Cart' }).click();

  await page.getByRole('button', { name: 'Checkout' }).click();
  await page.getByLabel('Card number').fill('4242 4242 4242 4242');
  await page.getByLabel('Expiry').fill('12/30');
  await page.getByLabel('CVC').fill('123');

  await page.getByRole('button', { name: 'Pay' }).click();

  await expect(page.getByText('Thank you for your purchase')).toBeVisible();
});

By Approach: How Tests Are Created

Manual Vs. Automated

Manual — a tester/developer goes through scenarios by hand.

Automated — scenarios are written as code and run in CI.

Manual testing isn't going anywhere, but as a project grows, having an automated "safety layer" becomes increasingly valuable.

TDD (Test-Driven Development)

TDD follows a simple loop:

Write a failing test (red)
Write the minimum amount of code to pass the test (green)
Remove duplication / clean up the code (refactor)

BDD (Behavior-Driven Development)

BDD focuses on a shared understanding of how the system should behave.

BDD-style tests do not have to be a formal BDD process with lots of meetings and Gherkin files. You can use the approach partially, simply as a convenient way to keep your focus on behavior.

Key ideas:

We talk in terms of behavior, not implementation details
We use the Given/When/Then structure
Scenarios are understandable to developers, QA, analysts, and business people
Tests become a form of living documentation

# cart.feature
Feature: Shopping cart
Scenario: User can add an item to the cart. Given the user is on the shop page and the cart is empty
    When the user adds an item to the cart
    Then the cart shows "1 item in cart"

import { Given, When, Then } from '@cucumber/cucumber';
import { expect } from '@playwright/test';

Given('the user is on the shop page', async function () {
  await this.page.goto('https://my-shop.example');
});

Given('the cart is empty', async function () {
  // reset the basket state, e.g. ensure it's empty
});

When('the user adds an item to the cart', async function () {
  await this.page.getByText('Add to cart').click();
});

Then('the cart shows {string}', async function (text) {
  await expect(this.page.getByText(text)).toBeVisible();
});

Exploratory Testing

It relies on curiosity: what happens if…?

Examples:

Rapidly switching between tabs to see if the UI breaks
Clicking a button 10 times a second
Entering unexpected values
Killing the network
Reloading the page in the middle of a request

Exploratory testing often identifies bugs that formal test scenarios miss entirely.

By Goal: What Is Being Validated

Functional Testing

We check what the system does. Common categories include:

Boundary — test edge cases and limits (e.g., min/max values, “just below/just above” a limit, Input limit is 10 characters → testing 9, 10, 11)
Regression — ensure existing functionality still works (e.g., new feature added → old flow still works)
Smoke — quick “does it run at all?” check (e.g., you fixed payment modal → check app loads, login works, basic flows still function)
Sanity — quick validation of a specific fix or feature (e.g., you fixed the payment modal → check only the payment modal behavior)

Non-Functional Testing

Here, the focus is on how the system behaves:

Performance — speed, load, response times
Security — vulnerabilities, permissions, attacks
Usability — how easy it is to use
A11y (Accessibility) — screen readers, keyboard navigation, contrast, etc.
Compatibility — different browsers/devices
Reliability — stability over long runs, restarts, and network issues

Specialized Test Types

Snapshot Testing

Snapshot tests compare a saved version of the UI/DOM to the current one. They're handy for components.

Example (Jest + React Testing Library):

// Button.test.jsx
import { render } from '@testing-library/react';
import { Button } from './Button';

it('renders primary button', () => {
  const { container } = render(<Button variant="primary">Click me</Button>);
  expect(container.firstChild).toMatchSnapshot();
});

Visual Regression/Screenshot Testing

Tools like Playwright or Storybook can compare screenshots pixel-by-pixel. For example, you can see if a button moved after a CSS change.

Mutation Testing

Tools like Stryker deliberately “break” your code (e.g., change operators or conditions) and check whether your tests can catch it. The idea is simple: if you can break business logic and the tests are still green, the quality of your tests is low — even if coverage is high.

Why so many classifications if, in the end, we are just checking that everything works? Separations are needed not for the sake of theory, but for the sake of risk, time, and cost management.

Why So Many Test Categories

The goal is not theory. It is risk management.

In real projects:

Time is limited
Money is limited
Risks vary
There are many changes

Therefore, testing is divided into types to understand:

What needs to be tested deeply
What can be tested quickly
What can be tested superficially
What must be tested after changes

Reducing Flaky Tests

Flaky tests (sometimes red, sometimes green with no code changes) destroy trust in your test suite.

To reduce instability:

Use stable test environments
Control your data (fixtures, factory functions)
Isolate from unstable external services (mocks/fakes)
Make tests deterministic (fake timers, control randomness)

FIRST Principles of Good Tests

Strong tests usually follow the FIRST model:

Fast — the test runs quickly

Independent — doesn't depend on other tests

Repeatable — gives the same result in any environment

Self-checking — validates itself (green/red) without manual log inspection

Timely — written at the right time (not a year after the feature was implemented)

Key Testing Realities

⭐ Bugs are inevitable
The goal of testing is not to "prove there are no bugs", but to reduce the risk of serious problems. We test not because someone writes bad code, but because errors are a natural part of software development.

⭐ You can't test everything
The space of possible inputs and scenarios is infinite. So, you have to make choices. We must choose what to test based on importance and risk, rather than attempting to cover everything.

⭐ Adopt a risk-based mindset
Focus your testing effort on areas that:

Break most often
Are critical for the business
Are complex and challenging to understand
Have tricky integrations

Testing is an investment. We put more effort into areas where failure would be costly.

⭐ The pesticide paradox
If you keep running the same set of tests, they eventually stop finding new bugs, like pests getting used to the same poison. Your test suite needs to be updated and expanded periodically. Tests must be reviewed and improved regularly; otherwise, they become noise and lose their value.

⭐ Quality is a team responsibility
It's not "the QA’s job" or "whoever writes tests". Architecture decisions, deadlines, scope, and attitude to technical debt all affect quality. Everyone (Dev, QA, PM, DevOps) contributes to product quality, so testing decisions and responsibilities must be shared.

Coverage: What Do 80% Actually Mean?

Coverage (line/function coverage) is often turned into a KPI. But it’s important to remember: Coverage ≠ test quality.

You can have 90% coverage and still:

Never test boundary values
Fail to catch real bugs
Ignore important branches in conditions

Use coverage to identify blind spots:

Untested modules
Untouched code paths
Rare scenarios

For many projects, 70–90% is reasonable, but what really matters is what exactly is being tested, not the number itself.

Testing as Part of Engineering Culture

Testing is not a luxury or a "nice-to-have if there's time left". It's part of the engineering discipline.

When the team has a basic understanding of the types of tests available, what value they bring, and how to think about coverage and risk, you can start arguing about details like Jest/Vitest, Cypress/Playwright, and how many E2E tests are needed.

But the foundation is the same: Testing = Engineering discipline.

Treat testing as risk management, not bureaucracy. Teams that adopt this mindset ship faster, break less, and release with confidence.

*The article was initially published on DataArt Team blog.

RASP: The Silent Ninja Handling the Threats You Don’t See

Nevena — Mon, 23 Feb 2026 08:24:07 +0000

What is RASP, and why does it matter? DataArt's Security Engineer, Kirill Chsheglov, explains this in-app security technology, compares leading commercial solutions, and examines what the open-source OpenRASP project brings to the table.

What is RASP?

RASP (Runtime Application Self-Protection) is a security technology that runs inside an application and protects it in real time. Think of it like a bodyguard that rides along with your app, monitoring activity and stepping in when something suspicious happens.

Where a traditional WAF (Web Application Firewall) only sees incoming traffic; RASP has full visibility of the app’s internal activity, including function calls, database queries, and more.

Why Does it Matter?

Many clients still depend on legacy systems that can't be easily patched. Perimeter tools help, but they often lack context, create noise, or miss threats that unfold within the application itself.

RASP closes that gap, quietly monitoring and reacting right away when something goes wrong. Unlike WAFs that raise too many alerts, RASP works silently and effectively, like a ninja, calmly whispering: "Relax. I see everything. I've already caught them."

Why Should Clients Turn to RASP?

"Don't touch the legacy code, it still works." RASP can cover security holes without changing the code, which can be troublesome.
WAF screams, RASP acts. Fewer false positives mean fewer alerts and no SOC meltdowns every Friday.
Zero-day? Stay calm. Even without a CVE (Common Vulnerabilities and Exposures), RASP can spot suspicious behavior and stop attacks.
Attacks have gotten smarter. Old perimeter defenses don't help much with microservices, APIs, or serverless—but that's where RASP works.
RASP may seem expensive, but it can save millions by stopping cyberattacks—for example, in oil and gas environments.
RASP works in production, unlike SAST and DAST, which work before deployment.

In short, RASP is an in-app security layer that understands context and acts immediately.

Leading Commercial Solutions and an Open-Source Option

Fastly employs a hybrid approach, combining edge-level protection with in-app agents. Malicious traffic is filtered globally before reaching your infrastructure. Agents inside the app runtime (Java, .NET, etc.) provide deeper inspection. A central cloud engine manages analytics and rule updates.

Imperva RASP offers a lightweight plugin that sits directly inside the application (JVM, .NET, Node.js). It utilizes grammar-based analysis to detect threats at runtime, including zero-day vulnerabilities. With no proxy or network dependencies, it works well for legacy apps or strict environments.

Contrast instruments deep code to add security directly into the application flow. By hooking into core runtime APIs (like java.lang.instrumentation), it accesses full stack traces, queries, and execution data to accurately detect and block attacks. Designed for DevOps, it integrates via CI/CD pipelines, containers, and Kubernetes, providing accurate in-app protection with minimal false positives.

OpenRASP is a fully open-source, server-layer solution. It integrates seamlessly into key operations, such as database access, file I/O, and networking, in languages like Java and PHP. With taint-tracking and context analysis, it flags and logs malicious behavior. It's customizable, but requires solid internal development, management, and tuning.

Performance Impact

The Fastly RASP engine is built for real-time decision-making, which reduces false positives and minimizes the impact on web performance (See Fastly's documentation for details).

Imperva's grammar-based RASP uses formal language parsing to achieve high detection accuracy with low runtime impact. End users won't notice it running (Read the datasheet for more information).

Contrast Protect reports that 80% of requests incur a latency of under 0.5ms, with 96% processed within a few milliseconds, matching or outperforming similar WAF solutions (See more at Contrast Security's glossary).

What do these tools have in common? RASP doesn't just protect, it does so quietly, blending into production like it was always there.

When RASP Makes Sense?

You run high-value web apps or APIs.
You need runtime protection while fixing complex issues.
You want real visibility into production threats.

Additional Reading

Check out the following material to learn more:

RASP isn’t a silver bullet. But it delivers something traditional tools can’t: a view from inside the application, paired with the ability to act immediately. While WAFs’ perimeter defenses raise alarms, RASP stays focused on stopping the threat at the point where it matters. A silent hero in a noisy world.

*The article was initially published on DataArt Team blog.

5 Ancient Lists of Data That Changed the World

Nevena — Mon, 09 Feb 2026 10:21:48 +0000

The new DataArt Museum project explores millennia of data mastery— from baboon bones to AI brains. For our blog, Alexey Pomigailov, DataArt Museum curator, selected from this remarkable online catalog 5 ancient lists of data that changed the world.

There’s a common myth that data is a 21st-century invention. In reality, data engineering has been around for thousands of years. Our need to record, calculate, and analyze information has always existed. Over the centuries, humans have used records, counts, and tracking to build something bigger than what came before: from notching a baboon bone with a fingernail, to creating the first lists of taxpayers to run ancient empires, to compiling cargo lists to track Viking goods. These early innovations eventually led to punch cards, Excel spreadsheets, online shopping, and even chatting with AI agents today.

1. The Uruk Clay Tablet (Sumer, c. 3200 BCE)

A clay tablet recording barley and malt deliveries for beer production was found in modern-day Iraq. It can be viewed as the birth of Tabular Data. By separating the "label" (malt) from the "value" (quantity) using a grid, Sumerian administrators invented the row-and-column structure. It was the world's first spreadsheet, decoupling data types from data values.

2. The Pinakes of Callimachus (Alexandria, c. 250 BCE)

A bibliographic registry of the 500,000 scrolls in the Great Library of Alexandria was the invention of Metadata and Indexing. Callimachus, an ancient Greek scholar and librarian, realized that data is useless if it isn't "addressable." He created a system that mapped a logical record (title/author) to a physical location (shelf), the ancient ancestor of the SQL index and the URL.

3. Nuova Cronica by Giovanni Villani (Florence, c. 1348 CE)

A chronicle of Florence tracked birth rates, grain prices, and mortality during the Black Death. It represents the shift from simple logging to Descriptive Analytics. The Italian chronicler Giovanni Villani didn't just record history; he used statistical data to describe the economic and demographic health of the city, arguably creating the first Business Intelligence report.

4. Liber Beneficiorum (Krakow, 1470–1480 CE)

Jan Długosz’s "Book of Benefices" is a massive register of church assets and endowments in Poland. As a precursor to State Statistics and ERP (Enterprise Resource Planning), it was a centralized database of decentralized assets, designed to give the "headquarters" (the Diocese) a unified view of geography, economics, and taxation across the region.

5. The Computus (Medieval Europe, c. 222–1200 CE)

Computus was a system of complex calculations used by the Church to synchronize lunar and solar cycles to determine the date of Easter. It can be viewed as the first Algorithm. Unlike a static lookup table, the Computus required "loops" of logic and conditional processing. It proved that mathematics could govern social time, paving the way for the clock cycles inside every modern CPU.

Our recent DataArt museum project, Recount, Sort & Figure Out, traces the evolution of these concepts and highlights the massive role this technology played in shaping civilization.

Seen through this lens, you realize that data engineering isn’t new — we just have faster tools. The logic of organizing the world into rows, columns, and addresses is one of humanity’s oldest survival skills. Explore this multi-millennial catalog to see how the art of handling data has shaped culture, technology, and imagination. to see how the art of handling data has shaped culture, technology, and imagination.

*The article was initially published on DataArt Team blog.

Preparing for the GCP ACE Exam: What You Should Know

Nevena — Mon, 02 Feb 2026 08:49:48 +0000

Thinking about the GCP Associate Cloud Engineer (ACE) certification? Eugene Kiselev, a seasoned engineer from DataArt with over 13 years of experience in various cloud-related projects, walks you through his preparation process, exam experience, and key takeaways. Here’s what he learned and recommends to anyone planning to take the exam.

Why I Took the ACE Exam

Although I already hold the GCP Professional Cloud Architect (PCA) certification, most of my recent work has focused on AWS or Azure. I still remember the core concepts; those are quite consistent across providers, but I’d lost touch with some of the practical nuances (like the color of some buttons) in the GCP console.

To refresh that knowledge, I chose the ACE exam. According to the description, it’s a hands-on certification that tests your ability to manage infrastructure, debug issues, and apply best practices, without diving into advanced topics like IoT or advanced ML pipelines. It sticks to the essentials: compute, storage, security, and high availability. It may not focus on GenAI, but it’s still a valuable skill set.

You may still encounter ML-related scenarios in the questions, simply because that’s the current reality. Yet, the exam focuses on infrastructure, not ML algorithms or pandas code blocks.

Scheduling Proctoring: What to Expect

Scheduling the exam in Poland was simple and affordable. One thing I like about Google exams is that you book a specific time slot, not a generic voucher. That suits my approach; by the time I schedule, I’m ready. This is a personal preference, of course. Some people prefer the flexibility of scheduling after payment.

Pro TIP: Pay close attention to AM/PM, especially if you’re aiming for a 12:00 slot. Google/Kryterion lets you switch between 12-hour and 24-hour formats, which you can use to double-check your time.

Google sends multiple email reminders confirming your exam time and date.

I opted to take the exam at home, but test centers are also available. Prices were identical in my case, so I went with convenience.

One advantage of test centers is that you won’t risk interruptions due to unstable internet or power issues. Also, test centers tend to be less strict about minor behavior, like briefly looking away from the screen. Likely to wipe away tears — Just kidding, it's not that hard!

Home-based proctoring, on the other hand, is more rigorous. Proctors might ask you to show your surroundings or end a session if they consider your actions suspicious. Reddit is filled with stories of such experiences, so it’s best to be cautious.

If you choose to take the exam at home:

Prepare a clean, quiet, private room where no one will interrupt you.
Before the exam, the proctor will ask you to show your room, desk, ID, wrists, and glasses (if applicable). Watches, smart glasses, or headphones are not allowed.
Kryterion now supports showing the room using your phone, which is 100% more convenient than using a laptop camera. You just scan a QR code, and then remove the phone after inspection.

My experience went smoothly: no issues or interventions, no technical problems. I used macOS, installed the testing software, and it worked perfectly.

Preparation Materials and Strategy

The official exam guide accurately reflects the topics covered, so I used it as my primary resource. Carefully reading it helped me identify areas for improvement. I highly recommend referring to it throughout your preparation.

My main study platform was Cloud Skills Boost due to the high quality of the lectures. These same lectures are likely available via Coursera as well. Cloud Skills Boost also includes hands-on labs, which are super important. Most labs give you a deep understanding of GCP services, how to operate/administer them, and how to answer tricky exam questions. They also teach best practices, which help you identify correct answers in scenario-based questions.

TIP: Many questions include several technically correct answers, but only one works in the real world. Practical experience helps you eliminate nonsense quickly.

Important Topics

While every topic matters, I found certain areas more prominent in the exam:

Billing, Projects, Organizations, and Folders

Engineers in large companies often don't deal with these directly (especially billing) since dedicated teams handle them. And due to security restrictions, you can't really practice billing on platforms like Cloud Skills Boost. So, I highly recommend creating your own GCP projects and exploring billing configs. You don't need to spin up resources; just work with folders, projects, and IAM policies. It's extremely valuable.
Google provides a nice architecture diagram, so you can try building something similar.

Kubernetes (GKE)

My background includes CKA and CKS certifications and deep Kubernetes experience, so this section was familiar. But the exam questions go beyond the basics. They no longer ask, “What is a Pod?” but focus more on debugging real-world issues, understanding GKE setups, high availability, and security.

The Kubernetes course in the official learning path is a good start, but not enough on its own. I suggest:

Deploying more than just a basic Nginx pod. For example, you can build something like the image below. Make sure it works, then update the nginx deployment with a different version of Nginx and finally delete pods and see what will happen. Try to break and fix things. This is extremely valuable for the exam and for everyday work.

Practicing debugging
Using Minikube or Killercoda for hands-on work. CKA labs in Killercoda is a good level, CKS probably too much

Cloud Run & Cloud Functions

The official learning path didn't cover these extensively, but there are other dedicated courses on the platform.

For additional preparation, I used the GCA Cloud Engineer Certification Prep course on LinkedIn Learning. It’s well-structured, comprehensive, and covers gaps in other resources. If you have access to it, it can serve as your main study resource.

Final Thoughts

What’s great about GCP exams, even practical ones like ACE, is their focus on understanding concepts, rather than memorizing UI details or CLI flags like AZ-104 (sorry, Microsoft). Some questions might seem like that at first, but if you read them carefully, you’ll realize they are testing your comprehension of how things work.

This means your knowledge won’t become outdated with a UI redesign. Plus, the skills you develop can be easily transferred to other clouds.

The ACE exam is a solid benchmark, if you are comfortable with:

Building scalable infrastructure
Debugging real issues
Analyzing logs
Applying good security and high availability practices

It's a certification that rewards hands-on skills, and a worthwhile addition for anyone looking to prove their ability to solve real-world administrative tasks in GCP.

*This article was initially published on DataArt Team blog.

From Idea to Prototype in 60 Minutes

Nevena — Mon, 26 Jan 2026 11:36:09 +0000

How does an idea evolve into a working prototype? Delivery Manager Andrey Sadakov and Product Owner Viktoriya Zinovyeva walk through the process in a recorded webinar and the text below. It’s a practical, step-by-step path from initial concept to prototype. They explain how to develop an MVP, utilizing effective coding practices that enhance development speed, recommend prototyping tools, and introduce semantic annotation.

Prototyping Video Demo

Watch the full webinar below or go through the article first.

Why AI Prototyping

Before AI-assisted development, execution carried most of the weight. Today, AI code generation has lowered that barrier.
AI helps us generate many ideas quickly, but human judgment still determines which directions are worth exploring.
AI speeds up development and improves quality, shifting the balance between ideation and execution; close to “implementation is nothing”.
Prototyping with AI streamlines discovery, requirements gathering, stakeholder alignment, and getting buy-in much faster. You can quickly build and show a prototype to evaluate potential early.

Product Idea

Imagine an app that helps diagnose pain and improve posture by analyzing user photos. It all started with Viktoriya's knee pain. A simple problem led to an idea: a posture analysis app.

Step 1: Shaping the Idea

The goal is to challenge the initial idea and generate alternatives using different perspectives.

Recommended Model: Use a reasoning-focused model for best results

Instruct the AI to:

Stay neutral and support recommendations with reasoning, pros, and cons for each option.
Ask about gaps or unclear points.
Present counterarguments when needed.
Provide the top three alternative strategies, including trade-offs.
List 5-7 questions to answer before moving forward.

Step 2: Market Research

This step identifies both supporting and opposing arguments for your chosen option, as well as the assumptions underlying them.

Recommended Model: Deep research mode

Include the results from step 1 for context. Then, instruct the LLM to:

Offer multiple viewpoints, and flag uncertainty levels where relevant.
Collect the strongest available evidence for and against each point.
Identify the five most critical assumptions.

This is the standard flow for market analysis: research competitors, look for signals from customer forums, social posts, and other customer insights. AI is suited to handle these exploratory tasks efficiently.

Step 3: Defining the Requirements

The goal is to design a clear specification for the AI prototyping tool.

Use the results from step 2 as input to give the full context. Ask for a Product Requirements Document (PRD) tailored to your specific tool (for example, Loveable). Adjust the PRD as needed, including elements such as personas, user stories, and functional requirements.

Specify the results format:

Use clear, simple language.
Cite pros and cons when suggesting alternatives.
Keep paragraphs short (under 4 lines).
Use bulleted lists where they clarify structure.

How a Prototype Differs from an MVP

Aspect	Prototype	MVP (Minimum Viable Product)
Purpose	To visualize and test ideas quickly, often used for concept validation or stakeholder buy-in	To validate a product in the market with real users and gather feedback
Functionality	Simulated or partial functionality; may not be fully operational	Core, working functionality that delivers actual value to users
Goal	Demonstrate feasibility, design, or workflow	Test market demand, usability, and product-market fit
Testing	Usability tests, concept validation, internal review	Live testing with users, real-world data, and customer feedback
Risk Mitigation	Reduces design/technical risks early	Reduces market and business risks
Iteration Speed	Very fast, since it's not fully functional	Slower than prototypes, but still faster than a full-scale product
Outcome	Helps decide whether to move forward with building an MVP	Provides insights for scaling or pivoting to a better product version

Technical Comparison: Prototype vs MVP

Aspect	Prototype	MVP (Minimum Viable Product)
Code Quality	Often quick and dirty, may use throwaway code, mockups, or scripts. Not built to scale or maintain.	Production-grade (even if minimal). Uses maintainable code that can be extended later.
Backend	Usually faked (e.g., mocked API responses, static JSON files, or stubs).	Real backend services, even if basic—often with database, APIs, and authentication.
Frontend	May be static screens or interactive mockups with limited or no logic.	Fully functional UI connected to backend, handling state and real interactions.
Deployment	Usually not deployed; runs locally, in a demo environment, or as design mockups.	Deployed in production (cloud, app stores, or web), accessible by real users.
Longevity	Disposable—meant to validate ideas before coding seriously.	Foundation for future development—can evolve into the whole product.
Code Quality	Usually faked (e.g., mocked API responses, static JSON file, or stubs).	Production-grade (even if minimal). Uses maintainable code that can be extended later.

Vibe Coding Best Practices

Build iteratively. Don’t try to generate an app from a single prompt. Start simple and build up in steps.
Choose a standard tech stack. LLMs can work with many frameworks, but JavaScript frameworks such as ReactJS, as well as Java or C#, are more reliable.
Provide context. Share relevant documentation, app functionality, business goals, and technical details. The more context the AI has, the better the results will be. Protocols like MCP help keep documentation up to date.
Set coding rules. Coding standards vary by language (e.g., JavaScript, NodeJS). Add these rules to prompts to help the AI produce better code.
Use semantic annotation.

Semantic Annotation

Just as we comment on code and update documentation for others, we use the same principle for AI. Create a readme.md file in markdown to explain your app’s business logic. Add clear comments in the code to describe its purpose, inputs, outputs, and side effects. You can also use XML to annotate the code. This helps AI understand the code's intent, not just how to run it. Use semantic markup to make this easier.

Recommended Tools

Loveable: A platform for quickly turning concepts into web applications. Best for creating clickable prototypes
Replit: An online IDE with AI features. Balances automation with developer control. Suitable for both speed and functionality.
Local Setup: Use Cursor with the Caluse Sonnet model for a hands-on, developer-driven approach. Best for full-scale development.

What to Do Next

With a clear structure and the right tools, anyone can create a prototype. Good documentation and small, well-defined steps make it easier to turn an initial idea into functional products quickly and easily. Start iterating and see where your ideas lead!

*The article was initially published on DataArt Team blog.

AI Coding Assistants: Helpful or Harmful?

Nevena — Mon, 19 Jan 2026 10:09:36 +0000

Denis Tsyplakov, Solutions Architect at DataArt, explores the less-discussed side of AI coding agents. While they can boost productivity, they also introduce risks that are easy to underestimate.

In a short experiment, Denis asked an AI code assistant to solve a simple task. The result was telling: without strong coding skills and a solid grasp of system architecture, AI-generated code can quickly become overcomplicated, inefficient, and challenging to maintain.

The Current Situation

People have mixed feelings about AI coding assistants. Some think they’re revolutionary, others don't trust them at all, and most engineers fall somewhere in between: cautious but curious.

Success stories rarely help. Claims like “My 5-year-old built this in 15 minutes” are often dismissed as marketing exaggeration. This skepticism slows down adoption, but it also highlights an important point: both the benefits and the limits of these tools need a realistic understanding.

Meanwhile, reputable vendors are forced to compete with hype-driven sellers, often leading to:

Drop in quality. Products ship with bugs or unstable features.
Development decisions driven by hype, not user needs.
Unpredictable roadmaps. What works today may break tomorrow.

Experiment: How Deep Does AI Coding Go?

I ran a small experiment using three AI code assistants: GitHub Copilot, JetBrains Junie, and Windsurf.

The task itself is simple. We use it in interviews to check candidates’ ability to elaborate on tech architecture. For a senior engineer, the correct approach usually takes about 3 to 5 seconds to give a solution. We’ve tested this repeatedly, and the result is always instant. (We'll have to create another task for candidates after this article is published.)

Copilot-like tools are historically strong at algorithmic tasks. So, when you ask them to create an implementation of a simple class with well-defined and documented methods, you can expect a very good result. The problem starts when architectural decisions are required, i.e., on how exactly it should be implemented.

Junie: A Step-by-Step Breakdown

Junie, GitHub Copilot, and Windsurf showed similar results. Here is a step-by-step breakdown for the Junie prompting.

Prompt 1: Implement class logic

The result would not pass a code review. The logic was unnecessarily complex for the given task, but it is generally acceptable. Let’s assume I don't have skills in Java tech architecture and accept this solution.

Prompt 2: Make this thread-safe

The assistant produced a technically correct solution. Still, the task itself was trivial.

Prompt 3:

Implement method List<String> getAllLabelsSorted() that should return all labels sorted by proximity to point [0,0].

This is where things started to unravel. The code could be less wordy. As I mentioned, LLMs excel at algorithmic tasks, but not for a good reason. It unpacks a long into two ints and sorts them each time I use the method. At this point, I would expect it to use a TreeMap, simply because it stores all sorted entries and gives us O(log n) complexity for both inserts and lookups.

So I pushed further.

Prompt 4: I do not want to re-sort labels each time the method is called.

OMG!!! Cache!!! What could be worse!?

From there, I tried multiple prompts, aiming for a canonical solution with a TreeMap-like structure and a record with a comparator (without mentioning TreeMap directly, let's assume I am not familiar with it).

No luck. The more I asked, the hairier the solution became. I ended up with three screens of hardly readable code.

The solution I was looking for is straightforward: it uses specific classes, is thread-safe, and does not store excessive data.

Yes, this approach is opinionated. It has (log(n)) complexity. But this is what I was going to achieve. The problem is that I can get this code from AI only if I know at least 50% of the solution and can explain it in technical terms. If you start using an AI agent without a clear understanding of the desired result, the output becomes effectively random.

Can AI agents be instructed to use the right technical architecture? You can instruct them to use records, for instance, but you cannot instruct common sense. You can create a project.rules.md file that covers specific rules, but you cannot reuse it as a universal solution for each project.

The Real Problem with AI-Assisted Code

The biggest problem is supportability. The code might work, but its quality is often questionable. Code that’s hard to support is also hard to change. That’s a problem for production environments that need frequent updates.

Some people expect that future tools will generate code from requirements alone, but that's still a long way off. For now, supportability is what matters.

What the Analysis Shows

AI coding assistants can quickly turn your code into an unreadable mess if:

Instructions are vague.
Results aren’t checked.
Prompts aren’t finetuned.

That doesn’t mean you shouldn’t use AI. It just means you need to review every line of generated code, which takes strong code-reading skills. The problem is that many developers lack experience with this.

From our experiments, there’s a limit to how much faster AI-assisted coding can make you. Depending on the language and framework, it can be up to 10-20 times faster, but you still need to read and review the code.

Code assistants work well with stable, traditional, and compliant code in languages with strong structure, such as Java, C#, and TypeScript. But when you use them with code that doesn’t have strong compilation or verification, things get messy. In other parts of the software development life cycle, like code review, the code often breaks.

When you build software, you should know in advance what you are creating. You should also be familiar with current best practices (not Java 11, not Angular 12). And you should read the code. Otherwise, even with a super simple task, you will have non-supportable code very fast.

In my opinion, assistants are already useful for writing code, but they are not ready to replace code review. That may change, but not anytime soon.

Next Steps

Having all of these challenges in mind, here's what you should focus on:

Start using AI assistants where it makes sense.
If not in your main project, experiment elsewhere to stay relevant.
Review your language specifications thoroughly.
Improve technical architecture skills through practice.

Used thoughtfully, AI can speed you up. Used blindly, it will slow you down later.

*The article was initially published on DataArt Team blog.

The Best and Worst of IT in 2025: Highlights, Scandals, Innovations

Nevena — Thu, 15 Jan 2026 14:57:20 +0000

As the new year begins, Andriy Silchuk, DataArt’s Head of R&D Center and Delivery Director, looks back on a turbulent 2025. From defining trends and high-profile scandals to breakthrough innovations and rare bright spots, he recaps what shaped the IT and hi-tech world—and shares his outlook for 2026.

Ladies and gentlemen, we’re lucky once again to have made it to the end of the year, so let’s officially tally up the year-end results.

As before, let’s follow a familiar route: we’ll briefly recall last year's forecasts, then look at the major trends, scandals, the good, and the bad that we all experienced in 2025. We’ll also examine the big names we lost this year, determine the heroes and villains, and recall 2025’s surprises. Then we’ll end the program with our 2026 forecasts.

So, pour yourself your favorite drink – and let's go!

A Brief Look at Last Year’s Forecasts

At the end of 2024, we predicted that we’d get real AI agents, turbulence in the US IT industry, changing requirements for engineers, "data as oil," and a growing lag between Europe and the United States in 2025.

What we actually got:

AI agents have truly made their way from being the subjects of presentations into full production: ChatGPT agents and a bunch of other tools are already walking around the sites themselves, pressing buttons, executing scripts, and the Linux Foundation is even launching an initiative based on agentic AI standards.
Turbulence in the United States IT industry hasn’t gone anywhere: antitrust lawsuits against Google/Meta, content wars, regulation — it was all present again this year.
The requirements for engineers have changed drastically: "I know how to work with AI tools" is now a basic required skill. Big Tech is introducing KPIs for using AI, and those who resist are told to look for a new job.
Data and infrastructure are truly the "new oil." The problem is not even about data, but about servers, GPUs, memory, and energy — everything is becoming more expensive and scarce.
Europe has cemented itself as the regulatory champion, with the EU AI Act, record DSA fines for X, etc. In the US, meanwhile, they’re more so debating how to regulate IT, rather than actually doing any regulating.

Our predictions were right practically across the board. Not because we’re prophets, but because the trends were blatantly obvious.

Four Most popular trends of 2025 (they’re all about AI)

1. Agency AI: From "chats" to real assistants

2025 was a turning point: the focus shifted from "generative AI" to agentic AI. ChatGPT agents and similar systems no longer just respond, but also perform tasks themselves—they open websites, monitor statuses, book, write, and edit documents. Businesses are churning out their own agents for support, sales, and back office, DevOps, and domain tasks, and at the same time, a whole zoo of multi-agent frameworks is growing. We've officially gone from "a chat who advises something" to "an assistant who does the job but still needs supervision."

2. GEMINI, GPT, CLAUDE and others – a new level of "smartness"

Google has finally shown that it’s still alive and very strong, with its Gemini 2.x and 3 models, Nano Banana, and other tools, and deep integration into Search, Android, and Workspace. OpenAI rolled GPT-5/5.1 out of "thinking mode" and made it the default in ChatGPT, effectively dragging a bunch of niche tools under it. Anthropic with Claude 4.5 is seriously putting the pressure on its competitors in coding and reasoning. Meta continues to pump Llama in open source. For the user, it is no longer "one model is better than another", but a whole forest of ecosystems that fight to be your main "superstructure in work." The high level of competition is always in our favor.

3. Data centers as new "capitols" with Heroes 3

The capitol is an extremely necessary thing in Heroes III, but it costs a lot of money. It’s the same with data centers. The IEA and the European Commission predict that data centers already consume about 1.5% of all electricity on the planet, and could double this consumption by 2030, largely due to AI. Energy demand in the United States for data centers jumped 20% year over year, and AI servers are taking an increasing share of capacity. Big Tech is responding in its own style: it’s buying up solar/wind plants, it’s building gas and small nuclear projects, it’s turning old coal-fired power plants into data centers, and its signing contracts for building its own nuclear power plants. Nuclear power plants, Karl!

4. Regulations, courts, and "AI psychosis"

The EU AI Act has officially started, and DSA is starting to bite with real fines. In the United States, state attorneys general issue warnings to large AI companies about mental health risks. The first lawsuits have appeared where ChatGPT and other models appear in real tragedies — from suicides to murders, where AI allegedly added fuel to the paranoia. Regulation has traditionally lagged behind technology, but politicians and courts are already in play, and 2025 has clearly shown that "it's just a chat" no longer works as an excuse.

Five most high-profile scandals of 2025

1. DeepSeek: China's "nightmare" for the market

At the beginning of 2025, DeepSeek released its models with an embarrassingly low price and pretentious claims about "ridiculous training costs." The market panicked, NVIDIA shares sagged. Then it turns out that everything is not so simple as "cheap" training. The quality of "supermodels" from China took a hit, too. But the shock of just how one release can collapse half the market remains.

2. X becomes DSA’s first major "patient"

The EU decided to demonstratively apply the Digital Services Act and issued X an estimated $140 million fine for manipulative blue ticking and refusal to provide data for research. This is the first major case DSA in action, and hardly the last. The signal is clear: playing "I do whatever I want" in Europe won’t work anymore, even if you really love freedom of speech in your own interpretation.

3. TikTok: Banned, then not banned, with an eternal "window for agreement"

The TikTok saga in the US was reminiscent of a soap opera. The law required that either TikTok sell itself to an American owner or leave the market. TikTok defiantly shut down its service in the United States before the deadline, the administration dragged out the time, and then Trump came. He extends the "window for agreements" several times (he’s the master of the “art of the deal,” let's not forget). As a result, bidding continues for a year, names of possible buyers are announced, but they never do get full control — formally everything has been signed, but in reality everyone just pretends to be very busy, and postpones the final steps.

4. Google: Antitrust wars and the shadow of selling Chrome

Google is simultaneously focused on several different fronts: dominance in advertising and search, abuse of its mobile platform, and the use of the web to train models. Against this background, there was even a lot of talk and rumors that the company could be forced to sell Chrome, and there was a long queue of those who wanted to buy it. Hyenas can sense blood from far away, as they say. The sale didn’t take place, but the very fact of discussing the sale of the #1 browser shows how tightly Google was squeezed.

5. OpenAI's exit from Microsoft's influence

OpenAI and Microsoft are officially "restarting" their partnership: Microsoft remains a large shareholder, but without total control. Azure's exclusivity is being diluted, and some OpenAI services are moving to other clouds. In the end, the companies declare friendship, but in fact they are preparing for a "civilized" departure: OpenAI wants to make decisions on its own and have freedom, while Microsoft wants the right to develop its own AI separately. At least, that's what they say. OpenAI gets certain advantages from separating, that's clear, but what Microsoft will get out of separation is still a question.

Three most positive events of 2025

1. AI in Medicine: From promises to real-world treatments

2025 was the year when AI in medicine finally showed something more serious and applicable than just promises on paper. Results of clinical trials of AI-developed drugs against cardiovascular and oncological diseases are emerging, and Rentosertib for the treatment of idiopathic pulmonary fibrosis has demonstrated safety and benefits. Furthermore, AI approaches to early diagnosis of cancer and liver diseases are actively developing. This is not yet "AI cured cancer," but real steps in this direction are already being taken.

2. Quantum Computers: Less hype, more benefit

After years of promises about how cool quantum technology can be, we are slowly but surely moving towards practical applications. New systems like Quantinuum, Helios, and Google Willow show progress in bug correction and stability. It's still expensive and niche, but it looks less and less like PR, and more and more like a long-term bet.

3. Global IT demand comes to life

India's IT services exports grew by about 12.5% to $224 billion in fiscal year 2024-25 after several sluggish years. For the industry, this means a simple thing: enterprise money is again used not only for cost optimization, but also for new projects and digitalization. For Ukrainian outsourcing, this means not a direct contract, but a very positive indicator: customers are ready to buy again. If the money returns to India, then it will reach us.

Four worst IT news in 2025

1. Massive declines in global services

In October, there was a long-term crash in AWS us-east-1, which in turn “crashed" Slack, Atlassian, Snapchat, and a million more. In November and December there were two big Cloudflare failures, one of which knocked out up to 28% of the world's HTTP traffic. The conclusion is banal, but painful: the Internet is too dependent on several infrastructure players. The words "multi-region/multi-cloud" on presentation slides are not a guarantee of real sustainability.

2. AI-enhanced cyberattacks

Cybercriminals are awake too: tools like PromptLock are emerging, which use generative AI to automate phishing and more complex attacks. The year 2025 saw a series of major leaks and ransomware attacks on energy, logistics, and other critical systems. AI increases the productivity not only of developers, but also of all the "bad guys".

3. Giant data breaks

Prosper Marketplace in the United States lost the data of 17.6 million people, and the South Korean company Coupang lost another 33.7 million accounts. In total, there are more than 50 million records with names, addresses, documents, and order histories. The reputation of fintech or e-commerce can now be lost in one bad year.

4. Mass layoffs in tech

According to TrueUp and other trackers, in 2025, almost 700 waves of layoffs in tech companies took the jobs of more than 200,000 people — an average of 600+ dismissals every day. The headlines are the same again: Amazon, Microsoft, Google, Intel, Meta, etc all laying off employees. Increasingly, companies are saying bluntly: we are cutting people to invest in AI and automation. So either we learn to work with AI, or AI will replace us, little by little. Don't forget this simple rule.

Six most interesting releases and announcements of 2025

1. ChatGPT Atlas and Comet — AI-browsers

OpenAI launched ChatGPT Atlas — a Chromium browser with ChatGPT at its heart: sidebar, summarizing articles, comparing products, working with documentation directly in a browser window. Perplexity rolled out Comet — also on Chromium, but with a focus on a personal agent who does its own research, deletes unnecessary tabs, and rakes mail. These are no longer add-ons on top of Chrome, but a new class of products: "a browser as a shell for an AI agent."

2. AGENTS.md — README for agents

In August, AGENTS.md appeared — a simple file at the root of the repository that explains to AI agents how to live in a project. How to collect and test code, where the entry points are, and what the rules are. In just a few months, tens of thousands of repositories pick it up, GitHub adds guides, and the Linux Foundation with OpenAI/Anthropic formalizes it as part of the standard for agentic AI. Starting this year, documentation is divided into human-made (README.md) and agent-made (AGENTS.md) — and it looks like it’s here to stay.

3. Claude 4.5 is a "programming neighbor" for developers

Anthropic updated its entire lineup: Opus 4.5, Sonnet 4.5, Haiku 4.5. Opus seriously improves reasoning, long contexts, and tool/agent handling. Sonnet has become a workhorse at an adequate price. Haiku has become an ultra-fast, high-volume option. In reviews, Claude 4.5 is often cited as one of the best dev assistants for real-world projects, not just for template tasks or pet projects.

4. Gemini — Google shows it can do it all again

Google rolled out Gemini 2.0 (Flash / Flash-Lite), then 2.5 Pro / Flash / Deep Think, and at the end of the year, Gemini 3 Pro. The models are getting faster, smarter, and are heavily tied to the Google ecosystem. The most important thing is total integration: Gemini lives in the search, Gmail, Docs, Android, and Google AI Studio. This is no longer an attempt to catch up with competitors, but a separate ecosystem that can really be used on its own. Many note that this is one of the best AI releases of the year.

5. Starlink Direct-to-Cell and Ukraine

SpaceX launches commercial Starlink Direct-to-Cell: satellites work as base stations, SMS texts are sent from ordinary smartphones through space without special devices! And then Kyivstar becomes the first operator in Europe to launch D2C together with Starlink: first for SMS and basic messages, then they plan to add voice and mobile Internet. For Ukraine, this is not just another feature, but an important element of resilience during blackouts and shelling.

6. Bonus: Sora and the first step to a "dead internet"

OpenAI released Sora 2 for video generation and a separate application — a conditional "Instagram," purely for AI videos, called Sora. Feeds are clogged with synthetic video, people are delighted, and at the same time, many are wondering: if social networks begin to massively switch to generated content, how much "live" Internet will we have left? On my side, I can admit I myself sometimes get caught up in this content. And yes, sometimes I can't even distinguish it from real content.

Five "most" interesting hardware inventions of 2025 — once again it’s all about the metal

1. Most innovative device: the Meta Ray-Ban Display

The first AR computer to be really similar to a daily device, in the form of normal glasses. Meta AI's messages, navigation, translations, and replies are all right in sight. Special attention should be paid to the Neural Band, a bracelet that reads muscle impulses and allows you to control the interface with gestures. There are still plenty of questions about the product, but as far as a direction of innovation goes, it’s very interesting.

2. Hobby of the Year – Logitech MX Master 4

Yes, it's "just a mouse," but the MX Master 4 with the new Haptic Sense Panel and Actions Ring was one of the most pleasant changes to the working day. Ergonomics, multi-device, and a bunch of custom shortcuts that really save time. As the owner of the previous version, I can honestly say: this is a device that’s difficult to pass up.

3. Disappointment of the Year – iPhone 17 Pro

On paper, there’s the A19 Pro, a new camera, Apple Intelligence, and marketing bull shit. In practice, there’s a controversial design, aluminum instead of titanium, and the main AI features arrived late to Europe and in a stripped-down form. If you’re looking for an Apple, then check out either the iPhone 17 Air, albeit with questions, but at least it’s something new, or the regular iPhone 17, which turned out to be much more successful.

4. An Interesting Niche Product – Oura Ring 4

Although Oura Ring 4 was released last year, it received a cool ceramic version this year, and looks like the king of niche devices: tracking sleep, stress, and activity in the format of a beautiful piece of jewelry, not just another screen on your arm. Not for everyone, but for those who bother, wellness is a very nice gadget.

5. Garbage of the Year – Samsung Galaxy XR

Formally, it’s the first Android XR device and flagship in cooperation with Samsung, Google and Qualcomm. In fact, it’s a rather expensive demo. Albeit lighter than Vision Pro, it’s not very ergonomic, with an external battery, damp software (very raw), unstable tracking, and a poor catalog of applications. Against this background, even the already-mentioned, not very successful Vision Pro looks cooler.

Big IT names we lost in 2025

The traditional block where you want to press F and cry.

Bill Atkinson was a legendary Apple engineer, creator of MacPaint and HyperCard, and the man who shaped the look of early GUI.
Steve Shirley is a pioneer of outsourcing and remote work, the founder of Freelance Programmers, who built an outsourcing business long before it became mainstream.
Margaret Boden is one of the founders of cognitive science and AI research, and the author of classic works on the interaction of artificial and human intelligence.
David Benaron is a doctor and entrepreneur whose developments formed the basis for the sensors of modern fitness trackers and smartwatches.
Udo Kier is an actor, but for us he is forever Yuri from Command & Conquer: Red Alert 2.

This is only a small cross-section of the people whose work "lies quietly under the hood" of the things we use every day, and that we have lost this year.

And separately — R.I.P. Skype, a piece of our everyday life, to which time still said "that’s it" and left.

Other 2025 Highlights

IT Hero of 2025 — Jensen Huang, CEO of NVIDIA

Under his leadership, NVIDIA briefly touches the $5 trillion capitalization bar on October 29, becoming the most valuable company in the world. The demand for their chips is rewriting records, and NVIDIA itself has finally turned from a "company for gamers" into a monopolist of infrastructure for generative AI. The man in the black leather jacket became the face of the era – more than anyone else.

2025 IT Villain — Astronomer CEO Andy Byron

We could easily give the statuette to one famous billionaire again, but this year the anti-hero award goes to Astronomer CEO Andy Byron. He became famous not for his products, but for his very loud personal story and the memes around it. Sometimes the villain of the year is not the one who breaks the market, but the one who coolly spoils his reputation because of an affair at a Coldplay concert. The story will go away, but the memes will stay with us forever.

IT anecdote of 2025

On the one hand, there’s Ilya Sutskever and Mira Murati, who collected billions for a startup based on a "bare name," without a product. It's very cool, but I would believe in such a joke only in an anecdote.

On the other hand, there’s a wave of madness around the new image generation model in GPT-4o: the Internet is turning into an anime carnival, Sam Altman complains that there’s not enough power, and users can’t stop. True surrealism.

IT Surprise of 2025 — Oracle and Media Triples

Oracle suddenly becomes an AI cloud star: its shares soar more than 40% in a day after news of giant contracts and OpenAI connections, its capitalization approaches a trillion, and Larry Ellison overtakes Musk in the ranking of the richest people in the world by several hours.

In parallel, Netflix, Paramount, and Warner Bros. Discovery play out a complex love triangle with purchase claims and political overtones. The content market is shrinking, and we are gradually approaching the world of "one app for all videos." Jobs willing, one day it will be so.

Mobile 2025: Liquid glass and Epic vs Apple

This year I decided to add such a nomination. Apple is importing a complete redesign of iOS in the style of liquid glass - beautiful, loud, uncomfortable in places, but definitely hype.

And Epic is finally winning a small but important victory in the fight against Apple. It was definitely a pebble that, albeit a little, changed the issue of commissions in mobile stores. Not a revolution, but it is from such microcracks that large monopolies begin to gradually rethink their behavior.

Five predictions for 2026

Alright, let's move on to the forecasts!

1. AI agents will become the new daily software, and the hype will continue

In 2026, the average engineer will have not one chat or tool, but several AI agents who will do the routine: walk through Jira/Confluence, rake mail, and write drafts. The item "experience in building and managing AI agents" will increasingly appear in vacancies.

2. Energy will be the main limitation on the AI boom

We’ll see the first cases when the construction of data centers is directly limited by access to energy and water. Investments in energy, especially nuclear energy, will become a part of Big Tech's AI strategy.

3. Regulators will move from chaotic fines to a system of rules

The first real AI certification frameworks for medicine, finance, and education will appear in 2026. They will still be bureaucratic, but they won’t look like chaotic steps any longer. At the same time, we can expect high-profile court cases against AI platforms for damage to health, people’s wallets, or their reputations.

4. Fake AI profiles and content will become commonplace

What now looks like "strange Insta accounts" and individual cases will become a massive buzz in 2026. Generated faces, stories, news, bloggers, individual content will become the new normal. The question of the year will be "is there anything real here?"

5. Internal AI platforms will become the standard for companies

If in 2025 proprietary LLMs or internal AI platforms were a feature of a few, then in 2026 an internal AI platform with access to documents, code, and processes will become a new "corporate standard." Someone will buy ready-made solutions, someone will assemble it themselves, but "enterprise AI" will cease to be a pilot, and will become an obligatory part of the infrastructure.

Closing 2025

This year was difficult. At times it was extremely difficult. For many people it became the most difficult year in their entire career and life. But from the point of view of IT, this year turned out to be incredibly rich. AI became smarter, data centers became hungrier, regulators got angrier, Big Tech got fatter, and Ukrainian IT got even more inventive.

*The article was initially published on DataArt Team blog.