DEV Community: David Ayres

Vibe Architecture

David Ayres — Fri, 15 May 2026 19:00:58 +0000

It's undeniable that AI has changed the technology landscape, but everything I read is deeply focused on the impact to Engineering, while Architects rarely get the same attention. In this article, I wanted to highlight a Solution Design I recently worked through, the Agentic AI tooling used, and what the final product looked like. This is going to be raw and honest because it was my first time using this approach.

Context;

I work as a Solutions Architect for a FinTech startup company. Although the primary focus is Payments, a significant part of the business is centred around Loyalty solutions. Working for a startup means new projects can arrive quickly, and workloads have to pivot rapidly to satisfy client demands — because clients pay the bills.

The Requirement;

Anybody familiar with the Italian Prize Issuance Regulation D.P.R 430/2001?

No? Me neither.

The ask from the business was to create an Instant Win game where, when a customer completes X transactions within a 24-hour period, they receive a game token. That token is then used in a random game of chance where a prize may be awarded.

Random prize issuance in Italy is heavily regulated, and the regulation has to be followed precisely.

I am also not an Italian speaker, but the Statement of Work and all third-party integration documents were written in Italian. There were a lot of integrations too — this solution sat in the middle of a large ecosystem involving external API integrations, file exchange, and a bespoke vendor-implemented SSO solution.

Then came the third and final caveat: the timelines for completing the design were measured in days, not weeks, due to a committed client delivery date.

It wasn't exactly a winning position, but we don't shy away from a challenge.

Tooling;

The Engineering team I work with all have access to Agentic AI coding tools, with Warp being the frontrunner in terms of adoption. At times, I've even pulled stories directly from the board and implemented changes myself using the tooling, so I was already familiar with the approach.

My immediate thought was:

Why can Engineers vibe with AI, but Architects can't?

Agentic Solution Design

It would have taken weeks to fully understand the regulation and translate all the supporting documents, which simply wasn't an option.

Instead, the only viable approach was to "Vibe Architect" the solution and leverage Warp to do the heavy lifting while I guided it through the process.

The setup looked something like this:

The initial workflow;

Pull all business documents into a local repository so Warp could consume the full context — still in their native Italian.
Pull down incumbent codebases to use as reference models for coding standards and implementation approaches.
Pull down the microservices specification catalogue to use as reference models for best practices.

Then came a pause.

I spent a couple of hours in Miro scoping out a high-level diagram of the landscape: what already existed, what needed to be modified, and what needed to be created. The classic Architect "boxes and arrows" exercise.

That step was critical because it gave the AI bounded contexts and a defined scope to work within.

The final steps were:

Point Warp at the Miro MCP and the board itself for context.
Point Warp at the Jira MCP and key Architectural constraint artefacts.

Then we wrote the instructional prompt and off we went.

The Output

I kept tight control over the AI throughout the process. After every major step, it would pause and wait for feedback, allowing me to continuously steer it in the right direction.

Together, we produced a large number of markdown documents — all in English — covering:

Algorithm Design: how prizes are awarded fairly and randomly
High-Level Architecture
Service Designs for each new service
Service Modifications for each updated service
Requirement generation, including NFRs
SSO implementation
Observability, error handling, and alerting strategy
Testing strategy

Normally, I like to construct my Confluence Solution Design documents manually, taking generated markdown and curating it carefully for the Engineering teams.

Given the time constraints, however, I asked Warp to write everything into Confluence for me.

Any diagrams were generated as Mermaid code, which meant I could quickly convert them into images and embed them into Confluence myself.

I also found myself treating Warp almost like a Technical Architect throughout the process. Due to Regulation D.P.R 430/2001, there are strict software controls that must be implemented. The Solution Design therefore had to go much deeper technically than our Engineering teams would normally expect from Architecture documentation.

The term Chi-Squared Distribution will now haunt me for the rest of my career.

The final step was asking the AI how confident it was that the proposed solution was compliant and would pass auditing.

It was happy, so I was happy.

The completed Solution Design was then handed over to Delivery Managers to convert into Epics and Stories — again using Agentic AI, this time directly through ChatGPT.

Implementation

As previously mentioned, the Engineering teams already use Agentic AI coding tools extensively. They were pulling stories and epics directly from Jira and, because the Technical Design had already broken work down into granular detail, Engineers were able to start quickly and work largely in parallel — effectively one Engineer per service.

Most of the collaboration concerns had already been solved during the design phase, allowing teams to work independently and integrate everything later.

Up front, we knew that investing a couple of days embedding the design and business domain into the team would allow them to better support their AI agents. That meant each Engineer became deeply knowledgeable within their service domain.

This paid dividends because it freed me up to move directly onto the next Solution Design.

The Spanner in the Works

Regulation D.P.R 430/2001 requires a signed compliance document from the Technology team evidencing adherence to the rules.

Although I had designed the solution, implementations naturally evolve during delivery.

By pulling all the codebases down locally and asking Warp to perform a full end-to-end audit of the implementation against the Confluence documentation, I was able to validate the final state of the solution.

Warp then had enough context to generate the compliance documentation for me as well, including relevant reference code examples where required.

We were more than compliant and proved it.

In Conclusion

Was I comfortable "Vibe Architecting" this?

Nope.

I think most Architects are control freaks at heart and want to be involved in every detail.

Was I confident?

Also nope.

I didn't hold enough of the implementation detail in my head, and I'm used to understanding everything end to end.

Did it work?

Surprisingly, yes.

By some miracle, all the AI involved across the delivery managed to hit the brief, fulfil the requirements, and remain compliant with the regulation.

What does this mean for me moving forward?

I honestly don't know.

What I do know is that I'm now working constantly alongside Warp, taking requirement documents and vibing solutions at a pace I couldn't previously achieve. I'm finding its insights especially valuable when producing change requests against incumbent codebases, where I can now generate detailed technical specifications and estimates far more quickly.

Does it sometimes mean I make the code changes myself?

Absolutely.

It's fun — and sometimes the documentation takes longer than the actual implementation.

What it also means is that I can significantly increase my output. I have 14 Engineers that I need to continuously feed work into, and the tooling helps me maintain consistency and quality across everything I produce.

I'm still trying to find the sweet spot between Solution Architecture and Technical Architecture when it comes to documentation depth. The AI can absolutely generate line-by-line code change specifications, but at some point that starts to diminish the value Engineers bring — because Engineers consistently provide insight and nuance during implementation that would otherwise be lost.

I'm finding myself experimenting more and more with AI tooling, and I'm convinced that over time the vast majority of what I do will become increasingly automated. I'll code with Warp, I'll research with Gemini, I'll use ChatGPT for quick code formatting or text analysis, then I'll use Gamma to make my presentations look pretty and professional. As for Miro, that's a post for another time.....

Much like modern Engineering workflows, I suspect my role will evolve into reviewing outputs, refining prompts, steering agents, and validating outcomes — perhaps even letting AI generate the diagrams for me too.

Hopefully one day I won't still be fixing Mermaid diagrams manually.

Until then, I still provide value.

(Yes, this post was proof read by AI but my voice is still in it!)

What is Architecture to me?

David Ayres — Fri, 13 Sep 2024 09:03:50 +0000

So for my next article in this series, there was a comment I left hanging in my last one:

"Yes there are lots and lots of flavours of this; Business, Enterprise, Security, Infrastructure, Application, Principal, Solution and a whole host of others that differ from company to company."

Which definitely needs to be explored. Officially speaking I'm a "Solution" Architect. Which can pretty much be any/all of the flavours of Architect above, like a smorgasbord of Architecture responsibility.

So what is Architecture to me and what does my day to day look like?

Solutions Architecture

So, a Solution Architect..... tends to have 1 or more Systems and is responsible for the technical ownership of them. They'll draw some system designs - "boxes and arrows" at what's often called a High Level. This box talks to this box that then talks to this box. All technical detail is abstracted away and it's the simplest view of the system possible, including internal systems and external third party ones (think salesforce, workday, etc). It'll show what the basic flow of data looks like and how that crosses across other Systems and domains.

They should (I say should as it's different everywhere and for everyone) also have supporting documentation to describe the box (or boxes) they own. I'll do another article on the story of a solution design because there's plenty to talk about "how" solutions should be described.

Those 2 documents together are key. We are responsible to make sure there's adequate documentation describing what a solution does, how it does it and why it's doing it. That way, anybody who comes along and wants to learn about it, can read the document first and learn all about it without needing to rely on an individual sharing their knowledge. It's critical to abstract a solution away from a person. People move on and leave companies, they forget things, a documented historical artefact of that solution lives forever. Systems are rarely short term, this has to be considered when writing documentation.

So a Solution Architect needs to be competent in the written word.

A Solution is also a fairly public component of a company. You might be lucky enough to design something isolated, hidden, that people don't really need to know about. If not, then you'll almost certainly need a third supporting document for your solution. The dreaded presentation. That "PowerPoint" you'll have to run through time and time again, to various members of the business, from Director level downwards. Maybe you are justifying the expense of a project, or sharing it's success. Either way, an Architect has to be able to not only create an engaging and interesting presentation but they also need to deliver that message, in a language the audience will understand. Presentation skills cannot be underestimated. For me, it's something I do weekly although I'm lucky that I like the sound of my own voice but really, it's because I'm always invested in what I'm working on and always willing to discuss it.

Then another key part of my role is the talking. There's always lots of talking. To help formulate a design;

There's conversations with the business to understand requirements. These will take place with a Business Analyst to understand "what" is needed. It's not a technical conversation but understanding what problems we need to solve forms the foundation of the Solution. If you can't solve the business problem, you'll never "win".
Any other Systems I might need to interact with, I'll talk with colleagues in those business/tech domains to understand how we can integrate and share the data. It could be an Architect, a Platform Lead, a Lead Engineer - whoever has that Technical ownership.
There's going to be some sort of a Technical governance group, where the Solution needs to be presented and "approved". They'll give insight from their own experience, ask probing questions to see if I have gaps.
If a design moves towards a Buy decision (more on that in another article) then there will be a super exciting RFP process talking to potential Suppliers and evaluating their offering. Is this fit for purpose, will it meet any NFRs, how locked in might we be with this choice etc.
Finally, there's the company governance. The security team audits and checks. Is GDPR data protected? Have I designed a hole that could be exploited by outside malicious parties? In the past these have been Threat Modelling session reviewed by a Security Architect but each company has their own requirements for this.

At the point a design is "complete" the knowledge share is between the Engineers, Architect and Analyst. It's your classic pizza sized team. To get to that point, there will have been so many touch points for the Architect and Analyst that get abstracted away to make the Engineer's lives easier. You need to be able to talk to people and build up rapport with a whole host of different job roles and personalities. There's a huge social touchpoint graph of conversations to complete a Solution.

The Technical "Stuff"

We've got a High Level diagram but the next level down starts to become more technical. That's where you start seeing an API and a data store being added to a diagram. We still aren't at a super technical level here, we aren't talking code, contracts or specific technologies, it's still very abstract, still boxes and arrows but something looking more like a technical delivery and something relevant to Engineers.

This enables conversations around integration patterns and how we might share data upstream or downstream. If I know a third party solution is going to be used (more on that in another article) they'll get added to the picture along with their integration components so we can map out the journey of data to more detail than the HLD. The same goes for any internal systems that might need to be integrated too.

My rule of thumb, rightly or wrongly, is that anything that starts becoming specific to a technology - like an Azure function app, a SQL database or a Service Bus Technology (RabbitMQ) is too much detail for this level. As a reminder, I'm working as a Solution Architect, if you are lucky enough to be a Software/Application Architect you might find yourself being required to do that Technical component diagram and prescribe to Engineers exactly what needs to be built.

Once that's finished, it's time to engage with an Engineering team. Together with a Lead/Principal/Staff/Senior Engineer(s) the deep down technical design can take place. I'll input my Architecture diagrams, along with non functional requirements around load, volume, data size, performance etc. and together we assess the correct technical components to fulfil the requirements.

Key decisions tend to be;

Relational Database vs Document Database
API vs Service Bus
What parts of the system need to scale and what limits do we need to put in place

I'll lead the design and make sure the technical diagram is completed, although it's a collaborative effort to put it together I have ownership of the artefact to add to my collection.

Now, something that's pretty important to understand here is that, yes, in isolation I "could" do the technical design myself but if I'm not building it and supporting it, I might make a technology decision that doesn't mesh with the team. It's only "fair" that they get a considerable say in what this looks like under the hood. They'll be the ones supporting it 24/7 while I swan off to the next project.

Equally, I shouldn't be an end to end detailed technical expert in all the technologies my company has adopted. I need to know enough to hold a conversation about the pros and cons of a technology, be able to assess it against NFRs but when it comes to the inner workings, it's left to the experts.

An Architect needs wide knowledge across all technologies a company works with, along with many other emerging ones to assess/recommend adoption but not the deep down detailed implementation. It means keeping an eye on the market, reading, following industry experts and trying to stay on top of things. Ultimately though, it's boxes and arrows that are key for me, where this Solution sits within the business and how's it's integrated.

The Project Team

My next comments are very specific to the way I like to work and definitely don't gel with how everybody likes to work. When it comes to Architecture you either dictate a Solution and move on, or you stay with it and follow it through to deployment.

For me, I love being part of project teams and I'm a strong advocate of Agile Architecture. I'll try my best (project capacity dependant) to attend all the agile ceremonies for a team and I've even been known to have my own Architecture stories as part of sprint deliveries. I want to be there, in the thick of it, supporting the team.

At the end of the day, alongside the Business Analyst, we are the subject matter experts for the Solution, the business value, the "why" something is being built so if we are working embedded within the team any questions or issues that arise can be resolved immediately. This also allows for any tweaks or changes to the design to be addressed as quickly as possible. I'm there to make sure the solution matches the design and that the team deliver to my vision.

Conclusion

It's getting a bit long now and the aim for all of this was to try and keep it short/snappy. Hopefully it gives you some insight into the value that an Architect brings but to summarise;

We need to be experts in our specific business domain but knowledgeable in other areas (that classic T individual)
We draw lots of boxes and arrows on pages, a diagram speaks a thousand words
We need to be able to write clear, concise and engaging documentation
We talk to A LOT of people across the business and third parties, communication is key
We need to be able to present our Solution to all levels of a business
There's a whole Technical knowledge and understanding that comes with it, we need to be continuous learners

To round out my list of articles - I'll cover off;

The narrative and artefacts of a Solution
The Build vs Buy decision

Which will hopefully give people a rounded view of the value Architecture brings to a business and what Architecture means to me.

Thanks for your time!

Are You Secretly an Architect?

David Ayres — Tue, 06 Aug 2024 18:20:03 +0000

Intro

Although my first article was all about the magic of CSV Schema Validation (check it out if you haven't already!) the majority of my day to day fits firmly into the Solution Architecture remit. I'm sure some code heavy posts will get written (I still have my hobby code, don't tell my boss), consider this Part 1 in a series of very Architecture focused articles where I'm hoping to;

Give some insight into how I've fallen into the role I'm in.
What it means to me being an Architect.
The average (or not so average) day to day life of a Solution Architect.
Insight, advice and a Sales Pitch for those Engineers who are considering making the move across.

Then if people find those interesting I'll also drop in some more detailed articles on more focused and role specific topics, especially the dreaded Build vs Buy debate that fills most of my days (I enjoy it really) and did somebody mention non-functional requirements!?

The Architect

Of all the roles within the Tech Community, none come with as much inconsistency and widely differing job roles as that of the Technical Architect.

Yes there are lots and lots of flavours of this; Business, Enterprise, Security, Infrastructure, Application, Principal, Solution and a whole host of others that differ from company to company.

You'll almost certainly have experienced working with at least one of these mysterious Architects, who often drift into and out of projects, seemingly preaching down from an Ivory Tower on how things should be done.

Then when they aren't doing that they'll be sighted having hushed conversations with all the "C-Suite" members about top secret initiatives that might be shared months/years later.

That's certainly the overall consensus of Architecture, one I can be guilty of myself sometimes unfortunately. Some of us don't mean to do it, I promise.....

My Journey

So I've pretty stumbled through my career, I did an A-Level in computing because I was sort of good at it and didn't mind it. That then meant going to University and doing Computer Science because again, I wasn't sure on what I wanted to do and it was sort of enjoyable and I was sort of good at it. I was then ejected from University with no real direction, I wasn't prepared for the world of work, either mentally or through my education. I ended up applying for anything I could computer related but found plenty of rejection because of my lack of experience which even to this day, is still the case for many of you.

IT Support

Eventually I managed to land a role at a large IT company close to me doing first line IT support in a call centre. A stepping stone to hopefully work my way up through the organisation. As tedious as that role was, I've always spoken positively of the experience and it's helped shaped a lot of my core skills I still use to this day.

I would be talking to hundreds of people a day, so it forced me to be personable but also how to build rapport with people. Sometimes people would be upset/angry they were having an IT issue and as the "face of the company" I had to try and win them over so we could try and resolve their issue. Typically not the sort of exposure an Engineer has.....

Coupled with that, I had to try and drill down to the underlying problem a customer was having as quickly as possible. I had to learn techniques and the right questions to ask. We were motivated to fix as many issues as possible with the customer on the phone so it became a bit of an art form.

Web Development

I've not been mostly truthful with my experience. While at University I did find myself drawn to Web Development. So while doing my day job, I was honing my skills and dabbling in what I could achieve website wise. I was fortunate enough to get a few private jobs through friends/family and managed to build a bit of a portfolio. Then my stubbornness paid off and my company advertised a role for a trainee web developer which I applied for and got. So I got to write code, learn from peers/senior developers and work on some pretty big sites. What was unique about this role was that I interacted directly with clients. There wasn't a Project Manager or Business Analyst (which today this day still confuses me) instead it was myself and the Team Lead going out to clients, discussing designs, requirements and doing the up sell of what we could deliver for them. I still use the phrase "walking the floor" from that role - which was coined for when the Team Lead and I would visit clients and tour their offices, making contacts and trying to drum up new business. A pretty unique environment but another one that helped shape my core skills. We would solutionise on the fly, so I got very comfortable selling projects to customers.

I moved through a few other companies, tried my hand at agency work but there was something consistent throughout those roles;

I was always customer facing, selling projects/solutions and learning how to describe something technical in a language my audience would understand.
I was designing the projects I worked on and naturally leading teams in how we should deliver them.
Everything would be fast paced, time was money and so decisions needed to be quick and correct first time.
I found myself writing less code and more taking ownership of what was being delivered/how.
Project Managers and Business Analysts came into the picture as the industry matured (now I sound old) but I would still be out there with sales directors, pitching for work then helping distil that for the team.

Architecture

Then one day, I was at the SDD Conference in London and I took myself off to a talk by Juval Lowy about being an Architect and that was it, a light bulb went off in my head and although I loved writing code, I loved designing systems more. That was my motivation; how could I do something better, quicker, cheaper. How could I meet that client's requirements and design something that'll exceed their expectations. I went back to my company and pretty much talked them into making me an Architect, officially taking me away from the code (although that never happened and there was always scenarios where I had to roll my sleeves up) and changing my job description to what I had been doing already.

I was raw and had learnt my trade as I went, shaping myself for the companies I worked for so I did some training although to this day, I still don't have any formal qualifications and have a habit of doing things "my way" no matter where I work.

Which is where the title of this article FINALLY comes back in..... does any of what I enjoyed sound like you? Are you an Engineer that drives more enjoyment from the design and the client interaction than writing the code? Can you talk to a room of people about highly technical topics in a language that anybody listening can understand?

Then perhaps secretly you are an Architect and maybe there's a different and better role out there for you!

CSV Schema Validation

David Ayres — Mon, 22 Jul 2024 14:30:20 +0000

Intro

The humble CSV file; which I'm not going to cover in detail here. If you don't know what a CSV file and were hoping this document would help - I'm more than happy to signpost you to the Wikipedia page.

So what's this document actually about then? Well, no matter how much we fight it, the CSV is a heavily used file format to share large amounts of data across integration platforms, which is especially useful when it comes to the always difficult task of integration across distributed third party systems.

I didn't set out on my career with a strong desire to be an Integration Architect but as most other Solution Architects deal with, we have to wear many hats, so I find myself in a scenario where I'm having to deal with a lot of files, moving between a lot of systems and CSV is a format I have to handle, no matter how much I fight it.

So the good?

CSV is a simple enough format to define; I've got X columns and a delimiter to indicate how to split those columns. Then there's the rows, the many many rows that make up the file. Quick and easy for a no code Integration platform to define. Consuming within a similar platform isn't too difficult and just becomes mostly config. As long as the columns are consistent on all rows you are good to go and can consume the file easily enough.

You can also zip these files and majorly reduce the file size, which makes it much easier to send/receive. Compression is majorly efficient when it comes to a CSV.

The bad?

Trust. The provider of the file has to be very strict on how they generate their file. They always have to have all columns on each row. If they miss a column, the file can't be imported and that row will have to be rejected. Also when generating the file, that pesky delimiter has to be suitably escaped. If it isn't then it'll generate additional columns which will break the file consumer.

For example:

ID,Title,Cost
1,Book 1,4.99
2,"Book, Book 2",5.99
3,Book, Book 3,6.99

Row 3 will have 4 columns if you split the columns on the delimiter: ",". Row 2 should pass because the delimiter is within speech marks which should tell consumers to treat the content between them as 1 column. One hopes that's the case anyway, plus we also hope that any speech marks are suitably escaped.....

So it's a pretty fragile file format and open to be broken.

Also a CSV file isn't able to hold any sort of meta data. This means any CSV Interface has to have a document shared by the owner so that the data held within can be understood. It's another level of trust between the provider and consumer. More often than not fields get treated as a string by the consumer as it's less prone to breaking. That's not ideal.

Then there's the complete lack of any sort of schema to validate the data against. The consumer has to understand the interface document (there's no standard for this at all) and potentially implement custom validation and logic. If you don't have a no code Integration Platform and are writing the code by hand, it can be timely repeating the same effort for each CSV file you consume.

CSV Schemas?

You might have looked into a CSV schema to make consuming these files easier, which is how you've probably stumbled across this article. With XML there's XSD and JSON we have JSON Schema, so why doesn't CSV have anything!? The below is what I've stumbled across with a critique of my opinion of them.

CSV Schema

An attempt was made to try and standardise a CSV Schema approach. An unofficial draft was made in 2016 but was never formally adopted.

CSV On The Web

CSV on the Web Is a fairly new attempt to standardise how a CSV schema should be documented. It's been recommended by the UK Government Digital Service but hasn't gotten the traction it might need. The biggest issue is that nobody has pioneered any .NET libraries to implement validation for it. I tried but it became a bigger job than I was hoping for to write my own library for this. It definitely shows promise and if it matures and there's more industry adoption, this would be a winner.

Ultimately, CSV is the forgotten file format of the digital age and doesn't get as much love or attention as it deserves.

My Use Case

As I've mentioned, I deal with a lot of CSV integrations, I don't have the privilege of a no code Integration Platform so I wanted to have a schema for my CSV files; to validate against when I'm generating files, or to use when consuming them. A quick schema check ahead of publishing and consuming data helps the end to end Integration process and improves data quality across the journey. There's been too many issues where bad files have been shared, which trigger support tickets and ultimately cost somebody time to look into and debug issues.

I've managed to cobble together an approach for CSV validation that's proven to be fast, scalable and has managed to handle even the weirdest of file content I have to deal with on a daily basis, hopefully it's useful to somebody!

CSV Schema Validation Tool

It's rare these days to have the pleasure of writing your own unique piece of software from scratch but validating a CSV file in .NET, with no packages I could just download and use, felt like I was pioneering something so I rolled up my sleeves and got stuck in!

Step 1; JSON Schema notation.

If you haven't been lucky enough to work with JSON Schema, then check it out here. It's mature, well supported, feature rich and a staple when defining JSON file formats and API responses. Why not make this work for a CSV?

This was the easy part. A CSV header doesn't take a huge amount of work to be written as a JSON schema. With not too much work I was able to define a schema that has all the wonderfulness like variable types, lengths, I can do REGEX patterns (who doesn't love a REGEX!?), enums and even better some of the built in JSON Schema field types like email formatting. Excellent.

To support a CSV file, I've introduced a couple of custom values that need to be added to the JSON Schema. All CSV files have a delimiter, so that's a mandatory field. Also (frustratingly) a CSV file doesn't have to have a header row, so that's a second mandatory field that needs to be added to the schema.

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "csvDelimiter": ",",
  "csvHasHeader": "true",
  "properties": { }
}

Step 2; CSV to JSON.

Now, as I've mentioned above, I work with the Microsoft stack so we can now start delving into some code.

The first thing to do is read in the JSON schema file. To understanding how to read the CSV file, we need that header and delimiter meta data. After some research, I settled on JsonSchema.net to read and parse my Schema files. Once the file is read, there's some validation checks to make sure the delimiter and header fields are present. If not, we reject the files as that's mandatory metadata.

The second thing to do is consume the CSV into the application and to do that, for years I've advocated for the NuGet package: CSV Helper. It's been around for a very long time and for good reason. It'll read a CSV file very quickly and comes with a "dynamic" type, so it's simple enough to generically read in a CSV file into a collection. During the read process, we pass in the delimiter value and if the file has a header or not, it does all the hard work for us!

One thing I love about CSV Helper is that it handles all the special characters for you, even a carriage return within double quotes;

firstName,lastName,age
John,Fish,5
David,"Cr
ab",22

So the above will still produce 2 rows of data, with 3 columns but the lastName for David would be:

Cr
ab

Which look strange but is exactly what we are expecting.

** It's slowly coming together nicely..... **

The final step of the puzzle is to convert a CSV file into a JSON file. This is the slightly less elegant part of the solution.

We loop through the CSV collection building up a dictionary of <string,object>. The string part is the column name that we extract from the validated schema file. We simply take each column in the CSV row and pull out the positional field name from the schema.

Of everything this feels a little "hacky" but there's no other way to associate the column name to the CSV. As column positioning is critical in a CSV, this approach simply takes advantage of that. If the schema column order doesn't match the CSV column order, everything will fall over and throw validation errors but I deemed this acceptable due to the behaviour of CSV.

The second thing we do here is make sure that the data from the CSV column is parsed and stored in the dictionary as the correct type. JsonSchema.net has built in enums for Schema Value Types, so we take the field type from the Schema and parse the value from the CSV. Now we've got a nicely formatted dictionary! If we didn't properly parse the data, the schema validation would fail.

The final step in transforming the CSV to JSON is to take the Dictionary and pass it through the System.Text.Json JsonSerializer so it becomes a JSON friendly string, then we parse it into a JsonDocument using JsonDocument.ParseAsync for the code to then treat it as a valid JSON Document. Remember, we are doing this row by row so each row is treated like it's own individual JSON document. The reason for this is that it gives us a line by line schema validation result so that consumers can be directed to specific faults. It also means that the JSON Schema file is written as if it's one simple row, which makes the notation simpler and easier to understand. This could be something to revisit for a V2.

The final step is to pass that JSON document into the Schema validator library and out come some results! That's the easy part. It returns some results, so we check through these and add any errors to a general error object that can be utilised by a system calling this library and hey presto - we can use JSON Schema to parse a CSV file!

Performance

Without spending too much time on optimisation, it can handle a good 500,000 rows in a couple of seconds. Given these are slow moving files being validated as part of larger integration journey, that's acceptable performance. I'm sure it can be optimised further.

The Quirks

All companies have Technical Debt and ghosts from past decisions that haunt them. Recently for me that's been multiple different CSV formats within the same file. Eurgh. However, with this tool we can handle these scenarios.

There would need to be a JSON Schema file for each row format, then a way to identify what type of row is being processed. Then instead of validating a file and schema combination, we can validate passing in a row and schema one at a time. Doing it this way means we have to handle any error messaging slightly differently but with a little bit of pre-processing of the data, it becomes a trivial hurdle.

Conclusion

Hopefully this has been useful and you've been able to solve a problem you've got! My approach is still in it's infancy but is already being tested in a Production environment.

For those of you interested in the Source Code, it's available here. Feel free to do a PR if you can see improvements. A reminder I'm a hobby engineer these days so there's definitely room for improvement!