Discussion on: DO NOT trust your frontend validators

View post

Don't do your validation in the API/middleware either! To be truly robust all constraints should be built into the database and APIs will call stored procedures for writes and stored procedures or views for reads. Whatever you do don't use the Repository pattern or ORMs for connecting to the database. At the end of the day the database is your one source of truth.
You should however carry out validation and give feedback at front end and API levels as sending data you know to be bad across the network is expensive, but ultimately when using a REST protocol you are unable to know if your data interaction is still valid until you try and hit the data store.

donny roufs • Aug 20 '22

At the end of the day the database is your one source of truth.

Uh no. That's far from true. Enterprise scaled applications are usually built around the domain meaning that the database is an implementation detail. Small apps that are usually aimed to be a POC or MVC might have their database as one source of truth but beyond that hell no.

András Tóth • Aug 31 '22 • Edited

Sorry to chime in so late, but there is a viewpoint not expressed here leading to the wrong conversations.
Who's responsibility is to keep data consistent or free of garbage? (i.e. invalid states)
Regarding if you say frontend, backend or database I have bad news for all of them.

If you think of the layers as an onion then...

The core would be the database (or distributed database, or databases if many services roll their own).
Then you can have n number of services that operate upon that. This means if you choose not to validate data written into your database now you have to do it n times!
Then you can have k number of frontends, plus other services and hackers with scripts that send in data. Obviously, as the article is also saying, at this point validation on the frontend is just UX ensuring a smooth user experience - i.e. when you input invalid data you do not need to wait for submitting the form to see the phone number format is wrong.

And then if you choose...

The database layer to ensure data validity/consistency you are going to face horrible source control options, inability to unit test solutions, hiring issues (sadly ORMs make hiring easier on the expense of good DB code), etc. Writing code to the DB is not a rich coding experience.
The service layer, then you need to figure out how to distribute your validator among many codebase, sometimes many languages! Good luck maintaining a validation library in Java, JS, Go, Rust etc. at the same time.

Everything sucks, just retire early and open a bar. 😐

Thomas Hansen AINIRO.IO • Aug 31 '22

Everything sucks, just retire early and open a bar

Respectfully, but if you look at your web API as a micro service, you can ensure all clients are using the same micro service to interact with the database. Creating such "bottle necks" is often very valuable, since it implies arguably the equivalent of "single source of truth" in regards to code able to modify data, and leads to the same nice place as "single source of truth" leads to related to data normalisation and similar constructs ...

András Tóth • Aug 31 '22

The point is, you are better off if you think about the bottlenecks than if you are not.

As a side note it horrifies me whenever I read "80-90% backend applications are simple CRUD applications, therefore they can be autogenerated from a document.". If your application is a simple CRUD you either don't have data consistency or an actual useful, sellable product.

Thomas Hansen AINIRO.IO • Aug 31 '22 • Edited

If your application is a simple CRUD you either don't have data consistency or an actual useful, sellable product

Define CRUD. Our "CRUD" generator allows you to apply.

reCAPTCHA values for individual verbs towards individual tables
Authorisation requirements for individual verbs towards individual tables
Row level security implying for instance users cannot see individual records that aren't their own "property"
Decide which rows are included in which CRUD verb endpoint
Automagically takes care of foreign keys, adding auto complete widgets in the frontend when you've got a foreign key, doing lookups into the referenced table
Publishing socket messages upon write invocations towards data
Implement caching
Log invocations
Add validators server side for individual fields
Etc, etc, etc ...

I'd say that covers about 80% to 90% of the stuff me and you typically do, assuming your background is enterprise software development ... ;)

... unless of course you're one of these guys always looking for an opportunity to make stuff more complex ... :/

an actual useful, sellable product

Psst, Microsoft Office Access ...?

Last time I checked it was selling pretty decent ...? ;)

Ricardo • Aug 20 '22

What? I don’t mean to be rude. But I disagree.
Owner of business logic is the application (C#, Java, Jose, whatever). Database is only the repository.
Validation belongs to businesses logic

Aaron Reese • Sep 1 '22

As Thomas pointed out, validation belongs in the bottleneck. So if ALL your data manipulation goes through ONE set of APIs you are safe to put the logic there. I proposed the database because it is (almost) always the ultimate bottleneck.
If your app offered different data storage solutions then the case for dB logic is diminished.
A number of references have been made to source code control of the database. For MSSQL there is a superb range of products from RedGate for SCC, migration, data compare, lineage, unit testing and a few other critical tools.

Thomas Hansen AINIRO.IO • Aug 21 '22

If you use your database as only the repository, you're missing out on a lot of features and safe guards. Of course, for a NoSQL guy what you're saying makes sense, simply because you've got no features allowing you to even validate your objects in your database. However, for the rest of us, the database and its schema can help us with a lot of things, creating guarantees for us, that prevents garbage data from entering our "repository".

Thomas Hansen AINIRO.IO • Aug 19 '22

Don't do your validation in the API/middleware either! To be truly robust all constraints should be built into the database and APIs will call stored procedures

I love the idea of moving validation logic as close to my data storage as possible. However, I also don't like putting business logic directly into my database. Yes, it's an oxymoron, I know :D

But this is a matter of taste I guess. I see your point here, especially when you've got multiple APIs accessing the same database - However, I suspect it's difficult to prevent users from using raw insert and update statements anyways, which of course would bypass the stored procedure inserts and updates ...

However, I think this is a matter of taste tbh with you, and you're definitely "closer" to my personal opinion than the guys simply adding frontend RegEx validators to the mix ... ;)

Jack • Aug 19 '22

This is kinda my take too. I know the most robust way is to build validators and constraints directly into the database. But in reality, you should only need to validate data at its contact point.

Once I've validated the request payload, I (as the developer) should know that my data is "safe" and the only person who can screw it up is me 😅

danjelo • Aug 19 '22

I usually use constraints for at least PK/FK keys. I have gotten in serious mess a few times when there were none and data migrations and faulty logic put wrong ids as keys :)

As a side note, some ORM's such as EF Core have some nice code first functionality where validation in models are reflected in db as constraints.

Thomas Hansen AINIRO.IO • Aug 19 '22

As a side note, some ORM's such as EF Core have some nice code first functionality where validation in models are reflected in db as constraints

The problem I've got with EF is the disparity between the RDBMS and its "OOP circus". For instance, it's very tempting to just do myObject.Save(). This model of using a database increases bandwidth consumption (passing in whole object during updates for instance), it increases chatter towards DB, and it makes it harder to synchronise access, resulting in the need for "locking records" either logically, or physically somehow ...

danjelo • Aug 19 '22

Yes agree. Have to say I am not really a fan of ORM's in general for the OR impedance mismatch for one thing and its tendancies to generate hellish SQL :) Recently troubleshooted a slow EF Core query. Could not find the issue, likely some sort of "parameter sniffing" issue where the query plan was not used.

Aaron Reese • Aug 19 '22

@jack:

But in reality, you should only need to validate data at its contact point.

Getting a bit OT here, but I absolutely disagree. You are about to 'POST' a customer order. How do you know if between the time the customer started the order on the app/website and submitted it, that the finance team have not put the customer account on hold for non-payment. This can only be done on the back end. On a really busy system (e.g. Amazon on Black Friday) this order request may even go into a message queue and may not get processed for several minutes. By the time it gets loaded into the system, the stock may be gone or the account may be suspended.

Thomas Hansen AINIRO.IO • Aug 19 '22

These are problems 90 percent never faces …

Jack • Aug 19 '22

You've quoted me but without the italics which totally changes the tone of my statement 😆

I don't work at Amazon or anything close that kind of scale, and the chances of something going wrong between contact point and database is virtually (virtually) 0.

Thomas Hansen AINIRO.IO • Aug 19 '22

Hehe 😅

You wish I was sorry. Sorry, but I’m not 🤪😉

Aaron Reese • Aug 19 '22

I also don't like putting business logic directly into my database

Why not? I can think of a few reasons but I would love to hear yours. To a certain extent I was being controversial with my original reply. Perhaps there should be a distinction between 'business logic' and 'data integrity'. Entering a telephone number and postal address in different countries doesn't break data integrity but it could be against business rules.
Ultimately someone/something has to be responsible for the validity of the data. If the data store is the one constant (you mentioned that [backend] users could do direct INSERT statements - well put the logic in a trigger.
In case you can't guess, I am a database guy. When the FE or API developers screw up the logic, guess who has to sort out the mess :)

Thomas Hansen AINIRO.IO • Aug 19 '22

Why not? I can think of a few reasons but I would love to hear yours

First of all I find it incredible hard to write validation logic in SQL. For instance, how do you validate an email address being valid in a stored procedure. I'm sure it can be done, I'm just not entirely sure if I want to see the code ... ;)

Perhaps there should be a distinction between 'business logic' and 'data integrity'

100% agree! Everything you can make the database take care of, you should make the database take care of, such as referential integrity, not null / versus null, field length, etc. However, in my video I illustrate a case where the validator semantically communicates that a field is not long enough back to the client. Validating things such as these in your stored procedure would be hard, and also probably result in an exception that it's impossible to return to the user because of security issues. Not to mention that the database is typically deployed on a different machine, possibly different network, as the backend API, resulting in one additional network request, resulting in that it's faster to validate in the API backend.

In case you can't guess, I am a database guy

Ahh, makes sense :)

By all means, apply as much data validation as you can in the database, I guess I just have a somewhat similar opinion to database validation as I do with frontend validation; "It's cool, nice to have, but don't exclusively do it" ... ;)

(For different reasons though)

guess who has to sort out the mess

I feel your pain ... :/

András Tóth • Aug 31 '22

However, I also don't like putting business logic directly into my database.
And why is that? Because the database is a really clunky coding experience.

I came to the conclusion that it is time for reimagining SQL. The language and connection must be modernized:

gain the ability to easily integrate with source control tools like git
modern programming language features: move away from thinking "it's a language to query the database" to have packages, code modules, unit testing/mocking capabilities

If this sounds ridiculous how does it sound to do n non-transactional rounds to the DB just because the team can only use ORMs and they don't know how to write the one action as one database transaction...

Thomas Hansen AINIRO.IO • Aug 31 '22

how does it sound to do n non-transactional rounds to the DB just because the team can only use ORMs

I've already covered ORMs ... ;)