Focus On the Structure
CSV, Excel, and JSONL need more validation love
Tabular data can be handled with as much rigor, quality, and efficiency as document-form, containment models (e.g. JSONSchema, XSD) or tables based data (e.g. SQL). And it should be!
As one example, CsvPath Validation Language can accommodate multiple entities living in a single tabular document using a sophisticated schema syntax as capable as DDL. In fact, it can do this at least four different ways. Each of them common enough in the wild.
- Mixed parent-child relationships by line adjacency and order
- Multiple entities' data lines grouped one entity after another
- Entities side-by-side, line-by-line
- Entities in sub-table like clusters organized visually and floating in a tabular landscape (e.g. Excel files with ancillary table/boxes for sub-calculations, dimensions, etc.)
The image above outlines each of these. Can you think of more ways to position multiple entities in a tabular CSV, Excel, or JSONL data file?
Have a look at this article on CSV validation schemas to see the syntax and some approaches used in the validation part of CsvPath Framework's data preboarding architecture.

Top comments (0)