Bulk data can always contain errors, whether you're parsing CSVs or Excel workbooks row by row or COPYing direct from a file or program. There isn't really anything to do about it other than perform what validation you can up front and use transactions so you can roll back if a problem slips through. If you're receiving the source data from another party, supplying a standalone validator to them could at least help cut down on the number of aborted ETLs.
You're completely right, bulk data will always be troublesome, especially when it comes from another party.
Users don't always know that. I believe it's our job to come up with some alternate solutions that solve the problem and reduce headaches from both parties.
A long time ago I worked for a software company that was maintaining a web app that had this "update by spreadsheet" feature. I can't tell you how many times users came asking for help because the software rejected their file. And usually when we opened the file the data was completely wrong. That problem could've been solved with a standalone validator like you said, but we weren't allowed to develop new features for that project, so we had to just keep fixing excel files over and over 😥
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.