DEV Community

Chris
Chris

Posted on

How to handle complex json schema

I am working in databricks. I have a mounted external directory that is an s3 bucket with multiple subdirectories containing call log files in json format. The files are irregular and complex, when i try to use spark.read.json or spark.sql (SELECT *) i get the UNABLE_TO_INFER_SCHEMA error. the files are too complex to try and build a schema manually, plus there are thousands of files. what is the best approach for creating a dataframe with this data?

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay