DEV Community

bronifty
bronifty

Posted on

Spark Is a DIY Athena - Or SQL Over Python Dataframes

am new to Spark, but so far I would say: it is the code libraries for a DIY Athena, which lets you write SQL against a file system (eg S3) like it was a database.

Some rather dense documentation discusses a Python Dataframe API (multidimensional arrays - aka matrices) through which SQL accesses these files. So SQL over Dataframes, essentially.

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more