DEV Community

Mike Houngbadji
Mike Houngbadji

Posted on

Apache Spark SQL / Hive: Create External Table based on File in HDFS

To create a Table based on a file located in HDFS, we'll proceed as follow:

  • Update the file/folder to HDFS:
hadoop fs -put /local/source/location /hdfs/destination/location
Enter fullscreen mode Exit fullscreen mode
  • Create the table using the below SQL:
CREATE TABLE sample_table(
        key STRING,
        data STRING)
USING CSV  -- This is based on the format of your source files
OPTIONS ('delimiter'=',',  -- This only needed for delimited file.
        'path'='hdfs:///hdfs/destination/location')
Enter fullscreen mode Exit fullscreen mode
  • We can the now query our table:
SELECT *
FROM sample_table
Enter fullscreen mode Exit fullscreen mode

References:
SparkSQL Documentation - Create Table

PS:
I wrote this to also help myself retrieve the solution faster.

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up