loading...

Load a JSON file to Google BigQuery using Python

jescudegodia profile image Jordi Escudé Gòdia 🎗️ ・1 min read

I'm starting to learn Python to update a data pipeline and had to upload some JSON files to Google BigQuery. Hope this helps people in need!

See GCP documentation (for a CSV example).

Steps before running the script:

  • Create a Google service account with BigQuery permissions.
  • Download the json key. Do not commit into git! Use .gitignore if needed.
  • Add the key to your .env variable. This will get load via load_dotenv library. Again, do not commit .env into git!

Example of your .env

GOOGLE_APPLICATION_CREDENTIALS=your-gcp-project-name-aaa333111aaa.json
  • Create the dataset via GCP Console, in the BigQuery section.
  • Run the script!

upload_json_to_bq.py script:

from google.cloud import bigquery

from google.oauth2 import service_account
from dotenv import load_dotenv
load_dotenv()

client = bigquery.Client()
filename = '/path/to/file/in/nd-format.json'
dataset_id = 'DatasetName'
table_id = 'TableName'

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = True

with open(filename, "rb") as source_file:
    job = client.load_table_from_file(
        source_file,
        table_ref,
        location="europe-west1",  # Must match the destination dataset location.
        job_config=job_config,
    )  # API request

job.result()  # Waits for table load to complete.

print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))

Note: the file must be a JSON newline delimited file.

Discussion

pic
Editor guide