DEV Community

Seenevasaraj
Seenevasaraj

Posted on

BIGQUERY DATA INSERTION USING PYTHON

Hey Reader,

My name is Seenevasaraj, and I am working as Software Developer at Luxoft India. The various project at Luxoft, I am implementing automation in all viable way. Through this article I desired to provide an explanation about records insertions into bigquery table using google-cloud-bigquery(python module)

What is bigquery
BigQuery is a fully managed enterprise data warehouse developed by Google. It offers powerful features for managing and analyzing large datasets. Here are some key points about BigQuery:

Serverless Architecture: BigQuery operates in a serverless environment, allowing you to execute standard SQL queries and analyze massive amounts of data without worrying about infrastructure management1.
Scalability: It can handle petabytes of data, making it suitable for organizations dealing with large-scale datasets.
SQL Queries: BigQuery supports querying using a dialect of SQL, which makes it accessible to users familiar with SQL syntax.
Built-in Machine Learning: BigQuery includes built-in machine learning capabilities, enabling advanced analytics and predictive modeling.
Cost-Effective: It’s a cost-effective solution, as you only pay for the resources you use.
Business Insights: By turning big data into valuable business insights, BigQuery helps organizations make informed decisions.

What is google-cloud-bigquery
The google-cloud-bigquery module in Python provides a client library for interacting with Google BigQuery, a powerful data warehouse and analytics platform. Let me break down the key points:

Purpose:

  • Google BigQuery is designed for querying massive datasets quickly using SQL.
  • It allows you to analyze data efficiently without worrying about infrastructure management.
  • You can focus on finding meaningful insights from your data.

Features:

  • NoOps: No need for database administrators; Google manages the infrastructure.
  • SQL Queries: You can use familiar SQL syntax to query data.
  • Pay-as-you-go Model: You pay only for the resources you use.

Installation:

  • To use this library, follow these steps:
  • Select or create a Cloud Platform project.
  • Enable billing for your project.
  • Enable the Google Cloud BigQuery API.
  • Set up authentication.

Supported Python Versions:

  • Python >= 3.7 is supported.
  • Older versions (Python 2.7, 3.5, 3.6) are not supported.

Example Usage

Image description

Bigquery-Python authentication setup

  • Create google service account to make python client and bigquery authentication

steps to create account
When you create a service account, you need to offer an alphanumeric ID (SA_NAME within the samples below), together with my-provider-account. The ID must be between 6 and 30 characters, and can contain lowercase alphanumeric characters and dashes. After you create a service account, you can't alternate its call.

The carrier account's name seems within the electronic mail deal with that is provisioned at some stage in creation, inside the layout SA_NAME@PROJECT_ID.Iam.Gserviceaccount.Com.

Each carrier account additionally has a everlasting, unique numeric ID, which is generated robotically.

You also offer the subsequent statistics whilst you create a service account:

SA_DESCRIPTION is an elective description for the provider account.
SA_DISPLAY_NAME is a pleasant name for the provider account.
PROJECT_ID is the ID of your Google Cloud challenge.

  • Pass service account json report as a argument to bigquery Client class

Inserting new records in list format
The first way you can upload data is per row. Here, a list of tuples appends two new rows to the table ‘test_table_creation’ using the function .insert_rows()

Image description

Insert into a Nested Table
Like above, when studying from a nested desk it’s additionally essential to deal with nested statistics unique additionally whilst putting information. In order to put in writing records into a unmarried nested desk, you could define a dictionary within the list that contains the nested statistics. The key of every entry then corresponds to the column call into which the statistics is to be inserted.

Image description

Upload csv into bigquery table
Now which you recognise how to import man or woman records into Big Query with Python, now we try the whole thing with a CSV. With the Pandas and CSV libraries we can first load information from CSV file.

Image description

Summary
Google BigQuery and Python—in this example the Cloud Datalab — is a very powerful toolset. Now you should know how to insert our records into BigQuery using Python. This could be useful for Data Analysts and Data Scientist who want to save their work in BigQuery and eventually visualize it. But also for Data Engineers who can realize data pipelines with Python. A special case is nested data, but since BigQuery works very well with it and often performs better, this topic should be familiar to you.

Top comments (0)