<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abhinav Singh</title>
    <description>The latest articles on DEV Community by Abhinav Singh (@abhinav_singh_04da9e27ca7).</description>
    <link>https://dev.to/abhinav_singh_04da9e27ca7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2914988%2F0768a280-759f-4fe5-ba0e-49d3cd27be8f.jpg</url>
      <title>DEV Community: Abhinav Singh</title>
      <link>https://dev.to/abhinav_singh_04da9e27ca7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abhinav_singh_04da9e27ca7"/>
    <language>en</language>
    <item>
      <title>Data Synchronization from Google BigQuery to ClickHouse in an AWS Air-Gapped Environment</title>
      <dc:creator>Abhinav Singh</dc:creator>
      <pubDate>Thu, 06 Mar 2025 04:03:32 +0000</pubDate>
      <link>https://dev.to/abhinav_singh_04da9e27ca7/data-synchronization-from-google-bigquery-to-clickhouse-in-an-aws-air-gapped-environment-4iki</link>
      <guid>https://dev.to/abhinav_singh_04da9e27ca7/data-synchronization-from-google-bigquery-to-clickhouse-in-an-aws-air-gapped-environment-4iki</guid>
      <description>&lt;p&gt;&lt;strong&gt;Understanding the Key Components&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Airgap Environment&lt;/strong&gt;&lt;br&gt;
An airgapped environment enforces strict outbound policies, preventing external network communication. This setup enhances security but presents challenges for cross-cloud data synchronization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Proxy Server&lt;/strong&gt;&lt;br&gt;
A proxy server is a lightweight, high-performance intermediary facilitating outbound requests from workloads in restricted environments. It acts as a bridge, enabling controlled external communication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ClickHouse&lt;/strong&gt;&lt;br&gt;
ClickHouse is an open-source, column-oriented OLAP (Online Analytical Processing) database known for its high-performance analytics capabilities.&lt;/p&gt;

&lt;p&gt;This article explores how to seamlessly sync data from BigQuery, Google Cloud’s managed analytics database, to ClickHouse running in an AWS-hosted airgapped Kubernetes cluster using proxy-based networking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case&lt;/strong&gt;&lt;br&gt;
Deploying ClickHouse in airgapped environments presents challenges in syncing data across isolated cloud infrastructures such as GCP, Azure, or AWS.&lt;/p&gt;

&lt;p&gt;In our setup, ClickHouse is deployed via Helm charts in an AWS Kubernetes cluster, with strict outbound restrictions. The goal is to sync data from a BigQuery table (GCP) to ClickHouse (AWS K8S), adhering to airgap constraints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenges&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Restricted Outbound Network:&lt;/strong&gt; The ClickHouse cluster cannot directly access Google Cloud services due to airgap policies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Data Transfer Between Isolated Clouds:&lt;/strong&gt; There is no straightforward mechanism for syncing data from GCP to ClickHouse in AWS without external connectivity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;&lt;br&gt;
The solution leverages a corporate proxy server to facilitate communication. By injecting a custom proxy configuration into ClickHouse, we enable HTTP/HTTPS traffic routing through the proxy, allowing controlled outbound access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F121jpyu3bek920qdtkud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F121jpyu3bek920qdtkud.png" alt="Image description" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture Overview&lt;/strong&gt;&lt;br&gt;
BigQuery to GCS Export: Data is first exported from BigQuery to a GCS bucket.&lt;br&gt;
ClickHouse GCS Integration: ClickHouse fetches data from GCS using ClickHouse’s GCS function.&lt;br&gt;
Proxy Routing: ClickHouse’s outbound requests are routed through a corporate proxy server.&lt;br&gt;
Data Ingestion in ClickHouse: The retrieved data is processed and stored within ClickHouse for analytics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation Steps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Proxy Configuration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Created a proxy.xml file defining proxy details for outbound HTTP/HTTPS requests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Used a Kubernetes ConfigMap (clickhouse-proxy-config)* to store this configuration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mounted the ConfigMap dynamically into the ClickHouse pod.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Kubernetes Deployment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Mounted proxy.xml in the ClickHouse pod at /etc/clickhouse-server/config.d/proxy.xml.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adjusted security contexts, allowing privilege escalation (for testing) and running the pod as root to simplify permissions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3gvfdkxirark7qpctt1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3gvfdkxirark7qpctt1.png" alt="Image description" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Testing and Validation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Deployed a non-stateful ClickHouse instance to iterate quickly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verified that ClickHouse requests were routed through the proxy.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Observed proxy logs confirming outbound requests were successfully relayed to GCP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuixh1jtgryv5avhntogk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuixh1jtgryv5avhntogk.png" alt="Image description" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Left window shows query to BigQuery and right window shows proxy logs — the request forwarding through proxy server&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Outcome&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This approach successfully enabled secure communication between ClickHouse (AWS) and BigQuery (GCP) in an airgapped environment. The use of a ConfigMap-based proxy configuration made the setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalable:&lt;/strong&gt; Easily adaptable to different cloud vendors (GCP, Azure, AWS).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible:&lt;/strong&gt; Decouples networking configurations from application logic.
&lt;strong&gt;Secure:&lt;/strong&gt; Ensures outbound traffic is strictly controlled via the proxy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By leveraging ClickHouse’s extensible configuration system and Kubernetes, we overcame strict network isolation to enable cross-cloud data workflows in constrained environments. This architecture can be extended to other cloud-native workloads requiring external data synchronization in airgapped environments.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>airgapped</category>
      <category>bigquery</category>
      <category>kubernetes</category>
    </item>
  </channel>
</rss>
