DEV Community: Pratham Chauhan

Self Hosting ClickHouse on a Shared EC2 Box: Private Access via Cloudflare Tunnel + Automated S3 Backups

Pratham Chauhan — Fri, 17 Jul 2026 15:12:34 +0000

I recently needed an analytics database for a side project. ClickHouse was the obvious pick, it’s stupidly fast for the append heavy, aggregate later workload I had. What I didn’t want was a managed cloud bill, and I already had an EC2 instance running a few other services. So the plan became: run ClickHouse on that same box, keep it completely off the public internet, let a separate web app read from it, and never lose the data.

This is the full walkthrough install, tuning it to co exist with other services on a small box, locking down users, exposing it to a remote app through a Cloudflare Tunnel (zero open ports), and a nightly S3 backup. I’ve folded in every wall I hit along the way as callouts, because those were the parts that actually cost me time.

💡 The shape of it. ClickHouse listens on localhost only. A cloudflared daemon on the same box makes an outbound connection to Cloudflare; a remote app reaches the DB through that tunnel, gated by a Cloudflare Access service token. Backups go to S3 via an EC2 IAM role. No inbound ports for the database, ever.

My box for reference: 2 vCPU / 8 GB RAM, Ubuntu, shared with a few other services.

1. Install ClickHouse

Straight from the official deb repo:

sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
ARCH=$(dpkg --print-architecture)
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg arch=${ARCH}] https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client

The installer prompts you to set a password for the default (admin) user. Set one and save it — it lands in /etc/clickhouse-server/users.d/default-password.xml.

⚠️ Gotcha one command per line. Don’t paste that GPG line and the ARCH= line together as one line. If they get concatenated, gpg tries to open ARCH=amd64 as a file and fails with a cryptic “No such file or directory”. Same with the curl ... | sudo pipe if a paste wraps it across lines, sudo runs with no command. Run each as its own line.

By default ClickHouse binds only to 127.0.0.1/::1 leave it that way. Everything else in this guide relies on it being loopback only.

2. Add swap (cheap OOM insurance)

Small instances usually have no swap, and ClickHouse merges can spike memory. A swapfile keeps the OOM killer from murdering the process (or worse, a neighbour service):

sudo fallocate -l 4G /swapfile && sudo chmod 600 /swapfile
sudo mkswap /swapfile && sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

3. Tune memory for a shared box

ClickHouse’s defaults assume it owns a big server: a 5 GiB mark cache, generous background pools. On a shared 8 GB box that will starve everything else. I capped it with an absolute memory limit (not a RAM ratio, a ratio would grab 80% of total RAM and choke my other services), plus trimmed caches and noisy system logs.

Best practice is to leave config.xml untouched and drop overrides into config.d/:

sudo tee /etc/clickhouse-server/config.d/z-low-memory.xml >/dev/null <<'XML'
<clickhouse>
    <max_server_memory_usage>2684354560</max_server_memory_usage>   <!-- 2.5 GiB hard cap -->
    <mark_cache_size>268435456</mark_cache_size>                    <!-- 256 MiB, was 5 GiB -->
    <max_concurrent_queries>10</max_concurrent_queries>
    <mlock_executable>false</mlock_executable>
    <metric_log remove="1"/>
    <asynchronous_metric_log remove="1"/>
    <trace_log remove="1"/>
    <text_log remove="1"/>
</clickhouse>
XML
sudo systemctl restart clickhouse-server

With the hard cap, ClickHouse throws a clean “memory limit exceeded” at 2.5 GiB instead of getting OOM killed and taking neighbours down with it.

⚠️ Gotcha don’t lower background_pool_size. My first instinct was to also set <background_pool_size>2</background_pool_size> to save resources. The server then refused to boot:
DB::Exception: The value of 'number_of_free_entries_in_pool_to_execute_mutation' setting (20)
is greater than 'background_pool_size' * 'background_merges_mutations_concurrency_ratio' (4).
... mutations cannot work with these settings. (BAD_ARGUMENTS)
ClickHouse has a startup sanity check: background_pool_size × concurrency_ratio (2) must exceed the mutation free entry threshold (default 20). With a pool of 2, that’s 2 × 2 = 4 < 20, so it hard fails. Those pool threads are mostly idle anyway and cost almost no memory the real lever is max_server_memory_usage. I just removed the override and let it default. If you genuinely must shrink it, you also have to lower three <merge_tree> free entry settings, which isn’t worth it. Whenever a config change won’t boot, check /var/log/clickhouse-server/clickhouse-server.err.log the exception names the exact setting.

Verify the cap is live:

clickhouse-client --password -q "SELECT value FROM system.server_settings WHERE name='max_server_memory_usage'"
# -> 2684354560

4. Create users the file based way (one read write, one read only)

I wanted two dedicated users instead of throwing default around: app_rw for the service that writes, and app_ro for the web app that only reads. ClickHouse supports declarative XML users in users.d/ passwords as SHA 256 hashes, per user settings profiles, quotas, and scoped grants.

First, hash each password:

echo -n 'your-write-password'    | sha256sum | awk '{print $1}'   # -> WRITE_HASH
echo -n 'your-readonly-password' | sha256sum | awk '{print $1}'   # -> RO_HASH

Then define both users. readonly=1 blocks all writes/DDL at the query level; the <grants> block scopes each user to just the metrics database:

sudo tee /etc/clickhouse-server/users.d/z-app-users.xml >/dev/null <<'XML'
<clickhouse>
    <profiles>
        <rw_profile>
            <max_memory_usage>1500000000</max_memory_usage>
            <max_threads>2</max_threads>
            <max_bytes_before_external_group_by>1000000000</max_bytes_before_external_group_by>
            <max_bytes_before_external_sort>1000000000</max_bytes_before_external_sort>
        </rw_profile>
        <ro_profile>
            <readonly>1</readonly>
            <max_memory_usage>1500000000</max_memory_usage>
            <max_threads>2</max_threads>
        </ro_profile>
    </profiles>

    <users>
        <!-- writer: the service running on this same box -->
        <app_rw>
            <password_sha256_hex>WRITE_HASH</password_sha256_hex>
            <networks><ip>::1</ip><ip>127.0.0.1</ip></networks>
            <profile>rw_profile</profile>
            <quota>default</quota>
            <grants>
                <query>GRANT CREATE DATABASE ON metrics.*</query>
                <query>GRANT ALL ON metrics.*</query>
            </grants>
        </app_rw>

        <!-- reader: the remote web app, via the tunnel -->
        <app_ro>
            <password_sha256_hex>RO_HASH</password_sha256_hex>
            <networks><ip>::1</ip><ip>127.0.0.1</ip></networks>
            <profile>ro_profile</profile>
            <quota>ro_quota</quota>
            <grants>
                <query>GRANT SELECT ON metrics.*</query>
            </grants>
        </app_ro>
    </users>

    <quotas>
        <ro_quota>
            <interval>
                <duration>3600</duration>
                <queries>2000</queries>
                <result_rows>100000000</result_rows>
            </interval>
        </ro_quota>
    </quotas>
</clickhouse>
XML
sudo systemctl restart clickhouse-server

💡 Why networks is localhost for both users even the remote reader. The tunnel daemon (next section) opens a fresh connection to localhost:8123, so ClickHouse sees every request as coming from 127.0.0.1, remote or not. Locking the users to loopback is both correct and the tightest setting.

⚠️ Gotcha GRANT ALL ON db.* does not include CREATE DATABASE. When my writer first tried to create the database, it got:
app_rw: Not enough privileges. To execute this query, it's necessary to have the grant CREATE DATABASE ON metrics.*
Database level ALL covers everything inside the database (tables, inserts, selects, alters) but not the privilege to create the database itself that’s a separate grant. Hence the explicit GRANT CREATE DATABASE ON metrics.* line above. It’s scoped to the name metrics, so the user can only ever create that one database.

Sanity check the read only user really is read only:

clickhouse-client -u app_ro --password -d metrics -q "CREATE TABLE t (x Int8) ENGINE=Memory"
# -> "Cannot execute query in readonly mode"  ✅

5. Create the database and tables

Generic schema an events table with a monthly partition and a 12 month TTL, which is a sensible default for time series analytics:

clickhouse-client -u app_rw --password -q "CREATE DATABASE IF NOT EXISTS metrics"

clickhouse-client -u app_rw --password -d metrics --multiquery <<'SQL'
CREATE TABLE IF NOT EXISTS events
(
    event_time  DateTime64(3, 'UTC'),
    event_type  LowCardinality(String),
    user_id     String  DEFAULT '',
    payload     String  DEFAULT '{}',
    duration_ms UInt32  CODEC(T64, LZ4)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, event_time)
TTL toDateTime(event_time) + INTERVAL 12 MONTH;
SQL

LowCardinality(String) for low distinct columns and the T64 codec on the duration give you strong compression for free.

6. Expose it to a remote app with a Cloudflare Tunnel

This is the part that keeps the database private. Instead of opening a port and firewalling it, cloudflared makes an outbound connection to Cloudflare; requests to a hostname get pushed back down that connection to the daemon, which forwards them to localhost:8123 (ClickHouse’s HTTP interface). Zero inbound ports.

I’d already created a named tunnel and installed the service. The routing is defined in a config file with ingress rules:

sudo mkdir -p /etc/cloudflared
sudo tee /etc/cloudflared/config.yml >/dev/null <<'YAML'
tunnel: <your-tunnel-uuid>
credentials-file: /home/ubuntu/.cloudflared/<your-tunnel-uuid>.json

ingress:
  - hostname: db.example.com
    service: http://localhost:8123
  - service: http_status:404          # catch-all, must be last
YAML

cloudflared tunnel ingress validate    # should print OK

Point the systemd service at that config. Edit the unit:

sudo systemctl edit --full cloudflared

Set ExecStart to:

ExecStart=/usr/local/bin/cloudflared tunnel --config /etc/cloudflared/config.yml run

Then reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart cloudflared
journalctl -u cloudflared -n 20 --no-pager   # look for "Registered tunnel connection"

⚠️ Gotcha --url silently ignores your ingress rules. My service was originally started as cloudflared tunnel run --url http://localhost:8000 <name>. That --url flag is single service “quick” mode: it routes every hostname on the tunnel to that one URL and completely ignores config.yml. So my database hostname was quietly being sent to port 8000 and returning 502 Bad Gateway. The fix is exactly what’s above drop --url entirely and use --config ... run so ingress rules take over.

⚠️ Gotcha the run subcommand is mandatory. cloudflared tunnel --config <file> without a trailing run just prints help and exits, and systemd restart loops it. The flag goes before run: cloudflared tunnel --config <file> run.

Confirm the origin is healthy from the box itself:

curl -s localhost:8123   # -> "Ok."

7. Lock the tunnel with Cloudflare Access

Routing to 8123 alone would leave the database open to anyone who knows the URL. The gate is Cloudflare Access: create an Access application on db.example.com with a Service Auth policy and a service token. Every request must then carry CF-Access-Client-Id and CF-Access-Client-Secret headers Cloudflare checks them at its edge and returns 403 to anything without them, before the request ever reaches your box.

Your remote app connects like this (env vars):

DB_HOST=db.example.com
DB_PORT=443
DB_SECURE=true
CF_ACCESS_CLIENT_ID=<service-token-id>
CF_ACCESS_CLIENT_SECRET=<service-token-secret>
DB_USER=app_ro
DB_PASSWORD=<readonly-password>

End to end test from your laptop CF token and read only DB creds:

curl https://db.example.com/ \
  -H "CF-Access-Client-Id: <id>" -H "CF-Access-Client-Secret: <secret>" \
  -H "X-ClickHouse-User: app_ro" -H "X-ClickHouse-Key: <readonly-password>" \
  --data-binary "SELECT count() FROM metrics.events"

Three outcomes confirm everything’s wired: missing CF headers → Cloudflare 403; valid headers + creds → a number; a write attempt with app_ro → ClickHouse “readonly mode” error.

💡 Use X-ClickHouse-User / X-ClickHouse-Key headers, not -u basic auth. Basic auth sets the Authorization header; keeping ClickHouse creds in their own headers avoids any collision with Cloudflare’s CF-Access-* auth.

⚠️ Gotcha the IP allowlist can’t gate the tunnel. Because cloudflared connects to ClickHouse from localhost, ClickHouse sees 127.0.0.1 for all tunnel traffic. Its <networks> allowlist therefore can’t distinguish a remote request from a local one Cloudflare Access is your access control, not the IP list. So: keep the Service Auth policy with no “allow everyone” bypass, give every ClickHouse user a strong password, and hand the remote app only the read only creds.

8. Automated nightly backups to S3

The last piece: I never want to lose this data. I used clickhouse-backup with the EC2 instance’s IAM role for S3 auth, so there are no AWS keys sitting on disk.

8a. Create an IAM role and attach it

In the AWS console: IAM → Policies → Create policy → JSON:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:PutObject","s3:GetObject","s3:DeleteObject","s3:ListBucket","s3:AbortMultipartUpload"],
    "Resource": [
      "arn:aws:s3:::my-clickhouse-backups",
      "arn:aws:s3:::my-clickhouse-backups/*"
    ]
  }]
}

Name it, then IAM → Roles → Create role → AWS service → EC2, attach that policy, name the role. Finally EC2 → your instance → Actions → Security → Modify IAM role and attach it. No reboot needed.

8b. Install and configure clickhouse backup

cd /tmp
curl -fL https://github.com/Altinity/clickhouse-backup/releases/latest/download/clickhouse-backup-linux-amd64.tar.gz -o chb.tar.gz
mkdir -p chb && tar -xzf chb.tar.gz -C chb
sudo mv "$(find chb -name clickhouse-backup -type f | head -1)" /usr/local/bin/
sudo chmod +x /usr/local/bin/clickhouse-backup
clickhouse-backup --version

Config — leaving access_key/secret_key out makes it fall back to the IAM role automatically:

printf 'general:\n  remote_storage: s3\n  backups_to_keep_local: 1\n  backups_to_keep_remote: 14\nclickhouse:\n  host: localhost\n  port: 9000\n  username: default\n  password: "your-default-password"\ns3:\n  bucket: my-clickhouse-backups\n  region: us-east-1\n  path: clickhouse/prod\n' | sudo tee /etc/clickhouse-backup/config.yml >/dev/null
sudo chmod 600 /etc/clickhouse-backup/config.yml
sudo clickhouse-backup print-config    # confirms it parses

⚠️ Gotcha heredocs and YAML hate a flaky paste. My SSH client kept prepending a space to every pasted line. That broke the config two ways: heredocs never terminated (the closing delimiter got indented, so the shell waited forever at the > prompt), and even when written, the YAML failed with did not find expected key because root keys weren’t at column 0. The fix that always works: a single line printf with \n baked in (above) one logical line can’t be mis indented by a paste. Check with cat -A file root keys must be flush left with $ right after and no ^I tabs.

Test it manually before trusting the schedule:

sudo clickhouse-backup create_remote     # freeze + upload in one step, auto-timestamped
sudo clickhouse-backup list remote        # should list your backup in S3

⚠️ Gotcha no EC2 IMDS role found. My first upload created the local backup fine, then failed on the S3 push:
failed to refresh cached credentials, no EC2 IMDS role found, ... EC2 IMDS ... StatusCode: 404
That 404 means no IAM role is attached to the instance the SDK has nowhere to get credentials. I’d forgotten step 8a. Confirm the role is live:
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 60")
curl -s -H "X-aws-ec2-metadata-token:$TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/
# -> your-role-name  (empty = no role attached)
Attach the role, wait ~30s, re run. If you genuinely can’t use a role, you can put access_key/secret_key in the s3: block instead but the role is cleaner and rotation free.

8c. The cron a fixed local time

I wanted 07:30 in my own timezone regardless of the box’s UTC clock. Cron supports CRON_TZ, which spares you any offset math:

CRON_TZ=Asia/Kolkata
30 7 * * * root /usr/local/bin/clickhouse-backup create_remote >> /var/log/clickhouse-backup.log 2>&1

Put that in /etc/cron.d/clickhouse-backup. create_remote with no name auto timestamps each backup, and backups_to_keep_remote: 14 prunes to the last 14 automatically no cleanup job needed.

⚠️ Gotcha no leading whitespace, and no dot in the filename. After my flaky paste, cat -A showed a leading space on both lines. A space before 30 7 ... cron tolerates, but a space before CRON_TZ means cron may ignore the timezone line and run at 07:30 UTC instead. Strip it: sudo sed -i 's/^[[:space:]]*//' /etc/cron.d/clickhouse-backup. Also: files in /etc/cron.d/ must not have a dot in the name cron silently ignores clickhouse-backup.conf. Name it clickhouse-backup.

Verify:

cat -A /etc/cron.d/clickhouse-backup      # two flush left lines
systemctl status cron --no-pager | head -3

Where it stands

That’s the whole thing: ClickHouse running on a shared box, bound to localhost, tuned so it plays nice with its neighbours; two scoped users; a Cloudflare Tunnel + Access combo that lets a remote app read the data with zero open ports; and a nightly IAM authenticated backup to S3 with 14 day retention.

A few things I deliberately left for later (YAGNI until proven otherwise):

Incremental backups (-diff-from-remote) only worth it once full backup time actually hurts.
A “backup didn’t run” alert a second cron that pings me if no success line hits the log in ~26h. Worth adding the day the data becomes recovery critical.
TLS between cloudflared and ClickHouse unnecessary here since that hop is over loopback.

The recurring lesson, if there is one: read the error log. Every single failure I hit the boot refusal, the 502, the IMDS 404, the YAML parse error told me exactly what was wrong in its first line. The fixes were small; finding them was just a matter of looking.

Everything That Goes Wrong With That Setup (And How I Debugged It)

Pratham Chauhan — Thu, 09 Jul 2026 08:08:05 +0000

In Part 1, I walked through setting up Gemini via Vertex AI, from project setup through a keyless EC2 deployment with Workload Identity Federation (WIF). If a few of those terms went by fast, quick recap: a service account is a non-human identity your app logs in as, an IAM role is a labeled bundle of permissions, and WIF is the system that lets AWS vouch for your server's identity to Google without any password changing hands.

This post is the part nobody puts in the official docs: every single thing that went wrong when I actually did this, in the order I hit it, and what each error really meant. Six distinct bugs, stacked on top of each other. Almost every one of them initially looked like a variation of the same "401 Unauthorized" error, and figuring out that they were six different problems, not one stubborn one, was most of the battle.

Bug #1: A 502 that had nothing to do with Google Cloud

Before I'd even gotten to testing auth, the site returned a 502 Bad Gateway. Easy mistake to make here: jumping straight to "something's wrong with my Vertex AI setup." A 502 just means the web server in front of your app (commonly nginx) sent a request to your app and got no answer back at all. It tells you nothing about why, only that your app isn't there to ask.

pm2 list
pm2 logs <app-name> --lines 200
curl -v http://localhost:3000

If curl localhost:3000 doesn't get a response, the problem is the Node process itself, full stop, not anything cloud related. Always rule this out first.

Bug #2: My process manager was silently ignoring my config

The actual cause of that 502: I'd saved my PM2 ecosystem file, which is just a config file that tells PM2 how to start and manage your app, as ecosystem.config.cjs (correct, since module.exports is older CommonJS style syntax) but ran:

pm2 start ecosystem.config.js   # filename typo, PM2 never loaded it

PM2 didn't throw an error. It just fell back to running the app a different way, which meant none of my environment variables, things like GOOGLE_APPLICATION_CREDENTIALS, project ID, location, ever reached the process. The app looked alive. It just had none of the config I assumed it had.

pm2 delete <app-name>
pm2 start ecosystem.config.cjs
pm2 env 0   # confirm the vars you expect are actually there

That last command became my go to sanity check for the rest of this whole debugging session, and honestly should be step one any time an app "isn't picking up" an environment variable.

Bug #3: Chasing the wrong service entirely

With the app finally up, I got a real, recognizable Google error:

{"error": "Could not load the default credentials..."}

Straightforward enough. But a bit later, a completely different looking error showed up:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" ...>
<title>401 - Unauthorized</title>

Here's the detail that mattered: this isn't a shape Google ever returns. Google's API errors are JSON. Their HTML error pages are branded with Google's actual logo and styling. This bare, unbranded XHTML 401 page is a generic default error document, the kind produced by a totally different piece of infrastructure. I burned real time suspecting a corporate proxy sitting between my server and the internet, then suspecting AWS S3, since the app also uploads images there. Both wrong. Direct curl tests against Google and S3 came back clean, with real certificates and their own genuine error formats.

The actual source: http://169.254.169.254/latest/meta-data/iam/security-credentials/. That address is the AWS EC2 Instance Metadata Service, usually shortened to IMDS. Every AWS EC2 server can quietly ask this internal, invisible to the outside world address "who am I, and what permissions do I currently have," without needing a password to ask.

curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# → the exact same XHTML 401 page

Worth explaining why a Google library was even talking to an AWS only address. That same 169.254.169.254 address is the standard link local metadata address on both AWS and Google Cloud. It's not a coincidence, it's part of a convention both cloud providers adopted. Google's auth library has a built in fallback that checks this exact address when it can't find credentials another way, because on a real Google Cloud server, that's genuinely where instance metadata lives. Run that same library on an EC2 instance instead, and it knocks on the same door, except now AWS is answering, with AWS's rules, not Google's. That collision is the single most confusing part of this entire setup, and the reason a "Google auth error" turned out to actually be an AWS one.

Takeaway: when an error's shape doesn't match what the service "should" return, trust that mismatch. It's telling you to look elsewhere.

Bug #4: IMDSv2 vs. an incomplete WIF config

Once I knew it was IMDS, the next question was why it was rejecting the request. My EC2 instance correctly enforced HttpTokens: required, which means it was running the newer, safer version of the metadata service called IMDSv2. In plain terms, IMDSv1 let absolutely anything on the machine ask the metadata address a question and get an answer back immediately. IMDSv2 adds one extra step: you first have to ask for a short lived "session token" with a separate request, and only then can you use that token to actually ask your question. It exists specifically to block a class of attack where a vulnerability in your app could otherwise be tricked into reading your server's cloud credentials. Disabling it is not the fix.

google-auth-library does support IMDSv2, but only if your WIF credential config explicitly tells it to, via one field inside credential_source, which is the part of the config file describing exactly how to fetch AWS's identity proof:

"credential_source": {
  "environment_id": "aws1",
  "region_url": "http://169.254.169.254/latest/meta-data/placement/availability-zone",
  "url": "http://169.254.169.254/latest/meta-data/iam/security-credentials",
  "regional_cred_verification_url": "https://sts.{region}.amazonaws.com?Action=GetCallerIdentity&Version=2011-06-15",
  "imdsv2_session_token_url": "http://169.254.169.254/latest/api/token"
}

My config, generated before I'd thought hard about IMDSv2, was missing that one line. Add it, and the library correctly fetches that short lived session token first before making any metadata requests, satisfying IMDSv2 without weakening it.

(One thing to watch for down the road: there's a known upstream bug where this session token gets cached with only a 300 second lifespan, which can cause intermittent failures on long running processes well after everything looks fixed. If the same error reappears hours or days later, that's the first thing to check, not a regression in your config.)

Bug #5: A service account that didn't exist

Auth chain working, AWS identity exchanged for a Google token, and a brand new error appeared:

Permission 'aiplatform.endpoints.predict' denied... or it may not exist

I tried granting IAM roles to gemini-prod-runner@my-project.iam.gserviceaccount.com, per Part 1's instructions, and got:

INVALID_ARGUMENT: Service account gemini-prod-runner@... does not exist.

The account in my actual project had drifted to a slightly different name during earlier setup. Worse, my WIF config had no service_account_impersonation_url field at all. Impersonation, in this context, just means "let identity A borrow identity B's permissions temporarily." Its absence meant my setup wasn't using that pattern in the first place. It was mapping the AWS identity straight to a Google identity of its own, with no service account in between.

gcloud iam service-accounts list --project="$PROJECT_ID"
cat /path/to/credentials.json   # check for service_account_impersonation_url

Lesson: don't assume your WIF setup follows the impersonation pattern just because that's the common example in tutorials, including Part 1's. Read your actual credential config file before granting anything. It tells you exactly which pattern you're on.

Bug #6: Binding the role to the wrong shape of identity

The final piece: figuring out what string to actually grant roles/aiplatform.user to, since no service account was involved. This came down to the provider's attribute mapping, which is just the rule set that tells Google how to translate the raw identity info AWS sends over into something Google understands:

attributeMapping:
  attribute.aws_role: "assertion.arn.contains('assumed-role') ? ... : assertion.arn"
  google.subject: assertion.arn

A quick translation of the jargon here. An ARN is AWS's way of uniquely naming any resource, including an identity, written as a long string like arn:aws:sts::123456789:assumed-role/my-role/instance-id. The "assertion" is the signed proof AWS hands over saying "this is genuinely who's asking." google.subject is Google's term for the exact identity string it sees.

I could have bound the role to google.subject, the full AWS identity string including the specific EC2 instance ID baked into it. That would have worked, but only for that one instance, breaking the moment it got replaced or the app scaled to a second server. Binding to attribute.aws_role instead, a cleaned up version of that string with the instance specific part stripped out, targets the IAM role generally. That means any current or future instance carrying that role authenticates correctly with no further changes:

gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="principalSet://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/aws-ec2-pool/attribute.aws_role/arn:aws:sts::AWS_ACCOUNT_ID:assumed-role/ec2-gemini-runner" \
  --role="roles/aiplatform.user"

One more bit of jargon worth untangling here: principal versus principalSet. A principal is one single, specific identity, like one exact instance's session. A principalSet is a whole group of identities that share some attribute, in this case "anyone using this AWS role," no matter which instance they're running on. That's why principalSet was the right choice for something durable.

That was the last piece. Gemini finally responded.

What I'd actually do differently

If I were setting this up again, knowing what I know now: I'd get the app fully working with a plain service account JSON key first, and only swap to WIF as a deliberate, isolated second step, since debugging two unfamiliar systems at the same time, your app's Vertex AI integration and a federated trust chain between two clouds, is much harder than debugging them one at a time. I'd open the credential config file before assuming anything about impersonation, since it tells you definitively which auth pattern you're on. I'd compare error response shapes, not just status codes, since a 401 from Google, a 401 from AWS IMDS, and a 401 from anything in between all look different once you actually read them side by side. I'd always fully restart my process manager, meaning delete and start fresh rather than just restart, after any environment variable change, and verify with pm2 env. And I'd follow up every add-iam-policy-binding command with a get-iam-policy check, since a typo'd identity name fails loudly but a propagation delay or a wrong project fails silently.

None of these six bugs were individually hard. Stacked on top of each other, each one disguised as a slightly different flavor of the same 401, they ate a full day. Hopefully Part 1 plus this post saves you most of that day.

How to Let a Vercel App Read a Private ClickHouse on EC2 (Using Cloudflare Tunnel)

Pratham Chauhan — Thu, 02 Jul 2026 10:00:15 +0000

A start to finish walkthrough, with every piece of jargon explained in plain language.

Why you might need this

Sooner or later, most teams end up with two things that need to talk to each other
but live in different worlds.

On one side is a private backend: a database, an internal API, an admin panel,
or some other service running on a server you control (an EC2 box, a machine in your
office, a virtual private cloud). You deliberately keep it locked away from the
public internet, because it holds data that would be dangerous to expose. Locking it
down is the right call for security.

On the other side is a program hosted somewhere else: an app on Vercel, a
serverless function, a scheduled job, a third party tool, or an AI agent. It needs to
read from that private backend to do its work, but it runs outside your private
network, so it has no way in.

This is a genuinely awkward gap, and it shows up constantly:

A dashboard or reporting app on Vercel needs to query a database that only lives inside your private network.
A scheduled job or an AI agent needs to pull fresh data every hour from a warehouse that has no public address.
A webhook or a partner integration needs to reach an internal service without you exposing that service to the whole internet.
You are running something locally or on a home server and want a stable, secure web address for it without buying a static IP or fighting with your router.

The tempting shortcuts all make things worse. Opening a port to the internet turns a
carefully hidden service into a target. Restricting by IP address fails when the
outside program has no fixed address (Vercel and most serverless platforms do not).
Building a custom API server in front of it just moves the exposed door and adds a
second thing to secure and maintain.

What you actually want is a way for the outside program to reach the private service
that:

never requires opening an incoming port,
does not depend on the outside program having a fixed address,
encrypts the traffic and puts real authentication in front of it, and
costs little or nothing to run.

That is exactly what a Cloudflare Tunnel gives you, and this guide walks through
setting one up end to end. The specific example is an AI agent on Vercel reading a
ClickHouse database on EC2, but the same pattern works for any private service and any
outside consumer. Once you understand the shape of it, you will reach for it again and
again.

The problem, in one sentence

We run a ClickHouse database on an EC2 server that is not reachable from the
public internet, and we need a program hosted on Vercel to read from it safely.

That sentence has four pieces of jargon. Here is what each one means:

ClickHouse: a database built for analytics. It stores huge tables of events (in our case, every tool call and session from our apps) and answers questions like "how many errors happened in the last hour" very fast.
EC2: a virtual server you rent from Amazon Web Services (AWS). Think of it as a computer in Amazon's data center that we control.
Not publicly accessible: the server's firewall blocks incoming connections from the internet. Nothing outside can knock on its door. This is good for security, but it means our outside program cannot reach the database directly.
Vercel: a hosting platform where our program (an AI agent) runs. Vercel runs code in the cloud, and its servers do not have a fixed address, which matters later.

So the puzzle is: the database is deliberately locked away inside a private
network, and the program that needs it lives somewhere else entirely.

Why the obvious ideas are the wrong ideas

"Just open the database port to the internet." You could change the firewall
to allow the world to connect to ClickHouse. But now your database is exposed to
every bot on the internet, protected only by a password. One leaked password or
one unpatched bug and your data is gone. We do not want the database reachable
from the open internet at all.

"Only allow Vercel's IP address through the firewall." A firewall can be told
"only accept connections from this specific address." The problem is Vercel does
not give your program a fixed address. Its servers pick from a large, changing
pool of addresses, so there is no single address to allow. This approach does not
work here.

"Build a small API server in front of the database." A common instinct is to
write a little web service (for example in Express, a popular Node.js web
framework) that sits in front of ClickHouse, checks a password, and forwards
queries. The catch: that server still has to accept incoming connections from the
internet, so you have just moved the exposed door from the database to the API,
and now you have a second program to maintain and secure. Unless that API tightly
restricts what queries are allowed, it adds work without adding safety.

We want something better: a way for the outside program to reach the database
without opening any incoming door at all.

The tool that solves it: Cloudflare Tunnel

Cloudflare is a company that sits in front of websites to make them faster and
safer. One of its free products is Cloudflare Tunnel.

Here is the key idea, and it is the part most people get backwards.

A normal setup (a "reverse proxy") works like this: the outside service dials
in to your server. That requires your server to have an open door (an open
port), which is exactly what we are trying to avoid.

A Tunnel works the opposite way. You run a small program on your server called
cloudflared. That program dials out to Cloudflare and holds the
connection open. When a request arrives for your service, Cloudflare pushes it
down that outgoing connection to cloudflared, which then hands it to
ClickHouse over the server's own internal network.

Why this is safe and clever:

No incoming door is ever opened. The connection is outbound only, started by your own server. Your firewall stays fully closed to the outside.
Cloudflare never learns your server's address. You do not register an IP with anyone. There is nothing to point at, and nothing to attack.
You never hand Cloudflare an SSH key or any access to your box. The little program you run reaches out to them, not the other way around.

A few more terms you will meet:

Port: a numbered channel on a server for a specific service. ClickHouse's web interface listens on port 8123. "Opening a port" means allowing outside connections to that number.
localhost: a shortcut name a computer uses to mean "myself." If ClickHouse and cloudflared run on the same server, cloudflared reaches the database at localhost:8123, which is purely internal and never touches the internet.
DNS (Domain Name System): the internet's phone book. It turns a name like ch.test.com into an address a computer can connect to.
Nameservers: the specific servers that hold the phone book entries for your domain. Whoever runs your nameservers controls your domain's DNS.

What you need before you start

An EC2 server where ClickHouse is running, and the ability to log into it over SSH (the standard way to get a remote terminal on a server).
A domain name. Cloudflare Tunnel needs to attach your service to a web address, and to do that the domain must be managed by Cloudflare. Moving your main company domain is a big change, so the clean move is to buy a cheap, throwaway domain just for internal infrastructure (this guide uses test.com). It costs a few dollars a year and keeps your real domain untouched.
A free Cloudflare account.
Your program on Vercel that will do the reading.

A note on the domain, because it trips people up: Cloudflare's free plan requires
the whole domain to use Cloudflare's nameservers. You cannot hand over just one
subdomain on the free plan (that feature, called "partial setup," is a paid
business feature). This is exactly why a separate throwaway domain is the easy
answer: you move the entire throwaway domain to Cloudflare and never touch your
production domain.

Phase 1: Put the domain on Cloudflare

What we are doing and why: we are making Cloudflare the manager of the throwaway
domain's phone book (its DNS), because the Tunnel can only attach a web address to a
domain that Cloudflare controls.

In Cloudflare, click Add a site, type your domain (test.com), and choose the Free plan.
Cloudflare scans the domain's existing records and shows them to you. For a fresh throwaway domain these are just parking entries from the registrar, and you do not need to add anything. The one address you will actually use (ch.test.com) gets created automatically later, so there is nothing to add by hand here.
Click Continue to activation. Cloudflare gives you two nameserver addresses (something like dana.ns.cloudflare.com).
Go to your domain registrar (where you bought the domain, for example Namecheap), find the nameserver setting, switch it from the registrar's default to Custom, and paste in Cloudflare's two nameservers.
Wait for the domain to show Active in Cloudflare. This can take a few minutes and occasionally up to a day. Cloudflare emails you when it is ready.

Phase 2: Create the Tunnel and get its token

What we are doing and why: we are creating the Tunnel on Cloudflare's side and
getting a token. A token is a long secret string that acts as the Tunnel's
password. The little program on our server will use it to prove it is allowed to
connect. We are using the token method (rather than the browser login method)
because it is simpler on a server that has no web browser.

In the Cloudflare dashboard, open Zero Trust (this is Cloudflare's security product area; it also appears under the name "Cloudflare One").
Go to Networks, then Tunnels, then Create a tunnel.
Choose the connector type Cloudflared and give the tunnel a name (this guide uses ch, short for ClickHouse).
On the install screen, choose your server's operating system. Ubuntu, a very common Linux for EC2, is built on Debian, so choose Debian. Then choose the architecture (the chip type). Run uname -m on your server to check: x86_64 means choose 64-bit, and aarch64 means choose arm64 (used by Amazon's Graviton servers).
Cloudflare now shows a set of commands with your token baked in. Keep this tab open.

Phase 3: Install the connector on your server

What we are doing and why: we are installing cloudflared (the little program that
holds the outbound connection) and setting it up as a service, which means the
operating system keeps it running in the background and restarts it automatically if
the server reboots.

Cloudflare shows three command boxes on the install screen. You run the first two and
skip the third.

Log into your EC2 server over SSH.
Copy the first box ("Install cloudflared") using its copy icon, paste it into the terminal, and run it. This adds Cloudflare's software source and installs the program.
Copy the second box ("Install as service"), paste, and run it. It looks like sudo cloudflared service install eyJ... where the long eyJ... string is your token. This registers the connection and starts it running in the background.
Skip the third box ("Or, run tunnel"). The word "Or" is the giveaway: it is an alternative that runs in the foreground and stops the moment you close the terminal. We want the background service, not this.
Check it is running:
You want to see active (running).
```
sudo systemctl status cloudflared
```

Back in the Cloudflare dashboard, the tunnel's status should turn to Healthy
within a few seconds. Healthy means the outbound connection from your server to
Cloudflare is up. It will still say "No routes" because we have not told it what to
serve yet, which is the next step.

Phase 4: Point a web address at ClickHouse

What we are doing and why: the tunnel is connected, but it does not yet know what to
do with requests. We now create a route (older versions of the dashboard call this a
"public hostname") that says "when a request comes in for ch.test.com, forward it to
ClickHouse at localhost:8123."

Open your tunnel and go to the Routes tab, then click Add route.
You are shown four route types. Choose Published application. It means "publish a local service to the internet at a web address." The other three (Private hostname, Private CIDR, Workers VPC) require special client software and are not what we need.
Fill in the form:
- Subdomain: ch
- Domain: test.com
- Path: leave empty (empty means "match every request")
- Service Type: HTTP
- Service URL: http://localhost:8123 (this is ClickHouse on the same server; if ClickHouse runs on a different machine inside your private network, use that machine's private address instead of localhost)
Click Add route. Cloudflare automatically creates the DNS record for ch.test.com, so you do not have to add it by hand.

Test it from your own laptop:

curl "<https://ch.test.com/?query=SELECT%201>"

A couple of terms here:

curl: a command line tool for making web requests. It is the quickest way to check that an address responds.
The quotes around the address matter. The zsh shell (the default terminal on modern Macs) treats a bare ? as a special "match any file" character, and without quotes it errors with no matches found. Quoting the address turns off that behavior.

If you get back 1, the full path works: your laptop reached Cloudflare, Cloudflare
pushed the request down the tunnel to your server, and cloudflared handed it to
ClickHouse, which answered SELECT 1 with 1.

Phase 5: Give the app a limited, read only database account

What we are doing and why: right now anything hitting the tunnel could run any query
using the database's main account. We create a separate account with the least power
possible: it can only read the two tables our program needs, and it cannot write,
change, or delete anything. This is the principle of "least privilege," and it means a
mistake or a leaked credential can do far less damage.

Key terms:

readonly user: a database account restricted to read only queries. It cannot insert, update, delete, or change settings.
GRANT: the SQL command that gives an account permission to do a specific thing.
Least privilege: give an account exactly the access it needs and nothing more.
On the server, open the ClickHouse client. Creating accounts requires an
administrator account (usually called default), not an ordinary app account. If
you try as a limited user you will see an "ACCESS_DENIED" error about needing the
"CREATE USER" grant. Connect as the admin:
```
clickhouse-client -u default
```
(Add --password 'yourpassword' if the admin account has one.)

First generate a strong random password and save it somewhere safe:
```
openssl rand -base64 24
```
Create the read only account and grant it read access to just the two tables. Note
that our tables live in a database named new_database, so the grants name
new_database.your_table_name and new_databse.your_other_table_name. If your tables live
somewhere else, change the database name to match.
```
CREATE USER agent_ro IDENTIFIED BY 'PASTE_THE_GENERATED_PASSWORD' SETTINGS readonly = 1;
GRANT SELECT ON new_database.your_table_name TO agent_ro;
GRANT SELECT ON new_databse.your_other_table_name TO agent_ro;
CREATE QUOTA agent_ro_q FOR INTERVAL 1 hour MAX queries = 2000, result_rows = 50000000 TO agent_ro;
```
The last line is a quota: a safety limit so a runaway program cannot hammer the
database with unlimited queries.

Test the new account through the tunnel. The address includes ?database=new_database
so ClickHouse knows which database to look in (our program sends this automatically):

curl -H "X-ClickHouse-User: agent_ro" -H "X-ClickHouse-Key: PASTE_THE_GENERATED_PASSWORD" \
  "<https://ch.test.com/?database=new_database>" \
  --data-binary "SELECT count() FROM your_table_name FINAL FORMAT JSON"

You should get back a small block of JSON with a count.

Confirm the read only lock actually works. This next command tries to create a table
and should be rejected, which is exactly what we want:

curl -H "X-ClickHouse-User: agent_ro" -H "X-ClickHouse-Key: PASTE_THE_GENERATED_PASSWORD" \
  "<https://ch.test.com/?database=new_database>" \
  --data-binary "CREATE TABLE x (a Int) ENGINE=Memory"

Two small things that commonly go wrong here:

Getting the database name wrong, or forgetting ?database=new_database, produces an "UNKNOWN_TABLE" error because ClickHouse looks in the wrong place.
A typo like ?databse=new_database produces an "UNKNOWN_SETTING" error because ClickHouse reads the misspelled word as a setting name.

Phase 6: Add a second lock with a Cloudflare Access service token

What we are doing and why: the database password is one lock. We add a second,
independent lock in front of the web address itself, so that even reaching ClickHouse's
front door requires a separate secret. This is defense in depth: two locks, so one
failing does not expose everything.

Key terms:

Cloudflare Access: a Cloudflare feature that stands in front of a web address and refuses anyone who cannot prove they are allowed.
Service token: a machine to machine credential, made of a Client ID and a Client Secret (a public name and a private password, roughly). Automated programs send these as two request headers to get through Access. A "header" is a small labeled piece of extra information attached to a web request.
In Zero Trust, go to Access controls, then Service credentials, then Service Tokens, and click Create Service Token. Name it (for example your-agent) and generate it. Copy the Client ID and Client Secret now, because the secret is shown only once.
Go to Access controls, then Applications, then Add an application, and choose Self-hosted. Set the application's hostname to ch.test.com.
Add a policy with the action Service Auth (this specifically means "let approved machines in without a human login screen"). Under Include, choose Service Token and select the token you just created. Save the policy and the application.
Verify the lock. Without the token, the request should now be blocked:
With both the token headers and the database credentials, it should succeed:
```
curl -so /dev/null -w "%{http_code}\n" "<https://ch.test.com/?query=SELECT%201>"
```

    curl -H "CF-Access-Client-Id: YOUR_CLIENT_ID" \
         -H "CF-Access-Client-Secret: YOUR_CLIENT_SECRET" \
         -H "X-ClickHouse-User: agent_ro" -H "X-ClickHouse-Key:YOUR_DB_PASSWORD" \
         "<https://ch.test.com/?database=new_database>" \
         --data-binary "SELECT 1 FORMAT JSON"

If the first is blocked and the second returns 1, you now have two independent locks
in place, and zero open incoming ports on your server.

Phase 7: Point the Vercel program at the database

What we are doing and why: the program on Vercel needs to know the address, the
database account, and the Access token. These are supplied as environment variables,
which are named settings you give a program without writing them into its code (so
secrets never live in the source).

Set these on the Vercel project (in Settings, under Environment Variables):

CLICKHOUSE_HOST=ch.test.com
CLICKHOUSE_PORT=443
CLICKHOUSE_SECURE=true
CLICKHOUSE_DATABASE=new_database
CLICKHOUSE_USERNAME=agent_ro
CLICKHOUSE_PASSWORD=your_database_password
CF_ACCESS_CLIENT_ID=your_client_id
CF_ACCESS_CLIENT_SECRET=your_client_secret

A note on the values:

Port 443 with CLICKHOUSE_SECURE=true: 443 is the standard port for secure web traffic (HTTPS). Cloudflare serves your tunnel address over HTTPS, so the program connects on 443 with encryption on, even though ClickHouse itself is plain HTTP on 8123 behind the tunnel.
The two CF_ACCESS_* values are what get the program past the Access lock from Phase 6.

Redeploy the Vercel app so the new settings take effect, then run whatever triggers the
program to read ClickHouse. If it returns data instead of a connection error, the whole
chain is working.

A quick security checklist before you call it done

Rotate any secret that has touched a chat window, a screenshot, or a shared note.
If you pasted the database password anywhere while setting up, change it:
and update the Vercel environment variable to match.
```
ALTER USER agent_ro IDENTIFIED BY 'a_new_password';
```
Keep the account read only. Do not grant it more than the tables it needs.
Leave the server's incoming firewall closed. The whole point of the tunnel is that
you never need to open a port. If you did open one earlier to test, close it again.

The one gotcha to know about

If the tunnel refuses to turn "Healthy," the usual cause is that your server's
outbound internet access is blocked. cloudflared prefers a fast protocol called
QUIC over UDP port 7844. If your network blocks UDP, force it to use standard HTTPS
instead: in the tunnel's configuration set the connector protocol to http2. You still
never open any incoming port; this only changes how the outgoing connection is made.

The whole thing in one picture

Your laptop / the Vercel app
        |
        v  (request to <https://ch.test.com>, carrying the Access token + DB login)
   Cloudflare edge
        |
        |  Cloudflare Access checks the service token (lock #1)
        |
        v  pushes the request DOWN the connection your server opened
   cloudflared  (running on your EC2 server, connection was outbound only)
        |
        v  <http://localhost:8123>, staying inside the private network
   ClickHouse  (checks the read only account + password, lock #2)
        |
        v
     answer travels back the same way

No incoming port was ever opened. The database was never exposed to the internet. Two
independent locks guard the path, and the program on Vercel reads exactly the two tables
it is allowed to, and nothing else.

Setting Up Gemini on Vertex AI for Production: A No-Nonsense Walkthrough

Pratham Chauhan — Tue, 30 Jun 2026 09:51:39 +0000

If you want to call Gemini through Google Cloud's Vertex AI from a real production environment, not just a local script, there's a specific order of operations that saves you from a lot of pain later. This is that walkthrough: project setup, a properly scoped service account, local testing, and finally getting it running securely on an EC2 instance, including the keyless Workload Identity Federation (WIF) path.

You might be wondering why bother with all this instead of just grabbing an API key from AI Studio and dropping it into an environment variable. Honestly, for a weekend project, do that. But a raw API key is a single static string that grants full access to whatever it's scoped to, forever, until you remember to revoke it. If it ends up in a public repo, a client side bundle, or a log file, anyone holding it can run up your bill or worse. The setup in this post trades a bit of upfront complexity for something much safer: identity based access that can be scoped down to exactly one permission, rotated without touching your app code, and in the WIF case, never even exists as a file you could accidentally leak. It's the difference between handing someone a house key and giving them a temporary badge that only opens one door and expires on its own.

Before diving in, a couple of terms that will come up a lot. An IAM role is just a labeled bundle of permissions, like a job title that comes with a fixed list of things you're allowed to do. A service account is not a human user. Think of it as a robot identity that your app logs in as, instead of a person typing a password. We'll use both throughout.

Part 2 covers everything that goes wrong if you skip a step or fat finger a detail here, and trust me, there's a lot that can go wrong. But first, let's do it right.

Step 1: Pick your project and enable the right API

Open the Google Cloud Console and confirm you're in the correct project (not just "a" project, since billing and permissions are scoped per project, and it's easy to set things up in the wrong one if you have several).

PROJECT_ID="your-gcp-project-id"
LOCATION="us-central1"
gcloud config set project "$PROJECT_ID"

The API you want is aiplatform.googleapis.com, labeled "Vertex AI API" in the console. An API here just means a specific Google Cloud service you have to switch on before you can use it. There's a deceptively similar sounding one called Vertex AI Search for commerce (retail.googleapis.com) that has nothing to do with calling Gemini models. Enable the right one:

gcloud services enable aiplatform.googleapis.com
gcloud services list --enabled --filter="aiplatform.googleapis.com"

Vertex AI locations aren't inherited from your project. You choose a region explicitly, meaning the physical data center area where your requests get processed. us-central1 is a safe, well supported default. Whatever you pick, use the same value everywhere: local testing, EC2 env vars, your app code. Mismatched regions are a surprisingly common source of "it works locally but not in prod."

Step 2: Set a budget before you test anything

Before your first API call, set up a budget alert. Go to Billing, then Budgets & alerts, then Create budget. Scope it to this specific project, and set a monthly amount you're comfortable with. A hundred dollars is a reasonable starting point. Add alert thresholds at every 10% so you get early warning, not just a surprise at the end of the month.

One important caveat: budgets are alerts, not hard stops. Google Cloud won't automatically cut you off at your limit. If you need an actual ceiling, build it into your own backend. Check estimated monthly spend before each Gemini call and reject the request if you're over:

if (monthlySpendUsd >= 100) {
  throw new Error("Monthly Gemini budget reached");
}

This is more reliable than a billing disable webhook, because it stops the request, not the project, which matters if this project hosts anything else.

Step 3: Create a least privilege service account

"Least privilege" just means giving something only the exact permissions it needs to do its job, nothing extra "just in case." Resist the urge to reuse an existing service account or grant Editor or Owner, which are broad, all access roles. Create a fresh service account dedicated to this one purpose:

IAM & Admin → Service Accounts → Create service account
Name: gemini-prod-runner

Then grant it exactly one role, roles/aiplatform.user (shown as "Vertex AI User" in the console). This single role covers everything needed to call Gemini models. It doesn't need Owner, Editor, or any billing or admin permissions, so even if this identity were somehow compromised, the blast radius is small: someone could call Gemini on your dime, but they couldn't touch your other cloud resources or billing settings.

gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="serviceAccount:gemini-prod-runner@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Step 4: Test locally before touching any servers

For local development, Application Default Credentials, usually shortened to ADC, are the path of least resistance. ADC is just Google's term for "let the command line tool log you in once, then every script on this machine can quietly reuse that login" instead of you managing key files by hand.

gcloud auth application-default login
gcloud auth application-default set-quota-project "$PROJECT_ID"

That second command matters more than it looks. It tells Google which project to bill your test requests against. Skip it and you'll get vague "quota exceeded" or "API not enabled" errors that have nothing to do with quotas or the API.

Set your environment:

export GOOGLE_CLOUD_PROJECT="$PROJECT_ID"
export GOOGLE_CLOUD_LOCATION="us-central1"
export GOOGLE_GENAI_USE_VERTEXAI="true"

And a minimal test script:

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({
  vertexai: true,
  project: process.env.GOOGLE_CLOUD_PROJECT,
  location: process.env.GOOGLE_CLOUD_LOCATION || "us-central1",
});

const response = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "Say hello in one sentence.",
});

console.log(response.text);

If you'd rather test with the exact kind of credentials you'll use in production, generate a service account key, which is just a downloadable JSON file containing a long lived password for that robot identity. This is fine for local testing only (more on why not to ship this to prod below):

gcloud iam service-accounts keys create ~/secrets/gemini-prod-runner.json \
  --iam-account=gemini-prod-runner@${PROJECT_ID}.iam.gserviceaccount.com
chmod 600 ~/secrets/gemini-prod-runner.json

export GOOGLE_APPLICATION_CREDENTIALS="$HOME/secrets/gemini-prod-runner.json"

Add secrets/, .env, and anything matching *credentials* or *service-account* to .gitignore immediately. Don't wait until after the first commit.

Step 5: Deploying to EC2, two paths

You have two reasonable options here, and which one you pick is a real tradeoff, not just a "more secure equals always better" decision.

Option A is putting that service account JSON key file on the server's disk. Copy the key file to the instance, point GOOGLE_APPLICATION_CREDENTIALS at it, and you're done. Fast to set up, easy to debug, and perfectly fine for getting a first deployment working. The tradeoff is that it's a long lived static secret sitting on a real machine. If it leaks, it's valid until you manually revoke it.

scp ~/secrets/gemini-prod-runner.json ubuntu@<EC2_IP>:/home/ubuntu/gemini-prod-runner.json
ssh ubuntu@<EC2_IP> chmod 600 /home/ubuntu/gemini-prod-runner.json

Option B is Workload Identity Federation, or WIF. In plain terms, WIF lets two clouds vouch for each other without ever sharing a password. AWS already knows, with certainty, which EC2 instance is making a request. WIF lets Google trust that AWS issued vouching instead of asking for a Google specific secret. No Google key ever touches the instance. Instead, AWS proves the instance's identity, and Google exchanges that proof for a short lived token that expires on its own, on demand, every time. More setup work, but nothing sitting on disk to rotate or leak.

If this is your first time deploying this app, I'd genuinely recommend starting with Option A, confirming everything else works end to end, and then swapping to WIF as an isolated second step. Debugging two unfamiliar systems at once, your app's Vertex AI integration and a federated trust chain between two clouds, is much harder than debugging them one at a time.

Step 6: Setting up WIF properly

This is the part with the most moving pieces, so go slowly, and here are the building blocks in plain language before the steps.

An IAM role on the AWS side (yes, AWS has its own separate concept also called a role, easy to confuse with Google's IAM role) is what gives your EC2 instance a verifiable identity card. It's not about granting AWS permissions here, it's purely "this machine is who it says it is."

A Workload Identity Pool on the Google side is basically a waiting room where Google agrees to listen to identities from somewhere outside Google, like AWS. A provider inside that pool is the specific configuration that says "and here's exactly how to verify an AWS identity, and here's which AWS account I trust." A principal is Google's general word for "the identity asking for access," whether that's a person, a service account, or in this case, an AWS role being recognized through the pool.

On the AWS side, create that IAM role for EC2. It doesn't need any special AWS permissions, since it exists purely to give the instance a verifiable identity, not to grant AWS side access to anything:

IAM → Roles → Create role → Trusted entity: AWS service → Use case: EC2
Name: ec2-gemini-runner

Attach it to your running instance:

EC2 → Instances → select instance → Actions → Security → Modify IAM role

On the Google side, create the Workload Identity Pool and an AWS provider inside it:

IAM & Admin → Workload Identity Federation → Create pool
Pool ID: aws-ec2-pool
Provider type: AWS
Provider ID: aws-ec2-provider
AWS account ID: <your AWS account ID>

Then grant that pool's identity permission to act as your service account, or grant the role directly to the AWS identity as its own principal. Google supports both patterns, and which one your setup uses depends on how you configure the binding (a binding is just the rule that says "this identity gets this permission"). The console's "Connected service accounts" flow under your pool will show you which path you're on. Either way, the end result needs to be that this specific AWS role can obtain a Google access token scoped to call Vertex AI.

Download the generated credential configuration file and place it on the instance:

export GOOGLE_APPLICATION_CREDENTIALS="/home/ubuntu/gcp-wif-credential.json"

This file is not a private key. It's a small JSON document of instructions telling Google's auth library how to go ask AWS who it's talking to. Worth opening it once and reading it, since knowing its shape will save you real time if something goes wrong later (Part 2 has a lot more on this).

Step 7: Wire it into your process manager

Whatever you use to run the app in production, PM2, systemd, Docker, make sure these environment variables actually reach the running process, not just your shell:

GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_APPLICATION_CREDENTIALS=/home/ubuntu/gcp-wif-credential.json

A systemd unit, for reference:

[Unit]
Description=My Gemini Vertex AI App
After=network.target

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/home/ubuntu/my-app
Environment="GOOGLE_CLOUD_PROJECT=your-project-id"
Environment="GOOGLE_CLOUD_LOCATION=us-central1"
Environment="GOOGLE_GENAI_USE_VERTEXAI=true"
Environment="GOOGLE_APPLICATION_CREDENTIALS=/home/ubuntu/gcp-wif-credential.json"
ExecStart=/usr/bin/npm start
Restart=always

[Install]
WantedBy=multi-user.target

Pre-production checklist

A quick list worth running through before calling this done. The correct API, aiplatform.googleapis.com and not retail.googleapis.com, is enabled. A budget with staged alerts exists. The service account has roles/aiplatform.user only, nothing broader. Any key files are outside the repo, chmod 600, and .gitignore'd. On EC2, you've picked one auth path deliberately rather than half configuring both. And your application has its own spend guard rather than relying solely on Google's billing alerts.

That's the setup. If everything above goes exactly to plan, you're done. If it doesn't, and there are a surprising number of ways it doesn't, Part 2 walks through every failure mode I personally hit, in the order I hit them, and how to actually diagnose each one instead of guessing.