Sapnesh Naik for Nango

Posted on May 7 • Originally published at nango.dev

How to sync large amounts of contacts from the HubSpot API

#hubspot #api #claude #agents

If you are building API integrations with HubSpot in your product, you will eventually need to sync more than 10,000 contacts from a customer's portal. That is not easy when you want a durable, resumable sync that fetches only new and changed records, instead of refetching everything on every run. You also need to think about managing the auth tokens for 100s of users, record storage, webhook updates, and more.

This guide walks through how to build a production-ready HubSpot contacts sync using Claude Code, Cursor, Codex, or any other AI Coding Agent with the Nango AI Function Builder skill.

The full working demo, including the Nango sync, an Express backend, and an HTML dashboard for browsing synced contacts, is on GitHub at NangoHQ/blog-demos/how-to-sync-large-amounts-of-contacts-from-hubspot-api.

Note: The same steps can also be used for syncing HubSpot companies, deals, tickets, and any other CRM Nango supports.

Why syncing large amounts of contacts from HubSpot is hard

HubSpot exposes contacts through two endpoints, and neither one is enough on its own:

Endpoint	Filter by `lastmodifieddate`?	10k total cap?
`GET /crm/v3/objects/contacts` (basic list)	No	No
`POST /crm/v3/objects/contacts/search`	Yes	Yes (400 error past 10k)

The search-the-crm guide confirms it: "The search endpoints are limited to 10,000 total results for any given query. Attempting to page beyond 10,000 will result in a 400 error." Search is also rate-limited to five requests per second per account.

So the search endpoint is the only one that filters by lastmodifieddate (what you need for incremental syncs), but it walls off at 10k. The basic list endpoint has no cap but no filter. The HubSpot community forum is full of 400 reports from developers who only used search and assumed their code was broken.

Solution to sync large amounts of contacts from HubSpot

Split the sync into two phases that share state via Nango checkpoints:

Initial phase. Page through every contact using the basic list endpoint. No 10k cap. After each page, save a checkpoint with the after cursor and the highest lastmodifieddate seen so far. Note: while this endpoint does have lastmodifieddate in its response, it does not support filtering by lastmodifieddate.
Incremental phase. Switch to the search endpoint. Filter by lastmodifieddate GT <watermark>, sort ascending by lastmodifieddate, and page through. Each page advances the watermark. If a request 400s past the 10k cap, catch the error, save the latest watermark, and start a fresh search from it inside the same run. Repeat until a search returns fewer than 10k results, so no modified records are missed.

Why building this from scratch is harder than it looks

The two-phase pattern is the easy part. The infrastructure underneath is the hard part:

Durable checkpoints that survive process restarts and store per-connection cursors and watermarks.
Resumable execution, so a 200k-contact backfill can stop and pick up across multiple runs without skipping or duplicating records.
HubSpot OAuth lifecycle with concurrent token refreshes and graceful handling of BAD_REFRESH_TOKEN.
Rate-limit handling with backoff and per-connection queuing for HubSpot's 5 req/s search cap.
Sync-completion webhooks so your backend knows when new records landed.

What Nango provides for building API integrations

Nango is an agentic API integrations platform. It gives you five building blocks that turn the two-phase pattern from a multi-week project into minutes of work with a coding agent:

Wide API support with managed auth. Pre-built auth for 700+ APIs, including HubSpot, with OAuth 2.0, API keys, basic auth, JWT, and other auth types. Token refresh, encryption, and revocation handling are managed for you.
AI Function Builder skill for coding agents. Install once with npx skills add NangoHQ/skills and your AI coding agent (Claude Code, Cursor, Codex, Gemini CLI, and others) gets the context to research the API, write a Nango sync or action in TypeScript, run dryruns against a real connection, and iterate until the integration works.
Actions and tool calls. Actions are functions written in TypeScript that run on Nango's infrastructure and are triggered from your app or by an AI agent. For example, a create-contact action calls POST /crm/v3/objects/contacts with the right token for the right connection, returning the created record.
Data syncs with record storage. Syncs run on Nango's serverless, tenant-isolated runtime. Synced records are stored in Nango's cache so your app can read them in a paginated manner with nango.listRecords(). The sync handles retries, rate limits, and resumable execution automatically.
Webhooks from external APIs. Nango ingests webhooks from external providers like HubSpot (for example, contact.deletion events), attributes each event to the right connection, and forwards a normalized payload to your backend. You write the handler once and Nango deals with provider-specific signing, retries, and routing.

Implementing a HubSpot contacts sync with Nango

Follow the steps below to implement a durable HubSpot contacts sync using the Nango AI Function Builder skill.

By the end of this section, you will have:

A customizable auth UI in your product so customers can connect their HubSpot account.
An initial backfill sync that fires the moment a new HubSpot connection is created and pulls all contacts.
An incremental phase that picks up only modified contacts on every run (every hour, customizable) and slides past the 10k search cap automatically.
Checkpoints that survive runtime restarts and rate limits.
A backend that receives sync-completion webhooks and reads only the records that changed.

Prerequisites

You will need a Nango account (the free tier is enough for development). Then register your own HubSpot OAuth app with the crm.objects.contacts.read scope, set the OAuth callback URL to https://api.nango.dev/oauth/callback, and configure HubSpot as an integration in the Nango dashboard.

Note: If you do not have your own HubSpot OAuth app yet, you can use Nango's pre-configured developer credentials to get going, then swap to your own credentials before going to production.

Next, install the Nango CLI and initialize an integrations project:

npm install -g nango
nango init
cd nango-integrations

Then install the AI Function Builder skill, so your coding agent has the context it needs to write Nango functions:

npx skills add NangoHQ/skills -s building-nango-functions-locally

This adds Nango's interface patterns, Zod schema conventions, dryrun testing flow, and CLI usage to whichever coding agent you use.

Tip: LLM training data on Nango is often stale. Add the Nango docs MCP server alongside the skill, so your agent can pull current API references during code generation.

Finally, set a Nango dev API key in a .env file at the root of nango-integrations so the CLI and dryrun commands can authenticate:

# nango-integrations/.env
NANGO_SECRET_KEY_DEV=<your-dev-api-key>

You can create or copy a dev API key from the Nango dashboard.

Step 1: Add a HubSpot test connection

Before generating the sync, create a real HubSpot connection in the Nango dashboard. The AI coding agent uses this connection to run nango dryrun against the live HubSpot API while it iterates on the function, so the generated code is tested against real responses.

In the Nango dashboard, click Add Test Connection, pick HubSpot, and complete the OAuth flow against a HubSpot developer account. Copy the connection ID from the connection details page; you will pass it into the prompt in the next step.

Step 2: Generate the contacts sync

Open your project in your AI coding agent and invoke the AI Function Builder skill with a prompt like the one below. Replace the connection ID with the one you just created.

Build a Nango sync for HubSpot contacts that runs every hour, handles
portals with more than 10,000 contacts reliably, and resumes cleanly
across runs without skipping or duplicating records.

Integration ID: hubspot
Connection ID: <your-connection-id>

Properties to sync: firstname, lastname, email, phone, jobtitle,
company, createdate, lastmodifieddate.

The agent figures out the rest from HubSpot's docs and the AI Function Builder skill: which endpoints to use, how to paginate, how to handle the 10k search cap, and how to checkpoint for resumability.

The agent researches both endpoints, writes the function, generates a test suite, runs nango dryrun against your test connection, and iterates on real responses until everything passes. Here is what the agent does behind the scenes.

Step 3: Review the generated sync

When the agent finishes, you will see three new files inside nango-integrations/hubspot/:

hubspot/syncs/fetch-contacts.ts: the sync function itself, with the two-phase logic, checkpoint schema, and Zod model.
hubspot/tests/fetch-contacts.test.json: a recorded HubSpot API response captured during the agent's dryrun, used as a deterministic fixture for unit tests.
hubspot/tests/hubspot-fetch-contacts.test.ts: the unit test that replays the fixture and asserts the sync produces the expected records.

Let's look at the important bits of the sync file. The full file with imports, schemas, helpers, and types is in the demo repo.

1. The createSync declaration and phase dispatcher. A string-only CheckpointSchema carries the phase, the basic-list cursor, and the watermark across runs.

const CheckpointSchema = z.object({
    phase: z.string(),
    after: z.string(),
    lastmodifieddate: z.string()
});

const sync = createSync({
    description: 'Two-phase HubSpot contacts sync that slides past the 10k search cap.',
    version: '1.0.0',
    frequency: 'every hour',
    autoStart: true,
    checkpoint: CheckpointSchema,
    models: { HubspotContact: HubspotContactSchema },
    exec: async (nango) => {
        const checkpoint = await nango.getCheckpoint();
        const phase = checkpoint?.phase === 'incremental' ? 'incremental' : 'initial';
        let watermark = checkpoint?.lastmodifieddate || undefined;

        if (phase === 'initial') {
            watermark = await runInitialPhase(nango, checkpoint?.after, watermark);
        }
        await runIncrementalPhase(nango, watermark);
    }
});

2. Initial phase: paginate through the basic list endpoint, save a checkpoint per page. No 10k cap on this endpoint, so a 200k-contact portal flows through across however many runs are needed.

const proxyConfig = {
    endpoint: '/crm/v3/objects/contacts',
    params: { limit: PAGE_LIMIT, properties: PROPERTIES.join(',') },
    paginate: {
        type: 'cursor' as const,
        cursor_name_in_request: 'after',
        cursor_path_in_response: 'paging.next.after',
        response_path: 'results',
        limit: PAGE_LIMIT,
        on_page: async ({ nextPageParam }) => {
            if (typeof nextPageParam === 'string' && nextPageParam) {
                await nango.saveCheckpoint({ phase: 'initial', after: nextPageParam, lastmodifieddate: watermark ?? '' });
            } else {
                // Backfill done. Switch to incremental on the next run.
                await nango.saveCheckpoint({ phase: 'incremental', after: '', lastmodifieddate: watermark ?? '' });
            }
        }
    },
    retries: 3
};

for await (const page of nango.paginate<HubspotContactRaw>(proxyConfig)) {
    const contacts = page.map(toContact);
    await nango.batchSave(contacts, 'HubspotContact');
    watermark = highestWatermark(contacts, watermark);
}

3. Incremental phase: search filtered by watermark, ascending, checkpointed per page. This is the part that solves the 10k cap. filterWatermark defines the search; savedWatermark advances per page. When a paged request 400s past the 10k cap, catch the error, restart the search from the latest watermark, and continue inside the same run until a slice returns fewer than 10k results.

let filterWatermark = savedWatermark;

while (true) {
    const filterGroups = [{
        filters: [{
            propertyName: 'lastmodifieddate',
            operator: 'GT',
            value: toMillisecondsString(filterWatermark)  // HubSpot wants Unix milliseconds
        }]
    }];

    let after: string | undefined;
    let hitTenKCap = false;

    while (true) {
        let response;
        try {
            response = await nango.post<SearchResponse>({
                endpoint: '/crm/v3/objects/contacts/search',
                data: {
                    filterGroups,
                    sorts: [{ propertyName: 'lastmodifieddate', direction: 'ASCENDING' }],
                    properties: [...PROPERTIES],
                    limit: PAGE_LIMIT,
                    ...(after ? { after } : {})
                },
                retries: 3
            });
        } catch (err) {
            // Past the 10k cap, HubSpot returns a 400. Restart the search
            // from the latest watermark inside the same run.
            if (isTenKCapError(err)) {
                hitTenKCap = true;
                break;
            }
            throw err;
        }

        const contacts = response.data.results.map(toContact);
        if (contacts.length === 0) return;

        await nango.batchSave(contacts, 'HubspotContact');
        savedWatermark = highestWatermark(contacts, savedWatermark);

        await nango.saveCheckpoint({ phase: 'incremental', after: '', lastmodifieddate: savedWatermark ?? '' });

        after = response.data.paging?.next?.after;
        if (!after) return;
    }

    if (!hitTenKCap) return;
    filterWatermark = savedWatermark;  // Slide the search window forward and loop.
}

A few design decisions worth calling out:

Checkpoints saved per page. The initial phase uses nango.paginate with an on_page callback; the incremental phase saves inline after each batchSave. Either way, if the function is interrupted, the next run resumes from the saved state rather than starting over.
Ascending sort on lastmodifieddate. Each search returns the lowest-watermark records first. When a request 400s past the 10k cap, the latest saved watermark is already past everything we just processed, so restarting the search from it picks up the next slice without gaps. The outer loop keeps sliding the window forward until a slice fits under 10k.
Milliseconds, not ISO, for the search filter. HubSpot's search endpoint expects lastmodifieddate as Unix milliseconds. toMillisecondsString handles the conversion.

Step 4: Deploy

Deploy just this sync to your Nango dev environment:

nango deploy --sync fetch-contacts dev

The sync now runs every hour for every connected HubSpot portal, starting with the initial backfill and automatically transitioning to the incremental phase once the backfill catches up.

Step 5: Handle sync-completion webhooks in your backend

When a sync run finishes with new records, Nango sends a sync-completion webhook to your backend with counts of added, updated, and deleted records, plus a modifiedAfter timestamp. Set the webhook URL in the Nango dashboard under Environment Settings > Webhooks. The demo app shows a working Express backend and an HTML dashboard that displays the synced contacts.

Then fetch only the records that changed:

const records = await nango.listRecords({
    providerConfigKey: 'hubspot',
    connectionId,
    model: 'HubspotContact',
    modifiedAfter: webhookPayload.modifiedAfter,
});

Tip: Always verify the webhook signature before processing using nango.verifyIncomingWebhookRequest(). Nango signs payloads with HMAC-SHA256, and verification requires the raw request body (before JSON parsing).

nango.listRecords is paginated. Loop through next_cursor until it returns null, so you do not silently drop records on portals with very high modification rates.

Local testing tip: To inspect webhooks during development, run ngrok against your local backend and set the ngrok URL as the webhook target in Nango's environment settings.

Step 6: Connect a new HubSpot account and test

With the sync deployed, connect a new HubSpot portal through Nango's auth UI. The sync starts immediately because of autoStart: true.

To verify the sync is doing the right thing:

Check the dashboard. The Nango dashboard shows per-run logs, record counts, current checkpoint, and durations. You should see an initial-phase run that each process all the contacts, then incremental runs that process only modified contacts.
Watch the checkpoint. During the initial phase, the checkpoint shows phase: "initial" with an advancing after cursor. Once the backfill completes, the phase flips to "incremental" and the lastmodifieddate watermark starts moving forward.
Modify a contact in HubSpot. Within the next hour, the incremental sync should pick it up, and your backend should receive a sync-completion webhook with addedOrUpdated: 1.

Edge cases worth knowing about

Custom properties. Update the prompt with the custom property names and rerun the AI Function Builder skill. For per-customer property lists, store them in connection metadata and read them during the sync.
Deletes. Search does not return deleted contacts. Subscribe to HubSpot's contact.deletion webhook. Nango can forward those webhooks to your backend after attributing each event to the right connection.

Conclusion

Syncing large amounts of contacts from HubSpot requires both of HubSpot's contact endpoints. Search filters by lastmodifieddate but caps at 10,000 results. The basic list endpoint has no cap but cannot filter. A reliable sync uses both, with checkpoints that resume cleanly across runs and durable infrastructure underneath: OAuth refresh, retries, rate limit handling, and webhook routing.

With Nango, the sync logic is a code function your AI coding agent generates from a prompt, while the infrastructure is managed for you. The same approach works for HubSpot companies, deals, tickets, and any of Nango's 700+ supported APIs.

Related reading:

DEV Community