This document outlines a detailed, real-world process for syncing data from a third-party API into a Supabase database. It is designed to prepare the data for a variety of uses, including custom dashboards, bespoke applications, and as a knowledge base for Large Language Models (LLMs). While this case study uses Ontraport as the primary example, the principles, challenges, and strategic decisions discussed form a reusable framework applicable to other APIs.
The documentation is structured for both technical and non-technical readers. Conceptual overviews are provided for general understanding, with specific technical sections and code examples available for developers and implementers.
The primary goal of this process is to achieve data sovereignty—liberating data from a vendor’s ecosystem to gain full control and visibility. For the client in this case, syncing their Ontraport data into a self-hosted Supabase instance enabled several key outcomes:
Before any data can be synced, the API must be understood. This traditionally manual and time-consuming phase can be significantly accelerated with the help of an LLM.
An LLM doesn’t “learn” an API in a human sense. Instead, it processes vast amounts of context provided to it. The key is to supply high-quality, structured, and unstructured information about the API.
A swagger.json (OpenAPI) file is the gold standard, providing a machine-readable blueprint of the API. An LLM can parse this to understand every endpoint, parameter, and data structure.
api-docs/swagger.json. For Ontraport, the URL was https://api.ontraport.com/1/api-docs/swagger.json.For context and nuance not present in the Swagger file, the full API documentation should be scraped. Using a tool like Firecrawl on the Ontraport docs (https://api.ontraport.com/doc/) provided a comprehensive text file. This was then ingested via a Retrieval Augmented Generation (RAG) system into Supabase, making the entire documentation searchable by the LLM.
Equipped with documentation and a call_API tool, an LLM can directly test endpoints. This is invaluable for confirming access, discovering available objects (e.g., Ontraport’s GET /objects/meta endpoint), and understanding the data you can actually retrieve, which may differ from the documentation.
With over 100 potential data objects (endpoints) in Ontraport, creating a Supabase table for each one manually is inefficient and prone to error. The solution is to automate table creation within an n8n workflow.
ontraport_tables Control TableThe first step is to create a central control table in Supabase. This table, ontraport_tables, lists every object to be synced, its Ontraport object_id, and the desired supabase_table_name.
While this was done manually for this project, a more efficient future approach would be to use the LLM to call the GET /objects/meta endpoint, retrieve the list of all available objects, and then generate the SQL INSERT statements to populate this control table automatically.
This is a critical point: while an LLM is excellent for exploration, creating database schemas is a precise, deterministic task best handled by code, not a probabilistic model. Using an LLM to generate CREATE TABLE statements risks hallucinations, syntax errors, and inconsistencies. A programmatic approach guarantees accuracy and provides traceable error logs.
The n8n workflow performs these steps:
information_schema.tables to see if the table for the selected object already exists.SELECT EXISTS ( SELECT 1 FROM information_schema.tables WHERE table_schema = 'public' AND table_name = '{{ $json.table }}' );range=1) to use as a template.CREATE TABLE SQL: A Code node then inspects this sample record and dynamically builds the CREATE TABLE statement. This code handles several important “gotchas”:id from the API as a bigint PRIMARY KEY. The default auto-incrementing Supabase id is not used to maintain a direct link to the source data.created_at timestamp for internal Supabase tracking.snake_case, handling keys that start with numbers, and removing illegal characters.boolean, jsonb), but defaults to text for safety.Here is the core logic from the n8n Code node that generates the query to create the tables:
const columns = [];
// Add 'id' column as BIGINT PRIMARY KEY (from API)
columns.push(`"id" bigint PRIMARY KEY`);
// Add 'created_at' column as TIMESTAMP WITH TIME ZONE (Supabase internal)
columns.push(`"created_at" timestamp with time zone not null default now()`);
for (let key in the_item) {
if (key === 'id') continue;
let cleanedKey = key;
// ... (sanitisation logic as provided in your notes) ...
cleanedKey = cleanedKey.replace(/[^a-z0-9_]/g, '_');
// Determine data type (default: text)
let dataType = 'text';
const value = the_item[key];
if (value !== null && value !== undefined) {
if (typeof value === 'boolean') {
dataType = 'boolean';
} else if (typeof value === 'object' && !Array.isArray(value)) {
dataType = 'jsonb';
}
}
// Wrap column name in quotes to avoid reserved word conflicts
columns.push(`"${cleanedKey}" ${dataType}`);
}
const createTableQuery = `
CREATE TABLE IF NOT EXISTS public."${table}" (
${columns.join(',n ')}
) TABLESPACE pg_default;
`;A core principle of this initial bulk sync is to **bring everything in as text**. Attempting to infer data types like integer or date from a small sample is risky; a column that appears numeric for 10 rows might contain text in the 11th. Importing as text is the safest option to prevent data loss, truncation, or formatting issues (e.g., timezone conversions, float rounding). Data type conversion and cleaning is a distinct, post-sync analysis step.
Long-running, monolithic n8n workflows that loop for hours are brittle. They consume vast amounts of memory, make debugging difficult, and a single error can halt the entire process. A more robust strategy is to use frequent, short-lived executions.
The n8n workflow is triggered every minute. Instead of syncing tables sequentially, it picks a random table that hasn’t been completed yet.
SELECT * FROM ontraport_tables WHERE bulk_sync_done IS NULL AND disable_reason IS NULL
ORDER BY RANDOM() LIMIT 1;This method offers a significant advantage: it quickly reveals if the sync logic works across all the different data types. Any API quirks or inconsistencies specific to certain objects are discovered within the first couple of hours, rather than days into a sequential sync. Once the logic is proven robust for all object types, the workflow can be left to run, gradually completing the entire dataset.
Every API handles pagination differently. Ontraport uses offset-based pagination with start and range parameters. The workflow handles this by looping and incrementing the start offset by 50 on each iteration until no more records are returned.
A more critical aspect is how to handle incremental syncing—fetching only new or updated records after the initial bulk sync. The ideal strategy depends entirely on the capabilities of the source API:
max(id) from the Supabase table and requests only records where id > max(id). This is highly efficient for capturing new records but will miss any modifications to older records.max(modification_date) from Supabase and request all records modified after that date. This captures both new and updated records.INSERT ... ON CONFLICT DO UPDATE statement in Supabase to handle the duplicates. Another approach is to find other filterable parameters in the API (e.g., search for contacts by the letter ‘a’, then ‘b’, etc.) to break the sync into manageable chunks. These methods are complex and less reliable.For the initial bulk sync with Ontraport, the max(id) approach was used for its simplicity. Once records are fetched, a Code node generates an INSERT ... ON CONFLICT DO UPDATE statement for each one. This “upsert” logic is crucial, ensuring that new records are added and existing records are updated if they happen to be fetched again.
const results = [];
for (const item of items) {
const the_item = item.json; // Each item.json is a single contact object
const columns = [];
const values = [];
const updateSet = [];
const table = $('Set variables1').first().json.table
// Add 'id' column. Ontraport ID is text.
columns.push('"id"');
values.push(`'${the_item.id}'`);
// Iterate over other contact properties from the API response
for (let key in the_item) {
// Skip 'id' as it's already handled
if (key === 'id') continue;
let cleanedKey = key;
// Handle keys starting with numbers by prepending 'f'
if (!isNaN(parseInt(cleanedKey.charAt(0)))) {
cleanedKey = 'f' + cleanedKey;
}
// Convert to snake_case and remove potential double underscores
cleanedKey = cleanedKey.replace(/([A-Z])/g, '_$1').toLowerCase();
cleanedKey = cleanedKey.replace(/__/g, '_');
// Replace any special characters (including forward slashes) with underscores
cleanedKey = cleanedKey.replace(/[^a-z0-9_]/g, '_');
columns.push(`"${cleanedKey}"`);
let value = the_item[key];
if (value === null || value === undefined) {
values.push('NULL');
} else if (typeof value === 'string') {
values.push(`'${value.replace(/'/g, "''")}'`);
} else {
values.push(`'${String(value).replace(/'/g, "''")}'`);
}
// Add to updateSet, excluding 'created_at'
if (cleanedKey !== 'created_at') {
updateSet.push(`"${cleanedKey}" = EXCLUDED."${cleanedKey}"`);
}
}
const insertQuery = `
INSERT INTO public.${table} (${columns.join(', ')})
VALUES (${values.join(', ')})
ON CONFLICT (id) DO UPDATE SET
${updateSet.join(', ')};
`;
results.push({ json: { query: insertQuery } });
}
return results;Ontraport imposes a limit of 180 requests per minute. Whilst this is ample for this example, the idea of using a short-lived scheduled executions rather than one monolithic attempt to update everything means rate limits are avoided.
{{
// Code to determine wait time between requests to avoid rate limiting by the API
// If remaining requests are low (e.g., less than 5), wait for the reset period
$('Get Ontraport items').item.json.headers['x-rate-limit-remaining'] < 5 ?
// Add 1 second buffer
waitTime = parseInt($('Get Ontraport items').item.json.headers['x-rate-limit-reset']) + 1 :
// Otherwise, wait a small amount to distribute requests
// Wait 500 milliseconds between requests
waitTime = 0.5
}}To maintain visibility into the sync process, the ontraport_tables table is used for logging. After each run, the workflow updates the row_count for the synced table. If an error occurs, the error message is written to the disable_reason column, which automatically excludes that table from future sync runs until the issue is manually reviewed.
This process revealed a major quirk in the Ontraport API: custom fields in API responses are named with cryptic IDs (e.g., f1430, f1639), not their human-readable names.
The actual names (“aliases”) for these fields are available from a separate /meta endpoint, which, inconsistently, requires the object’s name (e.g., “Contacts”) rather than its objectID.
An n8n workflow was created to iterate through each table, call its corresponding /meta endpoint, parse the response, and store the mapping between the field ID (f1234) and its alias (Investment Property Address) in a new table: ontraport_custom_fields.
Again, this was done by dynamically creating the Postgres INSERT statements as required:
// Get the fields object from the incoming data
const data = $input.all()[0].json.body.data;
// Get the first (and only) key dynamically
const dataKey = Object.keys(data)[0];
const fields = data[dataKey].fields;
// Transform into array of objects, one per field
const rows = Object.entries(fields).map(([fieldId, fieldData]) => {
let cleanedKey = fieldId;
// Handle keys starting with numbers by prepending 'f'
if (!isNaN(parseInt(cleanedKey.charAt(0)))) {
cleanedKey = 'f' + cleanedKey;
}
// Convert to snake_case and remove potential double underscores
cleanedKey = cleanedKey.replace(/([A-Z])/g, '_$1').toLowerCase();
cleanedKey = cleanedKey.replace(/__/g, '_');
// Replace any special characters with underscores
cleanedKey = cleanedKey.replace(/[^a-z0-9_]/g, '_');
return {
custom_field_id: cleanedKey, // Now matches the actual column name
alias: fieldData.alias,
type: fieldData.type,
required: fieldData.required,
unique: fieldData.unique,
editable: fieldData.editable,
deletable: fieldData.deletable
};
});
return rows;With hundreds of custom fields, querying the raw tables is impractical for humans and LLMs. The solution was to create a PostgreSQL VIEW for each table that has custom fields. This view presents the data with human-readable column names.
However, creating these views manually is not scalable. A PostgreSQL function, refresh_ontraport_view(table_name_param text), was developed to automate this. It dynamically constructs a CREATE OR REPLACE VIEW statement by joining the base table with the ontraport_custom_fields table to alias the columns.
DECLARE
view_sql text;
custom_fields_sql text;
view_name text;
error_detail text;
BEGIN
-- Construct the view name
view_name := table_name_param || '_view';
-- Build the custom fields part of the SELECT (only where exists = true)
SELECT string_agg(
'"' || custom_field_id || '" AS "' || alias || '"',
', '
ORDER BY custom_field_id
)
INTO custom_fields_sql
FROM ontraport_custom_fields
WHERE table_name = table_name_param
AND (exists IS NULL OR exists = true); -- Include NULL for backwards compatibility
-- Build the complete view SQL
view_sql := 'CREATE OR REPLACE VIEW ' || view_name || ' AS SELECT ' ||
'id, created_at, owner, firstname, lastname';
-- Add custom fields if any exist
IF custom_fields_sql IS NOT NULL THEN
view_sql := view_sql || ', ' || custom_fields_sql;
END IF;
view_sql := view_sql || ' FROM ' || table_name_param;
-- Try to execute the dynamic SQL
BEGIN
EXECUTE view_sql;
EXCEPTION
WHEN undefined_column THEN
-- Extract the column name from the error message
GET STACKED DIAGNOSTICS error_detail = MESSAGE_TEXT;
-- Mark any fields that don't exist as exists = false
UPDATE ontraport_custom_fields
SET exists = false
WHERE table_name = table_name_param
AND custom_field_id = substring(error_detail from 'column "([^"]+)" does not exist');
-- Retry the view creation without the missing column
PERFORM refresh_ontraport_view(table_name_param);
END;
END;Crucially, the function includes error handling. If view creation fails because a custom field in the mapping doesn’t actually exist in the base table (an undefined_column error), it catches the exception, marks that field as exists = false in the ontraport_custom_fields table, and then recursively calls itself to try creating the view again.
It is tempting to view this documented n8n workflow as a copy-paste template for any API sync. **This is not the case.** The workflow is heavily customised to Ontraport’s specific behaviours. Attempting to apply it directly to another API like Salesforce would fail.
The value of this documentation lies in providing a **framework and a methodology**. It gives you a process to follow and a checklist of critical questions to answer for any new API:
Starting from this framework will be faster than starting from a blank slate, but it does not eliminate the need for a dedicated discovery and development process for each new API.
Once the initial bulk sync is complete for all tables, the strategy must shift from a one-time data dump to an ongoing, incremental process that captures both new and, crucially, **modified** records.
The most effective way to achieve this is by switching the filtering logic from max(id) to max(modification_date). The workflow for each table would then be:
INSERT ... ON CONFLICT DO UPDATE logic to add or update the records in Supabase.It’s entirely possible to start with this “ongoing sync” logic from the very beginning. For an empty table, the max(modification_date) query will return NULL. The workflow can handle this by using a very old default date (e.g., ‘1901-01-01’) as the filter for the first API call, effectively fetching all records.
Note on Deletions: This strategy does not account for records that are deleted in the source system. Handling deletions is a significantly more complex problem, often requiring a full comparison of all IDs between the source and destination, and is considered a separate, advanced topic.
The sync process is complete, but it creates a new, complex challenge for the next phase: AI-driven data analysis and modification.
The architecture is now:
ontraport_contacts table (with f1234 columns) and the ontraport_contacts_view (with human-readable names).ontraport_contacts_view is accessed as a Foreign Table, allowing the LLM to interact with it.The Conundrum: The LLM sees and queries the clean, human-readable column names from the foreign table. However, it cannot perform modifications like changing a column’s data type or adding comments, as these operations are not permitted on views or foreign tables. Those changes must be applied to the original base table on the client’s server, which has the cryptic f1234 names.
This creates a fundamental disconnect. How can an LLM, working with the alias Investment Property Address, know that to add a comment, it must target the column f1430 on a different table on a remote server? Solving this is a significant challenge and represents the next chapter in this integration journey.
Regardless of the above difficulty, once the data is synced / imported, it immediately appears in the “CRM” environment I’ve set up which allows visibility and data interaction with any Supabase table:
And the AI Chat app also has immediate access to all that data as it has full access to Supabase.
AI and automation are two distinct concepts that can work powerfully together, but they are…
n8n, for all its power in workflow automation, is NOT natively built for streaming HTTP…
In navigating the current AI landscape, it's become clear that relying on off-the-shelf platforms, even…
Large Language Models (LLMs) are incredible pieces of technology, capable of generating remarkably human-like text,…
N8n provides two main views of your workflows. The workflow list shows you basic information…
I wanted to extract accurate execution times for all nodes in all my n8n workflows,…