Real LLM Streaming with n8n – Here’s How (with a Little Help from Supabase)

Hey fellow n8n enthusiasts!

If you’ve been building AI chat applications with n8n as your backend, you’ve undoubtedly hit the same wall I did: getting those slick, character-by-character streaming responses from your LLM into your custom UI. We see it with ChatGPT, Claude, and others – that immediate feedback is crucial for a good user experience. Waiting for the full n8n workflow to complete before anything appears on screen can feel like an eternity, especially when your workflows involve RAG, tool use, or complex context injection.

The bad news? n8n, for all its power in workflow automation, is NOT natively built for streaming HTTP responses. Its sequential, node-by-node execution model is fantastic for many tasks, but it’s a fundamental blocker for true LLM streaming.

The good news? I’ve been wrestling with this and have landed on a robust architectural pattern that brings that smooth streaming experience to n8n-powered UIs, primarily by leveraging the power of Supabase Edge Functions and Realtime subscriptions.

The Core n8n Streaming Challenge

Many have pondered: “Can’t I just use a Code node to import a WebSocket library and handle the stream there?” While the Code node is incredibly versatile, it still operates within n8n’s sequential flow. It will likely wait for the entire streaming interaction to finish before passing a complete result to the next node, or it simply won’t output the stream in a way your UI can consume live. The bottom line is, direct streaming out of a standard n8n workflow is problematic.

Understanding Common Web Streaming Technologies

Before we dive into the Supabase solution, it’s helpful to understand the common ways web applications typically handle real-time data streaming. This context highlights why n8n’s architecture presents a challenge and why we need to look at alternative patterns.

WebSockets

WebSockets provide a persistent, full-duplex (two-way) communication channel between a client (like your web browser) and a server. Once established, both the client and server can send data to each other at any time, making it highly efficient for truly interactive applications like live chat, online gaming, or collaborative editing. For LLM streaming, a WebSocket could theoretically allow the server to push text chunks to the UI as they’re generated. However, managing WebSocket connections, especially at scale, and integrating them into a non-streaming-native backend like n8n, requires careful server-side setup.

Server-Sent Events (SSE)

SSE is a simpler, unidirectional technology where the server can push data to the client over a standard HTTP connection, but the client cannot send data back to the server over that same SSE connection (it would use a separate HTTP request for that). SSE is often easier to implement than WebSockets and is an excellent fit for scenarios where the server needs to send a continuous stream of updates to the client, such as news feeds, live score updates, and – importantly for us – LLM response streaming. Many LLM APIs offer SSE as their streaming protocol. The challenge remains: how does n8n, as an intermediary, handle and forward an SSE stream from an LLM to the UI?

Long Polling (and its variants)

This is an older technique used to simulate a server push. The client makes an HTTP request to the server, and the server holds that request open until it has new data to send. Once data is sent (or a timeout occurs), the client immediately makes another request. While it can achieve a semblance of real-time updates, it’s less efficient than WebSockets or SSE due to the overhead of repeated HTTP requests and potential latency. It’s generally not the preferred method for high-performance streaming like LLM responses.

Given these options, the ideal scenario for LLM streaming involves the backend being able to efficiently manage and forward data chunks received from the LLM (often via SSE from the LLM API itself) directly to the client. This is where n8n’s standard request-response model for its nodes hits a limitation.

The Workaround: Offload Streaming to Supabase

Instead of trying to force n8n to do something it’s not designed for, this approach offloads the actual LLM communication and stream handling to a Supabase Edge Function. Here’s the gist:

Supabase Edge Functions: Think of these as nimble, serverless functions (running Deno) that you can deploy with ease. They can make API calls (like to your LLM), run your TypeScript/JavaScript logic, and integrate seamlessly with the Supabase ecosystem. For our purposes, an Edge Function becomes our dedicated “streaming agent.”

Supabase Realtime: This feature lets your client-side application subscribe to changes in your Supabase database. When data is updated or inserted, your UI gets notified instantly.

The core idea is to have the Edge Function call the LLM, receive the stream, and write it chunk by chunk into a Supabase database table. Your UI, subscribed to this table, then picks up these chunks in real-time and displays them.

Two Key Architectural Flavours

I’ve found two main ways to structure this, depending on your needs:

Approach 1: n8n as the Primary Orchestrator (UI → n8n → Edge Function → LLM → DB Stream → UI)

1. Your frontend app sends the user’s message to your main n8n webhook.
2. n8n does its usual magic: pre-processing, context retrieval (RAG), determining which tools might be needed, etc.
3. n8n then makes an HTTP request to a dedicated Supabase Edge Function, passing along the prepared prompt and any necessary context.
4. The Edge Function calls the LLM API. As the LLM streams back its response, the Edge Function writes each chunk (or a series of chunks) to a specific row/table in your Supabase database.
5. Your UI, which established a Realtime subscription to that database location when the message was first sent, receives these chunks as they arrive and updates the chat display.
6. Once the stream is complete, the Edge Function can inform n8n, and n8n can perform post-processing (logging, token counts, etc.) and send a final HTTP 200 response back to the UI (which can carry final metadata like costs or transcription).

This keeps n8n firmly in control of the overall workflow logic, which is often desirable.

Approach 2: Edge Function as the Primary Orchestrator (UI → Edge Function → n8n [for tasks] → LLM → DB Stream → UI)

1. Your UI sends the user’s message directly to the Supabase Edge Function.
2. The Edge Function can make calls to n8n webhooks for discrete pre-processing tasks (e.g., “fetch context for this user query”).
3. Once pre-processing is done, the Edge Function calls the LLM and streams the response to the Supabase database, just like in Approach 1.
4. The UI, again, picks this up via its Realtime subscription.
5. The Edge Function can call n8n again for any post-processing tasks.

This approach can be beneficial if you’re aiming for the absolute lowest latency for the stream itself to begin, or if you’re planning for more complex real-time bidirectional communication. For instance, this is the path I’d lean towards for voice chat, where audio chunks might be streamed to the Edge Function, which then orchestrates n8n for context and then streams audio back from an LLM. Your UI could even intelligently switch between these two approaches based on the interaction type.

The Journey: Why I Pivoted My Approach

For this test I used the chat interface I’ve been building using Svelte:

 

You can read more about why I ended up coding my own chat UI rather than using OpenWeb UI here: https://demodomain.dev/2025/05/22/the-shifting-sands-of-the-ai-landscape/

I initially went down the path of UI → Edge Function (Approach 2) for standard text chat. However, the complexity quickly ramped up, especially with:

  • Iterative Function Calls: LLMs that use tools/function calls require a loop. The LLM responds with a function call request, your system executes it, sends the result back, and the LLM continues. Managing this loop, where the Edge Function had to repeatedly call n8n, collect results, and maintain state, became quite involved.
  • Data Volume: Passing potentially large amounts of data (like Base64 encoded images or extensive context) between the Edge Function and n8n multiple times per user turn was cumbersome.
  • n8n Debugging: Trying to inspect n8n executions with 10-20MB of JSON data looping through workflows was a browser-crashing nightmare!

This led me to favour Approach 1 (UI → n8n → Edge Function) for most text-based chat scenarios, as it centralises more of the complex state management within n8n, which is better equipped for it, while still achieving the streaming UX.

Is This a “Hack”? Or a Viable Solution?

You might wonder if this is just a complicated hack. I’d argue it’s a pragmatic solution that leverages the strengths of each platform: n8n for workflow automation and Supabase for its excellent serverless functions and real-time capabilities. It’s not misusing Supabase; it’s using its features as intended to bridge a gap in n8n’s current feature set. The core Supabase services are solid and designed for this kind of real-time data propagation.

What If I’m Not Using Supabase?

This specific solution is, admittedly, tailored for those using or willing to adopt Supabase. Could you build something similar without it? Yes, but you’d be looking at:

  • Setting up and managing your own dedicated streaming server (e.g., a Node.js/Python app with WebSockets or SSE in its own container).
  • Implementing your own database and real-time update mechanism if you want to decouple the stream generation from the client connection.
  • More complex infrastructure and orchestration.

For those already in the n8n + Supabase world, this pattern offers a significantly smoother path.

Wrapping Up

Achieving a truly interactive, streaming chat experience with n8n as your backend is possible. It requires a bit of architectural creativity, but by offloading the direct LLM stream handling to Supabase Edge Functions and using Realtime subscriptions, you can deliver the UX users expect.

The beauty of this is the flexibility. You can keep n8n at the heart of your complex logic while still getting that responsive UI.

This has been a journey of trial, error, and refinement for me. The complexities, especially around managing iterative function calls and large data payloads within these streaming loops, are non-trivial.

demodomain

Recent Posts

The Shifting Sands of the AI Landscape

In navigating the current AI landscape, it's become clear that relying on off-the-shelf platforms, even…

3 weeks ago

Understanding the Brains (or lack thereof) Behind Your Chat App: Why LLMs Aren’t What You Might Think

Large Language Models (LLMs) are incredible pieces of technology, capable of generating remarkably human-like text,…

1 month ago

Building a Comprehensive N8n Command Center with Grafana: The Detailed Journey

N8n provides two main views of your workflows. The workflow list shows you basic information…

3 months ago

Extracting n8n Workflow Node Execution Times and Displaying in Grafana

I wanted to extract accurate execution times for all nodes in all my n8n workflows,…

3 months ago

The Art and Science of Syncing with n8n: A Technical Deep-Dive

Syncing seems easy, right? Just grab data from System A, push it into System B,…

4 months ago

Multi-Model, Multi-Platform AI Pipe in OpenWebUI

OpenWeb UI supports connections to OpenAI and any platform that supports the OpenAI API format…

4 months ago