Hey fellow n8n enthusiasts!
If you’ve been building AI chat applications with n8n as your backend, you’ve undoubtedly hit the same wall I did: getting those slick, character-by-character streaming responses from your LLM into your custom UI. We see it with ChatGPT, Claude, and others – that immediate feedback is crucial for a good user experience. Waiting for the full n8n workflow to complete before anything appears on screen can feel like an eternity, especially when your workflows involve RAG, tool use, or complex context injection.
The bad news? n8n, for all its power in workflow automation, is NOT natively built for streaming HTTP responses. Its sequential, node-by-node execution model is fantastic for many tasks, but it’s a fundamental blocker for true LLM streaming.
The good news? I’ve been wrestling with this and have landed on a robust architectural pattern that brings that smooth streaming experience to n8n-powered UIs, primarily by leveraging the power of Supabase Edge Functions and Realtime subscriptions.
Many have pondered: “Can’t I just use a Code node to import a WebSocket library and handle the stream there?” While the Code node is incredibly versatile, it still operates within n8n’s sequential flow. It will likely wait for the entire streaming interaction to finish before passing a complete result to the next node, or it simply won’t output the stream in a way your UI can consume live. The bottom line is, direct streaming out of a standard n8n workflow is problematic.
Before we dive into the Supabase solution, it’s helpful to understand the common ways web applications typically handle real-time data streaming. This context highlights why n8n’s architecture presents a challenge and why we need to look at alternative patterns.
WebSockets provide a persistent, full-duplex (two-way) communication channel between a client (like your web browser) and a server. Once established, both the client and server can send data to each other at any time, making it highly efficient for truly interactive applications like live chat, online gaming, or collaborative editing. For LLM streaming, a WebSocket could theoretically allow the server to push text chunks to the UI as they’re generated. However, managing WebSocket connections, especially at scale, and integrating them into a non-streaming-native backend like n8n, requires careful server-side setup.
SSE is a simpler, unidirectional technology where the server can push data to the client over a standard HTTP connection, but the client cannot send data back to the server over that same SSE connection (it would use a separate HTTP request for that). SSE is often easier to implement than WebSockets and is an excellent fit for scenarios where the server needs to send a continuous stream of updates to the client, such as news feeds, live score updates, and – importantly for us – LLM response streaming. Many LLM APIs offer SSE as their streaming protocol. The challenge remains: how does n8n, as an intermediary, handle and forward an SSE stream from an LLM to the UI?
This is an older technique used to simulate a server push. The client makes an HTTP request to the server, and the server holds that request open until it has new data to send. Once data is sent (or a timeout occurs), the client immediately makes another request. While it can achieve a semblance of real-time updates, it’s less efficient than WebSockets or SSE due to the overhead of repeated HTTP requests and potential latency. It’s generally not the preferred method for high-performance streaming like LLM responses.
Given these options, the ideal scenario for LLM streaming involves the backend being able to efficiently manage and forward data chunks received from the LLM (often via SSE from the LLM API itself) directly to the client. This is where n8n’s standard request-response model for its nodes hits a limitation.
Instead of trying to force n8n to do something it’s not designed for, this approach offloads the actual LLM communication and stream handling to a Supabase Edge Function. Here’s the gist:
Supabase Edge Functions: Think of these as nimble, serverless functions (running Deno) that you can deploy with ease. They can make API calls (like to your LLM), run your TypeScript/JavaScript logic, and integrate seamlessly with the Supabase ecosystem. For our purposes, an Edge Function becomes our dedicated “streaming agent.”
Supabase Realtime: This feature lets your client-side application subscribe to changes in your Supabase database. When data is updated or inserted, your UI gets notified instantly.
The core idea is to have the Edge Function call the LLM, receive the stream, and write it chunk by chunk into a Supabase database table. Your UI, subscribed to this table, then picks up these chunks in real-time and displays them.
I’ve found two main ways to structure this, depending on your needs:
1. Your frontend app sends the user’s message to your main n8n webhook.
2. n8n does its usual magic: pre-processing, context retrieval (RAG), determining which tools might be needed, etc.
3. n8n then makes an HTTP request to a dedicated Supabase Edge Function, passing along the prepared prompt and any necessary context.
4. The Edge Function calls the LLM API. As the LLM streams back its response, the Edge Function writes each chunk (or a series of chunks) to a specific row/table in your Supabase database.
5. Your UI, which established a Realtime subscription to that database location when the message was first sent, receives these chunks as they arrive and updates the chat display.
6. Once the stream is complete, the Edge Function can inform n8n, and n8n can perform post-processing (logging, token counts, etc.) and send a final HTTP 200 response back to the UI (which can carry final metadata like costs or transcription).
This keeps n8n firmly in control of the overall workflow logic, which is often desirable.
1. Your UI sends the user’s message directly to the Supabase Edge Function.
2. The Edge Function can make calls to n8n webhooks for discrete pre-processing tasks (e.g., “fetch context for this user query”).
3. Once pre-processing is done, the Edge Function calls the LLM and streams the response to the Supabase database, just like in Approach 1.
4. The UI, again, picks this up via its Realtime subscription.
5. The Edge Function can call n8n again for any post-processing tasks.
This approach can be beneficial if you’re aiming for the absolute lowest latency for the stream itself to begin, or if you’re planning for more complex real-time bidirectional communication. For instance, this is the path I’d lean towards for voice chat, where audio chunks might be streamed to the Edge Function, which then orchestrates n8n for context and then streams audio back from an LLM. Your UI could even intelligently switch between these two approaches based on the interaction type.
For this test I used the chat interface I’ve been building using Svelte:
You can read more about why I ended up coding my own chat UI rather than using OpenWeb UI here: https://demodomain.dev/2025/05/22/the-shifting-sands-of-the-ai-landscape/
I initially went down the path of UI → Edge Function (Approach 2) for standard text chat. However, the complexity quickly ramped up, especially with:
This led me to favour Approach 1 (UI → n8n → Edge Function) for most text-based chat scenarios, as it centralises more of the complex state management within n8n, which is better equipped for it, while still achieving the streaming UX.
You might wonder if this is just a complicated hack. I’d argue it’s a pragmatic solution that leverages the strengths of each platform: n8n for workflow automation and Supabase for its excellent serverless functions and real-time capabilities. It’s not misusing Supabase; it’s using its features as intended to bridge a gap in n8n’s current feature set. The core Supabase services are solid and designed for this kind of real-time data propagation.
This specific solution is, admittedly, tailored for those using or willing to adopt Supabase. Could you build something similar without it? Yes, but you’d be looking at:
For those already in the n8n + Supabase world, this pattern offers a significantly smoother path.
Achieving a truly interactive, streaming chat experience with n8n as your backend is possible. It requires a bit of architectural creativity, but by offloading the direct LLM stream handling to Supabase Edge Functions and using Realtime subscriptions, you can deliver the UX users expect.
The beauty of this is the flexibility. You can keep n8n at the heart of your complex logic while still getting that responsive UI.
This has been a journey of trial, error, and refinement for me. The complexities, especially around managing iterative function calls and large data payloads within these streaming loops, are non-trivial.
In navigating the current AI landscape, it's become clear that relying on off-the-shelf platforms, even…
Large Language Models (LLMs) are incredible pieces of technology, capable of generating remarkably human-like text,…
N8n provides two main views of your workflows. The workflow list shows you basic information…
I wanted to extract accurate execution times for all nodes in all my n8n workflows,…
Syncing seems easy, right? Just grab data from System A, push it into System B,…
OpenWeb UI supports connections to OpenAI and any platform that supports the OpenAI API format…