I come from a background of 30+ years in coding, before the days of the internet when machine language and C++ was a thing. I’ve seen the evolution to web apps and smartphones, then to low-code and no-code options, and “vibe” coding with AI. I embrace it all. I am pleased to have a solid understanding in coding logic and methodologies, and an understanding of how things operate “under the hood”. These core principals haven’t been replaced; they’ve merely been abstracted away from the user. This is great for users who want to get involved in building something but it results in many people not truly understanding the processes involved, and often not being able to differential between hype and real solutions.
The AI revolution is an exciting and rapidly evolving landscape, full of hype and solutions. Commercial platforms offered by OpenAI, Anthropic, and Google allow anyone to “chat with AI” but for the business user, the frequent change to their features, pricing models, API rate limits, model behaviours, and even service reliability can cause headaches. It makes it impractical for users to commit long-term to a single provider. Features you rely on might disappear or change detrimentally, while new, desirable features (like improved coding intelligence, memory retention, lower costs, or faster speeds) might emerge on competing platforms. Consequently, users often find themselves needing accounts across multiple services, juggling different interfaces, and migrating histories and documents, which is cumbersome. This constant flux creates a fundamental need for flexibility and adaptability in how users interact with AI models.
On the flip side to “all in one” platforms, there’s customisation and thank god we no longer need a team of engineers and room full of computers to build custom technology solutions. Low code solutions are revolutionary and YouTube is a wonderland of quick-fix AI customisations. Titles like “Build an AI agent in 7 minutes” or “Automate your sales calls with AI” promise instant customised workflows guaranteed to deliver results with effortless implementation.
However, these videos often showcase proof-of-concept demos, far removed from the realities of deploying robust, production-ready systems. The marketing often oversells simplicity while glossing over the true complexity and hidden costs involved in making these customised systems truly effective. It’s a sobering reality for those diving in expecting quick wins, only to find themselves managing a spiderweb of interconnected services, getting lost in rabbit-holes of weekend tinkering, always feeling like the final result is only a few more hours work away. These approaches often position tools like n8n as the entire “solution,” when in reality, n8n is a powerful orchestration tool that needs to be part of a larger, well-architected system.
The wording used in the above Youtube videos is key to the mindset. “I built the ultimate RAG agent”. I think most people don’t really know what RAG is or how it works which is reflective of the problem of abstracting the technical complexities away from the user. RAG is a methodology and a system so I’m not sure what a “RAG agent” is. But the “ultimate” version of that “RAG Agent” differs for everyone and it takes more than a 34 minute video using only seven n8n nodes to get a production-ready system in place. So while effortless customised solutions are now more available to everyone, the fundamental concepts of using the right tool for the job, and the underlying technical complexities do need to be considered.
However, I do think all these videos are absolutely fantastic and they’re helping so many people – including non-tech people – get involved in an exciting new era.
I was excited about Open WebUI as it is an attractive solution to some of these problems. It provides a consistent, well-regarded user interface that can connect to various backend LLMs – including local, open-source models – via API keys, offering multi-model support and user/group management features. It aims to abstract away the differences between underlying platforms.
And for many simpler use-cases Open WebUI is fantastic. However, Open WebUI itself becomes a platform with its own form of lock-in. While it connects to multiple models, it doesn’t solve the underlying issues of those models’ changing features, pricing, or limitations, and the burden of selecting the appropriate model is on the user. And while it offers customisation, only certain aspects are customisable and for many these limitations are hit early on.
Users become dependent on the Open WebUI development roadmap. If a backend model introduces a new feature (like ChatGPT’s Memories), users cannot leverage it through Open WebUI until the Open WebUI team decides to implement support, and even then, the implementation might not match the user’s specific needs. Open WebUI is popular because users can code their own (python) “pipes” and “filters”, customising the backend logic. But customising the Open WebUI interface is simply not possible unless you have coding abilities and are prepared to fork the codebase, which presents maintenance challenges. In almost all of the real-world situations where my customers indicate they’re interested in Open WebUI, they ultimately want some kind of extra button, or some custom visual feedback, or they have some other critical requirement that hinges on being able to customise the UI. As AI moves more toward the idea of “agents”, a “chat bot” or “chat with document” is merely one level of interaction the user wants to have with a model. Everything else requires custom interface controls.
Below I’ve taken a screen shot of the most recent bug reports and feature requests on Open WebUI’s github page. I’m not cherry-picking. If you check out Github you’ll see a range of requests for Open WebUI to implement into their user interface support for new features available in the latest models. I love Open WebUI and it’s free and they have a great community. But it’s just not possible for them (actually, it’s really only one person) to keep up with the pace of developments to large language models.
Being able to connect to various models via API is fantastic, but Open WebUI still operates within a paradigm where an “assistant” is tied to a specific model. This requires the user to be aware of each model’s capabilities and restrictions and pre-decide which model to use for a particular assistant. Given the rapid evolution of models, with changes occurring weekly, the choices are complex and often under-documented.
While the industry may eventually consolidate and simplify model selection, we are currently at a point where users need to understand the intricate matrix of features, benefits, and particularly the costs of various models. This means the user still needs to constantly modify the configured assistants, switching between models as more appropriate models for the task at hand come to light.
This is really a critical issue: While Open WebUI introduced a single UI for many models, and promoted in their name and marketing that it’s a “UI”, they baked in RAG functionality as well. This no longer makes Open WebUI a “user interface”, but more like a platform where the interface and logic are combined. Unfortunately, they provided little ability for the user to customise the RAG logic and in my experience there is no such thing as a “one size fits all” RAG workflow. I wonder how many people were drawn to the problem Open WebUI was promoting (having a single UI for multiple models) but got caught up the included RAG functionality, spending hours, days, or weeks trying to customise the RAG process because they weren’t getting good results for their particular context. In my experience, I’ve found RAG to be the single most important feature of an integrated AI system but also the single most complex feature that requires careful consideration and almost always – customisation.
The Rigidities of Open WebUI’s RAG Implementation
It’s worth detailing the issues with Open WebUI’s RAG implementation because it’s an issue that pertains to any offered solution. While Open WebUI offering RAG is a plus, its implementation lacks the flexibility required for diverse, real-world use cases. Key issues include:
Side note: There is significant focus and discussion surrounding RAG and like the hype around Al in general, it’s important to note that Al, and RAG, are not “solutions”. They are tools, and are short-hand terms to describe complex processes. Al doesn’t do anything unless you build a tech stack around it. The same goes for RAG. I think many believe they need to integrate RAG to solve their context issues. Firstly, RAG isn’t always a requirement. There’s a tendency to assume RAG is necessary, but for smaller document sets, injecting full document text into the prompt can yield better results and avoid the significant overhead associated with implementing, tuning, and maintaining complex RAG pipelines. For structured database stores, traditional SQL searches are better-suited. Secondly, if RAG is required, there is no “one size fits all” implementation. Careful analysis of your data sources is required, workflows must be considered and business goals identified.
And to be clear, I know there are answers to the above issues. People will say, “Just connect n8n as your backend”, or “connect an external RAG system”. Absolutely, and that is what makes Open WebUI so powerful. But to set up all that linking to other systems you need to write Python pipes, and the result is that you use Open WebUI purely as a UI – which sounds great – but after all that hard work in customisation you are still left with a UI that can’t be customised to support your custom back-end.
Users ultimately want to focus on obtaining suitable outputs from models – not spending time researching the latest models, being limited by the UI, and tinkering the myriad of RAG configurations for a particular use-case, or trying to build apparently simple n8n workflows. The bottom line is that it’s too early to offer a “one size fits all” platform, or RAG workflow, or even a front-end UI. The industry is still too fragmented to build useful, AI-centric business applications without significant customisation.
I still use Open WebUI and definitely recommend it for certain use-cases. But like the “ultimate RAG Agent”, there is no “ultimate AI interface”. I think sometimes many Youtubers don’t actually realise the complexities because they’re not from a coding background, only know n8n or have only just discovered Open WebUI, and they’re excited about it. I think it’s possible many of them don’t have much real-world experience in designing and implementing production-level, business solutions. But again, I applaud everyone for getting in and adding knowledge into the pool, and for many use-cases the quick and easy linking of OpenWeb UI to n8n (in 7 minutes) does solve people’s problems.
While Open WebUI has done a great job of separating the UI from the models, the AI industry is evolving so rapidly that further decoupling is required and there needs to be greater control over customisation – particularly customising the user interface. I propose a solution that separates concerns into distinct, customisable layers:
This decoupled approach aims to provide maximum flexibility. I’ve spent the last year building out and refining this tech stack. The stack uses free, open-sourced systems and so the only cost is the server to host the systems.
The emphasis is on providing a methodology and a flexible tech stack template that requires consultation and customisation, rather than an off-the-shelf product. The stack has been built with customisation in mind, and leverages no-code platforms. It supports all the possible outcomes a user may want.
Critically, each layer can be modified or replaced independently, providing maximum future-proofing, allowing the system to adapt to the rapidly changing AI landscape. The connections between these layers are kept fairly standardised and generic, meaning you can swap out any layer – use a different UI, replace n8n with another backend tool, or use a different database – without complex rework. I’ve tested this flexibility by successfully using Retool, Open WebUI, and even Telegram as frontend chat interfaces interacting with the n8n backend, demonstrating that the n8n workflows can indeed serve as a backend for any UI or platform such as Slack, MS Teams, and others.
But picking one interface is actually not required. In my experience, users prefer different interfaces for different use-cases. Quickly sending voice recordings to a model for transcription and saving in a “notes” file on Google Drive is best done using Telegram on their phone. Image generation is best done using the full Svelte interface or Open WebUI. Getting summaries of all project chats is best done directly in Slack. Almost anything can be used as the user interface, and all your chats are available across all chosen interfaces because all the logic is centralised within n8n and all the storage within Supabase. There’s no “platform” here; only an easily re-configurable tech stack.
Below are three different platforms, each connecting to the same n8n back-end; A chat app designed in Retool, Telegram, and then Open WebUI:
In terms of model choice, I’ve built the front-end to allow users to select a model for an assistant, but crucially, the backend includes the framework for dynamically switching models based on the context of the interaction. I’ve implemented the initial and most significant decision that can be abstracted away from the user: selecting a model based on the documents they wish to send. My experience shows this is a key pain point, as models have varying support for different file types, sizes, and total files per prompt, as well as varying context window limitations. I believe the user doesn’t have to manage these technical constraints. Therefore, the front-end accepts any file type, and the backend currently converts files into a format compatible with the initially selected model.
However, the underlying structure is in place to allow the backend to select an entirely different model if necessary. My intention is to progressively implement more sophisticated prompt and file pre-processing, along with on-the-fly model switching, over time.
Document management is handled dynamically using a familiar source like Google Drive, polled by n8n for updates, ensuring the knowledge base remains current without requiring a dedicated, static upload interface within the chat application itself.
For administration interfaces, Retool is used because it’s free and it’s very quick to ramp-up a custom admin interface specific to your business needs. Below is an example of the Retool page for linking documents from Google Drive to an Assistant. Dedicated, low-code or no-code tools for these sort of administration tasks are best done outside of the chat interface entirely.
For a direct comparison to Open WebUI, the no-code platform, n8n replaces Python “pipes” and “filters” in Open WebUI, and the RAG baked into Open WebUI is handled by n8n and Supabase. The only requirement for any coding is if you want to customise the UI, but because it’s purely a UI (no logic), it’s very light-weight and easy to customise, particularly using AI as a coding assistant. I find this combination no more complex than Open WebUI; the key complexity with Open WebUI is Python pipes, and in this proposed solution it’s in the UI code.
The user interface, built with Svelte, intentionally mirrors the look and feel of OpenWebUI. This leverages user familiarity and acceptance while serving as a solid foundation. It includes standard features like user authentication (via Supabase Auth, so the UI doesn’t need to include any of that logic), user roles (Admin/Standard), the ability to define models, and the concept of “Assistants” which pre-link specific models with selected knowledge base documents. The core chat interface remains conventional.
The primary improvement lies in the enhanced sidebar and knowledgebase input methods. Alongside the standard chat list, the sidebar prominently displays a list of all available documents sourced from the knowledge base (initially configured to pull from Supabase / Google Drive, but designed to be adaptable to other sources).
Enhancements include search and filtering. Crucially, this sidebar allows users to dynamically select context for their next prompt. When an Assistant is chosen, its pre-associated documents are highlighted. Users can then de-select these, add others from the main list, and critically, toggle the usage mode for each selected document between ‘Full Document’ (sending the entire file, potentially binary) and ‘RAG’ (using vector search). Visual icons clearly indicate the selected mode. Smart defaults are applied (e.g., PDFs, Excel files, images default to ‘Full’), with constraints (images cannot use RAG) and user overrides possible.
Beyond documents, the sidebar incorporates other context sources, demonstrating a modular vision. Users can input a URL, choosing whether n8n should process its ‘Text’ content or generate a ‘Screenshot’ (via Cloud Convert). A proof-of-concept allows inputting a Zoho Projects Task ID, triggering n8n to fetch task details as context so the user can “chat with task”. The design anticipates future plugins for diverse sources like Slack.
Input handling is significantly enhanced to overcome common frustrations. Users can combine multiple input types within a single message submission: typed text, recorded voice audio, dragged-and-dropped files, and inline pasted images. Pasted images appear visually within the text editor, and the UI sends the text with the images IN LINE with the text. This provides clearer context to the LLM compared to simply appending images to the end of the text prompt. Audio recording is captured, but transcription is deferred to n8n; the user retains manual control over sending the message after recording, allowing further edits or additions.
The Svelte layer remains purely a UI, performing minimal logic beyond managing selections, interacting with Supabase for data, and packaging the comprehensive user input (text with placeholders, raw audio, file data/references with modes, URLs with modes, Task IDs, model selection) for the n8n backend. It deliberately avoids any preprocessing like text extraction, file conversion, or transcription.
The Svelte UI is designed to provide clear feedback and control to the user. It provides token estimation for selected documents, drag-dropped documents, and pasted images. This uses Google’s token counting API endpoint, and for OpenAI, custom code in n8n replicates their exact token calculation method for images. For non-image files, the token count is estimated as bytes divided by 4.
A visual outline gauge shows the context window and how many tokens will be used on the next turn, with a tooltip providing a detailed breakdown.
Furthermore, a calculator icon appears under each message in the chat history, allowing users to see the exact token usage and exact cost for that specific turn on hover.
And user monthly budgets can be set and users can see their current monthly spend in their profile.
To provide transparency into backend processes, a spanner icon appears under LLM chat bubbles when tool calls or function calls were made during that turn. Hovering over this icon displays a tooltip showing the exact function calls made.
The n8n workflow acts as the central brain, receiving the rich payload from Svelte via a webhook. It begins with session management, checking for an existing chat session ID or creating a new one in Supabase. The workflow then meticulously deconstructs the incoming payload, identifying and separating typed text, audio data, various file types (dragged/dropped, pasted inline), document IDs flagged for ‘Full’ or ‘RAG’ use, URLs with their modes, and specific tool inputs like Zoho Task IDs.
A key feature is the context caching mechanism implemented using Supabase. As inputs are processed or generated (e.g., file conversions, URL screenshots, Zoho task details), the resulting data is cached with a unique ID. This serves both performance and, more importantly, context persistence. Recognising that non-textual context (images, full documents) sent in system prompts is often lost after a single LLM turn, this cache allows context to be explicitly recalled by the model via a function call.
A getPreviousContext tool is made available to the LLM. The system prompt includes a list of previously sent context items (with IDs, filenames, sizes where applicable) that are no longer actively selected by the user. If the LLM determines it needs access to one of these past items based on the user’s query, it can call the getPreviousContext tool with the relevant ID. The n8n workflow retrieves the item from the Supabase cache and returns it to the LLM, effectively re-injecting the past context into the current turn.
Before processing files selected for ‘Full Document’ mode, the workflow checks the target LLM’s capabilities (stored in Supabase). If necessary, it performs file conversions using n8n’s built-in nodes or the Cloud Convert API, caching the converted result to avoid redundant processing. Google Drive documents linked via Assistants are fetched using the Google Drive node, leveraging its on-the-fly conversion capabilities (e.g., Sheet to CSV).
Specific inputs like Zoho Task IDs trigger API calls to fetch relevant data. URLs are processed based on the selected mode (fetching text or generating a screenshot via Cloud Convert). Raw audio data is transcribed using a chosen service, currently OpenAI’s Whisper.
The RAG process is handled with particular attention to customisation. Acknowledging that LLMs generating search queries need full context, the workflow sends the entire conversational context to a dedicated LLM call tasked with generating multiple expanded search queries suitable for vector search. These queries are executed in parallel against the Supabase vector store. Simultaneously, another LLM call extracts relevant keywords from the full context and again does query expansion. These keywords drive a parallel full-text search using Supabase’s tsvector/tsquery. This is where the RAG implemented deviates from the commonly touted norm. My experience has shown that a one-size-fits-all ranking system for RAG results is ineffective. Consequently, the n8n workflow applies custom ranking logic, depending on the use-case. For example, you can configure n8n to prioritise keyword matches based on data source specifics, such as giving higher weight to keywords found in Zoho Task titles or descriptions, or document filenames, rather than relying solely on standard relevance scores. While every user’s case will differ, the framework is designed for easy customisation. It incorporates both vector and keyword search logic (hybrid search), with the ranking logic implemented as clear, customisable steps within the workflow. Furthermore, by using Supabase for data storage, hybrid searches and re-ranking logic can potentially be offloaded to Supabase functions for improved performance.
A significant RAG enhancement is the support for different prompt intentions. Similar to the requirement to pre-select a model for an assistant, pre-determining whether to use RAG or send the full document before a conversation begins is a limiting factor in systems like Open WebUI. The choice between RAG and full documents depends heavily on the user’s intent – are they “chatting with docs” to find specific answers and citations, or using the documents as broader context for brainstorming or asking aggregate/layout questions that require full document understanding? This requirement to switch between RAG and full context can change mid-conversation. For example “What was last year’s annual turnover” might require the LLM to limit its answer to the specific context of the documents provided – that’s an easy, standard use-case. But if the user wants to go onto discussing a specific document (“what is this document about?”), or ask a question such as “Does the summary properly encapsulate the detail of the document?”, then the model needs the entire document, and a user-specified document, not just RAG results from all documents. Asking, “Does the image have axis labels as the same name as the paragraph on the previous page?” Not only requires the full document, but also the full binary of the document so the model can use OCR to “view” the document images and text. RAG alone can not address the dynamic nature of user interactions with LLMs.
As a user, constantly having to consider whether the model can accept a document or if the question requires RAG, or a specific document, in a specific format, becomes frustrating. To address this, the system allows the user to set a preset, but the model can also decide. For example, RAG results can initially be injected into the system prompt (unless the user has explicitly opted for full documents) but the IDs of the source documents are also included with the RAG results, enabling the model to call a function to retrieve the binary or full document if its analysis determines that the full context is necessary, or if the user, using natural language requests the model to do so.
The workflow also incorporates a memory system. A separate, scheduled n8n process analyses user data and chat history to synthesise and store ‘memories’ in Supabase. The main workflow currently retrieves all these memories and adds them to the LLM context. Future plans involve implementing a RAG-on-Memories system for more relevant retrieval.
Context finalisation involves collating chat history, adding the current timestamp, the time elapsed since the last response, and user locale information (with instructions for the LLM to adapt).
Critically, the final call to the main LLM is made using n8n’s standard HTTP Request node, deliberately avoiding the built-in AI Agent nodes. This decision stems from significant limitations identified in the agent nodes regarding error handling (sub-node errors halting workflows without capture), token usage tracking (data inaccessible to subsequent nodes), dynamic provider switching, granular control over tool call descriptions and responses (especially handling binary data), and the inability to use LLM features not explicitly supported by the n8n nodes (like context caching or image URLs). While requiring meticulous manual construction of API payloads, error handling logic, and tool call loops for each provider, the HTTP Request node provides the necessary control, flexibility, and access to all underlying LLM features, forming the core of this adaptable template. The complexity of this setup is part of the value – clients receive a pre-built, tested framework that offers this level of control.
Tool call requests from the LLM are detected in the HTTP response. The workflow handles the execution, potentially calling other n8n workflows or using internal logic, managing sequential calls or loops, and formatting the response correctly for the specific LLM provider before making the subsequent call.
The final LLM response text is extracted, token usage is recorded in Supabase (with costing), and the result is sent back to the Svelte UI via a ‘Respond to Webhook’ node. While this architecture provides immense flexibility, a known limitation is n8n’s difficulty with native real-time streaming responses back to the UI, which might impact future live-chat features.
On the second turn of a conversation, a separate LLM call generates a concise chat title, which is saved and sent back to Svelte for display in the sidebar.
A particularly exciting aspect of this architecture is the approach to image generation. You don’t need to select a specific “image generation” model (however you can if you want); instead, all models have an image generation “tool” or function available to them via the chat input area.
This means you can simply discuss the image you want within the natural flow of conversation, and the model can formulate the prompt to generate it using the tool. With the tool activated the UI displays controls for the number of images to generate, aspect ratio, and the image generation model itself can be specified. Custom-trained image generation models via Replicate are supported.
All generated images are stored permanently in the Supabase database. While permanence is useful, the primary reason for storing images in the database is to manage context window limitations during image generation discussions. Instead of storing the full, potentially large, base64 image data in the chat history, only the image ID from Supabase is stored. These image IDs are dynamically rendered into visible images by the Svelte UI, allowing the user to see all images in the chat history. The image IDs are available to the model if it needs to reference a previous image for a new generation request, but the history remains manageable as it doesn’t contain the full image data.
This allows the model to retrieve any previously generated images stored in the Supabase database. If you make reference to an image in the chat (e.g., “what type of microphone is in that image?”), the model can retrieve that specific generated image and discuss it with you, and can then pass that image ID onto the next image generation request, as a reference image.
Image generation as a tool means the system prompt, any documents attached, and the chat history are all included for the model to use as context when you want to generate images. It can use all that context to help you refine your desired outcome and it can go on to build the image generation prompt for you.
This combination of image generation as a tool, the models ability to control the image generation process, the model’s ability to “see” and discuss images, and the capacity to use previously generated images as references for new requests is quite powerful in that you are effectively having a conversation with an image generation model, something OpenAI and other don’t even offer yet.
Users can click on any image for a full screen view which has icons to copy and download the image.
In navigating the current AI landscape, it’s become clear that relying on off-the-shelf platforms, even those offering some level of customisation like OpenWebUI, presents significant limitations. These platforms, while useful starting points, often bake in assumptions and rigidities, particularly around critical features like RAG, that don’t cater to the diverse and dynamic needs of real-world business applications. Similarly, the promise of quick-fix AI solutions often seen in online tutorials, while great for proof-of-concept demos, glosses over the true complexity and customisation required for robust, production-ready systems.
The tech stack I’ve put together is the culmination of 30 years coding and business experience, and my experience over the past year working closely with clients and building AI-centric workflows. This experience has reinforced my view that we need a fundamentally different approach. By advocating for a decoupled architecture with clear separation of concerns across the UI (Svelte), Logic (n8n), and Data (Supabase) layers, we gain the flexibility and adaptability necessary to thrive in this rapidly evolving space. This approach acknowledges that the industry is still maturing and consolidating, and until it does, a truly customised solution, built with future-proofing in mind, offers the best path forward. It’s an investment in a framework that can evolve alongside the technology, rather than being constrained by the limitations of a single platform.
Large Language Models (LLMs) are incredible pieces of technology, capable of generating remarkably human-like text,…
N8n provides two main views of your workflows. The workflow list shows you basic information…
I wanted to extract accurate execution times for all nodes in all my n8n workflows,…
Syncing seems easy, right? Just grab data from System A, push it into System B,…
OpenWeb UI supports connections to OpenAI and any platform that supports the OpenAI API format…
On Reddit, and elsewhere, a somewhat "hot" topic is using OWUI to manage a knowledge…