Building scalable AI agents for multi-tenant SaaS applications requires a layered architecture with an agent gateway for security and authentication, an agent orchestrator to manage isolated agent instances per tenant, and Redis for conversation state persistence to handle unstable network connections and long-running tasks.
深度探索
先修知识
- 暂无数据。
后续步骤
- 暂无数据。
深度探索
I quit my software engineering job for this本站添加:
So, I recently quit my software engineering job to pursue my AI startup full-time.
And after talking with some peers who are making several hundred,000 a year off of AI solutions, um, and us, we've been at this startup for almost a year now, and we're not really having any repeatable business, but I think we finally found the offer that will help people. And kind of the purpose of this video is is I want to share some of the technical challenge I'm running into as a software engineer.
I noticed a lot of videos on YouTube right now, and this isn't against anybody, is that they're just they're so surface level.
So today I want to kind of give you a deep dive into what it takes to actually build AI uh for a SAS that can potentially scale and some of the problems I ran uh I ran into along the way and how I solved them.
So to start off, what I mean by AI solutions is is we're noticing the really the the solution that people are currently paying for in the market are these AI agents. So and what we mean by an AI agent is is an AI agent can almost act like an employee or employee replacement.
the agents we've already sold are actually being used as almost an extension of their current employees so that they get more like the employees can do more work faster without them having to have more staffing overhead and we really want to take this and apply it to our software as a service.
So how we're doing that is is we need a way for people inside of our application. So, we'll go ahead and write our client here. This is our web portal, which is our software as a uh software as a service. And to get more specific, we're running Opulence AI, which we help general contractors basically eliminate a lot of the administrative work and even help them like earn more referrals and do some marketing.
And we were previously doing it this way where each request from a chat bubble would go to a server where this server would proxy the uh LLM request to a downstream AI provider such as Anthropic.
And we were using something like uh Sonnet 4.6 six to do the heavy lifting and we were using a technology called um AI SDK to actually send the messages across and what AI SDK would give us is it runs both on the client and the server. It handles the uh like the actual sending the message to the large language model through this uh server endpoint which would actually authenticate against anthropic the the sonnet model we're using. On top of that, it would it would handle things such as making tool calls on our server endpoint here and uh setting up the things like the prompts. So what what's really happening is is this client is just sending user messages over to an API server that then takes that user message uh also appends it with the system prompt and then sends it out to anthroic and it can even sit there in tool loop until it actually achieves a result. So an example of this would be in our opulence application here you can go create new site reports.
Uh hello, today is a new site report.
And originally when you hit create site report is it was sending it to an AI in order to just clean up the document. So we're now having this new architecture where it all goes to one agent that handles a lot of this work.
So, now that I've kind of showed you how we're doing it today, I want to show you the new way of doing it moving forward.
And it's actually pretty similar to this, but instead of that dedicated API server I was talking about, I've actually spun up a brand new uh what I'm calling the agent gateway and the agent orchestrator.
And then finally we have our Hermes instances.
And for those who don't know, Hermes is uh very similar to OpenClaw, meaning it has a whole like messaging gateway. It has cron tasks um and so much more. It comes with skill files included. It's like a total agentic solution just like open cloud but also exposes an open AI compatible uh endpoint which means our existing code that already uses AI SDK because it was already hitting anthropic is going to be totally compatible with our Hermes instance.
So it looks something like this.
And throughout this process, I've also been playing with more uh open source models just because a lot of like the uh the pricing is starting to change for these Frontier models and they're getting really expensive. So, we're using uh DeepSeek V4 Flash right now. Amazing model. It's like it almost feels like sonnet but with a million context window and it's only like what 28 cents per million output tokens or something like that like 8 cents for I don't know the exact numbers but it's incredibly cheap for sonnet like performance and the inference is incredibly fast.
So that was Deep Seek V4 Flash and I can kind of walk through I can try to give you a more coherent explanation of the problems we're solving here. So I have Hermes agents that need to take actions on behalf of our general contractors. uh client uh like we have Hermes agents that need to be able to take actions on the behalf of our audience to help them out.
And what each of these layers does here is we have the Hermes agent which is our actual agentic solution.
So our agents take actions on behalf of the user.
We have our LLM provider which is deep infra in this case.
And you can think of Hermes as the harness around the actual LLM provider.
So other examples of harnesses include Open Code, GitHub Copilot, Cloud Code, Codeex. They all ship harnesses on top of the LLM provider. What we're doing here is Hermes kind of decouples that from us. So we can pick any provider we want. We could use those other harnesses too underneath Hermes if we wanted to, but we notice it's significantly cheaper and much faster just to use an open model with this Hermes uh harness.
So, every time Hermes gets a request, it's going to send uh messages back and forth speaking that Open AI compatible, you know, language.
Um and then these other bits here because we have a multi-tenant software as a service application.
This gateway has to check every time a user sends a message from our portal which I can show you right here. This is just this chat window.
Hello.
Every time this request goes through, it's going to go over to our agent gateway. And what the agent gateway does is is you don't see this because it's behind the scenes is it's sending a user. Hey Hermes, whatever I said to it.
And then along with it is it's sending its tenant ID. So tenant ID UU ID something. And the agent is responsible for a few things. It goes and checks that is this user part of the tenant it's requesting and also there's some security risks we're facing as well by exposing agents directly to our front end. Uh an example of that is is it's really like easy for someone to like abuse this and try to get like API keys out of it or to jailbreak it. So it starts uh you know doing some unethical requests perhaps.
So does this message pass content moderation?
And then basically oh and the another thing it also checks is is this user user a valid and authenticated user because it's also sending the superbase access token along with every request.
So we do some magic on this side to check that that access token is actually valid um and is part of the tenant that's requesting.
So authenticated and authorized.
Once it checks all this goodness, it then forwards that request to the agent orchestrator. And this is where things start getting really interesting.
So the agent orchestrator is responsible for managing the entire life cycle of the Hermes instances. And why this gets so complicated is that every Hermes instance to be sufficiently secure and to have like an a proper agent set up for every tenant we serve there h they have to have their own Hermes instances. And where does Hermes actually run? Well, the easiest way to do it is to run Hermes inside of a Docker instance. So the agent orchestrator gets to really gets to do the really fun stuff of spinning up a docker instance and making sure it's healthy for every single tenant we have in our system. So this agent gateway and agent orchestrator live on one giant virtual machine that manages other Docker instances. I know it's a virtual machine within a virtual machine within a server hack. Lots of layers here.
But practically when it gets a new request is the first thing it does is is there already a warm instance?
Is there a warm instant warm uh Hermes instance for this tenant?
If there's not one, then it does this really cool flow where Hermes is super modular. So you can actually distribute different Hermes profiles. Um, and Hermes just makes it really simple to load that profile. So, for example, we have a default Cadwell Hermes profile, and Cadwell is just Opulence AI's uh assistant. So, it's going to go, well, I see that there is no valid Hermes instance for this tenant. I'll go ahead and pull the uh the default catalog profile, create the the workspace folder on disk and clone and uh load the profile into Hermes and then from there it'll wait for a health check and once the health check actually passes so assuming this Hermes instance is especially like online and alive is it it will then begin um proxying requests to this Hermes instance. And why this is so cool is is when Hermes is running in gateway mode is it already exposes that OpenAI compatible API endpoint.
Um and why that's really special is is one of the really popular frameworks for dealing with AI right now in Typescript TypeScript lands is called AI SDK by Verscell. an AI SDK is already fantastic at uh communicating via the open AI compatible standard. So that means our front-end code doesn't have to change very much and we can also reuse that AI SDK library because these are both Typescript servers to then proxy the requests. So the agent gateway gets to do its moderation, its authentication authorization layer. And what you're not seeing here is there's also a separate Reddus instance that's helping uh uh make the request more durable because one of the problems we're facing is is a lot of our users are actually using this service um in unstable environments where it's not uncommon for them to lose connection for example for 3 minutes. I wouldn't even eye that. I'm assuming they just lost 4G connection or 5G connection. So we have a Reddus instance.
The entire purpose of this is to basically remember the state and also handle the uh the streaming response from our actual agent orchestrator. So stores conversation state conversation whatever.
So that way if a user ever loses connection from our our our SAS um which is also bundled into an application via capacitor, our agent gateway gets to handle this uh complexity. So the orchestrator doesn't have to think about it. It just magically happens if the user comes back their conversation will be left off exactly where they are. This also enables it enables us to do uh longunning agentic task.
Basically meaning the user could you know send a request off maybe they go hey can you go ahead and find me or do a competitive analysis of my local market.
So I'm a general contractor in western Washington. Who else is competing in my market today? And that's can take some time. The Asian this Hermes instance over here has to go do a lot of research. It has to pull up public databases. It has to do web searches. I mean, I wouldn't be surprised if that could take 20 minutes sometimes depending on how big the market is. And the user doesn't have time for that, especially if they're on their phone, they're on the job site. So, what's really cool about this is is the user basically just gets to click off and forget about it. And the next time they come back, the Reddus will have already stored the conversational state that got uh returned by Hermes, and then it'll replay it for the user. So, they pick off exactly where they left off.
And actually that's kind of all I wanted to show you today. So basically this is what I've been implementing in our software as a service for opulence AI and I'm hoping that this shows people that you can create openclaw like experiences for even multi-tenant software as a service applications.
Um thanks for watching. If you have any questions please I'd love to hear them down below. This is really hard stuff and I'd love to collaborate with other people who are working on it.
相关推荐
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











