Install our extension to search inside any video instantly.

LFM2.5-8B-A1B: Local Agentic AI with Multilingual Support Tested
Added: 2026-05-31

2,520 views768:53fahdmirzaOriginal Release: 2026-05-28

LFM2.5-8B-A1B is an 8.3 billion parameter mixture-of-experts model that activates only 1.5 billion parameters per token, trained on 38 trillion tokens with a 128K context window and reinforcement learning. The model features a hybrid architecture with 18 layers using short convolutional blocks with multiplicative gating for efficient long context handling, and 6 layers using grouped query attention. While it demonstrates strong multilingual support across 80 languages including major languages like Spanish, French, German, and Arabic, it shows limitations in complex agentic tasks, struggling to complete multi-step workflows without human clarification despite its tool calling capabilities.

[00:00:01]Another model from Liquid AI. We have covered Liquid AI models before on this channel.

[00:00:07]Honestly, they never quite lived up to the hype. Gwen and Gemma small models kept beating them where it mattered, but this one seems a bit different, I would say. LFM-2.5 8-billion A1B is their most serious release yet, and the numbers back it up.

[00:00:27]But, you know what? Let's not believe on the numbers. Let's install it on our local system and then test it out. I'm going to use this Ubuntu. I have one GPU card, Nvidia RTX 306000, with 48 GB of VRAM, and I'm going to use vLLM in order to get it installed and serve on this system. You can see that I'm using quite a recent version of vLLM. If you don't know how to install vLLM and what exactly that is, just go to my channel and search for vLLM, and you should be able to find heaps of videos around installation.

[00:01:02]This is Fahad Mirza, by the way, and I welcome you to the channel. Please also follow me on X if you're looking for AI updates.

[00:01:10]Before I start the installation, a very, very quick overview of what exactly is this model.

[00:01:17]To their credit, this is an open weight model on Hugging Face. It's an 8.3 billion parameter mixture of expert model that only activates 1.5 billion parameters per token, trained on 38 trillion tokens, three times more than its predecessor, with a 128K context window, and reinforcement learning on top. It's built for agentic workflows that actually work reliably.

[00:01:46]I will be also talking about its architecture, but allow me to just start the download.

[00:01:53]So, I'm just going to serve it on my local system, and I'm using vLLM serve command with this context window.

[00:02:00]It is going to download the model.

[00:02:02]It shouldn't take too long, by the way.

[00:02:07]And while it downloads, let's quickly talk about the architecture and what exactly they have cooked without boring you.

[00:02:14]Architecture is actually quite simple.

[00:02:16]24 layers split two ways. 18 use short convolutional blocks with multiplicative gating, which is primarily a design from their own research that handles long context very cheaply. The other six are standard grouped query attention blocks.

[00:02:34]Vocabulary was also doubled from the previous generation for stronger multilingual support across Arabic, Chinese, Japanese, and Korean. And we will also test it out very shortly.

[00:02:48]And meanwhile, the model is downloaded.

[00:02:50]You can see that it's not a huge model.

[00:02:54]The model is now being served. Let me show you the VRAM consumption.

[00:02:59]It is consuming around 44 gig of VRAM with that KV cache due to the context window which I increased. If you don't want to do that, simply either decrease that context length or you can even use this quantized version, which they also have shared in different cards.

[00:03:18]For the agentic stuff, I'm going to use this Hermes agent, which we have covered a lot on the channel.

[00:03:23]I already have it installed on my local system. So, I'm just going to say Hermes model to set new model. And the endpoint, you can also use it directly in the config. If you don't know what Hermes is, how to get it installed, just please search my channel. Heaps of videos around it.

[00:03:41]So, as soon as I have run this, you can see that it is asking me what exactly I want to use as a provider.

[00:03:48]So, I'm just going to use that custom endpoint, and I will use this URL.

[00:03:54]And you can just simply press enter here.

[00:03:57]And I will now select two because I'm using open a compatible endpoint.

[00:04:03]And the good thing is that it has automatically detected our model, so I'll just use this model here. And I will auto detect it because we already have specified it. Just go with this and it is now saved.

[00:04:18]And now let me launch my Hermes agent.

[00:04:23]And the agent is launched. As you can see that it is all running.

[00:04:27]So, what we are going to do here, we are going to test how well this model handles real agentic tasks, which means scanning a Python project, a huge one, understanding the code base, and writing a technical report. So, I already have a Python project which I created sometime back start watching another video.

[00:04:49]I'm asking it to scan all Python files, understand what each one does, how they connect to each other, and write me a detailed markdown report saved as project overview.md.

[00:05:00]So, it is going to use its internal tooling and all that. Okay, so okay, I need to enable that because you see I didn't enable this auto tool choice.

[00:05:10]And for this tool call, we would need to restart this vLLM serving here. So, I just canceled it with control C and you can see that I am now serving it again.

[00:05:21]The tool call parser is LFM two and it uses tool call start and tool call end tokens.

[00:05:29]And this parser is built exactly for this. So, let's wait for it to get served again and I will restart my vLLM or my Hermes agent.

[00:05:39]vLLM is now serving again. Let's kick off Hermes agent.

[00:05:50]And now you can see that it is using all those tools. And the agent is now going through my directory, scanning all files, and preparing that report.

[00:06:02]So, model read the files, scanned them, but didn't generate the report. Maybe I will nudge it a bit and see if it generates the report.

[00:06:13]It should have done it by itself, but let's see.

[00:06:16]And now it is writing that report here.

[00:06:20]Okay, let me just go up and try to read through it.

[00:06:25]So, very honestly, if you read through this output, this report is shallow. It has listed the files, but never actually read them.

[00:06:33]So, it has done some of the tool calling, but then, if you just scroll down, it is asking me for clarification instead of just doing the job. I believe this is a weakness of this model on complex, multi-step agentic task, which was not really that complex.

[00:06:50]So, I think uh it should have done better here. We can keep asking it, but I don't want to uh really put more time on this one. Let's test some of the multilinguality.

[00:07:03]For the multilingual, I'm going to give it this simple sentence, and that you are the love of my life, and then it needs to translate it into all of these languages. I'm not expecting it to do all of it, but let's see how much it can stretch.

[00:07:21]The model has come back with the response. If I'll go through it, the major languages are solid, like Spanish, French, German, Italian, Russian, Arabic. I think they are pretty good. Even Portuguese looks good. The model handles its nine nine natively support lang- you know, supported languages well. So, nine languages are supported, but the cracks started showing in the long tail, as you can see. Uh the regional languages from India or some of the southeastern languages, some of the European languages, they are not there.

[00:07:53]Um and overall, I think for an 8 billion model running locally, covering 80 languages with reason- reasonable accuracy on the major ones is um is required these days, but I think it is not a bad response for what it advertises that it supports, but I think now we are at a stage in LLMs where we need more. Especially when you compare it to Gwen and Gemma and few other models in the market. So, I think it is um a good effort. I would say 8 billion parameters, 1.5 billion active knocking out 80 languages, but when it comes to agentic tasks, we saw that it struggles.

[00:08:34]It I think it can do some basic stuff here and there, but when you really give it some agentic task and tooling, I think it is going to struggle. But let me know your thoughts.

[00:08:44]What do you think about it? Again, please follow me on X, and please become a member if you want to support the channel. Thank you for all the support.

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Artificial Intelligence

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

Trending

Computer Science

The Meta AI Hack Is a DISASTER

LowLevelTV

141K views•2026-06-03

Paris is in SHAMBLES right now 😭

H1T1

4053K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30