Cacheon is a decentralized inference competition subnet on Bittensor where miners submit Docker containers containing optimized inference servers for open-source AI models, competing on speed metrics like time to first token and tokens per second while passing a correctness gate that verifies outputs match baseline model distributions, enabling fair, transparent benchmarking of inference optimization techniques across the entire inference stack.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Subnet Summer AMA X SN14 CacheonAdded:
subnet summer the the the launch for SN14 has been quite exciting and it seems like there's been you know a lot of discussion across different uh groups channels you guys made quite a lot of noise on X and everyone's getting quite excited about it um so why don't we just start at a very high level um tell us tell us about yourself um I know you've got a history in this ecosystem and then tell us about the subnet as well >> yeah yeah for sure uh So, hi, my name is Xavier. Uh, so I've been at Leighton for a little bit more than a year. Uh, so right before this, I was uh uh I was a lead contributor for TA app uh the website. Uh so that was a lot of fun. Uh and yeah, before this I was uh at the open tensor foundations uh that it was the time where we're doing the detail transitions uh before detail went live.
Uh so I was helping out with the team with uh simulations running the tokconomic simulation uh and also helping helping Jacob with some of the subnet reviews. Uh so if you remember before detail it was up to the validators to give it a vote on on subnets and then uh the consensus across all the validators votes determines how much emission you get. So it's uh a lot less market driven, a lot more uh valid data driven um and eventually realized it's not really a sustainable way to uh reward uh you know different subnet because it's quite quite controlled you know it's kind of like a oligarchy in a way uh so that was yeah so was during the time where we want to transition detail I was uh at open open sensor foundations doing research um and before that I was at penta capital for three Okay, that's that's a quite extensive background. With that background, what why do you think you were prepared to um run your own subnet now? What were the skills you took away from your previous experience?
>> Um yeah, like for for a good chunk of my career, it's uh mostly beyond the tech sites. Uh you know, building I was a quantum developers. So building data pipelines for for our training strategies. Uh and then I also hop on the fundamental side uh fundamental investment side uh which is you know looking at a business and how do they make money how big is the market how how are they capturing the market uh and how they returning this uh you know the market capture return to the shareholders uh so I feel I have a good understanding on both sides of the the coin uh and also when we're building tab savant uh we're also coming across a lot of issues when it comes to inference because you know it's a uh it's a customerf facing app so We needed to be fast. We need to be reliable. Um, and yeah, like I think we had this idea two two-ish months ago. We've been building this for a little bit under two months.
Uh, and two months ago was when Turbo Quant came out uh the paper from Google and we're like, "Oh, this is very interesting." Uh, but if you read more about the paper um on in the community discussions, there's actually quite a lot of drama when it comes to the the Turbo paper. So, TurboQuin was built on top of this existing technique called Rabbit Q. Um, and they're they're claiming uh TurboQuin was claiming that they have like eight ad 8x uh this compression rates or 8x improvements uh on top of Rabbit Q. Uh but the previous author the author of Rabbit Q was saying, "Oh, you guys are not actually comparing um your uh your optimization app to apples. you're using a much faster uh setup for your technique and you know you're using a a way lower uh setup for for our technique. So you're not crediting us like fairly you're not comparing you're not benchmarking apple to apples. So that kind of gave us the inspirations uh for cash on for summer 14. Uh we figured you know there uh we didn't see a uh a benchmark uh in the space that's really comparing all the optimization technique uh apple to apples uh comparing them fairly. You know they're not running on the same hardware or uh getting the same prompt or uh the same model for example or same language. Uh so that's kind of where the uh the idea started for for for Cashon.
Um and and yeah, so you know, really excited, really pumped about the upcoming launch. Um and yeah, and we have a great team behind us to uh to help us drive for this.
>> Amazing. Um I want to ask how how do you guys uh successfully benchmark across um the entire spectrum? How do you go about that?
>> Yeah. So basically the the challenger or the miners they submit a docker containers uh so and then we grab every single docker containers. We also grab the latest VLM uh as the baseline.
Obviously if you're not comparing against uh the the latest VLM, you know, you shouldn't get any any score or any any rewards. So we grab all of them. We run uh the same prompt uh the same hardware uh on the same mode. Uh and because it's a docker container, it's actually quite easy to to organize this uh uh these evaluations. So we run the same evaluations on all of them. We reset the GPU between every runs so that's uh the uh the memory the warm memory isn't passed on to uh so that everyone gets like a uh you know the same playing field pretty much. Um yeah so so that's how we're doing it. And then the uh the challenger or the miners, they're scored on latency.
They're scored on uh tokens per second and uh time to first token because you know we care about speed. Uh and on top of that we also have a corre what we call the correctness gate which is something that you have to get through before getting scored. Uh the purpose of that is you can't just spam random letters random characters so that you said you're the fastest. You have to your output have to be uh correct to be aligned with what the default VL what the default model is outputting otherwise what's the point of the speed you're just returning random garbage.
Yeah, completely. Um, so someone's already asked for someone to that's new to Betensa, what's the easiest way to start as a minor on 14?
>> What's the easiest way? Uh, we're actually pushing out the updates on the dock on an example minor. So this example miner basically is it's uh it's the baseline that I just talked about the latest V vlm uh docking container. Uh so with that, you know, you can actually use that submit as a minor. You're probably not gonna score very well because it's the baseline. Uh but we actually we're going to push out more updates to our documentation so that you can take this baseline miners uh you can run the GPU uh and you can just get started on optimizing that. So we want to make make sure that uh the bar for entry is um is lower so that you can take you can take the existing codes uh and and yeah and see where it takes you. Um another way that I think Mar should do is uh just check out what's the what's the current literature on the space uh what's the latest you know like what's uh what people are claiming is the fastest or like rust if you implement this in rust it's going to be uh the fastest or this guy claims that sg lane this version lane is going to be the fastest so you take the research paper you take that literature preferably also on GitHub code uh you take our code you know our platform code our read code which is all public on on GitHub club. Um, and yeah, just dump it in code and see and see what happens.
>> Okay. Yeah, that's really interesting.
Um, earlier on you you spoke about having the fastest tokens. What did you mean by that?
>> Yeah. Uh, the fastest uh so there's two metrics that we're comparing that that we're using to for evaluations. Uh, first one is called uh time to first tokens. So it's uh so when you type in the prompt to lls you know you wait like a second and then the first prompt or the first token pops up. So that's the time to first tokens that we're measuring this latency. And then the second part is uh is it's throughput.
It's how much how many token per second are you outputting over uh over all the prompts uh send you uh because LM two uh two kind of two phases when you send them prompts. Uh the first one is called prefill. uh what it basically is is that it takes all your input prompts and then it fills up the memory with with cache with uh with all those matrices so that it can reuse this cache to generate the responses uh later. So that's called prefill. So the time to first measures like how long does it take for LLM to process what you're saying and then and says the first word uh and then second part uh throughput tokens per second.
That one's quite self-explanatory just like how many how fast you're spending our tokens.
>> Okay. Amazing. Um I've got a question from Angry Dave. He says, "What's the current performance gap between top minor submissions and the VLine on.572B?
>> Yeah. So, so that's just we haven't we haven't launched it yet. Uh this launch is on Tuesday. Uh so far we've got like around 13 uh test submissions uh didn't pass the correctness gate uh or they have some set of issues uh but already seen some interesting like you know um and yeah so I'm really excited for this competition to go live and see what happens and how much optimize against the latest field.
Yeah, I think that's and that that I think you know we wait and see how it goes. But then what would you what would you say the roads maybe months ahead for cash on?
>> Uh did you say ro it was cut off for a little bit?
>> Yeah.
>> Okay.
>> Yeah.
>> Yeah. For for the road map uh for v1 which is launching uh next Tuesday. Uh it's the arena I was talking about.
Everyone has one fixed hardware. uh the miners submit docking containers and try to optimize against that one canonical model. Uh v2 uh version two is where it opens up the optima optimization surface. So right now we're not allowing the miners to do spectator decoding uh because there's some issue with with the correctness case that we're currently setting up uh and we want to get this proof of concepts out and say that oh like this actually works before open up even more optimization surface um you know some advanced technique people can use like spectative decoding uh quantizations you know custom custom kernels uh and we can say that hey you know we have this GPU we have this uh we have this uh this model and you can optimize more than what we're not at at the moment. Uh that's going to be version two. For version three, you know, we want to have a competition for uh for different models. So it's not just quin 2.5, it's going to be the frontier model like Kimmy, the frontier model for Quinn, the frontier model for DC, all those you know open source model is going to have own competitions because different models have a different have a different architectures. Some might be like experts, some might be some some other things. So maybe the the optimization technique is a little bit different across different levels. So very excited to see how um miners kind of look up against against that. Uh and after that is where it gets interesting. It stop being a just a leaderboard, a competition, it start being a product.
Uh so the the hypothesis for the PF is pretty simple. You know a lot of people care about latency and speed and building your own inference stack is is pretty expensive and so cashon kind of like else this problem to this open competitions and surface is the winner for each model and there allow customers to choose cash on because we're the fastest back we're the fastest back end for for AI not because of anything bit tensor uh specific um so yeah and uh the products it's it's quite simple just you know it's a plug-and-play inference servers Um and you know you can come to us and say oh I want to run this I want I want to run like this model with this GPU and you know I just want the fastest that's my requirements and then they can come to us and and we're like okay yeah we can provide this. So um >> what's what's some reasons that uh um a company would come to you and want the fastest like what's that that that difference that what does it how does it affect them positively?
>> How does it affect them? Uh yeah so like any yeah any any agents or or developers that's working on like API products or or rack pipeline uh they can use so so one of the product we can provide just straight up just APIs uh we can say that oh if you use our API it's going to be faster than uh for example using open routers um and we can guarantee that it's running the same the same model running on the same hardware because we can showcase with our open source competition like this this is what we're doing this is the code this is like onchain like verification. Uh so it's uh it's it's like a benchmark that's uh being productized. You know, we a benchmark is the proof and then we we're we as the sub owner are packaging all the leader of the all the leader of each leaderboard and say hey like you know this document container has been proven to be the fastest. Here's the XYZ here's the Y and then and then we provide this document container as a product. Well, not exactly this as a product, but we package as a product and sell it. Uh, so yeah, I think like the potential buyer, it's uh uh it's it's quite straightforward. Um, and I'm I mean I'm personally very excited for for that part of the journey when we stop being just just a competition, just a leaderboard, and we start being a uh a products that that we can sell.
>> Yeah, that's huge. Well, there was another question that came through which was how do you see Kion competing with centralized providers like Grock Fireworks or together AI in the long run? But I I feel like you've pretty much answered that. Um >> yeah. Yeah, I can touch on it a little bit more. Uh I think uh >> down the line like we would love to partner with with uh companies like shoots. Uh I was also thinking open rather we can partner with them but shoots already have a partnership with open rather. Uh so this B2B allows us to just like we don't have to worry about like maintaining our customers uh and then just working with uh yeah like shoes or open router who already have like a lot of traffic directing to them.
Um, so yeah. So I think I think the most direct compar uh comp going to be firework. Uh together AI they they own a lot more of the vertical stack versus firework and us. It's kind of like just just focused on um on on inference on speed on the software stuff. Uh Grock Grog will be also be a pretty good comp. uh their uh their focus their uh like their researchers uh focus more on the hardware to speed things up but at the end of the day like the products we're selling it's uh it's quite similar. Um I think for us like the the edge we have against them is that because open source we can prove uh we can prove that we're the fastest without um without going out there and say oh you know just trust me we're the fastest. So um and another another advantage that we have is that whenever a big lab releases a uh a drastically faster open source inference engine or uh a research lab or university push the frontier you know more let's say you know a couple months from now that's a pretty good outcome for us because uh if that happens you know cash on miners will immediately adopt and compete on top of it. Our mechanism doesn't bet on like any particular uh technique winning. It bets on this competition surfacing the best combination of techniques on this uh on this arena in this arena.
So you know the baseline shifts if the frontier shifts. So it's just going to gradually get better and better over time and it doesn't get disrupted by by uh the inference progress. So yeah, so it's just very excited to see where this goes. Um and I'm glad that like uh we already got like 50 I think 15 Tesla miners at the moment helping us out refining >> refining the mechanism.
>> So the the way I understand it with the miners is let's just let's give it an example just to break it down. So minor A uh finds the fastest solution. Then minor B comes in and finds an even fastest solution that's completely opposite. Then minor C appears and goes, well, I'm going to take a mix of what A and B do um the parts that work, and then I'm I'm going to be the fastest minor, and then it's going to go all the way around again until s until there's a a completely new solution.
>> Yeah, that Yeah, that that could work.
Uh because all the minor solutions are public. They're a Docker image. They're they Yeah. So, uh anyone can go on the Docker image and see what what they're doing.
Uh yeah, so you know this is this is the uh the great thing about open source you know we don't have to reinvent the wheel. we can build on top of other people's success and there's a very strong incentive to do so because there's so much money on top um and then when we expand it to multiple models to compete on um if you're not an expert in like this mix of if you're not expert in this like LM architectures maybe you know how to optimize against like this setup um so so yeah so eventually we're going to be like arena uh like yeah like a like a set of arena where every arena it's uh it's like this hardware where was this models and then you're competing on providing the best inference for those combinations. So what yeah what you said is definitely correct.
>> Yeah. Okay. Cool. Cool. Cool. I'm glad I got that right. Um I'm going to ask a question from one of the community members.
Gio has asked, "Do you have any concrete plans to use a portion of the tower missions received by the subnet for direct onchain buybacks or other ecosystem support mechanisms? uh for example liquidity, staking and rewards etc. >> uh using tower emissions received for onchain buyback. Uh I think I think the better way to do that is just bring the revenue from the product we're selling.
Uh I think that's that's what a lot of subnets are doing. Uh like for example you know if if we integrate with shoes for open router let's just say open router for example. Uh you know we're uh because right now if you go on open router let's say like Kimmy 2.6 Uh, and then there's a couple different providers that runs the same Kimmy model. Uh, but all of them have like a different inference speed. This this guy's a little bit faster. They charge maybe a little bit more. For example, this guy's a little bit slower. They charge a little bit less. And then we can go in there and say, "Hey, you know, catch on. Uh, we offer the fastest uh inference. We're we're the fastest. Uh, and then from there we can offer very competitive pricing. And then from that we can generate revenue from open router integrations and then use that to buy back uh alpha because right now like right right now the whole thing you know the subnets the validator the miners are all funded on emissions and eventually that's going to run out you know we can't this is like kind of similar to you know VC funding funding operations in the early days eventually you have to give that back you know how are we going to get that back selling the products um I yeah like as of now I don't really see any like a better solution ions to that besides just selling the products. Uh obviously there's there's a lot of ways that we can sell the products. Um that's just one example. Another example is like partnering with uh with enterprise who you know know they use a lot of tokens because they have a big company.
Uh they want to sign like a long-term uh B2B contract so that they're locking on the vendor and they know that they can guarantee the the best inference servers for for their internal use case. So that's another thing, you know, we we kind of just like manage their entire uh well maybe not the entire stack, but we manage like part of the stack. Um yeah, I think I think doing B2B might be better than just selling API because if if we just sell API endpoints, not only do we have to compete with with uh uh d uh directly compete with like open routers or shoots or or grock uh but also uh you just have to manage a lot more retail uh retail customers and doing B2B just it cleans up a lot of the uh the work that you have to do. So uh but more importantly with selling the API endpoints is that we have to get more GPUs. Uh and right now especially with the GPU supply constraints uh it's just very difficult to get your hands on. Uh and this is why I think like partnering with shoots and getting the open router integration is going to be huge.
>> So that that partnership with shoots is that already um in discussion or is that already been enabled?
>> Uh no because we don't have we haven't gone live yet gone live yet right. Uh yeah so we yeah we briefly talked about it uh John messaged in our discord uh John Durban said like what we're doing is something that they have considered themselves to do internally uh and he said if you guys doing that obviously it's much better much better to partner with you guys so that's like we don't have to worry about it. Um but yeah, so it is not something that's been like very hashed out. Uh but it it is on the road map. It is something that uh the team wants to do and it makes sense like it's it's a the partnership just naturally makes sense.
>> It does naturally make sense. Um it's it's a it's a win-win for for the both of you. Um and I think partnering with Shoots um and and what you're doing on the the Lency side could, you know, definitely excite a lot of the community. Um the same way that shoots has been one of the top subnets forever.
Um maybe people see you in that same limelight um for what you're doing. Um so just to clarify your competitive moat against all the competitors is that you're open sourced.
>> Well, open source save us a lot of things, right? Like we we don't have to uh it's a very easy way to prove that we're the fastest. It's a very easy it's a well I don't I don't think it's an easy way but it's it's a really good way to surface uh the best solutions uh like this um like I think a couple years ago when uh when the firm invested in Panta you know sorry when when Panta invested in TA like what was the thesis the thesis is that open source it's gonna it's gonna eat the world open source is going to eat traditional software just because you can build on top of open source much faster and then you need to find a good way to uh incentivize this this acceleration, this adoption and then Bit Tensor just sounds like a very straightforward way to uh to enable this. Um yeah, so I think I think our edge is that because of because of blockchain, because of Bit Tensor, because of open source, it saved us a lot of time internally when it comes to all the ops, all the all the hiring for example or uh managing uh essentially we're like outsourcing research. um and we provide this this platform to prove that you're the fastest and then us as the sub owners we package the best solutions and then we go on to sell it. So in a way we're doing half engineering and also half half sales.
>> That's that's that's very good. Um I I'm going to go back to the community I'm bouncing between questions that I've gotten and community questions just to make sure that they they get answered as well. Um, but are there plans to allow speculative decoding quant quant quantiz quantization or other aggressive optimizations in future versions?
>> Yeah, that's plan for v2. Um, yeah, we're not allowing right now for v1 is because right now comparing the correctness ac across the entire token distribution, uh, it's it's a lot of work. uh and especially now with well not a lot of work in terms of like work from us to implement it. It's a lot of work on the GPU. Uh and especially now when we just don't have enough GPUs available to go around uh it's much it's it's easier to not do that in V1 uh to make sure that we get things right one step at a time uh before opening up the surface area because if you just do this right now there might be a lot of a lot of things that we just you know did not think of or something that you know too many variables that's all I'm trying to say so it is planned just not for this Tuesday.
Okay.
And what if a major lab release is a dramatically faster open source inference engine?
>> Yeah. Yeah. So, >> yeah, that that was a good thing that I was talking about earlier. Um, you know, it's uh it keeps on pushing the frontier uh not only for whoever they release it, but also for us, you know, that's a good outcome for us because uh if someone released a much better engine, uh the miners can adapt to it. Um and our like our mechanism doesn't bet on any particular uh solution you know it's it's uh it's a combination of solutions in your docker image. So when they push the frontier forwards we can also push the frontier forward because we're using that as a baseline. Uh which also makes sense. You know if there's the world fastest engine out there for free. Why should we reward you for just like copying that solution? We should be rewarding you for pushing that frontier even more.
>> So >> that makes sense. essentially essentially another open- source competitor um when they create that new baseline essentially they're a minor but from somewhere else um that they just don't get they don't get rewarded because they're not part of um cash on but now it has an incentive for miners to compete with that to then I guess become yeah the fastest all over again. I I have to ask >> Oh well no I I'll have to ask and then we'll we'll go back to your answer.
Where'd the name come from?
>> Oh, uh, Kashan.
Uh, I like I like it was the French flare because, you know, it sounds cool, but also less syllables, you know, two instead of three syllables.
>> Uh, it came from uh, so the initial version of Cashon was that we're we're just optimizing the KV cache out of the entire inference stack. So KV cache is one part of this whole like you know inference stack um, thing. Uh so uh with KV cache you know we wanted something that's we want a name that's close to cache so we land on cash on uh it also makes sense you know we're turn on the cache cash cash on cash off so that's where the name came from uh and then this is around two two months ago one months ago uh and then uh we had a call with Jake and we're presenting the idea uh and then he was like oh like you know why don't you just aim for bigger you know why why are you just optimizing for this one small part I guess not a small part just one part of this uh inference stack why not just optimize for this entire thing where uh miners can use any language any framework any technique um and and also the PMF also just makes a lot more sense because right now you have a docker image and this this isn't like uh by accident you know when you have a docker image it's uh it g of a clean surface we can uh use it to plug into to other things there's a lot are portable, a lot more modular. Uh but also on the security side, you know, the the document container runs on an isolated network so that they can't like hack us for example. So, so yeah, so we pivoted from just optimizing this one part of the inference stack to this entire thing, the entire inference stack. And then but at the time we already purchased the domain name, the the Twitter handle. We're like, you know, cash also works for uh for this projects. Uh we're we're still, you know, using KVach a lot. Um yeah, so so the name the name just stuck.
Well, I guess now I get a bit of a a history of the progression of how you you built and you got to this point and that you did expand on and um now that you've made those changes, I guess you do have a better product market fit as you said and um you had a chat with Jake which is good. So he he obviously um likes what you're doing.
>> Yeah. Yeah. He advised on some of the the technical stuff. Um yeah, like after came back from um from the meeting, uh yeah, like like I said, we realized the uh the PMF is just a lot more clear because if you're just optimizing for this one stack of the LLM uh of the LLM inference, uh you can't really sell it, you know, like it's it's a lot of moving part to be like, oh, why don't you like just take out this one part and plug in our thing? um versus what we're doing right now just a lot more portable and a lot more modular. Um so after that meeting like everything was making a lot more sense techn technical side business side uh the only thing is that oh like do we want to change our name and then we decide that we don't want to change our name changing the like >> naming a project sometimes the hardest part you know you have so many good ideas and you're trying to land on this one name that's that can be you know that can represent your entire company and products. Um, >> but I guess it'll at the same time your name tells a bit about your history. You know, one day you can look back and and and give the story of why that name came about. I think that the Airbnb story started with with two guys sleeping on an air mattress and that's why they called it Airbnb. So, it's always nice to to to know about the history of a name. Um, so how do you actually verify that a miner's server is producing correct outputs and not just returning garbage just at a really quick rate?
>> Yeah, so the the TLDDR is that we compare the token distributions. So when LLM uh tries to output the tokens, uh it gives like a probability assignments for each token. Um so so yeah, we compare distributions. Uh so the the academic way is called KL divergence. We're not using the entire KL divergence just because it's a lot of work to go through each of them. It's a lot of work on the GPU to go through each of them. It t it takes too slow and um and then there are some like other issues. So what we do is that we we compare we compare the prompt from the baseline which is the latest VLM. We compare the prompt from the miners and then for each token we compare we compare the top five because after top five honestly it doesn't really matter that much. Um we compare we compare distribution and top five and then we look at the first we look at the first token where it diverges uh and then is it diverging enough? Is it divergent because of the actual distribution that's not matching where it's divergent because of a some tiny noise uh noise difference. Uh so we only check the first divergence because uh if this diverges, if this actually diverges, there's really no point comparing all the tokens after that because they're going to diverge so much it's going to mess up the score. Uh so that's our quick way to to measure the correctness. Uh so that's actually the reason why we're not allowing uh speculative decoding and some other like advanced quantization technique just because it has it has too much noise uh to to this like elegant solutions. Uh but yeah for the next step I think we have some ways to uh compare it like to benchmark it fairly on different GPUs on different techniques. Um yeah, for now we just want to make sure that we're getting like one things right at a time and taking a slow and before we uh we introduce more variables.
>> Yeah, cool. That makes sense.
>> I guess once you start benchmarking, you'll have a bit more of a clearer picture as well.
>> Um so um why why also Docker containers specifically?
>> Uh yeah, so on the security sides, uh it's it's isolated. what we can isolate on security sites um because we we don't want to give internet access during evaluation just because well there's no point with internet with internet connections you can cheat in a lot you can game the system and cheat in a lot of ways that we probably won't ever think of uh but also like uh the LLM runs without internet uh so like we're optimizing on the technique we're optimizing on the code so why should you have internet access uh but also protects us uh just so that people doesn't inject like Jordan horse or something. Uh but also from the product side it's uh when you have a docking container it's pretty much a complete product that you can sell to to businesses uh more portable uh more mod uh a lot more modular uh so yeah so the so both the tech side and also the business side makes sense more on the tech side uh I guess it's easy it's also easier to do like GPU cleanup between different runs so that we make sure that our benchmark is fair across all challengers um yeah So yeah, both business side and and the technical architecture side, I think it just makes a lot more sense to do it this way.
>> Cool. Thank you for answering that. Um, so tell me five years from now with the with um changes of AI, how quick can you get like what's the what's the roof and where do you see where do you see yourselves in the future?
>> Where do I see myself in the future in five years? Oh man. I mean, the space evolves so fast. I don't even know where we're going to be in like three months.
Um, >> is there is there a roof for speed?
>> Is is there is there a roof for speed?
Uh, yeah. I think I think that's the holy grail. Like is there really a roof for speed? I guess I get instantaneous.
Uh but yeah, I mean as as the model gets bigger, as the the GPU gets more beefier, uh like can we truly achieve like when I type it in, it it answers everything just immediately. Uh I mean obviously that's the place we wanted to be. Uh so but in the in the short term I think where we want to see oursel is is those like business integration business integration business partnership that I'm talking of so that people can route traffic through us not because we're some crypto projects or open source projects because we're genuinely the fastest and is backed by it's backed by science you know backed by open source.
Um so that's where we want to see ourselves in the short term routing routing traffic.
uh in the long term. Oh man. I mean, yeah, the space evolves so fast. Who knows? Uh hopefully most of the token research routes through us. I mean, that's that's the goal.
>> Yeah, for sure. Is there w with what you're building, do you think there's any any other directions that you can like fork out um from like giving your miners different tasks?
>> Yeah, I think so. Right now it's all softwarebased optimizations. Uh you know you're given the hardware, you're giving you're you're given a uh a a model and then you're optimizing against that. So one surface area that we can we can expand on is the hardware. Uh so because right now we we're giving you a fixed GPU. Uh yeah. So with that I think I mean would love to open up more surface area so that you can optimize not just against this but also against that. Um yeah, hopefully that will be that'll be a addition like a year down the line.
>> Cool. Cool. And and this is probably something you guys haven't um thought about so much, but um in terms of just uh your presence in the potency ecosystem, um how how do you want to show up? Do you want to be um like Max from Score that's appearing every day and talking to the community or do you want to be um like another team that's um quite informative and professional and how they they present their work maybe like shoots um how how do you guys want to present yourself in this space essentially >> how do I want to present oursel uh so in the bit space I think being very active on on discore being active on telegrams is quite important Al more importantly Twitter um for example today we saw that there was a minor using SG lang which u all the other miners were using vlm and this minor is using sg lang so yeah we want to highlight that we want to highlight that this is exactly what we're designing this for you can use any language you can use any framework you can use any technique to as long as you optimize what we're measuring you know your um so I think being yeah active on active in the sense of like giving giving updates on the products, not just like meme posting every single day. Uh but yeah, like we want to be active on social just so that our investors, our our partner, um our miners can see what we're doing. Uh down the line, since we want to do more B2B stuff, I think uh it's important to look professional. Uh yeah, like not overly tweet about memes. Uh because yeah, like we we're dealing with some of the industry leader over here hopefully down the line. Um and yeah, like we want to like the initial design for Kashon was a little bit more like hacker house uh like uh yeah like like a hacker house vibe. Uh but we want to be like a serious company. Um so so in terms of that I think the communications may also uh be more seen that way. Maybe you have to do a redesign on the website. Maybe you have to do a redesign on the docs.
Although the docs I think it looks quite professional at the moment. Um but yeah, maybe not full on students highest but uh but yeah, we want to look put together.
>> Yeah, it makes sense in terms of I guess the the market that you're going for long term in in terms of how you want to kind of I guess communicate or or brand brand yourself. Um as long as the tech works, I think everything else can come with time. Um is there anything else you wanted to share with us about um cash on? Anything that we should be looking out for? Anything for the investors to know or miners to understand about cash on?
>> Yeah. So we are putting um we're putting updates on our Discords uh right now.
That's that's our communication channel.
Uh I I I think we got enough interest to have our own Discord channel at the moment. So that that might be coming soon. Um so next up obviously Tuesday the mainet launch we're going to have the dashboard up there so that you can see who are competing uh um how much frontier they're pushing. Uh you can download the logs from each of the uh each of the minor to see what they're doing what they're doing wrong.
Obviously as a minor yourself you should be looking at that to uh to fix your your model or your inference engine.
Uh I will also be at proof of talk so if you want to meet in person uh I'll be there. Um yeah, like I think later on uh it will be great to do regular calls with the community. Uh kind of like quarterly earning, but uh we're we're pre-product pre-revenue. So just going to be updates about what we're doing, how the miners doing. Um and then hopefully later we'll be like, oh like this is uh we're aiming for this partnership or this is how much AR we're generating. Um yeah because like uh I mean not just cash on but like I feel like every single subnets it's pretty much IPO company you know you have a live token that's trading that's trading um except a lot of the subnets are pre-product or maybe pre-revenue um so yeah so I think it's important to look at it that way and what are the public companies doing they're they're communicating about the road map about the numbers u and and yeah I think we should do the same >> I think I think you guys. Uh you've you've already thought a a lot of this through before you've even started the race. Um so I'm I'm I'm really excited to see where you are even in a few weeks, a few months from now. Um yeah, and I'll be rooting for you for the sake of Betensa. Um but it seems like yeah, it seems like we've got a winner here.
Um so yeah, I'm really excited to follow the journey. I might even have to jump onto novelty search, which you're you're doing apparently today. Is that correct?
>> Yeah, no. It'll be it'll be a quick one uh because we don't because yeah, like there's not a lot of stuff to talk about at the moment. We don't have a product.
Uh but but yeah, so it will be quite similar to this AMA and yeah, a short 15 minutes. I think I think Jake have to talk about some other stuff before before Ceson. But yeah, excited to hop on a research. Hopefully we can hop on in again uh with both SAR and you know other places where we have a lot more stuff to showcase and and um yeah.
>> Yeah, we we'll definitely have you on again once once you're uh further into the journey because I'm sure the the community will be excited to um get that update. But thank you so much for joining us today and um we'll uh we'll be following your journey from from a distance.
>> Awesome.
>> And I'll see you proof of talk. Yeah, approve a talk. See you guys in person for those coming.
>> See you there. Thank you so much.
>> Thank you. Bye, guys.
Related Videos
Are our DeFi tools becoming too easy to exploit?
saidotfun
228 views•2026-05-30
Solana Unchained ($UCHN) Explained: Solana’s Next Big Utility Project?
CryptoVlogOfficial
339 views•2026-05-30
🚨 Access Network App FREE Withdrawal to MetaMask?! Only 25M Supply 🔥
Airdrop26Alpha
459 views•2026-05-28
Free TON in 2026? How I Tested This Reddit TON Tool
SirenHead-z9y
2K views•2026-05-28
⚠️ALGO Has a Very Bright Future! ✅ One #Crypto Everyone Should Own!
MetaShackle
184 views•2026-05-30
BingX EventX: Trade Sports, Crypto & Global Events With One Click
AidenCryptox
311 views•2026-05-31
XRP IS GOING TO VANISH! A SUPPLY SHOCK IS INEVITABLE! (THIS IS THE PROOF!)
NCash
2K views•2026-05-31
AI Predicts What XRP Looks Like If Ripple Gets A Fed Master Account
CryptoBlazon
422 views•2026-05-30











