Transcript: Fable's Back, AI Engineer Recap, & SambaNova

Mosiah.org · article artifact

Full machine transcript of Fable's Back, AI Engineer Recap, & SambaNova, a Cognitive Revolution livestream replay.

Source: https://www.youtube.com/live/mo2Qj_sozBs

Transcript note: Generated locally with faster-whisper tiny.en because YouTube captions were not available when captured. Timestamps were removed and 2,409 ASR segments were concatenated into this discussion copy. Expect some recognition errors, especially names, technical terms, and the music intro.

in the field, works from dark to dark. I lay down in the shade and nobody's so many part. Got the disempower on the blues. Nothing left for me to do. Got the disempower on the blues. Got replaced and the blues did to be a dollar. Say, stay off the street, tip my hat real polite came, taste it when I eat. World got fast, I got slow Let go of who we are still go Got the disempowerment blues Nothing left for me to do The disempowerment blues I replaced and the blues did too Got the rules I went to sell my soul Devil just shrugs it Somebody took the road, got the disempowerment, blue St. Louis wrote this song, but the songs used to Work from Dr. Dark I lay down in the shade, nobody saw me park Got the disempowerment, blue Nothing left for me to do Got the disempowerment blues got replaced and the blue A dollar say, stay off the street Tipped my hat real polite came, taste it when I eat World got fast, got slow Let go the way I still go, got the disempowerment blues Nothing left for me to do The disempowerment blues I replaced and the blues did too Ooh Ooh Crossroads I went to sell my soul Devil just shrugs it Somebody took the road Got the disempowerment Say no, you got the disempowerment Broke the song But the songs used too Get on screen Twitter may or may not be down It's not loading from me On a couple different URLs It seems like I don't know seems like it is working I guess. Yeah, we are That's the three and all right all right So good morning. It is Thursday July 7th 2026 And welcome to fable day We're back baby live here on AI and the AM and with our favorite America's top model to clean things up, clean up all the messes that we've made with lesser intelligences over the last couple weeks. So very glad for that. I remember we had some we had some bets it was before July 1st. I thought it would be before Fourth of July definitely. And so we just it was on July 1st basically. We we got the model on July 1st, we got the announcement on the 32. Yeah, I originally set the over-under for the Friday before, so it's definitely the over-relative to my guess. Not a crazy amount of time. I think the sort of Sam Altman public statement of like, it's not crazy for the government to want to take a look at these things for a couple of weeks before they go live is true. And if we can get into a sane regime and a reasonable cadence and this can be kind of part of the pre-release process, it does feel like there's still some chance for it to be all to the good. And I do appreciate that folks are at least, waking up and starting to take this stuff more seriously. A lot of open questions still about this, including generally what the hell happened, right? I mean, we're still kind of piecing it all together through reporting and hearsay and what exactly was it that caused this in the first place? Was it really Amazon that called the government in a panic? If so, like why? And were they confused about what was going on? That would be weird. Were they, is there some sort of weird intrigue between Amazon and then Thoropic? Amazon has a big share of anthropic. And as big as Amazon is, that share of end propagates starting to become pretty material to their overall market cap too. So it would be strange for this to be some sort of sabotage.

But there's not really a, I have a hard time telling a story that doesn't leave me with weird, open-ended questions around what was seen, why was it reported in this way, how did we end up in this sort of panic situation? And then how was it fixed? You know, the best information I've seen suggests that the way that they mostly reassured the government was to painstakingly point out that the things that were observed had already been. capabilities that were in the wild with other models for some time. And again, that's a sort of strange way to get back into the, you know, the approved column is to say, like everybody else, it's kind of like, you know, something my kid would try, right? Like he was doing it too. So sort of a strange way to try to get out of the frontier model penalty box. So yeah, I mean, if you fill in all the gaps in a charitable way, way, you can tell a judge, a resident black type story where this is kind of a clumsy, perhaps, but ultimately enlightened step to take for the government. Or if you fill in those gaps in an untradable way, you're just on a left shaking your head and thinking, who's making these weird moves and causing these weird hiccups? And I would love to see better information at a minimum. At this point, it really seems like the conversation can and should be more public than it has been. So that's what I'll be watching next in between prompting fable. Do you have any thoughts on where we are at this point? There's also the question of how nuked it is coming back or nerfed. And there I haven't, I don't have enough information yet to really render a good judgment. For my personal use, it does not seem to have been and affected so much that I can't do what I want to do. But then I have seen at least some reporting online suggesting that the fallbacks to Fable are the fallbacks to Opus are more frequent and that on some coding benchmarks, terminal style benchmarks, the performance overall is worse because of those higher rates of fallback. What do you make of it at this point? So in my testing, all that rates were lower than before. Interesting. And primarily because I was dealing with the production database, et cetera. And the table before, the moment you mentioned, the word production, that's it's out. And now, the table now is happy to address production database, et cetera. So I think actually what ended up happening is that they had a sit down and they refined, they actually got their classifiers, looser for a lot of things. And I've also seen online people say that they have, when they get dropped out for cyber security research, they can appeal. And the company does a quick like, why do you need this access? And then they have provided access. So there's some kind of like, so I think the defense and depth concept of, here we're going to have classifiers. And then after that, we're going to have like permissions for certain people, has basically been accepted I think that's where we are for fable. People are still saying that capability is very significant. It's very hard to judge these things at this point. So we will see. GPT 5.6 is coming out today, I hear. 215 today. So 215 is a bit kind of in today. And I think the opening eye guys are saying, GBT 5.6 Soul Ultra, that version, I think they are standing behind as better than people. But it just shows you like, how is anyone not part of this kind of AI bubbles supposed to know which model to choose? GPT 5.6 ultra.

It's just, the versioning is just so obtuse. I think the fact that they have dialed in, and again, we're inferring this from limited data points, but it matches my expectations that even a two week break and the five days of, you know, wasn't even five days, the three days of data that they were able to get on initial usage. would be enough for them to meaningfully dial in the precision of various defense and depth of safe safeguards. And we've talked about this a couple times, right? It's just how just in time all this stuff is. Like I wouldn't be surprised if the, I mean, obviously they've been working on these safeguards and iterating on them for some time. So it's not like the whole program is new, but the iteration cycle is pretty fast. So this two weeks, if you, you know, kind of Matt back two weeks before they actually launched Fable and where they were then and how much progress they made you know in the immediate run-up to the first launch I think it does just go to show that so much of this stuff is really just in time and fundamentally super iterative and yeah I guess I'm I continue to be of kind of two minds on everything. I do think iterative deployment has served us pretty well so far. If I critique the OpenAI strategy at a high level, I would say that's one of the things they, it feels to me like they have gotten quite right. And more so than end topic, right? I mean, this is one way end topic is kind of converged on the OpenAI paradigm. Initially their whole thing was much more conservative around deployment. There was at one point they were broadly understood to have made some sort of commitment to never advance the frontier. Obviously, they'll say now that they never actually said that. I'm pretty sure that it was understood. It was meant to be understood. But regardless of exactly how firm their commitment was to that clearly, they're doing it now. And clearly, they've bought into the open AI iterative deployment is kind of the best path to things going well broadly. So I think that still makes sense. But you do want, I mean, all these models kind of start to come under some strain as you get to sufficiently powerful capabilities or regimes where a small enough gap in the defenses is enough to create a huge problem. And yeah, it feels like in multiple ways we're kind of riding these paradigms that have worked well so far. And we just don't know if and when they might break. And if they do break, they might be breaking at kind of critical times, which is a strange juxtaposition on multiple different levels. People talk about that all the time, of course, with alignment, but I think it extends to these defense in depth, safeguards, and it definitely, you know, you know, the highest level, it kind of extends to the whole iterative deployment model. So as always, confusing, but selfishly, I'm glad to have it back, that's for sure. And I am excited to check out 5.6 as well. I feel like it's probably going to be for multiple reasons. One, obviously they're gonna start charging. We'll see, I mean, I kind of expect, it's gonna be interesting to watch anthropic dance around limits and usage, because OpenAI is gonna offer a ton more tokens, it seems, with 5.6, at their $200 price point, then anthropically it's gonna be offering for fable with their $200 price point. As of now, from what we have seen with the relaunch, it goes until July 7 on this preview access basis, And then you have to pay the API rates.

And that's gonna be order of magnitude, maybe more than an order of magnitude, more expensive than 5.6 tokens, if you have the pro subscription. So it's gonna be interesting to watch how they manage that. I would bet that Fable comes back to the Claude Max subscription. to the Claude Max subscription. It seems just very hard for them to have no fable there. That would make Chechi BT Pro, I think just a better product for a lot of people. And I don't think they want that. So I think they're gonna wanna have some access to their top model in that bundle. But it'll depend on how they've obviously got their limits and they've got a destroyed demand to clear their own market And so it might depend on just how strong demand is. I do think people, as we discussed in the interim, I think people will be willing to pay the API rate because things like the frontier code success rate and what that implies around just general taste and quality of work, judgment, quality of writing, which I've seen too, to pay a little less than two X for that much improvement, I think is gonna be worth it to a lot of people. But in the meantime, then we'll also all be kind of going the direction of like, how do we create our own little neutral platform for ourselves that does the model routing in just the right way? And is this something that people will be out for? sourcing to providers. Will we be going to like Sakana and having them handle the routing? Will we be just like plugging into open router? Will we just kind of keep it to a couple core subscriptions and have fabled aside when to send it over to codecs? That's I think going to be a definite area of exploration for a ton of people and I'm already doing a little bit of that just yesterday today because you know I might kind of default assumptions I'm going to spend a thousand dollars a month on AI tools and yet if I you know don't do any optimization I could blow past that real quick with if I just used fable for everything so it's going to be even as somebody who has a you know a plan to spend a thousand dollars a month without even you know thinking twice about it Even there, it's going to start to force me to do some thinking about how do I want to allocate these tokens and when can I use codecs? It'll be really interesting to see a 5.6 is better at the general purpose knowledge work too. That's the other huge thing. It's clearly going to be good at coding. It's probably going to be best at math. It's probably going to be the most steerable for coding. The vibe check I get is that if you really know what you want coated and you spell it out in great detail. Codex is probably a bit better, better at the instruction following, just a little more reliable to do exactly what you told it to do. Whereas if you don't know what you're doing and you're just kind of viving it, then fables probably better. Will this 5.6 though start to catch up on some of these softer skills, kind of theory of mind, you know, for when you don't explicitly say everything that you want or will it continue to be sort of this more artist, you know, very literal minded assistant that struggles more to fill in the gaps. I'll definitely be watching that and trying some things. This is the first one in a while where I felt like you know, they've they've sent some signals that maybe it could have a different kind of character and I'll be really interested to see, you know, can it help me with writing?

Can it, you know, can it do a good job on intro essays for the podcast? Yeah, Clod has had the top spot on that for all but like a week out of the last few years. There was one moment where Gemini 3 took the lead in the right as me test that I always run. GPT has never had the lead. Clod has almost always had the lead, but this I'll definitely be to try it again. I wonder if there's a big difference between giving it as reference you're writing before any AI and you're writing post AI help. I wonder if there's a significant difference between the two. Well probably not too much for me. I mean that is something I think for people to watch in general. I've continued pretty much to rewrite everything cloud has given me out of some sort of of masochistic commitment to full authorship. And really haven't compromised on that much. As we talked about in the first days of Fable launch, I do think now is the time to start to rethink some of those commitments. But I think now we'll be the dividing line for me, at least, where the co-authorship might mean that there is a meaningfully different input Um, to, you know, as part of the prompt in the future, thus far I'd say all my kind of historical stuff is mostly still. I can kind of claim enough authorship that I wouldn't worry about that too much. Um, I should try pangram labs on myself. That would be interesting. And pretty easy to run tests. Like if I just took all my intro essays through history of the podcast and plotted their pangram labs score over time, would I see a drop off in my human number as I potentially lean more and more on Claude that I even realized I was. It's certainly a possibility but I did bet against it. I would bet that I still get like 100% human on pretty much everything up through present but that could be something we can you know report back on in our next live session. I do like sitting here sometimes and just riffing out ideas that we can then vibe code and actually bring receipts on rather than purely speculate. I have, you know, I do see a lot of people trying to go down the model routing route in the sense of let's start to use open source models. And there was a little interview with Alex Carp of Palantir yesterday where he just goes ballistic on frontier model companies saying that they're going to come in and they're going to steal your data and they're going to steal your business and they're going to steal your IP and you're going to have nothing left. And this is unacceptable and we should all be using open source models. The really funny thing is right after that Clement Delanger, who is the CEO and founder of Hugging Face, then responded with, well, you know, Palantir is a free member on Hugging Face. Doesn't have a like, you know, subscribed, not subscribers are business. And so why don't you actually like put some money towards open source if you're, you know, if you're so keen on open source, why don't you put some money into it? And I think that basically describes all of most of like US tech firms are in that position where they don't like the frontier labs. And but they don't really want to use open source either. And then not supportive, that supportive open source anyway. They kind of want to use open source if it's free and they don't have to do anything. But if they have to do anything at all by themselves, they prefer to use a closed source and pitch a speaker or use a SaaS product.

So I also believe the frontier model companies can moderate at any time in turn. So again, they're playing this kind of revenue maximization game where they decide to route internally depending on whether they think that there's enough revenue revenue there or not. And so if they say, okay, there's a hundred million dollars at a 30% discount, can we serve that hundred million dollars worth of revenue, not using our, let's say, fable. but let's say using Sonnet 5 and how would we structure that so that we could hit the capability that is required and also hit the cost. And so the frontier lab can always do this kind of advisor strategies where you have, you know, Sonnet 5 as the core model which then pulls in in Fable as an advisor for various tasks. And in fact, on the API, that is actually what the product team and throw pick has been promoting. They've been promoting to people. If you're going to use on it in a high-cool, use on it in a high-cool, in your classification systems for your customer service. But when things get too difficult, pull in a Fable. And so this is basically model routing, but model routing driven by the enterprise customer themselves, teaching them how to model route effectively using a smaller model. But you can also model route using a better model. And that is part of this kind of split tasks into to do lists and then send out independent agents. Some people have been on fable right now on in in Claude code, if you set it to ultra code right now, you actually get this experience. But they don't fable doesn't use like subagents or you don't know whether they're using subagents which are of lower models. They won't tell you. But you can explicitly ask fable to use sonnet subagents and it will use those sonnet subagents and check their work. And so you can start like streaming a lot of this work actually to other agents even using the existing model and then cut down. I hit my fable limit yesterday, the five hour limit. I hit my fable limit and I went online and I saw one of the other software engineers, influencers online and he said he had done 16 pull requests since fables out hadn't hit his limit at all. But he had a structured strategy of he had already told Claude, these are the models that you can use. You have this, you have, and you have GPD 5.5, which is available to you by command line as a MCP tool. And he basically told it, and he also told it, this is... the cost and this is the effectiveness and this is the taste. And then he told it depending on what cost, what taste and the complexity of the task, this is how you should allocate and this is how you should allocate the to-do list and allocate the and basically he hadn't run out at all. So effectively these firms are capable of doing a mile running on their own insight internally. That kind of structured process could be moved into the model very easily. They just haven't because hey, you know, it's just not just an optimization, which is not necessary right now. But the moment it is, they think it is necessary, they will do it. So I don't I don't I don't really know that, you know, model routing is going to be a long term thing, but it is a negotiating tactic against against the firms, negotiating is a price. So it's very helpful right now that we have opening eye and entropic kind of close to each other. It would be horrible if we just had an entropic, we would all be suffering terribly.

So it tactically for what it's worth, I like to start with the smarter model and have it delegate to the lesser model. I've been a little confused by the idea that you'll start with a son at or even a haiku and have it route up. I could see that working, I think in so many cases is that there's always this kind of question of, how much are you gonna control and really map out the structure of the work that you're trying to do? How much are you gonna control the inputs? And how much are you gonna map out that structure? Going back a couple of years, I think it was already possible with even like a GPT-40 with fine tuning to get virtually all routine tests tasks done on an AI workflow basis. If you're a company that employs people to do roughly speaking the same thing over and over again, I would bet that with 4.0 and fine tuning, you could get human level performance on a large, large majority of those tasks. So already, I think we've had for a long time the ability to, if you control the environment and if you map it out and if you're willing to do the e-vows the prompts and the fine tuning, you could get there. Obviously, the hurdles there are high. The hurdles have been brought down with smarter models. What do people really want to do? I want to have a smart kind of general purpose assistant that I can throw anything at and then have it intelligently decide when a sub task is something that it doesn't have to put its full force behind. And that does come up a lot for me. So I'm planning a trip to China, breaking news. And one of the things I'm trying to do is just go through all my contacts on all the different social platforms and be like, who do I know in China? Who follows me that I don't even maybe know in China, but I can reach out and see if I can meet up with them or they might connect me whatever, of course. Now that's something that if it's It's going through thousands of individual profiles and classifying them into, is this something that we should consider reaching out to or not for this purpose? Then sure, probably any number of models can handle that. Certainly I would expect haiku to be good enough to do a good job and not miss obvious stuff. So, but I want to be able to construct the overall strategy and like sanity check the final results and do the prioritization. And I think that's what most people want intuitively from their AI experience. What is unclear to me right now is what enterprises really want. Do they want that experience that they can just give to all their people and be like, hey, here's an incredible source of help. And you can kind of use it on the fly in a flexible way and it'll help you. it'll give you leverage, it'll buy your time back, it'll help you focus on the more important things. Certainly a lot of lip service paid to that.

But the Alex Carp point of view, which I think is kind of implied by this like lesser to greater model routing also, is I think more a callback of the workflow paradigm, where you're not really leaning into the flexibility or general purpose nature of the model so much, and instead really controlling the plumbing, putting, you know, putting high-coo in a situation where it's like, okay, you are a customer service, triage agent, and you're gonna get these things, and you're gonna be able to handle 90% of them, and there's gonna be maybe 10% that you can, and here's the taxonomy of those, and we've worked this out, and under these circumstances, then you're gonna call up to a higher intelligence model when you need to, I mean, I guess, you know, it's going to be everything everywhere all at once is a good general prediction. It's not like it's going to be one or the other. But I guess maybe the question is like, what is the winning strategy for an enterprise? Now, I don't have a great intuition for which strategy enterprises should be pursuing or with what balance they should be approaching those two paradigms. So I think it's very, number one, enterprises only want increase in revenue and decrease in cost. That is the basic fundamental if you can't deliver value that way. And you can't somehow promise or somehow get some kind of metric that they can use internally to show that kind of return on investment. It's not going to fly. Alex Carp is bringing this other thing of risk. risk is you can get increase in revenue, you can get decrease in cost, but these guys are going to wipe you out in the future. This is the risk perspective, which he's kind of trying to inject in there. And that speaks to basically that enterprises actually are getting increases in revenue or decrease in cost that kind of plug into the existing models that can kind of see where it's going. And that's why he's trying to inject the risk perspective in there. I'm also very confused about is oh you know they're going to take all your IP. They look at the Fortune 500. Nike, what does Nike have to fear from anthropic? I mean it's, anthropic is not going to go and make shoes. They're not going to like build a shoe brand. They don't have the expertise to go approach athletes to do sponsorships. Like what exactly is like anthropic threat to Nike or threat to Nike's IP? And then once you start thinking about that, you start, okay, what if we put more and larger copper goldmine in the world? Where is the threat? What about, let's say, Lindy, which does all of the material's gases that you use for a chemical production? What is anthropic threat there? To their IP. And once you go down that kind of route, you kind of like discard all of the physical businesses, Right? Walmart is not going to be disrupted by anthropic. You get all the physical businesses out of the way. And what you're left with is the pure IP businesses. Software, software production. Maybe pharma. Maybe pharma. I'm not so sure. The paperwork businesses, banking, paperwork compliance businesses, accounting, tax, compliance, regulatory, all of these things which people work businesses, right? Those are all of the businesses where you have IP or relationships built up over years where if you have an entropic go in and they re-through your entire workflow and processing, they can basically absorb all of that into the model. So that is where I think the risk is.

So when Alex Harper is talking about this, he's really talking about, you know, he's really creating this aspect of fear and risk, but it's only going and affect a portion of the economy. And unfortunately, that portion is the portion that has been growing dramatically in the last 30 or 40 years. The paperwork, white color, type of professions have grown dramatically. The physical stuff has not grown a lot. Physical build out in the US has not grown that much in the last 30, 40 years. So I think that's really the perspective that he's coming from. And I don't think, like, and yes, the paperwork businesses are under threat, I think, because this thing is going to be great at complying with rules and reading the regulations. Way, way better than any human is ever going to be. Right? And I think that that is a bridge that they have to cross. Besides, so enterprises, you know, they're concerned about the cost of these things. They're a little bit worried that, you know, the return investment is not there. But on the other hand, okay, I was at the AI engineers. This is a good segment. I was at the AI engineers role fair first two days of this week. And if you don't have to switch, shout out to Swix. So Swix, he organizes the AI engineers' welfare. It's been about three years now and he brings in everyone. who was involved in basically implementing models in their companies and is concerned about these things. And it brings in a host of speakers. All of the top companies are sponsors and their boots and etcetera. And I was speaking to people who were not in tech, right? So I was speaking to a guy who was logistics CTO from the Midwest. And another guy who was, you know, running an accounting for like, you know, back office somewhere else, right? So these people were not actually, a lot of them were not actually in Silicon Valley like full time, but they run teams who are using AI to solve customer problems on a day-to-day basis. And so the logistics guy was telling me his CEO is completely AI built. A management is completely AI build, which is why I think the team is there at the the engine is welfare. If not, your logistics team are not going to send your logistics IT team to SF, right? And they're completely AI-filled. They have tried working with external vendors before, and they have found it difficult because external vendors have not delivered as fast as they want them to deliver, which can imagine sitting out in the Midwest if you have a local external vendor and your outsource, like a project to them. And they have no idea what's going on there, like a year behind the frontier. So his team internalized everything. So they're doing all this stuff internally now. And they're building e-vails. They're building all these systems. They're using guys like Arise and other guys like BrainTrust, which have started to build out these frameworks to evaluate models, you know, widely when you implement them in enterprises. And they are actually in the day-to-day process of implementing. And as they implement, they automatically they see the results because they are very close to that edge. They see that customer service calls or exceptions that used to happen are now getting handled immediately. And then all of the more difficult stuff that they used to have to jump on, they can now start to address. And so they are seeing, I think, the return on investment on a day-to-day basis.

And this is very different from, I think, the story that you get from the big enterprise CTOs. Because they're so far away from the frontline that they don't actually know what's going on very closely. They're just looking at the numbers. And by the numbers, the token spend is going up, but are you really seeing the return on investment? But if you go down to that working level and you see day to day on like the customer calls coming in and whether they're getting a handle, the handling rate and the exception rate, the exception rate is falling, the handling rate is going. And they're seeing that on a day to day basis, right? So this is I think where we are. The guys who are actually implementing and close to the implementations are actually seeing the results. It hasn't really filtered up into the top layer of the enterprises yet. But the CEOs who are AI-filled kind of know what's going to happen and they've made the commitment and they're making the investment. The CEOs who are not are kind of sitting by and like, oh, you know, we're going to wait around. We're going to see what happens, like it said. That's really where we are. And I think it's not really visible. Like it wasn't visible to me until I went to this AI and you know, it's welfare. Like I think of AI people is just a people on Twitter, right? Like that's, you know, for some reason my world has gotten constrained into this like small, tiny world and like, you know, the bubble. And I don't really have this good sense of what people outside the bubble are actually doing or thinking. But you go there and you meet these implementers. And then you realize like all of the stuff that we talk about and we're producing in the bubble is getting used outside. People are learning how to use these tools and people are deploying and people are using them. People are seeing the return on investment. But it's at the very micro granular level right now. It's gonna take some time for the numbers to filter upwards. So coming back to the two paradigms, a lot of what you're describing there sounds to me like people are successfully implementing the workflow paradigm. If they can measure things like the handled rate and the exception rate and so on, that by kind of definition means they're doing this more controlled, structured, workflow style build out. I still kinda wonder, you know, for me, that work is very quickly becoming We're in this weird spot again where with this latest tick of the model, certainly, and even before I think with Opus, the frontier models have a pretty good intuition, certainly better than the vast majority of people and probably better than all but a pretty small minority of people who have really honed the skill of how to build a more deterministic structure, AI powered workflow. And as such, it still feels to me like enterprises mostly should be embracing. Let's just pay up for a bunch of fable and kind of let our people use it. The organizations that are sort of like, oh, we're going to go make GLM52 work for whatever. We don't need that. You don't need super intelligence to write your email or what have you. I think that's true, but I think we're still in this kind of weird middle moment where if you want to Econimize and you say oh we're gonna have our people set up all these workflows and Then we'll be able to use an open source model and we'll save money.

I just think you're signing up to go kind of slow where and just you know have a lot of committees and a lot of people in meetings talking about how do we do this and how do we eval it and whatever and it just feels to me like there's still a lot of advantage in just throwing some high-value tokens out to your whole employee base and basically saying develop your own workflow solutions on a kind of as needed basis and by the way the frontier model which you have is really good at that. And maybe you can have a strategy that's like, and specifically we wanna delegate to our own internal GLM52 inference capacity or whatever inference partnership or provider we have. But how much does that say, versus just having fable kind of natively delegate to SONNED and even haiku where it's clearly gonna be more closely trained to do that well and know when they can handle the tasks accurately and prompt them well. And that I really don't have a great sense for it. I was just talking to a former investor of mine, still an investor in the sense that we haven't liquidated the investment yet. And he's working on a sort of red hat for AI kind of thesis at the moment. And so we were talking about this and I kind of came to the same conclusion with him. I think what, as so often I think like the tasklets, the Lindy, not the Lindy you mentioned, with the Lindy AI and the illicit product folks are really at the frontier of this where they're really wrestling with, how do we carve out any relevance for ourselves in a world of cog can do it all in a world of cog for science? And their answer is kind of like, we have to be the best at routing So we can deliver results without compromises, but do that as efficiently as possible. And we also, of course, have to be cross provider. That's the one thing that the Frontier Labs are probably not going to do. Although we even see that a little bit, right? Like there has been some direct integration or direct sort of plug-in type relationship between clawed and codecs. But they sort of have to do such a great job of that, that they can then amortize the cost of that work across all their customers and then deliver, ultimately savings to customers relative to what you could get on your own. Really hard given the price discrimination between the API and the subscription level. Although again, we have to remember for enterprise, the 150 seat cutoff that anthropocaz is, You know, it's another really weird discontinuity in all these curves. So depending on which side of that, you fall on your in a very different world price discrimination. nationwide. But I think task, you know, and Lindy, they've both put things out recently where they task at strategy was basically like, call it is the only thing that's good enough. We're just going to ride Claude and make the best product that we can and grow as fast as we can. And the limit at a similar strategy, they had a lot of different models that you could use, but Claude was the default for pretty much everything. And they were both like, we're going to prioritize making this thing work even if it's low margin or right times task was even negative margin. Lindy has also been, I think we've had disclosing numbers pretty transparent around. Inferences, they're biggest cost. So, you know, a little swing on inference.

Taskhood even said that when 4.8 was using a new tokenizer that was 30% more tokens, they opted not to move from 4.7 to 4.8 right off the bat just because of that 30% input token bump, which was enough to kind of mess with their economics. Yeah. So, but nevertheless, they couldn't compromise on customer experience, so they sort of waited, waited, waited, build, build with Claude until such time as they could finally get to the point where they could start to diversify across providers. And notably, this has just happened for both of them with tasklet bringing on other inference providers, I think leaning into opening eye, most of all, Lindy actually starting to do open source inference, but with a really intensive process, with many failures under their belt of previous models that they thought might maybe be good enough, and they weren't until finally they hit on one that they were able to get confidence on and actually launch to not all use cases, but at least enough use cases that it has kind of meaningfully changed their economics. So it seems like that play, you know, if I'm kind of advising an enterprise or advising like a red hat for AI type company that wants to go serve enterprise, I think that same strategy is kind of what I would say. Like the frontier models still have the juice where if you're using them on an ad hoc basis, you're gonna get value. You're gonna know where that money went. And even if you're structuring things, which is a good goal to have, you probably don't wanna rush into trying to structure them with a much cheaper model because it's only recently been possible with a lot of work to get comparable results. And even if you are gonna, you know, finding you can do that, who's gonna do it best? Probably still the frontier models even more, you know, then you're own people in a vast majority of cases. So, and then while you wait too, of course they're doing all this integration of the native routing. So yeah, I guess I kind of still feel like all roads lead back to fail sort of if you're really going for performance. And if you're not going for performance or you're really prioritizing risk or you're really prioritizing IP, that I think you do have another question on your hands of just like, how long can you do that before? because in addition to cloud for science, they also just announced a whole biomedical initiative, they're gonna go try to cure rare diseases, which I think everybody should be excited about, except for maybe the pharma companies that are trying to do highly controlled, structured build out type things while meanwhile anthropic is just Revan tokens at the frontier and potentially coming for you real quick. So the way I saw flow Crivelo's ultimores post as they started to switch into open source and he was explicit that they'd spend so much money on the cloud and open source was finally good enough. I think the open source being finally good enough is just an artifact of the slowdown that the Frontier Labs have had. have had. If the frontier labs had been allowed to release methane-fable level models in February, like we wouldn't, we wouldn't be having this conversation, I think, and the pricing would have dropped. I do have an endpoint that I think what enterprises will eventually look like. A lot of what businesses do is they try to standardize in order to reduce cost.

So if If you were like a, let's say soul entrepreneur and you have someone who comes into your restaurant and you don't like the look or they're making too much noise, you always have this management reserve to write not to serve you and to ask you to leave. So you would be like, okay, I don't like you, please leave. But if you're McDonald's, you need to have rules and structure because if you throw people out and like if your manager has, you know, tends to throw people of a, people of a certain kind out and it's discriminatory, then it's a problem and then you're going to have to like face up to it and you're going to get sued and you're going to have to like fix it. So in order for you to be to follow the rules and not get into that kind of position, you create this level of compliance and structures, etc, etc, and you create this entire like you know structured workforce and training etc to get your frontline staff kind of you know into that consistency zone. All of that kind of melts away with a good AI. So if you have a good multiplayer AI, which can hold the context of the entire enterprise at once, then all of the intermediary paperwork and red tape melts away. And so you could have like your manager at McDonald's say like, hey, this is the video of the guy, like you don't feel comfortable. Am I allowed to kick him out or not? You don't need the training, you don't need all this stuff, you can make a quick call and then the model can be like, okay, let's look at all the rules and like, is this going to be this commentary? Is there a cause? What we need to document and the model could say, okay, just take a photo of this and this and this and you can proceed to do so and this is the way to do it. You call the cops, you don't do yourself, you call the cops and you show them that this is what happened and they will be the the ones that we'll do this or we have a private security engaged like you can call them. So all of this stuff that we do on the enterprise does on a standardized basis can become customized now. And that has to do with clothing, like, you know, we're clothing measurements, you are whether they fit well on you, like the way your customers are as personal and talk, they set the vibe in your shop. All of this stuff becomes customized now. customizable because you can have a central entity that can hold the entire context of the business. And once you have that, then you can have your people are not doing people working anymore. Your people are there to make decisions. And that is the primary goal, like your people are there to like make vibe calls and then make decisions based on those vibe calls on how they want like their the set of decisions that are allowed to make to be able to go forward. So I think eventually all of this as you pointed out, all of this stuff about like having different models address, all of it kind of melts away because the model itself will allocate its own compute. The model will be able to allocate the appropriate amount of compute for the task that it needs to do. And when we talk about all these different size, different models, what we're saying is compute, what we're saying is how much compute should I use to solve this task is that compute going to be deployed using a small model, which thinks for longer, how much thinking times should I give it?

And this is where we have all of this low, high, you know, sauna and haiku, all of these names, they all boiled down to how much compute do I need to use to solve this task. And this problem of like not being able to, like not for the larger model, not being able to allocate compute appropriately is what all of this fragmentation is. The moment the larger model solves that problem, the moment you solve this problem of being able to judge how much compute to allocate to a certain task, all of this stuff melts away. Then you're just like, okay, who does the task best? If you have a context window which is large enough for the entire enterprise, you don't have individual instances anymore. You don't have transfer of context and summarization. The model handles all of that. It plugs in right in your CRM and databases. So we are on this kind of path towards there. We will eventually get there where you have one model that can decide how much computer allocate and also like have the context of the entire enterprise and be able to talk to individuals within your company and kind of advise them and be able to drive like strategy from the CEO level down and like have everyone be on the same page at the same time. And that's going to happen. We're on that journey towards there. This is why I thought Claude Tag, this is why I've been like, you know, I've been really like Claude Tag is excellent because Claude Tag is the beginnings of that, right? Like I've had this vision for a while. Claude Tag is the like they've solved multiplayer. They can they've solved like a able to handle independent threads, able to manage content like large amounts of context. Now it's a question of just getting it better and better. They've solved they've solved the zero to one problem on multiplayer. Now if the problem is the 1 to n. And the 1 to n, as speed at the other point, is actually... easier than the zero to one. The zero to one problem is the heart problem. And I think this hasn't been very well recognized in, in I think generally, Andre Carpati, you know, made a post and people are like, oh, look, now now he's promoting slack. Like, you know, you join, you join, you know, you're one of the greatest AI people in history and you join, anthropic and all of or sudden you're an influencer for SaaS. And they'd really dislike that anthropic is coming into SaaS because SaaS is where the easy money for tech people have been. Like consumer, you go in, you get hit on the head a couple of times when you're young, you come out, you do SaaS. And SaaS has been like stable and you've been able to make money. Anthropic is coming for all of SaaS right now because all of SaaS melts away as the model, as the model becomes a connective tissue of the enterprise, all of SaaS melts away. And I think people are still having Alex Carpa, having trouble wrapping their minds around what's gonna happen. And the pace at what's gonna happen. I didn't expect to see Claude tag so soon. I thought it would be two or three years from now. And so things are actually happening way, way faster. And Tropic is like downplaying it, like downplaying it so much, like they put out a tweet, They're like promoting it like a saspora like how could you how could you I mean you could have something if you solve the One to end problem. You could have something like a claw tag running US government Like it's the same. It's the same thing.

It's a singleton which runs For multiple people and has a shared context and can keep the threads separate and can follow the rules set It's the same thing. They have there on that path. They've crossed the zero to one. So yeah, it's good that people are optimizing those smaller models. It's great. But that problem is basically been solved kind of. Like I think the allocation of compute is almost there. And I think we are only seeing it because they slowed down a bit because of the regulatory issues. And I think by the end of the year, I think it'll be more evident that how powerful this paradigm is going to be. So is there a steel man of the Alex Carp thing and how much does it really center around trust and sort of IP concerns or the idea that you're sort of, you know, we're all kind of training our replacement from the sort of highest level of humanity, training the AI's, you know, generally down to the individual customer service rep that's that's now babysitting bots, is there more to the steelman case than that enterprises are worried that they are doing that too and kind of training their own future competition? Or is that really the heart of what you think he's channeling? I think he's actually using the exact same selling point that has always been used by into process, which is you have to own your own stuff. And that is why people had racks on premise, 30, 40 years ago. That's why Amazon created GovCloud. That's why you have all of these VPCs, et cetera, like it's the same thing, which is you have to have your own setup. You can't depend. And then within that own setup, there are vendors that give you that setup. And it is why Microsoft, for example, has a business because Microsoft was already inside and the prizes, banks, et cetera, and banks are not going to move to Gmail. Gmail came out after most banks were founded. Banks are not going to move to Gmail. Banks are Microsoft Outlook. You're not going to change that they are Microsoft teams. It's basically impossible. It's the same selling point. The problem is, I think, for some industries, If they're sheltered by government regulation, they get stuck. Like as banks are, banks cannot really easily migrate or have not traditionally been able to easily migrate to better software vendors. So they get stuck in cobalt or they get stuck in these things. So Alex Carp is like IBM now. He's basically trying to sell lock-in to enterprise into these older inferior products. And it works. I mean IBM has had a business for years, right? Like for 30, 40 years, they've been servicing mainframes while everyone else has moved to cloud servers, right? So it can work. And the key difference is that the AI can also do migration much more easily. So the cost of migrating is lower. So he's got to push the trust angle. The cost angle is not gonna be enough. has got to push the stressed angle in order to maintain. But they'll come up with new things. They'll come up with like, oh, you can manage your own singleton inside. And the cloud singleton outside will just come in to do maintenance on your singleton once in a while. Like people will find sales strategies around this stuff. So it's going to be a battle. It's going to be a battle. And the firms, these frontier labs, of race so much body. And there's only two ways. You either eat someone else's lunch, or you create new abundance. And as long as like the new abundance seems slow, they're gonna eat someone else's lunch.

They're just gonna drink the milkshake, right? So do you think he's right about the attitude? I mean, when he says, companies don't like the frontier labs and they love us. Obviously he's talking his own book about how much they love Palantir. But do you think that they really don't like the frontier labs? I mean, in some ways, the general purpose intelligence on demand, pay-by-use paradigm would be like the ultimate dream for the enterprise. And clearly, the adoption has been super strong. Do you think that's true? And if so, why do you think it's true? So I think a lot of this is driven by where you are in the food chain. And if you're in the food chain and you have to pay for a product, that upsets you. And you have to like if like the Uber CTO, you've run out of budget and you have to go back to your CEO for more budget, that upsets you, right? right? Enterprises, this is a cost. They're not happy with the cost. That's very clear, right? But they're being forced into it. They're being forced into it by competition, by CEOs who are, you know, AI forward, and they dislike it for that reason. That change is coming at all. They dislike that fact, right? Palantir at this point has a very clear kind of value proposition when they go in. They can tell you this is what we're going to do. This is how much it's going to cost and this is how much money you're going to make. This is how much you're going to save. They can tell you they're going to help you prevent fraud and this is how much your fraud rates are going to go down. So they have a clear value proposition. And so up front, you know how much you're going to pay before you engage in Palantir. You're not, you don't get like sticker shop with Palantir, right? The problem for the labs is people are getting sticker shock because they go in and they implement and then you're paying on a per token basis and all of a sudden the token costs like explode and so you get sticker shock and then you have to change your budget for the year. That's not very, that's not a very happy thing for any enterprise CTO or CEO to face. So that's annoying to them. them. The frontier labs are also horrible at sales. Right? They're not. You look at IBM. IBM has like 70% of the staff are basically sales engineers. And the sales engineers are there to basically help you implement, maintain, you know, do all the grunt work, etc. The labs are not doing that. The labs have decided, especially anthropologists, decided to do this very lean structure of having almost no people at all and just putting out the models and then just saying to like these enterprise teams here, you know, you can go ahead and use it or not use it. And I'm, you know, the CTO is on the I'm signing nine figure deals on the Uber over here, right? Like that guy, that guy isn't going to come and jump on your customer with sales calls and like, oh, you know, sure, sure, we'll help you do this. And, you know, our team will address this like next week. Nah, that's not happening, Right? So the level of like customer service that is expected for enterprise SaaS is not being provided by the frontier labs and they're not in a position to provide it. That's why they started this whole FTE program. But the FTE program people thought it was a sales engineers program. It's actually a program to extract data and workflows and implement them inside the models themselves. And that is what, you know, Palantir, Alex Carp is alluding to.

He's like, you, the FDs are coming in and they're not there to help you the data absorper workflows. And once the... of software workflows, you won't have a business because it'll be taken. And it's absolutely true. That's what we spoke to two operating IFTs. And they went into a company that thrive owned rather than an external company. And as they went in, then they took apart the workflow and they were basically absorbing. And they said that the intention is to absorb it in the next row. So yeah, that's where the enterprises are upset about costs. They don't like paying costs and they don't like sticker shock. And it's a function of the sticker shock that I think they're annoyed. And this whole open source model thing is about the sticker shock. Again, these are all negotiating positions. And having the open source models helps these companies, helps enterprises negotiate against opening eye and helps, you know, CEOs negotiate against the CTOs. The CTO wants budget. Use GLM 5.2. So CTO has to decide. Do I use GLM 5.2? Or do I put token spend limits of some kind on our engineers? Then they have to rethink. So they're trying to get the costing into the proper bound again. And they're hoping that it doesn't go up again. So if UberCTO can say, like, OK, I got 3 billion allocated to AI. And we're going to use anthropic. But if it's too expensive, switch over to GLM 5.2, that's what you have to do. And so that's what they're saying. saying that it's good enough to switch over, maintain the cost within that bracket. Don't push the cost out. And the model companies will basically negotiate to take that spend, right? The model companies have a 90% margin. So they will try and negotiate their pricing in order to come in and get 80 to 90% of that budget that's been allocated. And that is happening right now. So Overall, I feel like it's gonna come down to political economy in the sense that it's just really hard to imagine American business, probably. It's just probably the most dynamic business culture in the world. Maybe China would like to have a word about that. But it's just really hard to imagine how they do all this stuff and navigate all these things and maybe just the budget cycle and you know all the process that they've had. This is something as simple and fundamental as annual budgeting. Just isn't really going to work that well anymore. I'm afraid. And I do kind of bet on if we're in a laissez-faire market, I do think I bet on on the big tech singularity to ultimately win over a lot of legacy companies that are stuck in those processes, even if they're really trying. And the question then just might be, are these companies allowed to enter new markets? And this connects obviously to the equity sharing proposals that we've talked about a little bit and which are potentially getting more real. If the, in a way, you can imagine from a purely selfish standpoint, that some equity sharing where everybody gets an account, where they can look at how's my, you know, slice of the AI pie doing today, I could see that being in some way, the most genius move that the frontier companies ever make from a purely selfish standpoint, because if they even give away, right now apparently we're talking about 5% of opening I've been given to somebody, something, is it gonna go to the federal government? Is it gonna go to households? Is it something else? I don't think we have any clarity on that. Well, they'd be liquid is another interesting question.

I think when the precedent for just handing out shares, from what I understand of the post-Soviet world was like, I didn't really work super well, they kinda got cornered, markets got cornered and people didn't really realize what they had. Anyway, there's a lot of practical implementation questions there. But 5% I think could look really small relative to the, and even 50%, in some sense, could look small relative to the binary question of are these companies going to be able, are they going to be allowed to enter into new markets and compete directly across a vast, you know, super wide array of different markets against all their current customers. If they are allowed to do that, then it seems like it probably more than doubles their valuation versus if they are not allowed to do that. And my gut kind of says, give away if you're them, like give away half the company to the public. So everybody has a little ticker that they can watch. So they're all kind of cheering for you. And then you'll probably be allowed to come do these things. And like Amazon, you'll probably also be really well loved at the consumer level. You know, I think Amazon, as much as there's sort of hate in certain corners, you look at their favorable brand rating and it's like super high, right? I mean, I think the last number of the ISO were in the 70s, maybe even the 80% range, which is super high and why is it so high? Well, they do what people want, right? They give great selection, fast delivery, all the classic things that Bezos has will never change, great prices. Yes, they're like squeezing people left and right. Yes, they are doing, they're kind of doing exactly this, they look into their small business marketplace, find hot sellers, clone products, sell them for half or less in a lot of cases. And there's a whole kind of lienacon school of thought that's like, hey, this maybe shouldn't be allowed. You know, like this is an abuse of power and we're sort of hollowing out our small business culture by allowing this big tech company to come in and and undercut you so dramatically based on the data advantage to visibility advantage that they have. And they don't even need to make margin on it. So should we allow this or should we not allow this? The public broadly seems to come down time and time again. I want the best stuff I can get for the cheapest, the fastest and the small business owners, like, yeah, I pay lip service to that, but I'm not willing to pay twice as much on an ongoing basis for all these random products that Amazon would have sold me directly for half, just so I can sort of, generically, philosophically support small business. Now you take that to the enterprise level and you're like, oh hey, and Thropic is coming in and revolutionizing drug development potentially. We're still somewhat speculative on that obviously. But it's not good for Eli Lilly. You know, I don't think anybody cares. Especially if they have a little slice of entropic that they can watch tick up into the future. So speaking of that, entropic took a stake in, micron has taken a stake in entropic and entropic is cooperating with micron. So you can, that is actually happening, right? So as you said, this kind of taking a stake. So entropic needs memory. They want to co-design the memory because they don't have to capability, but that would also be, they can't compete hit to hit with micron, micron has all the manufacturing as well.

So they go in and they're a buyer, they're strategically, you know, partnering with micron to design the memory in the chips. And micron takes a stake in entropic, right? So there's a cross-chair ownership. So that is actually happening as you, as you kind of alluded to. I don't know to what extent anthropic will ever, I think the pharma thing is a little bit more, I think because Darius View and Machines of Loving Grace was an AI that they can do the entire pharmaceutical, the pharma research process from end to end, from idea generation all the way to through clinical trials and to the end. And I think it is very difficult for them just to offer API access to, let's say, Eli Lilly. And for Eli Lilly to be able to implement, because it's very difficult if you don't know what the AI is capable of of being able to design workflows that the AI can use. And so I have thought for some time that this will happen. My expectation was that anthropic would IPO and a bunch of people from anthropic would start leaving in order to found companies. means that basically attack each of these markets separately. I don't see the non-AI native companies from being able to be this kind of quick to do stuff. I expect the AI native companies to be the ones. So there are a few ways this can play out. One is acquisitions. So anthropic, I think, would maybe buy slack, or maybe buy sales force. You can kind of see it. Like they really like slack. Cloud tag is launches slack. they kind of want to make it their central kind of platform. And I could definitely see them, you know, at some point, if they're like at a couple of trillion evaluation and sales forces at like a couple hundred billion, it could be a deal to be done like, okay, I'll pay you 100 billion on tick slack, right? And sales force gets 100 billion cash. They get to stay as a CRM system of record and slack goes to entropic. And so you can start to see this kind of deals happen. Like this kind of taking over of key strategic assets and not really like that you buy shares in an throwback because they're going to pay you for the asset. You can use those funds to buy shares in there. They're happy to have you as a shareholder. But it's not like a special stake or whatever. It's just like the shares are available. You can buy them. They'll give you the money to buy it. Buy buying out your assets. And I think for example, if an throwback wanted enter like logistics or whatever and you need physical facilities, yeah, they would have to buy them, right? You have X and that buying them as an exit and those guys will exit can buy shares in an anthropic again, like that that system of acquisition can can support itself. And those buyers have the choice to whether buy shares in an anthropic or do something else with the cash. It's not like limited to only see when you when opening, I gives these shares to the people, the people can't sell them and buy something else, right? They're limited. So this is is actually less valuable than actually just having the stock out right and being able to sell the stock in the market and buy something else because that option of choice gets taken away and more valuable if you have that option of choice if the person has that and like they able to sell. But the thing is a lot of people who get these shares won't understand their value. This is the fact they won't understand the value.

They won't understand of super intelligence might be more valuable than anything else on the face of the earth in like 10 years. They won't understand that. And they would sell early, right? Same thing that happened with Bitcoin people, people sold early. They did not understand what was gonna become. So yeah, there are ways out of this where it's just, you know, without existing commercial structures, just pure acquisitions, without any, you know, crazy stuff like offering shares to the government, you can resolve all of these things. I think the great off-and-shash the government is really a political move. It's to kind of diffuse the amount, you know, what Sam is doing is like, look, we can either have the Dems come into power and take it away from us forcibly or we can give it voluntarily and decide the terms of which, you know, the ownership is structured. And that is really what he's aiming to do. Like the 5% is obviously a starting stake. If Bernie Sanders wants to go to 20, I'm sure Sam would go to 20. But after 20, like what can you do? What can Bernie Sanders do after that? You are the state already has a stake in the firm. What else do you want? And that is the answer that I think like the Bernie Sanders guys can't come to because what they really want is to stop progress. They want to freeze like wages and progress to people who are wages, progress, rents in New York to the people who already have homes. They want to freeze them so that things don't get more expensive. The people already have wages. They don't want to get taken away. That is what they're asking for. The way they go about it is like, I'm going to take your money. But here, I'm going to give you the money. What happens next? They don't know. Because again, you can't stop progress then. And that's what the people want. The people want to start the stop of progress. So he's created a political karundram for the left wing now. And he knows. Everyone knows that the left wing is probably going to have more power come November. And so people are making their bets now. I saw Jared Kushner's brother's wife, Carly Claus. She was on an interview with Amy Chang. I think yesterday or a day before on a podcast. And she comes out and she says she's a Democrat. And she comes out and she says, like, I've never met President Trump. We live in St. Louis and we are like unaffiliated with all of that stuff, right? And the reasoning is because as stuff comes out about like the Trump crypto stuff and like, you know, billions of dollars, et cetera, there are many, many Democrats lawyers who are sitting around collecting evidence in DC right now. And the winds will change. And these people, yes, President Trump plans to part on a bunch of people. But the thing is, the proceeds of this are already under scrutiny. And as those proceeds get used for other things, they will find ways to put people in the prison. There's a guy called Tana Greer. I don't know if you know him. He goes by scholar stage on Twitter. He writes, he's a lawyer. He's right-wing lawyer. but he's right wing kind of like non-front B kind of Republican lawyer. He's like, look, I told you guys not to do this. And when the Dems come into power, hundreds of people are gonna go to prison, hundreds. So this is where things are right now. I think, you know, Sam is kind of getting ready and everyone is getting ready. Like you have to be ready because the wins are gonna change.

And you have to be careful because, you know, Trump can part in a bunch of people but he's not gonna pardon like analyst number two on investment bank number five's desk, right? And what the Dems have done before is in the Clinton or what these lawyers have done before in the Clinton years is they've gone after like the smaller people and sure, the big guys get away but then it inflicts this level of fear on the staffers and it makes them like really aware. Like Trump isn't going to like pardon like, like 10 million people or whatever. There are like thousands of these like Republican influencers and like right-wing people and it's all being like tracked, right? Is he gonna pardon Tim Poole? Tim Poole took money from the Russians, right? Like is gonna pardon Tim Poole? Tim Poole is on, is an influencer. I don't know. So I think there's a bunch of this stuff happening in the background that people are getting ready for. And Sam has played his cards well because the 5% looks good for the Republicans and it looks good for the Democrats. And it diffuses the situation. And these, you know, if you discover an ASI, this is gonna be an infinite money pool anyway. Like, why do you care with 5%, like give up 20, give up 25, you know, give up 25 and become a tax free company, right? Like maybe you don't pay taxes anymore. Give up 49% ownership, whatever, right? So I think it's a very odd place because of this, the infinity number on the future proceeds make all the numbers like wonky and it makes everything a little bit more and Sam is a good player. I mean, he's excellent. What does Elon do now? Does Elon offer a 5% stay? Does what does Dario do? Does Dario offer a 5% stay? How about a 10%. What about Mark Zuckerberg? Mark Zuckerberg is already public. He's not gonna give you a 5% stay, right? Like it's just such genius, you know, genius political, absolutely genius political. So I don't know how do you feel like what like Sam has made this is obviously a political move. Like how do you feel about the politics and the appearance of this right now? Well, I was just reading commentary from Dean friend of the show, ball this morning, who officially starts at Open AI on Monday. He did say he's not talking with Sam or anyone at OK. I bought this yet. So this is just, and this is also continued indication that he's going to hopefully be free to speak his mind, even once he takes the role. But he basically said he thinks it's a viable, maybe even a good idea if the ownership goes to households and a terrible idea if it goes to the government. And that I think does resonate with me. I don't like the idea of having the federal government biased from one company to another, trying to pick winners. I think we still risk a lot of those dynamics, even if it's just probably distributed to households. But it does seem like it could be a lot worse if it's something where the government is a direct stakeholder, and especially if the government then becomes dependent on this asset as sort of some sort of backstop or clap. collateral for its borrowing and we just, you know, run it up even further and then we're kind of now in this weird thing where the stock print, I mean, we already are kind of in a world where the stocks can't go down or we have problems, but you can imagine, you know, if that now starts even threatened.

We're still in a world where the government I think can be the bailout, you know, be the sort of equity investor of last resort or whatever exactly is needed. I don't think we want to play with the kind of third rail of like intermingling the government's balance sheet with these companies to the point where they can no longer do that because they sort of already you know are in that same path of contagion. So you know I'm probably a less sophisticated thinker on this by a significant margin than somebody like Dean but I do think the household distribution sounds a lot better. I do have big questions around that when it comes to how is this consistent with OpenAI's mission? How does it benefit all humanity putting it all in the hands of five percent of people? I think is a very rough look for a lot of the commitments that they made over time. Would they have to bundle this or would they want to bundle this with some sort of universal basic compute commitment as well. They have done that with medical. They did make their latest and greatest medical experience free and unlimited globally. So there are making some moves in that direction. But I think it's going to be a tough sell to the other 95% of humanity to say, only Americans get any of the upside, especially because all the same concerns that we've been talking about for American companies, probably, you know, as we talked about with Europe 2031, not too long ago, like all those worries and risk vectors probably apply even more to other companies around the world. I think I've always thought that the whole, you know, lab, labs being like, okay, we're gonna be, we're gonna have like benefit all of humanity thing has always like not worked for me because I'm like, it's not possible. And it's like, you would not be able to like kind of execute that in the US, I think. So I've always been like, okay, sure. Like that's what you say, you know, take your word for it, but you know, I don't really believe you, right? So I think that's the fact of the matter. And yeah, I don't really see that as like, you know, a feasible thing for them to do, right? It's not really that feasible. Let me, let me segue here. And we have with us our first guest of this morning. We have Kunle, Alokurun. He's the cadence design professor of electrical engineering and computer science at Stanford University. But to the hardware engineering world, he's widely recognized as the father of the multi-core processor. In the late 1990s, his Stanford Hydra project proved that the future of computing was not about making a single processor run faster, but about putting multiple processors on a single chip to handle many tasks at once. a fundamental design that powers almost every server, laptop, and mobile device we use today. Today, he serves as co-founder and chief technologist at Sanva Nova Systems, a company that has completely rethought how artificial intelligence processes are built. Instead of relying on traditional graphics processing units, GPUs, which constantly shuffle data back and forth to memory, his team developed the reconfigurable data flow unit. This is a chip designed to stream data continuously like an assembly line. This approach drastically cuts down the time and energy required to generate AI responses specifically for complex multi-step autonomous AI agents. It joins us at a critical moment for the AI industry.

Over the past few months, Samanova has launched its massive SN50 processor, announced a groundbreaking architecture partnership with Intel to split AI workloads across specialized chips and secured a massive enterprise rollout with Vista equity partners. With the report circulating this week that Samba Nova is targeting a staggering $10 billion valuation, Conle is here to explain why the era of using one giant GPU for everything is coming to an end and how data flow architecture is making a highly advanced AI practical and profitable for the enterprise. Hi Kunle. Welcome to the show. Hi Trikash. Thank you for having me. Hi Nathan. Hi, great to meet you. Quick check. Did we say your name right? I think you cut out for some reason. Oh, okay. Did we get your name correct? Yeah, yeah, my name's just spelled another So you just pronounce every letter. And so you got it right. Amazing. For cash you take it, I'll refresh and come right back. OK. So, Conley, this is, I think, a critical moment in the chip industry. Because everyone is complaining about GPU prices, especially our friend Jensen Huang and videos, power and influence on the market, and the profit margins of Nvidia. And can you tell us a little bit about how Sambinova has this different paradigm that addresses kind of the issues with GPUs and brings down the total cost of ownership for enterprises? Yeah, so Sambinova was founded 2017 and it was kind of out of ideas from Chris Ray, my co-father, it was also Professor at Scatwood, and certified genius. And the idea was just, you know, if you could bring software algorithms ideas together with hardware architecture ideas, and they said in your introduction, I've been working in the hardware architecture space for a long time. And, you know, starting from a clean slate, how would you design an architecture that's optimized specifically for inference, right? So everybody thinks about GPUs as a kind of general purpose computing substrate, right? But the originality was designed for graphics and then they'd kind of made a foray into kind of high performance computing and for high performance computing, of course, need a lot of matrix calculation capability. And at some point, people realize that, hey, you could use these things for executing machine language, learning models. And the core of machine learning, and of course, AI is matrix multiplication. But when you want to The trainer model, clearly the core of the problem is how quickly you can do very, very large matrix multiplication. And so what happened is over time, GPUs put more and more of their silicon area into the exercise of making these matrix multiplication capable of these better, using the tensor cores. What you've trained a model, right? And you've trained a model once you now need to use that model of course And that's the inference problem and the inference problem is not really a compute problem Because as the models get bigger you now need to move the weights and of course what we call the the KV cache You know into the compute units and that is essentially a data movement problem, right? And it's a data movement problem from the memory to the compute units. And it's a data movement problem from your chip, compute unit, and of course you have to scale to multiple chips in order to handle the computational requirements for especially from very low latency, high speed inference.

And so our focus was how do you design an architecture that minimizes the overhead of communication. And make sure that you can most efficiently use the core resource in the system, which is the memory. And it's the memory, the memory isn't just one thing as you all know, it's a hierarchy of memories, right? And so the key thing is... how do you orchestrate that hierarchy, how you orchestrate the communication such that you keep everything as efficiently as possible. And if you do it right, you can get a five to 10x improvement over where GQs are today. It strikes me that in videos kind of solution around this has just been to increase the bandwidth rather than with through NVLink and advanced HVM integration. and software organizations like Tensor MT, via Lamb. Yeah. So is it, are they actually trying to brute force their way into this? Yeah. I mean, you really, of course, want to continue to get improvements, peak improvements on HPM bandwidth and check to check communication by using the latest technology. But then the key is how effectively do you use that bandwidth, how effectively do you use that communication? And do make sure that you don't waste it. And so whereas GPUs are often running at maybe 10 to 20% of the capabilities of the resources, the bandwidth and the memory bandwidth and the communication resources. Our goal in a salmon over system is to push that to be 70 to 80% of the peak. And so the idea is, yeah, everybody wants more capabilities from the underlying resources, but the key is keeping those resources as effectively used as possible. And of course that gives you more benefit for the cost that you spend on providing higher memory bandwidth with the latest HBM and higher signaling frequencies and communication bandwidths between the chips using the latest variety of NV-Link and stuff. For a lot of the chip users, they often focus on this number, the model flops utilization, M.F.U. What is your North Star as you guys are designing the chip? Is that the strongest consideration or some other metrics on numbers, things that you focus on? Now, the speed of light of doing inference is really, especially if you want high speed influence, right? So everybody knows that we're in the agenda KI era. And so what one wants is what we call premium tokens, right? So premium tokens are tokens that you can charge the most money for because they are premium and why they premium because they are they come from very large models. So they're accurate, but they also are provided to you at high speed so that they're that are useful from the point of view of an agentic environment that needs many turns through the models and potentially has multiple models that interact. And so the question is, how do you provide those premium tokens and you provide those premium tokens by having very fast inference? And that is not going to be limited by floating point. It's going to be limited by moving the KV cache and the parameters from the HBM memory to the the the the compute chip, right? And so we like to think of this as memory bandwidth utilization, right? So the speed of light is is one, right? Is that you use your memory bandwidth completely for one thing. Or maybe two things. Moving KV cache moving the parameters from the HPM to the compute unit. So that you generate, when you generate each token. Right?

And so in fact, in many instances, you're under use on utilizing the flops on the GPU or the compute units because in fact you are running a memory down problem. And in fact, most of the problems that one runs on GPUs and especially inference are in fact not compute down, they are memory down. Or memory and communication down. Could we zoom out and just ask you to kind of taxonomize the whole chip space if you would? I mean, it's a big question, but I think people are familiar with things like Sari bruce, which obviously has this like giant chip and has like a ton of memory on chip. We've seen a number of instances where people are, you know, burning the transformer architecture directly into the silicon with varying degrees. I think of flexibility still remaining as they pursue that strategy. And I guess I'm curious as to how you see the kind of menu of big different strategies, the big different bets that people are making. And then also how you see the strengths and weaknesses of each, are there, when you talk about KV cash, for example, like maybe we're in a world where we just keep pushing attention forever and there's, you know, it always kind of has to be dense, at least at some layers. And so we're kind of in, oh, a vent to, you know, told the singularity. And so that is like, you know, a persistent bottleneck. But maybe we're in a world where somebody figures out a state space model thing. And now we, and we've moved to a more like linear paradigm and, you know, the KV Cash isn't the constraint that it is today. So it is many different directions you could go with that, but I'm just really interested in your kind of high level map of the different strategies people are pursuing and what you might believe that would tip you to think, oh, this really pays off and in this world, this maybe pays off in this other world, if there are sort of cliffs, and I don't know what those would be, where if model set a certain size, then a certain paradigm won't work anymore. interested to hear your kind of thoughts on that as well. Yeah, that's a really interesting question. I think maybe you can think about it along three different axes, right? So one axis is sort of your flexibility versus specialization axis, right? So extreme flexibility might be something like a CPU or to some extent, maybe a GPU, which is this instruction-driven execution engine. And so can be pretty flexible, but of course you always pay overhead for executing instructions. And you pay overhead in terms of sort of canary and in terms of time. And then on the extreme of that axis would be something that would be very specialized for a very specific algorithm. And so if that algorithm changed anyway, then that piece of silicon would no longer be useful. And fixing your architecture to transformers and burning your weights into the design would be might be an extreme case of that. But I've learned never to bet against the innovation capable of these software people and algorithm people. And so I've seen even though with the time like being looking at ML and AI, that there's been this tremendous change in algorithms. And of course, now we kind of fixated on Transformers, but Transformers even aren't just one thing, right? You've got various types of different Transformers. You mentioned state space techniques. you mentioned, and the fact that the people are coming up with different ways of doing attention.

And so I would be very wary of sort of fixing any particular algorithm into architecture because then you can't innovate. So that's one act. Another axis would be sort of how you manage the what kind of memory you use, right? So if you could think about along the memory axis, right? So you've got, you know, very fast S-ray on chip, probably half a gigabyte. You've got hundreds of gigabytes, or maybe tens of gigabytes of HPM off chip, right? In HPM and then you've got, you know, terabytes of either flash or DDR, right? So that's the memory axis, right? And then maybe one other axis you could might think about would be how exactly do you manage the memory transfers and the communication? Do you do it again with instructions and make it very flexible? Or do you do it completely in hardware? And so the way I like to think about data flow is the data flow gets you into the sweet spot, especially reconficual data flow. And that it does allow you to be flexible. but it allows you to be flexible on the time scale that makes sense for AR. models, right? Which means that you're not changing things every cycle, right? You're only basically basically fix the model for the time at which you kind of are doing the inference. And you know, when another prompt comes along you may go to another model, but typically not actually you're going to fix that model on the machine for a while. So you don't have to make it completely flexible, but but you should make it so that you can change the model and optimize for the model. And so reconfigurability gives you that capability. Okay, so data flow, so let's talk about the memory access, right? And so you think about the size of the models. You need trillions of parameters, right? And so if you just limit yourself to the memory, you can fit on the chip or on the wafer, then you're kind of limited to maybe 40 or 50. If you're just a single chip, maybe half a gigabyte, you can get out of a way for maybe 40 or 50 gigabytes. But if you need enough memory for a trillion parameter model, that's a lot of odd chip memory, right? Now, so what you want really is to put the model in HBM, but make sure that that is, that you use the HBM as effectively as possible. And the way that you do that is by using hardware mechanisms from moving the data from the memory to the chip and between the compute chips, right? And so the idea is how can you be completely flexible but with very, very low overhead, almost no overhead, right? And so the problem with GPU's is they do use HPM so they can run large models, but they synchronize the data movement and the movement of communication of data between chips all in software. And that adds overhead. And it means that in particular, that they have a lot of trouble overlapping computation and communication. And that is in fact the key. So what you want to do is you want to communicate, but you don't want to communicate by waiting until you need to communicate. And then you have to run instructions to move the data. What you want is to construct a pipeline in which the communication is just one component of the pipeline. And so the way to think about the state of flow execution is that communication is happening all the time and it's just one of the pipeline stages and communication is happening for the last piece of computation, a piece of the computation for the model. you know, while the computation for this piece of the model is happening in some other stage in the pipeline, right?

So it's a classic idea from computer architecture, pipelining and the use of a memory hierarchy to move the data when you need it to where you need it at the right time. So the nice thing about these AI models is that you do have a graph of computation. And the whole idea of data flow is to take that graph of computation and map it onto the machine in a spatial way, such that you keep all the pieces of the model operating at the same time on different components of the computation that needs to be done. Can you contextualize a little bit? Let's say you have a llama three model and you have a normal like in video H100 or B200 chip versus a you know, RDA U chip. What is the difference in the amount of HPM required? Like is there a sense of like you would say like, okay, this, so it's not really a past the question, it's really a bandwidth question, right? So there's the two ways that the GPU uses bandwidth in ways that are not optimized. One way is that they divide the decode algorithm that you know the decode for a single token you've got multiple steps of the decoder right so take one step of the decoder right and think about all the code all the have to execute in order to execute that decode step well the The way that the GPU typically does it is they execute the decode algorithm one kernel at a time. Right? So there are some big kernels like flash attention, right, that have been optimized. But in general, there are multiple kernels that have to execute. And there are two overheads that happen. One is you have to move data from the GPU from the GISTRO from one kernel to the HBM and then the next kernel has to go fetch that data back into the GPU. That's wasted HBM bandwidth. The other aspect is you spend time launching that kernel and synchronizing between the two That is time that the HBM is not actively being used. So you have both wasting on bandwidth when you shouldn't waste it, and you have time that you're not fully utilizing the HBM. So the way that things work on a RDO in a data flow map is essentially you take the decoder and you make that a single kernel. And then you go even further and you use a technique that we've developed called kernel looping, whereas because you've got a single kernel and you've been in now, if for a sense, if you don't think about LAMA 370B, you have to run that Dakota 32 times. Well, you keep that single kernel decoder on the array of chips at the same time. And then you just keep looping, right? And so the net result is you keep the HBM completely occupied and you don't ever send any intermediate data between the kernels across the GPU or the Rdu HBM boundary, right? So you have both a more efficient use of the HBM bandwidth and you have a more complete use of the bandwidth. But we're not done that yet because the key innovation that I kind of alluded to early earlier is that you just you know because you're running across multiple chips and you're using what we call tensile level powers and right so at some point you now need to gather all those results together right in an all reduce. That's communication right you don't want to have that communication be a thing that limits the stocks, the pipeline. So what we're able to do is we're able to communicate from one odd-each U-chips SRAM to another odd-each U-chips SRAM without going through HBM. This is called, you know, we terminate the communication inside the SRAM, right?

So we don't use HBM bandwidth, and more importantly, it means that we could just treat the communication as another pipeline stage that we overlap with all the other kernel components of the decode algorithm, right? And so we get this more effective use of the HBM BAM when we keep the HBM running, we keep the HBM utilize all the time and we go back to our metric that we talked about, memory bad with utilization, right? This is how we push it as close as possible to one, right? Because we make sure that we only move the data that we absolutely have to move from HBM, the KB cache, and the parameters of the model. And we make sure that that interface is used as close to 100% of the time as possible. So those are kind of the key ideas. And back to this question, why can we do this extreme fusion into a single kernel? It's because we have more S-RAM on the chip. So you can say, you put more S-RAM, and then you can say, well, I'll put everything on the S-RAM, both the intermediate data between the kernels. And also, I will put the KV cache and the parameters. But then if you only use S-RAM, then you get into a very expensive system. Right? And so the key idea then is, is, is, let's build a system that is scalable. So especially with the, our latest version, the SN50, you could scale it all the way to 32,000 chips in necessary. And so in scale out in scale off, we can go to hundreds of chips. And then you can, so you can get the ability to run these large models very cost effectively, but you also make sure. that you can get this very high speed deco capable by using the data flow ideas to make sure that you can not spend time so that you can effectively use the Tensor Paralysis. So one of the limits of GPUs is because they don't effectively overlap communication and computation. They have a hard time using TensorPowels and beyond four or eight. And we can go to much wider levels, which means that we can get higher speed token generation. This has been fantastic. So thank you. You have a question, Nathan. So please, you have to think about where the different architectures sit. I would say that, you know, that GPUs are sort of, you know, in terms of their flexibility, have highly flexible, their instruction driven, they are, you know, their overheads such that they have, they're limited by the fact that they do have this instruction overhead, And but they do have the advantage to take advantage of the HBM capacity to be able to run very large models, right? If you're thinking about Grock and Cerebras, they are SRAM based and so they can't, effectively, very cost-effectively handle very large models. And in fact, you know, if you get to and trilling parameter size, then it's unclear how you can gang enough systems together to handle models of that size. And then, of course, they might claim that their data flow, but I would say that they're not quite data flow because they do involve instruction overhead in synchronization and orchestration, right? And the key to data flow is to hide all of the overheads of the communication by putting that synchronization and orchestration in the hardware. And so that's what we do on the audio design from Sam and Obit. We're at the time that we had booked. Do you have time for a couple more questions? Yeah, sure. Could you talk maybe a little bit then about what bullets you have to bite, you kind of alluded to at least one on the, not targeting the training market side.

Curious if there's any other big bullets that you're biting with this architecture. And then I'm also curious about how this kind of translates to parado frontiers for your customers. Like we see now the emergence of fast mode, right? And there's clearly this trade off between speed of return and batch processing, which batch size, which obviously translates to cost. So could you maybe characterize how your Pareto curve that you offer to customers compares to other Pareto frontiers in terms of volume versus speed type trade-offs? And then translate for me how that, What kind of customer is like the sweet spot customer for you? Is it somebody who needs more of one, more of the other versus the kind of hardware that we're more familiar with? Can you say that last bit? Just again? Yeah, so with the idea in mind that there are these trade-offs and probably the trade-offs with a Sama Nova system are different from other available systems one could buy, What does that translate to in terms of the most natural customers for you? Yeah, yeah, that's a really good question. So everybody's seeing the Pareto curve that Jensen put off, the trade off between throughput on the Y-axis and a speed per user on the X-axis. And what we see is that we don't, in the previous generation, like SN40, we could get the same throughput as a GPU at three to five X, bit higher speed, right? On the SN50, we don't have quite as much compute. So as you go to very, very large back sizes that the GPUs do very well. But once you get over 250 into the 500 tokens per second for user, this is where we can get, again, we can achieve those speeds at proof-ups, which are 3 to 5x better than what the GPU can provide. And so if you're thinking about a frontier lab or a Neo cloud that wants to provide premium tokens, especially in the neocloud on open source models. This is a sweet spot, right? So if you think about a mini max coding model, for example, right, where you kind of go to an awesome, as you can get speeds of five to 10s, X faster on mini max 2.7 compared to what you can get from other providers. And that's because it's being powered by our SN40 capability, which shows our previous generation. And 50 provides even more capabilities because it has the ability to scale to much larger number of chips using the techniques that I just described. So get back to this premium token idea. I mean, that is the ideal customer for somebody who wants to provide premium tokens to their users. Another good use of our system is something that we announced at a couple of few weeks ago. And that was the idea that you can gain up a RDA system with a GPU system in a disaggregated manner. So the idea is that because GPUs have more compute capability than a better pre-fill and RDA use a much, much better deco, which is the ultimate limit. And so you could do the pre-fill on a GPU, or you could use your existing systems that you have in the data center. And then you could bring in odd use at some ratio one to one or two GPUs to one to enable you to get very far-speed code capability. So one of the questions I had for you is we've had, we had a guest Bing Zoo on the show before and they were doing PTX instruction kind of optimization models. I might be marked. They were doing PTX instruction models. We can hear you hearing me. Are you hearing me? Yes, we can hear you fine. If you switch from like earpods to like computer mic like computer mic and then back it will refresh. I, we can hear you.

I'm not hearing. So maybe just refresh, just refresh on the page. Oh, he can't, he can't hear us. I see. Ah, drop off. Never a delmo in the live streaming game. Yeah, it's doing it, doing it live always always has, you know, more risk. All right. you'll see if we can, or we'll just message I think. So, sudden drop-off. Let's see. I don't know if we do have Brian here as well in the background. Brian, if you can hear us, maybe send a ping for just a quick page refresh. I think would be all we need. I sent that via email to No, no, no, no, it's not so improvement room for room for room here Shouldn't the banners the attention whatever API is filling in the headlines on the screen Send a signal to refresh the bay Let's see if that works prompt injecting our headline Right here. I could also come back. He might hear me if I come back And be another possible Switch all right. Let me refresh as well, and then maybe also get the hint if I refresh Yeah All right, I'm back in here. Me now. He saw the refresh I think. So maybe once he comes back, we'll do a quick, quick ground up and we'll let him go. We stretch this time a little bit. So it's the whole architecture question is fascinating because it indicates that there are ways around the in video monopoly. And people are working working hard on them. So one does hope that we can there we go. I'm not sure what happened there. I'm sorry. We are you are fault. Yeah. We are sometimes it's our fault. Sometimes it's a it's a live kid's fault. Like I've you know, I spoke I get emails from the dev team like oh we had a we had an outage. I'm like you should have told us was that before? Yeah, OK. We're back. We're back. We'll just kind of round up, like, what have you seen in terms of using AI within the design process as in within your firm in order to build software, build better tools to get around this whole CUDA mode idea. And where do you see things? And the second question, kind of, what is your like what does the roadmap look like on a three to five year mark for the firm? Yeah, well, I mean, I think this is something that I'm interested in both within Sam and over and you know, it's a big component of my research at Stanford. Sort of how do you use AI tools to speed up the design process, right? And so of course, you know, like every other firm, We are actively using AI, both open source AI models and frontier labs models in conjunction, because of course you want to control token cost, right? And so we've got models that run very efficiently on all machines, and so we can find them for talks that the open source models can handle. then you know for the very challenging task of course we'll go to the the frontier models in order to take advantage of those capabilities. So yeah that's getting used throughout the firm for all sorts of software development. You know we never of course use CUDA in all of our software stack and we never thought CUDA was was the right way to think about designing these compilers. And so we started with the Py Tools representation of the data flow in the AI model. And our whole goal was to take that representation and map it in two R on machine in a data flow way. And so that kind of completely side steps, Kuda. And then from the point of view of developers, what we wanted them to do is to be able to describe the way that they wanted to optimize their model at the PyTorch level.

And so the way to think about parallelism in these models is that it happens in multiple dimensions. And you should be able to describe those dimensions, whether it be tensor, or there'll be data parallel, well be pipeline parallel. And you can describe these kinds of optimizations at the PyTorch level and then our compiler can take those directives and do the appropriate mapping. And so, you know, remove the need for the user to write low-level code like CUDA, but still achieve the performance levels and the mapping a control that they really need to get high performance. So your other question was, you know, where do you where do I see the company going in three to five years? Yeah. Yeah. So we will continue to push on. So, you know, in seven years, we've taped out five or six chips, right? So we, and so we were on this, this cadence is already 18 months also we take out a new chip in the year 18 months. And so we continue to push on our ability to provide premium token inference, right, which means focusing on, you know, more bandwidth. You know, traditionally we've been one generation behind where NVIDIA has been in terms of both technology, but definitely HBM technology, right. So our current designs are, you know, are out using to me and we're able to kind of compete with three from Nvidia. But we want to continue to make sure that we use higher performance more recent HBM. We want to take advantage of the new ideas around packaging, 3D packaging and chiplets. we will think very deeply about how we design the next generation of architecture to both take advantage of the memory bandwidth capabilities of new generations of memory and also the ability to both scale out and up, right? And so we want to make sure that we can get very, very fast inference on very, very large models. And that will be, the key to that will be exploiting parallelism, especially the kind of parallelism that requires that we have this communication between multiple RDEUs. And so figuring out how to make sure that we can do that without the communication between chips getting in the way is going to be the key challenge. So we've got all kinds of interesting ideas about how you do that. So we want to be the highest performance and the lowest cost inference provider in terms of the capabilities that we provide from the RDE technology in, you know, with the SN50, which we have announced, the SN60, which will be coming down the pike soon and of course, future generations of our product will also be focusing on the same thing. We want to make sure that we maintain that flexibility because I see the landscape of AI models continue to change. There will be new varieties of transformers. There will be state-space models. There will be ways that we fundamentally have to transform the algorithms in order to make them more more, more linear because we can't continue to go in this, this quadratic direction. If we want to really enable everybody to use AI in a cost-effective manner. This has been fantastic. And I really appreciate you staying a little long with us. If I can dare to ask just one more question. And I sorry, I can't hear a word. Can't hear me again. Oh my God. of to our agents as they go through the transcript. We need to like refresh button in our producer review to just refresh somebody's. Thank you for joining us. Sorry for the technical difficulty. But this has been great. Well, that was a little rough.

The last question I wanted to ask was just around, let's say they achieve all their goals in terms of design. What will be the barriers to scale up? And maybe we can send an email and get a little extra thought on this and share it back. But the in previous rounds of kind of going down the like what is Nvidia's moat rabbit hole, one candidate answer has been at times, conviction, balance sheet, and lock up of the supply chain. that they are just willing and able to buy out capacity from key suppliers for their years to come. And I've heard that for a while, but we've seen that there's at least enough slack in the market for a $2 billion companies to pop up. And it seems like, as you mentioned in the intro, Sama-Dov is trying to be the next decade unicorn in the space. So what's happening there? Have the suppliers sort of said, we're gonna cap it in video can buy, so we don't lock ourselves into a monopsony future. Are there clauses that allow for flexibility? How is it happening that somebody can come in into this supposedly, and I think not just supposedly, but this super input constrained, super complicated supply chain, and by their way into being able to ship finished product in a time scale that matters. I don't have a great sense for that. So I kind of have a sense for what's happening right now. So number one, Sam and Ola, Chairman is Liputan at Intel. And so they're definitely gonna use the Intel Foundry. And the Intel founder doesn't have enough customers, right? Like he's, Lipu has been signing with Apple and signing with other people and trying to get them to use, use the, you know, Intel packaging and Intel, chip manufacturing fabs. So I think someone who has, we'll clearly, you know, Intel is also put in an I think investment in them. So clearly I think they're gonna be, become, you know, they're gonna tape and they're gonna produce at Intel. I think that's clear. On the other question for Envidia, number one for Envidia, I think Jensen is always under antitrust like I's now and he's got all of these like very powerful hyperscalers like around him like being very annoyed with his price with his pricing. So he's got to be aware of this and he is making deals. He signed a deal with Corning, Glass manufacturer for optics this week. he is co-building a factory with them. So he's put in, I think he's putting like 40 or 50% they're putting in and then they're putting the expertise and basically he'll buy the product. So he's building out factories. And that's a little bit hard to say that it's an antitrust issue because the factory wouldn't exist if he hadn't put the money in. It's not, he's not, he's not him buying out capacity that already exists. It's creating capacity that didn't exist in the market. So I think that's hard for an antitrust case And I think they're aware. And so they're targeting these things where they can start to own pieces of the supply chain directly. That's one. For the smaller chip startups, what ends up happening is that Broadcom or some of the larger guys have allocated capacity for wafers at TSMC. So what ends up happening is these guys, the smaller guys end up going to Broadcom. And Broadcom takes a big on that. and then they basically manufacture through broad comps relationship at TSMC. Qualcomm also has some capacity allocated, and so Qualcomm also can swing a little bit.

So these guys have like some allocated swing capacity in the industry that the smaller guys can use if they pay for it, right? So this is what has happened roughly in the past like 12 months or so. And it's not really that you can't get capacity, is that you're going to have to go through people who have the capacity already, and you're going to have to do deals with them in order to make it work. So, nice. Great answer. I finally kind of understood what the new paradigm is. So I get, you know, correct me if I'm wrong, but basically what he said is that a normal GPU acts a little bit like a hub and spoke system, And for every step of the decode process, it goes back to the hub and then goes back to the spoke for processing and comes back. While for then these new chips, like the salmonova, they're acting like a conveyor belt, where it goes to the first one and the second one and the third one and the fourth one, which is what it's supposed to do anyway, because that is the steps that it's supposed to follow. And that's already defined. And so basically that's what they're doing. And this is the difference between the GPU paradigm and this paradigm. Is that is that roughly what was going on? Yeah, I think that aligns with my understanding. I mean, I'm always in general in AI when people make comparisons between their technique and some baseline. I'm always a little wary, depending on how well I understand that baseline versus what's actually running in optimized form in production. I'm always like, okay, yeah, that sounds right. Are they actually doing it that way? Or do they have some other, you know, work around or optimization that narrows that gap somewhat relative to, you know, the kind of default or like naive path or I'm moving this with sort of transformers a million times where people will be like, oh, my new architecture, outperforms the transformer, but it's like, well, what transformer, right? And if you're using the 2017, you know, initial transformer as reference, like you're obviously way underselling what it is today in fully optimized form. So I'm always a little wary of that kind of thing at all levels of the stack. But I think that what you're saying there adds up. And then there was also this other big concept of blocking versus non-blocking. And I don't quite understand that as deeply understand that as deeply as I'd like to either, especially when it comes to how hard is it for NVIDIA to get out of that problem? Is there a reason that that is sacred in their architecture or is there an opportunity for them to relax an assumption or to and get to where Samanova is on that dimension with the next generation? I would be speculating definitely out of my, out of my wheelhouse and out of my depth to say much more than that. But yeah, like what, what bullets would in video have to bite to not have these blocking issues unclear. So let, let me give you like a kind of industry structure kind of thing, right? Maybe if you have like, in video chips are good for frontier because the GPUs are eminently reprogramble and you can run like multiple architectures on them and you can develop these new architectures on them. So what ends up happening is that as the frontier advances quickly, then the GPUs get utilized. If the frontier advances slowly, people start to optimize on the inference quickly. So you have again this kind of two waves of on the one side, the frontier advancing.

the frontier is advancing fast, people need to be at the frontier and there's not enough time to optimize the inference. But as the frontier advances and six months or nine months or 12 months behind that frontier, you have the inference optimization wave which is following up. So the frontier chips first get used for the frontier and then they kind of fall into the inference. So for inference, the GPUs are imperfect, but they're also depreciated because they're also, they were used for for the frontier and now they can be used for inference. So that's why the H100s are still being run for inference, for example, because you no longer develop the leading edge frontier models on those chips anymore, but they're good enough for inference. And so that's what ends up happening. You have this kind of two waves. And it's kind of similar to the two waves that we see in the model labs versus open source. Again, you have this, the leading edge, which is creating the new stuff. And again, you have this kind of optimization and like distillation and consolidation wave which is following up and which is reducing cost. So you have these multiple ways proceeding through the industry at multiple layers simultaneously. Does that kind of make sense? Yeah, I mean, he did specifically note too that they've historically been one generation behind Nvidia and so that definitely reinforces that notion. We also heard from SpaceX AI that a similar thing, right? they were willing to rent Colossus one to end Thoropic, or like as they didn't have a lot of inference, a lot of themselves, but also the idea was, it was sort of a mixed chip environment and it was becoming problematic for the training and so the Colossus two, which is like a more homogenous environment, just a better place to train and thus they didn't really need the first one as badly anymore, they didn't feel like they were sacrificing their frontier work to rent out what had kind of become now more of an inference capacity facility. So yeah, I'm trying to think of what the complications of that would be or if there's a counter to it, but all the data points that are coming to mind so far, support the general paradigm that you're laying out there. I think the chip business in general is just going to get so much bigger. I think there's enough food for everyone. The entire business is going to get so much bigger because you're going to get all of these different niches where all of these different players can thrive. So it's really a swing. It's like a swing back to mid-90s, the chip era again. Yeah, all computers is food for these AI's That was the title and a song lyric from an episode of the podcast I did with Jeffrey Laddish from Palisade. And I've got one coming up with, Ramen, the CEO of Liquid AI too. And I'm just thinking a little more about how even small bits of compute are starting to become relevant. The cell phone market is $500 billion a year. like it's been at the scale of the data center build out for quite a while now. The data center build out is now surpassed it in terms of annual scale, I think, but not, that hasn't been long. So I do think we're gonna see kind of, yeah, it's always comes back to everything, everywhere all at once, because we pretty soon, I think this is also as I look back and think about my predictions that I got wrong. I think one big one that I sort of got wrong was that inference will be really cheap. There was a time in a way I'm still right.

It depends on how you want to look at it. It's definitely not as cheap as I thought. It doesn't feel like it's super abundant to the point where it's certainly it's not too cheap to meter, right? Like we have not hit that level. And I don't know if I was expecting too cheap to meter, but I was definitely expecting closer than where we are right now. The way in which I was right is that Fable was still cheaper than the original GPT-3. It was $60 per million in foot tokens, $60 per million out of foot tokens, no caching, twice as much if you did fine tuning, and obviously, you know, dramatically less capable in all regards. I think the thing that I underestimated probably was just how many tokens, more from to your model would use. We were putting in a couple thousand at a time to GPT-3 and we were like, oh my God, this is 15 cents to make this one API call, and that felt like a crazy thing when in general API calls were in any other kind of API, where like a tenth of a cent or less. But then we saw, you know, dramatic cost reductions with three five and turbo and all those kind of, you know, models dropped those costs by all those two orders. It was like 98% cost reduction with performance improvement. And at that point it wasn't so much token hunger. Now we see this incredible token hunger. I still think that there's an interesting question or, I mean, another thing that's really changes we've moved to a world where we're now in the market clearing price era as opposed to the labs trying to use a, you know, trying to create the market out of nowhere and like get the whole thing going Europe. So now we're at a time where at a place where you could run things for a lot less, you But it's just not available because the market for the chips continues to push prices up as we've just discussed many times even on old chips. So I think I underestimated token hunger and I underestimated demand just bidding up compute regardless of all other progress. Even still, it's like cost per million is down, which isn't amazing. but we're in a metered world, that's for sure. I suspect the metered world is temporary because I think what ends up happening is people have budgets and the firms have to capture like wallet share basically. And so the whole like sticker shot thing I think doesn't work, they either have to pay for performance in the sense like, Okay, you solve, you get a new cancer drug, you get paid for that, right? Like that is something that is like measurable, like return on investment, or you get your cost, you're not a revenue item, you're a cost item. So then it becomes like, okay, this is how much I can afford. I will buy all of the most intelligent stuff that I can get within this budget, right? So, and I think that that is what is happening right now. It's this reconfiguring of the market to, we had this first expansion, we had this, you know, the blow up earlier this year, and now this is reconfiguring to like, okay, this is my wallet share. What can I get with this wallet share? And like with this wallet share, can I, you know, what can I do with it, right? The one thing that I think the X-Factor here, which I don't know how it's gonna work out, is whether or not these models become so much more intelligent like this year, that you again get this expansion of like budgets, like we gotta have it. Everyone has to spend tokens. This is the most important thing, which is basically what we had earlier this year.

Do we get another capability step up? Where that kind of thing happens, you think? Yeah, I bet on one more this year. The first kind of hinge point was probably whatever people date it to like, Opus 4.5 to 4.7 somewhere in there. Yeah. The second one is Fable. Fable was done trading early this year. It took us a while to get it. We obviously have had drama around it, but it's hard to imagine that there's not a significantly better step coming in the second half of the year. And maybe too. You know, that if you listen to the people closest to it And it has always served me well to take their statements, both publicly, kind of reading between the lines and any private statements where I'm not told secrets, but I'm given general direction as to what they are expecting. It's always served me well to take that at pretty much face value. And what I think the signal is from the folks at the frontier companies is things are speeding up, they're not slowing down. And we're entering the era of recursive self improvement. There's probably still a lot of low hanging fruit in all kinds of different levels of the stack. And so, yeah, we probably should expect at least one more significant turn within the calendar year. I'd be very surprised. I mean, that would be the first time in many orders of magnitude of inputs that we wouldn't get another significant step on that time scale if we were to not see one the rest of this year. So I would have to imagine that it really is coming. And certainly they're building out with the conviction and they're doing everything they can to secure capacity with the conviction that something like that is coming and that the... Exponential of demand for their products will continue or super exponential potentially even yeah it's They they do have to like start providing like Significant value that that's the thing right like the the first step of coding. Okay. Yes now everyone has like software coders on on standby like infinite software code. Okay great but You know like a Nike for example like okay So now your team is using tokens. Like what else can you do for me? Right? Like how else can you increase my revenue or how else, you know, cutting cost is not that significant. Like people like American firms rarely like focus on cutting costs that dramatically, except if it's a downturn and it's a downturn, they have like a stash of like employees that they're prepared to let go and which are the which are already been pre identified kind of and they're like, all right. Let's, let's, let's go. So I think like the key thing to that makes people buy something is revenue increase. How does, how do these models increase Nike's revenue? Right? Like, uh, they have to do something for the marketing or, uh, make people buy more shoes, right? Like, how does that work? Right? Um, that, that I don't know, like I can see for I can see it for coding but for Nike like how do you how do you get Nike to like allocate its IT budget almost exclusively to you right now that's still yeah I mean how elastic is demand for shoes I'm not super sure I personally am making progress on my 2026 goal of spending more time outside getting more exercise I think there probably is like an extra pair of shoes in 2026 for me as a result of my increased pound in the pavement as I'm liberated from my desk via AI. So, you know, that's one happy story. My guess is that doesn't move the aggregate shoe demand in the immediate term.

So I would say it probably is tough for a business like that immediately. But you know, then I think about that. I just did this episode came out yesterday with Thomas Von Chommer, who runs the US division of this company called Neural Concept. And they, you know, they may provide some inspiration for how you could imagine a company like Nike taking their business to the next level. But basically what they do is traditionally they provide models that validate designs, engineering designs, like CAD designs on a variety of dimensions, one big one being aerodynamics. And they actually work with like Formula One teams, which I didn't realize, I've never been a Formula One head, but they actually have rules around how much compute teams can use from week to week to optimize their design for aerodynamics and whatever else. But you're literally limited as an F1 team. And if I understood correctly, there's also like a handicapping where like the best teams actually have lower compute limits than teams that are not doing as well. So that's fascinating unto itself. They have Formula One customers, they also have just major automotive manufacturers and their break into the market point was these models that allow you to run validation of things like aerodynamics or heat dissipation or whatever, much faster than the physics-based simulations would allow, and most you could more deeply explore the design space and get your designs more optimized before you ever have to make a prototype, etc, etc. Now, they are also bringing an agent to market at the same time. So now you can have a neural concept agent that actually works in your CAD platform in a similar way to your human engineers and can call out to these models and say, hey, I just made a tweak, validate me, you know, with the aerodynamic model or whatever, give me that feedback and they can kind of run this TikTok loop of a genteq exploration, validation, optimization, which is the perfect formula obviously for reinforcement learning. It's definitely going to allow to the degree that they want to, you know, take a hand in training models. It's going to allow their agents to get quite good at climbing these hills. we should of course expect, you know, I actually think Grock is a good candidate to get really good at that kind of thing. I always thought, Elon, you just focus on that. They have so much like, you know, you said distracted by various kinds of porn and whatever. Yeah, like I mean, they have so much internal data which no one else has on like airflow and like material science and like all of this stuff. And he ends up focusing on like LLMs and like, you know, like, Grok imagine and like whatever, like what are you doing? Like, you know, Google does it, you know, damage would love to have like all the aerodynamic, like data and like all the data on that stuff, right? Like you'd love to have it. And they have a team that can do experiments too. So you can have like, you can set up what periodic labs is done. and you can set up a robotic experimentation lab and you can get a hit on material science and all of these things. And he ends up like in LLM's back in LLM's. And so that's like a little bit disappointing, I think. So yeah, I wouldn't count him out just yet, right? I mean, never bet against Elon is always certainly pretty well too. I mean, to land the plane on the Nike analogy, I guess, if you can get that loop close, right?

and you can get this sort of a genteck model, making these design tweaks, getting validated, getting really good at climbing that hill. That's a pattern that basically works for everything. Like I would bet that Nike has, clearly they have some way to forecast demand for products. Does that extend to the point where they have a sort of black box ML model that takes in just like a design of a shoe and predicts how popular that will be. Maybe maybe not. It would be just to see if we can get an answer to that question. If they do have that, then you can imagine them having an agent that actually like does the shoe design, you know, and kind of close that loop again. And so how can they drive more revenue? I mean, one big answer, which they're already quite good at, but you can imagine another order of magnitude would just be infinite designs, right? you know, even more extreme long tail and yeah, personalization, even. And if that becomes a token budgeted process as opposed to a human shoe designer process, then, you know, if we're all rich and enjoying abundance in general, then maybe we buy like, you know, two times as many shoes along with a bigger basket of everything else. That maybe they can only realize that if they can really scale that design and validation loop to the point where I can get something that nobody's ever seen before and you can too. Which is a reinforcement learning loop at the end of the day. Like it's a reinforcement learning loop with the human input for taste. Yeah. So I mean, Nike does see, I think you're right to say, Nike seems like one that is a relative literature, not as likely to be able to see massive growth as a result of AI as some other things might. But still, we can tell a story. You know, no one, no, no. Basketball shoes were not a big thing in like the 1970s. Right, like it was really like post Michael Jordan like 1980s when Michael Jordan signed with Nike and then it kind of took off. So you could kind of see maybe like Nike gets a marketing proposal of that kind for some other segment, like, you know, maybe old people shoes, you know, our, our, our, our, Thoughtics, whatever. Some other segment of the market and all of a sudden, that segment also becomes this kind of place where you can take like five bucks of plastic and sell it for 100 to 100 bucks. So maybe that's the thing. Maybe that's the thing that happens. All the senior living communities are going to be on fleek with the shoes, perhaps. We still say on fleek. I don't know. that came from. That's a deep cut. But yeah, well, no, if my maemas were in, you know, multi-colored custom designs in the next 12 months, we'll know how it happened. Indeed. And on that note, Nathan, good morning. So we will, it is two days before America's 250th anniversary. Amazing. How about that? How about that? And so happy birthday to the republic and we will see viewers and listeners next week. Here's to virtuous leadership long into the future. Indeed. Bye bye. Thanks for gosh. Bye for now.