Lex Fridman Podcast XX
[0] The following is a conversation with Jim Keller, his second time in the podcast.
[1] Jim is a legendary microprocessor architect and is widely seen as one of the greatest engineering minds of the computing age.
[2] In a peculiar twist of spacetime in our simulation, Jim is also a brother -in -law of Jordan Peterson.
[3] We talk about this and about computing, artificial intelligence, consciousness, and life.
[4] Quick mention of our sponsors.
[5] Athletic Greens, All In One Nutrition Drink, Brooklyn and Sheets, ExpressVPN, and Bell Campo grass -fed meat.
[6] Click the sponsor links to get a discount and to support this podcast.
[7] As a side note, let me say that Jim is someone who, on a personal level, inspired me to be myself.
[8] There was something in his words, on and off the mic, or perhaps that he even paid attention to me at all, that almost told me, you're all right, kid.
[9] a kind of pat on the back that can make the difference between a mind that flourishes and a mind that is broken down by the cynicism of the world.
[10] So I guess that's just my brief few words of thank you to Jim and in general gratitude for the people who have given me a chance on this podcast, in my work, and in life.
[11] If you enjoy this thing, subscribe by YouTube, review it on Apple Podcast, follow on Spotify, support it on Patreon, or connect with me on Twitter, Alex Friedman.
[12] As usual, I'll do a few minutes of ads now and no ads in the middle.
[13] I try to make these interesting, but I give you timestamps, so if you skip, please still check out the sponsors by clicking the links in the description.
[14] It is the best way to support this podcast.
[15] This show is sponsored by Athletic Greens, the all -in -one daily drink to support better health and peak performance.
[16] It replaced a multivitamin for me and went far beyond that with 75 vitamins and minerals.
[17] I do intermittent fasting of 16 to 24 hours every day and always break my fast with athletic greens.
[18] I'm actually drinking it twice a day now, training for the Goggins Challenge.
[19] I can't say enough good things about these guys.
[20] It helps me not worry whether I'm getting all the nutrients I need, especially since they keep iterating on their formula, constantly improving it.
[21] The other thing I've taken for a long time outside of athletic greens is fish oil.
[22] So I'm especially excited now that they're selling fish oil, and are offering listeners of this podcast free one -month's supply of wild -caught omega -3 fish oil.
[23] Sounds good when it's wild -caught for some reason.
[24] When you go to athletic greens .com slash Lex to claim the special offer, that's athletic greens .com slash Lex for the drink and the fish oil.
[25] Trust me, it's worth it.
[26] This episode is sponsored by Brook Linen Sheets.
[27] Sleep has increasingly become a source of joy for me with an eight -sleep self -cooling bed and these incredible smooth, buttery smooth, as they call them, and cozy brooklyn sheets.
[28] I've often slept on the carpet without anything but a jacket and jeans, so I'm not exactly the world's greatest expert in comfort, but these sheets have been an amazing upgrade over anything I've ever used, even the responsible adult sheets I've purchased in the past.
[29] there's a variety of colors patterns material variants to choose from they have over 50 ,000 five -star reviews people love them I think figuring out a sleep schedule that works for you is one of these essential challenges of a productive life don't let your choice of sheets get in the way of this optimization process go to brooklyn .com and use code lex to get 25 bucks off when you spend $100 or more.
[30] Plus, you get free shipping.
[31] That's brooklyn .com and enter promo code Lex.
[32] This show is sponsored by ExpressVPN, a company that adds a layer of protection between you and a small number of technology companies that control much of your online life.
[33] ExpressVPN is a powerful tool for fighting back in the space of privacy.
[34] As I mentioned in many places, I've been honestly troubled by Amazon's decision to remove parlor from, AWS.
[35] To me, it was an overreach of power that threatens the American spirit of the entrepreneur.
[36] Anyway, ExpressVPN hides your IP address, something that can be used to person to identify you.
[37] So the VPN makes your activity harder to trace and sell to advertisers.
[38] And it does all of this without slowing your connection.
[39] I've used it for many years on Windows, Linux, and Android, actually on iPhone now.
[40] But it's available everywhere else too.
[41] I don't know.
[42] I don't where else it's available.
[43] Maybe Windows phone?
[44] I don't know.
[45] For me, it's been fast and easy to use one big power on button that's fun to press.
[46] Probably my favorite intuitive design of an app that doesn't try to do more than it needs to.
[47] Go to expressvPN .com slash LexPod to get an extra three months free on a one -year package.
[48] That's ExpressVPN .com slash LexPod.
[49] This show is also sponsored by Bell Campo Farms, whose mission is to deliver meat you can feel good about.
[50] That's meat that is good for you, good for the animals, and good for the planet.
[51] Bell Campo animals graze on open pastures and seasonal grasses resulting in meat that is higher in nutrients and healthy fats.
[52] The farms are certified humane, which is the gold standard for the kind and responsible treatment of farm animals.
[53] As I've mentioned in the past, clean diet of meat and veggies, for me, has been an important part of my productive life.
[54] It maximizes my mental and physical performance.
[55] Bell Campo has been the best meat I've ever eaten at home, so I can't recommend it highly enough.
[56] Also, the CEO of the company, Anya, forget her last name, starts with an F, I think it's Fernald, follow her on Instagram or wherever else she's active because she happens to be a brilliant chef and just has a scientific view of agriculture and food in general, which I find fascinating and inspiring.
[57] Anyway, you can order Belcampo's sustainably raised meats to be delivered straight to your door using code Lex at bellcampo .com slash Lex for 20 % off the first time customers.
[58] That's code Lex at bellcampo .com slash Lex.
[59] Trust me, the extra bit of cost is worth it.
[60] And now here's my conversation with Jim Keller.
[61] What's the value and effectiveness of theory versus engineering this dichotomy in building good software or hardware systems?
[62] Well, it's good designs both.
[63] I guess that's pretty obvious.
[64] But engineering do you mean, you know, reduction of practice of known methods?
[65] And then science is the pursuit of discovering things that people don't understand, or solving unknown problems.
[66] Definitions are interesting here, but I was thinking more in theory, constructing models that kind of generalize about how things work.
[67] Engineering is actually building stuff, the pragmatic, like, okay, we have these nice models, but how do we actually get things to work?
[68] Maybe economics is a nice example.
[69] Like economists have all these models of how the economy works and how different policies will have an effect.
[70] But then there's the actual, okay, let's call it engineering of, like, actually deploying the policies.
[71] So computer design is almost all engineering and reduction in practice and own methods.
[72] Now, because of the complexity of the computers we built, you know, you could think you're, well, we'll just go write some code, and then we'll verify it, and then we'll put it together, and then you find out that the combination of all that stuff is complicated, and then you have to be inventive to figure out how to do it.
[73] Right?
[74] So that's definitely happens a lot.
[75] And then every so often some big idea happens, but it might be one person.
[76] And that idea is in what in the space of engineering or is it in the space of?
[77] Well, I'll give you an example.
[78] So one of the limits of computer performance is branch prediction.
[79] So and there's a whole bunch of ideas about how good you could predict the branch.
[80] And people said there's a limit to it.
[81] It's an asphotic curve.
[82] And somebody came up with a better way to do branch prediction.
[83] It was a lot better.
[84] And he published a paper on it, and every computer in the world now uses it.
[85] And it was one idea.
[86] So the engineers who build branch prediction hardware were happy to drop the one kind of training array and put it in another one.
[87] So it was a real idea.
[88] And branch prediction is one of the key problems underlying all of sort of the lowest level of software.
[89] It boils down to branch prediction.
[90] boils down the uncertainty.
[91] Computers are limited by, you know, single -thread computers are limited by two things.
[92] The predictability of the path of the branches and predictability of the locality of data.
[93] So we have predictors that now predict both of those pretty well.
[94] Yeah.
[95] So memories, you know, a couple hundred cycles away, local cache is a couple cycles away.
[96] When you're executing fast, virtually all the data has to be in the local cache.
[97] So a simple program says, you know, add one to every element in an array.
[98] It's really easy to see what the stream of data will be.
[99] But you might have a more complicated program that says get an element of this array, look at something, make a decision, go get another element, it's kind of random.
[100] And you can think that's really unpredictable.
[101] And then you make this big predictor that looks at this kind of pattern and you realize, well, if you get this data and this data, then you probably want that one.
[102] And if you get this one and this one and this one, you probably want that one.
[103] And is that theory or is that engineering?
[104] Like the paper that was written, was it an asymptotic kind of discussion, or is it more like, here's a hack that works well?
[105] It's a little bit of both.
[106] Like, there's information to your even, I think, somewhere.
[107] Okay.
[108] So it's actually trying to prove.
[109] Yeah, but once you know the method, implementing it is an engineering problem.
[110] Now, there's a flip side of this, which is, in a big design team, what percentage of people think their plan or their life's work is engineering, right?
[111] versus inventing things.
[112] So lots of companies will reward you for filing patents.
[113] Some, many big companies get stuck because to get promoted, you have to come up with something new.
[114] And then what happens is everybody's trying to do some random new thing, 99 % of which doesn't matter.
[115] And the basics get neglected.
[116] And or they get to, there's a dichotomy.
[117] They think like the cell library and the basic CAD tools, you know, or basic, you know software validation methods that's simple stuff you know they want to work on the exciting stuff and then they they spend lots of time trying to figure out how to patent something and that's mostly useless but the breakthroughs on the simple stuff no no you no you have to do the simple stuff really well if you're building a building out of bricks you want great bricks so you go to two places to sell bricks so one guy says yeah they're over there in a ugly pile and the other guy is like lovingly tells you about the 50 kinds of bricks and how hard they are and how beautiful they are and square they are and you know which one are you can buy bricks for them which is going to make a better house so you're talking about the craftsman the person who understands bricks who loves bricks who loves the variety that's a good word you know good engineering is great craftsmanship and when you start thinking engineering is about invention and set up a system that rewards invention and the craftsmanship gets neglected.
[118] Okay, so maybe one perspective is the theory, the science over -emphasizes invention, and engineering emphasizes craftsmanship, and therefore, so if you, it doesn't matter what you do, theory, engineering.
[119] Like, read the tech rags.
[120] They're always talking about some breakthrough or innovation, and everybody thinks that's the most important thing.
[121] But the number of innovative ideas is actually relatively low.
[122] We need them, right?
[123] And innovation creates a whole new opportunity.
[124] Like when some guy invented the internet, right?
[125] Like that was a big thing.
[126] The million people that wrote software against that were mostly doing engineering software writing.
[127] So the elaboration of that idea was huge.
[128] I don't know if you know Brendan Nike, he wrote JavaScript in 10 days.
[129] That's an interesting story.
[130] It makes me wonder, and it was famously for many years, considered to be a pretty crappy programming language.
[131] It still is, perhaps.
[132] It's been improving sort of consistently.
[133] But the interesting thing about that guy is, you know, he doesn't get any awards.
[134] You don't get a Nobel Prize or Fields Medal or...
[135] For preventing a crappy piece of, you know, software code.
[136] That is currently the number one programming language in the world and runs now is increasingly running the backhand of the internet.
[137] Does he know why everybody uses it?
[138] Like, that would be an interesting thing.
[139] Was it the right thing at the right time?
[140] Because when stuff like JavaScript came out, like there was a move from, you know, writing C programs and C++ to, let's call what they call managed code frameworks, where you write simple code, it might be interpreted, it has lots of libraries, productivity is high, and you don't have to be an expert.
[141] So, you know, Java was supposed to solve all the world's problems.
[142] It was complicated.
[143] JavaScript came out, you know, after a bunch of other scripting languages.
[144] I'm not an expert on it, but was it the right thing at the right time?
[145] Or was there something, you know, clever because he wasn't the only one.
[146] There's a few elements.
[147] And maybe if he figured out what it was, then he'd get a prize.
[148] Like that's...
[149] Constructive theory.
[150] Yeah, you know, maybe he's probably hasn't defined this.
[151] Or he just needs a good promoter.
[152] Well, I think there was a bunch of blog posts written about it, which is like wrong is right, which is like doing the crappy thing fast, just like hacking together the thing that answers some of the needs and then iterating over time, listening to developers, like listening to people who actually use the thing.
[153] This is something you can do more in software.
[154] But the right time, like you have to sense, you have to have a good instinct of when is the right time for the right tool and make it super simple and just get it out there.
[155] The problem is this is true with, hardware, this is less true with software, is this backward compatibility that just drags behind you as, you know, as you try to fix all the mistakes of the past.
[156] But the timing was good.
[157] There's something about that.
[158] And it wasn't accidental.
[159] You have to like give yourself over to the, you have to have this like broad sense of what's needed now, both scientifically and like the community.
[160] And just like this, it was obvious that, uh, There was no, the interesting thing about JavaScript is everything that ran in the browser at the time, like Java and, and I think other, like, scheme, other programming languages, they were all in a separate external container.
[161] And then JavaScript was literally just injected into the web page.
[162] It was the dumbest possible thing running in the same thread as everything else.
[163] And, like, it was inserted as a comment.
[164] So JavaScript code is inserted as a comment in the HTML code.
[165] And it was, I mean, it's, there's, it's either genius or super dumb, but it's like, it's so.
[166] It had no apparatus for like a virtual machine and container.
[167] It just executed in the framework, the program that's already running.
[168] And it was, that's cool.
[169] And then because something about that accessibility, the ease of its use resulted in then developers innovating of how to actually use it.
[170] I mean, I don't even know what to make of that, but it does seem to echo across different software, like stories of different software.
[171] Ph .P. has the same story, really crappy language that just took over the world.
[172] Well, we just have a joke that the random length instruction set, variable length instruction sets always one, even though they're obviously worse.
[173] Nobody knows why.
[174] X86 is arguably the worst architecture on the planet.
[175] It's one of the most popular ones.
[176] Well, I mean, isn't that also the story of risk versus risk?
[177] I mean, is that simplicity?
[178] There's something about simplicity that us in this evolutionary process is valued.
[179] If it's simple, it spreads faster, it seems like.
[180] Or is that not always true?
[181] That's not always true.
[182] Yeah, it could be simple as good, but too simple is bad.
[183] So why did risk win, you think, so far?
[184] Did risk win?
[185] In the long arc of history.
[186] We don't know.
[187] So who's going to win?
[188] What's risk?
[189] What's CISC and who's going to win in that space in these instruction sets?
[190] A .I. Software's going to win, but there'll be little computers that run little programs like normal all over the place.
[191] But we're going through another transformation, so.
[192] But you think instruction sets underneath it all will change?
[193] Yeah, they evolve slowly.
[194] They don't matter very much.
[195] They don't matter very much.
[196] Okay.
[197] I mean, the limits of performance are, you know, particularly.
[198] ability of instructions and data.
[199] I mean, that's the big thing.
[200] And then the usability of it is some, you know, quality of design, quality of tools, availability.
[201] Like right now, X -86 is proprietary with Intel and AMD, but they can change it any way they want independently.
[202] Right.
[203] Arm is proprietary to arm, and they won't let anybody else change it.
[204] So it's like a sole point.
[205] And RIS 5 is open source, so anybody can change it, which is super cool.
[206] but that also might mean it gets changed in too many random ways that there's no common subset of it that people can use.
[207] Do you like open or do you like clothes?
[208] Like if you were to bet all your money on one or the other, risk five versus it?
[209] No idea.
[210] It's case dependent.
[211] Well, X86, oddly enough, when Intel first started developing it, they licensed it like seven people.
[212] So it was the open architecture.
[213] And then they move faster than others and also bought one or two of them.
[214] But there was seven different people making X86.
[215] Because at the time, there was 6502 and Z80s and 8086.
[216] And you could argue everybody thought Z80 was the better instruction set.
[217] But that was proprietary to one place.
[218] Oh, in the 6 ,800.
[219] So there was like four or five different microprocessors.
[220] Intel went open, got the market share because people felt like they had multiple sources from it.
[221] And then over time it narrowed down to two players.
[222] so why you as a historian uh why did intel win for so long with uh with their processors i mean they were great their process development was great so it's just looking back to javascript and brandy like is uh in microsoft and netscape and all these uh internet browsers microsoft won the browser game because they aggressively stole other people's ideas like right or they did it you know i don't know if intel was stealing other people's ideas they started making stealing a good way just to clarify they started they started making rams random access memories yes and then at the time when the japanese manufacturers came up you know they were getting out competed on that and they pivoted the microprocessors and they made the first you know integrated microprocessor grant programs it was uh 4004 or something who was behind that pivot that's a hell a pivot.
[223] Andy Grove.
[224] And he was great.
[225] That's a hell of a pivot.
[226] And then they led semiconductor industry.
[227] Like they were just a little company, IBM, all kinds of big companies had boatloads of money.
[228] And they out -innovated everybody.
[229] Outer innovated.
[230] Okay.
[231] Yeah, yeah.
[232] So it's not like marketing.
[233] It's not any other stuff.
[234] And their processor designs were pretty good.
[235] I think the, you know, core two was probably the first one I thought was great.
[236] It was a really fast processor, and then Haswell was great.
[237] What makes a great processor in that?
[238] Oh, if you just look at it, it's performance versus everybody else.
[239] It's, you know, the size of it, the usability of it.
[240] So it's not specific, some kind of element that makes you beautiful.
[241] It's just like literally just raw performance.
[242] Is that how you think of bioprocessors?
[243] It's just like raw performance?
[244] Of course.
[245] It's like a horse race.
[246] The fastest one wins.
[247] now you don't care how it's as long as it would well there's the fastest in the environment like for years you made the fastest one you could and then people started to have power limits so then you made the fastest at the right powerpoint and then when we started doing multi -processors like if you could scale your processors more than the other guy you could be 10 % faster on like a single thread but you have more threads so there's lots of variability and then arm really explored like they have the A series and the R series and the M series like a family of processors for all these different design points from like unbelievably small and simple and so then when you're doing the design it's sort of like this big palette of CPUs like they're the only ones with a credible top to bottom palette and what do you mean a credible top to bottom?
[248] Well there's people who make microcontrollers that are small but they don't have a fast one There's people make fast processors but don't have a medium one or a small one.
[249] Is that hard to do that full palette?
[250] That seems like a...
[251] Yeah, it's a lot of different.
[252] So what's the difference in the arm folks and Intel in terms of the way they're approaching this problem?
[253] Well, Intel, almost all their processor designs were, you know, very custom high -end, you know, for the last 15, 20 years.
[254] So the fastest horse possible.
[255] Yeah.
[256] In one horse race.
[257] Yeah.
[258] And the architecture that are really good.
[259] But the company itself was fairly insular to what's going on in the industry with CAD tools and stuff.
[260] And there's this debate about custom design versus synthesis and how do you approach that?
[261] I'd say Intel was slow on the getting to synthesize processors.
[262] Arm came in from the bottom and they generated IP, which went to all kinds of customers.
[263] So they had very little say on how the customer implemented their IP.
[264] So Arm is super friendly to the synthesis IP environment.
[265] where as Intel said, we're going to make this great client chip, server chip, with our own CAD tools, with our own process, with our own, you know, other supporting IP, and everything only works with our stuff.
[266] So is that, is Arm winning the mobile platform space in terms of process?
[267] And so, in that, what you're describing is why they're winning.
[268] Well, they had lots of people doing lots of different experiments.
[269] So they controlled the processor architecture and IP.
[270] but they let people put it in lots of different chips, and there was a lot of variability in what happened there.
[271] Whereas Intel, when they made their mobile, their foray into mobile, they had one team doing one part, right?
[272] So it wasn't 10 experiments.
[273] And then their mindset was PC mindset, Microsoft, software mindset, and that brought a whole bunch of things along that the mobile world, the embedded world, don't do.
[274] Do you think it was possible for Intel to pivot hard and win the mobile market?
[275] that's a hell of a difficult thing to do, right, for a huge company to just pivot.
[276] I mean, so interesting to, because we'll talk about your current work, it's like, it's clear that PCs were dominating for several decades, like desktop computers, and then mobile, it's unclear.
[277] It's the leadership question.
[278] Like Apple under Steve Jobs, when he came back, they pivoted multiple times.
[279] You know, they built iPads and iTunes and.
[280] phones and tablets and great macs like who knew computers should be made out of aluminum nobody knew that that they're great it's super fun that was steve yeah steve jobs like they pivoted multiple times and uh you know the old intel they did that multiple times they made derams and processors and processes and i got to ask this what was the like working with steve jobs i didn't work with him.
[281] Did you interact with him?
[282] Twice.
[283] I said hi to him twice in the cafeteria.
[284] What did you say?
[285] Hi.
[286] He said, hey, fellas.
[287] He was friendly.
[288] He was wandering around and with somebody he couldn't find the table because the cafeteria was packed and I gave my table.
[289] But I worked for Mike Colbert who talked to, like Mike was the unofficial CTO of Apple and a brilliant guy and he worked for Steve for 25 years, maybe more.
[290] And he talked to Steve multiple times a day.
[291] And he was one of the people who could put up with Steve's, let's say, brilliance and intensity.
[292] And Steve really liked him.
[293] Steve trusted Mike to translate the shit he thought up into engineering products that worked.
[294] And then Mike ran a group called Platform Architecture, and I was in that group.
[295] So many times I'd be sitting with Mike, and the phone would ring.
[296] It'd be Steve, and Mike would hold the phone like this, because Steve would be yelling about something or other.
[297] Yeah, and then he would translate.
[298] And then he would say, Steve wants us to do this.
[299] So.
[300] Was Steve a good engineer or no?
[301] I don't know.
[302] He was a great idea guy.
[303] Idea person.
[304] And he's a really good selector for talent.
[305] Yeah, that seems to be one of the key elements of leadership, right?
[306] And then he was a really good first principles guy.
[307] Like somebody would say something couldn't be done, and he would just think that's obviously wrong, right?
[308] But, you know, maybe it's hard to do.
[309] Maybe it's expensive to do.
[310] Maybe we need different people.
[311] You know, there's like a whole bunch of, you know, if you want to do something hard, you know, maybe it takes time.
[312] Maybe you have to iterate.
[313] There's a whole bunch of things you could think about, but saying it can't be done is stupid.
[314] How would you compare?
[315] So it seems like Elon Musk is more engineering -centric, but it's also, I think he considers himself a designer too.
[316] He has a design mind.
[317] Steve Jobs feels like he's much more idea -spaced, design space versus engineering.
[318] Just make it happen.
[319] The world should be this way.
[320] Just figure it out.
[321] But he used computers.
[322] You know, he had computer people talk to them all the time.
[323] Like Mike was a really good computer guy.
[324] He knew what computers could do.
[325] Computer meaning computer hardware, like hardware, software, all of pieces.
[326] And then he would, you know, have an idea about what could we do with this next.
[327] That was grounded in reality.
[328] It wasn't like he was, you know, just finger painting on the wall and wishing somebody would interpret it.
[329] So he had this interesting connection because, you know, he wasn't a computer architect or designer, but he had an intuition from the computers we had to what could happen.
[330] It's interesting to say intuition because it seems like he was pissing off a lot of engineers in his intuition about what can and can't be done.
[331] Those, like the, what is all these stories about like floppy disks and all that kind of stuff?
[332] Yeah, so in Steve the first round, like he'd go into a lab and look at what's going on and hate it and fire people or ask somebody in the elevator what they're doing for Apple and not be happy.
[333] When he came back, my impression was he surrounded himself with a relatively small group of people and didn't really interact outside of that as much.
[334] and then the joke was you'd see like somebody moving up prototype through the quad with a with a black blanket over it and that was because it was secret you know partly from steve because they didn't want Steve to see it until it was ready yeah the dynamic with Johnny Ive and Steve is interesting it's like you don't want to he ruins as many ideas as he generates yeah yeah is a dangerous kind of line to walk if you have a lot of ideas like Gordon Bell was famous for ideas right and it wasn't that the percentage of good ideas was way higher than anybody else it was he had so many ideas and he was also good talking people about it and getting the filters right and you know seeing through stuff whereas Elon was like hey I want to build rockets so Steve would hire a bunch of rocket guys and Elon would go read rocket manuals so Elon is a better engineer a sense like or like more like a love and passion for the manuals.
[335] And the details.
[336] The details.
[337] The craftsmanship too, right?
[338] Well, I guess you had a craftsmanship too, but of a different kind.
[339] What do you make of the just the standard for just a little long ago?
[340] What do you make of like the anger and the passion and all that?
[341] The firing and the mood swings and the madness, you know, being emotional and all that.
[342] That's Steve.
[343] And I guess Elon too.
[344] So what is that a, Is that a bugger feature?
[345] It's a feature.
[346] So there's a graph, which is y -axis productivity.
[347] Yeah.
[348] X -axis at zero is chaos.
[349] Yeah.
[350] And infinity, it's complete order.
[351] Yeah.
[352] Right.
[353] So as you go from the origin, as you improve order, you improve productivity.
[354] Yeah.
[355] And at some point, productivity peaks, and then it goes back down again.
[356] Too much order, nothing can happen.
[357] Yes.
[358] But the question is, is, How close to the chaos is that?
[359] No, now here's the thing, is once you start moving a direction of order, the force vector to drive you towards order is unstoppable.
[360] Oh, it's a slippery.
[361] And every organization will move to the place where their productivity is stymied by order.
[362] So you need a...
[363] So the question is, who's the counterforce?
[364] Because it also feels really good.
[365] As you get more organized, then productivity goes up.
[366] The organization feels it.
[367] They orient towards it, right?
[368] They hire more people.
[369] They get more guys who can run process.
[370] You get bigger, right?
[371] And then inevitably, the organization gets captured by the bureaucracy that manages all the processes.
[372] Yeah.
[373] And then humans really like that.
[374] And so if you just walk into a room and say, guys, love what you're doing, but I need you to have less order.
[375] If you don't have some force behind that, nothing will happen.
[376] I can't tell you on how many levels that's profound.
[377] So that's why I'd say it's a feature.
[378] Now, could you be nicer about it?
[379] I don't know.
[380] I don't know any good examples of being nicer about it.
[381] Well, the funny thing is to get stuff done.
[382] You need people who can manage stuff and manage people, because humans are complicated.
[383] They need lots of care and feeding that you need to tell them they look nice and they're doing good stuff and pat them on the back, right?
[384] I don't know.
[385] You tell me, is that needed?
[386] Do you need that?
[387] I had a friend, he started to manage a group, and he said, I figured it out.
[388] You have to praise them before they do anything.
[389] I was waiting until they were done.
[390] And they were always mad at me. Now I tell them what a great job they're doing while they're doing it.
[391] But then you get stuck in that trap, because then when they're not doing something, how do you confront these people?
[392] I think a lot of people that have had trauma in their childhood would disagree with you, successful people, that you need to first do the rough stuff and then be nice later.
[393] I don't know.
[394] Okay, but, you know, engineering companies are full of adults who had all kinds of range.
[395] of childhoods, you know, most people had okay childhoods.
[396] Well, I don't know if, uh...
[397] And lots of people only work for praise, which is weird.
[398] You mean like everybody.
[399] I'm not that interested in it, but...
[400] Well, you're, you're probably looking for somebody's approval.
[401] Even still.
[402] Yeah, maybe.
[403] I should think about that.
[404] Maybe somebody who's no longer with this kind of thing.
[405] I don't know.
[406] I used to call it my dad and tell him what I was doing.
[407] He was, he was very, he was very, excited about engineering and stuff you've got his approval uh yeah a lot i was lucky like he he decided i was smart and unusual as a kid and that was okay when i was really young so when i like did poorly in school i was dyslexic i didn't read until i was third or fourth grade and they didn't care my parents were like oh he'll be fine so cool i was lucky that was cool is he still with us you miss him Yeah, he had Parkinson's and then cancer.
[408] His last 10 years were tough.
[409] And I killed him.
[410] Killing a man like that's hard.
[411] The mind?
[412] Well, it's pretty good.
[413] Parkinson's caused a slow dementia.
[414] And the chemotherapy, I think, accelerated it.
[415] But it was like hallucinogenic dementia.
[416] So he was clever and funny and interesting and was pretty unusual.
[417] do you remember conversations from that time like what do you have fond memories of the guy yeah oh yeah anything come to mind a friend told me one time i could draw a computer on the wayport faster than anybody you'd ever met and i said you should meet my dad like when i was a kid he'd come home and say i was driving by this bridge and i was thinking about it and he pulled out a piece of paper and he draw the whole bridge he was a mechanical engineer yeah and he would just draw the whole thing and then he would tell me about it and tell me how he would have changed it.
[418] And he had this, you know, idea that he could understand and conceive anything.
[419] And I just grew up with that, so that was natural.
[420] So, you know, like when I interview people, I ask him to draw a picture of something they did on a whiteboard.
[421] And it's really interesting.
[422] Like some people draw a little box, you know, and then they'll say, and this talks to this.
[423] And I'll be like, oh, that's this frustrating.
[424] And I had this other guy come in one time, he says, well, I designed a floating point this chip, but I'd really like to tell you how the whole thing works and then tell you how the floating point works inside of.
[425] Do you mind if I do that?
[426] He covered two whiteboards in like 30 minutes.
[427] And I hired him.
[428] Like, he was great.
[429] There's a crossman.
[430] I mean, that's the crossmanship to that.
[431] Yeah, but also the mental agility to understand the whole thing.
[432] Right.
[433] Put the pieces in context, you know, real view of the balance of how the design worked.
[434] Because if you don't understand it properly when you start to draw it you'll you'll fill up half the whiteboard with like a little piece of it and you know like your ability to lay it out an understandable way it takes a lot of understanding so and be able to so zoom into the detail and then zoom out to the big picture what about the impossible thing you see your dad believed that uh you can do anything that's a weird feature for a craftsman yeah it seems that that uh echoes in your own behavior like that's that's the well it's not that anybody can do anything right now right it's that if you work at it you can get better at it and there might not be a limit and they did funny things like like he always wanted to play piano so at the end of his life he started playing a piano when he had Parkinson's and he was terrible but he thought if he really worked out in this life maybe the next life he'd be better at it he might be on to something yeah he enjoyed doing it yeah That's pretty funny.
[435] Do you think the perfect is the enemy of the good in hardware and software engineering?
[436] It's like we were talking about JavaScript a little bit and the messiness of the 10 -day building process.
[437] Yeah, you know, creative tension, right?
[438] So creative tension is you have two different ideas that you can't do both, right?
[439] But the fact that you want to do both causes you to go try to solve that problem.
[440] That's the creative part.
[441] So if you're building computers, like some people say we have the schedule and anything that doesn't fit in the schedule we can't do, right?
[442] And so they throw out the perfect because they have a schedule.
[443] I hate that.
[444] Then there's other people to say we need to get this perfectly right and no matter what, you know, more people, more money, right?
[445] And there's a really clear idea about what you want.
[446] Some people really go to articulate in it.
[447] Right.
[448] So let's call that the perfect, yeah.
[449] Yeah.
[450] All right.
[451] But that's also terrible because they never ship anything.
[452] You never hit any goals.
[453] So now you have your framework.
[454] Yes.
[455] You can't throw out stuff because you can't get it done today because maybe you get it done tomorrow or the next project.
[456] Right.
[457] You can't.
[458] So you have to, I work with a guy that I really like working with, but he overfilters his ideas.
[459] Over filters.
[460] He'd start thinking about something.
[461] And as soon as he figured out was wrong with it, he'd throw it out.
[462] And then I start thinking about it You know, you come up with an idea And then you find out what's wrong with it And then you give it a little time to set Because sometimes, you know, you figure out how to tweak it Or maybe that idea helps some other idea.
[463] So idea generation is really funny.
[464] So you have to give your idea space.
[465] Like spaciousness of mind is key.
[466] But you also have to execute programs and get shit done.
[467] And then it turns out computer engineering is fun because it takes, you know, 100 people to build a computer 200 to 300, whatever the number is, and people are so variable about, you know, temperament and, you know, skill sets and stuff that in a big organization, you find the people who love the perfect ideas and the people that want to get stuffed on yesterday and people like to come up with ideas and people like the, let's say, shoot down ideas and it takes the whole, it takes a large group of people.
[468] Some are good at generating ideas, some are good at filtering ideas, and then in that giant mess, you somehow, I guess the goal is for that giant mess of people to find the perfect path through the tension, the creative tension.
[469] But like, how do you know when you said there's some people good at articulating what perfect looks like, what a good design is?
[470] Like, if you're sitting in a room and you have a set of ideas about, like, how to design a better processor, How do you know this is something special here?
[471] This is a good idea.
[472] Let's try this.
[473] Have you ever brainstormed idea with a couple of people that were really smart?
[474] And you kind of go into it and you don't quite understand it and you're working on it.
[475] And then you start, you know, talking about it, putting it on the whiteboard, maybe it takes days or weeks.
[476] And then your brain start to kind of synchronize.
[477] It's really weird.
[478] Like you start to see what each other is thinking.
[479] Yeah.
[480] And it starts to work.
[481] like you can see work like my talent and computer design is I can I can see how computers work in my head like really well and I know other people can do that too and when you're working with people that can do that like it is kind of an amazing experience and then and every once in all you get to that place and then you find the flaw which is kind of funny because you can you can fool yourself in but the two of you kind of drifted a long yeah yeah you got a direction that was used to Yeah, that happens too.
[482] Like, you have to, because, you know, the nice thing about computer design is always reduction to practice.
[483] Like, you come up with your good ideas.
[484] And I know some architects who really love ideas.
[485] And then they work on them and they put it on the shelf.
[486] Then go work on the next idea and put it on the shelf.
[487] They never reduce it to practice.
[488] So they find out what's good and bad.
[489] Because almost every time I've done something really new, by the time it's done, like the good parts are good but I know all the flaws like yeah it would you say your career just your own experience is your career defined by mostly by flaws or by successes like if again there's great tension between those if you haven't tried hard yeah right and done something new right then you're you're not going to be facing the challenges when you build it then you find out all the problems with it and But when you look back, do you see problems?
[490] Okay.
[491] When I look back, I think earlier in my career, like EB -5 was the second alpha chip.
[492] I was so embarrassed about the mistakes, I could barely talk about it.
[493] And it was in the Guinness Book of World Records, and it was the fastest processor on the planet.
[494] So it was, and at some point I realized that was really a bad mental framework to deal with, like, doing something new.
[495] We did a bunch of new things, and some worked out great.
[496] Some were bad, and we learned a lot from it, and then the next one we learned a lot.
[497] That also, you know, EV6 also had some really cool things in it.
[498] I think the proportion of good stuff went up, but it had a couple of fatal flaws in it that were painful.
[499] And then, yeah.
[500] You learned to channel the pain into, like, pride?
[501] Not pride, really.
[502] Just realization about how the world works, or how that kind of ideas that works.
[503] Life is suffering, that's the reality.
[504] No, it's not.
[505] I know the Buddha said that, and a couple other people are stuck on it.
[506] No, there's this kind of weird combination, a good and bad, you know, light and darkness that you have to tolerate and, you know, deal with.
[507] Yeah, there's definitely lots of suffering in the world.
[508] Depends on the perspective.
[509] It seems like there's way more darkness, but that makes the light part.
[510] It's really nice.
[511] What computing hardware, or just any kind of, even software design, are you, do you find beautiful from your own work, from other people's work, that you're just, we were just talking about the battleground of flaws and mistakes and errors, but things that were just beautifully done.
[512] Is there something that pops to mind?
[513] Well, when things are beautifully done, usually there's a well -thought -out set of abstraction layer.
[514] So the whole thing works in unison nicely?
[515] Yes.
[516] And when I say abstraction layer, that means two different components when they work together, they work independently.
[517] They don't have to know what the other one is doing.
[518] So that decoupling?
[519] Yeah.
[520] So the famous one was the network stack.
[521] There's a seven -layer network stack, you know, data transport and protocol and all the layers.
[522] And the innovation was, is when they really got that right.
[523] Because networks before that didn't define those very, well, the layers could innovate independently and occasionally the layer boundary would, you know, the interface would be upgraded.
[524] And that let, you know, the design space breathe.
[525] You could do something new in layer seven without having to worry about how layer four worked.
[526] And so good design does that.
[527] And you see it in processor designs.
[528] When we did the Zen design at AMD, we made several components very modular.
[529] And, you know, my insistence at the top was I wanted all the interfaces to find before we wrote the RTL for the pieces.
[530] One of the verification leads had, if we do this right, I can test the pieces so well independently when we put it together, we won't find all these interaction bugs because the floating point knows how the cache works.
[531] And I was a little skeptical, but he was mostly right, that the modularity of design greatly improved the quality.
[532] Is that universally true in general?
[533] Would you say about good designs?
[534] The modularity is usually modular.
[535] Well, we talked about this before.
[536] Humans are only so smart.
[537] And we're not getting any smarter, right?
[538] But the complexity of things is going up.
[539] So, you know, a beautiful design can't be bigger than the person doing it.
[540] It's just, you know, their piece of it.
[541] Like, the odds of you doing a really beautiful design of something that's way too hard for you is low.
[542] Right.
[543] If it's way too simple for you it's not that interesting it's like well anybody could do that but when you get the right match of your your expertise in you know mental power to the right design size that's cool but that's not big enough to make a meaningful impact in the world so now you have to have some framework to design the pieces so that the whole thing is big and harmonious but you know when you put it together it's you know sufficiently sufficiently interesting to be be used.
[544] So that's like a beautiful design is.
[545] Matching the limits of that human cognitive capacity to the module you can create and creating a nice interface between those modules and thereby, do you think there's a limit to the kind of beautiful complex systems we can build with this kind of modular design?
[546] It's like, you know, if we build increasingly more complicated, you can think of like the internet okay let's scale it down you can think of like social network like Twitter as one computing system and but those are little modules yeah right but it's built on it's built on so many components nobody at Twitter even understands right so so if an alien showed up and looked at Twitter he wouldn't just see Twitter as a beautiful simple thing that everybody uses which is really big you would see the network runs on the fiber optics, the data is transported, the computers.
[547] The whole thing is so bloody complicated.
[548] Nobody Twitter understands it.
[549] I think that's what the alien would see.
[550] So, yeah, if an alien showed up and looked at Twitter, or looked at the various different networked systems that you can see on Earth.
[551] So imagine they were really smart and it could comprehend the whole thing.
[552] And then they sort of evaluated a human and thought, this is really interesting.
[553] No human on this planet comprehends the system.
[554] they build no individual or well would they even see individual humans that's like we humans are very human centric entity centric and so we think of us as the organ as the central organism and the networks as just the connection of organisms but from a perspective of an alien from an outside perspective it seems like yeah we're yeah i get it we're the ants and they'd see the ant colony the ant colony yeah or the result of production of the ant colony which is like cities and And it's, it's a, in that sense, humans are pretty impressive, the modularity that we're able to, and the, and how robust we are to noise and mutation, all that kind of stuff.
[555] Well, that's because it's stress tested all the time.
[556] Yeah.
[557] You know, you build all these cities with buildings and you get earthquakes occasionally and, you know, some, you know, wars, earthquakes.
[558] Viruses every once in a while.
[559] You know, changes in business plans for, you know, like shipping or something.
[560] And like, as long as there's all stress tests, then it keeps adapting to the situation.
[561] So that's a curious phenomenon.
[562] Well, let's go.
[563] Let's talk about Moore's Law a little bit.
[564] At the broad view of Moore's Law, where it's just exponential improvement of computing capability, like Open AI, for example, recently published this kind of papers, looking at the exponential improvement in the training efficiency of neural networks for like image net and all that kind of stuff we just got better on this is purely software side just figuring out better tricks and algorithms for training neural networks and that seems to be improving significantly faster than the more's law prediction you know so that's in the software space what do you think if more's law continues or if the general verse law continues or if the general version of Moore's Law continues.
[565] Do you think that comes mostly from the hardware, from the software, some mix of the two, some interesting totally, so not the reduction of the size of the transistor kind of thing, but more in the totally interesting kinds of innovations in the hardware space, all that kind of stuff?
[566] Well, there's like a half a dozen things going on in that graph.
[567] So one is there's initial innovations that had a lot of room to be exploited.
[568] So, you know, the efficiency of the networks is improved dramatically.
[569] And then the decomposability of those, you know, they started running on one computer, then multiple computers, and then multiple GPUs, and then arrays of GPUs, and they're up to thousands.
[570] And at some point, so it's sort of like they were going from like a single computer application to a thousand computer application.
[571] So that's not really a Moore's Law thing.
[572] That's an independent vector.
[573] How many computers can I put it on this?
[574] problem because the computers themselves are getting better on like a Moore's law rate but their ability to go from one to 10 to 100 to a thousand you know was something and then multiplied by you know the amount of computers it took to resolve like Alex Nat to Res Nat to Transformers it's it's been quite you know steady improvements but those are like S -cars aren't they that's the exactly kind of S -curs that are underlying Moore's law from the very beginning so so what's the biggest what's the most productive, rich source of S -curves in the future, do you think?
[575] Is it hardware or is it software?
[576] So hardware is going to move along relatively slowly, like, you know, double performance every two years.
[577] There's still...
[578] I like how you call that slow.
[579] Yeah, it's the slow version.
[580] The snail's pace of Moore's Law.
[581] Maybe we should, we should trademark that one.
[582] Whereas the scaling by number of computers, you know, can go much faster, you know.
[583] I'm sure at some point Google had a, you know, their initial search engine was running out of the laptop, you know, like, and at some point they really worked on scaling that, and then they factored the indexer from, you know, this piece and this piece and this piece, and they spread the data on more and more things.
[584] And, you know, they did a dozen innovations.
[585] But as they scaled up the number of computers on that, it kept breaking, finding new bottlenecks in their software and their schedulers and made them rethink like it seems insane to do a scheduler across a thousand computers who schedule parts of it and then send the results to the one computer but if you want to schedule a million searches that makes perfect sense so there's the scaling by just quantity is probably the richest thing but then as you squale quantity like a network that was great on a hundred computers may be completely the wrong one you may pick a network that's 10 times slower on 10 ,000 computers, like per computer.
[586] But if you go from 100 to 10 ,000, it's 100 times.
[587] So that's one of the things that happened when we did internet scaling is the efficiency went down, not up.
[588] The future of computing is inefficiency, not efficiency.
[589] But scales.
[590] Inefficient scale.
[591] It's scaling faster than inefficiency bites you.
[592] And as long as there's dollar value there, like scaling costs lots of money.
[593] Yeah.
[594] But Google showed, Facebook showed, everybody showed that, scale was where the money was at it was and so it was worth it financially do you think is it possible that like basically the entirety of earth will be like a computing surface like this table will be doing computing this hedgehog will be doing computing like everything really inefficient dumb computing will be loved of fiction books they call it computronium computronium we turn everything into computing well most of the elements aren't very good for anything like you're not going to make a computer out of iron like you know silicon and carbon have like nice structures you know we'll see what you can do with the rest of it not i just people talk about well maybe we can turn the sun into computer but it's it's hydrogen and a little bit of helium so what what i mean is more like actually just adding computers to everything oh okay so you're just converting all the mass of the universe into a computer no no no so not using it'd be ironic from the simulation point of view is like the simulator, build mass, the simulates.
[595] Yeah, I mean, yeah, so, I mean, ultimately this is all heading towards a simulation.
[596] Yeah, well, I think I might have told you this story.
[597] At Tesla, they were deciding, so they want to measure the current coming out of the battery, and they decided between putting a resistor in there and putting a computer with a sensor in there.
[598] And the computer was faster than the computer I worked on in 1982.
[599] And we chose the computer because it was cheaper than the resistor.
[600] So, sure, this hedgehog, you know, it costs $13 and we can put an AI that's as smart as you in there for $5.
[601] It'll have one.
[602] You know, so computers will be, you know, be everywhere.
[603] I was hoping it wouldn't be smarter than me because...
[604] Well, everything's going to be smarter than you.
[605] But you were saying it's inefficient.
[606] I thought it was better to have a lot of dumb things.
[607] Well, Moore's Law will slowly compact that stuff.
[608] So even the dumb things will be smarter than us.
[609] The dumb things are going to be smart.
[610] Or they're going to be smart enough to talk to something.
[611] that's really smart, you know, it's like, well, just remember, like a big computer chip.
[612] Yeah.
[613] You know, it's like an inch by an inch, and, you know, 40 microns thick.
[614] It doesn't take very much, very many atoms to make a high power computer.
[615] Yeah.
[616] And 10 ,000 of them can fit in the shoebox.
[617] But, you know, you have the cooling and power problems, but, you know, people are working on that.
[618] But they still can't write compelling poetry or music or, you know, understand what love is or have a fear of mortality so so we're still winning neither can most of humanity so well they can write books about it so uh but but speaking about this uh this walk along the path of innovation towards uh the dumb things being smarter than humans you are now the CTO of 10 Storrent, as of two months ago, they build hardware for deep learning.
[619] How do you build scalable and efficient deep learning?
[620] This is such a fascinating space.
[621] Yeah, yeah.
[622] So it's interesting.
[623] So up until recently, I thought there was two kinds of computers.
[624] There are serial computers that run like C programs, and then there's parallel computers.
[625] So the way I think about it is, you know, parallel computers have given.
[626] in parallelism.
[627] Like, GPUs are great because you have a million pixels.
[628] And modern GPUs run a program on every pixel.
[629] They call it a shader program, right?
[630] So, or like finite element analysis.
[631] You build something, you know, you make this into little tiny chunks.
[632] You give each chunk to a computer.
[633] So you're given all these chunks, you have parallelism like that.
[634] But most C programs, you write this linear narrative and you have to make a go fast.
[635] To make a go fast, you predict all the branches, all the data fetches, and you run that more in parallel.
[636] But that's found.
[637] parallelism.
[638] AI is, I'm still trying to decide how fundamentalist is.
[639] It's a given parallelism problem.
[640] But the way people describe the neural networks and then how they write them in PyTorch, it makes graphs.
[641] Yeah, that might be fundamentally different than the GPU kind of.
[642] Parallelism, yeah, it might be.
[643] Because when you run the GPU program on all the pixels, you're running, you know, depends, you know, this group of pixels, it's background blue and it runs a really simple program.
[644] This pixel is, you know, some patch of your face.
[645] So you have some really interesting shader program to give you an impression of translucency.
[646] But the pixels themselves don't talk to each other.
[647] There's no graph, right?
[648] So you do the image, and then you do the next image, and you do the next image.
[649] And you run 8 million pixels, 8 million programs every time, and modern GPUs have like 6 ,000 thread engines in them.
[650] So, you know, to get, eight million pixels, each one runs a program on, you know, 10 or 20 pixels.
[651] And that's how they work, but there's no graph.
[652] But you think graph might be a totally new way to think about hardware?
[653] So Rajakad Dori and I have been having this good conversation about given versus found parallelism.
[654] And then the kind of walk is we got more transistors, like, you know, computers way back when did stuff on scalar data.
[655] Now we did it on vector data, famous vector machines.
[656] Now we're making computers that operate on matrices, right?
[657] And then the category we said that was next was spatial.
[658] Like, imagine you have so much data that, you know, you want to do the compute on this data.
[659] And then when it's done, it says, send the result to this pile of data on some software on that.
[660] And it's better to think about it spatially than to move all the data to a central processor and do all the work.
[661] So especially, I mean, moving in the space of data as a supposed to move in the data.
[662] You know, you have a petabyte data space spread across some huge array of computers, and when you do a computation somewhere, you send the result of that computation or maybe a pointer to the next program to some other piece of data and do it.
[663] But I think a better word might be graph, and all the AI neural networks are graphs.
[664] Do some computations, send a result here, do another computation, do a data transformation, do a merging, do a pooling, do another computation.
[665] Is it possible to compress and say how we make this thing efficient, this whole process efficient, this different?
[666] So first, the fundamental elements in the graphs are things like matrix multiplies, convolutions, data manipulations, and data movements.
[667] Yeah.
[668] So GPUs emulate those things with their little singles, you know, basically running a single -threaded program.
[669] And then there's a, you know, an Nvidia calls it a warp where they group a bunch of programs that are similar together.
[670] So for efficiency and instruction use.
[671] And then at a higher level, you kind of, you take this graph and you say this part of the graph is a matrix multiplier, which runs on these 32 threads.
[672] But the model at the bottom was built for running programs on pixels, not executing graphs.
[673] So it's emulation ultimately.
[674] So is it possible to build something that natively runs graphs?
[675] Yes.
[676] So that's what Ten Store did.
[677] So.
[678] Where are we on that?
[679] How, like in the history of that effort, are we in the early days?
[680] Yeah, I think so.
[681] Ten Storrents started by a friend of mine, Lebeja Bajek, and I was his first investor.
[682] So I've been, you know, kind of following him and talking to him about it for years.
[683] And in the fall, and I was considering things to do.
[684] I decided, you know, we held a conference last year with a friend organized it, and we wanted to bring in thinkers, and two of the people were Andre, Capacity and Chris Latner and Andre gave this talk.
[685] It's on YouTube called Software 2 .0, which I think is great, which is we went from programs computers where you write programs to data program computers.
[686] You know, like the future is, you know, of software as data programs, the networks.
[687] And I think that's true.
[688] And then Chris has been work, he worked on LLVM, the low level virtual machine, which became the interoperable.
[689] intermediate representation for all compilers.
[690] And now he's working on another project called MLIR, which is mid -level intermediate representation, which is essentially under the graph about how do you represent that kind of computation and then coordinate large numbers of potentially heterogeneous computers.
[691] And I would say technically tense torrents, you know, two pillars of those those two idea, software 2 .0 and mid -level representation.
[692] But it's in service of executing graph programs.
[693] The hardware is designed to do that.
[694] So it's including the hardware piece.
[695] Yeah.
[696] And then the other cool thing is for a relatively small amount of money, they did a test chip and two production chips.
[697] So it's like a super effective team.
[698] And unlike some AI startups where if you don't build the hardware to run the software that they really want to do, then you have to fix it by writing lots more software.
[699] So the hardware naturally does matrix multiply, convolution, the data manipulations, and the data movement between processing elements that you can see in the graph, which I think is all pretty clever, and that's what I'm working on now.
[700] So I think it's called the Grace Call processor introduced last year.
[701] It's, you know, there's a bunch of measures of performance.
[702] We're talking about horses.
[703] It seems to outperform 368 trillion operations per second.
[704] Seems to outperform NVIDIA's Tesla T4 system.
[705] So these are just numbers.
[706] What do they actually mean in real world perform?
[707] Like what are the metrics for you that you're chasing in your horse race?
[708] Like, what do you care about?
[709] Well, first, so the native language of, you know, people who write AI network programs is Pytorch now.
[710] Pytorch's TensorFlow.
[711] There's a couple others.
[712] The Pi Torch has one over TensorFlow, which is just...
[713] I'm not an expert on that.
[714] I know many people who have switched from TensorFlow to Pi Torch.
[715] Yeah.
[716] And there's technical reasons for it.
[717] I use both.
[718] Both are still awesome.
[719] Both are still awesome.
[720] But the deepest love is for PyTorch currently.
[721] Yeah, there's more love for that.
[722] And that may change.
[723] So the first thing is, when they write their programs, can the hardware execute it pretty much as it was written?
[724] Right.
[725] So PyTorch turns into a graph.
[726] We have a graph compiler that makes that graph.
[727] Then it fractions the graph down.
[728] So if you have a big matrix multiply, we turn it in the right -sized chunks to run on the processing elements.
[729] It hooks all the graph up.
[730] It lays out all the data.
[731] There's a couple of mid -level representations of it that are also simulatable so that if you're writing the code, you can see how it's going to go through the machine, which is pretty cool.
[732] And then at the bottom, it schedules kernels, like mass, data manipulation, data movement kernels, which do this stuff.
[733] So we don't have to write a little program to do matrix multiply because we have a big matrix multiplier.
[734] Like there's no SIMD program for that.
[735] But there is scheduling for that, right?
[736] So one of the goals is if you write a piece of PyTorch code that looks pretty reasonable, you should be able to compile it and run it on the hardware without having to tweak it and do all kinds of crazy things to get performance.
[737] There's not a lot of intermediate steps.
[738] It's running directly as written.
[739] Like on a GPU, if you write, large matrix multiplied naively, you'll get 5 to 10 % of the peak performance of the GPU.
[740] Right.
[741] And then there's a bunch of people published papers on this, and I read them about what steps do you have to do?
[742] And it goes from pretty reasonable, well, transpose one of the matrices, so you do row ordered, not column ordered, you know, block it so that you can put a block of the matrix on different SMs, you know, groups of threads.
[743] But some of it gets into little details.
[744] Like, you have to schedule it just.
[745] so, so you don't have registered conflicts.
[746] So they call them Kuda Ninjas.
[747] Kooten Niches.
[748] I love it.
[749] To get to the optimal point, you either use a pre -written library, which is a good strategy for some things, or you have to be an expert in microarchitecture to program it.
[750] Right, so the optimization step is a way and more complicated with the GPU.
[751] So our goal is if you write PyTorch, that's good pie torch, you can do it.
[752] Now there's, as the networks are evolving, You know, they've changed from convolutional to matrix multiply.
[753] People are talking about conditional graphs.
[754] You're talking about very large matrices.
[755] They're talking about sparsity.
[756] They're talking about problems that scale across many, many chips.
[757] So the native, you know, data item is a packet.
[758] So you send a packet to a processor.
[759] It gets processed.
[760] It does a bunch of work.
[761] And then it may send packets to other processors.
[762] And they execute in like a data flow graph kind of methodology.
[763] Got it.
[764] We have a big network on chip, and then the second chip has 16 Ethernet ports to hook lots of them together.
[765] And it's the same graph compiler across multiple chips.
[766] So that's where the scale comes in.
[767] So it's built to scale naturally.
[768] Now, my experience with scaling is as you scale, you run into lots of interesting problems.
[769] So scaling is a mountain to climb.
[770] Yeah.
[771] So the hardware is built to do this, and then we're in the process of...
[772] Is there a software part to this with Ethernet and all that?
[773] Well, the protocol at the bottom, you know, we send, you know, it's an Ethernet, Phi, but the protocol basically says, send the packet from here to there.
[774] It's all point to point.
[775] The header bit says which processor to send it to, and we basically take a packet off our on -chip network, put an Ethernet header on it, send it to the other end, to strip the header off and send us the local thing.
[776] It's pretty straightforward, too.
[777] Human and human interaction is pretty straightforward, too, but when you get a million of us, we could do some crazy stuff.
[778] together.
[779] Yeah, it could be fun.
[780] So is that the goal is scale?
[781] So, like, for example, I've been recently doing a bunch of robots at home for my own personal pleasure.
[782] Am I going to ever use TenseTorrent or is this more for?
[783] There's all kinds of problems.
[784] Like, there's small inference problems or small training problems or big training problems.
[785] What's the big goal?
[786] Is it the big training problems or the small training problems?
[787] Well, one of the goals is the scale from 100 millawatts to a megawatt.
[788] You know, so, like, really have some range on the problems, and the same kind of AI programs work at all different levels.
[789] So that's cool.
[790] The natural, since the natural data item is a packet that we can move around, it's built to scale, but so many people have, you know, small problems.
[791] Right, right.
[792] But, you know, like inside that phone is a small problem to solve.
[793] So do you see testosterone potentially being inside a phone?
[794] Well, the power efficiency of local memory, local computation and the way we built it is pretty good.
[795] And then there's a lot of efficiency on being able to do conditional graphs and sparsity.
[796] I think it's for complicated networks, I want to go in a small factor, it's quite good.
[797] But we have to prove that.
[798] That's a fun problem.
[799] And that's the early days of the company, right?
[800] It's a couple of years, you said.
[801] But you think, you invest it, you think they're legit, hence you join.
[802] Well, that's...
[803] Well, it's also, it's a really interesting place to be.
[804] Like, the AI world is exploding, you know, and I looked at some other opportunities like build a faster processor, which people want, but that's more on an incremental path than what's going to happen in AI in the next 10 years.
[805] So this is kind of, you know, an exciting place to be part of.
[806] The revolutions will be happening in the very space that...
[807] And then lots of people working on it, but there's lots of technical reasons why some of them, you know, aren't going to work out that well.
[808] And, and, you know, that's interesting.
[809] And there's also the same problem about getting the basics right.
[810] Like, we've talked to customers about exciting features.
[811] And at some point, we realized that each of that we're realizing they want to hear first about memory bandwidth, local bandwidth, compute intensity, program ability.
[812] They want to know the basics, power management, how the network ports work.
[813] Where the basics?
[814] Do all the basics work?
[815] Because it's easy to say, we've got this great idea that you know, the crack, GPT3.
[816] But the people we talk to want to say, if I buy the, so we have a piece of express card with our chip on it, if you buy the card, you plug it in your machine, you download the driver, how long is it take me to get my network to run?
[817] Right.
[818] You know, that's a real question.
[819] It's a very basic question.
[820] So, yeah, is there an answer to that yet, or is it trying to get done it?
[821] Our goal is like an hour.
[822] Okay.
[823] When can I buy a test warrant?
[824] Pretty soon.
[825] for my for the small case training yeah pretty soon months good i love the idea of you inside the room with uh carpathie under carpathy and chris ladner uh very very interesting very brilliant people very out -of -the -box thinkers but also like first principles thinkers well they both get stuff done they only get stuff done to get their own projects done they they talk about it clearly the educate large numbers of people and they've created platforms for other people to go do their stuff on.
[826] Yeah, the clear thinking that's able to be communicated is kind of impressive.
[827] It's kind of remarkable to, yeah, I'm a fan.
[828] Well, let me ask, because I talk to Chris actually a lot these days.
[829] He's been, one of the, just to give him a shout out, and he's been so supportive as a human being.
[830] So everybody's quite different.
[831] Like great engineers are different.
[832] But he's been like, sensitive to the human element in a way that's been fascinating.
[833] He was one of the early people on this stupid podcast that I do to say, like, don't quit this thing and also talk to whoever the hell you want to talk to.
[834] That kind of from a legit engineer to get like props and be like, you can do this.
[835] That was, I mean, that's what a good leader does, right?
[836] To just kind of let a little kid do his thing.
[837] Like, go, go do it.
[838] Let's see what turns out.
[839] That's a pretty powerful thing.
[840] But what do you, what's your sense about?
[841] He used to be, he, now I think, stepped away from Google, right?
[842] He said, sci -fi, I think.
[843] What's really impressive to you about the things that Chris has worked on?
[844] Because we mentioned the optimization, the compiler design stuff, the LLVM.
[845] Then he's also at Google work, the TPU stuff.
[846] he's obviously worked on Swift so the programming language side talking about people that work in the entire idea of the stack yeah yeah what from your time interacting with Chris and knowing the guy what's really impressive to you that just inspires you well well like Lovian became you know the de facto platform for you know compilers like it's it's amazing and you know it was good code quality all the good design choices, he hit the right level of abstraction.
[847] There's a little bit of the right time, the right place.
[848] And then he built a new programming language called Swift, which, you know, after, you know, let's say some adoption resistance became very successful.
[849] I don't know that much about his work at Google, although I know that, you know, that was a typical, they started TensorFlow stuff and they, you know, it was new.
[850] You know, they wrote a lot of code and then at some point it needed to be refactored to be, you know, because its development slowed down why PiTorch started a little later and then passed it.
[851] So he did a lot of work on that.
[852] And then his idea about MLIR, which is what people started to realize is the complexity of the software stack above the low -level IR was getting so high that forcing the features of that into a low -level was putting too much of a burden on it.
[853] So he's splitting that into multiple pieces.
[854] And that was one of the inspirations.
[855] for our software stack where we have several intermediate representations that are all executable and you can look at them and do transformations on them before you lower the level.
[856] So that was, I think we started before MLIR really got far enough along to use.
[857] But we're interested in that.
[858] He's really excited about MLRA.
[859] That's his like little baby.
[860] So there seems to be some profound ideas on that that are really useful.
[861] So each one of those things has been, as the world of software gets more and more complicated, how do we create the right abstraction levels to simplify it in a way that people can now work independently on different levels of it?
[862] So I would say all three of those projects, LovM, Swift, and MLR did that successfully.
[863] So I'm interested in what he's going to do next in the same kind of way.
[864] Yes.
[865] on either the TPU or maybe the NVIDIA GPU side how does TenseStorrent you think or the ideas of relying it doesn't have to be Tennis torrent just this kind of graph -focused graph -centric hardware deep learning centric hardware beat NVIDias do you think it's possible for it to basically overtake NVIDIA?
[866] Sure.
[867] What's that process?
[868] look like?
[869] What's that journey look like, do you think?
[870] Well, GPUs were built to run shader programs on millions of pixels, not to run graphs.
[871] Yes.
[872] So, there's a hypothesis that says the way the graphs, you know, are built is going to be really interesting to be inefficient on computing this.
[873] And then the primitives is not a SIMD program.
[874] It's matrix multiply convolution.
[875] And then the data manipulations are fairly extensive about like how do you do a fast transport?
[876] with a program.
[877] I don't know if you ever written that transpose program.
[878] They're ugly and slow, but in hardware you can do really well.
[879] Like, I'll give you an example.
[880] So when GPU accelerators started doing triangles, like, so you have a triangle which maps on the set of pixels.
[881] So you build, it's very easy, straightforward to build a hardware engine that will find all those pixels.
[882] And it's kind of weird because you walk along the triangle to you get to the edge and then you have to go back down to the next row and walk along, and then you have to decide on the edge if the line of the triangle, triangle is like half on the pixel.
[883] What's the pixel color?
[884] Because it's half of this pixel and half the next one.
[885] That's called rasterization.
[886] You're saying that could be done in hardware.
[887] No, that's an example of that operation as a software program is really bad.
[888] I've written a program that did rasterization.
[889] The hardware that does it is actually less code than the software program that does it, and it's way faster.
[890] Right.
[891] So there are certain times when the abstraction you have, rasterize a triangle, you know, execute a graph, you know, components of a graph.
[892] The right thing to do in the hardware software boundary is for the hardware to naturally do it.
[893] And so the GPU is really optimized for the rasterization of triangles.
[894] Well, no, that's just, well, like in a modern, you know, that's a small piece of modern GPUs.
[895] What they did is that they still rasterized triangles when you're running in a game, but for the most part, most of the computation in the area the GPU is running Chatur.
[896] programs, but they're single -threaded programs on pixels, not graphs.
[897] To be honest, I don't actually know the math behind shader, shading and lighting and all that kind of stuff.
[898] I don't know what.
[899] They look like little simple floating point programs or complicated ones.
[900] You can have 8 ,000 instructions in a shader program.
[901] But I don't have a good intuition why it could be parallelized so easily.
[902] No, it's because you have 8 million pixels in every single.
[903] So when you have a light, right?
[904] Yeah.
[905] that comes down, the angle, you know, the amount of light, like, say this is a line of pixels across this table, right?
[906] The amount of light on each pixel is subtly different.
[907] And each pixel is responsible for figuring out what?
[908] Figure it out.
[909] So that pixel says, I'm this pixel.
[910] I know the angle of the light.
[911] I know the occlusion, I know the color I am.
[912] Like every single pixel here is a different color.
[913] Every single pixel gets a different amount of light.
[914] Every single pixel has a subtly different and translucency.
[915] So to make it look realistic, the solution was you run a separate program on every pixel.
[916] See, but I thought there's like reflection from all over the place.
[917] Every pixel is supposed to.
[918] Yeah, but there is.
[919] So you build a reflection map, which also has some pixel -aided thing.
[920] And then when the pixels looking at the reflection map, has to calculate what the normal off the surface is, and it does it per pixel.
[921] By the way, there's both loads of hacks on that.
[922] You know, like you may have a lower resolution, light map, reflection map.
[923] There's all these, you know, hacks they do.
[924] But at the end of the day, it's per pixel computation.
[925] And it's so happening, you can map a graph -like computation onto the pixel -central computation.
[926] You can do floating -point programs on convolution and the matrices.
[927] And the Nvidia invested for years in CUDA, first for HPC, and then they got lucky with the AI trend.
[928] But do you think they're going to essentially not be able to hardcore pivot out of their...
[929] We'll see.
[930] That's always interesting.
[931] How often the big company is hardcore pivot, occasionally?
[932] How much do you know about Nvidia, folks?
[933] Some.
[934] Some.
[935] I'm curious as well.
[936] Who's ultimately, as a...
[937] Well, they've innovated several times, but they've also worked really hard on mobile.
[938] They worked really high on radios.
[939] You know, they're fundamentally a GPU company.
[940] Well, they tried to pivot.
[941] There's an interesting little game and play in autonomous vehicles, right?
[942] with or a semi -autonomous like playing with Tesla and so on and seeing that's a dipping a toe into that kind of pivot they came out with this platform which is interesting technically yeah but it was like a 3 ,000 watt you know you know thousand watt three three thousand dollar you know GPU platform I don't know if it's interesting technically it's interesting philosophically I I technically I don't know if it's the execution the craftsmship is there I'm not sure that I didn't get a sense they were repurposing GPU use for an automotive solution.
[943] Right, it's not a real pivot.
[944] They didn't build a ground -up solution.
[945] Right.
[946] Like the chips inside Tesla are pretty cheap.
[947] Like MobileE has been doing this.
[948] They're doing the classic work from the simplest thing.
[949] You know, they were building 40 square millimeter chips.
[950] And Nvidia, their solution had 200 millimeter chips and 200 millimeter chips.
[951] And, you know, like, boatloads are really expensive DRAMs.
[952] And, you know, it's a really different approach.
[953] the mobile I fit the let's say automotive cost and form factor and then they added features as it was economically viable and Nvidia said take the biggest thing and we're going to go make it work you know and that's also influenced like Waymo there's a whole bunch of autonomous startups where they have a 5 ,000 watt server in their trunk right and but that's that's because they think well 5 ,000 watts and you know $10 ,000 is okay because it's replacing a driver Elon's approach was that port has to be cheap enough to put it in every single Tesla, whether they turn on autonomous driving or not, which, and Mobile Eye was like, we need to fit in the bomb and, you know, cost structure that car companies do.
[954] So they may sell you a GPS for $1 ,500.
[955] But the bomb for that's like $25.
[956] Well, and for Mobile Eye, it seems like neural networks were not first -class citizens, like the computation.
[957] They didn't start out as a...
[958] Yeah, it was a CV problem.
[959] You know, they did classic CV and found stoplights and lines.
[960] And they were really good at it.
[961] Yeah, and they never, I mean, I don't know what's happening now, but they never fully pivoted.
[962] I mean, it's like it's the invidia thing.
[963] And then as opposed to, so if you look at the new Tesla work, it's like neural networks from the ground up, right?
[964] Yeah, and even Tesla started with a lot of CV stuff in it, and Andre has basically been eliminating it.
[965] You know, move everything into the network.
[966] So without, this isn't like confidential stuff, but you sitting on a porch looking over the world, looking at the work that Andre is doing, that Elon's doing with Tesla Autopilot, do you like the trajectory of where things are going on the hardware side?
[967] Well, they're making serious progress.
[968] I like the videos of people driving the beta stuff.
[969] Like it's taken some pretty complicated intersections and all that, but it's still an intervention per drive.
[970] I mean, I have the current Autopilot, my Tesla.
[971] I use it every day.
[972] Do you have full self -driving beta or no?
[973] So you like where this is going?
[974] They're making progress.
[975] It's taken longer than anybody thought.
[976] You know, my wonder was, is, you know, hardware 3, is it enough computing, off by 2, off by 5, off by 10, off by 100?
[977] Yeah.
[978] And I thought it probably wasn't enough, but they're doing pretty well with it now.
[979] Yeah.
[980] And one thing is the dataset gets bigger, the training gets better and then there's this interesting thing is you sort of train and build an arbitrary size network that solves the problem and then you refactor the network down to the thing that you can afford the ship right so the the goal isn't to build the network that fits in the phone it's to build something that actually works and then then how do you make that most effective on the hardware you have and they seem to be doing that much better than a couple years ago Well, the one really important thing is also what they're doing well is how to iterate that quickly, which means like it's not just about one -time deployment, one building, it's constantly iterating the network and trying to automate as many steps as possible, right?
[981] And that's actually the principles of the software 2 .0, like you mentioned with Andre, is it's not just, I mean, I don't know what the actual, his description of software 2 .0 is.
[982] if it's just high -level philosophical or their specifics.
[983] But the interesting thing about what that actually looks in the real world is it's that what I think Andre calls the data engine.
[984] It's like it's the iterative improvement of the thing.
[985] You have a neural network that does stuff, fails on a bunch of things, and learns from it over and over and over.
[986] So you're constantly discovering edge cases.
[987] So it's very much about data engineering, like figuring out it's kind of what you were talking about with Tenthstone is you have the data landscape and you have to walk along that data landscape in a way that it's constantly improving the neural network and that feels like that's the central piece and there's two pieces of it like you find edge cases that don't work and then you define something that goes get your data for that but then the other constraint is whether you have to label it or not like the amazing thing about like the GPT3 stuff is it's unsupervised.
[988] So there's essentially infinite amount of data.
[989] Now, there's obviously infinite amount of data available from cars of people successfully driving.
[990] But, you know, the current pipelines are mostly running on labeled data, which is human limited.
[991] So when that becomes unsupervised, right, it'll create unlimited amount of data, which is no scale.
[992] Now, the networks that may use that data might be way too big for cars, but then there'll be the transformation from, now we have unlimited data, I know exactly what I want.
[993] Now can I turn that into something that fits in the car?
[994] And that process is going to happen all over the place.
[995] Every time you get to the place where you have unlimited data, and that's what Software 2 .0 is about, unlimited data training networks to do stuff without humans writing code to do it.
[996] And ultimately also trying to discover, like you're saying, the self -supervised formulation of the problem, so the unsupervised formulation of the problem.
[997] Like in driving, there's this really interesting thing, which is you look at a scene that's before you and you have data about what a successful human driver did in that scene, you know, one second later.
[998] It's a little piece of data that you can use just like with GPT3 as training.
[999] Currently, even though Tesla says they're using that, it's an open question to me, how far can you, can you sell all of the driving with just that self -super superfluver?
[1000] piece of data.
[1001] And, like, I think...
[1002] Well, that's what KamiI is doing.
[1003] That's what KamiAIs doing, but the question is how much data, so what KamiA