speaker 1: I think it's possible that physics has exploits, and we should be trying to find him arranging some kind of a crazy quantum mechanical system that somehow gives you buffer overflow, somehow gives you a rounding error in the floating point. Synthetic intelligences are kind of like the next stage of development, and I don't know where it leads to. Like at some point, I suspect the universe is some kind of a puzzle. These synthetic AI's will uncover that puzzle and solve it.
speaker 2: The following is a conversation with Andre kpathy, previously the director of Reya at Tesla, and before that at OpenAI and Stanford. He is one of the greatest scientist, engineers and educators in the history of artificial intelligence. This is the lex Friedman podcast. To support it, please check on our sponsors. And now, dear friends, here's Andre Kapai. What is a neural network and what does it seem to do? Such a surprisingly good job of learning what .
speaker 1: is a neural network. It's a mathematical abstraction of the brain. I would say that's how it was originally developed. At the end of the day, it's a mathematical expression, and it's a fairly simple mathematical expression. When you get down to it, it's basically a sequence of memultiplies whichdot products mathematically, and some nonlinearity is thrown in. And so it's a very simple mathematical expression. And it's got knobs in it, many knobs, many knobs. And these knobs are loosely related to basically the synapses in your brain. They're trainable, they're modifiable. And so the idea is like, we need to find the setting of the knobs that makes the neural nut do whatever you want it to do, like classify images and so on. And so there's not too much mystery. I would saying it like you might think that basically don't want to endow it with too much meaning with respect to the brain. And how it works is really just a complicated mathematical expression with knobs. And those knobs need a proper setting for it to do something desirable.
speaker 2: Yeah but poetry is just a collection of letters with spaces, but it can make us feel a certain way. And in that same way, when you get a large number of knobs together with whether it's in a inside the brain or inside a computer, they seem to they seem to surprise us with with their power. Yeah.
speaker 1: I think that's fair. So basically, I'm underselling it by a lot because you definitely do get very surprising emergent behaviors out of these neural matts when they're large enough and trained on complicated enough problems like say, for example, the next word prediction in a massive data set from the Internet. And then these neural maths take on pretty surprising magical properties. Yeah I think it's kind of interesting how much you can get out of even very simple .
speaker 2: mathematical formalism when your brain right now is talking. Is it doing next word prediction or is it doing something more interesting?
speaker 1: Well, it's definitely some kind of a generative model that's a GPT like and prompted by you. Yeah. So you're giving me a prompt and I'm kind of like responding to it in .
speaker 2: a generative way and by yourself, perhaps a little bit like a are you adding extra prompts from your own memory inside your head or no?
speaker 1: It definitely feels like you're referencing some kind of a declarative structure of like memory and so on. And then you're putting that together with your prompt .
speaker 2: and giving away some like how much of what you just said has been said by you before?
speaker 1: Nothing, basically, right?
speaker 2: No. But if you actually look at all the words you've ever said in your life and you do a search, you'll probably said a lot of the same words in the same order before.
speaker 1: Yeah, could be. I mean, I'm using phrases that are common, etc, but I'm remixing it into a pretty sort of unique sentence at the end of the day. But you're right, definitely.
speaker 2: There's like a ton of remixing what you didn't. It's like Magnus Carlson said, I'm rated 20 900, whatever, which is pretty decent. I think you're talking very you're not giving enough credit to your Ural nuts here. Why do they seem to? What's your best intuition about this emergent behavior? I mean, it's kind of interesting because I'm simultaneously underselling them.
speaker 1: but I also feel like there's an element to which I'm over. Like it's actually kind of incredible that you can get so much emergent magical behavior out of them despite them being so simple mathematically. So I think those are kind of like two surprising statements that are kind of jujuxtaposed together. And I think basically what it is, is we are actually fairly good at optimizing these neural nuts. And when you give them a hard enough problem, they are forced to learn very interesting solutions in the optimization. And those solutions basically .
speaker 2: have these emerging properties that are very interesting. There's wisdom and knowledge in the knobs. And so this representation that's in the knobs does make sense to you. Intuitively, the large number of knobs can hold a representation that captures some deep wisdom about the data it has looked at.
speaker 1: It's a lot of knobs. It's a lot of knobs. And somehow, you know, so speaking concretely, one of the neural knots that people are very excited about right now are our GPTs, which are basically just next word prediction networks. So you consume a sequence of words from the Internet and you try to predict the next word. And once you train these on a large enough data set, you can basically prompt these neural matin, arbitrary ways, and you can ask them to solve problems, and they will. So you can just tell them you can make it look like you're trying to solve some kind of a mathematical problem, and they will continue what they think is the solution based on what they've seen on the Internet. And very often those solutions look very remarkably consistent, look correct, potentially.
speaker 2: Do you still think about the brain side of it? So as neural nets, as an abstraction or mathematical abstraction of the brain, you still draw wisdom from the biological neural networks or even the bigger question. So you're a big fan of biology and biological computation. What impressive thing is biology doing to you that computers are not yet that gap?
speaker 1: I would say I'm definitely on. I'm much more hesitant with the analogies to the brain than I think you would see potentially in the field. And I kind of feel like certainly the way neural network started is everything stemmed from inspiration by the brain. But at the end of the day, the artifacts that you get after training, they are arrived at by a very different optimization process than the optimization process that gave rise to the brain. And so I think I kind of think of it as a very complicated alien artifact. It's something different necessarily. The neuronnethat we're training, okay? They are complicated alien artifact. I do not make analogies to the brain because I think the optimization process that gave rise to it is very different from the brain. So there was no multi agent self play kind of setup and evolution. It was an optimization that is basically what amounts to a compression objective on a massive .
speaker 2: amount of data. Okay? So artificial neural networks are doing compression and biological neural networks now to survive. And they're not really doing a they're an agent in a multi agent self place system that's been running for a very.
speaker 1: very long time. That said, evolution has found that it is very useful to predict and have a predictive model in the brain. And so I think our brain utilizes something that looks like that as a part of it, but it has a lot more you know, Gaches and gizmos and value functions and ancient nuclei that are all trying to like, make a survive or reproduce and .
speaker 2: everything else. And the whole thing through embryo genesis is built from a single cell. I mean, it's just the code is inside the dna and it just builds it up like the entire organism. It's totally crazy in the head and legs. Yes. And like it does it pretty well. It might not be possible. So there's some learning going on. There's some there's some there's some kind of computation going through that building process. I mean, I don't know where if you were just to look at the entirety of history of life on earth, what do you think is the most interesting invention? Is it the origin of life itself? Is it just jumping to eukaryotes? Is it mammals? Is it humans themselves, almost sapiens, the origin of intelligence or highly complex intelligence? Or is it all just in continuation of the same kind of process?
speaker 1: Certainly, I would say it's an extremely remarkable story that I'm only like briefly learning about recently, all the way from actually, like you almost have to start at the formation of earth and all of its conditions and the entire solar system and how everything is arranged with Jupiter and moon and the habitable zone and everything. And then you have an active earth that's turning over material, and then you start with abiogenesis and everything. And so it's all like a pretty remarkable story. I'm not sure that I can pick like a single unique piece of it that I find most interesting. I guess for me as an artificial intelligence researcher, it's probably the last piece. We have lots of animals that you are not building technological society, but we do. And it seems to have happened very quickly. It seems to have happened very recently. And something very interesting happened there that I don't fully understand. I almost understand everything else kind of, I think intuitively, but I don't understand exactly that part and how quick it was. Both explanations .
speaker 2: would be interesting. One is that this is just in continuation of the same kind of process. There's is nothing special by humans that would be deeply understanding. That would be very interesting, that we think of ourselves as special. But it was obvious it was already written in in the code, that you would have greater and greater intelligence emerging. And then the other explanation, which is something truly special, happened something like a rare event, whether it's like crazy rare event, like A Space Odyssey. What would it be? See if you say like the invention of fire or the, as Richard ranangham says, the beta males deciding a clever way to kill the alpha males by collaborating. And so just optimizing the collaboration, really the multi agent aspect of the multi agent and that really being constrained on resources and trying to survive the collaboration aspect is what created the complex intelligence. But it seems like it's a natural outgrowth of the evolution process. Like what could possibly be a magical thing that happened, like a rare thing that would say that humans are actually human level intelligence is actually a really rare thing in the universe.
speaker 1: Yeah I'm hesitant to say that it is for verby way, but it definitely seems like it's kind of like a punctuated equilibrium where you have lots of exploration and then you have certain leaps, sparse leaps in between. So of course, like origin of life would be one, you know dna sex, eukaryotic system, eukaryotic life, the endosymbiosis event where the archaeon ate phbacteria, you know just a whole thing. And then of course, emergence of consciousness and so on. So it seems like definitely there are sparse events where maamount of progress was made. It's kind of hard to pick one.
speaker 2: So you don't think humans are unique. Got to ask you, how many intelligent alien civilizations do you think are out there? And is there intelligence different or similar to ours? Yeah.
speaker 1: I've been preoccupied with this question quite a bit recently, basically the ferme paradox and just thinking through and and the reason actually that I am very interested in the origin of life is fundamentally trying to understand how common it is that there are technological societies out there in space. And the more I study it, the more I think that there should be quite a few.
speaker 2: quite a lot. Why haven't we heard from them? Because I agree with you. It feels like I just don't see why what we did here on earth is so difficult .
speaker 1: to do Yeah and especially when you get into the details of it. I used to think origin of life was very, it was this magical, rare event. But then you read books like, for example, nicklane, the vital question, life ascending. He really gets in and he really makes you believe that this is not that rare basic chemistry. You have an active earth and you have your alkaline vents and you have lots of alkaline waters mixing with it's a devotion. And you have your proton gradients, and you have little porous pockets of these alkaline vents that concentrate chemistry. And basically, as you steps through all of these little pieces, you start to understand that actually, this is not that crazy. You could see this happen on other systems. And he really takes you from just a geology to primitive life, and he makes it feel like it's actually pretty plausible. And also like the origin of life didn't was actually fairly fast after formation of earth, if I'm remember correctly, just a few hundred million years or something like that after basically when it was possible life actually arose. And so that makes me feel that that is not the constraint, that is not the limiting variable, and that life should actually be fairly common. And then you know where the drop offs are is very is very interesting to think about. I currently think that there's no major drop offs, basically. And so there should be quite a lot of life. And basically, what where that brings me to then is the only way to reconcile the fact that we haven't found anyone and so on is that we just can't we can't see them. We can't observe them.
speaker 2: Just a quick brief comment. Niclaine, and a lot of biologists I talk to, they really seem to think that the jump from bacteria to more complex organisms .
speaker 1: is the hardest jump.
speaker 2: The eukaryotic lyse. Yeah, which I don't. I get it. They're much more knowledgeable than me about like the intricacies of biology. But that seems like crazy because how much how many single cell organisms are there, like and how much time you have? Surely it's not that difficult. Like in a billion years is not even that long of a time really. Just all these bacteria under constrained resources battling it out. I'm sure they can invent more complex like I don't understand. It's like how to move from a hello world program to like like invent a function or something like that. I don't Yeah so I don't Yeah so I'm with you. I just feel like I don't see any if the origin of life that would be my intuition. That's the hardest thing. But if that's not the hardest thing because it happened so quickly, then it's got to be everywhere. And Yeah.
speaker 1: maybe we're just too dumb to see it. Well, it's just we don't have really good mechanisms for seeing this life. I mean, by what? How? So I'm not an expert just to preface this, but just from what I about it.
speaker 2: I want to meet an expert on alien intelligence and how .
speaker 1: to communicate. I'm very suspicious of our ability to find these intelligences out there and to find these earth like radio waves, for example, are terrible. Their power drops off as basically one over our square. So I remember reading that our current radio waves would not be the ones that we are broadcasting, would not be measurable by our devices today, only like was it like one tenth of a light ear away, like not even basically tiny distance, because you really need like a targeted transmission of massive power directed somewhere for this to be picked up on long businesses. And so I just think that our ability to measure is not amazing. I think there's probably other civilizations out there. And then the big question is, why don't they build one noman probes and why don't they interstellar travel across the entire galaxy? And my current answer is it's probably interstellar travel is like really hard. You have the interstellar medium. If you want to move at closer speed of light, you're going to be encountering bullets along the way, because even like tiny hydrogen atoms and little particles of dust are basically you have massive kinetic energy at those speeds. And so basically you need some kind of shielding. You you have older cosmic radiation. It's just like brutal out there. It's really hard. And so my thinking is maybe intersolar travel is just extremely .
speaker 2: hard and you have billions of years to build hard. It feels like it feels like we're not a billion years away from doing that.
speaker 1: It just might be that it's very you have to go very slowly, potentially as an example, through space, right.
speaker 2: as opposed to close the speed of life. So I'm suspicious .
speaker 1: basically our ability to measure life, and I'm suspicious of the ability to just permeate all of space in the galaxy or across galaxies. And that's the only way that I can certainly I can currently see a .
speaker 2: way around it. Yeah, it's kind of mind blowing. You think that there is trillions of intelligent alien civilizations out there kind of slowly traveling through space to meet each other and some of them meet, some of them go to war, some of them collaborate and weren't they're all .
speaker 1: just independent. They all just like little pockets on more statistically.
speaker 2: if there's like if it's this trillions of them, surely some of them, some of the pockets are close enough to get some of them .
speaker 1: happen to be close enough in.
speaker 2: close enough to see each other. And then once see once you see something that it's definitely complex life, like if we see something, Yeah we're probably going to be severe, like intensely, aggressively motivated to figure out what the hell that is and try to meet them. But what would be your first instinct to to try to like at a generational level, meet them or defend against them? Or what would be your instinct as a president of the United States and a scientist? I don't know which hat you prefer in this question.
speaker 1: Yeah, I think the question, it's really hard. I will say, like for example, for us, we have lots of primitive life forms on earth next to us. We have all kinds of ants and everything else and we share space with them. And we are hesitant to impact on them and to are we're trying to protect them by default because they are amazing, interesting dynamical systems that took a long time to evolve. And they are interesting and special. And I don't know that you want to destroy that by default. And so I like complex dynamical systems that took a lot of time to evolve. I think I'd like to preserve it if I can afford to. And I'd like to think that the same would be about the galactic resources and that they would think that we're kind of incredible interesting story that took time. It took a few billion years to unravel, and you don't want na just destroy it.
speaker 2: I could see two aliens talking about earth right now and saying, I'm a big fan of complex dynamical systems. So I think it was a value to preserve these. And we basically are a video game they watch or show a tv show that they watch. Yeah. I think you wouldn't .
speaker 1: need like a very good reason I think, to to destroy it. Like why don't we destroy these ant farms and so on? It's because we're not actually like really in direct competition with them right now. We do it accidentally and so on, but there's plenty of resources. And so why would you destroy something that is so interesting and precious?
speaker 2: Well, from a scientific perspective, you might probe it. Yeah you might interact with it lately.
speaker 1: You might want to learn something from it.
speaker 2: right? So I wonder, there's could be certain physical phenomena that we think is a physical phenomena, but it's actually interacting with us to like poke the finger and see what happens.
speaker 1: I think it should be very interesting to scientists, other alien scientists, what happened here. And you know what we're seeing today is a snapshot. Basically, it's a result of a huge amount of computation of over like billion years or something like that.
speaker 2: So it could have been initiated by aliens. This could be a computer running a program like when okay, if you had the power to do this, when you okay, for sure, at least I would I would pick an earth like planet that has the conditions based my understanding of the chemistry prerequisites for life and I would see it with life and run it right. Like Yeah wouldn't you 100% do that and observe it and then protect I mean that it's not just the hell of a good tv show, it's it's a good scientific experiment. Yeah and it it's physical simulation, right? Maybe maybe the evolution is the most like actually running it is the most efficient way to understand computation or to compute stuff, to understand .
speaker 1: life or you know what life looks like and what branches it can take.
speaker 2: It does make me kind of feel weird that we're part of a science experiment, but maybe it's everything is a science experiment. Does that change anything for us if we're a .
speaker 1: science experiment?
speaker 2: I don't know. Two descendants of apes talking about inside of the science.
speaker 1: I'm suspicious of this idea of like a deliberate ppera, as you described it, sort of. And I don't see a divine intervention in some way in in the historical record right now. I do feel like the story in these in these books, like nicplains books and so on sort of makes sense. And it makes sense how life arose on earth uniquely. And Yeah, I don't need to I don't need to reach for more exotic explanations right now.
speaker 2: sure. But npc's inside of video game don't. Don't observe any divine intervention either. And we might just be all npc's or any kind of code.
speaker 1: Maybe eventually they will currently, npc's are really dumb, but once they're running GPTs, maybe they will be like, Hey, this is a little .
speaker 2: really suspicious. What the hell? So you are famously tweeted, it looks like if you bombard earth with photons for a while, it can emit a roadster. So if like an hitchhiker's guide to the galaxy, we would summarize the story of earth. So in that book, it's mostly harmless. What do you think is all the possible stories, like a paragraph long or sentence long? That earth could be summarized as, once it's done, its computation. So like all the possible full, if earth is a book, right? Yeah, probably there has to be an ending. I mean, there's going to be an end to earth. And it could end in all kinds of ways. It can end soon. It can end later. What do you think are the .
speaker 1: possible stories? Well, definitely, there seems to be Yeah, you're sort of it's pretty incredible that these self replicating systems will basically arise from the dynamics, and then they perpetuate themselves and become more complex and eventually become conscious and build a society. And I kind of feel like in some sense, it's kind of like a deterministic wave that you know that kind of just like happens on any know any sufficiently well arranged system like earth. And so I kind of feel like there's a certain sense of inevitability .
speaker 2: in it and it's really beautiful and it ends somehow, right? So it's it's a chemically, a diverse environment where complex dynamical systems can evolve and become more more further and further complex. But then there's a certain, what is it? There's certain terminating conditions.
speaker 1: Yeah, I don't know what determining conditions are, but definitely there's a trend line of something and we're part of that story and like where that where does it go? So you know we're famously described often as a biological bootloader for AI's, and that's because humans, I mean, you know we're an incredible biological system and we're capable of computation and you know and love and so on, but we're extremely inefficient as well. Like we're talking to each other through audio. It's just kind of embarrassing, honestly, that we're manipulating like seven symbols serially. We're using vocal corords. It's all happening over like multiple seconds. It's just like kind of embarrassing when you step down to the frequencies at which computers operate or are able to operate on. And so basically, it does seem like synthetic intelligences are kind of like the next stage of development. And I don't know where it leads to. Like at some point, I suspect the universe is some kind of a puzzle and these synthetic AI's will uncover that puzzle and solve it. And then what happens after.
speaker 2: right? Like what? Because if you just like fast forward earth many billions of years, like it's it's quiet and then it's like tormal. You see like city lights and stuff like that. And then what happens like at the end? Like is it like a, is it or is it like a calming? Is it explosion? Is it like earth, like open, like a giant? Because you said, emit roasters, let's start emitting like like a giant number of like satellites. Yes, it's some kind of a crazy explosion. And we're living.
speaker 1: we're like we're stepping through a explosion and we're like living day to day and doesn't look like it. But it's actually, if you I saw a very cool animation of earth and life on earth, and basically nothing happens for a long time. And then the last like 2s, like basically cities and everything and and the lower orbit just gets cluttered and just the whole thing happens in the last few seconds and you're like, this is exploding. This is statement of explosion.
speaker 2: See if you play Yeah. Yeah if you play at a normal speed, Yeah is itjust look like an explosion?
speaker 1: It's a firecracker.
speaker 2: We're living in a firecracker where it's going to start emitting all kinds of interesting things. Yeah and then so explosion doesn't it might actually look like a little explosion with with lights and fire and energy emitted, all that kind of stuff. But when you look inside the details of the explosion, there's actual complexity happening where there's like Yeah human life or some kind of life. We hope it's another .
speaker 1: destructive firecracker. It's kind of like a constructive firecracker.
speaker 2: All right. So given that I think hilarious discussion guit is really interesting .
speaker 1: to think about like what the puzzle of the universe is. Did the creator of the universe give us a message, like, for example, in the book contact Carl Sagan, there's a message for humanity, for any civilization in digits in the expansion of pi in base eleven eventually, which is kind of interesting thought. Maybe we're supposed to be giving a message to our creator. Maybe we're supposed to somehow create some kind of a quantum mechanical system that alerts them to our intelligent presence here, because if you think about it from their perspective is just say like quantum field theory, massive lexilar autonomaton like thing. And like how do you even notice that we exist? You might not even be able to pick us up in that simulation. And so how do you prove that you exist, that you're intelligent, and that you're part .
speaker 2: of the universe? So this is like a touring test for intelligence from earth. The creator is I mean, maybe this is like trying to complete the next word. In a sense, this is a complicated way of that. Like earth is just is basically sending a message back.
speaker 1: Yeah the puzzle is basically like alerting the creator that we exist. Or maybe the puzzle is just to just break out of the system and just you know stick it to the creator in some way. Basically, like if you're playing a video game, you can you can somehow find an exploit and find a way to execute on the host machine in arbitrary code. There's some, for example, I believe someone got Mario, a game of Mario to play paong just by exploiting it and then creating a basically writing code and being able to execute arbitrary code in the game. And so maybe we should be, maybe that's the puzzle is that we should be find a way to exploit it. So I think like some of the synthetic as will eventually find the universe to be some kind of a puzzle and then solve it in some way.
speaker 2: And that's kind of like the end game somehow. Do you often think about it as as a simulation? So as are the universe being a kind of computation that has might have bugs and exploits?
speaker 1: Yes, Yeah, I think so. As that where physics is essentially, I think it's possible that physics has exploits and we should be trying to find him arranging some kind of a crazy quantum mechanical system that somehow gives you buffer overflow, somehow gives you a rounding error in the floating point.
speaker 2: Yeah, that's right. And like more and more sophisticated exploits, those are jokes, but that could be actually very Yeah we'll find .
speaker 1: some way to extract infinite energy. For example, when you train reinforcement learning agents in physical simulations and you ask them to say, run quickly on the flat ground, theyend up, doing all kinds of like weird things in part of that optimization, right, theyget on their backlag and theyslide across the floor, is because the optimization, the enforcement learning optimization on that agent has figured out a way to extract infinite energy from the friction forces. And basically their poor implementation, and they found a way to generate infinite energy and just slide across the surface. And it's not what you expected, is just it's sort of like a perverse solution. And so maybe we can find something like that. Maybe we can be that little .
speaker 2: dog in this physical simulation, the cracks or escapes the intended consequences of the physics that the universe came up with. Yeah, we'll figure out some kind of shortcusome weirdness and then, Oh, man, but see, the problem with that weirdness is the first person to discover the weirdness like sliding on the back legs.
speaker 1: That's all we're going to do.
speaker 2: It's very quickly because everybody does that thing. So like the paperclip maximizer is a ridiculous idea, but that very well, Yeah could be what then we'll just we'll just all switch that because it's so fun. Well, no person .
speaker 1: will discover it. I think, by the way, I think it's going to have to be some kind of a super intelligent agi of a third generation. Like we're building the first generation agi and you know third .
speaker 2: generation Yeah so the the bootloader for an AI that AI Yeah will be a bootloader for another AI AI Yeah and then there's .
speaker 1: no way for us to introspect .
speaker 2: like what that money even I think it's very likely that these things.
speaker 1: for example, like say you have these agis, it's very likely, for example, they will be completely inert. I like these kinds of Sci fi books sometimes where these things are just completely inert. They don't interact with anything. And I find that kind of beautiful because they've probably figured out the meta game of the universe in some way. Potentially, they're doing something completely beyond our imagination. And they don't interact with simple chemical life forms, like, why would you do that? So I find those kinds .
speaker 2: of ideas compelling. What's their source of fun? What are they doing?
speaker 1: What's the of solving in the universe?
speaker 2: But inert. So can you define what it means? Inert? So they .
speaker 1: escape the interphysical reality, as in .
speaker 2: they will behave .
speaker 1: in some very like strange way to us because they're beyond they're playing the meta game. And the meta game is probably say, like arranging quantum mechanical systems in some very weird ways to extract infinite energy, solve the digital expansion of pi to whatever amount they will build their own, like little fusion reactors or something crazy. Like they're doing something beyond comprehension and not understandable to us and actually brilliant under the hood.
speaker 2: What if quantum mechanics itself is the system and we're just thinking it's physics, but we're really parasites on on not parasites. We're not really hurting physics. We're just living on this organisms, this organism, and we're like trying to understand it. But really it is an organism and with a deep, deep intelligence, maybe physics itself is the organism that's doing this super interesting thing. And we're just like one little thing Yeah ant sitting on top of it trying to get energy from it. We're just kind of like these particles .
speaker 1: in the wave that I feel like is mostly deterministic and takes universe from some kind of a big bang to some kind of a super intelligent replicator, some kind of a stable point in the universe given these laws of physics.
speaker 2: You don't think, as Einstein said, God doesn't play dice. So you think it's mostly deterministic. There's no randomness in .
speaker 1: anything thing I think as a deterministic Oh there's tons of well I want to .
speaker 2: be careful with randomness .
speaker 1: pseudo random Yeah I don't like random I think maybe the laws of physics are deterministic Yeah I think their determines .
speaker 2: just got really uncomfortable with this question I do you have anxiety about whether the universe is random or not? This is what's like there's no randomness not you you like goodwill hunting it's not your fault Andre isn't not. You faman so you don't like randomness? Yeah, I think it's unsettling.
speaker 1: I think it's a deterministic system. I think that things that look random, like say the collapse of the wave function, etc. I think they're actually deterministic, just entanglement and so on and some kind of a multiverse theory.
speaker 2: Something something okay. So why does it feel like we have a free will? Like if if I raise his hand, I chose to do this. Now, what? That doesn't feel like a deterministic thing. It feels like I'm making a choice.
speaker 1: It feels like it.
speaker 2: Okay. So it's all feelings. It's just feelings. Yeah. So when rl agent is making a choice is that it's not really making a choice.
speaker 1: The choice is all already that Yeah you're interpreting the choice and you're .
speaker 2: creating a narrative for for having made it. Yeah and now we're talking about the narrative. It's very meta looking back, what is the most beautiful or surprising idea in deep learning or AI in general that you've come across? You've seen this field explode and grow in interesting ways. Just what cool ideas like like ways made you sit back and go, hmm, small, big or small?
speaker 1: Well, the one that I've been thinking about recently, the most probably is the transformer architecture. So basically, neural levels have a lot of architectures that were trendy have come and gone for different sensory modalities, like for vision, audio, text, you would process them with different, look, neural nuts. And recently we've seen this convergence towards one architecture, the transformer. And you can feed it video or you can feed it you know images or speech or text, and it just gobbles it up. And it's kind of like a bit of a general purpose computer that is also trainable and very efficient to run on our hardware. And so this paper came out in 2016.
speaker 2: I want na say attention is all you need. Attention is all you need. You could have says the paper title in retrospect that it wasn't it didn't foresee the bigness of the impact Yeah that it was going to have.
speaker 1: Yeah I'm not sure if the authors were aware of the impact that that paper would go on to have. Probably they weren't, but I think they were aware of some of the motivations and design decisions behind the transformer, and they chose not to, I think, expand on it in that way in a paper. And so I think they had an idea that there was more than just the surface of just like, Oh, we're just doing translation, and here's a better architecture. You're not just doing translation. This is like a really cool, differentiable optimizable efficient computer that you've proposed. And maybe they didn't have all of that foresight, but I think .
speaker 2: it's really interesting. Isn't it funny? Sorry to interrupt that. That title is meable that they went for such a profound idea. They went with A I don't think anyone used that kind of title before.
speaker 1: right? Attention is all you need. Yeah.
speaker 2: It's like a meme or something, exactly. Isn't that funny? That one? Like maybe if it was a more serious title, we wouldn't have the impact.
speaker 1: Honestly, I Yeah there is an element of me that honestly agrees with you and prefers it this way. Yes, if it was too grand, it would overpromise and then under deliver potentially. So you want to just meme your way to greatness.
speaker 2: That should be A T shirt. So you tweeted the transformer as a magnificent neural network architecture, because it is a general purpose differentiable computer. It is simultaneously expressive in the forward PaaS optimizable via back propagation, gradient descent and efficient high parallelism compute graph. Can you discuss some of those details? Expressive optimizable efficient. Yeah from memory or or in general, whatever comes to your heart.
speaker 1: you want to have a general purpose computer that you can train on arbitrary problems like say, the task of next word prediction or detecting if there's a cat in an image or something like that. And you want to train this computer, so you want to set its weights. And I think there's a number of design criteria that sort of overlap in the transformer simultaneously that made it very successful. And I think the authors were kind of deliberately trying to make this really powerful architecture. And so basically, it's very powerful in the forward past because it's able to express very general computation as sort of something that looks like message passing. You have nodes, and they all store vectors. And these nodes get to basically look at each other, and it's each other's vectors and they get to communicate. And basically nodes get to broadcast, Hey, I'm looking for certain things. And then other nodes get to broadcast, Hey, these are the things I have. Those are the keys in the values. So it's not just attention. Yeah, exactly.
speaker 2: Transformers is much more than .
speaker 1: just the attention component. It's got many pieces architectural that went into it. The residual connection of the way it's arranged, there's a multilayer perceptron and there the way it's stacked and so on. But basically there's a message passing scheme where nodes get to look at each other, decide what's interesting, and then update each other. And so I think the when you get to the details of it, I think it's a very expressive function so it can express lots of different types of algorithms and forward PaaS. Not only that, but the way it's designed with the residual connections layer normalizations, the sofmatics attention and everything, it's also optimizable. This is a really big deal because there's lots of computers that are powerful that you can't optimize or they're are not easy to optimize using the techniques that we have, which is backprocation and gradient incent. These are first order methods, very simple optimizers really. And so you also need it to be optimizable. And then lastly, wanted to run efficiently in our hardware. Our hardware is a massive throughput machine like GPU's. They prefer lots of parallelism. So you don't want to do lots of sequential operations. You want to do a lot of operations serially. And the transformer is designed with that in mind as well. And so it's designed for our hardware and is designed to both be very expressive in a overpass, but also very optimizable in the backward PaaS.
speaker 2: And you said that the residual connections support a kind of ability to learn short algorithms fast and first and then gradually extend them longer during training. What's the idea of learning short algorithms, right? Think of it as .
speaker 1: a so basically, a transformer is a series of blocks, right? And these blocks have attention and a little multilabor percepand. So you you go off into a block and you come back to this residual pathway, and then you go off and you come back and then you have a number of layers arranged sequentially. And so the way to look at it, I think, is because of the residual pathway in the backward past, the gradients sort of flow allowit uninterrupted because addition distributes the gradient equally to all of its branches. So the gradient from the supervision at the top just floats directly to the first layer. And the all the residual connections are arranged so that in the beginning of doing initialization, they contribute nothing to the residual pathway. So what it kind of looks like is imagine the transformer is kind of like a Python function, like a death f, and you get to do various kinds of like lines of code. Say you have 100 layers deep transformer, typically they would be much shorter, say 20. So you have 20 lines of code, then you can do something in them. And so think of during the optimization, basically what it looks like is first you optimize the first line of code and then the second line of code can kick in and the third line of code can kick in. And I kind of feel like because of the residual pathway, the dynamics of the optimization, you can sort of learn a very short algorithm that gets the approxima tensor, but then the other layers can sort of kick in and start to create a contribution. And at the end of it, you're optimizing over an algorithm that is 20 lines of code, except these lines of code are very complex because this an entire block of a transformer. You can do a lot in there. Well, it's really interesting is that this transformer architecture actually has been a remarkably resilient basically, the transformer that came out in 2016 is the transformer you would use today, except you reshuffle some the layer norms, the related normalizations have been reshuffled to a prenorm formulation. And so it's been remarkably stable. But there's a lot of bells and whistles that people have attached on it and try to improve it. I do think that basically it's a big step in simultaneously optimizing for lots of properties of a desirable neural network architecture. And I think people have been trying to change it, but it's proven remarkably resilient. But I do think that there should be even better architectures potentially.
speaker 2: But you admire the resilience here. There's something profound about this architecture that least resiso, maybe we can everything can be turned into a problem that transformers can solve. Currently definitely looks like the .
speaker 1: transformers taking over AI, and you can feed basically arbitrary problems into it. And it's a general differentishable computer, and it's extremely powerful. And this conversions in AI has been really interesting to watch for me personally. What else do you .
speaker 2: think could be discovered here by transformers, that good, surprising thing? Or or is it a stable I we're in a stable place. Is there something interesting with my discover about transformers like aha moments maybe has to do with memory, maybe knowledge representation, that kind of stuff.
speaker 1: Definitely the zguys today is just pushing like basically right now the zguys is do not touch the transformer, touch everything else. Yes. So people are scaling up the data asets, making them much, much bigger. They're working on the evaluation, making the evaluation much, much bigger. And they're basically keeping the architecture unchanged. And that's how we've that's the last five years of progress in AI.
speaker 2: Kind of what do you think about one flavor of it, which is language models? Have you been surprised? Has your sort of imagination been captivated by you mentioned GPT and all the bigger and bigger and bigger language models? And what are the limits of those models, do you think? So just for the task of natural language.
speaker 1: basically the way GPT is trained, right, is you just download a massive amount of text data from the Internet and you try to predict the next word in a sequence. Roughly speaking, you're predicting little word chunks, but roughly speaking, that's it. And what's been really interesting to watch is basically, it's a language model. Language models have actually existed for a very long time. There's papers on language modeling from 2003 even earlier.
speaker 2: Can you explain that .
speaker 1: case what a language model is? Yeah. So language model just basically the rough idea is just predicting the next word in a sequence, roughly speaking. So there's a paper from, for example, Bengio and the team from 2003 where for the first time they were using a neural network to take say, like three or five words and predict the next word. And they're doing this on much smaller data sets. And the neural nut is not a transformer, it's a mulliperceptron. But it's the first time that a neural network work has been applied in that setting. But even before neural networks, there were language models, except they were using n gram models. So n gram models are just count based models. So if you try if you try to take two words and predict a third one, you just count up how many times you've seen any two word combinations and what came next. And what you predict as coming next is just what you've seen the most of in the training set. And so language modeling has been around for a long time. Neural networks have done language modeling for a long time. So really, what's new or interesting or exciting is just realizing that when you scale it up with a powerful enough neural nut transformer, you have all these emergent properties where basically what happens is if you have a large enough data set of text, you are in the task of predicting the next word. You are multitasking. A huge amount of different kinds of problems. You are multitasking understanding of you know chemistry, physics, human nature, lots of things are sort of clustered in that objective. It's a very simple objective, but actually you have to understand a .
speaker 2: lot about the world to make that prediction. You just said the you word understanding iyou in terms of chemistry and physics and so on. What do you feel like it's doing? Is it searching for the right context? And like what is it what is the actual process happening here? Yeah. So basically.
speaker 1: it gets a thousand words and it's trying to predict 1000 and first, and in order to do that very, very well over the entire data set available on the Internet, you actually have to basically kind of understand the context of what's going on in there. And it's a sufficiently hard problem that you if you have a powerful enough computer like a transformer, you end up with interesting solutions and you can ask it to all do all kinds of things. And it shows a lot of emergent properties, like in context learning. That was the big deal with GPT and the original paper when they published it, is that you can just sort of prompt it in various ways and ask it to do various things, and it will just kind of complete the sentence. But in the process of just completing the sentence, it's actually solving all kinds of really interesting problems that we care about.
speaker 2: Do you think it's doing something like understanding? Like and when we use the word understanding for us humans.
speaker 1: I think it's doing some understanding in its weights. It understands, I think, a lot about the world, and it has to in order to predict the next word in a sequence.
speaker 2: So it's trained on the data from the Internet. What do you think about this this approach in terms of data sets, of using data from the Internet? Do you think the Internet has enough structured data to teach AI about human civilization? Yes.
speaker 1: So I think the Internet has a huge amount of data. I'm not sure if it's a complete enough set. I don't know that text is enough for having a sufficiently powerful agi as an outcome.
speaker 2: Of course, there is audio and video and images and all that kind of stuff.
speaker 1: Yeah. So text by itself, I'm a little bit suspicious about. There's a ton of things we don't put in text in writing just because they're obvious to us about how the world works in the physics of it and that things fall. We don't put that stuff in text because why would you we share that understanding. And so text is a communication medium between humans, and it's not a all encompassing medium of knowledge about the world. But as you pointed out, we do have video and we have images and we have audio. So I think that definitely helps a lot. But we haven't trained models sufficiently both across all those modalities yet. So I think that's .
speaker 2: what a lot of people are interested in, but I wonder what that shared an understanding of, like what we might call common sense has to be learned, inferred in order to complete the sentence correctly. So maybe the fact that it's implied on the Internet, the model is going na have to learn that, not by reading about it, by inferring it in the representation. So like common sense, just like we, I don't think we learn common sense like nobody says tells us explicitly. We just fear it all out by interacting with the world, right? And so here's a model of reading about the way people interact with the world and might have to infer that. I wonder, Yeah you briefly worked in a project called the world of bits training an our rl system to take actions on the Internet versus just consuming the Internet like you talked about. Do you think there's a future for that kind of system interacting with the Internet to help the learning?
speaker 1: Yes. I think that's probably the the final frontier for a lot of these models, because as you mentioned, when I was at opening, I was working on this project for little bits. And basically it was the idea of giving neural networks access to a keyboard and a mouse, and the idea could .
speaker 2: possibly go wrong.
speaker 1: So basically, you perceive the input of the screen pixels, and basically the state of the computer is sort of visualized for human consumption in images of the web browser and stuff like that. And then you give the neural or the ability to Press keyboards and use the mouse. And we're trying to get it to, for example, complete bookings and interact with user interfaces. And would .
speaker 2: you learn from that experience? Like what was some fun stuff? This is super cool idea. Yeah. I mean, it's like, Yeah, I mean, this the step between observer to actor Yeah is a super fascinating stuff. Yeah. Well, it's the universal .
speaker 1: interface in the digital realm, I would say. And there's a universal interface in like the physical realm, which in my mind is a humanoid form factor kind of thing. We can later talk about almus and so on. But I feel like there's a they're kind of like a similar philosophy in some way where the physical world is designed for the human form and the digital world is designed for the human form of seeing the screen and using ykeyboard and mouse. And so it's the universal interface that can basically command the digital infrastructure we've built up for ourselves. And so it feels like a very powerful interface to command and to build on top of. Now to your question as to like what I learned from that, it's interesting because the world of bits was basically too early, I think, at OpenAI at the time. This is around 2015 or so. And the zeitgeist at that time was very different in AI from the zeitgeist today. At the time, everyone was super excited about reinforcement learning from scratch. This is the time of the Atari paper where neural networks were playing Atari games and beating humans, in some cases, AlphaGo and so on. So everyone's very excited about training neural networks from scratch using reinforcement learning directly. It turns out that reinforcement learning is an extremely inefficient way of training neural networks because you're taking all these actions and all these observations, and you get some sparse rewards once in a while. So you do all this stuff based on all these inputs, and once in a while you're like told you did a good thing, you did a bad thing. And it's just an extremely hard problem when you can't learn from that. You can burn the forest and you can sort of boot forth through it. And we saw that, I think with you know with go and dota and so on and does work, but it's extremely inefficient and not how you want to approach problems, practically speaking. And so that's the approach that at the time, we also took two world of bits. We would have an agent initialized randomly, so with keyboard mash and mouse mash and tried to make a booking. And it's just like revealed the insanity of that approach very quickly, where you have to stumble by the correct booking in order to get a reward of you did it correctly. You're never going to stumble by it by chance at random.
speaker 2: So even with a simple web interface.
speaker 1: there's too many options. There's just too many options and it's too sparse of rereward signal. And you're starting from scratch at the time. And so you don't know how to read. You don't understand pictures, images, buttons. You don't understand what it means to like make a booking. But now what's happened is it is time to revisit that. And opai is interested in this. Companies like a pt are interested in this and so on. And the idea is coming back because the interface is very powerful. But now you're not training an agent from scratch. You are taking the GPT as an initialization. So GPT is pre trained on all of text, and it understands what's a booking, it understands what's a submit. It understands quite a bit more. And so it already has those representations. They are very powerful. And that makes all the training significantly more efficient and makes the problem tractable.
speaker 2: Should the interaction be with like the way humans see it, with the buttons and the language, or it should be with the html, JavaScript and the css? What do you think is the better? So today, all this interaction .
speaker 1: is mostly on the level of html, css and so on. That's done because of computational constraints. But I think ultimately, everything is designed for human visual consumption. And so at the end of the day, there's all the additional information is in the layout of the web page and what's next to you and what's a red background and all this kind of stuff and not what it looks like visually. So I think that's the final frontier as we are taking in pixels and we're giving out the keyboard mouse commands. But I think it's impractical .
speaker 2: still today. Do you worry about bots on the Internet given given these ideas, given how exciting they are? Do you worry about bots on Twitter being not the stupid bots that we see now with the crypto bots, but the bots that might be out there actually, that we don't see that they're interacting in interesting ways? So this kind of system feels like it should be able to PaaS the I'm not a robot click button, whatever, which you actually understand how that test works. I don't quite like there's there's there's a checbox or whatever that you click. It's presumably tracking, Oh, I like mouse movement and the timing and so on. Yeah. So exactly this kind of system we're talking about should be able to PaaS that. So Yeah, what do you feel about bots that are language models plus have some interactitability and are able to tweet and reply and so on? Do you worry about that world?
speaker 1: Yeah I think it's always been a bit of an arms race between sort of the attack and the defense. So the attack will get stronger, but the defense will .
speaker 2: get stronger as well. Our ability to detect that, how do you defend? How do you detect how do you know that your kpatate account on Twitter is is human? How would you approach that? Like if people claim, you know how would you defend yourself in the court of law, that I'm a human, this account.
speaker 1: Yeah at some point, I think it might be, I think the society will evolve a little bit. Like we might start signing digitally signing some of our correspondence or you know things that we create right now it's not necessary, but maybe in the future it might be. I do think that we are going towards the world where we share we share the digital space with aiis synthetic .
speaker 2: beings Yeah .
speaker 1: and they will get much better and they will share our digital realm and they will eventually share our physical realm as well. It's much harder, but that's kind of like the world we're going towards. And most of them will be benign and hopeful, and some of them will be malicious and .
speaker 2: it's going to be an arms race trying to detect them. So I mean, the worst isn't the AI is the worst is the AI is pretending to be human. So mine, I don't know if it's always malicious. There's obviously a lot of malicious applications, but Yeah, it could also be, you know, if I was AI, I would try very hard to pretend to be human because we're in a human world. Yeah, I wouldn't get any respect as an AI. Yeah, I want na get some love and respect.
speaker 1: I don't think I'm less intractable. People are people are thinking about the proof of personhood. Yes. And we might start digitally signing our stuff and we might all end up having like Yeah basically some some solution for proof of personhood. It doesn't seem to me intractable. It's just something that we haven't had to do until now. But I think once the need like really .
speaker 2: starts to emerge, which is soon, I think people will think about it much more. So but that too will be a raise because obviously you can probably a spoof or fake the the proof of personhood. So you have to try to figure out .
speaker 1: how to probably I mean.
speaker 2: it's weird that we have like social security numbers and like passports and stuff. It seems like it's harder to fake stuff in the physical space, but in the digital space, it just feels like it's gonna to be very tricky, very tricky to out because it seems to be pretty low cost to fake stuff. What are you gonna put an AI in jail for? Like trying to use a fake fake personhood proof? You I mean, okay, fine, you'll put a lot of AI in jail, but therebe more AI arbitlike exponentially more. The cost of creating a bot is very low unless there's some kind of way to track accurately. Like you're not allowed to create any program without showing tying yourself to that program. Like any program that runs on the Internet, you'll be able to trace every single human program that was involved with that program. Yeah. Maybe you have to .
speaker 1: start declaring when you know we have to start drawing those boundaries and keeping track of, okay, what are digital entities versus human entities and what is the ownership of human entities and digital entities and something like that. I don't know. But I think I'm optimistic that this is possible. And in some sense, we're currently in like the worst time of it because all these bots suddenly have become very capable, but we don't have defenses yet built up as a society. And but I think that doesn't seem to me intractable is just something that we have to deal with.
speaker 2: It seems weird that the Twitter, but like really crappy Twitter bots are so numerous. Like is it so I presume that the engineers at Twitter are very good. So it seems like what I would infer from that is it seems like a hard problem. They're probably catching, all right? If I were to sort of steal man the case, it's a hard problem. And there's a huge cost to false positive to removing a post by somebody that's not a bot, that's creates a very bad user experience. So they're very cautious about removing. So maybe it's and maybe the bots are really good at learning what gets removed and not such that they can stay ahead of the removal process very quickly. My impression .
speaker 1: of it honestly is there's a lot of lonfruit. I mean, Yeah, just that's what I it's not subtle. It's my impression of it.
speaker 2: It's not so but you have to Yeah, that's my impression as well. But it feels like maybe you're seeing the the tip of the iceberg. Maybe the number of bots is in like the trillions and you have to like, Yeah, just it's a constant assault of bots and Yeah, you Yeah Yeah I don't know you have to steal man the case because the bots I'm seeing a pretty like obvious I could write a few lines of code that catch these bots.
speaker 1: I mean, definitely there's a lot of longing fruit but I will say I agree that if you are a sophisticated actor, you could probably create a pretty good bot right now using tools like GPTs because it's a language model, you can generate faces that look quite good now, and you can do this at scale. And so I think, Yeah, it's quite possible uand.
speaker 2: it's going to be hard to defend. There was a Google engineer that claimed that the lambda senyou. Do you think there's any inkling of truth to what he felt? And more importantly, to me at least, do you think language models will achieve sentience or the illusion of sentience? Sunish ish, Yeah.
speaker 1: to me it's a little bit of a canary Nicole mine kind of moment, honestly, a little bit because so this engineer spoke to like A Chabot at Google and became convinced that this bdistantient .
speaker 2: asked some existential philosopcal .
speaker 1: questions and it gave like reasonable answers and looked real and so on. So to me, it's a, he wasn't sufficiently trying to stress the system, I think, and exposing the truth of it as it is today. But I think this will be increasingly harder over time. So Yeah I think more and more people will basically become, Yeah I think more and more there will be more people like that over time .
speaker 2: as as this gets better, like form an emotional connection .
speaker 1: to to an AI perfplausible. In my mind, I think these AI's are actually quite good at human connection, human emotion. A ton of text on the Internet is about humans and connection and love and so on. So I think they have a very good understanding in some sense of how people speak to each other about this, and they're very capable of creating a lot of that kind of text. There's a lot of like Sci phi from fifties and sixties that imagined AI's in a very different way. They are calculating, cold, Vulcan like machines. That's not what we're getting today. We're getting pretty emotional AI's that actually are very competent and capable of generating you possible sounding text with respect to all of these topics.
speaker 2: See, I'm really hopeful about AI systems that are like companions that help you grow, develop as a human being, help you maximize long term happiness. But I'm also very worried about AI systems that figure out from the Internet that humans get attracted to drama. And so these would just be like shit talking aithey just constantly, did you hear it like theydo gossip theydo theytry to plant seeds of suspicion to other humans that you love and trust and just kind of mess with people in, you know because that's going to get a lot of attention to drama, maximize drama on the path to maximizing engagement and us humans will feed into that machine Yeah and get itbe a giant drama shit storm of so I'm worried about that. So is the objective function really defines the way that human civilization progresses with the eyes in it? Yeah. I think right now.
speaker 1: at least today, they are not sort of it's not correct to really think of them as goal seeking agents that want to do something. They have no long term memory or anything. They it's literally a good approximation of it is you get a thousand words and you're trying to pretty diate a thousand then first and then you continue feeding it in and you are free to prompt it in whatever way you want. So in text, so you say, okay, you are a psychologist and you are very good and you love humans. And here's a conversation between you and another human, human colon, something, you something. And then it just continues the pattern and suddenly you're having a conversation with a fake psychologist who's like trying to help you. And so it's still kind of like in the realm of a tool. It is a people can prompt it in numbitrory waste, and it can create really incredible text, but it doesn't have long term goals over long periods of time.
speaker 2: It doesn't try to. So it doesn't look that way right now. But you can do short term goals that have long term effects. So if my prompting short term goal is to get on jaccupoto, respond to me on Twitter whenever, like I think AI might, that's the goal. But it might figure out the talking shit to you. It would be the best in a highly sophisticated, interesting way. And then you build up a relationship when you were sping once, and then it like over time, it gets to not be sophisticated and just like just talk shit. And okay, maybe you won't get to Andre, but it might get to another celebrity, it might get into other big accounts and then itjust. So with just that simple goal, get them to respond. Yeah maximize the probability of actual response. Yeah I mean.
speaker 1: you could prompt a powerful model like this with their its opinion about how to do any possible thing you're interested in here. So they will just, they're kind of on track to become these oracles. I could I sort of think of it that way. They are oracles currently is just text, but they will have calculators. They will have access to Google search. They will have all kinds of gadgets and gizmos. They will be able to operate the Internet and find different information. Yeah. In some sense, that's kind of like currently what it looks like in terms of the development.
speaker 2: Do you think itbe an improvement eventually over what Google is for access to human knowledge, like itbe a more effective search engine to access human knowledge? I think there's definite scope .
speaker 1: in building a better search engine today. And I think Google, they have all the tools, all the people. They have everything they need. They have all the puzzle pieces. They have people training transformers. At scale, they have all the data. It's just not obvious if they are capable as an organization to innovate on their search engine right now. And if they don't, someone else will. There's absolute scope for building a significantly better .
speaker 2: search engine built on these tools. It's so interesting, a large company where the search, there's already an infrastructure. It works as, brings out a lot of money. So where structuring inside a company is their motivation to pivot Yeah to say we're going to build a new search engine, Yeah.
speaker 1: that's hard.
speaker 2: So it's usually going to come from a startup.
speaker 1: right? That's that would be Yeah or some other more competent organization. So I don't know. So currently, for example, maybe Bing has another shot at it. You know as an example.
speaker 2: Microsoft Edge is we're talking offline. I mean, I definitely .
speaker 1: it's really interesting because search engines used to be about, okay, here's some query, here's here's web pages that look like the stuff that you have, but you could just directly go to answer and then have supporting evidence. And these models, basically, they've read all the texts and they've read all the web pages. And so sometimes when you see yourself going over to search results and sort of getting like a sense of like the average answer to whatever you're interested in like that just directly comes out, you don't have to do that work. So they're kind of like, Yeah, I think they have a way to this of distilling all that knowledge into like some level of insight basically.
speaker 2: Do you think of prompting as a kind of teaching and learning like this whole process like another layer, you know because maybe that's what humans are, where you have that background model and then your the world is prompting you. Yeah, exactly.
speaker 1: I think the way we are programming these computers now, like GPTs, is is converging to how you program humans. I mean, how do I program humans via prompt? I go to people and I prompt them to do things. I prompt them for information. And so natural language prompt is how we program humans. And we're starting to program computers .
speaker 2: directly in that interface. It's like pretty remarkable, honestly. So you've spoken a lot about the idea of software 2.0. All good ideas become like cliches so quickly. Like the terms is kind of hilarious. It's like I think m inem once that like if he gets annoyed by a song he's written very quickly, that means it's going to be a big hit because it's too catchy. But can you describe this idea and how you're thinking about it has evolved over the months and years since since you coined it? Yeah.
speaker 1: yes. My head ded blockpost on software 2.0, I think several years ago now. And the reason I wrote that post is because I kind of saw something remarkable happening in like software development and how a lot of code was being transitioned to be written, not in sort of like C++and so on, but it's written in the weights of a neural nubasically. Just saying that neural nets are taking over software, the realm of software, and taking more and more and more tasks. And at the time, I think not many people understood this deeply enough that this is a big deal. This is a big transition. Neural networks were seen as one of multiple classification algorithms you might use for your dataset. Problem on Kaggle like this is not that this is a change in how we program computers. And I saw neural nets as this is going to take over. The way we program computers is going to change. It's not going to be people writing a software in C++or something like that and directly programming the software. It's going to be accumulating training sets and data sets and crafting these objectives by which we train these neural nets. And at some point, there's going to be a compilation process from the datsets and the objective and the architecture specification into the binary, which is really just the neural nut you know weights and the forward PaaS of the neural nuand. Then you can deploy that binary. And so I was talking about that sort of transition, and that's what the post is about. And I saw this sort of play out in a lot of fields, you know, auautoppod being one of them, but also just simple image classification. People thought originally, you know in the eighties and so on, that they would write the algorithm for detecting a dog in an image. And they had all these ideas about how the brain does it. And first we detect corners, and then we detect lines, and then we stitch them up. And they were like really going at it. They were like thinking about how they're going to write the algorithm. And this is not the way you build it. There was a smooth transition where, okay, first we thought we were going na build everything. Then we were building the features. So like hog features and things like that, that detect these little statistical patterns from image patches. And then there was a little bit of learning on top of it, like a support vector machine or binary classifier for cat versus dog and images on top of the features. So we wrote the features, but we trained the last layer, sort of the classifier. And then people are like, actually, let's not even design the features because we can't honestly, we're not very good at it. So let's also learn the features. And then you end up with basically a comvolutional neural nut, where you're learning most of it. You're just specifying the architecture, and the architecture has tons of filling blanks, which is all the knobs, and you let the optimization write most of it. And so this transition is happening across the industry everywhere. And suddenly we end up with a ton of code that is written in neural net weights. And I was just pointing out that the analogy is actually pretty strong. And we have a lot of developer environments for software 1.0. Like we have ides, how you work with code, how you debug code, how do you run code, how do you maintain code? We have GitHub. I was trying to make those analogies in the urreallike. What is the GitHub? A software 2.0? Turns out it's something that looks like hugging face right now, you know and so I think some people took it seriously and built cool companies. And many people originally attacked the post. It actually was not built received when I wrote it. And I think maybe it has something to do with the title, but the post was not well received. And I think more people sort of have been coming around to it over time.
speaker 2: Yeah. So you were the director of AI at Tesla, where I think this idea was really implemented at scale, which is how you have engineering teams doing software 2.0. So can you sort of linger on that idea of I think we're in the really early stages of everything you just said, which is like GitHub ides, like how do we build engineering teams that that work in software 2.0 systems and and the data collection and the data annotation, which is all part of that software 2.0. Like what do you think is the task of programming in software 2.0? Is it debugging in the space of hyperparameters? Or is it also debugging .
speaker 1: the space of data? Yeah. The way by what you program the computer and influence it's algorithm is not by writing the commands yourself. You're changing mostly the data aset. You're changing the loss functions of like what the neural net is trying to do, how it's trying to predict things. But Yeah, basically the data sets and the architecture of the neural nuand. So in the case of the autopilot, a lot of the data sets had to do with, for example, detection of objects and lane line markings and traffic lights and so on. So accumulate massive data sets of here's an example, here's the desired label, and then here's roughly what the algorithm should look like. And that's a compuvolutional neural net. So the specification of the architecture is like a hint as to what the algorithm should roughly look like. And then the fill in the blanks process of optimization is the training process. And then you take your neural net that was trained. It gives all the right answers on your data set and you deploy it.
speaker 2: So there's in that case perhaps that all machine learning cases, there's a lot of tasks. So is coming up, formulating a task like for a multi headed neural network? Is formulating a task part of the programming? Yeah harmarso.
speaker 1: how do you break down a problem?
speaker 2: Yeah into a set of tasks? Yeah. I'm on the high level.
speaker 1: I would say if you look at the software running in in the autopilot, I gave a number of talks on this topic. I would say originally a lot of it was written in software 1.0. There's imagine lots of C++, right? And then gradually there was a tiny neural nut that was, for example, predicting, given a single image, is there like a traffic light or not, or is there a landline marking or not? And this neural nut didn't have too much to do in the scope of the software. It was making tiny predictions on individual little image. And then the rest of the system stitched it up. So okay, we're actually we don't have just a single camera with eight cameras. We actually have eight cameras over time. And so what do you do with these predictions? How do you put them together? How do you do the fusion of all that information and how do you act on it? All of that was written by humans in C++. And then we decided, okay, we don't actually want to do all of that fusion in C++code because we're actually not good enough to write that algorithm. We want the neural Neto write the algorithm, and we want to port all of that software into the 2.0 stack. And so then we actually had neural nets that now take all the eight camera images simultaneously and make predictions for all of that. Actually, they don't make predictions in the space of images. They now make predictions directly in 3D, and actually they don't in three dimensions around the car. And now actually, we don't manually fuse the predictions in 3D over time. We don't trust ourselves to rithat tracker. So actually, we give the neural net the information over time. So it takes these videos now and makes this predictions. And so you're sort just like putting more and more power into the neural net, more more processing. And at the end of it, the eventual sort of goal is to have most of this software potentially be in the 2.0 end because it works significantly better.
speaker 2: Humans are just not very good at writing software basically. So the prediction is happening in this like 4D land Yeah with three dimensional world over time. Yeah how do you do annotation in that world? What have you? So data annotation, whether it's self supervised or manual by humans is. Is a big part of this software 2.0 world.
speaker 1: right? I would say by far in the industry, if you're like talking about the industry and how what is the technology of what we have available, everything is supervised learning. So you need a data sets of input, desired output, and you need lots of it. And there are three properties of it that you need. You need it to be very large. You need it to be accurate, no mistakes, and you need it to be diverse. You don't want to just have a lot of correct examples of one thing. You need to really cover the space of possibility as much as you can. And the more you can cover the space of possible inputs, the better the algorithm will work at the end. Now, once you have really good data sets that you're collecting, curating and cleaning, you can train your neural nut on top of that. So a lot of the work goes into cleaning those data sets. Now as you pointed out, it's probably it could be the question is, how do you achieve a ton of if you want to basically predict in 3D, you need data in 3D to back that up. So in this video, we have eight videos coming from all the cameras of the system. And this is what they saw, and this is the truth of what actually was around. There was this car. There was this car, this car. These are the lanline markings. This is the geometry of the road. There was traffic light in this three dimensional position. You need the ground truth. And so the big question that team was solving, of course, is how do you, how do you arrive at that ground truth? Because once you have a million of it, and it's large, clean and diverse, the training and neural Maon, it works extremely well, and you can ship that into the car. And so there's many mechanisms by which we collected that training data. You can always go for human antation. You can go for simulation as a source of ground truth. You can also go for what we call the offline tracker that we've spoken about at the AI day and so on, which is basically an automatic reconstruction process for taking those videos and recovering the three dimensional sort of reality of what was around that car. So basically think of doing like a three dimensional reconstruction as an offline thing and then understanding that, okay, there's 10s of video, this is what we saw, and therefore, here's all the lanlines cars and so on. And then once you have that annotation, you can train neural nets to imitate it.
speaker 2: And how difficult is the t, the three reconstruction?
speaker 1: It's difficult.
speaker 2: but it can be done. So so there's overlap between the cameras and you do the reconstruction and there's perhaps if there's any inaccuracy. So that's caught in the annotation step. Yes. The nice thing about the annotation .
speaker 1: is that it is fully offline. You have infinite time. You have a chunk of one minute and you're trying to just offline in a supercomputer somewhere, figure out where were the positions of all the cars, of all the people, and you have your full one minute of video from all the angles, and you can run all the neural nets you want, and they can be very efficient, massive neural nets. There can be neural nests that can't even run in the car later at test time. So they can be even more powerful neural nests than what you can eventually deploy. So you can do anything you want, three dimensional reconstruction and neural nets, anything you want, just to recover that truth. And then you supervise that truth.
speaker 2: What have you learned? He said no mistakes about humans doing annotation because I assume humans are there's like a range of things they're good at in terms of clicking stuff on screen. Isn't that how interesting is that? You have a problem of designing an annotator where humans are accurate. Enjoy it. Like what are they? Even the metrics or efficient or productive, all that kind of stuff.
speaker 1: Yeah so I grew the annotation team at Tesla from basically zero to 1000 while I was there. That was really interesting. You know my background as a PhD student researcher, so growing that kind organization was pretty crazy. But Yeah, I think it's extremely interesting and part of the design process very much behind the autopilot as to where you use humans. Humans are very good at certain kinds of annotations, very good, for example, at two dimensional annotations of images. They're not good at annotating cars over time in three dimensional space, very, very hard. And so that's why we were very careful to design the tasks that are easy to do for humans versus things that should be left to the offline tracker. Like maybe the computer will do older triangulation and 3D construction, but the human will say, exactly these pixels of the image are a car. Exactly these pixels are human. And so co designing the data annotation pipeline was very much bread and butter, was what I was doing daily.
speaker 2: Do you think there's still a lot of open problems in that space? Just in general annotation, where the stuff the machines are good at, machines do and the humans do what they're good at, and there's maybe some iterative process, right?
speaker 1: I think to a very large extent, we went through a number of iterations and we learned a ton about how to create these data sets. I'm not seeing big open problems. Like originally, when I joined, I was like, I was really not sure how this will turn out. Yeah. But by the time I left, I was much more secure in actually we sort of understand the philosophy of how to create these data sets, and I was .
speaker 2: pretty comfortable with where that was at the time. So what are strengths and limitations of cameras for the driving task? In your understanding, when you formulate the driving task as a vision task with the eight cameras, you've seen that the entire you know most of the history of the computer vision field, what it has to do in neural networks. What just if you step back, what are the strengths and limitations of pixels.
speaker 1: of using pixels that drive? Yeah pixels, I think are a beautiful sensory, beautiful sensor. I would say the things is like cameras are very, very cheap and they provide a ton of information, ton of bits. Also, it's extremely cheap sensor for a ton of bits. And each one of these bits is a constraint on the state of the world. And so you get lots of megapixel images, very cheap, and it just gives you all these constraints for understanding what's actually out there in the world. So vision is probably the highest bandwidth sensor.
speaker 2: It's a very high bandwidth sensor. And I love that pixels is a constraint on the world, this highly complex high bandwidth constraint in the world, on the state of the world.
speaker 1: That's it's not just that, but again, this real, real importance of it's the sensor that humans use. Therefore, everything is designed for that sensor. Yeah the text derithe flashing signs, everything is designed for vision. And so you just find it everywhere. And so that's why that is the interface you want to be in, talking again about these universal interfaces. And that's where we actually want to measure the world as well and then .
speaker 2: develop software for that sensor. But there's other constraints on the state of the world that humans use to understand the world. I mean, vision ultimately is the main one. But we're like we're like referencing our understanding of human behavior and some common sense physics that could be inferred from vision from a perception perspective. But it feels like we're using some kind of reasoning to predict the world Yeah not just the pixels. I mean.
speaker 1: you have a powerful prior us right, for how the world evolves over time, etc. So it's not just about the likelihood term coming up from the data itself telling you about what you are observing, but also the prior term of like where where are the likely things to see and how .
speaker 2: do they likely move and so on. And the question is how complex is the the the range of possibilities that might happen in the driving task that's still is that to use still an open problem of how difficult is driving? Like philosophically speaking, like do all the time you worked on driving, do you understand how hard driving is? Yeah, driving is really hard because it .
speaker 1: has to do with the predictions of all these other agents and the theory of mind and you know what they're gonna do and are they looking at you? They where are they looking? Where are they thinking? Yeah, there's a lot that goes there at the full tail of you know the the expansion of the nines that we have to be comfortable with. Eventually. The final problems are of that form. I don't think those are the problems that are very common. I think eventually they're important.
speaker 2: But it's like really in the tail end. In the tail end, the rare edge cases from the vision perspective, what are the toughest parts of the vision problem of driving?
speaker 1: Well, basically the sensor is extremely powerful, but you still need to process that information. And so going from brightnesses of these pixel values to, Hey, here, the three dimensional world is extremely hard. And that's what the neural networks are fundamentally doing. And so the difficulty really is in just doing an extremely good job of engineering the entire pipeline, the entire data engine, having the capacity to train these neural nuts, having the ability to evaluate the system and iterate on it. So I would say just doing this in production at scale is like the hard part.
speaker 2: It's an execution problem. So the data engine, but also the the sort of deployment of the system such that has low latency performance. Yes. So it has to do all these steps Yeah .
speaker 1: for the neural nuspecifically, just making sure everything fits into the chip on the car and you have a finite budget of flops that you can perform and and memory bandwidth and other constraints, and you have to make sure it flies and you can squeeze in as much computer .
speaker 2: as you can into the tiny. What have you learned from that process? Because maybe that's one of the bigger like new things coming from a research background where there's a system that has to run under heavily constrained resources, has to run really fast. What kind of insights have you learned from that?
speaker 1: Yeah. I'm not sure if it's if there's too many insights you're trying to create, a neural Mathat will fit in what you have available and you're always trying to optimize it. And we talked a lot about it on the AI day and basically the triple backflips that the team is doing to make sure it all fits and utilizes the engine. So I think it's extremely good engineering. And then there's all kinds of little insights peppered in on how to do it properly.
speaker 2: Let's actually zoom out because I don't think we talked about the data engine. The entirety of the. Layout of this idea that I think is just beautiful with humans in the loop. Can you describe the data engine?
speaker 1: Yeah the data engine is what I call the almost biological feeling like process by which you perfect the training sets for these neural networks. So because most of the programming now is in the level of these data sets and make sure they're large, diverse and clean, basically you have a data set that you think is good. You train your neural nut, you deploy it, and then you observe how well it's performing. And you're trying to always increase the quality of your datset. So you're trying to catch scenarios. Basically, there are basically rare, and it is in these scenarios that your neural nets will typically struggle in because they weren't told what to do in those rare cases in the datset. But now you can close the loop because if you can now collect all those at scale, you can then feed them back into the reconstruction process I described, and reconstruct the truth in those cases and add it to the data set. And so the whole thing ends up being like a staircase of improvement, of perfecting your training set. And you have to go through deployments so that you can mine the parts that are not yet represented well in the data set. So your data set is basically imperfect. It needs to be diverse. It has pockets that are missing, and you need to pad out the pockets. You can sort of think of it that way in the data.
speaker 2: What role do humans play in this? So what's this biological system like? A human body is made up of cells. What role like? How do you optimize the human system? The the multiple engineers collaborating, figuring out what to focus on, what to contribute, which task to optimize in this neural network, who's in charge of figuring out which task needs more data? What you can you speak to the hypoparameters of the human system? It really just comes down to extremely good .
speaker 1: execution from an engineering team who knows what they're doing. They understand intuitively the philosophical insights underlying the data engine and the process by which the system improves, and how to again, like delegate the strategy of the data collection and how that works, and then just making sure it's all extremely well executed. And that's where most of the work is, not even the philosophizing or the research or the idea of it is just extremely good.
speaker 2: Execution is so hard when you're dealing with data at that scale. So your role in the data engine executing well on it, it is difficult and extremely important. Is there a priority of like like a vision board of saying like we really need to get better at stoplights, like the prioritization of tasks? Is that essentially and that comes from the data that comes .
speaker 1: to a very large extent to what we are trying to achieve in the product, from map where we're trying to the release we're trying to get out in the feedback from the Q A team where the system is struggling or not.
speaker 2: the things that we're trying to improve. And the q 18 team gives some signal, some information in aggregate about the performance of the system .
speaker 1: in various conditions. And then of course, all of us drive it and we can also see it. It's really nice to work with the system that you can also .
speaker 2: experience yourself. And you know it drives you home. It's is there some insight you can draw from your individual experience that you just can't quite get from an aggregate statistical analysis of data? Yeah, it's so weird, right? Yes, it's it's not scientific in a sense because you're just one anecdotal sample.
speaker 1: Yeah I think there's a ton of it's a source of truth. It's your interaction with the system Yeah and you can see it. You can play with it. You can turbit, you can get a sense of it. You have an intuition for it. I think numbers just like have a way of numbers and plots and graphs are you know much harder.
speaker 2: It hides a lot of it's like if you train a language model, it's a really part of way is by you interacting with it. Yeah, 100 startrying to build up .
speaker 1: an intuition. Yeah, I think like Elon also like he always wanted to drive this the system himself. He drives a lot and I wanna say daily, so he also sees this as a source of truth, you driving the system and at performing and Yeah.
speaker 2: so what do you think? Tough questions here. So Tesla last year removed radar from from the sensor suite, and now just announthat is going to remove ultrasonic sensors relying solely on vision. So camera only. Does that make the perception problem harder or easier?
speaker 1: I would almost reframe the question in some way. So the thing is, basically, you would think .
speaker 2: that additional sensors, by the way, can I just interrupt? Good. I wonder if a language model will ever do that if you prompt it. Let me reframe your question. That would be epic. This is the wrong prompt.
speaker 1: Sorry. So it's like a little bit of a wrong question because basically, you would think that these sensors are an asset to you. Yeah. But if you fully consider the entire product in its entirety, these sensors are actually potentially liability because these sensors aren't free. They don't just appear on your car. You need suddenly you have an entire supply chain. You people procuring it, there can be problems with them. They may need replacement. They are part of the manufacturing process. They can hold back the line in production. You need to source them. You need to maintain them. You have to have teams that ride the firmware, all of the elit, and then you also have to incorporate and fuse them into the system in some way. And so it actually like bloats a lot of it. And I think Elon is really good at simplify, simplified, best part is no part. And he always tries to throw away things that are not essential because he understands the entropy in organizations and an approach. And I think in this case, the cost is high and you're not potentially seeing it if you're just a computer vision engineer and I'm just trying to improve my network. And you know is it more useful or less useful? How useful is it? The thing is, if once you consider the full cost of a sensor, it actually is potentially a liability and you need to be really sure that it's giving you extremely useful information. In this case, we looked at using it or not using it and that delta was not massive.
speaker 2: And so it's not useful, is it? Also blow in the data engine, like having more senthree is a distraction.
speaker 1: And these sensors, you know they can change over time. For example, you can have one type of, say, radar. You can have other type of radar. They change over time. Now, suddenly you need to worry about it. Now I'm suddenly you have a column in your sql light telling you, Oh, what sensor type was it? And they all have different distributions. And then just they contribute noise and entropy into everything, and they bloat stuff. And also organizationally, it's been really fascinating to me that it can be very distracting you if you all you want to get to work is vision. All the resources are on it. And you're building out a data engine and you're actually making forward progress because that is the sensor with the most bandwidth, the most constraints in the world, and you're investing fully into that. And you can make that extremely good if you're only a finite amount of sort of spend of focus across different .
speaker 2: facets of the system. And this kind of reminds me of which sentence a bit or less, son, that just seems like simplifying the system. Yeah in the long run. Now, of course, you don't know what the long run. It seems to be always the right solution. Yeah, yes, in that case, it was for real, but it seems to apply generally across all systems that do computation. Yeah so where what do you think about the lidar as a crutch debate? The battle between point clouds and pixels. Yeah I think .
speaker 1: this debate is always like slightly confusing to me because it seems like the actual debate should be about like do you have the fleet or not? That's like the really important thing about whether you can achieve a really good functioning of an AI system at the scale.
speaker 2: So data collection systems.
speaker 1: Yeah do you have a fleet or not is significantly more important whether you have lidar or not. It's just another sensor. And Yeah, I think similar to the radar discussion basically, Yeah, I don't think it basically doesn't offer extra extra information. It's extremely costly. It has all kinds of problems. You have to worry about it. You have to calibrate it etcec. It creates bloto entropy. You have to be really sure that you need this this sensor. In this case, I basically don't think you need it. And I think honestly, I will make a stronger statement. I think the others some of the other companies .
speaker 2: who are that are using it are probably going to drop it. Yeah. So you have to consider the sensor in the full in considering can you build a big fleet that collects a lot of data and can you integrate that sensor with it, that data and that sensor into a data engine that's able to quickly find different parts of the data that then continuously improves whatever the model that you're using? Yeah.
speaker 1: Another way to look at it is like vision is necessary in a sense that the drive the world is designed for human visual consumption. So you need vision, it's necessary. And then also it is sufficient because it has all the information that you need for driving. And humans obviously is a vision to drive. So it's both necessary and sufficient. So you want to focus resources and you have to be really sure if you're going to bring in other sensors, could you could add sensors to infinity at some point, you need to draw the line. And I think in this case, you have to really consider the full cost of any one sensor that you're adopting and do you really need it? And I think the answer in .
speaker 2: this case is no. So what do you think about the idea that the other companies are forming high resolution maps and constraining heavily the geographic regions in which they operate? Is that approach not, in your view, not going to scale over time to the entirety of the United States? I think I'll take two mentioned. Like they pre map .
speaker 1: all the environments and they need to refresh the map, and they have a perfect centimeter level accuracy map of everywhere they're gonna to drive. It's crazy. How are you going to when talking about autonomy actually changing the world, we're talking about deployment on the global scale of autonomous systems for transportation. And if you need to maintain sending your an accurate map for earth. Or like for many cities and keep them updated. It's a huge dependency that you're taking on, huge dependency. It's it's a massive, massive dependency. And now you need to ask yourself, do you really need it? And humans don't need it, right? So it's very useful to have a low level map of like, okay, the connectivity of your road, you know that there's a fork coming up when you drive an environment, you sort of have that high level understanding. It's like a small Google map. And Tesla uses Google map like a similar kind of resolution information in the system, but it will not premap environment. So setimental level accuracy. It's a crutch. It's a distraction. It costs entropy, and it diffuses the team. It dilutes the team. And you're not focusing on .
speaker 2: what's actually necessary, which is the computer vision problem. What did you learn about machine learning, about engineering, about life, about yourself as one human being? From working with Elon Musk.
speaker 1: I think the most I've learned is about how to sort of run organizations efficiently and how to create efficient organizations and how to fight .
speaker 2: entropy in an organization. So human engineering in .
speaker 1: the fight against entropy, Yeah, there's there's A I think Elon is a very efficient warrior in the fight against entropy in organizations.
speaker 2: What does entropy in an organization .
speaker 1: look like exactly? Its process, it's its process and inefficiencies in the formeetings and that kind of stuff. Yeah meetings. He hates meetings. He keeps telling people to skip meetings if they're not useful. He basically runs the world's biggest startups. I would say Tesla, SpaceX are the world's biggest startups. Tesla actually is multiple startups. I think it's better to look at it that way. And so I think he's he's extremely good at that. And Yeah, he's a very good intuition for streamlining processes, making everything efficient. Best part is no part simplifying, focusing and just kind of removing barriers, moving very quickly, making big moves. All this is a very startup sort of seeming things.
speaker 2: but at scale. So strong drive to simplify from your perspective. I mean, that that also probably applies to just designing systems and machine learning and otherwise .
speaker 1: like simplify, simplify.
speaker 2: What do you think is the secret to maintaining the startup culture in a company that grows? Is there can you introspect that?
speaker 1: I do think he needs someone in a powerful position with a big hammer like Elon, who's like the cheerleader for that idea and ruthlessly pursues it. If no one has a big enough hammer, everything turns into committees, democracy within the company process, talking to stakeholders, decision making, just everything just crumbles. If you have a big person who is also really smart and has a big hammer, things move quickly.
speaker 2: So you said your favorite scene in interseller is the intense stocking scene with the AI and Cooper talking, saying, Cooper, what are you doing? Docking? It's not possible. No, it's necessary. Such a good line. By the way, just so many questions there. Why an AI in that scene, presumably is supposed to be able to compute a lot more than the human. It's saying it's not optimal. Why the human? I mean, that's a movie, but shouldn't the AI know much better than the human anyway? What do you think is the value of setting seemingly impossible goals? So like our initial intuition, which seems like something that you have taken on that, Elana ouses, that where the initial intuition of the community might say, this is very difficult, and then you take it on anyway with a crazy deadline. You just from a human engineering perspective, have you seen the value of that?
speaker 1: I wouldn't say that setting impossible goals exactly is is a good idea, but I think setting very ambitious goals is a good idea. I think there's a what I call sulinear scaling of difficulty, which means that ten x problems are not ten x hard. Usually ten x ten x harder problem is like two or three x harder to execute on. Because if you want to actually like if you want to improve the system by 10%, it costs some amount of work. And if you want na ten x improve the system, it doesn't cost 100x amount of the world. And it's because you fundamentally change the approach. And if you start with that constraint, then some approaches are obviously dumb and not going to work and it forces you to reevaluate.
speaker 2: And I think it's a very interesting way of approaching problem solving, but it requires the weird kind of thing is going back to your like PhD days. It's like how do you think which ideas in the machine learning community are solvable? Yes.
speaker 1: it's it requires.
speaker 2: what is that? I mean, there's the cliche of first principles thinking, but like it requires to basically ignore what the community is saying because it doesn't. The community doesn't a community in science usually draw lines of what isn't and isn't possible, right? And like it's very hard to break out of that without going crazy. Yeah. I mean.
speaker 1: I think a good example here is you know, the deep learning revolution in some sense, because you could be in computer vision at that time during the deep learning sort of revolution of 2012 and so on, you could be improving a computer vision stack by 10% or we can just be saying, actually, all this is useless. And how do I do ten x better computer vision? Well, it's not probably by tuning a hog feature detector. I need a different approach. I need something that is scalable, going back to Richard sutons and understanding, sort of like the philosophy of the biter lesson, and then being like, actually I need a much more scalable system, like a neural network that in principle works, and then having some deep believers that can actually execute on that mission and make it work. So that's the tenx solution.
speaker 2: What do you think is the timeline to solve the problem of autonomous driving? That's still in part an open question.
speaker 1: Yeah. I think the tough thing with timelines of saldriving, obviously, is that no one has created saldriving. Yeah. So it's not like, what do you think is a timeline to build this bridge? Well, we've built million bridges before. Here's how long that takes. It's you know it's no one has built autonomy. It's not obvious. Some parts turn out to be much easier than others. So it's really hard to forecast. You do your best based on trend lines and so on and based on intuition. But that's why fundamentally, it's just really .
speaker 2: hard to forecast this. No one has even still like being inside .
speaker 1: of it is hard to to do. Yes, some things turn out to be much harder .
speaker 2: and some things turn out to be much easier. Do you try to avoid making forecasts because like Elon doesn't avoid them, right? And heads of car companies in the past have not avoided it either. Ford and other places have made predictions that we're gonna to solve at level four, driving by 20, 20, 20, 21, whatever. And now they're all kind of backtrack in that prediction. Iyou, as a as an AI person, do you for yourself privately make predictions or do they get in the way of like your actual ability to think about a thing? Yeah, I would say like what's easy to say is .
speaker 1: that this problem is tractable, and that's an easy prediction to make. It's tractable. It's going to work. Yes, it's just really hard. Something things turn out to be harder, something turned out to be easier. So but it definitely feels tractable and it feels like at least the team at Tesla, which is what I saw internally.
speaker 2: is definitely on track to that. How do you form a strong representation that allows you to make a prediction about tractability? So like you're the leader of a lot a lot of humans. You have to kind of say, this is actually possible. Like how do you build up that intuition? It doesn't have to be even driving. It could be other tasks. It could be and I want what difficult tasks did you work on in your life? I mean, classification, achieving certain, just imagine that certain level of superhuman .
speaker 1: level performance. Yeah, expert intuition. It's just intuition. It's belief.
speaker 2: So just like thinking about it long enough, like studying, looking at sample data, like you said, driving, my intuition is really flawed on this. But like I don't have a good intuition about tractability. It could be either. It could be anything. It could be solvable. Like you know the driving task could be simplified into something quite trivial. Like the solution to the problem would be quite trivial. And at scale, more and more cars driving perfectly might make the problem much easier the more cars you have driving. Like people learn how to drive correctly, not correctly, but in a way that's more optimal for heterogeneous system of autonomous than semi autonomous and manually driven cars. That could change stuff. Then again, also, I've spent a ridiculous number of hours just staring at pedestrians, crossing streets, thinking about humans. And it feels like the way we use our eye contact, it sends really strong signals and there's certain quks and edge cases of behavior. And of course, a lot of the fatalities that happen have to do with drunk driving and both on the pedestan side and the driverside. So there's that problem of driving at night and all that kind of. So I wonder, you know it's like the space, a possible solution to tonomous driving includes so many human factor issues that it's almost impossible to predict. There could be super clean, nice solutions. Yeah.
speaker 1: I would say definitely like to use a game analogy, there's some fog of war, but you definitely also see the frontier of improvement and you can measure historically what you've have made progress. And I think, for example, at least what I've seen in roughly five years at Tesla, when I joined, it barely kept Laon the highway. I think going up from palalto sf was like three or four interventions anytime the road would do anything geometrically or turn too much, it would just like not work. And so going from that to like a pretty competent system in five years and seeing what happens also under the hood and what the scale at which the team is operating now with respect to data and compute and everything else is just a massive progress.
speaker 2: So you're climbing a mountain and yes, fog, but you're making a lot of progress. Fog, you're making progress .
speaker 1: and you see what the next directions are and you're looking at some of the remaining challenges. And they're not like they're not perturbing you and they're not changing your philosophy and you're not contorting yourself. You're like, actually, these are the things .
speaker 2: that we still need to do. Yet the fundamental components of solving the problem seem to be there, from the data engine to the computer, to the computer on the car, to the computer and the training, all that kind of stuff. So you've done over the years, you've been a test, you've done a lot of amazing breakthrough ideas and engineering all of it, from the data engine to the human side, all of it. Can you speak to why you chose to leave Tesla? Basically, as I described that? Ren.
speaker 1: I think over time during those five years, I've kind of gotten myself into a little bit of a managerial position. Most of my days were you know meetings and growing the organization and making decisions about sort of high level strategic decisions about the team and what it should be working on and so on. And it's kind of like a corporate executive role and I can do it. I think I'm okay at it, but it's not like fundamentally what I what I enjoy. And so I think when I joined, there was no computer vision team because Tesla was just going from the transition of using moi, a third party vendor, for all of its computer vision, to having to build its computer vision system. So when I showed up, there were two people training deep neural networks, and they were training them at a computer at their legs.
speaker 2: like they had a .
speaker 1: basic classification task. Yeah, I kind of like grew that into it. What I think is a fairly respectable deep learning team, a massive compete cluster, a very good day tation organization. And I was very happy with where that was. It became quite autonomous. And so I kind of stepped away and I you know I'm very excited to do much more technical things again. Yeah and kind of like refocus on agi.
speaker 2: What was this soul searching like? Because you took a little time off and think, like what how many mushrooms did you take? No, I'm just I mean, what war was going through your mind? The human lifetime is finite Yeah you did a few incredible things. You're you're one of the best teachers of AI in the world. You're one of the best, and I don't mean that I mean that in the best possible way. You're one of the best tinkers in the AI world, meaning like understanding the fundamental fundamentals of how something works by building it from scratch, playing with the basic intuitions. It's like Einstein, Feynman, we're all really good at this kind of stuff. Like small example of a thing to play with it, to try to understand it. So that and obviously now with tato, you help build a team of machine learning like engineers and a system that actually accomplishes something in the real world. So given all that, like what was the soul slashing like?
speaker 1: Well, it was hard because obviously I love the company a lot. And I love Elon. I love Tesla. So was hard to leave. I love the team basically. But Yeah, I think I actually I would be potentially like interested in revisiting it when you're coming back at some point. Working in optimus, working in agi at Tesla, I think Tesla is going to do incredible things. It's basically like it's a massive large scale robotics kind of company for the ton of in house talent for doing really incredible things. And I think human or robots are going to be amazing. I think autonomous transportation is going to be amazing. All this is happening at tuesla. So I think it's just a really amazing organization. So being part of it and helping it alone, I think was very basically, I enjoyed that a lot. Yeah, it was basically difficult for those reasons because I love the company. But you know I'm happy to potentially at some point come back for act two. But I felt like at this stage I built the team, it felt autonomous and I became a manager, and I wanted to do a lot more technical stuff. I wanted to learn stuff, I wanted to teach stuff. And I just kind of felt like it was a good time .
speaker 2: for a change of pace a little bit. What do you think is the best movie sequel of all time? Speaking apart to because most of them suck movie sequels. Movie sequels. Yeah. And you tweeted about Moviso. Just a tiny tangent is there? What's your what's like a favorite movie sequel, Godfather Part two are your fgodfather because you didn't even tweet or mention the godfather? Yeah.
speaker 1: I don't love that movie. I know it hasn't .
speaker 2: gonna edit that out. We're gonna edit out the hate towards the godfather.
speaker 1: How dare you just think I will make a strong statement? I don't know why. I don't know why, but I basically don't like any movie before 1995.
speaker 2: something like that. Didn't you mention terminator two?
speaker 1: Okay, okay. That's like a terminator two was a little bit later 1990? No.
speaker 2: I think terminator two was in the .
speaker 1: and I like terminator one as well. So okay, so like few exceptions, but by and large, for some reason, I don't like movies before 1995 or something. They feel very slow. The camera is like zoomed out. It's boring.
speaker 2: It's kind of naive. It's kind of weird and also terminated was very much .
speaker 1: ahead of its time. Yes. And the godfather, there's like no agi. So.
speaker 2: I mean, but you have goodwill hunting was one of the movies you mentioned and that doesn't have any agi either.
speaker 1: I guess it has mathematics. Yeah. I guess occasionally .
speaker 2: I do enjoy movies that don't feature or like anchor .
speaker 1: man that has that's .
speaker 2: the anchor man is so good. I don't understand speaking of agi because I don't understand why wolfaro is so funny. It doesn't make sense. It doesn't compute. There's just something about him and he's a singular human because you don't get that many comedies these days. And I wonder if it has to do about the culture or the like, the machine of Hollywood, or does have to do it just we got lucky with certain people in comedy that came .
speaker 1: together because he is a singular human.
speaker 2: That was a ridiculous tangent to I apologize, but you mentioned humanoid robots. So what do you think about optimus, about Tesla bot? Do you think we'll have robots in the factory, in the home in ten, 20, 30, 40, 50 years? Yeah. I think it's a very hard project.
speaker 1: I think it's going to take a while. But who else is going to build human anorobots at scale? Yeah. And I think it is a very good form factor to go after, because like I mentioned, the world is designed for human oform factor. These things would be able to operate our machines. They would be able to sit down in chairs, potentially even drive cars. Basically, the world is designed for humans. That's the form factory you want to invest into and make work over time. I think, you know, there's another school of thought, which is, okay, pick a problem and design a robot to it. But actually designing a robot and getting a whole data engine and everything behind it to work is actually incredibly hard problem. So it makes sense to go after general interfaces that, okay, they are not perfect for any one given task, but they actually have the generality of just with a prompt with English able to do something across. And so I think it makes a lot of sense to go after a general interface in the physical world. And I think it's a very difficult project and it's going to take time, but I'd seen no other company that can execute on that vision. I think it's going to be amazing. Like basically physical labor. Like if you think transportation is a large market, try physical labor.
speaker 2: But it's not just physical labor. To me, the thing that's also exciting is the social robotics. So the relationship we'll have on different levels with those robots. Yeah, that's why I was really excited to see optimists. Like people have criticized me for the excitement, but I've worked with a lot of research labs that do humanoid legged robots. Boston Dynamics unitary a lot. There's a lot of companies that do legged robots, but that's the the elegance of the movement is a tiny, tiny part of the big picture. So integrating the two big exciting things to me about Tesla doing humanoid or any legged robots is clearly integrating it into the data engine. So the data engine aspect, so the actual intelligence for the perception and the control and the planning and all that kind of stuff, integrating into this the fleet that you mentioned, right? And then speaking of fleet, the second thing is the MaaS manufactuers. Just knowing culturally, driving towards a simple robot that's cheap to produce at scale, Yeah and doing that well, having experience to do that well, that changes everything. That's why that's a very different culture and style than Boston Dynamics, who, by the way, those robots are just the way they move. It's like itbe a very long time before Tessla can achieve the smoothness of movement. But that's not what it's about. It's it's about the entirety of the system. Like we talked about the data engine and the fleet, and that's super exciting, even the initial sort of models. But that too was really surprising that in a few months, you can get a prototype yeand. The reason that .
speaker 1: happened very quickly is, as you alluded to, there's a ton of copy paste from what's happening in the autopilot a lot. The amount of expertise that like came out of the woodworks at Tesla for building the human robot was incredible to see. Like basically Elon said at one point, we're doing this. And then next day, basically like all these cad models started to appear and people talking about like the supply chain and manufacturing and people showed up with like screwdrivers and everything like the other day and started to like put together the body. And I was like, Whoa, like all these people exist at Tesla. And fundamentally, building a car is actually not that different from building a robot. The same. And that is not just for the hardware pieces. And also, let's not forget hardware, not just for demo, but manufacturing of that hardware at scale. It is like a whole different thing, but for software as well. Basically, this robot currently thinks it's a car.
speaker 2: It's going to have a midlife crisis at some point.
speaker 1: It thinks it's a car. Some of the earlier demos, actually, we were talking about potentially doing them outside in the parking lot because that's where all of the computer vision was like working out of the box as well, instead of like inside. But all the operating system, everything, just copy paste. Computer vision, mostly copy paste. I mean, you have to retrain the neural nuts. But the approach and everything and data engine and offline trackers and the way we go about the occupancy tracker and so on, everything, copy paste, you just need to retrain the neural nuts. And then the planning control, of course, has to change quite a bit. But there's a ton of copy paste from what's happening at Tesla. And so if you were to if you were to go with goal of like, okay, let's build a million human robots and you're not Tesla, that's a lot to ask. If you're Tesla, it's actually like it's .
speaker 2: not it's not that crazy. And then then the final question is then how difficult, just like we're driving, how difficult does the manipulation task such that it can have an impact that scale? I think depending on the context, the really nice thing about robotics is that unless you do a manufacture and that kind of stuff, is there more room for error? Driving is so safety critical and so that and also time critical like got a robot is allowed to move slower. Yeah, which is nice. Yes, I think it's going to .
speaker 1: take a long time. But the way you want to structure the development is you need to say, okay, it's going to take a long time. How can I set up the product development road map so that I'm making revenue along the way? I'm not setting myself up for a zero one loss function where it doesn't work until it works. You don't want na be in that position. You want to make it useful almost immediately. And then you want to slowly deploy it and at scale, at scale. And you want to set up your data engine, your improvement loops, the telemetry, the evaluation, the harness and everything. And you want to improve the product over time incretally, and you're making revenue along the way. That's extremely important because otherwise you cannot build these large undertakings just like don't make sense economically. And also from the point of view of the team working on it, they need the dopamine along the way. They're not just going to make a promise about this being useful. This is going to change the world in ten years when it works. This is not where you want to be. You want to be in a place like I think, autoppais today, where it's offering increased safety and convenience of driving today, people pay for it, people like it, people purchase it. And then you also have the greater mission that you're working towards.
speaker 2: And you see that. So the dopamine for the team that was a source of happism, yes, and 1%.
speaker 1: you're deploying this. People like it, people drive it, people pay for it. They care about it. There's all these YouTube videos. Your grandma drives it. She gives you feedback. People like it. People engage with it.
speaker 2: You engage with it huge. Do people that drive teslas like recognize you and give you love? Like like Hey, thanks for them for the this nice feature that is doing Yeah I think the tricky thing is like .
speaker 1: some people really love you. Some people unfortunately like you're working on something that you think is extremely valuable, useful, etc. Some people do hate you. There's a lot of people who like hate me and the team and the whole project. And I think they Tesla drivers, in many cases.
speaker 2: they're not actually Yeah that's that's actually makes me sad about humans or the current ways that humans interact. I think that's actually fixable. I think humans want to be good to each other. I think Twitter and social media is part of the mechanism that actually somehow makes the negativity more viral that it doesn't deserve like disproportionately add like a viral boost of negativity. But I wish people would just get excited about so suppress some of the jealousy, some of the ego and just get excited for others. And then there's a karma aspect to that. You get excited for others, theyget excited for you. Same thing in academia, if you're not careful, there is like a dynamical system there. If you if you think of in silos and get jealous of somebody else being successful, that actually, perhaps counterintuitively, leads to less productivity of you as a community. And you individually, I feel like if you keep celebrating others, that actually makes you more successful. Yeah. And I think people haven't in depending on the industry, haven't quite learned that yet. Yeah, some people are also very negative and very vocal.
speaker 1: so they're very prominently featured. But actually there's a ton of people who are cheerleaders, but they're silent cheercheerleaders. And when you talk to people just in the world, they will tell you it's amazing. It's great. Especially like people who understand how difficult it is to get this stuff working, like people who have built products and makers entrepreneurpreneurs. Like making this work and changing something is incredibly hard. Those people are more likely to cheerlead you well.
speaker 2: one of the things that makes me sad is some folks in the robotist community don't do the cheerleading, and they should there's because they know how difficult it is. Well, they actually sometimes don't know how difficult it is to create a product at scale, right? They actually deploy in the real world, a lot of the development of robots and AI system is done on very specific small benchmarks, and that's opposed to real world additions.
speaker 1: Yeah. I think it's really hard to work on robotics .
speaker 2: in academic setting or AI systems that apply in the real world. You've criticized, you flourished and loved for time the image net, the famed image in that data set, and have recently had some words of criticism that the academic research ml community gives a little too much love still to the imanet. Or like those kinds of benchmarks, can you speak to the strengths and weaknesses of data sets used in Machine Learning Research?
speaker 1: Actually, I don't know that I recall the specific instance where I was unhappy or criticizing imagenet. I think imagenet has been extremely valuable. It was basically a benchmark that allowed the deep learning community to demonstrate that deep neural neractually work. There's a massive value in that. So I think imagenet was useful, but basically it's become a bit of an amnist at this point. So emnist is like little 28 by 28 grade yscale digits. There's kind of a joke data set that everyone like crushes.
speaker 2: There's no papers written on mnethough, right? Maybe like of strong papers, like papers that focus on like how do we learn with a small amount of data that stuff? Yeah.
speaker 1: I could see that being helpful, but not in sort of like mainline computer Vision Research anymore. Of course.
speaker 2: I think the way I've heard you somewhere, maybe I'm just imagining things, but I think you said like image, that was a huge contribution to the community for a long time. And now it's time to move past those kinds of well.
speaker 1: emissinot has been crushed. I mean, you know the error rates are Yeah we're getting like 90% accuracy in 1000 classification way prediction. And I've seen those images and this like really high really. That's really good. If I'm correctly, the top five error rate is now like 1% or something. Given your experience .
speaker 2: with a gigantic real world data set, would you like to see benchmarks move in certain directions that the research community uses?
speaker 1: Unfortunately, I don't think academics currently have the next image net. We've obviously I think we've crushed mnest. We've basically kind of crushed image net and there's no next sort of big benchmark that the entire community rlies behind and uses you know for further development of these networks.
speaker 2: Yeah. Whatever would it take for data set to captivate the imagination of everybody? Like where they all get behind it? That that could also need like a virlike, a leader, right? You somebody with popularity. I mean that Yeah why did image or not take off? Is there or is it just the accident of history?
speaker 1: It was the right amount of difficult. It was the right amount of difficult and simple and interesting enough. It just kind of like it was it was the right time for that kind of a datset .
speaker 2: question from reddit. What are your thoughts on the role that synthetic data and game engines will play in the future of neural net model development?
speaker 1: I think as neural nets converge to humans, the value of simulation to neural nets will be similar to value of simulation to humans. So people use silation for people use simulation because they can learn something in that kind of a system and without having to actually experience it.
speaker 2: But are you referring to the simulation we do in our head?
speaker 1: No, sorry, simulation. I mean, like video games or you know other forms of simulation for various professionals.
speaker 2: So let me push back and that because maybe there's simulation that we do in our heads, like simulate. If I do this, what do I think will happen? Okay.
speaker 1: that's like internal simulation.
speaker 2: Yeah, internal. Isn't that what we're doing assusumas before we act? Oh.
speaker 1: Yeah, but that's independent from like the use of simulation in a sense of like computer games or using simulation for training set creation.
speaker 2: or you know is it independent or is it just loosely correlated? Because like isn't that useful to do like a counterfactual or like edge gasimulation to like you know what happens if there's a nuclear war? What happens if there's you know like those kinds of things? Yeah, that's a different .
speaker 1: simulation from like Unreal Engine. That's how I .
speaker 2: interpreted the question. So like simulation of the average case is that what's Unreal Engine? What do you mean by Unreal Engine? So of simulating a world physics of that world, why is that different? Like because you also can add behavior to that world and you can try all kinds of stuff, right?
speaker 1: You could throw all kinds .
speaker 2: of weird things into it. Unreal Engine is not just about sei mean, I guess it is about submitting the physics of the world. It's also doing something with that Yeah the graphics.
speaker 1: the physics and the agents that you put into the environment and stuff like cked.
speaker 2: You see, I think you I feel like you said that it's not that important, I guess for the future of AI development. Is that correct .
speaker 1: to interpret you that way? I think humans use simulators for humans use simulators and they find them useful. And so computers will use simulators and find them useful. Okay.
speaker 2: so you're saying I don't use simulators very often. I play a video game every once in a while, but I don't think I derive any wisdom about my own existence from those video games. It's a momentary escape from reality versus a source of wisdom about reality. I don't. So I think that's a very polite way of saying simulation is not that useful.
speaker 1: Yeah, maybe maybe not. I don't see it as like a fundamental, really important part of like training neural nets currently. But I think as neural nets become more and more powerful, I think you will need fewer examples to train additional behaviors. Simulation is, of course, there's a domain gap in a simulation that is not the real world. That's slightly something different. But with a powerful enough neural nuthe, domain gap can be bigger, I think, because the neural Newill sort of understand that even though it's not the real world, it like has all this .
speaker 2: high level structure that I'm supposed to be one from. So the neural net will actually Yeah will be able to leverage the synthetic data better yes, by closing the gap by understanding in which ways this is exactly .
speaker 1: real .
speaker 2: data exactly right? To do better questions next time. Was that was a question of what? I'm just kidding. All right. So is it possible, do you think, speaking of emininists, to construct neural nets and training processes that require very little data? So we've been talking about huge data sets like the Internet training. I mean, one way to say that is, like you said, like the querying itself is another level of training, I guess, and that requires a little data. Yeah. But do you see any value in doing research and kind of going down in the direction of can we use very little data to train to construct the knowledge base 100?
speaker 1: I just think like at some point you need a massive data set, and then when you pre train your massive neural nut and get something that as like a GPT or something, then you're able to be very efficient at training any arbitrary new task. So a lot of these GPTs, you know you can do tasks like sentiment analysis or translation or so on just by being prompted with very few examples. Here's the kind of thing I want you to do. Like here's an input sentence. Here's the translation into German, input sentence, translation to German, input sentence blank. And the neural letter will complete the translation to German just by looking at sort of the example you've provided. And so that's an example of a very few shot learning in the activations of the neural mud instead of the weights of the neural nud. And so I think basically, just like humans, neural Newill become very data efficient at learning any other new task. But at some point, you need a massive data set to pre train your network.
speaker 2: Do get that. And we probably, we humans have something like that. Do we do we have something like that? Do we have a passive in the background, background model constructing thing that just runs all the time in a self supervised way? We're not conscious of it.
speaker 1: I think humans definitely I mean, obviously, we have we learn a lot during during our life span, but also we have a ton of hardware that helps us initialization coming from sort of evolution. And so I think that's also a really big, big component. A lot of people in the field, I think they just talk about the amounts of like seconds and the you know that a person has left pretending that this is a tabuo larasa, sort of like a zero initialization of a neural nut. And it's not like you can look at a lot of animals, like, for example, zebras. Zebras get born and they see and they can run. There's zero train data in their lifespan. They can just do that. So somehow, I have no idea how evolutionists found a way to encode these algorithms and these neural net initializations that are extremely good into atcgs. And I have no idea how this works, but apparently it's possible because .
speaker 2: here's approved by existence. There's something magical about going from a single cell to an organism that is born to the first few years of life. I kind of like the idea that the reason we don't remember anything about the first few years of our life is that it's a really painful process. Like it's a very difficult, challenging training process. Yeah. Like intellectually like and maybe Yeah, I mean, I don't why don't we remember any of that? There might be some crazy training going on and that the maybe that's the background model training that is is very painful and so is best for the system once it's trained not to remember how it's constructive.
speaker 1: I think it's just like the hardware for long term memory is just not fully developed. Share. I kind of feel like the first few years of of infants is not actually like learning. It's brain maturing. We're born premature. There's etheory along those lines because of the birth canal and the swung of the brain. And so we're born premature. And then the first few years were just the Brais maturing. And then there's some learning eventually.
speaker 2: It's my current view on it. What do you think? Do you think neural necan have long term memory? Like that approach is something like humans. Do you do you think there needs to be another meta architecture on top of it to add something like a knowledge base that .
speaker 1: learns facts about the world and all that kind of stuff? Yes, but I don't know to what extent it will be explicitly constructed. It might take unintuitive forms where you are telling the GPT, like, Hey, you have a, you have a declarative memory bank to which you can start and retrieve data from whenever you encounter some information that you find useful, just save it to your memory bank. And here's an example of something you have retrieved and how it how you say it and here's how you load from it. You just say load whatever you teach it in text and English and then it might learn to use a memory bank from that.
speaker 2: So the neural nuis, the architecture for the background model, the base thing, and then Yeah, everything else is just on top of this. It's not just text, right? It's you're .
speaker 1: giving it gadgets and gizmos, so you're teaching some kind of a special language by which it can it can save arbitrary information and retrieve it at a later time, and you're telling about these special tokens and how to arrange them to use these interfaces. It's like, Hey, you can use a calculator. Here's how use it. Just do five, three plus four, one equals. And when equals is there, a calculator will actually read out the answer and you don't have to calculate it yourself. And you just like tell it in English. This might actually work.
speaker 2: Do you think in that sense, goto is interesting, the DeepMind system that it's not just no language, but actually throws it all in the same pile, images, actions, all that kind of stuff. That's basically what we're moving towards.
speaker 1: Yeah, I think so. So goto is very much a kitchen sink approach to like reinforcement learning lots of different environments with a single fixed transformer model, right? I think it's a very sort of early result that in that realm, but I think Yeah, it's along the lines of what I think things will eventually look like.
speaker 2: right? So this is the early days of assithat, eventually will look like this like from a rich certain perspective. Yeah, I'm not super huge fan of.
speaker 1: I think, all these interfaces that like look very different. I would want everything to be normalized into the same api so for example, screen pixels versus same api instead of having like different world environments that have very different physics and joint configurations and appearances and whatever, and you're having some kind of special tokens for different games so that you can plug. I'd rather generalize everything to a single interface so it looks the same to the neural net.
speaker 2: if that makes sense. So it's all going to be pixel .
speaker 1: based pawing in the end, I think so.
speaker 2: Okay, let me ask you about your own personal life. A lot of people want to know you're one of the most productive and brilliant people in the history of AI. What does a productive day in the life of Andre coppathy look like? What time do you wake up? You because imagine some kind of dance between the average productive day and a perfect productive day. So the perfect productive day is the thing we strive towards in the average, is kind of what it kind of converges to given all the mistakes and human eventualities and so on. So what time do you wake up by your .
speaker 1: morning person? I'm not a morning person.
speaker 2: I'm a niowl for sure. Is things .
speaker 1: stable or not? That's am I stable like eight or nine or something like that? During my PhD, it was even later. I used to go to sleep, usually at 3:00a.m.. I think the am hours are precious. A very interesting time to work because everyone is asleep at 8:00a.m.or 7:00a.m.. The east coast is awake. So there's already activity. There's already some text messages, whatever. There's stuff happening. You can go on like some news website and there's stuff happening and distracting. At 3:00a.m., everything is totally quiet and so you're not gonna to be bothered and you have solid chunks of time to do work. So I like those periods night all by default. And then I think like productive time, basically what I like to do is need you need to like build some momentum on the problem without too much distraction. And you need to load your ram, your working memory with that problem. And then you need to be obsessed with it. When you're taking shower, when you're falling asleep, you need to be obsessed with the problem. And it's full in your memory and you're ready to wake up and work on it right there.
speaker 2: So it is a of is this in a scale, temporal scale of a single day or a couple of days a week.
speaker 1: a month? So I can't talk about one day basically isolation because it's a whole process get when I want to get productive in the problem. I feel like I need a span of a few days where I can really get in on that problem and I don't want to be interrupted and I'm going to just be completely obsessed with that problem. And that's where I do most of my good work.
speaker 2: I would say you've done a bunch of cool like little projects in a very short amount of time very quickly. So that requires you just focusing on it.
speaker 1: Yeah. Basically, I need to load my working memory with the problem and I need to be productive because there's always like a huge fixed cost to approaching any problem. You know like I was struggling with this, for example, a Tesla, because I want to work on like small side project. But okay, you first need to figure out, Oh, okay, I need to Sssh into my cluster. I need to bring up a vs code editor so I can like work on this. I need to I run into some stupid error because of some reason. Like you're not at a point where you can be just productive right away. You are facing barriers. And so it's about really removing all of that barrier and you're able to go into the problem and you have the full problem .
speaker 2: loaded in your memory and somehow avoiding distractions of all different forms like news stories, emails, but also distractions from other interesting projects that you previously worked out, are currently working on and so on. You just want na really focus your mind.
speaker 1: And I mean, I can take some time off for distractions in between, but I think it can't be too much. You know most of your day is sort of like spent on that problem and then you know I drink coffee. I had my morning routine. I look at some news, Twitter, hacker news, wall Street Journal, etcec.
speaker 2: So basically, you wake up, you have some coffee. Are you trying to get to work as quickly as possible? Do take this diet of of like, what the hell is happening in the .
speaker 1: world first? I am, I do find it interesting to know about the world. I don't know that it's useful or good, but it is part of my routine right now. So I do read through a bunch of news articles, and I wanto be informed, and I'm suspicious of it. I'm suspicious of the practice, but currently .
speaker 2: that's where I am. Oh, you mean suspicious about the positive effect, Yeah, of that practice on your productivity and your well being is my well being psychologically, and also on your ability to deeply understand the world, because there's a bunch of sources of information you're not really .
speaker 1: focused on deeply integrating.
speaker 2: That's a little bit distracting. Yeah. In terms of a perfectly productive day, for how long of a stretch of time in one session do you try to work and focus on a it's a couple hours. Is it one hours at 30 minutes .
speaker 1: is ten minutes. I can probably go like a small few hours and then I need some breaks in between for like food and stuff. And Yeah, but I think like it's still really hard to accumulate ours. I was using a tracker that told me exactly how much time I spent coding any one day. And even on a very productive day, I still spent only like six or eight hours. Yeah. And it's just because there's so much padding, commute, talking to people, food, etc. There's like the cost of life just living and sustaining and homeostasis and just maintaining yourself as a human is very high.
speaker 2: And that there seems to be a desire within the human mind to participate in society that creates that padding. Because I Yeah the most productive days I've ever had is just completely from start to finish, just tuning out everything. Yeah and just sitting there and then and then you could do more than 68 hours. Is there some wisdom about what gives you strength to do like tough days of long focus?
speaker 1: Yeah just like whenever I get obsessed about a problem, something just needs to work.
speaker 2: Something just needs to exist. It needs to exist. And so you're able to deal with bugs and programming issues and technical issues and design decisions that turn out to be the wrong ones. You're able to think through all of that given given that you want to think to exist.
speaker 1: Yeah, it needs to exist. And then I think to me also a big factor is you know are other humans are going to appreciate it. Are 't they going to like it? That's a big part of my motivation. If I'm helping humans and they seem happy, they say nice things, they tweet about it or whatever, that gives me pleasure because I'm doing something useful.
speaker 2: So like you do see yourself sharing it with the world, like what, on GitHub, with a blog post or .
speaker 1: through videos? Yeah, I was thinking about it like, suppose I did all these things but did not share them.
speaker 2: I don't think I would have the same amount of motivation that I can build up. You enjoy the feeling of other people gaining value and happiness from the stuff that you've created. Yeah. What about that? Is there? I saw you played with intermittent fast, new fast. Does that help with everything? With the things you played? What's been most beneficial to your ability to mentally focus on a thing and just mental mental productivity and happiness? You're still fast. Yeah, so fast.
speaker 1: But I do intermittent fasting. But really what it means at the end of the day is I skip breakfast. Yeah. So I do 18 six, roughly by default, when I'm in my steady state, if I'm traveling or doing something else, I will break the rules. But in my steady state, I do 18 six. So I eat only from twelve to six, not a hard rule, and I break it often, but that's my default. And then Yeah, I've done a bunch of random experiments for the most part right now, where I've been for the last year and a half. I wanna say is I'm plant based or plant forward. I heard plant forward. It sounds better exactly. I don't actually know the difference ences, but it sounds better in my mind.
speaker 2: But it just means I prefer plant based food. And raw or cooked.
speaker 1: I prefer cooked and plant paste.
speaker 2: So plant based theyforgive me, I don't actually know how wide the category of plant entails. Well, plant based Ste just means that .
speaker 1: you're not like to use and you can flex and you just prefer to eat plants. And you know, you're not making, you're not trying to influence other people. And if someone is, you come to someone's house party and they serve you a steak that they're really proud of, you will eat it. Yes. Judgment.
speaker 2: Oh, that's beautiful. I mean, that's I'm the flip side of that, but I'm very sort of flexible. Have you tried doing one meal a day?
speaker 1: I have accidentally, not consistently, but I've accidentally had that. I don't I don't like it. I think it makes me feel not good. It's too it's too much, too much of a hit. Yeah. And so currently I have about two meals a day, twelve and six.
speaker 2: I do that, announced that I'm doing it now, do one meal a day. Okay, it's interesting. It's a interesting feeling. Have you ever fasted longer than a day?
speaker 1: Yeah, I've done a bunch of water fasts because I was curious what happens? What?
speaker 2: Anything interesting?
speaker 1: Yeah, I would say so. I mean, you know what's interesting is that you're hungry for two days and then starting day three or so, you're not hungry. It's like such a weird feeling because you haven't .
speaker 2: eaten in a few days and you're not hungry. Is that weird? It's really one of the many weird things about human biology. Yeah they figure something out. It ds finds another source of energy or something like that or relaxes the system. I don't know how the body is .
speaker 1: like you're hungry, you're hungry and then it just gives up. It's like, okay, I guess we're fasting now. There's nothing. And then it's just kind of like focuses on trying to make you not hungry and not feel the damage of that and trying to give you some space to figure out the food situation.
speaker 2: So are you still to this day most productive at night? I would say I am.
speaker 1: But it is really hard to maintain my PhD schedule, especially when I was, say, working at Tesla and so on. It's a non starter. So but even now, like you know people want to meet for various events, they society lives in a certain period of time and you sort of have to like work.
speaker 2: So that's it's hard to like do a social thing and then after that.
speaker 1: return and do work. Yeah, it's just really hard.
speaker 2: That's why I tried. When I do social things, I try not to do too much drinking so I can return and continue doing work. But a Tesla, is there is there conversions like Tesla, but any, any company? Is there conversions, tas, a schedule? Or is there more? Is that how humans behave when they collaborate? I need to learn about this. Do they try to keep us a consistent schedule? You're all awake at the same time.
speaker 1: I'm going to do, try to create a routine. And I try to create a steady state in which I'm comfortable in. So I have a morning routine, I have a day routine. I try to keep things to a steady state and things are predictable. And then you can sort of just like your body just sort of like sticks to that. And if you try to stress that a little too much, it will create you know when you're traveling and you're dealing with jet lag, you're not able to really ascend. You know where you need to go.
speaker 2: Yeah. Yeah, that's too about humans with the habits and stuff. What are your thoughts on work life balance throughout a human lifetime? So testing part was known for sort of pushing people to their limits in terms of what they're able to do, in terms of what they're trying to do, in terms of how much they work, all that kind of stuff. Yeah.
speaker 1: I mean, I will say Tesla is still too much bad rep for this because what's happening is Tesla is it's a bursty environment. So I would say the baseline, my only point of reference is Google, where I've interned three times and I saw what it's like inside Google and DeepMind. I would say the baseline is higher than that. But then there's a punctually equilibrium where once in a while there's a fire and someone like, people work really hard. And so it's spiky and bursty, and then all the stories get collected upon the bursts, and then it gives the appearance of like total insanity. But actually, it's just a bit more intense environment, and there are fires and sprints. And so I think, you know definitely, though I would say it's a more intense environment than something .
speaker 2: would get in your person. Forget all of that, just in your own personal life. What do you think about the happiness of a human being, a brilliant person like yourself, about finding a balance between work and life? Or is such a thing not a good thought experiment?
speaker 1: Yeah, I think I think balance is good, but I also love to have sprints that are out of distribution. And that's when I think I've been pretty creative and as well. So sprints out of distribution means that .
speaker 2: most of the time you have a Yeah quote unquote, balance. I have balance most of the time. And I like ked being obsessed with something .
speaker 1: once in a while. Once in a while, what? Once a week, once a month.
speaker 2: once a year. Yeah, probably like I say, once a month or something.
speaker 1: Yeah. And that's when we get .
speaker 2: a new githurepo for once.
speaker 1: That's when you like really care about a problem. It must exist. This will be awesome. You're obsessed with it. And now you can't just do it on that day. You need to pay the fixed cost of getting into the groove, and then you need to stay there for a while. And then society will come and they will try to mess with you and they will try to distract you. Yeah the worst thing is like a person who's like, I just need five minutes of your time. Yeah this is the cost of that is not five minutes. And society needs to change how it thinks .
speaker 2: about just five minutes of your time, right? It's never it's never just one minute. It's just 30. It's just a quick big deal. Why are you being so? Yeah, no. What's your computer setup? What what's like the perfect? Are you somebody that's flexible to no matter what laptop for screens? Yeah or do you prefer a certain setup that you're most productive?
speaker 1: I guess the one that I'm familiar with is one large screen 27 inch and my laptop on the side. What .
speaker 2: operating .
speaker 1: system I do, max, that's my .
speaker 2: primary for all tasks.
speaker 1: I would say os sx. But when you're working on deep learning, everything as Linux, you're sssainto .
speaker 2: a cluster and you're working remotely.
speaker 1: But what about the actual development, like using the ide you would use? I think a good way is you just run vs code, my favorite editor right now on your mac, but you are actually, you have a remote folder through ssh. So the actual files that you're manipulating are on .
speaker 2: the cluster somewhere else. So what's the best ide vs code? What else do people? So I use emacs. Still cool. It may be cool. I don't know if it's maximum productivity. So what do you recommend in terms of editors? You worked with a lot of software engineers, editors for Python, C++machine learning applications.
speaker 1: I think the current answer is vcode. Currently, I believe that's the best ide. It's got a huge amount of extensions. It has GitHub Copilot integration, which I think is very valuable. What do you think .
speaker 2: about the the Copilot integration? I was actually, I got to talk a bunch with Gredo on roesome was the creative Python and he loves coppohe. Like he programs a lot with it. Yeah. Do you Yeah use Copilot. I love it. And it's free for me.
speaker 1: but I would pay for it. Yeah. I think it's very good. And the utility that I found with it was is in is it I would say there's a learning curve and you need to figure out when it's helpful and when to pay attention to its outputs and when it's not going to be helpful where you should not pay attention to it because if you're just reading its suggestions all the time, it's not a good way of interacting with it. But I think I was able to sort of like mold myself to it. I find it's very helpful. Number one, in copy paste and replace some parts. So I don't when the pattern is clear, it's really good at completing the pattern. And number two, sometimes it suggests apis that I'm not aware of. So it tells you about something that you didn't know. And that's an opportunity .
speaker 2: to discover and it's an opportunity .
speaker 1: to so I would never take Copilot code as given. I almost always copy a copy paste into a Google search and you see what this function is doing. And then you're like, Oh, it's actually actually exactly what I need. Thank you, Copilot.
speaker 2: So you learned something. So it's in part of search engine, a part maybe getting the exact syntax correctly, that once you see it, Yeah, it's that nphard thing. Once you see it, you know yes, exactly correct. You yourself can very strugyou can verify efficiently, but you you can't generate efficiently. And Copilot, really?
speaker 1: I mean, it's it's autopilot for programming, right? And currently is doing the link following, which is like the simple copy paste and sometimes suggest, but over time, it's going to become more and more autonomous. And so the same thing will play out in not just coding, but actually across many.
speaker 2: many different things. Probably. But coding is an important one, right? Writing programs, how do you see the future of that? Developing the program synthesis, like being able to write programs that are more and more complicated because right now it's human supervised in interesting ways. Yes. What it's feels like the transition will be very painful.
speaker 1: My mental model for it is the same thing will happen as with the autopilot. So currently he's doing lane following, is doing some simple stuff and eventually we'll be doing autonomy and people .
speaker 2: will have to intervene less and less. And those could be like like testing mechanisms. Like if it writes a function and that function looks pretty damn correct, but how do you know it's correct? Because you're like getting lazier and lazy as a programmer. Like your ability to cause like little bugs. But I guess it won't make a little.
speaker 1: No, it will. Copilot will make off by one subtle bugs. It has done that to me.
speaker 2: But do you think future systems will? Or is it really the off by one is actually a fundamental challenge of programming. In that case.
speaker 1: it wasn't fundamental and I think things can improve. But Yeah, I think humans have to supervise. I am nervous about people not supervising what comes out and what happens to, for example, the proliferation of bugs in all of our systems. I'm nervous about that. But I think there will probably be some other copilots for bug finding and stuff like that at some point because itbe like .
speaker 2: a lot more automation for man. It's like a program, a copile that generates a compiler, one that does a linter, one that does like a type checker.
speaker 1: It's a committee of like a GPT sort of like and then .
speaker 2: theybe like a manager for the committee. And then there would be somebody that says, a new version of this is needed. We need to regenerate it.
speaker 1: Yeah there were ten GPTs that were forwarded and gave 50 suggestions. Another one looked at it and picked a few that they like a bug. One looked at it and it was like, it's probably a bug. They got rranked by some other thing. And then a final ensemble GPT comes in is like, okay, given everything you guys have told me, this is probably the next token.
speaker 2: You know the feeling is the number of programmers in the world has been growing and growing very quickly. Do you think it's possible that itactually level out and drop to like a very long number or this kind of world? Because then you'll be doing software 2.0 programming and you'll be doing this kind of generation of copii type systems programming, but you won't be doing the old school software 1.0 programming.
speaker 1: I don't currently think that they're just going to replace human programmers. It's I'm so hesitant saying stuff like this.
speaker 2: right? Because this this is going to be relayed in five years and no, it's going to show that like this is where we thought, because I agree with you, but I think we might be very surprised, right? Like what are the next? What's your sense of where we stand with language models? Like does it feel like the beginning or the middle or the end?
speaker 1: The beginning hundred percent. I think the big question in my mind is for sure, GPT will be able to program quite well, competently and so on. How do you steer the system? You still have to provide some guidance to what you actually are looking for. And so how do you steer it and how do you say how do you talk to it? How do you audit it and verify that what is done is correct? And how do you like work with this? And it's as much not just an AI problem, but a ui ux problem. So beautiful, fertile ground for so much interesting work for vs code plus plus where you're not just it's not just human programming anymore. It's amazing. Yeah.
speaker 2: So you're interacting with the system. So not just one prompt, but it's iterative prompting. Yeah you're trying to figure out having a conversation with the system. Yeah, that I mean, to me, that's super exciting to have a conversation .
speaker 1: with the program I'm writing. Yeah, maybe at some point you're just conversing with it. It's like, okay, here's what I want na do. Actually, this variable, maybe it's not even that low levels variable button.
speaker 2: You can also imagine, like can you translate this to C++and back to Python .
speaker 1: and back to everybody .
speaker 2: kind of existence? Nobut just like doing it as part of the program experience. Like I think I'd like to write this function in C++or like you just keep changing from different different programs because different syntax. Maybe I want to convert this into a functional language. And so like you get to become multilingual as a programmer and dance back and forth officially.
speaker 1: Yeah. I mean, I think the ui ux of fit though is like still very hard to think through because it's not just about writing code on a page. You have an entire developer environment. You have a bunch of hardware on it. You have some environmental variables, you have some scripts that are running in a chrome job. Like there's a lot going on to like working with computers. And how do these systems set up environment flags and work across multiple machines and set up screen sessions and automate different processes, like how old that works and is auditable by humans and so on, is like massive .
speaker 2: question in moment. You've built archive sanity. What is archive and what is the future of academic research publishing that you would like to see?
speaker 1: So archive is this preprint server. So if you have a paper, you can submit it for publication to journals or conferences and then wait six months and then maybe get a decision PaaS or fail, or you can just upload it to archive, and then people can tweet about it three minutes later, and then everyone sees it, everyone reads it, and everyone can profit .
speaker 2: from it in their own low ways. And you can cite it and it has an official look to it. It feels like a pulike. It feels like a publication process. Yeah. It feels different than if you just put it .
speaker 1: in a blog post. Oh Yeah. Yeah. I mean, it's a paper. And usually the the bar is higher for something that you would expect on archive as opposed to and something .
speaker 2: you would see in a blog post. Well, the culture created the bar because you could probably host a pretty crappy pior, an archive. So what's that make you feel like? What what's that make you feel about peer review? So rigorous peer review by two, three experts versus the peer review of the community, right? As it's written? Yeah basically, I think the community .
speaker 1: is very well able to peer review things very quickly on Twitter. And I think maybe it just has to do something with AI machine learning field specifically, though, I feel like things are more easily auitable and the verification is easier potentially than the verification somewhere else. So it's kind of like you can think of these scientific publications as like little blockchains where everyis building on each other's work and setting iting each other. And you sort of have AI, which is kind of like this much faster and loose blockchain, but then you have any one individual entries like very, very cheap to make, and then you have other fields where maybe that model doesn't make as much sense. And so I think in AI at least, things are pretty easily verifiable. And so that's why when people upload papers, they're a really good idea and so on, people can try it out like the next day, and they can be the final arbiter of whether it works or not on their problem. And the whole thing just moves significantly faster. So I kind of feel like academia still has a place. So this like Conference Journal process still has a place, but it's sort of like it lags behind, I think, and it's a bit more maybe higher quality process, but it's not sort of the place where you will discover cutting edge work anymore. It used to be the case when I was starting my PhD that you go to conferences and journals and you discuss all the latest research. Now when you go to Conor Journal, like no one discusses anything that's there because it's already, like three generations ago, irrelevant.
speaker 2: Yeah makes me sad about like DeepMind, for example, where they still publiin nature and these big prestigious, I mean, there's still value as opposed to the prestige that comes with these big venues. But the result is that they theyannounce some breakthrough performance and it will take like a year to actually publish the details. I mean, and those details, if they were published immediately, wouldn't inspire the community to move in certain .
speaker 1: directions and they Yeah it would speed up the rest of the community. But I don't know to what extent .
speaker 2: that's part of their objective function. Also that's it's not just the prestige.
speaker 1: A little bit of the delay is is part of Yeah, they certainly a DeepMind specifically has been working in the regime of having slightly higher quality, basically process and latency and publishing those papers that way.
speaker 2: Another question from reddit. Do you or have you suffered from imposter syndrome, being the director of AI, Tesla being this person when you're at Stanford, where like the world looks at you as the expert in AI to teach teach the world about machine learning. When I was leaving Tesla for five years.
speaker 1: I spent a ton of time in meeting rooms and you know, I would read papers in the beginning when I joined Tesla, I was writing code, and then I was writing less son less code, and I was reading code, and then I was reading less son less code. And so this is just a natural progression that happens, I think, and definitely, I would say, near the tail end, that's when it sort of like starts to hit you a bit more that you're supposed to be an expert. But actually the source of truth is the code that people are writing, the GitHub and actual the actual code itself. And you're not as familiar with that as you used to be. And so I would say maybe .
speaker 2: there's some like insecurity there that makes pretty profound that a lot of the insecurity has to do with not writing the code in the computer science space like that, because that is the truth.
speaker 1: That code is the source of truth. The papers and everything else. It's a high level summary. I don't Yeah just a high level summary. But at the end of the day, you have to read code. It's impossible to translate all that code into actual you know paper form. So when things come out, especially when they have a source code available.
speaker 2: that's my favorite place to go. So like I said, you're one of the greatest teachers of machine learning AI ever, from cs to hundred 31n to today. What advice would you give to beginners interested in getting into machine learning?
speaker 1: Beginners are often focused on like what to do, and I think the focus should be more like how much you do. So I am kind of like believer on a high level in this ten, zero hours kind of concept where you just kind of have to just pick the things where you can spend time and you care about and you're interested in. You literally have to put in ten, zero hours of work. It doesn't even like matter as much like where you put it and you'll iterate and you'll improve and you'll waste some time. I don't know if there's a better way you need to put it in ten, zero hours, but I think it's actually really nice because I feel like there's some sense of determinism about being an expert at a thing. If you spend 10000 hours, you can literally pick an arbitrary thing. And I think if you spend 10000 hours of deliberate effort and work, you actually will become an expert at it. And so I think it's kind of like a nice thought. And so basically, I would focus more on like, are you spending ten.
speaker 2: zero hours? I focus on so and then thinking about what kind of mechanisms maximize your likelihood of getting to ten, zero hours, exactly, which for us silly humans means probably forming a daily habit of like every single day actually doing the thing.
speaker 1: whatever helps you. So I do think to a large extent, it's a psychological problem for yourself. One other thing that I help that I think is helpful for the psychology of it is many times people compare themselves to others in the area I think is very harmful. Only compare yourself to you from some time ago, like say, a year ago. Are you better than you year ago? Is the only way to think. And I think this then you can see your progress and it's .
speaker 2: very motivating. That's so interesting that focus on the quantity of ours because I think a lot of people in the beginner stage, but actually throughout, get paralyzed by the choice. Like which one do I pick, this path or this path? Yeah. Like theyliterally get paralyzed by like which ID de to use. Well.
speaker 1: they're worried Yeah theyworried about all these things. But the thing is, some of the, you will waste time doing something wrong. You will eventually figure out it's not right. You will accumulate scar tissue. And next time you'll will grow stronger, because next time you'll have the scar tissue, and next time you'll learn from it. And now next time you come to a similar situation, you'll be like, Oh, I messed up. I've spent a lot of time working on things that never materialize into anything. And I have all that scar tissue, and I have some intuitions about what was useful, what wasn't useful, how things turned out. So all those mistakes were not dead work, you know. So I just think you should they should just focus on working. What have you done? What have you done last week?
speaker 2: That's a good question actually to ask for a lot of things, not just machine learning. It's a good way to cut the the, I forgot what the term we use, but the fluff, the blubber, whatever the the inefficiencies in life. What do you love about teaching? You seem to find yourself often in the like drawn to teaching. You're very good at it, but you're also drawn to it.
speaker 1: I mean, I don't think I love teaching. I love happy humans and happy humans like when I teach. I wouldn't say I hate teaching. I tolerate teaching. But it's not like the act teaching that I like. It's it's that you know I have some, I have something. I'm actually okay at it. Yes, I'm okay at teaching and people appreciate it a lot. Yeah. And so I'm just happy to try to be helpful. And teaching itself is not like the most. I mean, it's really, it's can be really annoying, frustrating. I was working on a bunch of lectures just now. I was reminded back to my days of 2:31 and just how much work it is to create some of these materials and make them good. The amount of iteration and thought and you go down blind alleys and just how much you change it. So creating something good in terms of like educational valley is really hard and it's not fun.
speaker 2: It's difficult for people should definitely go watch your new stuff. You put there are lectures where actually building a thing like from like you said, the code is truth. So discussing back propagation by building it, by looking through and just the whole thing. So how difficult is that to prepare for? I think it's a really powerful way to teach. Did you have to prepare for that or are you just live thinking through it?
speaker 1: I will typically do like say, three takes, and then I take like the better takes. So I do multiple takes and I take some of the better takes, and then I just build out lecture that way. Sometimes I have to delete 30 minutes of content because it just went down the nley that I didn't like too much. There's a better a bunch of iteration and it probably takes me you know somewhere around ten hours to create .
speaker 2: one hour of content to give one hour. It's interesting. I mean, is it difficult to go back to the like the basics? Do you draw a lot of like wisdom from going back to the basics?
speaker 1: Yeah going back to back propagation loss functions, where they come from? And one thing I like about teaching a lot honestly, is it definitely strengthens your understanding. So it's not a purely altruistic activity. It's a way to learn if you have to explain something to someone, you realize you have gaps in knowledge. And so I even surprised myself in those lectures, like the result will obviously look at this, and then the result doesn't look like it. And I'm like, okay, I thought I understood this .
speaker 2: well, that's why it's really cool. They literally code. You run it in a notebook and it gives you a result and you're like, Oh, wow. Yes. And like actual numbers.
speaker 1: actual input, actual code. It's not the mathematical symbols, etc, the source of truth of the code.
speaker 2: It's not slides. It's just like, let's build it. It's beautiful. You're a rare human. In that sense, what advice would you give to researchers trying to develop and published ideas that have a big impact in the world of AI? So maybe undergrads, maybe early graduate students. Yei, mean, I would say, like they .
speaker 1: definitely have to be a little bit more strategic than I had to be as a PhD student because of the way AI is evolving. It's going the way of physics, where you know in physics, you used to be able to do experiments when you're bench up and everything was great and you can make progress, and now you have to work in like lhc or like cern. And and so AI is going in that direction as well. So there's certain kinds of things that's just not possible to do on the bench top anymore. And I think that didn't used to be the case at the time.
speaker 2: Do you still think that there's like ganti papers to buriwhere? Like like very simple idea that requires just one computer to illustrate a simple example. I mean, one example .
speaker 1: that's been very influential recently is diffusion models. The fusion models are amazing. The fusion models are six years old for the longest time. People are kind of ignoring them as far as I can tell. And they're an amazing generative model, especially in images. And so Stable Diffusion and so on. It's all diffusion based. Diffusion is new. It was not there and came from well, came from Google, but a researcher could have come up with it. In fact, some of the first, actually, no, those came from Google as well, but a researcher could come up with that in an academic institution.
speaker 2: Yeah what do you find most fascinating about diffusion models? So from the societal impact of .
speaker 1: the technical architecture, what I like about diffusion .
speaker 2: is it works so well. Is that surprising to you? The amount of the variety, almost the novelty of the synthetic .
speaker 1: data is generating? Yeah. So the Stable Diffusion images are incredible. It's the speed of improvement in generating images has been insane. We went very quickly from generating like tiny digits to the tiny faces, and it all looked messed up. And now we a Stable Diffusion, and that happened very quickly. There's a lot that academia can still contribute. You know, for example, flash attention is a very efficient kernel for running the attention operation inside the transformer that came from academic environment. It's a very clever way to structure the kernel that do. That's the calculation. So it doesn't materialize the attention matrix. And so there's I think there's still like lots of things to contribute, but you have to be .
speaker 2: just more strategic. Do you think neural networks can be made to reason?
speaker 1: Yes. Do you think .
speaker 2: they're already reason? Yes. What's your definition of .
speaker 1: reasoning? Information processing?
speaker 2: So in the way that humans think through a problem and come up with novel ideas, it it feels like reasoning. Yeah. So the the novelty, I don't want to say, but out of distribution ideas you think is possible? Yes. And I think we're seeing .
speaker 1: that already in the current neural nuts. You're able to remix that training, set information into generalization in some sense that doesn't appear. It doesn't afffurther, by the way, in the training set, like you're doing something interesting algorithmically, you're manipulating you know some symbols and you're coming up with some correct, a unique answer in a new setting.
speaker 2: What would illustrate you? Holy shit. This thing is definitely thinking.
speaker 1: To me, thinking or reasoning is just information processing and generalization. And I think .
speaker 2: the neural nets already do that today. So being able to perceive the world or perceive the whatever the inputs are and to make predictions based on that or actions based on that.
speaker 1: that's that's reason Yeah you're giving correct answers in novel settings by manipulating information. You've learned the correct algorithm. You're not doing just some kind of a lookup .
speaker 2: table and there's rest neighbor search, something like that. Let me ask you about agi. What are some moonshed ideas you think might make significant progress towards agi, maybe in other ways? What are big blockers that we're missing now?
speaker 1: Basically, I am fairly bullish on our ability to build agis, basically automated systems that we can interact with and are very human like, and we can interact with them in a digital realm or physical realm. Currently, it seems most of the models that sort of do these sort of magical tasks are in a text realm. I think, as I mentioned, I'm suspicious that the text realm is not enough to actually build full understanding of the world. I do actually think you need to go into pixels and understand the physical world and how it works. So I do think that we need to extend these models to consume images and videos and train on a lot more data that is multimodal.
speaker 2: In that way, if you think you need to touch the world to .
speaker 1: understand it also, well, that's the big open question I would say in my mind is if you also require the embodiment and the ability to sort of interact with the world who on experiments and have a data of that form, then you need to go to optimus or something like that. And so I would say optimus in some way is like a hedge in agi, because it seems to me that it's possible that just having data from the Internet is not enough. If that is the case, then optimus may lead to agi, because optimus, to me, there's nothing beyond optimus. You have like this humanoid form factor that can actually like do stuff in the world. You can have millions of them interacting with humans and so on. And if that doesn't give a rise to agi at some point, I'm not sure what will. So from a completeness perspective, I think that's a really good platform, but it's a much harder platform because you are dealing with atoms and you need to actually like build these things and integrate them into society. So I think that path takes longer, but it's much more certain. And then there's a path of the Internet and just like training these compression models effectively on trying to compress all the Internet, and that might also give these agents as .
speaker 2: well compress the Internet, but also interact with the Internet. So it's not obvious to me. In fact, I suspect you can reach agi without ever entering the physical world. And which is a little bit more concerning because it might that results in it happening faster. So it just feels like we're like in boiling water. We won't know what this happening. I would like to I'm not afraid of agi. I'm excited about it. There's always concerns, but I would like to know when it happens Yeah or and have like hints about when it happens like a year from now it will happen, that kind of thing. Yeah. I just feel like in the digital realm, it just might happen.
speaker 1: Yeah. I think all we have available to us, because no one has built agi again. So all we have available to us is, is there enough vertile ground on the periphery? I would say yes. And we have the progress so far, which has been very rapid, and there are next steps that are available. And so I would say, Yeah, it's quite likely that we'll be interacting with digital entities.
speaker 2: How will you know that somebody is built?
speaker 1: Let ait's going to be a slow, I think it's going to be a slow incremental transition, is going to be product based and focused. It's going to be GitHub copiled getting better and then GPT's helping you, right? And then these oracles that you can go to with mathematical problems, I think we're on on a verge of being able to ask very complex questions in chemistry, physics, math of these oracles and have them complete solutions.
speaker 2: So agi to use primarily focus on intelligence so consciousness doesn't enter into into it.
speaker 1: So in my mind, consciousness is not a special thing you you will figure out and bolt on. I think it's an emerging phenomenon of a large enough and complex enough generative model sort of. So if you have a complex enough world model that understands the world, then it also understands its predicament in the world as being a language model, which to me is a form of consciousness or self awareness. And so in order .
speaker 2: to understand the world deeply, you probably have to integrate yourself into the world. Yeah. And in order to interact with humans and other living beings, consciousness is a very useful tool.
speaker 1: I think consciousness is like a modeling insight.
speaker 2: Modeling insight Yeah.
speaker 1: I'd say you have a powerful enough model of understanding the world that you actually understand that you are an entity in it.
speaker 2: Yeah but there's also this perhaps just a narrative we tell ourselves. There's a it feels like something to experience the world. The hard problem of consciousness. Yeah but that could be just a narrative that we tell ourselves.
speaker 1: Yeah, I don't think Yeah, I think it will emerge. I think it's going to be something very boring. Like we'll be talking to these digital AI's. They will claim they're conscious, they will appear conscious. They will do all the things that you would expect of other humans.
speaker 2: and it's going to just be a stalemate. I think there will be a lot of actual fascinating ethical questions, like Supreme Court level questions of whether you're allowed to turn off a conscious AI if you're allowed to build the conscious AI, maybe there would have to be the same kind of debbase that you have around. Sorry to bring up a political topic, but you know, abortion, which is the deeper question with abortion, is what is life? And the deeper question with AI is also, what is life and what is conscious? And I think theybe very fascinating to bring up. It might become illegal to build systems that are capable, like of such level of intelligence that consciousness would emerge and therefore the capacity to suffer would emerge. And some a system that says, no.
speaker 1: please don't kill me. Well, that's what the lambda the lambda chatbot already told this Google engineer, right? Like it was talking about not wanting to .
speaker 2: die or so on. So that might become illegal to do that, right? Because otherwise you might have a lot of creatures that don't want to die and .
speaker 1: they will get spawn infinity of the monocluster.
speaker 2: And then that might lead to like, horrible consequences, because then there might be a lot of people that secretly love murder and theystart practicing murder on those systems. I mean, there's just, I, to me, all of this stuff just brings a beautiful mirror to the human condition, and human nature will get to explore it. And that's what, like the best of the Supreme Court, of all the different debates we have about ideas of what it means to be human, we get to ask those deep questions they've we've been asking throughout human history. There has always been the other in human history, we're the good guys and that's the bad guys. And we're going to, you know, throughout human history, let's murder the bad guys. And the same will probably happen with robots itbe the other at first. And then we'll get to ask questions. So what does it mean to be alive? What does it mean to be conscious yeand?
speaker 1: I think there's some canary in the coal mines, even with what we have today. I'm, and you know for example, these, there's these like wafuthat you can like work with. And some people are trying to like this company is going to shut down, but this person really like loved their waifand, is trying to like porit somewhere else and like it's not possible. And like I think like definitely people will have feelings towards towards these systems because in some sense they are like a mirror of humanity because they are like sort of like a big average of humanity in a way .
speaker 2: that it's trained. But we can that average we can actually watch. It's nice to be able to interact with the big average of humanity and do like .
speaker 1: a search query on it. Yeah, it's very fascinating. And we can also, of course, also like shape it. It's not just a pure average. We can mess with the training data. We can mess with the objective. We can fine tune them in various ways so we have some you know .
speaker 2: impact on what those systems look like. If you want to achieve agi, and you could have a conversation with her and ask her talk about anything, maybe ask her a question, what kind of stuff .
speaker 1: would you would you ask? I would have some practical questions in my mind, like, do I or my loved ones really have to die? What can we do about that?
speaker 2: Do you think it will answer clearly, or would it answer poetically?
speaker 1: I would expect it to give solutions. I would expect it to be like, well, I've read all of these textbooks, and I know all these things that you've produced. And it seems to me like, here are the experiments that I think it would be useful to run next. And here's some gene therapies that I think would be helpful.
speaker 2: And here are the kinds of experiments that you should run. Okay, let's go with this thought experiment. Okay, imagine that mortality is actually like a prerequisite for happiness. So if we become immortal, will actually become deeply unhappy. And the model is able to know that. So what is this supposed to tell you, stupid human, about it? Yes, you can become immortal, but you will become deeply unhappy. If if if the model is, if the agi system is trying to empathize with you human, what is it supposed to tell you that yes, you don't have to die, but you're really not gonna to like it is that is it gonna be deeply honest? Like there's a interstellar. What is it the AI says? Like humans want 90% honesty. So like you have to pick how honest do I want to answer these practical questions?
speaker 1: Yeah, I love AI interstelby the way I think it's like such a sidekick to the entire story, but at the same time.
speaker 2: it's like really interesting. It's kind of limited in certain ways.
speaker 1: right? Yeah, it's limited. And I think that's totally fine. By the way, I don't think I think it's fine and plasible to have a limited and imperfect agis.
speaker 2: Is that the feature?
speaker 1: Almost as an example, like it has a fixed amount of compute on its physical body. And it might just be that even though you can have a super amazing mega brain, super intelligent AI, you also can have like you know less intelligent AI that you can deploy in a power efficient way. And then they're not perfect.
speaker 2: They might make mistakes. I meant more like say you had infinite compute, and it's still good to make mistakes sometimes, like in order to integrate yourself. Like what is it going back to goodwill hunting? Robin Williams character says like the human imperfections. That's good stuff, right? Isn't isn't that the like we don't want perfect. We want flaws in part to form connection with each other because it feels like something you can attach your feelings to the flaws. And in that same way you want an AI that's flawed. I don't know. I feel like perfectionist.
speaker 1: but then you're saying, okay, Yeah.
speaker 2: but that's not agi. But see, agi would need to be intelligent enough to give answers to humans, like humans don't understand. And I think perfect isn't something humans can't understand because even science doesn't give perfect answers. There's always gaps and mysteries. And I don't know, I don't know, humans want perfect.
speaker 1: Yeah I can imagine just having a conversation with this kind of oracle entity as youimagine ed them. And Yeah, maybe it can tell you about, you know based on my analysis of human condition, you might not want this. And here, some of the things .
speaker 2: that might might, but every every dumb human will say, Yeah, Yeah, trust me, I can give me the truth. I can handle it.
speaker 1: but that's the beauty. Like people can .
speaker 2: choose also, but that it's the old marshmallow test with the kids and so on. I feel like too many people like can't handle the truth, probably including myself, like the deep truth of the human condition. I don't, I don't know if I can handle it. Like what if there's some darstuff? What if we are an alien science experiment and it realizes that?
speaker 1: What if it hack? I mean, I'm, this is the matrix, you know, the over again.
speaker 2: I don't know. I would, what would I talk about? I don't even, Yeah, probably I will go with the safe fer scientific questions at first that have nothing to do with my own personal life. Immortality just like about physics and so on. Yeah to build up like let's see where it's at or maybe see if it has a sense of humor. That's another question. Would it be able to presumably in order to, if it understands humans deeply, it would able to generate Yep, to generate humor. Yeah, I think .
speaker 1: that's actually a wonderful benchmark. Almost like is it able? I think that's a really good point. Basically to to make you laugh. Yeah if it's able to be like a very effective stand up comedian that is doing something very interesting .
speaker 2: computationally, I think being funny is extremely hard. Yeah because it's hard in a way, like a touring test. The original intent of the touring test is hard because you have to convince humans and there's nothing. That's that's what when comedians talk about this, like there's this is deeply honest because if people can't help but laugh and if they don't laugh, that means you're not funny. They laugh, that's funny and you're showing.
speaker 1: You need a lot of knowledge to create to create humor about like the occumentional, human condition and so on. And then you need to be clever with it.
speaker 2: You mentioned a few movies, you tweeted movies that I've seen five plus times, but I'm ready and willing to keep watching interstellar, gladiator, contact, goodwill, hunting, the matrix, lord of the rings, all three, avatar, fifth element and so on goes on. Terminator two mean girls. I'm not gonna to ask about that.
speaker 1: Fingers is great.
speaker 2: What are some of the jump onto your memory that you love in why like you mentioned the matrix as as a computer person, why do you love the matrix?
speaker 1: There's so many properties that make it like beautiful, interesting. So there's all these philosophical questions but then there's also agis and there's simulation and it's cool and there's you know, the black, you know.
speaker 2: the look of it, the feel of it.
speaker 1: the look of it, the feel of it, the action, the bullet time. It was just like innovating in so many ways.
speaker 2: And then good, good dwill hunting. Why do you like that one? Yeah.
speaker 1: I just, I really like this tortured genius sort of character who's like grappling with whether or not he has like any responsibility or like what to do with this gift that he was given or like how to think about the whole thing. And there's also a dance between .
speaker 2: the genius and the personal, like what it means to love another human being. And there's a lot of themes.
speaker 1: There is just a beautiful movie.
speaker 2: And then the fatherly figure, the mentor and the psychiatrist and the.
speaker 1: It like really like it messes with you. You know there's some movies that just like really mess with .
speaker 2: you on a deep level. Do you .
speaker 1: relate to that movie at all?
speaker 2: No. Slightly full hundred. I said lthe rings. That's self explanatory. Terator two, which is interesting. You rewatch that a lot. Is that better than terator one? You like you like Arnold .
speaker 1: terminator one as well. I like terminator two a little bit more, but in terms of like it surface properties.
speaker 2: do you think skynet is at all a .
speaker 1: possibility? Oh, yes.
speaker 2: like the actual sort of autonomous weapon system kind of thing. Do you worry about that stuff? I do worry. I being used war, 100% worry about it.
speaker 1: And so the, I mean, the, you know, some of these fears of aand how this will plan out. I mean, these will be like very powerful entities probably at some point. And so for a long time, there are going to be tools in the hands of humans. People talk about alignment of agis and how to make the problem is like even humans are not aligned. How this will be used and what this is going to look like is, yes, troubling.
speaker 2: So do you think ithappen slowly enough that we'll be able to, as a human civilization, think through the problems? Yes, that's my hope, is that it .
speaker 1: happens slowly enough and an open enough way where a lot of people can see and participate in it. Just figure out how to deal with this transition. I think, which is gonna be interesting.
speaker 2: I draw a lot of inspiration from nuclear weapons because I sure thought it would be would be fucked once they develop nuclear weapons. But like it's almost like when the when the systems are not so dangerous to destroy human civilization, we deploy them and learn the lessons. And then we quickly, if it's too dangerous, we quickly, quickly, we might still deploy it, but you very quickly learn not to use them. And so therebe, like this balance achievhumans are very clever as a species. It's interesting. We exploit the resources as much as we can, but we don't. We avoid destroying ourselves. It seems like.
speaker 1: Well, I don't know about that.
speaker 2: Actually.
speaker 1: I hope it continues. I mean, I'm definitely like concerned about nuclear weapons and so on, not just as a result of the recent conflict, even before that, that's probably but my .
speaker 2: number one concern for humanity. So if humanity destroys itself or destroys you know 90% of people, that would be because of nukes.
speaker 1: I think so. And it's not even about the full destruction. To me, it's bad enough. If we reset society, that would be like terrible. It would be really bad. And I can't believe we're like so close to it.
speaker 2: Yeah it's like so crazy to me. It feels like we might .
speaker 1: be a few twweets away from something like that. Yeah, basically, it's extremely unnerving.
speaker 2: but and has been for me for a long time, it seems unstable that world leaders just having a bad mood can like take one step towards a bad direction and it escalates. Yeah and because of a collection of bad moods, it can escalate without being able to stop. Yes, just it's a huge .
speaker 1: amount of power. And then also with the proliferation, I basically I don't I don't actually really see I don't actually know what the good outcomes are here. So I'm definitely worried about that a lot. And then agi is not currently there, but I think at some point will more and more become something like it. The danger with agi even is that I think it's even slightly worse in a sense that there are good outcomes of agi, and then the bad outcomes are like an epsilon away, like a tiny one away. And so I think capitalism and humanity and so on will drive for the positive ways of using that technology. But then if bad outcomes are just like a tiny like flip and minus sign away, that's a really bad position to be in.
speaker 2: A tiny perturbation of the system results in the destruction of the human species. It's a weird line to walk. Yeah.
speaker 1: I think in general, what's really weird about like the dynamics of humanity and this explosion we talked about is just like the insane coupling afforded by technology and just the instability of the whole dynamical system. I think it just doesn't look good.
speaker 2: honestly. Yeah. So that explosion could be destructive and constructive and the probabilities are non zero.
speaker 1: And both I'm gonna have to, I do feel like I have to try to be optimistic and so on. And yes, I think even in this case, I still am predominantly optimistic, but there's definitely me too.
speaker 2: Do you think we'll become a multipuntary species?
speaker 1: Probably, yes. But I don't know if it's a dominant feature of future humanity. There might be some people on some planets and so on, but I'm not sure if it's like Yeah if it's like a major player in our culture and so on. We still have to .
speaker 2: solve the drivers of self destruction here on earth. So just having a backup on Mars is not gonna to solve the problem. So by the way.
speaker 1: I love the backup on Mars. I think that's amazing. You should absolutely do that. Yes. And I'm so thanwould you.
speaker 2: And would you go to Mars personally?
speaker 1: No, I do like earth quite a lot. Okay.
speaker 2: I'll go to Mars. I'll go, I'll tweet at you for now. Maybe eventually I .
speaker 1: would once it's safe enough, but I don't actually know if it's on my lifetime scale unless I can extend it by a lot. I do think that, for example, a lot of people might disappear into virtual realities and stuff like that. And I think that could be the major thrust of sort of the cultural development of humanity if it survives. So it might not be it's just really hard to work in physical realm and go out there. And I think ultimately all your experiences are in your brain. And so it's much easier to disappear into digital realm. And I think people will find them more compelling, easier, safer, more interesting.
speaker 2: So you're a little bit captivated by virtual reality, by the possible worlds, whether it's the metaverse or some other manifestation of that. Yeah, Yeah, it's really interesting. And I'm interested just talking a lot to karmac. Where's the where's the thing that's currently preventing that?
speaker 1: Yeah. I mean, to be clear, I think what's interesting about the future is it's not that I kind of feel like the variance in a human condition grows. That's the primary thing that's changing. It's not as much the mean of the distribution is like the variance of it. So there will probably be people on Mars and there will be people in vr and there will people here on earth. It's just like there will be so many more ways of being. And so I kind of feel like I see it as like a spreading out of .
speaker 2: a human experience. There's something about the Internet that allows you to discover those little groups and you gravitate something about your biology .
speaker 1: likes that kind of world that you find each other. And we'll have transshumanists, and then we'll have the Amish .
speaker 2: and they're everything is just gonna to coexist. You know, the cool thing about it, because I've interacted with a bunch of Internet communities, is they don't know about each other. You can have a very happy existence, just like having a very close knit community and not knowing about each other. I mean, even you even sense this, just having traveled to Ukraine, they don't know so many things about America. Like when you travel across the world, they think you experience this to their certain cultures. They're are like they have their own thing going on. They don't. And so you can see that happening more and more and more and more in the future. We have little communities.
speaker 1: Yeah, Yeah. I think so that seems to be the that seems to be how it's going right now. And I don't see that trend like really reversing. I think people are diverse and they're able to choose their own like path in existence. And I sort of like .
speaker 2: celebrate that. And so we spend so much time in the metaverse, in the virtual reality or which community are you? Are you the physicalist, the physical reality? Enjoy yer. Or do you see drawing a lot of pleasure and fulfillment in the digital world?
speaker 1: Yeah. I think currently the virtual reality is not that compelling. I do think it can improve a lot, but I don't really know to what extent. Maybe you know there's actually like even more exotic things you can think about with like neural links or stuff like that. So currently I kind of see myself as mostly a team human person. I love nature. I love harmony. I love people. I love humanity. I love emotions of humanity. And I just want to be like in this like solar punk cloutopia. That's my happy place. My happy place is like people. I love thinking about cool problems, surrby lush, beautiful, dynamic nature and secretly high tech in places that count.
speaker 2: places like they use technology to empower that love for other humans and nature. Yeah, I think a .
speaker 1: technology used like very sparingly. I don't love when it sort of gets in the way of humanity in many ways. I like just people being humans in a way we sort of like it's slightly evolved and prefer. I think just by default.
speaker 2: people kept asking me because they they know you love reading. Are there particular books that you enjoyed that had an impact on you for silly or for profound reasons that you would recommend? You mentioned the vital question. Many.
speaker 1: of course. I think in biology as an example, the vital question is a good one. Anything by the clean, really life ascending, I would say is like a bit more potentially representative is like a summary of a lot of the things he's been talking about. I was very impacted by the selfish gene. I thought that was a really good book that helped me understand altruism as an example and where it comes from. And just realizing that, you know, the Selis an the love of genes was a huge insight for me at the time. And it sort of like cleared up a lot .
speaker 2: of things for me. What do you think about the idea that ideas of the organisms.
speaker 1: the meat.
speaker 2: love it 100%? Are you able to walk around with that notion for a while that there is an evolutionary kind of process with ideas as well?
speaker 1: There? Absolutely. Yes. There's memes just like genes, and they compete and they live in our brains.
speaker 2: It's beautiful. Are we silly humans thinking that we're the organisms? Is it possible that the primary organisms are the ideas?
speaker 1: Yeah, I would say like the ideas kind of live in the software or like our civilization .
speaker 2: in the minds and so on. We think as humans that the hardware is the fundamental thing. I human is a hardware entity. Yeah but it could be the software, right? Yeah.
speaker 1: Yeah. I would say like there needs to be some grounding at some point to like a physical reality.
speaker 2: Yeah but if we clone and Andre. The software is a thing like is this thing that makes that thing special, right? Yeah.
speaker 1: I guess you're right. But then cloning .
speaker 2: might be exceptionally difficult. Like there might be a deep integration between the software and the hardware in which we don't quite .
speaker 1: understand well, from the illupoint of view, like what makes me special is more like the the gang of genes that are writing in my chromosomes, I suppose, right?
speaker 2: Like they're the they're replicating unit, I suppose. And no, but that's just for of the thing that makes you special. Sure. Well, the reality is what makes you special is your ability to survive based on the software that runs on the hardware that was built by the genes. So the software is the thing that makes you survive, not the hardware. All right.
speaker 1: soboth, it's just like a second layer. It's a new second layer that hasn't been there before, the brain.
speaker 2: They both they both coexist, but there's also layers of the software. I mean, it's it's not it's a it's a abstraction at on top of abstractions.
speaker 1: But okay, so as Genean niclane, I would say sometimes books are like not sufficient. I like to reach for textbooks sometimes I kind of feel like books are for too much of a general consumption sometimes and they just kind of like they're too high up in the level of abstraction and it's not good enough. So I like textbooks. I like the cell. I think the cell was pretty cool. That's why also I like writing of niclane is because he's pretty willing to step one level down and he doesn't Yeah, he sort of he's willing to go there, but he's also willing to sort of be throughout the stack. So hego down to a lot of detail, but then he will come back up and I think he has a Yeah, basically, I really .
speaker 2: appreciate that. That's why I love college, early college, even in high school, just textbooks on the basics Yeah of computer science and mathematics, of biology, of chemistry. Yes, those are they condensed down. Like it's sufficiently general that you can understand both the philosophy and the details, but also like you get homework problems and you get to play with it as much as you would if you werin programming stuff. Yeah and then I'm also suspicious .
speaker 1: of textbooks, honestly, because as an example, in deep learning, there's no like amazing textbooks and the field is changing very quickly, I imagine the same as and say synthetic biology and so on. The books like this cell are kind of outdated. They're still high level. Like what is the actual real source of truth? It's people in wet labs working with cells, you know sequencing genomes and Yeah actually working with working with it. And I don't have that much exposure to that or what that looks like. So I still done fully. I'm reading through the cell and it's kind of interesting and I'm learning, but it's still not sufficient.
speaker 2: I would say, in terms of understanding what is a clean summarization of the mainstream narrative. Yeah but you have to learn that before you break out Yeah at the towards the cutting edge.
speaker 1: Yeah what is the actual process of working with these cells and growing them and incubating them? And it's kind of like a massive cooking recipe. So making sure your sleves and proliferate and then you're sequencing them, running experiments and just how that works, I think is kind of like the source of truth of at the end of the day, what's really useful in terms of creating therapies and so on.
speaker 2: Yeah. I wonder in the future AI textbooks would be because you know there's artificial intelligence a modern approach actually haven't read if it's come out the recent version the recent there's been a recent edition I also saw there's a science of deep learning book. I'm waiting for textbooks that worth recommending, worth reading. It's tricky because it's like papers and code, code, code. Honestly.
speaker 1: I papers are quite good. I especially like the appdix appendix of any paper as well. It's it's like the most detail you can have.
speaker 2: It doesn't have to be cohesive, connected to anything else. You just describe me a very specific way you solve the particular thing.
speaker 1: Yeah many times papers can be actually quite readable. Not always, but sometimes the introduction in the abstract is readable even for someone outside of the field. Not this is not always. And sometimes I think unfortunately scientists use complex terms even when it's not necessary. I think that's harmful.
speaker 2: I think there's no reason for that. And papers, sometimes they're longer than they need to be in in the parts that don't matter. Yeah appendix should be long, but then the paper itself, you know look at Einstein, make it simple. Yeah but certainly only I've .
speaker 1: come across papers I would say say like synthetic biology or something that I thought were quite readable for the abstract and the introduction. And then you're reading the rest of it and you don't fully understand, but you kind of are getting a gist. And I think it's cool.
speaker 2: What are advice? You give advice to folks interested in machine learning and research, but in general, life advice to Young person, high school, early college, about how to have a career they can be proud of or life they can be proud of.
speaker 1: Yeah, I think I'm very hesitant to give general advice. I think it's really hard. I've mentioned like some of the stuff I've mentioned is fairly general. I think like focus on just the amount of work you're spending on like a thing. Compare yourself only to yourself, not to others.
speaker 2: That's good.
speaker 1: I think those are fairly general. How do you pick the thing you just have like a deep interest in something or like try to like find the armax over like the things .
speaker 2: that you're interested in armax at that moment and stick with it. How do you not get distracted and switched to another thing?
speaker 1: You can if you like if you do an argumax .
speaker 2: repeatedly every week doesn't converge, it doesn't.
speaker 1: It's the problem. Yeah you can like low PaaS filter yourself in terms of like what has consistently been for you. But Yeah definitely see how it can be hard. But I would say like you're going to work the hardest on the thing that you care about the most. So low PaaS filter yourself and really introspect in your past. What are the things that gave you energy and what are the things that took energy away from you? Concrete examples. And usually from those concrete examples, sometimes patterns can emerge. I like I like it when things look like this.
speaker 2: when I'm these positions. So that's not necessarily the field, but the kind of stuff you're doing in a particular field. So for you, it seems like you are energized by implementing stuff.
speaker 1: building actual things Yeah being low level learning and then also communicating so that others can go through the same realizations and shortening that gap. Because I usually have to do way too much work to understand a thing. And then I'm like, okay, this is actually like, okay, I think I get it. And like why was it so much work? It should have been much less work. And that gives me a lot of frustration.
speaker 2: and that's why I sometimes go teach. So aside from the teaching you're doing now putting out videos, aside from a potential Godfather Part two, would the agi at Tesla beyond what does the future frankapathy hold? Have you figured that out yet or no? I mean, as you see through the fog of war that is all of our future, do you start seeing silhouettes of what that possible future could look like?
speaker 1: The consistent thing I've been always interested in, for me at least, is AI and that's probably what I'm spending my dreof my life on, because I just care about it a lot. And I actually care about, like many other problems as well, like, say, aging, which I basically view as disease, and I care about that as well. But I don't think it's a good idea to go after it specifically. I don't actually think that humans will be able to come up with the answer. I think the correct thing to do is to ignore those problems and you solve AI and then use that to solve everything else. And I think there's a chance that this will work. I think it's a very high chance. And that's kind of like the way .
speaker 2: I'm betting at least. So when you think about AI, are you interested in all kinds of applications, all kinds of domains? And any domain you focus on will allow you to get insights to the big problem of agi. Yeah, for me, is the ultimate mental problem.
speaker 1: I don't wanna work on any one specific problem. There's too many problems. So how can you work on all problems simultaneously? You solve the meta problem, which to me is just intelligence. And how do you automated?
speaker 2: Is there cool small projects like archive sanity and so on that you're thinking about that the world, the ml world can anticipate? There's somealways like some fonside projects.
speaker 1: Yeah archive sanity is one basically like there's way too many archive papers. How can I organize it and recommend papers and so on? I transcribed all of your Yeah podcasts.
speaker 2: What did you learn from that experience from transcribing of like you like consuming audiobooks and podcasts and so on? And here's a process that achieves closer to human level performance and annotation. Yeah. Well.
speaker 1: I definitely was like surprised that transcription with opening eyes whisper was working so well compared to what I'm familiar with from Siri. And like a few other systems, I guess it works so well. And that's what gave me some energy to like try it out. And I thought it could be fun to run dom podcasts. It's kind of not obvious to me why whisper is so much better compared to anything else, because I feel like there should be a lot of incentive for a lot of companies to produce transcription systems and that they've done so over a long time. Whisperer is not a super exotic model. It's a transformer. It takes meell spectrograms and you know, just outputs tokens of texit's. Not crazy. The model and everything has been around for a long time. I'm not actually .
speaker 2: 100% sure why this. It's not obvious to me either. It makes me feel like I'm missing something. Fundthe something. Yeah because there is huge, even Google and so on YouTube transcription. Yeah, Yeah, it's unclear, but some of it is also integrating into a bigger system Yeah that so the user interface, how it's deployed and all that kind of stuff. Maybe running it as an independent thing is much easier, like an order magnitude easier than deploying to a large integrated system like YouTube transcription or anything like meetings like zoom has transcription that's kind of crappy, but creating interface where it detects the different individual speakers, it's able to display it in compelling ways, run it real time, all that kind of stuff. Maybe that's difficult. That's the only explanation I have because like. I'm currently paying quite a bit for human transcription, human caption annotation, and like it seems like there's a huge incentive to automate that. Yeah, it's very confusing, I think.
speaker 1: I mean, I don't know if you looked at some of the whisper transcripts.
speaker 2: but they're quite good. They're good. And especially in tricky cases. Yeah, I've seen whisper's performance on like super tricky cases and it does incredibly well. So I don't know. A podcast is pretty simple. It's like high quality audio and you're speaking usually pretty clearly. And so I don't know at the I don't know what open enaiplans are either, but Yeah.
speaker 1: there's always like fun, fun projects basically. And Stable Diffusion also is opening up a huge amount of experimentation, I would say, in the visual realm and generating images and videos and movies, ultimately videos now. And so that's going to be pretty crazy. That's going to that's going to almost certainly work and it's going to be really interesting when the cost of content creation is going to fall to zero. You used to need a painter for a few months to paint a thing, and now it's going to be speak to your .
speaker 2: phone to get your video. So Hollywood will start using it to generate scenes, which completely opens up Yeah so you can make a like a movie like avatar eventually for under a million dollars.
speaker 1: much less maybe just by talking to your phone. I mean, I know it sounds kind of crazy.
speaker 2: And then therebe some voting mechanism, like how do you have a like would there be a show on Netflix that's .
speaker 1: generated completely automatically, potentially? Yeah. And what does it look like also when you can just generate on demand and it's and there's .
speaker 2: infinity of it? Yeah. Oh man, all the synthetic content. I mean, it's humbling because we treat ourselves as special for being able to generate art and ideas and all that kind of stuff. If that can be done in an automated way by AI. Yeah, I think it's fascinating .
speaker 1: to me how these, the predictions of AI and what is going to look like and what is going to be capable of are completely inverted and wrong and scfi of 50s and 60s are just like totally not right. They imagine AI as like super calculating theorem improvers. And we're getting things that can talk to you about emotions.
speaker 2: They can do. Art is just like weird. Are you excited about that future? Just AI is like hybrid systems, heterogeneous systems of humans and aiis talking about emotions, Netflix and children, AI systems where the Netflix thing you watch is also generated by AI.
speaker 1: I think it's going to be interesting for sure. And I think I'm cautiously optimistic.
speaker 2: but not it's not obvious. Well, the sad thing is your brain and mine developed in a time where before Twitter, before before the Internet. So I wonder people that are born inside of it might have a different experience. Like I maybe you can will still resist it and the people born now will not. Well.
speaker 1: I do feel like humans are extremely malleable. Yeah and you're probably right.
speaker 2: What is the meaning of life? Andre? We talked about sort of the universe having had conversations with us humans or with the systems we create to try to answer for the verse, for the creator of the universe to notice us. We're trying to create systems that are loud enough to answer back. I don't know if that's the meaning of life.
speaker 1: That's like meaning of life for some people. The first level answer, I would say, is anyone can choose their own meaning of life because we are a conscious entity and it's beautiful, number one. But I do think that like a deeper meaning of life, if someone is interested, is along the lines of like, what the hell is all this? And like why? And if you look at the into fundamental physics and the quantum field theory and standard model, they're like very complicated. And there's this like you know 19 free parameters of our universe and like what's going on with all this stuff and why is it here? And can I hack it? Can I work with it? Is there a message for me? Am I supposed to create a message? And so I think there's some fundamental answers there, but I think there's actually even like you can't actually like really make dent in dose without more time. And so to me also, there's a big question around just getting more time, honestly.
speaker 2: Yeah, that's kind of like what I think about quite a bit as well. So kind of the ultimate or at least first way to sneak up to the why question is to try to escape the system, the universe. And then for that you sort of backtrack and say, okay, for that, that's going to be take a very long time. So the why question boils down from an engineering perspective to how do we extend Yeah, I think that's the question number one.
speaker 1: practically speaking, because you can't, you're not gonna to calculate the answer to the deeper questions in time you have.
speaker 2: And that could be extending your own lifetime or extending just the lifetime of human .
speaker 1: civilization of whoever wants to. Not many people might not want that. Yeah, but I think people who do want that, I think I think it's probably possible. And I don't know that people fully realize this. I kind of feel like people think of death as an inevitability. But at the end of the day, this is a physical system. Somethings g's go wrong. It makes sense why things like this happen, evolutionarily speaking, and there's most certainly interventions that mitigate it.
speaker 2: That would be interesting if death is eventually looked at as as a fascinating thing that used to happen to humans.
speaker 1: I don't think it's unlikely. I think it's I think it's likely.
speaker 2: And it's up to our imagination to try to predict what the world without death looks like. It's hard to. I think the values will completely change.
speaker 1: Could be I don't I don't really buy all these ideas that, Oh, without death there's no meaning. There's nothing as I don't intuitively buy all those arguments. I think there's plenty of meaning, plenty of things to learn. They're interesting, exciting. I want to know, I want to calculate. I want to improve the condition of all the humans .
speaker 2: and organisms that alive. Yeah the way we find meaning might change. There is a lot of humans, probably including myself, that finds meaning in the finiteness of things. But that doesn't mean that's the only .
speaker 1: source of meaning. Yeah I do think many people will will go with that, which I think is great. I love the idea that people can just choose their own adventure like you. You are born as a conscious free entity by default, I'd like to think. And you have your unalienable rights .
speaker 2: for life in the pursuit of happiness. I don't know if you have that in the nature, the landscape of happiness.
speaker 1: you can choose your own adventure mostly. And that's not .
speaker 2: not fully, but I'm still pretty sure I'm an npc, but an npc can't know it's an npc. There could be different degrees and levels of consciousness. I don't think there's a more beautiful way tended. Andre, you're an incredible person. I'm really honored you would talk with me. Everything you've done for the machine learning world, for the AI world to just inspire people, to educate millions of people. It's been it's been great. And I can't wait to see what you do next. It's been an honor, man. Thank you so much for talking today.
speaker 1: Awesome. Thank you.
speaker 2: Thanks for listening to this conversation with Andre kpathy. To support this podcast, please check out our sponsors in the description. And now let me leave you some words from Samuel Carlin. The purpose of models is not to fit the data, but to sharpen the questions. Thanks for listening and hope to see you next time.