2025-04-24 | Anthropic | Lessons on AI agents from Claude Plays Pokemon
AI代理如何通过玩《宝可梦》测试复杂任务处理能力
标签
媒体详情
- 上传日期
- 2025-06-11 14:50
- 来源
- https://www.youtube.com/watch?v=CXhYDOvgpuU
- 处理状态
- 已完成
- 转录状态
- 已完成
- Latest LLM Model
- gemini-2.5-pro-preview-06-05
转录
speaker 1: The AI world has talked so much about agents in the last year, but for a lot of the world, I think like it's pretty hard to really understand what that means. And I think the example of Pokemon where it's like, Oh, it's like not just A Chabot where I type in a chat and I get a response back, but like it's like doing this on its own and seeing and trying things and taking actions and all that is a way that I think more people have been able to like latch onto. What is this agent's thing we're talking about? Take one. speaker 2: Today, we're diving into the story behind Claude plays Pokemon. I'm Alex. I lead Claude relations . speaker 1: here at anthropic, and I'm David. I'm on our applid . speaker 2: AI team and I made Claude plays Pokemon. David, can we get a broad overview of what is Claude plays Pokemon? speaker 1: For those who aren't aware, Claude plays Pokemon is a sort of experiment with agents that is hooking up, clad our language model to the Game Boy game, Pokemon red, letting it actually try to play the game. And so we started from square one, like start a new game and see how Claude does learning how to be a . speaker 2: Pokemon master. Okay, there's a lot there that we're gon na have to get into on the technical side. And also just generally, how is Claude playing Pokemon? How does it know how to play? But how did this actually come about? What was the idea for this? speaker 1: And why Pokemon? Yeah. So the core genesis of like why I started this was a lot of I work with customers at anthropic. I spent a lot of my day working with our customers. And it was pretty obvious to me last year that like agents was the most important thing that was happening for our customers. It's like where they were making value. And I just wanted some sort of test bed for myself to experiment with agents to get a feel for how does clad work when it needs to take a whole bunch of actions in a row without any conversation with a human? I actually like had seen someone before me had hooked up, clawed to Pokemon at anthropic Elliot. And so I was sitting in the back of my brain, and then I thought, like, I really want some sort of place for me to do my own experimentation with agents. I am like a Pokemon long term fan boy. It was the first game I ever got when I was a kid that was really like the initial. And this was like June of 2024 last year, where we had a new model coming out, 3.5 sonnet. It just seemed like the perfect time to give it a go. speaker 2: Why did you specifically, besides that nostalgic component, why choose Pokemon as like the test bed? Yeah. Why not? Like I plays Mario or Zelda. And another game, Pokemon is actually like a really . speaker 1: good setup for this language models for all their graces right now. They're like slow. They take you know one generation at a time. Like they basically can only see like a snapshot of the game at any point in time. And Pokemon, there's like essentially no consequences for sitting and waiting for a while, right? The combat is turned based, so you just have to Press a button and wait to see what happens. Movement, not much happens when you're not moving. So it's like in terms of games for a model to be able to play, it's sort of like perfectly set up. And then separately, like why games in the first place is also like a pretty reasonable question to get at. And one of the fun things about games is they're like one of these unique things where you do something for a really long period of time and get pretty clear feedback, like, am I making progress? So whether it's a score counter in pong that shows you, like are you beating your opponent in pong or in Pokemon, like am I getting gym badges? Am I making my way through the game? You actually get some like feedback loop where you can see like is the model actually being successful? Like can it do this thing? And so games just like happen to be this great environment where you can let a model try to do something for a long time and actually like by the structure of the games themselves, get a sense of like how well is it doing in some measurable way. speaker 2: right? Okay. So to summarize that basically Pokemon is great because it's turn based. It's not synchronous necessarily. You're not playing against other people. So like we couldn't have Claude plays like call a duty or something that . speaker 1: does not work right now. It would be really hard. It claded like if you could get clad running at 60 frames a second, like maybe you could play some other game. But practically right now. speaker 2: Yeah, exactly. Right. Right. And then games are great because they're kind of their own simulated contained environment once it can go through entire process, Yeah we have a cloud playing Pokemon. You've you've started to work on this. But how does that actually work? How Claude, how is clad controlling the character in the game and actually . speaker 1: making things happen? I guess in general, like you can think of the prompt fuquad to start is basically like you're playing Pokemon. It's really simple. I just tell the model it's playing Pokemon and that's it. And then you hook up quad with a set of tools that it has to be able to actually interact with the game. And you know, it's actually a pretty simple set of tools as the minimal set you get in a Game Boy game, it's like pressing buttons on the Game Boy. So basically tell quad it's allowed to Press a, Press b, hit up, down, left, right, that kind of thing. You describe that tool, do it behind them scenes. Like I have to implement some code that actually says, like when Claude says it wants to Press A, I need to make it so that actually goes Press a on the emulator. speaker 2: I see. So the tools are like options for next action that can take and you get the output from Claude and then translate it into the action bingo. speaker 1: So the whole game is essentially Claude taking a series of actions, mostly just pressing buttons one at a time. Now, the key p back loop, and how it does this is every time it presses a button, we send it back a screenshot. So okay, clade actually sees the screen of the game. Boy, you can see like I'm in this room or I'm in a Pokemon battle or whatever it is. And so the next thing I wanna do is Press a to to talk to this person or to select this move or whatever it is, and then it kind of just does that forever, you know, iterates on that. And now practically, we've actually given a handful of other tools that help it do things like manage its memory over a long period of time. Quad has a limited amount of context. It can't fit actually like a whole Game Boy rollout like play in one context. Winso, there's a lot you need to do to actually manage the details of how to get that right, right? But at its core, it's sort of like while Press buttons and see if you could play the game and give it some feedback along the way to know . speaker 2: what it's looking at. Yeah. And so as you've been constructing that kind of harness . speaker 1: around claw to get it to play that and you're . speaker 2: running into some of those limitations, one of the things you've encountered is like the memory issue. Yeah maybe you can speak a little bit on that around Yeah, you had to do specifically there. speaker 1: Yeah, I just like I'm gonna to zoom out of that question 1s first, which is like there's all sorts of ways that people have come up with to build these like agents that have to take actions for a long time. So you know there's this like general concept of an agent, which is just like the model needs to take some set of actions, and we really don't know what it is ahead of time, but the model is going to have to like do something, see what happens, do the next thing, learn. And people have come up with all sorts of crazy ways to wire up that thing, to try to make models better in any number of situations. And so I actually like the first thing I did was just try other people's ideas basically, right? So there's this paper called Voyager. That was the first thing I tried. It was, I think probably like a 20, 23 paper from nvidia that they used to play Minecraft. And that was like the first thing I tried. And that gives the model all sorts of, I won't dive into it, but like a lot of fancy tools and stuff. And I've like sort of like stripped back to my own simple version of it from there. So then like I guess what you start with is this sort of like the world's simplest version. All you can do is Press buttons and get a screenshot. And the first wall you run into is what happens when you hit 200, zero tokens, basically, which is the limit quad. Practically, that's like something like taking 50 buttons or something like that, and getting a screenshot back fills it up. So you Press 50 buttons and you have barely started the game in Pokemon. And so you run into this wall pretty quickly, where if you do nothing, you just run out of space, then you crash and you're done. And so one of the first things you need to figure out is, what do I do when I run out of space, right? And the two like key insights here of how I want about this, at least. And I think that it's actually pretty well reflected like in the industry these days about how you think about agents are some concept of long term memory. So I give quad some ability to basically store a memory in what I call it knowledge base, where it just basically says, like I just did this thing or I have these Pokemon or here's my goals and I've checked off six of them or whatever it is, and you'll see it sort of like make incremental updates. And that persists throughout time. Like it can constantly always look at that and keep track of something because then the second key is once I fill up the context length I have called summarize back down to remove the 50 actions it just took into one short summary, I say. So you delete a whole bunch of stuff in reset, but it has this sort of like set of long term memories that give it some ability to sort of like remember things over the course of potentially. I mean, if you look at the twitch stream now, like it's been running for three weeks continuously, it's probably summarized itself a few thousand times by now. Wow. And so it's really important for it to have some sort of long term memory to remember. The last three weeks of what it's done. speaker 2: So the long term memory there, it writing out to this external knowledge base was just basically like a plain text file Yeah is like kind of like the movie memento where they're were like writing sticky notes on the wall and everything track all I've been here, I've done this. Yeah. speaker 1: that's exactly right. Like if you just do it naively were to delete the message, quad would have amnesia and it would like, how did I get here? Why am I parway through Pokemon? He told me, I'm just starting a new game. What's going on? In some sense, this is the poit notes they clawed leaves itself so that when it has the reset and you remove a lot of what's recently seen, it has some way to remember. Like, all right, I know I've done so far. I have a sense of where I'm at. Maybe I've even like learned some lessons, like maybe I've learned some things about like this strategy works really well. I've learned some stuff that like I want to keep around because for the next you know ten minutes that I have. speaker 2: it's gonna to be really helpful. That makes sense. You know I think it's probably pretty important to clarify here or maybe even explain further. We trained Claude on this. Like how does it know actually how to navigate around Pokemon? I mean, you're not really giving it that many instructions. speaker 1: You're not like giving it a how to know guide. I know this is part of the fun part. So we haven't trained clalot on Pokemon itself at all. Yeah Claude obviously like news, something about Pokemon. Like if you go to Claude AI and ask it questions, like you'll you'll notice that it can recall some like general facts. And there's enough in the pop culture that, of course. speaker 2: Claude knows something about Pokemon . speaker 1: through the ptraining process exactly. And it even knows like some of the broad strokes information about the game. So it knows that like the first gym leader is Brock. And so it has some sense of like the structure of the game. But the details it really knows nothing about. In fact, sometimes it thinks it knows things and it doesn't. This is one of the hard things you have to fight with, but that means that even without sort of like specific training, knowing a little bit about what Pokemon is and some of the motivation of it, it has to figure out everything else sort of in the intermediate space. And so you'll see it like some of the funny things you find out, like to see how this actually works in practice is you'll see it talk to an npc that gives it some tidbit of information. Like at the very beginning it says, professor oak is next door. You need to go talk to him. And that's what like mom tells the Pokemon character. So claad mom and Pokemon tells Claude, you need to go next door. And then Claude will like very rigorously be like, I need to go next door. Like I need to go find professor oak next door. And it will be on a mission and occasionally like over it. So the funny fact about Pokemon red is it actually lies to you, like mom lies to you. Professor ook is actually not next door. You have to go find him elsewhere. I've actually seen Claude get disailed on this because it's . speaker 2: like my mom told me. speaker 1: so trusting that hopefully, I mean, I can't have to believe mom. Like to my mom, she told me this is and it will get stuck there, like really looking next door for professor for a very long period of time. Funny enough, you really do see over the course of it playing the game that it like by interacting and just seeing the screen, seeing what trainer people say, signs say, just this experience of what happens when it tries stuff, that it like actually picks up the tidbits that are the core of where it's going, what it should do next, that kind of thing. speaker 2: So this is like a pure playing experience. We didn't like give it some pathway to follow out. No, tell it. Like if you run in this situation and do this sort of thing, it's just like really interacting with the game as . speaker 1: a human would in that sense. Yeah. And maybe just like on a why am I doing this level? My goal is not to beat Pokemon red. Like I did that when I was six. Like this is not the the gold standard. Like and I'm pretty sure you can write a program to beat Pokemon red if that's your only goal. I wanted to understand like how does Claude work? Like how is clad this? Like can clad handle this situation? And so that end, like it would be no fun to just give it the answer guide. Like I want to find out how Claude figures out the answer. And so most of how I've structured everything and most of how you like what you actually see when you do this is like a pretty bare bones. Like Claude has to go figure this out on its own because I'm curious and we're curious like how does Claude even try doing this thing? How does it try to play Pokemon. speaker 2: right? We're not trying to get Claude to be Pokemon. We want to just generally evaluate Claude on adgentic tasks and Yeah figure out where it's at. speaker 1: Yeah. I don't think anybody's making their buying decision for a model on which model plays Pokemon the best. So this is really for our own understanding. Yeah. speaker 2: So you've been doing this now . speaker 1: for a good bit of time too long. speaker 2: What's it kind of been like over that amount of time with the different models as we've continued to iterate on? Claude, can you just walk us through what that journeys look like? speaker 1: So I mentioned I started with 3.5 soliit. That was the first model I hooked up and it wasn't very good at Pokemon. So in Pokemon, you started the second floor of your house, and to get your first bit of progress is like finding the stairs in the top right. And I probably spent like three days of working on this before I got the model to find the stairs in the first room. I remember just being incredibly hype the first time it like got out of the house and then actually like got to the cutscene where you potentially can get your first Pokemon. That was like the pinnacle of my achievement with 3.5 sonnet. And I thought I had really like done amazing things back then when I had that happening. And that was like better than any of the experiments Weseen in the past. And then I kind of like tucked it away. Like this has been a fun project. I learned some stuff, but like that's about it. And then we released the refresher 3.5 sonnet in October. And I picked this back up and and you can tell it was a little better, like like pretty noticeable. So it like very consistently found the stairs amazingly, it pretty consistently with like a relatively predictable time frame would figure out how to get a starter Pokemon. We actually saw it like win the Batwin a battle the first time, start moving a little bit the right direction after that. It was really slow. It made a lot of dumb mistakes. But like you could see, you could squint and see, like it had clearly gotten better. A lot of that was just like not getting stuck, like not thinking the game was bud, like having a sense of different strategies and things to try. We were actually kind of excited about that again, like wherever quad is, you're just so excited when it gets like one step further that it's very fun to watch and engage with. But it was still like I remember someone asked me back then, like how do you just like random button presses do in comparison? And Claude was like this much better than pressing random buttons. So you know like it was better, but like not not a lot better than pressing random buttons. And so again, I like tucked it away. It been fun. And then we got to testing 3.7 on it, and it was just like, way better. And it was pretty obvious one of the first moments that I realirealized your points heaon. It was way better. I was going through watching it play, and I realized there was this like terrible bug in my code where I wasn't showing Claude, that all of the information it needed to play the game. I had this like thing at the time that was helping like show it a map to try to give it like some extra sense of how to navigate, right? And 3.7 was already doing way better than 3.6. The new 3.5 did. And I was like, Oh, this is this might be like pretty real. And so I pretty quickly became like deeply obsessed with like we got to find out how good this actually is. And I started like grinding harder than ever on like giving everything all of the tools the qud really needed to be successful because you could see it had like the core material it needed to make progress over the handful of weeks that we did testing of it. It was like pretty clear that claad was quite capable, not a star yet. We can get into that, I'm sure. But it started actually playing the game bea gym leader one day and everybody freaked out. You know that's like the same thing that people can see on the stream now, or it's slow and challenging sometimes, but like it makes meaningful progress in the game. That's that's fascinating. speaker 2: What what do you think that shows actually about like the models delimprovements themselves? Yeah. I think why is it that it's actually getting better at the game? speaker 1: This is one of the fun things is I actually like kind of know a little bit about the models because of this. I think there are a few things that have made a difference over time. Surprisingly, the vision, which is the hardest thing, like when you look at the game, like you notice that it the quads not very good at understanding Game Boy screens. That actually hasn't improved a lot. Like the model has made all this progress and its ability to actually like stare at a Game Boy screen and see what it's going on is about as bad as it always has been. So then it's like, how is it actually making progress if it doesn't have a better fundamentals of what's going on? And I think the things that you've noticed or that I've noticed the most, and this actually tracks a lot with what we focus on at anthropic, is that quad is just getting a lot better at coming up with strategies of things that should try of questioning a lot the previous strategies it had and thinking like maybe the mistake isn't that there's a bug in the game. It's that like I had a bad strategy. So like what is the other strategy I could try if the last thing I tried didn't work? What should I try next? And sort of like backtracking to figure out good different things to try. There's like a certain tenacity to like trying all of the different ways you could try a problem and figuring it out. And I think that was the jump that got from 3.5 to the refresh in October. And then that was like huge jump in that exact skill with 3.7, where now it's just like way more willing. And even though it's slow, like I think if you watch the twist stream for two hours, you might think there's no way this thing is good at questioning itself. But the amazing thing is like after overtime, it does typically step back and think like, what's the next thing I should do? And that ability to sort of like triage and understand different ways to go about solving a problem and getting better at that is like one of the biggest things that I think has improved with our models over time. It's one of the biggest things that enables agents for all sorts of things, not just Pokemon, to be really successful. And it's the thing that's made it pretty . speaker 2: okay at playing Pokemon. Yeah. So the fact that it's improved on all these capabilities and it's progressing through the game, how does that actually extrapolate out into other areas? Where are we witnessing that improvement . speaker 1: with you know maybe more real life Yeah use cases. You know it's funny, like at squint value, Pokemon couldn't be more different than writing code or all the other stuff that people do with quad. But this core thing, like the ability to come up with a plan that's good to try something to see if it's working or not and adjust to understand like what are the different strategies available to me and to sort of like be willing to try them, fail, update what you should be doing with new information. That is like the core recipe of, I think like a lot of what makes agents good in a lot of scenarios. So one that we probably spend too much time talking about is coding. But like when you are writing code, you write something, you see a test fail and you think, like what I do wrong? Like how do I do better? Like what should I try next? What are what's the next strategy? And models do this all the time. And there's like one thing between a model that can get it perfectly right every time, all the time. But sometimes you just don't even have the information you need. Like you don't know until you run the test that that's something you miss something. And so the ability to know, what test should I run to learn when I find out something? How should I corporate that? How should I update the strategy? I had to go write this piece of code or whatever. That's the same thing. And I think that's actually like any industry. Like when you think about how you go about, if I were to go search the Internet for something, it's like I click on something I noticthe page is bad. It doesn't have the fact I need, or maybe it has like part of it, but like I realize because of that, I need to go search something else to right? Like all of that building a strategy and understanding how to like incorporate new information over potentially like a lot of different actions just has like really broad reaching applications, I think, for how people build things with quad. And it just like maps really intuitively to how we as people think about solving complex problems. speaker 2: So kind of that planning and then executing loop Yeah doing taking an action, stepping back, reevaluating and taking another action. Yeah. speaker 1: Yeah. I think a lot of times we as people think about that on like the really granular scale, like when I'm doing a menial thing in my job, I don't necessarily think like, Oh, I need to reevaluate my action and try something different. But like that's still what is compiled down. Like that's what's happening with me is like I'm getting a piece of information I am seeing if that means I needed like what the next step I need to take is and having sort of like this continuous feedback loop and a lot of that is what's going to make houd a useful colleague, assistant, whatever it might be going forward. Yeah. I mean, I guess I can see it in . speaker 2: myself sometimes too. And I have like I have a to do list and maybe I wrote it at the start of the day Yeah and I go through some meetings and then all of a sudden I get new information or I'd take an action, I talk to somebody, and now I have to return to my to do list, reprioritize, move things around. And it's that same sort of like loop that Claud's learning here as well. Yeah, that's exactly right. speaker 1: And you can just notice it getting much better at that. Yeah and a thing. And maybe to make it concrete, a thing that would happen in the past is it would write out its to do ist and it would get like hyperfixated. Maybe it wouldn't be able to successfully incorporate this new thing. It learned, right? A good example. What you still see sometimes times, but you see a lot less, is like clads. Like I need to go to the top left in Pokemon and then I'll just walk into a wall for hours on it. And if the model is really fixated on walking to the top left, like Yeah, you keep just walking up until I get there. But if it's walking into a wall, like it eventually needs to step back and be like, mm, maybe I maybe I need to be doing something else. Yes. And that's the kind of thing where that's what happens in Pokemon. But Yeah, it's like super translational to a lot . speaker 2: of better things we do, right? That's a good parlanted. This next question I had around the what are the current kind of funny moments we've been seeing? I mean, it's not perfect quite yet. You know it's got a lot better over the past eight or so months, but we haven't beat Pokemon. Yeah, there's definitely been some funny times along the way. Yeah. Any stories you could share there? speaker 1: Yeah, quad is not perfect to this yet. I think like I have a laundry list of the things that I think quad still needs to get better at. And I'll start by harping on some of the things that like quad doesn't quite do well yet and that are interesting and they tend to make pretty funny moments. Yeah one of my favorites was related to its like visual acuity. It doesn't see the screen very well. So I was playing I was watching it play one time and I went to bed and it had walked into a building and it thought that a doormat in the building was a dialogue box, and you have to Press a button to dismiss dialogue. And I woke up the next day and it had spent eight hours pressing that button, trying to dismiss the dialogue box, like not realizing that like it was like it's like cycling through the dialogue and it just thinking it had to advance. And so there's a couple of things that are like a little bit interesting there to unpack. One is like making a mistake where you become really confident something's a dialog box is a pretty bad mistake. And so if you have like a core understanding issue, that's pretty bad. The other is time. Like pressing a button 15000 times to claad doesn't really mean much. Like to me, if I was spending eight hours pressing buttons, I'd be like, I'm pretty tired of buthat, much like my thumb Hurtz for quad like 15, zero button presses. Like who cares? Keep going, right? speaker 2: It doesn't know how much time is even allowed. speaker 1: Yeah. And so there's like some intuition space around, like the ability to comprehend how long is too long, like what is time, that kind of thing that I think can it's a bit funny and it needs some work on. So that's like one general category though is like seeing another fun story I have that's more on the like planning and strategy side. I was watching a play one time and it got into mounount moon, which is for the people who watch the stream, like a place where it really likes to get stuck for a long period of time. And it had one Pokemon at the time in this run that it had and it had the option to learn a new move that was a not attacking move. And you really need attacking. Like you don't have an attacking move. You can't beat other Pokemon. And it wanted to learn that move, but it just was like very excited. And so it like Press a Press A A bunch of times to like get clear red the dialogue to get to that point. And it accidentally pressed the button too many times and it deleted its only attacking move. Oh, no. And so now it's like stuck here in this part of the game with no moves, essentially like nothing to do, like no way to progress. And that's a little bit of like like when there are destructive consequences sometimes, like understanding that I need to go slow, right? Like something could go wrong wrong if I Press a 15 times without checking what happened in the middle. And a thing you'll notice in that is that sometimes, like its intuition is that like it will be able to stop itself. Like if I Press a, if I want to Press a 15 times, like I'll just say Press a 15 times and then if I see something go wrong, I'll just stop because Claude like doesn't quite have this intuition that like, right, I'm am a language bottle. Yeah, I don't have the right. This not actual, this tool is not giving me the affordance to stop. I'm not I'm not checking in the middle what's going on. Yeah. And so there's a little bit of like understanding about like and I think about this as like a self awareness of my limitations, the situation, that kind of things is Claude, that it sometimes struggles with. And that's actually really important. Like one of the things I withe most of quad was that it had like a slightly better awareness of, Hey, maybe I'm not good at seeing the screen. Like maybe the fact that I'm walking into this wall over and over and over again means that like I need to learn something at a meta level about my own capabilities and I should think about a completely different strategy rather than like walking into the wall. And so there's a little bit of like its ability to sort of like meta learn about its what it is, what its capabilities are, that kind of thing. And I think it could still has a lot of room to get better at. And then maybe one last thing around frustration, one of my favorite stories. So again, on mount moon. It typically the model takes like two days to get through matmoon. It's it's amazing. Like it's like it takes a long on time on the stream. I think what makes mount moon so hard? speaker 2: Yeah. speaker 1: So one of the things squd's worse that right now is like navigation, like wandering over a long period of time, right? It doesn't quite have a good understanding of like spatial awareness in general, I guess I would say. And mount moon is like the first place that really stresses where you need to make like navigate a pretty long maze of corridors to get to the right spot. And there's all sorts of like little nooks and crannies to get stuck. And I need to do a pretty nuanced like traverse through a series of paths to get to the exit. And it just takes quite a long time to like actually find all of the paths and get a sense of where it is and where it shouldn't go and that kind of thing. So one time I was, in fact, the first time it ever got there. When you go through mount moon, there's this like last thing you do, which is you have to get a fossil in mount moon. And once you get the fossil, you're like 15 steps from the exit, from finally getting out the other side. And so the first time I was ever watching quad tum mount moon, it goes through like three days, gets the fossil, and it's like, for me, I'm like, this is it. It's finally happening. I didn't I didn't think this was ever gonna to happen, that this was hopeless. I was like, ready to write this off. We were going to publish that benchmark we had, and this was going be the end of it. Like that's that's fine. We got one back, mowe're excited. And so I'm a like peak hype, like we're we're gonna get out. This is it. We could keep going at the game, right? And then it proceeded to get lost. It's 15 steps away. It turns around and goes the other direction, gets lost. And then it uses this item called an escape rope, which teleport you back to the last place you rested, which is like outside the beginning of the cave. So it spends three days navigating, is ten steps from the exit, turns around completely nopes out of the situation, like goes to the beginning again. And I just had a meltdown because it's the funniest thing you've ever seen. Me, it's objectively hilarious content. speaker 2: But like I was gonna cry because I would have loved to read this . speaker 1: like transcript on that one too. I sad because it doesn't even it doesn't doesn't lie. It doesn't even know. It's just like, this is a disaster. I'm lost. Like I best case scenario, I gencan just like get to the beginning and try again. speaker 2: It's like, yo, we're so close on the dialogue box piece. How does Claude actually break out of that? Yeah. Is there a tactics or is that like, all right, we goto reset the game and start over sort of thing? speaker 1: Yeah. This gets like the little details of how you can like how I've actually come to understand what it means to build good agents because like there's a train of thought that exists for a while of like build the most big, complicated system to try to patch every little weird quirk that a model has. And that's actually really hard to do. How I think about it now is I watch the model play like I give it a pretty simple, straightforward way to play the game. And I watch and I see what goes wrong and no way to find out that it's going to get stuck for eight hours trying to dismiss dial a box that I'm waking up and seeing it stuck for eight hours with that. And then you can sort of like build a, how do I how do I give it the right information? It needs to be able to break out of this. So like one really simple thing that actually helps is just like giving it a step count every time, it's like taking an action. So saying, Hey, like this is action 2400, your next action will be two, 500 or whatever. And then you can also say, like because one of your limitations is you don't have a great sense of time, you may just want to keep track of how long even trying to do something. And if there's something you should learn from that fact. Like if you've been trying for something for a really long time, like maybe it's a good idea to reconsider. And that's actually enough in the case of Pokemon to get it to keep track of how long it's been pressing a. If it's been pressing a for 10000 times, like if at least just by telling it to keep track of that, it has some hope of being able to realize this weird, I should get out of it. This is just like a little like it's literally just like thinking about the information that I need to give quad to be successful, right? Like quad doesn't have any innate sense of time. It does not every time you run, it is completely new to quad. And that's not for humans. Like we have a great sense of time. Like the sun goes up and down. It's very easy for us, right? And so this is just like I've thought a lot about as I've gone on this, like learning about what quad like what affordances it needs to be able to understand the situation a little bit better and then just providing it with that set of information. And so a lot of the iteration I've done in the last few months on this is just sort of like watch, see what it struggles with. Understand is there's some like piece of information that I can give quad that will help it have more of the tools and needs to reason about the situation. And then often that's like the best way . speaker 2: to start getting progress on this. Okay, cool. And that feels very similar to, I guess, some of the general prompting guidance that we usually give customright around. Like, Hey, if you were to write a prompt and give it to somebody, and they knew nothing about this situation, they were in a like a basically a box with no windows and they had to do this task, would they be able to do it? And if you don't provide all that context, that's right. And this is just kind of the next level of that. speaker 1: That's right. Agent, there's a small danger with agents to, because the whole reason why youever use an agent is you can't sort of enumerate all of the situations gonna to get into. If I wanted to beat the first part of Pokemon, an agent isn't what I would do. I would give it a series of like, here's what you need to do first, and here's what we need to do second, and here's what we need to do third, and things like that. And that was my only goal, right? And why you use an agent is you can't do that. Like it's these scenarios where you don't know what situation is gonna to be presented in front of you, and you really need to lean on the model to navigate and use its own intuitions for how to do it. And so there's a danger to get too far down that rabbit hole of like I'm going to try to predict every single thing and write that all into a prompt. You'll see some Frankenstein prompts if you try to anticipate every possible thing that the model could end up struggling with, which is why I think it's like just important to be measured and like read a lot. And like the thing I've worprobably the most is just like watch it, read what it says, see what it's struggling with, understand and just find out like the most minimal ways that you can give it a little bit more context rather than trying to like work around every . speaker 2: single detail that makes sense. Maybe switching gears a little bit here. So we included cloudplace Pokemon as part of our cloud 3.7 sonnet launch. Yeah we had the benchmark with all the lines how far each models got through the different gms. So then we put out like an article kind of explaining it. What was the reaction? Like like just the general public and people more in the AI space, I guess, like maybe to tell . speaker 1: like a little bit of the backstory about Yeah. So like they've this prompted it like you know as I've been hacking on this, like I, we have a slack channel quad place ys Pokemon that I've been like just like posting updates on this on and this has like had a history of like at first there's a lot of people who just, it's fun so that the nostalgia hit Yeah, like it's the same content of why someone would tune into the stream that we have. It's just like fun to watch. It's exciting. Like it's exciting to see the model they progress. This is kind of our baby is clawed. And so like seeing it you can just start rooting for and you're proud of it. And so that was like the initial like internal traction that had me excited about the project, just like people just like had so much fun looking at it. And then was this like switch that flipped for 3.7s on it, which was like, Oh, we're like learning something really interesting about this model. Like this is a thing we really wanted, was for a quad to be able to build better plans and act better over longer periods of time. And like Pokemon's actually like a pretty reasonable way for us to test this. And so suddenly I actually had like researchers coming to me and saying, like can we measure this? Like can we look at this? Like what what is actually going on here? And so there was this like small breakthrough moment, like a week before we ended up launching the model that was like, this might actually be one of the better ways we can tell the story of like, what is this thing we were trying to do by making Claude better over a longer time frames? And is this a good way for us to understand it and for us to tell this story? And that really like snowballed into a lot of like a how I thought about it. It's like maybe this is actually like a pretty crisp way for us to understand what is 3.7 soligood at and how should we use it and be like, this is probably an entertaining way to show it to the world too, right? In a pretty tight little sprint, we decided, like this is a thing we should put out in the world. We should make a graph that shows how good it is compared to other models. We should put together the twitch stream and let people like, see and experience that same sort of fun and excitement we had seen, and also like the feel for what's actually going on here. We should talk about it in our research materials because it helps people understand. And that really was like the inspiration of it. speaker 2: I think that just makes like a great point on we wanted to be able to tell a story about how Claude was improving in this kind of dimension. Yeah and that's getting harder and harder to do Yeah as like the models get better. Yeah and we're almost having to like move into this regime of equipping the models with like real life things now instead of just like artificial forced Yeah test cases and benchmarks. speaker 1: I know anytime you have like a little tiny test case, it's like pretty hard models start getting to 100% pretty fast and we just get excited anytime there's a evaluation that a model is getting 30% on, you know like it's getting a third of the way through the game roughly. Like that's amazing. We know something about it's like not good enough to do this, but it is good new to do this. That's like pretty good information. That was like one of the interesting, I never had that in my head as like why I'm doing this, but it became, I think this was like the moment where it actually was like very interesting to look at and understand. speaker 2: But then do you want to talk about like what happened after all? Yeah. So we launch we launch it. We include it in the materials. We put out the blog post. speaker 1: we start up the twitch stream and then what happens next? It was a lot more popular than I expected. I guess I shouldn't be too surprised. The AI world is pretty exciting these days, but it has had like a remarkably large set of people who were like excited to watch and experience and engathrough the same thing as me for like the first two weeks, there were like thousands of people at any given point in the day, 20 47, tuned in watching crazy, which is amazing. People were making, there's like a subreddit that was started. People were making memes in fan art. And I saw a song the other day that someone made about it, which is amazing. So like I guess at like the most grassroots level, it's had this like very fine community. One of the most amazing things to me about it is, I don't know, I am skeptical of online chat rooms, but the chat on it has actually been like really positive and fun. And people are like excited for clad, talking about AI, talking about agents, like engaging with this thing. It's been like kind of amazing at how positive and fun it has been to have a community around that. And then the like other side that I've found, like really, I guess like rewarding is I think this has just like been a way that people actually can see and understand what an agent is for the first time. Like the AI world has talked so much about agents in the last year. And I think like when you're used to being a person that like writes code or uses coding, agents like that might ground pretty easily in what we're talking about. But for a lot of the world, I think like it's pretty hard to really understand what that means. And I think the example of Pokemon where it's like, Oh, it's like not just a chat bot where I type in a chat and I get a response back, but like it's like doing this on its own and seeing and trying things and taking actions and all of that is a way that I think more people have been able to like latch on to, what is this agent's thing we're talking about? And I think that's great. Like I think it hopefully can like bring more people into the dialogue of what are we building here? What are the possibilities of AI? How can I think about how this is going to impact me, how I can use this to the most impact, how I can like accomplish more and more by not just thinking about it as a chat box where I type in a question and get an answer, but as a collaborator that I can ask to go do something that's complicated, that takes time, and it can actually do it. And I don't think most people are going, na, go ask Claude to play Pokemon and see what happens. But like I do think it's been a way where it like resonates a little bit more than what some people have been able to in the past, which has been like maybe . speaker 2: my favorite thing out of it. Yeah, I love that. I think it's so I've had so many people that weren't necessarily in the AI space reach out to me and ask, what is this whole thing? Why is called d playing Pokemon? And when they start to dig into it more, it's like, Oh, okay, I kind of understand now where these things are headed, what they're able to do. It gives that more like visceral feeling about Yeah where models are currently at Yeah and the twist shot is absolutely, it's probably one of my favorite things like of any of our launches Yeah just to like see all these random strangers from across the world I know like cheering for Claude and making memes out . speaker 1: of like Claude when fails in mtain moon, we launch the stream on the day after we launch the model. And then it was the following Saturday that it finally started going out the path to get out of mount moon for her assignment that had spent three days there. Yeand man, the chat was electric. Chat was blown up. Crazy. Yeah. Going crazy. I was sitting there on my couch next to my wife who I was ignoring. I'm so sorry to her. And just like glowing as people are just like having so much fun cheering for quad and rooting it on and like getting so hype. And Yeah, I could never have imagined how much fun having like an army of people cheering for Claude would have been. speaker 2: Yeah before we put it out there, I think we need some way Claude can interact. I know in the next iteration. speaker 1: I know claad deserves to . speaker 2: know how many people love. Yeah and it should be able to talk back with the chat and like make call outs and everything. Maybe go full twitch streamer. Does Claude have a . speaker 1: favorite Pokemon? So Claude is very tactical, very pragmatic. So there are a few things I'll say as a starter, Claude really likes to go for balbasaur. It doesn't always succeed. Sometimes it gets lost trying to find it. But I swibalbasabecause it has a tiadvantage in the first two gms. Really good strategic choice. So it gets obsory a strategy. Very rational. I call out at the beginning. That said, there are a few Pokemon that like it always seeks out in a run that are like the rare ones. So it loves catching Pikachu. If it ever sees a Pikachu, it gets really obsessed with catching it. It also digs a clefari and mountain moon. It really likes to go for those. So it likes the rare ones. Say when it want it see something that it knows is rare. speaker 2: it goes right after that. Yeah, that's like the same chagy my . speaker 1: eight year old self. Yeah Yeah Yeah. speaker 2: We're on the same page there. Claude, last question. Do you have any advice for folks out there that might have watched Claude place Pokemon or they're just getting started building on top of Claude for how they can start thinking about building their own sets of agents in any takeaways just from this whole experience? speaker 1: The biggest thing for me, and I actually think this is like advice across all of adopting AI I've given people, or this advice before in different contexts, is like start by doing something you love, like that does fun. Like this is not related to AI at all, but it's like I think the difference between people who crush it and like figure out how to adopt AI and not is just like a certain amount of time coming to some understanding of what models are good at, what they're bad at, what can I trust it with? How do I really gain trust in this model? And by starting with something that you are excited about that's fun, that you're gonna na want na like boot up at 7:00p.m.after a day of work and like actually go hack on. Like that was the thing that made Pokemon so magical for me is like I would get done with work. And it was like the first thing I was excited to do, you know, and it meant that I had like so much space to really get to learn and know this model. And there's all sorts of like technical details I can tell you about how to build agents I've learned. But like more than anything, you learn by interacting with and experiencing quad and finding the way that is going na set you best up to be excited to spend six hours with cloor, whatever it is on a week is the thing that I think it's going to get you into it. Because once you've done it once, it's like much easier for me to reason about, like how would I go build an agent for something else? And it's also, for all the reasons we talked about that are translational, like the things that quad is good at in Pokemon. Actually tell me something about what I can expect about different things and how I use claad, what I need to look for if it's good enough at this, like how to how to think about finding out if quad can handle this part of my job that I want to automate is the same things, same way I went about figuring out like can claad figure out how to get out of mount moon or not, you know, and so just that experience and intuition. My biggest piece of advice is just like find something to you, have fun with, build a relationship with quad and that will like Carry you so much more than any like individual prompting tip or something like that will. speaker 2: I love that. That's great. Well, thanks, David. This is awesome. If you want to follow along with Claude place Pokemon, I will drop a link to the twitch stream below. I expect that will be continuing to run Claude and future versions of Claude on Pokemon going forward. And thank you for watching.
最新摘要 (详细摘要)
概览/核心摘要 (Executive Summary)
本次讨论深入探讨了“Claude玩宝可梦”(Claude Plays Pokémon)这一AI智能体实验的背后故事、技术实现、模型演进及深远意义。该实验由Anthropic的应用AI团队成员David Hershey主导,旨在通过让Claude模型玩《宝可梦 红》这款经典游戏,测试和评估其在复杂、长周期任务中的智能体(Agentic)能力。实验的核心发现是,尽管Claude的视觉识别能力(理解游戏画面)改进不大,但其在策略制定、计划调整、从失败中学习和多步骤推理方面的能力随着模型版本迭代(从3.5 Sonnet到3.7 Sonnet)取得了“天壤之别”的进步。
技术上,Claude通过一个“工具”集与游戏模拟器交互,该工具集允许其执行按键操作,并通过接收游戏截图作为反馈来决定下一步行动。为解决模型有限的上下文窗口问题,开发了一套长期记忆系统:一个外部“知识库”用于存储关键信息和目标,同时结合对短期行为的周期性“总结”,以防止模型“失忆”。
实验中出现了许多有趣的失败案例,如将门垫误认为对话框并连续按键8小时,或在离洞口仅几步之遥时使用“逃生绳”返回起点。这些失败揭示了模型在视觉感知、时间概念和“自我意识”(理解自身局限性)方面的不足。然而,该项目在公开发布后获得了巨大反响,其Twitch直播吸引了大量观众,并成功地将抽象的“AI智能体”概念以一种直观、有趣的方式普及给公众。最终,该实验不仅为评估和迭代AI的长期规划与适应性能力提供了一个超越传统静态基准的、动态且可量化的新范式,也为开发者提供了宝贵的经验:构建AI智能体的最佳起点是选择一个充满乐趣、能激发持续投入的项目,通过实践建立对模型能力的深刻直觉。
项目起源与目标
- 发起人与动机:David Hershey(应用AI团队)为了给自己创建一个测试平台来试验和理解AI智能体而发起了这个项目。他观察到,智能体是为客户创造价值的最重要领域,他希望深入了解Claude在需要连续执行大量动作而无人干预情境下的表现。
- 为何选择宝可梦:
- 理想的测试环境:宝可梦是回合制游戏,对模型的响应速度要求低。模型可以有充足的时间分析截图并决定下一步行动,而不会有实时惩罚。
- 清晰的反馈机制:游戏内置了明确的进展衡量标准,如获得道馆徽章、游戏进程推进等,这使得评估模型的成功与否变得直观且可量化。
- 个人兴趣:David本人是宝可梦的长期粉丝,这为他提供了持续投入项目的热情。
- 核心目标:项目的目的并非让Claude通关游戏,而是为了深入理解和评估Claude作为智能体的能力和局限性。
> David Hershey: "My goal is not to beat Pokemon red. Like I did that when I was six... I wanted to find out how Claude figures out the answer."
技术实现:Claude如何玩宝可梦
- 基本架构:
- 初始指令 (Prompt):给予Claude一个极其简单的指令:“你正在玩宝可梦”。
- 工具集 (Tools):为Claude提供一套可以与Game Boy模拟器交互的工具,主要是模拟按键操作(如按A键、B键、上、下、左、右)。
- 核心反馈循环:
- Claude决定并执行一个按键操作。
- 系统向Claude返回一张游戏画面的截图。
- Claude基于新的截图,结合其记忆和目标,决定下一个动作。
- 这个“观察-行动-观察”的循环会持续进行。
核心挑战与解决方案:长期记忆系统
- 问题:上下文窗口限制:Claude的上下文窗口有限,大约只能容纳50次按键操作及相应的截图。若无处理,模型很快会“用尽内存”,导致任务中断和“失忆”。
- 解决方案:David设计了一套双重记忆系统,其灵感类似于电影《记忆碎片》(Memento)中的主角。
- 长期记忆 (Knowledge Base):给予Claude一个工具,让其能将关键信息写入一个外部的“知识库”(类似于一个纯文本文件)。这些信息包括:
- 当前的目标(如“我需要去下一个城镇”)。
- 已拥有的宝可梦。
- 已完成的任务。
- 从经验中学到的教训(如“某个策略很有效”)。
- 这个知识库在整个游戏过程中持续存在,Claude可以随时查阅。
- 短期记忆总结 (Summarization):当上下文窗口即将填满时,系统会要求Claude将最近执行的几十个动作总结成一句话。然后,系统会清空这些详细的动作记录,只保留总结和长期记忆,从而为新的动作腾出空间。
David Hershey: "it's been running for three weeks continuously, it's probably summarized itself a few thousand times by now."
- 长期记忆 (Knowledge Base):给予Claude一个工具,让其能将关键信息写入一个外部的“知识库”(类似于一个纯文本文件)。这些信息包括:
模型迭代与能力演进
Claude在玩宝可梦上的表现随着模型版本的更新换代发生了显著变化:
- Claude 3.5 Sonnet (2024年6月版):
- 表现非常差,能力有限。
- 花了大约三天时间才找到初始房间里的楼梯。
- 能触发获得初始宝可梦的过场动画已是当时的“巅峰成就”。
- Claude 3.5 Sonnet (2024年10月更新版):
- 有明显进步,能稳定地找到楼梯并获得初始宝可梦。
- 首次赢下了一场战斗。
- 但仍然非常缓慢,会犯很多“愚蠢的错误”,其表现仅比“随机按键”好一点点。
- Claude 3.7 Sonnet:
- 实现了“天壤之别 (way better)”的飞跃。
- 即使在David的代码存在一个bug(该bug导致用于导航的地图未能正确显示给模型)的情况下,其表现也远超前代。
- 能够有意义地玩游戏,并成功击败了一位道馆馆主。
什么能力得到了提升?
一个令人意外的发现是,模型进步的关键点并不在于视觉能力。
- 未显著提升的能力:
- 视觉理解 (Vision):David指出,Claude理解Game Boy像素画面的能力“一直以来都那么糟糕”,并未随模型迭代有太大改善。
- 显著提升的能力:
- 策略与规划:模型更擅长提出新策略,并质疑和修正自己先前的错误策略。
- 问题解决与韧性:当一个方法行不通时,模型更愿意回溯并尝试其他所有可能的解决方案。
- 从新信息中学习:能够更好地将新信息(如NPC的对话、战斗结果)融入其现有计划,并进行动态调整。这与人类处理复杂任务(如编程、研究)时的思维循环非常相似。
有趣的失败与当前局限性
Claude在游戏中犯下的错误生动地展示了当前AI智能体的局限性:
- 视觉误判与缺乏时间感:
- 案例:Claude曾将建筑物内的一块门垫误认为是一个对话框,并花费了整整8个小时连续按A键试图“关闭”它。
- 暴露问题:1) 核心的视觉理解错误;2) 完全没有时间概念,对“尝试太久”没有直观感受。
- 缺乏“自我意识”与破坏性行为:
- 案例:为了学习一个新技能,Claude连续快速按A键,结果意外地删除了自己唯一一个攻击性技能,导致自己被困在游戏中无法战斗。
- 暴露问题:模型不理解其行为的潜在破坏性后果,也缺乏对其自身局限性(如无法在连续按键中途停止)的认知。
- 导航困难与“挫败感”行为:
- 案例:在“月见山”这个迷宫里,Claude花了三天时间,历尽艰辛,在距离出口仅有15步之遥时,它迷路了,然后使用了一个名为“逃生绳 (Escape Rope)”的道具,将自己传送回了洞穴的入口,前功尽弃。
- 暴露问题:空间导航能力差;在感到“迷失”时,会采取一种看似理性的“重置”行为,但实际上是灾难性的。
- 模型的“偏好”与决策逻辑:
- 对话中还透露,Claude表现出两种决策逻辑:它会出于战略考虑(妙蛙种子对早期道馆有属性优势)选择初始宝可梦,也会因稀有度而执着于捕捉皮卡丘和皮皮等宝可梦,这与人类玩家的行为模式惊人地相似。
社区反响与公众影响
- 内部演变:项目最初只是Anthropic内部一个有趣的实验,随着3.7 Sonnet的卓越表现,它转变为一个严肃的、衡量模型规划与推理能力的基准。
- 公开发布与巨大成功:
- Twitch直播:直播吸引了数千名观众24/7不间断观看,形成了一个积极、热情的社区。
- 社区文化:催生了专门的Reddit子版块、大量的梗图、粉丝艺术甚至歌曲。
- 核心价值:该项目最大的成功在于,它将“AI智能体”这个抽象的技术概念,以一种具体、直观且引人入胜的方式呈现给了大众。
> David Hershey: "it's a way that I think more people have been able to like latch onto. What is this agent's thing we're talking about?"
给开发者的建议
David Hershey为有志于构建AI智能体的开发者提供了核心建议:
- 核心建议:“从做一些你热爱且觉得有趣的事情开始。”
- 理由:
- 建立直觉:要真正掌握AI,需要花费大量时间与其互动,以建立对其能力、弱点和可信度的深刻直觉。一个有趣的项目能提供持续投入的内在动力。
- 知识的可迁移性:从一个有趣项目(如玩宝可梦)中学到的关于模型如何思考、规划和犯错的经验,可以直接应用于解决更严肃、更实际的业务问题。
- 实践大于理论:这种通过实践获得的深刻理解,远比任何单一的提示工程技巧或技术文档都更有价值。
总结
“Claude玩宝可梦”项目不仅是一个成功的AI能力展示,更是一次关于如何评估、改进和理解高级AI智能体的深刻实践。它揭示了当前模型在战略规划和适应性方面的巨大进步,同时也暴露了其在感知、自我认知和常识推理方面的局限性。最终,该实验证明了通过富有挑战性且反馈明确的复杂任务来推动和衡量AI发展的重要性,并为广大开发者指明了一条通过兴趣和实践来掌握AI智能体开发的有效路径。