2025-05-13 | Stanford CS25: V5 I The Advent of AGI, Div Garg

Div Garg，AGI Inc. 的创始人兼首席执行官，在演讲中探讨了人工智能代理（AI agents）及其通往通用人工智能（AGI）的路径。他指出，AI代理是能够感知、推理并在开放环境中行动的系统，代表着迈向AGI的第一步，有望彻底改变人机交互方式。然而，实现AGI面临诸多挑战，包括推理脆弱、目标漂移、记忆浅薄以及不确定性下的校准问题。解决这些问题不仅需要模型改进，更需要重新思考智能系统的设计、评估和部署方式。

Garg提出了一种受人类启发的代理设计方法，涵盖了新的代理评估标准、在线强化学习训练方法以及代理间通信（如MCP、A2A和Agent Protocol）。他详细阐述了AI代理的架构，包括短期/长期记忆、工具使用能力、高级规划（如反思、自我批评和任务分解）以及执行行动的能力。他通过一个AI代理成功通过加州DMV驾驶考试的案例，展示了代理在现实世界中的应用潜力。

Garg强调，构建AI代理的原因在于它们在数字世界中比人类更高效，能够解锁更高的生产力，并实现更复杂的系统。特别地，开发类人代理至关重要，因为它们能够像人类一样操作为人类设计的计算机界面（如键盘和鼠标），从而实现更直接和广泛的应用。

视频科技

媒体详情

上传日期: 2025-05-18 15:20
来源: https://www.youtube.com/watch?v=nEHNwdrbfGA
处理状态: 已完成
转录状态: 已完成
LLM 提供商/模型: openai/gemini-2.5-pro-exp-03-25

转录

下载为TXT

speaker 1: Today we have our co instructor, div, talking about human inspired approaches to agents and how the path to agi requires a rethinking of how we design, evaluate and deploy intelligence. Div Garg is the founder and CEO of agi Inc, a new applied AI lab redefining AI human interaction with the mission to bring agi into everyday life. The previously founded multi on the first AI agts startup, developing agents that can interact with computers and assist with everyday tasks. Funded by top Silicon Valley vcs. Deb has spent his career at the intersection of AI research and startups and was previously a PhD student here at Stanford focused on rl. His work spans across various high impact areas ranging from self driving cars, robotics, computer control, and Minecraft. AI agents.
speaker 2: With that, I'll hand it to him. So take it away. Yes. Excited to be here. Great. So Yeah excited to be here. And the topic for this lecture is we wanted to talk about a lot of new things that are happening in the AI world right now. So there's been a lot of developments with agents and all the new models that are coming out. And it seems like you have like some sort of super intelligence when it comes like chat and reasoning already there compared to like average humans. And it is going to be very interesting like the next few years as you figure out what does intelligence look like? What is something like hei, and what is the form factor? How can this be something that's useful? And like how will this be applied in society? Cool. So let's take the first thing we want to touch on. Like what does agi look like? It's like agi is such an abstract concept right now. It's like no one has like visualized it or given it a meaning. It's like, is it some sort of supercomputer? Is it just like ChatGPT, but just like ten x batis? It something that's more of a personal companion? Is it something that's embedded in your life? And like that's not clear yet. And those are like kind of the questions I think we really need to go and figure out. This is one diagram on how AI agents work. So this is architecture from open air researcher Lilian Wang. She recently left and joined a new company. So this is showing how you can think about agents and how they can be broken down into different supaffs. And there's a lot of different things that you require to make the siwork. So the first layer is memory. You need to have like some sort of short term memory. You want to have some sort of long term memory. This is like you have some sort of short ter presentation that's maybe like a chat window if you're using something to like chat tgpt. And you might also have like a personal history of the user where like, okay, this is maybe like what the user likes. This is what they don't like. The second thing that you need is tools. Like you want this kind of agents to be able to use tools, like how humans use tools. So you want them to be able to use calculators. You want them to be able to use a calendars, web search, coding and so on. The third part over here is like you want to have advanced planning. And that means like you want to the agency be able to like use reflection where like if something goes wrong, they can have faillover mechanisms, error correct and like recovered. You want like self criticism, and you want like decomposition, where like you have chains of thoughts so that agent can do their own reasoning loops. They can also break down a complex task into some goals. And the final fourth ingredient is actions, where like you want this agency to be able to act on your behalf and like go do things. And this is kind of like high level encapsulates how agents look like fundamentally. And this is maybe like what will as this systems become more powerful over time, will eventually lead to something that's like ar. This is one thing that we also building at. So I recently started this new AI log. We're called agi Inc. And we're looking a lot into like, what does agi look like for everyday purposes? And like how can this be applito daily life? This is one of the demos of some technologies we built in the past. This shows how an AI agent can be applied in the real world. So this is a bit old, and this shows like how an AI agent can be applied to PaaS a real driving test in California. And so this is like an actual dmv test that the agent took. And then let me share the screen and talk about setup. So in this screen, what's happening is there's someone attempting the dmviolinterest and there's a human who has their hands or the keyboard. They're not actually touching the screen. And it's the agent that's going in taking all the exams. And there's like 40 questions in this test and the agency regards that can go and PaaS the whole thing. And we did this live. So the dmv was actually like screen recording what we doing. They were also like watching like the person on camera. But even then, like the agent was successfully able to evade like the whole startup and like past exam. So this was really fun. We did this as a White haecking attempso. We informed the dmv afterwards that we did this. Funnily enough, they actually sent us a driving license afterwards. So that was really fun actually. So at the end, the agent is able to PaaS and get a PaaS, get falcode on this test here. And so Yeah, so this is like a very fun experiment showing how agents can be applied in the real world. ID, and like there's so many things that are possible in this vein, like how can we make agents more useful, apply them in real life? We have been like working on a lot of different efforts along with a lot of like the AI community. One of those things is like agent evaluations. How can we evaluate this kind of agents in the real world and make sure we have standards and benchmarks that allows us to know, okay, how well are these agents working on like different websites or different use cases? How can we trust them? How can we know, okay, where to deploy them and how to use them? Another thing we doing have been doing is agent training. Can we train agents to be able to do advanced planning, self correction and improve themselves? And that this uses the combination of like reinforcement learning and a bunch of other advanced techniques. And finally, we have also been looking a lot into agent communication. Like how can you have agent communicate with other agents? And there's been a lot of like new breakthroughs in this area recently. So if you have looked at model context protocol mcp, that's a very new thing that has been coming out. Similarly, like there's a lot of work around like a two a. That's like Google's agenand two a and communication protocol that recently came out. We also have been working on some open source projects called agent protocol where you also we have been allowing different kind of agents to communicate to each other. So you can have a coding agent that can talk to, a web agent that can talk to like apbased agent and so on. And that allows you to like do like much, much more complex things than what's possible with just a single agent. Cool. So before we dive more deeper into like how a lot of these things works, that's I want to bring about. Like why do we need agents? Like why are they useful? Why do we actually want to go and build them? And there's a lot of things we need to think about here. And I will touch on a lot of different different topics in the introduction, going from the architectures, building like more like human like agents using computer interactions, maybe like memory, communication and like what are the future red actions? So like when you think we're building agents, there's a lot of things questions you have to answer. The first one is okay, like why is this useful? How can you actually build them? What are the different building blocks? And finally, what can you do with them? And to first answer the y question, we have this key thesis that agents will be more efficient in interfacing with computers in the digital world compared to humans. And that's the reason that we want to go and apply agents to be able to do things for us. So you can imagine you have an army of virtual assithat are like fully digital that can go and do whatever you want on your behalf. And you can talk to them using a human interface. And that's kind of the vision we have been like moving towards. Also have a blog post about this called software 3.0 that you can check out, which touches upon with some of those ideas. Oh, so we want to .
speaker 3: go and build agents because .
speaker 2: usually like large language models are not good enough. And we want like action capabilities that allows us to unlock more productivity and go do things. And this also allows us to build more complex systems. There's a lot of techniques involved in actually building this, such as like chaining different models together, reflection and a bunch of the mechanisms. And as sure before in the architecture slide two, there's a lot of different components like memory actions, personalization, access to the Internet and so on. And finally, the questions will come. What are the different applications we can apply them to? There's also a question of why do we want to build human like agents? Like why can't we just have api agents or why can't we have a bunch other kind of agents you can imagine, which are not mimicking human interactions? And one reason we want to push towards more human like agencies, these agents can operate interfaces like how we do. And usually the Internet and the web and computers are designed for humans. So they're designed for like keyword and mouse interactions so that we can go and like navigate interfaces. We can like use our and if agents are able to like use interfaces like we do that allows them to directly communicate and like do a lot of things, they're changing how current software programs work. And that becomes very, very effective because that allows you to work on the 100% of the Internet without any sort of bottlenecks. If you think about apis, there's only like 5% of apis on the Internet are public that are accessible, and it's very hard to build agents that are fully reliable over apis. And so there's a lot of contention between human agents versus api agents, and that's like an ongoing battle that's happening right now. Second thing is you can imagine a lot of humanic agents as becoming a digital extension of you. So they can learn about you. They can have a context about you. They can do tasks like how you will do it. They also have less restricted boundaries. This kind of human like agents can handle logins, they can handle payments, and they're able to interact with any of the services without restrictions on terms of app access. So you don't need to pay for using an api or you don't need to like go to like a service provider and ask them for, okay, can you give me access to this api? You can just go and use an interface like you normally do. And the final thing is like there's a very ing simple action space. The agents only need to learn how to click and type. And if they're able to do that very effectively, they can generalize to any sort of interface, and they can also improve our time. So the more you teach them, the more data can give them. They can learn from, like user recordings, feedback, and become better and better over time. And so when it comes to this api versus more direct computer control agents, this are kind of like how we think about like the pros and cons. Api agents are usually easier to build. They are more controllable, they are more safer. But apis have higher variability. So you have to build like different agents for each api, and then apis can keep changing. You never have like full guarantee that this agent will always work 100% when it comes to this more direct interaction computer to control lled agents there, it's easier to take actions. In this case, it's also more free from interactions because you're not restricted by the api boundaries. But it's also hard to provide guarantees because you don't know what the agent will do. So if anyone here has played with like agents, like operator, it's work in progress. It's not like clearly there there's a lot of like issues that it turns into and that's kind of the boundaries where like agents are right now. There's also like different levels of autonomy. When you think about agents, this is usually goes from level one to level five. So level one to level two is when a human is in control and the agent is acting like a Copilot. So it's helping the human erso. This is something like if you use like a coeditor like cursor, that' S L two agent where you have paral automation, where the human is in control, the human is directing the code, but the agent is helping them. When it comes to something like l three, this is where there's still a human fallback mechanism, but the agent is in control. So this is like if you use like cursor composer or Windsor or any of the newer code editors that are more agent tic, the agent is writing most of the code, but a human is monitoring, giving it feedback. Okay, this went wrong. Can you correct that for me? Can you fix this issue? And that is more for l three system. And then you have more advanced systems, which are like l four and l five. In l pha systems, you don't have a human in the loop, so it's the agent that's going on doing everything. You might still have some sort of like automated fallback layers. So if you look at wemo in sf, that's A L pha system because the cell driving car is driving itself, but there's human operators that are remotely monitoring it, making sure that nothing goes wrong. And when you have l five system, in that case, there's no humans in the loop, there's no monitoring, and the AI agent is able to operate itself autonomously, fully, fully, independently. So when we are .
speaker 3: building this agents.
speaker 2: one hard thing is trust. How do we trust this agents actually going to go do what we want them to do? How can we go and deploy them in the real world to solve these issues? One effort that we have in building is a miniature version of the Internet, where we have cwn, like the top 20 websites on the Internet, and we are benchmarking, okay. Ly, how do agents go and perform on all these interfaces? This is actually live, so you can go check it out on here. Valdraxfive c, and what we have done is we have built like digital clones of websites like AirBNB and Amazon and dodash and LinkedIn. And the agents can go and navigate these interfaces on predefintasks, and you can get a final score. This is showing the evaluation results for GPT -4o. We find that GPT -4 zero actually is not very good on when it comes to being agent tic and it only reaches 14% successful aggreacy in this case. We try this on like eleven different environments that we are showing on the right. And we have our different moments. We have dash dish, which is our dodh clone and omnizon and so on. So you can actually go and check this moment out. We also compare a lot of open source frameworks out there. Some of them are like the open enair computer use model that's poweroperator. We actually find it's not very good when it comes to this task. Ks, so it's only able to reach maximum 20% accuracy on some of the environments, like our email environment or our calendar environments. But on a lot of the other environments, it's not able to actually go and do really well. We also tried a bunch of the frameworks out there, stage hand, if you have seen that it's open source framework for automating web agents, browser use and one of our own custom agents, which we are calling agenzero. And we find that agents are still like early when it comes to like actually automating a lot of those interfaces. And we are able to reach me like I will select up to 50% success rate. But a lot of the students are actually failing when you are applying them in a lot of these real world websites. Similarly, we benchmark all the different models that are available, including all the closed source apis and all the open source models. And we find again, on erc tasks, most models are doing like decently well, but no one is like really good right now. The maximum success we have seen is with like clot 3.7, where it can reach around 40% accuracy. Gemini 2.5 and zero, three follow very closely with it. The other models like tend to taper off. And so the interesting learning has been for us is that a lot of these models are not fully ready to be deployed in the real world. Because if you have, say, like an agent that's powered by Claude and then you aped that, you can only expect a 41 person success rate. That this will actually go do what you want to do, and that's not good enough. And this brings a question, okay, what is it that is required to make this agency even better? How can they improve and how can they be applied for your actual practical use cases? And so this brings us to our next topic for the lecture, which is like, how can we train AI agenent tic models? So how can we have models that are more custom, fine tunand, are better on decision titosks. This is one of our past works called agent q, which is a self improving agent system. So this is agent q. That's a system that can self improve. You can learn by corrections and planning. And how the system works is it's able to go and self correct itself. So whenever it makes a mistake, it can save that mistake in its vamemory. And it's able to like use that to do a lot of trial and error learning similar to humans. So like suppose the first time you learn how to ride a bike, you make a lot of mistakes, you follow a lot of times, but over time, you're able to improve your policy and go and do that really well. We apply similar mechanisms to make these agents actually work really, really well in the real world. And so what's happening in this system is the agent can explore the space of interfaces and see, ok, what are the things that it did that went wrong? What are the things that went right? And it's able to use reinforcement learning to self improve and become better and better. So asient q combines a lot of different techniques. The first method is Monte Carlo research. This is borrowed from like other rl techniques like AlphaGo that allow you to plan over search space of tasks and unlock like advanced reasoning. A second thing that we do is self critic mechanisms. So the agent can self verify and get feedback whenever it makes a mistake and to able to learn from that feedback. And finally, we use rl aif techniques like dpu direct preference optimization to be able to improve the agent using rl. And by combining all these two techniques together, we are able to build some very powerful systems. Agent queis also available on archive as a research paper, apor. So you can go and check it out. For the sake of time, I will give some of the lectures here. But how aent key normally works is we have this Monte Carlo research where the agents is exploring the different states. It's estimating like rewards on if we were to visit like this state, what's expected value of the future preted reward. And based on that, it's able to improve its prediction model like a collect. Should we go take this path or a different path in the tree? And then over time, the agent can become very good at exploring the right states and figuring out like what are the right paths in the state space and what are the wrong ones. We also do excel critic mechanism in this case. What happens is if you have a particular task in this case, where like say, a user says, like book me or reservation for a restaurant for the chahow on open table for two people on August 14, 20, 24 at 7:00p.m., and this is the current state of the screen where you can see the screenshot, then the agent can go and propose a bunch of different actions. So you can choose to go and select the date time. It can choose to select select the number of people and then open the date lector. It can instate search for terra Italy Silicon Valley restaurant and type that in the search bar. Or it can maybe like decide to go to the open table home page. And how the self critic mechanism works, as like all this proposed actions are pastoral critic network and the critic lm is able to go and predict like, ok, what's the best action you take? And so to give a ranking order, okay, this is the best action that we should go and use. This is like, so this is like rank one. This is rank two and rank three. And based on that, we can go and optimize the system to take the correct actions and improve over time. And finally, we use reinforcement learning from human feedback, where we used methods like grpo and tpo, which are different, our algorithms, to be able to use all the failures and successful directories you collected so far and improve the agent over them. And so dpu is a special atechnique based on our lecture, where you can train an lm using preference data of failures and successes and use that to improve the model overall. And so this is how Asian cube works, where we create this Monte color lo research to create trajectories of successes and failures. We can then use celcratic mechanisms to identify what are the proposed actions that actually succeeded and failed. And then we are able to PaaS them through dpu to actually go and optimize the network. This is an example of how this works. So the agent starts in like the first rate, and the task in this case is we want to go and book a restaurant reservation on open table. So first it makes a mistake and goes to the home page, then recognizes that middle mistake and can backtrack. Ack. So the blue arrow here shows that it's going on backtracking. Then it can go and navigate to the rice restaurant. In this case, if the agent accidentally makes a mistake and choose the incorrect date, then it can, again, backtrack, recover back, open the date lector, choose the right date, open a seselection, and then finally complete it, the reservation. And so this is kind of how the system is learning over time. It's making a lot of mistakes, but it's saving the mistakes and over time, improving on them. We tried Asian quein, a lot of real world scenarios, including like open table, actual reservations. So we actually spun up thousands of, or I would select more like 100000s of bots that are ran on open table and use our method to create agents that are able to book restaurants and like make reservations and do a bunch of other things. And we tried this with a lot of different methods and models out there. So we tried GPT photo, and then we found like on this open table reservation tasks, we are only able to reach around 62.6% accuracies. When it comes to something like tpu, the accures actually go to something like 71%. When we try agent q, we are able to make this work much, much better. So we are able to reach 81% accuracy without any cts as part of the method. And when we apply the whole technique with mts and dpu and like the self critic mechanisms, we are actually able to reach close to 95.4% accuracies. And this is using a lot of like self learning for the agent to improve itself. This takes usually less than one day of training for the agent to go from, I would say here, like 20% accuracy. That's 18.6. That's roughly percent all the way to 95.4. So that's a forex improvement in agent performance in last Monday. All right. Cool. As the next topic, I'll touch on memory and personalization. So one way to think about AI agents is that they are taking information, processing them. And okay, so imagine like khave, a air model, what an AI model is doing. It's taking some prompts. So it's taking some language tokens and start putting some new language tokens. And so this is acting like similar to processor, where if you have a cpu, what happens is you have some instructions, which are usually manually encoded, that go into the cpu, and then you have some instructions that come out, which also binally encoded, and then you do a loop over them again and again. And that's how like normal computers work. You can do a very similar thing and have the abstraction of an AI model as acting similar to a computer, where you have language tokens that are going in, that are encoded in the prompt, and you have language tokens coming out. And this allows you to think about an AI model as being a processor that's operating over natural language. So this is something that you can think about GPT -4, for example, going and doing this. This is similar to some of the older processor like mythirty two that use 32 bit instructions. Right now, if you look at GPT -4, we are able to reach like very big context. So that's very interesting. And when like GPT -4 initially came out, it was constrained to like 8K tokens. Now we have 32K tokens and 120k tokens and like 1 million tokens. So the context length of the model is just increasing, increasing over time. And as the context length increases.
speaker 3: that also allows us to have.
speaker 1: A question from online, can you speak to the compute budget for the day long run? Like was it H -100s or like cluster?
speaker 2: The results? Yes. Yes. So that was all at 100. Actually, we trained the whole models on 50H -100s in less than one day. Gotcha.
speaker 1: And then one question from before, as AI agents increasingly emulate human behavior, what protocols do you foresee being implemented to help users distinguish between AI and humans in conversation?
speaker 2: Yeah, that's very interesting. And that becomes a question of security, of how can we identify is whether it's a human or agent. It's actually a very hard question right now because you actually have voice agents that are effectively able to mimic humans and are able to PaaS as humans. And that's actually happening in the real world right now. Over time, we will need like human proof of identity. So this could be like biometrics. This could also be a combination of maybe like some sort of like personal data or like some sort of password or secrets that only you know and you can use that to authenticate that you're talking to an actual human and not an agent. Cool any more questions or Yeah .
speaker 3: by professors, students, why do agency cies then fail? So they need some comprehensive study first they say ms has been there for more than 20 years, right? Your distribution systems, the transition processing. So we are just heavy AI cover by by the second name. And so so far I really haven't seen anything new except for you have an agent which instead of just having abi people coding all the logic in the program, you have an agent will be able to do something, you're given a problem, will give you some results. How you just you together the intelligence suddenly anyway ated. So communicating between angels my point is just have communication between ages is exactly the same as ever my us before 20 years ago. Collaboration giving agents is only being to any intelligence but I'm missing that part. Correct? This is .
speaker 2: actually something that's coming next. But just to answer the question, the biggest issue is just to lavigate. What happens is when all these agents are communicating using natural language, that causes a lot of miscommunication where like maybe your agent got the wrong instruction or fail to understand what's happening. And the more agents you add, the more communication overhead is there. You can imagine if you have agenent tic system with n different agents and this n squared communication groups ops. And so like the amount of errors in the system increases. They come like as quadratic and makes that allows for lot of different .
speaker 3: mistakes that can happen. Sting, transport, maall, pretty much all the problem solved. 15 chapters.
speaker 2: Great. Yeah, totally.
speaker 3: Yeah. That could be .
speaker 2: very interesting. But for the audience here. So let's come back to this. So one way to think about agents is when you have this transform al model, the transformodel is acting as a processor. So it's taking in this input prompts, and it's giving out the output prompts. And what you want to do is you want to be able to have a memory system. So you want to have something like a file dascram where you are saving what's happening and being able to process that over time. So you want to have repeated operations. So you do the first Passover model, you get some output tokens, you can save them in a ram like system, and then you have some new instructions that come out. Okay. Like now here's step two of the plan. Coexecute that. Here's step three of the plan. Here's step four of the plan. And that looping behavior, this is what's in a sense giving rise to Asians. Where you can imagine this is the transformer, is the processor, the memory system, and the instructions and the planning are acting similar to the filsystem and the ram. And so they are overall, like giving rise to this computer architecture, where you have the age ent acting as like a computer system with a memory processors, which are the compute, and then being able to use the browsers and actions and multimodity, which can be like inputs like audio and voice and so on. Okay. When we think about long term memory, there's based on the anlogy before, you can think of this as similar to a disk where you want a user memory that's long lived and persistent and so that you can save context about the user, you can load that on the fly whenever you want it. There's different mechanisms for long term memories. The prevalent one is embeddings. So you have reiteval models that can go and fethe right user embeddings on the fly. So if I have a question like, ok, does this person, Joe, is he allergic to peanuts? Then can the system go and find out? And if we have a lot of user data about the user, then we can use a rtable model to do embedding lookup, find out, okay? Like if this is something that we already know about the user or not, and based on that, make a right judgment. And this is something that is very important. And you are able to see early cases of this in system sexactivity right now. There's still have a lot of open questions when it comes to long term memory. The first one is hierarchy. How do we decompose memory into like more graphical structures where you can have temporal persistence, you can have more structures? And you might also want to think about memory as something that is adaptable, because human memory is usually not static. It's changing over time. And so you also want to think about when you have agent memory, how can it change? How can it be dynamic? How can it self adjust? Because the systems are also learning, they're improving. And what does this dynamic memory systems look like? Cool. And with .
speaker 3: memory .
speaker 2: like leads to personalization, where the goal with having long term memory is that you can personalize this agents to the user and they able to understand what you like, what you don't like, and they're aligned with the preferences. So if you have this case of like maybe someone is allergic to peanuts and you want to have an agent that's ordering food on dodash, then you want it to be personalized so it doesn't accidentally order something that you're allergic to. And how can you go and build that? And everyone has different preferences, likes and dislikes. So when you're designing agents, it's very important to actually make sure that you can account for this. So there might be a lot of express set personalization information that you can collect when I go like what is the user? Like are they allergic something? What are the favorite dishes? What seed preferences they have, if they're flying and so on. There's also a lot of implicit preferences. So there's a lot of things around lego collwhich brand. Do you like do like ididas versus Nike? If there were like ten items on a list, like suppose you're looking for like a housing, which which one do you prefer and buy? And those things are very implicit, so they're not explicitly known. And then there are mechanisms where you can collect a lot of this implicit preferences and then personalize over time. There's a lot of challenges when you are building these specialization systems. The first one is just use ser privacy interest trust. How do you actually go and actively collect this information and how do you get people to give it that to you? There's different methods you can go and use to actually collect this formation. So one is just actively active learning where you're explicitly asking the user for the preferences. You're asking them, okay, like are you allergic to something tor? Do you have the seat preference and so on. And there might also be like passive learning where if you can record the users and see what they're doing, then you're able to passively learn from the preferences. Maybe this person likes mic issues because that's where what we have seen them do on the computer and the agis, learning from your behavior and become better and better. And you can learn to personalize by supervised fine tuning where you are collecting a lot of interactions. This can also be through human feedback where you can get ththumbs down and use that to like improve, okay. Like this agent, go and do the right thing. And this is something similar to ChatGPT where if you like the chat outputs, then you can give it a thumbs up. If you don't like it, you give it a thumbs on. And then this can be used to personalize the system over time. Okay. So now we're going to agent to agent communication.
speaker 1: One question online, how do you do evaluations on the performance of agents that collaborate with humans? And is it a moving target? At what point is human performance redundant and agents can be fully autonomous?
speaker 2: I would also say it's a hard question. You just have to go and build benchmarks because it's very hard to know what's going to happen in the real world right now. I will say like based on a lot of current state of evaluations and like what I showed before, like agents are not fully there. The most successful agents have seen so far are coding agents. So if you have your whatever like intelligent code editor and you can already see the traces, like they are automating a lot of engineering for you already that you don't have to go and write a lot of bolet plate code or you don't have to like spend a lot of your own time fixing books. So at some point we'll see this thing where like humans become more like managers and we are giving them feedback, we are giving them direction, okay? Like we want this like suppose you have like systems of different agents, so giving them, okay, like I want agent want to go and do this, agent to go and do this so on, see what the final output is and over using that improve the overall the generation process that you're are going towards. And so this is likely what's going to happen is like the agenent tic systems will become better and better executors where like the humans become the managers for the systems of agents.
speaker 3: So when it comes to .
speaker 2: agent to in communication, we think about like multi agent architectures and multi agent systems where you have all this cute little digital robots that can go and talk to each other, communicate and like go do your book in a very coordinated and streamline manner. There's reasons that you want to go and build metitan systems. The first one is paralyzation. By dividing a task into smaller terms and having like multiple agents, like if you have any agent instead of one agent, you can improve the overall speeds and efficiencies. The second thing is specialization. If you have different specialized agents, so you have a maybe like a spreadagent and you have a slack agent, and you have a web browser agent, then you can also route different tasks, different agents. Each agent can become really good at their tasks. And this is similar to having a degree in a specific major or having an occupation specialized in that occupation. There's a lot of challenges when it comes to agent agent communication. The biggest one is that this kind of communication is lossy. When you have one agent communicating to another agent, it's possible that it might make mistakes. So like it's like this system, not what happens in human organizations. Maybe your manager will ask you to go do something, but you may misunderstood them, then do something different. Or they were like, Oh, what does this happen? And similarly, like agent to agent, communication is also fundamentally losy, where whenever you are communicating information from one agent to another, you're losing some percentage of the forand that allows for mistakes to propagating that system and become increasingly more prevalent. And there's different mechanisms from a tisystem. This is a very novel field right now. Like people are still trying to figure this out. No one has actually crthis right now. What you want to do is you want to like build the right system of hierarchies where like you might have manaagents that are working with like worker agents, you might have managers of manaagents and you might have maybe like flat agents organizations where like maybe like one manamanaging hundreds of agents, or it could be like a big like vertical tree where you have maybe like ten different hierarchies of agents that are managing each other. And so a lot of these systems are possible, and this just depends on the task or what you're going on specializing on. And the biggest challenge for this kind of systems is like how do you exchange communication effectively without losing that information? How do you build syncing primitives? Like how can communication from one agent that's very far away from the maybe like another agent in the hierarchy go and be communicated very, very effectively across a chain? There's a couple of frameworks out there that are looking to solve these problems on how do we make this communication protocoll robust and how can we have mechanisms to reduce this miscommunication. A big one in this part is mcp, which is model context protocol. This is a protocol that came from anthropic that a lot of people are using right now. It's a simple wrapper around apis. So what does it it does is it gives you like a streamlined standard format around each api and by creating an mcp wrapper around your service, so this could maybe like you have a file server service that exposes an api you can create like an mcp rapper for your file server, or maybe for your email client, or for maybe a slack client or something running on your computer. Then all this mcp connector servers can go and communicate with each other and like do things for you. And so this allows for very effective communication where you are able to control the routing and like make this listmoduer. So you're able to plug in like new services as you want to. Similarly, another framework in this space is agent to agent protocol. So this is a new protocol that came from Google very recently that's aligned for agents, logicommunity other agents and add a lot of reliability and fallback mechanisms. And I'm not sure how many people here in the room have used mcps. Yeah, not many. Okay, okay, cool. So mcps are actually very cool. Like what they are doing is they are abstracting your apis and making them like very, very modeler so that you can go and plug plug your api into an mcp protocol. And once it's wrapped around that, then you can go and interconnect it to any other service that's supported by mcps. So it kind of becomes in a sense, like having a standard interface for communication for your different services or applications you have and then exposing them and like letting them connect and talk to each other. So similar to like how you have some htttps for communication on the normal Internet, mcv becomes like an interesting protocol for communication to happen over different. And Yep, so if you have like a client like clad or raplet or some of the model, you can connect that to servers that are supporting the nv protocol. You can have a bunch of like different services. Each services could be like some sort of data tool like a database api or pretty much als, and they can all interconnect them like do like modular things for you. And because mcps are not dependent on the spec of your api, they can allow you to absorb a lot of changes and add this level of modularity and abstraction by standardizing the whole interface. You can also have like dynamic tool discovery, because you can find different ncp servers that are exposed in some sort of directories. And then you can also plug in mcp servers that you like and connect them so you can plug in new tools .
speaker 3: them out and you can route .
speaker 2: information based on what you want to do. Okay. Finally, like touching on some of the issues when it comes to aient existence. So so far, we have seen a lot of different things, okay? Like how can this agents work? How can we valid them? How can we train them? How can we think about communicating with different agenent systems? And even though a lot of these things are very interesting, a lot of these things are taking off. There remains a lot of key problems in the space that you still have to solve for this agency denpractical for them to be applied in everyday life and for them to become useful for you. The biggest one is just reliability, that the systems have to become very, very reliable. Like they need to be close to 99.9% reliable if you're giving them access to your payments and your bank details, for example, or maybe they're connected to your emails, calendars and whatever services. And then you want to really trust them that you don't want the systems to go rogue and maybe will like post something wrong for you on socials on your Twitter or your LinkedIn, or you don't want them to go and create an havoor, make a wrong transaction on your behalf. And so that becomes like like how can you trust an agenent tic system that's operating autonomously? And that's where reliability because becomes a big thing. Second issue with otous agents is looping. Like these agents can go do something wrong so they can get stuck in a loop and they might just go and repeat that process again again. So if you give them a task and maybe if you remember like the restaurant booking task that are sure before, and maybe like the agent went to the wrong restaurant and can maybe just trying to do the same thing again and again, it doesn't know what to do, and that kind of issues can happen. A lot of the agents where you might end up wasting a lot of money and compute. And it's very important to be able to figure that out and correct that. And that leads to a lot of use cases around like how can we test agents, how can we properly benchmark them in the real world on a lot of different use cases and make sure we are learning from that? And how can we also, once we deploy the systems, be able to observe them? And that becomes like, how can we know what is happening? Can we monitor them online? Can we have some sort of like safety, which could be based on audit trails, that we can audit all the operations that this incihas done so far. And we can maybe also have human overrides that if something goes wrong, we have some sort of human fallback where maybe like a remote operator can take control of the agent and correct it, fix it, or maybe you are able to go and rightly take control and fix it. This is similar to autopilot in tester. So when you're driving autopilot, maybe like you see like you correct something, maybe it's going to go do something wrong and you can take over control and override the system. And that becomes very interesting when you're thinking about real world deployment of agents. Cool. Okay. So that was Yeah so that was the whole lecture on agents. Sorry. There was like some things that are a bit messy. Yeah, we had to put together some final slides. Happy would take questions. And Yeah.
speaker 3: go on. What you see like an accuracy. I say like 40% or something on the front on task over the course of the day, you think there's a plan, you know 99 or 51, 999. And the other thought logists are just iterations or is actually clear things that you don't have to try.
speaker 2: So this is definitely possible, especially with like reinforcement learning and like we I showed the agent q method before. So right now, a lot of these models, like even if you have like clouude sonnet or GPT -4GPT -4 or Gemini, they're not trained on this agenent tic interfaces tasks. So that's why like they're kind of they're working zero short ts, so they're not never trained in their distribution training set on actually going and optimizing these problems. And so when they encounter this new interfaces or this kind of new tasks in the real world, they often fail. But if you're able to train the system directly to work on this task, ks, using reinforcement learning and like collections and like self improvement, then you can actually reach very, very high accuracy. So on the open table task with agand q, we reach like 95% accuracies. And if you keep and going and training the systems, you can fully saturate them, like reach close to like 99.9%. The hard thing becomes this. There's a diversity of tasks. So I can imagine there's millions of upsites. And if you want to train an agent that's usually 99.9% on each website, that's a hard challenge. And that's something that's very interesting. Me, how can you build a generalized agent that can work on the whole Internet, that can generalize to everything? Maybe in the future you'll have agents that can do like ultimate, all of voice calling, all of like computer control, maybe like they can also use all of the apis and everything. And something like that is possible theoretically. It's just like very hard to build out.
speaker 3: Yeah do you know whether agents are able to solve .
speaker 2: capp as they can?
speaker 3: What do you think the implications of that are for? Like how the Internet is going to work in the next ten years?
speaker 2: It's definitely very interesting. I would say it's a cat and mouse game. So you have seen like the new generation capp chthat becoming harder and harder to solve. And I think like it's very hard to beat this because if a human can do it theoretically, like a agent can also go and do the same thing. So over time, I think we'll have to just figure out like a better methods of identity. Biometrics can be a big part of that. Like if you are able to use fingerprints or some sort of two fa mechanisms, then we know like this is an actual human.
speaker 3: not an agent. So in the there's this article called AI 2027 that you've probably heard of that you know, outlined, you know like where AI research is going to go and what might happen. And in 2027, after 27, when you know AI, we automate programming and then we automate AI research. And you after your lecture, I was wondering, do you think we could automate the process of creating AI agents? Because from what I understand, the main bottleneck is how am I going to access uis apis? How am I going to be able to access data that is enclosed in those like, I guess, complex and somewhat dynamic systems? So what if, very simply, someone designed an agent that could that was optimized to vectorize apis and uis, and then you designed an agent that was optimized to train agents on different vectorize data sets? Because they're like specific architectures that you can use train agents, whatever. Do you think we will see in the future people automating with confidence our process of creating AI agents, making all these niche specific AI agents that we're seeing on the market obsolete? Yeah, I absolutely things .
speaker 2: so this is gonna to happen. I know I think it's already happening in the bigger las. So if you have like las like opthey have a lot of there's also select papers from like if you have simpfrom siai, we are working on like AI research agents that can go and write research papers and like train models and do a bunch of things. So it's totally possible for agents to go and self improve and build other agents and like you can have a whole process on how that can happen. And definitely like it's possible to like train on a lot of this like data sourand apis and find ways to like represent them and like collect the right types of details and improve that. I do think that seems to be the future of a lot of maybe hard research, especially ironlike protein designs and a lot of hard sciences. So we'll definitely see a lot of that happen.
speaker 3: Hi God. Nice to meet you again. Just give you context with building like this lufor AI agents. Basically, it's like Uber for AI agents. So I've been working for agents on the agents for a long time. The biggest problem with agents has been, as you said, reliability and hallucination. The first thing is, the first thing we try to work on is how do we prevent agent prevention from hallucinating? The next thing is what models were based at executing action. So for my research, we realized that clouude is great. Better still, we have GPT at the end. So we have just like snacks, we have like a team of agent doing work. And then one of those, the action, seem to be GPT agent because we struggle with some agents doing, as you said, GPT -4o is great at taking action. And other models of GPT seem to know work well and pand other stuff. So I think the biggest challenge with building agents is also the third one is the fact that end users can take one hit. So my wife here doesn't give, like if I give out to telthe product and it makes one mistake, there is no space for reinforcement learning in sense that if I say, book my flight like I told manus to do yesterday, and I made one mistake, I lost trust. So the problem is to work in the real world. Our agent should prevent making a mistake in the real world. So that brings us to samandbox, which I love, what you're doing with sandbox and doing clones of this website. The challenge with sandbox is you can't clone all the website on the Internet. And where human excel is, the fact that if given a new tathey, figure out their way around. So these are challenges that we have with agents, and I'm happy if we can talk more about it or we can talk about later.
speaker 2: Totally, totally. Just get a guess just of it. What's the exact .
speaker 3: question there? So I think the question is, how do we make them ready for the real world? We have embody with those. Good job with calling what makes mistakes. We have email agent. My owner went got stopped in a loop and kept sent email five times. So an investor, we have coding agent that wipes up 3000 lines of code for me yesterday. How to redo it? So we have these challenges in the reward and people like my wife are not going to take one shot hit and they will just stop using it. So I think the question is, how do we prevent agent from hallucinating, right?
speaker 2: Yeah. So it's definitely a hard problem where like you can go and keep improving this agents, even if you look at like maybe a lot of like the initial models that came out, like when you had like the first version of like gbthree and so on, they helescinate a lot. But like as you have bigger models that are like more parameter size and the train on more data, they start halescinating less. So like if you see the negeneration models, GPT -4 and clouude, I think like over time, I think as you figure out how to make better foundation models, a lot of these errors in the systems go down, especially hallucinations and other things that can happen. You just require a lot of like monitoring and evolutions and a lot of testing, and this also becomes very domspecific. So like if you're working on something that's a domain specific problem and you're like, okay, you want an agent that can work 99.9% on this domain, then what you want to do is you want to curate the right task cases. You can be like, okay, here's a thousand scenarios that we really go and care about. Can we go and test this agents on this 1000 scenarios all the time, which could be in production and when you're are actually running this with your users or this could be like some sort of offline simulation where you're like daily testing. Is there any regressions in the system? What happens if you change a prompt? Like what will this look like? And if you're able to build a lot of like very robust testing, then you can also verify, okay? Your accuses are going up and then it kind of becomes tyocan. You find tunthis agento become better and better for your use cases. So I think I would say the correct answer is a combination of models will become better and better over time. So you can just simplicity, trust them more as the new model comes out. And the second thing becomes is just you want to have very domain specific testing and evaluation. So for your own use case, can you go and have like some sort of ways to rank which model is doing what, how good is it and make the right judgment and be able to like fine tune and use like reinforce cement learning and other techniques like make them better over time? Bye ke?
speaker 3: Reinforactions because I think the problem lifelong with models, I don't think you need. So the question here is we need become a small language model. Yeah. So that's an interesting question. Also.
speaker 2: we are already seeing some hints of this. So if you look at a lot of the newer models, that Traon reason increases. And we have found like you can actually train smaller models on reason Ising traces and have like better accuracies. So a lot of the newer activity four models, like all the gbfour o and like all the new series of like zero three mini and so on, they're actually distribulled small models, but they're just like fun you and using reinforcement learning other techniques to be very good at reasoning. And so we are already seeing that with all the new generation ation of all the thinking models that are coming out and the o one and o three series. So that's showing this, that smaller models with batreasoning better processing, actually, as that cancer itwill be interesting to see, like how far can you push the limits? And like what well this look like is like maybe like over this year, what are the best accuracies we can expect from these kind of reasoning things? Can we actually go and be like psd level at mathematics and even like three super intelligence on a lot of this specific domains?
speaker 3: I think a lipotest is reward. And my approach architecture, which I think may work, is the manager agent could be large language model and the worker agent could be small language models because I think it's distillation happening when you are collaborating in a team. Yeah. What the last question is regarding memory. When analogy gave with respect to a computer, we have random access memory. We have the room and then we have the you know hard drive with AI agents right now, I just think they have random access memory. And with mazero, we are just giving it wrong. I don't think they have the hard drive and this the consciousness, right? Why they are working. I think that's a challenge. I would like to know how do we implement that kind of system to make it sort of like a computer?
speaker 2: Yeah, that's an interesting question. Like I'll be curious if you're actually trying experiment and see how that works. Like I'll say this state answer to this. It just depends on what you're building, what your applications are. And then it is depending on what you're doing. Like there can be different sia models that might work better if you're doing a coding task. You might work like a more of recording ding model versus like if you're doing something more chat based or actions and so on. And I think I could just have to find direct ingredients in a sense, like the right components for your application and I go and build that. Yeah. So there's no right answer to it .
speaker 3: in a sense.

概览/核心摘要 (Executive Summary)

Div Garg 在斯坦福 CS25 的演讲中，描绘了通用人工智能（AGI）的诱人前景，同时也坦诚地剖析了当前自主AI代理（Agent）在通往AGI道路上面临的严峻挑战与未来方向。他认为，尽管超级智能似乎指日可待，但当前代理在推理、目标一致性、记忆和不确定性处理方面仍显脆弱。Garg 强调，实现 AGI 不仅仅是模型改进，更需要重新思考智能系统的设计、评估和部署，特别是采用人类启发的方法，关注其在提升数字世界交互效率和作为用户数字延伸的核心价值。他介绍了其团队在 AGI Inc. 和先前在 MultiOn 的工作，包括构建能够与计算机交互并协助日常任务的代理。演讲核心内容包括：AI 代理的通用架构（记忆、工具、规划、行动），通过模拟加州DMV驾驶考试等真实案例展示代理潜力；强调了严格评估标准的重要性，并展示了其团队构建的模拟网站环境（如 Valdraxfive c [原文如此，可能为转录错误]）用于基准测试，结果显示即便是 GPT-4o 等前沿模型在特定代理任务上成功率仍有不足（如 GPT-4o 仅14%）。Garg 详细阐述了通过强化学习（如其 AgentQ 系统，结合蒙特卡洛树搜索、自评机制和DPO）训练代理以实现自我改进，并能在特定任务（如OpenTable预订）上达到超过95%的准确率（使用GPT-4 Turbo级模型，转录为GPT photo，作为对比基线时准确率约62.6%）。他还讨论了长期记忆、个性化对代理的重要性，以及代理间通信（如MCP、A2A协议）的进展与挑战。最后，他指出了代理系统面临的关键问题，如可靠性（需达99.9%）、循环行为，并提出了通过严格测试、监控和人类监督来解决的思路。

引言与演讲者背景

演讲者：Div Garg，AGI Inc. 的创始人兼CEO，一个致力于重新定义 AI 与人类交互并将 AGI 带入日常生活的应用型 AI 实验室。
- 此前创立了 MultiOn，这是首批 AI 代理创业公司之一，开发能与计算机交互并协助完成日常任务的代理，获得了顶尖硅谷风险投资公司的投资。
- Garg 的职业生涯聚焦于 AI、研究和创业的交叉领域，曾是斯坦福大学专注于强化学习（RL）的博士生（后辍学）。
- 其工作涵盖自动驾驶汽车、机器人、计算机控制和 Minecraft AI 代理等多个高影响力领域。
演讲主题：探讨以人类为灵感设计 AI 代理的方法，阐述其在提升数字世界交互效率、作为用户数字延伸方面的核心价值，以及通往 AGI 之路如何需要重新思考智能系统的设计、评估和部署。
当前AI发展背景：
- 超级智能似乎“近在咫尺 (round the corner)”。
- 前沿模型持续扩展。
- 新一代自主AI代理正在兴起，它们能够在开放环境中感知、推理和行动，代表着向AGI迈出的初步步伐。
AGI面临的挑战：
- 脆弱的推理能力 (brittle reasoning)
- 目标漂移 (drifting goals)
- 浅层记忆 (shallow memory)
- 在不确定性下校准能力差 (poor calibration under uncertainty)
- 实际部署暴露了当前代理的脆弱性。
解决方案方向：不仅仅是模型改进，更需重新思考设计、评估和部署，包括严格的评估指标和紧密的用户反馈循环，以构建能推理、记忆和恢复的系统。

AGI 的形态与 AI 代理的架构

AGI 的未知形态：Div Garg 指出，AGI 目前仍是一个抽象概念，其具体形态尚不明确。
- 可能是某种超级计算机。
- 可能是十倍强大的 ChatGPT。
- 可能是更个性化的伴侣。
- 可能是嵌入生活中的某种形态。
AI 代理的核心架构 (基于 OpenAI 研究员 Lilian Wang 的图表)：
1. 记忆 (Memory)：
  - 短期记忆：如聊天窗口的即时内容。
  - 长期记忆：如用户的个人历史、偏好（“用户喜欢什么，不喜欢什么”）。
2. 工具 (Tools)：代理应能像人类一样使用工具。
  - 例如：计算器、日历、网络搜索、编码等。
3. 高级规划 (Advanced Planning)：
  - 反思 (Reflection)：出现问题时，能进行故障转移、错误纠正和恢复。
  - 自我批判 (Self-criticism)。
  - 分解 (Decomposition)：如思维链，使代理能进行自主推理循环，将复杂任务分解为子目标。
4. 行动 (Actions)：代理能够代表用户执行任务。
Garg 认为，这种架构随着系统能力的增强，最终可能导向某种形式的 AGI。

AGI Inc. 的探索与真实世界应用案例

AGI Inc. 的使命：探索 AGI 在日常生活中的应用形态。
真实世界应用演示 (旧案例)：AI 代理通过加州DMV驾驶理论考试。
- 场景：一个AI代理参与真实的DMV在线考试，人类操作员双手离开键盘。
- 过程：代理自主完成包含约40个问题的考试。DMV方面对过程进行了屏幕录制并监控摄像头中的人。
- 结果：代理成功“规避了整个设置 (evade like the whole startup)”并通过了考试。Garg 团队称之为一次“白帽黑客尝试 (White hat hacking attempt)”，事后通知了DMV，并且DMV还寄送了驾照。
- 意义：展示了AI代理在真实世界应用的巨大潜力。

推动 AI 代理发展的关键举措

Div Garg 及其团队和 AI 社区在以下几个方面做出了努力：

代理评估 (Agent Evaluations)：
- 目标：建立在真实世界中评估代理的标准和基准。
- 关注点：代理在不同网站或用例上的表现如何？如何信任它们？何时何地以及如何部署和使用它们？
代理训练 (Agent Training)：
- 目标：训练代理进行高级规划、自我纠正和自我提升。
- 方法：结合强化学习 (reinforcement learning) 和其他先进技术。
代理通信 (Agent Communication)：
- 目标：实现代理与代理之间的有效沟通。
- 近期突破：
  - 模型上下文协议 (Model Context Protocol, MCP)：一个新兴的协议。
  - 谷歌的 A2A (Agent-to-Agent) 通信协议。
  - Garg 团队的开源项目 Agent Protocol：允许不同类型的代理（如编码代理、网页代理、基于API的代理）相互通信，从而完成更复杂的任务。

为何需要 AI 代理及其优势

核心论点：“代理在数字世界中与计算机交互的效率将高于人类 (Agents will be more efficient in interfacing with computers in the digital world compared to humans)。”
愿景：拥有一支完全数字化的虚拟助手军队，通过人类界面进行交互，代表用户执行任务。Garg 提及了其博文 “Software 3.0” 阐述了部分此类观点。
超越大型语言模型 (LLM)：LLM 本身不足，需要行动能力来释放更高生产力并构建更复杂的系统。
构建模块：涉及链式模型、反思机制、记忆、行动、个性化、互联网接入等。

为何选择类人代理 (Human-like Agents)

与现有界面的兼容性：类人代理能像人类一样操作为人类设计的界面（如互联网、网页、计算机应用，通常基于键盘鼠标交互）。
- 这使得代理可以直接与现有软件程序交互，覆盖“100%的互联网 (100% of the Internet)”，无需等待API。
- 相比之下，API代理受限于公开API（仅约5%的API是公开的），且在API上的可靠性难以保证。
用户的数字延伸：代理可以学习用户习惯、上下文，并以用户的方式执行任务。
更少的限制边界：类人代理可以处理登录、支付等操作，与任何服务交互，不受API访问权限的限制（无需支付API费用或向服务商申请权限）。
简单的行动空间：代理只需学习“点击 (click)”和“输入 (type)”，即可泛化到任何界面，并通过用户记录、反馈等数据不断改进。
API代理 vs. 直接计算机控制代理的权衡：
- API代理：
  - 优点：易于构建、更可控、更安全。
  - 缺点：变异性高（每个API需要不同代理）、API可能频繁变更、无法保证100%工作。
- 直接计算机控制代理：
  - 优点：行动更容易、更自由（不受API边界限制）。
  - 缺点：难以提供保证（行为不可预测）、目前仍是“进行中的工作 (work in progress)”，存在许多问题。

AI 代理的自主性级别 (Levels of Autonomy)

Garg 参照自动驾驶的分类，提出了AI代理的五个自主性级别：

L1-L2 (人类控制，代理辅助)：人类处于主导地位，代理扮演副驾驶 (Copilot) 角色。
- 示例：像 Cursor 这样的代码编辑器，提供并行自动化，人类指导编码，代理辅助。
L3 (代理控制，人类后备)：代理主导大部分工作，但仍有人类监控和反馈机制。
- 示例：像 Cursor Composer 或 Windsor 等更具代理性的代码编辑器，代理编写大部分代码，人类监控并纠错。
L4 (高度自主，特定场景人类监控)：代理在特定环境中自主操作，无需人类实时介入，但可能有远程人类监控或自动化后备层。
- 示例：Waymo 在旧金山的自动驾驶服务，车辆自主驾驶，但有远程操作员监控。
L5 (完全自主)：无需任何人类介入或监控，AI代理能够完全独立自主地运行。

信任、评估与基准测试

核心挑战：如何信任代理会按照预期执行任务？
AGI Inc. 的评估努力：构建了一个“互联网的微缩版本 (miniature version of the Internet)”，克隆了排名前20的网站，用于基准测试代理在这些界面上的表现。
- 该平台据称在线，地址为 Valdraxfive c [原文如此，可能为转录错误]。
- 克隆网站示例：Airbnb、Amazon、DoorDash (克隆名为 DashDish)、LinkedIn。
- GPT-4o 评估结果：在11个不同环境（如DashDish、Omnizon等）的代理任务中，成功率仅为 14%。
- 其他框架评估结果：
  - OpenAI Computer Use Model (驱动Operator)：在邮件或日历等环境最高准确率 20%，其他环境表现不佳。
  - Stagehand、BrowserUse 及 AGI Inc. 自研的 AgenZero：最高成功率约 50%。
- 结论：当前代理在自动化真实世界网站界面方面仍处于早期阶段。
模型在ERC任务上的基准测试 [ERC任务具体含义原文未明确说明]：
- Claude 3.7：表现最好，准确率约 40%。
- Gemini 2.5 和另一个紧随其后的模型（转录为'zero, three'，可能指GPT-4o或Claude 3系列中的先进模型）表现接近。
- 其他模型表现逐渐下降。
- 重要启示：“许多这些模型尚未完全准备好在真实世界中部署 (a lot of these models are not fully ready to be deployed in the real world)。” 例如，一个由Claude驱动的代理，预期成功率仅41%，这还不够好。

训练更强大的代理模型：AgentQ 系统

目标：构建针对决策任务进行定制微调的、更强大的代理模型。
AgentQ 系统：一个能够自我改进的代理系统，通过纠错和规划进行学习。
- 核心机制：类似于人类学习骑自行车，通过反复试验和错误，不断改进策略。代理会保存错误并在后续学习中利用。
- 技术组成：
  1. 蒙特卡洛树搜索 (Monte Carlo Tree Search, MCTS)：借鉴 AlphaGo 等强化学习技术，在任务的搜索空间中进行规划，实现高级推理。
  2. 自评机制 (Self-Critic Mechanisms)：代理能够自我验证，在犯错时获得反馈并从中学习。一个“批评家LM (critic LM)”会对提议的行动进行排序。
  3. 基于AI反馈的强化学习 (RLAIF)：使用如 DPO (Direct Preference Optimization) 等技术，通过偏好数据（成功与失败的轨迹）改进代理。
- 工作流程：MCTS 生成成功和失败的轨迹 -> 自评机制识别哪些提议的行动成功或失败 -> DPO 用于优化网络。
- 应用案例：OpenTable 餐厅预订
  - 代理在预订过程中可能犯错（如选错日期），然后通过回溯、恢复来纠正错误，最终完成预订。
  - 团队运行了成千上万（Garg口误为“十万级”）的机器人程序在OpenTable上进行测试。
  - 性能对比 (OpenTable 预订任务)：
    - GPT-4 Turbo（转录为GPT photo，推测为GPT-4系列较强模型）：约 62.6% 准确率。
    - 仅使用 DPO：约 71% 准确率。
    - AgentQ (无MCTS)：81% 准确率。
    - AgentQ (完整版，包含MCTS, DPO, 自评)：接近 95.4% 准确率。
    - 训练效率：从约18.6%的基线准确率提升至95.4%（约4倍提升），训练时间少于一天。
    - 计算资源 (Q&A补充)：在 50个 H100 GPU 上训练，用时少于一天。
- AgentQ 的研究论文已在 arXiv 上发布。

代理的记忆与个性化

AI模型作为处理器：将AI模型（如GPT-4）类比为处理器，输入语言令牌（提示），输出新的语言令牌。
- 随着模型上下文长度的增加（从8K到1M令牌），相当于处理器的“RAM”在增大。
记忆系统的重要性：需要类似计算机文件系统和RAM的机制，用于保存状态并进行迭代处理（计划的步骤1、2、3、4等）。
- Transformer模型：可视为处理器。
- 记忆系统、指令和规划：类比于文件系统和RAM。
- 整体构成一个类计算机架构：代理 = 计算机系统（包含记忆、处理器/计算单元、输入/输出如浏览器/行动/多模态）。
长期记忆 (Long-Term Memory)：类比于计算机硬盘。
- 目标：持久化的用户记忆，按需加载用户上下文。
- 实现机制：主要是嵌入 (Embeddings) 和检索模型。例如，查询“Joe是否对花生过敏？”时，系统通过嵌入查找用户数据。
- 开放性问题：
  - 记忆的层级结构 (Hierarchy)：如何将记忆分解为更复杂的图结构，实现时间持久性和结构化？
  - 记忆的适应性 (Adaptability)：人类记忆是动态变化的，代理记忆如何实现动态调整和自我修正？
个性化 (Personalization)：
- 目标：通过长期记忆使代理理解用户偏好（喜欢什么、不喜欢什么），并与用户偏好对齐。例如，点餐代理避免用户过敏的食物。
- 信息类型：
  - 显式个性化信息：用户明确提供的信息（如过敏史、座位偏好）。
  - 隐式个性化信息：通过用户行为推断的偏好（如品牌偏好Adidas vs. Nike，租房偏好）。
- 挑战：用户隐私和信任。
- 信息收集方法：
  - 主动学习：明确询问用户偏好。
  - 被动学习：记录用户行为，代理从中学习。
- 实现个性化的学习方法：监督式微调、人类反馈（如点赞/点踩）。

代理间的通信 (Agent-to-Agent Communication)

多代理架构与系统：设想多个小型数字机器人相互交谈、协调工作。
构建多代理系统的原因：
1. 并行化 (Parallelization)：将任务分解，由多个代理执行，提高速度和效率。
2. 专业化 (Specialization)：不同代理专注于特定任务（如电子表格代理、Slack代理、浏览器代理），各司其职。
挑战：
- 有损通信 (Lossy Communication)：代理间传递信息可能出错或产生误解，类似人类组织中的沟通问题。错误会传播并放大（N个代理，N²通信路径，错误可能呈二次方增长）。
- 新兴领域：目前尚无人完全解决这些问题。
- 系统层级：需要设计合理的层级结构（如管理者代理与工作者代理，扁平化或垂直化组织）。
- 核心难题：如何在不丢失信息的情况下有效交换信息？如何构建同步原语 (syncing primitives) 以保证跨层级通信的有效性？
解决通信问题的框架：
1. 模型上下文协议 (Model Context Protocol, MCP)：源自 Anthropic，被广泛使用。
  - 它是一个围绕API的简单包装器，为每个API提供标准化的格式。
  - 通过为服务（如文件服务器、邮件客户端、Slack客户端）创建MCP包装器，这些服务可以相互通信。
  - 优点：控制路由、模块化（即插即用新服务）、通过标准化接口吸收API变更、动态工具发现。
  - 类比：如同HTTP之于普通互联网，MCP是不同AI服务/应用间通信的协议。
2. A2A (Agent-to-Agent) 协议：近期由谷歌推出，旨在提高代理间通信的可靠性和回退机制。
Q&A中关于人机区分：有提问者关心如何区分AI和人类对话。Garg认为这非常困难，尤其语音代理已能有效模仿人类。未来可能需要生物识别、个人数据或密码等身份证明方式。

AI 代理系统面临的关键问题与未来方向

尽管AI代理发展迅速且前景广阔，但仍存在一些关键问题亟待解决，才能使其在日常生活中实用化：

可靠性 (Reliability)：
- “系统必须变得非常非常可靠，如果你让它们接触你的支付信息和银行账户细节……你需要达到接近99.9%的可靠性。”
- 不希望代理失控，例如在社交媒体上发布错误信息或进行错误的金融交易。信任是基石。
循环行为 (Looping)：
- 代理可能陷入错误循环，不断重复相同过程，浪费计算资源和金钱。
解决方案思路：
- 严格测试与基准测试：在多种真实用例中测试，并从中学习。
- 可观测性 (Observability)：在线监控代理行为，了解其内部状态。
- 安全机制：
  - 审计追踪 (Audit trails)：记录代理的所有操作。
  - 人类覆写 (Human overrides)：当出现问题时，允许人类（如远程操作员或用户本人）接管并纠正代理行为，类似特斯拉Autopilot的人工接管。

Q&A环节洞察提炼

问答环节进一步揭示了Garg对未来的展望，包括通过强化学习实现代理性能大幅提升的信心、AI代理解决复杂任务（如验证码）的能力，以及对“自动化AI代理创建过程”这一趋势的肯定，预示着代理将能实现自我改进并构建更高级的代理系统。同时，也强调了在特定领域通过精细化测试和持续模型迭代来解决可靠性与幻觉问题的实践路径，并探讨了大小模型在代理系统中的协同潜力。

核心结论

Div Garg 的演讲描绘了AI代理作为通往AGI关键一步的广阔前景，同时也坦诚地指出了当前面临的严峻挑战。他强调，通过人类启发的设计、严格的评估体系、先进的训练方法（特别是强化学习和自我改进机制）、可靠的记忆与个性化以及高效的代理间通信，是克服这些挑战、提升代理能力和可靠性的核心路径。尽管如GPT-4o等前沿模型在通用能力上表现优异，但在具体的、需要与复杂真实世界环境交互的代理任务上，其“零样本”表现仍有显著不足，凸显了针对性训练和评估的重要性。最终目标是构建高度可靠、可信赖、能够无缝融入并赋能人类日常生活的AI代理系统，但这需要整个AI社区在基础模型、算法、工程实践和安全伦理等多方面持续努力和创新。

摘要历史 (2)

Detailed Summary 摘要

模型：gemini-2.5-pro-exp-03-25

2025-05-18 15:33

Detailed Summary 摘要

模型：gemini-2.5-pro-exp-03-25

2025-05-18 15:26

StreamSparkAI