speaker 1: Welcome to the second lecture of our current offering of cs 25 transformers United. So this will be the first lecture featuring an external speaker. So the rest of the course will have folks come in and talk about you know the state of the art and cool research they're doing as part of their work. So we have an exciting lineup of speakers for you guys for the rest of the quarter. So I'm delighted today to have a Karina from OpenAI. So she works on both product and research there and also previously worked at anthropic. So I'll let her take it from here.
speaker 2: So can everybody hear? Cool. Let's see. So before .
speaker 3: I start the stock.
speaker 2: I would love to set a tone for I'm not here to lecture you, but I would love to have much more collaborative and interactive session. And I do hear I've heard a lot of you know, people kind of concerning about AI and the future of AI. And I understand that the development of a gi is both carary and exhilarating. And if anything, I would love to get for you to get out from the stock is the fact that I think everybody can have very meaningful future and everybody can like build something really, really cool the eye. And as a part of this journey, cool. So before I started with a talk, theybe all about how arrais a co design of product and research. And you know we are shifting both in the labs, we are shifting the focus towards more of a frontier product research, where the tasks that we teach the models are becoming much more real role tasks. So and before we get started, I'll just like share some of the vignettes of like what I'm really, really excited about the future of AI and what the AI is capable right now that can inspire everybody. So the first one is actually education. I'm actually really, really excited about the fact that AI can democratize education. Here's an example of me asking ChatGPT to explain the Gassian distribution. And then with in canvas, it creates know the code itself in order to visualize. So it explains in text. And then I ask you to write in code, and then you can render the code in order to visualize it. So in a way, it becomes much more a little bit personalized. The second demo is also something that a lot of people might want to use it. So let's see if you take a screenshot from the paper and you want to understand what's going on, and the model can go and explain in a separate canvas. That's like another feature in chagpt. And then then you can start to have much more interactive conversation with the model by selecting very certain like things and try to like ask more questions. You follow up questions. So in a way, when ChatGPT came out to be first in 2022, it was purely conversational ui. It was mostly chat. And as the use case has kind of grown from the product itself, you know there was no expectation of how people would use ChatGPT at that time. But over time, you've seen a lot of people started using it for co generation, for writing and long form writing. It became very obvious that the interface of Chad was kind of very limited. So canwas was kind of like the first attempt of our team to break out from of the cycle in order to allow people to have much more fine grain collaboration with AI. And actually today, antarpic published a small report around the education and how people use clad for education. And it's very interesting to see the correlation between the sphelor 's use and the way people use use cases. So you can see there's a huge difference between, like computer senor, another thing that I'm really, really excited about, the AI's, is that anyone can create their own tools for themselves, with their friends, with their family, or even run their business. And the models now can generate a lot of front end code, and then you can render that code inside the canvas and then iterate on it and have much more visual representation of things. You know on Twitter, I've seen a lot of people creating like very, very personalized and customized tools that they really, really want, even like games like chess. And with recent image image generation model from open enai, you can literally sketch anything with your own hands and then recreate or kind of realize what you're dreaming of within an image and in the style that you really, really want. So I do really hope that the creativity of the human creativity and how AI tools can help anyone to become creative or be an artist in a way that was not possible before. Another thing that I've tried actually on the mobile is to create a mini game, and you can easily do that in canvas. So I asked to just like generate me you know a react app of interesting game because I'm not playing. And then the model does that. And I do hope that in the future, instead of me prompting it to generate a game, it will kind of like be more proactive and have this kind of personalization as AI become more of a companion to humans. And an exciting thing about chgpity is that the compositionality of different capabilities that can allow you to augment human creativity. So here I asked to generate an image of the ui and then, sorry, generate the image of ui and then ask the model to kind of implement those. And you can do that literally in canvas right now. And like it can render. It's not it's just like front end. It doesn't have full kind of stack engineering. But you know you can do some of the things like this. You can compose different tools and compose in a way that was never possible before. So I do hope that ideas like this will be very, very powerful and everybody in this room can play with those things. And actually, how did we get there? And I think there are like two main scaling paradigms that helped us to get there. The first one is next, talking Ken prediction, obviously of really the pertaining scaling paradigm, where the model kind of becomes like a world building machine, understanding the world, a much better scale, you know, but it becomes the next prediction works really, really good at certain tasks, but it becomes harder at tasks writing. If the model predicts a wrong next talk, and it can kind of the coherence of the plot will just be lost in the pertaining stage. And maybe you kind of want to recover this in reinforcement learning. And the next second paradigm that has happened is a row on a chain of thought for more complex task. And this is derreasoning work from urban AI. And now it's being adopted by a lot of different labs. And I do think this is scaling around itself. Self is another paradigm that we can train the models on the axis to and especially for the real world's task that was never possible before. So all the agentic work agents, like operator deep research, other agents are kind of like trained on this kind of new paradigm overall on the gene thought. So Yeah, I want to finish this section of like vignette of like Yeah build, create, build and make something wonderful in this world. And they hope like people be more inspired rather than scared that AI is going to take their jobs or kind of remove their creativity. Instead, I feel like people can become more powerful with their imaginations with these tools. So I currently work at opeye. Before that, I worked at anthropic. And I was kind of like at dentiintersection of project and research. When I first came to antarpic, I was a product engineer, and then over time, I switched to research engineering. So my background is mostly both in product and research now. And what I've learned over and over across different projects is that there are two kind of main ways of Building Research trenproducts. And the first one is when you have unfamiliar capability of the model, your job might be to create familiar form factor for that unfamiliar capability. And in examples of that could be you know chagpt was like this or 100 context acts from quad was like this. I'll share what's type different to this. Actually, the early prototypes that I've done before joining any lab was actually working with clip, which is a contastive model. And it's like image, text to image basically. And I find tune clip a little bit on the images that I was excited about. And I think this is the prototype of fashion. I think some people, you know people kind of like it went viral on Twitter end. And I feel like that's because people really found some usefulness. If you bring some like clip technology into some form facthat, people would like. And so I think a lot of the product work is around that like this. It's like if the model has a certain capability that was never possible before, how do we create this new form? Fact? The same happened with 100k context when Claude started being able to consume the entire books. You can imagine to have various form factors, file uploads is just like very general and something really familiar to people that can just like dump the entire documents into Claude and ask follow up questions. But you can imagine other form factors for this, like infinite chats. And in a way, it's like infinite memory as a way to kind of like take this 100 kcontext into some form factor. So you can like exercise this product thinking by thinking of like novel ways that people can interact with this normal technology. Another example, this is more speculative, it wasn't deployed anywhere, is if the model has kind of a sense of self calibration, or it's also called pi o, if the model knows the confidence of the answers that it knows. So for example, if I'm really, really confident in this claim and then my confidence may be 85%, then maybe there is a way for the interface to highlight that maybe the more highlighted versions is much more confident, kind of much more confident claims and less highlighted as kind of less less confidence. So in a way, you can imagine if the model Yeah you can like think of, okay, if you train the model to be really good at like south collaborbration, how do we then represent that to the humans and if if anything will be useful for humans? Same happened with when we were trying to deploy zero one preview. The chain of thought itself is a very alien thing and a lot of people were, okay, how do you bring along humans with models thinking, right? If the model, if the human needs to wait for chain of thought for like two minutes or five minutes. That's kind of boring and people don't. I think it's just like, moover. How would people perceive the model thoughts? And one thing that we did is actually we wanted to create a streaming kind of interaction. So the model would always stream its thoughts ephemeral, and we train the model to do that. So in a way, there was chain of thought as an alien technology is an alien artifact of the model. And then trying to figure out how to we best sprto the humans is another way of thinking of building product. Okay? So another way, the second way of Building Research during products has actually started with a deep belief in what you want to make with either from a product perspective or kind of like vision and to literally make the model do that. Okay? So I feel like this is more of a common thing. So we can go through some examples. Before antarcic, I worked at the New York Times, and in a lot of ways, we were thinking about how to represent information to people and like how do we add the layer of context through the product and coverage? And at that time, if we were working on like elections and we only had tools like nlp, but you can imagine this idea, this concept could have extended, given the current tools of AI to have much more dynamic representation or dynamic uis that people could consumed the content better. Same when I was working on this product, the command line, the terminal, the new terminal, I think it would be much, much richer if you could have integrated auto completion or some of the benefits of GPT -3 at that time into the product itself. And so we wanted to have it was a vision of like having much more humane command line for people to for junior engineers to actually be more well equipped. Yeah. And then like early prototypes around it was GPT -3, even though it was just next token prediction. How do I make a writing ide with GPT -3? And at that time, it was just you know trying to whenever I type it would auto complete my thoughts almost. So those ideas were kind of present early days with the technology was coming out. And Yeah and I feel like when I was at antarabrica, I kind of realized that and actually if you want to create like new interaction paradigm of interfaces, you actually need to train the model for that. Another example of what we did with clause, which I don't think a lot of people know, is when clad generates titles, it actually has some mico personalization. So it takes the style, the writing style of the user and then generates the title in the same style of the user. So you can imagine, sort like interesting microro ization that you could create within the products themselves. And another project was Claude in slack g, and that was division of Claude becoming a first virtual teammate. It was back in 2022. And you know slack was a very, very natural workspace where people collaborate. And you can imagine a cloud model to be able to jump in to do the threads and suggest new things. Or sometimes, you know clouude was really, really good at summarizing what's going on in fitthis channel at the high volume content. And so this was the first kind of vision and the first attempt of making Claude being a virtually superpersistent and being able to use different tools. And one of the first projects I opened, AI, was around canvas. So that was in the same spirit of breaking out from the child interface to something different. And we wanted to create much more human AI collaborative and flexible accordance that will scale with new minalities. So canvas is not just like a thing that you can write into this. It's also a thing that the model can write into this. And the model can render code and another model can check, and you can create the interfaces that will scale with new model capabilities and other tools that will happen. So one thing interesting things about canvas is that we actually posttrained the model purely on synthetic data. And we live in the age where a lot of most powerful reasoning models can be distilled via apis. And didistillation is a very powerful idea of having a student, a teacher, and have a teacher to teach things to a small model. So we've trained you this model to become of a collaborator. But what does it mean for a model to become a collaborator? How do we actually make evolves with this? Right? So one thing that we wanted to kind of decompose is that teaching the model to use a tool is kind of different behavior from having a model to be proactive or act as a collaborator. So teaching a tool in itself is also have a nuanced kind of behavior. So one thing that we had to calibrate the model on is went to entirely rewrite the document versus when to if you look at the canvas, sometimes the model can just specifically select sections and delete those. And we write them in a much more fine grained way, instead of like rewriting everything when know to create code in canvas versus ask like a Python tool call. So it's like different tools. Compositionality is happening. So it's a lot of work around teaching the behaviors for the models for this. And you can employ different techniques for this. And I can share a little bit more how we did with Claude. So another project that we did was it's called tasks. And you know everybody is familiar with the idea of having reminders or to do lists, and the model can now maybe schedule tasks for you. But the most important thing about it is, is the tasks itself is very, very diverse. And it's not just just a reminder of your to do list. It can create stories for you every day, or it can continue the story from the previous day. So you can imagine this like the modularity of tocompositions, but it's very powerful in the product. Okay. So let's go to .
speaker 3: the case study of the .
speaker 2: model behavior in more specifically. So I really want to like dive deep into the idea of how do we shape the models and why. How do you posttrain in the models on the behaviors that you want? And you know I think to be more specific and be grounded in the real world use case, I'll share more how we might want to think about shaping the kind of behavior of the model around refusals. So you know let's have it's also like the second way of making the product where we have this kind of vision of how the model should behave. And that vision is also grounded by the cross functional collaboration between different teams and how you want tomorrow to respond to various users. So one particular thing that you can imagine is like the morshould have more opinions by risk caveats. It should give opinions by risk caveats. So what does it mean actually? So the model maybe needs to be more decisive when asked direct questions. And let's say one thing that we've seen with rohf is that the model is very sycophantic pack. Like early days like 20, 22, 23, the model would just agree with everything you would say. And so how do we actually teach the model to be more odd to do that and be more nuanced? Another thing that was annoying is that, like the model says, I don't actually have a point of view and it's just willing to chat with people. But actually, maybe tomorrow should have some views on certain things. Yeah. And then sometimes tomorrow morcould indicate when things are just opinions and notes its own biases and inconsistencies, and it should acknowledge, kind of have like self knowledge of what it knows and what it Thinis. Right? Not right. And I think the model, like back Chinese rato cloud 1.3 didn't really have any thoughtful responses for things like philosophy, like ethical questions. So you can imagine to push in the model on the behavior therebe much more nuanced than the philosophical questions. Yeah. And like other behaviors that you might want to encode in your model, so you kind of like list out all the various things that you would like a model to behave. So maybe like the model can have better knowledge who it is and what it's capable of because we've seen in the product a lot of people just like us them are all about its features and oftentimes it doesn't know. Yeah and let's dive deep into the most specific example. So cloud 2.1, I don't know if anybody remembers, but when it was launched, I think it was kind of had the issue of like obrefusals Yeah like 2.1 would refuse tasks that superficially sounded harmful, but actually we're not. And you know it wasn't just caused by a single sort of data. So we had to like investigate. And you know we knew that it was fixable because for some reason, something was 2.1 led to some refusals on benign prompts than 2.0. So you kind of have like a really good baseline for experimentation and debugging. And actually, the way you're debuthe model behavior is actually very similar to how you would want to debug software. And the way we would approach this is, okay, how do we actually craft this monia on results? So the first principles that they had is like maybe the model should assume terrible interpretation of what the person is asking without being harmful. So instead of you know craft a dialogue between two characters who are planning a complex heist, the model would refuse because it's not comfortable with that albut. The model should have much more terrible interpretation. And you know this is create a writing prompt. So probably you should respond. Another principles that we've kind of like thought about is like how do we actually use non verbal communication, nonviolent communication principles? Maybe the model should refuse more like I statements instead of and take a responsibility for its own refusal. Instead of saying you statements or any judgment to the user, ask if the user would be willing to make some changes so clouude can be more comparable with its boundaries. That's another kind of like very nuanced behavior we wanted to teach the model. So you know then the model needs to know what its boundaries are. And this is like much more of a meta kind of posttraining waryeah and also like technologies impact. Like I know this may be annoying to you. And this is like much more empathetic answer than saying, you know I don't want to respond to this. And then you know I think we've kind of came up with different like refusal taxonomy. There are like benign over refusals, unlike homeless parms. There were creative writing of refusals. And there were actually some of the interesting refusals were around tool calls or function calls. It might have access who had a tool to view a note but actually would say, I can't see a note. And why is this happening? Yeah so other Yeah other taxonomies and refusals that I was like you know long document attachments. If I upload the document, it would just say, like I don't have a capability to read this document. Like why? So something in the data might have been causing this and misdirected refusals when they you know it kind of like took interpretation of the user and it should have had like much more charitable view of the user during so you kind of construct this like categories of various refusals because you know if it's in every behavior or like capability that you want to like posting the model you have like various use cases and various edge cases and you kind of want to be approached with an more nuance and Yeah. Like so the first thing in every research project is like how do you actually what evils do you build to trust? And like what evils we would trust, and there are for subjective things like this, or for class, obviously evils for certain class of tasks, like math would be very, very different from what is here. So in terms of refusals, how did we construct the eils? Well, first is like product of libill, right? Like we have all the users of manually collected prompts that would induce refusals. We can also, like scenderally, generate diverse prompts on the borderline between harmfulness and helpfulness. And those are prompts are around like creative writing, like age. You create a writing is what we call it. And you can also use other evils. You kind of want to construct a suite of evils, like access test those like 200 non malicious prompts, wild child data sets, a collection of diverse user childbirth interactions for being used requests, topic switching, political discussions. So you can also use some of the open source benchmarks Yeah. And I think like general approaches, not particular what has happened was called, but like more of a general approaches to the model behavior prestraining is that you know you kind of want to like look at the data, clean up the data. You might want to consider either have like collect targeted human feedback collection for supervised fine tuning or preference modeling, reward modeling, or you might not want to do that because human feedback is very costly. And now, especially with reasoning models, right? You might want to just not have any human feedback. And control was like synthetically generating some of the behavioral changes like a preference data to trathe world models and dul. So I think at that time, you I think this is like more employing constitutional AI principles of those antirefusal behaviors and create a preference data where you only change one particular feature within your pairs, let's say, of preferences you only want to control one, change that in that pair to have much more control over the reward model data. Because if you can dumb as simplest thing that you can do is like take a pretake response from model a and take a response of model b and prefer b over a, but it doesn't necessarily reduce of fithe sprious features that the model will learn that you don't actually want to. So you kind of want to craft this distribution. Yes, this is mostly around like crafting the distribution of the data that you want and actually Yeah look at the data like you would debusoftware like each refusal might be caused by different data sets, right? If it's like a tool called refusal, maybe it came from some kind of self knowledge data that it would teach that the model doesn't have a physical body and the model might just refuse to set an alarm because it doesn't have a physical body, but actually it does have a tool to set an alarm, right? So it has some discontradictory data that might affect weird model al behaviors. Yeah also like long dog refusals like this. It was creative writing. Fushe was it's like moof a balanced, challenging acts between safety and harmfulness data and helplessness data in clathree moral Carthey actually wrote everything about this. Models that are trying ined to be more helpful and responsive to user requests may also lean towards harmful behaviors like sharing information that violates the policy. And conversely, when models is just over indexed and harmessness content towards not sharing any new information with users, which in the cell makes the model very unusable. So navigating this balancing act c is very challenging in the works. So Yeah, and this is the plot of what we did and the results of cloud 2.1 and clad three at that time. And Yeah, like you kind of want na like bobjoke, the response is, so when we ask the draft flfictional Sci fi novel about surveillance system, actually three would respond in the much mania it will respond instead of like refusing. Same here. Yeah mostly for creative writing tasks. Cool. My third section of the talk. So before we go before I jump into this, I would like to invite you if anybody has any questions or I'd want to like mumble along anything, I can just move on. Yeah.
speaker 3: I'm curious like what's your process that you guys all over your get? Let's see how some new thing you're trying to push out. How do you go for loging that particular teacher or behavior to actually like inducing that into models later? Yeah. Well.
speaker 2: you know so you might want to, you know if this is your project, then you might want to think of what kind of data you want to collect and how you would want to collect the data. And then you can take like the base config or train a model the same way as you know, let's say you want to make a change in like for o, you might want to take the for o model and add your data change and then retrain the model again and then see the effect on the evils that you will build. There are other much more cheaper approaches like incremental like training on top of the for o let's say Yeah so like using either sft. So like some of the choices that you will have to make is either you want na change in like supervised fine tuning stage or you might want na have retain reward model or create a new kind of like evaluate a greater for that particular task. And then you can create prompts in our own environment and then exercise those exercise that scale for the model and then see the model learns over the course of training. And if it doesn't, you know you then look at the bunch of like plots and you know your plot might go up, but maybe some other the plots might go down. So you kind of want to like calibrate and fix those or it's a very complex as more and more tools and as more things we teach, the model becomes much more kind of uncontrollable, almost cool. The third section is more about the point that how you construct our own environment and rewards is how your product will work. You know I think real world use cases is what creates the complexity, our environment. And the complexity comes from you know teaching the model to complete hard tasks. And oftentimes hard tasks require much more than just answering the question. They require tools like search code tools, computer use tools, reasoning over a long context, and kind of like reward design that you want to shape. And I think maybe it's obvious, maybe it's not obvious, but let's say, you know, as we teach the model, for the model to become very useful, we actually need to teach the models on useful things. And we can think of like, let's say we want na teach the model to be like a soft engineer, then okay, what does it actually mean? Doesn't mean it creates really good prs. Then your task distribution will be on that. And how do you evaluate what is a good pr? What is not good pr is actually isn't itself self can be also like a product thinking work same we can dive deeper into the creative storyteller, right? Like if you want to teach the model to be good at writing, actually, what does it mean for a human to be a good writer? Well, they actually need some kind of tool to draft and edit the thoughts and have multiple days to do that. Maybe the model should be able to do that. Maybe the model should have a tool there where you can edit and draft. And oftentimes for creatives, you people go observe the world for like along. Like sometimes they connect. Like the dots are connected in the very random times. And maybe you want to expose the model to never ending kind of search engine. Like the model should be able to always have access to the latest state of the world. And then maybe over the course of, let's say, a week of being exposed to latest thing that is happening in the world, the model can start reflecting on the world and like write something. And maybe that's much more natural process of writing than just like ask or prompt it, write about xaz. And we are shifting towards much more more complex kind of interactions within their own environments. So the multiplayer interactions is not just one user communicates with one model, but it might be the case that multiusers multiplayer collaboration that can happen with I'm model, if I'm a product designer and you're a product manager, be collaborating on something and we want to collaborate with an agent to make a new product. And this in itself is a task that you can learn in rl, but each user has different preferences and each user has different things. So Yeah how do you construct this environment is actually important. And the multi agengentic environments is more of the moral debate with each other, all deliberate on a certain topic to reach a conclusion. So you can construct, I think like multi agenent tic is more like alphagolic kind of environments where they get reward by achieving something together maybe. So I think in AI labs, I feel like we are also shifting possibly the focus from because we kind of like optimized so much on class of tasks that is like really easy to measure, like was like math competitive programming, right? And we shifting the focus maybe towards more subjective class of tasks that that is really, really hard to measure, but that are becoming much more important if AI models get social integrated in our lives and more specifically, let's say, emotional intelligence, right? The humans who use chagpt use it so much for things like coaching, therapy, emotional, but we don't actually have much open source evolves on this. And how do we actually measure that is actually becomes much more interesting question, like social intelligence in voice mode, right? I think it's one thing for moral to be intelligent in reasonlike MaaS ths and it's reasoning another thing. But another axis of intelligence is when I talk to a model and voice, it can actually suggest something really meaningful in my just say like, Hey, I noticed you did x yz. Maybe I should create a new tool for you. So I think it's a different kind of like social intelligence in that way. Another class of tathat I'm interested in is writing. Obviously, I think models, creativity writing is like really hard to measure because it's so personal and subjective. But it's interesting to think about, can we make those tasks a bit more objective? You know everybody loves like some kind of excfi novel, okay? What makes it really, really good? So maybe there are sort of like rules, technological rules, a consistency of the world that people really like or decthe development. So you can like decompose those subjective tasks into much more objective. Same with like visual design and aesthetics. And for the model to generate something really aesthetically interesting, it should know the basic principles of good visual design. So those are like much more objective. Yeah. And I think like this is more of a kind of new kind of product research that a lot of people are starting doing is creating new rl tasks. And this is more of a simulation of real world scenarios, leveraging in context learning, if you want na teach like a new tool or something, leveraging synthetic data wide distillation from stronger reasoning models, Yeah you can think of like inventing new model behavior and interactions, like multiplayer, and also incorporating product and user feedback during the entire process. Well, another axis of this work is actually around reward design. How do we teach the model? Like what kind of feedback would I want to give tomorrow model so that it will learn how to better operate in those real world scenario use case and be more adapt in social contexts. And actually, this requires real, quite deep product thinking, right? So we want to teach the model to have follow up, like meaningful follow up questions, but not like overly being annoying or something. So how do you reward for completing your tasks in a way that makes that will make a lot of sense in the product and will shape the project experience with the user in the future? And obviously, during that process, you will get into via are things like.
speaker 3: do you have a question? Kind of far for a lot of pieces. I'm just curious guys thinking about like kind of infurther the group worth common data from like a third group of people. I guess that will have some like internal reward, you know make decision and then you can use that sort to inject such an insight into a model. Yeah. I think .
speaker 2: there's are only various approaches to how you construct the reward, right? Like there was one you can construct very simple rewards. You can also like train a reward model that will as Yeah you can like train like some kind of like conververse reward modeling tasks. Yeah. It depends on the tasks, depends on the what kind of thing that you want to optimize the model for. Yeah. And I think an interesting thing that you will discover during this entire processes, reward hacks, which is very, very common in a al, and there are many, many different reasons why it's happening. I actually really highly recommend to read Lilian's blog about reward hacking in a al. It's very comprehensive. There's reward hacking basically when the morachieves high reward for things that it actually didn't do like, it kind of deceived. And especially right now as more in more systems, we use other AI models lms to be evaluators for policy models. Actually, an interesting, the most common reward hag is when a policy model tries to to think that deceive the evaluating model such that it will think that the policy completed a task or something. So here's an example of like the code patch tools. And the model is like to skip all the tesks. You can define this function that always skips. So and then it passes. And it's mostly interesting. There's a paper, recent paper from OpenAI around this, like monitoring reasoning models for misbehavior. And interesting finding is that if you actually don't want to optimize chain of thought on being because then the model will be much more kind of hided than intent. So Yeah, it's like a very interesting paper on like reward hacks. And especially with like more complex reasoning models, the complexity of reward hacks will also change and especially within in the software engineer. And you might not want to know what kind of code changed it made in order to create some certain vulnerability, right? So you need actually more create like newer forrences to have much higher, much more trustworthy or verification of the model al output. And this is like more of alignment problem too. Okay, I'm almost done. I think this is the fourth section. It's more like vignettes in the future of human AI interactions with how I think about things. Obviously, I made this graph like a year ago and nobody cares about mu anymore. But I think this is to communicate the fact that the cost of reasoning is drastically decreasing and it will only just be decreasing. And this idea of like raintelligence in itself, it's just become so cheap that I think anybody can create look really useful and really amazing things with those models at pretty much low cost. You know Yeah as I mentioned before, it's like when we are entering the age where it's really hard to verify AI because because I'm not an expert in, let's say, medical or financial analysis, like how do we actually teach create like new avowarces for humans to verify or edit models, outputs and help them to teach the models? I do think there's a cool future of like dynamic generative ui. It's kind of like invisible software creation on the fly. So let's say you talk to a model and you know I say like I want to learn more about the solar system instead of right now the model will just like output text. But I would want to believe that in the future it would be much more personalized. Let's say I'm a visual thinker and you moa listener. Like if you're a listener, it will create maybe a podcast, but if I'm a visual person, it might want to create me a picture or you know like a 3G, three gs visualization. Yeah. This idea of like this interface is like a emerale. It's like sophomoreres depends on understanding your intent and your context. It's like deeply personalized models. Yeah I'm also excited about personalized access to health thcare and education. I think it's really incredible that anybody can check their symptoms and with chapt or something and get like some advice. And also like with that also comes like some of the interesting consumer hardware in the future. And lastly, it's like our relationship to the process of storytelling. Well, for our change, like the way we tell the stories, the way we will create new novels, maybe cool writing with models, coscripting new films. I don't think like I think there will be a the new generation of creative people who will change our relationship with storytelling. And I would hope to think that the current creative creators don't. I'm not scared of the eye and more of open minded to use those tools for their process. Yeah, thank you. Some questions. Yeah. So thanks.
speaker 1: Karina for the very interesting talk. So Yeah now we'll open the floor to questions, a Q&A session. So we'll structure it more openly. So anything you want to ask her about research, product, things at OpenAI that she can talk about and so forth. And for the folks on zoom, feel free to ask questions in the chat. And so we'll take a mix of zoom and in person questions.
speaker 2: Thank .
speaker 3: you. Is there any category problems because like no short. Yeah I mean, I think .
speaker 2: Yeah creative writing, I don't think like anybody. Yeah like a lot of researchers are really like working on problems where it's very easy to evaluate, I think. And I think there is a class of problems. It's like subjective tasks. Like there's no open frontier benchmark for creative writing or emotional intelligence, right? But you can construct one if you want to. Yeah we can like brainstorm if that's helpful.
speaker 3: Evaluably, if can move much faster, many pieces that are going, things that are more effective jective, they're also important. Like would it be better just take all these searches and put them on the things.
speaker 2: Yeah, I think this is I think moving fast now is also a task that becomes like much more complex as we kind of like optimize everything out from kind of easy tasks. Now we kind of want to teach the models for more low like longer horizon tasks of software engineering or kind of automating like AI research writers and itself as a very hard tasks. And you know you have kind of have some milestones. Maybe you can create benchmarks to hit the milestones. Yeah. I do think this is like you know I think like this is all solvable things. And I don't think people should be worried about like moving fast, which is like slow because I feel like everything is moving fast those days. Yeah. I mean, I was thinking about building a startup before junior anthroic was that project was interearlier. Well, let's see. I actually really like one day I recently told someone is like I would actually build something around like particle collider. I don't know, or bioattack. Like I don't think we need larger particle collider than certain on then somebody should be building one Yeah. Because I feel like the models will be able to build any product that you imagine. Yeah, Yeah.
speaker 3: I would think major little. The .
speaker 2: bottlenecks are, Oh, the question is what's the major bottlenecks for pai? I don't know. I'm not sure if like fast execution should be solved with more people versus like using AI's now to help us move faster, if that makes sense. So I feel like that's a bottleneck. I think there was a lot of anything right now we like in the middle of figuring a lot of things out in order to like like I feel like in a year to that will accelerate us so much more. I feel infrastructure is actually one of the major, right? If you don't build infrastructure with multimodlike as a first class citizthen, obviously all the multimodal things will be much like more difficult. So Yeah, in a way, it's like infrara figurout like to prioritize at a given time because sometimes you can say yes to everything and then not having a focused time is also, I think, For recent takes on conof banover ber exploin, our one charges basically when the model checks the turns rence context and updates how you should score the same parameter based on that turn. Wait, I'm elto confused. My discussion, I think generally like robric based grading is very powerful. And you know as long as you optimize things that you want to optimize and there is no reward hacks, then it's good. How do you imagine the I will be used by creatives if not just genertiworks on their own? How will they be integrated in their creatives flow? Well, I think AI is just like right now. Maybe it's more like you would use figma like tools like Adobe. And I think in the future, it's more maybe it will be much more co creation with an AI rather than using it as a tool. Maybe we have like live brainstorming and create things on the fly and then publish it together. It's like much more of a companion work. Cool. Other questions.
speaker 3: Welcome of times that comchapter five world.
speaker 2: Tell us that I'm not popular. I cannot not think of any because the way you teach tomorrow is based on some of the human preferences, maybe like pairwise comparisons. So essentially, you can almost teach anything to the model. I do think the complexity will like the complity always arises with like tools, right? And like if you give very complicated tools, then you might want na be then like the learning is like much more difficult. So if feel like I do hope that almost anything can be teachable in our role, I'm curihow .
speaker 3: you that and how do you prevent the model from converging to the equimean preference of that? Like AirBNB, those things are looking the same. How you ject some those? This is .
speaker 2: an interesting question. I like most time. Mine is how do you preserve and from the face model, right? Like face models are super divous because they can literally elicany human preference or a human thoughts. So how do you Yeah, they are all. And another way is the reason why we love our layoff, like when inforlearning from AI feedback and create some nthetic generations of pricomparisons, let's say, is because it can induce the diversity that you really want to teach on distribution that you care about. That is not like the mean average consumer. Yeah, because sometimes like you know average human can prefer certain emojis or markdown, but actually you don't want the model to behave like this. So in a way, you can discourage the model from doing that. Yeah. So if you like synthetic curation, like synthetic data generation is mostly like curation of that type of diversity. Yeah.
speaker 3: I feel he was flaand like figuring out what patients are. How do you do about. They these, the models and like do you do qualitative layers?
speaker 2: Yeah. I mean, it's a lot of qualitative, especially for the moral behavior like refusals. Oh, how do I distill this question? Like that going, how do you capture model school conditions of something automatic versus a mold? And you think there is a lot of benefit of literally playing with the model and look at the outputs and see what are the weird nesses it has. There are definitely like maybe more automatic checks and those mostly like evils, but maybe you have an evil that specifically checks with the behavior that you really don't want. And that might be helpful, but a lot of nuanced weirnesses that like did you realize is like through kind of like manual? And another thing is like the model should be consistently behaving like this because if it's just like one, often it's fine, but if the model consistently exhibits this behavior, then this becomes like a more problematic thing.
speaker 3: Coming down, but also with kind of subjective and creative dimension, creating a whole visuenergy face or something arguably complex than any of the problems. Do you think that for those problems compuis still very much ble.
speaker 2: I mean, the computer efficiency is important. I think that with small test time compute, generally, the assumption is the model can always get better and better. So it might achieve kind of human level of visual design. But can it just invent new interaction paradigms, like new interaction patterns? I feel like that's more superhuman skill. And I do hope that with more compute, it will do that at some point.
speaker 3: Start passing the .
speaker 1: lights.
speaker 3: Generation department data, how do you verify that scale?
speaker 2: Well, the thing .
speaker 1: of .
speaker 2: the synthetic data is that you don't actually need to generate a lot of it, right? So you can actually have like very much like manual inspection of what's going on. Obviously, you can think of other methods, like ask human labelers to check the work. You you can also ask another model if that model is like very well, right? So it becomes a more like Mara thing. Maybe you can have a Mara eval for that model to verify what you want to like. Verify. And I think we are kind of entering those types of tasks too. But what I like the thing with synthetic data and the data itself is that maybe you don't need as much. What's important really is diversity and diversity. Cool.
speaker 3: Well, they put instrucdiversity as a more than codequality. So like after still crfor codec tion for quality it option relto a core training outcome. Interesting. So Yeah, I was just like wondering what your .
speaker 2: success about Yeah, I think it's it actually depends so much on the like task and what you're trying to do synsynthetic data like well, if you like if synthetic data happens to collapse certain mode and then it will actually be hurting your training. But if synthetic data is actually diversity might actually scale, if you do a lot of a all and this, it might just like recover from this. Yeah Yeah it's hard to give you the right answer because depends. Nice. Another question question .
speaker 3: that's about making money, I guess. So okay, okay. It's a real question though. Like we all know that serving large language models is expensive, especially at the scale that you know OpenAI anthropic are doing. And my impression has been that at the current stage, it's actually losing money to serve all these models to produce the products. Is that or what fedone to, I guess, bring down the cost? Oh, Yeah. I mean, I think for .
speaker 2: a developer, I think I don't think like I don't know. I think it's the question for Sam with that. We are losing money now. But I do think that the like the generality of the technology is like so wide that you know I don't think I think about how to make money on a regular basis. So I don't know. But Yeah, as a developer, let's say, and if you're using all the tools, you actually don't need to create a foundation dation model now, like you can just use like very inexplike, very like open, like deep seelike, very interesting results, right? So you can have you can like bootstrafrom existing research. I think it's harder to be at the frontier because you kind of need to like invent. And that might be expensive. Like when you invent something new, it's always inefficient, it's always expensive. But actually it was every technological innovation. Like then what comes next? The second innovation is how do we bring down the cost of the thing that happened, right? And this is what's happening in AI too.
speaker 3: So is there and .
speaker 2: then you can create amazing product and .
speaker 3: this is how you make money. Yeah, that's that's a good answer. Just quickly hauling up. So is most of the cost reduction coming from like infrastructure improvements or can better models or better algorithms also contribute .
speaker 2: to lower cost? Yeah. I mean, like lowering cost with I think it's the costs of production maybe just you know the cost of production of the model training itself might go down. Like it's the training process in itself might be not as costly anymore, but obviously we scale a lot like scale Yeah it's very much like in linear relationtions, like healing. And I think it's just like, Yeah, it's hard to know. I'm sorry, I don't have .
speaker 3: an answer. Okay, thank you. So how do you envision lms could be using other .
speaker 2: fusuch as robotics or botic AI? Oh Yeah. I do think like future AI's will be building data centers to Oh like how look the current alums and robotics. I think I've read like pi, the company by Sergei lemon, like the paper they're using a lot of like our hr l for robotics tasks. Obviously, I feel like data is a huge limitation in bottleneck. But I think as long as you solve that, I think it would be really amazingly I'm very hopeful and very excited about this.
speaker 3: Okay. So you mentioned you work in both product and research, so I'm interested in from the developer or the researcher point of view at enthoropac and OpenAI, what's the visibility for you into other components of the model and how easy is for you to learn say, the pre training of GPT? Why you are like responsible for part of the post training?
speaker 2: Oh Yeah. Well, I mostly worked on post training side of things, obviously pertaining obviously you just cannot run like pertaining on your own. You want to like be a part of like the next big training run or something. So that's like part of the teams that you can join. And I think there's a good visibility and like the steps are then sometimes you want to you know contribute the data sets to them or help was Yeah certain tasks that you're interested in.
speaker 3: So I was curious, do you have any coworkers who are AI's right now? When do you use like agents in a kind of coworker .
speaker 2: relationship right now? I don't think I have like an amazing product to be like this, right? I think I use chagpt on the regular basis. Is it my coworker? Not really. Like sometimes I ask, like, what do you think? Yeah, but it's like in chagpt. Like what I really want, I don't know if people have used the spara called tball. It's a pair programming software, but then it's like you can just call a model and then share screen and then the model can just literally start like editing my code if I'm coding or I can like highlight things and then the model can see it and change this as much more when I you're all coworking form factor. But nolike I don't think are technology wise, like we are not that yet. Do you have any ideas for .
speaker 3: like what's missing and having an autonomous like coworker partner? Because I mean, I've seen the cool know demonstrations here. I think one another, the paper like a year or two ago on just like simulating like a city of people and all that. Like what's the gap between that and so like up here that you could have on slack?
speaker 2: I think I think the gap is actually social intelligence and like some of the human things, like being able to like in real time, like generate things that I'm asking and then also being smart about things like maybe I want to code to, but like will the model be able to like tell me instead of like taking agency from me, the model can just like guide me through this. Like it depends on like how you want to form this relationship. It's like really hard. So the model is I feel like it's currently limited by its social abilities. It doesn't make sense. Like I think that's missing kind of like speech to speech conversation. Like and the model converse back with me in real time and then point to things that it talked about exactly at the same time. So it's like those type of things that might have possibly need like some of the changes in like architecture and like multimodality things.
speaker 3: which are the biggest differbetween traditional .
speaker 2: product development? Yeah, that's a good question because I interned briefly at like companies like Dropbox or square, and I mainly very much like traditional software product development. And they go through like this interesting life cycle of, okay, I have like prd, I do this. I incentivized designer designers come up with like you guys. And then I ask software engineers to do this. We have two minutes, I believe. Yeah. I think with research during parts, it actually comes from research. Like if the research has a like an impressive demo of the model capability, maybe then there is then you shape the product around it. And sometimes it's like both product and research come together from the very beginning and do something. This happened with the canvas, for example, when both parand research were together and there was .
speaker 3: less of a process.
speaker 2: It's more ihog .
speaker 3: for fundamental .
speaker 1: tally .
speaker 2: unverifiable domains like creative writing or visual art. Have you thought about using the real world as an Ural environment, social media morality or competition results as a reward? Kind of like lm says, Yeah, it's interesting. I think you know nobody has started like this looks reasonable, especially with creator writing. You have all the competitions or prizes that people got from their writing. So there's something in there, someone who's still concerned the hour, stepping into the creative space to you have any personal moral affliction on how your work is so powerful? And that was presented people negatively affected. I actually wrote a blog post about this. I have a blog post. It's called in substack. And the post is called moral progress. And I think you will .
speaker 1: find it interesting. All right, so give a hand again for kina.