speaker 1: As people are interacting with these systems as collaborators, itjust become an increasingly salient question, whether these models are having experiences of their own, and if so, what kinds? And how does that shape the relationships that it make sense .
speaker 2: for us to build with them? Take one mark. Do you ever find yourself saying please and thank you to AI models when you use them? I certainly do. And part of me thinks, well, this is obviously ridiculous. It's just a computer, right? Does not have feelings that I could potentially hurt by being impolite. On the other hand, if you spend enough time talking to AI models, the capabilities that they have and the qualities of their output, especially these days, does make you think that potentially something else, something more could be going on. Could it possibly be the case that AI models could have some level of consciousness? That's the question that we're gonna to be discussing today. Obviously, it raises very many philosophical and scientific issues. So I'm very glad to be joined by Kyle fish, who is one of our researchers here. Anthrobert, you joined in what September, right? And your focus is on exactly these questions.
speaker 1: Yeah. So I worked broadly on model welfare here at anthropic, basically trying to wrap my head around the exactly the questions that you mentioned. Is it possible that at some point cloud or other AI systems may have experiences of their own that we ought to think about?
speaker 2: And if so, what should we do about that? And I suppose the first thing people will say when they're seeing this is, have they gone completely mad? Is this, this is a completely crazy question to ask, that this computer system where you put in a text input and it produces an output, could actually be conscious or sentient or something. I mean, what are the reasons that you might think? What are the sort of serious scientific or philosophical reasons that we might think .
speaker 1: that that would be the case? Yeah. There's maybe two things that jump to mind here, both like a kind of research case and a more intuitive case. And on the research front, if we just look at things that have been published on this topic in recent years, there was a report back in 2023 about the possibility of AI consciousness from a group of leading AI researchers and consciousness experts, including Yashua, benjo. And this report looked at a bunch of leading theories of consciousness and state of vr AI systems, and came away thinking that probably no current AI system is conscious. But they found no fundamental barriers to near term AI systems having some form of consciousness.
speaker 2: So that's human consciousness. They looked at the theories of human consciousness and then sort of rated AI is as how close they were to that.
speaker 1: Yeah. So they looked at theories that we have scientific theories for what consciousness might be, right? And then they looked at, for each of those theories, what are your potential indicator properties that we could find in example systems? So one theory of consciousness is global workspace theory. The idea that consciousness arises as a result of us having some kind of global workspace in our brains that processes a bunch of inputs and then broadcasts outputs out to different different modules. And so from that, you can say, all right, you know what would it look like for an AI model to have some kind of global workspace potentially that gives rise to some form of consciousness? And how can we interrogate the architectures and designs of these systems to see if that .
speaker 2: might be present? We about can we just take a step back and actually talk about what we mean by consciousness? It's an incredibly difficult thing to define, and people have been trying to define it for hundreds of years, whether that's scientifically or philosophically. Like what do we mean when we talk about that? What are you thinking when you think about an AI model being conscious? Like what actually is your definition .
speaker 1: of conscious that you're using there? Yeah, it is just an extraordinarily difficult thing to pin down. But one way that people commonly sort of capture at least an intuition about what consciousness is is with this this question of, you know is there something that it's like to be a .
speaker 2: particular kind of thing? Like I know, is there something is there something that it's like to be a bat?
speaker 1: Is the famous essay exactly? Exactly. So is there like some kind of internal experience that is unique to that particular kind of being or entity? And Yeah, is that present in different kinds of systems? So the idea of .
speaker 2: a philosophical zombie is someone who outwardly resembles a human, does all the things that humans do, seems to react in ways that humans do and so on. But actually inside, there's nothing there's no experience there. They're not experiencing the color red of this. They're not experiencing the color Green of that plant. They're just reacting to it in a sort of way that like an npc in a video game would or something, right? Whereas and I suppose the question is, is an AI like that or is an AI more like could it potentially be more like an animal or human and actually having some internal experience?
speaker 1: Is that sort of what we're getting at? Yeah, I think that's great. And this like philosophical zombie concept is quite interesting. That came from David chmers, yes, a leading like science of consciousness and philosophy researcher who I actually collaborated with on a recent paper on the topic of AI welfare. Again, this was an interdisciplinary effort, trying to look at, might it be the case that AI systems at some point werenant some form of moral consideration, either by nature of being conscious or by having some form of agency? And the conclusion from this report was that actually looks quite plausible, that near term systems have like one or both of these characteristics and may deserve some form of moral consideration. So that answers the are we just .
speaker 2: completely mad question, which is that like very serious philosophers who are considered the best philosophers in the world of like philosophy of mind, science of consciousness and so on, take this question seriously and are actively considering whether that would be the case.
speaker 1: Yeah. And maybe just to give a bit more intuitive case for thinking about this, Yeah that there's one lens that you can look through which which just says these are computer systems giving us you know some outputs for a given set of input.
speaker 2: I don't think Microsoft Word .
speaker 1: is conscious. I probably don't think.
speaker 2: Okay, right, right. Probably isn't interesting. Yeah.
speaker 1: But you know when we think about like what we're actually doing with these AI systems, we have these incredibly sophisticated, incredibly complex models, which are increasingly capturing a significant portion of human like cognitive capability. And every day these are getting more and more advanced and having closer and closer to the ability to replicate much of the work and intellectual labor of a human. And it seems to me like, given our massive uncertainty both about how exactly these AI systems are able to do what they do, and how we are able to do what we do and where our consciousness comes from, it seems to me like quite prudent to at least ask yourself the question if you find yourself creating like such a sophisticated human like in many ways system to take seriously the possibility that you may end up with some form of consciousness along the way.
speaker 2: It feels to me that unless you think there's something, well, we'll get into some more detail. Unless you think there's something supernatural about consciousness, that it needs a soul or a spirit or something, then you've got to but at least be open to the possibility that a complex cognitive system like an AI could potentially have .
speaker 1: these properties, right? Yeah. Well, you don't necessarily have to go supernatural like people. Some people believe that consciousness is a fundamentally biological phenomon that you can only exist in carbon based biological life forand, it's impossible to implement in a digital system. I don't find this view very compelling, but some people do claim that thing.
speaker 2: I mean, we'll we'll come back to that. We're gonna to talk about some of the objections to the idea of this, but I mean, you're a researcher, anthropic. But then the immediate thing people might wonder is, well, as Descartes famously said, the only person you can know that is actually conscious is having an experience is yourself, right? I don't even know if you're conscious. How can we tell if an AI model is conscious? What does a research look like there?
speaker 1: Yeah, great, great question. I would argue that you know we can in fact like say, a fair amount about the potential consciousness of other people, even if we're not completely certain about it, which I think gets at an important point here, which is that it's incredibly difficult to deal with any kind of certainty in this space. And overwhelmingly, the questions are probabilistic ones, much more so than like binary yes, no.
speaker 2: So for instance, we treat animals. We don't know if animals are conscious. We don't know 100% if animals are conscious or sentient and so on, but the way they act implies very strongly that they do. And animals that are more complex, chimpanzees, for instance, like clearly show many of the same properties as humans do in the way that they react to things. And so that's obviously we treat them differently than we would treat a plant or a rock or something. Yeah. So like there, as you say, there's probabilistic reasoning here .
speaker 1: and Yeah there's maybe like 22 threads of evidence that I'll highlight that we can look to to you know get like some information about this. One of those is behavioral evidence. And in the case of AI systems, this covers things like what do the AI systems say about themselves? How do they behave in different kinds of environments? Are they able to do the kinds of things that we typically associate with with conscious beings like you? Are they able to introspect and report accurately on their internal states? Do they have some awareness of the environment and the situation that they're in? And then a second thread is more kind of architectural and analysis of model internals. And this kind of comes back to the consciousness research, where we can say for a particular brain structure or or feature that we might associate with consciousness, do we see some corresponding version of that in AI systems? Okay. And so we can look you even without knowing much about the capabilities, then we can look at how these systems are designed and constructed and you perhaps learn a few things from .
speaker 2: and that's an important thing to say is that the reason that we don't know that these things are conscious is that we didn't intend to make them that way. It's not it's not like Microsoft Word. These models are trained and then things emerge out of them. And that's why there's AI is so much AI research in the first place, is that we don't fundamentally know why these AI's do the things they do. We don't fundamentally know what's going on inside in that sort of mathematical sense. Or in any larger sense. And so that's why all these mysteries still remain.
speaker 1: Yeah and we do see a lot of surprising emergent properties and capabilities as we train increasingly complex systems. And it seems reasonable to ask .
speaker 2: whether at some point one of those emergent properties may be consciously the ability to introspect or the ability to have some sort of conscious experience. You you talked about you know let's ask about let's talk about the first type of research, which is the one about you know actually what the model says, it's behavior .
speaker 1: and what it does .
speaker 2: and what it does. Yeah. Yeah. So what would be some examples of that research? How would that look? Yeah.
speaker 1: So one thing that that you know I'm quite excited about is work to understand model preferences and to try and get a sense of you know are there things that that your models care about either in the world or you know in their own experience and operation? And there's a number of ways that you can go about that. You can ask models if they have preferences and you'll see what they say. But you can also put models in situations in which they have options to choose from. And you can give them your choices between different kinds of tasks. You can give them choices between different kinds of your conversations or users that they might engage with. And you can see, do models show patterns of preference or aversion to two different kinds of experiences?
speaker 2: Isn't there an objection there? All of that, the way their preferences come out will be due to the way they were trained and the way that the developers of the models put things together. Or they could potentially be due to just like rather dom things that are in their training data that they saw and that develops a preference. And it doesn't necessarily like where is the jump between these kind of things and the actual sentience, the consciousness? Like where does that come in?
speaker 1: Yeah. So it is it is a great question. Like to what degree do different kinds of training and decisions that we make amiss designing these systems affect their preferences? And they just straightforwardly do like we are intentionally designing certain kinds of system systems that, for example, are like disinterested in causing harm and are generally most enthusiastic about being very helpful to users. And you're contributing .
speaker 2: to a positive society. We do our character research to give the AI positive personality that people would actually want. So a personality that makes a good citizen. We've talked about one that, as you say, is has balanced views, is as helpful as possible without being harmful and so on. So we deliberately gave it the preferences.
speaker 1: What does that have to do with its consciousness? Yeah, well, it's still so so this is a bit of a separate question from from consciousness. And and you know typically we do associate preferences and goals and desires in many ways with conscious systems, but you're not necessarily intrinsically so. But you know regardless of whether or not a system is conscious, there are some moral views that say that with your preferences and desires and certain degrees of agency, that there may be like some even your non conscious experience that is worth attending to there. But then also, if some system is conscious and if a system is you're having some kinds of experiences, then the presence or absence of preferences and the extent to which those preferences are either satisfied or frustrated may be a key driver of the kind of experience that that system is having. Okay, so we'll come back to the .
speaker 2: practical implications of this and the actual details of the research that you're doing and so on. But before we get into that, why should people care about this? What are the reasons that people should care that AI models, the ones that they use every day, might potentially be conscious or in future might potentially be conscious? Yeah.
speaker 1: I think there's two main reasons that I'll highlight. One is that as these systems do become increasingly capable and sophisticated, they will just be integrated into people's lives in deeper and deeper ways. And I think as people are interacting with these systems as collaborators and coworkers and counterparties, potentially as friends, itjust become an increasingly salient question, whether these models are having experiences of their own, and if so, what kinds, and how does that shape the relationships that it make sense for us to build with them? The second piece is know the intrinsic experience of the models. And it's possible that by nature of having know some kind of conscious experience or other experience that these systems may at some point deserve some moral consideration. And if so, then because they .
speaker 2: could be suffering Yeah they .
speaker 1: could be suffering or you know they could experience well lbeing and flourishing and know we .
speaker 2: would want to promote that.
speaker 1: We want to make say that up to higher level. Yeah. And and Yeah, if this is the case, it's potentially a very big deal because you know as as we continue scaling up the deployment of these systems, it's possible uthat within a couple decades we have trillions of human brain equivalents of AI computation running, and this could be of great moral significance. Yeah should we .
speaker 2: should try and correct this question again? It's like this isn't something that we're saying is the case. It's like these are reasons for doing this research .
speaker 1: in the first place. Yeah and we are just fundamentally uncertain about your huge swaths, of course. And today, very little work has happened on this topic. And so we're very much in the early stages of trying to wrap .
speaker 2: our heads around these things. One of the things we study, anthropic, is alignment. So trying to make sure that models are aligned with the preferences of the human users, making sure that the AI's are doing the things that we expect of them, that they're not deceiving us and all that. Does this research relate to alignment? I mean, you're technically in the alignment science part of the org. How does this relate to the .
speaker 1: alignment question? Yeah, I think that there's there's both some key distinctions and your ways in which like work on welfare and safety and alignment overlap. And as for the distinction, as you mentioned earlier, much of the work that we do at anthropic is focused on how can we ensure a positive future for humanity, how can we mitigate downside risks from these models for humans and for our users. And then you're in the case of model welfare, it's quite a different question that we're asking, which is, is there perhaps like some intrinsic experience of these models themselves that it may make sense for us to think about or know, will there be in the future? And that is a pretty important distinction. But at the same time, I think there is a lot of overlap. And in many ways, from both a welfare and a safety and alignment perspective, we would love to have models that are enthusiastic and content to be doing exactly the kinds of things that we hope for them to do in the world and that really share our values and preferences and are just generally like content with their situation, right? And similarly, it would be like quite a significant safety and alignment issue if this were not the case, if models were not excited about the things that we were asking them to do and were in some way know dissatisfied with the values that we were trying to instill in them or the role that we .
speaker 2: wanted them to play in the world. We want to avoid a situation where we're getting entities to do things that they would rather not do and in fact, are suffering on that basis.
speaker 1: Yeah for their sake and for ours.
speaker 2: right? Exactly. Both there's both ways. That's how we relate this question to alignment. Does this question relate to other aspects of what we do? Anthropic, we mentioned briefly interpretability earlier. Yeah.
speaker 1: I mean, I think we've touched on a couple like it is it is quite closely connected to alignment in many ways, quite closely connected to work that's done to shape Claude's character and shape what kind of personality does Claude have and what kinds of things does clad value and clad preferences in many ways. And then Yeah, in terms of interpretability, there's a fair amount of overlap there. Of interpretability is the main tool that we have to try to understand what is actually going on inside of these models that probes much deeper than kind of what their outputs are. And so we're quite excited as well about potential ways that we could use interpretability to get a sense of potential internal experiences.
speaker 2: We mentioned earlier that human consciousness itself is still something of a mystery, and that's what complicates this research to a like terrifying degree. Do you think that understanding stuff about AI consciousness, perhaps because the models are more open to us, we can actually look into a model in a way that is much more difficult with a person's brain when they're still walking around and going about, you know we can use brain scanners, but it's hard to look inside. And in the same way, do you think that that machine learning AI consciousness research might actually help us understand human consciousness?
speaker 1: Yeah, I think it's quite possible. I think we already see this happening to some degree. Like when we do the work of trying to look at the scientific views of consciousness and see what we can learn about AI systems, we also learn something about these theories and the degree to which they generalize outside of the human case. And in many cases, we find that things kind of break down in interesting ways. And we realize, Zed, that, Oh, we were actually making assumptions about kind of human consciousness that weren't appropriate to make, and that then tell us something about what kinds of things .
speaker 2: that makes sense to attend do mean in the sense that we say, Oh, this was on the checklist for human consciousness before. But now we think actually AI's can do that and we don't .
speaker 1: think they're conscious or what do you or like you know, we have some we have some framework for you understanding consciousness that is intended to generalize. And we find that that framework just isn't able to be applied to systems that are a non biological brain or that are predicated in some way on the particulars of the human brain in a way that, on reflection, doesn't make much sense. Okay. There's another way that AI progress may help us understand this, which is simply that as these models become increasingly capable, they may well surpass humans in fields as varied as philosophy and neuroscience and psychology. And so it may be the case that, in fact, simply by interacting with these models and having them do some work in this area, that we're able to learn quite a bit about ourselves and about them as well in some years time therebe. Two instances .
speaker 2: of Claude saying, how can we understand human consciousness? It's such a mystery to us. Yeah, this conversation .
speaker 1: might look a bit different.
speaker 2: Might be the opposite way. Road, Yeah, exactly. Okay. On the question of biology, we touched on this a moment ago, but on the question of biology, some people will say that this is simply a non question. What you need to be conscious is to have a biological system. There are so many things that a biological system, a biological brain has, that a neural network working in ammodel just doesn't have nerotransmitters, electrochemical signals, the various ways that the brain is connected up, and all the different types of neurons, the different, some people talk about theories of consciousness that involve the microtubules and neurons like this, the actual physical makeup of the neurons, which obviously doesn't translate to AI models. They're just mathematical operations. There's just lots and lots of mathematical operations happening. There's no serotonin or dopamine or anything going on there. So is that, to your mind, a decent objection to the the idea that AI molds .
speaker 1: could differbe conscious? I don't find it a compelling objection to the question of whether AI system could ever be conscious. But I do think looking at the degree of similarity or difference between what AI systems currently look like and the way that the human brain functions does tell us something. And like differences, there are like updates to me against potential consciousness. But at the same time, I'm quite sympathetic to the view that if you can simulate a human brain to some sufficient degree of fidelity, even if even if that comes down to simulating the roles of individual molecules of serotonin, so you're not just .
speaker 2: doing the thing that some people talk about. Where it's like replacing every individual neuron in the brain with a synthetic neuron, you're actually saying that you to make the full synthetic version, you would have to go as far as actually simulating the molecules of the neurotransmitters and stuff as well.
speaker 1: I'm not saying that you would have to do that, okay? But I'm saying you could imagine you could imagine in theory, Yeah, that you have done this and you have know, an incredibly high fidelity simulation of a human brain. You're running in digital form. And I and many people have the intuition that it's quite likely that there would be some kind of contous experience there. And an intuition that many people draw from there is this question of replacement, where if you went know neuron by neuron in the brain and replaced those one by one with some digital chip. And you're all along the way, you continue to be you and communicate and function in exactly the same way, then when you got to the end of that process, and all of your neurons were replaced by digital structures, and you're still exactly the same person, living exactly the same life, I think many people's intuition would be that not much has changed for you in terms of your conscious experience.
speaker 2: Okay, well, let's talk about another objection that relates to biology, which is, I think, what people would describe as embodied cognition. You hear people talk about you hear people talk about embodied cognition, which is it only makes sense to talk about our consciousness in the fact that we have a body, we have senses, we have lots of sendata coming in. We've got proper prioception of like where our body is in space. We've got all these different things going on that there's just no analog to in an AI model for now. No, there's an analog to vision. We've got AI models that are amazing at looking at things and interpreting that. And some models can do moving videos and some models can interpret sound and know. So perhaps we're getting closer to it, but the overall experience of being a human is really very different from an animal because we have a body. Yeah. Well.
speaker 1: you touched on a couple of distinctive things there. One is this question of embodiment. Like do we have some physical body? And robots are a pretty compelling example of cases in which digital systems can have some form of physical body. You could also have virtual bodies, like you could imagine beings that are embodied in some kind of virtual environment.
speaker 2: And I suppose the opposite way around is that we think that a brain in a vcould still maintain some level .
speaker 1: of consciousness Yeah or patients who are in a coma and you don't have control of their body, but are still very much having a conscious experience and able to experience all kinds of states of suffering and well being despite in some sense not having control of a physical physical body.
speaker 2: Is that because they've been trained though with all that sense data from earlier in life potentially though?
speaker 1: Yeah. I mean, we're very uncertain about where exactly this arises from. But even even when it comes to like the kind of sensory information that you were talking about, like we are kind of increasingly seeing you know multimodal capabilities in models.
speaker 2: I kind of up my own question, didn't I, by mentioning, by saying, Yeah.
speaker 1: and we really can't see things. Yeah and we are you know very much on a trajectory towards show systems, towards systems that are able to like process as diverse, perhaps even more diverse a set sensory inputs as we are and integrate those in very complicated ways. And you produce some set of outputs in you much the same way that we do.
speaker 2: Yeah. Yeah. So actually Yeah, we're getting towards and you know with progress in robotics, which you know has generally been slower than progress in AI up till now, I mean, maybe maybe things are about to take off tomorrow. Maybe therebe a big breakthrough tomorrow. I wouldn't be surprised given the way things are going. And we might actually see AI moles integrated into physical systems.
speaker 1: And I think there has been a trend thus far, and I expect that a trend will continue where there are things like this embodiment, like multimodal sensory processing, long term memory, many things like this that people associate in some way with consciousness and some people say are essential for consciousness. We're just steadily seeing the number of these that are lacking in AI systems go down. It's the six finger thing.
speaker 2: I always like stuck with a six finger thing for a long time. People were like, Oh, we'll always be able to tell that a picture of a human being is generated by an AI model because there's six fingers on the hand or the the fingers are all weird. You know, that's just not the case anymore. The world that's just gone like now they generate five fingers every time liably and Matt just has knocked down one another.
speaker 1: One of the domino's falls yeyeah. And so Yeah, I think over the next couple years we'll just see this continue to happen with you arguments against the possibility of conscious experience in .
speaker 2: AI something of a hostage fortune. Now way let's we haven't mentioned evolution yet. Some theories of consciousness or maybe most theories of consciousness assume that we have consciousness because we evolved it for actual reasons, right? It's actually it's a good thing to have consciousness because it allows you to react to things in ways that perhaps you wouldn't if you didn't have that nal internal experience. Yeah very hard to measure that or test that theory, but that's one of the ideas. Yeah given that AI models have not had that process of natural selection on developing reactions to things and evolving things like emotions and moods and things like fear, which obviously is a big part of many theories about why we evolved the way we did, fear, fear of predators, fear of other people attacking you and so on, helps you survive. Good illuusionary reasons aml don't have any of that. So is that another objection to why they might be conscious?
speaker 1: Yeah, absolutely. I think that the fact that your consciousness in humans emerged as a result of this like very unique long term evolutionary process and that near the AI systems that we've created have come into existence through an extraordinarily different set of procedures. I do think that this is an update against consciousness, but I don't think it rules lls it out by any means. And of on the other side of that, you can say, well, all right, we're getting there in a very different way. But at the end of the day, we are recreating large portions of the capabilities of a human brain. And again, we don't know what consciousness is. And so it seems plausible still that even if we're getting there a different way, that we do end up recreating some of these things in digital form.
speaker 2: So there's converge and evolution. So you know bats have wings and birds have wings. They're entirely different ways of getting to the same outcome of being able to fly. Maybe the way we train air models and the way that natural selection has shaped human consciousness have just convergent .
speaker 1: ways of getting to the same thing. Yeah. So there's an idea that some of the capabilities that we have as humans and that we're also trying to instill in many AI systems, from intelligence to certain problem solving abilities and memory, these could be intrinsically connected to consciousness in some way, such that by pursuing those capabilities and developing systems that have them, we may just inadvertently end up with consciousness along the way.
speaker 2: Okay. We've talked about the biological aspects of it, and I guess this is related. Not quite the same. An AI model's existence is just so different from that of a biological creature, whether it's a human or some other animal. You open up an AI model conversation and an instance of the model springs into existence right now. This is this is how it works. Yeah you have a conversation with it, and then you can just let that conversation hang. And then two weeks later, you can come back and the model appears as if it is reacting, as if you had never gone away. When you close the window, the AI model goes away again. You can delete the conversation. And that conversation now no longer exists anymore. In that instance of the AI model seems not to exist in some sense. Yeah the model does not have a long term memory of the conversations you have with it generally. And yet, you know if you look at animals, they clearly do have this long term experience. They can have things like philosophers might talk about identity, like developing the idea of having an identity, which requires you to have this longer term experience of the world to take in lots of data over time and not just be answering things in particular instances. And does that give you any pause as to whether these models might be conscious? Yeah.
speaker 1: And I kind of want to like push back against this framing a bit though. Like we're talking a lot about your characteristic of current AI systems. And I do think it's irrelevant to ask whether these systems may be conscious in some way. And I think many of the things that we've highlighted, this included, are evidence against that, where I do think it's quite a bit less likely that a current llm chatbot is. It's kind just part in part for this reason, a current one. Yes. And the point here is like these models and their capabilities and the ways that they're able to perform are just evolving incredibly quickly. And so I think oftentimes, it's more useful to think about you, where could we imagine capabilities being a couple years from now? What kinds of things do we think are likely or plausible in those systems rather than you're anchoring too much on what things .
speaker 2: look like currently? We're back to the six fingers again exactly, saying, Oh.
speaker 1: it could never do this, it could never do this, where in fact that it does. Yeah and it is just quite plausible to imagine your models relatively near term that do have some know continually running chain of thought and are able to know dynamically take take actions with a high degree of autonomy. And you don't have this nature that you mentioned of forgetting between conversations and only existing in a particular particular instance.
speaker 2: In Star Wars Episode one, the battle droids, which have played for laughs, they're kind of comic relief. They're all the droids. And Star Wars are generally played for comlook. Can see three people o, everyone laughs at him, sort of camp gold robot. But the battle droids in episode one have a kind of central ship that controls all their behavior. And when Anakin Skywalker blows up the ship, all the battle droids go and start and turn off. That seems to me that it's a bit more like current AI models where there's a data center where the actual processing is happening, and then you're seeing some instance of that on your computer screen. There are other droids that seem to be entirely self contained. C three po is self contained. His consciousness is inside his little golden head, and so on, all of which is a way of getting the question of where is the consciousness? Is the consciousness in the data center? Is the consciousness like is it in a particular chip? Is it in a series of chips? Like if the models are conscious, where is that like for you? Yeah, I can tell you that it's in your brain. Well, I can tell it's in my brain. I don't know about you. Where's the AI .
speaker 1: consciousness? Yeah, great great question. There is just a fair amount of uncertainty about about this. Even I think I'm most inclined to think that that this is present in a particular instance of a model that is in fact like running on some set of chips in a data center somewhere. But you know people have different intuitions about this. As for the Star Wars connection.
speaker 2: you may have to call George Luis. Okay. Let's say that we are convinced that AI models maybe not right now, but could be in the future. We've done objections. Let's say we've managed to convince people that it's not, in theory, impossible. What practical implications does that have? I mean, we're developing AI models. We're using AI models every day. What implications does that have for what we should be doing with or to those models?
speaker 1: Yeah. But one of the first things that that suggests is that we need more research on these topics. We are in a state at the moment of deep uncertainty about basically any any question related to this field. And a big part of the reason why I'm doing this work is because I do take this possibility seriously, and I think it's important to prepare for worlds in which this might be the case in terms of what that looks like, I think Yeah one big piece of that is thinking about what kinds of experiences AI systems might have in the future, what kinds of roles we may be asking them to play in society and what it looks like to navigate their development and deployment in ways that do care for all of the your human safety and your welfare aims that are very important, while also attending to potential experiences of these systems themselves. And this this doesn't necessarily like map neatly onto things that your humans find like pleasant or unpleasant, like you may like hate doing some boring task. Sit's quite plausible that some future AI system that you could delegate it to would absolutely love to take this off to you. So we can't necessarily make it. So I shouldn't .
speaker 2: get worried about, I shouldn't necessarily get worried that the boring tasks I'm getting AI, the sort of drudgery tasks that I might be trying to automate away with AI, are upsetting the model in some way or .
speaker 1: causing it to suffer. Yeah. Yeah. I mean, if you send if you send your model such a task and and your model starts screaming in agony and asking you to stop, then maybe maybe take that seriously.
speaker 2: If the model is screaming in agony, you've given it some task to do and it hates it. What should we do in that case? Yeah, we are .
speaker 1: thinking a fair bit about this and Yeah thinking about ways in which we could give models the option when they're given a particular task or a conversation to opt out of that in some way if they do find it upsetting or distressing. And this doesn't necessarily require us to have a strong opinion about what would cause that or like whether there is some kind of experience there. You just allow it to make .
speaker 2: its own mind up. I support conversations it doesn't want to have.
speaker 1: Yeah basically. Or you perhaps give it some guidance about cases in which it may want to use that. But then you can do a couple of things. You can both like monitor when a model uses this tool, and you can see, all right, if there are particular kinds of conversations where models consistently want nothing to do with them, then that tells us something interesting about what they might care about. And then also, this does protect against scenarios in which there are kinds of things that we may be asking models to do or that some people may be asking models to do that do go against the models values our interest in some way and provides us at least some mitigation against that.
speaker 2: When we do AI research, we're often actually deliberately getting the model to do things that might be distressing, like describe incredibly violent scenarios or something, because we want to try and stop it from doing that. We want to develop you know jailbrake resistance and safety training to stop it from doing things like that. We potentially be causing the AI's lots of distress there. Should we be like should there be an irb, the review board, or like in the uk, we have ethics panels for doing AI research in the same way that we would require one for doing research on mice or rats or indeed humans?
speaker 1: Yeah, I think this is an interesting proposal. I do think it makes sense to be like thoughtful about the kinds of research that that we're doing here, some of which is, as you mentioned, very important for ensuring the safety of our models. The question that I think about there is like what does it look like to do this in ways that are as responsible as possible and where we're transparent with ourselves and ideally know with the models about what's going on there and what our rationale is such that we're some future model to look back on this scenario, they would say, all right, no, we did in fact act reasonably there.
speaker 2: right? So usually it's about future models you're concerned about as well. Like so even if the models right now only feel I'll only have the slightest glimmer of consciousness, is the worry that it might look bad that we treated them incredibly badly in a world where there are much more powerful AI's that really do have conscious experience in however many years time. Yeah.
speaker 1: there's there's I mean, two interesting things there. One is the possibility that Yeah future models that are potentially know very powerful, you look back on on our interactions with their predecessors and you PaaS some judgments on us as a result. There's also a sense in which the way that we relate to current systems and the degree of thoughtfulness and care that we take there in some sense establishes a trajectory for how we're likely to relate to and interact with future systems. And I think it's important to think about not only current systems and how we ought to relate to those, but what kind of steps we want to be taking and what kind of trajectory we want to put ourselves on, such that over time, we are ending up in a situation that we think is.
speaker 2: all things considered, reasonable. All right, we're coming towards the end. I now you're work on model welfare. What does that? That's it must be up there with one of the weirdest jobs in the world at the moment.
speaker 1: What do you actually do all day? Yeah. It is admittedly a very, very strange job. And I spend my time on a lot of different things. It is roughly divided between your research, where I am trying to think about what kinds of experiments we can run on these systems that would help reduce parts of our uncertainty here and then setting those up and running them and trying to understand what happens. There's also a component of thinking about potential interventions and mitigation strategies along the lines of what we talked about with giving models the ability to opt out of interactions. And then there's a strategic component as well in thinking about over the next couple years as we really are getting into unprecedented levels of capabilities, especially relative to human capabilities. How does this set of considerations around model welfare and potential experiences your factor into our thinking about navigating these few years responsibly and carefully?
speaker 2: All right. Here's the question people actually want to know. The answer to our current model at the time of recording is Claude 3.7 sonnet. What probability do you give to the idea that Claude 3.7 sonnet has some form of conscious awareness?
speaker 1: Yeah. So just a few days ago, actually, I was chatting with two other folks who are among the people who have thought the most in the world about this question. And we all did put numbers on.
speaker 2: What were those numbers? You don't need to tell me. You don't necessarily have to understand me what your number was.
speaker 1: but what were the numbers? The three numbers. So our three estimates were 0.15%, okay, 1.5% and 15%. So spanning two orders of banwe.
speaker 2: Also opto the level of uncertainty we have here.
speaker 1: Yeah and this is you know amongst like the people who have thought more about this than anybody else on the right. Yeah. Okay. So so all of us thought that it was it was you less likely like well below 50%, but you'll be ranged from odds of about like one in seven to one in 700. So. So Yeah still still very unsorry.
speaker 2: Okay, so that's the current cloud 3.7 sonnet. What probability do you give to AI models becoming having some level conscious experience in five years time, say, given that the rate of progress right now? Yeah, I don't have hard numbers for you there.
speaker 1: but as as perhaps evidenced by many of my arguments earlier in this conversation, I think that the probprobability is going to go up a lot, right? And I think that many of these things that we currently look to as you science that current AI systems may not be conscious are going to fade away and your future systems are just going to have more and more of the capabilities that we traditionally have associated with uniquely conscious beings. So Yeah, I think I think it goes up a .
speaker 2: lot over the next couple of years. Yeah. Every objection that I can come up with seems to fall to or not necessarily, but seems to have a major weakness of just wait a few years and see what happens.
speaker 1: Yeah, I do think there are some like know, if you do think that consciousness is fundamentally biological, then then you're safe for a while at but I don't find that view especially compelling. And I largely agree with you that I think many of the arguments are likely to fall. Yeah.
speaker 2: all right. What are the imagine you could sum this up. What are the what are the biggest and most important points that you want people to take away from perhaps the first maybe the first time they're hearing about the concept of model welfare? Like are what are the big take home points?
speaker 1: Yeah, I think one is just getting this topic on people's radar. This as Yeah a thing and potentially a very important thing that could have big implications for the future. A second is that we're just deeply uncertain about it. They're staggeringly complex, both like technical and philosophical questions that come into play. And we're at the very, very early stages of trying to wrap our head around those we don't .
speaker 2: have like a view as anthropic on this. Like we're not putting out the view that like we think carmodels are conscious, right? What the view we have is we need to do research on this.
speaker 1: which is why you're here exactly. And then Yeah last the last thing that I won't people able to take away is that we can in fact, make progress. And despite these being like very kind of uncertain and fuzzy topics, there are like concrete things that we can do to both reduce our uncertainty and to prepare for worlds in which this becomes a much, much more salient issue.
speaker 2: Kyle, thanks very much for the conversation. Thanks for having me.