2025-03-04 | CS224N | Lecture 18 - NLP, Linguistics, Philosophy

NLP语言学与哲学探讨及AI未来发展

视频科技

媒体详情

上传日期: 2025-06-05 22:45
来源: https://www.youtube.com/watch?v=NxH0Y78xcF4
处理状态: 已完成
转录状态: 已完成
LLM 提供商/模型: openai/gemini-2.5-pro-exp-03-25

转录

speaker 1: Okay, hi everyone. I'll get started. The last class. Okay, Yeah. Well, welcome. Congratulations and thank you to making it to the last real lecture of cs 224n. Yeah. So this is the plan for today and the lectures titled nlp linguistics and philosophy, which I took as meaning that I could talk about anything I wanted to. So that is what I'm going to do. So this is sort of this what we're going to go through, talk a bit about the major ideas of cs 2:24n and open problems, some of the more foundational questions of where are we with llm, symbolic versus neural systems, meaning and linguistics and nlp. And then I'll close with some slides on the future risks of AI in the world. Okay. So here as an attempt to sort of lay out the most major things that we looked at in cs 224n, we stuwith word vectors, and we developed the idea of neural nlp systems. We expanded from a simple feed Ford network into doing sequence models, language models, rand, ns lstms. And then we introduced this powerful new model that's been very influential, the transformer. And then we built from there to the kind of, it's not exactly an architecture, but model that's been built up in recent years to produce high performance nlp systems where we're first doing pre training and then A A post training phase of various techniques that we talked about to produce these general foundation models that understand language so well. And then we went on from there and talked about various particular topics like benchmarking and reasoning. So a few of the major ideas that we looked at were this idea that you could get a long way by having dense representations, those are our hidden representations in neural networks, and then looking at distributional semantics, representing words by their context, first slogan of you shall know a word by the company it keeps. And I'll come back to that a bit later and talk about ideas of meaning. But you know that's essentially been the idea that has driven most of the successful ideas of modern nlp, whether it's the earlistatistical nlp phase or more modern neural nlp phase. And in this world, we start instantiating there as these models of word vectors. But the same contextual idea is then used in all the models up through transformers. We looked at both the challenges and opportunities of training large, deep neural networks, and how gradually people developed ideas and tricks, such as having residual connections, which made it much more possible and stable to do successfully, which took us from a place where a lot of this seemed black magic that was hard to get right, to people being able to very reliably train high performance transformer models. We talked about sequence models, what's good about them and some of their problems, and how those problems have been addressed in large measure by adopting this different architecture of transformers, which give a form of parallelization. And then we moved into the modern form of pre training by language modeling, where language modeling seems a simple thing, predicting words and context, but it emerges as what we think of as a universal pre training task that all kinds of both linguistic and world knowledge help you to do this task of predicting words better. And so this has ended up as just a general method to produce the kind of powerful, knowledgeable models that we have today. And up until now, there's been this amazing property that we see, this empirical fact that we seem to just get this basically, well, not basically, it's extremely linear improvements as performance as we continue to scale data and compute and model size up by orders of magnitude. That doesn't mean that all problems in nlp are solved. There are lots of things that people still work on and see opportunities to try and make things better. And a few of these are mentioned on the next few slides. So there's a real question of how much these models are good at actually learning to be able to do things generally rather than just being very good at memorization. That a lot of the benefits of what we're getting from these large pre trained language models is that they've seen a huge amount of stuff, and therefore, they know everything. They've seen every pattern before, and they know how to use things. So I've occasionally used the analogy that large language models are sort of like a talking encyclopedia, that they're really, in many ways more like a huge knowledge store than necessarily something that is intelligent in the sense of being able to work out how to solve new problems and generalize as human beings do. A kind of interesting fact actually is that in some ways, transformer models are actually worse at generalising than the older lstms that preceded them. So here's just one little graph I'm not going to spend a lot of time on, but this was looking at data that's being generated by a finite automatter and then trying to learn it from a limited amount of data with either an lstm or transformer. And the observation is that you know at the scales that they're working, even having seen quite limited exemplification, the lstm is basically sealing the entire of this graph, right? Is just at the one line, because it generalizes in good ways because of its lstm architecture, whereas the transformer needs to see a ton more data before it actually learns the patterns well. And so if we think of one of the prime attributes of humans, intelligence is actually we're amazing at figuring out and learning things from very limited exposure, right? You know there's something that you don't know how to do, and a friend shows you once what you do to make it work. And by and large, you know you'll improve a few times with practice, but you can learn effectively new skills from these kind of single shot examples. And that's not always what we seem to be seeing in our models. There's a lot of interest in what's going on inside neural networks that a lot of the time neural networks still appear as black boxers where we have no real idea of how they're doing, what they're doing. And as perhaps for your final projects, the main thing you're doing is measuring the final performance number and seeing if it goes up or not. So there's a lot of interest and better understanding. What do they learn? How do they learn it? Why do they succeed and fail? And a lot of that work is starting to look more closely into what's happening inside neural network computations. There is some work of that sort that actually goes back quite a fairway. So here's an old blog post by Andre keparthy while I was a grad student here in 2016. And he was looking at lstms and how do they learn, and he found that one of the neurons in an lstm cell was effectively sort of measuring position along a line of text. And as the line of text got long, its sort of value started to change, because the model was learning that there was sort of a line length of this text and that the line was likely to be ending at that point. And in recent times, there started to be with transformers as well, a lot of work looking at mechanistic interpretability or causal abstraction, trying to understand the internals and models. A problem that's far from solved, and in many respects probably unsolvable, is the multilingual question of dealing with all the other languages of the world that you do have to keep in your head, that whatever you see for English, it's worse for every other language. And what they're getting out of modern language models. Now, you know there is a good news story here. I don't want to claim that everything is terrible. So in this graph, which is kind of small, the blue line was the performance of GPT -3 point five English. And then all of the Green bars are then the performance of GPT -4. And so you know there's a genuine good news story here, which is look, not just for English, but for a lot of other languages, for Greek, Latvian, Arabic, Turkish, all of them in GPT -4 are better than English was in GPT -3 point five. So you know that's the good news argument. But building these models big is in some sense raising all boats. But you know these are still all huge languages. And you know things are starting to drop off at the bottom of this table for languages where the performance is worse than English and GPT -3 point five. But even for those languages, their languages for which much less written data is available, but they're still large languages. So the ones that the three at the bottom are actually all Indian languages. They're Punjabi, Murati and Telugu, and which are languages that are each spoken by millions of people. They're not small languages. So the real question is, what happens when you actually get to the small, low resource languages? So the vast majority of languages around the world don't have millions of speakers. They vary from having hundreds of speakers to hundreds of thousands of speakers. And there are thousands of such languages. A lot of those languages are primarily oral and have very limited amounts of written text. Now, some of those languages are likely, or many of those languages are likely to go extinct in the coming decades, but many of those language communities would like to preserve their languages. And it's very unclear how the kind of language technologies that we've been talking about in the later parts of the course can be extended to those languages because there just isn't sufficient data to build the kind of models that we've been looking at. So I imagine you've gotten some idea in this course of how evaluation is a huge part of what we do that effectively, a lot of the way that progress is being driven is by defining evaluations of what models should be able to achieve, and then people working to measure systems and improve systems so that they do better on what we see as good language, other understanding or other properties. One of the concerns that many people have about what's happened with the large recent closed models from large companies is a concern that all of the benchmarks are being sullied and not to be trusted. So here's one example that comes from a tweet of Horace hearus, and he's noting, I suspect gpc four's performance is influenced by data contamination, at least on code forces. One of the coding benchmarks of the easiest problems on code forces, it solved ten out of ten pre 20, 21 problems, but zero out of ten recent problems. This strongly points to contamination. And the worry is that every time you're seeing these fantastic results of how well the latest, best language model is performing, that at this point, so much data is on the web that gets included in the pre training data for these large language models. That essentially they're memorizing at least a good share of the questions they're appearing in these challenges. So they're not actually solving them in a fair way as an independent test set at all. They're just memorizing them. And so there's sort of issues then as to you know what kind of thoroughly hidden test sets we can have or dynamic evaluation mechanisms so we can actually have benchmark integrity, another huge area the number of us evolved in at Stanford elsewhere is making nlp work in different technical domains. So domains including biomedical or clinical, medical dical nlp have a lot of differences of vocabulary and usage. They have a lot of potential good uses, but they also have a lot of potential risks of doing harm if the language understanding is incomplete. I myself have been more involved doing things in the legal nlp, working with other people at the reg lab with Dan ho in building foundation models for law. And there are all kinds of ways, again, in which this kind of technology could be really useful. The biggest problem in most countries, it's bad in the United States, but it's way worse in a place like India, is that most people can't get access to the kind of legal help that they need to solve their problems because of the cost of it and the lack of trained lawyers. So if more could be done to be able to help people fire nlp tools, you know in principle that would be great, but in practice, the tools still don't have good enough language understanding. So in the reg lab, there's a just completed study out at the moment looking at legal nlp systems, and we were finding that the hallucination rate, rate at which there was made up stuff in their legal answers was effectively for one question in six, which is in a very good accuracy rate. If you're someone who's wanting to rely on these systems for legal advice, there are lots of things also to work out, dealing with the social and cultural aspects of nlp. Nlp systems remain very biased against various cultures and religions. They have certain social norms, you could say that they pick up from somewhere, but those social norms are very biased. It's against certain groups and related to there being small languages that I mentioned before that there are lots of issues with underrepresented groups in having the kind of nlp that theylike to have. Okay. So that's sort of the summary of that bit. So for the next bit, I thought I'd just sort of give one more bit of perspective on where are we with the best language models like GPT -4. I mean, I think it's really interesting at this moment of where we are because you know on the one hand, the performance of these models is just amazing. And you know even as someone who works in nlp and has worked in it for many, many years now, I mean, you know I can tell her a sort of story that these models, you know, that we do this training to predict the next word, and it's conditioning on a lot of text, and it knows about things and it does. But you know, in some sense, these things still seem like magic, right? It's just kind of hard to believe how this could possibly work. So in this example, I asked chagpt four zero. I did that this this morning. I asked ed to write a sonnet explaining the transformer neural net architecture, in which every line begins with the letter t. And it sort of still, frankly, blows my mind. And I don't actually feel I can really explain even to myself in a way that's convincing, how this large transformer is able to take all its pre training text and reading that instruction. And as a next token prediction machine, it successfully produces something that is a sonnet. And every line begins with the letter T. I hope you remember from your high school English classes, sonnet cement meant have ab rhyming. It's a little bit imperfect here. Net set, some people who complain about that, there's basically a rhyme, behold, unfold, grass clasp, right? So it's basically doing the writing, rhyming riand. The wines are meant to be an iambic pentameter and it's basically produced lines and iambic pentameter to delve into the world of neural nets. You know, iambic pentameter. Yeah, it's incredible. So here's it's sonic it came up with to delve into the world of neural nets. Transformers rise and marvel to bewhole through layers deep. The network's path is set to learn from data patterns, to unfold the tokens feed into attentions, grasp to weigh their import context to a gn the queries, keys, and values in a clasp, to process sequences with power fine through heads of many multifaceted site, to paralleze, capturing the span, transforming each a matrix math delight. So synthesize with elegance and plan the models might in every task it shows to turn raw data into wisdom prose. Now, you could object that I'm not sure this exactly explained the transform neural net architecture. It's a little bit abstract, I'll give it that. But you know in another sense, it did in one place or another evoke quite a bit of stuff about transformers with queries, keys and values and multi headed stuff, parallel eyes with matrix math, whatever else. Yeah still kind of blows my mind how well that works. And you know indeed, as natural language understanding and sort of world understanding devices, I mean, these devices have clearly crossed the threshold in which they're very usable in many contexts. So here's there's now started to be some fairly good studies that have been done on you know how much value people can get out of using llms like GPT -4. So this study by delecqua and a whole lot of coltics, including Ethan Molik, they took a bunch of consultants from the Boston Consulting group. And so you know the front test like that means you know 23 year olds graduating from universities like this one, but more on the east coast. They've become you know Boston consultants, you know not exactly dummies. And so they've found in this study so you know controlled task, there are actually so three groups. But the big contrast is that two of the groups were using GPT -4 to do consulting tasks and one of the groups wasn't using GPT -4 to do tasks. The difference between the two that were was one of them was given more training on how to use GPT -4, but that didn't seem to make much of a difference. But their result was that the groups using GPT -4 in their study completed 12% more tasks on average. They did the task 25% more quickly, and the results were judged 40% higher quality than those not using AI, which I think is a pretty stunning success of how GPT -4 or similar llms are good enough to actually help people get real work done with whatever asterisk you want to put about the quality of management consultant work in various instances. Yeah. I mean, and the interesting result is that you know using these llm seem to be a big leveler and actually you see exactly the same thing for people using coding llms, that they're a huge assistance for people whose own skills are weaker and they're much less of an assistance for people whose own skills are strong. Okay, so that's a good news story. But you know if on the other hand, you're more like the good news story for human beings, here's a study that goes in the other direction. Can GPT -4 write fiction that matches the quality of New Yorker fiction writers? And the result of that study was not even close, that GPT -4 was measured as three to ten times worse at creative writing than a New Yorker fiction writer. So there's still hope for human beings. Hang on there. And so you know I think that's kind of the you know the jeal screen picture that we have at the moment. Then in some ways, these things are great and useful. In other ways they're not so great. And I think that's something that we're still sort of going to be seeing playing out in the future years. I think living in Silicon Valley, we see a lot of the positive hype. So if you just want na see a little bit of the negative on the other side, late last year, there was a piece in the Financial Times which was titled generative AI, hyper intelligent. And I won't read all of this. But basically they were wanting to express considerable skepticism of the current AI boom. Investors should keep their heads. Expectations for generative AI running way ahead of the limitations that apply to it. As investment in generative AI grows, so does pressure to create new use cases. By 2027, idc thinks enterprise spending on generative AI will reach 143 billion, up from 16 billion this year, so ten times up. OpenAI hopes for more funding to pursue human like AI. It is worth remembering that when examining openman's plan for superintelligence, models predict they do not comprehend that limitation cast out on AI achieving even human like intelligence. And then they sort of start talking about some of the problems with you know, limited gains for law, skilled workers, inaccuracies and the work they produced, and suggest that the limitations will become more obvious as generative AI tools roll out. That will put pressure on providers to address costs. AI could add 4 trillion profits, says mckinensey, but price and clarity is lacking. Without it, companies cannot predict what financial gains AI can accomplish, and AI cannot predict that either. Okay, that's that topic. I'm chugging through my topics. The next topic is I wanted to return and say a bit more about symbolic methods that dominated AI from the sixties until about 2010, versus what I termed here as cybernetics. Because the original alternative, going back to the fifties and sixties was called cybernetics. And in a very real sense, neural networks is a continuation of the cybernetics tradition rather than the AI tradition that started in the fifties and sixties. In this context, Stanford is the home of the symbolic systems program. So at the moment, we are unique in having a symbolic systems program. So the name symbolic systems came about because at the time it was started, so I guess philosophy was an active part of the symbolic systems program. And John barweshown in this picture, and he actually died Young, so he actually died in 2000. John barwehad, a very strong belief that you were meant to be dealing with meaning in the world and the connection between people's thinking and the world. So he refused to allow the program to be called cognitive science, as it's called at most other places, and it ended up being called symbolic systems. Now, at one point, there were two universities that had symbolic systems, because John barweactually moved away from Stanford and went to Indiana, which is where he originally was from. And so Indiana also had a symbolic systems program for a number of years, but they've actually changed theto cognitive science now since he died. We are unique in having symbolic systems. And so the idea of symbolic systems, this is sort of what's on the website with a bit of interpretation, right? So symbolic systems studies systems and meaningful symbols that represent the world about us, like human languages, logics and programming languages, and the systems that work with these symbols, like brains, computers, and complex social systems. Contrasting that to the sort of typical view of cognitive science, which is focusing on the mind and intelligence as a naturally occurring phenomenon, symbolic systems gives equal focus to human constructed systems that use symbols to communicate and to represent information, right? So in AI terms, you know, AI as a field, and the name AI arose around arguing for a symbolic approach, right? That John McCarthy used the color photo there, and who founded Stanford's artificial intelligence and the original famous Stanford AI lab. So John McCarthy came up with the name artificial intelligence, and he very, very explicitly chose a new name to disassociate what he was doing from the cybernetics approach, which had been pursued by people including Norbert Wiener at mit, who's shown on the right side. So Marvin Minsky, the teeny photo down here, sort of founded artificial intelligence at mit. McCarthy worked with him for a few years, and then McCarthy came to Stanford. And two of the other, most prominently AI people, Newell and Simon, who were at cmu, and the other two people on the right side. And so in particular, Newell and Simon developed actually, no, let me say a sentence first. Yeah. So I mean, McCarthy's own background was a mathematician, logician, right? So that he wanted to construct an artificial intelligence that looked like math and logic effectively, right? And that sort of was AI as a symbolic system, and that was developed as a position in the philosophy of artificial intelligence by Newell and Simon. And so they developed what they called the physical symbol system hypothesis. So that said, a physical symbol system has the necessary and sufficient means for general intelligent action. And so you know that's a super strong claim. It's not only claiming that having a symbol system allows you to produce artificial general intelligence, but through the necessary clause that you can't have artificial general intelligence without having a symsystem. So that was sort of the basis of classical AI, right? And that kind of contrasts a bit with the you know so cybernetics you know had its origins and sort of control and communication. So it's much nearer to sort an electrical engineering kind of background and was wanting to sort of unify ideas of control and communication between animals, maybe perhaps more than humans and machines. Yeah. So I mean, you know sort of in Yeah so cybernetics comes from a Greek word Kubernetes, which is sort of interesting all the uses it has. So it's exactly the same route that occurs both in Kubernetes, if you're familiar with that, as you know, distributed containers and modern systems, but also it's actually the same route that the word government comes from. Of course, it's a control system as well. Yeah. So under the cybernetics tradition was when neural nets first started being explored. The very earliest neural nets are the most famous ones of Frank rosenblatz, which we used for vision. The neural net was actually wired to say just a teeny bit about this. In case you think that AI hype is only a thing of the 2020s, there was just as much AI hype in the 19 fifs when Rosenblatt unveiled his perceptron. So in the New York Times article about it, new navy device learns by doing psychologist shows embryo of computer design to read and grow wiser. The navy revealed the embryo of an electronic computer today that it expects we'll be able to walk, talk, see, write, reproduce itself and be conscious of its existence. And this hype is all the more incredible when you get to the later paragraph of the article and you find out what the demonstration was actually of. And the demonstration that people were shown was that this device learned to differentiate between right arrow and left arrow pictures after 50 exposures. But there you go. Okay. Yeah. So what what do we make of this in the case of nlp and language? And you know the position I would like to suggest is you know there's just no doubt that language is a symbolic system, right? But humans developed language as a symbolic system. It's perhaps most obvious that if you think about it in writing, we have symbols of the letters and words that we use. But even if there's no writing, and you know the majority of human language use over time has been verbal human language use, that even though the substrate it's carried on, whether sound waves or in sign languages, movements of hands, even though that's a continuous substrate, the structure of human languages is a symbol system. We have symbols, which are the sounds of human languages. For cat, we have a cut, an at and a turthose are symbols. And they're recognised in a symbolic way by language users. And indeed, all the pioneering work and categorical perception in cognitive psychology is done with the sounds of human languages, the phonemes, as linguists call them. So spoken language also has a symbolic structure. But you know, going against neand, Simon, the fact that humans use a symbol system for communication doesn't mean that the processor of the symbols, the human brain, has to be a physical symbol system. And so similarly, we don't have to design nlp, our computer processes, those physical symbol systems either. The brain is clearly much more like a neural network model. And probably neural models will scale better and capture language processing better than something. There's a symbolic processor in the same way. I mean, that sort of leaves behind the question of, well, why did humans come up with a symbol system for communication? I mean, after all, you know, we could have just sort of hummed at different frequencies and that could have been used as our system of communication. I mean, I think the dominant idea, which seems reasonable to me, but who knows, is that having a symbolic system gives signaling reliability, right? That if you have discrete target points that are separated, then that gives you an ability when there's degradation of the signal to recover it. Well, Yeah. So where does that leave linguistics, which has mainly been developed in terms of describing a symbolic system? I think the right way to think about it as linguistics is good for giving us questions, concepts and distinctions when thinking about language acquisition, processing and understanding. And indeed, one of the interesting things that's come about is that of as nlp and AI have been developed further and is able to do a lot of low level stuff, that there's actually the sort of higher level concepts that linguists often talk about a lot, things like compositionality and systematic generalization, which I'll come back to in a few minutes, the mapping of stable meanings for symbols, the reference of linguistic expressions in the world that they get talked about more and more in artificial intelligence, context, building neural systems. And I mean, I think one way to think about is that you know a lot of the early neural network of most notably visual processing, but also other kinds of sensory stuff like sounds. I mean, doing that is sort of what gets you to insect level intelligence. And if you want to get higher up the chain than insect level intelligence, then a lot of the kind of questions and properties of linguistic systems become increasingly relevant. At a slightly more prosaic level that I don't think one necessarily wants to believe all the fine details of different linguistic theories, but you know for how human languages are structured and how they behave, I think Yeah most of our broad understanding of linguistics is right. And so therefore, when we're thinking about nlp systems and we're thinking about know understanding how they behave, wanting to know whether they have certain properties, thinking up ways to evaluate them, a lot of that is done in terms of linguistic understanding, wanting to see where they capture facts about sentence structure, discourse structure, semantic properties like natural language inference, whether you can do things like bridging an effort. Ra, which I did not cover this year's class because we skipped the co reference lecture. When we slice one lecture off the class, metaphors, presubtions, all of these things are linguistic notions that we try and get our nlp models to capture. So I just want to say a couple more remarks about you know the role of human language in human intelligence. I think this is kind of interesting. So an interesting person in the history of linguistics is this guy, bill helm von humbot, who was a prominent German academic. So really, the American education system was borrowed from Germany, right? So up until the Second World War, the preeminent place of science and learning was Germany. And Germany essentially vivon Humboldt's work developed the idea of having graduate education. And the us copied graduate education from Germany and started doing its own. But you know, in that context, it was still the case that for people in the United States prior to the 19 thirties, that generally people would go to Germany to finish their education, either to get their PhD or to do a postdoc or something like that, right? So, you know, if you trace back my own academic tree or most other academic trees of people who got PhDs in the us, they actually go back a few generations and then they go back to Germany. So we don't think of that as much in the modern world. Yeah. So Humboldt was influential in developing the University system, but he also worked a lot on language. And I mean, he's someone that Chomsky always cites because he's known for this famous statement about that human language must make infinite use of finite means. So the fact that we have a limited supply of words and sentence structures, but out of those where can recursively build up an infinite number of sentences. And that's, in chomsy view, supporting the kind of symbolic, structured view of language that he's been advocating. But I think there's sort of another interesting take of on Humboldt, which we can argue whether it's right or not, but I think is kind of interesting. And one of the things he wants to stress is that language isn't just something used for the purpose of communication, that he, I should actually introduce something here. So carneman Verska, two well known cognitive psychologists, and they introducthis idea that there are two kinds of thinking, system one, cognition, and system two cognition. And system one is the kind of subconscious thinking that you're not really thinking of, just we process stuff when it comes into our heads, whether visual signals or speech. And system two, thinking is the conscious. Let me think about this and try and figure out what's going on. I'm solving the math problem style of thinking. And you know I think you can see in von Humboldt's writings essentially the same kind of distinction between system one and system two cognition, although he refers to system one cognition as x of the spirit and system two cognition as thinking. Yeah, then so basically he argues for a version of the philosophical position of the language of thought. Of suggesting that effective system two thinking requires extension of the mind through the symbols of language. And so he argued that having language is absolutely a necessary foundation for the progress of the human mind. And I think that's actually an interesting perspective, which I have some sympathy with. I mean, you know, obviously we can think without language. You know, we can feel afraid. We can think visually and about how things that fit together. But I think it's fairly plausible that for the sort of more abstract, larger scale thinking that humans engage in and has led them to sort of of higher levels of thought than a chimpanzee gets to, that language gives us scaffolding inside the mind that makes that possible. Another version of that is from the philosopher Daniel Dennett, who just actually died a couple of months ago. So Dennett wrote this book called from bacteria to bark and back. And the main thing this book was about was the origin of human consciousness. And I'm not going to talk about human consciousness today, but he introduced this model of four grades of progressively more competent intelligences. And so the four levels he outlined was that the bottom one was Darwinian. So Darwinian intelligence was something that was predesigned and fixed. It doesn't improve during its lifetime. Improvement only happens by evolution through genetic selection. So things like bacteria and viruses, Darwinian intelligences, so then after that was scinarian intelligences. And so they improved behavior by learning to respond to reinforcement. So something like a lizard, or you know perhaps a dog, we could argue about how intelligent dogs are, has scinarian intelligence. And so then the third level up, poperian intelligence, is things that learn models of the environment. So they can improve performance by thinking through plans and then executing them and seeing how they behave. So in a computational sense, poperian intelligence kind of means that you can do model based reinforcement learning. And so primates, like chimpanzees, can definitely do the kind of planning and model based reinforcement learning that gives you a Paparian intelligence. But actually, a lot of recent evidence shows that a lot of simpler creatures can also do it. So I'm not sure the facts here. So all these studies you see are about crews from the South Pacific, Australia and Fiji and places like that. So I'm not sure if northern hemisphere crows are dumber, but at least southern hemisphere crows can learn plans so that they can do multiti stage planning to work out ways to get a piece of meat that's down a hole by learning to pick up a stick and poke it in. And so you know that even crows can be preparian intelligences. But what dent suggests is that there's a stage beyond poparian intelligence, which he calls Gregorian intelligence. And the idea of Gregorian intelligence is that you can build thinking tools which allow you to do a higher level of control of mental searches. So he suggests that, you know, things like, well, mathematics is a thinking tool, but well, also a democracy is a thinking tool, but nevertheless, out of the space of thinking tools, that human language is the preeminent thinking tool that we have. And so he suggests that, you know the only biological example we have of a Gregorian intelligence is human beings. And so I think in that kind of sense, you can say that there's a very important role for language. Two parts to go in my summary. Okay. So the next one is what kind of semantic should we use for language? And so this is getting back to the question I mentioned for word vectors. And this is kind of interesting. So the semantics that's been dominant in philosophy of language or in linguistic semantics, is a notion of model theoretic semantics, where the meaning of words is their denotation, what they represent in the world. I mentioned this, I think, in an early lecture, right? So that if you have a word like computer, the meaning of compuis, the set of computers, this one, that one, that one or the other computers are out, right? So it's denotational relationship between a word and it's denotation in the world or in a model of the world. And that was the notion that was used in most of the history of AI for doing symbolic AI. And that then contrasts with this sort of distributional semantics, that the meaning of a word is understanding the context in which it's used, which is effectively what we're using for our neural models. Yeah. So if you look at the traditional view of understand, interpreting the meaning of human language, and this is what you'll have seen if you did an intro logic class at some point, right, that we have a sentence, the red apple is on the table and you get to write in some logical representation, first autopredicate calculus or whatever. This one's a bit different to allowing thus, where normally for first autoprediccalculus you only do for all and it there exists, but you have a sort of a formal logic. And you know, in the early week, in weeks 12 of the logic class, you have some English sentences for which you translate into formal logic. And then after that, you forget about human languages, and you just sort of start proving stuff about formal logical systems. And so to some extent, what you're going a philosophy class represents the tradition of Alfred Tarsky. So Tarsky believed that you couldn't talk about meaning in terms of talking about human languages, because human languages were, quote, quote, impossibly incoherent. Yeah. And so from about the sort of 19 forties until 1980, you know Tuskey was the preeminent logician in the us. He was in Berkeley. And so that was very much the view of the logicians of the world. But during that period, one of his students was this guy, Richard Montague. So Richard Montague sort of rebelled against that picture, saying, I reject the contention that an important theoretical difference exists between formal and natural languages. So he then set about showing that, well, you could start building up a formal semantics for describing the meaning of natural language sentences. And so Richard Montagues work became the foundation of the work that's used in semantics in linguistics as well. For anyone who's done link 130 or 230, the picture you saw as sort of a Montague picture of semantics. And so that was the semantics that was taken over and essentially used as the model of doing natural language understanding for most of the history of nlp, you know, roughly 1960 to 2000 1517. You know, and so the picture essentially was that if we wanted to have a sentence that we interpreted like the red apple is on the table, what we would do is wefirst produce a syntactic structure for the sentence. So we would pait, and then using ideas roughly along the lines that Montague suggested, we would construct its meaning by looking up meanings of words in allexicon, and then using the compositionality of human languages to work out the meanings of progressively larger phrases and clauses in terms of the meanings of those words and the way that they are combined. Slightly reminiscent of my discussion of tree structures, the meanings in the last lecture I gave. And so you would build up a meaning representation of a sentence. And so this could then give you a semantic meaning of a sentence that you could use in a system. This is approximately a slide except retitle that I actually used to use in cs 224n in the 2000s decade. Right? So we have a we have or part of a sentence I get. Oh no, it's a whole sentence here it is how many red cars? What can I get? The sentence. I think there's a sentence here. How Oh, how many red cars in Palo Alto does Kathy like? How many red cars in Palo Alto does Kathy like? And sorry, Yeah, the cars, sorry, got hidden underneath here. Yeah. So we have a sentence. We pause it. We look up meanings of words in a lexicon, we start composing them up. We get a semantic form for the whole sentence, which we can then convert into sql, and we can run against a database and we can get the answer. And this was outline, the kind of technology that was widely used for natural language understanding systems that were built anywhere from the 19 sixties to 2010. And you know, in particular, they were used not only in a purely kind of rule based grammar and lexicon way. This same basic technology was incorporated into a machine learning context where your goal was to start to learn various of these parts. You could not only learn the parser, but you could also learn semantic meanings of words and learn composition rules. And so the acne of that work was then what was called semantic parsing that was pioneered by Luke zetelmoyer and Mike Collins in the two thousandds decade, and then taken up by others, including Percy Liang's. So Percy Liang's PhD thesis, but also actually his early work at Stanford, before he was convinced to do neural networks was doing semantic farsing work. So you know these systems could actually work and were used in limited domains, but they're always extremely brittle. And Yeah, the interesting thing is sort of what of humans? I mean, there is you some evidence that humans do something like this, that they work out the structure of sentences and compute meanings in a bottom up, mostly projective way. You know there's a lot of controversy as to exactly how human understanding of sentences still works, but you know there are so many people have argued in support of human brains doing something similar. That's obviously not what we're getting with current day transformers. And so you know the question is, do our current day neural language models provide suitable meaning functions? And that's know a complex question because you know in many ways, Yeah, they seem to they do an amazing job at understanding whatever sentences you put into them. But there are still some genuine concerns as to where they making shortcuts or work to a certain extent and don't actually have the same kind of compositional understanding with systematic generalization that human beings do. Okay, so that's the traditional denotational semantics view. And that contrasts with the kind of use theory of meaning. And then first or second lecture, and at the beginning of this one, I attributed that to the British linguist Jr. First, you shall know a word by the company it keeps. But it's not only a position of Firth, it's also been a minority position of philosophers. In particular, it was advanced by Wittgenstein and his later work in his work, philosophical investigation. So in that work he writes, when I talk about language, word, sentences, etc, I must speak the language of every day. Is this language somehow too coarse and material for what we want to say? Then how is another one to be constructed? And how strange that we should be able to do anything at all with the one we have Philosophical Investigations that's written in this sort of vaguely poetical literary style. But the point of it is meant to be saying, look, these logician people are claiming you can't use natural human languages to express meaning, and you have to translate into this symbol system. But isn't that a weird concept, that one symbol syem's no good, but this other symbol system somehow fixes things? And then about denotational semantics, he writes, you say the point isn't the word, but it's meaning. And you think of the meaning as a thing of the same kind as the word. They're also different from the word. Hear the word. They're the meaning. So that's the symbol and its denotation, the money and the cow that you can buy with it, but contrast money in its use. And he goes on from there to argue for the kind of, you know, the meaning of money is the way that money can be used in the world. The meaning of money isn't pointing at pieces of money. Okay, so this is what' S Referred to as a youth theory of meaning. And so the question is, is that a good theory of meaning? So some people just don't accept this kind of distributional semantic use theories of meaning as a theory of meaning or semantics, most prominently in recent nlp work. That's the position of Bender and cola that they just take as axiomatic. The only thing that counts as having a meaning is that you've got form over here and meaning over there. But I think that that's too narrow. I think we have to argue that meaning arises from meaning of words, arises from connecting words to other things. And although in some sense you could say connecting words to things in the real world is privileged, it's not the only way that you can ground meanings. You can have meanings in a virtual world, but you can also have meanings by connecting one word to other things in human language. And the other thing that I think you need to say is, you know meaning isn't a sort of a zero. One thing that you know the denotation of word or you don't, I think meaning is a gradient thing, and you can understand meanings of words and phrases either more or less. And so this is an example I gave in a piece that I wrote a couple of years ago. Okay, what is the meaning of the word chanai? Well, maybe a few of you know it, but if you don't, well, what could I do? Well, you know, if youseen or held one youhave classic grounded meaning, know something about the denotation. Well, if that's not the case, well, you know, I could at least show you a picture of one. Here's a picture of one. So that gives you some information about what a Shai is. But, you know, is that the only thing I can do? I mean, suppose, well, sorry, I've left out a bullet point, you know, so this gives you a partial meaning of a she, but surely you have a richer meaning if youheard one being played and, well, it's showing you a picture of one. The only thing I can do, suppose younever, you know, seen, felt or heard one. But, you know, I told you it's a traditional Indian instrument, a bit like an oboe. Well, I think you understand something about the meaning of the word at that point, that, you know, it's sort of connected to India. It's a wind instrument using reads that's used for playing music. You know, I could tell you some other things about it. I could say it has holes, sort of like a recorder, but it has multiple reds and a flared end, more like an oboe. Then maybe you know a bit more about a chna, even though you've never seen one. And if you then extend to what we do more in our sort of corpus based linguistic learning, you know, you could imagine it's not that I tried to define one for you. Instead, I've just shown you a textual use example. So here, or several of those. So here's one textual use example from a week before. Chanai players sat in bamboo, a chanans at the entrance to the house playing their pipes. Bikash bababou disliked the chanines whale, but was determined to fulfill every conventional expectation the groom's family might have. So if that's all you know about Chanet, you know in some ways you understand less of the meaning of the word than if youseen one, but actually in other ways you understand more of the meaning of the word than if you've just seen one. Because you know from that one textual example, you know some things you have heard a characterization of the sound as wailing, and you know that it's connected with weddings, which you don't get from just having held or looked at while not even you know having had someone stand in front of you and play it. And you know that's an important part of the meaning of a Neito people. And so that's a sense in which I think so meaning comes from various kinds of connections. Okay, last topic, our AI future. Yeah so there are different sensors of our AI future and lots of things that we can be worried about. One thing we can be worried about, whether we're all gonna to lose our jobs. Interesting question. Here's a newspaper article from the New York Times. Much of the machine makes idle hands. Prevalence of unemployment with greatly increased industrial output points to the influence of labor saving devices as an underlying cause. This was published in the New York Times in 1928. But, you know, it turns out that quite a few people like labor saving machines, like washing machines and dishwashers and sewing machines, lots of useful labor saving machines. And, well, you know, this was published in 1928, just before, you know, at a time when a small group of immensely powerful and rich men dominated the United States just before the Great Depression. But what happened in the decades after that greatly changed policies in the United States led to boom years that distributed wealth and work much more evenly across the country. And the country boomed. You know here's another one. In the past, new industries hired far more people than those they put out of business. But this this is not of many of today's new industries. Today's new industries have comparatively few jobs for the unskilled or semi skilled, just the class of workers whose jobs are being eliminated by automation. You know, this was time magazine in 1961. So this is a long gstanding fear, which at least so far has not been realiised know here we are in which a country in which not everyone might have the work that they wish they had, but that overall, almost everybody has a job, and many people working a lot of hours a week, whereas once upon a time, the claim was that before the end of the twentieth century, weonly have to do a three day work week because there wouldn't be much work to go around. Imagine. Yeah. So another fear is will almost all the money go to five to ten enormous technology giants? I actually think this is a more serious worry. This seems to be the direction that we're headed in at the moment. I think there's no doubt that modern networks and a concentration of AI talent tend to encourage this outcome just but you know essentially, this is the modern analog of what happened in the early decades of the twentieth century. You know the equivalent then was transportation networks, and it was domination of the new transportation networks like railways that led to a few people, dominating the economic system. But what happens there would be, you know, essentially comes down to a political and social question. So as I was mentioning before, after the Great Depression, country successfully dealt with the monopolistic power of a small number of companies. And with political leadership, we could do that again. The problem is that there's not much sign of political leadership right at the moment, but that's a political problem to solve, rather than it actually being a technological problem to solve. So the next problem is, should we be afraid of an imminent singularity when machines have artificial general intelligence beyond the human level in particular, would such an event threaten human survival? So this is a concern that is increasingly exploded into the mainstream with discussions of AI existential risk. And in quite a few of discussions that have been leading to the setting up of things like AI safety institutes in the us, uk, are motivated by maybe there are these worries of out of control artificial intelligence taking over and deciding to eliminate humanity. So we get these sort of article headlines, like pausing AI developments isn't enough. We need to shut it all down. How rogue aieyes may arise. AI godfather Jeffrey Hinton warns of dangers as he quits Google we must slow down the race to godlike AI. I don't personally give these concerns too much credence, and I think there's started to be increasing pushback against them. So in the other direction, Francois charet, who is the architect of kos, sort of argues, there does not exist any AI model or technique that could represent an extinction risk for humanity. Not even if you extrapolate capabilities far into the future of ice scaling laws. Most documents boil down to this is a new type of technology. It could happen. Joel Pineau, who's meta AI leader, refers to existential risk discourses unhinged, and points out the floor of the lot of the utilitarian argumentation that goes along with discussions of these risks, which is, you know, if you say the elimination of humanity is infinitely bad, that means, you know, any non zero chance multiplied by infinity will be bigger than the badness of anything else that could happen in the world. But that that isn't actually a sensible way to have rational discussion about the outcomes. And many people, including Timm de jebreu, have argued that a lot of the, well, a lot of the outcome of this focus on existential risk, and if you're more cynical, a lot of the purpose of this focus on existential risk is to distract away from immediate harms that are arising from companies deploying automated systems, including their biases, worker exploitation, copyright violation, disinformation, growing concentration of power and regulatory capture by leading AI companies. And that's something that is worth you know thinking about that behind all the discussions of our amazing AI's and all the things we can do with them, like get our homework done or generate wonderful images, that there are lots of things underneath about disinformation, deception, hallucinations, problems of homogeneity, decision making, violation of copyrights and people's creativity, lots of carbon emissions, erosion of rich human practices. So we need to be conscious of the sort of present day harms that can come about from AI and for nlp as well. There are various kinds of harms that we've touched on, which include generating offensive content, generating untruthful content and enabling disinformation. So the disinformation one is an interesting one that if models can reason well about texts, can they also be persuasive in communicating incorrect information or opinions to users? Perhaps there are new possibilities for doing very personalized misinformation propagation that easily persuades human beings better than traditional methods of political advertising. And there's starting to be evidence that that's it's still being debated in the literature, but. There's now multiple studies suggesting that humans can be influenced by disinformation generated by AI's, and it seems reasonable to think that we're going to start seeing more use of that in political systems and elsewhere, which is potentially quite scary. You know perhaps the worst of it isn't going to be tech space. That's likely that visual fakes are going to be even more compelling in political context. Ts, and this sort of seems like whether it happens in the us for this election or in other countries in their elections, that we're likely to see some major incidents where AI generated fakes can be seen of having a major impacts on political systems. So I sort of think really what we should be doing is worrying not about existential risks, but worrying about what people and organizations with power will use AI to do that. This is a pattern that we've noticed multiple times, also with social media, right? In the early days of social media, there was the idea this was meant to lead to new freedoms for people across the globe, bringing the positives of free political thought and improved human lives in last measure. That isn't what's happened, that new technologies get captured by powerful people and organizations who master the new technological options and AI and machine learning as being increasingly used for surveillance and control. And we're seeing that around the world at the moment. So my final thought to end with is a thought about Carl Sagan. So when I was Young, many decades ago, Carl Sagan did the series cosmos on television, explaining the miracles of the universe. And at the time when I was a teenager, I loved cosmos. Now, this was a long time ago. So much more recently, there's now a new generation of cosmos. And the book is advertised on the basis of with a new forward by Neil degrass. Tyson, I think Carl Sagan was a good guy, and he didn't only write cosmos. He wrote a number of other books. And another of the books he wrote was the demon haunted world, which has a theme that's a little bit closer to some of the things that connect with what we're dealing with here. So in that book, he writes, I have a foreboding of a world in my children's or grandchildren's time when awesome technological powers are in the hands of a very few and no one representing the public interest can even grasp the issues, when the people have lost the ability to set their own agendas or knowledgeably question those in authority, when clutching our crystals and nervously consulting our horoscopes, our critical faculties and decline, unable to distinguish between what feels good and what's, we slide almost without noticing, back into superstition and darkness. I think if you look around the us and many other parts of the world today, this is actually much more the risk that humanity is facing and why education, which we try to provide at Stanford and other places, is an important thing that should be valued. And all the other things that go along with this, of having things like open source that supports the broad dissemination of learning. Thank you.

摘要历史 (1)

Detailed Summary 摘要

模型：gemini-2.5-pro-exp-03-25

2025-06-05 22:52

StreamSparkAI