2025-03-04 | CS224N | Lecture 18 - NLP, Linguistics, Philosophy

NLP语言学与哲学探讨及AI未来发展

媒体详情

上传日期
2025-06-05 22:45
来源
https://www.youtube.com/watch?v=NxH0Y78xcF4
处理状态
已完成
转录状态
已完成
Latest LLM Model
gemini-2.5-pro-exp-03-25

转录

下载为TXT
speaker 1: Okay, hi everyone. I'll get started. The last class. Okay, Yeah. Well, welcome. Congratulations and thank you to making it to the last real lecture of cs 224n. Yeah. So this is the plan for today and the lectures titled nlp linguistics and philosophy, which I took as meaning that I could talk about anything I wanted to. So that is what I'm going to do. So this is sort of this what we're going to go through, talk a bit about the major ideas of cs 2:24n and open problems, some of the more foundational questions of where are we with llm, symbolic versus neural systems, meaning and linguistics and nlp. And then I'll close with some slides on the future risks of AI in the world. Okay. So here as an attempt to sort of lay out the most major things that we looked at in cs 224n, we stuwith word vectors, and we developed the idea of neural nlp systems. We expanded from a simple feed Ford network into doing sequence models, language models, rand, ns lstms. And then we introduced this powerful new model that's been very influential, the transformer. And then we built from there to the kind of, it's not exactly an architecture, but model that's been built up in recent years to produce high performance nlp systems where we're first doing pre training and then A A post training phase of various techniques that we talked about to produce these general foundation models that understand language so well. And then we went on from there and talked about various particular topics like benchmarking and reasoning. So a few of the major ideas that we looked at were this idea that you could get a long way by having dense representations, those are our hidden representations in neural networks, and then looking at distributional semantics, representing words by their context, first slogan of you shall know a word by the company it keeps. And I'll come back to that a bit later and talk about ideas of meaning. But you know that's essentially been the idea that has driven most of the successful ideas of modern nlp, whether it's the earlistatistical nlp phase or more modern neural nlp phase. And in this world, we start instantiating there as these models of word vectors. But the same contextual idea is then used in all the models up through transformers. We looked at both the challenges and opportunities of training large, deep neural networks, and how gradually people developed ideas and tricks, such as having residual connections, which made it much more possible and stable to do successfully, which took us from a place where a lot of this seemed black magic that was hard to get right, to people being able to very reliably train high performance transformer models. We talked about sequence models, what's good about them and some of their problems, and how those problems have been addressed in large measure by adopting this different architecture of transformers, which give a form of parallelization. And then we moved into the modern form of pre training by language modeling, where language modeling seems a simple thing, predicting words and context, but it emerges as what we think of as a universal pre training task that all kinds of both linguistic and world knowledge help you to do this task of predicting words better. And so this has ended up as just a general method to produce the kind of powerful, knowledgeable models that we have today. And up until now, there's been this amazing property that we see, this empirical fact that we seem to just get this basically, well, not basically, it's extremely linear improvements as performance as we continue to scale data and compute and model size up by orders of magnitude. That doesn't mean that all problems in nlp are solved. There are lots of things that people still work on and see opportunities to try and make things better. And a few of these are mentioned on the next few slides. So there's a real question of how much these models are good at actually learning to be able to do things generally rather than just being very good at memorization. That a lot of the benefits of what we're getting from these large pre trained language models is that they've seen a huge amount of stuff, and therefore, they know everything. They've seen every pattern before, and they know how to use things. So I've occasionally used the analogy that large language models are sort of like a talking encyclopedia, that they're really, in many ways more like a huge knowledge store than necessarily something that is intelligent in the sense of being able to work out how to solve new problems and generalize as human beings do. A kind of interesting fact actually is that in some ways, transformer models are actually worse at generalising than the older lstms that preceded them. So here's just one little graph I'm not going to spend a lot of time on, but this was looking at data that's being generated by a finite automatter and then trying to learn it from a limited amount of data with either an lstm or transformer. And the observation is that you know at the scales that they're working, even having seen quite limited exemplification, the lstm is basically sealing the entire of this graph, right? Is just at the one line, because it generalizes in good ways because of its lstm architecture, whereas the transformer needs to see a ton more data before it actually learns the patterns well. And so if we think of one of the prime attributes of humans, intelligence is actually we're amazing at figuring out and learning things from very limited exposure, right? You know there's something that you don't know how to do, and a friend shows you once what you do to make it work. And by and large, you know you'll improve a few times with practice, but you can learn effectively new skills from these kind of single shot examples. And that's not always what we seem to be seeing in our models. There's a lot of interest in what's going on inside neural networks that a lot of the time neural networks still appear as black boxers where we have no real idea of how they're doing, what they're doing. And as perhaps for your final projects, the main thing you're doing is measuring the final performance number and seeing if it goes up or not. So there's a lot of interest and better understanding. What do they learn? How do they learn it? Why do they succeed and fail? And a lot of that work is starting to look more closely into what's happening inside neural network computations. There is some work of that sort that actually goes back quite a fairway. So here's an old blog post by Andre keparthy while I was a grad student here in 2016. And he was looking at lstms and how do they learn, and he found that one of the neurons in an lstm cell was effectively sort of measuring position along a line of text. And as the line of text got long, its sort of value started to change, because the model was learning that there was sort of a line length of this text and that the line was likely to be ending at that point. And in recent times, there started to be with transformers as well, a lot of work looking at mechanistic interpretability or causal abstraction, trying to understand the internals and models. A problem that's far from solved, and in many respects probably unsolvable, is the multilingual question of dealing with all the other languages of the world that you do have to keep in your head, that whatever you see for English, it's worse for every other language. And what they're getting out of modern language models. Now, you know there is a good news story here. I don't want to claim that everything is terrible. So in this graph, which is kind of small, the blue line was the performance of GPT -3 point five English. And then all of the Green bars are then the performance of GPT -4. And so you know there's a genuine good news story here, which is look, not just for English, but for a lot of other languages, for Greek, Latvian, Arabic, Turkish, all of them in GPT -4 are better than English was in GPT -3 point five. So you know that's the good news argument. But building these models big is in some sense raising all boats. But you know these are still all huge languages. And you know things are starting to drop off at the bottom of this table for languages where the performance is worse than English and GPT -3 point five. But even for those languages, their languages for which much less written data is available, but they're still large languages. So the ones that the three at the bottom are actually all Indian languages. They're Punjabi, Murati and Telugu, and which are languages that are each spoken by millions of people. They're not small languages. So the real question is, what happens when you actually get to the small, low resource languages? So the vast majority of languages around the world don't have millions of speakers. They vary from having hundreds of speakers to hundreds of thousands of speakers. And there are thousands of such languages. A lot of those languages are primarily oral and have very limited amounts of written text. Now, some of those languages are likely, or many of those languages are likely to go extinct in the coming decades, but many of those language communities would like to preserve their languages. And it's very unclear how the kind of language technologies that we've been talking about in the later parts of the course can be extended to those languages because there just isn't sufficient data to build the kind of models that we've been looking at. So I imagine you've gotten some idea in this course of how evaluation is a huge part of what we do that effectively, a lot of the way that progress is being driven is by defining evaluations of what models should be able to achieve, and then people working to measure systems and improve systems so that they do better on what we see as good language, other understanding or other properties. One of the concerns that many people have about what's happened with the large recent closed models from large companies is a concern that all of the benchmarks are being sullied and not to be trusted. So here's one example that comes from a tweet of Horace hearus, and he's noting, I suspect gpc four's performance is influenced by data contamination, at least on code forces. One of the coding benchmarks of the easiest problems on code forces, it solved ten out of ten pre 20, 21 problems, but zero out of ten recent problems. This strongly points to contamination. And the worry is that every time you're seeing these fantastic results of how well the latest, best language model is performing, that at this point, so much data is on the web that gets included in the pre training data for these large language models. That essentially they're memorizing at least a good share of the questions they're appearing in these challenges. So they're not actually solving them in a fair way as an independent test set at all. They're just memorizing them. And so there's sort of issues then as to you know what kind of thoroughly hidden test sets we can have or dynamic evaluation mechanisms so we can actually have benchmark integrity, another huge area the number of us evolved in at Stanford elsewhere is making nlp work in different technical domains. So domains including biomedical or clinical, medical dical nlp have a lot of differences of vocabulary and usage. They have a lot of potential good uses, but they also have a lot of potential risks of doing harm if the language understanding is incomplete. I myself have been more involved doing things in the legal nlp, working with other people at the reg lab with Dan ho in building foundation models for law. And there are all kinds of ways, again, in which this kind of technology could be really useful. The biggest problem in most countries, it's bad in the United States, but it's way worse in a place like India, is that most people can't get access to the kind of legal help that they need to solve their problems because of the cost of it and the lack of trained lawyers. So if more could be done to be able to help people fire nlp tools, you know in principle that would be great, but in practice, the tools still don't have good enough language understanding. So in the reg lab, there's a just completed study out at the moment looking at legal nlp systems, and we were finding that the hallucination rate, rate at which there was made up stuff in their legal answers was effectively for one question in six, which is in a very good accuracy rate. If you're someone who's wanting to rely on these systems for legal advice, there are lots of things also to work out, dealing with the social and cultural aspects of nlp. Nlp systems remain very biased against various cultures and religions. They have certain social norms, you could say that they pick up from somewhere, but those social norms are very biased. It's against certain groups and related to there being small languages that I mentioned before that there are lots of issues with underrepresented groups in having the kind of nlp that theylike to have. Okay. So that's sort of the summary of that bit. So for the next bit, I thought I'd just sort of give one more bit of perspective on where are we with the best language models like GPT -4. I mean, I think it's really interesting at this moment of where we are because you know on the one hand, the performance of these models is just amazing. And you know even as someone who works in nlp and has worked in it for many, many years now, I mean, you know I can tell her a sort of story that these models, you know, that we do this training to predict the next word, and it's conditioning on a lot of text, and it knows about things and it does. But you know, in some sense, these things still seem like magic, right? It's just kind of hard to believe how this could possibly work. So in this example, I asked chagpt four zero. I did that this this morning. I asked ed to write a sonnet explaining the transformer neural net architecture, in which every line begins with the letter t. And it sort of still, frankly, blows my mind. And I don't actually feel I can really explain even to myself in a way that's convincing, how this large transformer is able to take all its pre training text and reading that instruction. And as a next token prediction machine, it successfully produces something that is a sonnet. And every line begins with the letter T. I hope you remember from your high school English classes, sonnet cement meant have ab rhyming. It's a little bit imperfect here. Net set, some people who complain about that, there's basically a rhyme, behold, unfold, grass clasp, right? So it's basically doing the writing, rhyming riand. The wines are meant to be an iambic pentameter and it's basically produced lines and iambic pentameter to delve into the world of neural nets. You know, iambic pentameter. Yeah, it's incredible. So here's it's sonic it came up with to delve into the world of neural nets. Transformers rise and marvel to bewhole through layers deep. The network's path is set to learn from data patterns, to unfold the tokens feed into attentions, grasp to weigh their import context to a gn the queries, keys, and values in a clasp, to process sequences with power fine through heads of many multifaceted site, to paralleze, capturing the span, transforming each a matrix math delight. So synthesize with elegance and plan the models might in every task it shows to turn raw data into wisdom prose. Now, you could object that I'm not sure this exactly explained the transform neural net architecture. It's a little bit abstract, I'll give it that. But you know in another sense, it did in one place or another evoke quite a bit of stuff about transformers with queries, keys and values and multi headed stuff, parallel eyes with matrix math, whatever else. Yeah still kind of blows my mind how well that works. And you know indeed, as natural language understanding and sort of world understanding devices, I mean, these devices have clearly crossed the threshold in which they're very usable in many contexts. So here's there's now started to be some fairly good studies that have been done on you know how much value people can get out of using llms like GPT -4. So this study by delecqua and a whole lot of coltics, including Ethan Molik, they took a bunch of consultants from the Boston Consulting group. And so you know the front test like that means you know 23 year olds graduating from universities like this one, but more on the east coast. They've become you know Boston consultants, you know not exactly dummies. And so they've found in this study so you know controlled task, there are actually so three groups. But the big contrast is that two of the groups were using GPT -4 to do consulting tasks and one of the groups wasn't using GPT -4 to do tasks. The difference between the two that were was one of them was given more training on how to use GPT -4, but that didn't seem to make much of a difference. But their result was that the groups using GPT -4 in their study completed 12% more tasks on average. They did the task 25% more quickly, and the results were judged 40% higher quality than those not using AI, which I think is a pretty stunning success of how GPT -4 or similar llms are good enough to actually help people get real work done with whatever asterisk you want to put about the quality of management consultant work in various instances. Yeah. I mean, and the interesting result is that you know using these llm seem to be a big leveler and actually you see exactly the same thing for people using coding llms, that they're a huge assistance for people whose own skills are weaker and they're much less of an assistance for people whose own skills are strong. Okay, so that's a good news story. But you know if on the other hand, you're more like the good news story for human beings, here's a study that goes in the other direction. Can GPT -4 write fiction that matches the quality of New Yorker fiction writers? And the result of that study was not even close, that GPT -4 was measured as three to ten times worse at creative writing than a New Yorker fiction writer. So there's still hope for human beings. Hang on there. And so you know I think that's kind of the you know the jeal screen picture that we have at the moment. Then in some ways, these things are great and useful. In other ways they're not so great. And I think that's something that we're still sort of going to be seeing playing out in the future years. I think living in Silicon Valley, we see a lot of the positive hype. So if you just want na see a little bit of the negative on the other side, late last year, there was a piece in the Financial Times which was titled generative AI, hyper intelligent. And I won't read all of this. But basically they were wanting to express considerable skepticism of the current AI boom. Investors should keep their heads. Expectations for generative AI running way ahead of the limitations that apply to it. As investment in generative AI grows, so does pressure to create new use cases. By 2027, idc thinks enterprise spending on generative AI will reach 143 billion, up from 16 billion this year, so ten times up. OpenAI hopes for more funding to pursue human like AI. It is worth remembering that when examining openman's plan for superintelligence, models predict they do not comprehend that limitation cast out on AI achieving even human like intelligence. And then they sort of start talking about some of the problems with you know, limited gains for law, skilled workers, inaccuracies and the work they produced, and suggest that the limitations will become more obvious as generative AI tools roll out. That will put pressure on providers to address costs. AI could add 4 trillion profits, says mckinensey, but price and clarity is lacking. Without it, companies cannot predict what financial gains AI can accomplish, and AI cannot predict that either. Okay, that's that topic. I'm chugging through my topics. The next topic is I wanted to return and say a bit more about symbolic methods that dominated AI from the sixties until about 2010, versus what I termed here as cybernetics. Because the original alternative, going back to the fifties and sixties was called cybernetics. And in a very real sense, neural networks is a continuation of the cybernetics tradition rather than the AI tradition that started in the fifties and sixties. In this context, Stanford is the home of the symbolic systems program. So at the moment, we are unique in having a symbolic systems program. So the name symbolic systems came about because at the time it was started, so I guess philosophy was an active part of the symbolic systems program. And John barweshown in this picture, and he actually died Young, so he actually died in 2000. John barwehad, a very strong belief that you were meant to be dealing with meaning in the world and the connection between people's thinking and the world. So he refused to allow the program to be called cognitive science, as it's called at most other places, and it ended up being called symbolic systems. Now, at one point, there were two universities that had symbolic systems, because John barweactually moved away from Stanford and went to Indiana, which is where he originally was from. And so Indiana also had a symbolic systems program for a number of years, but they've actually changed theto cognitive science now since he died. We are unique in having symbolic systems. And so the idea of symbolic systems, this is sort of what's on the website with a bit of interpretation, right? So symbolic systems studies systems and meaningful symbols that represent the world about us, like human languages, logics and programming languages, and the systems that work with these symbols, like brains, computers, and complex social systems. Contrasting that to the sort of typical view of cognitive science, which is focusing on the mind and intelligence as a naturally occurring phenomenon, symbolic systems gives equal focus to human constructed systems that use symbols to communicate and to represent information, right? So in AI terms, you know, AI as a field, and the name AI arose around arguing for a symbolic approach, right? That John McCarthy used the color photo there, and who founded Stanford's artificial intelligence and the original famous Stanford AI lab. So John McCarthy came up with the name artificial intelligence, and he very, very explicitly chose a new name to disassociate what he was doing from the cybernetics approach, which had been pursued by people including Norbert Wiener at mit, who's shown on the right side. So Marvin Minsky, the teeny photo down here, sort of founded artificial intelligence at mit. McCarthy worked with him for a few years, and then McCarthy came to Stanford. And two of the other, most prominently AI people, Newell and Simon, who were at cmu, and the other two people on the right side. And so in particular, Newell and Simon developed actually, no, let me say a sentence first. Yeah. So I mean, McCarthy's own background was a mathematician, logician, right? So that he wanted to construct an artificial intelligence that looked like math and logic effectively, right? And that sort of was AI as a symbolic system, and that was developed as a position in the philosophy of artificial intelligence by Newell and Simon. And so they developed what they called the physical symbol system hypothesis. So that said, a physical symbol system has the necessary and sufficient means for general intelligent action. And so you know that's a super strong claim. It's not only claiming that having a symbol system allows you to produce artificial general intelligence, but through the necessary clause that you can't have artificial general intelligence without having a symsystem. So that was sort of the basis of classical AI, right? And that kind of contrasts a bit with the you know so cybernetics you know had its origins and sort of control and communication. So it's much nearer to sort an electrical engineering kind of background and was wanting to sort of unify ideas of control and communication between animals, maybe perhaps more than humans and machines. Yeah. So I mean, you know sort of in Yeah so cybernetics comes from a Greek word Kubernetes, which is sort of interesting all the uses it has. So it's exactly the same route that occurs both in Kubernetes, if you're familiar with that, as you know, distributed containers and modern systems, but also it's actually the same route that the word government comes from. Of course, it's a control system as well. Yeah. So under the cybernetics tradition was when neural nets first started being explored. The very earliest neural nets are the most famous ones of Frank rosenblatz, which we used for vision. The neural net was actually wired to say just a teeny bit about this. In case you think that AI hype is only a thing of the 2020s, there was just as much AI hype in the 19 fifs when Rosenblatt unveiled his perceptron. So in the New York Times article about it, new navy device learns by doing psychologist shows embryo of computer design to read and grow wiser. The navy revealed the embryo of an electronic computer today that it expects we'll be able to walk, talk, see, write, reproduce itself and be conscious of its existence. And this hype is all the more incredible when you get to the later paragraph of the article and you find out what the demonstration was actually of. And the demonstration that people were shown was that this device learned to differentiate between right arrow and left arrow pictures after 50 exposures. But there you go. Okay. Yeah. So what what do we make of this in the case of nlp and language? And you know the position I would like to suggest is you know there's just no doubt that language is a symbolic system, right? But humans developed language as a symbolic system. It's perhaps most obvious that if you think about it in writing, we have symbols of the letters and words that we use. But even if there's no writing, and you know the majority of human language use over time has been verbal human language use, that even though the substrate it's carried on, whether sound waves or in sign languages, movements of hands, even though that's a continuous substrate, the structure of human languages is a symbol system. We have symbols, which are the sounds of human languages. For cat, we have a cut, an at and a turthose are symbols. And they're recognised in a symbolic way by language users. And indeed, all the pioneering work and categorical perception in cognitive psychology is done with the sounds of human languages, the phonemes, as linguists call them. So spoken language also has a symbolic structure. But you know, going against neand, Simon, the fact that humans use a symbol system for communication doesn't mean that the processor of the symbols, the human brain, has to be a physical symbol system. And so similarly, we don't have to design nlp, our computer processes, those physical symbol systems either. The brain is clearly much more like a neural network model. And probably neural models will scale better and capture language processing better than something. There's a symbolic processor in the same way. I mean, that sort of leaves behind the question of, well, why did humans come up with a symbol system for communication? I mean, after all, you know, we could have just sort of hummed at different frequencies and that could have been used as our system of communication. I mean, I think the dominant idea, which seems reasonable to me, but who knows, is that having a symbolic system gives signaling reliability, right? That if you have discrete target points that are separated, then that gives you an ability when there's degradation of the signal to recover it. Well, Yeah. So where does that leave linguistics, which has mainly been developed in terms of describing a symbolic system? I think the right way to think about it as linguistics is good for giving us questions, concepts and distinctions when thinking about language acquisition, processing and understanding. And indeed, one of the interesting things that's come about is that of as nlp and AI have been developed further and is able to do a lot of low level stuff, that there's actually the sort of higher level concepts that linguists often talk about a lot, things like compositionality and systematic generalization, which I'll come back to in a few minutes, the mapping of stable meanings for symbols, the reference of linguistic expressions in the world that they get talked about more and more in artificial intelligence, context, building neural systems. And I mean, I think one way to think about is that you know a lot of the early neural network of most notably visual processing, but also other kinds of sensory stuff like sounds. I mean, doing that is sort of what gets you to insect level intelligence. And if you want to get higher up the chain than insect level intelligence, then a lot of the kind of questions and properties of linguistic systems become increasingly relevant. At a slightly more prosaic level that I don't think one necessarily wants to believe all the fine details of different linguistic theories, but you know for how human languages are structured and how they behave, I think Yeah most of our broad understanding of linguistics is right. And so therefore, when we're thinking about nlp systems and we're thinking about know understanding how they behave, wanting to know whether they have certain properties, thinking up ways to evaluate them, a lot of that is done in terms of linguistic understanding, wanting to see where they capture facts about sentence structure, discourse structure, semantic properties like natural language inference, whether you can do things like bridging an effort. Ra, which I did not cover this year's class because we skipped the co reference lecture. When we slice one lecture off the class, metaphors, presubtions, all of these things are linguistic notions that we try and get our nlp models to capture. So I just want to say a couple more remarks about you know the role of human language in human intelligence. I think this is kind of interesting. So an interesting person in the history of linguistics is this guy, bill helm von humbot, who was a prominent German academic. So really, the American education system was borrowed from Germany, right? So up until the Second World War, the preeminent place of science and learning was Germany. And Germany essentially vivon Humboldt's work developed the idea of having graduate education. And the us copied graduate education from Germany and started doing its own. But you know, in that context, it was still the case that for people in the United States prior to the 19 thirties, that generally people would go to Germany to finish their education, either to get their PhD or to do a postdoc or something like that, right? So, you know, if you trace back my own academic tree or most other academic trees of people who got PhDs in the us, they actually go back a few generations and then they go back to Germany. So we don't think of that as much in the modern world. Yeah. So Humboldt was influential in developing the University system, but he also worked a lot on language. And I mean, he's someone that Chomsky always cites because he's known for this famous statement about that human language must make infinite use of finite means. So the fact that we have a limited supply of words and sentence structures, but out of those where can recursively build up an infinite number of sentences. And that's, in chomsy view, supporting the kind of symbolic, structured view of language that he's been advocating. But I think there's sort of another interesting take of on Humboldt, which we can argue whether it's right or not, but I think is kind of interesting. And one of the things he wants to stress is that language isn't just something used for the purpose of communication, that he, I should actually introduce something here. So carneman Verska, two well known cognitive psychologists, and they introducthis idea that there are two kinds of thinking, system one, cognition, and system two cognition. And system one is the kind of subconscious thinking that you're not really thinking of, just we process stuff when it comes into our heads, whether visual signals or speech. And system two, thinking is the conscious. Let me think about this and try and figure out what's going on. I'm solving the math problem style of thinking. And you know I think you can see in von Humboldt's writings essentially the same kind of distinction between system one and system two cognition, although he refers to system one cognition as x of the spirit and system two cognition as thinking. Yeah, then so basically he argues for a version of the philosophical position of the language of thought. Of suggesting that effective system two thinking requires extension of the mind through the symbols of language. And so he argued that having language is absolutely a necessary foundation for the progress of the human mind. And I think that's actually an interesting perspective, which I have some sympathy with. I mean, you know, obviously we can think without language. You know, we can feel afraid. We can think visually and about how things that fit together. But I think it's fairly plausible that for the sort of more abstract, larger scale thinking that humans engage in and has led them to sort of of higher levels of thought than a chimpanzee gets to, that language gives us scaffolding inside the mind that makes that possible. Another version of that is from the philosopher Daniel Dennett, who just actually died a couple of months ago. So Dennett wrote this book called from bacteria to bark and back. And the main thing this book was about was the origin of human consciousness. And I'm not going to talk about human consciousness today, but he introduced this model of four grades of progressively more competent intelligences. And so the four levels he outlined was that the bottom one was Darwinian. So Darwinian intelligence was something that was predesigned and fixed. It doesn't improve during its lifetime. Improvement only happens by evolution through genetic selection. So things like bacteria and viruses, Darwinian intelligences, so then after that was scinarian intelligences. And so they improved behavior by learning to respond to reinforcement. So something like a lizard, or you know perhaps a dog, we could argue about how intelligent dogs are, has scinarian intelligence. And so then the third level up, poperian intelligence, is things that learn models of the environment. So they can improve performance by thinking through plans and then executing them and seeing how they behave. So in a computational sense, poperian intelligence kind of means that you can do model based reinforcement learning. And so primates, like chimpanzees, can definitely do the kind of planning and model based reinforcement learning that gives you a Paparian intelligence. But actually, a lot of recent evidence shows that a lot of simpler creatures can also do it. So I'm not sure the facts here. So all these studies you see are about crews from the South Pacific, Australia and Fiji and places like that. So I'm not sure if northern hemisphere crows are dumber, but at least southern hemisphere crows can learn plans so that they can do multiti stage planning to work out ways to get a piece of meat that's down a hole by learning to pick up a stick and poke it in. And so you know that even crows can be preparian intelligences. But what dent suggests is that there's a stage beyond poparian intelligence, which he calls Gregorian intelligence. And the idea of Gregorian intelligence is that you can build thinking tools which allow you to do a higher level of control of mental searches. So he suggests that, you know, things like, well, mathematics is a thinking tool, but well, also a democracy is a thinking tool, but nevertheless, out of the space of thinking tools, that human language is the preeminent thinking tool that we have. And so he suggests that, you know the only biological example we have of a Gregorian intelligence is human beings. And so I think in that kind of sense, you can say that there's a very important role for language. Two parts to go in my summary. Okay. So the next one is what kind of semantic should we use for language? And so this is getting back to the question I mentioned for word vectors. And this is kind of interesting. So the semantics that's been dominant in philosophy of language or in linguistic semantics, is a notion of model theoretic semantics, where the meaning of words is their denotation, what they represent in the world. I mentioned this, I think, in an early lecture, right? So that if you have a word like computer, the meaning of compuis, the set of computers, this one, that one, that one or the other computers are out, right? So it's denotational relationship between a word and it's denotation in the world or in a model of the world. And that was the notion that was used in most of the history of AI for doing symbolic AI. And that then contrasts with this sort of distributional semantics, that the meaning of a word is understanding the context in which it's used, which is effectively what we're using for our neural models. Yeah. So if you look at the traditional view of understand, interpreting the meaning of human language, and this is what you'll have seen if you did an intro logic class at some point, right, that we have a sentence, the red apple is on the table and you get to write in some logical representation, first autopredicate calculus or whatever. This one's a bit different to allowing thus, where normally for first autoprediccalculus you only do for all and it there exists, but you have a sort of a formal logic. And you know, in the early week, in weeks 12 of the logic class, you have some English sentences for which you translate into formal logic. And then after that, you forget about human languages, and you just sort of start proving stuff about formal logical systems. And so to some extent, what you're going a philosophy class represents the tradition of Alfred Tarsky. So Tarsky believed that you couldn't talk about meaning in terms of talking about human languages, because human languages were, quote, quote, impossibly incoherent. Yeah. And so from about the sort of 19 forties until 1980, you know Tuskey was the preeminent logician in the us. He was in Berkeley. And so that was very much the view of the logicians of the world. But during that period, one of his students was this guy, Richard Montague. So Richard Montague sort of rebelled against that picture, saying, I reject the contention that an important theoretical difference exists between formal and natural languages. So he then set about showing that, well, you could start building up a formal semantics for describing the meaning of natural language sentences. And so Richard Montagues work became the foundation of the work that's used in semantics in linguistics as well. For anyone who's done link 130 or 230, the picture you saw as sort of a Montague picture of semantics. And so that was the semantics that was taken over and essentially used as the model of doing natural language understanding for most of the history of nlp, you know, roughly 1960 to 2000 1517. You know, and so the picture essentially was that if we wanted to have a sentence that we interpreted like the red apple is on the table, what we would do is wefirst produce a syntactic structure for the sentence. So we would pait, and then using ideas roughly along the lines that Montague suggested, we would construct its meaning by looking up meanings of words in allexicon, and then using the compositionality of human languages to work out the meanings of progressively larger phrases and clauses in terms of the meanings of those words and the way that they are combined. Slightly reminiscent of my discussion of tree structures, the meanings in the last lecture I gave. And so you would build up a meaning representation of a sentence. And so this could then give you a semantic meaning of a sentence that you could use in a system. This is approximately a slide except retitle that I actually used to use in cs 224n in the 2000s decade. Right? So we have a we have or part of a sentence I get. Oh no, it's a whole sentence here it is how many red cars? What can I get? The sentence. I think there's a sentence here. How Oh, how many red cars in Palo Alto does Kathy like? How many red cars in Palo Alto does Kathy like? And sorry, Yeah, the cars, sorry, got hidden underneath here. Yeah. So we have a sentence. We pause it. We look up meanings of words in a lexicon, we start composing them up. We get a semantic form for the whole sentence, which we can then convert into sql, and we can run against a database and we can get the answer. And this was outline, the kind of technology that was widely used for natural language understanding systems that were built anywhere from the 19 sixties to 2010. And you know, in particular, they were used not only in a purely kind of rule based grammar and lexicon way. This same basic technology was incorporated into a machine learning context where your goal was to start to learn various of these parts. You could not only learn the parser, but you could also learn semantic meanings of words and learn composition rules. And so the acne of that work was then what was called semantic parsing that was pioneered by Luke zetelmoyer and Mike Collins in the two thousandds decade, and then taken up by others, including Percy Liang's. So Percy Liang's PhD thesis, but also actually his early work at Stanford, before he was convinced to do neural networks was doing semantic farsing work. So you know these systems could actually work and were used in limited domains, but they're always extremely brittle. And Yeah, the interesting thing is sort of what of humans? I mean, there is you some evidence that humans do something like this, that they work out the structure of sentences and compute meanings in a bottom up, mostly projective way. You know there's a lot of controversy as to exactly how human understanding of sentences still works, but you know there are so many people have argued in support of human brains doing something similar. That's obviously not what we're getting with current day transformers. And so you know the question is, do our current day neural language models provide suitable meaning functions? And that's know a complex question because you know in many ways, Yeah, they seem to they do an amazing job at understanding whatever sentences you put into them. But there are still some genuine concerns as to where they making shortcuts or work to a certain extent and don't actually have the same kind of compositional understanding with systematic generalization that human beings do. Okay, so that's the traditional denotational semantics view. And that contrasts with the kind of use theory of meaning. And then first or second lecture, and at the beginning of this one, I attributed that to the British linguist Jr. First, you shall know a word by the company it keeps. But it's not only a position of Firth, it's also been a minority position of philosophers. In particular, it was advanced by Wittgenstein and his later work in his work, philosophical investigation. So in that work he writes, when I talk about language, word, sentences, etc, I must speak the language of every day. Is this language somehow too coarse and material for what we want to say? Then how is another one to be constructed? And how strange that we should be able to do anything at all with the one we have Philosophical Investigations that's written in this sort of vaguely poetical literary style. But the point of it is meant to be saying, look, these logician people are claiming you can't use natural human languages to express meaning, and you have to translate into this symbol system. But isn't that a weird concept, that one symbol syem's no good, but this other symbol system somehow fixes things? And then about denotational semantics, he writes, you say the point isn't the word, but it's meaning. And you think of the meaning as a thing of the same kind as the word. They're also different from the word. Hear the word. They're the meaning. So that's the symbol and its denotation, the money and the cow that you can buy with it, but contrast money in its use. And he goes on from there to argue for the kind of, you know, the meaning of money is the way that money can be used in the world. The meaning of money isn't pointing at pieces of money. Okay, so this is what' S Referred to as a youth theory of meaning. And so the question is, is that a good theory of meaning? So some people just don't accept this kind of distributional semantic use theories of meaning as a theory of meaning or semantics, most prominently in recent nlp work. That's the position of Bender and cola that they just take as axiomatic. The only thing that counts as having a meaning is that you've got form over here and meaning over there. But I think that that's too narrow. I think we have to argue that meaning arises from meaning of words, arises from connecting words to other things. And although in some sense you could say connecting words to things in the real world is privileged, it's not the only way that you can ground meanings. You can have meanings in a virtual world, but you can also have meanings by connecting one word to other things in human language. And the other thing that I think you need to say is, you know meaning isn't a sort of a zero. One thing that you know the denotation of word or you don't, I think meaning is a gradient thing, and you can understand meanings of words and phrases either more or less. And so this is an example I gave in a piece that I wrote a couple of years ago. Okay, what is the meaning of the word chanai? Well, maybe a few of you know it, but if you don't, well, what could I do? Well, you know, if youseen or held one youhave classic grounded meaning, know something about the denotation. Well, if that's not the case, well, you know, I could at least show you a picture of one. Here's a picture of one. So that gives you some information about what a Shai is. But, you know, is that the only thing I can do? I mean, suppose, well, sorry, I've left out a bullet point, you know, so this gives you a partial meaning of a she, but surely you have a richer meaning if youheard one being played and, well, it's showing you a picture of one. The only thing I can do, suppose younever, you know, seen, felt or heard one. But, you know, I told you it's a traditional Indian instrument, a bit like an oboe. Well, I think you understand something about the meaning of the word at that point, that, you know, it's sort of connected to India. It's a wind instrument using reads that's used for playing music. You know, I could tell you some other things about it. I could say it has holes, sort of like a recorder, but it has multiple reds and a flared end, more like an oboe. Then maybe you know a bit more about a chna, even though you've never seen one. And if you then extend to what we do more in our sort of corpus based linguistic learning, you know, you could imagine it's not that I tried to define one for you. Instead, I've just shown you a textual use example. So here, or several of those. So here's one textual use example from a week before. Chanai players sat in bamboo, a chanans at the entrance to the house playing their pipes. Bikash bababou disliked the chanines whale, but was determined to fulfill every conventional expectation the groom's family might have. So if that's all you know about Chanet, you know in some ways you understand less of the meaning of the word than if youseen one, but actually in other ways you understand more of the meaning of the word than if you've just seen one. Because you know from that one textual example, you know some things you have heard a characterization of the sound as wailing, and you know that it's connected with weddings, which you don't get from just having held or looked at while not even you know having had someone stand in front of you and play it. And you know that's an important part of the meaning of a Neito people. And so that's a sense in which I think so meaning comes from various kinds of connections. Okay, last topic, our AI future. Yeah so there are different sensors of our AI future and lots of things that we can be worried about. One thing we can be worried about, whether we're all gonna to lose our jobs. Interesting question. Here's a newspaper article from the New York Times. Much of the machine makes idle hands. Prevalence of unemployment with greatly increased industrial output points to the influence of labor saving devices as an underlying cause. This was published in the New York Times in 1928. But, you know, it turns out that quite a few people like labor saving machines, like washing machines and dishwashers and sewing machines, lots of useful labor saving machines. And, well, you know, this was published in 1928, just before, you know, at a time when a small group of immensely powerful and rich men dominated the United States just before the Great Depression. But what happened in the decades after that greatly changed policies in the United States led to boom years that distributed wealth and work much more evenly across the country. And the country boomed. You know here's another one. In the past, new industries hired far more people than those they put out of business. But this this is not of many of today's new industries. Today's new industries have comparatively few jobs for the unskilled or semi skilled, just the class of workers whose jobs are being eliminated by automation. You know, this was time magazine in 1961. So this is a long gstanding fear, which at least so far has not been realiised know here we are in which a country in which not everyone might have the work that they wish they had, but that overall, almost everybody has a job, and many people working a lot of hours a week, whereas once upon a time, the claim was that before the end of the twentieth century, weonly have to do a three day work week because there wouldn't be much work to go around. Imagine. Yeah. So another fear is will almost all the money go to five to ten enormous technology giants? I actually think this is a more serious worry. This seems to be the direction that we're headed in at the moment. I think there's no doubt that modern networks and a concentration of AI talent tend to encourage this outcome just but you know essentially, this is the modern analog of what happened in the early decades of the twentieth century. You know the equivalent then was transportation networks, and it was domination of the new transportation networks like railways that led to a few people, dominating the economic system. But what happens there would be, you know, essentially comes down to a political and social question. So as I was mentioning before, after the Great Depression, country successfully dealt with the monopolistic power of a small number of companies. And with political leadership, we could do that again. The problem is that there's not much sign of political leadership right at the moment, but that's a political problem to solve, rather than it actually being a technological problem to solve. So the next problem is, should we be afraid of an imminent singularity when machines have artificial general intelligence beyond the human level in particular, would such an event threaten human survival? So this is a concern that is increasingly exploded into the mainstream with discussions of AI existential risk. And in quite a few of discussions that have been leading to the setting up of things like AI safety institutes in the us, uk, are motivated by maybe there are these worries of out of control artificial intelligence taking over and deciding to eliminate humanity. So we get these sort of article headlines, like pausing AI developments isn't enough. We need to shut it all down. How rogue aieyes may arise. AI godfather Jeffrey Hinton warns of dangers as he quits Google we must slow down the race to godlike AI. I don't personally give these concerns too much credence, and I think there's started to be increasing pushback against them. So in the other direction, Francois charet, who is the architect of kos, sort of argues, there does not exist any AI model or technique that could represent an extinction risk for humanity. Not even if you extrapolate capabilities far into the future of ice scaling laws. Most documents boil down to this is a new type of technology. It could happen. Joel Pineau, who's meta AI leader, refers to existential risk discourses unhinged, and points out the floor of the lot of the utilitarian argumentation that goes along with discussions of these risks, which is, you know, if you say the elimination of humanity is infinitely bad, that means, you know, any non zero chance multiplied by infinity will be bigger than the badness of anything else that could happen in the world. But that that isn't actually a sensible way to have rational discussion about the outcomes. And many people, including Timm de jebreu, have argued that a lot of the, well, a lot of the outcome of this focus on existential risk, and if you're more cynical, a lot of the purpose of this focus on existential risk is to distract away from immediate harms that are arising from companies deploying automated systems, including their biases, worker exploitation, copyright violation, disinformation, growing concentration of power and regulatory capture by leading AI companies. And that's something that is worth you know thinking about that behind all the discussions of our amazing AI's and all the things we can do with them, like get our homework done or generate wonderful images, that there are lots of things underneath about disinformation, deception, hallucinations, problems of homogeneity, decision making, violation of copyrights and people's creativity, lots of carbon emissions, erosion of rich human practices. So we need to be conscious of the sort of present day harms that can come about from AI and for nlp as well. There are various kinds of harms that we've touched on, which include generating offensive content, generating untruthful content and enabling disinformation. So the disinformation one is an interesting one that if models can reason well about texts, can they also be persuasive in communicating incorrect information or opinions to users? Perhaps there are new possibilities for doing very personalized misinformation propagation that easily persuades human beings better than traditional methods of political advertising. And there's starting to be evidence that that's it's still being debated in the literature, but. There's now multiple studies suggesting that humans can be influenced by disinformation generated by AI's, and it seems reasonable to think that we're going to start seeing more use of that in political systems and elsewhere, which is potentially quite scary. You know perhaps the worst of it isn't going to be tech space. That's likely that visual fakes are going to be even more compelling in political context. Ts, and this sort of seems like whether it happens in the us for this election or in other countries in their elections, that we're likely to see some major incidents where AI generated fakes can be seen of having a major impacts on political systems. So I sort of think really what we should be doing is worrying not about existential risks, but worrying about what people and organizations with power will use AI to do that. This is a pattern that we've noticed multiple times, also with social media, right? In the early days of social media, there was the idea this was meant to lead to new freedoms for people across the globe, bringing the positives of free political thought and improved human lives in last measure. That isn't what's happened, that new technologies get captured by powerful people and organizations who master the new technological options and AI and machine learning as being increasingly used for surveillance and control. And we're seeing that around the world at the moment. So my final thought to end with is a thought about Carl Sagan. So when I was Young, many decades ago, Carl Sagan did the series cosmos on television, explaining the miracles of the universe. And at the time when I was a teenager, I loved cosmos. Now, this was a long time ago. So much more recently, there's now a new generation of cosmos. And the book is advertised on the basis of with a new forward by Neil degrass. Tyson, I think Carl Sagan was a good guy, and he didn't only write cosmos. He wrote a number of other books. And another of the books he wrote was the demon haunted world, which has a theme that's a little bit closer to some of the things that connect with what we're dealing with here. So in that book, he writes, I have a foreboding of a world in my children's or grandchildren's time when awesome technological powers are in the hands of a very few and no one representing the public interest can even grasp the issues, when the people have lost the ability to set their own agendas or knowledgeably question those in authority, when clutching our crystals and nervously consulting our horoscopes, our critical faculties and decline, unable to distinguish between what feels good and what's, we slide almost without noticing, back into superstition and darkness. I think if you look around the us and many other parts of the world today, this is actually much more the risk that humanity is facing and why education, which we try to provide at Stanford and other places, is an important thing that should be valued. And all the other things that go along with this, of having things like open source that supports the broad dissemination of learning. Thank you.

最新摘要 (详细摘要)

生成于 2025-06-05 22:52

概览/核心摘要 (Executive Summary)

本讲座回顾了CS224N课程的核心概念,探讨了自然语言处理(NLP)领域的未解难题、大型语言模型(LLM)的现状、符号主义与连接主义(神经网络)的思辨、语言学与NLP中的意义理论,并对人工智能(AI)的未来风险进行了展望。课程首先回顾了从词向量、序列模型(RNN/LSTM)到Transformer和预训练基础模型的发展历程,强调了稠密表示、分布语义(“观其伴而知其言”)和规模化定律(Scaling Laws)等核心思想。接着,讲座指出了当前NLP面临的挑战,包括模型的泛化能力与记忆化问题(LLM有时被比作“会说话的百科全书”)、模型可解释性(“黑箱”问题)、多语言处理(尤其是低资源语言)、评估基准的可靠性(数据污染问题)以及领域适应(如法律、医疗)和社会文化偏见。在讨论LLM(以GPT-4为例)时,讲座展示了其惊人的能力(如按复杂要求写诗)和实用价值(显著提升咨询顾问工作效率),但也指出了其局限性(如创造性写作远逊于人类作家)和业界对其过度炒作的质疑。讲座还探讨了符号主义AI(源于逻辑,如物理符号系统假说)与神经网络(源于控制论)的历史分野与当前融合,认为语言本身是符号系统,但其处理机制(大脑或AI模型)更接近神经网络。最后,讲座讨论了不同的意义理论(指称论 vs. 使用论),并对AI的未来风险表达了看法,认为相比于遥远的“奇点”或生存威胁,更应关注当前由AI技术滥用引发的实际危害,如信息茧房、偏见歧视、权力集中、虚假信息泛滥等,并引用卡尔·萨根的警示,强调了批判性思维和公众理解力的重要性。

CS224N 课程回顾与核心思想

  • 发展历程:
    • 词向量 (Word Vectors) 和简单的神经网络开始。
    • 发展到序列模型:循环神经网络 (RNNs)、长短期记忆网络 (LSTMs)。
    • 引入强大的Transformer架构。
    • 构建现代高性能NLP系统:预训练 (Pre-training) + 后训练 (Post-training) -> 通用基础模型 (General Foundation Models)
    • 探讨特定主题:基准测试 (Benchmarking)、推理 (Reasoning) 等。
  • 核心思想:
    • 稠密表示 (Dense Representations):神经网络中的隐藏表示。
    • 分布语义 (Distributional Semantics):通过上下文理解词语含义,核心理念为 “You shall know a word by the company it keeps”。这一思想驱动了从早期统计NLP到现代神经NLP的成功。
    • 大规模深度神经网络训练: 挑战与机遇,如残差连接 (Residual Connections) 等技术使得训练更稳定可靠。
    • 序列模型 vs. Transformer: Transformer通过并行化等方式克服了传统序列模型的一些局限。
    • 语言模型作为通用预训练任务: 预测词语的任务能有效学习语言知识和世界知识。
    • 规模化定律 (Scaling Laws):经验观察到,随着数据、计算和模型规模的指数级增长,模型性能呈现近乎线性的提升

自然语言处理领域的开放性问题 (Open Problems in NLP)

  • 泛化 vs. 记忆 (Generalization vs. Memorization):
    • 模型在多大程度上是真正学习泛化能力,而非仅仅记忆大量见过的数据模式?LLM有时被类比为“会说话的百科全书”
    • 有研究指出,Transformer在某些泛化任务上可能不如早期的LSTM(例如,从有限数据学习有限自动机)。
    • 人类擅长“小样本学习” (few-shot learning)“单样本学习” (single-shot learning),而当前模型通常需要大量数据。
  • 可解释性 (Interpretability):
    • 神经网络常被视为“黑箱” (Black Boxes),其内部工作机制不清晰。
    • 研究方向:理解模型学到了什么、如何学习、为何成功或失败(如机制可解释性 (Mechanistic Interpretability)因果抽象 (Causal Abstraction))。
    • 早期例子:Andrej Karpathy发现LSTM中某个神经元能学习检测文本行长度。
  • 多语言处理 (Multilingual NLP):
    • 现有模型对英语之外的语言支持普遍较差。
    • 虽然GPT-4在许多语言上超越了GPT-3.5的英语水平(“水涨船高”),但对于数据资源较少的语言(即使是拥有数百万使用者的语言,如旁遮普语、马拉地语、泰卢固语)性能下降明显。
    • 对于低资源语言,尤其是主要口头使用、缺乏书面文本的数千种语言,现有技术难以应用,这些语言面临消亡风险。
  • 评估完整性 (Evaluation Integrity):
    • 担忧大型闭源模型的训练数据包含了常用基准测试的数据,导致“数据污染” (Data Contamination)
    • 例子:Horace He指出GPT-4在Codeforces编程基准测试中,对2021年前的老问题表现完美,但对新问题表现差,“强烈指向污染”
    • 需要更可靠的评估方法,如严格保密的测试集动态评估机制
  • 领域适应 (Domain Adaptation):
    • 将NLP应用于特定技术领域(如生物医学、临床医学、法律)面临挑战,这些领域有独特的词汇和语言用法。
    • 潜力巨大(如法律领域可提升司法可及性),但风险也很高(如理解不准确导致伤害)。
    • 斯坦福RegLab研究发现,法律NLP系统在回答问题时,幻觉(捏造信息)率约为六分之一
  • 社会与文化方面 (Social and Cultural Aspects):
    • 模型存在对不同文化、宗教的偏见 (Bias)
    • 模型习得的社会规范 (Social Norms) 可能带有偏见。
    • 代表性不足的群体 (Underrepresented Groups) 在NLP技术发展中面临挑战。

大型语言模型 (LLM) 的现状与评估

  • 惊人能力:
    • 讲者认为当前LLM(如GPT-4)的能力有时“如同魔法”,难以完全解释其工作原理。
    • 例子:要求GPT-4写一首关于Transformer架构的十四行诗 (Sonnet),要求每行以字母'T'开头。模型成功生成了符合格律(抑扬格五音步)、押韵(基本符合ABBA押韵格式)且内容相关的诗歌,提及了“queries, keys, and values”, “multi-headed stuff”, “parallelize”, “matrix math”等Transformer概念。
  • 实用价值:
    • 研究表明LLM能显著提升工作效率。
    • 波士顿咨询集团 (BCG) 研究: 使用GPT-4的顾问比未使用者:
      • 完成任务数量 增加12%
      • 完成任务速度 加快25%
      • 产出质量被评定为 高出40%
    • LLM对技能较弱者的辅助作用更大,具有“拉平效应” (Leveling Effect),这在编程辅助工具中也观察到。
  • 局限性与质疑:
    • 在某些领域,尤其是创造性任务上,LLM表现远不如人类专家。
    • 研究比较GPT-4与《纽约客》小说作家的创意写作能力,结果显示GPT-4 “差了3到10倍”
    • 业界存在对生成式AI过度炒作的怀疑声音
    • 《金融时报》文章 (“Generative AI: hyper-intelligent?”) 指出:
      • 期望远超实际局限。
      • 模型是“预测而非理解” (predict they do not comprehend)
      • 对实现类人智能甚至超级智能表示怀疑。
      • 担忧成本、投资回报不明确、不准确性等问题会逐渐显现。

符号系统与神经网络系统之辩

  • 历史背景:
    • 符号主义AI: 源于数学和逻辑,认为智能的核心是符号操作。代表人物:John McCarthy(命名AI,创立斯坦福AI实验室)、Newell & Simon(提出物理符号系统假说 (Physical Symbol System Hypothesis):符号系统是通用智能行为的充要条件)。斯坦福的符号系统 (Symbolic Systems) 项目受此影响(John Barwise坚持此名称,强调符号与世界的联系,而非仅关注认知)。
    • 控制论 (Cybernetics): 源于控制与通信理论(工程背景),关注动物与机器的控制与通信。代表人物:Norbert Wiener。早期神经网络(如Frank Rosenblatt的感知器 (Perceptron))属于此流派。
    • 历史上的AI炒作:1950年代对感知器的宣传(声称能走、说、看、写、自我复制、有意识)远超其实际能力(仅能区分左右箭头)。
  • 当前观点:
    • 语言是符号系统: 无论是书面语(字母、词语)还是口语(音素具有符号性,存在范畴感知 (Categorical Perception)),人类语言都具有符号结构。符号系统可能因其信号可靠性 (Signaling Reliability) 而被采用。
    • 处理语言的系统不必是符号系统: 大脑更像神经网络,而非物理符号系统。因此,NLP模型也可以是神经网络。
    • 语言学的作用: 为NLP提供问题、概念和区分(如组合性 (Compositionality)系统性泛化 (Systematic Generalization)、意义映射、指称等)。随着AI发展,这些语言学中的高级概念愈发重要,有助于AI超越“昆虫级智能”。语言学知识对于设计和评估NLP系统(如句子结构、篇章结构、语义属性如自然语言推断NLI、指代消解等)仍然至关重要。

语言、思维与智能

  • Wilhelm von Humboldt:
    • 提出语言“有限手段的无限应用” (infinite use of finite means),影响了乔姆斯基。
    • 认为语言不仅是交流工具,更是思维的必要基础,特别是对于系统2思维(有意识的、深思熟虑的思考,区别于系统1的直觉式思考)。语言为人类心智发展提供了支架。
  • Daniel Dennett:
    • 在其著作《从细菌到巴赫再回来》中提出智能的四个等级:达尔文式(预设固定)、斯金纳式(强化学习)、波普尔式(建立环境模型并规划)、格里高利式 (Gregorian)
    • 格里高利式智能能够创造和使用“思维工具” (Thinking Tools) 来进行更高级的思维搜索。人类语言是最重要的思维工具,使人类成为目前已知的唯一格里高利式智能。

语言学与自然语言处理中的“意义” (Meaning in Linguistics and NLP)

  • 模型论/指称语义学 (Model-Theoretic / Denotational Semantics):
    • 传统观点(哲学、语言学、早期AI)。
    • 意义在于词语与世界中实体(指称物)的对应关系(如“computer”的意义是所有计算机的集合)。
    • 代表人物:Tarski(认为自然语言“不可能连贯”)、Richard Montague(反驳Tarski,开创了自然语言的形式语义学)。
    • 应用于早期NLP:通过句法分析 (Parsing) 结合词典 (Lexicon)组合规则构建句子的逻辑形式(如语义分析 (Semantic Parsing),代表人物:Zettelmoyer, Collins, Percy Liang)。
    • 过程:句子 -> 句法树 -> 词义查找 -> 组合意义 -> 逻辑表示 (如SQL)。
  • 分布/使用理论语义学 (Distributional / Use Theory of Semantics):
    • 现代NLP的主流方法。
    • 意义在于词语的使用语境 (Context of Use)(“观其伴而知其言” - J.R. Firth)。
    • 哲学根源:后期维特根斯坦 (Philosophical Investigations),认为“意义即使用” (Meaning is Use)(以货币为例,其意义在于其用途,而非指向货币本身)。
    • 对指称论的批评:质疑为何需要从自然语言(一种符号系统)转换到另一种形式逻辑(另一种符号系统)才能表达意义。
  • 讲者的综合观点:
    • 反对将意义局限于指称关系(批评Bender & Koller的观点)。
    • 认为意义产生于联系 (Connections):词语与世界实体的联系(接地),以及词语与其他词语的联系。
    • 意义是渐进的 (Gradient),而非二元的(知道/不知道)。
    • 例子:“Shehnai”(印度乐器)的意义可以通过多种方式部分理解:
      • 看到/听到实物(接地意义)。
      • 看图片。
      • 文字描述(“像双簧管的印度传统乐器”)。
      • 文本语境:即使从未见过实物,通过阅读文本(如“Shehnai演奏者在婚礼入口处吹奏”,“Bikash Babu不喜欢Shehnai的哀鸣声”)也能理解其文化含义(与婚礼相关)、声音特征(哀鸣)等,这些是仅看实物无法获得的。
    • 当前LLM基于分布语义,但在真正的组合理解和系统泛化方面仍存疑虑。

人工智能的未来风险与社会影响

  • 失业问题:
    • 对技术导致失业的担忧由来已久(引用1928年《纽约时报》和1961年《时代》杂志文章)。
    • 历史上,技术进步并未导致大规模永久性失业,反而创造了新工作和提高了生活水平(如洗衣机等节省劳动力的机器受到欢迎)。
  • 财富集中:
    • 讲者认为这是更现实、更严峻的担忧。当前AI人才和网络效应可能加剧少数科技巨头的垄断。
    • 类比20世纪初铁路等交通网络带来的垄断。
    • 解决方案本质上是政治和社会问题,需要有效的政策干预(如同大萧条后对垄断的处理),但目前缺乏相应的政治领导力迹象。
  • 生存风险/奇点 (Existential Risk / Singularity):
    • 对超强AI失控威胁人类生存的担忧日益进入主流(如AI安全峰会、Hinton的警告)。
    • 讲者对此表示怀疑 (personally don't give these concerns too much credence)
    • 引用反对观点:
      • Francois Chollet:现有或可预见的AI技术不足以构成生存威胁。
      • Joelle Pineau:生存风险论述“精神错乱”(unhinged),批评其功利主义论证(无限大的灾难乘以任何非零概率压倒一切)。
      • Timnit Gebru等人:对生存风险的过度关注可能转移了对当前实际危害的注意力
  • 当前的实际危害 (Immediate Harms):
    • 更应关注的问题:
      • 偏见与歧视 (Bias)
      • 劳工剥削 (Worker Exploitation)
      • 版权侵犯 (Copyright Violation)
      • 虚假信息与欺骗 (Disinformation, Deception, Hallucinations)
      • 权力集中与监管俘获 (Concentration of Power, Regulatory Capture)
      • 碳排放 (Carbon Emissions)
      • 文化同质化与人类实践的侵蚀 (Homogeneity, Erosion of Rich Human Practices)
    • 虚假信息: AI可能被用于生成高度个性化、更具说服力的虚假信息,影响公众舆论和政治进程。已有研究表明AI生成的虚假信息能影响人类。视觉造假(Deepfakes)可能比文本更具影响力。
    • 核心担忧: 并非AI本身,而是“有权势的人和组织将利用AI做什么” (what people and organizations with power will use AI to do)。AI可能像社交媒体一样,被用于加强监控和控制。

结论性思考

  • 引用卡尔·萨根 (Carl Sagan) 在《魔鬼出没的世界》(The Demon-Haunted World) 中的警示:
    > "我预感到,在我的子孙后代生活的时代,当强大的技术力量掌握在极少数人手中,而代表公众利益的人甚至无法理解相关问题时;当人民失去了设定自身议程或明智地质疑权威的能力时;当我们紧握水晶、紧张地查阅星座运势,我们的批判能力衰退,无法区分感觉良好之事与真实之事时,我们几乎在不知不觉中滑回迷信与黑暗。"
  • 讲者认为,萨根描述的这种批判性思维能力下降、公众对技术与权力失察的风险,比AI末日论更为现实和紧迫。
  • 强调教育(如斯坦福所提供的)和开放获取(如开源)对于维护社会理性和应对未来挑战的重要性。