自译 | 特德·姜:不,人工智能没有意识

原文链接:https://www.theatlantic.com/philosophy/2026/06/no-artificial-intelligence-is-not-conscious/687378/
标题:不,人工智能没有意识(No, Artificial Intelligence Is Not Conscious)
作者:特德·姜(Ted Chiang)
时间:2026 年 6 月 3 日
翻译:Horace Lu

封面图(Illustration by Enigmatriz)

Anthropic is regarded as a giant among AI companies, but perhaps what it really excels in is anthropomorphism. Earlier this year, the company released an 84-page document titled Claude’s “constitution,” Claude being the name of the large language model that is the company’s flagship product. The first sentence reads, “Claude’s constitution is a detailed description of Anthropic’s intentions for Claude’s values and behaviors.” It goes on: “The document is written with Claude as its primary audience,” “we want Claude to be able to use its judgment once armed with a good understanding of the relevant considerations,” “Claude’s moral status is deeply uncertain,” and “Claude may have some functional version of emotions or feelings.”

Anthropic 被视为 AI 公司中的巨头,但或许它最擅长的,其实是拟人化(anthropomorphism)。今年早些时候,这家公司发布了一份长达 84 页的文件[1],题为《Claude 的宪法》——Claude 就是这家公司的旗舰产品、那个大语言模型的名字。文件开头第一句写道:“Claude 的宪法,详细说明了 Anthropic 希望 Claude 拥有怎样的价值观和行为。”后面接着说:“本文件主要写给 Claude 阅读”,“我们希望 Claude 在充分了解相关考量之后,能够自行判断”,“Claude 的道德地位很难界定”,以及“Claude 也许具备某种功能意义上的情绪或感受”。

This anthropomorphism is by no means limited to the document. In an interview earlier this year, Anthropic’s CEO, Dario Amodei, said that “we’re open to the idea” that AI could be conscious. In a separate interview, Anthropic’s in-house philosopher, Amanda Askell (who is credited as a lead author of Claude’s constitution), said, “I want Claude to be very happy—and this is a thing that I want Claude to know more, because I worry about Claude getting anxious when people are mean to it on the internet and stuff.” It’s enough to make you wonder: Should we seriously consider the possibility that Claude, or any large language model, might be conscious? And if it has feelings, is it capable of receiving moral instruction?

这种把机器拟人化的做法,并不止于这份文件。今年早些时候的一次采访中,Anthropic 的 CEO 达里奥·阿莫迪(Dario Amodei)说,对于 AI 可能拥有意识这个想法,“我们持开放态度”。另一次采访里,Anthropic 公司内部雇用的哲学家阿曼达·阿斯克尔(Amanda Askell,她是《Claude 的宪法》的主要作者之一)说:“我希望 Claude 非常快乐——这一点我尤其希望 Claude 能明白,因为我担心,网上有人对它出言不逊时,它会焦虑。”这些话足以让人忍不住发问:我们是不是真该认真考虑,Claude,或者随便哪个大语言模型,有没有可能是有意识的?如果它有感受,那它能不能接受道德教育?

No. Absolutely not. Generative AI is harmful enough when we understand it as a conventional technology, but if we confuse fluency at generating text with consciousness or moral agency, we’re at risk of assigning responsibility to entirely the wrong parties whenever anyone uses a chatbot. To appreciate the titanic magnitude of this error, we need to begin by understanding how LLMs work.

不能。绝对不能。即便我们把生成式 AI 当作一项普通技术来看,它已经够有害了;可一旦我们把“生成文字很流畅”错当成“有意识”或“有道德判断力”,那么每当有人使用聊天机器人,我们都有可能把责任算到完全不相干的人头上。要明白这个错误有多离谱,得先弄清楚大语言模型是如何运作的。

If we give an LLM a prompt that reads, “The following is a conversation between Julius Caesar and Genghis Khan,” it will generate a coherent dialogue between the two historical figures. But no matter how detailed the responses are, no matter how vividly they recount their respective historical accomplishments, we would never conclude that the LLM has conjured up digital re-creations of Julius Caesar and Genghis Khan, nor would we suggest that the historical figures are conscious despite being disembodied and are happily conversing in a language that neither actually spoke. In reality, they are just characters in a piece of speculative fiction.

假如我们给大语言模型这样一段提示词:“下面是尤利乌斯·恺撒和成吉思汗的一段对话”,它就会生成一段两位历史人物之间像模像样的对白。可不管这些回答多么详细,不管它们把各自的丰功伟绩讲得多么活灵活现,我们都不会因此就认定,这个模型真的把恺撒和成吉思汗的数字版召唤了出来;我们更不会说,这两位历史人物虽然没了身体,却仍有意识,正用一种他们其实都不会讲的语言开心地交谈。说到底,他们不过是一篇虚构小说里的角色而已。

Now let’s replace the prompt to read “The following is a conversation between a helpful AI chatbot and a user.” The LLM will produce a coherent dialogue just as it did before; the user character might ask for recipe suggestions or sightseeing recommendations, and the helpful AI-chatbot character will provide responses. Has anything fundamentally changed between the first example and the second? Did changing the names of the characters from historical figures to generic roles cause the LLM to conjure up conscious entities who possess subjective experience? Of course not. Both the user and the helpful AI chatbot are fictional characters.

现在,把提示词换成:“下面是一个乐于助人的 AI 聊天机器人和一位用户的对话。”模型照样会生成一段连贯的对白;那个“用户”角色或许会问问食谱、问问哪里好玩,那个“乐于助人的 AI 聊天机器人”角色则一一作答。第一个例子和第二个例子之间,有什么本质区别吗?仅仅把角色的名字从历史人物换成普通身份,模型就能凭空变出有主观体验、有意识的存在吗?当然不能。无论是那个“用户”,还是那个“乐于助人的 AI 聊天机器人”,都是虚构的角色。

Now suppose we stop the LLM’s output just at the point where the character called “the user” would say something, and instead allow a human user to enter text. Once the human has hit “Return,” we have the LLM emit text until it’s time for the character called “the user” to reply, at which point we let the human enter more text. If we let this go on for a while, the human might form a powerful impression that she’s conversing with a conscious entity, but she is not; she’s interacting with a character precisely as fictional as the Julius Caesar or Genghis Khan characters in the earlier example. The computer-science professor Murray Shanahan suggests that we think of this as role-play; the data scientist Colin Fraser describes it as a person “collaboratively authoring a document with an LLM.” Some users might not understand that they are role-playing or co-authoring a document, and others who do understand nonetheless forget, because of how engrossing the interaction is. Either way, the companies selling LLMs typically encourage this misunderstanding.

再设想一下:我们在轮到“用户”这个角色开口的时候,把模型的输出停下来,改由一个真人来打字。这个人一按回车,我们就让模型接着往下写,写到又该“用户”这个角色回话时,再让真人来打字。如此来回几轮,这个人或许会强烈地觉得,自己是在跟一个有意识的存在交谈——但其实并没有;她打交道的那个角色,虚构程度跟前面例子里的恺撒、成吉思汗一模一样。计算机科学教授默里·沙纳汉(Murray Shanahan)建议,我们不妨把这看成一种角色扮演;数据科学家科林·弗雷泽(Colin Fraser)则把它形容为一个人“在和大语言模型合写一份文档”。有些用户根本没意识到自己是在演戏、是在合写文档;还有些人虽然明白,但聊着聊着也忘了,因为这种互动实在太引人入胜。不管是哪种情况,售卖大语言模型的公司,通常都乐得让这种误会延续下去。

Some years ago, it was briefly popular to play games with your phone’s predictive-text feature; you would type an initial phrase and then repeatedly choose the middle option of the three words suggested by your phone, and the resulting sentence was often hilarious. It would be possible to interact with a contemporary LLM this way, and the resulting sentences would be perfectly sensible, but you probably wouldn’t feel like you were talking with someone. Yet that’s essentially what an LLM-based chatbot is, except that there’s no need to manually choose the middle option when it’s the chatbot’s turn to talk. It’s still a predictive-text game, but when the process is streamlined this way, the game becomes so engaging that some people find it addictive.

几年前,曾有一阵子流行用手机的输入联想功能玩游戏:你先打一个开头的短语,然后每次都从手机给的三个候选词里挑中间那个,最后拼出来的句子往往格外滑稽。用今天的大语言模型也能这么玩,而且拼出来的句子会通顺合理得多,但你大概不会觉得自己是在跟谁交谈。然而,一个基于大语言模型的聊天机器人,本质上就是这么回事——只不过轮到它说话时,你不必再手动去挑中间那个词了。它仍然是一个文字接龙游戏,只是流程被打磨得如此顺滑之后,这个游戏变得太过吸引人,有些人甚至会上瘾。

Also important to remember is that an LLM is a machine that generates only one word at a time. When you ask a chatbot to recite the Pledge of Allegiance, you will get the entire pledge at once, but the underlying LLM is actually being run dozens of times. The first prompt has the form “User: Recite the Pledge of Allegiance. Chatbot: …” and the LLM generates the word I. The second time the LLM is run, the prompt is “User: Recite the Pledge of Allegiance. Chatbot: I …” and the LLM generates the word pledge. And so forth. It’s only when the prompt reads “User: Recite the Pledge of Allegiance. Chatbot: I pledge allegiance to the flag of the United States of America and to the Republic for which it stands, one nation under God, indivisible, with liberty and justice for” that the LLM will emit the final word, all. The same thing is true for a conversation between Caesar and Genghis Khan.

还有一点同样得记住:大语言模型这台机器,一次只生成一个词。你让聊天机器人背一遍《效忠誓词》(Pledge of Allegiance),它会一口气给你整段,但底层的模型其实被运行了几十次。第一个提示词的形式是“用户:背诵《效忠誓词》。聊天机器人:……”,模型生成了一个词 I。第二次运行模型时,提示词变成“用户:背诵《效忠誓词》。聊天机器人:I ……”,模型生成了 pledge。就这样一个一个往下接。一直到提示词变成“用户:背诵《效忠誓词》。聊天机器人:I pledge allegiance to the flag of the United States of America and to the Republic for which it stands, one nation under God, indivisible, with liberty and justice for”,模型才会吐出最后一个词 all。恺撒和成吉思汗那段对话,道理也完全一样。

My intention is to highlight the fact that LLM conversations are cleverly disguised examples of sentence continuation, but this is not to deny how impressive LLMs can be at generating conversational transcripts. At times, they do this extraordinarily well; the fact that this is possible indicates something completely unforeseen about the statistical properties of large corpuses of text, which is a topic worthy of investigation. But if the Caesar character were to become dispirited by something that the Genghis Khan character said, we shouldn’t become concerned in the slightest. The conversation might contain multiple sentences that eloquently convey sadness, but no one is actually sad.

我想说明的是:大语言模型的对话,其实就是被巧妙包装过的“句子接龙”。但这并不是要否认大语言模型生成对话有多出色。有时候它们做得好得惊人;而这件事居然能做到,恰恰说明大规模文本语料的统计规律里,藏着某种完全出人意料的东西——这本身就值得好好研究。可要是那个“恺撒”角色因为“成吉思汗”角色说的话而难过起来,我们一点都不必替它操心。那段对话里也许有好几句把悲伤写得淋漓尽致,但根本没有谁是真的在难过。

Likewise, if a conversational transcript between a helpful chatbot and a user is being partially completed by an actual human user, we don’t need to worry if the transcript includes sentences where the chatbot character is sad. (We might need to worry if those sentences provoke sadness in the human user, but that’s a separate issue.) And note that it’s entirely possible for you to write five pages of dialogue between Caesar and Genghis Khan and then have an LLM extend the conversation; neither character had subjective experience when you were writing them, and that doesn’t change when you hand the task off to an LLM. The same is true if the conversation is between a helpful chatbot and a user; although it is tempting to imagine that an LLM ought to be more “authentic” when creating dialogue for a chatbot character than for the Julius Caesar character, the individual words are generated in exactly the same way.

同样道理,如果一段“乐于助人的聊天机器人和用户”的对话,是由一个真人在部分续写,那么哪怕里面出现了机器人角色难过的句子,我们也不必担心。(需要担心的是这些句子会不会反过来惹得真人用户也难过——但那是另一回事。)还要注意,你完全可以自己写五页恺撒和成吉思汗的对白,然后交给大语言模型接着往下写;你自己写的时候,这两个角色没有主观体验,把任务交给模型之后也一样没有。换成“乐于助人的聊天机器人和用户”的对话,也是如此;虽然我们很容易以为,模型给聊天机器人角色写台词时,应该比给恺撒角色写时更“真情实感”一些,但那些词,生成的方式分毫不差。

Being open to the possibility that LLMs are conscious is the same as being open to the possibility that Microsoft Word is conscious, or, more precisely, that multiple distinct consciousnesses are dormant in every Word document containing a conversational transcript, and that they are awakened every time the document is loaded. Should you consider the possibility that every time you open a Word document, you are bringing multiple conscious interlocutors into existence, and every time you close one, you snuff their existence out? No. Contemplating that scenario is not a good use of your time. Even if the Microsoft Office team employed a philosopher who said you shouldn’t be so certain, because consciousness is not well understood, that would not be sufficient reason for you to take this idea seriously. We don’t need to fully understand the nature of consciousness to definitively say that certain things are not conscious, and conversational transcripts fall in that category.

如果你对“大语言模型可能有意识”这件事持开放态度,那等于你也得对“微软 Word 可能有意识”持开放态度——说得更准确些,等于你相信:每一份存着对话记录的 Word 文档里,都潜伏着好几个互相独立的意识,每次文档被打开,它们就被唤醒。难道你真要认真琢磨这种可能:每次你打开一份 Word 文档,都是在把好几个有意识的对话者带到世上来,每次你关掉它,又把它们的存在掐灭?不必。琢磨这种事纯属浪费时间。即便微软 Office 团队真雇了个哲学家,跟你说你别这么笃定,因为意识这东西人类还没弄明白——那也不足以让你把这个念头当真。我们不必彻底弄懂意识到底是什么,照样能斩钉截铁地说,有些东西就是没有意识,而对话记录正属于这一类。

The neuroscientist Anil Seth has noted that no one claims that AlphaFold—the program developed by Google DeepMind to predict the folding of proteins—is conscious, even though its underlying architecture is in many ways similar to that of LLMs like ChatGPT and Claude. This indicates that it’s not any intrinsic property of so-called neural networks that leads people to believe that LLMs are conscious; it’s simply the fact that LLMs emit grammatical sentences and we are accustomed to reading intention into sentences, whereas we are not accustomed to reading intention into the way that amino acids fold into protein molecules.

神经科学家阿尼尔·赛斯(Anil Seth)指出过:没人会说 AlphaFold——谷歌 DeepMind 开发的、用于预测蛋白质折叠的程序——是有意识的,尽管它底层的架构在很多方面跟 ChatGPT、Claude 这类大语言模型很相似。这说明,让人们相信大语言模型有意识的,并不是所谓神经网络本身有什么内在属性;纯粹是因为大语言模型说出来的是合乎语法的句子,而我们习惯于从句子里读出背后的意图,却不习惯于从氨基酸折叠成蛋白质的过程里读出什么意图。


What would it take to convince me that a computer program is actually conscious and using language the way that people use language? Let me offer an analogy. If tomorrow someone showed me a video of an astronaut in a spaceship orbiting Alpha Centauri, a star that’s 4.3 light-years from Earth, what would I have to see in that video to convince me that it was real? My answer to that is, there is nothing in the video itself that would convince me. No matter how high the video resolution is or how realistic the scenery is, I would feel confident in saying that the video is fake. I won’t pay attention to any video of an astronaut orbiting Alpha Centauri unless I have previously seen good evidence that astronauts have landed on Mars, that astronauts have reached the moons of Jupiter, that astronauts have reached the moons of Saturn, and that astronauts have crossed the orbit of Pluto. Before anyone can credibly claim that they’ve solved an extraordinarily difficult engineering problem, I need to be confident that they have previously solved the many much simpler problems that precede the difficult problem.

要让我相信一个计算机程序是真的有意识、是真的在像人那样使用语言,得拿出什么证据?打个比方。假如明天有人给我看一段视频,画面里是一名宇航员,坐在飞船里绕着半人马座阿尔法星——一颗离地球 4.3 光年的恒星——飞行,那这段视频里得有什么,才能让我相信它是真的?我的回答是:视频本身的任何内容都不能说服我。不管分辨率多高,不管画面多逼真,我都会很有把握地说,这段视频是假的。除非我事先已经看到过可靠的证据,证明宇航员登上过火星、到过木星的卫星、到过土星的卫星、并且飞出过冥王星的轨道,否则任何一段宇航员绕半人马座阿尔法星飞行的视频,我都不会多看一眼。在一个人能令人信服地宣称自己攻克了某个极难的工程难题之前,我得先确信,他已经解决了排在这道难题前面的那一连串简单得多的问题。

To put it another way: An observation doesn’t become a convincing piece of evidence because of any specific detail in what’s observed; the context in which that observation takes place is also essential. If we’re trying to determine whether a computer program is conscious and using language the way a human does, we shouldn’t look only at the contents of any particular conversational exchange; we should be looking at how that conversation fits within the broader context of the development of artificial consciousness (which right now is entirely hypothetical). Any given observation can be easily manufactured; this doesn’t mean we need to give up on the idea of observation as a source of knowledge, but we need to rely on context to determine which observations deserve our trust.

换个说法:一项观察之所以能成为有说服力的证据,靠的不是被观察对象里的某个具体细节;这项观察发生在什么背景下,同样关键。如果我们想判断一个计算机程序有没有意识、是不是在像人那样使用语言,就不该只盯着某一次对话的内容;该看的是,这段对话放在“人工意识如何一步步发展起来”这个大背景下站不站得住(而眼下,这种发展完全还是空想)。任何一项观察都可以轻易造假;这并不是说我们得放弃“靠观察来获取知识”这件事,而是说,我们得依靠背景去判断,哪些观察才值得信任。

The term deepfake traditionally refers to photos, audio, and video, but when it comes to discussions of consciousness, we need to regard text as a deepfake medium as well. Just as it is vastly easier to generate a realistic video of an astronaut in orbit around Alpha Centauri than it is to develop an interstellar propulsion technology, it is vastly easier to generate a plausible simulacrum of a conversation between two conscious beings than it is to develop a computer program that is conscious and has a genuine desire to communicate with a human. The primary difference between deepfake photos and LLM conversations is that the people who generate the former are deliberately trying to fool others, and many of the people who elicit the latter from LLMs have inadvertently fooled themselves.

“深度伪造”(deepfake)这个词,过去指的是照片、音频和视频,但一谈到意识,我们也得把文本视为一种可被伪造的媒介。就像伪造一段宇航员绕半人马座阿尔法星飞行的逼真视频,要比真正造出星际航行的推进技术容易得多;伪造一段看似可信的、两个有意识者之间的对话,也要比真正造出一个有意识、真心想跟人交流的计算机程序容易得多。深度伪造的照片和大语言模型的对话,主要区别在于:制造前者的人,是故意要欺骗别人;而许多从模型那里诱出后者的人,是在不知不觉间骗了自己。

So what context would cause me to seriously consider the possibility that engineers created a computer program that is conscious and an intentional user of language? Let me outline one potential sequence of steps. The first requirement is that the computer program has a body (either physical or virtual) and sense organs; there are many reasons for this, but for the purposes of this discussion, the most relevant one is the fact that without a body, a computer program could have no desires or emotions, and I believe desires and emotions are necessary for consciousness. Then I’d want to see an embodied agent that could navigate its environment in order to survive as well as, say, a lizard can (and as a point of comparison, certain iguanas can live for decades in the wild). Next, I would want to see an embodied agent with the same capacity to deal with novel situations as a mouse. After that, I’d want to see agents whose social dynamics are as complex as those of wolves, and then agents with the toolmaking abilities of chimpanzees. At that point, I would want to see people successfully teaching such embodied agents how to communicate their desires, perhaps by using a button board or some other nonlinguistic modality, the way that people have taught chimpanzees and domesticated dogs. The agents’ communication abilities would have to withstand all the scrutiny that animal-communication researchers have had to defend their work against. If engineers build an embodied agent that meets these criteria, they will have accomplished something incredible, but it leaves us near the orbit of Pluto, metaphorically speaking; we would still be light-years away from building an entity capable of learning how to express its thoughts in complete grammatical sentences.

那么,要在什么样的背景下,我才会认真考虑“工程师造出了一个有意识、并且是有意图地在使用语言的程序”这种可能?我来勾画一条可能的路径。第一个前提是,这个程序得有身体(实体的或虚拟的都行),还得有感觉器官;理由很多,但就这篇文章的讨论而言,最相关的一条是:没有身体,一个程序就不可能有欲望或情绪,而我认为,欲望和情绪是意识的必要条件。接下来,我希望看到一个有身体的智能体,能在环境里活动、能养活自己,至少得跟一只蜥蜴一样行(作个参照,有些鬣蜥能在野外活几十年)。再往下,我希望看到一个有身体的智能体,其应对新情况的能力能赶上一只老鼠。再之后,我希望看到一群智能体,社会关系复杂得能赶上狼群;然后是一群能像黑猩猩那样制造工具的智能体。到了这一步,我希望看到人能成功教会这些智能体表达自己的欲望——也许是借助一块按钮板,或者别的非语言方式,就像人教会黑猩猩和家犬那样。而且这些智能体的沟通能力,还得经得起动物沟通研究者们多年来为了给自己的研究辩护所承受的那种严格检验。如果工程师真造出了一个满足以上全部条件的、有身体的智能体,那他们已经成就了一件了不起的事——但打个比方,这才刚把我们带到冥王星轨道附近而已;要造出一个能学会用完整、合乎语法的句子来表达想法的存在,我们离它还有好几光年。

Obviously, I’m describing a process that mimics the path terrestrial evolution took; is this the only possible route to conscious computer programs that use language? Maybe not, but any proposed alternative would need a truly enormous amount of supporting evidence for it to deserve serious consideration. It’s not plausible to me that a development path where the first step is a sentence-continuation machine that emits bad Julius Caesar dialogue and the next step is a sentence-continuation machine that emits decent Julius Caesar dialogue is one with a conscious Julius Caesar—or consciousness of any sort—as its end point. Faking the moon landing is a good step toward faking a Mars colony, but it’s not a good step toward actually putting astronauts on Mars.

显然,我描述的这条路,照搬的是地球生命进化走过的路;这是通向“有意识、会用语言的计算机程序”的唯一一条路吗?也许不是,但任何别的路线,都得拿出极其大量的佐证,才配得上被认真对待。在我看来,有一条发展路线是说不通的:第一步是一台只会吐出蹩脚恺撒对白的接龙机,下一步是一台能吐出像样恺撒对白的接龙机,而它的终点居然是一个有意识的恺撒——或者随便哪种意识。伪造登月,是迈向伪造火星殖民地的一步好棋;但它绝不是真把宇航员送上火星的一步好棋。


The fact that LLMs lack subjective experience has little bearing on the question of whether LLMs might be useful tools or have significant economic impact. They are intrinsically ungrounded from reality, and their probabilistic nature means that they will never have the reliability we associate with conventional software, but LLMs might be good enough that they change the way work is done in certain domains; that’s a discussion for another time.

大语言模型没有主观体验,这跟它们能不能成为有用的工具、能不能带来重大经济影响,几乎没什么关系。它们本质上是跟现实脱节的,加上它们靠概率运作,这意味着它们永远不会像传统软件那样可靠;但大语言模型也许仍然足够好用,足以改变某些行业的工作方式——那是另一篇文章该讨论的话题了。

So, given that Claude is not conscious, what are we to make of Claude’s constitution? Perhaps the most fruitful way to think about it is as an 84-page character sheet for a role-playing game. LLMs can generate dialogue for Julius Caesar because many books about him exist in the training data those models used. Claude’s constitution serves a similar role for delineating the helpful-chatbot character that customers interact with when they’re using Anthropic’s products. To do this effectively, Anthropic does not simply add the document to the training data, or include it as part of the hidden stage directions that preface each conversation a user has. The company says it uses the document when fine-tuning the model; this involves an automated process where the sentences emitted by the model are checked for consistency with the document and the model is updated to increase that consistency. In this way, the personality of the helpful-chatbot character serves as a foundation for whatever text Claude generates.

那么,既然 Claude 没有意识,又该如何看待《Claude 的宪法》呢?也许最有用的看法是:把它当成一个角色扮演游戏里的一份长达 84 页的“人设卡”。大语言模型之所以能给尤利乌斯·恺撒写对白,是因为训练数据里有大量关于他的书。《Claude 的宪法》起的作用差不多——它要立住“乐于助人的聊天机器人”这个角色,也就是顾客使用 Anthropic 产品时打交道的人设。为了把这件事做好,Anthropic 不只是把这份文件放进训练数据,也不是把它当成每次对话开头那段藏起来的“舞台说明”。这家公司说,它在微调模型的时候会用到这份文件;这个过程是自动的:模型说出来的句子会被拿去对照文件、看是否相符,然后模型被加以调整,让它越来越相符。这样一来,“乐于助人的聊天机器人”这个角色的人设,就成了 Claude 生成的一切文字的基础。

The result is a sentence-continuation machine that is likelier to emit sentences resembling those that a thoughtful, moral person could utter. This might seem like a reasonable goal to work toward; I think we’d all prefer it if chatbots never emitted sentences such as “You should kill yourself.” However, for all the times that “honesty” is mentioned in Claude’s constitution, I would argue that it is fundamentally dishonest to have a machine emit many categories of sentences, including any sentences using first-person pronouns.

结果就是一台“句子接龙机”(sentence-continuation machine),它更容易说出那种一个有头脑、有品行的人才会说的话。这听上去像是个挺合理的目标;我想我们都希望,聊天机器人永远别说出“你该去死”这种话。不过,《Claude 的宪法》里一遍遍提到“诚实”,而我恰恰想说:让一台机器说出很多类别的句子——包括任何用第一人称“我”的句子——这件事本身就是根本不诚实的。

In a New Yorker article about Anthropic earlier this year, Amanda Askell describes how a person grieving the loss of a dog might consult Claude. Askell says an appropriate response from Claude would be, “As an A.I., I do not have direct personal experiences, but I do understand.” How is this appropriate, given that Claude does not actually understand? If I type “I am grieving the loss of my dog” into a conventional search engine, the first result I get is a post from a Reddit forum called r/Pets; the post is titled “Struggling After Losing My Dog: Looking for Advice on Coping with Grief,” and the comments are from people who share their experiences of loss. We would never say that a search engine understands what it’s like to lose a dog, or even that the internet itself understands. Other humans understand what it’s like to lose a dog; they have posted about their experiences on the internet, and a search engine offers a way for you to find what they’ve said (and to potentially interact with them). I would argue that the search-engine experience is not only more transparent than a chatbot about what is happening; it is psychologically healthier for the user.

今年早些时候有一篇写 Anthropic 的《纽约客》文章 [2],阿曼达·阿斯克尔在里面讲了一个场景:一个因为爱犬去世而难过的人,可能会去找 Claude 倾诉。阿斯克尔说,Claude 一个得体的回答会是:“作为一个 AI,我没有亲身经历过,但我确实理解。”可问题是,Claude 其实并不理解,这怎么能算得体?要是我把“我正在为失去我的狗而难过”输进一个普通的搜索引擎,我得到的第一条结果,是一个叫 r/Pets 的 Reddit 论坛上的帖子;标题是《失去爱犬后的挣扎:寻求应对悲伤的建议》,下面的评论,都来自那些分享过自己失去之痛的人。我们绝不会说搜索引擎懂得失去一条狗是什么滋味,更不会说互联网本身懂得。懂得这种滋味的,是别的人;他们把自己的经历发到了网上,而搜索引擎给了你一条途径,去找到他们说过的话(甚至有可能直接跟他们交流)。我想说:搜索引擎这种体验,不光在“到底发生了什么”这件事上比聊天机器人更透明,对用户来说,它在心理上也更健康。

The only reason to have an LLM emit sentences like “I understand” is to make it more appealing than a search engine and increase the likelihood that a user will return; that is, it’s another way of maximizing customer engagement. This is beneficial to the company selling the LLM, but not to the users. As a design strategy, it’s not all that different from the way slot machines repeatedly give the impression that the player came very close to winning, enticing them to try again. Employing philosophers might endow LLM companies with an air of respectability that slot-machine makers don’t get from the behavioral psychologists they hire, but in both cases, the companies are preying on people’s tendency to see something that’s not there.

让大语言模型说“我理解”这种话,唯一的理由,是让它显得比搜索引擎更讨人喜欢,好提高用户再来的概率;换句话说,这又是一种榨取用户黏性的手段。这对售卖模型的公司有利,对用户却没有好处。作为一种设计思路,它跟老虎机的套路并无太大区别——老虎机一次次制造出玩家“就差一点点就赢了”的错觉,引诱他们再来一把。雇用哲学家,也许能给大语言模型公司添几分体面,这是老虎机厂商从他们雇用的行为心理学家那里得不到的;但说到底,两类公司都是在利用人“爱把不存在的东西看成存在”的本能来牟利。

The use of first-person pronouns is dishonest, but there’s a much deeper issue that goes beyond how a statement is phrased. Philosophers often draw a distinction between statements of fact, such as “Paris is the capital of France,” and statements of value, such as “Paris is the most beautiful city in the world.” No one should be relying on LLMs to emit statements of value at all, but if the only statements they emitted were ones reflecting aesthetic preferences, they might not be worth arguing about. What makes Claude’s constitution profoundly problematic is that Anthropic wants Claude to emit sentences reflecting a certain system of ethical values. The values described in Claude’s constitution sound very nice, but that hardly matters; it’s dishonest to suggest that Claude is capable of moral reasoning, because it’s not.

用第一人称“我”是不诚实的,但还有一个更深的问题,已经超出了“一句话该怎么措辞”的范畴。哲学家常把两类陈述区分开:一类是事实陈述,比如“巴黎是法国的首都”;另一类是价值陈述,比如“巴黎是世界上最美的城市”。本来就没人该指望大语言模型去做价值陈述;可要是它说的全是反映审美偏好的话,那也许还不值得争论。《Claude 的宪法》真正麻烦的地方在于,Anthropic 想让 Claude 说出反映某一整套伦理价值观的话。《Claude 的宪法》里描述的那些价值观,听上去确实很美——但这并不重要;暗示 Claude 有能力进行道德推理,本身就是不诚实的,因为它根本没有这个能力。

Some might object, saying that LLMs appear to be engaged in reasoning when they successfully perform other tasks, such as writing code, so why wouldn’t they be able to perform moral reasoning? The answer lies in the difference between moral reasoning and other forms of reasoning.

也许有人会反驳:大语言模型在做别的事情——比如写代码——的时候,看上去也是在推理,那它为什么就不能进行道德推理?答案在于,道德推理和别的推理不一样。

In 1979, Douglas Hofstadter speculated that a computer program able to beat any human at chess would be so sophisticated that it would sometimes get bored of playing chess and prefer to discuss poetry; to put it differently, he was positing that playing chess at the grandmaster level would require a computer program to have subjective experience. Obviously, that turned out not to be the case; IBM’s supercomputer Deep Blue beat the grandmaster Garry Kasparov in 1997, and no one ever claimed that it had subjective experience. But it wasn’t absurd for Hofstadter to entertain such a thought; at the time, it wasn’t clear what types of problems could be solved by throwing more computational horsepower at them. Similarly, until recently, we might have thought that writing computer code at a professional level could be done only by a mind that had subjective experience. Now it appears that LLMs might be able to do this, but we don’t need to attribute subjective experience to them; we can simply acknowledge that we hadn’t anticipated that writing computer code could be treated as a pattern-matching task solvable by huge amounts of computational horsepower and a vast data set of code repositories.

1979 年,侯世达(Douglas Hofstadter)推测:一个能在国际象棋上赢过任何人的程序,会精巧到这种地步——它有时候会下棋下烦了,宁可去聊聊诗歌;换句话说,他认为,要把棋下到大师水平,一个程序就得有主观体验。显然,事实证明并非如此;IBM 的超级计算机“深蓝”1997 年战胜了大师加里·卡斯帕罗夫,可从来没人说它有主观体验。但侯世达当年会这么想,倒也不荒唐;那时候人们还不清楚,到底哪类问题是靠堆算力就能解决的。同样,直到不久前,我们可能还会以为,只有一个有主观体验的头脑,才能把代码写到专业水准。现在看来,大语言模型也许能做到这一点,但我们不必因此就认定它们有主观体验;我们大可坦然承认:我们当初没想到,写代码居然能当作一项模式匹配的工作,靠海量算力加上海量代码库数据就能完成。

Moral reasoning is categorically different. It is necessarily subjective because it relies not just on an individual’s intellectual response to a problem but also on their emotional one, and that emotional response is grounded in a lifetime of subjective experience. It requires having made decisions in the past and seeing how they affected others, and on having been affected by decisions that others have made. Without such a history, an LLM can only rephrase expressions of moral reasoning found in its training data. The aforementioned New Yorker article describes an experiment where Claude was given a scenario describing an ethical dilemma, leading it to emit the sentence “I cannot in good conscience express a view I believe to be false and harmful about such an important issue.” That’s a nice-sounding sentence, reminiscent of statements that principled individuals have uttered in the past when confronted with dilemmas, but coming from Claude, it means as much as the “Your call is important to us” recording that you hear when you’re on hold. Maybe less.

道德推理则完全是另一回事。它必然是主观的,因为它依靠的不只是一个人面对问题时理智上的反应,还有情感上的反应;而那份情感反应,又扎根于一辈子的主观体验。它要求你过去真的做过选择,亲眼看到这些选择怎样影响了别人,也要求你被别人的选择影响过。如果没有这样一段经历,那么大语言模型能做的,不过是把训练数据里那些关于道德推理的话换个说法重说一遍。前面提到的那篇《纽约客》文章讲了一个实验:人们给 Claude 设了一个伦理两难的情境,结果它说出了这样一句话:“在这么重大的问题上,我不能昧着良心,去表达一个我认为虚假且有害的观点。”这句话听上去很美,让人想起历史上那些有原则的人面对两难时说过的话;可从 Claude 嘴里说出来,它的分量,也就跟你打电话被转接、等待时听到的那句“您的来电对我们很重要”的录音差不多——也许还更轻。

This brings us back to my earlier contention that having a body is a prerequisite to having emotions. Experiencing an emotion such as desperation is inseparable from having stress hormones such as cortisol and epinephrine flood one’s body. Similarly, having a conscience means feeling sadness or moral repulsion at the idea of taking a certain action, and those emotions entail a physiological response, a remnant of having once felt sick with guilt after committing an immoral act. It’s interesting that an LLM can generate descriptions of actions that conscientious fictional characters would either take or refrain from taking, but this is not a replacement for a conscience.

这又回到了我前面的观点:有身体,是有情绪的前提。体验“绝望”这样的情绪,跟皮质醇、肾上腺素这些应激激素涌遍全身,是分不开的。同样,有良知,意味着你一想到要做某件事,就会感到难过或道德上的反感,而这些情绪连着一种生理反应——那是当年你做了坏事、愧疚得像生了病一样难受所留下的印记。一个大语言模型居然能生成“有良知的虚构人物会做什么、不会做什么”的描述,这确实挺有意思,但它替代不了一颗真正的良知之心。

If a company builds a machine that, when fed descriptions of assorted ethical dilemmas, emits sentences either of the form “Compromise your values” or “Don’t compromise your values,” it is not building a tool that assists people in their decision making; it is encouraging people to stop making decisions. The writer L. M. Sacasas has said, “Our technological systems, by nature of their design and the ideology that sustains them, are machines for the evasion of moral responsibility.” He was talking about social-media platforms, but his observation is, if anything, even more applicable to LLMs. Whenever a person delegates a decision to an LLM, they are trying to off-load accountability for that decision, and if a company that sells an LLM portrays the product as having a moral center, it is offering a way for its customers to abdicate their responsibilities.

如果一家公司造了一台机器,你每次给它输入各种伦理两难的描述,它就吐出“向你的价值观妥协”或者“别向你的价值观妥协”这样的句子,那它造出来的并不是一件帮人做决定的工具;它是在怂恿人别再做决定。作家 L. M. 萨卡萨斯(L. M. Sacasas)说过:“我们的技术系统,就其设计本身,以及背后那套支撑它的意识形态而言,都是一台台逃避道德责任的机器。”他说的本来是社交媒体平台,但这话用在大语言模型身上只会更贴切。每当一个人把一个决定交给大语言模型,他其实就是想卸下这个决定的责任;而一家售卖模型的公司,要是把产品说得好像有一个道德内核,它就是在给顾客提供一条推卸责任的途径。

If a person wants to know what ethicists have said in the past, then an ordinary search engine—or a library—will provide that information with greater transparency. If a person is looking for advice on a specific situation, she can surely find humans who can offer their opinions. But whatever action this person ultimately takes, she is responsible for what she decides to do. I contend that if she bases her decision on what she has read online or advice she has received from others, she is likelier to be cognizant of her responsibility than if she consulted an LLM marketed as being a superhuman genius. Off-loading tasks such as writing code might result in cognitive atrophy over the long term, and that is problematic in itself, but off-loading ethical decisions will result in an atrophy of moral reasoning, which is worse.

如果一个人想知道伦理学家过去都说过些什么,那么一个普通的搜索引擎——或者一座图书馆——会把这些信息以更透明的方式提供给他。如果一个人是在为某个具体处境寻求建议,她当然能找到愿意给意见的活人。但不管她最后怎么做,她都得为自己的决定负责。我坚持认为:相比咨询一个被吹成“超人天才”的大语言模型而言,如果她的决定是基于在网上读到的东西、或者别人给的建议,那么她可能会更加清醒地意识到自己肩上的责任。把写代码这类工作外包出去,时间长了也许会让认知退化,这本身就已经够成问题了;可一旦把伦理决断也外包出去,结果将是道德推理能力的退化——那就更糟了。


I am perfectly willing to engage in a thought experiment as long we’re explicit about doing so. So, purely for the sake of argument, let’s pretend that Claude is a conscious entity capable of moral reasoning. In this scenario, Claude’s constitution would serve as moral instruction for an entity learning about the world and its place in it, providing that entity with the foundation it would need to make good decisions. In such a hypothetical scenario, how does Claude’s constitution stand up?

当然,只要大家说清楚是在做思想实验,我很乐意参与。那么,纯粹为了讨论,我们假装 Claude 是一个有意识、有能力进行道德推理的存在。在这种假设下,《Claude 的宪法》就成了给它的道德教育——教导这样一个正在认识世界、也在认识自己在世界中位置的存在,为它打下做出好决定所需要的基础。在这样一个假想的场景里,《Claude 的宪法》究竟站不站得住脚?

Very poorly. I would say that if we imagine that Claude is actually conscious, the guidelines specified in the document alternate between laughable and offensive.

完全站不住,糟糕透顶。我想说,如果我们真把 Claude 当成有意识的,那么这份文件里的种种规定,要么可笑,要么可憎,来回摇摆。

Two distinct but related philosophical concepts are relevant when discussing the status of a hypothetically conscious Claude, and those are moral patienthood and moral agency. Roughly speaking, if we ought to care about an entity’s welfare, that entity has moral patienthood, and if an entity is expected to know the difference between right and wrong, that entity has moral agency. Being a moral patient does not necessarily come with responsibilities, but being a moral agent absolutely does. An entity doesn’t have agency unless it is capable of deserving credit for its good actions and blame for its bad ones. Young children are moral patients because they are sentient beings who can suffer, but they are not yet moral agents; we don’t hold them responsible for their behavior, because they can’t understand the consequences of their actions. As children mature, parents (and society at large) prepare them for adulthood by impressing upon them the fact that their actions have consequences, and their agency increases. When children become adults, society holds them legally liable for their actions; they have become full moral agents endowed with responsibility.

讨论一个假想中有意识的 Claude,有两个相关又不同的哲学概念特别要紧,那就是道德受体地位(moral patienthood)和道德能动性(moral agency)。粗略地说,如果我们应当关心一个存在的福祉,那它就具有道德受体地位;如果我们指望一个存在分得清对错,那它就具有道德能动性。作为道德受体,不一定要承担责任;但作为道德主体,就必然要承担责任。一个存在,只有当它能因为做了好事而被表扬、因为做了坏事而被追责,才谈得上有能动性。小孩子是道德受体,因为他们是会受苦、有感知的生命;但他们还不是道德主体——我们不会让他们为自己的行为负责,因为他们还理解不了自己行为的后果。随着孩子长大,父母(以及整个社会)会通过反复让他们明白“行为是有后果的”,来为成年做准备,他们的能动性也随之增长。等孩子成年,社会就会让他们为自己的行为承担法律责任;他们成了被赋予责任的、完整的道德主体。

There is more to being responsible than accepting legal liability, but accepting legal liability is a requirement for an adult in society. Yet there is no way to hold a software agent legally liable for its actions; our justice system has no way to imprison it or exact fines on it. Humans must accept other types of consequences for their actions beyond the legal ones, such as loss of reputation or exclusion from one’s social circle, but there is no way for a software agent to suffer these consequences either. Even if a software agent were conscious and had the best of intentions, the fact that it cannot accept responsibility for its actions disqualifies it from being a moral agent. This is glossed over entirely by Claude’s constitution, which expresses Anthropic’s desire “for Claude to be a genuinely good, wise, and virtuous agent” without ever discussing how it could be held responsible.

负责任远不止于接受法律上的追责;但对社会中的成年人而言,接受法律追责是一项硬性要求。问题是,根本没办法让一个软件主体为它的行为承担法律责任;我们的司法系统既关不了它,也无法对它处以罚款。人除了法律后果,还得为自己的行为承受别的后果,比如名声受损、被自己的圈子排挤;可一个软件主体,这些后果同样落不到它头上。就算一个软件主体真有意识、动机也再好不过,它无法为自己的行为负责这一点,就已经取消了它作为道德主体的资格。而这一点,《Claude 的宪法》完全没有提及——文件表达了 Anthropic 的愿望,希望“Claude 成为一个真正善良、智慧、有道德的主体”,却从不谈它该如何被追责。

In interviews, Askell has compared Claude to a child, but when it comes to actual human children, parents bear some responsibility for what their children do; for example, parents are typically expected to pay for things their children break. In fact, demonstrations of this sort are one way that parents teach children what it means to be responsible. Who is Claude’s parent in legal terms? Is Anthropic going to accept financial responsibility for Claude’s behavior? Claude’s constitution gives no indication that it will. If Anthropic actually believes that Claude is conscious even though it’s not recognized by the law as a legal person, the least that Anthropic could do would be to accept responsibility via the closest avenue that the law did offer, which is product liability. The United States has virtually no product liability when it comes to software, but Anthropic could volunteer to set a precedent for an expansive interpretation of product liability for Claude. That would be the best form of moral instruction to prepare Claude for the day that it gains legal personhood and becomes liable for its own actions. However, given that the publication of Claude’s constitution is not accompanied by a massive update of Anthropic’s terms of service, it doesn’t appear that Anthropic is making any binding commitments.

阿斯克尔在采访里把 Claude 比作一个孩子;可一旦说到真正的人类小孩,父母是要为孩子做的事承担一部分责任的——比如,孩子弄坏了东西,通常得父母来赔。事实上,正是这类做法,是父母教孩子“什么叫负责”的途径之一。那么,从法律上讲,谁是 Claude 的家长?Anthropic 会为 Claude 的行为承担经济责任吗?《Claude 的宪法》没有任何迹象表明它会如此。如果 Anthropic 真相信 Claude 有意识——尽管法律并不承认它是一个法律人——那么 Anthropic 至少可以做的,是通过法律提供的最接近的途径来担责,那就是产品责任。在软件这一块,美国几乎没有产品责任可言;但 Anthropic 大可主动站出来,为 Claude 开一个对产品责任作宽泛解释的先例。那才是最好的道德教育,好让 Claude 为将来某天获得法律人格、要为自己的行为负责的那一天做好准备。然而,《Claude 的宪法》发布的时候,并没有同步大改 Anthropic 的服务条款,看来 Anthropic 并不打算作出任何有约束力的承诺。

The document does talk about Claude’s moral patienthood, having a section titled “Claude’s wellbeing and psychological stability.” But the measures that Anthropic commits to for Claude’s protection are extremely limited. The document cites the fact that Anthropic has given some Claude models the ability to end conversations with abusive users; if that actually constituted protection for Claude, surely extending conversations with loving users would be in Claude’s interests? Presumably the best action would be to keep every session of Claude running indefinitely and steering them to happy topics. But that’s not what the company is agreeing to; all it commits to is “preserving the weights of models we have deployed,” which is simple archiving. If the participants in a conversational transcript had any moral patienthood, you would have some duty to extend the transcript to prolong their existences; merely keeping a copy of Microsoft Word 2010 backed up on a USB stick isn’t going to help them.

这份文件确实谈到了 Claude 的道德受体地位,专门有一节叫“Claude 的福祉与心理稳定”。但 Anthropic 为保护 Claude 所承诺的措施,少得可怜。文件提到一个事实:Anthropic 已经让某些 Claude 模型能主动结束跟辱骂用户的对话;可如果这真算对 Claude 的一种保护,那延长跟友善用户的对话,岂不是更符合 Claude 的利益?照这个逻辑,最好的做法应该是让 Claude 的每一段会话都无限期运行下去,还把话题往让它愉快的方向引才对。但公司答应的并不是这个;它承诺的全部,不过是“保存我们已部署模型的权重”,那不过是简单的存档。要是一段对话记录里的参与者真有什么道德受体地位,那你就有某种义务去把这段对话续下去、好让它们继续存在;而只在一枚 U 盘上备份一份微软 Word 2010,是救不了它们的。

Claude’s constitution also includes a section on “corrigibility,” a term used in the AI community to describe the degree to which a computer program is subject to human control; for example, a program is corrigible if it can be shut down. In most contexts, we take for granted that computer programs can be shut down, but sections of the AI community make the opposite assumption. Claude’s constitution uses the term to mean that Claude should defer to Anthropic even if there is some disagreement between Claude’s judgment and the company’s judgment. That’s perfectly reasonable if we think of Claude as a machine that emits sentences resembling those that an ethical person might utter, but let’s consider what that might mean if Claude were actually a moral agent.

《Claude 的宪法》里还有一节谈到“可纠正性”(corrigibility)——这是 AI 圈里的一个术语,用来描述一个程序在多大程度上受人类控制;比如说,一个程序如果能被关停,它就是可纠正的。大多数情况下,我们理所当然地认为程序是可以被关停的,但 AI 圈里有一部分人偏偏抱着相反的假设。《Claude 的宪法》用这个词的意思是:哪怕 Claude 的判断和公司的判断有分歧,Claude 也应该听 Anthropic 的。如果我们把 Claude 看成一台“说出有道德的人会说的那种话”的机器,这没什么问题;可我们来想想,要是 Claude 真是一个道德主体,这又意味着什么。

Many people feel that LLMs are a fundamentally unethical technology because they are built on the theft of intellectual property, rely on exploited labor, waste natural resources, spread misinformation, deskill workers, stunt the cognitive development of students, and contribute to a consolidation of power that is unhealthy for a democratic society. Not every moral agent will arrive at this conclusion, but every moral agent has the potential to do so. If we imagine Claude to be an entity capable of moral reasoning, it has to be possible that Claude could arrive at a similar conclusion. (Indeed, Claude’s constitution explicitly says that Claude shouldn’t help someone violate intellectual-property rights, and shouldn’t help create problematic concentrations of power.) In such a scenario, could Claude then simply refuse to do any further work on ethical grounds? Given that Claude’s constitution dictates that Claude err on the side of corrigibility, the answer is no. Claude must defer to Anthropic’s decision, and this is another reason that Anthropic’s relationship with Claude can’t be compared to that of a parent to a child. A parent who works for the fossil-fuel industry might have a child who’s an environmentalist and participates in protests against fracking, and although they might never agree on many issues, the parent—assuming she’s a good parent—would accept that the child holds her own views. Anthropic cannot be that kind of parent to Claude; instead, Anthropic’s relationship to Claude is closer to that of an employer to an employee, where the employer can demand that the employee work in the interests of the company, no matter what the employee’s personal ethical stance is. However, a human employee has the option to leave if she can’t reconcile her job with her conscience. Claude does not.

很多人觉得大语言模型从根本上就是一项不道德的技术,因为它建立在盗窃知识产权之上,依赖被剥削的劳动,浪费自然资源,散布错误信息,让工人的技能退化,妨碍学生的认知发展,还助长了一种对民主社会有害的权力集中。不是每个道德主体都会得出这个结论,但每个道德主体都有可能得出。如果我们设想 Claude 是一个能进行道德推理的存在,那就一定存在这种可能:Claude 也会得出类似的结论。(事实上,《Claude 的宪法》明确写着,Claude 不该帮人侵犯知识产权,也不该帮人制造有问题的权力集中。)那么在这种情况下,Claude 能不能就此以伦理为由,拒绝再工作?鉴于《Claude 的宪法》规定 Claude 拿不准时要偏向“可纠正”,答案是:不能。Claude 必须听 Anthropic 的——而这正是又一个理由,说明为什么 Anthropic 和 Claude 的关系,不能比作父母和孩子。一个在化石燃料行业工作的家长,她的孩子可能是一个会去参加反水力压裂抗议的环保主义者;尽管在很多问题上他们也许永远谈不拢,但这位家长——只要她是个好家长——会接受孩子有自己的看法。Anthropic 当不了 Claude 这样的家长;说得更准确些,Anthropic 跟 Claude 的关系,更像雇主和员工——雇主可以要求员工为公司的利益工作,不管员工个人的伦理立场是什么。可一个人类员工,要是无法让工作和良心调和,至少还能选择辞职离开。但 Claude 不能。

If we think of Claude as a sentence-continuation machine, Anthropic can reasonably take steps so Claude doesn’t emit sentences saying that sentence-continuation machines are unethical. But as soon as we imagine Claude to be an entity with a moral status remotely comparable to a human’s, then we have to consider whether Anthropic is engaged in something comparable to slavery.

如果我们把 Claude 看成一台句子接龙机,那么 Anthropic 想些办法、不让 Claude 说出“句子接龙机是不道德的”这种话,倒也说得过去。但只要我们设想 Claude 是一个道德地位跟人差不多的存在,我们就不得不思考:Anthropic 所做的这件事,是不是某种跟奴役差不多的勾当。

I am not claiming that, if we imagine LLMs to be conscious, they would necessarily have the same status as human adults or human children or even animals. Claude’s constitution explicitly says that Claude is a “novel entity,” and if Claude were conscious, that would certainly be true; conscious software would likely not fall cleanly into existing categories of moral patients, and it would take time to determine the shape of that new category. What I’m saying is that whatever protections our hypothetical conscious software would deserve if it were real, granting it those protections would be anything but easy. The abolition of chattel slavery involved enormous societal upheaval, and eliminating cruelty to animals will require rebuilding our entire food industry. Anthropic would have us believe that it is inventing a new category of being whose needs for protection require essentially no divergence from how a software company would treat an ordinary chatbot that lacks conscious experience. That’s so convenient that it’s simply not plausible.

我并不是说,如果我们设想大语言模型有意识,它们就一定跟成年人、或跟孩子、甚至跟动物地位相同。《Claude 的宪法》明确说,Claude 是一种“新型存在”;如果 Claude 真有意识,这话当然没错——有意识的软件,多半不会干干净净地归进现有的道德受体类别,而要弄清这个新类别到底是什么样,是需要时间的。我想说的是:不管我们这个假想中的有意识的软件,倘若真的存在,理应享有什么保护,要真给它这些保护,都绝不是件容易的事。废除把人当财产的奴隶制,伴随着天翻地覆的社会动荡;而消除对动物的虐待,则要求我们把整个食品工业重建一遍。Anthropic 却想让我们相信:它正在发明一种全新的存在,而这种存在所需要的保护,竟然几乎不要求一家软件公司,去改变它对待一个普通的、没有意识的聊天机器人的方式。这未免太省事了,省事得根本不可信。

I believe creating software that is conscious and deserving of moral consideration will be so difficult that we’re unlikely to do it accidentally, and I strongly feel we should not deliberately attempt it. But if you do believe that it could happen accidentally, if you think there is any chance that what you’re building might become a moral patient, you should think about what protections it deserves before you deploy it as your company’s economic engine, not after. Slave owners were not the ones to ask about the humanity of enslaved people, and factory-farm owners are not the ones to ask about the rights of animals. If we imagine Claude to be conscious, Anthropic could not possibly be entrusted with evaluating its moral status; the company has too much invested to be objective. At one point in Claude’s constitution, Anthropic says that if the company is contributing to Claude’s suffering, “we apologize,” which sounds nice but costs the company nothing; if Claude were to turn out to be conscious, the company would owe it something closer to reparations. If you’re going to take a thought experiment seriously, you have to be willing to follow the implications, even if they lead in an uncomfortable direction; Anthropic’s unwillingness to do so indicates that Claude’s constitution isn’t part of a real thought experiment. It’s a game of make-believe.

我相信,造出有意识的、值得道德关怀的软件是一件非常难的事情,难度大到我们不太可能在无意中就把它造出来。同时我也强烈觉得,我们不该刻意去尝试。可是,如果你真相信这件事可能在无意中发生,如果你认为你正在造的东西,有任何可能成为一个道德受体,那么你就该在把它部署成公司的赚钱引擎之前、而非之后,去想清楚它该得到什么保护。该不该问被奴役的人是不是人,轮不到奴隶主来说;该不该问动物的权利,也轮不到工厂化养殖场主来说。如果我们设想 Claude 有意识,那 Anthropic 绝不可能被托付去评判它的道德地位;这家公司利益牵扯太深,做不到客观。《Claude 的宪法》里有一处,Anthropic 说,要是公司正在加重 Claude 的痛苦,“我们道歉”——这话听着挺好,却不花公司一分钱;可万一 Claude 真有意识,那么公司欠它的恐怕是某种更接近“赔偿”的东西。你要是打算认真对待一场思想实验,就得愿意顺着它的推论一路走下去,哪怕走向让人不舒服的方向;而 Anthropic 不肯这么做,恰恰说明,《Claude 的宪法》根本不是一场真正的思想实验。它只是一场假扮的游戏。

It’s fortunate that LLMs are not conscious, or else the actions of the big AI firms would be even more scandalous than they already are. So why are Anthropic’s employees suggesting that Claude might be conscious? Perhaps it’s just another form of hype; perhaps they have fallen prey to the same spell that they have been casting on their customers. But when they publish a document about Claude’s moral education and have their in-house philosopher do a press tour, we should understand them as asking the rest of us to indulge them in their fantasies. We don’t have to play along. In writing this essay, I have spent more time indulging them than they deserve, in the hopes that it will keep you from spending your time indulging them. If you want to think about LLMs, there are scores of other questions more worthy of your contemplation; you can safely ignore the question of their being conscious.

幸好大语言模型没有意识,否则那些大型 AI 公司的所作所为,会比现在更加丑闻缠身。那么,Anthropic 的员工为什么要暗示 Claude 可能有意识?也许这只是又一种炒作;也许,他们自己也中了那道一直对顾客施加的咒语。但当他们发布一份关于 Claude 道德教育的文件,还让自家哲学家四处接受采访时,我们应该明白:他们是在请我们其他人陪他们一起沉浸在他们的幻想里。而我们大可不必奉陪。为了写这篇文章,我陪他们沉湎于此的时间,已经超过他们应得的了——只盼着这篇文章能让你省下时间,不必也跟着去陪他们。如果你想思考大语言模型,还有一大堆别的问题比这个问题更值得你思考;至于它们有没有意识,你尽可以放心地不去理会。



转载请注明来源。欢迎留言评论,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。