查看原文
其他

ChatGPT是干什么的?为什么它有效?(第二部分)

梵天Daozen 梵天Daozen 2023-04-14

继续是大神Stephen Wolfram介绍ChatGPT原理系列。书接上回,我们第一部分阐述的是ChatGPT中的"Generative"部分,即根据已有的文本,结合推算出来下一个"词"的概率,不断续写来生成文章。计算出不同词的概率的,用的正是模型,这一部分就是解释什么是模型,以及什么叫"好的模型"。


What Is a Model?

什么是模型?


Say you want to know (as Galileo did back in the late 1500s) how long it’s going to take a cannon ball dropped from each floor of the Tower of Pisa to hit the ground. Well, you could just measure it in each case and make a table of the results. Or you could do what is the essence of theoretical science: make a model that gives some kind of procedure for computing the answer rather than just measuring and remembering each case.

假设你像十六世纪后期的伽利略一样,想知道从比萨斜塔的每一层楼掉落的炮弹需要多长时间才能落地。嗯,你可以在每种情况下测量它并制作一个结果表。或者你可以用理论科学的本质来做:建立一个模型,给出某种程序来计算答案,而不是仅仅测量和记下每个具体的案例。

Let’s imagine we have (somewhat idealized) data for how long the cannon ball takes to fall from various floors:

(理想状况下)假设我们有炮弹从不同楼层落下所需时间的数据:

How do we figure out how long it’s going to take to fall from a floor we don’t explicitly have data about? In this particular case, we can use known laws of physics to work it out. But say all we’ve got is the data, and we don’t know what underlying laws govern it. Then we might make a mathematical guess, like that perhaps we should use a straight line as a model:

如果某个楼层我们没有明确数据,我们如何计算出从这一层掉下来需要多长时间?在这种特殊情况下,我们可以使用已知的物理定律来解出答案。但是假设我们所拥有的只是数据,而我们不知道受其约束的基本物理规律。那么我们可能会做一个数学上的猜测,比如也许我们应该用一条直线作为模型:

We could pick different straight lines. But this is the one that’s on average closest to the data we’re given. And from this straight line we can estimate the time to fall for any floor.

我们可以选择不同的直线。但平均来说,目前的这条是最接近我们给出的数据的那个。从这条直线我们可以估计任何楼层下降的时间。

How did we know to try using a straight line here? At some level we didn’t. It’s just something that’s mathematically simple, and we’re used to the fact that lots of data we measure turns out to be well fit by mathematically simple things. We could try something mathematically more complicated—say a + b x + c x(2)—and then in this case we do better:

我们怎么知道在这里应该尝试使用直线?在某种程度上来说我们并不是真的知道。这只是因为数学上更为简单,我们已经习惯了这样一个事实,即我们测量的大量数据被证明与数学上简单的东西吻合。我们也可以尝试一些数学上更复杂的东西——比如 a + b x + c x (2) ——实际上,这种情况下用后者效果更好:

Things can go quite wrong, though. Like here’s the best we can do with a + b/x + c sin(x):

不过有时候有的选择会错得离谱。比如说我们尝试用 a + b/x + c sin(x) 来做匹配,最好也只是做到下面这样(注:即不断优化a、b、c三个参数来拟合,最好也只能做到下图这样)

It is worth understanding that there’s never a “model-less model”. Any model you use has some particular underlying structure—then a certain set of “knobs you can turn” (i.e. parameters you can set) to fit your data. And in the case of ChatGPT, lots of such “knobs” are used—actually, 175 billion of them.

值得理解的是,从来没有什么真正的“无模型模型”。你使用的任何模型无论如何都有一些特定的底层结构——然后用一组特定的“你可以转动的旋钮”(即你可以设置的参数)来适应你的数据。就ChatGPT 来说,它用了很多很多这样的“旋钮”——实际上是 1750 亿个。

But the remarkable thing is that the underlying structure of ChatGPT—with “just” that many parameters—is sufficient to make a model that computes next-word probabilities “well enough” to give us reasonable essay-length pieces of text.

但值得注意的是,ChatGPT 的底层结构有效构建了一个这样的模型(虽然“只有”1750 亿个参数),它能够“足够好”去计算下一个词的概率,从而可以给我们生成合理的文章长度的文本。


Models for Human-Like Tasks 

类人任务模型


The example we gave above involves making a model for numerical data that essentially comes from simple physics—where we’ve known for several centuries that “simple mathematics applies”. But for ChatGPT we have to make a model of human-language text of the kind produced by a human brain. And for something like that we don’t (at least yet) have anything like “simple mathematics”. So what might a model of it be like?

我们上面给出的例子里面涉及建立模型的数值数据,本质上来自简单物理学——几个世纪以来我们就知道“简单的数学适用”。但是对于 ChatGPT,我们必须建立一个模型,来处理由人类大脑产生的人类语言。对于类似的东西,我们(至少现在)没有像“简单数学”这样的东西。那么它的模型可能是什么样的呢?

Before we talk about language, let’s talk about another human-like task: recognizing images. And as a simple example of this, let’s consider images of digits (and, yes, this is a classic machine learning example):

在我们谈论语言之前,让我们谈谈另一个类似人类的任务:识别图像。作为一个简单的例子,让我们考虑一下数字图像(是的,这是一个经典的机器学习例子):

One thing we could do is get a bunch of sample images for each digit:

我们可以做的一件事是为每个数字获取一堆示例图像:

Then to find out if an image we’re given as input corresponds to a particular digit we could just do an explicit pixel-by-pixel comparison with the samples we have. But as humans we certainly seem to do something better—because we can still recognize digits, even when they’re for example handwritten, and have all sorts of modifications and distortions:

然后,为了确定我们作为输入给出的图像是否对应于特定数字,我们可以与我们拥有的样本进行明确的逐像素比较。但作为人类,我们似乎确实做得更好——因为即使这些数字是手写的,有各种修改和扭曲,我们仍然可以识别它们:

When we made a model for our numerical data above, we were able to take a numerical value xthat we were given, and just compute a + b x for particular a and b. So if we treat the gray-level value of each pixel here as some variable x(i) is there some function of all those variables that—when evaluated—tells us what digit the image is of? It turns out that it’s possible to construct such a function. Not surprisingly, it’s not particularly simple, though. And a typical example might involve perhaps half a million mathematical operations.

当我们为上面的数值数据建立模型时,我们能够采用给定的数值 x,就能用特定 a 和 b 来计算出 a + b x。因此,如果我们将此处每个像素的灰度值视为某个变量 x (i) ,是否存在所有这些变量的函数——在评估时——告诉我们图像的数字是多少?事实证明,构造这样一个函数是可能的。不出意外,它并不会特别简单。一个典型的例子可能涉及 50 万个数学运算。

But the end result is that if we feed the collection of pixel values for an image into this function, out will come the number specifying which digit we have an image of. Later, we’ll talk about how such a function can be constructed, and the idea of neural nets. But for now let’s treat the function as black box, where we feed in images of, say, handwritten digits (as arrays of pixel values) and we get out the numbers these correspond to:

但最终结果是,如果我们将一幅图像的像素值集合输入该函数,输出的将是这个图像指定的数字。后面我们将讨论如何构建这样的函数,以及神经网络的概念。但是现在让我们暂时把这个函数当作黑盒子,我们输入图像,比如手写数字(作为像素值数组),然后我们得到这些对应的数字:

But what’s really going on here? Let’s say we progressively blur a digit. For a little while our function still “recognizes” it, here as a “2”. But soon it “loses it”, and starts giving the “wrong” result:

但是这中间到底发生了什么?假设我们将一个数字逐渐模糊,前面一段时间我们的函数仍然能够“识别”它是“2”。但很快它就“丢失了它”,开始给出“错误”的结果:

But why do we say it’s the “wrong” result? In this case, we know we got all the images by blurring a “2”. But if our goal is to produce a model of what humans can do in recognizing images, the real question to ask is what a human would have done if presented with one of those blurred images, without knowing where it came from.

但为什么我们说这是“错误”的结果呢?在这种情况下,我们知道所有这些图像都是通过模糊“2”得来的。但是,如果我们的目标是建立一个跟人类在识别图像方面能力一样的模型,那么真正要问的问题是,如果人类在不知道这张图怎么来的情况下,又会把它识别成什么呢?

And we have a “good model” if the results we get from our function typically agree with what a human would say. And the nontrivial scientific fact is that for an image-recognition task like this we now basically know how to construct functions that do this.

如果我们从函数中得到的结果通常与人类所说的一致,那么我们就有了一个“好的模型”。重要的科学事实是,对于像这样的图像识别任务,我们现在基本上知道如何构建函数来满足需求。

Can we “mathematically prove” that they work? Well, no. Because to do that we’d have to have a mathematical theory of what we humans are doing. Take the “2” image and change a few pixels. We might imagine that with only a few pixels “out of place” we should still consider the image a “2”. But how far should that go? It’s a question of human visual perception. And, yes, the answer would no doubt be different for bees or octopuses—and potentially utterly different for putative aliens.

我们能“用数学证明”它们有效吗?答案是不能。因为要做到这一点,我们必须对我们人类正在做的事情有一个数学理论。拿一张数字“2”的图像然后改变几个像素。我们可能会想象只有几个像素“不合适”,我们应该还会认为图像是“2”。但这能推演多远?(注:这跟秃子悖论相似,如果改变几个像素这张图还是2,那不断重复改变几个像素是不是还是2呢?到底改变多少才会让人觉得它不是2呢?)的这是人的视觉感知问题。而且没错,对于蜜蜂或章鱼来说,答案无疑会有所不同——而且对于想像中的外星人来说,答案可能会完全不同。

下集预告


我们这一部分理解了什么是模型,下一部分将会去了解怎么去训练模型——即什么是机器学习和神经网络。



您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存