GPT3 的工作原理 - 可视化和动画-花粉乐享

> 乐学堂 > > GPT3 的工作原理 - 可视化和动画

GPT3 的工作原理 - 可视化和动画

来源：钻井工程师随笔

2023-04-11 12:40:11

338

管理

How GPT3 Works - Visualizations and AnimationsGPT3 的工作原理 - 可视化和动画

Jay Alammar 发表的一篇blog，我用机器翻译转给大家看看，关于最火热的GPT3的工作原理。

原文地址：

https://jalammar.github.io/how-gpt3-works-visualizations-animations/

The tech world is abuzz with GPT3 hype. Massive language models (like GPT3) are starting to surprise us with their abilities. While not yet completely reliable for most businesses to put in front of their customers, these models are showing sparks of cleverness that are sure to accelerate the march of automation and the possibilities of intelligent computer systems. Let’s remove the aura of mystery around GPT3 and learn how it’s trained and how it works.科技界充斥着 GPT3 炒作。大规模语言模型（如 GPT3）的能力开始让我们大吃一惊。虽然对于大多数企业来说，展示在客户面前的这些模型还不是完全可靠，但这些模型正在显示出聪明的火花，这些火花肯定会加速自动化的进程和智能计算机系统的可能性。让我们揭开 GPT3 的神秘面纱，了解它的训练方式和工作原理。

A trained language model generates text.经过训练的语言模型生成文本。

We can optionally pass it some text as input, which influences its output.我们可以选择将一些文本作为输入传递给它，这会影响它的输出。

The output is generated from what the model “learned” during its Training period where it scanned vast amounts of text.输出是根据模型在扫描大量文本的训练期间“学习”的内容生成的。

You can see a detailed explanation of everything inside the decoder in my blog post The Illustrated GPT2.您可以在我的博文 The Illustrated GPT2 中看到解码器内部所有内容的详细解释。

The difference with GPT3 is the alternating dense and sparse self-attention layers.与 GPT3 的不同之处在于密集和稀疏自注意力层的交替。

This is an X-ray of an input and response (“Okay human”) within GPT3. Notice how every token flows through the entire layer stack. We don’t care about the output of the first words. When the input is done, we start caring about the output. We feed every word back into the model.这是 GPT3 中输入和响应（“Okay human”）的 X 射线图。注意每个令牌如何流经整个层堆栈。我们不关心第一个单词的输出。输入完成后，我们开始关心输出。我们将每个词反馈回模型。

It’s impressive that this works like this. Because you just wait until fine-tuning is rolled out for the GPT3. The possibilities will be even more amazing.令人印象深刻的是，它是这样工作的。因为您只需等到 GPT3 推出微调。可能性将更加惊人。

Fine-tuning actually updates the model’s weights to make the model better at a certain task.微调实际上是更新模型的权重，使模型在某个任务上表现更好。

Written on July 27, 2020 写于 2020 年 7 月 27 日

花粉社群VIP加油站

0

赏礼

赏钱

0

免责声明：本文仅代表作者个人观点，与花粉乐分享无关。其原创性以及文中陈述文字和内容未经本网证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。
凡本网注明 “来源：XXX（非花粉乐分享）”的作品，均转载自其它媒体，转载目的在于传递更多信息，并不代表本网赞同其观点和对其真实性负责。
如因作品内容、版权和其它问题需要同本网联系的，请在一周内进行，以便我们及时处理。
QQ：2443165046 邮箱：info@hflfx.com

低代码迎来重大突破！微软启用超级模型GPT-3，路人秒变程序员

2023-04-11 12:41

GPT-3 定价揭晓，对个人用户来说，可能贵

2023-04-11 12:38