AI & ChatGPT. A Primer.

Eugene Vainshel, CFA
21 min readJul 25, 2023

--

“Artificial Intelligence is a science and a set of computational technologies that are inspired by — but typically operate quite differently from — the ways people use their nervous system and brain to sense, learn, reason and take action.”

While AI algorithms have existed for many years, today, we are seeing a rapid, and unprecedented expansion of AI-based capabilities.

With AI becoming more prevalent, especially at the enterprise level, it is creating huge opportunities for organizations to book tangible business benefits by making smarter decisions, and through introduction of more efficient processes.

Bottom line: AI is not something of the future, it is here now, and it’s fueling the Fourth Industrial Revolution.

Industrial Revolutions

Table of Contents

Introduction to ChatGPT

  • ChatGPT
  • ChatGPT Prompts
  • ChatGPT In Action
  • ChatGPT Cheat Sheet

Introduction to AI

  • AI Timeline
  • The “AI” Team
  • AI Drivers
  • AI Categories
  • AI Use Cases
  • How AI Learns

What Google ‘Really’ Thinks of ChatGPT

  • There is No Moat
  • Implications For Google
  • What Happened
  • Competing With Open Source Is a Losing Proposition
  • Letting Open Source Work for Us
  • What About OpenAI?

Appendix

  • A Deep Dive Into How ChatGPT Works

ChatGPT

ChatGPT (Chat Generative Pre-trained Transformer) has quickly become one of the most significant tech launches since the original Apple iPhone in 2007.

The chatbot is now the fastest-growing consumer app in history, hitting 100 million users in only two months — but it’s also a rapidly-changing AI shapeshifter, which can make it confusing and overwhelming.

But what is crystal clear, is that ChatGPT has sparked an AI arms race, with Microsoft using a form of the chatbot in its new Bing search engine and Microsoft Edge browser. Google has also responded by announcing a chatbot, tentatively described as an “experimental conversational AI service”, called Google Bard.

ChatGPT Prompts

The ChatGPT prompts that follow, aim to illustrate the diverse abilities of ChatGPT for content creators across various domains including media content creation, natural language processing, and programming.

To download a PDF, click here.

Sample Prompts

  • Prompt: write an email selling software to corporate executives
  • Prompt: write a software engineer resume
  • Prompt: write an intro paragraph to a mystery novel
  • Prompt: write a paragraph on the history of the calculator in a formal style
  • Prompt: write a tweet on futurism
  • Prompt: write a blog on French cuisine
  • Prompt: summarize this text: “….”
  • Prompt: create a table from this text: “…”
  • Prompt: give me a list of 5 citrus fruits
  • Prompt: classify the named entities in this text: “…”
  • Prompt: translate this text into Portuguese: “…”
  • Prompt: show me how to make an http request in Python
  • Prompt: convert this code from Python to JavaScript: “…”
  • Prompt: convert this JSON object into XML: “…

ChatGPT In Action

First released by OpenAI as a “research preview” on November 30, 2022, the ChatGPT interface was, as it is now, a simple text box that allowed users to answer follow-up questions.

ChatGPT Main Screen

Step 2: submit a prompt

User Entered Prompt

Step 3: await a response

ChatGPT Generated Response

ChatGPT Cheat Sheet

Here’s a ChatGPT Cheat Sheet to help you get started:

ChatGPT Cheat Sheet

AI Timeline

“Artificial Intelligence is a science and a set of computational technologies that are inspired by — but typically operate quite differently from — the ways people use their nervous system and brain to sense, learn, reason and take action.”

AI was first named in 1955 by John McCarthy, and was defined as the ability of machines to perform human-like tasks. The term has gained popularity ever since its first mention.

Lately, there has been a big rise in the day-to-day use of machines powered by AI. Virtual assistants are becoming more common, most of the web shops predict your purchases, many companies use of chatbots in their customer service and many companies use algorithms to detect fraud.

These are just a few examples of how AI is used today. But this is just the beginning

Atlas, by Boston Dynamics

AI Timeline

AI Timeline

The “AI” Team

Many companies wish to implement and take advantage of machine learning (AI) and to do so with ‘in-house’ talent. If your company decides to go this route, at minimum, you’ll need to develop (or acquire) some of the following technical skills:

  • Data engineers and machine learning engineers who can scale the algorithms
  • Data analysts who can process the outcome
  • Statisticians to help ensure quality results
  • Software engineers to turn all you’ve created into something that can be used by the masses — be it your customers or your employees
The “AI” Team

AI Drivers

The three main drivers that have made AI available / accessible to companies are the evolution of data, the evolution of computing and the evolution of algorithms.

  • The evolution of data: A factor contributing to the massive adoption of AI is the exponential growth of available data. With the introduction of the Internet, social media, proliferation of sensors and smart devices, and the fact that data storage became cheaper, data has become more accessible than ever before.
  • The evolution of computing: Another major factor in AI’s current success is computing power. Back when AI was just beginning to be developed, the computing power was minimal. Computers nowadays can take much more data and heavier algorithms than in the 1950s.
  • The evolution of algorithms: Algorithms have been around since we could write. Recently, the development of more advanced algorithms has helped AI become more powerful and efficient.

AI Categories

We can split the term AI into three categories: narrow, broad and general.

General AI encompasses all humanlike capabilities, whereas narrow AI can only do a certain task — but it can do it quite well.

AI Categories
  • Narrow AI (present): Narrow AI is focused on addressing very focused tasks. Contrary to its naming, it’s a very powerful tool for routine jobs.
  • Broad AI (present): What we see today in self driving cars, is a collection of narrow AI systems, that work together to make decisions. This is what we call broad AI. Examples of broad AI include a system within a bank that analyzes the balance sheet of corporate customers to recommend the best currency hedging strategy. Another example would be a system that supports engineers who work on complex maintenance tasks on a platform in the middle of the Atlantic Ocean.
  • General IA (future): General AI refers to machines that can perform any intellectual task a human can. Currently AI does not have the ability to think abstractly, strategize and use previous experiences to come up with new creative ideas as humans do.

AI Use Cases

While relevant AI use cases span various areas across virtually every industry, there are three main macro domains that continue to drive the adoption of AI. These are:

  • Cognitive engagement: Involves how to deliver new ways for humans to engage with machines, moving from pure digital experiences (such as the ability to run transactions digitally) into human-like natural conversations.
  • Cognitive insights and knowledge: Deals with how to augment humans who are overwhelmed with information and knowledge.
  • Cognitive automation: Relates to how to move from process automation to mimicking human intelligence to facilitate complex and knowledge-intense business decisions.
AI Implementations

How AI Learns

Learning, which is one of the fundamentals of AI and machine learning, is when the algorithm improves itself by looking (learning from) the historical data it is provided.

While there is still quite a bit of confusion about the difference between AI, machine learning and deep learning — simply stated, AI encompasses the latter two.

Machine Learning

The “AI machine” learns by recognizing patterns in the historical / training data it is fed, and then by mapping these patterns to future outcomes.

AI System

The machine learns through adjusting the weights and biases in the network to get to the correct outcome. This feedback usually comes from a trainer — the data scientist.

The data scientist tells the model what should happen and what shouldn’t happen. This correction is then sent back through the network and an error rate is computed. With each iteration, the model works to decrease the error rate.

Machine Learning Techniques

Machine learning is enabling a machine to learn from data without explicitly programming it with rules, because it can learn from the data it’s given. Instead of programming all the rules, you feed the algorithm data and let the algorithm self-adjust to improve its accuracy.

There are four types of machine learning: supervised, unsupervised, reinforcement and transfer.

  • Supervised learning is a learning method that maps an input to an output using human data and human feedback to improve.
  • Unsupervised learning occurs when the algorithm is not given a specific “wrong” or “right” outcome. Instead, the algorithm is given unlabeled data. An unsupervised learning algorithm, for example, can find natural groupings of similar customers in a database.
  • Reinforcement learning is a class in and of itself; here, the AI is not given a specific goal, but rather learns from trial and error. If we take a maze as an example, the algorithm will be rewarded when it comes closer to its goal and be penalized every time it gets stuck or moves away from the completion. A recent example of reinforcement learning is AlphaGo, where Google trained a deep reinforcement learning network with many examples of the game Go, eventually making its performance superior to that of even the best human.
  • Transfer learning is when your algorithm learns to solve one problem, takes information from this problem and then solves a new problem with that information. This currently happens a lot with image recognition.

Deep Learning

Deep learning (DL) is a relatively new set of methods that is changing machine learning in fundamental ways.

DL isn’t an algorithm per se, but rather a family of algorithms that implements deep networks (i.e., many layers). These networks are so deep that new methods of computation, such as graphics processing units (GPUs), are required to train them, in addition to clusters of compute nodes.

DL almost always outperforms the other types of algorithms when it comes to image classification, natural language processing and speech recognition.

Currently, the larger the neural network and the more data that can be added to it, the better the performance a neural network can provide.

While DL is very powerful, it does have a couple of drawbacks:

  • Blackbox problem: It’s almost impossible to determine why the system came to a certain conclusion. This is called the “black box” problem, though there are now many available techniques that can increase insights in the inner workings of the DL model.
  • Also, deep learning often requires extensive training times, a lot of data and specific hardware requirements, and it’s not easy to acquire the specific skills needed to develop a new DL solution to a problem.

What Google ‘Really’ Thinks of ChatGPT

According to a Google insider, Meta’s decision to open-source its AI model is a game changer, both for Google & OpenAI

Readers that follow the AI landscape may have heard about the recently leaked document by a Google Researcher commenting on the state of the AI and the Large Language Models Arms Race.

The sentiments expressed, paint a picture that in many ways, is in stark contrast to the hype being propagated by VCs, AI Companies, and Social Media Influencers.

For those that have not had a chance to read the leaked memo, below are the key conclusions, from inside Google itself …

There is No Moat

Here at Google, we’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be?

But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch.

I’m talking, of course, about Meta’s recent open-source release of its LLM, LLaMA (Large Language Model Meta AI).

And while Big Tech LLMs (i.e., those from Google, OpenAI, etc.) still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source LLM models are faster, more customizable, more private, and pound-for-pound more capable. They are doing with $100 what we struggle to do with $10M And they are doing so in weeks, not months.

Implications For Google

Meta’s open-source release of LLaMA has profound implications for our business strategy:

We have no secret sauce. Our best hope is to learn from and collaborate with what others are doing outside Google.

Giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the < 20B parameter regime.

People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. We should consider where our value add really is.

What Happened

At the beginning of March, 2023, the open source community got their hands on their first really capable foundation model (LLM), when Meta’s LLaMA was leaked to the public.

LLaMA

A tremendous outpouring of innovation followed, with just days between major developments. With LLaMA going open-source, the barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop.

In many ways, this shouldn’t be a surprise to anyone.

The current renaissance in open source LLMs (Large Language Models) comes hot on the heels of a renaissance in image generation. In both cases, low-cost public involvement (i.e., open-source) kicked off a flurry of ideas and innovation from individuals around the world, with these contributions quickly outpacing those from entrenched industry players.

Competing With Open Source Is a Losing Proposition

The modern internet runs on open source for a reason.

Keeping our technology secret was always a tenuous proposition. Google researchers are leaving for other companies on a regular cadence, so we can assume they know everything we know, and will continue to for as long as that pipeline is open.

But holding on to a competitive advantage in technology becomes even harder now that cutting edge research in LLMs is affordable. We can try to hold tightly to our secrets while outside innovation dilutes our value, or we can try to learn from each other.

Large Language Models

Letting Open Source Work for Us

Paradoxically, the one clear winner in all of this is Meta.

Because the leaked model was theirs, they have effectively garnered an entire planet’s worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

The value of owning the ecosystem cannot be overstated. Google itself has successfully used this paradigm in its open source offerings, like Chrome and Android. By owning the platform where innovation happens, Google cements itself as a thought leader and direction-setter, earning the ability to shape the narrative on ideas that are larger than itself.

The more tightly we control our models, the more attractive we make open alternatives. Google and OpenAI have both gravitated defensively toward release patterns that allow them to retain tight control over how their models are used. But this control is a fiction. Anyone seeking to use LLMs for unsanctioned purposes can simply take their pick of the freely available models.

Google should establish itself a leader in the open source community, taking the lead by cooperating with, rather than ignoring, the broader conversation.

What About OpenAI?

All this talk of open source can feel unfair given OpenAI’s current closed policy. Why does Google have to share, if OpenAI won’t?

But the fact of the matter is, we are already sharing everything with them in the form of the steady flow of poached senior researchers. Until we stem that tide, secrecy is a moot point.

And in the end, OpenAI doesn’t matter.

They are making the same mistakes we are in their posture relative to open source, and their ability to maintain an edge is necessarily in question. Open source alternatives can and will eventually eclipse them unless they change their stance. In this respect, at least, we can make the first move.

Appendix: A Deep Dive Into How ChatGPT Works

OpenAI

OpenAI, the developer of ChatGPT, is an AI research and deployment company, currently employing around 100 people and organized into three main areas:

  • Capabilities (advancing what AI systems can do)
  • Safety (ensuring those systems are aligned with human values)
  • Policy (ensuring appropriate governance for such systems)

Sam Altman

Sam Altman is the founder of OpenAI:

Sam Altman, OpenAI Founder

Topics of Discussion

The ChatGPT discussion that follows is divided into the following sections:

  1. GPT = Chat Generative Pre-Trained Transformer
  2. What is ChatGPT 4
  3. How ChatGPT Works
  4. How It Works, A Deeper Dive
  5. LLM (Large Language Model)
  6. Neural Networks
  7. Neural Network Layers
  8. Neural Network Training
  9. Language as Numbers
  10. Data Data Data
  11. Inside ChatGPT
  12. Training ChatGPT

1. GPT = Chat Generative Pre-Trained Transformer

GPT (generative pre-trained transformer) is a large language model (LLM) used to predict the probability of the sequence of words. It is trained on an input sequence and its target is predicting the next token (i.e., word) at each point of the sequence.

ChatGPT is shorthand for Chat Generative Pre-Trained Transformer.

  • Chat: The ‘chat’ naturally refers to the chatbot front-end that OpenAI has built for its GPT language model.
  • Generative Pre-Trained: The second and third words show that this model was created using ‘generative pre-training’, which means it’s been trained on huge amounts of text data to predict the next word in a given sequence.
  • Transformer: Lastly, there’s the ‘transformer’ architecture, the type of neural network ChatGPT is based on. Interestingly, this transformer architecture was actually developed by Google researchers in 2017.

2. What is ChatGPT 4

On March 14, 2023, OpenAI announced that its next-generation language model, GPT-4, was available to developers and ChatGPT Plus subscribers — with Microsoft confirming that Bing is already running on GPT-4.

ChatGPT Plus

The big change from GPT-3.5, is that OpenAI’s newest language model is multimodal, which means it can process both text and images.

This means you can show it images and it will respond to them alongside a text prompt — an early example of this, noted by The New York Times, involved giving GPT-4 a photo of some fridge contents and asking what meals you could make from the ingredients.

GPT-4

OpenAI also says that safety is a big focus of GPT-4, with OpenAI working for over six months to put it through a better monitoring framework by working alongside experts across a range of specialist fields, like medicine and geopolitics, to make sure its answers are both “accurate and sensitive”.

Message From OpenAI

3. How ChatGPT Works

ChatGPT has been created with one main objective — to predict the next word in a sentence, based on what’s typically happened in the gigabytes of text data that it’s been trained on.

ChatGPT works thanks to a combination of deep learning algorithms, a dash of natural language processing, and a generous dollop of generative pre-training, which all combine to help it produce disarmingly human-like responses to text questions

Once you give ChatGPT a question or prompt, it passes through the AI model and the chatbot produces a response based on the information you’ve given and how that fits into its vast amount of training data.

It’s during this training that ChatGPT has learned what word, or sequence of words, typically follows the last one in a given context.

ChatGPT In Action

In addition to language-based tasks, ChatGPT is also talented at coding and productivity tasks.

For the former, its ability to create code from natural speech makes it a powerful ally for both new and experienced coders who either aren’t familiar with a particular language or want to troubleshoot existing code.

4. How It Works, Con’t

What ChatGPT is fundamentally trying to do, is to produce a “reasonable continuation” of whatever text it’s got so far, where by “reasonable” we mean “what one might expect someone to write after seeing what people have written on billions of webpages, etc.”

Let’s say we’ve got the text “The best thing about AI is its ability to _____”.

  • Imagine scanning billions of pages of human-written text (e.g., on the web and in digitized books) and finding all instances of this text — then seeing what word comes next what fraction of the time.
  • ChatGPT effectively does something like this, except that it doesn’t look at literal text; it looks for things that in a certain sense “match in meaning”. But the end result is that it produces a ranked list of words that might follow, together with “probabilities”

The remarkable thing is that when ChatGPT does something like write an essay what it’s essentially doing is just asking over and over again “given the text so far, what should the next word be?” — and each time adding a word.

But, OK, at each step it gets a list of words with probabilities. But which one should it actually pick to add to the essay (or whatever) that it’s writing? By design, it’s not always the one with the highest assigned probability …

The fact that there’s randomness here, means that if we use the same prompt multiple times, we’re likely to get different essays each time.

5. LLM (Large Language Model)

As discussed above, ChatGPT picks its next word based on probabilities. But where do those probabilities come from?

At the core of ChatGPT is a so-called “large language model” (LLM) that’s been built to do the job of estimating those probabilities.

LLMs are machine learning models that utilize deep learning algorithms to process and understand language (i.e., they are trained on immense amounts of data to learn language patterns so they can perform tasks).

Large Language Model Size
  • The graphic (using a logarithmic scale) shows how drastically models have grown in size in only four years, from BERT’s 240 million parameters to GPT-3’s 175 billion.
  • Parameters are a model’s internal variables that drive its decision making, similar to neurons in the brain. The more parameters used in a model, the higher level of complexity and sophistication that model is able to attain.

LLMs are comprised of multiple layers of Neural Networks, which work together to analyze text and predict outputs.

The best-known example of LLMs is ChatGPT. Another popular example is BERT, or Bidirectional Encoder Representations from Transformers, which was developed by Google.

6. Neural Networks

So how does ChatGPT model human language? The most popular — and successful — current approach uses what are known as neural networks.

Invented in the 1940’s Neural Networks are the functional unit of Deep Learning and are known to mimic the behavior of the human brain to solve complex data-driven problems.

Ultimately, a neural net is a connected collection of idealized “neurons” — usually arranged in layers. Here’s an example:

Neural Network

What makes neural nets so useful, is they they can be incrementally “trained from examples” to perform many different tasks.

7. Neural Network Layers

Neural Networks typically consist of three layers:

  • Input layer: The data that we feed to the model is loaded into the input layer from external sources like a CSV file or a web service. It is the layer in the Neural Network architecture that passes information from the outside world without any computation.
  • Hidden Layers: The hidden layers are what makes deep learning what it is today. They are intermediate layers that do all the computations and extract the features from the data.
  • Output layer: The output layer takes input from preceding hidden layers and comes to a final prediction based on the model’s learnings. It is the layer where we get the final result.
Neural Network Layers

8. Neural Network Training

How does neural net training actually work?

An important feature of neural nets is that — like computers in general — they’re ultimately just dealing with data.

The basic idea is to supply lots of “input → output” examples to “learn from” — and then to try to find weights that will reproduce these examples (ChatGPT uses 175 billion of such weights).

Particularly over the past decade, there’ve been many advances in the art of training neural nets. And, yes, it is basically an art, as mostly things have been discovered by trial and error, adding ideas and tricks that have progressively built a significant lore about how to work with neural nets.

9. Language as Numbers

Neural nets — at least as they’re currently set up — are fundamentally based on numbers. So if we’re going to use them to work on something like text, we’ll need a way to represent text with numbers.

Ultimately we have to formulate everything in terms of numbers.

One way to do this is just to assign a unique number to each of the 50,000 or so common words in English. So, for example, “the” might be 914, and “cat” might be 3542 (these are the actual numbers used by GPT-2).

But actually we can go further than just characterizing words by collections of numbers; we can also do this for sequences of words, or indeed whole blocks of text. And inside ChatGPT that’s how it’s dealing with things.

It takes the text it’s got so far, and generates an embedding vector to represent it. Then its goal is to find the probabilities for different words that might occur next. And it represents its answer for this as a list of numbers that essentially give the probabilities for each of the 50,000 or so possible words.

10. Data Data Data

Many of the practical challenges around neural nets — and machine learning in general — center on acquiring or preparing the necessary training data (recall the maxim: garbage in, garbage out).

But how much data do you actually need to show a neural net to train it for a particular task? Again, it’s hard to estimate from first principles.

But generally, neural nets need to “see a lot of examples” to train well. And at least for some tasks, it’s an important piece of neural net lore that the examples can be incredibly repetitive. And indeed it’s a standard strategy to show a neural net all the examples one has, over and over again.

But often, just repeating the same example over and over isn’t enough.

It’s also necessary to show the neural net variations of the example. And it’s a feature of neural net lore that those “data augmentation” variations don’t have to be sophisticated to be useful. Just slightly modifying images with basic image processing can make them essentially “as good as new” for neural net training.

11. Inside ChatGPT

OK, so we’re finally ready to discuss what’s inside ChatGPT.

And, yes, ultimately, it’s a giant neural net — currently a version of the so-called GPT-3 network with 175 billion weights. But it’s a neural net that’s particularly set up for dealing with language.

  • Recall that ChatGPT’s overall goal is to continue text in a “reasonable” way, based on what it’s seen from the training it’s had (which consists in looking at billions of pages of text from the web, etc.)
  • So at any given point, it’s got a certain amount of text — and its goal is to come up with an appropriate choice for the next token to add.

In the end what we’re dealing with is a neural net made of “artificial neurons”, each doing the simple operation of taking a collection of numerical inputs, and then combining them with certain weights. It is rather remarkable, that these operations, taken together, can somehow manage to do such a good “human-like” job of generating text.

12. Training ChatGPT

OK, so we’ve now given an outline of how ChatGPT works once it’s set up. But how did it get set up? How were all those 175 billion weights in its neural net determined?

Basically they’re the result of very large-scale training, based on a huge corpus of text — on the web, in books, etc. — written by humans. In practice, ChatGPT was successfully trained on a few hundred billion words of text.

But, OK, given all this data, how does one train a neural net from it?

Here’s the basic concept:

  • Start from a huge sample of human-created text from the web, books, etc.
  • Then train a neural net to generate text that’s “like this”.
  • And in particular, make it able to start from a “prompt” and then continue with text that’s “like what it’s been trained with”.

Ultimately ChatGPT is ‘simply’ pulling out some “coherent thread of text” from the “statistics of conventional wisdom” that it’s accumulated (i.e., it’s training data)

The majority of the effort in training ChatGPT is spent “showing it” large amounts of existing text from the web, books, etc. But it turns out there’s another — rather important — part.

A key idea in the construction of ChatGPT was to have another step after “passively reading” things like the web: to have actual humans actively interact with ChatGPT, see what it produces, and in effect give it feedback on “how to be a good chatbot”

As soon as it’s finished its “raw training” from the original corpus of text it’s been shown, the neural net inside ChatGPT is ready to start generating its own text, continuing from prompts, etc.

--

--