What is generative AI ?
Generative Artificial Intelligence (AI) can be broken down into two parts -
Generative - Create new content (For example: text, audio, video, code, images)
Artificial Intelligence - automatically (generate) using a computer program
Generative AI is not a new concept, it has been there for a while and you have been using it.
Almost everyone of us have made use of Google translate, which is an example of generative AI and it was launched in year 2006.
For the iPhone fans the example is Siri, it was launched in 2011. You can talk to Siri, ask it to call a friend, open an app or Siri tells you a joke. Siri talks back and performs the task.
The auto-complete feature when you type text on your phone is an example of generative AI.
What changed with the advent of chatGPT in late 2022?
The AI examples just above were not so impressive or didn't seem too intelligent to us as humans but then something happened in November 2022, chatGPT from OpenAI was launched and everyone was talking about it. ChatGPT did convince many smart people that something revolutionary has been created.
A system that can generate text like humans do.
A system intelligent enough to take exams and pass too.
Capable of solving mathematical problems.
Can read documents and answer your queries.
It took less than 2 months for chatGPT to cross 100 Million users mark which if we compare to other apps it took 30 months for instagram and 55 months for spotify.
How actually does these systems like chatGPT work ?
It's almost magical to people how these systems like chatGPT, Gemini, Claude or Mistral generate text responses.
The systems mentioned above are known as Large Language Models (LLM). All of these LLMs work on the same principle which is called Language Modeling.
What is Language Modeling?
A language model is a type of artificial intelligence designed to understand and generate human language. It can predict what word comes next in a sentence, generate text, answer questions and perform other language tasks. This is also known as Natural Language Processing (NLP).
It assumes we have a sequence of words i.e. the context, for example in the image below the context is "I want to" and for given context the language model will try to 'predict what comes next' which can be "Play", "Eat", "Run", "Dance" and further continuation.
Because of this, the text box where you write your messages/queries in chatGPT or any other LLM, is called a context window.
The example we have seen here is very simple but imagine a computer/machine which has seen a lot of text, almost all the text that is publicly available on the internet and it knows what word follows which other words. It will take many lifetimes if a human wants to read this much text.
But still sometimes it fails, because LLMs predicts the most likely answers and you may be expecting a less likely one.
The language models always predict answers in probabilities so if we show an image of dog and ask to predict, it will answer something like 93% dog and 7% not a dog. What you see on the screen is the answer with higher probability.
Whenever a new language model like chatGPT comes up it is always hyped about how many billion parameters it is trained on. What are these parameters and what they do ?
Think of the parameters as the knobs and dials of a language model that can be adjusted to make it work correctly.
Why Billions of Parameters ?
It's mainly because of two reasons -
Complexity of Human Language: Human language is too complex with nuances, context, idioms, and many variations. To understand and generate human-like text, a language model need to be very sophisticated. The more parameters it has, the more detailed and accurate it can be.
Learning from Data: A language model learns by analyzing vast amounts of text data. Each parameter helps the model understand and remember a small piece of this data. With billions of parameters, the model can capture a lot more detail from the training data.
Take an example of learning a new skill like playing a Piano, initially you need to remember multiple things like - Position of keys, finger movement, music notes and how they all connect.
Now imagine you are trying to learn piano tunes for not just one song, but all the songs that have ever been created. You'd need a lot more memory, techniques and lot more fine-tuned adjustments to play all of them correctly.
The process of making adjustments/training for specific tasks, to the language models is known as 'fine tuning the LLMs'.
What is prompting and how it is helpful while interacting with these models?
The large language models have been trained on data that is publicly available on the internet, so it's a mix of good, bad and average data these have been trained on.
If you simply ask a question to the LLM straight forward, it will output an answer which may be satisfactory but if you customize the way you ask a question and interact with the model, it's the secret sauce to extract the hidden deep down quality data. So the quality of the output depends on how you frame your questions and this is an iterative process.
Below is a simple prompt -
Help me write an email asking to be assigned to the legal documents project.
A better version of the prompt -
Help me write an email asking to be assigned to the legal documents project.
I’m applying for a job on the legal documents project, which will check legal documents using LLMs.
I have ample experience prompting LLMs to generate accurate text in a professional tone. Write a paragraph of text explaining why my background makes me a strong candidate to this project and advocate for my candidacy.
You may take a look at the iterative process of prompting explained here - Mastering the Iterative Approach and a comprehensive documentation from OpenAI on prompt engineering.
Subscribe to the newsletter for more such breakdowns and do let me know for any questions or topics that you would like to understand.