#7: How to Train Your Own AI Language Model
Welcome back to another episode of The Junction.
We are going to be talking about how
to train your own AI model today.
We've been talking a lot about AI, you know, jumping
in there and asking it prompts and things, but, you
know, once you start jumping in your own box or
like a paid version of it, I think you start
to see real benefit from it.
Learning off your own data sets, right.
Chase, would you agree? Disagree?
Well, yeah, I mean, you train anybody
on your own stuff naturally, right.
They're going to be better at it.
You do it over and over again.
You're going to learn from your
own right, wrong or indifferent.
For better, for worse. Yeah.
You bring your other half with you, right.
You're naturally going to be better. Sure.
I just have a question for you.
If I wanted to train my own data model, my
own AI model, should I pull up the how to
Train your Dragon movie and like, I've got it done.
Are we good?
No, I'm shaking my head. Can you sing it?
What's the theme song?
That's a Pixar movie, I think.
I know, right?
There's no music in there.
Mel, you got to get on the movie train.
You should go talk to every Pixar music
has movie or every Pixar movie has music.
Yeah, they do.
You know what is crazy, though?
There are training models to generate video content.
Like, do you think the next Pixar movie
is going to be made by AI?
I think parts of the Pixar movie
are going to be made by AI.
You think about, oh, you know, the Star Wars movies
that just came out, they have animated in a I
don't know if they've necessarily used, like, OpenAI.
Right.
But they're using artificial intelligence to make people
look younger or people that are now gone.
They are bringing them back to life,
which maybe has some ethical concerns.
But I really enjoy Star Wars movies,
so I'm not going to complain.
It's for entertainment value, naturally. Right.
First of all, when we talk about a model, is
that what the layperson thinks of as Chat GPT?
And I'm talking about myself in this
context as a model or like yes.
When you say training, an AI model to me, is
that I'm going to go out to Chat JPT.
And is Chat JPT a model?
Is Jasper a model?
Are these various AI tools models or is there a
handful of models that all these tools are using or
based on the model that you're talking about, are these
large language models that are trained off of data?
And typically it's like I won't say a
one time thing, but they're not continuously training.
So you give it a set of data.
It is now trained and for the most
part, you don't typically go back, at least
at the size that OpenAI is doing it.
You don't go back and train it
over and over and over again.
Now you can but typically you do versions, right?
Like, you refine your data set
and you generate a new model.
And that's what OpenAI is doing.
They've got 3.5.
They had some versions prior to that.
They've got Iterative versions even within 3.5 and 4.0.
They've got different versions of 4.0 where
it'll take more context and some less.
Some of that's just the way that
those business runs large language models.
That's correct.
Okay, because while you were explaining, I went out and
just did a nice quick Google search on the types
of models and pulled up an article here from HubSpot
talking about the models that marketers are using today.
So it's calling out four types of artificial
intelligence in this article reactive Machines, Theory of
Mind, Limited Memory, and Self Aware.
Does Chachabt fall into any of those?
I think it depends on and yes, the answer
is yes, it depends on the model that you're
using and the method that you're using it.
Ultimately, all of these are trained
off of data sets, right?
They took this is maybe layman's terms, right?
But they basically took all of the Internet prior
to 2021 and threw it at this model.
And the model learned off of that information.
And now it can provide, if it was in its raw
infancy with no rules, none of the as an Open AI
model, I can't it would basically regurgitate all the things that
it's like if you ever watch The Matrix, right, they just
downloaded how to do jiu jitsu and boom.
Now they can do jiu jitsu.
It's a similar idea, but the hope isn't that we would
take these models and just give it a wealth of information
that it would predict the next best answer, right?
If we tell it at one plus one is two, then
in theory we could say, well, what is two plus one?
And without giving it that data, that pre created data
to tell it that two plus one is three, it
could predict that two plus one is three.
So what should business leaders be excited about, or anyone
who's thinking about deploying an OpenAI strategy at their company
and training it on their own data set?
What are those predictive insights that you
see people getting excited about the most
and getting the most value out of?
The first one that I can think of is, at
least from a marketing standpoint, this is how the people
in our business talk about our business, right?
And you take all of those transcripts or recordings,
you digest that down into a data set that
you send to the model, and the model then
starts to talk like Mel, right?
It starts to talk like Scott,
starts to talk like Chase.
And it uses that to understand that, well,
this is the way we talk about things.
One of the things that we say often is
we help you close the gaps to close business?
Well, it would pick that up, right.
And then if you asked it to write a marketing
email, it might use a little bit of that, right?
Or it might think like, oh, these guys can
close the gaps to help people win business. Right.
So I'm going to write something about, hey, we
can help you close the business by closing gaps
that you couldn't have otherwise closed without us.
Right.
Which if you go out to chat chippyt, unless that's
repeated, I guess throughout the website or in some other
way right, from the pre 21 era, it wouldn't necessarily
be that accurate or have that tone of voice. Right.
Well, this is what you do when you train a model.
Basically, you're asking a question in one prompt and in
the second prompt you're giving it the answer, right?
You're saying, here's the question,
here's the right answer.
And over time with enough prompts where you give it
similar questions and similar answers, it starts to realize, okay,
this is the way we talk about this thing.
Is there a benchmark or a baseline?
How many prompts and how many
answers are we talking about here?
Ten hundred thousand.
It's number 2100.
Prompts and answers is the benchmark for maybe
it's starting to learn a little bit.
Anything below that, it's just not going to be enough.
Now you could certainly try it, right?
And above and beyond that
you're kind of diminishing returns.
Now, I won't say that.
Yeah, aim for 100.
You should aim for like 1000.
And it depends on the data set or
the data model that you're trying to build.
Should we transition and ask OpenAI how to train a
model or do you have additional well, this is just
one of the things that I'm thinking about, right?
Like you were talking about use cases.
We can do it for marketing, but you can also
do it for anything that you are doing repetitively where
the answer kind of changes over time, right?
From a coding standpoint, well, we're not going to
code the same exact thing every time, but maybe
we do it in a specific way or maybe
we write emails in a specific way.
Anything that is similar over time or kind
of repetitive, but not always the same.
Those are the use cases where you
would benefit from training a model.
So we've got access to this OpenAI playground.
Is that where I'm training it?
Where do I go?
I don't know where to start.
Great question.
Training a model is not just
typing something in by hand.
You have to curate your data set, right?
You have to put it in a specific syntax in a
CSV file that is formatted in the way that the model
that is going to digest it expects it in.
You can't just throw a bunch of stuff at it, right?
Let me grab some transcripts and throw it in there.
It's not a folder where you
can upload a bunch of stuff.
You have to tell this model, hey, here's the data set.
I know how you were expecting it.
And here are the prompts in that specific order
that you are expecting to digest them in.
I mean, it is a computer, right?
Like, sure, you can write a
message, right, and chat with it.
But that's the front end piece we're
talking about, the back end piece.
So if you want to train a model, you have
to use at least for OpenAI, you have to use
the API to send the data to it.
It then takes some time to digest it, and then
you can go in and start typing messages to it.
Do I need a technical background to do this?
Thinking about our listeners, right?
Like, what are the resources I
need besides maybe a paid subscription?
To get access to OpenAI, you don't necessarily need
I mean, obviously you need access to your data.
You need to know how the data is formatted.
You need to format the data.
Now, up until that point, anybody
could probably do that, right?
It's not this crazy format where it's written code,
it's just a specific syntax in a CSV file.
Beyond that, sending the data in, you
will have to access the API, which
involves some level of technical pross.
And if you can't do mean I know some people, right?
We can certainly help you with that.
But you do have to send the data in through the API.
And at that point you can
start using that trained model.
You can call it Mel Bell is super cool,
model one, whatever you want to call it.
And then in that playground, then
you can select that model, right?
And then start typing messages to it.
So I would say, I don't know, on
a scale of one to five, you're probably
in the three range somewhere around there.
All right, let's get back to it.
Let's jump into the playground and let's
ask yeah, what should we ask it?
How to make a million bucks?
Or, I mean, should we do something more realistic?
I have a feeling you've done that before. Have you?
True or false, has Chase ever asked Chat
JPT how to win a million bucks?
I haven't asked that specific question, but one of the questions
that people do ask is, how do I make money? Right?
It's actually a pretty decent question to ask.
Not because you think it will actually tell you
how to make money, but it forces the model
to predict something that isn't necessarily one plus one.
It's like something that is not tangibly,
quickly, accessible, so it shows that it
can think at least a little bit.
But I have not personally asked that, at least not yet.
I just typed in here, how do I
train an AI model like Chat GPT?
I didn't give it any other parameters,
didn't tell it, give me a list.
So it came out here and said, here's
a high level overview of the process.
Give me five things.
Gather a data set, which you hit,
prepare and clean the data set.
Two for two.
Fine tune a language model.
So this says Chat GPT is typically built upon
a pre trained language model generated, pre trained transformer.
That's where you're sending the
data in through the API. Okay.
Number four, select a training approach
so you can use either the
supervised fine tuning or reinforcement learning.
So this supervised fine tuning involves training the
model to predict the next conversational turn.
Given the previous turns, the reinforcement learning
uses a reward model to score generated
responses and adjust the model's parameters accordingly.
Have you used either of those?
Are you familiar?
Yeah, that's the idea. Right. I don't know.
If you're training a dog, right, you want
to reward it for doing good things.
Basically, the idea is to say, this is
the reward for answering this question correctly.
When you talk rewards and you bring
up the example of training a dog.
So are you telling Chat GPT,
good job, the answer was correct.
In a sense.
When you're training it, you are effectively
leading it to go in a direction. Right.
If you want your dog to sit and it starts to sit
all the time, you're going to reward it more to an extent.
There are some dangers here, though, if you are
a bad actor, you could say that, hey, here's
a reward for answering this in a negative tone. Right.
Every time you answer in a
negative tone, here's a treat. Right?
Well, then it learns.
Well, I'm going to start answering in a negative tone.
It reminds me of those have you ever been
to one of those restaurants where it's like a
rude restaurant where you walk in and they're like,
you know, that's not really my vibe.
I've heard of them.
There's enough negativity in the world.
Like you're just driving down the road
or God forbid, you log into Twitter.
There's so many angry people out there.
I feel like I don't need to invite that.
I don't need to, by choice,
go and eat an establishment.
I guess they exist for a reason. Oh, yeah.
I'm sure they make some
money because they're still around.
Yeah, it's an entertainment.
There's probably an entertainment value. Totally.
And you could do that with you could
have that same idea with training these models. Right.
Like jokingly respond negatively to these things. Right.
And then reward it every time it does that.
And then it's joking.
Maybe it's like a sarcastic like the YouTube example.
Well, he kind of gave it a persona, but
you're sort of training the guy on sorry. Yes on twitch.
Yeah.
The guy on twitch. Right.
He is training his model to respond with
maybe the correct maybe it's grammatically correct.
Right.
But it is kind of sarcastic. Right.
He's designed that and rewarded the model for
being that way for being that way.
Now, you could do that with anything, right?
You could talk about or you could
reward it for giving bad answers, right?
Like you could train it to say
one plus one equals three and reward
it for answering mathematical questions incorrectly.
You could do things where it knows the
answer, but it flips out letters, right?
Like chase C-H-A-S-E.
But tell it that you reward the model
for switching out one of the letters.
So as you're talking, I'm thinking for, again, in
the context of a workplace setting and a company
that's considering using this and trading on their own
data set, even if it's not certainly it's not
for nefarious oh, for sure use, you could inadvertently
reward the wrong oh, yeah.
So these are things to be aware of.
Well, that is one of the ethical
concerns as we take a path towards
artificial general intelligence, the AGI, right.
The model where it learns from itself and it just
iteratively picks the trainings that it should look at.
Well, if we start rewarding it for the wrong things,
inadvertent Lee it's going to start chasing the wrong answers,
the wrong rewards, and then it's going to iterate that
and keep getting worse and worse and worse. Right?
Like if we reward can you walk that
back or are you starting mean everything's got
a model that's just gone AWOL.
You can just unplug it, right?
It does, unfortunately for the
AGI, right, need electricity.
So I don't think it's going to ever take
over the world, but it will do potentially unethical
things because of the way that you trained it
and potentially you didn't mean to do that. Right.
So you just have to be careful about the data set
that you come up with and ensure that you've cleaned that
up to ensure that there's not basically wrong answers.
Right?
If we put in, I don't know, we had an employee
maybe that if we had an employee that wasn't good at
talking about Ven and they said, well, we implement NetSuite, and
a lot of those prompts get into the training model.
Well, now GPT or our own model is
going to think that we implement NetSuite and
that's not something that we would want.
Yeah, that was a great example. All right.
Was there something a prompt that
you wanted to enter in?
Yeah, I did something similar. I did.
What are some of the things to be thinking about?
When I train an OpenAI model and it hit on a
majority of what you mentioned, it got into some of things
that you potentially don't necessarily have to worry about.
If you're using OpenAI, there are some open source models
that you can run on your own server, on your
own computer, and there's no connection to the Internet.
If you do go down that path, you
have to be worried about hardware, right.
These things run on high end
graphics cards and those are expensive.
The bigger the data set, the
more context that you're giving it.
You need more Ram and storage and things like that.
So you probably don't have to worry about that.
If you're going to be using OpenAI, you do have
to be thinking about kind of these evaluation metrics.
That's one of the things that
we were just talking about, right?
The reward piece.
There are some different levers within these models that you
can move up and down to kind of move it
in a direction that you would want it to go.
Kind of like with the tone idea, right?
Like, I want you to be polite or I want you to be rude.
Is this like when you were talking about we've
done some things where we've scored things on a
scale of zero to five, or is that different?
Exactly, that's the same. Okay.
Yeah, you're giving it parameters. Exactly.
And then when it says that was an
eight and you're like, yeah, that felt like
an eight, you're close, you're getting there.
That's kind of the idea, right.
You've got a bunch of different
things to help it learn.
Ultimately, that's the idea.
And it does go into they call it fine
tuning because you can train it multiple times.
It's not just a one shot.
You might have more data that comes in today
that you now want to send to it, and
you can refine instead of starting over from scratch.
You can iterate on your own data set to enhance it,
to refine it, and you can have it learn new things.
Maybe you did transcripts at first, but now
you're piping in, I don't know, some level
of company like metrics or data, right.
And now it can learn that stuff.
You also have the ability to train multiple models.
You could have one model for transcripts, you could
have one model for sales data, and through your
own security and privacy setup, you could say, well,
Mel gets access to both and Chase only gets
access to this one, right?
So there is some level of it's not
necessarily training a model, but basically segregating out
which models that you want to train.
I mean, you could even have one
that's like the expert content writer, right?
And you feed it all sorts of like, here's
how to write, here's how to write really well
in English or with this kind of tone.
And there may be some benefit to that,
like splitting out your models just to make
sure that there is some level of segregation.
But that's one thing that I don't know, maybe
not a lot of people are thinking about they're
probably not thinking a lot about fine tuning things.
But when you do, you're going to
want specific models to do specific things.
Just like in your business, you're going to
hire somebody that's really good in marketing.
You're probably not going to find somebody
that's good at marketing and coding.
Those are going to be two different people.
So naturally, you probably have
two different models, right?
In that scenario.
I think that's a really interesting point you make.
I hadn't considered that we would
have multiple models here then.
But do you have to be aware of
this concept of overfitting in that context?
So you split off the I want this
to go be specialized in this area.
I've read some things about it.
As you train, you want to maintain
this idea of generalization so that can
adapt or respond to different inputs.
Yeah, I'm sure they'll figure out a
way to do this at some point.
But at least through all the different
models and training these models, I haven't
seen the ability to unlearn something right.
Like, oops, I sent you the wrong stuff, right?
As far as I can tell.
And feel free to write end,
right, if I'm just wrong here.
But you don't have the ability to delete
something that you sent to it to learn. Right.
So if you told it one plus one is three, you
can't be like, oh, hold on, let's go retrain that.
Because what it will do, I mean, maybe you could train
over it over time and slowly get rid of that.
But it's like memory, right?
It trained on a data set,
and it basically memorized that.
Well, now I give it another data set.
It memorizes that what you'll find or what you'll have
to be worried about or be thinking about is the
data that you gave it two years ago is now
really old, and maybe some of that is no longer
relevant or it is now factually incorrect. Right?
Or maybe it was wrong to begin with.
Well, and I think, again, as you're growing and
scaling and maybe you're entering new markets or you
have new products, obviously, hopefully you're feeding it that
information, but the overfitting, you run the risk of
it not responding well to new inputs.
Yeah, well, and that's why Chat
GBT has been doing this.
They've got different versions, right?
And they just have version 3.5 and now four.
They'll have five and six and
seven and eight at some point.
Well, I think I have a better
understanding of how to train your model.
I couldn't go do it today.
I know someone who could probably do it for us.
Call me.
We would love to hear what
our listeners think about this.
If you have experience training a model or additional
follow up questions, or if we want to get
more specific, send us an email at sejunction@bentechnology.com.
Until then, keep it automated.