#7: How to Train Your Own AI Language Model
E7

#7: How to Train Your Own AI Language Model

Welcome back to another episode of The Junction.

We are going to be talking about how

to train your own AI model today.

We've been talking a lot about AI, you know, jumping

in there and asking it prompts and things, but, you

know, once you start jumping in your own box or

like a paid version of it, I think you start

to see real benefit from it.

Learning off your own data sets, right.

Chase, would you agree? Disagree?

Well, yeah, I mean, you train anybody

on your own stuff naturally, right.

They're going to be better at it.

You do it over and over again.

You're going to learn from your

own right, wrong or indifferent.

For better, for worse. Yeah.

You bring your other half with you, right.

You're naturally going to be better. Sure.

I just have a question for you.

If I wanted to train my own data model, my

own AI model, should I pull up the how to

Train your Dragon movie and like, I've got it done.

Are we good?

No, I'm shaking my head. Can you sing it?

What's the theme song?

That's a Pixar movie, I think.

I know, right?

There's no music in there.

Mel, you got to get on the movie train.

You should go talk to every Pixar music

has movie or every Pixar movie has music.

Yeah, they do.

You know what is crazy, though?

There are training models to generate video content.

Like, do you think the next Pixar movie

is going to be made by AI?

I think parts of the Pixar movie

are going to be made by AI.

You think about, oh, you know, the Star Wars movies

that just came out, they have animated in a I

don't know if they've necessarily used, like, OpenAI.

Right.

But they're using artificial intelligence to make people

look younger or people that are now gone.

They are bringing them back to life,

which maybe has some ethical concerns.

But I really enjoy Star Wars movies,

so I'm not going to complain.

It's for entertainment value, naturally. Right.

First of all, when we talk about a model, is

that what the layperson thinks of as Chat GPT?

And I'm talking about myself in this

context as a model or like yes.

When you say training, an AI model to me, is

that I'm going to go out to Chat JPT.

And is Chat JPT a model?

Is Jasper a model?

Are these various AI tools models or is there a

handful of models that all these tools are using or

based on the model that you're talking about, are these

large language models that are trained off of data?

And typically it's like I won't say a

one time thing, but they're not continuously training.

So you give it a set of data.

It is now trained and for the most

part, you don't typically go back, at least

at the size that OpenAI is doing it.

You don't go back and train it

over and over and over again.

Now you can but typically you do versions, right?

Like, you refine your data set

and you generate a new model.

And that's what OpenAI is doing.

They've got 3.5.

They had some versions prior to that.

They've got Iterative versions even within 3.5 and 4.0.

They've got different versions of 4.0 where

it'll take more context and some less.

Some of that's just the way that

those business runs large language models.

That's correct.

Okay, because while you were explaining, I went out and

just did a nice quick Google search on the types

of models and pulled up an article here from HubSpot

talking about the models that marketers are using today.

So it's calling out four types of artificial

intelligence in this article reactive Machines, Theory of

Mind, Limited Memory, and Self Aware.

Does Chachabt fall into any of those?

I think it depends on and yes, the answer

is yes, it depends on the model that you're

using and the method that you're using it.

Ultimately, all of these are trained

off of data sets, right?

They took this is maybe layman's terms, right?

But they basically took all of the Internet prior

to 2021 and threw it at this model.

And the model learned off of that information.

And now it can provide, if it was in its raw

infancy with no rules, none of the as an Open AI

model, I can't it would basically regurgitate all the things that

it's like if you ever watch The Matrix, right, they just

downloaded how to do jiu jitsu and boom.

Now they can do jiu jitsu.

It's a similar idea, but the hope isn't that we would

take these models and just give it a wealth of information

that it would predict the next best answer, right?

If we tell it at one plus one is two, then

in theory we could say, well, what is two plus one?

And without giving it that data, that pre created data

to tell it that two plus one is three, it

could predict that two plus one is three.

So what should business leaders be excited about, or anyone

who's thinking about deploying an OpenAI strategy at their company

and training it on their own data set?

What are those predictive insights that you

see people getting excited about the most

and getting the most value out of?

The first one that I can think of is, at

least from a marketing standpoint, this is how the people

in our business talk about our business, right?

And you take all of those transcripts or recordings,

you digest that down into a data set that

you send to the model, and the model then

starts to talk like Mel, right?

It starts to talk like Scott,

starts to talk like Chase.

And it uses that to understand that, well,

this is the way we talk about things.

One of the things that we say often is

we help you close the gaps to close business?

Well, it would pick that up, right.

And then if you asked it to write a marketing

email, it might use a little bit of that, right?

Or it might think like, oh, these guys can

close the gaps to help people win business. Right.

So I'm going to write something about, hey, we

can help you close the business by closing gaps

that you couldn't have otherwise closed without us.

Right.

Which if you go out to chat chippyt, unless that's

repeated, I guess throughout the website or in some other

way right, from the pre 21 era, it wouldn't necessarily

be that accurate or have that tone of voice. Right.

Well, this is what you do when you train a model.

Basically, you're asking a question in one prompt and in

the second prompt you're giving it the answer, right?

You're saying, here's the question,

here's the right answer.

And over time with enough prompts where you give it

similar questions and similar answers, it starts to realize, okay,

this is the way we talk about this thing.

Is there a benchmark or a baseline?

How many prompts and how many

answers are we talking about here?

Ten hundred thousand.

It's number 2100.

Prompts and answers is the benchmark for maybe

it's starting to learn a little bit.

Anything below that, it's just not going to be enough.

Now you could certainly try it, right?

And above and beyond that

you're kind of diminishing returns.

Now, I won't say that.

Yeah, aim for 100.

You should aim for like 1000.

And it depends on the data set or

the data model that you're trying to build.

Should we transition and ask OpenAI how to train a

model or do you have additional well, this is just

one of the things that I'm thinking about, right?

Like you were talking about use cases.

We can do it for marketing, but you can also

do it for anything that you are doing repetitively where

the answer kind of changes over time, right?

From a coding standpoint, well, we're not going to

code the same exact thing every time, but maybe

we do it in a specific way or maybe

we write emails in a specific way.

Anything that is similar over time or kind

of repetitive, but not always the same.

Those are the use cases where you

would benefit from training a model.

So we've got access to this OpenAI playground.

Is that where I'm training it?

Where do I go?

I don't know where to start.

Great question.

Training a model is not just

typing something in by hand.

You have to curate your data set, right?

You have to put it in a specific syntax in a

CSV file that is formatted in the way that the model

that is going to digest it expects it in.

You can't just throw a bunch of stuff at it, right?

Let me grab some transcripts and throw it in there.

It's not a folder where you

can upload a bunch of stuff.

You have to tell this model, hey, here's the data set.

I know how you were expecting it.

And here are the prompts in that specific order

that you are expecting to digest them in.

I mean, it is a computer, right?

Like, sure, you can write a

message, right, and chat with it.

But that's the front end piece we're

talking about, the back end piece.

So if you want to train a model, you have

to use at least for OpenAI, you have to use

the API to send the data to it.

It then takes some time to digest it, and then

you can go in and start typing messages to it.

Do I need a technical background to do this?

Thinking about our listeners, right?

Like, what are the resources I

need besides maybe a paid subscription?

To get access to OpenAI, you don't necessarily need

I mean, obviously you need access to your data.

You need to know how the data is formatted.

You need to format the data.

Now, up until that point, anybody

could probably do that, right?

It's not this crazy format where it's written code,

it's just a specific syntax in a CSV file.

Beyond that, sending the data in, you

will have to access the API, which

involves some level of technical pross.

And if you can't do mean I know some people, right?

We can certainly help you with that.

But you do have to send the data in through the API.

And at that point you can

start using that trained model.

You can call it Mel Bell is super cool,

model one, whatever you want to call it.

And then in that playground, then

you can select that model, right?

And then start typing messages to it.

So I would say, I don't know, on

a scale of one to five, you're probably

in the three range somewhere around there.

All right, let's get back to it.

Let's jump into the playground and let's

ask yeah, what should we ask it?

How to make a million bucks?

Or, I mean, should we do something more realistic?

I have a feeling you've done that before. Have you?

True or false, has Chase ever asked Chat

JPT how to win a million bucks?

I haven't asked that specific question, but one of the questions

that people do ask is, how do I make money? Right?

It's actually a pretty decent question to ask.

Not because you think it will actually tell you

how to make money, but it forces the model

to predict something that isn't necessarily one plus one.

It's like something that is not tangibly,

quickly, accessible, so it shows that it

can think at least a little bit.

But I have not personally asked that, at least not yet.

I just typed in here, how do I

train an AI model like Chat GPT?

I didn't give it any other parameters,

didn't tell it, give me a list.

So it came out here and said, here's

a high level overview of the process.

Give me five things.

Gather a data set, which you hit,

prepare and clean the data set.

Two for two.

Fine tune a language model.

So this says Chat GPT is typically built upon

a pre trained language model generated, pre trained transformer.

That's where you're sending the

data in through the API. Okay.

Number four, select a training approach

so you can use either the

supervised fine tuning or reinforcement learning.

So this supervised fine tuning involves training the

model to predict the next conversational turn.

Given the previous turns, the reinforcement learning

uses a reward model to score generated

responses and adjust the model's parameters accordingly.

Have you used either of those?

Are you familiar?

Yeah, that's the idea. Right. I don't know.

If you're training a dog, right, you want

to reward it for doing good things.

Basically, the idea is to say, this is

the reward for answering this question correctly.

When you talk rewards and you bring

up the example of training a dog.

So are you telling Chat GPT,

good job, the answer was correct.

In a sense.

When you're training it, you are effectively

leading it to go in a direction. Right.

If you want your dog to sit and it starts to sit

all the time, you're going to reward it more to an extent.

There are some dangers here, though, if you are

a bad actor, you could say that, hey, here's

a reward for answering this in a negative tone. Right.

Every time you answer in a

negative tone, here's a treat. Right?

Well, then it learns.

Well, I'm going to start answering in a negative tone.

It reminds me of those have you ever been

to one of those restaurants where it's like a

rude restaurant where you walk in and they're like,

you know, that's not really my vibe.

I've heard of them.

There's enough negativity in the world.

Like you're just driving down the road

or God forbid, you log into Twitter.

There's so many angry people out there.

I feel like I don't need to invite that.

I don't need to, by choice,

go and eat an establishment.

I guess they exist for a reason. Oh, yeah.

I'm sure they make some

money because they're still around.

Yeah, it's an entertainment.

There's probably an entertainment value. Totally.

And you could do that with you could

have that same idea with training these models. Right.

Like jokingly respond negatively to these things. Right.

And then reward it every time it does that.

And then it's joking.

Maybe it's like a sarcastic like the YouTube example.

Well, he kind of gave it a persona, but

you're sort of training the guy on sorry. Yes on twitch.

Yeah.

The guy on twitch. Right.

He is training his model to respond with

maybe the correct maybe it's grammatically correct.

Right.

But it is kind of sarcastic. Right.

He's designed that and rewarded the model for

being that way for being that way.

Now, you could do that with anything, right?

You could talk about or you could

reward it for giving bad answers, right?

Like you could train it to say

one plus one equals three and reward

it for answering mathematical questions incorrectly.

You could do things where it knows the

answer, but it flips out letters, right?

Like chase C-H-A-S-E.

But tell it that you reward the model

for switching out one of the letters.

So as you're talking, I'm thinking for, again, in

the context of a workplace setting and a company

that's considering using this and trading on their own

data set, even if it's not certainly it's not

for nefarious oh, for sure use, you could inadvertently

reward the wrong oh, yeah.

So these are things to be aware of.

Well, that is one of the ethical

concerns as we take a path towards

artificial general intelligence, the AGI, right.

The model where it learns from itself and it just

iteratively picks the trainings that it should look at.

Well, if we start rewarding it for the wrong things,

inadvertent Lee it's going to start chasing the wrong answers,

the wrong rewards, and then it's going to iterate that

and keep getting worse and worse and worse. Right?

Like if we reward can you walk that

back or are you starting mean everything's got

a model that's just gone AWOL.

You can just unplug it, right?

It does, unfortunately for the

AGI, right, need electricity.

So I don't think it's going to ever take

over the world, but it will do potentially unethical

things because of the way that you trained it

and potentially you didn't mean to do that. Right.

So you just have to be careful about the data set

that you come up with and ensure that you've cleaned that

up to ensure that there's not basically wrong answers.

Right?

If we put in, I don't know, we had an employee

maybe that if we had an employee that wasn't good at

talking about Ven and they said, well, we implement NetSuite, and

a lot of those prompts get into the training model.

Well, now GPT or our own model is

going to think that we implement NetSuite and

that's not something that we would want.

Yeah, that was a great example. All right.

Was there something a prompt that

you wanted to enter in?

Yeah, I did something similar. I did.

What are some of the things to be thinking about?

When I train an OpenAI model and it hit on a

majority of what you mentioned, it got into some of things

that you potentially don't necessarily have to worry about.

If you're using OpenAI, there are some open source models

that you can run on your own server, on your

own computer, and there's no connection to the Internet.

If you do go down that path, you

have to be worried about hardware, right.

These things run on high end

graphics cards and those are expensive.

The bigger the data set, the

more context that you're giving it.

You need more Ram and storage and things like that.

So you probably don't have to worry about that.

If you're going to be using OpenAI, you do have

to be thinking about kind of these evaluation metrics.

That's one of the things that

we were just talking about, right?

The reward piece.

There are some different levers within these models that you

can move up and down to kind of move it

in a direction that you would want it to go.

Kind of like with the tone idea, right?

Like, I want you to be polite or I want you to be rude.

Is this like when you were talking about we've

done some things where we've scored things on a

scale of zero to five, or is that different?

Exactly, that's the same. Okay.

Yeah, you're giving it parameters. Exactly.

And then when it says that was an

eight and you're like, yeah, that felt like

an eight, you're close, you're getting there.

That's kind of the idea, right.

You've got a bunch of different

things to help it learn.

Ultimately, that's the idea.

And it does go into they call it fine

tuning because you can train it multiple times.

It's not just a one shot.

You might have more data that comes in today

that you now want to send to it, and

you can refine instead of starting over from scratch.

You can iterate on your own data set to enhance it,

to refine it, and you can have it learn new things.

Maybe you did transcripts at first, but now

you're piping in, I don't know, some level

of company like metrics or data, right.

And now it can learn that stuff.

You also have the ability to train multiple models.

You could have one model for transcripts, you could

have one model for sales data, and through your

own security and privacy setup, you could say, well,

Mel gets access to both and Chase only gets

access to this one, right?

So there is some level of it's not

necessarily training a model, but basically segregating out

which models that you want to train.

I mean, you could even have one

that's like the expert content writer, right?

And you feed it all sorts of like, here's

how to write, here's how to write really well

in English or with this kind of tone.

And there may be some benefit to that,

like splitting out your models just to make

sure that there is some level of segregation.

But that's one thing that I don't know, maybe

not a lot of people are thinking about they're

probably not thinking a lot about fine tuning things.

But when you do, you're going to

want specific models to do specific things.

Just like in your business, you're going to

hire somebody that's really good in marketing.

You're probably not going to find somebody

that's good at marketing and coding.

Those are going to be two different people.

So naturally, you probably have

two different models, right?

In that scenario.

I think that's a really interesting point you make.

I hadn't considered that we would

have multiple models here then.

But do you have to be aware of

this concept of overfitting in that context?

So you split off the I want this

to go be specialized in this area.

I've read some things about it.

As you train, you want to maintain

this idea of generalization so that can

adapt or respond to different inputs.

Yeah, I'm sure they'll figure out a

way to do this at some point.

But at least through all the different

models and training these models, I haven't

seen the ability to unlearn something right.

Like, oops, I sent you the wrong stuff, right?

As far as I can tell.

And feel free to write end,

right, if I'm just wrong here.

But you don't have the ability to delete

something that you sent to it to learn. Right.

So if you told it one plus one is three, you

can't be like, oh, hold on, let's go retrain that.

Because what it will do, I mean, maybe you could train

over it over time and slowly get rid of that.

But it's like memory, right?

It trained on a data set,

and it basically memorized that.

Well, now I give it another data set.

It memorizes that what you'll find or what you'll have

to be worried about or be thinking about is the

data that you gave it two years ago is now

really old, and maybe some of that is no longer

relevant or it is now factually incorrect. Right?

Or maybe it was wrong to begin with.

Well, and I think, again, as you're growing and

scaling and maybe you're entering new markets or you

have new products, obviously, hopefully you're feeding it that

information, but the overfitting, you run the risk of

it not responding well to new inputs.

Yeah, well, and that's why Chat

GBT has been doing this.

They've got different versions, right?

And they just have version 3.5 and now four.

They'll have five and six and

seven and eight at some point.

Well, I think I have a better

understanding of how to train your model.

I couldn't go do it today.

I know someone who could probably do it for us.

Call me.

We would love to hear what

our listeners think about this.

If you have experience training a model or additional

follow up questions, or if we want to get

more specific, send us an email at sejunction@bentechnology.com.

Until then, keep it automated.

Episode Video

Creators and Guests

Chase Friedman
Host
Chase Friedman
I'm obsessed with all things automation & AI
Mel Bell
Host
Mel Bell
Marketing is my super power