Use Machine Learning to Score Leads in Google Sheets

Welcome to part two of our little series on using a Bayesian classifier to score leads. Let’s smash this.

Last time we delved into a little background stuff – what it is to score leads and where most techniques fall short; the skinny on Bayes’ theorem; and how this neat thing called the naïve Bayes classifier fits into it all. Today we’re going to look at doing a sort of proof-of-concept implementation of the classifier to score leads in Google Sheets.

Come. Let us make it so.

Step 1

First things first! We need a bunch of data with which to train our lead-scoring algorithm. Let’s do that. Below you’ll see we’ve got an embedded Google Sheets, um, sheet with some particulars on it.

It should be mostly self-explanatory, but in case it isn’t, here’s a breakdown of what the headings mean:

company – three guesses
trial member? – has the lead signed up to our 30-day trial?
content subscriber? – has the lead signed up to our blog community?
attended a webinar? – has the lead ever viewed one of our free online webinars?
attended an event? – has the lead been to one of our little meetups?
state – is the deal open, closed or lost?

As you can no doubt see, we’ve populated 20 rows under each heading with some dummy data.

Step 2

To be able to score leads ourselves, we need to show the sheet how to interpret this training data. We’re gonna start real simple – just counting how many times the state column is marked sale, loss or pending.

To do this we’ll need a simple Google Sheets function called COUNTIF, which counts the occurrence of a thing in a range of cells. So create three headings: sales made, sales lost and sales pending. Alongside sales made, you want the following formula:

=COUNTIF(F2:F21,”sale”)

This tells Google Sheets to look in the state column and count the number of cells containing the word ‘sale’. Simple. Next, do the same for the sales lost heading:

=COUNTIF(F2:F21,”loss”)

And then the sales pending heading:

=COUNTIF(F2:F21,”pending”)

Nice. Now we’re going to do something even more straightforward: we’re going to figure out the chances of encountering each state.

Under sales made, write another heading: probability of sale. Under the other headings write probability of loss and probability of pending.

Now, alongside probability of sale, use the following formula:

=B1/(B1+D1+F1)

This simply checks the ratio of sales made against the total number of states. Now you know the drill – repeat this part for the remaining headings.

You should now have something like the following:

Step 3

Now we’re going to calculate the odds that a given category is associated with a sale, a loss or a pending state. We’re basically going to do this by counting all the times, for example, leads have attended demos and ended up buying (but not counting if only one of these conditions is true).

You’re aiming for a section like this:

For this bit we’ll be using a similar spreadsheet function to the one we used in the last step – this time COUNTIFS. This version of the function takes as many arguments (that’s the term for the gumpf inside the brackets) as you like, and only updates the count if all the conditions match.

Let’s do the trial member row. Start with the sale column:

=(COUNTIFS(B2:B21,B24,J2:J21,”sale”)+1)/(B33+1)

What is this doing? First of all it’s looking through the trial member column of the training set, looking for a match to whatever is in cell B24 – in this case ‘no’. It then looks through the state column to find any occurrences of ‘sale’. It then divides that by the contents of cell B33, which if you recall contains the count of the total number of sales in the training set.

In other words, how often did a trial member end up becoming a sale, as a proportion of total sales? You can type in ‘yes’ or ‘no’ into the conditions column and it’ll update the probability accordingly.

(You’re probably wondering why we’re adding 1 to both sides of the equation. This is called additive smoothing, and it’s used to avoid tricky situations when a count comes back as zero. You may take our word for it that dividing by zero is a bad way to go.)

Okay, let’s carry on. Now do the loss column:

=(COUNTIFS(B2:B21,B24,J2:J21,”loss”)+1)/(D33+1)

And finally the pending column:

=(COUNTIFS(B2:B21,B24,J2:J21,”pending”)+1)/(F33+1)

You’re doing great, by Jove. Just repeat this for all the remaining categories and you’re swimming.

Step 4

Guys guys we’re nearly there. We’ve got all the individual probabilities worked out. Now we just need to chain them together to get our combined probabilities.

Which is to say, we’re going to use multiplication to figure out the answer to (for examps): ‘Given that a lead is not a trial member, has subscribed to our content, has attended a webinar, and has not attended an event, should we hard-sell them, rescue them, or just keep nurturing them?’

Whatever conditions we plug into our spreadsheet will give us an updated answer based on our training data. Neat.

Our next section is going to look like this:

What’s happening here? We’re essentially working out the probability of each state given a set of conditions, then simply picking the option with the highest probability. We then have the sheet make make a recommendation to the sales and marketing teams based on that outcome.

Here’s what the formula for probability of sale looks like:

=C2*C3*C4*C5*B34

And here’s probability of loss:

=D2*D3*D4*D5*D34

Finally, probability of pending:

=E2*E3*E4*E5*F34

As you can see, we’re just multiplying all the probabilities under each state together. It’s frankly ridiculous how simple this is.

The last cell looks a little tricky:

=IF(B1>D1,IF(B1>F1,”SELL”,”NURTURE”),IF(D1>F1,”RESCUE”,”NURTURE”))

But all it’s saying is, if the highest probability is making a sale, recommend ‘SELL’. If the highest probability is loss, recommend ‘RESCUE’. Finally, if the highest probability is pending, recommend ‘NURTURE’.

Step 5

Boom. Your classifier is ready for use. Your whole sheet should look something like this:

Obviously you’ll want to set up your categories according to stuff that’s meaningful (and measurable) for your organization. You’ll then want to populate the training data with actual examples from your own sales pipeline (the more entries you can put in, the better).

Then all you need to do is plug in the info for a lead you want score and the algorithms will do the rest of the work.

But wait. It’s a bit cumbersome right now, right? That’s why, next time, we’re going to show you how to set up an automated lead-scoring monster.

Because you’re our favourite.

PS: Want to short-cut all of this? You can find our sheet here – all you need to do then is click file > make a copy to get your own version.

Click to read part 3.