Here’s how a famous bit of math from the 18th century is helping us bolster our sales through machine learning.
If you’re new to lead scoring in general, here’s the gist. You’ve got a bunch of leads, right? In fact, you should have more leads than your sales peeps can comfortably handle at once. Which is to say, you need some way of prioritizing those thangs.
Enter lead scoring. Based on the nature of the lead – how big they are, which industry they’re in, etc – and that lead’s behaviour (how many white papers they’ve downloaded, whether they’ve signed up to your newsletter), you should be able to establish which leads are ready for the sales team and which leads require further nurturing.
Many ways of scoring leads rely on weighting – viewing a pricing page is given a lesser weight than filling out a form requesting a demo, for example.
This is the way we do it, and it certainly makes sense, but it has a problem: it’s quite hard to ensure weights aren’t arbitrary. Does downloading a whitepaper get a weighting of three? Ten? Hmm.
In fact it has two problems: it’s also quite hard to interpret the scores. At which point do you decide a lead is hot versus merely warm? When does a lead need rescuing?
These are things that, as far as we can establish, need to be felt out. We wanted something more self-sufficient and less prone to bias and error.
Enter the naïve Bayes classifier, the secret behind our machine learning magic trick.
Let’s say you’re standing on a street corner, waiting for a friend. You can see a bunch of people walking toward you; one of them is this friend and everybody else is not this friend. The task of categorising these people as friend or not-friend is a classification task.
There are many bits and bobs of information to help you in this task.These pieces of information are known as features. Hair colour. Facial features. Height. Gait. Clothing. Based on these features, you can probably make a fairly accurate guess as to which of the people is your friend.
This is essentially what classifiers do: they slot stuff into categories with some degree of confidence. Which is to say, based on this information over here and that information over there, does this thing belong more to group A, B, or maybe C?
Now there are many ways of handling this classification under the hood. Some of them, such as convolutional neural networks, can get alarmingly complicated.
The naïve Bayes classifier is not alarmingly complicated. As opposed to deep-learning techniques, this classifier decides a thing belongs to one category or another based essentially on counting stuff. It really is that straightforward.
But let’s start at the beginning.
What’s a Bayes and why should we care?
Reverend Thomas Bayes was a bit of a mathematical bad-ass who revolutionised probability theory back in the 1700s with a mouthful of a document titled An Essay Towards Solving a Problem in the Doctrine of Chances.
Bayes basically came up with a way of scoring a belief based on evidence. This is important, because the only way we can figure out a lot of things is by testing them – and the problem with tests is that they are never, ever, ever 100% reliable.
This becomes especially important when you’re testing for rare things. Let’s say you’ve got a test for terminal collywobbles that throws up a false positive 1% of the time. That doesn’t seem like a lot of false positives, and it leads people to make very scary assumptions about test results because they think they’re, like, 99% likely to have collywobbles.
But now let’s say the kind of collywobbles you’re testing for only occurs in one out of a million cases. Given that, it may surprise you to know that you’ll get far more false positives than real positives.
If that seems counterintuitive, we feel for you. But Bayes has got you covered. Let’s work through this.
UPDATE: Just FYI, one of our wondrous mathematical minds has made some tweaks to our breakdown of the numbers here.
Can you do basic addition, multiplication, and division? Then you’ve got this.
So the idea is this: we need to relate two things – the chances that somebody has collywobbles given a positive test result; and the chances that a test result is positive given that somebody has collywobbles. (These may appear to be the same, but they’re not. You’ll see.)
Let’s start with a scary-looking mathemagical formula. DON’T WORRY THO WE HAVE YOUR BACK. It’s really not scary at all.
P(C | T) = P(T | C) × P(C) ÷ P(T)
Guys guys this is basically instructing you to multiply and divide stuff. Seriously.
We’ll take this nice and slow. Let’s try this with even higher chances of having collywobbles:
- 1 out of 10000 people have collywobbles
- 90 out of 100 people with collywobbles test positive
- 1 out of 100 people generally test positive
Let’s start with the most important figure, the one that establishes a baseline: the probability of somebody having this form of collywobbles at all. Remember, this is P(C).
- That’s 1 time out of 10000, or a probability of 0.0001 (1 being 100% probable). Therefore, P(C) = 0.0001
- Now let’s look at the probability of a positive result recorded for those with collywobbles. This is the P(T | C) part of the equation. That’s 90 times out of 100, or a probability of 0.90. Formally, we say P(T | C) = 0.90
- And the probability of getting a positive test, P(T), whether or not they have collywobbles, is 1 out of 100. This is mathematically presented as P(T) = 0.01
- Finally, all this information gives us what we need to calculate the chances that somebody has collywobbles given that they tested positive – P(C | T). So, using that equation from, we do this:
P(C | T) = P(T | C) × P(C) ÷ P(T)
= 0.90 × 0.0001 ÷ 0.01
That’s a… 0.9% chance of actually having collywobbles – even with the positive test result!
Clearly, it’s quite a different and less-alarming picture. (And yes, the calculation really is that straightforward.)
But what does this have to do with classification and lead scoring?
It’s real simple actually – you just run the probabilities on a bunch of things and then classify your input based on the thing with the highest probability.
The nice thing about the naïve Bayes classifier is that it assumes all the features – the bits of evidence – all count independently toward a certain overall probability. That means you don’t need to worry about how they interrelate; you just multiply as many things together as you need.
That means with machine learning we can take a bunch of behavior (did the lead download this white paper? Did they fill in a contact form?) and use that to infer, using probabilities, whether a lead should be sold to, nurtured or rescued.
Let’s see how this actually looks in practice – but let’s do it next time around. We’ve already blasted you with a thousand words here 🙂
Click to read part 2.