linear regression

The Machine-Learning Contact Center: a Basic Primer on Linear Regression

Linear regression is about as simple as deep-learning gets. Let’s see what it can do for us.

Whoa, so we got stuck in the deep end last time around with generative adversarial networks. Let’s get brass-tack now and have a squizz at one of the most straightforward machine-learning algorithms.

 

What in the Jiminy crickets is linear regression?

Linear regression – it’s a scary name (at least it was to this writer, who is not very clever), but not at all a scary topic to grok.

Let’s say you’re trying to sell your Care Bear comic collection (oh shh, we’re not judging) and you want to know what you could expect to get for them. Now helpfully, you have a big old spreadsheet of past Care Bear comic sales. Let’s say your spreadsheet has a bunch of information, including the price a comic sold for and the year in which it was printed.

Now, say you’ve had a chance to sell a reasonable number of comic books. After a spell, you realise people are negotiate lower prices for certain comics yet they are also willing to pay higher price for other issues. Chances are you won’t think this behaviour is random. (Right?) You’ll be thinking the price of each comic book sold is being influenced by something that the customers deem important.

Being the astute business person you are, you might make such a statement: ‘I believe my customers are negotiating their prices based on the year each comic book was printed in.’ As a result, you might end up arguing that perhaps – knowing when a comic was published – you could ‘guess’ how much a customer would be willing to pay for it.

 

And now for a little gentle jargoning

In machine learning circles, your belief is formally called a hypothesis statement. Since you hypothesised that prices might be influenced by the year of print, we therefore say that price is dependent on the year of issue. Since the belief is that price is influenced by year and not the other way around, we conclude that the year is independent of price. Officially, these two streams of information – the publication years and the prices – are called variables. This because the print year and price on each comic purchased are arbitrary (or, they ‘vary’). Capisce?

Now, if you set up a simple graph, with the publication dates going backwards from left to right and the prices from bottom to top, then place each comic sale on the graph accordingly, you’ll see that while the points look a little random, there will be a definite trend to their placement. In fact, we’re willing to bet that just off the cuff you could draw a rough line through the middle-ish of all the points.

That right there is linear regression – essentially reducing a cloud of data to a line. That line, once drawn, can be used to predict the prices of other comics.

See? Simple.

 

Now what could you use it for?

Any bit of contact-centre data whose value is influenced by other data is a prime candidate for linear regression. Volume forecasting is the most standout example – using linear regression, it’s very simple to predict staff requirements.

The drawback is that the technique falls short with data that don’t readily fit to a line, and because it represents only the midpoint of the spread, it doesn’t much help for predicting any kind of extreme. (And in fact extremes in data can throw your whole line off.) So while it’s good for predicting, say, holiday-period elevations, it won’t do you much good for spikes on individual days.

Still, it’s a technique so simple you could implement it yourself in Excel or Google Sheets.

Which is why, in our next instalment, we’ll tackle doing just that.