“Knock… knock,” he managed to utter, as he lay dying on the desert floor, having gone without water for as long as the human body was capable, in an attempt to tell the driest joke in history.
About a month ago what is now known (at least on Wikipedia) as the November 2015 Paris attacks happened, with more than a hundred people killed in mass shootings and suicide bombings.
I vaguely remember first seeing reports on this on Facebook, thinking it was some sort of joke. It was unreal; classified in my head with the other “how can that be true?” events, in the realm of the Boston Marathon bombings; the disappearance of MH370; the September 11 attacks on the World Trade Centre; and Steve Job’s and Michael Jackson’s deaths, both of whom played a huge part in shaping my childhood.
Over the next couple of days I noticed that many people’s profile pictures were overlayed with the French flag. It was a movement that felt bigger than myself, and I wanted to be a part of that. I did a quick Google search and found that it was easily done. A few clicks and I could get myself a profile picture overlayed with the French flag. Facebook made it really easy.
But I had my doubts. I wasn’t sure if this was what I wanted to do. Despite my feeling of loss, I knew it was temporary and didn’t want to commit to changing my profile picture for an indefinite length of time – what would it mean to me or anyone? It felt hypocritical to have that overlay longer than the feeling lasted.
The Facebook developers, though, probably thought and felt the same thing. And in what I must say was a masterstroke, they provided users the option to have that overlay be temporary, defaulting to a week (which was exactly the length of time I’d felt was appropriate). That nudged me in the direction of going ahead with the profile picture update.
I must admit, though, that I still had my reservations. It felt, in a way, overtly political, which is something I go out of my way to be not; but at the same time it felt comforting and it gave me the feeling of being part of a bigger collective, a collective saying yeah let’s show the terrorists we won’t be put down.
Yes, I knew that this reeked of slacktivism: it certainly wasn’t the least I could do (i.e. nothing) but it probably wasn’t too far off. But what else could I do? And if it made me feel better without causing others too much distress, why not?
Still, I started to worry: had I done the right thing? I wondered if others would view me as a herd-follower, mindlessly following others because it was trendy or just because. (Just thinking about what I thought others were thinking about me made me second-guess myself — this wasn’t about me so why was I making it about me?)
And seeing many others writing about why they weren’t doing the change made me worry as well, because I’d frankly not thought too much about it (remember the nudge mentioned above? I was on the fence and a silly thing like the Facebook default of a week made me finally do it!)
So I did the only rational thing I could think of and read the arguments of those who were against the overlay (and there were many). From what I gathered, most dissenters were from one of two camps.
The first camp essentially said, “having the French flag on your profile picture is meaningless and a form of slacktivism. It doesn’t do anything and is a pointless exercise.”
The second camp, funnily enough, in effect stated quite the opposite. “Why do we only care so much about France when there are so many other countries suffering similarly? Why should the attacks in France be so special? Because no flags were put up for the other countries, I’m not going to do so for France.” Their act of refusing to take this French flag action seemed to place undue weight on the importance of this exercise.
In the end I bought more into the argument of the first camp. Putting a French flag overlay on your profile picture is a little pointless – I mean, what purpose does it serve? But then again so many of the things we do are like that, but we still do them anyway in the hope that it might make a difference, even if in the smallest of ways. (Reminds me a bit of e-mails that I send out asking for action before a deadline – I know it’s not going to happen before then, that people being people will dally and deadlines will be pushed back. But still I do it, in the hope that deadlines might one day be met.)
The second camp reminded me a bit of how charity works. If I see a single beggar I might decide to give a coin. If I see ten, I avoid them like the plague. If I gave one of them one, I would then have to give to the others. And if I couldn’t, then it wouldn’t be fair to those who receive nothing. So I just avoid giving altogether. But this just makes me feel like a prick, and keeps them all feeling hungry.
In the end though, there did seem to be common thread. A theory that unified both seemingly disparate camps. Other than the fact that those who wrote about it tended to be a little more political, I realised that if the campaign wasn’t as successful as it had been, I wouldn’t be writing this at all. Because nobody would’ve cared, and neither would I.
For every mindless Facebook user who applied the overlay (me included), there was a dilution of (political?) meaning (though it seemed to me to somewhat increase the feeling of solidarity and community). In the end, the more political among us probably found that making a greater statement was to not have an overlay, but to write about why not to have an overlay.
On the Why only France Question
I want to address separately the “why only France” question though, because this did stump me a little bit. I sort of got this argument at an intuitive level – France is no more or less special than other countries that had been attacked, and having it elevated to such a “special” status can be irksome feel horribly unfair.
But, like a number of commentators have mentioned, one big difference is that the attack in France was so rare and unlikely that it shocked us. A bombing in Israel or Palestine (or the general “Middle East”) seems like a once-a-week affair. Horrible as it is, it’s not unexpected and doesn’t make the news. When it happens in France, it does.
And if you ask me which makes me sadder, the deaths in Israel/Palestine/Middle East or France, I must admit it’s France. Not because I think France is greater in any way, but because I relate more to the French. I know more about them, have dreams of vacationing there, and find them more relatable because they seem more like me.
I remember the Boston Marathon bombings hitting me especially hard. Being an avid runner myself, one who aspired (still do, sometimes) to one day run the Boston Marathon, reading about the bombings made me literally sick. For weeks I felt down, and running just didn’t give me the same high. I would look at images of runners with severed limbs and ask myself what for do we run so hard?
It felt like my family was being attacked; it felt like me being attacked.
If most of the Facebook community looks like they’re treating non-Western countries unfairly, it might just be because most of its users are from Western countries, and people tend to sympathise more strongly with people from similar cultures, people who are more like them. It’s just the way we are.
And if Facebook itself does it, as a for-profit company seeking to make its users happy (so they return and drive its revenue), should we be too surprised?
One problem with success is that if you get too much of it before you’re ready, you’ll never dare to try again. Having enjoyed the glory of success, it doesn’t make sense to negate that glory through a subsequent attempt that might end with failure.
I need to have a data-dump on the sales forecasting process and forecasts.
On optimistic and pessimistic forecasting:
- When forecasts are (consistently) too low: well-known issue that even has a name: sandbagging. You forecast lower to temper expectations. When you do get better results than the forecast you look like a hero.
- When forecasts are (consistently) too high: quick research on Google shows that this is almost as prevalent as sandbagging. It seems salespeople are by nature over-optimistic about their chances of closing deals. My question though: if you consistently fail to deliver on the high-expectations doesn’t this dent your confidence? I’m not a salesperson, but if I was one I’d probably be a sandbagger (note: this actually reminds me of IT teams, where sandbagging is so prevalent because of the high variability of project outcomes).
- If the above is true, that we consistently under- and over-estimate our abilities to deliver, would a range of forecasts be a better bet? But I don’t hear sales leaders saying “don’t give me a number, give me a range.”
- Would a range solve the sandbagging and over-optimism problem? In a way, it might, since it forces an alternative view that would be hidden should only a single number held sway.
- Sandbaggers would be forced to say, “well yes, if EVERYTHING went to plan we might get a 20% increase in sales this month,” while the over-optimistic’ers would be forced to say, “fine, you are right that there is quite a bit of risk. A 10% drop wouldn’t be impossible.”
- The problem with a range is that it is, well, a range. Oftentimes a single number is preferred, especially if it’s to be communicated to other parties. It’s easier to tell a story with a single number, and its precision (note that I did not say accuracy) is seductively convincing.
- One way around this would be to explicitly ask for a range, but at the same time ask also for the “highest probability” or “expected” value. This forces thinking about the best and worst case scenarios while giving you the benefit of a single number. And if you were tracking these forecasts, you might actually find that you can systematically take the optimistic forecasts of known sandbaggers and pessimistic forecasts of known over-optimistic’ers.
On the granularity of forecasting
- When forecasting, the more granular the forecast the more noise you’ll find. I find it easiest to think about this in terms of coin flips.
- A fair coin gives a 50/50 chance of being heads or tails.
- If I flipped a coin, there would be a 50/50 chance of it being heads or tails, but I couldn’t tell you with any certainty if the next flip was going to be heads or tails.
- However, if you flipped a coin a thousand times, I could tell you with certainty that the number of heads would be close to 50%, which is the nature of fair coin.
- But let’s say I flipped a coin ten times. Could I tell you with certainty that the number of heads would be close to 50%? Well, no.
- With just 10 flips (or “trials”, in statistical parlance), the probability of getting 5 heads is actually only 24.60%, which means that you have a 75.40% chance of getting something other than 5 heads/tails.
- As we increase the number of trials, the probability of heads gets ever increasingly closer to 50%. Every additional trial reduces the variability, and you get closer and closer to what is the “nature of the coin”.
- In sales forecasting there are occasionally times that you are asked to forecast for very specific things, so specific in fact that you might only have 10 historical data points from which to extrapolate. But with just 10 trials, what’s the chance that those 10 would fit the “nature of the thing being predicted”?
- From Arthur Conan Doyle’s Sherlock Holmes: “while the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant.“
- One way around this is to aggregate upwards. You can, for example, ask yourself “what category does this thing I’m trying to predict fall into?” and lump this with those other similar things in the same category.
- Say you have 10 related products that have sold about 10 units each, similar to each other though not identical. Though you could attempt to predict them individually, the small sample sizes per product would give you so much variance your prediction would likely be not much better than chance. It would be better to group these products together into a single category and perform predictions on this larger category.
- Variations/predictive noise at the individual product level cancel each other out, giving you a cleaner picture.
- Though looking at the individual products is a precise exercise, it doesn’t add to predictive accuracy.
On Building Great Predictive Models
- The greatest amount of time spent on developing good predictive models is often in data preparation.
- Give me perfect data, and I could build you a great predictive model in a day.
- A predictive model that is 80% accurate may not be “better” than a model that is 70% accurate. It all depends on the context (if this was in the business domain, we’d say it depends on the business question).
- Let’s say I build a model that is so complex it’s impossible for others but the most technical minds to understand, or which uses a “black box algorithm” (i.e. you let the computer do its predictive thing, but you have no hope of understanding what it did, e.g. a neural network). It predicts correctly 8 out of 10 times (or 80%).
- Concurrently, I also build a model using a simple linear regression method, which is algorithmically transparent – you know exactly what it does and how it does it, and it’s easily explainable to most laypersons. It performs a little worse than the more complex model, giving me the correct answer 7 out of 10 times (or 70%).
- Is giving up control and understanding worth that additional 10% accuracy? Maybe, but in a business context (as opposed to a hackathon) chances are good that after the 7th time you spend an hour explaining why the model does what it does, you’ll probably want to opt for the more easily understandable model at the expense of a little accuracy.
- Business understanding is an important aspects of model building.
Overfitting a model and “perfect” models
- Finally, I want to talk about overfitting models. Have you heard about overfitting?
- When we build predictive models, we build them based on past data. In machine learning we call this data “training data”, i.e. data we use to “train” our model.
- Overfitting happens when we “train” our model so well on training data that it becomes so specific to the data used to train it that it cannot be expanded to predict new data.
- I find it akin to learning a new language. Sometimes you get so fixated on the grammar and syntax and structure you miss the woods for the trees, that though your speech may be grammatically correct it could be awkward or unnatural (e.g. overly formal, which is often the case if we learn to speak like we write).
- When somebody speaks to you in a language you’re just picking up using conversational language, you try to process it using your highly formalised syntaxes and grammars and realise that though you know all the words individually, when strung together they make as much sense as investment-linked insurance plans.
- Overfitting often happens when we try to predict at increasingly granular levels, where the amount of data becomes too thin
- In the end the model becomes VERY good at predicting data very close to what was used to build the model, but absolutely DISMAL at predicting any other data that deviates even slightly from that.
- If tests show you’ve got a model performing at too-good-to-be-true levels, it probably is. Overfitted models perform very well in test environments, but very badly in production.
- Sometimes when a model performs “badly” in a test environment, ask yourself: (1) is it performing better than chance? (2) is it performing better than the alternatives?
- If your answer to both (1) and (2) is yes, that “bad” model is a “good” one, and should be used until a better one comes along.
- Unless, of course, that the resources it takes to carry out predictions, in terms of monetary cost, time, or both, are higher than the benefits it brings. Sometimes a simple model with above-average performance that can be run in a minute can be far more valuable than one with superb predictive performance but which has a turnaround time longer than the time in which decisions are made.
- I know of some people who look at predictive model and dismiss them simply because they aren’t perfect; or worse, because they’re too simple — as if being “simple” was bad.
- But models have value as long as they perform better than the alternatives. If they’re simple, quick to run, and require no additional resources to build or maintain, all the better.
So many ideas – have to expand on some of these one of these days.
Did I mention that I am learning (and have learned) to program in Python at Codeacademy and am loving it? (Unbiased plug: if you want to learn to program, doesn’t have to be Python, try Codeacademy!)
Sometimes I think that I’m such a nerd: reading programming books in the train; programming for fun at night; cracking a “have you heard of Excel Slicers” joke when asked to cut a cake.
I suppose I always have been sort of a nerd, but now I actually feel alright coming out as one.