Getting Excited About Small Data

The next few quarters for analytics in my company are, from my perspective, going to be game-changing, and I’m excited to say my team’s taking the lead on it: from machine learning and advanced visualisations to new ways of thinking about data, we’re currently taking the steps to get to what I call “the next phase of analytics”. We are a small team with big dreams.

But what I often get from friends (and some colleagues) when I tell them about the things my team is doing, though, are questions on how “big data” is playing a part in it. Specifically, how it figures in our plans for the next few quarters.

When I tell them it doesn’t, they look at me as if I just said I loved eating broccoli ice-cream: perplexed; a little disgusted; and mixed with a bit of pity on the side. (If you clicked on that link or you know that song, you might have guessed I’m doing that parenting thing.)

“Big data” simply doesn’t factor in those plans (yet). We have enough small data to worry about to even think about big data. And yet, to them small data is yesterday’s news. It’s as if small data doesn’t count; as if it’s nothing to get excited about.

But it does count. And to those who haven’t yet experienced the joys of wringing all the value out of small data, it is downright exciting.

Sure, big data has the potential to change the world, and in many cases it already has. But by and large most of the value of big data still lies in its potential.

Small data, on the other hand, has long shown its ability to change the world.

I love especially this little story from the book mind+machine by Marc Vollenwider:

Using just 800 bits of HR information, an investment bank saved USD 1 million every year, generating an ROI of several thousand percent. How? Banking analysts use a lot of expensive data from databases paid through individual seat licenses. After bonus time in January, the musical chairs game starts and many analyst teams join competitor institutions, at which point the seat license should be canceled. In this case, the process step simply did not happen, as nobody thought about sending the corresponding instructions to the database companies in time. Therefore, the bank kept unnecessarily paying about USD 1 million annually. Why 800 bits? Clearly, whether someone is employed (“1”) or not (“0”) is a binary piece of information called a “bit”. With 800 analysts, the bank had 800 bits of HR information. The anlaytics rule was almost embarrassingly simple: “If no longer employed, send email to terminate the seat license.” All that needed to happen was a simple search for changes in employment status int he employment information from HR.

The amazing thing about this use case is it just required some solid thinking, linking a bit of employment information with the database licenses.

Small data can have big impact.

So yes, I am excited about small data!

And no, big data won’t be part of our coming analytics revolution. (Yet.)

The problem with running a team at full capacity

I shared this earlier on LinkedIn, but thought that it was worth sharing it here too as a reminder to myself: Six Myths of Product Development

I came across the article above while researching why a team that traditionally does great work may sometimes stumble (yes, mine). The past few weeks had been a whirlwind of activity, with team output close to or at an all time high. We were publishing and developing things left and right, and everyone was running close to capacity. It was great.

Then came an e-mail that questioned the quality of the output. Then another. Much of the great work threatened to come undone, but thankfully most made it through unscathed. We were still, generally, in a good place. But this was a wake up call. Something needed to be done.

After I explained to her my conundrum, my knowledgeable friend, Google, suggested an article from the Harvard Business Review website called “Six Myths of Product Development.”

It was a most excellent suggestion.

The article highlighted six myths or fallacies:

  1. High utilization of resources will improve performance.
  2. Processing work in large batches improves the economics of the development process.
  3. Our development plan is great; we just need to stick to it.
  4. The sooner the project is started, the sooner it will be finished.
  5. The more features we put into a product, the more customers will like it.
  6. We will be more successful if we get it right the first time.

It didn’t take long for me to realise that our problem was very likely linked to #1: I’d neglected slack.

You see, I normally tend guard slack time jealously as I know time-pressures are often a big cause of low quality output. But given the myriad of “urgent” business needs had allowed myself and the team to run too close to full capacity.

We have seen that projects’ speed, efficiency, and output quality inevitably decrease when managers completely fill the plates of their product-development employees—no matter how skilled those managers may be. High utilization has serious negative side effects… Add 5% more work, and completing it may take 100% longer. But few people understand this effect.

It’s funny how bringing down the amount of expected output may actually increase it.

(As an aside, I love point #6 – I’m a big fan of “fail fast, fail often” as I believe strongly in “the wisdom of crowds”, where we can aggregate feedback and iterate quickly, especially for early development. But it’s not always easy to get business buy-in, especially when all they see in “fail fast, fail often” is “fail”!)


The need for theory in prediction models

I’d like to share this wonderful quip by philosopher Robert Long, that was quoted in the (also insightful and actually pretty good) book A Richer Life by Philip Roscoe:

Let’s say that in early 2001 I formulate a theory to the effect that there is a Constant Tolkienian Force in the Universe that produces a Tolkien film every year. When Austrians complain that my theory ignores the fact that films are products of human action and not of constant impersonal forces, I reply: ‘Oh, I know that. My theory isn’t supposed to be realistic. The question is whether it’s a good predictor.’ So I test it in 2001, 2002, and 2003. Lo and behold, my theory works each year! … But unless I pay some attention to the true explanation of this sequence of film releases, I’ll be caught by surprise when the regularity fails for 2004.

The above quote relates to the “blind” prediction models that we build (maybe statistical, maybe machine learning), where “accurate prediction” is so often the aim of the game.

The problem with just going for accurate predictions, without thinking about the underlying causes or theory, is that it’s difficult to tell whether or not the predictions are flukes, or if they contain true insight that may not be so obvious to anybody or anything else but the algorithm.

Though the quote above is rather tongue-in-cheek (the “Tolkienian Force” model is really built on just three data points, way too few for statistical significance) the moral of the need for a good theory wasn’t lost on me: theory is the human ying to the technological yang. They balance and support each other, creating something much more powerful and persuasive than either would alone. When one is amiss, the other somehow doesn’t feel right either.

Another (More Selfish) Benefit of Theory

They’re actually really useful when persuading the business on accepting the results of a prediction model that they otherwise know nothing about. Without a theory, on what basis should the business believe the model? On faith?

That’s a tough sell.

And if you were the one who built the model, and they “trusted” the model because they trust you, what happens if the model fails?

Well, you fail with the model.

With a convincing theory, the model stands on its own merits. They may “trust” the model a bit more because they trust you, but if the model fails, it’s the theory that fails with it, not you.

What’s Sales Reporting Governance got to do with Bribery?

I lead a Sales Operations team, and one of our objectives for this year is to establish a “sales reporting governance structure”: to ensure that the right reports/tools get developed, with the right specifications, at the right time; and, perhaps most importantly, with the buy-in by the right people.

Essentially this governance structure looks at controlling the reporting life cycle (something like this report life cycle diagram) from when a report is dreamt up in the head of one of our business partners (our “internal customers”), through to when the report reaches its EOL (end-of-life) and can be stopped.

Though you may think this is somewhat dry work, let me assure you that it’s often anything but. Conversations can be excruciating quite colourful, particularly when it comes to prioritisation and negotiating timelines.

Take for example the following conversation between one of our business partners (BP) and us:

BP: “What do you mean you can only deliver it next Friday? I need it by Tuesday.”

Us: “Sure, that can be done, but we’ll need to stop work on the other three developments we’re working on for you that are due next Monday.”

BP: “No, you can’t stop work on those. I need those next Monday, and this one by next Tuesday.”

Us: “Sure, but we’ll have to exclude the new functionalities that you’d asked for.”

BP: “No, you can’t do that.”

Us:“I’m sorry but if you’re not able to budge on re-prioritising the other work, nor reducing the scope, there’s no way we can hit the timelines you’re asking for, especially when you’re asking for this so late in the game.”

BP: “I’m escalating this. You’ll hear from my manager.”

And so on.

In all fairness though, I have to say that in my experience most managers and senior colleagues (and anyone who has worked in, or closely with, IT) tend to understand that we have to satisfy ourselves with but 24 hours a day to do all we need to do.

These sort of escalations tend to end with “the manager” having a cordial chat with us and agreeing on a workable next step forward, none of which involves us engineering more time into the day.

Establishing a Governance Structure

Having a governance structure tends to minimise “unconstructive” conversations like those above, I think largely because of a mutual trust: the business partner trusts our verdict of whether something is possible or not impossible within a specific time frame,  while we trust that they have thought carefully through their requests and won’t be changing or adding to them unnecessarily.

But the problem with establishing a governance structure is that it, well, needs to be established, which can be incredibly tricky to get going. It’s almost like an negotiating a peace deal, where both sides want the conflict to stop, but are worried what might happen the moment they lay down their arms — will the other side take advantage and strike when they are at their most vulnerable?

I will be the first to admit that it takes a leap of faith going from a world of “if I don’t shout loud enough, and often enough, nothing’s going to get done”, to one where we’re all amicably setting and agreeing on priorities, and where promised delivery deadlines are actually being met.

It also doesn’t help that from a developer’s side, without the benefit of having past projects to tune one’s intuition, accurately estimating project scope or determining deadlines is going to be difficult;  often multiple iterations are necessary before this sort of “accuracy” is achieved. What this means is that early on, chances are good deadlines are going to be missed, which doesn’t help in building trust.

After a missed deadline or two, it’s all too easy to fall back into old patterns and proclaim that the process doesn’t work.

There will also be many, especially those more used to the “free-and-easy” days of yore, who will actively fight the change, citing that it creates too much red tape and jumping through hoops to get things done.

“We need to establish our reporting as soon as possible or we’ll just be flying blind — we can’t afford to go through this process!”

But the thing is, we often can’t afford not to.

When the number of development projects are small, I have to agree that the process, this “bureaucracy”, adds little value. We could simply get on a phone call, or write an e-mail, and agree among ourselves what needs to be done and when. If the requirement changes, no biggie, we simply tweak until its perfect – there’s sufficient slack in the system that will enable us to do just that.

But problems will occur when the number of projects starts to creep up, and more stakeholders are introduced.

The Need for a Tighter Process

The first problem is that due to the higher workload, the slack in the system that allowed for changes in between a development cycle will be gone. This means that changes or additions to the original requirement will likely have to be parked until development time opens up, which could be weeks down the road.

Business partners are not going to like that. “It’s a simple change for God’s sake!”

The thing is, no matter how small a change is, it’s going to be work. Somebody’s got to do it, and that means time out from other projects, which also have agreed timelines. If we focus on that change now, it risks jeopardising the timelines for every other project down the line.

If the change is important enough, then maybe we can take time out from another project and put it into executing the change. But it needs to be agreed by the team owning the other project. Which leads nicely to the second problem.

The second problem is that everyone will have their own agendas, and everyone’s pet project will be “of the highest priority”.

What happens when Team A, B, and C all have “high priority projects” that need to be done by next Monday, and development team only has the capacity to complete one or two? Without a proper process or governance structure, can we guarantee that the project of the highest priority for the business will be one that’s completed?

In the end, more time will be spent explaining to each of the stakeholders why their project was not completed; people will be upset, and the next time they’ll just be sure to shout all the louder, and all the more frequently. More time will be spent on meetings and e-mails, people “ensuring” this and that and never really ensuring anything at all. Estimated delivery dates will be given, but nobody would trust them because they know someone else coming in with an “urgent” request would likely take priority. If it’s “last in, first out”, why should I raise a request early only to be relegated down to the bottom of the delivery pile?

This just struck me as very analogous to the concept of bribery, which I was reminded of on my reading of the book Treasure Islands by Nicholas Shaxson:

Some argue that bribery is ‘efficient’ because it helps people get around bureaucratic obstacles, and get things done. Bribery is efficient in that very narrow sense. But consider whether a system plagued by bribery is efficient and the answer is the exact opposite. [Bribery undermines] the rules, systems, and institutions that promote the public good, and they undermine our faith in those rules.

Despite any short-term drawbacks, there are plenty of longer-term benefits, not least that of supporting stronger surrounding report development structures and a generally healthier culture.

Though setting up the governance structure thus far has been tough, with plenty of push-back and many of our business partners trying to circumvent the process we have established, I think it’s one of the most important things we can, and have ever attempted, to do.

Falling to the level of our training

I first saw the following wonderful quote in a book by Joshua Medcalf (called Hustle),  attributed to  an anonymous Navy SEAL:

Under pressure, you don’t rise to the occasion, you sink to the level of your training.

What a beautiful principle to live your life by. (I was particularly inspired because I have been doing quite a bit of training for my upcoming IPPT – haven’t had an IPPT gold in ages!)

PS: A little research brought me to Quora where I learned that the origin of that quote could probably be attributed to the Greek poet Archilochus:

We don't rise to the level of our expectations, we fall to the level of the training.