Getting Excited About Small Data

The next few quarters for analytics in my company are, from my perspective, going to be game-changing, and I’m excited to say my team’s taking the lead on it: from machine learning and advanced visualisations to new ways of thinking about data, we’re currently taking the steps to get to what I call “the next phase of analytics”. We are a small team with big dreams.

But what I often get from friends (and some colleagues) when I tell them about the things my team is doing, though, are questions on how “big data” is playing a part in it. Specifically, how it figures in our plans for the next few quarters.

When I tell them it doesn’t, they look at me as if I just said I loved eating broccoli ice-cream: perplexed; a little disgusted; and mixed with a bit of pity on the side. (If you clicked on that link or you know that song, you might have guessed I’m doing that parenting thing.)

“Big data” simply doesn’t factor in those plans (yet). We have enough small data to worry about to even think about big data. And yet, to them small data is yesterday’s news. It’s as if small data doesn’t count; as if it’s nothing to get excited about.

But it does count. And to those who haven’t yet experienced the joys of wringing all the value out of small data, it is downright exciting.

Sure, big data has the potential to change the world, and in many cases it already has. But by and large most of the value of big data still lies in its potential.

Small data, on the other hand, has long shown its ability to change the world.

I love especially this little story from the book mind+machine by Marc Vollenwider:

Using just 800 bits of HR information, an investment bank saved USD 1 million every year, generating an ROI of several thousand percent. How? Banking analysts use a lot of expensive data from databases paid through individual seat licenses. After bonus time in January, the musical chairs game starts and many analyst teams join competitor institutions, at which point the seat license should be canceled. In this case, the process step simply did not happen, as nobody thought about sending the corresponding instructions to the database companies in time. Therefore, the bank kept unnecessarily paying about USD 1 million annually. Why 800 bits? Clearly, whether someone is employed (“1”) or not (“0”) is a binary piece of information called a “bit”. With 800 analysts, the bank had 800 bits of HR information. The anlaytics rule was almost embarrassingly simple: “If no longer employed, send email to terminate the seat license.” All that needed to happen was a simple search for changes in employment status int he employment information from HR.

The amazing thing about this use case is it just required some solid thinking, linking a bit of employment information with the database licenses.

Small data can have big impact.

So yes, I am excited about small data!

And no, big data won’t be part of our coming analytics revolution. (Yet.)

The problem with running a team at full capacity

I shared this earlier on LinkedIn, but thought that it was worth sharing it here too as a reminder to myself: Six Myths of Product Development

I came across the article above while researching why a team that traditionally does great work may sometimes stumble (yes, mine). The past few weeks had been a whirlwind of activity, with team output close to or at an all time high. We were publishing and developing things left and right, and everyone was running close to capacity. It was great.

Then came an e-mail that questioned the quality of the output. Then another. Much of the great work threatened to come undone, but thankfully most made it through unscathed. We were still, generally, in a good place. But this was a wake up call. Something needed to be done.

After I explained to her my conundrum, my knowledgeable friend, Google, suggested an article from the Harvard Business Review website called “Six Myths of Product Development.”

It was a most excellent suggestion.

The article highlighted six myths or fallacies:

  1. High utilization of resources will improve performance.
  2. Processing work in large batches improves the economics of the development process.
  3. Our development plan is great; we just need to stick to it.
  4. The sooner the project is started, the sooner it will be finished.
  5. The more features we put into a product, the more customers will like it.
  6. We will be more successful if we get it right the first time.

It didn’t take long for me to realise that our problem was very likely linked to #1: I’d neglected slack.

You see, I normally tend guard slack time jealously as I know time-pressures are often a big cause of low quality output. But given the myriad of “urgent” business needs had allowed myself and the team to run too close to full capacity.

We have seen that projects’ speed, efficiency, and output quality inevitably decrease when managers completely fill the plates of their product-development employees—no matter how skilled those managers may be. High utilization has serious negative side effects… Add 5% more work, and completing it may take 100% longer. But few people understand this effect.

It’s funny how bringing down the amount of expected output may actually increase it.

(As an aside, I love point #6 – I’m a big fan of “fail fast, fail often” as I believe strongly in “the wisdom of crowds”, where we can aggregate feedback and iterate quickly, especially for early development. But it’s not always easy to get business buy-in, especially when all they see in “fail fast, fail often” is “fail”!)

 

The need for theory in prediction models

I’d like to share this wonderful quip by philosopher Robert Long, that was quoted in the (also insightful and actually pretty good) book A Richer Life by Philip Roscoe:

Let’s say that in early 2001 I formulate a theory to the effect that there is a Constant Tolkienian Force in the Universe that produces a Tolkien film every year. When Austrians complain that my theory ignores the fact that films are products of human action and not of constant impersonal forces, I reply: ‘Oh, I know that. My theory isn’t supposed to be realistic. The question is whether it’s a good predictor.’ So I test it in 2001, 2002, and 2003. Lo and behold, my theory works each year! … But unless I pay some attention to the true explanation of this sequence of film releases, I’ll be caught by surprise when the regularity fails for 2004.

The above quote relates to the “blind” prediction models that we build (maybe statistical, maybe machine learning), where “accurate prediction” is so often the aim of the game.

The problem with just going for accurate predictions, without thinking about the underlying causes or theory, is that it’s difficult to tell whether or not the predictions are flukes, or if they contain true insight that may not be so obvious to anybody or anything else but the algorithm.

Though the quote above is rather tongue-in-cheek (the “Tolkienian Force” model is really built on just three data points, way too few for statistical significance) the moral of the need for a good theory wasn’t lost on me: theory is the human ying to the technological yang. They balance and support each other, creating something much more powerful and persuasive than either would alone. When one is amiss, the other somehow doesn’t feel right either.

Another (More Selfish) Benefit of Theory

They’re actually really useful when persuading the business on accepting the results of a prediction model that they otherwise know nothing about. Without a theory, on what basis should the business believe the model? On faith?

That’s a tough sell.

And if you were the one who built the model, and they “trusted” the model because they trust you, what happens if the model fails?

Well, you fail with the model.

With a convincing theory, the model stands on its own merits. They may “trust” the model a bit more because they trust you, but if the model fails, it’s the theory that fails with it, not you.

What’s Sales Reporting Governance got to do with Bribery?

I lead a Sales Operations team, and one of our objectives for this year is to establish a “sales reporting governance structure”: to ensure that the right reports/tools get developed, with the right specifications, at the right time; and, perhaps most importantly, with the buy-in by the right people.

Essentially this governance structure looks at controlling the reporting life cycle (something like this report life cycle diagram) from when a report is dreamt up in the head of one of our business partners (our “internal customers”), through to when the report reaches its EOL (end-of-life) and can be stopped.

Though you may think this is somewhat dry work, let me assure you that it’s often anything but. Conversations can be excruciating quite colourful, particularly when it comes to prioritisation and negotiating timelines.

Take for example the following conversation between one of our business partners (BP) and us:

BP: “What do you mean you can only deliver it next Friday? I need it by Tuesday.”

Us: “Sure, that can be done, but we’ll need to stop work on the other three developments we’re working on for you that are due next Monday.”

BP: “No, you can’t stop work on those. I need those next Monday, and this one by next Tuesday.”

Us: “Sure, but we’ll have to exclude the new functionalities that you’d asked for.”

BP: “No, you can’t do that.”

Us:“I’m sorry but if you’re not able to budge on re-prioritising the other work, nor reducing the scope, there’s no way we can hit the timelines you’re asking for, especially when you’re asking for this so late in the game.”

BP: “I’m escalating this. You’ll hear from my manager.”

And so on.

In all fairness though, I have to say that in my experience most managers and senior colleagues (and anyone who has worked in, or closely with, IT) tend to understand that we have to satisfy ourselves with but 24 hours a day to do all we need to do.

These sort of escalations tend to end with “the manager” having a cordial chat with us and agreeing on a workable next step forward, none of which involves us engineering more time into the day.

Establishing a Governance Structure

Having a governance structure tends to minimise “unconstructive” conversations like those above, I think largely because of a mutual trust: the business partner trusts our verdict of whether something is possible or not impossible within a specific time frame,  while we trust that they have thought carefully through their requests and won’t be changing or adding to them unnecessarily.

But the problem with establishing a governance structure is that it, well, needs to be established, which can be incredibly tricky to get going. It’s almost like an negotiating a peace deal, where both sides want the conflict to stop, but are worried what might happen the moment they lay down their arms — will the other side take advantage and strike when they are at their most vulnerable?

I will be the first to admit that it takes a leap of faith going from a world of “if I don’t shout loud enough, and often enough, nothing’s going to get done”, to one where we’re all amicably setting and agreeing on priorities, and where promised delivery deadlines are actually being met.

It also doesn’t help that from a developer’s side, without the benefit of having past projects to tune one’s intuition, accurately estimating project scope or determining deadlines is going to be difficult;  often multiple iterations are necessary before this sort of “accuracy” is achieved. What this means is that early on, chances are good deadlines are going to be missed, which doesn’t help in building trust.

After a missed deadline or two, it’s all too easy to fall back into old patterns and proclaim that the process doesn’t work.

There will also be many, especially those more used to the “free-and-easy” days of yore, who will actively fight the change, citing that it creates too much red tape and jumping through hoops to get things done.

“We need to establish our reporting as soon as possible or we’ll just be flying blind — we can’t afford to go through this process!”

But the thing is, we often can’t afford not to.

When the number of development projects are small, I have to agree that the process, this “bureaucracy”, adds little value. We could simply get on a phone call, or write an e-mail, and agree among ourselves what needs to be done and when. If the requirement changes, no biggie, we simply tweak until its perfect – there’s sufficient slack in the system that will enable us to do just that.

But problems will occur when the number of projects starts to creep up, and more stakeholders are introduced.

The Need for a Tighter Process

The first problem is that due to the higher workload, the slack in the system that allowed for changes in between a development cycle will be gone. This means that changes or additions to the original requirement will likely have to be parked until development time opens up, which could be weeks down the road.

Business partners are not going to like that. “It’s a simple change for God’s sake!”

The thing is, no matter how small a change is, it’s going to be work. Somebody’s got to do it, and that means time out from other projects, which also have agreed timelines. If we focus on that change now, it risks jeopardising the timelines for every other project down the line.

If the change is important enough, then maybe we can take time out from another project and put it into executing the change. But it needs to be agreed by the team owning the other project. Which leads nicely to the second problem.

The second problem is that everyone will have their own agendas, and everyone’s pet project will be “of the highest priority”.

What happens when Team A, B, and C all have “high priority projects” that need to be done by next Monday, and development team only has the capacity to complete one or two? Without a proper process or governance structure, can we guarantee that the project of the highest priority for the business will be one that’s completed?

In the end, more time will be spent explaining to each of the stakeholders why their project was not completed; people will be upset, and the next time they’ll just be sure to shout all the louder, and all the more frequently. More time will be spent on meetings and e-mails, people “ensuring” this and that and never really ensuring anything at all. Estimated delivery dates will be given, but nobody would trust them because they know someone else coming in with an “urgent” request would likely take priority. If it’s “last in, first out”, why should I raise a request early only to be relegated down to the bottom of the delivery pile?

This just struck me as very analogous to the concept of bribery, which I was reminded of on my reading of the book Treasure Islands by Nicholas Shaxson:

Some argue that bribery is ‘efficient’ because it helps people get around bureaucratic obstacles, and get things done. Bribery is efficient in that very narrow sense. But consider whether a system plagued by bribery is efficient and the answer is the exact opposite. [Bribery undermines] the rules, systems, and institutions that promote the public good, and they undermine our faith in those rules.

Despite any short-term drawbacks, there are plenty of longer-term benefits, not least that of supporting stronger surrounding report development structures and a generally healthier culture.

Though setting up the governance structure thus far has been tough, with plenty of push-back and many of our business partners trying to circumvent the process we have established, I think it’s one of the most important things we can, and have ever attempted, to do.

Falling to the level of our training

I first saw the following wonderful quote in a book by Joshua Medcalf (called Hustle),  attributed to  an anonymous Navy SEAL:

Under pressure, you don’t rise to the occasion, you sink to the level of your training.

What a beautiful principle to live your life by. (I was particularly inspired because I have been doing quite a bit of training for my upcoming IPPT – haven’t had an IPPT gold in ages!)

PS: A little research brought me to Quora where I learned that the origin of that quote could probably be attributed to the Greek poet Archilochus:

We don't rise to the level of our expectations, we fall to the level of the training.

Playing Baseball without a Bat – a great example of effective statistical visualisation

Came across a very interesting and persuasive video on baseball via Kottke.org today. It’s a great example of what an interesting question, effective visualisation, and some statistical knowledge can do.

The question the video seeks to answer is the following: what would happen if baseball player Barry Bonds, who happened to play one of his greatest (if not the greatest) baseball seasons ever in 2004, played without a baseball bat?

I’m not a baseball fan, and frankly quite a number of the things that were mentioned in the video were lost on me. But I’m a fan of interesting statistics and great visualisations, and this definitely had both.

And despite having a few doubts at its conclusion (the results seem too good to be true – watch to the end!), it is convincing and definitely worth a watch if you’re either into baseball or statistical visualisations.

On meritocracy, luck, and giving back

Kottke’s post on meritocracy, a concept that I had in my younger days considered infallible, reminded me that even those of us who have worked hard and achieved so-called “success” have much to owe to “luck”.

Even the smartest, hardest working, most beautiful of us all, would likely have not fared well, had we been born in the midst of a famine to parents who couldn’t even afford to feed themselves.

And even the dumbest, most slothful, and ugly of us all, would not have fared too badly, had we been born to highly influential and powerful parents whom held us in even the slightest regard.

So let us all remain humble if are ever lucky and become “more successful” than others.

We probably owe more to chance and luck than we think.

Lucky

I met up with a friend last week over lunch, and one of the things that was brought up in the conversation was on our work, our careers. He was genuinely happy and excited for me that I was (finally) going to graduate from my Master’s degree in Analytics.

To him, my having these analytical skills, backed with a Master’s degree, would easily propel me to the top. I would, he said, be in high demand.

Being quite the realist, though, I didn’t exactly share his optimism.  I knew that even if I was the best in the world at what I did, if nobody knew what I did, it didn’t matter. There would be far too many people like me with similar qualifications and experiences.

But I knew where he was coming from.

It was true that my skill set was in demand. And it was true that I probably had an easier time than most in finding career opportunities. Unlike many others I knew, I was in the rather envious position of not worrying whether or not I’d find another job if I left my current one, by choice or otherwise, because I knew I would. I only stayed because I wanted to.

It then occurred to me how lucky I was.

Living the Dream

“I am living the dream,” I said to the group, “doing what I love.”

I was in a management development workshop organised by the company, and that was my response to the question, “tell us something nobody else in the workshop knows.”

It had come spontaneously and was as much a surprise to me as it was to everyone else.

It wasn’t that my career was perfect — I still had much I wanted to do; much I wanted to achieve.

But given all the million-and-one constraints, my career’s turned out pretty good: leveraging my business-IT background, I work within Sales but deal with technology (even doing some scripting and programming) every single day; I develop data products that are used by hundreds, from the frontline through to senior management; I regularly get to present my ideas and train Sales on technology and data literacy; and I lead a team of wonderful colleagues who do excellent work (and at the same time have a great boss); it’s almost precisely how I would have envisioned a “good” career outcome (shame about the pay!)

But it could have been so different.

I knew was lucky.

Right Place, Right Time

I was lucky in that my parents weren’t poor, and had purchased a computer for the home even when that wasn’t a very common thing to do. And I was lucky that I was allowed to use this very expensive toy, which exposed me to technology at a very young age.

I was lucky that I grew up in a time when the Singapore government wasn’t too interested on clamping down on software piracy — I suspect the government did this on purpose because many of us, though not poor, were not rich enough to actually purchase professional-grade software to play around with. 99% of what I know I learned on bootleg software.  This move alone probably bumped up Singapore’s technological literacy a fair bit.

I was lucky that I was never stopped in pursuing my love for technology — when I opted for a technology-focused polytechnic education (i.e. the Diploma route) instead of going the more traditional “junior college” (i.e. the A-Levels route), I never met any parental resistance (which in a way, was because I was lucky enough that my grades were good but never exceptional, and so my parents didn’t really care — had they been exceptional, my guess would be that the would have been far more opinionated).

I was lucky that I was hired for an analytics position at the very last interview that I decided to go for before heading into the world of Financial Advising, thereby leading me to my current world of technology and analytics… what were the chances?

Right place. Right time. And if not enabled by the luck, at least not hindered.

But not everyone will be so fortunate, and it is up to us, the lucky and empowered ones, to give back and to try to provide opportunities to others who may not be as lucky.

Yet.

On Giving Back

My one simple philosophy on giving back: that anyone whom I work  or in any way interact with should find that if I had never appeared in their lives they would have been a little poorer for it.

I seek to be the luck in people’s lives.

Because so often they are in mine.