Building an Antifragile System

I just completed the testing of a new program I wrote. 500 lines of well-commented code, making debugging easy if necessary. With this program, the reports we run daily in the morning would take 10 minutes instead of the usual hour, and fully-automated too. Without any manual inputs, the potential failure points owing to manual input errors (so common when any sort of manual intervention is required) are no longer a threat, a large boon to data integrity.

But yet, I couldn’t quite feel at ease. Something was bothering me – I just couldn’t help thinking that it was all too automated. Too efficient. Too fragile.

Was I missing something? I thought about it a while, then realised that I’d forgot about insuring myself against possible data problems. Would I know if the data wasn’t correct? What alternatives did I have if the program failed? Key business decisions may be made based on these reports, after all, so it wasn’t a good idea to take any chances here even if at a slight hit to efficiency.

I’m going to allude to Taleb’s wonderful book Antrifragile again (as I did in my previous post): the problem with lean and highly efficient systems is that they tend to be fragile, breaking easily under times of volatitlity (even the volatility of time itself) while simultaneously portraying an air of infallibility (it’s so efficient and wonderful that all the potential problems are ignored).

Fat-free, highly efficient systems break easily under times of volatility. Take for example a reporting system that extracts a specific piece of data within a spreadsheet on a daily basis. At its most efficient, you just take the data from a specific cell, without doing any checks or introducing any sort of flexibility. So perhaps you program the system to look for the cell that’s directly below the “Sales” header (say, for “Region X”), which is always the second from the left-most column. One day, because there’s be the addition of a new sales region (“Region Y”), the cell that you’re looking for pushed to the cell that’s two rows below the “Sales” header. And since everything’s automated, you don’t realise that the number that you’re now getting is for “Region Y”, not “Region X”, and you’ll continue using the numbers as if everything was fine. If you were manually running the report, however, you’d have spotted the difference right away.

Or take task scheduling for example. Let’s say you have two tasks you want your computer to carry out: Task A and Task B. Task B depends on the result of Task A, so Task A would have to be run first. Because you want to be as efficient as possible (i.e. completing the most number of tasks in the smallest amount of time), you set the starting time of Task B as close to Task A as you can (for the sake of this example, let’s assume you can’t set the scheduler to start Task B “after Task A” but have to give it a specific time).

Let’s say that after monitoring the times taken for Task A to run for a month, you find that it takes 30 minutes at most, so you schedule Task B to start 30 minutes after Task A. One day, due to an unscheduled system update halfway through Task A, Task A takes a minute longer than usual to run (i.e. 30 + 2 minutes), causing Task B to fail since Task A wasn’t ready on time. The system thus breaks because of this quest for “efficiency”. If a little redundancy had been included for “freak run times” at the expense of pure efficiency (or a less fragile process been put in place, e.g. figuring out how to start Task B after Task A, no matter how long Task A took), it’d have been OK.

An unwarranted faith and an air of infallibility of automated systems. When I program something to automate a process, I test it thoroughly before deployment, making sure that it works as advertised. After that, I wash my hands off the program. The reason why I automated it in the first place was because I didn’t want the process to be as involved as it was. If I was going to have to be looking over the program’s shoulder (i.e. if it had one) it’d defeat the purpose of automation, wouldn’t it? So I trust the system to work, and expect it to work until I know of changes that might break it.

I’m not sure about other developers but I find that I tend to place a lot of trust on automated systems because  of the “no manual intervention” aspect of automated systems. I find that most data discrepencies occur because of data-entry or procedural errors (i.e. missing out an action or doing things in the wrong order). As mentioned previously, when I automate something I do a lot of checks to make sure everything’s working correctly as at the time of deployment. After that though, anything goes. And even though I do carry out the occasional random check, they are random and may not occur for a long time after things have gone awry.

Forgetting that things can and do go wrong is one mistake we may make in designing systems. Expecting things to eventually go wrong and what to do about it when it does is one way we can protect ourselves against such occurences, helping us to mitigate the risk of large negative consequences when things do go wrong. Take for instance the example mentioned before of the additional sales region in the reporting system. One way we could protect ourselves against the using of the wrong data while enjoying the benefits of an automated system would be to create a “checksum” of sorts (used to ensure the integrity of the data after transmission or storage), where we check the sum of the total sales number against the sum of the individual sales numbers. If there’s a discrepency, perhaps because of a missing or additional region, we flag it out and prevent its distribution.

Creating great automated systems doesn’t necessarily entail making things as efficient as possible, even though efficiency might be the main benefit you were seeking in the first place. A highly efficient system that’s prone to error (and worse, errors you’re not aware of – the unknown unknowns) is worse than having no automated system at all. With such a system, you’d just be doing the wrong thing faster (like great management without an accomanying talent for leadership/vision!)

Having a system that can withstand shocks (one that’s “robust”), and recover stronger than before (one that’s “antifragile”) after they happen is the best system you can hope to have. Adding (some) redundancy for contingency and keeping in mind that even the best systems can fail is the key to protecting ourselves against the errors that happen. And they will happen. So, how antifragile’s your system?

On antifragility and new stuff

Taleb once again scores with me with his book on “antifragility”. Like his book on randomness and black swans, this book has opened my mind to a concept that I’ve intuitively felt but never been able to put down in words.

I wrote once about “destroying things” to love them more – making new things old because of the transcience of “newness” but the lasting hold of “oldness”. I never realised that it could have been antifragility at work.

An object, when new, when perfect, is at its most fragile. At any moment a small bit of entropy – a scratch, a bump, or even the controlled might of time — might cause it to be no longer new; no longer perfect. The harder you try to keep it in pristine condition, the worse off you’ll feel when it’s finally imperfect. It’s value drops precipitously, and all the effort maintaining it goes to waste.

But the moment an object is old, the focus is no longer on its newness. It becomes more robust — a small scratch on an already scratched object brings no harm. And it might even be brought into the realm of the antifragile — where a scratch could bring along positive associations like memories or good feelings, making it better than it was before.

Analytics Adoption: Evolutionary vs. Revolutionary Technology

In this post about analytics adoption, I’d like to start with a short story.

The wife and I got ourselves each a Samsung Galaxy S4 over the weekend. Though it’s a great phone, we couldn’t help but feel that there was a distinct lack of a “wow” factor.

We both moved to the S4 from the S2. Back in its day (about two years ago) it was the latest and the greatest, and though technology has come some way since then it’s still a very capable phone. I remember when we got the S2… boy, did we feel like country bumpkins moving into the city. Everything was wow, wow, wow.

Even I, a self-prosessed can’t-go-a-day-without-the-computer-nut, could go a day (a day! can you imagine??) without touching my computer because everything I needed to do on it I could do on the phone. It was amazing to be able to send free text messages, and access e-mail and Facebook on the go. I didn’t know what I was missing until I tasted the data-plan-backed mobile life.

But having had such a capable phone already, the S4 comes to us an evolutionary and not revolutionary move. Sure, things are snappier, bright, faster, and larger. But that’s about all they are. It hasn’t been the same habit-changing killer app.

Evolutionary vs. Revolutionary Technology

Now, the S4 may be a evolutionary technology step for me, but for plenty of people who haven’t yet made the move to a relatively capable handset like the S2, it could well be a revolutionary move. The thing is that where a user is in the lifecycle of technology adoption makes a big difference to how that technology is perceived.

I would imagine that the biggest winners in analytics ROI (i.e. dollar returned per dollar invested) would well be those who have been avoiding it thus far. Because there’s such a huge gap to be bridged between no analytics and some analytics, even the smallest investments in analytics could give huge returns. (Whether it scales or not is another story for another day.)

And with the recent improvements in analytical processes/methodology, software, and thinking (a very important point here), things are far cheaper — and not just in terms of money, but of time, executive buy-in, and ease-of-adoption.

User requests are like hunger pangs

“So, when is the [request] going to be ready?” he asks me, the fourth person to ask in a one-week period.

This, I think to myself, is probably real hunger.

“I’m working on it,” I reply, which means I’m waiting it out to determine how important the request really is. The moment I can confidently say it’s a valid “need”, a real hunger, I move it into my high-priority queue and start work on it.

It’s not that I don’t wish to help, but system/application/report requests have a tedency to come in hugely inflated, seemingly much more important than they really are. More a reaction to an itch than a true life-saving need it’s thought to be.

I like to think of requests that come into my queue as a type of hunger. There is real hunger: the haven’t-eaten-for-days-and-starving hunger; and then there’s perceived hunger: the after-dinner craving for Pringles hunger.

When a request is of the “real” hunger variety, no matter how long you try to wait it out it’ll always be there (and the people who are requesting it won’t let you forget it’s there!)

“Perceived” hunger requests, on the other hand, tend to go away like after-dinner cravings when you give it a little time.

One problem with giving in to these “perceived” hunger requests is that, like the afore-mentioned Pringles, once you “pop you can’t stop” – these sorts of requests tend to come one after another. And it’s difficult to know when to say no because each request isn’t really that different from those that came before it.

A precedent, once set, can bind you to a cycle of petty requests (“why did her request go through and not mine?”) for the life of the project.

So my advice is: wait and see. If it’s really important you’ll be sure to know.

Which reminds me, it’s time for supper.

What do you do? I’m an analyst.

“So,” my wife’s friend asks, “what do you do?”

I pause for a moment, prepare to say “business analyst”, but then decide not to because I didn’t think she’d understand what I did. I look at my wife and ask her, “what do I do?”, hoping she’d have a better answer.

My wife looks at me, then at her friend, and says, “he’s a business analyst.”

I look at my wife’s friend and she looks back at me. Silence. She gives me a puzzled look.

Food’s here. We eat.

The “what do you do?” question has probably been asked since the dawn of awkward social situations. Since when wives demanded husbands go out on social gatherings with their friends.

But despite it being such a predictable question, it’s something I never really had a satisfying answer to.

It wouldn’t have been so awkward if I was a firefighter, doctor, teacher, butcher, or astronaut  You know, easily definable occupations we all wrote about as kids and coloured in colouring books.

“Business analyst” just doesn’t fit neatly into pre-conceived notions of what a job should be like. It’s rather new-ish, relatively abstract, and not quite defined the same everywhere. Different companies have different meanings for the term.

But I came across an interesting post that contained a little nugget on what a business analyst does. The post is about building a web analytics team, but I think the article is really about building any analytics team, web analytics or not.

From the post: by Jim Sterne, founder of the Digital Analytics Association, about what an “analyst” does:

The magical person called ‘the analyst’ understands all the data and how it is captured and how reliable it is. But they also understand what optimisation is about and what the business process looks like and what the business goals are. The analyst is that magic place in the middle where they understand the desired outcome, they comprehend the big picture and can look at Big Data and ask the right questions. It is the creative part. But they also have to be really good at communicating their insights out to the marketing people and the business strategy folks because if they have a great insight and they don’t know how to communicate it, it doesn’t matter.

Here’s as good a description as any I’ve seen about what an analyst does. Unfortunately not quite the elevator pitch of a business analyst that I can give.

Link

I’m a sucker for personality tests. Here’s one I haven’t taken (and believe me I’ve taken many) which though I wouldn’t read too much into its results, does seem in its own little way pretty accurate: ColorQuiz

(I suppose one reason I like personality tests is that I’m always amazed when they get things right, and I’m always wondering how they did it. I suppose one way you could go about doing this is just asking lots of people lots of questions, and depending on how they answered cluster them into one of several pre-defined categories, each associated with certain personality traits. Of course there could be other ways you could go about doing this, but in general there’s going to be data analysis and predictions and other wonderful things. Beautiful.)

On the Boston Marathon Bombings

I haven’t written about it. I didn’t think I would. But I think I should. The tragedy at the Boston Marathon has hit me worse than I’d thought.

(“Why do you look so sad?” the wife asked. “I don’t know,” I replied.)

I’m an avid runner, and one who has dreamt of qualifying for Boston for the past 10 years. Never managed it, but it didn’t matter. The anticipation probably gave me more joy than its actualisation will.

It was routine when I ran to have images of myself running down the streets of Boston push me on during my runs. Cheered on by friends and family and in peak condition.

Two breaths in. Four breaths out. Boston.

(“Maybe it’s work,” I said.)

But it’s hard to do that now. After what happened.

What a horrible, horrible juxtaposition. That of a planned and dedicated personal triumph, set against that of the randomness of terror and vulnerability.

Just thinking of it hurts. Makes me sick.

(It wasn’t work I thought.)

On my run last night I couldn’t help but think, “what’s it all for, what’s this all for?”

Automated out of a job

“Most of the time,” my friend told me,” they were just doing really manual work. Copying and pasting, doing very routine things that could have been automated. And they’d do these things for 8 or 9 hours a day, sometimes more. They’d come back on Saturdays just to finish their work.”

He was talking about the work some of his ex-colleagues were doing, and how he couldn’t believe the way they were going about doing it. “If I’d helped them automate that,” he said, “some of them, no, most of them, would be out of a job.”

I have seen my fair share of people doing what he described. Really bright individuals (some highly paid, I might add), spending an incredible amount of time doing incredibly inefficient work, copying and pasting data from one system to the next, or manually looking up values in one spreadsheet to another (as opposed to using Excel functions like vlookups or pivot tables). Half their day, I estimate, could be spent doing these types of low-level work because they didn’t know of any other ways to go about doing it.

It’s a sad truth that the work many of us “knowledge workers” do isn’t really dependent on any sort of higher-level thinking. Trained high school students could do the work many of us do (even those that require a bachelor’s or post-grad degree).

Sure, it takes a little intellectual skill to get the gears moving, to understand the process when things go wrong, but once that’s taken care of the main bulk of whatever work that’s left over is hypnotic, routine stuff. Stuff that could well be taken off in a fraction of the time through the use of technology.