The Evolutionary Advantage of a Resistance to Change

I was just thinking about organisational change and pondering over how our natural tendency to change is to resist it, when this thought popped up: if resistance to change is so hardwired in our brains, it must serve some purpose — but what?

One of the premises of evolutionary theory is this: if something survives (be it an organism or perhaps even an idea), its survival can be attributed to certain traits or characteristics that help it survive. These traits or characteristics are developed/evolved over time: as those that have these traits survive, and those that don’t die, more and more of the population will have these traits, eventually becoming a “norm”.

Such as a natural tendency to resist change.

At least that’s the theory.

So, let’s make a provocative statement here (inspired by Edward de Bono’s po): Po, a resistance to change isn’t necessarily a bad thing, despite all the bad press it’s getting. In fact, it may be good.

I think that before we get all gung-ho about the next big thing and bashing old ways of doing things, we should think about why things are the way they are, especially if they have been that way for a long time.

Old ways of doing things are, generally speaking, antifragile. They’ve withstood the test of time and have a decent history of working. Good ideas tend to remain good ideas, and the longer they survive the better they tend to be; bad ideas, on the other hand, are discarded as soon as they’re found out (a caveat: those that do manage to survive for long, however, tend to be the most dangerous — bad things are made worse when they’re not known to be bad. Think insidious. Think CFCs.)

So if you have an old way of doing things, it may not be the best way, but it’s likely a way that generates decent results, enough for it to have lasted as long as it has. The moment a better way of doing things is found and tested to work, the old way is discarded. But until then, the old way is the best way.

If people weren’t resistant to change, on the other hand, good ideas wouldn’t have the time to spread. We’d be flitting from one idea to the next, discarding great ones and embracing bad ones in equal measure.

So a resistance to change isn’t all that bad. It’s the way things should be. The incumbent has earned its right on the throne, and the onus should always be on the challenger to prove its worth.

So it does worry me if people rush into new ways of doing things without having redundant systems in place, just in case. I mean, let’s not be too hasty in burning bridges.

If there’s going to be a process change, have the old process remain in place until the new process is proven stable. Depending on what the process is, a few iterations (or days, or months) is most certainly needed. Give it time to prove its antifragility.

When automobiles were first introduced, horse-drawn carriages didn’t disappear overnight. Concurrency. Then obsolescence.

Resist, consider, then change. Carefully.

IT Replacing Labour and the Possible Fragility of the Economy

Latest stats on the US Economy, written up about by Andrew McAfee. He posits the fact that unemployment’s not going up while other economic stats are might be due to greater IT spend — technology replacing labour? Seeing what I’ve seen in my half decade in the workforce I can’t say I disagree: too many jobs are still out there because of a (possibly deliberate) refusal to embrace automation.

Then again, there is a problem with a too-highly-automated workforce, when efficiencies and optimisations are taken a step too far; where redundancies in processes mean the weakest link breaks the system, making it far too fragile.

If IT’s replacing more of the workforce, could it mean we might be approaching an increasingly fragile economy, dependent on automated systems we might not even be properly aware of?

Just a thought.

Risk vs. Uncertainty (Part II): The Secure Print & Scan Edition

I remember once talking to a friend in HR who lamented the fact that her office didn’t have a way to securely scan or print documents. Because she worked in a department with highly sensitive data, I thought that it was strange it wasn’t made available to her.

I recommended that she put in a request to IT to see what they could do, to see if (a) the ability to securely print and scan was available, and if not, be made available; and (b) if it was available, instructions to how to use that function should be made known to the company’s employees.

The response that my friend got from IT was hugely disappointing (to us, at least): that because there wasn’t enough demand for such a service, IT didn’t feel there was a need to invest in this area.

Now, I’ve worked closely with and within IT, and I fully understand from an IT perspective that resources are highly limited (both in terms of money and time), and issues affecting only a small number of users typically shouldn’t warrant any IT investment. And to them, this was an issue that affected only a small number of users — how many people were going to be scanning or printing highly sensitive documents anyway?

But what this IT person didn’t understand was that though it was only a few people who were printing or scanning sensitive documents, these sensitive documents had the potential to impact a large number of people. HR doesn’t work for itself; it’s a supporting function dealing with (potentially) the most sensitive aspects of every employee in the company.

I had half the mind to ask my friend to tell that IT guy that the next document she prints would be his employment details. Perhaps then he would think it a worthwhile investment.

This brings me to another point about risk and uncertainty. Remember in my previous post that I mentioned that the main difference between risk and uncertainty was that the former had known odds while the latter had unknown odds?

The chance of someone seeing a document he or she shouldn’t be given privy to can be calculated. It’s a risk, and the odds are by and large calculable – it’d be something along a function of the number of people using the printer, their usage frequency and periods of printing (peak vs. off-peak periods), the number of documents that are printed (sensitive vs. non-sensitive), and the length of time sensitive documents are collected or deleted.

As the IT person had assumed, the risk of an unauthorised person viewing a sensitive document was probably quite low. Printing and scanning of sensitive documents wasn’t done particularly often, and by and large they’d be collected or deleted before they were viewed by unauthorised persons. But within this scenario lay an uncertainty: how sensitive are these documents and what’d happen if they were viewed by someone with malicious intent?

The IT person couldn’t possibly know how bad an outcome it could be if such an incident did occur. Preventing just one employee from seeing the employment details of a competing employee and finding something he or she deemed “unreasonable” or “unfair” would probably justify the costs of a printing and scanning security implementation. Imagine the costs involved in damage control.

And if you think that’s unlikely or that small employee dispute resolutions are “low cost”, how about preventing the leakage of information about an impending M&A?

In almost all cases, if a relatively low, limited cost can prevent a potentially large (and you don’t know how large), negative outcome, pay it. Make thinking of situations like these with the risk vs. uncertainty mindset and you’ll be surprised the different conclusions you may come up with.

On theory, practice, and Snowflake Schemas

Just the other day I learnt that the data warehouse I was working on was designed using a Star and Snowflake schema. I’d known enough about them to know that this meant the data was set up on fact and “dimensional” tables, but not much other than that.

So the moment I had some time I went online and looked up definitions, and realised that they were pretty much the way many of the bigger databases I’d done up looked like. I’d been using this schema for the past four years (at least) without my ever realising it. It was like in Le Bourgeois gentilhomme when Jourdain remarks

Good heavens! For more than forty years I have been speaking prose without knowing it.

Which reminded me of something else I read about in Taleb‘s Antifragile: that academia often comes after practice. You can do something your whole life (practice), have it labelled and described as something in theory (academia), and after a while forget that it’d started not as a theory but as an unlabelled, undescribed bit of practice and not as a theory.

Personally, Taleb’s spelling this out gives me much reassurance that we don’t always have to understand something theoretically in order to do something (an activity) well. If I want to run well, or swim well, or program well, it doesn’t necessarily have to follow on from learning the theories of aerodynamics, water viscosity, or binary.

Building an Antifragile System

I just completed the testing of a new program I wrote. 500 lines of well-commented code, making debugging easy if necessary. With this program, the reports we run daily in the morning would take 10 minutes instead of the usual hour, and fully-automated too. Without any manual inputs, the potential failure points owing to manual input errors (so common when any sort of manual intervention is required) are no longer a threat, a large boon to data integrity.

But yet, I couldn’t quite feel at ease. Something was bothering me – I just couldn’t help thinking that it was all too automated. Too efficient. Too fragile.

Was I missing something? I thought about it a while, then realised that I’d forgot about insuring myself against possible data problems. Would I know if the data wasn’t correct? What alternatives did I have if the program failed? Key business decisions may be made based on these reports, after all, so it wasn’t a good idea to take any chances here even if at a slight hit to efficiency.

I’m going to allude to Taleb’s wonderful book Antrifragile again (as I did in my previous post): the problem with lean and highly efficient systems is that they tend to be fragile, breaking easily under times of volatitlity (even the volatility of time itself) while simultaneously portraying an air of infallibility (it’s so efficient and wonderful that all the potential problems are ignored).

Fat-free, highly efficient systems break easily under times of volatility. Take for example a reporting system that extracts a specific piece of data within a spreadsheet on a daily basis. At its most efficient, you just take the data from a specific cell, without doing any checks or introducing any sort of flexibility. So perhaps you program the system to look for the cell that’s directly below the “Sales” header (say, for “Region X”), which is always the second from the left-most column. One day, because there’s be the addition of a new sales region (“Region Y”), the cell that you’re looking for pushed to the cell that’s two rows below the “Sales” header. And since everything’s automated, you don’t realise that the number that you’re now getting is for “Region Y”, not “Region X”, and you’ll continue using the numbers as if everything was fine. If you were manually running the report, however, you’d have spotted the difference right away.

Or take task scheduling for example. Let’s say you have two tasks you want your computer to carry out: Task A and Task B. Task B depends on the result of Task A, so Task A would have to be run first. Because you want to be as efficient as possible (i.e. completing the most number of tasks in the smallest amount of time), you set the starting time of Task B as close to Task A as you can (for the sake of this example, let’s assume you can’t set the scheduler to start Task B “after Task A” but have to give it a specific time).

Let’s say that after monitoring the times taken for Task A to run for a month, you find that it takes 30 minutes at most, so you schedule Task B to start 30 minutes after Task A. One day, due to an unscheduled system update halfway through Task A, Task A takes a minute longer than usual to run (i.e. 30 + 2 minutes), causing Task B to fail since Task A wasn’t ready on time. The system thus breaks because of this quest for “efficiency”. If a little redundancy had been included for “freak run times” at the expense of pure efficiency (or a less fragile process been put in place, e.g. figuring out how to start Task B after Task A, no matter how long Task A took), it’d have been OK.

An unwarranted faith and an air of infallibility of automated systems. When I program something to automate a process, I test it thoroughly before deployment, making sure that it works as advertised. After that, I wash my hands off the program. The reason why I automated it in the first place was because I didn’t want the process to be as involved as it was. If I was going to have to be looking over the program’s shoulder (i.e. if it had one) it’d defeat the purpose of automation, wouldn’t it? So I trust the system to work, and expect it to work until I know of changes that might break it.

I’m not sure about other developers but I find that I tend to place a lot of trust on automated systems because  of the “no manual intervention” aspect of automated systems. I find that most data discrepencies occur because of data-entry or procedural errors (i.e. missing out an action or doing things in the wrong order). As mentioned previously, when I automate something I do a lot of checks to make sure everything’s working correctly as at the time of deployment. After that though, anything goes. And even though I do carry out the occasional random check, they are random and may not occur for a long time after things have gone awry.

Forgetting that things can and do go wrong is one mistake we may make in designing systems. Expecting things to eventually go wrong and what to do about it when it does is one way we can protect ourselves against such occurences, helping us to mitigate the risk of large negative consequences when things do go wrong. Take for instance the example mentioned before of the additional sales region in the reporting system. One way we could protect ourselves against the using of the wrong data while enjoying the benefits of an automated system would be to create a “checksum” of sorts (used to ensure the integrity of the data after transmission or storage), where we check the sum of the total sales number against the sum of the individual sales numbers. If there’s a discrepency, perhaps because of a missing or additional region, we flag it out and prevent its distribution.

Creating great automated systems doesn’t necessarily entail making things as efficient as possible, even though efficiency might be the main benefit you were seeking in the first place. A highly efficient system that’s prone to error (and worse, errors you’re not aware of – the unknown unknowns) is worse than having no automated system at all. With such a system, you’d just be doing the wrong thing faster (like great management without an accomanying talent for leadership/vision!)

Having a system that can withstand shocks (one that’s “robust”), and recover stronger than before (one that’s “antifragile”) after they happen is the best system you can hope to have. Adding (some) redundancy for contingency and keeping in mind that even the best systems can fail is the key to protecting ourselves against the errors that happen. And they will happen. So, how antifragile’s your system?