I’ve been thinking about this question for a long time now: how do we balance the need for agility in working with data and the need for proper governance around the said data?
For those with no idea what I’m talking about, here’s a great summary of the conundrum I’ve found no answers for:
[F]rom its earliest days, data management and BI have been the purview of the IT department. As the demand for reporting increased, BI leaders emerged to focus on standardizing and governing data and metrics, creating dashboards and reports with traditional BI tools that ensured one version of the truth enterprise-wide. And at the time, this top-down approach to BI worked.
But in today’s on-demand world, businesses need to move lightning fast, to be agile and to be able to course-correct in real-time to compete. The IT-driven approach to BI reporting no longer works for the business person who needs to ask questions, get answers, and based on the answer, keep iterating to gain insights. If a retail brand manager sees an anomaly with an item SKU on a report, it prompts logical next questions. Why? When? Where? With the top-down approach, she must work through data analysts to get ad-hoc reports built and perhaps wait days to get the answers—too late to have any impact.
Meanwhile, for the “data ninjas” it always feels like groundhog day—they deliver reports, which surface more questions, which result in more reports, creating an endless back and forth where neither side ever feels satisfied. In response, business analysts have employed workarounds using data discovery tools, creating bottom-up, “shadow IT” solutions. While this workaround produces faster ad-hoc answers, it is counterproductive, creating silos of information and many versions of the truth.thoughtspot.com – The Analytics Battle: Agility vs. Governance
On the one hand, I understand and fully appreciate the need for “one source of truth”, and for proper data governance.
It means not having to constantly validate numbers, a pointless and time-sucking activity if there ever was one.
It means helping business jump straight into actually using data to solve problems and identify opportunities, instead of gaming the situation by blowing up small and insignificant discrepancies in order to avoid the harder hitting issues at hand if the data paints a negative light for that particular party.
It means data is less likely to be misinterpreted, and avoids catastrophic scenarios where the data ends up in the wrong hands.
All good and valid points.
But… and that’s a big BUT, coming from an analytics perspective, data governance as it has been traditionally enforced can get extremely (extremely!) frustrating.
The data we have holds potentially so much insight, but because it’s not validated, not cleaned, not IT-sanctioned, we’re forced to ignore it as if it didn’t exist.
It’s like starving next to a buffet line. At closing time, the workers start disposing of the food. When you ask for a little since it’s already going to waste, they tell you “no”, because “it’s against company policy” and they don’t want to be “sued in case anything should happen to you.” So you continue to starve to death – because it’s better you die by your own actions than by theirs.
I head an analytics team that’s done brilliant work. But perhaps too brilliantly, as we’re a little bit a victim of our own success. The things (reports; analyses; tools) we’ve provided the sales team have answered countless questions and provided much insight. But here’s the rub: the more questions you answer, the more questions you get.
We’re at a point where because we’ve answered the original questions so brilliantly (“it looks so easy how you do it” said one – despite it being anything but), we’re now expected to have similarly brilliant — and quick! — answers to the follow-up questions.
But we don’t.
Because where the original questions could be answered with available data sources “as is”, the new questions tend to require multiple sources; linked in myriad ways; transformed in gory complexity; and at a level more granular than anybody had expected 20 years ago when the systems and data architecture were first set up.
So we move off the highway of “valid”, “IT-sanctioned” data. We take a little data here; a little data there. Experimenting: joining; transforming; squeezing out of the data every single ounce of insight we can.
In between, questions change. And we iterate. Answers give rise to more questions. Iterate. Eventually when there’s a little stability, we try to merge back into the highway only to be told: sorry, your data’s not welcome, and if we did want to replicate what you did, we’ll need to set up a 2-year project for that.
And so a “silo” is born. (Ironically, this “silo” tends to be less siloed than existing data sources as it will take in data from disparate systems and pull them into a single data source. I’d almost say “data lake”, but more often than not it’d be more of a pond.)
It’s not entirely IT’s fault – we know data governance is essential and they’re doing as much as they can within the tight constraints they operate in.
But at the same time there has to be a better way, a compromise of sorts.
Otherwise, with the increasing pressure from the needs of the business for increased data agility, like how prohibitions spur black markets, we’re going to see much more unregulated siloed data (ponds :)) and the rise of Shadow IT.