Aspects of Digital Transformation: I.T Management By Spreadsheet - An Anti-Spreadsheet Manifesto

Overview

Over the course of 24 years in the IT industry I've noticed a repeating theme in "Enterprise" IT teams that is worryingly "backward".

The starting point for this observation is the practice in many IT teams of relying on MS Excel spreadsheets to manage a large variety of IT infrastructure tasks that should (and could) instead be automated or systematised using holistic software solutions.

In this article I put forward the opinion (founded on a few decades of gathered experience) that using excel spreadsheets to manage modern i.t infrastructure is analogous to using Sumerian clay tablets to record the progress of goods across the modern warehouse floor.

It is a "mindset" problem, and potentially a skills issue:  A symptom of both an inability to automate systems effectively and a resistance to efficiency within IT infrastructure engineering teams.

The Observation:

There is a common practice of managing IT infrastructure and processes in Excel Spreadsheets as an alternative to putting in place automation to track I.T information and manage infrastructure holistically.

This practice hinders the improvement of I.T infrastructure and services to the point where it degrades the business' ability to deliver I.T services to its customers efficiently, reliably and with integrity.

It takes a cognitive toll on IT teams and acts as a form of "human resource filter" in the sense: It contributes to the reasons why highly skilled talent leave the team and why less highly skilled talent who are comfortable with this method of operating stay with the organisation.

The Context

Given this observation, I have to qualify that the practice of managing Enterprise IT by excel spreadsheet is only truly detrimental in a certain context. This context is defined by three parameters:

  1. The profile of the IT organisation under discussion
  2. The degree of IT automation possible in such an organisation
  3. The Human Factor: The Skills, Capability and Mindset of the technical teams practicing IT within the organisation.

First, I will provide a profile of the kind of organisation this article is aimed at:

  • It's IT infrastructure is located in privately run data centers ("on prem") and comprises a large amount of legacy IT infrastructure (e.g Solaris/UNIX Operating Systems running on SPARC, other variants of UNIX, networking, backup and storage systems more than 10 years old).
  • The organisation is in the initial stages of moving some of it's IT infrastructure from these datacenters to the cloud. The planning for this process is complicated by problems re-architecting legacy applications to run "on the cloud".

To further set context, the concept of IT Automation needs to clarified, as well the role of the "human factor" in IT teams.

  • What do I mean by "IT automation"?

In this article I use the following working definition of IT Automation:

"Automation is the continuous process by which technology is used to make systems, processes, procedures less dependent on humans for execution, gradually reducing dependence on people for actual operations and placing them in an advisory or "oversight" role over systems."

In other words: A process of gradually moving the human being "out of the loop".

  • How is the "human factor" relevant?

The degree of automation in an organisation is both a result of the human factor and an influence on human behaviour of people working with the automation. Briefly:

a) The skills and capabilities of the people working in an IT organisation determine the degree of automation present in the organisation - perhaps even more than the actual nature of the organisation's core business itself.

b) The degree of automation in an organisation determines how efficiently it's staff carry out IT-related tasks and the "workload stress" experienced by the staff. It therefore contributes psychological and social influences to IT teams.

Our Central Theme

Expanding further on our observation, it consists of three key "tendencies" on the part of some I.T infrastructure teams in certain types of organisations:

  • The tendency to manually manage otherwise automatable I.T processes using spreadsheets (primarily Microsoft Excel).
  • The tendency to implement crude, piecemeal automation when automation can be implemented using holistic solutions designed from the ground up to support automation of process.
  • The tendency to resist any attempts to migrate the organisation away from a spreadsheet-based IT management methodology.

In short, we identify the following theme:

The practice of IT Management by excel spreadsheet is detrimental to to the practice of automation, both limiting current automation practices and limiting future attempts to automate IT within the organisation by entrenching itself within the organisation.

During the course of this article I will enumerate the arguments I've heard from business, management and technical staff for why this approach to automation is justified from a business and technical point of view and compare them against the real consequences of the approach as I've experienced them over the years.

Caveat: The Speculative Nature of this Article

While this analysis is based on empirically observed data points gathered over an extensive period of time across many industries practicing IT in many geographical and cultural locations I must caution that it is not a rigorous academic study and should not be treated as such.

Potential for academic rigour exists in this area.

That said, the theme identified above is observable in reality and has concrete effects on the efficiency of IT in "enterprise" organisations. It is a real problem that should be addressed - regardless of whether sufficient academic rigour has been applied to it or not.

Potential Root Causes of resistance to automation

In my experience, most attempts to move IT organisations away from their (current) spreadsheet-based methodology are immediately resisted at levels of technical staff, management and to a lesser degree more senior levels of management (IT directorship).

Some of the reasons given over the years (some of those I've personally heard when driving automation projects in traditional I.T organisations) are:

  • Labour and Time-saving Efficiency "Its easier to do it by hand":
  • No fundamental justification for automation ("we've always done it that way, so why automate")
  • We're planning to upgrade to a better system soon, so automation is a waste of time
  • We're planning to move to the cloud  so automation of our current procedures is wasted
  • The risk of something going wrong while we automate and the risk of transitioning from our current manual method is too high ("legacy lock in", "analysis paralysis", "fear of the unknown")
  • The system does not support automation (no automation APIs or interfaces).
  • "We're too busy to automate!"
  • We don't have the human and financial resources to automate (Organisational skills deficit)
  • The technology doesn't exist to automate this process ("We think only an A.I or A.G.I can replace the human in this loop!")

This is largely a list of invalid reasons for resisting automation although some of them may be justifiable for a short period of time in an organisation's history.

We provide a table (ironically) of thematic counterpoints to these arguments that may be expanded in depth:

I Argument Counter Argument Comment
1 Labour and Time-saving Efficiency "Its easier to do it by hand": 1) It's probably only easier for staff already familiar with the process. 2) It gets progressively harder as new employees are handed the task and the old ones are replaced. 3) It becomes extremely risky if the staff familiar with the process are unavailable for some reason Sometimes called the "Job Security Defence"
2 No fundamental justification for automation ("we've always done it that way, so why automate") The justification for automation becomes apparent when human factors lead to error The implication is that the human will never make an error and that low-risk errors are worth the time wasted
3 We're planning to upgrade to a better system soon, so automation is a waste of time Potentially a valid argument IF it is actually the case that the new system will negate the need for bespoke automation Here one needs to determine if this is a merely kicking the can down the road with the intention of not automating once the new system is in place and if the new system indeed provides the automation features promised
4 We're planning to move to the cloud so automation of our current system is wasted The inclination to automate is usually a key driver of moving to the cloud. It tends to begin in the legacy environment as a model of the automation that will eventually be done on the cloud. Claims that current automation efforts will be deferred until moves to the cloud are complete when the skills and inclination have not been demonstrated "on premise" should be regarded as unsubstantiated. People tend to operate on the Cloud as they did in the legacy environment - if the inclination to automate wasn't part of systematic practice before it's unlikely to change without concentrated IT Governance efforts
5 The risk of something going wrong while we automate, and the risk of transitioning from our current manual method to automation is too high ("legacy lock in") This is usually an indication that the underlying process itself is not well understood and not deterministic. Identify the exact reason for this to isolate the cause of the potential risk - then address that aspect by redesign IT business systems that are too risky to automate are usually poorly designed to begin with
6 The system does not support automation (no automation APIs or interfaces). This is almost always a sign of a system so old and disfunctional that it should be replaced as soon as possible few systems still in operation after 2020 don't support automation via APIs and instrumentation
7 "We're too busy to automate!" The most common reason IT teams are too busy to automate are because of the chicken-and-egg situation of being too caught up in manual activities. Manual activties are time consuming and wasteful Break this cycle with an automation initiative led by a new hire.
8 We don't have the human and financial resources to automate (Organisational skills deficit) Ironically, financial and human resources are required to do manually what should be automated. The actual underlying claim here is usually "We can't find people smart enough to automate - for any amount of money or newly hired staff"
9 The technology doesn't exist to automate this process ("We think only an A.I or A.G.I can replace the human in this loop!") The claim that a particular business carries out IT processes that are beyond automation but happen within a deterministic IT environment is a contradiction in principle This is usually a complaint about the complexity of the process to be automated, which suggests analysis and simplification should be carried out as a pre-requisite to automation

Lets examine some plausibly valid reasons for avoiding initiatives to automate:

I Reason Comment
1 We don't have the money and resources to automate Compare the costs of the current manual mode of operation with the costs of automation to validate this claim. However, if you're broke - then you're broke!
2 We don't have the technical skills to automate (outdated skillsets or skills gaps) This seems to be an H.R problem ...
3 The Business and its IT infrastructure are about to be sold and we don't want to deliver any more value than what we were paid for. Fair Enough. Let's hope the purchaser doesn't view this as a due diligence issue ...
4 The I.T department is about to be fired and outsourced and see no reason to deliver any more value than the minimum operations. Fair Enough. Let's hope the purchaser doesn't view this as a due diligence issue ...
5 The process is so non-deterministic that there is no scientific and economically feasible means to automate (AKA "the technology doesn't exist"). You must be running some Quantum-level tech to be unable to automate ...

Case studies

So much for theory and argument but what real-world data points do I have to support the discussion thus far?


Below is a comparison of various companies I've delivered consulting, implementation and operational services for in the past. Each approached automation with a different mindset, reaping different rewards from automation.

IT Automation was important for all these organisations - ultimately proving business critical - though varying in maturity across the sample set.

The total sample consist of 14 companies in 10 different industries across 5 countries in 4 continents over a span of 24 years, of which 5 are shown in the table below:

I Business Type Location Automation Maturity Criticality to Business Barriers to Automation Type of Systems Automated
1 Commerical ISP Africa Mature Internet Services Continue but provisioning stops without automation Overworked Staff (firefighting) realtime and batch systems, small-scale configuration management
2 Non-Profit ISP Africa Basic Internet Services Continue, Provisioning can continue manually Unwillingness to embrace automation, inclination to operate manually batch, configuration systems
3 Public Cloud Provider Global Advanced Automation is central to the business service none. automation is mandated realtime, batch, global scale configuration management, machine learning, analytics collection
4 Retail Brand Operator and store operator Middle East Moderate Automation failure seriously slows operations skill, legacy systems, legacy processes, fear of change, mindset batch processing systems
5 eCommerce Startup Middle East Inconsistent (Basic, Mature) Unable to operate at required scale without automation Staff occupied with fire-fighting and startup MVP priorities batch and realtime
6 eCommerce Business Europe Advanced Unable to operate at required scale without automation none. automation mandated batch and realtime, medium configuration management, machine learning, analytics collection

As can be seen from the table, some potential trends are suggested:

  • Organisations who mandate automation achieve advanced automation and also achieve the advanced benefits of automation (e.g machine learning, analytics capabilities, large scale configuration management).
  • Organisations with an aversion to automation or skills deficit achieve Moderate and Basic levels of automation and do not achieve more than automation of batch processes and small scale configuration operations.

Counter-Automation Culture

Along our way to identifying the systemic root causes leading to poor automation in an organisation we find ourselves looking at "mindset" or the "human factor" more closely:

The symptom of I.T management by excel spreadsheet is an indication of a deeper problem: "Fear of Automation", which manifests itself as a culture in I.T teams which against initiatives to improve the efficiency of I.T systems.

This problem is known by its characteristic manifestations in an I.T team:

  • The strongly held belief that its more efficient to do a rare task manually than to spend much time automating it.
  • The fear that automating a system or process might increase the impact of potential issues
  • A general inability (or disinclination) to plan out the complete set of automation scenarios and evaluate the relative merits of each.

I call this "counter automation culture" because once it establishes within an IT department it becomes part of the shared culture of the entire department.

It becomes even more recognisable as a "culture" when new employees join the department and meet with a behavioural "wall of resistance" to any initiatives to automate.

At this point the problem ceases to be a technological one and becomes a social issue ("Human Factor"). This brings me to Conway's Law and it's influence on IT in organisations:

The Influence of Conway's Law

The structure of an organisation has a substantial influence over its inclination and ability to automate.

Conway's Law describes a tendency of the organisations structure to influence the design of the systems they build.

In short, it allows one to conclude that:

  • If  company is rigidly hierarchical in structure it is unlikely to build effective microservice applications
  • If company is highly autonomous, distributed in structure it is unlikely to have much success in building rigidly monolithic, strictly defined applications
  • If company is poorly organised, the boundaries  between functional areas, policies, processes sometimes rigid and in other places poorly defined or absent, the software they build will be similarly inconsistent in its design.

Since the consistency and structure of software systems determines how well they can be automated, Conway's law influences the quality and degree of automation one can expect from a given organisation.

Takeaway: One should be able to predict the quality of automation present in an organisation by getting a first glance at its organisational structure an dynamics.

Compare and Contrast: I.T Organisations that do automate vs. those that don't.

One way to assess the validity of the ideas we've put forward thus far is to compare the behaviour of organisations that drive automation versus those organisations that substitute "spreadsheet based management" for automation.

This comparison can be done along multiple axes of comparison for a complete picture:

  • Large companies vs. Small Companies
  • Cloud Companies vs. On Premise companies
  • HFT Trading companies vs. Everyone else

It should quickly become clear that high performing organisations do not allow their IT automation to be dominated by manual "spreadsheet based" methods involving tools like MS Excel. Organisations that depend on scale and adaptivity to remain competitive would be least likely to substitute spreadsheet based IT management for automation methods.

Recap: Why Automate? What are the consequences of not automating?

In light of what's been discussed so far in this article it might be useful to take a pause and ask ourselves if any of this matters at all?

Specifically: "Why bother with automation at all? Is it important?".

Here we take a business perspective to understand the influence of the global market on business competitiveness ...

In short:

Automation of I.T provides an organisation with competitive capabilities that influence the entire operations of the business. It determines how fast an organisation can bring services to market and how efficiently it can operate those services without losing profits through overhead.

Against competitors in the market, an organisation's I.T sophistication can determine how much market share a business loses or wins and perhaps whether it continues as a viable competitor.

The Cure

Having identified "IT Management by spreadsheet" as a symptom of a deeper problem, I'd like to propose a series of "cures" for the disease of "counter automation culture"  in increasing order of risk:

  • The direct approach: Cure the disease, not the symptom (i.e don't ban MS Excel just yet, first do thorough process analysis of the automation needs)
  • Remove the risk of automation ("De-risking") by upgrading systems, refining and stabilising processes and capturing undocumented procedures in documentation.
  • Remove all the "non-automatable" systems (embark on technology refresh, transformation and upgrade) and replace with systems that offer good options for automation and integration
  • Institute new formal technological practices, approaches and certifications as an incentives to learn and  drive automation. Include automation as a KPI, OKR or other staff or team performance metric.
  • Sometimes you just need to fire all the backward people - carefully
  • Restructure Departments and Teams appropriately (Using Conway's Law as a guide)

Conclusion

  • The symptom of I.T management by spreadsheet represents a failure of an organisation to adopt the mindset and culture of continuous improvement through automation. It is retrograde in character.
  • It means that engineers who have been entrusted with developing excellent infrastructure have instead opted to replace engineering with a form of "spreadsheet based bureaucracy".
  • While this form  of administrative bureaucracy may be minimally successful at keeping I.T infrastructure operational, it can never lead to continually improving service quality and will fail to meet the rising demands of a global market.