Harnessing spreadsheets as part of a data ecosystem
Spreadsheets have a bit of a bad reputation, as a place where data goes to die; data that's "in a spreadsheet somewhere" in the organisation isn't often easily available to those who need it. Generally, everyone knows that all the data of interest to an organisation should be in some database, or data lake, or data warehouse, or whatever buzzword is currently in fashion - a place that's backed up, secured against unauthorised access, audited, and most importantly, easy to connect systems to.
A far cry from "Documents\2023_Financials_final_v2.xls on Bob's laptop".
And yet: When Bob has just gathered some new and important data, it's not usually easy for Bob to put it into the central data lake (even harder if one doesn't exist yet). Such a thing is generally in the care of teams of specialist engineers, who are very busy. And even if Bob learns how to do it and his request for write access is granted, or the data team are available to help, and the data gets in there - sure, it's now accessible via SQL and APIs, and analytical dashboard tools that are hooked into the system, but that's not necessarily much help for Bob's colleague Alice who would like to see if Bob's data matches what she's seeing at her desk.
Spreadsheets get the job done
So, what happens in practice is that Bob fires up a spreadsheet to put the data into. Because he can do that in mere minutes. And then he can email it to Alice, or share it directly if it's in an online system such as Google Sheets.
And, while the analytical capabilities of that central system are impressive, Bob and Alice can sum and average columns in their spreadsheets without needing to learn a new tool; draw charts and do linear regressions with a few clicks.
To a certain extent, this is a problem with the enterprise data tools; they're meant to be used by an elite cadre of specialists with technical training. There's certainly plenty of scope for better tools that make central data repositories more accessible and usable (and we're working on the problem ourselves). But spreadsheets are here now, and available on every office computer, and users are familiar with them: they'll be around for a long time yet.
And this isn't a bad thing. While spreadsheets certainly have shortcomings as a data storage and processing tool, they're still pretty good at it. And they combine that with being an excellent and accessible data exploration and presentation tool. They let users automate tasks without needing to learn programming languages; and although we might not call spreadsheet formula development "programming", it arguably is, and can provide an introduction to the way a computer works that can provide an on-ramp to more powerful programming languages.
Data tools should work with spreadsheets, not try to fight them
Developers can be forgiven for thinking that spreadsheet data is "just a pretty CSV"; that data from spreadsheets can be imported by just treating it as a CSV, or that if your system can generate a CSV that's a perfectly good export format for spreadsheet users. Sure, a spreadsheet can just be a single row of headers with rows of data underneath, but that's a boring spreadsheet. Good spreadsheets contain formatting, explanatory text, clear layout, charts, and so on. Software systems aiming to interoperate with spreadsheets need to account for that when importing - and software exporting to spreadsheets can produce much more readable and useful results if they lean into that rather than just generating CSV-shaped data in a .xlsx file.
Here at Register Dynamics, we build data systems that work with the world as it really is - and we know that a lot of the world's most important data is in spreadsheets. Talk to us about your needs and find out how we can help!
Author
Tags: