The default for computing is data loss

Most recent update: 22nd July 2022 - 05:08:26 - 4667 characters

We should likely warn time travelers as we present them the wonder that is the modern mobile phone not to get their hopes up. They're still likely to lose text documents.

Why?

The default for computing is data loss.

For the average user there's still a good chance that when they create a file they'll lose it.

Even if you do everything right, paying money for a backup solution, it's entirely possible for such a backup solution to lose your files or terminate your account.

You can't trust cloud backups or storage regardless of your relationship with the company. The situation only gets worse when we remember how many startups disappear leaving no trace of their data behind.

That this is still not solved in an era when most users have enough storage, compute, and bandwidth to back up their digital life multiple times over is a slow rolling nightmare. There exists an ever expanding bitrot fragmenting our collective human history.

There are many causes, and many partial solutions, but the morbid truth still remains: the default for computing is data loss.

A writer is still likely to lose their novel or their PhD.

A coder is still likely to lose a database, almost regardless of their level of sophistication.

Your friend circle is still likely to lose their discussions and "memeography" as they shift from one transient communication platform to the next.

If we want data safety then the end user must have a consistent and well tested process available at little or no cost. The solution needs to be simple to use and understand. Default settings should be resilient and any additional configuration foolproof. Such a data safety layer shouldn't need user intervention, remaining as invisible as possible excepting situations where data loss might later occur.

Types of data loss / impairment / deprecation

  • Local data: you're likely to have three devices, a few hundred megabytes / gigagbytes of irreplaceable data, and we'll still expect to lose it
  • Local databases: the OSS default isn't trivial to use / set up and resiliency instead relies on paid closed source PaaS extensions that rarely make it back to open source
  • Cloud data: exporting your own data but limited in what is connected to you, let alone relevant to the broader ecosystem
    • Nearly equivalent: importing lossily from one service to the next
  • Transient data and experiences: game servers that will shut down and never return, file formats that at best require data archaeology, vital data lost in reams of banal statistics, ...
    • Whilst some of this is a reality of a changing and evolving world "transience" in technology seems to far exceed that of the real world
  • Backups: Are these backups independent, fragmented, and likely to decay themselves in the following few years?
    • Example: A community of tens or thousands may be able to export their own data, each a slither of the original, but can it ever be reassembled again?

Developers, by implicitly or explicitly making certain practices simple or easy, dictate the type of data safety and data loss their users are likely to experience. Given the developer is far removed from the consequences, especially when the developer is tech savvy and their average user closer to technologically unsophisticated, this is a problem.

A collection of questions to ask when thinking about data backup for end users (and yourself):

  • Does it require user action (or a process that isn't error proof) to begin and maintain backups?
  • Is data backed up locally still useful?
    • Does it require special software to use? Are there static "rendered" versions that still provide partial value without the underlying tool?
    • Does a backup of data come with sufficient surrounding context to be understandable?
  • Is the user aware of what limitations or errors the backup is constrained by?
    • Are they aware of what is or isn't covered by the backup?
    • Do they know what "recovery" looks like in the worst case?
    • Is the user alerted when an accident might result in their data or backup being compromised?
  • How much of the original functionality is maintained after export?

If you're interested you can read more thoughts on how ease of use, open source, and SaaS don't (by default) result in open source software resilient to data loss.