Data and Composability #

What does it mean to own data? It means:

You have a full copy of it
It lasts until you decide to delete it
You can do whatever you want with it, including opening it with other apps

Data ownership vs. free software #

“Free software” is an older concept of digital freedom, which describes the right to run programs and modify them as you see fit. The Free Software Foundation (FSF) has a whole page explaining the definition of “Free Software”. The word “data” does not appear in it a single time.

This is a problem, because modern software is often simple and straightforward, and only becomes interesting because of the large, rich data set that it acts on. Consider, for example, social network software like Twitter: the programs underlying it is pretty simple, and there are endless Free Open-Source (FOS) copies of it, but the copies are unappealing because the value of the app is its data, not the actual program.

Composability #

Composability is when multiple different programs work together and share the same data.

Your computer does composability great. Consider the following story:

On a walk with your friend, he uses a camera app to take a picture of some mountains, then sends it to you on Telegram. When you get home, you download it and look at it in your image viewer. You think it’s a great picture, but there’s someone walking their dog in the corner, so you open it in Photoshop and remove them from the picture. Then you email it to your grandma to show her the walk you went on.

You just used 5 different apps: Camera, Telegram, Image Viewer, Photoshop, and Email. They all shared access to one picture.

Centralized services #

When you post a tweet on Twitter, you’re counting on them to keep it for you. You’re also counting on them to let you look at other people’s tweets. If they decide not to anymore, you have no recourse. If their servers crash, you also have no recourse.

Even if the Twitter app was free-and-open-source, it would be worthless, because the data would be gone.

In the world of centralized services, achieving data composability is hard. It’s usually not hard for technical reasons, but for business reasons: hosted software companies make money by renting out access to their data. If they gave it away for free, they wouldn’t make any money. Once again, the app is not worth much.

To illustrate that this is not a technical problem, consider that most such services have an “API” which doesn’t do anything except produce the data. But they want you to pay for it, and won’t let you use it otherwise.

Filling the gap #

This leaves us in an odd spot, where the software is free but the data is not, and the software is no good without the data. If we could get the data out of the centralized services, we could have lots of fun.

Here’s the interesting part: most apps have adopted a “client side rendering” approach, where the app and the data are treated as separate resources (i.e., loaded separately). This means that data is transmitted as JSON from the server to the app. Every time you open Twitter, all the tweets you see are loaded as JSON data, which is then processed by front-end to render the tweets.

This happens on every page you load; all the data has to be loaded in somehow! If you open the browser console “Network” tab, you can find it right there. The browser doesn’t do anything magical; if you sent this stream of bytes to the server from another application, you’d get the same stream of bytes back. (There are some asterisks to this, but they’re all manageable.)

Normally this data is lost when you leave the page. But suppose instead that you intercepted it and saved it in a database. You now have a way to recover your data from the centralized services.