< back

should data expire?

entropy and permanence in digital landscapes

alt title: navigating forgiveness and growth in an environment of fixed points

Cynth: So, okay. This is a summer school talk by Honor Ash who uses they/them pronouns. You can find them on mastodon @h@post.lurk.org. This talk is titled 'should data expire, entropy and permanence in digital landscapes (alt title: navigating forgiveness and growth in an environment of fixed points)' This talk explores preliminary research Honor has been conducting as part of their ongoing cross-media artistic practice, centring around community, scale, alienation, communication, and how the internet has changed the way we think.

The rules are: no vocal interruption of the presenter - hold all questions and comments until the end. Use the text chat, turn off your camera and microphone until the end of the talk. This talk will be recorded and please let us know if you would like, not like that to happen. And yeah. It's all you!

Honor: Great, thank you. I'm using an iPad, so I will do a screen share, but it will mean that my face camera will go away. So everyone has to be okay with me being a disembodied voice. I will also try not to speak too quickly, but I have a lot to fit in. I will be publishing the transcript afterwards and we can talk about it if there's anything that gets missed.

Here we are. That was a great introduction because it means I can skip saying this ridiculously long title myself. I'm gonna be talking about the personal angle of this topic - I just wanted to clarify that because I think there is a very business angle that could be talked about, which is just not where I'm going at all.

The key points that I wanna talk about here are on the screen right now: about how our relationship with data has changed how we see each other and ourselves, how our collective memory and growth has been impacted by mass data collection, and I would like to talk about the internet as a place, as well as a tool, service, and way of life.

I just wanted to clarify that I'm from the UK, so a lot of the things I talk about might not be relevant to everyone, especially places where tech and the internet are used very differently. And I'm also on the youngest end of millennials. So that's my lived experience. I think this is a reasonably generational topic, but that is bleeding into everywhere.

So that's just giving you some context there. So to start off with, I wanna talk about how we stored our information pre-digital.

Hard copies. When that's the only option, that's what you do. They're very direct. There's no abstraction between the data that's stored and the format that it's stored in - there's a very straightforward relationship with the data that is created and also the data that you collect. So originals in this context are quite a lot of effort to produce, but reproduction is very, very easy. There's a threshold of things being important enough to be made in the first place - but also there's the space equation. It takes up quite a bit of space. So that really informs not only what you make, but also what you keep. Data in this way is grounded in the material and as technology develops, we start to get this kind of new layer of abstraction on the top.

So we have more tools to help us write and create more quickly, it's not as expensive in terms of time to write something down, so we have a slightly more frivolous usage of recording information, but still very limited by the physical space that we have - not so much by the time that it takes to make an original. So discs and drives and things like that - they take up space physically - and the concept of storage in this context is not abstracted from the physical, so file size is still very connected to how much space you physically have. There's a prioritisation because there is that limitation on space, previously which was informed by the creation process being difficult, but is now very much more about how much space you have.

And we have a very direct relationship here with our data storage - we know exactly where it's stored. It has a physical presence in our lives and our homes and our offices and whatever. As technology develops, we start to lose this direct relationship between physical space and file size.

So as we get more efficient, technology speaking, and everything just gets smaller, it's a lot less grounded in physicality and how much you can fit in something. And it's a lot easier to lose track of what that means.

So we kept trying to fix this - we created skeuomorphic interfaces for desktop computers. You had your folders and it was designed to replicate a physical office space or file management system that you might have used in the non-digital world. And you know, that worked for a period, but people kept being born who don't remember the physicality that these are referring to.

So then we get things like this, which is an article from last year about people who are around my age and younger kind of the gen Z millennial cusp, not intuitively understanding the structure of file management systems.

There's no inherent memory in people around my age and younger of the physicality of data and how managing, curating, and organising that was absolutely necessary due to physical limitation. I mean, my first interaction with folders really was as an abstraction. And so I do feel like this is a very generational issue that's kind of becoming everyone's problem, now that people who are my age are also adults who interact with everyone else in the world - who've developed only knowing this as an abstraction.

So this is something that I think illustrates it in quite a fitting way. It's a Tumblr post from a user called graphics-cafe.

It says "every now and then I see a USB flash drive that's like 512 GB for like $2 and I feel myself aging like that one scene from Indiana Jones where they choose the wrong holy grail." So they asked people who saw this post to tag it with their age and how much data intuitively feels like a lot. And it's obviously very unscientific, you know, it's very biased - who's gonna see this and interact with this - but there is a trend there that: the younger someone is, the more varied their answers are, and it tends to be larger as well. Which kind of does speak to a generational, intuitive disconnect about the size and tangible nature of data.

So what changed? Obviously technology got more sophisticated, but why did we lose our grip on what it means for something to be a certain size when we speak of data?

It's, it's the cloud. It's such an intangible, wispy, dreamlike, metaphor. And it's removed from physicality. It's someone else's problem. It's somewhere else. And this is the dominant mode of thought for people growing up now - it's the norm, right? So storage is like an unlimited resource. It's one bucket and everything's in the bucket and the bucket is infinitely large. And so why is the cloud so infinite feeling?

Well, I think a lot of people in this probably felt this or knew this, but it's pretty much Google's fault. Through introducing Gmail, they had a gigabyte of storage initially and actually no mechanism to delete emails in their original implementation. It was 500 times bigger than Hotmail. So all of a sudden there was just no reason to consider the concept of filling up your inbox. It broke everyone's deletion habit. And it was kind of the straw that broke the camel's back in terms of a full disconnect from the storage you're using and how much space you are actually taking up.

My first email address was actually a Gmail address. I never made those habits in the first place. There was nothing there to break. And it was never part of my thought process around my data. Well, what motivated Google to do this. And it was that they could make money off it. The more that they can encourage us to amass data on their servers, the more they can scrape it and sell us things - and until 2017, they actually did scrape every email that was sent and that landed on their servers to better advertise to us. So where does that leave us now, having existed in this world for so long?

This is an article from March this year. I'm not gonna read the whole thing out, it's kind of a visual, but key points: it was a relatively small sample size about the way that adults store their photos, it was commissioned by Fujifilm. According to this study, 97% of all adults keep photos on cloud services. Only 12% spend money on additional storage and seven out of 10 people admitted to transferring old photos onto new devices without filtering or organising them in any way at all.

And a lot of the reasons that people cited for doing this was about wanting to hold onto things, wanting to remember things better, but on average, those polled only reviewed their photos about once a month at most.

So this dynamic puts us in a position of trusting cloud storage, very, very, very heavily - and then the question is, well, can we trust these companies with this? We've already looked into the ways that they might use our data, but can they at least be trusted to keep it safe?

And. Yeah. So the answer to that again is no. It was a massive data loss, when MySpace lost that much music. It was 50 million tracks approximately, and it was very largely not backed up. At the time of their uploading, the conception was that it would just live on MySpace - it wasn't necessarily treated with care. And at the point of creation, our mindset had already begun to shift societally that we could just put it up and it would just live there in the cloud, and that's where it lives, and it'll just always be there and that's fine. I really think that this is a symptom of this abstraction and a kind of blind trust in a lot of tech companies as well, that it might be safer there than it would be if it was in our own hands.

There's an existential thing going on here as well that we seem to really shy away from - that digital information has a physicality, and it's also subject to the same forces of decay as everything else that has a physical form, even when it's in the cloud and when it's someone else's problem.

So that led me to thinking about the informal archive that we're all creating versus the formal discipline of archiving. When you think of an archive you tend to think of something quite formal, like in a library backroom with a lot of folders and things like that - a nd then we've got this informal mass archive that we are just all carrying around with us.

So formal archives, they're contextualised, they're learned, they're studied. They have someone whose job it is to know them. And they're also usually removed in time, temporally removed from the person whose artifacts they are.

They're not used to inform the relationships that person has in real time. But informal archives, the kind we're all making every time we breathe online. They're dormant - they're not accessed very often and we don't have a relationship with the information that's in them. They're decontextualised, they're not among things that make sense for them to be around. And they're still semi-public in the way that a formal archive might be. In fact, they're more easily accessed a lot of the time. A lot of them are on profile pages. So then you have yourself as a living subject of an extremely public archive, which is treated by others almost the same as if it's a formal, historical, archive - and it's not just others that interact with that public, incidental, unexamined vision - it's ourselves. It's fed back to us at random, it's fed back to us with auto-generated slideshows in widgets, on our phone in TimeHop memories and roundups of what we did last year. It creates a kind of performed self, which is treated as though it's the whole and it encourages others to see us that way too.

Bearing all that in mind. I wanna talk a bit about the activating moment of the screenshot as a method of legitimising something, committing it to record and choosing to save it. That's a moment of agency. When you take a screenshot, you've seen something that's important enough. It's crossed that threshold that you not only want to keep it, but you want to know where it is. It, it reached something in you that makes the ongoing automated collection no longer sufficient for this kind of recollection, and you wanna put a flag in the ground at that point in time and be able to step back inside it - either because it's important to other people or because it's important to you. So expiry based platforms like Snapchat really attract this method of activation. They encourage the natural curation-of-memory behavior, and clearly there's something there that's really important to us.

There's obviously privacy concerns when something expected to be transient can be saved by others. So, they implemented a notification system and we immediately created workarounds to circumnavigate those notifications. So there's a couple of questions that, that raises for me. Why is it so painful to have something that we thought would be transitory legitimised in this way, but also why go to such lengths to memorialise and preserve things, even in spite of this, even if it could betray the trust of the person that you're talking to?

I think this is very linked to the ways in which what we choose to save - our externalised memory - contributes to our image of self. If something is formative, but also fleeting, we'll try and capture it. You'll try and get that photo, that snippet of video, that voice memo - you'll try and save it in some way.

And humans only have so much capacity to remember. We always want to be supporting that and augmenting that. And in theory, what we choose to preserve becomes safe, a kind of highlight reel, but in practice when we're saving so much without thinking, it really clouds our ability to discover and recall and properly sort our collected and collective memory. It impedes our ability to understand and learn from the past.

So, the big question, what would it be like if all personal data had some kind of expiry date? we've already touched on some services that try and implement this already. We've got Snapchat, we've got disappearing messages on quite a lot of services, now. We've got iMessage audio files that disappear after two minutes. Unless you choose to keep it forever - which also acknowledges that audio files are both conversational and transient in that aspect, and quite large files. And we have stories on a number of platforms, but they're most typically linked to Instagram - but with stories they don't fully disappear - they're removed from public view, but they're still kept in your archive. And when it comes to the performance of the self and the construction of the self for others, I think that's really, really useful. But when it comes to understanding ourselves through our saved memories, I don't think it really gives us anything to have just another mass unsorted archive.

So, you know, what, if this was applied, not just to chat messages, snaps and really transitory things. What if this was applied to our emails, our photos, the really heavy stuff, the sentimentally loaded things. I think a few things would happen. First off, it would need to be a complete erasure. I think there would need to be no hope of recovering it and we would need to know it was really gone for that to have an impact on how we felt about it.

I think it needs to be a long enough period that you've done all your processing before it slips away. You've had enough time to think about it, come back to it and evaluate whether or not you really need to keep it, but not such a long period that you are a fundamentally different person by the time you stumble upon it.

And I, I think that the main effects would be these ones. I think we would get more mileage out of less computing power on a very practical level. We'd be going from a very disposable and greedy type of usage or hoarding behavior to one where, you know, if the box is full, you take something out of the box, you don't build infinite extensions until there's nothing left to build with. We would have a better understanding of what informs us as people, because the external framework that we rely on to assist our memory would be more naturally goopy and imperfect and rounded. And that really feeds into what I mean when I say a reintroduction of more natural entropy into a digital context . It would allow us to leave behind damaging patterns of behavior, like diving into a sprawling, decades-old message log, just to prove a point or to make a pain feel fresh.

We'd also have a more intuitive understanding of the relationship between our actuality and the collected data we choose to publicly present to illustrate that whole. So what I mean by that is instead of a mass unexamined uncurated bulk of information about us just sitting on the public internet for people to take at face value, we would instead have small collections of little trinkets and tokens and souvenirs that clearly have more of a story behind them than just the item.

And that would also hopefully encourage us to afford that grace to others. so we would see people more as a whole, rather than just seeing their uncurated mass collection of artifacts and thinking that that's the same thing. And mistaking that for their actual self.


Honor: So I can see there's a question from Veronica.

"Have you seen any examples of people saving their data in ways you think are antithetical to how corporations store it and scrape it?"

A lot of people who care about this and think about this a lot and think a lot about their privacy will have their own backups and they'll have things on hard drives physically in their house, and some people have hard copy archives of things because it does look like that might be one of the best ways of saving information for the long term. On a large scale. I think we're really encouraged to not think about it because it benefits capital - is ultimately the motivating factor there. So I don't really know of many, like large scale rebellions to that. I think, because the nature of the context that we're in.

Dave says "any ideas how this relates to software? Do you think the kind of transience you're suggesting works there?"

That's not really my field of expertise. I did say at the start (even though I was going a million miles an hour and I'm really sorry! That was so fast, but I was worried I was gonna be way too long) that it was from a personal point of view. I think that any implementation of things like this should also have within it, that companies should not store data for really long periods about people.

I think it only benefits the scale mindset, the growth mindset, the pursuit of more efficient profit which I'm also pretty fundamentally against. So I think realistically it might negatively impact organisations, but in a way that fits in with the kind of anti, the predatory attitude towards our data. I hope that's close to what you meant.

Okay. So fish says

"What about abandonware? This is a case where data expires. It's impossible to use or download those programs legally, I think it's a sad form of expiration."

So I don't really know what you mean by that. I'm really, really thinking about this from like a personal, interpersonal, our photos, our emails, the things that form our image of ourselves - that's where this topic has come from. I'm really not educated about that. And it's about the personal, and I don't think I can speak on that fish.

Manuel. (Manuel asks a question about the EU Right To Be Forgotten Bill and its impact on the direction businesses are taking in terms of privacy and data) Yeah. So I do think that there is definitely a mood that people wanna move towards, more transience in a lot of things. I think a lot of people have felt the heavy weight of the personal mass. There's a lot of features, even on a lot of fediverse software, Mastodon - it has a auto deletion that you can set up after a certain amount of time, which shows a kind of self-awareness of that mass of data. And I think that it's an important thing to think about.

And myself, when I've been going through my personal archives - my 'personal archives', my collected mass - I've relied on that Right To Be Forgotten, to have my information deleted by companies that I have no business with anymore, no reason for them to store that information about me, and I've been really grateful that it existed. So I think it has possibly informed the kind of general discussion around that because obviously it came from somewhere, people were thinking about it. And when something like that is popular enough to be passed into law, then obviously that is going to be an indication for companies to start shifting towards that way of thinking.

No, thank you, Dave, but it's not a question. No, I didn't mean that like that. I just meant, I didn't know how to respond to it. I'm not often in these contexts speaking publicly like this! So Dave said, "I really enjoy the empirical aspect you're taking to this" and you know it is important to me to not assert something without contextualising it. And you know, kind of just checking in with the world and make sure that other people are kind of seeing and talking about it - not necessarily talking about, but just that there's points I can look at and see that, 'oh, yeah. Yeah. Okay. That makes sense. And that makes sense with what I'm saying. So we're in a situation where this all makes sense.' I think it would be quite bold to just try to assert something like this without having it relate back to some empirical fact.

Do we have any more questions in the room?

Yeah, I really wanted this to kind of just be. . I mean I had a lot more to say, I really rushed through it because I was so scared of going over by a long way. But I cut quite a lot out. I'm gonna be talking about this more because it's really hit home with me, And it really connects with something that I believe in quite strongly, so it's quite easy to keep going down the rabbit hole about it.

But yeah, I wanted this to just be a discussion point and hopefully encourage people to think more wholly about the people that they encounter. And you know, what you take from people's collected archives. . .

Cynth: I mean, if we don't have questions, maybe you could like go through some of the stuff you cut. Cause I know you'd worked really hard on that. You wanna take it away?

Honor: I did have some extra slides prepared, but I don't have them on this device, so I won't be able to share them. I'm not as prepared to share that as obviously I would be if I hadn't cut anything. I did wanna think about the tension between the public and the private both in formal archiving and also in the collected data that surrounds us online. There's quite a lot of debate actually around personal letters, things like that being included in archives, a lot of the time archives actually are ordered to seal records by governments because it would appear unfavorable. There's a very big debate there about private communications in the public domain and the public sphere. And that's obviously completely related to social media and things like that. So when you are posting to an audience but it's technically public, a newspaper can just take that and put that and broadcast it to um, a much larger audience than you were intending just because of the technicalities of the public there.

So I think that if we were in a context where things expired more readily, then we would have a bit more protection from that kind of public/ private conversation being had without our full informed consent to participate in it. But then there's also an argument against that where it's like, okay, well, if people can just delete everything, then how do we learn about things? And like, is there not a threshold at which something becomes news, at which point it should be in the public domain by its importance societally, which is a lot of the debate that happens around formal archives. About the public /private tension people wanting to protect their family, or in some cases it will be their own information that has been delivered in a particular context, and then published in a way that feels a bit icky.

So , that's something that is really related to this topic that I didn't have time to get into. There's also, when everything is public, I did touch on this a bit, but the, kind of the performativity of the society that we're in now and how, when you are doing something, you're thinking about the imagined observer, even if you are not directly being observed.

Right? So this kind of performativity that follows us into our private lives. And that is something that's very connected with this topic that I think about quite a lot. Yeah. That's what I can talk about off the top of my head, but I'm gonna keep working on this. Definitely. So it, it will expand.

Cynth: For some reason, Riley's not here anymore, but, actually, I don't know if Riley's here. I heard the disconnect sound, and I don't know if that was Riley. But I'm reminded of care ethics and a focus on the local and the community rather than overarching structures, I think is a part of that, like paying attention to what individual people need.

Honor: Mm. Yeah yeah, I think so. I think that a lot of this expansive data makes it more efficient for businesses to work, but it also relies on quantitative data about people. It reduces people to statistics and numbers and it means that we get overly fixated on that and personalised Interactions are rarer.

It's harder and harder to talk to a person in customer service. It's a ' common problems people have' bot and if yours doesn't perfectly fit within that, which often it won't because the average is very rarely an exact person, that's something that you then have to deal with and it makes the world generally, I think, a bit worse. And I believe in the concept of a de-growth, a deprioritisation of this, 'everything more efficient, everything larger, everything more profit' kind of attitude.

And it definitely does relate to that quite heavily. That's where this kind of thinking came from. I started going down this hole thinking about scale and communities and how emotional responses don't necessarily scale very well. And we end up in quite a binaristic, binarised situation where there's good and bad and there's number-go-up and there's number-go-down. And there's not really empathetic ways to exist between that.

Cynth: I guess talking about big data also makes me think about AI and stuff that's trained on huge corpora that ends up being just averages.

Honor: Yeah.

Cynth: And what's the most good for the most amount of people, but the 'most amount of people' isn't necessarily you.

Honor: But is it even anyone. It's an average, right, and average often does not fit a lot of people. You see this in physical infrastructure as well, that has been designed to fit the average person, and there's quite a lot of people that fall outside, either side. And it also turns the not fitting into the average into a personal problem, instead of just, you know, this world wasn't designed to be sensitive to each person that approaches it because we prioritised something else. We prioritised efficiency. We prioritised, quite often efficiency in the name of growth in the name of capital. And AI is a tool that's developed to make us even more efficient, right. Even smarter about how we can be more efficient and be smarter about it. But yeah, it removes an actual human interaction. And I guess if it achieved its goals and it was a true artificial intelligence, it would perhaps be able to come up with a solution that was completely perfect for you every single time, but that's not where we're at yet. The computing power that would be required to exist in that space would also be very unsustainable.

this talk was streamed live on saturday 6th august 2022 as part of summer school 2022, an online, interdisciplinary conference for the fediverse, informally hosted by the instance scholar.social.

image sources and further reading

with thanks to howe furber (archive paraprofessional who helped answer lots of questions i had about the role and form of archiving in an internet age), maya (for answering lots of silly questions over email), darius (for helping me start organising these thoughts into something real), cynth (for being a super supportive moderator and asking great questions), and gem (for being the first person i talk to about all of my ideas and never making me feel like i'm silly)

✰ ✶ ✵