key: cord-0615027-ki7wxqye
authors: Nicol, Emma; Briggs, Jo; Moncur, Wendy; Htait, Amal; Carey, Daniel; Azzopardi, Leif; Schafer, Burkhard
title: Revealing Cumulative Risks in Online Personal Information: A Data Narrative Study
date: 2022-04-04
journal: nan
DOI: nan
sha: 697892ce8e4bfa75dbc6067e3fbf74ba35634b42
doc_id: 615027
cord_uid: ki7wxqye

When pieces from an individual's personal information available online are connected over time and across multiple platforms, this more complete digital trace can give unintended insights into their life and opinions. In a data narrative interview study with 26 currently employed participants, we examined risks and harms to individuals and employers when others joined the dots between their online information. We discuss the themes of visibility and self-disclosure, unintentional information leakage and digital privacy literacies constructed from our analysis. We contribute insights not only into people's difficulties in recalling and conceptualising their digital traces but of subsequently envisioning how their online information may be combined, or (re)identified across their traces and address a current gap in research by showing that awareness is lacking around the potential for personal information to be correlated by and made coherent to/by others, posing risks to individuals, employers, and even the state. We touch on inequalities of privacy, freedom and legitimacy that exist for different groups with regard to what they make (or feel compelled to make) available online and we contribute to current methodological work on the use of sketching to support visual sense making in data narrative interviews. We conclude by discussing the need for interventions that support personal reflection on the potential visibility of combined digital traces to spotlight hidden vulnerabilities, and promote more proactive action about what is shared and not shared online.

people sharing their own personal information online, others sharing information about them, and via e.g., automated functions that make additional metadata public, such as disclosing one's location when posting on Instagram. Sharing may be intentional, unintentional or inadvertent. People are connecting digitally with others for professional and social reasons, via an increasing diversity of channels. Such diversity means that networks supporting interaction and collaboration with others are evolving in ways that are complex and difficult to mentally model. The challenge of understanding how one's data is being shared in and across these networks grows with this complexity.

A substantial body of research has examined how an individual's digital traces may be used to discover or infer information about them -their interests, livelihood, place of work, whether they are depressed or likely to selfharm, relationships, sexual orientation, political opinions, religion and other preferences -even when not explicitly disclosed [37, 43, 81] . Smartphone use alone can reveal much, based on information including accelerometer and GPS data, app usage patterns, call logs and Bluetooth proximity [27] . Revelations include the phone user's identity, mood, stress levels, personality, whether they are a parent, likely destination when travelling, whether they are sitting/walking or running, and the quality of their sleep [27] . When combined over time and across multiple digital channels -e.g., Facebook, LinkedIn and Tinder -this array of digital traces can afford unintended insights into people's lives as private individuals, employees, and citizens. Combined digital traces may also reveal unintended insights about employers, and even about national security.

These insights are significant -from them, we can learn just how much we are revealing about ourselves online, not just about our past and present, but also about our future: "...fragments of past (online) interactions or activities . . . , when correlated together, allow a preemption and prediction of future behaviors" [64, p250] . Insights are also relevant to hostile actors -e.g., fraudsters -who can make use of coherent, combined digital traces to gain advantage over their victims.

In this research we investigate:

• The everyday online information sharing practices of employed people, and their associated awareness of how pieces of personal information -digital traces -can be connected over different online channels over time;

• To what extent people recognise how their connected traces are available to others, potentially to be explored as a more coherent whole;

• What this more coherent whole could convey about an individual, including insights into their apparently private self (e.g., behavior, values, habits etc.);

• How aware people are of the potential harms and hazards of how such insights could be used against them, and by whom.

We took a holistic approach, acknowledging the interconnectedness of online practices [50, 80] ; the networked nature of online identity [51] ; and use of multiple digital channels (e.g., social networking sites, IoT, apps), devices (e.g., personal smartphones, Wi-Fi-enabled devices at home, work computers), and behavioral patterns that are determined by the affordances of particular digital technologies (e.g., GPS data of fitness apps) [28] . We conducted 26 semi-structured interviews to (i) solicit insights into interviewees' digital ecosystems across multiple communication channels, sharing networks and devices plus associated behavioral patterns and practices; (ii) explore co-constructed aspects of participants' online identities across apparently discrete channels of information and; (iii) identify experiences and consequences where combined digital traces revealed more than intended.

All participants in this study were residents of the United Kingdom (UK), and subject to the protection of the EU General Data Protection Regulation (GDPR) in tandem with the UK Data Protection Act 2018 (DPA) [49] . However, following the 2016 referendum at which the UK decided to leave the European Union, the GDPR stopped being directly applicable. We were mindful of this uncertain context while conducting the study and we therefore acknowledge the implications for our participants and findings in our discussion.

We first situate our study in the context of prior work on digital traces, cybersecurity, and the workplace, before going on to outline our data narrative interview method, accounting for the necessary changes to its design during Covid-19 related Lockdown conditions. We then present the results of our thematic analysis across three themes:

visibility and self-disclosure, unintentional information leakage and digital privacy literacies. We discuss issues around soliciting people's recollections and understandings of their digital traces across networked space and time. Our contribution comprises insights not only into people's difficulties in recalling and conceptualising their digital traces but of subsequently envisioning how their information may be combined, or (re)identified across their traces. We also contribute insights on the inequalities of privacy, freedom, and legitimacy that exist for different groups with regard to what they make or feel compelled to make public online and the privileges that are enjoyed by some but not by others. Finally, we make a methodological contribution on the use of sketching to support visual sense making in the interviews, inviting new perspectives on researching personal online information interactions, and building on previous studies in CSCW that used this method in other contexts. This set of interviews represents a step towards our overarching research goal to identify the need for tools or other designed interventions that support not only personal reflection on the potential visibility of combined digital traces, but that additionally support the curation of one's existing traces, retrospectively.

Our study takes a socio-technical approach to the cybersecurity risks emanating from people's digital traces. We interpret cybersecurity from a post-digital perspective, where "the protection of technology and information has become so intermingled with the protection of people and society that distinguishing between the two is impossible.

. . . in a post-digital society, technological security rests upon the protection of people, and vice versa" [21, p10] . Prior work by Dunphy et al. [26] at the intersection of design and cybersecurity has focused on people and their experiences, while government agencies such as the UK's National Cyber Security Centre (NCSC) have now introduced cybersecurity guidance "for anyone looking to develop security which works for organisations and for people" incorporating a design orientation [56] . Our specific focus is on combined, intersecting digital traces, and we frame our work against this backdrop, first defining digital traces for the purposes of this paper and explaining how they can be (mis)used, and the digital literacies that affect people's understanding of how they 'look' online.

Our digital traces are multi-dimensional. We leave traces of personal information across multiple digital platforms and across time. Such traces are generated [64] before we are born, across the lifespan [61] , and even post-mortem [55] . Even in childhood, there is a multiplicity of apparently innocuous channels via which personal information is often shared -for example through connected toys; children and parents posting on social media; biometrics used for schools' fingerprint charge accounts [78] . Brandtzaeg and Lüders [17, p2] highlight that it is "increasingly important to understand how time is perceived in the context of a non-anonymous social media environment" (authors' emphasis), as digital traces over time can reveal much, not only about our current selves, but also about our past opinions, actions and feelings.

Digital traces emanate from the central actor, using their real name and pseudonyms. They also emanate from a range of other actors -e.g., health providers; employers (including through productivity tracking [9] ); government agencies (including public registers of companies) -serving to produce a co-constructed digital identity for individuals [77] .

In addition, people's personal information can surface through other channels, posted by friends and acquaintances, and also by government sources (e.g., voter registers) and other organisations (e.g., that collate and create dossiers on individuals).

While these traces are spread across multiple locations, with subsets of information shared with specific audiences, boundaries between the platforms and audiences are known to be porous. Context collapse, "in which individuals must meet the expectations of multiple and diverse audiences simultaneously" [17, p2] is a recognised phenomenon.

Embarrassing and harmful situations can arise through context collapse, when users try to navigate multiple, diverse audiences on the same platform: they may accidentally blur borders between the public and the private, the professional and the personal -leading to information leakage [23] . However, Costa [22, p3652] found that context collapse was not a given, and was "a result of situated practices of social media usage within Western Anglophone contexts". Her

Turkish participants ably navigated complex security settings to ensure that boundaries between online groups were maintained.

There are tools available that enable individuals to make sense of their digital traces in certain contexts. For example, quantified self, personal informatics, and life logging tools can give people a better understanding of their own behaviors, through recording, measurement, visualization and publication of their own data [29, 45, 64, 69] . Coherently combined traces -e.g. geolocation, step count, heart rate -can become material for conversation and expression of personal identity, and/or to improve behavior or performance in a particular area, and form "highly personal accounts of (users)

pasts" [29, p518] . The utility of combined digital traces extends beyond the individual: exploitation by others can afford unintended insights and privacy violations, potentially adversely affecting individuals, employers and organisational security.

Approaches to exploiting digital traces may be manual or involve the use of specially-developed tools. 'Lurkers'especially adolescents -may trawl the social media feeds of friends and followers for updates and juicy titbits, joining the dots between posts to work out more than was intended to be revealed 1 . More seriously, perpetrators of intimate partner violence may go to great lengths online to track down their victims (survivors) through their digital traces across multiple platforms, in order to continue their abusive behavior [35] . Examples of tools developed to harvest digital traces include a Blockchain-based application that enabled people to establish others' trustworthiness [82] , and a tool that combined online dating site posts with fitness tracker information to reveal where people lived, whether they lived alone, and when they were at home or out exercising -information that was subsequently exploited by stalkers [20] . Using Facebook profiles, Bachrach et al. [8] were able to infer people's Big Five Personality traits. Other initiatives have sought to predict from Twitter and Reddit posts whether people are depressed, suffering from anorexia, or likely to self-harm [46] . On a larger scale, and beyond our current focus on an individual's interpretation of combined digital traces, automated Big Data approaches can use these same traces to assemble insights into people's behavior and to predict e.g., the likelihood of someone repaying a loan or developing diabetes [63] , someone's political leanings [60] , their propensity for criminal behavior [53] , and their mental health [46] .

Efforts to exploit digital traces can be facilitated via lax security and privacy settings and behaviors. A large scale example of this lies in the lax security applied to social networking sites in Runet (the Russian Segment of the Internet), leading to 30 million profiles becoming publicly available to download [41, p50] . The profiles included users' personal and intimate details such as "sexual orientation, sex frequency and preferences in sex", along with "personal information like weight, height, smoking habits, alcohol, drugs, body characteristics... dwelling type, marital status, and religion".

While this information may have been disclosed to circulate within the particular context of e.g., a dating site, it became more widely available, exposing people's intimate details to a wider audience than was ever intended [41] . On a smaller scale, when parents share posts and pictures of their children ("sharenting") without consent, they add a dimension to their children's online identities that may be at odds with how their children wish to appear online, skewing their digital traces [62] . Meanwhile sharing posts and photos about friends may unwittingly reveal private information about them (e.g., details of companions or associates, locations, activities etc).

Even when people actively separate their digital identities across different online channels to organize their social groups or to obfuscate aspects of their identity, their efforts can be undermined by re-identification, involving the linking together of profiles and other information [36] . For example, the Personal Genome Project 2 linked profiles to online voter lists via e.g., birthdate, zip code -destroying the assumed anonymity of the profiles [74] ; Facebook profile images tagged with real names were used to re-identify people on other sites (e.g., Friendster, Match.com) that host otherwise anonymous profiles [36] .

Social media content is used by up to 80 percent of employers and recruitment agencies as part of their assessment of candidate suitability for a post [72] . Young and Quan-Haase [83] found that applicants' chances of acceptance for advertised job positions could be adversely impacted by their openness online about e.g., health conditions or pregnancy.

Employees can face dismissal for their conduct on social networking sites, even when posting outside of working hours [76] . Thornthwaite [76, p119] observes that social media is "blurring the legal distinction between employees' public and private lives, increasing employer control over personal lives in ways reminiscent of traditional master-servant relationships", the effects of which are then tested in industrial tribunals, which increasingly challenge employers' intrusive stance.

Employees' social media activities also have the potential to negatively impact on their employers by unintentionally leaking sensitive information online -such as trade secrets, intellectual property and personal details of other employees.

This can represent a significant security risk to organisations "result(ing) in a loss of competitive advantage, loss of reputation, and erosion of client trust" [4, p351] . Irresponsible posting can result in damage to employers that goes way beyond their reputation. There have been instances of military personnel and their families discussing operations and deployment details on social media, even posting pictures of ongoing operations [25] . In a recent case, staff living on a UK nuclear submarine base exposed compromising information via their use of the Only Fans pornography-sharing website [54] . Adversaries motivated to exploit such information can seek out service personnel or their family members to blackmail them, or use the intelligence gathered to attack or infiltrate deployment locations.

The work of boyd e.g., [13] [14] [15] and boyd and Ellison [16] drew from Goffman's notions of impression management [32] to articulate the necessary maintenance and updating of "front stage" digital selves and online identities, and discrepancies between what one "gives" -in explicit displays of friends, interests and online representations -and "gives off" as interpreted by others, and as amplified in networked publics [13] [14] . boyd's early work with young people also found that they cared deeply about online privacy, and recognised the value of both privacy and publicity, understanding them as contextual and adjustable, the latter especially with regard to being socially present and acquiring social status. Indeed, self-disclosure is a key factor in developing relationships and building trust in online environments, much as it is in face-to-face contexts. This self-disclosure includes actions such as deliberately sharing selected personal photographs [24] , and represents a negotiation of privacy, with information often shared with selected (groups of) online network members, rather than with all network members [51] .

This crafted, contextually-situated privacy is subject to the pressures of normativity. When flows of information adhere to entrenched norms, there are few concerns, whereas when violations of norms occur, protest and complaint often result [58] . For example, a patient may be comfortable with healthcare providers sharing medical information with specialists, but very uncomfortable when the same data is shared with marketing companies (see also Bussone et al. [19] ). When considering contextual privacy in workplace or organisational settings, we can look to the work of e.g., Ashenden [7] . This work found that employees who believed their employer was driven by the need to protect information thought risks to be overstated and colleagues overly cautious, whereas those who believed the organisation was driven by a need to optimise information use, thought security risks justified and colleagues' behavior risky.

McDonald and Forte [52] have drawn attention to the limits of concepts such as contextual integrity and boundary regulation when thinking about privacy in Human Computer Interaction (HCI). They revealed conceptual gaps in current frameworks and have argued for considering vulnerability i.e. of particular groups of people as a core concept when thinking about privacy. Researchers in Computer Supported Cooperative Work (CSCW) have highlighted serious issues with digitally-mediated identity management, with the effects often being pronounced for those considered vulnerable e.g., Simpson and Semaan's work on algorithmic identity investigated and confirmed concerns that the short video sharing application TikTok was suppressing the individual identities of LGBTQ+ users via algorithmic and human moderation [71] . Seberger et al. [68] showed how users navigate trade-offs involved in app use. Despite technical and regulatory mechanisms aimed at empowering users to manage their privacy, people have a sense of resignation around privacy due to the convenience offered by apps: there is a fine line between feeling empowered by technology and the discomfort of invasive app behavior. Users are often resigned to disclosing data even as they accept personal responsibility for their own privacy.

People can often be surprised when they discover the personal data collection and distribution activities of apps that they use. Shklovski et al. [70] showed that people felt personal space had been violated in "creepy" ways by apps with the creepiness lying in the realization that apps were conduits for personal information and space to "leak" to unknown entities who had not explicitly been invited in. These authors link creepiness to notions of personal space and territoriality [5] , and to contextual integrity [57] . People will formally agree to the information sharing undertaken by apps, and can rationalize their use of them when asked, putting any creepiness out of their minds. Nonetheless, the creepiness remains. [70] further suggest that there are harmful health consequences of this enduring creepiness while acknowledging that such creepy experiences may not always be negative and unwanted; and pose an interesting question as to whether such creepiness fades over time suggesting that as cultural norms change, so too will the conditions under which such creepy experiences are encountered, which has implications across the digital lifespan.

A person's ability to generate digital traces online does not imply the accompanying presence of digital privacy literacy, that is, an understanding of how information travels when shared online, and the associated risks. Digital privacy literacy is an area of education and practice often used in social justice and/or public education programmes, including those run by public libraries to help develop competencies in groups of people at particular risk, including those who rely on public library computers for personal digital communications and information practices, particularly those who are subject to social inequalities (see e.g., [2]). Digital privacy literacy can be considered distinct from the well-established, if rather broad area of digital literacy (e.g., [44] ), which refers to an individual's multiple competencies, from having regular access to and basic functional operational skills (e.g., using a keyboard and mouse) along with being able to "read" and "write" clear information across a number of digital modes using these apparatus, including in textual, visual and wider forms of communication media [44] . Digital privacy literacy can be regarded as a subset of data literacy, and is used to refer to a range of competencies around understanding and communicating with informational data, which often relates to privacy and (personal) data, and goes towards enabling one's personal data self-care.

In the social sciences, the growing literature on critical data studies e.g., [42] , includes [48] Lupton's notion of the "lively" aspects of personal data, as they are added to, and (re)configured by human interpretation and corporate segmentation/analyses. Lupton notes that personal data, as (partly) human, is typically represented visually and in language as organic and material (e.g., flows, breadcrumbs) or "humanised" as e.g., footprints (p47). People's encounters with the personal data that they generate via use of digital technologies presents them with challenges as to how to interpret, control and make sense of these data. Lupton elsewhere [47] has argued that such data and their circulations could be made more perceptible and interpretable using what she describes as three dimensional materialisations, recognising that people's interactions with such re-presentations of personal data elicit visceral responses. There are also new research areas around human data interaction that include designing frictions into user experiences of technologies to promote critical reflection, e.g., prior to sharing information; and on interaction design's dark patterns, which comprise e.g., targeted manipulation and confusing terms of service [34] , widely adopted by industry to nudge people into a particular course of action, including coercing people into disclosing personal data, often beyond that necessary for the task in hand.

We conducted an interview study using a data narrative approach [80] in order to understand the risks and consequences of the digital traces that people leave online. This approach served to capture participants' descriptions of their data, device use, channels and networks of communication, and data and information practices. Using this approach also allowed us to capture the co-constructed aspects of a person's online identity, as well as enabling the investigation of direct and observed experiences of, in this case, the cumulative implications of digital traces.

We conducted the study in May-July 2020. Due to the physical distancing requirements of the Covid-19 Lockdown in place in the UK at the time of the study, we had to conduct interviews remotely via videoconferencing, which we took care to pilot before the interviews. We engaged with participants in advance by sending information sheets by email.

Cognisant of the likely effects on data sharing that might arise due to the circumstances of the Lockdown, we added questions to the interview schedule regarding changes to data-sharing habits and experiences, with the intention of capturing the effects of self-isolation, homeworking and other Lockdown-related phenomena (Appendix A).

We recruited 26 adults (13 male, 12 female, one non-binary; age 20 -59 years, median 37 years) to take part in the study. All were based in the UK, active online and in full-time employment. We recruited participants by creating an advertisement that was circulated via emails to contacts with access to mailing lists to be shared more widely, and via social media, with the offer of a £20 shopping voucher for participation. We aimed to recruit participants from a variety of employment roles and sectors and made sure there was roughly equal representation from employees in the public (n=16) and private sectors (n=10) and that staff recruited represented all levels of seniority. Participants were employed in sectors including healthcare, education, engineering, management, IT and hospitality and were drawn from city, suburban, town and rural locations. 21 participants reported speaking English as their first language, four spoke it as a second language and one was bilingual in English and another language. 13 had postgraduate qualifications, 10 were qualified to undergraduate level, two had qualifications from further educational studies and one had high school qualifications. When we asked about their level of technology skill (the interviewer read out the full definition of each category to each participant) they responded as follows: four said "Low/Low-Medium" indicating basic use of software, hardware and social media; 15 said "Medium" indicating confidence with using and integrating a variety of standard software packages over a number of platforms; seven said "Medium-High/High" including the use of specialised software and an ability to program. At the time of interview, 18 participants were working at home full time, five split their time between working at home and at their workplace, while three reported no home working.

The first author conducted all interviews, with each lasting 60-90 minutes. In a short briefing, we invited participants to ask questions about the study based on their reading of the information sheet. They then provided consent verbally. We then asked participants to complete a technology questionnaire via a SurveyMonkey link, delivered via the Chat function of the videoconferencing software, or sent by email. We designed the questionnaire to capture participants' self-reported use of technology including devices, communication channels, data storage and social media networks (Appendix B). In addition, the interviewer asked participants a short series of questions regarding their current employment, level of education, and their understanding of and confidence in using digital technology.

Interview questions were centred on the following areas and are detailed in Appendix A: (i) information about communications channels, apps, data storage/management systems, and devices used, including whether/how any of these were shared; (ii) everyday practices and behavior patterns around e.g., conducting searches, posting and other digital information sharing; (iii) participants' awareness of the unanticipated potential for self-disclosure through digital traces and their associated level of concern; iv) information management, security setting behaviors and Lockdownrelated changes -especially regarding working from home; v) we asked participants to envision a scenario where someone else had to write a book about them based only on their digital traces, and to think about what the resulting book would comprise. We finished by asking them to summarise their advice to others on optimising their information security. We tailored the questions, where appropriate, through answers provided in the technology questionnaire.

We supplemented the interview questions by asking the participants to hand-draw sketches of their digital eco-system on paper. Having participants sketch as part of interviews, has its roots in Cultural Probes [31] and has been used successfully within the Human Computer Interaction (HCI) and Computer Supported Cooperative Work (CSCW)

communities in a number of study contexts. For example [40] asked participants to draw a map of their finances to understand how people track money. Building on work by e.g., Ryan et al. [66] on mapping interactive digital artefacts,

Vertesi et al. [80] used the data narrative approach to promote people's descriptions of how they manage their personal data. Drawing was used to facilitate more thorough conversations, to elicit grounded comments about data practices and examine conceptions of personal data space. Vertesi et al. [80] argue that drawing during interview allows the remembering of new stories, the discovery of forgotten elements, and the visual expression of relationships between devices and data. The drawings produced are not a test of a participant's precision or recall with regard to their digital ecosystem, rather, they give structure to the interview process as tools with which to think [80] . Memory aspects aside, asking participants to engage with their personal information sharing in this way rather than, for example, simply talking about it or using their devices as the main support, exploits the power of defamiliarization described in work by e.g., [11] to invite participants to have a new perspective on familiar aspects of their lives.

In the current work, participants were asked to come to the session prepared with pen/pencil and paper. They were asked to sketch out their communications channels, devices, types of information shared/intended recipient(s), along with whether usage was in a personal or professional capacity, and (separately) whether they were using a personal or professional account. The process of drawing the sketches proceeded throughout the interviews with participants adding to them as elements were remembered. Participants were also asked to identify and draw any links between media, devices and the public or private aspects of shared information. Participants were later asked to photograph and email the final sketched maps to the interviewer (21 sketches were returned from 26 participants). We have included two example sketches in Appendix C to illustrate the variation in approach and reflecting the uniqueness of each individual in the study and the personal nature of their digital ecosystem.

We made audio recordings with participant consent, and coded anonymised transcripts by performing thematic analysis [18] , using NVivo. Our analysis took a hybrid approach (see also [67] ): existing concepts were used for deductive coding while new concepts grounded on the empirical data from the interviews, contributed to the inductive coding.

The deductive coding included, for example, concepts from technology law literature on personal information such as Right to Erasure and pseudonymous posting [65] and on the participants' desires or requirements for a tool to manage their digital traces. The resulting coding list was iteratively refined in the light of the interview data, as new codes emerged. One author did the early coding, undertaking frequent code review sessions with another author to help remove potential biases. The list of preliminary codes was then distilled further into a set of refined codes ( Table 1 in Results) that corresponded with a high number of instances across the transcripts, or that captured novel emerging design ideas or relevant practices. All authors were involved in three data sharing meetings and arrived at the designations of the refined codes through discussion. Further iterative analysis and clustering resulted in the three final themes: visibility and self-disclosure, unintentional information leakage and digital privacy literacies. From an original list of six themes, visibility and self-disclosure arose from the combination of themes of visibility, self-disclosure and identity curation, unintentional information leakage was one of the original themes, while themes of conceptions of overview of own data and data concerns were combined into one as digital privacy literacies. Each theme is discussed in turn below illustrated with pseudonymous quotes.

We now discuss each of our themes in turn (see Table 1 for details of the codes that comprise the themes), illustrating points with verbatim quotes. While participants' ages are reported, all names have been changed and other identifying information omitted to protect interviewees' privacy. Meanwhile, five participants expressed awareness that online traces only ever provided a partial picture, albeit an authentic representation of someone, and as such invited interpretation. Una, a clerical assistant commented: While not expressing particular concern, Una touched on the potential for inferences to be drawn from incomplete traces, and also on how some interests lend themselves more to being documented and shared, saying: "I play a musical instrument, but that is nowhere on my social media, and not many people know how musical I am...

[but] they know I'm maybe quite sporty, but they don't see the musical side" (Una, 21). As part of his youth worker role, Calvin recruits others to work with the young people for whom he has safeguarding responsibilities on on whom he carries out informal background checks online -while also mindful of the limitations of these:

One of the first things I do is a basic Facebook search of them...and sometimes that has. . . created a false narrative ... We had a student on placement, and her photos showed ...this quite ragey, party person. When I met her, they were completely lovely, and those photos didn't represent them fairly. One interviewee reflected on living with an ongoing tension between feeling they had to promote their visibility online, and protecting their own safety and that of their community. Zara, a third sector worker whose family had sought asylum in the UK, commented that her family's desire to be invisible online raised questions, including around their legal status, which she perceived as damaging her current prospects:

I have also met loads of people who were genuinely risking... running for their lives. And any information that they put online digitally would be instantly sought out, so they stayed off any kind of digital, social media, anything. But then they're also met with the contrast of needing to put something out in order to progress -I'm going to say in a Western country -but, well, that's not exclusive to Westerners, but just to put yourself on show, or otherwise people don't think you're legitimate. (Zara, 20)

Helen, a helpline operator, described how wanting to stay offline had affected a loan application: "I had to put myself Participant responses often linked self-disclosure with wanting to share a particular personal event or to signal allegiance with or opposition to a more universally significant concern. Participants recognised the risks of oversharing, but also, as in posting in support of a social movement such as BLM, of under-sharing -for fear of being seen as uncaring or unaware. Visibility and self-disclosure were perceived as tricky to navigate: even when preferring to maintain a minimal online profile, current norms around recruitment, immigration applications and the financial service industry, for example, often require personal information to be available online in order to assess credibility or eligibility.

Participants often implicitly recognised the need to tread a fine line regarding the volume and type of information to put out there. We heard from the perspectives of both recruiters and the recruited, how the absence of certain types of information threatened employment and wider social and professional prospects. This partial picture was recognised as potentially damaging in specific contexts such as job seeking.

Various forms of self-disclosure can stem from information leakage. Participants mostly talked about unintentional information leakage in respect of their personal (i.e., non-work) information sharing, but leakage due to work/personal boundary blurring was also reported by two participants. Third sector workers Zara and Flora had both resorted to using their personal Facebook logins to set up Facebook accounts for work, having tried and failed to set up dedicated work profiles. This led to their mixing information sources across professional and private accounts, and their professional identities encroaching onto the personal: "It's really difficult to keep a personal Facebook, I think, if your workplace is using Facebook a lot for the way they work" (Flora, 54).

Four interviewees reflected on their experiences of information leakage in their domestic lives. Such leakage could be intimate, if also unintentional and/or somewhat creepy, though some were more relaxed about this than others. Will (24) These information leakages mostly occurred due to insecure account settings. However, 11 interviewees recounted how things posted online about them by others either unthinkingly or, due to more deliberately malicious actions,

were also sources of unintentional information leakage. Healthcare worker Will described how his mother's misplaced pride had potentially serious consequences: "I said (to her) I wanted to join. . . one of the intelligence agencies. She. . . posted up, 'My son wants to become a spy, how does he do it?'" (Will, 24). While Will joked, saying this was "a little bit counter-productive" for undercover work, he went on to recount how his mother had also posted details of his confidential military achievements and training exercises on Facebook, including the nature and location of the training.

This leakage violated military protocols and created a security risk.

Instances of postings either by participants or people known to them, that had then been appropriated to other media, were reported by five participants. Conversations believed to be private had been recorded and/or more widely (re)shared in a very different, typically much later context, e.g.,:

A friend of mine. . . was called out on Twitter for things that he'd said in a private group chat six years ago, and. . . reported to his place of study. It's all cleared up, but it still sort of hit home... people can take anything you say and change it to however they see it. . . I guess if there's any shred of doubt, it can be...

Delivery driver Vinny (24) In contrast to more narrow understandings of data leakage that link it to the use of insufficiently secure settings, our participants reported threats that arose in many other ways. Such threats were apparent through the actions of others, via the re-appropriation of content intended for a specific audience, into new locations and contexts. This revealed the information to a wider public -often alongside personally-identifying, or socio-politically significant information.

This might be done with the intent of causing reputational damage, but even in the absence of malicious intent, such combinations of digital traces can lead to revelations, and the associated lack of control that individuals have can be problematic and/or distressing. Our participants' responses also show that the proliferation of Internet of Things (IoT) devices has created new vectors for information leakage. Personal and specifically domestic contexts are where they reported the majority of such experiences, perhaps due in part to the Lockdown context, but also because these instances were those that were the most immediate, identifiable and relatable. Two female participants in particular talked of recently discovering routes via which partners had sight of what had previously been private information, and of their raised awareness, and associated concern.

The interviews were revealing with regard to participants' digital privacy literacy, their knowledge and understanding of the nature and potential of coherent digital traces, and of their control and personal agency over those traces.

Overwhelmingly, participants were aware that there was a great deal of publicly available information online about them. 14 participants explicitly mentioned that they knew this from having previously conducted a Google search on themselves. Jenny, a local government officer, had been disturbed to discover some information related to a company with whom she'd long ago been involved: "it had my full name and address and date of birth, and I was like, 'Whoah!'"

(Jenny, 43) Despite this, none of those who reported having searched for themselves on Google was able to describe with confidence or accuracy the range of publicly or semi-publicly (for instance, a Facebook account) visible information across their various online accounts. For those participants who said they had never Googled themselves, when offered the opportunity to check what was visible online during the interview, they were invariably surprised by the level of detail about them that was public. Further, slighty more than half (14) of the participants said they had what they regarded as stringent approaches to information sharing, deletion and account security. For example Olly, an electronic engineer, recounted one of his practices and the motivation behind it: "I tend to disable search history...my bigger fear is I am an immigrant into this country. . . so I think that if I search for something because it was in the news, can it be connected to me...by mistake?" (Olly, 40). However, this was at odds with some of participants' other answers during interview. For example, when asked how they would advise a friend with the same digital services as them what security measures should be taken to secure their information, in the case of all but 6 participants, their answers tended to be very general and provided few specific recommendations, indicating that they had relatively low levels of privacy literacy and a lack of awareness around their own information's potential for being compromised.

As referenced in the wider results, participants did, on the whole, have a good sense of the long-lasting persistence of their information, once it was online. However, envisaging the potential effects of connecting apparently discrete aspects of their online information coherently proved more challenging even with access to their various accounts and having their sketches to hand. Understandably then, participants struggled to comment on where potential risks or possible consequences lay where others might connect the same dots into coherent digital traces to potentially use against them. Where matters of concern were expressed, these related to interviewees being aware of inviting unwelcome marketing (10 participants), and being targeted by advertised goods, particularly if these were of no personal interest to them.

We found it striking that nine of the participants regarded themselves and the personal information they had shared online as being of no likely interest to others. Ivor, a writer and teacher, was aware that a great deal of his personal information was available but was bemused that there could be any interest in it: This self-identification as "boring", a term used specifically by five participants, was a factor in some interviewees' lack of motivation around deleting superfluous personal information circulating online. Four further participants used language such as "dull" or "uninteresting" to describe their postings.

In line with the literature, four participants expressed a lack of agency, overwhelm or resignation [10, 38] , or feeling unable to manage and where necessary remove information [59] . Flora, for example, conveyed a sense of being overwhelmed by her online clutter or, at least, with finding sufficient time to deal with it; the task was clearly not a In summary, participants showed an awareness that they had left digital traces but could not be accurate about how visible their online information was: those who looked were surprised when confronted with the reality. Even with the support of sketching, the majority did not or could not show how their devices and information channels were interconnected. All participants had a sense of the persistence of online information but only rarely did they acknowledge the potential for connections to be made and compromises to arise. Even those who believed themselves vigilant in their approach to personal information sharing could not explain beyond the most basic guidance, how one would secure devices and channels to minimise risks from cumulative revelation. In general, participants lacked agency to undertake remedial action to their digital traces out of a sense of it being too late, not possible to do, or it not being necessary as their traces were "boring" and therefore posed little risk to them. Younger participants were more likely to have actively removed content, often to delete elements of childhood online activity as they moved into adulthood.

This research focused on employed people's everyday online information sharing practices and their associated levels of awareness of how pieces of personal information -their digital traces -can interconnect over different online channels and media over time. We wanted to find out to what extent people recognised how these connected traces are available to others, to be explored as a more coherent whole; what this coherent whole could convey about them, including their apparently private self (e.g., behaviours, values, routines etc.); and where and to what extent they were aware of hazards and potential harms of how this could be used against them, and by whom. Through thematic analysis of the outcomes of 26 interviews, we uncovered themes of visibility and self-disclosure, unintentional information leakage, and digital privacy literacies.

Visibility and self-disclosure was heavily influenced by necessity and obligation, with some participants feeling compelled to have an online presence when job seeking. This is consistent with Berkelaar and Buzzanell's findings [12, p84] that employers increasingly expect potential staff to maintain digital career capital to enable employers to "construct and evaluate professional and/or workplace identities". Participants also identified that online visibility could help to build legitimacy as citizens, and to comply with perceived social norms -for example, by publicly expressing a stance around current events such as Black Lives Matter. Choosing not to be visible and not to disclose information about oneself can be seen as a privilege, afforded to those who are established members of society and not seeking work. This is especially pertinent for those whose safety may be jeopardised by online visibility -such as survivors of domestic or other abuse, and asylum seekers -yet who feel compelled to be visible due to the adverse impact that invisibility could have on their chances of getting a job, or gaining legitimacy as citizens and members of social groups [21] ; also see [73] .

Unintentional information leakage occurred as a result of the actions of others, who shared participants' information to unforeseen audiences and at times causing un/intentional shame or other harm. This could be particularly uncomfortable when the information shared was from long-forgotten posts, or was taken out of context. While participants had a good understanding of the persistent nature of aspects of their traces, they found it difficult to recall what they had previously posted across multiple channels and hence where potential vulnerabilities might lie. Yet it is not at all surprising that participants struggled to remember past posts when remembering involves "cognitive processing of knowledge from the past, through a repetitive process of reconstruction" [79, p371] . Given the volume of information commonly shared online, remembering everything that one has posted presents an intractable cognitive processing burden that links to our third theme of digital privacy literacy. While processes of remembering can be supported by a range of internal and external cues including those that are technological such as Facebook "On This Day" reminders, things always get irretrievably forgotten [79] . Although options exist to have content removed from the Internet or at least not show up in search results -e.g., the Right To Erasure 3 -there is no easy mechanism through which to erase aspects of one's past history online, or to remove comments made over years that could be misinterpreted or show one in a bad light if later taken out of context. Multiple respondents rationalised that their online information was "boring" and of no interest to others. They also referred to being unable to summon the required time, effort and/or practical digital privacy competencies to erase aspects of their past history online. This is understandable, when such curation increasingly involves sophisticated multidisciplinary skills and knowledge spanning digital, technical, legal, and socio-cultural competencies. For example, an individual who is seeking public election might want to check back through their past history online for any information that could be taken out of context and wrongly interpreted as expressing socially unacceptable views. Of course, this is also open to misuse, with individuals who genuinely hold socially unacceptable views cleaning up their online profiles to obfuscate their true opinions.

Our study was conducted at a time when the legal context of the UK was an uncertain one. Materially, little has changed for UK citizens with regard to GDPR since the UK left the EU, as the UK enacted the UK-GDPR in 2020. However there is an ongoing high-profile political and legislative debate as to whether the UK should diverge more aggressively from the European framework, which is seen in some quarters as unnecessarily burdensome and overprotective [3].

All these changes took place, and were publicly discussed, while the interviews took place. This posed some legal and ethical challenges for the research: knowing that the GDPR would cease to be applicable shortly after the interviews were completed, what legal assurance could be given to the participants? It raised also questions for the substantive part of the research: discussions surrounding post-Brexit data protection in the news will have created more awareness of data protection questions, and may also have contributed to an even stronger feeling of uncertainty and vulnerability.

Disclosing data about oneself in the UK during 2019, the year prior to this study, also meant that it was at least not certain what legal protection would apply to it in a few months' time, which given the permanence of digital traces poses a significant difficulty.

A key limitation of our work is that while our participants were able to conceptualise aspects of the implications of personal information sharing in interview, they consistently struggled to conceptualise the entirety or whole picture of their accumulated digital traces across multiple channels and across time, and potential knock-on effects and risks. We acknowledge that the data narrative approach was not sufficient to achieve this and in this context, we identify the following pressing future work:

• Demonstrate to individuals in everyday terms -perhaps by using other narrative approaches, including scenarios -the potential use by another agent of seemingly harmless pieces of personal information posted across disconnected digital traces.

• Go beyond "awareness nudges" by promoting reflection before sharing, to enable people to make informed choices about the information that they add to their cumulative digital traces.

• Some participants conveyed anxiety around their old posts being re-discovered, despite them not having clear recall of their contents, amplifying their perceived impotence and lack of knowledge about how to go about removing offending information. We see an opportunity to enable people to efficiently curate the material that they have posted in the past online, without having to trawl through every single post or delete an account wholesale.

• A further opportunity to reduce unintended information leakage lies in integrating prompted password change as part of standard installation for owners of domestic internet-enabled/ IoT devices, combined with information about inherent risks of devices and how to mitigate against these, to protect domestic privacy.

Last but not least, there is a critical need to address digital privacy literacy, including digital privacy gaps. This aligns with ongoing efforts towards ensuring social justice within the HCI e.g., [73] , and CSCW [75] design and wider research communities. Talhouk et al. 's CSCW work [75] around the digitisation of food aid intended for Syrian refugees in Lebanon found that due to refugees' "low technological literacies", their "experiences of engaging with food aid"

were severely impoverished and ability to "identify and report the misuse of the technologies by other stakeholders and intermediaries" (p133) curtailed, amplifying the already present power asymmetries experienced by those groups.

While interviewees had a good understanding of the persistent nature of their traces, they found it difficult during interviews to recall what they had previously posted. Remembering, from the Latinrememorari -or "call to mind", involves cognitive processing of knowledge from the past, "through a repetitive process of reconstruction" [79, p371 ].

Yet people face challenges in making connections between something visible, and its meaning -and, to paraphrase Rancière [1]-across heterogeneous spaces and times. Rancière also refers to "the power of art in its ability to represent what is absent or unrepresentable -and that when they are represented they infer power" (ibid). He, along with others, discusses memories as works of fiction -reconstructions as opposed to re-presentations or later reproductions. While processes of remembering can be supported by a range of internal and external cues -things always get irretrievably forgotten [79] . There is only ever a partial picture, or re-presentation.

Our work offers understandings around personal information and what it collectively comprises [39] , including the inferences that others can draw. It aims to promote personal agency around management of this information. As future work, we will use the findings reported here to inform the design of an online tool. The digital user interface of this tool, as well as how it curates, contextualises, and relates information to people, will be informed by the qualitative outcomes of design workshops. The tool will allow people to explore the risks and consequences surrounding their own online data-sharing activities and the digital traces they leave behind. We are mindful of Elsden et al. 's proposition that "design should seek to support people in making account of their data, and guard against the assumption that more, or "better", data will be able to do this for them" [29] . Our current design work is also mindful of so-called "moral economies" that are produced as a result of practices and activities around personal data, which are laden with affect, cultural expectation, and responsibility [80] . Design, we argue, is central to promoting sense-making and digital privacy literacies in this context. Even the provision of designed tools is a form of design activism, introducing new frictions into online activity to provide context to, reflection on and guidance for, our information-sharing decisions [6, 33] , and/or where necessary, helping to develop counter-narratives [30] . C PARTICIPANT SKETCHES Fig. 2 . Two drawings created by participants, Vinny, 24 (right) and Denise, 32 (left) during interview to represent the online platforms they used and the types of information shared. Both participants indicated interconnections, drawing lines between platforms to convey information flows. We received sketches from 21 participants documenting their devices and apps, often along with the various relationships these enabled, in a mapping or table arrangement. The sketches supported and added depth to interview conversations and the drawing of the sketches often triggered participants to remember additional details

Data Privacy Project -Initiatives to inform and support libraries and librarians

A case analysis of securing organisations against information leakage through online social networking

The environment and social behavior: privacy, personal space, territory, and crowding

Digital interface design and power: Friction, threshold, transition. Environment and Planning D: Society and Space

In their own words: employee attitudes towards information security

FacebookPersonality_michal_29_04_12.pdf

Relational privacy and the networked governance of the self

The privacy paradox -Investigating discrepancies between expressed privacy concerns and actual online behavior -A systematic literature review

Making by making strange

Online Employment Screening and Digital Career Capital

Taken out of context: American teen sociality in networked publics

Why youth (heart) social network sites: The role of networked publics in teenage social life

Public Default Private When Necessary'. boyd -Google Scholar

Social network sites: Definition, history, and scholarship

Using thematic analysis in psychology

Trust, Identity, Privacy, and Security Considerations for Designing a Peer Data Sharing Platform Between People Living With HIV

Protecting oneself online: The effects of negative privacy experiences on privacy protective behaviors

Too Much Information: Questioning Security in a Post-Digital Society

Affordances-in-practice: An ethnographic critique of social media logic and context collapse

Context collapse: Theorizing context collusions and collisions

Sex Differences in Self-Disclosure, Reciprocity of reciprocity of self-disclosure and liking: Three meta-analyses reviewed

Exploiting military OpSec through open-source vulnerabilities

Understanding the experience-centeredness of privacy and security technologies

The future of wearable technology -Centre for Research and Evidence on Security Threats

Do smartphone usage scales predict behavior?

A Quantified Past: Toward Design for Remembering With Personal Informatics

Design activism: beautiful strangeness for a sustainable world

The Presentation of Self in Everyday Life

Meaningful Inefficiencies

When does manipulation turn a design 'dark'? Interactions

An evidence synthesis of covert online strategies regarding intimate partner violence

Information revelation and privacy in online social networks

Digital footprints and changing networks during online identity transitions

Explaining the privacy paradox with online apathy

Tangible data, a phenomenology of human-data relations

Money Talks: Tracking Personal Finances

Analysis of privacy in online social networks of Runet

Small data in the era of big data

Online social networks: Why we disclose

Literacy in the New Media Age -Gunther R. Kress -Google Books

A Stage-Based Model of Personal Informatics Systems

Overview of eRisk 2019 Early Risk Prediction on the Internet. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11696 LNCS

Feeling your data: Touch and making sense of personal digital data

Data selves: more-than-human perspectives

The foundations of EU data protection law

Polymedia: Towards a new theory of digital media in interpersonal communication

Networked privacy: How teenagers negotiate context in social media

The Politics of Privacy Theories: Moving from Norms to Vulnerabilities

Predictive Policing: Review of Benefits and Drawbacks

Navy officer 'filmed X-rated video at Faslane nuclear base' | Metro News

Living digitally

The cyber threat to UK business

Privacy in Context: Technology, Policy, and the Integrity of Social Life

A Contextual Approach to Privacy Online

Sunlight alone is not a disinfectant: Consent and the futility of opening Big Data black boxes (without assistance)

Who do they think you are? | Open Rights Group

Opportunities and challenges of the digital lifespan: views of service providers and citizens in the UK

Sharenting: Parental adoration or public humiliation? A focus group study on adolescents' experiences with sharenting against the background of their own impression management

Reinventing society in the wake of big data

Why data is not enough: Digital traces as control of self and self-control

The Right to Be Forgotten

Device Ecology Mapper: A tool for studying users' ecosystems of interactive artifacts

Design for trust: An exploration of the challenges and opportunities of bitcoin users

Empowering resignation there's an app for that

Beyond total capture: A constructive critique of lifelogging

Leakiness and Creepiness in App Space: Perceptions of Privacy and Mobile App Use

For You, or For

Invasion of the Social Networks: Blurring the Line between Personal Life and the Employment Relationship

Technologies for social justice lessons from sex workers on the front lines

Identifying Participants in the Personal Genome Project by Name (A Re-identification Experiment)

Food Aid Technology: The Experience of a Syrian Refugee Community in Coping with Food Insecurity

Social media and dismissal: Towards a reasonable expectation of privacy

GROOMING, GOSSIP, FACEBOOK AND MYSPACE. Information

UK Children's Commissioner

A future-proof past: Designing for remembering experiences

Data Narratives: Uncovering tensions in personal data management

Mosaic: Quantifying privacy leakage in mobile networks

TAPESTRY: A De-centralized Service for Trusted Interaction Online

Information revelation and internet privacy concerns on social network sites

We acknowledge the contributions of our project partners and time and effort of our participants. This work was sponsored by EPSRC grant EP/R033889/1. The study was approved by the Research Ethics Committee of DJCAD, University of Dundee.