Background. Thanks to Social Media and other online platforms, parts of human (inter)actions that traditionally happened offline have moved online, and entirely new interaction spaces have opened. The digital nature of these spaces provides a treasure trove of data, referred to as Digital Trace Data (DTD) in the following, which can help researchers in (i) assessing theories previously developed, and (ii) conceptualizing new theories, especially those that explain digital aspects of human existence.
But studying human behavior online comes with its own set of methodological and technical pitfalls. It raises issues of representation and external validity - regarding online contexts beyond the concrete data source used, but specifically such that infer to broader, and offline, settings. Such studies may not only be unfit to (in)validate previously established theories, but when their insights are applied to real-world applications, biases can be both conceived and reinforced [1]. Therefore, it is essential to identify and document the limitations of studies using DTD. Error documentation strategies exist for other paradigms of measuring attitudes or behaviour, for example, surveys.
Approach. Our work highlights the challenges encountered in DTD research, by leveraging the long-standing Total Survey Error Framework (TSE) [3] and mapping it to DTD research where applicable. Additionally, many new challenges arise due to idiosyncrasies of the big and heterogenous data encountered in digital spaces. Our comprehensive and systematic approach builds on a novel error framework we have developed, the Total Error Framework for Digital Traces of Online Human Behavior (TED-On) [2].
Objectives. Our error framework serves three main purposes:
1. It provides translation from a known framework (TSE) to new applications in the big data domain and establishes a similar vocabulary, while retaining the distinction between errors of measurement and representation [3]. It does so by reframing known errors, such as those of validity of construct measures, sampling and adjustment for research scenarios dealing with big, “found” data. Consequently, it also allows to transfer some of the error mitigation strategies developed for survey research to DTD studies.
2. It introduces novel errors that arise particularly through the nature of DTD that are either not present or not salient in survey research. These are most prominently related to the nature of the data used as well as new data processing and analysis methods (e.g., machine learning) that become on the one hand possible, but often also necessary with DTD.
3. It enables researchers to systematically reflect on and concisely document the errors and biases present in DTD approaches. This allows documentation of a researcher’s own project, but also systematic reviews of existing studies, making their analysis pipelines transparent and comparable - even across disciplinary borders, aided by the common vocabulary mentioned above. We have already begun to work on a systematic review of DTD research and will present case studies that demonstrate the applicability of our framework.
Findings. To illustrate, consider some particularities of DTD studies and how our framework takes them into account:
- First-hand analysis of most raw DTD at scale is beyond human capacity. Therefore, researchers generally depend on automated methods for preprocessing, annotating, and aggregating (e.g. of textual data from online communications such as tweets or comments), raising concerns of validity. E.g., while surveys and content analysis often rely on codebooks that have been explicitly developed and are applied by human coders, DTD methods rely on either fully automated solutions, based on machine learning or heuristics, or a combination of human and automated solutions. This entails developing labeling approaches but these methods can incur augmentation errors both on single traces (e.g., textual messages) collected as well as on the user accounts that represent a target population (e.g., when inferring socio-demographics from profiles [4]).
- The above-mentioned challenge is one effect of different platforms producing not only large amounts, but also heterogeneous types of data that each necessitate a tailored approach: specific norms and technical design of a digital platform affect how people express or record attitudes, behaviours or characteristics, which we discuss under the term platform affordances.
- Regarding sampling, survey researchers, hindered by the infeasibility of reaching a large number of people or responses, typically use probabilistic samples for the estimation of unbiased statistics and/or employ reweighting strategies. With DTD being large scale, researchers can bypass the infeasibility of accessing large populations due to social media platforms containing the digital traces of millions of people; yet which populations they reach on a given platform is dictated by platform coverage, not only by users being registered, but also by which users leave which amount and type of traces, which is governed by their individual characteristics.
- Further, researchers collect particular data subsets of choice from a given platform, often due to size limitations. This can be either done by querying for traces of human (inter)activity that fit specific patterns (e.g., keywords) or selecting users on a platform directly (e.g. via friendship connections or keywords). This differs notably from (random) survey sampling inside a sampling frame and based only on individuals or households, but uses explicit queries on either traces or users, which are often conceived ad-hoc and can lead to selection errors.
Conclusion. In our talk, we will cover these and other particularities and our systematization in more detail, giving the audience hands-on examples and concrete suggestions for tackling challenges associated with DTD studies, based on our extensive research experience in Computational Social Science, especially regarding machine learning and natural language processing approaches.
References
1. Olteanu, Alexandra, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. "Social data: Biases, methodological pitfalls, and ethical boundaries." Frontiers in Big Data 2 (2019)
2. Sen, Indira, Fabian Floeck, Katrin Weller, Bernd Weiss, and Claudia Wagner. "A total error framework for digital traces of humans." Forthcoming in Public Opinion Quarterly, 2021, arXiv preprint arXiv:1907.08228
3. Groves, Robert M., and Lars Lyberg. "Total survey error: Past, present, and future." Public opinion quarterly 74, no. 5 (2010)
4. Buolamwini, Joy, and Timnit Gebru. "Gender shades: Intersectional accuracy disparities in commercial gender classification." In Conference on fairness, accountability and transparency,(2018)