Background: The well-known properties of Big Data (Volume, Velocity, and Variety) can be used to describe log data gathered in computer-based large-scale assessments. Log data is available in large volumes depending on the level of detail and the granularity of a log-data recording implemented in a particular assessment software (Volume). While an assessment is still being taken or a digital learning environment is currently used, log data can generate instant feedback (Velocity). Moreover, log data's event character, in which log events of different types can contain various event-specific data, results in diverse data, even after transforming log data to generic formats such as XES, xAPI, or the universal log format (Variety). Log data are not specific to standardized computer-based assessments but also occur as continuous streaming data in game-based assessments and digital learning environments, providing the starting point for modern learning analytics.
Objectives: Although log data can be described as Big Data, current psychometric literature requests validity arguments for process indicators derived from log data. Moreover, log file data analysis requires specific tools capable of handling the contextual dependency of log events (where the meaning of events depends on the context created by previous events).
Method/Approach: Hence, the talk presents LogFSM, a tool that implements a method for analyzing log data using formalized algorithms, specified as finite-state machines (Kroehne & Goldhammer, 2018). The underlying framework disentangles the technical nature of log events from the conceptual decomposition of the test-taking process into meaningful states and relevant actions, on top of which process indicators are created. The presented approach aims to support the analysis of collected log data, the processing, and, in particular, the algorithmic and theory-driven feature extraction that is necessary to analyze these kinds of Big Data from assessments. LogFSM provided as an R package (https://github.com/kroehne/LogFSM) can be embedded into the more general workflow of the definition, construction, and validation of process indicators.
Results/Findings: Empirical applications (using data from various large-scale assessments such as the OECD studies PIAAC and PISA and a test from the National Educational Panel Study) are used to illustrate how LogFSM can be applied to derive meaningful process indicators. Moreover, it will be shown how the reproducibility of log data analyses and the dissemination of log file data and derived indicators can benefit from the framework.
References: Kroehne, U., & Goldhammer, F. (2018). How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika, 45 (2), 527–563. https://doi.org/10.1007/s41237-018-0063-y