Bachelor Thesis

Cluster Analysis and Characterisation of Students in Virtual Learning Environments.

Awarded as the Best Bachelor Thesis 2020 in the field of educational technologies by the Educational Technologies Division of the German Informatics Society.


Big Data is becoming an aspect of higher education that offers opportunities for the educational world in equal measure as it implies (ethical) risks. Responsible handling of the new data is crucial. Learning analytics is a relevant and emerging area of research in this regard. Learning analytics describes the deeper engagement with data to gain information about learners, learning contexts, and learning processes. A common technique for gaining this information is cluster analysis. In this process, data from learners are classified into groups that are similar to each other. In the literature, clusters have been primarily conceptualized as types of learners. This work examines what can be said about students in these clusters and how they can be further characterized. I used a dataset of students from the Open University in the UK of 18.660 students with over a million data points. For cluster analysis, the weekly sum of clicks per student is log-transformed and clustered using the k-Means ++ algorithm. I conducted an exploratory data analysis, for which I developed a heatmap that visualized the daily click over the term for each cluster. Then, the student data is linked to demographic information. This is followed by statistical analysis and a more detailed interpretation to answer the research question. In the characterization, it can be seen that the study performance is reflected in the clusters and the clusters can be divided into successful and less successful. In particular, the module structure shapes the interaction patterns in the virtual learning environment. In addition, demographic characteristics differ between clusters, so interaction patterns and clusters cannot be considered independent of these characteristics. Among other things, this reveals a gap in interaction patterns and study performance between students from different social contexts, which raises ethical questions.

Full Text

Full Text (German):