Impact of data noise on the performance of supervised machine learning models using multimodal data to estimate collaboration quality

multimodal learning analytics

CSCL

collaboration quality

journal

Author

Chejara, P., Prieto, L., P., Dimitriadis, Y., Rodríguez-Triana, M. J., Ruiz-Calleja, A., Kasepalu, R., & Shankar, S. K

Doi

https://doi.org/10.18608/jla.2024.8253

Citation (APA 7)

Chejara, P., Prieto, L. P. ., Dimitriadis, Y., Rodríguez-Triana, M. J., Ruiz-Calleja, A., Kasepalu, R. ., & Shankar, S. K. . (2024). The Impact of Attribute Noise on the Automated Estimation of Collaboration Quality Using Multimodal Learning Analytics in Authentic Classrooms. Journal of Learning Analytics, 11(2), 73-90. https://doi.org/10.18608/jla.2024.8253

Abstract

Multimodal learning analytics (MMLA) research has shown the feasibility of building automated models of collaboration quality using artificial intelligence (AI) techniques (e.g., supervised machine learning (ML)), thus enablingthe development of monitoring and guiding tools for computer-supported collaborative learning (CSCL). However, the practical applicability and performance of these automated models in authentic settings remains largely an under-researched area. In such settings, the quality of data features or attributes is often affected by noise, which is referred to as attribute noise. This paper undertakes a systematic exploration of the impact of attribute noise on the performance of different collaboration-quality estimation models. Moreover, we also perform a comparative analysis of different ML algorithms in terms of their capability of dealing with attribute noise. We employ four ML algorithms that have often been used for collaboration-quality estimation tasks due to their high performance: random forest, naive Bayes, decision tree, and AdaBoost. Our results show that random forest and decision tree outperformed other algorithms for collaboration-quality estimation tasks in the presence of attribute noise. The study contributes to the MMLA (and learning analytics (LA) in general) and CSCL fields by illustrating how attribute noise impacts collaboration-quality model performance and which ML algorithms seem to be more robust to noise and thus more likely to perform well in authentic settings. Our research outcomes offer guidance to fellow researchers and developers of (MM)LA systems employing AI techniques with multimodal data to model collaboration-related constructs in authentic classroom settings.