|
Hendrik Drachsler |
Hi there, I’m currently reviewing some Learning Analytics papers and wondering if anyone of you knows about an evaluation framework / review guidelines for Learning Analytics? I’m asking that because one of the objectives of Learning Analytics is the personalization of learning, but in order to measure what are promising steps towards this personalization we need to define benchmarks and evaluation criteria in form of a framework. Otherwise I can hardly compare the outcomes of one LA application / paper with another one. Developing such an evaluation framework is not a very easy task, on the other hand many other related research fields achieved a kind of evaluation standard like in data mining and recommender system with the TREC conference for example. They focus on performance, accuracy, precision and recall measures that are rather technical but useful to show the effects of a certain technologies on a specific dataset.
Learning Analytics is not only about technical measures. At the end we want to support teachers and students in their learning process and make the educational system more transparent. But measuring an increase in the effectiveness (learning outcomes) and efficiency (study time) of the learning process takes most of the time much longer than testing the technical measures. The best well known educational evaluation measure is from Kirkpatrick (http:/
So any ideas about an evaluation framework for Learning Analytics? By the way the ACM Recommender System conference applies the following evaluation criteria to their papers. Algorithms
Applications
Presentation Techniques
Recommender Inputs
|
|
Manolis Mavrikis |
Performance, accuracy, precision, recall are indeed a step towards evaluation but all depends on the data and the problem I suppose e.g the Educational Data Mining KDD Cup for example had its own evaluation metrics. The RecSys criteria seem interesting if the question is about evaluating papers. Lastly, I think you are grappling with an interesting problem that I think also came up during the fishbowl in EC-TEL. I agree that Learning Analytics is not only about technical measures but is not only about measure of learning outcomes either. Depending on the purpose, just evaluating a technical approach based on learning outcomes can often be problematic (e.g. the effects from learning cannot be observed in the short time available for early prototype testing let alone the various uncontrolled factors that come into play in classroom evaluations). This is also discussed a lot in adaptive systems evaluation. It is perhaps worth spending some time as a community thinking what is it that we want to measure e.g. in an application that uses data mining to present some results or predict something for a teacher accuracy is important but perhaps it is more important when the output is used for providing recommendations or feedback directly to students, I find there the construct of relevance could be more important (i.e. was the recommendation relevant to the situation?). Different constructs could be useful in different cases. Apart from the metrics framework therefore, it may be useful also to come up with a classification of the type of applications the techiques have before evaluating them. Hope this helps --- interesting discussion. |
|
Hendrik Drachsler |
Indeed, I think proper evaluation framework needs a layered approach like suggested by Brusilovsky for Adaptive Hypermedia. Roughly saying 2-3 layers: 1. Technical Layer, 2. Educational Layer, evtl. a 3. Relevance Layer?? Do you have a reference for the relevance concept, how to evaluate that besides questionnaires and satisfaction analysis? I agree a classification of the type of applications is highly needed and a required step before a generic evaluation framework. Only thing that bothers me a bit is, that all this sounds not to be an easy to use evaluation framework, more like a complicated instrument with many if conditions. Maybe to complicated that people will use it in the end. I raised the same question in the LA google discussion and George and Gavesh replied to it: |
|
Manolis Mavrikis |
A good evaluation framework might indeed need to have different axes and make it a bit complicated but I don't think it would be so complex. Measuring relevance is difficult. Haven't used the concept for evaluating LA or EDM explicitly but we do use it for evaluating support strategies of an intelligent system by recording interaction of the intelligent support. I don't have a published reference. A paper is under review "Manolis Mavrikis, Sergio Gutierrez-Santos, Eirini Geraniou, Richard Noss (under review) Design Requirements and Validation Metrics for Adaptive Exploratory Learning Environments" but I am happy to share with you over email if you find helpful. Here is, however, the main thrush of the paper:
The metrics we are talking about are three performance indicators — relevance, coverage, and scope — and three others for student’s perception of the intelligent system — helpfulness, repetitiveness and comprehension — that allow identifying design or implementation problems at various stages of the development and measuring students perception of the intelligent support. About relevance in particular it is a measure of how many feedback interventions made by the system were relevant for the student i.e. where appropriate to the situation as judged by annotators . In our case that is relevance = \frac{# of relevant feedback interventions by system}{# of feedback interventions by system} but in other cases that could be something else.
In my experience this ends up a bit tedious because it requires involving human experts annotating the data (our only available gold standard in this situation) but maybe in the future it can be simplified e.g. Baker and Carvalho had a paper for labelling student behaviour using text replays (is available in Ryan Baker's website) which could be adapted of course (as the group has done in other papers if I am not mistaken) to perhaps bootstrap the process and evaluate relevance. This would be interesting.
As for the classification of the type of application the 2009 Baker and Yacef's review at the Journal of Educational Data Mining has a classification based on their application (it is available in JEDM's website and also mentions the older Romero and Ventura's classification)
Hope that's useful.
|