TELeurope
Login or Register | Lost password | Help

Log in



Lost password

Log in using OpenID

Not a member yet? Join today!






A short description what your interest is. This field is mainly used for the registration process.

[ x ] close panel
 
X
In the educational world, only very limited datasets are publicly available and no agreed quality standards exist on the personalization of learning. The SIG dataTEL aims to address these issues by advancing data driven research to gain verifiable and valid results and to develop a body of knowledge about the personalization of learning.

Share |
Discussions > An Evaluation Framework for Learning Analytics?

An Evaluation Framework for Learning Analytics?

Hendrik Drachsler
181 days ago

Hi there,

I’m currently reviewing some Learning Analytics papers and wondering if anyone of you knows about an evaluation framework / review guidelines for Learning Analytics?

I’m asking that because one of the objectives of Learning Analytics is the personalization of learning, but in order to measure what are promising steps towards this personalization we need to define benchmarks and evaluation criteria in form of a framework. Otherwise I can hardly compare the outcomes of one LA application / paper with another one.

Developing such an evaluation framework is not a very easy task, on the other hand many other related research fields achieved a kind of evaluation standard like in data mining and recommender system with the TREC conference for example. They focus on performance, accuracy, precision and recall measures that are rather technical but useful to show the effects of a certain technologies on a specific dataset.

 

Learning Analytics is not only about technical measures. At the end we want to support teachers and students in their learning process and make the educational system more transparent. But measuring an increase in the effectiveness (learning outcomes) and efficiency (study time) of the learning process takes most of the time much longer than testing the technical measures. The best well known educational evaluation measure is from Kirkpatrick (http://www.businessballs.com/kirkpatricklearningevaluationmodel.htm) but that requires several pre- and post test in a longer timeperiod.  

 

So any ideas about an evaluation framework for Learning Analytics?

By the way the ACM Recommender System conference applies the following evaluation criteria to their papers.

Algorithms
A good RecSys algorithms paper will:
• describe the recommender/ranking/prediction algorithm in sufficient detail that someone else could implement it
• articulate the important new idea(s) that the algorithm instantiates, in comparison to previously known algorithms
• demonstrate that performance is better on some well-defined metric, than some baseline algorithm.

 

Applications
A good RecSys paper reporting on a case study of an application deployment will:
• Identify a novel type of item to be recommended or decision process to be influenced, in comparison to previously reported targets of recommender systems.
• Identify unusual properties of the new item type that created special problems or opportunities
• Explain any non-trivial mappings of known techniques to the new domain
• Report on challenges and how they were overcome
• Articulate lessons that might be relevant to others deploying Recommender Systems in similar or related contexts

 

Presentation Techniques
A good RecSys paper about a new way of using recommendations/prediction/ranking to enhance the user experience will:
• Clearly explain the presentation technique
• Articulate what is novel about it, in comparison to existing techniques
• Demonstrate that it has desirable properties for users, through anecdotes or data from lab studies or field deployment

 

Recommender Inputs
A good RecSys paper about a new source of information to be used as an input to recommender algorithms will:
• Clearly explain how the information will be gathered or elicited
• Articulate what is novel about the information source, in comparison to other sources
• Provide some evidence that it can be used to make good recommendations

 

 

 

 

Manolis Mavrikis
180 days ago

Performance, accuracy, precision, recall are indeed a step towards evaluation but all depends on the data and the problem I suppose e.g the Educational Data Mining KDD Cup  for example had its own evaluation metrics.

The RecSys criteria seem interesting if the question is about evaluating papers.

Lastly, I think you are grappling with an interesting problem that I think also came up during the fishbowl in EC-TEL.  I agree that Learning Analytics is not only about technical measures but is not only about measure of learning outcomes either. Depending on the purpose, just evaluating a technical approach based on learning outcomes can often be problematic (e.g. the effects from learning cannot be observed in the short time available for early prototype testing let alone the various uncontrolled factors that come into play in classroom evaluations). This is also discussed a lot in adaptive systems evaluation. It is perhaps worth spending some time as a community thinking what is it that we want to measure e.g. in an application that uses data mining to present some results or predict something for a teacher accuracy is important but perhaps it is more important when the output is used for providing recommendations or feedback directly to students, I find there the construct of relevance could be more important (i.e. was the recommendation relevant to the situation?). Different constructs could be useful in different cases. 

Apart from the metrics framework therefore, it may be useful also to come up with a classification of  the type of applications the techiques have before evaluating them.

Hope this helps --- interesting discussion.

Hendrik Drachsler
177 days ago

Indeed, I think proper evaluation framework needs a layered approach like suggested by Brusilovsky for Adaptive Hypermedia.

Roughly saying 2-3 layers: 1. Technical Layer, 2. Educational Layer, evtl. a 3. Relevance Layer??

Do you have a reference for the relevance concept, how to evaluate that besides questionnaires and satisfaction analysis?

I agree a classification of the type of applications is highly needed and a required step before a generic evaluation framework.

Only thing that bothers me a bit is, that all this sounds not to be an easy to use evaluation framework, more like a complicated instrument with many if conditions.  Maybe to complicated that people will use it in the end.

I raised the same question in the LA google discussion and George and Gavesh replied to it:
Check it out here: http://groups.google.com/group/learninganalytics/browse_thread/thread/79932dcd63c473cf

Manolis Mavrikis
173 days ago

A good evaluation framework might indeed need to have different axes and make it a bit complicated but I don't think it would be so complex.  

Measuring relevance is difficult. Haven't used the concept for evaluating LA or EDM explicitly but we do use it for evaluating support strategies of an intelligent system by recording interaction of the intelligent support.

I don't have a published reference. A paper is under review "Manolis Mavrikis, Sergio Gutierrez-Santos, Eirini Geraniou, Richard Noss (under review) Design Requirements and Validation Metrics for Adaptive Exploratory Learning Environments" but I am happy to share with you over email if you find helpful. Here is, however, the main thrush of the paper:

 

The metrics we are talking about are three performance indicators — relevance, coverage, and scope — and three others for student’s perception of the intelligent system — helpfulness, repetitiveness and comprehension — that allow identifying design or implementation problems at various stages of the development and measuring students perception of the intelligent support. About relevance in particular it is a measure of how many feedback interventions made by the system were relevant for the student i.e. where appropriate to the situation as judged by annotators . In our case that is relevance = \frac{# of relevant feedback interventions by system}{# of feedback interventions by system} but in other cases that could be something else. 

 

In my experience this ends up a bit tedious because it requires involving human experts annotating the data (our only available gold standard in this situation) but maybe in the future it can be simplified e.g. Baker and Carvalho had a paper for labelling student behaviour using text replays (is available in Ryan Baker's website) which could be adapted of course (as the group has done in other papers if I am not mistaken) to perhaps bootstrap the process and evaluate relevance. This would be interesting. 

 

As for the classification of the type of application the 2009 Baker and Yacef's review at the Journal of Educational Data Mining has a classification based on their application (it is available in JEDM's website and also mentions the older Romero and Ventura's classification)

 

Hope that's useful.

Hendrik Drachsler
170 days ago

Hi Manolis,

very relevant discussion, I agree we need to first classify the applications and connect the becnhmarking criteria to those classifications. Instead of relevance, coverage and scope i would suggest Data, Technologie, and User. Would be great if you could share the paper by mail: hendrik[DOT]drachsler[AT].ou.nl. I will try to combine that information and design a first draft of an eval. framework. Looking forward to your replies on that when done.