Review Smoother

Tools for Factoring Out the Generosity of Reviewers

One of the favorite activities of social scientists, and in society at large, is the collection of ratings. We rate movies and restaurants, colleges, consumer products, conference papers, job applications, SAT essays, and mutual funds. The ultimate goal of collecting these ratings is often to assign each item a score with which they can be compared. The most common and straight-forward method for producing such a score is usually to average the ratings that each item receives.

However, it is frequently the case that each item is rated by only a subset of the reviewers. If the reviewers differ in the way they assign their ratings, such that some reviewers are more generous than others, simply averaging an item's ratings may result in a biased score. If an item is only reviewed by a small number of reviewers and some of them happen to be sticklers, the item may receive a lower average rating than it deserves. In theory, this bias could be greatly reduced by estimating how generous each of the reviewers is, relative to the other reviewers, and adjusting his or her ratings accordingly.

The following technical report describes a model that simultaneously determines the generosity of each reviewer and a score for each item that is adjusted to account for the varying generosities of its reviewers.

Technical Report

This form allows you to try out the analysis methods described in the paper on your own data. Just copy and paste the data to the box below, set the parameters, and press the "Analyze It" button.

Each line of the data should include three fields, separated by white space. The first field is the name of the item, the second is the name of the reviewer, and the third is the rating. The item and reviewer names can be strings or numbers, but they cannot contain white space. For example:

 paper1 fred 4
 paper2 fred 6
 paper2 jill 3
Also, be sure to set the minimum and maximum possible ratings to the end-points of your scale and the increment to the smallest possible difference between two ratings. If the scale is continuous and some ratings use the absolute maximum or minimum value, the increment should be set to a small value (like 1e-6), rather than to 0.


Model: Average Z Score Linear Logistic Spindle
Minimum Possible Rating:
Maximum Possible Rating:
Smallest Rating Increment:
Extreme Value Penalty:
Output: Scores Generosities Adjusted Ratings
Data:

Doug Rohde, dr+web@tedlab.mit.edu,
Department of Brain and Cognitive Sciences,
Massachusetts Institute of Technology