Interesting piece by Carl Bialik aka The Numbers Guy titled “Understanding How A Current Kids’ Flick Can Beat Out de Sica“. In the piece Carl examines a number of different ways rating systems operate online.

Compiling all of that information into a single ranking is a provocative numbers question. If the only two critics to rate CafĂ© Chris each awarded it the maximum five stars, while 100 diners rated its rival Dave’s Diner with an average of 4.8 stars, has Chris really surpassed Dave in culinary excellence? Or should we treat the much smaller number of voters for Chris — who could be Chris and his brother — with a grain of salt?

This raises a really good question…Are all ratings equal? And what does a rating really mean without some understanding of who the rater is? Let’s compare the situation to a real life scenario. Suppose a software engineer were to be recommended by Bill Gates and another one by somebody not as well know…Who would you hire?

Clearly the answer is that you will put more weight in a recommendation coming from Bill. You would justify putting higher weight on Bill’s recommendation by noting that Bill has better access and understanding of software talent and clearly has a lot more to lose in terms on his reputation by making careless recommendations.

But on the internet its hard to identify, who is who. This patina of anonymity forces sites to adopt hokey solutions like the IMDB

Internet Movie Database, the cinema site owned by Amazon.com, approaches its list of users’ favorite films in this way. A new release whose first two votes are enthusiastic doesn’t push it past “The Godfather.” Instead, IMDB assigns all new movies 1,300 votes with a rating of 6.7 — the average rating for all films listed on the site. Then each actual vote is added to those.

This is how “Umberto D.,” with an average user vote of 8.3, can rank at No. 242 of all time, while “Shrek” is 10 notches higher despite having an average user vote of just 8.0. “Shrek” wins because almost 30 times as many people have voted for it than for “Umberto D.,” adding more certainty to its acclaim.

This modified formula dates from the early days of IMDB, nearly a decade ago, managing editor Keith Simanton says. At first the site used a simple average, but “it wasn’t working out well,” he says. The current ratings system helps “to mitigate the fan-boy aspect.” In other words, two die-hard fans — such as the director and his mother — can’t easily game the ratings.

Another interesting problem here is the problem of context. What is the point of putting together a list of all time favorite movies on IMDb? Is the list intended to display the movies one should watch? If that is the case, a genre based organization might be more successful. In terms of ratings, such a classification would ensure that the fans of a particular genre, like animation movies, who tend to be excitable and a lot more comfortable with rating things online are not directly compared with fans of a different genre who might have different characteristics.

When applied to a specific context and where community credentials of a participant can be clearly established, a rating system can indeed produce results.

A similar approach underlies player rankings on Halo 3, the Xbox 360 title released two weeks ago that lets players in multiple locations join the same game online. The first day Microsoft released the futuristic war game, players joined a game 2.4 million times. Some were playing with friends, but others relied on the game’s matchmaking feature to find equally skilled strangers to compete against.

Microsoft uses a Bayesian formula similar to IMDB’s, called TrueSkill, to change players’ rankings slowly as they get more experience. After all, a single great result in a Halo 3 match could be the result of a fluke (your opponent gave up because an urgent offline need took her from the game) or a deliberate effort to game the system (your friend threw the game so you could gain rating points).

Getting the TrueSkill ranking right is crucial. “If there is a great disparity in skills between competing players, neither of them will have a lot of fun,” says Microsoft researcher Thore Graepel, who helped develop TrueSkill.

A new Halo 3 player who gets good quickly may have to wade through tiresome routs until TrueSkill catches up to his true skill. And IMDB users may not be able to discover highly regarded films that haven’t received enough votes to make the Top 250 chart, which in turn makes it hard for those films to get more attention and so more votes. Many other sites, such as the local-reviews site Yelp, keep it simple and just show average ratings.

While TrueSkill is clearly an important component of Halo 3, it also brings up the limitation of such context restrained interactions. Even though a user has skills playing video games and even has a great score in other games, Halo 3 still treats the user as a newbie who has to earn their reputations before playing at their true level. These kind of limitations are likely to force a number of good players to abandon the game in the course of ramping up.

This is the point I have to make a plug for SezWho :-)…We think we have a solution that does not have any of the limitations, identified above. It assigns proper weight to ratings based on reputation of rater, it rewards users for identifying themselves and handles context based translations across different social media (Blog, forums etc.) communities.

Rate this:
3.7 (1 person)

Comments

Post a comment   |   Trackback URI   |   Comments RSS feed

Filter Comments

No comments yet.

Trackbacks/Pings

  • No trackbacks or pings yet

Leave a Comment

Comment template by SezWho