Interesting piece by Carl Bialik aka The Numbers Guy titled “Understanding How A Current Kids’ Flick Can Beat Out de Sica“. In the piece Carl examines a number of different ways rating systems operate online.

Compiling all of that information into a single ranking is a provocative numbers question. If the only two critics to rate Café Chris each awarded it the maximum five stars, while 100 diners rated its rival Dave’s Diner with an average of 4.8 stars, has Chris really surpassed Dave in culinary excellence? Or should we treat the much smaller number of voters for Chris — who could be Chris and his brother — with a grain of salt?

This raises a really good question…Are all ratings equal? And what does a rating really mean without some understanding of who the rater is? Let’s compare the situation to a real life scenario. Suppose a software engineer were to be recommended by Bill Gates and another one by somebody not as well know…Who would you hire?

Clearly the answer is that you will put more weight in a recommendation coming from Bill. You would justify putting higher weight on Bill’s recommendation by noting that Bill has better access and understanding of software talent and clearly has a lot more to lose in terms on his reputation by making careless recommendations.

But on the internet its hard to identify, who is who. This patina of anonymity forces sites to adopt hokey solutions like the IMDB

Internet Movie Database, the cinema site owned by Amazon.com, approaches its list of users’ favorite films in this way. A new release whose first two votes are enthusiastic doesn’t push it past “The Godfather.” Instead, IMDB assigns all new movies 1,300 votes with a rating of 6.7 — the average rating for all films listed on the site. Then each actual vote is added to those.

This is how “Umberto D.,” with an average user vote of 8.3, can rank at No. 242 of all time, while “Shrek” is 10 notches higher despite having an average user vote of just 8.0. “Shrek” wins because almost 30 times as many people have voted for it than for “Umberto D.,” adding more certainty to its acclaim.

This modified formula dates from the early days of IMDB, nearly a decade ago, managing editor Keith Simanton says. At first the site used a simple average, but “it wasn’t working out well,” he says. The current ratings system helps “to mitigate the fan-boy aspect.” In other words, two die-hard fans — such as the director and his mother — can’t easily game the ratings.

Another interesting problem here is the problem of context. What is the point of putting together a list of all time favorite movies on IMDb? Is the list intended to display the movies one should watch? If that is the case, a genre based organization might be more successful. In terms of ratings, such a classification would ensure that the fans of a particular genre, like animation movies, who tend to be excitable and a lot more comfortable with rating things online are not directly compared with fans of a different genre who might have different characteristics.

When applied to a specific context and where community credentials of a participant can be clearly established, a rating system can indeed produce results.

A similar approach underlies player rankings on Halo 3, the Xbox 360 title released two weeks ago that lets players in multiple locations join the same game online. The first day Microsoft released the futuristic war game, players joined a game 2.4 million times. Some were playing with friends, but others relied on the game’s matchmaking feature to find equally skilled strangers to compete against.

Microsoft uses a Bayesian formula similar to IMDB’s, called TrueSkill, to change players’ rankings slowly as they get more experience. After all, a single great result in a Halo 3 match could be the result of a fluke (your opponent gave up because an urgent offline need took her from the game) or a deliberate effort to game the system (your friend threw the game so you could gain rating points).

Getting the TrueSkill ranking right is crucial. “If there is a great disparity in skills between competing players, neither of them will have a lot of fun,” says Microsoft researcher Thore Graepel, who helped develop TrueSkill.

A new Halo 3 player who gets good quickly may have to wade through tiresome routs until TrueSkill catches up to his true skill. And IMDB users may not be able to discover highly regarded films that haven’t received enough votes to make the Top 250 chart, which in turn makes it hard for those films to get more attention and so more votes. Many other sites, such as the local-reviews site Yelp, keep it simple and just show average ratings.

While TrueSkill is clearly an important component of Halo 3, it also brings up the limitation of such context restrained interactions. Even though a user has skills playing video games and even has a great score in other games, Halo 3 still treats the user as a newbie who has to earn their reputations before playing at their true level. These kind of limitations are likely to force a number of good players to abandon the game in the course of ramping up.

This is the point I have to make a plug for SezWho :-)…We think we have a solution that does not have any of the limitations, identified above. It assigns proper weight to ratings based on reputation of rater, it rewards users for identifying themselves and handles context based translations across different social media (Blog, forums etc.) communities.

Rate this:
3.7 (1 person)

End of page-rank?

August 22nd, 2007

We live in a page-rank world. Google the main organizer and cataloger of the internet, uses Page-Rank as the primary way to organize information (I know I am oversimplifying here as Google uses a number of other algorithms as well but link structure between sites and pages is still one of the most important factor). Even blog search engines like Technorati and Sphere use a derivative of page-rank algorithm to rank the content. At a high level what that means is that content on a site derives its credibility from the credibility of the site. E.g. if there is a page on cnn.com, it inherits the page-rank from the site. Now if there is a page that has same information and is on a site with a lower page-rank than cnn.com, the page will be considered less credible and show up lower in the Google search results. The idea of deriving credibility of the content from the site made a lot of sense when there were editorial boards and organizations to ensure everything was vetted, reviewed and solid. But does it still make sense in the evolving social media landscape?

Let’s take an example. Let’s say there is a video on YouTube. Should the fact that the video happens to be on a popular site make it more credible? You can be sure that the staff at YouTube has not reviewed the video to ensure the credibility of the content…In such situations does it still make sense to use a page-rank based mechanism to evaluate the credibility of the content? Clearly with user generated content the credibility of the content cannot be derived from the credibility of the site, instead the credibility has to come from other source. How about users who are generating the content? How about the consumers of that content?

Its all about the people

When sites become a two conversation (ReadWrite) and when everybody has access to the means of publishing content, and has the potential to get immediate, unlimited distribution, as is the case with social media, the ranking of the site become meaningless in determining the quality of the content. This is a change for the Internet but in the real world, that is really how things work. E.g. in a meeting, a conference or a social gathering, people take into account the credibility of the person who is speaking to determine what to make of it. In other words, who is delivering the message is almost as important as what is being delivered. Now that Internet is enabling a global conversation, we need to go back to the same people-based credibility model to evaluate the content that is generated by users.

Let’s go back to our earlier example to see how it can work. Instead of using the site based credibility, suppose there was a way to establish that a particular user has spend some time thinking about the topic and has posted some interesting thoughts on the subject on his/her blog. Wouldn’t that make you more likely to watch the new YouTube Video?

This is all good but how?

One of the key strengths of social media is that users have the means of producing and publishing content. This also means the conversation on any topic span multiple sites. While this provides a great deal of flexibility to users it also makes it really hard for any particular site to provide enough of user context to make their content credible. Even a popular site like YouTube can only show what other videos a user has published, but what if the user has only a few videos on YouTube and the rest of the context is in the form of Flickr pictures, blog posts/comments and forum discussions? YouTube will not be able to show that context for the users and the content is going to become less interesting as a result.

sz11.jpg

Another problem with building a people based credibility framework has been that there has been now way to establish people’s identity. This is an artifact of the evolution of the web where initially the focus was on sites and the organization principle was page-rank. Lack of a universal identity mechanism prevents sites, even though community clearly benefits from such context, from putting together cross-site user profiles. Right now there are a number of efforts like OpenID, card-space that are underway to establish a universal distributed framework using which applications can establish user’s identity. The issue though is that these frameworks are still in their infancy and still a few years away from critical mass. So in the meantime, how do we proceed?

Rise of Community-Rank

One of the key ingredients that has been that have not been leveraged this far is the incentives for participants to identify themselves and be known as a good member of the community. There are a number of members in each community that are serious participants and would be happy to be rewarded in terms of recognition for their value-added participation. What if there was system that enabled users to build and control inter-site and intra-sites participation profile. Such a system will have to allow full user control over the profiles and provide mechanism to users to have as many identities as they want (let’s face is – all of us have multiple identities both in real and virtual worlds). Much like real world, in such a system, community will be able to reward users for participating well and punish those users that don’t. Let’s can this system a community-rank and identity system.

sz21.jpg

Using community-rank and identity system, readers of social media sites will be able to establish participation history of a user, understand what the community thinks of the user’s content and easily find the most credible content. For search, community-rank will lessen the reliance on site context and put the focus on the community reputation of the people generating the content.

But what about privacy?  There are always risks when you start organizing information around users and their participation in communities. While a system like this benefits the community as a whole, some of the users might not want to have participation profile. To address these concerns such a system will need to provide full user control on the profile information. In addition, it should allow users to be anonymous if they want their contributions to not be a part of their profile. By addressing some of the privacy concerns, such a system can really help improve the quality of conversation in communities.

ConclusionPage rank based organization is not suited for social media site (you just have to go and search in a discussion forum to realize that things don’t work as well as you would like). A community rank and identity system has the potential to unlock huge amount of value in social media by incentivizing participation and by empowering readers.

Rate this:
3.9 (2 people)

Interesting take on the online reputation scrubbing services…Check it out here.

Rate this:
3.2

Fascinating post on MSNBC about the price users put on Privacy…The post talks about experiments where users were asked how much they value their private data. Customers were asked the question in two ways:

  • How much are customer willing to pay to protect their privacy?
  • How much do customers want to be paid to share their private information?

As expected customers wanted a whole lot more money to share their private information while very few were willing to pay to anything to protect that information. I think people have this assumption about privacy that its something they just have…and I think its an artifact of how things used to be before everything changed because of technology. We now need to reexamine our assumptions about how much we really value privacy and come up with a more rational value (rather then have endowment effect and other psychological factors skew our judgement) … This is too important for everybody.

Rate this:
3.2

Great opinion piece by Tom Grubisich at Washinton Post. It talks about the lack of transparency with user generated content.

These days we want “transparency” in all institutions, even private ones. There’s one massive exception — the Internet. It is, we are told, a giant town hall. Indeed, it has millions of people speaking out in millions of online forums. But most of them are wearing the equivalent of paper bags over their heads. We know them only by their Internet “handles” — gotalife, runningwithscissors, stoptheplanet and myriad other inventive names.

Imagine going to a meeting about school overcrowding in your community. Everybody at the meeting is wearing nametags. You approach a cluster of people where one man is loudly complaining about waste in school spending. “Get rid of the bureaucrats, and then you’ll have money to expand the school,” he says, shaking his finger at the surrounding faces.

You notice his nametag — “anticrat424.” Between his sentences, you interject, “Excuse me, who are you?”

He gives you a narrowing look. “Taking names, huh? Going to sic the superintendent’s police on me? Hah!”

In any community in America, if Mr. anticrat424 refused to identify himself, he would be ignored and frozen out of the civic problem-solving process. But on the Internet, Mr. anticrat424 is continually elevated to the podium, where he can have his angriest thoughts amplified through cyberspace as often as he wishes. He can call people the vilest names and that hate-mongering, too, will be amplified for all the world to see.

This is a real problem with the Internet (Although I am not sure about “transparency in all institution” especially with the government)…With the lack of incentives to participate and lack of tools for the community to control the conversation, the vocal and vilest few take over the conversation and bring down the quality of discourse. Tom suggests a few solutions:

Until recently, many of the site’s posters identified themselves with anonymous Internet handles — which were the site’s default ID. Now, people must enter a “user ID” that appears with their comments.

Hal Straus, washingtonpost.com’s interactivity and communities editor, says the changes “move us in the direction of transparency.” But the distinction is not quite a difference, because washingtonpost.com user IDs can be real names or fictional Internet handles. While the site prohibits comments that are libelous, abusive, obscene or otherwise inappropriate, Mr. anticrat424 could still find a well-amplified podium at washingtonpost.com.

The news and opinion site Huffingtonpost.com requires posters to register with their real names but maddeningly assures them that it will “never” use those names.

Though not foolproof, there are ways to at least raise the bar. Gordon Joseloff, a former CBS News correspondent who owns WestportNow.com, a popular grass-roots site in Westport, Conn., used to employ the standard permissive registration process. But in late 2005, turned off by the venom of anonymous posters, Joseloff instituted a policy requiring anyone who wanted to comment to use his or her real name. Joseloff also requires registrants to give their phone numbers. Numbers aren’t posted on the site, but they give him and his team an additional check against false registration.

Only the big sites like the Washington Post or Huffington Post can pull off requiring users to register…And its really painful for readers to have to remember another username and password. So what is the solution? Also there is a need to strike the right balance between the need for anonymity and identification…At times Anonymity is justified, like with whistleblowers etc. but at the same time anonymity all the times provides perverse incentives…What do you think?

Rate this:
3.2

Benefits of Forgetting

May 10th, 2007

Interesting and scholarly study from Kennedy School of Government’s Viktor Mayer-Schönberger, titled “Useful Void: The Art of Forgetting in the Age of Ubiquitous Computing”. In the study, the author points to change in our default societal behavior, from forgetting unimportant things to remembering everything.

In March 2007, Google confirmed that since its inception it had stored every search query every user ever made and every search result she ever clicked on. Google remembers forever.

“хранить вечно“ (to be preserved forever) the KGB stamped the dossiers on its political prisoners. The Communist state would never forget the identity, believes, actions and words of those that had opposed it.

Like the Soviet state, Google does not forget. But unlike the Soviet Union that ceased to exist fifteen years ago, Google has become an indispensable tool for hundreds of millions of people around the world, who use it every day. We seem to have accepted that our digital society may forgive, but no longer forgets.

This has resulted in a drastic shift in our data retention behavior. For millennia it was difficult and costly to preserve. We would only do so in exceptional circumstances, and most frequently only for a limited period of time. For almost all of human history, most of what humans experienced was quickly forgotten. Today, however, retention of digital data is (relatively) easy and cheap. As a consequence, and absent other considerations, we keep rather than delete it. This is the central point: In our analog past, the default was to discard rather than preserve; today the default is to retain.

Credit bureaus store extensive information about hundreds of millions of U.S. citizens. Daniel Solove writes that the largest US provider of marketing information offers up to 1,000 data points for each of the 215 million individuals in its database. We also see the combination of formerly disparate data sources. Solove mentions a company that provides a consolidated view at data from 20,000 different sources across the world. It retains the data, he writes, even if individuals dispute its accuracy.

Companies keep our air travel reservations on file even when we decide not to buy the ticket, together with rich information about us and our previous travel patterns.21 Millions of cameras in public places – the UK alone is said to operate between 2 and 3 million produce records of our movements that are kept. Law enforcement agencies store biometric information about tens of millions of individuals even if these have never been charged with a crime. Search engines retain each of our search queries, and keeps archival copies of our web pages long after we have taken them offline.

This is only the beginning. With the advent of ubiquitous computing, of cheap GPS chips in our cell phones, cameras and cars, of RFID tags in everyday objects, and of tiny, networked sensors that surround us, a more comprehensive trail of our actions will be collected than ever before. Given low cost of storage, ease of retrieval and potential value in accessing information, much of the data that is being collected will be kept for months if not years, as our societal default has shifted from deletion to retention.

This has drastic consequences beyond the obvious ability to know much more about other people’s preferences, behaviors, actions and opinions than in the analog world of incremental forgetting. Living in a world in which our lives are being recorded and records are being retained, in which societal forgetting has been replaced by precise remembering, will profoundly influence how we view our world, and how we behave in it.

If whatever we do can be held against us years later, if all our impulsive comments are preserved, they can easily be combined into a composite picture of ourselves. Afraid how our words and actions may be perceived years later and taken out of context, the lack of forgetting may prompt us speak less freely and openly. This is the temporal version of a panoptic society, in which everything is being watched; it is a society in which most of what is being recorded and collected is being preserved. Regardless of other concerns we may have, it is hard to see how such an unforgetting world could offer us the open society that we are used to today.

So what is the solution? The author suggests a combination of legislative and technical approaches that restore the default of forgetting in our society. So if some entity or person wanted to remember things beyond certain time period, they would need to do some special action like writing down in digital terms…I think this makes a lot of sense and could prevent common people from becoming more and more like stage coached politicians who  plan and practice each and every one of their moves and utterances…What do you think?

Rate this:
3.2

Hit job - web style

May 7th, 2007

This is a familiar enough story (Via SF Gate)…And the people hit have few avenues for relief:

The first postings appeared soon after Sue Scheff, who runs a Web-based referral service for parents with troubled teenagers, advised a woman from Louisiana to withdraw her twin sons from a boarding school in 2002. Scheff is “a con artist,” “a crook” and “a fraud,” according to the messages, which peppered blogs and Internet forums for parents of troubled teens.

Soon, calls to Scheff’s Parents Universal Resource Experts dropped by half, said Scheff, 45, who lives in Weston, Fla. “People would say: ‘You know, I just read this about you online. How do I know I can trust you?’ ”

Scheff, whose 6-year-old service usually draws a lot of traffic, is a victim of an emerging phenomenon: online smear campaigns, which can wreak havoc in the victims’ professional and business lives at the touch of a few keystrokes.

We need an identity and reputation infrastructure that puts all opinions, expressed by all people, in perspective based on what they have done in the past. Such a system will help online communities maintain decorum by penalizing participants who don’t add value to the discussion (much like discussions in real community) and rewarding those who do…This is quickly emerging as an important requirement for wider adoption of social media…

Rate this:
3.2

Comscore cookie study

April 17th, 2007

Interesting study from Comscore about the behavior of users with regards to managing cookies on their computers. The data is presented in a somewhat convoluted manner, so let me highlight the key points:

  1. On average a user clear cookies about 2.5 times a month on one computer.
  2. While 69% of the users don’t clear cookies at all, 31% of the users clear cookies at least once every month on a computer.
  3. 7% of the users are frequent cookie cutters, meaning that they clear cookies more than 4 time each month.
  4. Looks like users clear cookies indiscriminately, without regard to the source of the cookie as the data is pretty similar for first party and third party cookies.

Overall, this data sounds about right as it jives well with data I got from a buddy of mine at Yahoo!. Some of the business implications of the data above are:

  1. Cookie based tracking of the number of unique visitors is unreliable
  2. Browser side cache for web pages can be unreliable as my guess is that when users clear their cookies, they also clear all the cached files (anybody has specific data here?) as they are considered same sort of private data…This means that a web site relying on browser side caching for scalability and performance of web site might be in for a surprise.
  3. Cookie based tracking services like MyBlogLog etc. perhaps need a better way to track users? May be using browser plug-ins?

What do you think?

Rate this:
3.2

Web Attack - What to do?

April 10th, 2007

Great article in BusinessWeek how Internet and Social media (despite the occasional nastiness) is making business become more accountable:

istock_000001744024xsmall.jpg

Home Depot’s (HD ) CEO goes into an emergency huddle with his crisis management team after 14,000 bilious customers storm an MSN (MSFT ) comment room.

The venom of crowds isn’t new. Ancient Rome was smothered in graffiti. But today the mad scrawls of everyday punters can coalesce into a sprawling, menacing mob, with its own international distribution system, zero barriers to entry, and the ability to ransack brands and reputations. No question, legitimate criticism about companies should get out. The wrinkle now is how often the threats, increasingly posted anonymously, turn savage. Even some A-list bloggers are wondering if the cranks are too often prevailing over cooler heads.

Most companies are wholly unprepared to deal with the new nastiness that’s erupting online. That’s worrisome as the Web moves closer to being the prime advertising medium—and reputational conduit—of our time. “The CEOs of the largest 50 companies in the world are practically hiding under their desks in terror about Internet rumors,” says top crisis manager Eric Dezenhall, author of the upcoming book Damage Control. “Millions of dollars in labor are being spent discussing whether or not you should respond on the Web.”

In the beginning, the idea of this new conversation seemed so benign. Radical transparency: the new public-relations nirvana! Companies, employees, and customers engage in a Webified dialectic. Executives gain insight into product development, consumer needs, and strategic opportunities. All the back-and-forth empowers consumers, who previously were relegated to shouting at call-center minions. Venom can be a great leading indicator.

Trashing brands online can also be high theater. Rats cruising around a Greenwich Village KFC/Taco Bell (YUM ) on YouTube (GOOG ). MySpacers (NWS ) busting their employers’ chops. Faux ads bashing the Chevy (GM ) Tahoe as a gas-guzzling, global-warming monster. Millions of people watch this stuff—then join in and pile on. Is it any wonder companies lose control of the conversation?

When the Web turns against them, executives are faced with the problem of how to manage the blowback. They have two choices: ignore the smaller furies and hope they won’t metastasize, or respond outright to the attacks. It’s rarely a good idea to lob bombs at the fire-starters. Preemption, engagement, and diplomacy are saner tools.

…But what happens when the uproar grows so noisy that the mainstream media is bound to pick it up? That’s exactly the position new Home Depot CEO Francis S. Blake found himself in last month. MSN Money columnist Scott Burns accused Home Depot of being a “consistent abuser” of customers’ time. Within hours, servers were caving under the weight of 10,000 angry e-mails and 4,000 posts, which took the company to task for pretty much everything. It was the biggest response in MSN Money’s history. Blake’s predecessor, Robert L. Nardelli, the guy who famously didn’t allow comments at the company’s annual meeting, simply would have ignored the mob. But Blake knew the controversy could quickly mushroom.

The only way over it, he decided, was through it. So Blake penned a heartfelt and repentant online letter to all Home Depot customers, essentially copping to the company’s less-than-stellar service. He promised to increase staffing and begged for the chance to make good. He created a site to deal specifically with service. He thanked Scott Burns.

In crisis-management circles, the gamble was viewed as a win. Blake actually generated rare applause on an unofficial Home Depot employee site called the Orange Blood Bank, where workers are more likely to post riffs knocking the company. (”You can’t do it, and we’ll never help.”)

I think this is a good thing for all parties, if you take longer view of things…This makes people more accountable and that is always a good thing.

Rate this:
3.2

Great piece in the SF chronicle today by Dan Fost about the recent firestorm related to vitriolic comments against Kathy Sierra (BTW she is great and I love her blog).

The threats against Kathy Sierra, an author who promotes the notion of emphasizing the needs of the user in Web site design, have sparked a Webwide debate on the nature of online discourse.

The incident and its aftermath have drawn back the curtain on a computer culture in which the more outrageous the comment, the more attention it gets. It’s a world that many women in particular see as still dominated by men and where personal attacks often are defended on grounds of free speech.

In addition, many of the newest tools of the Internet are coming into play. Blogs and online communities were supposed to herald an era in which “the wisdom of crowds” guided online behavior to a higher plane. Instead, instances of mob rule appear to be leading the discussion into the sewer.

Some observers believe the incident eventually could serve as a warning to Web communities to increase accountability and stamp out the vitriol that characterizes much of online conversation.

“We need to say this is not acceptable behavior,” said Tim O’Reilly, CEO of Sebastopol’s O’Reilly Media, which publishes Sierra’s books and runs the ETech conference where Sierra was scheduled to speak this week. “If you start making offensive comments, they will be deleted from a blog. Don’t give people that platform.”

This is a sad state of affairs and not completely unexpected either…As one of the commenters quipped in one of the older posts:

Normal Person + Anonymity + Audience = Total Idiot

The other issue here is really, accountability…Unlike in human communities, on the Internet, its easy to avoid facing repercussions of making nasty and unhelpful comments. We really need a system across social media that addresses the issue of accountability by providing the right incentives to all users for participating positively. Such a system will ensure that the users get rewarded for positive contributions and are held accountable for disrupting community discourse.

A powerful argument about what lack of accountability does to good people is provided by Philip Zimbardo, in his interesting book called the Lucifer Effect. Through a number of experiments, Philip demonstrates how if you put good people in accountability free lawlessness, they become fairly evil. Anybody remember Abu Ghirab? (I haven’t read it yet but heard from a number of sources that this is an interesting and powerful book).

What do you think?

Rate this:
3.2