Just in time for an example to illustrate the thinking of web 3.0 (I know I'm not supposed to talk about that right now), I'm on to a way a web audience can be explained demographically through now-established, boot strapped systems. It's a lot like chaos theory where the only way you can describe any one particular part is by understanding how that part is connected and interwoven with every other part of a whole. Or, if you don't throw-up from reading this whole post then you are awesome.
While I started off thinking this statistics stuff was easy, I now truely believe it's a total freak show and it's very difficult to get a sense of how a site is doing from just the stats or any other one indicator alone. Yet with a variety of predefined indicators from the newfound layers of 2.0-like underpinnings, you can really paint a valuable demographic painting of an audience make-up. Especially compared to traditional TV stats.
With Rocketboom, the video distribution comes from our own servers, also from reports from 3rd party partner servers, redistribution estimates from non partner distributions, unaccounted for redistributions and long term accumulation of archives at each of these distribution points. This is further complicated by rss pings that lead to d/l's and rss pings that do not lead to d/l's, attempts to watch videos and complete downloads and complete views versus audience members, audience reach and actual daily audience visits, downloads and views. It's an amazing complex web of gazpacho.
First, internal indicators.
When I first tweaked my moveable type weblog template to host one post per page, essentially I wound up with a video blog that has one video per page. If someone comes to the home page, that's an attempt to load a video, if they go to an archive page, that is an attempt as well. Besides the archive page, and the about page, I have always considered my site's page views to represent a rough estimate of how many views Rocketboom videos were getting. Then there is the number of incomplete page views which adds to the amount of attempts and people but not full video views.
One of the many reasons why Rocketboom created a spark initially was due to being a first adopter of Dave Winer's RSS enclosure spec. Many people, including those that hated Rocketboom, would watch it because it was about all they could get in video form with their aggregators. Over time, this audience became very substantial and when Apple released its iTunes pod-casting directory, the amount of daily RSS subscribers from iTunes and other aggregators has continued to challenge the traffic that appears via our website viewers, even now that more content choices are available and our audience wants to watch.
In other-words, setting aside all of the page views which equate with downloads, attempts and website traffic, there is this other large factor of activity. So much so that page views for a website like Rocketboom only tell some of the story. For instance, when iTunes pings our XML sheet which is considered in our stats config to be a file, not a page, it may or may not lead to a pull of a video file. If it does, it otherwise goes unseen to the outside world. Furthermore, all of this RSS traffic is not detected by most 3rd party indicators like Alexa and Google Trends.
And this is only just the beginning. Should I count the traffic to the wiki on Wikia's site as a relevant support for the show? What about my personal blog here at dembot? If I posted this post at a url that was rocketboom.com/dembot/ you would be reading it over on rocketboom to stack up page points. So all of the cross promotion from other sites like Apollo Pony, Abbey.la, rocketboom.org, rocketboom.wikia.com, humanwire.org is important to me in the long run, but not exactly apparent when comparing, oh, say, Ze Frank's site who has everything he does under zefrank.com.
Nonetheless, setting aside all of the other sites, with regards to just rocketboom.com, the page views this year have ranged from 8,499,610 complete page views at the website for the entire month of March to 6.3 million for the month of July and around 5 to 6 million per month for most of the rest of the months this year. That does not equate with completed video downloads, but it's close. The incompletes, or attempts, is of course much greater. And again, this does not include video downloaded through aggregators like itunes, democracy, etc, the biggest traffic we have. And it does not include off site partner redistributions, etc.
So that we know. We also know that what I assumed all along was 4 GB servers on a load balance round robin switcher was originally 4 100mb servers in round and is now 2gb servers in round with a switch for heavy load days to all 4GB. 78,000 per day on average was appearing during a lower point according to the two servers. At peak times in our history, over twice as many. These are files just to the website and do not count redistributions, etc.
The other two GB servers turn on when there is overload, but so far I have not been able to verify the extra numbers and can not include that in my estimates then. Even still, its unlikely to add enough to match up.
This does not match up with the page requests and still yet, beyond the pages, there are also the direct video downloads that happen through xml and bypass the page count which also seems unaccounted for. So recently I went out to round-up more off site redistributions and partner and non-partner distributions with other aggregators that redistribute from one master file on their own servers and found quite a few. To give you one interesting understanding of how many aggregators and weird browsers are out there, last month, in addition to Democracy, iTunes, Firefox and Internet Explorer, there were 7,844 different kinds of browsers that hit the website.
Then there are the off-site distributions including everything from TiVo to Nokia to Movedigital to Rocketboom.jp to TvTonic, so on and so forth, most all of which are not publicly verifiable and dependent on 3rd party reporting.
On average, for the month of October, and including Sat and Sun, even though we did not release new episodes on those days, we served over 207,000 complete videos per day on average that I can account for. This number which has been twice as large during peak times/days, and most of which is concentrated over 5 days as opposed to 7 days, still seems to be missing a substantial amount up against the website numbers.
When it comes to advertising with post-roll videos on Rocketboom, advertisers want to know how many complete downloads there were. If we had preroll, then page attempts and incompletes would mean something but it takes a complete d/l to open up a conversation that an ad was possibly watched.
In the past, I always felt as though I was underselling due to unaccounted for redistributions. Now that I have everything together in its most accurate form and have squeezed out everything I could find, in underselling, Im going to go with 1 million complete video downloads per week. I consider this to be a drastic undersell and also drastically reduced from what was earlier reported.
ON SITE STRUCTURE FOR ENGAGEMENT
Everything I have mentioned so far is dependent on information that I am collecting personally compared to what we can learn from other on and off-site indicators.
This is a long story but lets cut right to indicators like the wiki and the comments which would also fall into the category of engagement.
Rocketboom has a fluctuating # of comments from day to day. I think our all time high is 400 comments but sometimes we will have 15 comments on one day and 80 comments the next. I decided to go out and have a look at other websites that use comments to see how Rocketboom compares.
I first though of Daily Kos because they are a political site which by nature generates more commentary and I think of Kos as having lots of comments. A quick scroll of the site last week showed approx. an equal amount of commenting activity, while this week most posts on Daily Kos have twice the number of comments as Rocketboom. Many blogs and websites have many more comments per post than we do (especially You-Tube videos), yet most blogs do not have as many comments. Thus there is a sign of engagement to add to the demographics pie.
With regards to the story links that refer to the sources of the information discussed on Rocketboom, there is an indicator in how much traffic we send away compared to other sites.
The other day Robert Scoble had a popular post which was well linked-to. Since we also linked to that post, I asked him how the links back to his site looked from Rocketboom. He wrote:
"Here’s my referrer page. The day before it showed about 500 from Rocketboom and 750 from the BBC/Tech page."
While clearly Rocketboom is just T-ball compared to the BBC in size, there is something to be said for the level of engagement ratio per audience member in this comparison.
Lets have a look now at the 'ol Alexa graphs. I have learned a few things about Alexa and find them to be absolutely the worst indicator. But they are popular and it can be used to provide a couple of insights if we spend some time to filter out all of the noise that Alexa doesn't.
One thing I just learned is that Alexa measures rankings based on people who have an alexa toolbar installed in their IE browser on a PC. The toolbar sends a note to Alexa for every site that is visited to collect data. Then Alexa takes that information and decides how popular your website is up against the rest.
Considering that it must use the MSIE (Microsoft Internet Explorer) browser almost exclusively, this is not necessarily useful for websites that are say, anti-IE or for sites with a heavy Firefox audience. As for Rocketboom, this month for instance, here is the breakdown of audience browser use:
As you can see above, the orange MSIE browser is just a small part of the variable. On many other websites, MISE might be way ahead as the biggest browser.
Note also that the top two browsers, Democracy and iTunes, can't even be traced by Alexa because it all happens without a browser.
So Alexa is lame because it's not much of indicator in terms of how high or low Rocketboom stands in reach compared to other sites (i.e. our audience is not as likely to instal a toolbar) and it's not a good indicator of what kind of site traffic we are getting to our own site.
If one day we went out an encouraged our audience to go and instal an Alexa toolbar, it would likely raise our status because a proportionately higher amount of our audience would be clicking on the site. I heard that Ze Frank has said as much to his audience so again, its not a good indicator unless you are him.
In some ways, Alexa can be a good indicator of our own progress compared to ourselves you would think; no matter how many people are using the tool bar and clicking on Rocketboom to influence the results, those same people click more or less compared to each other. So if the amount goes up, that's at least a possible indication that there really is upward movement and thus a possible representation of the rest of the activity that would cause traffic to fluctuate.
In the below graph we see the history on Alexa for Rocketboom by itself:
As noted, you can see that this year has seen some serious variables. The months for July and August are still mind boggling to me.
There are also some MAJOR flaws in trying to assume what is going here, however. Notice above that just after our biggest-ever spike in July, and by the end of August, it "appears" as though we continued to experience a downward trend. The same downward trend appears for the whole first quarter too. Yet compare Rocketboom to some other popular weblogs:
In the case above there are downward trends that appear to sweep all sites just the same. Thus, to a relatively major degree, it appears this is not an indication that we are doing poorly against ourselves, per se. It's more that the people who use Alexa toolbars have become distracted because it's happening to all of us at the same time. Is this an internet-wide use trend? I can't help but wonder if this is an indication that people are abandoning the Alexa toolbar this year. Perhaps fewer people are clicking because fewer people are using it. Below is a close-up of a downward trend from Jan to October, almost all year for all the sites:
In another useful instance of a same-site comparison, taken in context with itself, take the Alexa data for TRM, our first advertiser as seen below:
We know that the big TRM spike was during the time they advertised on Rocketboom. With a typical peak-out point of 5 million on their own, advertising on Rocketboom brought them up to over 25 million. That's over a 500% increase in reach, compared to itself over a short enough time to assume less outside noise.
Without having to rely on Alexa, we can look to many other indicators to gauge audience levels and reach.
Technorati is an important indicator for me because I consider Rocketboom to be a blog that is informed by blogging. Unlike Alexa which is only used by strange people, Technorati is simply an aggregator of information across 60 million weblogs. When a link is linked to, a point is added. Today Rocketboom is at #75 out of 60-million which means that more bloggers are linking to rocketboom than most other blogs.
For the last 90 days, there have been between about 20 and 60 regular posts per day that use the word "rocketboom":
Check out that spike on Novermber 14 & 15, weee! It's a triple pile-up with news about the TiVo, Zune and. . .some other stuff.
This is obviously a major indicator: We know something about bloggers as a demographic: they are information junkies. They are snarky. They are opinionated. You know who you are. And so again, for an advertiser or sponsor, or just for my own self awareness, there is a story here about the audience that has already been told, without need to ask. In so many ways, we have a nice reach into these 60 million smarty pants. And futhermore, each of these people have all kinds of data on their blogs about themselves, so much so, it can be overkill on the information. It's like, hey dude, I didnt need to know all THAT! Advertisers on the other hand just love to soak in this kind of data. It's intersting to me because I just like to know about my audience, it's fascinating. It's also where I get most of my feedback, along with comments on the site and emails.
Rocketboom is a unique word that no one else uses except to refer to our site. Thus, in the various search engines, there is a number that is returned and that number, compared to others, is an indicator. Today the number in Google is at over 3,000,000 results. That may mean nothing to you, but compared to, oh, say, "Ze Frank" who has 670,000 today, it means something as an indicator for how we match up here. This same method can be applied to any search engine too to get an overall score and different terms can be compared to paint different pictures.
Why is this indicator important? I have noticed recently that on any regular day, we get a huge amount of traffic to the website from Google for instance. It's a great place for discovery. Due to having a high link value, it's thus more likely that we will appear as a result.
More useful is Google Trends. Whenever someone uses Google to look for something, Google gives a count to that search term and gives a count to that location. Thus, we can use the site to compare how often people search rocketboom, compared to oh, lets say, ze frank why don't we, since he brought it up. In other-words, what are people looking for more?
As you can see from the top half of the chart, more people have been searching for 'rocketboom' then they have been searching for 'ze frank'. But Ze is on the rise! Look out he is hot! From the bottom half of the chart, you can find that Rocketboom has more main stream media news mentions.
Now lets have a look in terms of origin:
It appears as though Rocketboom has Ze licked in Pleasenton, California, the "weathiest middle-sized city in the US". I'm not saying he has a more ghetto audience or anything.
(**update: Tim asked Ze about his popularity in Dutch to which he replied, "I think that its because 'ze' is the word for 'the.' its not my site…"
The international aspect is neat. Video online clearly out reaches any TV broadcast in potential and the visual element helps to break down language barriers.
"Google News is a computer-generated news site that aggregates headlines from more than 4,500 English-language news sources worldwide". Thus, we can use it as an indicator to see how much our site is being talked about in more of a main stream, journalistic reach.
What is the count and what is being said?
A few have already emerged and can provide a story into the audience as well.
iTunes top 100 chart or featured? This chart is not an indication of how many subscribers you have. This indicates how many new people subscribed that day.
Network2 top 10 chart?
Small survey of people now but growing.
SOCIAL NETWORKING SITES
These sites tell us something about our demographic because we already know about the demographics of these sites. People on Myspace for instance are walking databanks for public info like favorite music, location, age, interests. All kinds of profiles. Aside from that, Myspace has its own character of people in general.
How many Myspace friends do YOU have?
Well what about Facebook? Flickr views?
SOCIAL NEWS / BOOKMARKING
How many people have bookmarked your show/site on delicious? This site has a character to it and can tell us something by the demographics of the audience that uses it. If you really wanted to get deep, you could cross reference all of the other links that are popular with the people who have bookmarked your site to find out about their most popular interests.
Digg and Slahdot are like flashmob communities and both are different, as seen by the comments. Slashdot has an audience that contributes a great deal to the brain trust of the issue at hand, whereas Digg comments are more just snarky jokes and personal opinions. So again, if you wind up linked-to by these communities, you can paint a picture about what happened with the incoming audience based on the pictures that have already been painted off site.
So what is the point? The data is out there and you can find it easily. You can say a great deal about the audience and how you stack up against the rest; Who is out there and what they are saying; Where they are and from whence they came. And yet I still have no idea how many.
Who will take the afternoon to aggregate a set of API's to check a URL up against these kinds of indicators and then assign values and perhaps an overall value in return?
Great rundown on stats and their usefulness. I felt you got a bit off-track towards the end where you seemed to go after Ze more than making your points, but it was still well-done.
Having read both sides of this thing, I'd say the thing you both agree on is that stats as they stand now are inherently voodoo. I believe Ze's point about Alexa is that if you were to take ANY stats at face value, he's more popular than Rocketboom, but it seems unlikely that that's the case. He's got a lot, but it's true, Rocketboom's got a much bigger distribution engine going on.
What needs to come out of this is some sort of standard equation for determining audience size. It would assume a certain amount of voodoo, but be applied consistently among all sites. Page visits aren't really indicative of video views, and all downloads are not views, but if there were some agreed-upon standard for parsing raw logs, at the very least everyone could be working from the same page. Judge reach, and then combine it with some kind of "popularity" index (there are already some of those around, I think), and you have two valuable numbers to present to advertisers.
Without something standard, I think everyone still comes off as playing at being popular... sponsors have to trust their impression of you more than the numbers you say you have, because they have no idea if they're legit or not. This, at least, is a step towards a solid accounting method.
Again, well done!
Posted: November 16, 2006 3:19 PM
i noticed you didn't mention the business week article that just came out.
i thought ze's point was that the numbers were fuzzy, and it certainly looks like that is true.
you didn't post my last comment so i'm supposing you screen/edit these, but i hope you don't this time.
And in the "it's not about the bike" column today, please everyone- see the point outside the context of the RB v. Ducky situation. Look at the effort that went into this post, the questions it raises, the content about the stats.
It's not high school. But then...
Thanks for a killer post, Andrew.
Posted: November 17, 2006 7:11 AM
You seem a bit worried. 3 weeks later, and you're still going on about this.
You can claim to be web 9.0 for all I care. The fact is that Ze Frank does a much better job at connecting with his viewers than you've ever been able to do.
By the way... congrats to Amanda Congdon on her new multi-platform deal.
For those able to look past the "nerd fight" (and I'm not saying that was Drew's motivation for writing this), there are several valuable insights in this post. I can tell Drew spends a lot of time thinking about these things.
He's in a unique position to make interesting observations about online/portable/social media. I'm grateful he was willing to take the time to share.
Posted: November 19, 2006 2:07 PM
Rocketboom...could you have pulled this off in any other city? And Ze Frank? It's always nice to see how the big guns are doing. Thanks for the info.
Posted: November 19, 2006 4:58 PM
Hey thanks Drew, that was interesting. Kind of a technical stream of conciousness, where we get to bounce around in your brain for a while.
So, let me attempt an executive summary:
1. Measuring an audience is complicated. There are lots of things you have to consider, and lots of ways to slice and dice.
2. The better you are at distribution, the more complicated it gets.
3. Be suspicious of off-sight indicators like Alexa. They have a lot of hidden bias. They can be very wrong.
4. Someone should provide a tool that takes all the mismatched crappy data available and blenders out a nice smooth report, maybe with a single headline number.
Did I miss anything? Any of you want to add bullet points?
30,000 300,000 3 million. Who is watching rocketboom and wondering if they should watch the next days show based on how many zeros are after the 3?
Perhaps there needs to be a different approach to getting advertising. That is the reason you are showing the general public these charts and graphs?
Or maybe you could sell it to Google? Or Godaddy? Or Apple?
Someone needs to sign a deal! I thought Tikibartv would have something going by now, I thought Rocketboom would defiantly be making money. Yet its been nothing (except that ebay thing) not even a godaddy advert.
Whats whats whats going on?
I see a lot of discussion about the "voodoo" involved in reading web stats and download stats and how this holds podcasting back. They still represent an actual click or action by a person or another site (bot) interested in the content of your site.
The numbers for TV and Radio are derived by logs filled out by viewers/listeners. I know the statistical math backs up the validity of the results but do they approach the strength of even low-balled numbers from unique page views?
Just some random thoughts. Thanks for keeping the discussion interesting.