beyond the tag cloud: the tagdex

I think tag clouds are somewhat useless, to be honest. They are a nice way to fill up a bit of space in a sidebar, if you restrict the cloud to the top 25 or so, but unless the writer is imposing a strict taxonomy on themselves, ultimately the size of the cloud will balloon to an unmanageable size. And a tag cloud in a folksonomy makes no sense, because the wide variation in tags is a feature, not a bug. You want the tags to be vast and redundant. It is ok to have a post about Jhumpa Lahiri’s latest novel tagged “book”, “books”, “review”, “Lahiri”, etc. because this increases the points of entry to the content from tag indexing services like technorati, and also increases the intra-blog, inter-post linkages (assuming you are using some variant of a Related Posts plugin that uses tags for determining what is related).

A far better way to think of tags is to consider them as terms in an index. The same kind of index you find at the end of a piece of non-fiction, to be specific. Consider an excerpt from the Index to the book, The Physics of Star Trek, as an example:

It’s easy to see how tags could be recruited to “build” an index of this type. The tags would first need to be sorted in alphabetical order, and then listed as a DL-type HTML list with the “page number” (post number). A range of posts coudl be indicated by the usual dash (ex. Bosons, 192-194) and a list of separate posts by commas (Black Star, 15, 51).

That would be the crudest implementation, but quite effective. However you could go further than this. For example, what about the “see also” link? You could simulate this by looking for tags whose usage is highly correlated, like “Lahiri” and “books”. You could literally calculate Pearson’s correlation coefficient between all pairs of tags in the database and store that in a lookup table, which woudl be updated whenever a post is published. Then any tag whose correlation coefficient to the present post is above some threshold (say, > 0.50) would get the “See also” treatment on both tags’ entries.

You coudl even draft categories in wordpress to contribute, by using them as “tags” in their own right and lumping them into the regular index build (after all, as implemented in WordPress, tags and categories are just redundant taxonomic systems). However, you also might look for correlations between tags and categories, and use the categories as Index parent terms. An example from my own geekblog would be something like

Anime
– Ranma
– Makoto Shinkai
– Someday’s Dreamers
(…)
Geek Service
– Asus EEE PC
– HDTV
– Space
(…)

I had to manually generate the above but it would be far simpler to do it via correlation analysis instead. At any rate, the basic idea is to assign categories as index headings and tags as their cdependents, since presumably categories are more formally taxonomic, and more importantly, fewer. In fact you could do both, treating categories as tags and also giving them higher status as above. You would just need to put a logical test in to exclude a category from appearing as its own parent/child!

Obviously a tag-driven index as above wouldn’t fit in a sidebar. A useful place for it would be its own page, but you might also imagine it embedded on the 404 page. As a standalone, though, it would be a very useful node for search engine optimization, enough so that perhaps it should be called a “tagdex” instead of an index to better distinguish it.

Though useful to any blogger using tags on wordpress, a tagdex would be far more effective on a site whose tags were a genuine folksonomy rather than a taxonomy, since the tag diversity would be greater. However, folksonomy is not a feature of WordPress, unless you use Scott’s awesome WP-Folksonomy plugin (which he wrote in response to my earlier rant about taxonomies and folksonomies). If a thriving ecosystem of wordpress-based folksonomies can be encouraged to thrive (using Scott’s plugin, or equivalent), that will be a significant step towards the Semantic Web. A tagdex represents a coherent snapshot of all the tag metadata in that site’s folksonomy (or taxonomy). As such, it is something that could be parsed and aggregated by the hypothetical Semantic Search Engine of the future.

One response to “beyond the tag cloud: the tagdex”

August 26, 2008

michael brewster

Interesting idea here… I’m a teacher and writing consultant, and I’ve used “Most Important Word” (MIW) strategies in reading for years. With the advent of tag clouds, I have been able to have students see correlations between word frequency and their self-generated MIW lists. But I’m at the limit of innovation because while looking at the frequency of a word in a text is cool or a nice visual shorthand, it is missing that semantic meaning component.

I’m interested in building a text tool that can map associations between ideas and words in a text but that acts like a tag cloud in its simplicity for display purposes on the web. The sort of stuff I see that acts in the associative way seems to be animated (Flash? Java?-driven) and doesn’t translate well to the cut & paste nature of educational use. Invariably, I’m reduced to a screengrab which loses searching/text functionality besides linking.

I can’t be alone in seeing the potential for tagging relevancy that exceeds “frequency,” can I? Who’s doing useful work in this area?