rethinking the structure of discussions online

So, here’s a developing conversation by several heavyweights on the social media and technology spheres: Anil Dash argues that the new generation of social discussion tools like Branch, Svbtle and Medium are exclusive, Fred Wilson chimes in with a paean to inclusivity, and Josh Miller says openness is a spectrum. Dave Winer then says the real problem is a lack of innovation in creating systems for Discussion, a problem that Branch, Medium etc are trying to solve but they are constrained by the problem of access and signal to noise that Dash and Miller are taking issue with.

Maybe what we need to do is to throw out the old paradigm of discussion as “post-comment” and instead try to merge those categories. The entire conversation should be more folksonomic (disclosure, folksonomy is one of my Pet Issues, see my manifesto/rant on folksonomy here)

One great example is the P2 theme from WordPress. It upends the blog format by putting a post-entry box at the top of the theme (no more Dashboard, that talismanic secret niche from which only the Initiated may create content). Comments appear on the front page indented to the main post in real time, instead of being hidden under a link and requiring a page refresh. As is usual with WordPress, only registered users may post, but now the comment field is more transparent and included on the main page at the same stature as the parent post itself. This serves to demote a post and promote the comments. We used this format for amazing discussions at Talk Islam for years (until the site waned due to lack of participation and other priorities).

I’ve also argued that structurally, posts-comments on blogs represent a single “node” in exactly the same way that a forum thread represents a single node. In fact, forums map precisely onto blogs; no one has yet created a WordPress theme that represents posts as threads, but it could easily be done (I was disappointed with BBpress for its failure to recognize this duality).

P2 is a great start but it needs to go further. Blog systems allow the public to comment on a post, but they don’t fully embrace folksonomy (allowing the public to tag a post or add meta data), and they certainly don’t allow the public to make posts. Of course, spam is always a concern, but here is an area ripe for innovation rather than relying on captchas and user roles. For example, what about using social media to weed out spammers from real users? I am user azizhp on almost all social media profiles, for example; a smart spam detection system could cross-ref my email address and username across those systems to establish my bona fides transparently. Also note that existing anti-spam services like Akismet inherently assume the post-comment model exists; Akismet doesn’t scan your posts to see if they are spam. Imagine if it could! With that capability, WordPress could immediately take the concept of P2 even further by allowing unregistered users to post to a blog.

Ultimately, Discussions online aren’t as complicated as Branch, Medium, etc make it out to be. Someone says something; others respond. The only question is, Who? Who speaks first? Who speaks next? Who gets to categorize/tag the debate? Who can add value? Right now, there’s no way to answer all of these questions with, “anyone, that’s who”. Instead of reiterating the old post-comment model we need to turn to folksonomy as an alternative and then start trying to craft technological solutions to the inevitable new set of problems that will involve. I think those problems can be solved, and we have most of those tools already, though there’s lots of room for innovation. And the result will be something both open and inclusive, far more so than anything we have right now.

bit.ly and folksonomy

I noticed a few links on Twitter using bit.ly for url shortening rather than the old standby tinyurl and was intrigued. RWW raved about the service as well, mainly because it makes an attempt to categorize links using semantic algorithms. I took a look and have to admit that one other feature of bit.ly stands out – the ability to define your own link code. In one sense this is kind of a bad thing, because it is Yet Another Namespace that everyone with a brand or trademark would do well to rush to grab (assuming the service does take off, which I am still a bit skeptical about). For example, bit.ly/blog and bit.ly/islam are now taken, pointing to various blogs of mine as an experiment (I note with satisfaction that the Obama campaign already is on top of this and has grabbed bit.ly/obama).

At RWW, Marshall says that bit.ly’s semantic classification of links makes it the tool of the future:

Bit.ly is analyzing all of the pages that its users create shortcuts to using the Open Calais semantic analysis API from Reuters! Calais is something we’ve written about extensively here. Bit.ly will use Calais to determine the general category and specific subjects of all the pages its users create shortcuts to. That information will be freely available to the developer community using XML and JSON APIs as well.

I can’t share in Marshall’s enthusiasm however because I don’t see the semantic categorization as innately useful. I’ve blogged before about why folksonomies are the key to web 3.0, and all bit.ly is doing is generating a taxonomy for its links, not levarging the power of folksonomy. In a sense, by letting users define their own link code, bit.ly is sitting on top of an intrinsic mechanism for folksonomy already, by simply treating the codes that users assign to their links as folksonomic tags. I hope that they recognize the value of those custom codes, and not get too enamoured of and distracted by the magic word “semantic”.

Whether bit.ly gains traction is of course not going to be driven by fringe features such as geotagging and semantics (ie, metadata), but rather by how easy it is to integrate teh service into other tools that users actually, well, use. TinyURL rode the Twitter wave to prominence, since it is the default url-shortener service (automatically invoked when you tweet, with no user intervention required). Similar services like is.gd, which have a much simpler API and are theoretically more robust in their namespaces, still haven’t broken into the market much yet, even though they also have teh requisite bookmarklets and firefox extensions already. If bit.ly wants to make inroads it needs to become the default URL service for a hot web app like Friendfeed, or even contract with twitter itself to become a user-specified alternative to tinyurl. I think that it would make sense to try for partnership with friendfeed, actually, because then the link history can be integrated into the user’s profile and browsed like any other service. If bit.ly doesn’t support RSS feeds of its user’s linkages, they should.)

Overall, there are plenty of services out there but the thing to remember is that none shoudl be thought of as genuinely archival. A shortened URL should be a tool of convenience, but don’t expect that link to work forever. In one sense its better for there to be many such services rather than one to rule them all, which is why I am glad to see another competitor to TinyURL emerge. The rest is just icing on the cake (and hopefully a spur towards further innovation).

beyond the tag cloud: the tagdex

I think tag clouds are somewhat useless, to be honest. They are a nice way to fill up a bit of space in a sidebar, if you restrict the cloud to the top 25 or so, but unless the writer is imposing a strict taxonomy on themselves, ultimately the size of the cloud will balloon to an unmanageable size. And a tag cloud in a folksonomy makes no sense, because the wide variation in tags is a feature, not a bug. You want the tags to be vast and redundant. It is ok to have a post about Jhumpa Lahiri’s latest novel tagged “book”, “books”, “review”, “Lahiri”, etc. because this increases the points of entry to the content from tag indexing services like technorati, and also increases the intra-blog, inter-post linkages (assuming you are using some variant of a Related Posts plugin that uses tags for determining what is related).

A far better way to think of tags is to consider them as terms in an index. The same kind of index you find at the end of a piece of non-fiction, to be specific. Consider an excerpt from the Index to the book, The Physics of Star Trek, as an example:

excerpt from second page of index to Physics of Star Trek

It’s easy to see how tags could be recruited to “build” an index of this type. The tags would first need to be sorted in alphabetical order, and then listed as a DL-type HTML list with the “page number” (post number). A range of posts coudl be indicated by the usual dash (ex. Bosons, 192-194) and a list of separate posts by commas (Black Star, 15, 51).

That would be the crudest implementation, but quite effective. However you could go further than this. For example, what about the “see also” link? You could simulate this by looking for tags whose usage is highly correlated, like “Lahiri” and “books”. You could literally calculate Pearson’s correlation coefficient between all pairs of tags in the database and store that in a lookup table, which woudl be updated whenever a post is published. Then any tag whose correlation coefficient to the present post is above some threshold (say, > 0.50) would get the “See also” treatment on both tags’ entries.

You coudl even draft categories in wordpress to contribute, by using them as “tags” in their own right and lumping them into the regular index build (after all, as implemented in WordPress, tags and categories are just redundant taxonomic systems). However, you also might look for correlations between tags and categories, and use the categories as Index parent terms. An example from my own geekblog would be something like

Anime
Ranma
Makoto Shinkai
Someday’s Dreamers
(…)
Geek Service
Asus EEE PC
HDTV
Space
(…)

I had to manually generate the above but it would be far simpler to do it via correlation analysis instead. At any rate, the basic idea is to assign categories as index headings and tags as their cdependents, since presumably categories are more formally taxonomic, and more importantly, fewer. In fact you could do both, treating categories as tags and also giving them higher status as above. You would just need to put a logical test in to exclude a category from appearing as its own parent/child!

Obviously a tag-driven index as above wouldn’t fit in a sidebar. A useful place for it would be its own page, but you might also imagine it embedded on the 404 page. As a standalone, though, it would be a very useful node for search engine optimization, enough so that perhaps it should be called a “tagdex” instead of an index to better distinguish it.

Though useful to any blogger using tags on wordpress, a tagdex would be far more effective on a site whose tags were a genuine folksonomy rather than a taxonomy, since the tag diversity would be greater. However, folksonomy is not a feature of WordPress, unless you use Scott’s awesome WP-Folksonomy plugin (which he wrote in response to my earlier rant about taxonomies and folksonomies). If a thriving ecosystem of wordpress-based folksonomies can be encouraged to thrive (using Scott’s plugin, or equivalent), that will be a significant step towards the Semantic Web. A tagdex represents a coherent snapshot of all the tag metadata in that site’s folksonomy (or taxonomy). As such, it is something that could be parsed and aggregated by the hypothetical Semantic Search Engine of the future.

Semantic authoring

RWW argues that for the Semantic Web to really take off, content-management systems need to incorporate semantic markup. They argue,

Allowing authors or readers to add tags to articles or posts allows a measure of classification, but it does not capture the true semantic essence of the document. Automated Semantic Parsing (especially within a given domain) is on the way – a la Spock, twine and Powerset – but it is currently limited in scope and needs a lot of computing power; in addition, if we could put the proper tools in the authors’ hands in the first place, extracting the semantic meaning would be so much easier.

For example, imagine that you are building an online repository of content, using paid expert authors or community collaboration, to create a large number of similar records – say, a cookbook of recipes, a stack of electrical circuit designs, or something similar. Naturally, you would want to create domain-specific semantic knowledge of your stack at the same time, so that you can classify and search for content in a variety of ways, including by using intelligent queries.

Ideally, the authors would create the content as meaningful XML text, so that parsing the semantics would be much easier. A side benefit is that this content can then be easily published in a variety of ways and there would be SEO benefits as well, if search engines could understand it more easily. But tools that create such XML, and yet are natural and easy for authors to use, don’t appear to be on their way; and the creation of a custom tool for each individual domain seems a difficult and expensive proposition.

The problem with XML authoring, as the author notes, is that it’s too time-consuming from a user perspective. You’re basically requiring that the user fill out a detailed, unique form on every post or content node.

What’s really needed is a way for the CMS to prefill semantic data for the user, and then let the user tweak it. The prefill would have to come from contextual information (post title keywords, word frequency analysis, link text) and metadata (category, tags). In a way you have a mini-search engine index running against your own post, and giving you “search results” to let you “rank” the sub-content into a structured form. And even then, take pains to hide the XML-ness; instead of showing the user a pile of confusing <blah>blahblah</blah> xml markup, it should provide a cleaner view like

drink: triple latte
cost: $4.50
opinion: sucks, overpriced

where of course the labels (drink, cost, opinion) are mapped to the actual XML containers <drink></drink> etc. The user can edit the list easily, insert or delete labels as they choose, and then hit publish.

To achieve this, you need good metadata. By good, I mean “rich” – it should be noted that tagging alone is actually pretty poor as far as metadata goes because it’s usually only a taxonomy imposed by the author, not a true folksonomy. The advantage of the latter is that the metadata is more variable, giving any semantic algorithm more room to play with. Note that tagging as implemented in WordPress is not a true folksonomy, though a plugin now exists to rectify that. Semantic algorithms will starve on taxonomies alone.

wordpress folksonomy progress

The experiment of adding Scott’s WP_Folksonomy plugin to my blog has been a success so far. My blog, haibane.info, is by no means a giant traffic draw but it does have enough that the userbase has been adding some tags of their own. I have at least one user (Scott himself?) who reliably adds tags to most posts, and there have been others drive-by tagging as well. It’s encouraging to see however that there was a thread at the WordPress support forum asking about folksonomy; I directed them to the plugin asap. Now, a search for the term “folksonomy” will lead people to the same tool, and thus the seeds are sown for more people to use it. Let’s hope hat many more blogs, preferably far larger than mine, embrace and adopt folksonomy this year.

The first WordPress-based folksonomy

I’ve added ScottSM’s WP_Folksonomy plugin to my Haibane.info blog. It seems to work like a charm, and is up to version 0.5. It’s a grand experiment of sorts, representing the very first WordPress-based true tagging folksonomy rather than the taxonomic implementation that WordPress features by default. Let’s see how it goes, I’ve allowed anyone to tag a post, not just registered users, so it really is a free-for-all. Exciting!

WP_Folksonomy

ScottSM has written a folksonomy plugin for WordPress!

* v0.21 12-15-2007:
o Fixed overlap between tag add and comment add $_POST variables
* v0.2 12-15-2007:
o Added Control Panel
o Added Subscribers Only and Authorize Tags options
o Tracks submitted tags
o Added Delete and Accept Tag actions
* v0.1 12-14-2007:
o A rough public tag adder

I’ll be installing this on Haibane.info as soon as I get a chance. I need to modify my template for tags support first so it might take me a few days. However this is one plugin that I think is a real game-changer.

Scott, I highly encourage you to submit this to the WordPress Weblog Tools Collection blog.

taxonomy versus folksonomy

The WordPress 2.3.x branch officially incorporated tagging into the WordPress core, rendering many third-party tagging plugins obsolete. However, the implementation of tags is largely redundant to the existing category system. As present, both categories ((I use cats as shorthand for categories)) and tags are systems for taxonomy:

tax·on·o·my (tăk-sŏn’É™-mÄ“) pronunciation
n., pl. -mies.

1. The classification of organisms in an ordered system that indicates natural relationships.
2. The science, laws, or principles of classification; systematics.
3. Division into ordered groups or categories: “Scholars have been laboring to develop a taxonomy of young killers” (Aric Press).

Of course, the definition applies to more than just organisms or serial killers. Note that the definition explicitly mentions categories, a reality that is embraced by other content-management systems like Drupal. The inclusion of tags in WordPress was driven more from a desire to optimize for tag-based blog search engines like Technorati rather than having a clear taxonomic usage goal in mind. This has resulted in a lot of confusion among WP endusers about how to balance cats and tags.

For example, Lorelle, one the WP community’s major personalities, has written volumes about tags, and comes across rather skeptical. She describes tags as a standardized keyword system. However, she also insists that cats are not tags. Lorelle summarizes the sole benefits of tags (in her opinion) thus:

  • To provide additional keywords to help search engines and tag services add up your keyword counts and classify your post content.
  • To provide additional navigation on your site, like an index reference, helping the user find related post content.
  • To provide additional information and resources by linking to off-site services, such as Technorati, del.icio.us, or other off-site search engines or tag services.

Note that items 1 and 3 are somewhat redundant. But all of these can be achieved with categories as well, if the user enforces a discipline on themselves. Ultimately, a gigantic cloud of tags is as useless as an enormous list of categories, but either one applied consistently and selectively results in a genuinely useful categorization that then can be leveraged for navigation and aid search engines in classification.

There is a much more fundamental difference between cats and tags that the WP developers seem to have missed, however, that transcends conventional taxonomy. The true power of tags is fully realized not as yet another taxonomy, but as a folksonomy, defined as

“…the practice and method of collaboratively creating and managing tags to annotate and categorize content. In contrast to traditional subject indexing, metadata is not only generated by experts but also by creators and consumers of the content.” [ref: Wikipedia: Folksonomy]

Note that a true folksonomy is not just a site that lets users tag their own content, but also lets users tag others’ content as well. For example, Technorati is not really a folksonomy, whereas del.icio.us is, because anyone can add their own tags to they discover in the latter. ((Note that the WP developers tend to gloss over this important difference between Technorati and del.icio.us)) Other excellent examples are Amazon.com, which allows users to tag items for sale, and the political site Daily Kos, where users tag each others’ diaries (mini-blogs).

The main argument against this sort of thing is essentially “it will lead to chaos” – and to some extent, it has ((See: meta-noise.)) – but consider that the same argument could be made against Wikipedia‘s “anyone can edit this page” philosophy. What happens is that the community itself self-organizes; on Wikipedia there are all sorts of conventions that have emerged (no trivia, proper sourcing, “disputed” article designations, etc) and on DailyKos the community itself polices tag use, creates tag conventions of its own, and regularizes the common ones.

The bottoms-up, unstructured approach of folksonomy is a new paradigm for finding content online, that is largely orthogonal to the old way of brute-force searching. The driver here is not a search algorithm, but a truly human filter, that is infinitely customizable:

As folksonomies develop in Internet-mediated social environments, users can (generally) discover who created a given folksonomy tag, and see the other tags that this person created. In this way, folksonomy users often discover the tag sets of another user who tends to interpret and tag content in a way that makes sense to them. The result is often an immediate and rewarding gain in the user’s capacity to find related content (a practice known as “pivot browsing”). Part of the appeal of folksonomy is its inherent subversiveness: when faced with the choice of the search tools that Web sites provide, folksonomies can be seen as a rejection of the search engine status quo in favor of tools that are created by the community.

Though folksonomies are in their infancy, they arguably represent the next evolutionary step for the Internet, and are intimately tied to the concept of the Semantic Web. The modern Internet is thoroughly dominated by Google, which represents the pinnacle of the old, algorithmic model, but it’s not beyond the realm of imagination to envision Google becoming outdated someday with the rise of a Semantic search engine (and the groundwork for such a “Web 3.0” has already been laid today) ((Extrapolating, one might imagine Web 4.0 to arrive with true Artificial Intelligence, thereby removing humans and automating the creation of meta-data to power the Semantic web. A serious name might be the Intelligent Web; a more tounge-in-cheek one might be the Pedantic Web.)).

None of this applies to the taxonomic tag system that WordPress uses today. WordPress’ tag system is designed more for Technorati than for Del.icio.us. However, that doesn’t mean WordPress can’t support a folksonomy via plugins. The basic functionality required of such a plugin is to add an “Edit tags” link on each post. The user access level of who is permitted to edit the tags (from Admins right down to unregistered users) would be configurable, and the admin could decide whether to permit individual authors to override the tag edit access on a per-post basis. The default setting might be to permit any registered blog user to edit tags, and disable author override, to encourage a more open stance (albeit with some protection). Supplementary plugins like Peter Keung’s captcha system would be useful in weeding out spam registrations, as well.

Of course, categories would remain under the exclusive control of the blog admins. This will create two sets of taxonomies per blog; one defined by the author, and one by the readers. The former would be optimized for keyword taxonomy engines like Technorati and the latter would be optimized for folksonomic engines like del.icio.us. Both systems would meet all three of Lorelle’s criteria above and serve to attract traffic and provide readers with multiple points of entry to the site’s content.

Ultimately, plugin-based folksonomy functionality could eventually be adopted into the WordPress core, just as tags themselves were adopted ((There’s a long way to go, of course. As of this writing, the term folksonomy doesn’t even appear once in the WP-Trac system.)). If so, the WordPress technology would become part of the foundation for Web 3.0 itself – and beyond.

UPDATE: The confusion about folksonomies versus taxonomies persists, with many people simply assuming that “tags” by themselves are sufficient. For example, Binary Bonsai’s Matt has a post about his move away from categories towards tags, but he has really just substituted one taxonomy for another. This WordPress “folksonomy” plugin (now deprecated) did much the same thing. Presently there is no true implementation of a folksonomy on the WordPress platform.