taxonomy – Haibane.info

Tags to Hashtags #wp

I’ve written a new plugin for wordpress entitled “AHP Tags to Hashtags” for use with WordPress and WordPress MU. The plugin can be found for now at pastebin here, I will update when it’s been added to the official wordpress plugin repository.

The plugin appends the tags for each post to the post title in the RSS feed. For example, for a post titled “Awesome post” which is tagged with “Amazing, Awesome, Super awesome”, the RSS feed will show the post titles as “Awesome post #Amazing #Awesome #Superawesome”. Note that spaces in a tag are removed, and hash symbols (#) are prepended to each.

This plugin is useful primarily to bloggers who pipe their posts into Twitter. The post tags become Twitter hashtags. Since post tags and twitter hashtags are both a form of metadata, it is natural to simply and automatically reuse the one for the other.

Consider a blog post on the Iran election. Normally youd tag the post Iran and then when you tweet it, youd have to manually insert the twitter hashtag #iranelection. Now, you can simply tag the post iranelection (no # symbol) and it will automatically be hashtagged. Combined with a service like Twitterfeed, this plugin can greatly automate the process of piping relevant posts into the twitterverse.

Note that the plugin makes no attempt to check that the total length of the post title, including hashtags, falls within the 140-character limit imposed by Twitter.

At present the plugin has no options. The feature roadmap includes the following:
– add title character length checking
– toggle using tags or categories for conversion to hashtags
– let user decide whether to remove spaces in tags, or convert to underlines or other character

this is a pretty simple plugin so other feature requests are appreciated.

UPDATE: version 2.0 of the plugin is at pastebin here. This version no longer appends all tags, but only those already beginning with #. This way the blogger can selectively choose which tags they want converted into hashtags.

Semantic authoring

RWW argues that for the Semantic Web to really take off, content-management systems need to incorporate semantic markup. They argue,

Allowing authors or readers to add tags to articles or posts allows a measure of classification, but it does not capture the true semantic essence of the document. Automated Semantic Parsing (especially within a given domain) is on the way – a la Spock, twine and Powerset – but it is currently limited in scope and needs a lot of computing power; in addition, if we could put the proper tools in the authors’ hands in the first place, extracting the semantic meaning would be so much easier.

For example, imagine that you are building an online repository of content, using paid expert authors or community collaboration, to create a large number of similar records – say, a cookbook of recipes, a stack of electrical circuit designs, or something similar. Naturally, you would want to create domain-specific semantic knowledge of your stack at the same time, so that you can classify and search for content in a variety of ways, including by using intelligent queries.

Ideally, the authors would create the content as meaningful XML text, so that parsing the semantics would be much easier. A side benefit is that this content can then be easily published in a variety of ways and there would be SEO benefits as well, if search engines could understand it more easily. But tools that create such XML, and yet are natural and easy for authors to use, don’t appear to be on their way; and the creation of a custom tool for each individual domain seems a difficult and expensive proposition.

The problem with XML authoring, as the author notes, is that it’s too time-consuming from a user perspective. You’re basically requiring that the user fill out a detailed, unique form on every post or content node.

What’s really needed is a way for the CMS to prefill semantic data for the user, and then let the user tweak it. The prefill would have to come from contextual information (post title keywords, word frequency analysis, link text) and metadata (category, tags). In a way you have a mini-search engine index running against your own post, and giving you “search results” to let you “rank” the sub-content into a structured form. And even then, take pains to hide the XML-ness; instead of showing the user a pile of confusing <blah>blahblah</blah> xml markup, it should provide a cleaner view like

drink: triple latte
cost: $4.50
opinion: sucks, overpriced

where of course the labels (drink, cost, opinion) are mapped to the actual XML containers <drink></drink> etc. The user can edit the list easily, insert or delete labels as they choose, and then hit publish.

To achieve this, you need good metadata. By good, I mean “rich” – it should be noted that tagging alone is actually pretty poor as far as metadata goes because it’s usually only a taxonomy imposed by the author, not a true folksonomy. The advantage of the latter is that the metadata is more variable, giving any semantic algorithm more room to play with. Note that tagging as implemented in WordPress is not a true folksonomy, though a plugin now exists to rectify that. Semantic algorithms will starve on taxonomies alone.

del.icio.us bundle linkrolls

The grandfather of social bookmarking sites is del.icio.us, which basically brought “tagging” mainstream (along with Technorati). Most people I know who use the service end up with unwieldy tag clouds, however, because it’s often hard to enforce a self-discipline on what tags you assign. I’ve spent a lot of time manually pruning my tags but there are still plenty on my tag list that are redundant or obsolete.

There is an option to “bundle” your tags – essentially, tagging a group of tags, to help you organize things better. However, bundles at present are only visible to the user, and do not have a dedicated URL or RSS feed like individual tags do. Using the “+” operator to search for multiple tags, ie http://del.icio.us/azizhp/Iraq+Hillary, functions as an AND operator, whereas to simulate a bundle you’d need an OR equivalent that del.icio.us does not support. As a result, if you want to add a linkroll to your site that only shows tag from a single bundle, you’re out of luck.

However, there is a workaround, albeit a clumsy one: create “container” tags. Then you must manually tag all items in the bundle with the container tag. After doing this, you will be able to access your bundle using the container tag, and can create customized linkrolls accordingly. For example, I created the “2008” container tag for all my tags related to the Presidential candidates.

One caveat: try to avoid naming your container tags identically to the bundle. You can prefix the container tags with the “@” symbol to keep them distinct, or name them entirely differently. This is so that if/when in the near future del.icio.us improves support for bundles there won’t be any namespace collisions between your tags and your bundles. Once that day comes you can simply delete all the container tags if you so wish.

Alas, there still is no way to create a tag cloud from a single bundle, so that still awaits the del.icio.us team’s attention.

taxonomy versus folksonomy

The WordPress 2.3.x branch officially incorporated tagging into the WordPress core, rendering many third-party tagging plugins obsolete. However, the implementation of tags is largely redundant to the existing category system. As present, both categories ((I use cats as shorthand for categories)) and tags are systems for taxonomy:

taxÂ·onÂ·oÂ·my (tÄƒk-sÅn’É™-mÄ“) pronunciation
n., pl. -mies.

1. The classification of organisms in an ordered system that indicates natural relationships.
2. The science, laws, or principles of classification; systematics.
3. Division into ordered groups or categories: â€œScholars have been laboring to develop a taxonomy of young killersâ€ (Aric Press).

Of course, the definition applies to more than just organisms or serial killers. Note that the definition explicitly mentions categories, a reality that is embraced by other content-management systems like Drupal. The inclusion of tags in WordPress was driven more from a desire to optimize for tag-based blog search engines like Technorati rather than having a clear taxonomic usage goal in mind. This has resulted in a lot of confusion among WP endusers about how to balance cats and tags.

For example, Lorelle, one the WP community’s major personalities, has written volumes about tags, and comes across rather skeptical. She describes tags as a standardized keyword system. However, she also insists that cats are not tags. Lorelle summarizes the sole benefits of tags (in her opinion) thus:

To provide additional keywords to help search engines and tag services add up your keyword counts and classify your post content.
To provide additional navigation on your site, like an index reference, helping the user find related post content.
To provide additional information and resources by linking to off-site services, such as Technorati, del.icio.us, or other off-site search engines or tag services.

Note that items 1 and 3 are somewhat redundant. But all of these can be achieved with categories as well, if the user enforces a discipline on themselves. Ultimately, a gigantic cloud of tags is as useless as an enormous list of categories, but either one applied consistently and selectively results in a genuinely useful categorization that then can be leveraged for navigation and aid search engines in classification.

There is a much more fundamental difference between cats and tags that the WP developers seem to have missed, however, that transcends conventional taxonomy. The true power of tags is fully realized not as yet another taxonomy, but as a folksonomy, defined as

“…the practice and method of collaboratively creating and managing tags to annotate and categorize content. In contrast to traditional subject indexing, metadata is not only generated by experts but also by creators and consumers of the content.” [ref: Wikipedia: Folksonomy]

Note that a true folksonomy is not just a site that lets users tag their own content, but also lets users tag others’ content as well. For example, Technorati is not really a folksonomy, whereas del.icio.us is, because anyone can add their own tags to they discover in the latter. ((Note that the WP developers tend to gloss over this important difference between Technorati and del.icio.us)) Other excellent examples are Amazon.com, which allows users to tag items for sale, and the political site Daily Kos, where users tag each others’ diaries (mini-blogs).

The main argument against this sort of thing is essentially “it will lead to chaos” – and to some extent, it has ((See: meta-noise.)) – but consider that the same argument could be made against Wikipedia‘s “anyone can edit this page” philosophy. What happens is that the community itself self-organizes; on Wikipedia there are all sorts of conventions that have emerged (no trivia, proper sourcing, “disputed” article designations, etc) and on DailyKos the community itself polices tag use, creates tag conventions of its own, and regularizes the common ones.

The bottoms-up, unstructured approach of folksonomy is a new paradigm for finding content online, that is largely orthogonal to the old way of brute-force searching. The driver here is not a search algorithm, but a truly human filter, that is infinitely customizable:

As folksonomies develop in Internet-mediated social environments, users can (generally) discover who created a given folksonomy tag, and see the other tags that this person created. In this way, folksonomy users often discover the tag sets of another user who tends to interpret and tag content in a way that makes sense to them. The result is often an immediate and rewarding gain in the user’s capacity to find related content (a practice known as “pivot browsing”). Part of the appeal of folksonomy is its inherent subversiveness: when faced with the choice of the search tools that Web sites provide, folksonomies can be seen as a rejection of the search engine status quo in favor of tools that are created by the community.

Though folksonomies are in their infancy, they arguably represent the next evolutionary step for the Internet, and are intimately tied to the concept of the Semantic Web. The modern Internet is thoroughly dominated by Google, which represents the pinnacle of the old, algorithmic model, but it’s not beyond the realm of imagination to envision Google becoming outdated someday with the rise of a Semantic search engine (and the groundwork for such a “Web 3.0” has already been laid today) ((Extrapolating, one might imagine Web 4.0 to arrive with true Artificial Intelligence, thereby removing humans and automating the creation of meta-data to power the Semantic web. A serious name might be the Intelligent Web; a more tounge-in-cheek one might be the Pedantic Web.)).

None of this applies to the taxonomic tag system that WordPress uses today. WordPress’ tag system is designed more for Technorati than for Del.icio.us. However, that doesn’t mean WordPress can’t support a folksonomy via plugins. The basic functionality required of such a plugin is to add an “Edit tags” link on each post. The user access level of who is permitted to edit the tags (from Admins right down to unregistered users) would be configurable, and the admin could decide whether to permit individual authors to override the tag edit access on a per-post basis. The default setting might be to permit any registered blog user to edit tags, and disable author override, to encourage a more open stance (albeit with some protection). Supplementary plugins like Peter Keung’s captcha system would be useful in weeding out spam registrations, as well.

Of course, categories would remain under the exclusive control of the blog admins. This will create two sets of taxonomies per blog; one defined by the author, and one by the readers. The former would be optimized for keyword taxonomy engines like Technorati and the latter would be optimized for folksonomic engines like del.icio.us. Both systems would meet all three of Lorelle’s criteria above and serve to attract traffic and provide readers with multiple points of entry to the site’s content.

Ultimately, plugin-based folksonomy functionality could eventually be adopted into the WordPress core, just as tags themselves were adopted ((There’s a long way to go, of course. As of this writing, the term folksonomy doesn’t even appear once in the WP-Trac system.)). If so, the WordPress technology would become part of the foundation for Web 3.0 itself – and beyond.

UPDATE: The confusion about folksonomies versus taxonomies persists, with many people simply assuming that “tags” by themselves are sufficient. For example, Binary Bonsai’s Matt has a post about his move away from categories towards tags, but he has really just substituted one taxonomy for another. This WordPress “folksonomy” plugin (now deprecated) did much the same thing. Presently there is no true implementation of a folksonomy on the WordPress platform.