Kudos to the fans. One of the nominees for the Hugo Awards this year is Archive of Our Own, a fanfiction archive containing nearly 5 million fanworks—about the size of the English Wikipedia, and several years younger. It’s not just the fanfic, fanart, fanvids, and other fanworks, impressive as they are, that make Archive of Our Own worthy of one of the biggest honors in science fiction and fantasy. It’s also the architecture of the site itself.
At a time when we’re trying to figure out how to make the internet livable for humans, without exploiting other humans in the process, AO3 (AO3, to its friends) offers something the rest of tech could learn from.
Here’s a problem that AO3 users, like the rest of the internet, encounter every day: How do you find a particular thing you’re interested in, while filtering out all the other stuff you don’t care about? Most websites end up with tags of some sort. I might look through a medical journal database for articles tagged “cataracts,” search a stock photo site for pictures tagged “businesspeople,” or click on a social media hashtag to see what people are saying about the latest episode of #GameOfThrones.
Tags are useful but they also have problems. Although “cataracts,” “businesspeople,” and #GameOfThrones might seem like the most obvious tags to me, someone else might have tagged these same topics “cataract surgery,” “businessperson,” and #GoT. Another person might have gone with “nuclear sclerosis” (a specific type of cataract), “office life,” and #Daenerys. And so on.
There are two main ways of dealing with the problem of tagging proliferation. One is to be completely laissez-faire—let posters tag whatever they want and hope searchers can figure out what words they need to look for. It’s easy to set up, but it tends to lead to an explosion of tags, as posters stack on more tags just in case and searchers don’t know which one is best. Laissez-faire tags are common on social media; if I post an aesthetic photo of a book I’m reading on Instagram, I have over 20 relevant tags to choose from, such as #book #books #readers #reader #reading #reads #goodreads #read #booksofig #readersofig #booksofinstagram #readersofinstagram #readstagram #bookstagram #bookshelf #bookshelves #bookshelfie #booknerd #bookworm #bookish #bookphotography #bookcommunity #booklover #booksbooksbooks #bookstagrammer #booktography #readers #readabook #readmorebooks #readingtime #alwaysreading #igreads #instareads #amreading. “Am reading” indeed—reading full paragraphs of tags.
The other solution to the proliferation of competing tags is to implement a controlled, top-down, rigid tagging system. Just as the Dewey Decimal System has a single subcategory for Shakespeare so library browsers can be sure to find Hamlet near Romeo and Juliet, rigid tagging systems define a single list of non-overlapping tags and require that everyone use them. They’re more popular in professional and technical databases than in public-facing social media, but they’re a nice idea in theory—if you only allow the tag “cataract” then no one will have to duplicate effort by also searching under “cataracts” and “cataract surgery.”
The problem is rigid tags take effort to learn; it’s hard to convince the general public to memorize a gigantic taxonomy. Also, they become outdated. Tagging systems are a way of imposing order on the real world, and the world doesn’t just stop moving and changing once you’ve got your nice categories set up. Take words related to gender and sexuality: The way we talk about these topics has evolved a lot in recent decades, but library and medical databases have been slower to keep up.