The Hairy Underbelly: When Disambiguation Fails

Discogs is an incredible archive of music from all around the world. What’s most incredible is the level of detail available on every page. The casual 21st century web user probably never thinks about how all that information gets there. In fact it’s often hard work. Sometimes it gets ugly…

This guest series, written by hmvh, is about what really goes on under the surface, the elbow grease involved in making Discogs shine, the hairy underbelly of contributing to Discogs.

————

Discogs is a database for releases.

A “release” is Discogs’ term for audio carriers such as vinyl records, cassettes, reel-to-reel tapes, CDs, wax cylinders – even iTunes downloads.

Audio aside, each release may also contain a wealth of metadata in the form of barcodes, names of record labels, companies, manufacturers, studios, illustrators, musicians and technical personnel. Capturing such a “credit” creates a distinct page for the relevant entity so that any audio recording they’ve ever been involved in or contributed to (even long after their demise) may be crosslinked to a swathe of other releases, companies, places, and people.

And some of these entities share names.

You’d be quite surprised how many Mark Browns, Tim Whites or DJ Davids there are out there — let alone mononyms such as Mandy, Sandy or Brandy!

How are these names kept apart in the database?

Discogs disambiguates namesakes via a simple numerical suffix. The first (to the database) John Smith is entered exactly as such; all subsequent John Smiths get the next available number. The concept is easy to grasp but often difficult to implement. It’s a manual process, one which requires users to put on their thinking caps in order to identify and pick the correct entry out of a potentially long list of possibilities – or create a new one.

Are you entering a release for the Canadian rapper, or is this John Smith the sound engineer from the UK, John Smith the indie folk singer from Idaho who just put out his first album, or a misspelt Jon Smith on a Russian bootleg? Some people wear many hats.

Identifying the correct artist from one sharing a name with other artists often requires researching each unique artist’s body of work. Luckily, most active musicians today have an official site, and it usually includes a biography. The chance of a deceased Kentucky bluegrass fiddler active during the 1950s being the same individual responsible for remastering a series of Death Metal CDs on a German label in the 2010s is rather slim. Common sense goes a long way.

Unfortunately, sometimes people are in short supply. Sometimes honest mistakes are made, sometimes people are reckless or simply don’t care. Occasionally we witness bouts of crazy bulk changes by fans or the artists themselves that are tantamount to vandalism (these hijacks are worth an article of their own). Whatever the cause, we might end up with jumbled pages that list the collected work of several distinct people – something entirely counter to the function of Discogs and the accuracy the database community strives for.

How do these mistakes get fixed?

Any user is welcome to correct blatantly obvious mistakes. If you’re sure that a release should point to John Smith (3) instead of John Smith (42), go ahead and update the erroneous entry in accordance with the guidelines.

You should also make an effort to describe and update the artist’s or label’s profile to help other contributors avoid making a similar mistake.

Occasionally, though, the required corrections can be profound: The artist’s name might have been wrong to begin with; there could be many potential duplicates, or the original (oldest) entry might be impossible to determine. With or without diacritics? Are eyeballs and opinions from others in the Database Community required? In each of these cases the problem should be raised in the forums. You’ll find many helpful users who will offer not only guidance and suggestions but will likely be willing to ease the workload if a “mass edit” is required. Sometimes there are arguments, occasionally egos get bruised, rarely do genuine concerns get turned down.

This recent incident is worth pointing out as a case study: a forum thread stating that “Someone hijacked two of my album credits and linked their cds to it.”

Many readers were puzzled but once the dust had settled and the complaint was understood it became a matter of determining which Scott Wilson could lay claim to being the first artist with that name entered into the Discogs Database (some 13 years ago!). It was he. Scott provided a verifiable list of entries he should be linked to; everything else would therefore be the work of other people of the same name. Those had be moved. We also knew who the other Scott Wilson was and what his volume of work encompassed because he too had joined the conversation and provided a concise list. This author took responsibility of the situation, and through a series of forum posts and private messages all necessary facts got thrashed out until the discographies of the two artists with the same name were separated.

The “other” Scott Wilson ended up with the next available slot because his discography had also been spread across several other Scott Wilson profiles – over and above the original one. In the end it was necessary to review and try to identify each and every single Scott Wilson as far as possible.

We started with 25, and the number eventually fanned out to the current number: 29. There was no favouritism involved, no preference for the “bigger” or “more famous” artist. Where possible, disambiguation numbers were kept as originally and sequentially assigned. Along the way, the sizeable work of unrelated Australian guitarist/vocalist/producer Scott Wilson (who had also been spread across multiple profiles) was neatly tied together. It took two lengthy sessions over 24 hours to clear up the mess.

In the end the situation was resolved amicably within a matter of days – largely in credit to both Scott Wilsons bringing the issue to the forums as well as their subsequent cooperation.

Unfortunately, that’s not always the case. Sometimes there are heated arguments, and sometimes the sheer numbers of entries is huge – as is the case with the current effort trying to sort through all the artists using the imaginative name of just plain Alex.

It’s a Sisyphean task. Users are constantly making, discovering and repairing errors. Every day in the Database Forums there are new threads started over concerns that two or more artists (or labels or studios for that matter) may have multiple entries that need to be separated – often across different alphabets. And those are especially challenging if you can’t even read the language.

What’s incredible to see is the huge community of people who care enough about creating a well organised archive of all the music in the world that they volunteer their time to cleaning Discogs. Every day.

Most casual visitors to the site are unaware of the work going on behind the scenes.

This series of blog posts will try to shine a strobe light on the hairy underbelly and some of the inner workings of the beast named Discogs.

————

hmvh has been active on Discogs since June 2004 and has contributed:

  • 1,302 new releases
  • 10,906 release edits
  • 7,308 new master releases
  • 1,681 master release edits
  • 3,818 artist profile edits
  • 1,875 label profile edits
  • 8,413 release images
  • 3,466 artist images
  • 875 label images

————

If you’re also a Contributor interested in writing a guest piece for the Discogs Blog, please send a Private Message to Moon_Ray.

Return to Discogs Blog

Leave A Reply