Transitioning from HDF to JSON


Since the early days of Discogs, all release data has been stored in a format called HDF, or Hierarchical Data Format. At the time, this was a good solution due to the tight integration with the Clearsilver templating library which was the foundation of how we generated HTML.

As time went on, the JSON data format started gaining momentum not only amongst the development community, but in the Discogs codebase as well especially after converting our templates from Clearsilver to myghty and then finally to jinja2. It became apparent that in order to simplify our codebase, we needed to start pruning away legacy technologies like HDF.

The Implementation

Python has pretty good built-in JSON support that allows painless transitions to and from a native Python dictionary object. HDF? Not so much. There is no easy way to transform an HDF object to a native Python dictionary. If you want to do that, you have to traverse the HDF tree and manually build one. Here’s an example of that in action:

>>> import neo_cgi
>>> from neo_util import HDF
>>> release = HDF()
>>> release.readString(release_hdf_data)
>>> node = release.getObj('labels.0')
>>> labels = []
>>> while node:
...     labels.append({'name': node.getValue('name'), 'catno': node.getValue('catno')})
...     node =
>>> labels[0]
{'name': 'Svek', 'catno': 'SK032'}

All that code just to represent a Release’s labels in a native fashion. Converting back to HDF also requires more code. Mo code, mo problems.

However, if the release data was stored as JSON, transforming to a native Python dictionary is painless:

>>> import json
>>> release = json.loads(release_json_data)
>>> release['labels'][0]
{'name': 'Svek', 'catno': 'SK032'}

Now, we can easily make changes to the release object and then serialize it back to JSON:

>>> release['title'] = 'Stockholm'
>>> json.dumps(release)
'{"labels": [{"catno": "SK032", "name": "Svek"}], "title": "Stockholm"}'

Knowing that, how do we transform HDF data to JSON? No magical tool existed at the time to do this conversion for us, so we ventured out to write our own. And so hdf2json was born.

Introducing the hdf2json Python Library

Usage is a piece of cake:

>>> from hdf2json import hdf2json
>>> hdf2json(release_hdf_data)
'{"labels": [{"catno": "SK032", "name": "Svek"}], "title": "Stockholm"}'

If we wanted to convert to a native Python dictionary, hdf2dict also exists:

>>> from hdf2json import hdf2dict
>>> release = hdf2dict(release_hdf_data)
>>> release['title']

And then back to JSON:

>>> json.dumps(release)
'{"labels": [{"catno": "SK032", "name": "Svek"}], "title": "Stockholm"}'


For all the poor souls out there that are still stuck using HDF, we hope hdf2json will come in handy for you. Otherwise, we hope you enjoyed learning a bit of the behind-the-scenes stuff that happens at Discogs.

Keep Digging

Want to join the Discogs Community of music lovers?
Sign up for an account, subscribe to Discogs newsletters, and discover music articles, exclusive news, limited-edition offers, and more.
Return to Discogs Blog