Open Data + API

by teo

I’m pleased to announce that our main data set is now available for third party use, for free, under our new data license. This data is initially available through our API. But we will soon be providing monthly snapshots of the release, artist, and label data in XML format. With the API you can programmatically retrieve any release, artist, label, or search results page in XML format. For those who would like to have all of the data at once, you should use the upcoming monthly snapshots.

Now the XML format has not been finalized yet. So I would like to get feedback on the schema before calling it version 1.0. I will do this on Oct 1, 2007. So if you start using this data in an application, check back often as the schema will likely change.

By making the data open and available like this, we are enabling all the work that the submitters and moderators have put into the database over the last 7 years to be used in many new and different ways, on other websites, or simply on individual computers.

For example, if you run a fan website about an artist, you can “plug in” the Discogs data, and display it in different ways to suit the site. Or, you could download the database to display and search it in different ways on your own computer. We can expect to be surprised at some of the uses it will end up being put to! And by having the data “open” like this, we hope to further encourage the growth of Discogs, and the confidence in the site.

more details:
Data License: http://www.discogs.com/help/license.html
API Documentation: http://www.discogs.com/help/api

65 comments about “Open Data + API
  • marcelrecords 10 years ago
    and some already use it to boost their ads on eBay!
  • Manys 10 years ago
    Awesome!
  • hmvh 10 years ago
  • teo 10 years ago
    fixed that. thanks
  • deejsasqui 10 years ago
    This is very interesting, and I'm excited to see where this leads.

    My first questions:
    1) Do we really want the [url=http://www.discogs.com/release/add]Submit a Release[/url] link on all [url=http://www.discogs.com/help/attribution]Discogs Attributions[/url]? Perhaps link to [url=http://help.discogs.com/wiki/AboutDiscogs]About Discogs[/url], and/or [url=http://help.discogs.com/wiki/Contributing]Adding / Updating Discogs Information[/url]?

    2) Shouldn't the Open Data + API page be listed as something other than [url=http://www.discogs.com/help/docindex]Discogs Help[/url]? After all, we already have [url=http://help.discogs.com/]Discogs Help![/url], and I think the exclamation point as a differentiation is minor enough to be missed by most who are looking for help on Discogs. Or the Open Data + API going to be woven into the general Discogs Help! section, so all documentation of Discogs is under one umbrella-roof?
  • bubbleguuum 10 years ago
    That's awesome ! Christmas in August !! As the writer of [url=http://www.hydrogenaudio.org/forums/index.php?showtopic=50523&hl=foo_discogs]foo_discogs[/url] (a tagger for foobar2000) I was waiting for something like this since a long time !
    Now I'll be able to get Notes and Roles which i didn't do because it was such a PITA with parsing HTML.
    Any plan to include the rating of releases ? And to access the exact schema used (DTD) even if it's not final ?
  • bubbleguuum 10 years ago
    A few observation:

    1.

    From a release request:


    Aphex Twin
    AFX


    If I want to get the artist page from this, it's quite easy from the name field: http://discogs.com/artist/Aphex+Twin. (have to replace spaces by +). But will it works in all the cases especially with special characters (which need to be expressed as hexa using % in URLs). Wouldn't it be better to have an explicit link tag


    Aphex Twin
    AFX
    Aphex+Twin


    This could be extended to everything that is linkable and accessible via an API query (ie Labels, Artist, Release) whatever the page they appear

    2.

    It would be great in releases and artists replies to have links to images like this:


    R-15607-1103497766.jpg
    ...
    R-15607-1103497780.jpg


    3.

    While we're at it would be great to have rating / # votes / submitter / # owned / # wanted. The latest three are not super important though
  • teo 10 years ago
    bubbleguuum, you need to urlescape the artist and label names. Most scripting languages have that as a standard function. I'll add the image links. The other fields may come at a later time.
  • bubbleguuum 10 years ago
    Repost due to damn XML not showing (replaced brackets with {})

    A few observation:

    1.

    From a release request:

    {artist}
    {name}Aphex Twin{/name}
    {anv}AFX{/anv}
    {/artist}


    If I want to get the artist page from this, it's quite easy from the name field: http://discogs.com/artist/Aphex+Twin. (have to replace spaces by +). But will it works in all the cases especially with special characters (which need to be expressed as hexa using % in URLs). Wouldn't it be better to have an explicit link tag

    {artist}
    {name}Aphex Twin{/name}
    {anv}AFX{/anv}
    {link}Aphex+Twin{/link}
    {/artist}


    This could be extended to everything that is linkable and accessible via an API query (ie Labels, Artist, Release) whatever the page they appear

    2.

    It would be great in releases and artists replies to have links to images like this:

    {images}
    {image}R-15607-1103497766.jpg{/image}
    ...
    {image}R-15607-1103497768.jpg{/image}
    {/images}

    3.

    While we're at it would be great to have rating / # votes / submitter / # owned / # wanted. The latest three are not super important though
  • teo 10 years ago
    [quote=deejsasqui]2) Shouldn't the Open Data + API page be listed as something other than Discogs Help? After all, we already have Discogs Help!, and I think the exclamation point as a differentiation is minor enough to be missed by most who are looking for help on Discogs. Or the Open Data + API going to be woven into the general Discogs Help! section, so all documentation of Discogs is under one umbrella-roof?[/quote]
    The help sections will be merging in the next week or two.
  • bubbleguuum 10 years ago
    @teo: ok thanks, did not know about url escaping.
  • Random_Tox 10 years ago
    This is GREAT news. I'll immediately pester developers of my favorite music apps to get on it. It seems a forum for 'Third Party Development' under the 'Discogs' would make sense.

    I'm extremely impressed with how much has been added to Discogs in the short time I've been using it (or the shorter time I've been paying attention). I really feel that Discogs is poised to become the de facto standard online music database with this addition. Where can I buy stock? When will will we see "Filmogs.com"?
  • bubbleguuum 10 years ago

    [quote=Random_Tox]This is GREAT news. I'll immediately pester developers of my favorite music apps to get on it. It seems a forum for 'Third Party Development' under the 'Discogs' would make sense.
    [/quote]


    Just curious : what would you like to see as feature using this data ?
  • Schneckl 10 years ago
    Finally! This is great news.
    A few things with the license you might want to consider:
    - It should be self-referencing somewhere, similar to the GPL. The text of this license or a link to the license web page should be included in any re-distribution. Could be done with the Attribution Requirement ("View license: [url=http://www.discogs.com/help/license.html]www.discogs.com/help/license.html[/url]").
    - The Attribution Requirement should be made more general. There is IMO no need to include the discogs specific style.
    - A deejsasqui] said [url=http://www.discogs.com/forums/topic?topic_id=141878#1837903]above[/url, a direct link to submit form might turn out to be a not so good idea.
    - The Attribution Requirement is not yet suitable for printing on hard copies, as it is required in point 2 of the license, because it contains links that are not spelled out. Therefore i suggest that it say "Discogs.com music database" or "Discogs music database (www.discogs.com)" instead of "Discogs music database".
  • Random_Tox 10 years ago
    [quote=bubbleguuum]Just curious : what would you like to see as feature using this data ?[/quote]

    I use a central digital archive ([url=http://flac.sourceforge.net/]FLAC format[/url]) to play/access my entire music collection, and keep all physical media in storage.

    Any application I use for the archiving process would benefit greatly from the accuracy and consistency Discogs data. Much time is spent correcting info pulled from [url=http://www.gracenote.com/]CDDB[/url] and [url=http://www.freedb.org/]FreeDB[/url].

    I tag my music data with additional references to artist aliases, band members, producers, remixers, mix DJs, etc. This data is then available in my player application's selection interface. If player applications would pull this info from Discogs, it would spare me my cross-referencing efforts.

    These are the tools I primarily use:
    [url=http://www.slimdevices.com/pi_features.html?]Slimserver[/url]
    [url=http://cdexos.sourceforge.net/]CDex[/url]
    [url=http://www.softpointer.com/tr.htm]Tag&Rename[/url]
    [url=http://www.winamp.com/]Winamp[/url]
  • bubbleguuum 10 years ago

    [quote=Random_Tox]
    I tag my music data with additional references to artist aliases, band members, producers, remixers, mix DJs, etc. This data is then available in my player application's selection interface. If player applications would pull this info from Discogs, it would spare me my cross-referencing efforts.
    [/quote]

    You should give [url=http://www.discogs.com/forums/topic?topic_id=130409]foo_discogs[/url] a try then (follow the link for details and screenshots), it's designed exactly for this. Only thing it doesn't tag yet is notes and roles which i'll add soon with the new API. But artist aliases, ANVs, band members etc it does it already, plus the obvious and less obvious tags. And album/artist art.

  • binaryme 10 years ago
    [quote=Random_Tox]Where can I buy stock? When will will we see "Filmogs.com"?[/quote]

    Or Bibliogs.com? A (free) equivalent of Discogs for books would be massive!
  • deejsasqui 10 years ago
    Filmogs ~ IMDB's "[url=http://imdb.com/help/show_leaf?mymoviesguide]My Movies[/url]"
    Bibliogs ~ [url=http://www.librarything.com/]LibraryThing[/url]

    They aren't exact matches, but they are similar.

  • Pugget 10 years ago
    This is truly wonderful news for us programmers in the house. I've been thinking about writing an unofficial Ruby on Rails API for discogs for some time, but now my life will be full of joy without the effort!

    Thanks thanks thanks...
  • Random_Tox 10 years ago
    [quote=deejsasqui]Filmogs ~ IMDB's "My Movies" [/quote]
    Thanks. I'll give that a try, but just like CDDB, IMDB desperately needs the Ogs treatment. Their data is more useful than nothing, but compared to the detail, accuracy, interface, and structure of Discogs, it's a turd. Their structure jumbles tv commercials and vidio game voice acting in with movie appearances.

    There used to be a pretty decent Laserdisc DB operated by a distributor. It started to include DVDs when they first became available and the operator quickly had more work than they could handle. RIP. (Looks like [url=http://www.lddb.com]LDDB[/url] is an option, but my initial impression is that it could also take a lesson or twelve from discogs.)
  • binaryme 10 years ago
    [quote=deejsasqui]Bibliogs ~ LibraryThing
    [/quote]
    Nice!
  • Kergillian 10 years ago

    [quote=binaryme]Or Bibliogs.com? A (free) equivalent of Discogs for books would be massive![/quote]

    -grin- I'd like a Comicogs :)
  • Kergillian 10 years ago
    Nope - I mean with everything cross-indexed like on Discogs - writers, artists, letterers, colorists, editors - AND characters - in and out of costume, including cameos. Including dates, editions, formats, notes for important info (1st apearances, deaths, etc). That would be *awesome* to have.
  • mawiles 10 years ago
    click on the story, and you got more info:
    http://www.comichunters.net/?t=1&comic=2768&story=4770
  • Kergillian 10 years ago
    Yah, I did - but it's not cross-indexed, just listed out :)
  • Cybah 10 years ago
    Some form of versioning on the objects would be useful, to see whether or not a local cached entry needs to be updated or not. It could be quite elegantly knitted in with HTTP caching, if the version was to be the current date and time. That would also reduce bandwidth on the Discogs servers.

    Each time a response is generated, include a HTTP Last-Modified: field based on a table column. Also include the version field in the XML response so as to be useful with snapshots. When receiving a request, check for a HTTP If-Modified-Since: field and if the object hasn't been modified since, return a HTTP 304 (Not Modified) response.

    Alternatively, the HTTP ETag mechanism could be used if the version was to be something other than a date (e.g. database serial int).

    Is that feasible? Is it done already?
  • Gecks 10 years ago
    [quote=Random_Tox]These are the tools I primarily use:
    Slimserver
    CDex
    Tag&Rename
    Winamp [/quote]

    very surprised to not see http://www.musicbrainz.org on that list!

    as a musicbrainzer, i can't wait to see how we can utilize a discogs API...mmm :)
  • AbsoluteBodyControl 10 years ago
    I think I've found a bug in the API...

    Some releases are impossible to get for some reason:

    Example: http://www.discogs.com/release/70179?f=xml doesn't return anything (even if you supply the API key).

    This one does work: http://www.discogs.com/release/100000?f=xml

    Anyone else have this problem?
  • teo 10 years ago
    AbsoluteBodyControl, I've fixed that bug.
  • AbsoluteBodyControl 10 years ago
    Nice, thanks. Everything imported properly now :)
  • Manys 10 years ago
    Search XML is returning HTML entities. Is this by design?

    AFX* - 2 Remixes By AFX
  • Manys 10 years ago
    er, let's try that again:

    [em]AFX[/em]* - 2 Remixes By [em]AFX[/em]

    replace angle brackets for the square ones.
  • bubbleguuum 10 years ago
    agreed, it would be better if artist name / flag is artist name is an ANV / release title , would be in their own tags.
  • Manys 10 years ago
    Also in search results, there is no attribute to distinguish between artists and releases, they're both using result.title. I suppose the URI can be hacked up to get some context, but... ;)
  • bubbleguuum 10 years ago
    A question related to the API KEY.

    I'm developping an app to tag files from discogs data (foo_discogs).

    I have two possiblities:

    1. I bundle the app with my API key but the limit (5000 queries per day) can be reached depending on the number of users
    2. I force all the users to go register and create their private API key.

    5000 sounds like a lot but i have a batch update feature and some
    user could potentially do a lot of requests so i think I'll have to go with 2. Any comments ?
  • Manys 10 years ago
    Also, is it possible to page through to additional search results?
  • hypernova 10 years ago
    holy shit web service interfaces should be defined in WSDL, proprietary xml and even without any schema definition is a very poor solution... will this be upgraded once?
  • Manys 10 years ago
    i think it's (hopefully) in flux until oct 1. i did manage to work out a decent search client over the weekend, though my xml handling is still inelegant to say the least. the schema def is indeed in need.

    any suggestions? maybe we can come up with something to give to teo instead of expecting him to fix it all himself.
  • Manys 10 years ago
    I think I'm going to move my future comments to the dev board: http://www.discogs.com/forums/board?forum_id=843
  • teo 10 years ago
    Thanks for all the feedback!

    [quote=Manys]Search XML is returning HTML entities. Is this by design?[/quote]
    Those are the emphasis tags around words that the search engine highlights. How should it be? Should they be escaped? don't include them? Have two fields, one plain and one with the emphasis tags? This is easy to change so just tell me what you want.


    [quote=Manys]Also in search results, there is no attribute to distinguish between artists and releases, they're both using result.title.[/quote]
    I'll fix this in the next day or two.


    [quote=Manys]Also, is it possible to page through to additional search results?[/quote]
    How? do you want it to include links to the previous and next page of results?


    [quote=bubbleguuum]1. I bundle the app with my API key but the limit (5000 queries per day) can be reached depending on the number
    [quote=Manys]I think I'm going to move my future comments to the dev board[/quote]
    of users
    2. I force all the users to go register and create their private API key.[/quote]
    As it is now I think you should ask your users to register and use their own API key. One heavy user could impact other users of your app. The restriction is only there to prevent abuse, so I'm fine changing it. Would it be better if the restriction was 5000 requests per API key, per IP. So your same key could be used on any number of IP addresses (clients), 5000 times each per 24 hours period.


    [quote=hypernova]web service interfaces should be defined in WSDL[/quote]
    I'll look into this later. We're at the very early stage and I just want people to use it first and see what they can do.


    [quote=Manys]I think I'm going to move my future comments to the dev board[/quote]
    I'm also going to start an API forum soon. That may be a better place.
  • Manys 10 years ago
    [quote=teo]
    How? do you want it to include links to the previous and next page of results? [/quote]
    First up, thanks for the reponses! As for the search, there doesn't really seem to be a way in the API to search for additional pages, just query type, the query, the api key and the xml return directive. Nothing to specify what quanitity or sequence or anything like that. Maybe I just add the f=xml to any URI and hack it that way? ;)
  • bubbleguuum 10 years ago
    [quote=teo]Thanks for all the feedback!

    Manys
    Search XML is returning HTML entities. Is this by design?

    Those are the emphasis tags around words that the search engine highlights. How should it be? Should they be escaped? don't include them? Have two fields, one plain and one with the emphasis tags? This is easy to change so just tell me what you want.
    [/quote]

    There's should not be html tags at all (em and td tags among other) or display formatting, just organized hierarchical data.
    For example i'd replace:
    {title}{em}AFX{/em}* - Smojphace EP{/title}

    with

    {result}
    {artist anv=1}
    AFX
    {/artist}
    {title}
    Smojphace EP
    {/title}
    {/result}

    There would be more work to properly present {summary}

    [quote]
    As it is now I think you should ask your users to register and use their own API key. One heavy user could impact other users of your app. The restriction is only there to prevent abuse, so I'm fine changing it. Would it be better if the restriction was 5000 requests per API key, per IP. So your same key could be used on any number of IP addresses (clients), 5000 times each per 24 hours period.
    [/quote]

    A limitation of 5000 request per key and IP would be wondeful for apps ! It's a bit erstrictive to force user to register just to obtain a API key. The simpler the better !


  • teo 10 years ago
    Manys, the html entities are now gone and there are "type=release|artist|label" values in the result tags. To get the next page of results you can add "start=20" to the url. I'm also going to add some type of "nextResults" and "previousResults" tags containing urls to the next/prev pages. I'll post again when that's in.

    bubbleguuum, the limit is now 5,000 requests per IP per 24 hour period, with any valid api key. So you should distribute your app with your key, and any number of clients will be able to make 5,000 requests per 24 hour period.
  • Manys 10 years ago
    Sweet! Now to get my page formatting down... :)

    Also: what is your stance on my caching requests and data locally? I read the license and it seems fine as long as users see an attribution to Discogs, but I want to be sure before I take some load off your machines (he says confidently).
  • bubbleguuum 10 years ago
    For artist page XML requests, the format, label and year corresponding to each release is not present (as it is displayed on the HTML page).
    This missing info is very important for my app when a user must select a particular release. So it would be great if you could add those 3 missing info in the {release} XML element.

  • bubbleguuum 10 years ago
    to be clear something like:

    {release id="63114" status="Accepted" type="Main"}
    {title}Analog Bubblebath Vol. 2{/title}
    {format}12"{/format}
    {label}Rabbit City Records{/label}
    {released}1991{/released}
    {/release}
  • teo 10 years ago
    bubbleguuum, those 3 fields are now included in the artist API request.

    Manys, I don't mind you caching data on your end.
  • bubbleguuum 10 years ago
    teo: thanks very much, that was quick !
  • bubbleguuum 10 years ago
    Found 2 very small non critical bugs:

    1.

    http://www.discogs.com/artist/laurent+garnier?f=xml&api_key=

    look the last url in {urls} : it's an empty tag: {/url}. Maybe someone committed an empty url for this artist. It was making my parser crash before I hardened it.

    2.

    http://www.discogs.com/artist/Super-A-Loof?f=xml&api_key=

    look at {realname}. it's "Kirk Degiorgio & Jamie O'Dell" while the web page displays "Kirk Degiorgio, Jamie Odell, Ian O'Brien" at it should.

    Any chance you could add rating and # votes in release request response ?
  • bubbleguuum 10 years ago
    I noticed track duration tag is present as an empty tag {duration/} when there's no duration for the track. It would simplify parsing a bit if this tag was not present at all in that case.
  • encoded 10 years ago
    I've noticed that certain artists' XML responses do not include their profile, while others do.

    For example:

    http://www.discogs.com/artist/Skinny+Puppy has a profile, and when I call the API for this artist, I get a profile at resp/artist/profile.

    In contrast:

    http://www.discogs.com/artist/Project+Pitchfork also has a profile, but when I call the API, I get no profile anywhere in the XML document.

    Is there a better place to log such bugs? Please inform me if there is.

    Thanks a million for the API! It makes me all warm and cuddly inside.

    e.
  • bubbleguuum 10 years ago
    teo, any news on adding links to images on both artist and release pages ?
  • Manys 10 years ago
    encoded: could that just be a case of a null profile that you can catch on your side?
  • encoded 10 years ago
    Manys: Not sure what you mean... I'll try to explain further. If you go here:

    http://www.discogs.com/artist/Project+Pitchfork

    You'll see that there is a profile for the artist. It reads "One of the best known German industrial electro band".

    However, if you access that same page via the new API, the XML that is returned does not contain a profile tag anywhere in it. Nor does the response include the text "One of the best known German industrial electro band" anywhere in the response. So it is not a matter of the profile being null, the profile simply isn't there, at all.

    Does that make it clearer?

    e.
  • bubbleguuum 10 years ago
    I'd like to have the artist numeric id in the {artist} tag, the one that is used to construct the artist image page. Something like this would be fine:

    {artist id=1234} ...

  • bubbleguuum 10 years ago
    I otherwise released the first version of foo_discogs using the API, so thanks for making it! :

    http://www.discogs.com/forums/topic?topic_id=130409
  • Dave_Scream 10 years ago
    VERY good :) thanks for foo_discogs and fundamental labour of bubbleguuum for HTML parser. so i tag my collection with foo_discogs more than year...but searched for it much longer. thanks bubbleguuum.

    now. discogs API appeared, and foo_discogs is much faster and now my mp3's have "vocals", "remixed" and "featuring" tags!! cool! we all waited this for a long time...

    the only thing we need is to get "rank", "votes" and "release comments" by foo_discogs...

    teo, plz, add the maximum info that can be added in XML
    ---
    the only thing I don't like, is that musicbrainz can copy all info to their database... but their taggers are not so good in contrast with foo_discogs. and their database doesnt have label info.
  • Dave_Scream 10 years ago
    VERY good :) thanks for foo_discogs and fundamental labour of bubbleguuum for HTML parser. so i tag my collection with foo_discogs more than year...but searched for it much longer. thanks bubbleguuum.

    now. discogs API appeared, and foo_discogs is much faster and now my mp3's have "vocals", "remixed" and "featuring" tags!! cool! we all waited this for a long time...

    the only thing we need is to get "rank", "votes" and "release comments" by foo_discogs...

    teo, plz, add the maximum info that can be added in XML
    ---
    the only thing I don't like, is that musicbrainz can copy all info to their database... but their taggers are not so good in contrast with foo_discogs. and their database doesnt have label info.
  • teo 10 years ago
    Image details are now in the API responses. Check http://www.discogs.com/help/api for details.
  • .J. 10 years ago
    I think in some cases the "artist" entries are malformed, for example in: http://www.discogs.com/label/Perlon?f=xml
    There are a bunch of linebreaks in the artist names (in this case CatNo: 13, 16, 28.5, 29, 32) Any idea what that might result from?

    Thanks a million for creating a real API to Discogs! Keep up the great work :)
  • .J. 10 years ago
    I think in some cases the "artist" entries are malformed, for example in: http://www.discogs.com/label/Perlon?f=xml
    There are a bunch of linebreaks in the artist names (in this case CatNo: 13, 16, 28.5, 29, 32) Any idea what that might result from?

    Thanks a million for creating a real API to Discogs! Keep up the great work :)
  • teo 10 years ago
    .J., the linebreaks are now removed from artists on the label page. thanks for reporting it.
  • .J. 10 years ago
    Would it be possible to include the "Format" information of the releases on a label page?
    Greetings
  • evamedia 10 years ago
    moving to API forum