Discogs will be in read-only mode for several hours – 17-Feb

Discogs will go into read-only mode for several hours tonight beginning around 9pm (PST). When the site is in read-only mode you will be able to browse and view pages but not change any data (no submissions, updates, modding, forum posting, etc)

During this time I will be performing a much needed conversion to make Discogs support Unicode. This will allow us to support all languages in the the future. For more about unicode check [url=http://en.wikipedia.org/wiki/Unicode]wikipedia[/url].

Return to Discogs Blog
87 Comments
  • Feb 23,2006 at 6:31 am

    [url=http://www.discogs.com/release/155929]something Japanese[/url]

  • Feb 23,2006 at 5:38 am

    [url=http://www.discogs.com/label/Tropic+Records]Tropic Records[/url]

  • Feb 23,2006 at 5:37 am

    [url=http://www.discogs.com/search?start=20&type=all&q=i!%C3%89]#21[/url]

  • Feb 23,2006 at 5:33 am

    [url=http://www.discogs.com/search?type=all&q=s%E2%88%9A_&btn=Search]SA|d[/url]

  • Feb 23,2006 at 5:27 am

    >>> O[b]$ka[/b]

  • Feb 23,2006 at 5:25 am

    >>> [url=http://www.discogs.com/artist/Numer+Raz]Nastukafszy = track 21 length[/url]

  • Feb 23,2006 at 5:22 am

    [i]1 Oœka Przez [b]$[/b] Jak… (Intro)
    2 Tak Zapomnie[b]æ[/b]
    Featuring – Mes , Trish (3)
    3 Muzyka
    Featuring – Borixon
    4 Przesi[b]¹[/b]k[b]³[/b]em Tym
    Featuring – Onar
    5 Skit
    6 [b]£[/b]apie Chwile
    Featuring – Numer Raz
    7 By[b]æ[/b] Sob[b]¹[/b]
    Featuring – Selma
    8 Milion
    Featuring – Jano (2)
    9 Skit 22
    10 Bezele Kochanie
    Featuring – Kie[b]³[/b]basa , Ko[b]³[/b]cz , Tede
    11 Prze-Ziomy
    Featuring – Stasiak
    12 W Tych Klubach
    Featuring – Cayra , Teka
    13 Skit
    14 Czas To Zmieni[b]æ[/b]
    Featuring – Pezet , Stasiak
    Scratches – DJ Homer (2)
    15 Miêdzy S[b]³[/b]owami[/i]

  • Feb 23,2006 at 5:17 am

    >>> Mie[b]�[/b]cie

  • Feb 23,2006 at 5:13 am

    Polish can’t be this weird. Here are the accents that look odd
    to me:

    >>> ¯
    >>> ¹
    >>> Nienawiœ[b]æ[/b]
    >>> ³
    >>> Uwa[b]¿[/b]aj

  • Feb 23,2006 at 5:07 am

    [url=http://www.discogs.com/artist/W%C2%B3odi]have a tylenol[/url]

  • Feb 23,2006 at 12:29 am

    it seems widespread actually
    before the unicode update when main artist entry was Garcons it was ok and created no link problems to actually credit releasses to Garçons
    now the releases still get listed on the artist page but the links are broken

    same seems to apply to accents and stuff: i’ve found a Antonio Carlos Jobim appearance listed with accents, and since the main entry is without accents the link is broken

    this is potentially a big problem since for the longest time artist entries where accepted with or without special characters …

    also see Roisin Murphy: releases are listed under artist entry w/o accents
    when you open a release that’s been entered as Róisín Murphy the artist link opens a nex page … where the release is actually not listed …

    http://www.discogs.com/artist/Roisin+Murphy

    open: http://www.discogs.com/release/470905

    click artist – opens: http://www.discogs.com/artist/R%C3%B3is%C3%ADn+Murphy

  • Feb 22,2006 at 11:08 pm
  • Feb 22,2006 at 11:06 pm

    Garçons & Marie Et Les Garçons are having problems too (still? again?)

    artist pages are Garcons , all releases are listed fine but they display Garçons and link doesn’t work ???

    is it ok to update all releases with problems to Garcons and then try to rename artist entries?

  • Feb 22,2006 at 5:33 am

    another eg, a bit like the Pele one above:

    Releases on [a=Mondeé Oliver]’s page are now split between there and [a=Mondee Oliver] – merge pending, easy to fix. just a heads-up, in case there are other examples where accented names have been separated or merged in the conversion process…

  • Feb 22,2006 at 1:44 am

    yea, I just caught a glimpse of it. thanks though.

  • Feb 22,2006 at 1:25 am

    à;GRUMH… is fine: check available release pictures for confirmation

  • Feb 22,2006 at 1:19 am

    [url=http://www.discogs.com/artist/%C3%A0;GRUMH…]à;GRUMH…[/url]

    I’m assuming this is not the artist’s name. (sometimes it is not always obvious)

  • Feb 22,2006 at 1:16 am

    Pele, Pelé and Pélé somehow got combined into the single [a=Pele] entry. The artist names show up correctly on the individual release pages, but the [a=Pele] discography page lists all the releases featuring the three separate artists together.

  • Feb 21,2006 at 3:02 pm

    guess it’s ok – must have cleaned all the mess i contributed to a couple days ago, now – thanks djpc and teo ;-)

  • Feb 21,2006 at 2:23 pm

    LOL – look at the first link, people aren’t paying enough attention to updates – the credit was spelt differently to the artist.

  • Feb 21,2006 at 2:07 pm

    (and now links don’t need special formatting – this place gets weirder by the minute…)

  • Feb 21,2006 at 2:05 pm

    updates are back it seems

    yet some releases seem to suffer still from unicode bug

    is it ok to try and correct through updates or what?

    see:
    http://www.discogs.com/release/209622
    http://www.discogs.com/release/235283
    http://www.discogs.com/release/204708
    and some other on that artist page

  • mjb
    Feb 21,2006 at 12:15 pm

    oops, 92 is a [i]right[/i] single quote :)

  • mjb
    Feb 21,2006 at 12:13 pm

    Some of the problems are coming from the way the conversion was done, and (IMHO) buggy browsers that conflate Windows-1252 and ISO-8859-1.

    Like in that [a=Neo & Farina] profile, there are curly quotes that were submitted as Windows-1252 printable character bytes that correspond to non-printing control codes in ISO-8859-1. When teo converted them to UTF-8, he converted them assuming they were ISO-8859-1, so they’re now the UTF-8 bytes for the control codes.

    For example, byte 0x92 is a left single quote in Windows-1252 and a control code in ISO-8859-1. When converted to UTF-8 it became 2 bytes: C2 92, which still indicates that same control code. In order to represent a left single quote, it needs to be 3 bytes: E2 80 99.

    teo should run another conversion pass to fix these kinds of errors. Basically, any pair of bytes C2 8x or C2 9x needs to have the C2 dropped and the remaining byte converted to Unicode from Windows-1252, and then that value should be encoded as UTF-8.

    This will fix the problems with the characters highlighted in yellow [url=http://en.wikipedia.org/wiki/Windows-1252]here[/url].

    As for the other problems, please keep posting examples so we can diagnose.

  • AEK
    Feb 21,2006 at 9:10 am

    I see the little blocks in place of characters all over the place, in profiles, in comments….. they’re f’in everywhere!!

    using Windows

  • Feb 21,2006 at 4:11 am

    “Intercord Tontr䧥ger GmbH”: http://www.discogs.com/changereq?id=1448818

  • Feb 21,2006 at 1:55 am

    Shows as “Intercord Tontr䧥r GmbH” in IE.
    Codation = ‘Unicode (UTF-8)’

  • Feb 20,2006 at 10:54 pm

    [url=http://www.discogs.com/label/Intercord+Tontr%E4ger+GmbH]Intercord Tonträger[/url] That becomes a “?” in my browser (Safari)

  • Feb 20,2006 at 10:07 pm
  • Feb 20,2006 at 5:07 pm

    how much longer do i have to put up with this (’) crap?

  • Feb 20,2006 at 3:44 pm

    [i]I know Im going to forget what I wanted to update by the time it comes up…[/i]

    So make a list containg which releases you want to update and what you want to correct.

    I got several lists:

    [b]Updates[/b]
    list of direct links to change requests I submitted, containing pending ones as well as those rejected and cancelled which need to be resubmitted with correction
    I can check them through at any time, rejected updates aren’t lost.

    [b]ToDo[/b]
    list of things I want to check/update, like
    – artists: is it the right one for the release/remix, are there aliases/groups missing in the profile, should it be split
    – labels: correct cat#s, move releases to sublabels/proper label, check all releases for necessary updates
    – users: check contributed releases for mistakes, caused e.g. by websubmission or inexperience
    – releases: direct links to release pages with notes what I want to update
    So I don’t forget what I want to change at the next opportunity.

    [b]Update-Guesses[/b]
    list of direct links to release pages that should be checked by release owners
    Main issues are wrongly truncated cat#s, distribution codes used as cat#s and potential duplicate releases caused by promos/white labels/multiple labels+cat#s

  • Feb 20,2006 at 2:18 pm

    Any idea when updates will be availablee again? I just wanted to do some regular updates and image adding and I know Im going to forget what I wanted to update by the time it comes up…

  • Feb 20,2006 at 7:16 am

    My pending and draft releases were all mixed up, because there was  character instead of [b]SOME OF[/b] the accented characters (I noticed it certainly for [b]ž[/b] and [b]č[/b]).
    I had to edit all of my pending / draft entries! :(
    Still not sure with accepted releases / artists…

  • Feb 20,2006 at 5:46 am

    Still updates blocked right?

  • Feb 20,2006 at 4:46 am

    pan american is still broke. see my post 7 up from here for the links.

    (incidently, IE shows it as “Pan•(square thing)American”, Firefox as “PanAmerican”, but neither have a problem with the • when i type it here).

  • teo
    Feb 19,2006 at 4:32 pm

    Those 45 artists are updated and working correctly now, same with the Sähkö label. I’m going to leave release updates disabled for a while longer just in case there are any others that missed the conversion.

    I’ve also put in redirect code so you can still use the search and browse links.

    Jooles, thanks, yes I’m aware of that. None of the browse data has been converted and I was planning to regenerate all of it since it’s more of a side part of the database; it has no impact on the real artist entries. Same with search. For for now the temporary redirects will let you still click those links.

    jasmithers and langster, I don’t see any funny characters like you mention. Any chance your browser is set to a different encoding? I suggest you leave it on default or auto-detect.

  • Feb 19,2006 at 4:25 pm

    [u=teo]: just in case you’re not already aware of this; searching via the alphabetical list of artists also shows loads of broken artist links with strange characters – much more than 50. Probably more like 2000. If you choose the last link on each of the letters of the alphabet, that’s where most of the accented characters appear.[url=http://www.discogs.com/artists/G?start=11200]Example for the last page of artists beginning with G[/url].
    Good luck with the fixes…

  • Feb 19,2006 at 3:51 pm

    aye. and i can see the dot on my above post, but not on the pan american artist page, and it was there before the update (or at least, a few weeks ago).

  • Feb 19,2006 at 3:10 pm

    you probably need a browser which is able to interpret the unicode
    properly. Some browsers have an obsolete interpreter.

  • Feb 19,2006 at 1:59 pm

    [url=http://www.discogs.com/artist/Pan%C2%95American]Pan American[/url] is broke at the moment. should be, and was, ‘Pan•American’

    just in case this isn’t one of the 45 :)

  • Feb 19,2006 at 12:56 pm

    Opera 8 and seeing squares (luckily not everywhere ;-)

  • Feb 19,2006 at 12:43 pm

    Yup that is what I had yesterday in place of the £ symbol.

  • Feb 19,2006 at 12:07 pm

    here with Firefox 1.5.0.1 on mac with unicode UTF-8 i get fancy A in squares where ” should be

  • Feb 19,2006 at 12:00 pm

    Hmmmmm I am using Windows. and Copied and pasted the above, so obviously see little squares. :S

  • Feb 19,2006 at 11:57 am

    @langster, I don’t see any squares, just a 12 without a ” ????
    (using Firefox 1.0.7 on Win XP).

  • Feb 19,2006 at 10:57 am

    @teo

    I know this should really be put in the SELLING forum but I think it applies here as well..

    Check my postage prices out here…

    [b][i]U.K. – £2.75 for the 1st 12” + £1.00 for each additional 12” (1st Class Recorded Post)
    EUROPE – £3.80 for the 1st 12” + £1.20 for each additional 12” (Airmail)
    U.S.A. & CANADA – £4.50 for the 1st 12” + £2.00 for each additional 12” (Airmail)
    REST OF WORLD – £5.00 for the 1st 12” + £2.00 for each additional 12” (Airmail)[/b][/i]

    Those Little squares are where the [b]”[/b] used to be, to indicate a 12″ record. This has happened since your unicode update. Also yesterday I noticed my poud signs changed to a strange Russian symbol but appear to have sorted themselves out now.

    Would you like me to fix the boxes back to ” ? Or has anyone else noticed this problem? And will you be making a fix for it?

    Secondly. I put soundclips in my item description obviously using a HTML code link. I have noticed if I want to edit my item in anyway THE LINK IS THEN LOST. And I onlt get this part of the link left <a href= then the rest is lost. No matter if I change anything or not. So If I go into change ALL my prices and lower them by a pound. Then I would have to RE-ENTER every single mp3 link again, and everything that was written after the link as well. :S

  • Feb 19,2006 at 10:43 am

    If you kill me the queu will be much more manageable sure, and would do a favour to my wife ;pp

  • Feb 19,2006 at 10:42 am

    I see purity – and with one like me always updating of course ;p

  • Feb 19,2006 at 10:28 am

    Still “Updates Temporarily Disabled” For how long?

  • Feb 19,2006 at 10:10 am

    also, the ‘items for sale’ pages don’t represent all characters properly
    yet.

  • Feb 19,2006 at 10:04 am

    clicked oops :)

  • Feb 19,2006 at 10:04 am

    I spotted a small other thing but I think I remember it being there before
    the changes:

    http://www.discogs.com/artist/Ja%C3%AFa

    the ‘e accent aigu’ on one of the visible titles is displayed as ‘?’
    It displays properly when the release is slicked though.

  • teo
    Feb 19,2006 at 9:15 am

    All of the releases listed above are now displaying correctly. Now I’m cleaning up the 45 artists and Sähkö label (I don’t yet know what happened with that).

    Again, don’t worry about search problems. That will be resolved soon as well. thanks

  • Feb 19,2006 at 8:12 am

    I think teo disabled it temporarily. see above.

    Yes the FFFD word (hex) is replacing several malformed UTF characters.
    Hopefully teo’s comment is true – that the script just missed some DB
    entries.

  • Feb 19,2006 at 8:01 am

    here too…

    just leave the browser’s window open to retry in a few… hours? hopefully.

  • Feb 19,2006 at 7:58 am

    When I’m trying to do an update it says:

    ‘[b]Updates temporarily disabled.[/b]’.

    I’ve been updating all day, the message only appeared now.

    Is this normal?

    (btw, i’m at 100 pendings – i don’t know if it has something to do with it. probably not, been to 107 earlier this day)

  • Feb 19,2006 at 7:56 am

    Also the [url=http://www.discogs.com/label/S%C3%83%C2%A4hk%C3%83%C2%B6]Sähkö label[/url] seems to have been toasted. The individual releases seem to be showing Finnish characters, but not the label itself.

  • teo
    Feb 19,2006 at 7:50 am

    ah, thanks Jooles, right, please don’t submit updates to these. I’ll be fixing them.

  • teo
    Feb 19,2006 at 7:48 am

    Thanks for all the feedback!

    UPDATE: There are a few releases in the 200,000 – 240,000 range that did not get converted. I’m not sure why because my scripts covered the complete range. But I’m trying to figure out the range of what was not converted and will update them then.

    45 artists were not converted either (including Royksopp) because the converted data would conflict with other artist entries. I am updating those by hand.

    Finally, the search engine is going to have a few broken links so don’t worry about it now. I am updating it later today.

  • Feb 19,2006 at 7:46 am

    donnacha: the search engine doesn’t like special characters.

    it is because of the umlaut. another example: type this into the engine:

    [b]é[/b]

  • Feb 19,2006 at 7:46 am

    [u=julesparis] (and any others who’ve been submitting updates):

    I think it’s best to hold off submitting any updates to individual releases containing artists with accented characters who appear wrong. Let’s wait for teo to try to iron out the bugs with this new upgrade and then see if any minor updates need to be made…

  • Feb 19,2006 at 7:40 am

    They are going to have to parse through the whole database and ‘find and
    replace’ for many characters if this situation is correct.

  • Feb 19,2006 at 7:37 am

    here is another: http://www.discogs.com/artist/Asia+2001

    in this case the circumflex works with lower case ‘a’ but not ‘A’ it seems.
    I am thinking now that the issue is rather deep. I think it has to do
    with the way that the characters were input into the original submissions.
    They were no doubt cut an pasted from different places and the hexadecimal is
    getting translated differently in these circumstances.

  • Feb 19,2006 at 7:18 am

    like in my example: http://www.discogs.com/label/Stoneage+Records

    two of the Orembo (whatever it is) have the proper umlaut, one does not…

  • Feb 19,2006 at 7:16 am

    The back-up may be too large and unwieldy to actually handle.
    But yeah, certain characters – perhaps it is so. Though the umlaut
    works fine in some cases and not for others, so I think it is subtler
    than that.

  • Feb 19,2006 at 7:05 am

    Only some special characters are broken. Must be the way they were entered in the first place.

    Should have been run on a backup of the database first – to iron out obvious problems like this.

  • Feb 19,2006 at 6:59 am

    actually these guys are doing a very good job. There are eally only a
    very few little things to clean up. I actually did not know what the DB
    upgrade was about, but I guessed it was a change having to do with
    Unicode when I was browsing my items and several characters were
    all ‘messed up’ At several points yesterday the situation was way, way worse. It looks like the Scandanavian character sets are in need of
    some fixing and the ‘umlaut’ is perhaps still an issue. I noticed mjb
    in another topic, and I suspect he is a clever fellow capable of resolving
    any residual bugs expediently.

  • Feb 19,2006 at 6:02 am

    Actually, [a=Motörhead] and [a=Crüxshadows, The] pages are ok except via the search engine.

  • Feb 19,2006 at 5:55 am

    [r=494677] Artist Page Link does not work.I Think everything with an ö,ä,ü (German Umlaut) is a broken link.

  • Feb 19,2006 at 5:22 am

    Previously, accented letters were treated like the same letter without an accent, IIRC.

    Entering a credit for “Chateau Flight” would lead to the page for “Château Flight”, but it seems, that this is not the case now, which makes checking artist links somewhat difficult, because some accents are inconsistent/redundant.

    For example, Marcos Lopez from http://www.discogs.com/artist/Marmion will mostly be listed without an accented “o”, but the artist link for “Marcos Lopez” is currently “empty”, because it has to be entered as “Marcos López”

  • Feb 19,2006 at 4:24 am

    yep
    as above

  • Feb 19,2006 at 2:39 am

    Röyksopp is a big mess as well, when you eventually get on the artist page, there’s no way back from a release: clicking on the artist link leads back to “wrong page”

  • Feb 19,2006 at 2:00 am

    this is one major fuck up – loads of artists end up with broken links: Âme, Château Flight …

  • Feb 19,2006 at 1:38 am

    [url=http://www.discogs.com/artist/Sven+V%E4th]Sven V?th[/url]
    And the search-engine still doesn’t understand special characters.

  • Feb 19,2006 at 12:25 am
  • Feb 19,2006 at 12:20 am

    Finnish is a weird language; it is no wonder…

  • Feb 19,2006 at 12:19 am

    EXO CD28

  • Feb 19,2006 at 12:18 am

    spotted another little unicode bug:

    http://www.discogs.com/label/Exogenic+Records

    it is the third to last entry iirc.

  • Feb 18,2006 at 11:11 pm

    Existing entries are having some difficulties…
    [url=http://www.discogs.com/release/200582]for instance[/url]

    Is all the info lost for these?

  • Feb 18,2006 at 10:43 pm

    I knew it would take longer. Excellent idea and outcome though.

  • Feb 18,2006 at 9:49 pm

    I meant “fo’ evah, yo.

  • Feb 18,2006 at 9:48 pm

    that felt like o’ evah, son!

  • Feb 18,2006 at 9:15 pm

    word ’em up

  • teo
    Feb 18,2006 at 9:08 pm

    Okay, the conversion is complete! Sorry about the delay, it took much longer than I anticipated. But we’re now running with unicode support so we can truly handle all languages.

    There are a few areas where you may see broken links: search results and the browse artists/labels pages. I am in the process of rebuilding those and they should be up do date within 24 hours.

    Special thanks to [u=mjb] for his help in getting me up to speed on unicode!

  • Feb 17,2006 at 7:54 pm

    my maths is no good, is this about to happen really soon?

  • Feb 17,2006 at 6:51 pm

    oooo yeah! So this should fix the probs with accented characters on the new submission form. Schweet!

  • Feb 17,2006 at 5:25 pm

    Cool!

Leave A Reply