Discogs will be in read-only mode for several hours – 17-Feb
Discogs will go into read-only mode for several hours tonight beginning around 9pm (PST). When the site is in read-only mode you will be able to browse and view pages but not change any data (no submissions, updates, modding, forum posting, etc)
During this time I will be performing a much needed conversion to make Discogs support Unicode. This will allow us to support all languages in the the future. For more about unicode check [url=http://en.wikipedia.org/wiki/Unicode]wikipedia[/url].
[url=http://www.discogs.com/release/155929]something Japanese[/url]
[url=http://www.discogs.com/label/Tropic+Records]Tropic Records[/url]
[url=http://www.discogs.com/search?start=20&type=all&q=i!%C3%89]#21[/url]
[url=http://www.discogs.com/search?type=all&q=s%E2%88%9A_&btn=Search]SA|d[/url]
>>> O[b]$ka[/b]
>>> [url=http://www.discogs.com/artist/Numer+Raz]Nastukafszy = track 21 length[/url]
[i]1 Oka Przez [b]$[/b] Jak… (Intro)
2 Tak Zapomnie[b]æ[/b]
Featuring – Mes , Trish (3)
3 Muzyka
Featuring – Borixon
4 Przesi[b]¹[/b]k[b]³[/b]em Tym
Featuring – Onar
5 Skit
6 [b]£[/b]apie Chwile
Featuring – Numer Raz
7 By[b]æ[/b] Sob[b]¹[/b]
Featuring – Selma
8 Milion
Featuring – Jano (2)
9 Skit 22
10 Bezele Kochanie
Featuring – Kie[b]³[/b]basa , Ko[b]³[/b]cz , Tede
11 Prze-Ziomy
Featuring – Stasiak
12 W Tych Klubach
Featuring – Cayra , Teka
13 Skit
14 Czas To Zmieni[b]æ[/b]
Featuring – Pezet , Stasiak
Scratches – DJ Homer (2)
15 Miêdzy S[b]³[/b]owami[/i]
>>> Mie[b]�[/b]cie
Polish can’t be this weird. Here are the accents that look odd
to me:
>>> ¯
>>> ¹
>>> Nienawi[b]æ[/b]
>>> ³
>>> Uwa[b]¿[/b]aj
[url=http://www.discogs.com/artist/W%C2%B3odi]have a tylenol[/url]
it seems widespread actually
before the unicode update when main artist entry was Garcons it was ok and created no link problems to actually credit releasses to Garçons
now the releases still get listed on the artist page but the links are broken
same seems to apply to accents and stuff: i’ve found a Antonio Carlos Jobim appearance listed with accents, and since the main entry is without accents the link is broken
this is potentially a big problem since for the longest time artist entries where accepted with or without special characters …
also see Roisin Murphy: releases are listed under artist entry w/o accents
when you open a release that’s been entered as Róisín Murphy the artist link opens a nex page … where the release is actually not listed …
http://www.discogs.com/artist/Roisin+Murphy
open: http://www.discogs.com/release/470905
click artist – opens: http://www.discogs.com/artist/R%C3%B3is%C3%ADn+Murphy
[a=Garcons] => http://www.discogs.com/release/255848
[a=Marie Et Les Garcons] => http://www.discogs.com/release/425333
Garçons & Marie Et Les Garçons are having problems too (still? again?)
artist pages are Garcons , all releases are listed fine but they display Garçons and link doesn’t work ???
is it ok to update all releases with problems to Garcons and then try to rename artist entries?
another eg, a bit like the Pele one above:
Releases on [a=Mondeé Oliver]’s page are now split between there and [a=Mondee Oliver] – merge pending, easy to fix. just a heads-up, in case there are other examples where accented names have been separated or merged in the conversion process…
yea, I just caught a glimpse of it. thanks though.
à;GRUMH… is fine: check available release pictures for confirmation
[url=http://www.discogs.com/artist/%C3%A0;GRUMH…]à;GRUMH…[/url]
I’m assuming this is not the artist’s name. (sometimes it is not always obvious)
Pele, Pelé and Pélé somehow got combined into the single [a=Pele] entry. The artist names show up correctly on the individual release pages, but the [a=Pele] discography page lists all the releases featuring the three separate artists together.
guess it’s ok – must have cleaned all the mess i contributed to a couple days ago, now – thanks djpc and teo ;-)
LOL – look at the first link, people aren’t paying enough attention to updates – the credit was spelt differently to the artist.
(and now links don’t need special formatting – this place gets weirder by the minute…)
updates are back it seems
yet some releases seem to suffer still from unicode bug
is it ok to try and correct through updates or what?
see:
http://www.discogs.com/release/209622
http://www.discogs.com/release/235283
http://www.discogs.com/release/204708
and some other on that artist page
oops, 92 is a [i]right[/i] single quote :)
Some of the problems are coming from the way the conversion was done, and (IMHO) buggy browsers that conflate Windows-1252 and ISO-8859-1.
Like in that [a=Neo & Farina] profile, there are curly quotes that were submitted as Windows-1252 printable character bytes that correspond to non-printing control codes in ISO-8859-1. When teo converted them to UTF-8, he converted them assuming they were ISO-8859-1, so they’re now the UTF-8 bytes for the control codes.
For example, byte 0x92 is a left single quote in Windows-1252 and a control code in ISO-8859-1. When converted to UTF-8 it became 2 bytes: C2 92, which still indicates that same control code. In order to represent a left single quote, it needs to be 3 bytes: E2 80 99.
teo should run another conversion pass to fix these kinds of errors. Basically, any pair of bytes C2 8x or C2 9x needs to have the C2 dropped and the remaining byte converted to Unicode from Windows-1252, and then that value should be encoded as UTF-8.
This will fix the problems with the characters highlighted in yellow [url=http://en.wikipedia.org/wiki/Windows-1252]here[/url].
As for the other problems, please keep posting examples so we can diagnose.
I see the little blocks in place of characters all over the place, in profiles, in comments….. they’re f’in everywhere!!
using Windows
“Intercord Tontr䧥ger GmbH”: http://www.discogs.com/changereq?id=1448818
Shows as “Intercord Tontr䧥r GmbH” in IE.
Codation = ‘Unicode (UTF-8)’
[url=http://www.discogs.com/label/Intercord+Tontr%E4ger+GmbH]Intercord Tonträger[/url] That becomes a “?” in my browser (Safari)
http://www.discogs.com/label/Töshöklabs
how much longer do i have to put up with this () crap?
[i]I know Im going to forget what I wanted to update by the time it comes up…[/i]
So make a list containg which releases you want to update and what you want to correct.
I got several lists:
[b]Updates[/b]
list of direct links to change requests I submitted, containing pending ones as well as those rejected and cancelled which need to be resubmitted with correction
I can check them through at any time, rejected updates aren’t lost.
[b]ToDo[/b]
list of things I want to check/update, like
– artists: is it the right one for the release/remix, are there aliases/groups missing in the profile, should it be split
– labels: correct cat#s, move releases to sublabels/proper label, check all releases for necessary updates
– users: check contributed releases for mistakes, caused e.g. by websubmission or inexperience
– releases: direct links to release pages with notes what I want to update
So I don’t forget what I want to change at the next opportunity.
[b]Update-Guesses[/b]
list of direct links to release pages that should be checked by release owners
Main issues are wrongly truncated cat#s, distribution codes used as cat#s and potential duplicate releases caused by promos/white labels/multiple labels+cat#s
Any idea when updates will be availablee again? I just wanted to do some regular updates and image adding and I know Im going to forget what I wanted to update by the time it comes up…
My pending and draft releases were all mixed up, because there was character instead of [b]SOME OF[/b] the accented characters (I noticed it certainly for [b]ž[/b] and [b]č[/b]).
I had to edit all of my pending / draft entries! :(
Still not sure with accepted releases / artists…
Still updates blocked right?
pan american is still broke. see my post 7 up from here for the links.
(incidently, IE shows it as “Pan(square thing)American”, Firefox as “PanAmerican”, but neither have a problem with the • when i type it here).
Those 45 artists are updated and working correctly now, same with the Sähkö label. I’m going to leave release updates disabled for a while longer just in case there are any others that missed the conversion.
I’ve also put in redirect code so you can still use the search and browse links.
Jooles, thanks, yes I’m aware of that. None of the browse data has been converted and I was planning to regenerate all of it since it’s more of a side part of the database; it has no impact on the real artist entries. Same with search. For for now the temporary redirects will let you still click those links.
jasmithers and langster, I don’t see any funny characters like you mention. Any chance your browser is set to a different encoding? I suggest you leave it on default or auto-detect.
[u=teo]: just in case you’re not already aware of this; searching via the alphabetical list of artists also shows loads of broken artist links with strange characters – much more than 50. Probably more like 2000. If you choose the last link on each of the letters of the alphabet, that’s where most of the accented characters appear.[url=http://www.discogs.com/artists/G?start=11200]Example for the last page of artists beginning with G[/url].
Good luck with the fixes…
aye. and i can see the dot on my above post, but not on the pan american artist page, and it was there before the update (or at least, a few weeks ago).
you probably need a browser which is able to interpret the unicode
properly. Some browsers have an obsolete interpreter.
[url=http://www.discogs.com/artist/Pan%C2%95American]Pan American[/url] is broke at the moment. should be, and was, ‘Pan•American’
just in case this isn’t one of the 45 :)
Opera 8 and seeing squares (luckily not everywhere ;-)
Yup that is what I had yesterday in place of the £ symbol.
here with Firefox 1.5.0.1 on mac with unicode UTF-8 i get fancy A in squares where ” should be
Hmmmmm I am using Windows. and Copied and pasted the above, so obviously see little squares. :S
@langster, I don’t see any squares, just a 12 without a ” ????
(using Firefox 1.0.7 on Win XP).
@teo
I know this should really be put in the SELLING forum but I think it applies here as well..
Check my postage prices out here…
[b][i]U.K. – £2.75 for the 1st 12 + £1.00 for each additional 12 (1st Class Recorded Post)
EUROPE – £3.80 for the 1st 12 + £1.20 for each additional 12 (Airmail)
U.S.A. & CANADA – £4.50 for the 1st 12 + £2.00 for each additional 12 (Airmail)
REST OF WORLD – £5.00 for the 1st 12 + £2.00 for each additional 12 (Airmail)[/b][/i]
Those Little squares are where the [b]”[/b] used to be, to indicate a 12″ record. This has happened since your unicode update. Also yesterday I noticed my poud signs changed to a strange Russian symbol but appear to have sorted themselves out now.
Would you like me to fix the boxes back to ” ? Or has anyone else noticed this problem? And will you be making a fix for it?
Secondly. I put soundclips in my item description obviously using a HTML code link. I have noticed if I want to edit my item in anyway THE LINK IS THEN LOST. And I onlt get this part of the link left <a href= then the rest is lost. No matter if I change anything or not. So If I go into change ALL my prices and lower them by a pound. Then I would have to RE-ENTER every single mp3 link again, and everything that was written after the link as well. :S
If you kill me the queu will be much more manageable sure, and would do a favour to my wife ;pp
I see purity – and with one like me always updating of course ;p
Still “Updates Temporarily Disabled” For how long?
also, the ‘items for sale’ pages don’t represent all characters properly
yet.
clicked oops :)
I spotted a small other thing but I think I remember it being there before
the changes:
http://www.discogs.com/artist/Ja%C3%AFa
the ‘e accent aigu’ on one of the visible titles is displayed as ‘?’
It displays properly when the release is slicked though.
All of the releases listed above are now displaying correctly. Now I’m cleaning up the 45 artists and Sähkö label (I don’t yet know what happened with that).
Again, don’t worry about search problems. That will be resolved soon as well. thanks
I think teo disabled it temporarily. see above.
Yes the FFFD word (hex) is replacing several malformed UTF characters.
Hopefully teo’s comment is true – that the script just missed some DB
entries.
here too…
just leave the browser’s window open to retry in a few… hours? hopefully.
When I’m trying to do an update it says:
‘[b]Updates temporarily disabled.[/b]’.
I’ve been updating all day, the message only appeared now.
Is this normal?
(btw, i’m at 100 pendings – i don’t know if it has something to do with it. probably not, been to 107 earlier this day)
Also the [url=http://www.discogs.com/label/S%C3%83%C2%A4hk%C3%83%C2%B6]Sähkö label[/url] seems to have been toasted. The individual releases seem to be showing Finnish characters, but not the label itself.
ah, thanks Jooles, right, please don’t submit updates to these. I’ll be fixing them.
Thanks for all the feedback!
UPDATE: There are a few releases in the 200,000 – 240,000 range that did not get converted. I’m not sure why because my scripts covered the complete range. But I’m trying to figure out the range of what was not converted and will update them then.
45 artists were not converted either (including Royksopp) because the converted data would conflict with other artist entries. I am updating those by hand.
Finally, the search engine is going to have a few broken links so don’t worry about it now. I am updating it later today.
donnacha: the search engine doesn’t like special characters.
it is because of the umlaut. another example: type this into the engine:
[b]é[/b]
[u=julesparis] (and any others who’ve been submitting updates):
I think it’s best to hold off submitting any updates to individual releases containing artists with accented characters who appear wrong. Let’s wait for teo to try to iron out the bugs with this new upgrade and then see if any minor updates need to be made…
They are going to have to parse through the whole database and ‘find and
replace’ for many characters if this situation is correct.
here is another: http://www.discogs.com/artist/Asia+2001
in this case the circumflex works with lower case ‘a’ but not ‘A’ it seems.
I am thinking now that the issue is rather deep. I think it has to do
with the way that the characters were input into the original submissions.
They were no doubt cut an pasted from different places and the hexadecimal is
getting translated differently in these circumstances.
like in my example: http://www.discogs.com/label/Stoneage+Records
two of the Orembo (whatever it is) have the proper umlaut, one does not…
The back-up may be too large and unwieldy to actually handle.
But yeah, certain characters – perhaps it is so. Though the umlaut
works fine in some cases and not for others, so I think it is subtler
than that.
Only some special characters are broken. Must be the way they were entered in the first place.
Should have been run on a backup of the database first – to iron out obvious problems like this.
actually these guys are doing a very good job. There are eally only a
very few little things to clean up. I actually did not know what the DB
upgrade was about, but I guessed it was a change having to do with
Unicode when I was browsing my items and several characters were
all ‘messed up’ At several points yesterday the situation was way, way worse. It looks like the Scandanavian character sets are in need of
some fixing and the ‘umlaut’ is perhaps still an issue. I noticed mjb
in another topic, and I suspect he is a clever fellow capable of resolving
any residual bugs expediently.
Actually, [a=Motörhead] and [a=Crüxshadows, The] pages are ok except via the search engine.
[r=494677] Artist Page Link does not work.I Think everything with an ö,ä,ü (German Umlaut) is a broken link.
Previously, accented letters were treated like the same letter without an accent, IIRC.
Entering a credit for “Chateau Flight” would lead to the page for “Château Flight”, but it seems, that this is not the case now, which makes checking artist links somewhat difficult, because some accents are inconsistent/redundant.
For example, Marcos Lopez from http://www.discogs.com/artist/Marmion will mostly be listed without an accented “o”, but the artist link for “Marcos Lopez” is currently “empty”, because it has to be entered as “Marcos López”
yep
as above
Röyksopp is a big mess as well, when you eventually get on the artist page, there’s no way back from a release: clicking on the artist link leads back to “wrong page”
this is one major fuck up – loads of artists end up with broken links: Âme, Château Flight …
[url=http://www.discogs.com/artist/Sven+V%E4th]Sven V?th[/url]
And the search-engine still doesn’t understand special characters.
http://www.discogs.com/release/240418
Finnish is a weird language; it is no wonder…
EXO CD28
spotted another little unicode bug:
http://www.discogs.com/label/Exogenic+Records
it is the third to last entry iirc.
Existing entries are having some difficulties…
[url=http://www.discogs.com/release/200582]for instance[/url]
Is all the info lost for these?
I knew it would take longer. Excellent idea and outcome though.
I meant “fo’ evah, yo.
that felt like o’ evah, son!
word ’em up
Okay, the conversion is complete! Sorry about the delay, it took much longer than I anticipated. But we’re now running with unicode support so we can truly handle all languages.
There are a few areas where you may see broken links: search results and the browse artists/labels pages. I am in the process of rebuilding those and they should be up do date within 24 hours.
Special thanks to [u=mjb] for his help in getting me up to speed on unicode!
my maths is no good, is this about to happen really soon?
oooo yeah! So this should fix the probs with accented characters on the new submission form. Schweet!
Cool!