Android media service should always read tags in OGG/AAC(M4A) as UTF-8 [37013213]

Obsolete

Bug

[AOSP] Version-Opensource

Status Update

No update yet.

Description

yu...@gmail.com

created issue #1

Dec 2, 2014 05:46AM

There are some existing reports regarding the text-encoding issues in music tags:

http://code.google.com/p/android/issues/detail?id=2688

http://code.google.com/p/android/issues/detail?id=2777

http://code.google.com/p/android/issues/detail?id=2930

http://code.google.com/p/android/issues/detail?id=12263
The symptom is common: Japanese/Chinese song info get messed up.

I'm starting a new issue ticket here because I think the *root cause* is Android media service tries detecting the text-encoding when reading tags. It's up to discuss whether it's correct for MP3(particularly with ID3v1), but no doubt it's wrong for OGG and M4A.

The text-encoding for OGG/MP4 is defined in the standard, it should always be UTF-8. There is no reason to detect the encoding, just force using UTF-8.

For MP3 IDv1, the commonly used way is always reading in the native encoding of current locale. I'm not an expert but I can see in wikipedia(

http://en.wikipedia.org/wiki/ID3) mentioning that ID3v2 supports user-defined text-encoding. I don't know if this feature is commonly used in practice but in case it's used, the media service should respect it rather than detect the encoding.

Comments

yu...@gmail.com <yu...@gmail.com> #2Dec 2, 2014 08:01AM

Spent some time looking into this issue.
This should be regarded as a *design flaw*.

Now Android media service consults of 2 parts, the service and the client.
The service will only retrieve the raw data, then the client will handle the rawdata, and convert the data to UTF-8 if it doesn't think it is already UTF-8.

https://android.googlesource.com/platform/frameworks/av/+log/master/media/libmedia/MediaScannerClient.cpp
I can see in commit history that efforts are made to address this issue, and I really appreciate it. But it's the wrong way.

Like I suggested, the encoding can be decided without looking into the content. Now the client is not aware of the format, so there is no confidence on whether the tags are already in UTF-8 for the client side.

I understand the fact "Id3 tags are supposed to be ISO-8859-1 or unicode, but often aren't.", but that's only true for ID3, please take note that MP4/OGG tags are always expected to be UTF-8.

I don't understand the client/server design for this. It seems that client is for database access and server is for scanning and they are supposed to work with any client/server interface complient component. No points indeed. No one is expecting multiple such clients/multiple such servers on a running system.

Anyway given the current design, I would suggest you add an extra field to the scanning interface, indicating the utf-8 confidence (bool) of the tags.
Like:
Client {
addStringTag(
}
MP4server/OGGServer {
client.addStringTag(tag, value, true);
}
MP3server {
client.addStringTag(tag, value, false);
} (MP3 is arguble because I think taggers that tags ID3v2 Text information frames with a non-zero Text encoding field should be considered confident)

The latest commit seems to move the convertion from storing to acquiring (I haven't checked the java part but from the commit log it seems to be so). Then the utf-8 confidence tag should be stored into the database as well.

hb...@gmail.com <hb...@gmail.com> #3Dec 2, 2014 11:11AM

We appreciate your frustration, but please don't create duplicate issues - it makes it harder to see the true number of bugs and may actually hinder any fix.

We can't accept patches here for legal reasons. Please follow the instructions on

http://s.android.com/source/submit-patches.html

yu...@gmail.com <yu...@gmail.com> #4Dec 2, 2014 11:34AM

I don't use google code too much for project management but it seems the tracker is just a simple one without link features.

I don't think this being a duplicate issue - as long as this issue addresses on the root cause rather than the symptoms. Maybe you're right and that's the AOSP community way of treating issues but please track this issue further in this ticket simply because once the root cause gets fixed the symptoms will be gone as well.

I haven't created a patch yet. I know the patch instruction and will do it that way.
I just come to post the suggestion hoping some active developer can join the discuss.

Thanks.

en...@google.com <en...@google.com> Dec 3, 2014 01:22AM

Assigned to ma...@google.com.

ma...@google.com <ma...@google.com> #5Dec 3, 2014 10:35PM

Unfortunately it's simply not true that OGG doesn't suffer from badly encoded tags (we've seen OGG files with ISO8859-1 encoded tags, for example).
Can you absolutely, 100% guarantee that M4A files don't also suffer from this issue?

yu...@gmail.com <yu...@gmail.com> #6Dec 3, 2014 11:28PM

AFAIK, there are 3 common sources of M4A. iTunes store, ripped CD from iTunes/foobar2k(Tag managed by the application rather than the encoder), CLI encoder(tag managed by the encoder itself, out of many the popular ones are faac/neroaac/qaac/qtaacenc/fhgaac/fdkaac). None of above yields bad encoded tags.
Also there are taggers that might change the property of M4A files, like mp3tag/Windows Explorer(file property), no issue at all.

I don't know how OGG comes because I only use fb2k+oggenc but at least there is no issue with this combo.

Bad encoder/application might exist for M4A as well - after all the open-source world is so fantastic, even OGG has UTF-8 in its standard yet there are ISO8859-1 encoded tags(if what you said is true), so it won't surprise me if someone with little or no knowledge decides to contribute an open-source M4A tagger/manager that yields bad encoded tags.

However, in my opinion, we shouldn't care about of the bad applications and bad files. Following the current way(auto-detecting text encoding), no one will be satisfied because the detecting can't be 100% correct. But if we follow the standard, those who follow the standard, use standard-compliant applications, will be satisfied.
This is a trade-off but I would vote for following the standard.

You may say people might complain because their OGGs with bad encoded tags won't be displayed `correctly' but in such case, the OGGs can't either be displayed as they want in every common music applications we could name. So it's the file's fault, it's the user's fault providing wrong files, not the application(media server)'s fault.

MP3 is up to discuss because I haven't been using it mainly for a long time(switched to OGG/M4A). I would do some investigation on it to see how the fact is. The main concern for MP3 is that:
assuming we have 2 applications, A and B.
A supports UTF-8, and encodes tags in UTF-8, sets encoding byte to UTF-8.
B doesn't supports UTF-8, the user uses B to modify tags, now tags are encoded in the native encoding, leaving encoding byte unmodified(still UTF-8).
So in this way. the encoding byte is not confident even it's set to UTF-8.

However my current opinion for MP3 is same as for OGG/M4A - following the standard, that means using UTF-8 when it's marked to be UTF-8, using native locale when it's not.

The principle is here:
Following the standard will make people who also follow the standard *never in trouble*, and will warn/remind those who don't so that ****they can be satisfied after changing the tags manually with right-doing applications****. The behavior is predictable.
Trying to be `smart' will only hurt in the way that no one will be always satisfied. And once a problem occurs, there is even no simple way to fix it. Please tell me how can I be satisfied if I have a song with tags mis-interpreted? Should I try looking for some application to tag in the wrong text-encoding and put it back in my device? And what if it still fails? try another text-encoding? What if all fail? So I have to change the text itself for the tag field? This definitely makes no sense.

yu...@gmail.com <yu...@gmail.com> #7Dec 4, 2014 12:02AM

I have thought over about the solution. I think one possible way that makes sense is adding UTF BOM in the server side, and the client side doing nothing except storing to database.
So if the file format is OGG/M4A, the sever simply adds UTF-8 BOM before copying the text field. If the file format is WMA, we add UTF-16 LE BOM(M$ loves it very much).

The advantages of doing so are:
1. We retain the original info in the database, so that if the user changes system
locale (like Chinese -> Japanese), no re-scan needed.
2. if it's UTF-16 LE(WMA), no conversion is needed before converting to a JString.

This however needs change of the interface because there are NULL bytes in a UTF-16 string. So currently we can convert UTF-16 to UTF-8.

de...@gmail.com <de...@gmail.com> #8Dec 4, 2014 11:13AM

Just to add that this is also a problem with FLAC files which are also specified to use UTF-8 encoded tags. Since Android 5.0, FLAC files tags containing non ascii characters are likely to display incorrectly. An example file that illustrates the problem is attached. The tag includes a 'right single quotation mark' UFT-8 code - 0xE2 0x80 0x99 (e28099). Depending on the other text present in the tag, it may or may not be interpreted correctly.

deleted

Restricted

0 B

yu...@gmail.com <yu...@gmail.com> #9Dec 6, 2014 11:46AM

I started a fork in github for better tracking of the implementation of the change.

https://github.com/yumeyao/android_platform_frameworks_av
(issue tracker of googlecode is kind of lame)

Anyone who interested in following the standard, please refer to

issue 36903751

to help me with research of the standards:

https://github.com/yumeyao/android_platform_frameworks_av/issues/2

ra...@gmail.com <ra...@gmail.com> #10Feb 3, 2016 04:33PM

I've been bitten by this bug MANY times, and I still am under Android 6. My MP3 and OGG files have properly encoded tags in UTF-8, but in order for the metadata to appear properly under Google Play Music and other players which use the Media Scanner produced library, I have to use ID3v2.3 encoded as UTF-16 for most of them, ID3v2.4 encoded as UTF-8 for others!. Sometimes I have to tweak this for songs belonging to the SAME ALBUM!!!

Ogg files are handled incorrectly most of the time, because Media Scanner ignores the encoding (UTF-8). Wondering the encoding always without first checking if it's valid UTF-8 as the standard says is ridiculous. You're causing problems for files having proper metadata in order to support files with badly encoded metadata. Great. Great, great.

But more important: I have a file which gets its metadata misinterpreted ALWAYS, no matter if I use ID3v2.3/UTF-16, ID3v2.4/UTF-8 or Vorbiscomments. I can give you the offending file, if you want me to. Not here, as I can't distribute music, but I can prepare a silent file with the offending metadata. But please, fix this, as this problem DOESN'T happen on an iPhone or if you use players like PowerAmp. So it can be done and Android is lacking behind.

so...@gmail.com <so...@gmail.com> #11Feb 3, 2016 05:40PM

Suppose I do want to waste a full-night's worth of CPU/Disk time... is there *anything* I can do to my tags that will make them render correctly?

Or are there any decent music players on Android that don't use this broken service?

ra...@gmail.com <ra...@gmail.com> #12Feb 3, 2016 05:55PM

Soumy,

in my experience, using ID3v2.3 encoded in UTF-16 with BOM is the combination which produces less errors. Still, it produces errors. Like 30 or so in my collection of 3100 songs. Not a big percent, but still. There's no safe way because Android is actually ignoring the specified encoding and tries to "guess" it.

PowerAmp uses its own music database and so far has always interpreted metadata correctly. Two problems, though: it's known to crash in some phones and if you want to use it permanently you have to squeeze like 4 euros or so. Not a big amount, if you ask me, but I couldn't use it on my old Samsung SCL, for example, due to the repeated crashes... It's far from perfect, but regarding metadata it's AGES above the results of Media Scanner.

ej...@gmail.com <ej...@gmail.com> #13Aug 21, 2016 03:06PM

Does this screenshot help? I'm struggling with using "smart quotes" in my files. Compare tracks 21 and 24. Both of them have the exact same character in them, the smart right quote (also known as ALT+0146 or U+2019). For one track, 24, the right quote shows just fine. For the other, 21, well, you can see what happens (it should be "Luke Slater's 7th Plain"). This happens almost randomly it seems - for some tracks it's fine, others no go. Other "non-typical" characters, like sharps and flats that appear a lot in the titles of classical music work just fine. Accented letters can be a bit dicy - again, some work, others doesn't.

I get the same behavior whether these files are MP3 or OGG. Other media players that don't use Android MediaServer (like GoneMad) have no problem reading and displaying the tags correctly.

deleted

Restricted

0 B

ej...@gmail.com <ej...@gmail.com> #14Aug 25, 2016 03:01PM

Another example...tracks 19-24 are all the same Cello Suite - all should say "in E(flat)". 19, 21, 23, and 24 are fine. For some reason the flat sign in 20 and 22 is mangled. Really struggling with why this is.

deleted

Restricted

0 B

81...@gmail.com <81...@gmail.com> #15Mar 11, 2018 06:10PM

Still broken in Oreo.
The only reliable way I could get the media scanner to interpret the tags correctly was to put characters with unmistakable encoding in the comment field. For instance "Android fix アンドロイド". Both ID3v2.3 and ID3v2.4 seem to work with this fix. I also removed ID3v1 just in case.
This is a very annoying bug, especially when media scanner detects 5 different encodings in a single album and splits it in 5 differently named albums. I can't fanthom how a regular user would deal with this.

ra...@gmail.com <ra...@gmail.com> #16Mar 11, 2018 06:30PM

A regular user will deal with this by not using stock Android, but a vendor-fixed version. Many vendors fix this, as the fix is easy as you can see in the messages above. The only reason why this is still unfixed is probably because Google loves bugs. Really, that's the only sane reason I can think of for the lot of bugs with easy fixes that linger in the Android code base.

This won't be fixed in Oreo, in P or in any other version. If this bug bites you, your best option is not to use a music player which uses Android metadata. Instead of Google Play Music, use other player not affected by this bug, there are a couple of them. Or use an Android version fixed by the vendor. Other than that...

It's preposterous, right?

ej...@gmail.com <ej...@gmail.com> #17Mar 11, 2018 07:15PM

Come back to this after a few years...it's still an issue in the P developer preview.

I've seen a couple of interesting ways to try to deal with this. One is at

https://github.com/beetbox/beets/issues/1893 - which says they had more success when they replaced a character with a look-alike Cyrillic version (there are characters that look like "a", "e", and "o" in Cyrillic but they're different characters in Unicode)...I tried this and did have some success with the Media Service. It was better. But still had several files that would not work.

Since UTF-8 is the "official" charset for Android (

https://developer.android.com/preview/behavior-changes.html) it seems like we'd want to enforce this in the Media Service rather than guessing.

This happens in both ogg vorbis, opus, and flac - the three formats I've been known to use. And I believe all of the standards for these three say to use UTF-8.

Perhaps a solution is to use the current "guessing" model for MP3, but don't "guess" for ogg/opus/flac?

Thank you...I know this impacts a small number of users, but after I spend so much time making tags accurate, this makes me a tad angry. :)

en...@google.com <en...@google.com> #18Mar 11, 2018 07:22PM

#16: do you have specific examples of major OEMs who use different heuristics? that might help us motivate changing AOSP, given the Project Treble work.

ra...@gmail.com <ra...@gmail.com> #19Mar 11, 2018 07:29PM

#18: I can't provide an exhaustive list of vendors, but Samsung does, for example. I don't know if this happens in their latest devices but in 2016 devices I tested they honored UTF-8 encoded tags if the encoding was explicit (ID3v2.4, Vorbiscomment) and they honored UTF16-LE (I think, sorry, I don't have my notes about this anymore) for ID3v2.3. So they weren't using the AOSP method, that's for sure.

As for other OEM... I can't tell. My recent experience is with an OEM which uses an Android version with little add-ons, quite a pure experience very near to the stock Android, with some usability improvements in place. And for the metadata they use whatever AOSP provides, this I know perfectly because I have in my collection some albums which are interpreted with MORE than one encoding.

I don't currently have the resources to investigate many vendors, but I can try to ask around providing a couple of tailored MP3s to see if the metadata is correctly interpreted or not. This may take a lot of time for me, anyway, so I can't promise results soon.

Anyway, I'm with you: the best course of action would be guessing as a last resort, if using MP3 with ID3 metadata not specifying encoding. For Vorbiscomment, ID3v2.4, opus, etc. this is UTF-8, no need to guess the encoding.

Thanks for your attention :)

fi...@gmail.com <fi...@gmail.com> #20Dec 8, 2019 01:40PM

This bug celebrated its 10th anniversary and it's still without a solution

ra...@gmail.com <ra...@gmail.com> #21Dec 8, 2019 09:02PM

#20, yes, it's amazing, and fantastic. I can't believe this bug is still unfixed, specially when one of the reporters gave a solution, YEARS ago. I'm tented to call names to this part of AOSP, and believe me, I don't insult software gratuitously because even the worst pieces of software crap have a lot of work behind and a huge engineering effort, but this one is... wow, unbelievable.

I suppose that this is one of those miriad Android stupid design flaws one have to learn to live with. Fortunately, there's a workaround: not using AOSP music player or Google Play Music, but a decent third party music player, there's a few worth trying.

Meanwhile, a new ton of stupid features will be added to Android for the next version and a ton of old bugs will go unfixed. But hey, AOSP doesn't cost a dime, so...

I can't believe it's been 10 years...

yu...@gmail.com <yu...@gmail.com> #22Dec 9, 2019 06:45AM

well. hard to believe it's still unfixed given that I already pointed out the root cause several years ago - but seems got rejected by the arrogant developer(?).
never mind i'm using samsung s10e and huawei p30 and none of them have this issue.. so yes, just because they are asian company and they respect those users who do listen to music other than in English right??

ra...@gmail.com <ra...@gmail.com> #23Dec 9, 2019 06:48AM

Exactly #22, I was talking about you and the fix you proposed. And I'm using a phone not affected by this bug because the manufacturer fixed it on its own, so my international music appears correctly. I no longer wait for (or expect, really) for a fix from Google.

ad...@google.com <ad...@google.com> #24Sep 4, 2020 12:36PM

Status: Won't Fix (Obsolete)

Thank you for your feedback. We assure you that we are doing our best to address all issues reported. For now, we will be closing the issue as won't fix obsolete.
If this issue currently still exists, we request that you log a new issue along with the bug report here

https://goo.gl/TbMiIO and reference this bug for context.

ni...@gmail.com <ni...@gmail.com> #25Jan 17, 2024 04:13AM

New bug was created by another user:

https://issuetracker.google.com/issues/237674422

Issue 37013213

Description

Issue summary

Comments

yu...@gmail.com <yu...@gmail.com> #2Dec 2, 2014 08:01AM

hb...@gmail.com <hb...@gmail.com> #3Dec 2, 2014 11:11AM

yu...@gmail.com <yu...@gmail.com> #4Dec 2, 2014 11:34AM

en...@google.com <en...@google.com> Dec 3, 2014 01:22AM

ma...@google.com <ma...@google.com> #5Dec 3, 2014 10:35PM

yu...@gmail.com <yu...@gmail.com> #6Dec 3, 2014 11:28PM

yu...@gmail.com <yu...@gmail.com> #7Dec 4, 2014 12:02AM

de...@gmail.com <de...@gmail.com> #8Dec 4, 2014 11:13AM

yu...@gmail.com <yu...@gmail.com> #9Dec 6, 2014 11:46AM

ra...@gmail.com <ra...@gmail.com> #10Feb 3, 2016 04:33PM

so...@gmail.com <so...@gmail.com> #11Feb 3, 2016 05:40PM

ra...@gmail.com <ra...@gmail.com> #12Feb 3, 2016 05:55PM

ej...@gmail.com <ej...@gmail.com> #13Aug 21, 2016 03:06PM

ej...@gmail.com <ej...@gmail.com> #14Aug 25, 2016 03:01PM

81...@gmail.com <81...@gmail.com> #15Mar 11, 2018 06:10PM

ra...@gmail.com <ra...@gmail.com> #16Mar 11, 2018 06:30PM

ej...@gmail.com <ej...@gmail.com> #17Mar 11, 2018 07:15PM

en...@google.com <en...@google.com> #18Mar 11, 2018 07:22PM

ra...@gmail.com <ra...@gmail.com> #19Mar 11, 2018 07:29PM

fi...@gmail.com <fi...@gmail.com> #20Dec 8, 2019 01:40PM

ra...@gmail.com <ra...@gmail.com> #21Dec 8, 2019 09:02PM

yu...@gmail.com <yu...@gmail.com> #22Dec 9, 2019 06:45AM

ra...@gmail.com <ra...@gmail.com> #23Dec 9, 2019 06:48AM

ad...@google.com <ad...@google.com> #24Sep 4, 2020 12:36PM

ni...@gmail.com <ni...@gmail.com> #25Jan 17, 2024 04:13AM

Add comment

Issue metadata