Fixed
Status Update
Comments
na...@google.com <na...@google.com>
en...@google.com <en...@google.com> #2
to expand slightly, i think there are two choices:
1. change the VM to not require surrogates.
2. change all code calling NewStringUTF that might need to deal with emoji to convert to surrogates first.
this wasn't a problem in the past because non-BMP just wasn't relevant, but emoji have really changed that. the older i get, the more i wonder whether we should change the VM.
that said, as we see with how easy it is to crash Settings by switching to Arabic, most code that uses NewStringUTF is pretty suspect to start with. an audit/rewrite would probably be a good thing. i'm just not sure how practical that is. (and i never did go back and fix that Settings crash...)
1. change the VM to not require surrogates.
2. change all code calling NewStringUTF that might need to deal with emoji to convert to surrogates first.
this wasn't a problem in the past because non-BMP just wasn't relevant, but emoji have really changed that. the older i get, the more i wonder whether we should change the VM.
that said, as we see with how easy it is to crash Settings by switching to Arabic, most code that uses NewStringUTF is pretty suspect to start with. an audit/rewrite would probably be a good thing. i'm just not sure how practical that is. (and i never did go back and fix that Settings crash...)
na...@google.com <na...@google.com> #3
To expand on 2 slightly, the VM is right that the sequence isn't modified UTF-8. These code points are supposed to be encoded as 2 x 3 byte surrogate pairs.
That said, I was thinking of modifying the VM to accept 4 byte utf-8 sequences and convert them into utf-16 surrogate pairs. It's bound to be tricky but it will probably make life a lot easier for apps that are treating mutf-8 as "null terminated UTF-8 over UCS-2 - {0}". We'll have to be stricter about overly long encodings though
My only worry is that we're introducing yet another pseudo-encoding :( . If we do this, we'll have to go delete the line on the wiki UTF-8 article that says "All known Modified UTF-8 implementations also treat the surrogate pairs as in CESU-8"
That said, I was thinking of modifying the VM to accept 4 byte utf-8 sequences and convert them into utf-16 surrogate pairs. It's bound to be tricky but it will probably make life a lot easier for apps that are treating mutf-8 as "null terminated UTF-8 over UCS-2 - {0}". We'll have to be stricter about overly long encodings though
My only worry is that we're introducing yet another pseudo-encoding :( . If we do this, we'll have to go delete the line on the wiki UTF-8 article that says "All known Modified UTF-8 implementations also treat the surrogate pairs as in CESU-8"
en...@google.com <en...@google.com> #4
sgtm
mr...@gmail.com <mr...@gmail.com> #9
A change the VM could use, related to this, is a char32 type, or uchar (for UCS4 char) as an all alpha label. Ustring wluld be the complement to String. Then if source is char16 the 6-byte surrogates get used going to UTF8, and with char32 4-byte form is output for non-BMP code points. Going the other way, utf8 to char16 or char32, whether it's 4-byte UTF8 or a 6-byte pair the target width determines use UTF16 or UTF32. An invalid pair gets stored as separate char32 values, not converted, or optionally throws an exception. I doubt adding bytecodes is even necessary, just a type letter to the class record, as the int ops can be overloaded. The changes to the compiler to add the new type keywords should be similarly minimal.
It's a thought, anyways.
It's a thought, anyways.
ra...@gmail.com <ra...@gmail.com> #11
how to resolve this error
[Deleted User] <[Deleted User]> #12
The workaround (for older phones) appears to be not putting the Emoji into your XML, but instead defining it in code.
So you can do something like this:
<string name="hooray">Hooray! %1$s</string>
Then in code:
final String PACKAGE_EMOJI = "\uD83C\uDF81";
getString(R.string.hooray, PACKAGE_EMOJI);
So you can do something like this:
<string name="hooray">Hooray! %1$s</string>
Then in code:
final String PACKAGE_EMOJI = "\uD83C\uDF81";
getString(R.string.hooray, PACKAGE_EMOJI);
[Deleted User] <[Deleted User]> #13
When J2V8 calls NewStringUTF with a string which contains emoji, very bad things happen on some
versions of Android:
On KitKat 4.4 And Lollipop 5.1.1, the converted string contains garbage characters rather than the original emoji.
On Lollipop 5.0.2, it crashes. This is not a normal exception which can be caught in Java; it actually kills the VM. (See error log below.)
On Marshmallow it appears to work fine.
#12
it looks like the only solution is to add in code
versions of Android:
On KitKat 4.4 And Lollipop 5.1.1, the converted string contains garbage characters rather than the original emoji.
On Lollipop 5.0.2, it crashes. This is not a normal exception which can be caught in Java; it actually kills the VM. (See error log below.)
On Marshmallow it appears to work fine.
#12
it looks like the only solution is to add in code
Description
I first observed this with Emoji (e.g. 😃 for a smiling face), but it seems any Unicode code point >= 0x10000 causes this crash since these are four-byte UTF-8 characters, which always begin with a value between 0xf0 and 0xf4, which seems to upset the NewStringUTF function.
Testing with a high three-byte UTF-8 character, e.g. 0xfffd (�) works fine.
Example of strings.xml content which crashes:
<string-array name="word_list_good">
<!-- U+1F603: SMILING FACE WITH OPEN MOUTH -->
<item>😃</item>
<!-- ... -->
</string-array>
Dalvik stacktrace (on an Android Wear 4.4W2 emulator):
W/dalvikvm( 1934): JNI WARNING: NewStringUTF input is not valid Modified UTF-8: illegal start byte 0xf0
W/dalvikvm( 1934): string: '😃'
W/dalvikvm( 1934): in Landroid/content/res/AssetManager;.getArrayStringResource:(I)[Ljava/lang/String; (NewStringUTF)
I/dalvikvm( 1934): "main" prio=5 tid=1 NATIVE
I/dalvikvm( 1934): | group="main" sCount=0 dsCount=0 obj=0xb2ddcda0 self=0xb8db8480
I/dalvikvm( 1934): | sysTid=1934 nice=0 sched=0/0 cgrp=[fopen-error:2] handle=-1216638336
I/dalvikvm( 1934): | state=R schedstat=( 0 0 0 ) utm=0 stm=0 core=0
I/dalvikvm( 1934): #00 pc 000019e5 /system/lib/libcorkscrew.so (unwind_backtrace+101)
I/dalvikvm( 1934): #01 pc 00008131 /system/lib/libbacktrace.so (CorkscrewCurrent::Unwind(unsigned int)+49)
I/dalvikvm( 1934): #02 pc 000028c9 /system/lib/libbacktrace.so (Backtrace::Unwind(unsigned int)+25)
I/dalvikvm( 1934): #03 pc 000b7c61 /system/lib/libdvm.so (dvmDumpNativeStack(DebugOutputTarget const*, int)+81)
I/dalvikvm( 1934): #04 pc 000954a8 /system/lib/libdvm.so (dvmDumpThreadEx(DebugOutputTarget const*, Thread*, bool)+1512)
I/dalvikvm( 1934): #05 pc 0009568b /system/lib/libdvm.so (dvmDumpThread(Thread*, bool)+75)
I/dalvikvm( 1934): #06 pc 0004beb3 /system/lib/libdvm.so
I/dalvikvm( 1934): #07 pc 0004dcdd /system/lib/libdvm.so (ScopedCheck::check(bool, char const*, ...)+1853)
I/dalvikvm( 1934): #08 pc 0005269a /system/lib/libdvm.so
I/dalvikvm( 1934): #09 pc 000b7fec /system/lib/libandroid_runtime.so
I/dalvikvm( 1934): #10 pc 0002b3de /system/lib/libdvm.so (dvmPlatformInvoke+82)
I/dalvikvm( 1934): at android.content.res.AssetManager.getArrayStringResource(Native Method)
I/dalvikvm( 1934): at android.content.res.AssetManager.getResourceStringArray(AssetManager.java:186)
I/dalvikvm( 1934): at android.content.res.Resources.getStringArray(Resources.java:468)
Or in a layout XML file:
<TextView
android:id="@+id/happy"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="😃" />
ART stacktrace (on a Nexus 5 with Android 5.0):
art/runtime/check_jni.cc:65] JNI DETECTED ERROR IN APPLICATION: input is not valid Modified UTF-8: illegal start byte 0xf0
art/runtime/check_jni.cc:65] string: '😃'
art/runtime/check_jni.cc:65] in call to NewStringUTF
art/runtime/check_jni.cc:65] from java.lang.String android.content.res.StringBlock.nativeGetString(long, int)
art/runtime/check_jni.cc:65] "main" prio=5 tid=1 Runnable
art/runtime/check_jni.cc:65] | group="main" sCount=0 dsCount=0 obj=0x737fdec0 self=0xb4f07800
art/runtime/check_jni.cc:65] | sysTid=2892 nice=0 cgrp=apps sched=0/0 handle=0xb6f12ec8
art/runtime/check_jni.cc:65] | state=R schedstat=( 571393690 122086422 592 ) utm=50 stm=7 core=1 HZ=100
art/runtime/check_jni.cc:65] | stack=0xbe0d2000-0xbe0d4000 stackSize=8MB
art/runtime/check_jni.cc:65] | held mutexes= "mutator lock"(shared held)
art/runtime/check_jni.cc:65] native: #00 pc 00004c58 /system/lib/libbacktrace_libc++.so (UnwindCurrent::Unwind(unsigned int, ucontext*)+23)
art/runtime/check_jni.cc:65] native: #01 pc 000034c1 /system/lib/libbacktrace_libc++.so (Backtrace::Unwind(unsigned int, ucontext*)+8)
art/runtime/check_jni.cc:65] native: #02 pc 0025918d /system/lib/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, char const*, art::mirror::ArtMethod*)+84)
art/runtime/check_jni.cc:65] native: #03 pc 0023cd13 /system/lib/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+162)
art/runtime/check_jni.cc:65] native: #04 pc 000b1195 /system/lib/libart.so (art::JniAbort(char const*, char const*)+620)
art/runtime/check_jni.cc:65] native: #05 pc 000b18c5 /system/lib/libart.so (art::JniAbortF(char const*, char const*, ...)+68)
art/runtime/check_jni.cc:65] native: #06 pc 000b3e63 /system/lib/libart.so (art::ScopedCheck::Check(bool, char const*, ...) (.constprop.128)+922)
art/runtime/check_jni.cc:65] native: #07 pc 000bd965 /system/lib/libart.so (art::CheckJNI::NewStringUTF(_JNIEnv*, char const*)+44)
art/runtime/check_jni.cc:65] native: #08 pc 00087f97 /system/lib/libandroid_runtime.so (???)
art/runtime/check_jni.cc:65] native: #09 pc 002599a7 /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_content_res_StringBlock_nativeGetString__JI+102)
art/runtime/check_jni.cc:65] at android.content.res.StringBlock.nativeGetString(Native method)
art/runtime/check_jni.cc:65] at android.content.res.StringBlock.get(StringBlock.java:82)
art/runtime/check_jni.cc:65] - locked <0x125b223b> (a android.content.res.StringBlock)
art/runtime/check_jni.cc:65] at android.content.res.XmlBlock$Parser.getPooledString(XmlBlock.java:458)
art/runtime/check_jni.cc:65] at android.content.res.TypedArray.loadStringValueAt(TypedArray.java:967)
art/runtime/check_jni.cc:65] at android.content.res.TypedArray.getText(TypedArray.java:144)
art/runtime/check_jni.cc:65] at android.widget.TextView.<init>(TextView.java:917)