Bug P3
Status Update
Comments
sc...@chromium.org <sc...@chromium.org> #2
Is hyphenation done in line layout? Anyone know about our hyphenation dictionary?
[Monorail components: -Blink Blink>Layout]
[Monorail components: -Blink Blink>Layout]
va...@chromium.org <va...@chromium.org> #3
[Empty comment from Monorail migration]
va...@chromium.org <va...@chromium.org> #4
Thanks for filing the issue!
@Reporter: Could you please share a sample test file satisfying the conditions mentioned inhttps://crbug.com/chromium/973102#c0 , which helps us to triage this further in a better way.
@Reporter: Could you please share a sample test file satisfying the conditions mentioned in
ea...@chromium.org <ea...@chromium.org> #5
On which platform is this? The dictionaries are OS specific.
yv...@gmail.com <yv...@gmail.com> #6
It's simple, create an HTML file that contains these words and change its content so that the words are at the end of the line. As I said, you'll have to play depending on your screen size and everything, I cannot provide a universal test case. You're probably used to doing this, I only do this on my local computer which is inaccessible to you.
My OS is Android 7.1.1 on a Sony Xperia Z5 Compact, but the issue has been reported to me from a user of Samsung Internet 9.2.10.15 on some Samsung device. So it does affect multiple devices.
My OS is Android 7.1.1 on a Sony Xperia Z5 Compact, but the issue has been reported to me from a user of Samsung Internet 9.2.10.15 on some Samsung device. So it does affect multiple devices.
sh...@chromium.org <sh...@chromium.org> #7
Thank you for providing more feedback. Adding the requester to the cc list.
For more details visithttps://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
For more details visit
[Deleted User] <[Deleted User]> #8
yv...@gmail.com <yv...@gmail.com> #10
Still occurs here. What did you try to reproduce as you failed? I just opened my website in Chrome on Windows (latest stable) and I can see this.
yv...@gmail.com <yv...@gmail.com> #11
PS: Chrome on Windows now supports hyphenation. And it's just as broken as it was before on other platforms when I initially reported this.
ko...@chromium.org <ko...@chromium.org> #12
Reporter, it would be greatly help us to analyze if you could share any reproducing URLs for our investigations.
I guess the page reporter is seeing has lang="en" or lang="en-uk".
Tests:https://jsbin.com/boxamav/edit?html,output
(may need to shorten "12345" or make it longer depends on fonts)
* Win/Linux/ChromeOS Chrome hyphenates at:
- "start-up" for "en-us"
- "star-tup" for "en-uk" and "en"
* Mac Chrome and Safari hyphenates at:
- "start-up" for "en-us" and "en"
- "star-tup" for "en-uk"
* Firefox doesn't hyphenate at either point.
Two points need investigations:
* I'm not sure if "star-tup" is the correct hyphenation for UK English.
* When the lang is "en", should it be "en-us" or "en-uk"? Currently, Chrome matches Android behavior, but from the tests, Safari seems to use "en-us" for "en".
[Monorail components: -Blink>Layout Blink>Layout>Inline]
I guess the page reporter is seeing has lang="en" or lang="en-uk".
Tests:
(may need to shorten "12345" or make it longer depends on fonts)
* Win/Linux/ChromeOS Chrome hyphenates at:
- "start-up" for "en-us"
- "star-tup" for "en-uk" and "en"
* Mac Chrome and Safari hyphenates at:
- "start-up" for "en-us" and "en"
- "star-tup" for "en-uk"
* Firefox doesn't hyphenate at either point.
Two points need investigations:
* I'm not sure if "star-tup" is the correct hyphenation for UK English.
* When the lang is "en", should it be "en-us" or "en-uk"? Currently, Chrome matches Android behavior, but from the tests, Safari seems to use "en-us" for "en".
[Monorail components: -Blink>Layout Blink>Layout>Inline]
yv...@gmail.com <yv...@gmail.com> #13
My page lang value is set to "en". Does "en-uk" exist after all? AFAIK the ISO country code for UK is still "GB". At least all browsers offer me to send "en-GB" in the accepted languages header.
Anyway, you can see the effect here:https://ygoe.de/en
Both words appear in the content. Please note that hyphenation is only active for page widths below 420 pixels. You may also need to edit the content to place the words at a line end (open developer tools, find the parent <p> element and add the contenteditable attribute, then edit the text on the page before each word to place it where you want).
Anyway, you can see the effect here:
Both words appear in the content. Please note that hyphenation is only active for page widths below 420 pixels. You may also need to edit the content to place the words at a line end (open developer tools, find the parent <p> element and add the contenteditable attribute, then edit the text on the page before each word to place it where you want).
ko...@chromium.org <ko...@chromium.org> #14
Thanks, sorry, you're right, not "en-uk", but "en-gb".
Confirmed a few things:
1. "star-tup" is not a correct hyphenation even for "en-gb".
2. Android, and Chrome Android/Win/Linux/ChromeOS uses "en-GB" when "en" is set.
https://android.googlesource.com/platform/frameworks/base/+/master/core/jni/android_text_Hyphenator.cpp#143
From the test result, it looks like macOS uses "en-US" when "en" is set.
The "hyphenator.js" uses "en-us" when "en" is set.
https://github.com/mnater/Hyphenator/blob/master/Hyphenator.js#L93
3. Our "en-gb" dictionary is up-to-date with the TeX hyphenation dictionary.
https://github.com/hyphenation/tex-hyphen/tree/master/misc
4. The "en-gb" dictionary has the "r1tu" entry, meaning to hyphenate as "r-tu". This entry does not exist in the "en-us" dictionary. The "hyphenator.js" has this entry too.
https://github.com/mnater/Hyphenator/blob/master/patterns/en-gb.js
5. Firefox and hyphenator.js does not hyphenate "startup" at all.
I'm still not sure whether the issue is in the "en-gb" TeX dictionary (reproduces on macOS too) or in the hyphenator code (not reproducible on Firefox and hyphenator.js,) and also not sure whether to map "en" to "en-gb" or to "en-us".
Confirmed a few things:
1. "star-tup" is not a correct hyphenation even for "en-gb".
2. Android, and Chrome Android/Win/Linux/ChromeOS uses "en-GB" when "en" is set.
From the test result, it looks like macOS uses "en-US" when "en" is set.
The "hyphenator.js" uses "en-us" when "en" is set.
3. Our "en-gb" dictionary is up-to-date with the TeX hyphenation dictionary.
4. The "en-gb" dictionary has the "r1tu" entry, meaning to hyphenate as "r-tu". This entry does not exist in the "en-us" dictionary. The "hyphenator.js" has this entry too.
5. Firefox and hyphenator.js does not hyphenate "startup" at all.
I'm still not sure whether the issue is in the "en-gb" TeX dictionary (reproduces on macOS too) or in the hyphenator code (not reproducible on Firefox and hyphenator.js,) and also not sure whether to map "en" to "en-gb" or to "en-us".
gi...@appspot.gserviceaccount.com <gi...@appspot.gserviceaccount.com> #15
The following revision refers to this bug:
https://chromium.googlesource.com/chromium/src/+/a2c814f8d7db0008fc653d99532e5e7b8ff64732
commit a2c814f8d7db0008fc653d99532e5e7b8ff64732
Author: Koji Ishii <kojii@chromium.org>
Date: Tue Jun 22 01:06:11 2021
Change "en" hyphenation to use "en-us" instead of "en-gb"
When the specified langauge is "en", this patch changes to use
the "en-us" hyphenation dictionary instead of the "en-gb".
It looks like this behavior matches the other browsers.
Android maps "en" to "en-gb", but because Android takes the
language from the system, it is usually more specific (i.e.,
"en-us" or "en-gb", not "en".) On the other hand, CSS
prohibits using the system language <crbug.com/676270 > that
the use of "en" is more common.
Bug: 973102
Change-Id: I7547725b9d30fc137f987fb200fa2e4b699d2c21
Reviewed-on:https://chromium-review.googlesource.com/c/chromium/src/+/2975039
Reviewed-by: Kent Tamura <tkent@chromium.org>
Commit-Queue: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#894484}
[modify]https://crrev.com/a2c814f8d7db0008fc653d99532e5e7b8ff64732/third_party/blink/renderer/platform/text/hyphenation/hyphenation_minikin.cc
[modify]https://crrev.com/a2c814f8d7db0008fc653d99532e5e7b8ff64732/third_party/blink/renderer/platform/text/hyphenation_test.cc
commit a2c814f8d7db0008fc653d99532e5e7b8ff64732
Author: Koji Ishii <kojii@chromium.org>
Date: Tue Jun 22 01:06:11 2021
Change "en" hyphenation to use "en-us" instead of "en-gb"
When the specified langauge is "en", this patch changes to use
the "en-us" hyphenation dictionary instead of the "en-gb".
It looks like this behavior matches the other browsers.
Android maps "en" to "en-gb", but because Android takes the
language from the system, it is usually more specific (i.e.,
"en-us" or "en-gb", not "en".) On the other hand, CSS
prohibits using the system language <
the use of "en" is more common.
Bug: 973102
Change-Id: I7547725b9d30fc137f987fb200fa2e4b699d2c21
Reviewed-on:
Reviewed-by: Kent Tamura <tkent@chromium.org>
Commit-Queue: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#894484}
[modify]
[modify]
ko...@chromium.org <ko...@chromium.org> #16
The message #14 changes the dictionary for "en" to "en-us", which hyphenates "startup" correctly.
The issue in the "en-gb" hyphenation dictionary is not addressed yet though.
The issue in the "en-gb" hyphenation dictionary is not addressed yet though.
yv...@gmail.com <yv...@gmail.com> #17
I don't know whether it's advisable to default from "en" to "en-US" given the worldwide spread of GB-based English (including Australia, India, South Africa, Singapore and others) over the very regional home of US-based English (including Canada only).
Even if you suggest using en-US here, it's probably wrong because my content uses en-GB spelling. And if I set en-GB instead of en, nothing has improved for me. It might even get worse on Macs.
Anyway, it shouldn't matter which is used if "star-tup" is invalid everywhere. So if the system produces that hyphenation, *something* is broken for sure. And please don't forget "JavaScript" as well.
I don't understand the note about Android knowing a more specific language from the system. The language definition comes from the lang attribute in the HTML document. It can change to anything and is completely unrelated to the current system's locale setting. Most websites have a language selector that redirects the visitor to another language version of the site, setting another lang attribute.
Even if you suggest using en-US here, it's probably wrong because my content uses en-GB spelling. And if I set en-GB instead of en, nothing has improved for me. It might even get worse on Macs.
Anyway, it shouldn't matter which is used if "star-tup" is invalid everywhere. So if the system produces that hyphenation, *something* is broken for sure. And please don't forget "JavaScript" as well.
I don't understand the note about Android knowing a more specific language from the system. The language definition comes from the lang attribute in the HTML document. It can change to anything and is completely unrelated to the current system's locale setting. Most websites have a language selector that redirects the visitor to another language version of the site, setting another lang attribute.
ko...@chromium.org <ko...@chromium.org> #18
Thanks for the comment.
As in thehttps://crbug.com/chromium/973102#c14 , the switch to "en-us" is done to be interoperable with Safari and Firefox. As you point out, it's not related with this issue, but we found we're not interoperable with other browsers, therefore we took the change.
As you might have figured out, the "star-tup" issue reproduces in Safari too when you set lang="en-gb". All browsers use the TeX hyphenation dictionaries:
https://www.tug.org/tex-hyphen/
or one derived from TeX. I just learned its format as part of the investigation for this issue. As far as I understood, it is an issue in the dictionary itself. So I assume it reproduces in TeX too, though I don't have environment to test it.
On the other hand, however, it does not reproduce in Firefox and hyphenator.js even when I set lang="en-gb", so I'm going to look into why the difference appear. Hopefully that can figure out the real cause of the issue.
That's where I am now, sorry for the slow steps but I hope your understanding.
As in the
As you might have figured out, the "star-tup" issue reproduces in Safari too when you set lang="en-gb". All browsers use the TeX hyphenation dictionaries:
or one derived from TeX. I just learned its format as part of the investigation for this issue. As far as I understood, it is an issue in the dictionary itself. So I assume it reproduces in TeX too, though I don't have environment to test it.
On the other hand, however, it does not reproduce in Firefox and hyphenator.js even when I set lang="en-gb", so I'm going to look into why the difference appear. Hopefully that can figure out the real cause of the issue.
That's where I am now, sorry for the slow steps but I hope your understanding.
yv...@gmail.com <yv...@gmail.com> #19
Yes, I have a workaround in place for now and prevent any hyphenation around the affected words. I'm going to change the lang attribute on my website to en-GB just to be more precise about my intention. I've learned here that it can actually make a difference.
ko...@chromium.org <ko...@chromium.org> #20
ko...@chromium.org <ko...@chromium.org> #21
Discussed with experts. A few more findings.
* Firefox does not ship the "en-gb" dictionary, but only "en-us"[1]. That probably explains why it does not reproduce in Firefox.
* The previous test on hyphenator.js was wrong, it reproduces too.
* The "star-tup" is probably an unfortunate side-effect for "star-tling"[2].
* In general, the quality of the TeX hyphenation dictionaries outside en-us varies.
Chromium uses the system dictionaries in Android. I think the "star-tup" issue should go to Android, or to the TeX community[3].
The "JavaScript" case is on us,https://crbug.com/chromium/963039 .
[1]https://searchfox.org/mozilla-central/search?q=intl%2Flocales&path=
[2]https://github.com/hunspell/hyphen/blob/master/tbhyphext.tex#L797
[3]https://www.tug.org/tex-hyphen/
* Firefox does not ship the "en-gb" dictionary, but only "en-us"[1]. That probably explains why it does not reproduce in Firefox.
* The previous test on hyphenator.js was wrong, it reproduces too.
* The "star-tup" is probably an unfortunate side-effect for "star-tling"[2].
* In general, the quality of the TeX hyphenation dictionaries outside en-us varies.
Chromium uses the system dictionaries in Android. I think the "star-tup" issue should go to Android, or to the TeX community[3].
The "JavaScript" case is on us,
[1]
[2]
[3]
gi...@appspot.gserviceaccount.com <gi...@appspot.gserviceaccount.com> #22
The following revision refers to this bug:
https://chromium.googlesource.com/chromium/src/+/152b45f49f0a3f53645c3b56036dcf188187cb55
commit 152b45f49f0a3f53645c3b56036dcf188187cb55
Author: Koji Ishii <kojii@chromium.org>
Date: Thu Jun 24 07:26:44 2021
Avoid auto-hyphenating capitalized words, except for German
This patch disables automatic hyphenation for capitalized
words. Originally raised to Firefox[1], CSS WG resolved[2].
The logic matches Firefox. There were some discussions about
more heuristic rules to detect proper nouns (e.g., iTunes) and
considerations for other languages than German. We can tweak
the rules as they come up.
[1]https://bugzilla.mozilla.org/show_bug.cgi?id=1550532
[2]https://github.com/w3c/csswg-drafts/issues/3927
Bug: 963039, 973102
Change-Id: I437a98a3c6eacdf4b027c622e5f60bdd056a57b8
Reviewed-on:https://chromium-review.googlesource.com/c/chromium/src/+/2982497
Reviewed-by: Yoshifumi Inoue <yosin@chromium.org>
Commit-Queue: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#895487}
[modify]https://crrev.com/152b45f49f0a3f53645c3b56036dcf188187cb55/third_party/blink/renderer/platform/text/hyphenation.h
[modify]https://crrev.com/152b45f49f0a3f53645c3b56036dcf188187cb55/third_party/blink/renderer/platform/text/hyphenation/hyphenation_minikin.cc
[modify]https://crrev.com/152b45f49f0a3f53645c3b56036dcf188187cb55/third_party/blink/renderer/platform/text/hyphenation/hyphenation_minikin.h
[modify]https://crrev.com/152b45f49f0a3f53645c3b56036dcf188187cb55/third_party/blink/renderer/platform/text/hyphenation_test.cc
[modify]https://crrev.com/152b45f49f0a3f53645c3b56036dcf188187cb55/third_party/blink/renderer/platform/text/mac/hyphenation_mac.cc
[modify]https://crrev.com/152b45f49f0a3f53645c3b56036dcf188187cb55/third_party/blink/renderer/platform/wtf/text/unicode.h
commit 152b45f49f0a3f53645c3b56036dcf188187cb55
Author: Koji Ishii <kojii@chromium.org>
Date: Thu Jun 24 07:26:44 2021
Avoid auto-hyphenating capitalized words, except for German
This patch disables automatic hyphenation for capitalized
words. Originally raised to Firefox[1], CSS WG resolved[2].
The logic matches Firefox. There were some discussions about
more heuristic rules to detect proper nouns (e.g., iTunes) and
considerations for other languages than German. We can tweak
the rules as they come up.
[1]
[2]
Bug: 963039, 973102
Change-Id: I437a98a3c6eacdf4b027c622e5f60bdd056a57b8
Reviewed-on:
Reviewed-by: Yoshifumi Inoue <yosin@chromium.org>
Commit-Queue: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#895487}
[modify]
[modify]
[modify]
[modify]
[modify]
[modify]
ko...@chromium.org <ko...@chromium.org> #23
[Empty comment from Monorail migration]
ko...@chromium.org <ko...@chromium.org> #24
Found Gecko's mapping:
https://searchfox.org/mozilla-central/source/modules/libpref/init/all.js#1931
One way to fix this, at least until the dictionary get improved, is to map all "en*" to "en-us", as Gecko does.
One way to fix this, at least until the dictionary get improved, is to map all "en*" to "en-us", as Gecko does.
ko...@chromium.org <ko...@chromium.org> #25
[Empty comment from Monorail migration]
Description
local
Steps to reproduce the problem:
Create a web page containing the words "startup" and "JavaScript" and have the browser hyphenate them. Since this depends on screen size, font family, font size and all sorts of individual factors, I am unable to provide a universal test case. You'll have to play. Be sure to set the lang="en" attribute on the page.
What is the expected behavior?
start- up
Java- Script
(Correct me if I'm wrong, I haven't learnt much about English hyphenation rules, but these look reasonable to me.)
What went wrong?
star- tup
JavaS- cript
Does it occur on multiple sites: Yes
Is it a problem with a plugin? No
Did this work before? N/A
Does this work in other browsers? Yes
Chrome version: Chrome 74.0.3729.157 Channel: stable
OS Version: 7.1
Flash Version: n/a
Cannot be tested on Chrome/Windows because it doesn't support hyphenation altogether. Sad story. Use Firefox.