Status Update
Comments
ja...@google.com <ja...@google.com> #2
- Android Build Version (go to Settings > About Device > Build Number (hold down to copy))
- Android Device Model:
- Please provide a sample project by uploading the exported zip file of the project from Android Studio or a sample apk file. Note: Please upload the bug report and screenshot to google drive and share the folder to android-bugreport@google.com, then share the link here.
- Upload the bug report file. Steps followed here:
- Steps taken for issue to occur:
1.
2.
3.
etc...
- Expected output: What is the expected output?
- Current output: What is the current output?
Thank you for your cooperation.
te...@shkspr.mobi <te...@shkspr.mobi> #3
- Android Build Version: ONEPLUS A5010_43_200513 / Android 10.0.0
The issue has been reported an open source project which use the WEB_URL
pattern.
https://github.com/signalapp/Signal-Android/issues/9449 https://github.com/signalapp/Signal-Android/issues/9122
I've also noticed the problem in Telegram, and SMS messenger services.
ja...@google.com <ja...@google.com>
vi...@google.com <vi...@google.com>
vi...@google.com <vi...@google.com> #4
hj...@gmail.com <hj...@gmail.com> #5
En español
te...@shkspr.mobi <te...@shkspr.mobi> #6
Would you accept a patch for this issue?
te...@shkspr.mobi <te...@shkspr.mobi> #7
Happy birthday to this bug! Are you happy to accept a patch for this?
pr...@gmail.com <pr...@gmail.com> #8
The trailing .
and !
are allowed in URLs, so your trailing .
example is incorrect.
But there indeed is a problem with WORD_BOUNDARY
for some non-word trailing URL characters, such as trailing slashes, like:
https://example.com/page1/
.
This problems is also visible in your trailing !
example, which shows an inconsistency in the match (one does match the trailing !
while it doesn't for the other URL).
This is caused by the \b
class in (?:\b|$|^)
(WORD_BOUNDARY
value), which is normally used as a border between word and non-word characters. But because URLs can end in both word and non-word characters, this will cause a non-word character to be trimmed if the URL isn't the end of the string.
A fix is to remove the trailing WORD_BOUNDARY
from WEB_URL
to:
public static final Pattern WEB_URL = Pattern.compile("("
+ "("
+ "(?:" + PROTOCOL + "(?:" + USER_INFO + ")?" + ")?"
+ "(?:" + DOMAIN_NAME_STR + ")"
+ "(?:" + PORT_NUMBER + ")?"
+ ")"
+ "(" + PATH_AND_QUERY + ")?"
+ ")");
instead of:
public static final Pattern WEB_URL = Pattern.compile("("
+ "("
+ "(?:" + PROTOCOL + "(?:" + USER_INFO + ")?" + ")?"
+ "(?:" + DOMAIN_NAME_STR + ")"
+ "(?:" + PORT_NUMBER + ")?"
+ ")"
+ "(" + PATH_AND_QUERY + ")?"
+ WORD_BOUNDARY
+ ")");
te...@gmail.com <te...@gmail.com> #9
A bug in old Android was found,
For four years, it lingered around.
Despite every fix,
It stayed in the mix,
A glitch that was sly and profound.
Description
The WEB_URL Pattern incorrectly assumes punctuation at the end of a string is part of a URl.
Example
Consider a string:
The WEB_URL Pattern misinterprets this as the path being
/page.
That is, the period/full-stop is included in the URl.
Consider a string:
In this string, the pattern correctly identifies the first path as
/page1
and incorrectly identifies the second path as/page2!
When a user clicks on a link which includes punctuation, they may receive a 404 error from the webserver, or be sent to an incorrect page.
Where the error occurs
I'm unfamiliar with the Android source, but it appears that the regex is in android/util/Patterns.java;l=323
Specifically the WORD_BOUNDARY
There is a test suite athttps://cs.android.com/android/platform/superproject/+/master:art/test/094-pattern/src/Main.java;l=54?q=web_url which should also be extended.
My knowledge of regexes is insufficient to suggest a complete solution. However, I notice that Markdown's URl parser ends with: