Status Update
Comments
pa...@google.com <pa...@google.com>
pa...@google.com <pa...@google.com> #2
pa...@google.com <pa...@google.com> #3
+1
[Deleted User] <[Deleted User]> #4
SELECT FLOAT64(JSON '75.2430000000005', wide_number_mode=>'round') as result; -- returns 75.2430000000005
While that works for individual values, the example I gave was just for clarity of the issue. This solution won't work when parsing an actual json object with different data types.
For example given the the json object below in the table `database.schema.tablename` running the parse_json object without defining the wide_number_mode=>'round' will result in failure if the value is 75.2430000000004, but success if the value is 75.2430000000005. Which is the inconstant behavior I am referring to as the two values have the exact same decimal accuracy so you would assume they should act in a similar fashion. So it would make sense if both values required the same wide_number_mode, but that is not the case.
json = {
"user": "bob",
"age": 27,
"test_score": 75.2430000000004
}
select parse_json(json) from `database.schema.tablename`;
pa...@google.com <pa...@google.com> #5
[Deleted User] <[Deleted User]> #6
pa...@google.com <pa...@google.com> #7
Thanks Robert for confirmation. The product team is aware of the issue. They will review the issue and update here.
Please note - This product or feature is covered by the Pre-GA Offerings Terms of the Google Cloud Terms of Service. Pre-GA products and features might have limited support, and changes to pre-GA products and features might not be compatible with other pre-GA versions. For more information, see the launch stage descriptions.
pa...@google.com <pa...@google.com>
lo...@google.com <lo...@google.com> #8
"PARSE_JSON: Input number: 21.0419003 cannot round-trip through string representation"
where 21.0419003 is longitude/latitude
ju...@deliverect.com <ju...@deliverect.com> #9
I have a similar issue:
PARSE_JSON: Input number: 51.4166919 cannot round-trip through string representation
this is a longitude/latitude, same as above
ra...@getcruise.com <ra...@getcruise.com> #10
For reference, I have two numbers that are castable to FLOAT64
but cannot parse the JSON with exact wide_number_mode
because it errors out.
Parsing the JSON with the default 'exact'
mode crashes.
From parsing the JSON with wide_number_mode => 'round'
50.0007544437
->50.000754443700004
1.3085912824
->1.3085912824000001
From using CAST
CAST("50.0007544437" AS FLOAT64)
->50.0007544437
CAST("1.3085912824" AS FLOAT64)
->1.3085912824
al...@airasia.com <al...@airasia.com> #11
Invalid input to PARSE_JSON: Input number: 16.556616 cannot round-trip through string representation
I am really surprised this issue is not raising more attention.
au...@premise.com <au...@premise.com> #12
bi...@numan.com <bi...@numan.com> #13
se...@ividence.com <se...@ividence.com> #14
se...@woptimo.com <se...@woptimo.com> #15
su...@google.com <su...@google.com> #16
fr...@google.com <fr...@google.com>
fr...@google.com <fr...@google.com> #17
Thanks everyone for the comments.
To add more information, floating point numbers use the IEEE-754 standard to encode a value into 64 bits. Most numbers cannot be represented precisely as a FLOAT64 and approximations are used.
For example, SELECT 50.0007544437
returns 50.0007544437
in BigQuery but it is still an approximation. It is rounded to a limited number of digits, and the value approximating 50.0007544437
might appear as "50.0007544437"
, hiding the fact that the value is not precise. In other situations, the approximation can be visible (SELECT 50.00075444370001
returns 50.000754443700011
). (Note that this has nothing to do with the number of digits after the decimal point. 0.1 cannot be represented exactly as a FLOAT64 either.)
For the JSON type, a different algorithm is used for parsing and representing floating point numbers, resulting in 50.0007544437
being printed as 50.000754443700004
, and therefore failing the "round trip through string representation check" done when wide_number_mode
is exact
(the default value).
We are looking into whether we can reconcile the two formattings.
In the meantime, please use wide_number_mode=>"round"
. It doesn't have any impact on your FLOAT64 values, unless you care about their string representation, in which case you shouldn't encode them as numbers, but as string in the JSON data.
mb...@hopper.com <mb...@hopper.com> #18
I think the problem here is that Google chose non-standard floating point number string representation behaviour for JSON in Spanner and that causes problems with round tripping. Yes, it is true that most numbers cannot be represented precisely as a 64-bit floating point number and approximation is used. For example, given the following numbers:
535.5357055664061
535.5357055664062
535.5357055664063
535.5357055664064
535.5357055664065
their binary form in IEEE-754 will be represented as:
100000010000000101111000100100100011111111111111111111111111111
100000010000000101111000100100100100000000000000000000000000000
100000010000000101111000100100100100000000000000000000000000000
100000010000000101111000100100100100000000000000000000000000001
100000010000000101111000100100100100000000000000000000000000010
As you can see, both 535.5357055664062
and 535.5357055664063
have exactly the same binary representation, but in fact the binary value is actually equal to 535.53570556640625
. As neither of those values can be represented exactly, the approximation is used.
But that's where the most interesting part starts, while IEEE-754 states regarding rounding rules
rounds to the nearest value; if the number falls midway, it is rounded to the nearest value with an even least significant digit; this is the default for binary floating point and the recommended default for decimal
Spanner doesn't follow the recommended default behaviour for decimal in JSON implementation, and
SELECT PARSE_JSON('{"id":535.5357055664062}', wide_number_mode=>'round') AS json_data;
returns
{"id":535.5357055664063}
It's worth mentioning here that IEEE-754 actually has the following optional rounding rule:
rounds to the nearest value; if the number falls midway, it is rounded to the nearest value above (for positive numbers) or below (for negative numbers); this is intended as an option for decimal floating point.
And it seems whoever implemented it for JSON in Spanner, chose this optional behaviour.
The funniest part here though is inconsistent Spanner behaviour between JSON and non-JSON. While my example above returns one value, the following two queries in Spanner:
SELECT 535.5357055664062;
and
SELECT 535.5357055664063;
would actually return another value:
535.5357055664062
As most programming languages actually follow default behaviour for IEEE-754, which is in turn is used by JSON serialiser, that's actually quite annoying Spanner bug (given also inconsistent behaviour in Spanner itself). Same number in Python for example:
In [2]: 535.5357055664062
Out[2]: 535.5357055664062
In [3]: 535.5357055664063
Out[3]: 535.5357055664062
While wide_number_mode
is harmless indeed (in certain cases), it is still a dirty hack to just accommodate for weird behaviour in string representation for floating point numbers in JSON implementation and causes some amount of frustration.
fr...@google.com <fr...@google.com>
v-...@choreograph.com <v-...@choreograph.com> #19
Why this problem is only or one number below.
select PARSE_JSON('101.71286344529');----Working
select PARSE_JSON('101.71286344528');----Error
select PARSE_JSON('101.71286344527');----Working
select PARSE_JSON('101.71286344526');----Working
select PARSE_JSON('101.71286344525');--- Working
ph...@hopper.com <ph...@hopper.com> #20
It doesn't have any impact on your FLOAT64 values, unless you care about their string representation, in which case you shouldn't encode them as numbers, but as string in the JSON data
The JSON specification allows arbitrary (even infinite!) precision for numbers. I believe the actual problem here is forcing users to use FLOAT64
to represent numbers during parsing. There are valid cases for working with JSON numbers that cannot be represented as floating points (as made evident with this popular issue).
I can see 2 other approaches that could potentially solve people's problem:
- allow falling back to string representation, e.g.:
wide_number_mode => 'string'
- allow using
NUMERIC
/BIGNUMERIC
instead ofFLOAT64
, e.g.:wide_number_mode => 'numeric'
Option 1. would probably require parsing all numbers as strings so that the schema is consistent. So it might not be the best solution, but it is the most general.
Option 2. is not as general since you could technically still have numbers that cannot fit, but it would solve the problem for almost all users. People that would still hit this with BIGNUMERIC
are probably already using strings anyway.
Description
For example:
`select parse_json('75.2430000000004');` errors out with the error `Invalid input to PARSE_JSON: Input number: 75.2430000000004 cannot round-trip through string representation` while `select parse_json('75.2430000000005');` does not even though decimal accuracy is exactly the same.
When I invoke `select parse_json('75.2430000000004', wide_number_mode=>'round');` this will result in rounding down to the value of `75.24300000000039` while `select parse_json('75.2430000000005', wide_number_mode=>'round');` returns the desired result of `75.2430000000005`.
From what I see in the PARSE_JSON documentation it should be converting that decimal value to a FLOAT64 which should be able handle the given number of decimal places without relying on number rounding. Can someone please explain to me what is going on and if this is a bug or not? This is causing issues replicating google lighthouse data from a federated table where the json object is a string representation of the data to a native big query table where the data is actually a json object.