Fixed
Status Update
Comments
el...@google.com <el...@google.com> #2
We recognize the importance of this too. We don't have a specific timeline for the feature, but it's something that we are actively scoping out.
my...@google.com <my...@google.com> #3
Nick, can you clarify why you need "configurable" precision, instead of a fixed precision?
If we were to support DECIMAL(p, s) (which allows up to p-s digits before the decimal point and s digits after the decimal point), what would be the desired range of p and s?
If we were to support DECIMAL(p, s) (which allows up to p-s digits before the decimal point and s digits after the decimal point), what would be the desired range of p and s?
go...@nytimes.com <go...@nytimes.com> #4
Fixed-point arithmetic is critical for financial applications!
[Deleted User] <[Deleted User]> #5
The only alternative is to store all currency as integers and convert back. This is not a viable solution and is a huge blocker for the New York Times. Please make this more than just a "scooping out."
el...@google.com <el...@google.com> #6
By scoping out I mean that we are determining the various aspects of the system that need to change to support the feature, not that there is a question of whether we will undertake the work.
go...@nytimes.com <go...@nytimes.com> #7
I encourage you to look at Postgres's implementation of numerics, because
they allow basically infinite precision. That can be handy -- particularly
in a system designed to operate on terabytes of data.
--gordon
they allow basically infinite precision. That can be handy -- particularly
in a system designed to operate on terabytes of data.
--gordon
mo...@google.com <mo...@google.com> #8
Thanks Nick - we are familiar with Postgres's and SQL Server's implementations of numerics and tradeoffs that were involved in them. What we are trying to do here is to gather actual user requirements, i.e. for your financial application, what values of s (and p) would be acceptable. For example, GAAP prescribes s=4, but many applications use more than that (due to currency conversion and other reasons). What would be your requirements ?hh
[Deleted User] <[Deleted User]> #9
BTW That was Gordon dropping the postgres reference.
I mean, I was assuming configurable scale. Do you just need a max ?
I mean, I was assuming configurable scale. Do you just need a max ?
go...@nytimes.com <go...@nytimes.com> #10
This is Gordon.
The most extreme example I've personally used are financial instruments
with nominal values in the hundreds of billions. The goal is to keep all
financial transactions to a tenth of a cent, so that is a scale of 16 or 17.
In the old days, the Turkish Lira was about 1,000,000 per dollar, so
representing ten billion dollars also required 16 or 17 significant digits.
(I think the Iranian rial is currently the most "'undervalued" currency,
with about 30,000 to the dollar.)
However, it can be useful (in a more ad-hoc way) to not have a limit -- or
to have a limit that is at least in the hundreds.
--gordon
The most extreme example I've personally used are financial instruments
with nominal values in the hundreds of billions. The goal is to keep all
financial transactions to a tenth of a cent, so that is a scale of 16 or 17.
In the old days, the Turkish Lira was about 1,000,000 per dollar, so
representing ten billion dollars also required 16 or 17 significant digits.
(I think the Iranian rial is currently the most "'undervalued" currency,
with about 30,000 to the dollar.)
However, it can be useful (in a more ad-hoc way) to not have a limit -- or
to have a limit that is at least in the hundreds.
--gordon
my...@google.com <my...@google.com> #11
Thanks for the reply. We will allow at least 17 significant digits and at least 3 digits after decimal point for a tenth of a cent.
We understand that having unlimited precision would be nice, but that would degrade the performance and cost you more in storage. We are considering various options for optimizing user experience.
We understand that having unlimited precision would be nice, but that would degrade the performance and cost you more in storage. We are considering various options for optimizing user experience.
[Deleted User] <[Deleted User]> #12
From Google's own Real Time Bidding protobuf:
// The maximum CPM you want to be charged if you win the auction for this
// ad slot, expressed in micros of your account currency. For example, to
// bid a CPM of 1.29 USD, set max_cpm_micros = 1290000.
We would want to maintain accuracy, but still convert out of integer, when storing bid data. So in this case:
1.290000
so, 6 places after the decimal to properly capture Google Ad Exchange records.
I checked other formats:
OpenRTB uses float.
Appnexus uses float and double
https://developers.google.com/ad-exchange/rtb/downloads/realtime-bidding-proto
http://www.iab.com/wp-content/uploads/2016/03/OpenRTB-API-Specification-Version-2-5-FINAL.pdf
https://github.com/rtbkit/rtbkit/blob/master/rtbkit/plugins/bid_request/appnexus.h
// The maximum CPM you want to be charged if you win the auction for this
// ad slot, expressed in micros of your account currency. For example, to
// bid a CPM of 1.29 USD, set max_cpm_micros = 1290000.
We would want to maintain accuracy, but still convert out of integer, when storing bid data. So in this case:
1.290000
so, 6 places after the decimal to properly capture Google Ad Exchange records.
I checked other formats:
OpenRTB uses float.
Appnexus uses float and double
my...@google.com <my...@google.com> #13
Sure, we will also take this into account. Thanks for the links.
ge...@fivetran.com <ge...@fivetran.com> #14
Another example requirement for you: my company, Fivetran, is a zero-configuration data pipeline that syncs relational databases to BigQuery (among other things). We have customers that use UNSIGNED BIGINT as the primary key for MySQL tables. If BigQuery had a DECIMAL type, we would use DECIMAL(10, 0) to represent UNSIGNED BIGINT.
As it is, there is really no good solution. If we know that the UNSIGNED BIGINT column is just an id, we could convert it to signed representation using wraparound---large values would become negative, but who cares? The problem with this approach is that our system is automated, and there's no way to know in general whether it's OK to treat it as an id and convert large values to negative.
As it is, there is really no good solution. If we know that the UNSIGNED BIGINT column is just an id, we could convert it to signed representation using wraparound---large values would become negative, but who cares? The problem with this approach is that our system is automated, and there's no way to know in general whether it's OK to treat it as an id and convert large values to negative.
my...@google.com <my...@google.com> #15
RE #13, BigQuery's INT64 type already supports 10 digits without converting large values to negative. You don't need to wait for DECIMAL(10, 0).
ge...@fivetran.com <ge...@fivetran.com> #16
Sorry, is made a mistake writing that---we would use DECIMAL(20, 0), which would be able to fit 18446744073709551615, the maximum value of an UNSIGNED BIGINT in MySQL. That value does not fit in BigQuery INT64, I just checked.
my...@google.com <my...@google.com> #17
OK, we will try to support 20 digits, but can't promise at this point. I will give more details on the plan once we finish the design.
Note that the numeric type would be slower and more costly than INT64, so if you could make your data fit into INT64 it would be more ideal.
Note that the numeric type would be slower and more costly than INT64, so if you could make your data fit into INT64 it would be more ideal.
[Deleted User] <[Deleted User]> #18
[Comment deleted]
[Deleted User] <[Deleted User]> #19
Only ideal from a performance point of view. In these applications, that is not the prime consideration. This is about making sure that we can let people work with the data safely and minimizing possibility of error when doing obvious things. Instructions to interpret as cents is a problem. As well, the implicit floor rounding of integers is also a problem.
Please do not think of an end user as an Engineer, or data engineer. Think of an analyst with no programming or data type knowledge. It's Sunday evening, and they've got Excel, the web-ui, and a deadline to make sure that a quarterly earnings filing matches up with another system the next day. They've worked successfully with Oracle for a decade and now we are asking them to use BigQuery. Messing with integer representations of cents is not something this person wants to deal with.
Please do not think of an end user as an Engineer, or data engineer. Think of an analyst with no programming or data type knowledge. It's Sunday evening, and they've got Excel, the web-ui, and a deadline to make sure that a quarterly earnings filing matches up with another system the next day. They've worked successfully with Oracle for a decade and now we are asking them to use BigQuery. Messing with integer representations of cents is not something this person wants to deal with.
my...@google.com <my...@google.com> #20
Understood. My previous message was for george@fivetran.com, who is planning to use the numeric type to store ids rather than money.
je...@google.com <je...@google.com> #21
Agree wth Nick's comment #18 . We should be thinking of a particular use case (although I love Gordon's analysis of the most devalued currencies to set limits). Rather we need to look at other computational tools (Excel, MySQL, R, Matlab, C) to make sure that our limits on precision aligns with the rest of the world.
cj
cj
ge...@fivetran.com <ge...@fivetran.com> #22
Thanks myzh. I'm going to echo what others are saying from a slightly different perspective---our product is an *automated* data pipeline that is syncing from MySQL to BigQuery. There's no human who can look at the schema and say "Oh, these IDs don't have any intrinsic meaning, I can just convert UNSIGNED LONG to LONG and it doesn't matter that big positive numbers will get mapped to negative numbers.
Performance is not a terribly important concern because the underlying data source is a MySQL database---anything that can FIT in MySQL will be CRUSHED by BigQuery, so it's no-big-deal if we're getting 50% or even 10% of BigQuery's performance capability. The thing that is so valuable about BigQuery in this case is that it's so easy to set up and use.
Performance is not a terribly important concern because the underlying data source is a MySQL database---anything that can FIT in MySQL will be CRUSHED by BigQuery, so it's no-big-deal if we're getting 50% or even 10% of BigQuery's performance capability. The thing that is so valuable about BigQuery in this case is that it's so easy to set up and use.
[Deleted User] <[Deleted User]> #23
Any progress on this? We'd love an ETA.
my...@google.com <my...@google.com> #24
Thanks for the interest and the patience. We don't have an ETA yet. We will have a better answer in early Q3.
my...@google.com <my...@google.com>
ap...@google.com <ap...@google.com> #25
Nick (and other people, who expressed interest in DECIMAL/NUMERIC support), I am working on a proposal for the fixed decimal point type support in standard SQL and I would like to run the basic design decisions by you to make sure they fit your needs.
The proposal is to implement an unparameterized NUMERIC type with fixed precision of 38 decimal digits and scale of 9 digits, meaning NUMERIC(38, 9). The scale and precision are going to be preserved across operations, and for multiplication and division we are going to round the result using half-up rounding semantics. We will error out if the result of the operation would exceed the precision. We plan to support NUMERIC parameters and results for AVG, SUM, MIN, MAX, ANY_VALUE aggregation and analytic functions, and for ABS, SIGN, GREATEST, LEAST, DIV, MOD, POW math functions.
Please note that this is still preliminary and I am working on nailing the exact behavior of NUMERICs and how the type fits into the rest of the language down, but I want to make sure that the defaults that we are writing into the spec are suitable for your applications.
The proposal is to implement an unparameterized NUMERIC type with fixed precision of 38 decimal digits and scale of 9 digits, meaning NUMERIC(38, 9). The scale and precision are going to be preserved across operations, and for multiplication and division we are going to round the result using half-up rounding semantics. We will error out if the result of the operation would exceed the precision. We plan to support NUMERIC parameters and results for AVG, SUM, MIN, MAX, ANY_VALUE aggregation and analytic functions, and for ABS, SIGN, GREATEST, LEAST, DIV, MOD, POW math functions.
Please note that this is still preliminary and I am working on nailing the exact behavior of NUMERICs and how the type fits into the rest of the language down, but I want to make sure that the defaults that we are writing into the spec are suitable for your applications.
go...@nytimes.com <go...@nytimes.com> #26
It is disappointing that you are not opting for the more flexible Postgres
version of numerics.
For practical purposes, 38/9 is fine -- particularly the ones involving
money which is what I really care about. I would appreciate include
logarithms -- LN() and LOG(). These are the simplest way to implement a
PRODUCT() aggregation/window function. I would also throw in:
- ABS()
- EXP()
I also assume that these can readily be converted to floating point
representations.
What will be the semantics for mixed operations, between floats and
integers and numeric?
--gordon
On Fri, Aug 4, 2017 at 4:49 PM, <buganizer-system@google.com> wrote:
version of numerics.
For practical purposes, 38/9 is fine -- particularly the ones involving
money which is what I really care about. I would appreciate include
logarithms -- LN() and LOG(). These are the simplest way to implement a
PRODUCT() aggregation/window function. I would also throw in:
- ABS()
- EXP()
I also assume that these can readily be converted to floating point
representations.
What will be the semantics for mixed operations, between floats and
integers and numeric?
--gordon
On Fri, Aug 4, 2017 at 4:49 PM, <buganizer-system@google.com> wrote:
el...@google.com <el...@google.com> #27
Gordon, we want to make a numeric type with sane defaults available as soon as we can, keeping in mind forward-compatibility for parameterization in the future. You can imagine that parameterization adds significant complexity to the design and implementation, and it doesn't make sense to block the feature completely.
I'll let Alex comment on the math function support since he is driving this feature.
I'll let Alex comment on the math function support since he is driving this feature.
ap...@google.com <ap...@google.com> #28
Gordon,
as Elliott said we are starting with a default NUMERIC option which we probably would have to implement anyway, even if we chose to offer the full support for parameterized NUMERICs from the beginning. This decision keeps the door open for introducing parameterized NUMERICs later.
As far as EXP, LOG, LN functions go, the SQL standard defines their result type as an implementation defined approximate numeric type. For the Standard Google SQL this would mean DOUBLE. With introduction of the NUMERIC type nothing changes here.
ABS will support NUMERICs, i.e. the result type is going to be NUMERIC if the argument is NUMERIC.
Mixed operations with DOUBLEs and FLOATs, if we say broadly, will result in the DOUBLE result type. This is SQL standard compliant, once an approximate numeric value is introduced into a computation, the result cannot be made exact. Integers in Standard SQL can always be safely promoted to NUMERICs, so operations between integers and NUMERICs will result in a NUMERIC.
as Elliott said we are starting with a default NUMERIC option which we probably would have to implement anyway, even if we chose to offer the full support for parameterized NUMERICs from the beginning. This decision keeps the door open for introducing parameterized NUMERICs later.
As far as EXP, LOG, LN functions go, the SQL standard defines their result type as an implementation defined approximate numeric type. For the Standard Google SQL this would mean DOUBLE. With introduction of the NUMERIC type nothing changes here.
ABS will support NUMERICs, i.e. the result type is going to be NUMERIC if the argument is NUMERIC.
Mixed operations with DOUBLEs and FLOATs, if we say broadly, will result in the DOUBLE result type. This is SQL standard compliant, once an approximate numeric value is introduced into a computation, the result cannot be made exact. Integers in Standard SQL can always be safely promoted to NUMERICs, so operations between integers and NUMERICs will result in a NUMERIC.
go...@nytimes.com <go...@nytimes.com> #29
These are all fine. I wanted to be sure they didn't generate an error.
One more thing. Will there be a way to specify a NUMERIC literal. Is it
possible that something like
cast(1.2 as numeric)
could ever result in 1.999999999 or 1.200000001?
--gordon
On Mon, Aug 7, 2017 at 2:59 PM, <buganizer-system@google.com> wrote:
One more thing. Will there be a way to specify a NUMERIC literal. Is it
possible that something like
cast(1.2 as numeric)
could ever result in 1.999999999 or 1.200000001?
--gordon
On Mon, Aug 7, 2017 at 2:59 PM, <buganizer-system@google.com> wrote:
ap...@google.com <ap...@google.com> #30
Gordon,
this is the details that I am working on to finalize right now. There will be an explicit NUMERIC literal and my intention is to make this cast result in an exact NUMERIC, 1.2.
this is the details that I am working on to finalize right now. There will be an explicit NUMERIC literal and my intention is to make this cast result in an exact NUMERIC, 1.2.
[Deleted User] <[Deleted User]> #31
Just thought I'd chime in on this because we're also seeing a lot of issues around this. We store a bunch of dollar amounts, and with floats, of course some times you get more than 2 decimal places with like 99999 in the extra positions. This is normal for a float but we're trying to store a number that is an exact amount so it's not really ideal to begin with, even not counting the next issues...
When you sum a large number of these amounts, it switches into scientific notation, in an unpredictable way. So the same "SELECT SUM(amount_field) * percentage AS total" formula returns correct data for some values (small numbers) and incorrect for other values (the larger ones that became scientific notation). I.e. if you look at "one day" of data and SUM(amount_field), it comes out fine, but when you look at one month of data and SUM(amount_field), the value becomes high enough to switch to scientific notation and now any math on that result that would have worked on the smaller amounts doesn't work correctly. As our data is organic, it can get above and below these thresholds depending on season or user behavior so there would be no guarantee that even just getting one day of data wouldn't go above the threshold.
We can cast to INT but then we loose the decimal places. Even CAST(SUM(amount_field)*100 AS INT64)/100 doesn't work because the SUM could return scientific notation and *100 just makes the number completely wrong. Maybe there is a way around this but it's still a hack to do all this extra work and make the queries really ugly just to get a decimal, which is what the data was to begin with.
Thanks & let me know if you have any tips or when you've decided on how your upcoming changes will work so maybe we can plan for it. Obviously we would really love to see a decimal data type so the column is stored in it's natural state, even if it is a little slower to work with it.
When you sum a large number of these amounts, it switches into scientific notation, in an unpredictable way. So the same "SELECT SUM(amount_field) * percentage AS total" formula returns correct data for some values (small numbers) and incorrect for other values (the larger ones that became scientific notation). I.e. if you look at "one day" of data and SUM(amount_field), it comes out fine, but when you look at one month of data and SUM(amount_field), the value becomes high enough to switch to scientific notation and now any math on that result that would have worked on the smaller amounts doesn't work correctly. As our data is organic, it can get above and below these thresholds depending on season or user behavior so there would be no guarantee that even just getting one day of data wouldn't go above the threshold.
We can cast to INT but then we loose the decimal places. Even CAST(SUM(amount_field)*100 AS INT64)/100 doesn't work because the SUM could return scientific notation and *100 just makes the number completely wrong. Maybe there is a way around this but it's still a hack to do all this extra work and make the queries really ugly just to get a decimal, which is what the data was to begin with.
Thanks & let me know if you have any tips or when you've decided on how your upcoming changes will work so maybe we can plan for it. Obviously we would really love to see a decimal data type so the column is stored in it's natural state, even if it is a little slower to work with it.
[Deleted User] <[Deleted User]> #32
Strongly agree with comment #31 above.
[Deleted User] <[Deleted User]> #33
Isn't the scientific notation just an artifact of the UI layer ?
Note for business cases we almost *never* want output in scientific notation. I vote for it to be banished from all output. Consider the burden in places on downstream parsing.
Note for business cases we almost *never* want output in scientific notation. I vote for it to be banished from all output. Consider the burden in places on downstream parsing.
[Deleted User] <[Deleted User]> #34
Totally agree. The Scientific Notation provides no value in result sets.
[Deleted User] <[Deleted User]> #35
Hi,
is there any news for an estimated timeline for this feature?
For our internal use cases (financial/pricing information), the proposed implementation of NUMERIC(38,9) is OK.
Thanks.
is there any news for an estimated timeline for this feature?
For our internal use cases (financial/pricing information), the proposed implementation of NUMERIC(38,9) is OK.
Thanks.
ap...@google.com <ap...@google.com> #36
David, we are working on getting the proposal implemented. Right now we are targeting Q1.
go...@nytimes.com <go...@nytimes.com> #37
As a word of encouragement, the lack of numerics makes it more difficult for us to validate data loaded into BQ from other databases -- something that we need for audit purposes.
--gordon
--gordon
tr...@gmail.com <tr...@gmail.com> #38
Is this still slated for Q1?
el...@google.com <el...@google.com> #39
That's right. When the initial alpha release is ready, we will provide a way to opt-in to get access to it.
[Deleted User] <[Deleted User]> #40
Do you have a rough timeline of when to expect the alpha?
el...@google.com <el...@google.com> #41
Expect it some time in Q1. We'll provide more specific information when we have it.
[Deleted User] <[Deleted User]> #42
Any updates? Alpha or beta availability?
ap...@google.com <ap...@google.com> #43
Yes, we are approaching alpha availability. We will make an announcement once all the relevant bits make it into production.
el...@google.com <el...@google.com> #44
If you are interested in the alpha release of the NUMERIC type, please submit this form: https://goo.gl/forms/UqMISS5SJxDi4cwY2 . We will whitelist projects on a rolling basis and send an email to notify when you are included.
Please keep in mind that this is an Alpha feature, as explained inhttps://cloud.google.com/terms/launch-stages . As such, we discourage using the NUMERIC type with production applications, and there is no associated SLA.
Thanks!
Elliott, on behalf of the BigQuery team
Please keep in mind that this is an Alpha feature, as explained in
Thanks!
Elliott, on behalf of the BigQuery team
[Deleted User] <[Deleted User]> #45
Thanks Elliott. Do you have a rough estimate of when this might move into beta?
el...@google.com <el...@google.com> #46
We have a list of unimplemented features in the alpha documentation that we share with people who have enrolled, and these are things that we want to address before moving into beta. We don't have a specific date that we expect all of them to be complete, though. We try to be as thorough as possible, of course, but we also want to give alpha users time to report feedback in case there are oversights that we didn't recognize ourselves.
[Deleted User] <[Deleted User]> #47
We are also affected greatly by this 'feature'...
Looking at what the query mentioned in the job id's do; currency conversion on the fly (floats) and then aggregating:
If we run it multiple times, differences can be in the range of 0.35 % for the aggregated result.!
Because we are talking revenue reporting here, this large of a difference is just unacceptable for our stakeholders
Job id's:
spil-bi-online-prd:EU.bquijob_a21d171_162aa959cd4
spil-bi-online-prd:EU.bquijob_2eaf48b1_162aae00559
We really hope the NUMERIC data type will fix this & that it will be moved to beta / GA soon. Because not running under a SLA is not ideal...
Question: If we would submit a project for this feature, is all OTHER 'work we do' in BigQuery (not using the NUMERIC data type) still under SLA?
Looking at what the query mentioned in the job id's do; currency conversion on the fly (floats) and then aggregating:
If we run it multiple times, differences can be in the range of 0.35 % for the aggregated result.!
Because we are talking revenue reporting here, this large of a difference is just unacceptable for our stakeholders
Job id's:
spil-bi-online-prd:EU.bquijob_a21d171_162aa959cd4
spil-bi-online-prd:EU.bquijob_2eaf48b1_162aae00559
We really hope the NUMERIC data type will fix this & that it will be moved to beta / GA soon. Because not running under a SLA is not ideal...
Question: If we would submit a project for this feature, is all OTHER 'work we do' in BigQuery (not using the NUMERIC data type) still under SLA?
el...@google.com <el...@google.com> #48
We hope that it will be in beta/GA soon as well :) Specifically for the
beta, I can share that we are targeting the first half of this quarter, and
we will provide an update here as well as in the release notes when that
happens. I can't provide any specifics yet about timing of a GA launch,
however.
In terms of the SLA, if you run queries in a project that is enrolled in
the alpha but you don't reference any NUMERIC columns or create NUMERIC
values, then the usual BigQuery SLA applies. The Google Cloud Alpha terms
apply for any queries involving NUMERIC, however.
beta, I can share that we are targeting the first half of this quarter, and
we will provide an update here as well as in the release notes when that
happens. I can't provide any specifics yet about timing of a GA launch,
however.
In terms of the SLA, if you run queries in a project that is enrolled in
the alpha but you don't reference any NUMERIC columns or create NUMERIC
values, then the usual BigQuery SLA applies. The Google Cloud Alpha terms
apply for any queries involving NUMERIC, however.
[Deleted User] <[Deleted User]> #49
Thanks for the quick reply el..., I've immediately submitted the project we are building, - but kind of blocked now -, to the Alpha!
[Deleted User] <[Deleted User]> #51
I really love the existence of the NUMERIC data type. However, when I sum() over a NUMERIC column, I get: "Field xy is of type STRING which is not supported for SUM". In the schema it is shown as NUMERIC as it should. Is there a restriction for aggregations?
el...@google.com <el...@google.com> #52
Are you using legacy SQL? You need to use standard SQL for a SUM over a
NUMERIC column. See
https://cloud.google.com/bigquery/data-types#numeric-type-support for more
information.
NUMERIC column. See
information.
xu...@gmail.com <xu...@gmail.com> #54
Chuyển tất cả vào tài khoản 9704050941241478 mai hồng xuyên
Description
Especially when importing from RDBMS that have it. We don't want to UDF around integer pennies if possible.
We'd much prefer a native decimal type with configurable precision, and documented rounding rules.