NUMERIC data type [35906014]

Fixed

Feature Request

Status Update

No update yet.

Description

[Deleted User]

created issue #1

Jan 19, 2017 08:40PM

We are going to be dealing with dollars soon, for which floats are not ideal.
Especially when importing from RDBMS that have it. We don't want to UDF around integer pennies if possible.

We'd much prefer a native decimal type with configurable precision, and documented rounding rules.

Comments

el...@google.com <el...@google.com> #2Jan 19, 2017 09:21PM

Assigned to el...@google.com.

We recognize the importance of this too. We don't have a specific timeline for the feature, but it's something that we are actively scoping out.

my...@google.com <my...@google.com> #3Jan 19, 2017 09:32PM

Nick, can you clarify why you need "configurable" precision, instead of a fixed precision?

If we were to support DECIMAL(p, s) (which allows up to p-s digits before the decimal point and s digits after the decimal point), what would be the desired range of p and s?

go...@nytimes.com <go...@nytimes.com> #4Jan 19, 2017 11:12PM

Fixed-point arithmetic is critical for financial applications!

[Deleted User] <[Deleted User]> #5Jan 19, 2017 11:15PM

The only alternative is to store all currency as integers and convert back. This is not a viable solution and is a huge blocker for the New York Times. Please make this more than just a "scooping out."

el...@google.com <el...@google.com> #6Jan 19, 2017 11:19PM

By scoping out I mean that we are determining the various aspects of the system that need to change to support the feature, not that there is a question of whether we will undertake the work.

go...@nytimes.com <go...@nytimes.com> #7Jan 20, 2017 03:12PM

I encourage you to look at Postgres's implementation of numerics, because
they allow basically infinite precision. That can be handy -- particularly
in a system designed to operate on terabytes of data.

--gordon

mo...@google.com <mo...@google.com> #8Jan 20, 2017 03:39PM

Thanks Nick - we are familiar with Postgres's and SQL Server's implementations of numerics and tradeoffs that were involved in them. What we are trying to do here is to gather actual user requirements, i.e. for your financial application, what values of s (and p) would be acceptable. For example, GAAP prescribes s=4, but many applications use more than that (due to currency conversion and other reasons). What would be your requirements ?hh

[Deleted User] <[Deleted User]> #9Jan 20, 2017 04:14PM

BTW That was Gordon dropping the postgres reference.
I mean, I was assuming configurable scale. Do you just need a max ?

go...@nytimes.com <go...@nytimes.com> #10Jan 20, 2017 04:58PM

This is Gordon.

The most extreme example I've personally used are financial instruments
with nominal values in the hundreds of billions. The goal is to keep all
financial transactions to a tenth of a cent, so that is a scale of 16 or 17.

In the old days, the Turkish Lira was about 1,000,000 per dollar, so
representing ten billion dollars also required 16 or 17 significant digits.
(I think the Iranian rial is currently the most "'undervalued" currency,
with about 30,000 to the dollar.)

However, it can be useful (in a more ad-hoc way) to not have a limit -- or
to have a limit that is at least in the hundreds.

--gordon

my...@google.com <my...@google.com> #11Jan 20, 2017 05:20PM

Thanks for the reply. We will allow at least 17 significant digits and at least 3 digits after decimal point for a tenth of a cent.

We understand that having unlimited precision would be nice, but that would degrade the performance and cost you more in storage. We are considering various options for optimizing user experience.

[Deleted User] <[Deleted User]> #12Jan 20, 2017 05:21PM

From Google's own Real Time Bidding protobuf:

// The maximum CPM you want to be charged if you win the auction for this
// ad slot, expressed in micros of your account currency. For example, to
// bid a CPM of 1.29 USD, set max_cpm_micros = 1290000.

We would want to maintain accuracy, but still convert out of integer, when storing bid data. So in this case:

1.290000

so, 6 places after the decimal to properly capture Google Ad Exchange records.

I checked other formats:
OpenRTB uses float.
Appnexus uses float and double

https://developers.google.com/ad-exchange/rtb/downloads/realtime-bidding-proto

http://www.iab.com/wp-content/uploads/2016/03/OpenRTB-API-Specification-Version-2-5-FINAL.pdf

https://github.com/rtbkit/rtbkit/blob/master/rtbkit/plugins/bid_request/appnexus.h

my...@google.com <my...@google.com> #13Jan 20, 2017 05:28PM

Sure, we will also take this into account. Thanks for the links.

ge...@fivetran.com <ge...@fivetran.com> #14Feb 6, 2017 10:09PM

Another example requirement for you: my company, Fivetran, is a zero-configuration data pipeline that syncs relational databases to BigQuery (among other things). We have customers that use UNSIGNED BIGINT as the primary key for MySQL tables. If BigQuery had a DECIMAL type, we would use DECIMAL(10, 0) to represent UNSIGNED BIGINT.

As it is, there is really no good solution. If we know that the UNSIGNED BIGINT column is just an id, we could convert it to signed representation using wraparound---large values would become negative, but who cares? The problem with this approach is that our system is automated, and there's no way to know in general whether it's OK to treat it as an id and convert large values to negative.

my...@google.com <my...@google.com> #15Feb 6, 2017 10:17PM

RE #13, BigQuery's INT64 type already supports 10 digits without converting large values to negative. You don't need to wait for DECIMAL(10, 0).

ge...@fivetran.com <ge...@fivetran.com> #16Feb 6, 2017 10:44PM

Sorry, is made a mistake writing that---we would use DECIMAL(20, 0), which would be able to fit 18446744073709551615, the maximum value of an UNSIGNED BIGINT in MySQL. That value does not fit in BigQuery INT64, I just checked.

my...@google.com <my...@google.com> #17Feb 6, 2017 11:09PM

OK, we will try to support 20 digits, but can't promise at this point. I will give more details on the plan once we finish the design.

Note that the numeric type would be slower and more costly than INT64, so if you could make your data fit into INT64 it would be more ideal.

[Deleted User] <[Deleted User]> #18Feb 7, 2017 02:17PM

[Comment deleted]

[Deleted User] <[Deleted User]> #19Feb 7, 2017 02:19PM

Only ideal from a performance point of view. In these applications, that is not the prime consideration. This is about making sure that we can let people work with the data safely and minimizing possibility of error when doing obvious things. Instructions to interpret as cents is a problem. As well, the implicit floor rounding of integers is also a problem.

Please do not think of an end user as an Engineer, or data engineer. Think of an analyst with no programming or data type knowledge. It's Sunday evening, and they've got Excel, the web-ui, and a deadline to make sure that a quarterly earnings filing matches up with another system the next day. They've worked successfully with Oracle for a decade and now we are asking them to use BigQuery. Messing with integer representations of cents is not something this person wants to deal with.

my...@google.com <my...@google.com> #20Feb 7, 2017 05:15PM

Understood. My previous message was for george@fivetran.com, who is planning to use the numeric type to store ids rather than money.

je...@google.com <je...@google.com> #21Feb 8, 2017 12:02AM

Agree wth Nick's

comment #18

. We should be thinking of a particular use case (although I love Gordon's analysis of the most devalued currencies to set limits). Rather we need to look at other computational tools (Excel, MySQL, R, Matlab, C) to make sure that our limits on precision aligns with the rest of the world.

cj

ge...@fivetran.com <ge...@fivetran.com> #22Feb 17, 2017 12:44AM

Thanks myzh. I'm going to echo what others are saying from a slightly different perspective---our product is an *automated* data pipeline that is syncing from MySQL to BigQuery. There's no human who can look at the schema and say "Oh, these IDs don't have any intrinsic meaning, I can just convert UNSIGNED LONG to LONG and it doesn't matter that big positive numbers will get mapped to negative numbers.

Performance is not a terribly important concern because the underlying data source is a MySQL database---anything that can FIT in MySQL will be CRUSHED by BigQuery, so it's no-big-deal if we're getting 50% or even 10% of BigQuery's performance capability. The thing that is so valuable about BigQuery in this case is that it's so easy to set up and use.

[Deleted User] <[Deleted User]> #23Apr 20, 2017 03:18PM

Any progress on this? We'd love an ETA.

my...@google.com <my...@google.com> #24Apr 24, 2017 09:14PM

Reassigned to my...@google.com.

Thanks for the interest and the patience. We don't have an ETA yet. We will have a better answer in early Q3.

my...@google.com <my...@google.com> Jul 10, 2017 08:55PM

Reassigned to ap...@google.com.

ap...@google.com <ap...@google.com> #25Aug 4, 2017 08:49PM

Nick (and other people, who expressed interest in DECIMAL/NUMERIC support), I am working on a proposal for the fixed decimal point type support in standard SQL and I would like to run the basic design decisions by you to make sure they fit your needs.

The proposal is to implement an unparameterized NUMERIC type with fixed precision of 38 decimal digits and scale of 9 digits, meaning NUMERIC(38, 9). The scale and precision are going to be preserved across operations, and for multiplication and division we are going to round the result using half-up rounding semantics. We will error out if the result of the operation would exceed the precision. We plan to support NUMERIC parameters and results for AVG, SUM, MIN, MAX, ANY_VALUE aggregation and analytic functions, and for ABS, SIGN, GREATEST, LEAST, DIV, MOD, POW math functions.

Please note that this is still preliminary and I am working on nailing the exact behavior of NUMERICs and how the type fits into the rest of the language down, but I want to make sure that the defaults that we are writing into the spec are suitable for your applications.

go...@nytimes.com <go...@nytimes.com> #26Aug 7, 2017 02:14AM

It is disappointing that you are not opting for the more flexible Postgres
version of numerics.

For practical purposes, 38/9 is fine -- particularly the ones involving
money which is what I really care about. I would appreciate include
logarithms -- LN() and LOG(). These are the simplest way to implement a
PRODUCT() aggregation/window function. I would also throw in:

- ABS()
- EXP()

I also assume that these can readily be converted to floating point
representations.

What will be the semantics for mixed operations, between floats and
integers and numeric?

--gordon

On Fri, Aug 4, 2017 at 4:49 PM, <buganizer-system@google.com> wrote:

- Show quoted text -

el...@google.com <el...@google.com> #27Aug 7, 2017 04:20PM

Gordon, we want to make a numeric type with sane defaults available as soon as we can, keeping in mind forward-compatibility for parameterization in the future. You can imagine that parameterization adds significant complexity to the design and implementation, and it doesn't make sense to block the feature completely.

I'll let Alex comment on the math function support since he is driving this feature.

ap...@google.com <ap...@google.com> #28Aug 7, 2017 06:59PM

Gordon,

as Elliott said we are starting with a default NUMERIC option which we probably would have to implement anyway, even if we chose to offer the full support for parameterized NUMERICs from the beginning. This decision keeps the door open for introducing parameterized NUMERICs later.

As far as EXP, LOG, LN functions go, the SQL standard defines their result type as an implementation defined approximate numeric type. For the Standard Google SQL this would mean DOUBLE. With introduction of the NUMERIC type nothing changes here.

ABS will support NUMERICs, i.e. the result type is going to be NUMERIC if the argument is NUMERIC.

Mixed operations with DOUBLEs and FLOATs, if we say broadly, will result in the DOUBLE result type. This is SQL standard compliant, once an approximate numeric value is introduced into a computation, the result cannot be made exact. Integers in Standard SQL can always be safely promoted to NUMERICs, so operations between integers and NUMERICs will result in a NUMERIC.

go...@nytimes.com <go...@nytimes.com> #29Aug 7, 2017 07:04PM

These are all fine. I wanted to be sure they didn't generate an error.

One more thing. Will there be a way to specify a NUMERIC literal. Is it
possible that something like

cast(1.2 as numeric)

could ever result in 1.999999999 or 1.200000001?

--gordon

On Mon, Aug 7, 2017 at 2:59 PM, <buganizer-system@google.com> wrote:

- Show quoted text -

ap...@google.com <ap...@google.com> #30Aug 7, 2017 07:13PM

Gordon,

this is the details that I am working on to finalize right now. There will be an explicit NUMERIC literal and my intention is to make this cast result in an exact NUMERIC, 1.2.

[Deleted User] <[Deleted User]> #31Aug 15, 2017 05:34PM

Just thought I'd chime in on this because we're also seeing a lot of issues around this. We store a bunch of dollar amounts, and with floats, of course some times you get more than 2 decimal places with like 99999 in the extra positions. This is normal for a float but we're trying to store a number that is an exact amount so it's not really ideal to begin with, even not counting the next issues...

When you sum a large number of these amounts, it switches into scientific notation, in an unpredictable way. So the same "SELECT SUM(amount_field) * percentage AS total" formula returns correct data for some values (small numbers) and incorrect for other values (the larger ones that became scientific notation). I.e. if you look at "one day" of data and SUM(amount_field), it comes out fine, but when you look at one month of data and SUM(amount_field), the value becomes high enough to switch to scientific notation and now any math on that result that would have worked on the smaller amounts doesn't work correctly. As our data is organic, it can get above and below these thresholds depending on season or user behavior so there would be no guarantee that even just getting one day of data wouldn't go above the threshold.

We can cast to INT but then we loose the decimal places. Even CAST(SUM(amount_field)*100 AS INT64)/100 doesn't work because the SUM could return scientific notation and *100 just makes the number completely wrong. Maybe there is a way around this but it's still a hack to do all this extra work and make the queries really ugly just to get a decimal, which is what the data was to begin with.

Thanks & let me know if you have any tips or when you've decided on how your upcoming changes will work so maybe we can plan for it. Obviously we would really love to see a decimal data type so the column is stored in it's natural state, even if it is a little slower to work with it.

[Deleted User] <[Deleted User]> #32Aug 23, 2017 08:05PM

Strongly agree with

comment #31

above.

[Deleted User] <[Deleted User]> #33Aug 24, 2017 03:25PM

Isn't the scientific notation just an artifact of the UI layer ?
Note for business cases we almost *never* want output in scientific notation. I vote for it to be banished from all output. Consider the burden in places on downstream parsing.

[Deleted User] <[Deleted User]> #34Sep 8, 2017 01:56AM

Totally agree. The Scientific Notation provides no value in result sets.

[Deleted User] <[Deleted User]> #35Oct 31, 2017 01:24PM

Hi,

is there any news for an estimated timeline for this feature?
For our internal use cases (financial/pricing information), the proposed implementation of NUMERIC(38,9) is OK.

Thanks.

ap...@google.com <ap...@google.com> #36Oct 31, 2017 11:32PM

David, we are working on getting the proposal implemented. Right now we are targeting Q1.

go...@nytimes.com <go...@nytimes.com> #37Nov 1, 2017 01:11AM

As a word of encouragement, the lack of numerics makes it more difficult for us to validate data loaded into BQ from other databases -- something that we need for audit purposes.

--gordon

tr...@gmail.com <tr...@gmail.com> #38Dec 5, 2017 12:09AM

Is this still slated for Q1?

el...@google.com <el...@google.com> #39Dec 5, 2017 12:11AM

That's right. When the initial alpha release is ready, we will provide a way to opt-in to get access to it.

[Deleted User] <[Deleted User]> #40Dec 14, 2017 06:27PM

Do you have a rough timeline of when to expect the alpha?

el...@google.com <el...@google.com> #41Dec 14, 2017 06:48PM

Expect it some time in Q1. We'll provide more specific information when we have it.

[Deleted User] <[Deleted User]> #42Feb 1, 2018 07:50PM

Any updates? Alpha or beta availability?

ap...@google.com <ap...@google.com> #43Feb 1, 2018 08:13PM

Yes, we are approaching alpha availability. We will make an announcement once all the relevant bits make it into production.

el...@google.com <el...@google.com> #44Mar 5, 2018 05:42PM

If you are interested in the alpha release of the NUMERIC type, please submit this form:

https://goo.gl/forms/UqMISS5SJxDi4cwY2. We will whitelist projects on a rolling basis and send an email to notify when you are included.

Please keep in mind that this is an Alpha feature, as explained in

https://cloud.google.com/terms/launch-stages. As such, we discourage using the NUMERIC type with production applications, and there is no associated SLA.

Thanks!
Elliott, on behalf of the BigQuery team

[Deleted User] <[Deleted User]> #45Mar 7, 2018 12:51AM

Thanks Elliott. Do you have a rough estimate of when this might move into beta?

el...@google.com <el...@google.com> #46Mar 7, 2018 12:58AM

We have a list of unimplemented features in the alpha documentation that we share with people who have enrolled, and these are things that we want to address before moving into beta. We don't have a specific date that we expect all of them to be complete, though. We try to be as thorough as possible, of course, but we also want to give alpha users time to report feedback in case there are oversights that we didn't recognize ourselves.

[Deleted User] <[Deleted User]> #47Apr 9, 2018 03:00PM

We are also affected greatly by this 'feature'...

Looking at what the query mentioned in the job id's do; currency conversion on the fly (floats) and then aggregating:
If we run it multiple times, differences can be in the range of 0.35 % for the aggregated result.!
Because we are talking revenue reporting here, this large of a difference is just unacceptable for our stakeholders
Job id's:

spil-bi-online-prd:EU.bquijob_a21d171_162aa959cd4
spil-bi-online-prd:EU.bquijob_2eaf48b1_162aae00559

We really hope the NUMERIC data type will fix this & that it will be moved to beta / GA soon. Because not running under a SLA is not ideal...

Question: If we would submit a project for this feature, is all OTHER 'work we do' in BigQuery (not using the NUMERIC data type) still under SLA?

el...@google.com <el...@google.com> #48Apr 9, 2018 04:06PM

We hope that it will be in beta/GA soon as well :) Specifically for the
beta, I can share that we are targeting the first half of this quarter, and
we will provide an update here as well as in the release notes when that
happens. I can't provide any specifics yet about timing of a GA launch,
however.

In terms of the SLA, if you run queries in a project that is enrolled in
the alpha but you don't reference any NUMERIC columns or create NUMERIC
values, then the usual BigQuery SLA applies. The Google Cloud Alpha terms
apply for any queries involving NUMERIC, however.

[Deleted User] <[Deleted User]> #49Apr 9, 2018 06:26PM

Thanks for the quick reply el..., I've immediately submitted the project we are building, - but kind of blocked now -, to the Alpha!

el...@google.com <el...@google.com> #50May 15, 2018 05:46PM

Marked as fixed.

This has now been released in beta:

https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#numeric-type.

[Deleted User] <[Deleted User]> #51May 16, 2018 01:12PM

I really love the existence of the NUMERIC data type. However, when I sum() over a NUMERIC column, I get: "Field xy is of type STRING which is not supported for SUM". In the schema it is shown as NUMERIC as it should. Is there a restriction for aggregations?

el...@google.com <el...@google.com> #52May 16, 2018 01:16PM

Are you using legacy SQL? You need to use standard SQL for a SUM over a
NUMERIC column. See

https://cloud.google.com/bigquery/data-types#numeric-type-support for more
information.

el...@google.com <el...@google.com> #53Aug 30, 2018 05:03AM

This was released to general availability last week:

https://cloud.google.com/bigquery/docs/release-notes#august_20_2018

xu...@gmail.com <xu...@gmail.com> #54Mar 3, 2020 06:44PM

Chuyển tất cả vào tài khoản 9704050941241478 mai hồng xuyên