Partitioning on timestamp/date field [65440943]

Fixed

Feature Request

Status Update

No update yet.

Description

mm...@merlinnot.com

created issue #1

Sep 7, 2017 01:12PM

Hi,

is there a way to use this experimental feature? I'm currently developing a new app which would make a great use of it. I'm currently getting "Field based partitioning support is not yet available for this project" error from the API.

Comments

bl...@google.com <bl...@google.com> Sep 7, 2017 01:13PM

Assigned to cm...@google.com.

cm...@google.com <cm...@google.com> Sep 7, 2017 08:03PM

Reassigned to ep...@google.com.

[Deleted User] <[Deleted User]> #2Sep 11, 2017 11:52AM

Hey,
same api error/question here.
We would like to try out experimental feature as well.

a....@mediameeting.fr <a....@mediameeting.fr> #3Sep 20, 2017 08:11AM

We need it too... Did anyone find how we can get it ? Blocked with "Field based partitioning support is not yet available for this project" API error...

ep...@google.com <ep...@google.com> #4Sep 20, 2017 01:54PM

We are waiting on a few approvals to start alpha for this feature shortly.
Please watch this bug for more updates.

r....@gmail.com <r....@gmail.com> #5Oct 11, 2017 08:44AM

That is great to hear!
I have one important question though regarding this new feature. Is it going to be implemented in such a way that we have the option to:
1) Select any column / field for partitioning BUT ALSO KEEP the time based partitioning on top of that?
2) OR is it going to be implemented in such a way that we can choose to partition on ONE FIELD ONLY, either being time or something else?
We need scenario 1 :)!

gr...@gmail.com <gr...@gmail.com> #6Oct 11, 2017 11:49AM

Currently looking into doing 90000 different tables. Getting Scenario 1 as described above would allow us to just use 1.

ep...@google.com <ep...@google.com> #7Oct 11, 2017 08:38PM

Status: Duplicate of

35905817

We are currently doing a controlled alpha release for date/timestamp column based partitioning. Development is actively underway on scenario (1) that you outlined above. I don't have a good ETA to provide at this point, unfortunately.

r....@gmail.com <r....@gmail.com> #8Oct 12, 2017 07:02AM

I do not understand as we already have time based partitioning?
I agree this on is a partial duplicate of

https://issuetracker.google.com/issues/35905817
HOWEVER, that one mentions partitioning on another field than datetime.

Scenario 1 I describe here in my previous comment would imply keeping the datetime partitioning, but added to that the possibility to partition on an additional field (or fields even).
Like you can do in a 'traditional RDBMS' that supports partitioning on multiple columns...

de...@derekperkins.com <de...@derekperkins.com> #9Oct 12, 2017 03:50PM

I believe that refers to partitioning on a user date field as opposed to the hardcoded `_PARTITIONTIME`

ep...@google.com <ep...@google.com> #10Oct 12, 2017 03:54PM

Correct, we are offering partitioning on a user specified timestamp or date field at this point. After that you plan to support multi-level partitioning (feature in development), where first level is date/timestamp and second level is an arbitrary integer column.

ke...@gmail.com <ke...@gmail.com> #11Oct 12, 2017 06:57PM

Only arbitrary integers (ie, not strings)?

r....@gmail.com <r....@gmail.com> #12Oct 13, 2017 09:01AM

I 2nd that last question, why not strings? With int only I would not want to call it real 'multi-level partitioning '...

gr...@gmail.com <gr...@gmail.com> #13Oct 13, 2017 08:52PM

You can codify your strings to ints before loading the data. But then again, having bigQuery do it behind the scenes would be much more convenient

ep...@google.com <ep...@google.com> #14Oct 13, 2017 08:55PM

Sorry, the inclusion of word integer in #10 was unintentional. We are planning on supporting strings as well here.

du...@gmail.com <du...@gmail.com> #15Nov 14, 2017 04:40PM

Hope to try out experimental feature as well!

[Deleted User] <[Deleted User]> #16Nov 14, 2017 04:53PM

looking forward to this feature!

[Deleted User] <[Deleted User]> #17Nov 28, 2017 07:01PM

Joining the party... Is there a rough ETA for the alpha of supporting user-specified date/time partitioning? Thanks.

ep...@google.com <ep...@google.com> #18Nov 28, 2017 07:16PM

Date/timestamp field based partitioning is in alpha. We are happy to add you to the whitelist if you are interested. It has some caveats specified in the documentation (will be shared with you separately). If you are interested, we will need the project number(s) to add to the whitelist.

[Deleted User] <[Deleted User]> #19Nov 28, 2017 07:20PM

Who can we privately message our project number to if we're interested in being whitelisted?

ep...@google.com <ep...@google.com> #20Nov 28, 2017 07:23PM

If you are fine with sharing the project number over email, I can send you an email to your email address. You can respond to it with your project number.

pe...@gmail.com <pe...@gmail.com> #21Nov 28, 2017 07:24PM

send me as well, I am also a GDE on cloud

Message last modified on Nov 28, 2017 07:25PM

va...@doit.com <va...@doit.com> #22Nov 28, 2017 07:27PM

I would be interested as well to participate in alpha. My project number is #276982747399

[Deleted User] <[Deleted User]> #23Nov 28, 2017 07:31PM

I'm fine with sending it over email! I'm at michael.raimi@blueapron.com

ya...@gmail.com <ya...@gmail.com> #24Nov 28, 2017 07:55PM

It seems the API already supports date-based partitioning under
configuration.load.timePartitioning.field

See

https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.tableDefinitions

I have not tested it but request is still open for any field.

On Tue, Nov 28, 2017 at 2:31 PM, <buganizer-system@google.com> wrote:

- Show quoted text -

ep...@google.com <ep...@google.com> #25Nov 28, 2017 08:00PM

The API supports it, but we are doing it as a controlled alpha. Server side support is enabled only for whitelisted customers. Thanks!

sz...@book.com <sz...@book.com> #26Nov 28, 2017 09:41PM

I would like to be added to the alpha list. Mail me at - szmehta@book.com

[Deleted User] <[Deleted User]> #27Nov 29, 2017 12:33AM

Please whitelist our project; #753193720196

[Deleted User] <[Deleted User]> #28Nov 29, 2017 02:59AM

Please add our project #467018477224 to whitelist. thanks!

va...@doit.com <va...@doit.com> #29Nov 29, 2017 09:19PM

Please add project #831908763256 and email roy@storemaven.com.

Thank you!

ep...@google.com <ep...@google.com> #30Nov 30, 2017 01:38AM

Everyone that has requested for it so far has been whitelisted. Please share your feedback, if any, about issues you may face with the feature. Thanks!

[Deleted User] <[Deleted User]> #31Nov 30, 2017 03:13AM

Just tried and it looks promising. Could you share the documentation to us? Mail: cr@17.media
Thanks!

pe...@aliz.ai <pe...@aliz.ai> #32Nov 30, 2017 08:40AM

Please add project #171413420977

be...@gmail.com <be...@gmail.com> #33Nov 30, 2017 05:45PM

Please add project number 359749582022, project id sk-data-platform. Also where do we find the documentation?

Thanks!

[Deleted User] <[Deleted User]> #34Nov 30, 2017 10:52PM

It appears that bq CLI doesn't yet support '--time_partitioning_field' with inferred Avro schema from Avro data file(s). If that's the case, is there a generic converter from avsc (avro json) to bq json?

Also, the timestamp field must be a top-level field at this time. Will the beta support a nested timestamp field? Thanks.

ep...@google.com <ep...@google.com> #35Dec 1, 2017 06:25PM

Regarding avro load, --time_partitioning_field support seems like it should work. We tested it on bq version 2.0.27
Can you please share the command line that did not work? Also, what version of bq command line tool are you using?

At the moment we don't have immediate plans to support partitioning on nested fields, but we will consider that based on alpha feedback.

[Deleted User] <[Deleted User]> #36Dec 3, 2017 02:59AM

> bq version
This is BigQuery CLI 2.0.27

> bq load --autodetect --source_format=AVRO --time_partitioning_field=createdTimeMs xxx.yyy gs://xxx/xxx.avro
BigQuery error in load operation: Table with field based partitioning must have a schema.

Without --time_partitioning_field, the load succeeds.

ep...@google.com <ep...@google.com> #37Dec 3, 2017 06:58AM

Thanks for reporting the issue. It looks like this is an issue with interaction between autodetect and column based partitioning. We will try to fix it as soon as possible. In the meantime, specifying a schema explicitly should work.

ya...@gmail.com <ya...@gmail.com> #38Dec 3, 2017 05:11PM

Not sure anymore where to post this (see my comment

https://issuetracker.google.com/issues/67861929#comment3) but specifically to custom date-field partitionning:

We receive daily large files with date format YYYYMMDD. Even with this req, we won't able to partition by the date field at loading without prepping the data beforehand (dataprep would be overdoing). The alternative is to load all dates as INTEGER or STRING and query-parse to a new set of tables, with hopefully this custom date-field partitioning. The latter is not only very cumbersome but expensive all data must be scanned from staging to final.

Hence: along with allowing a custom date-field to be used for partionning, request is to be able to specify a custom date-pattern.

Message last modified on Dec 3, 2017 05:15PM

ep...@google.com <ep...@google.com> #39Dec 3, 2017 07:36PM

Load doesn't do this, but this is possible to do with a query. Assuming these files are on GCS, you can do the following:
1. Define an external table over the files in GCS.
2. External tables have a pseudo column _FILE_NAME for each row that contains the path to the file from which the row comes. Using REGEXP_EXTRACT you can extract the date part of the path.
2. Run an insert DML statement that selects from that external table and inserts into partitioned table. Assuming it is a pseudo column (_PARTITIONTIME) partitioned table, the insert DML would something like.
INSERT partitioned_table (_PARTITIONTIME, field1, field2) SELECT REGEXP_EXTRACT(_FILE_NAME, ...), field1, field2 from external_table

ya...@gmail.com <ya...@gmail.com> #40Dec 3, 2017 07:49PM

Sorry I meant the YYYYMMDD dates are part of the file's content, not file name. We cannot change the date format in the source, it is hard-coded in legacy system, and we have a lot to convert. Ideally load would parse dates using custom pattern or fully supports ISO 8601 (where both YYYY-MM-DD and YYYYMMDD are accepted).
In addition, would be great to ignore date-specific nullmarker 00000000 (have logged separate request).

Data prepping script is not bullet-proof and slow (regex a date). Currently thinking of using dataframe to load, transform and stream to bq but very slow and memory-hungry (+need to provision VM).

Message last modified on Dec 3, 2017 07:52PM

ep...@google.com <ep...@google.com> #41Dec 3, 2017 07:59PM

If they are in some column inside the file, couldn't you still use the INSERT DML mechanism that was suggested above? With that approach, you won't need to run data prep, everything happens inside the INSERT statement.

ya...@gmail.com <ya...@gmail.com> #42Dec 3, 2017 08:14PM

Agreed, but more as a workaround and at a cost as this would scan the entire table. Most tables we get have at least one date column and we have 2TB of daily data to load.

Would be so much easier - for users - to have custom date format supported, or at least have the following supported:
YYYY-[M]M-[D]D --> today
YYYY[-]MM[-]DD --> new and standard ISO8601
Both could be accommodated

Message last modified on Dec 3, 2017 08:21PM

ep...@google.com <ep...@google.com> #43Dec 3, 2017 09:22PM

There is no separate table scan. It would be a single scan of each input file (that you use in your external table definition) that you would insert rows directly into the necessary partition of the partitioned table. Performing a load also scans those input files.

ya...@gmail.com <ya...@gmail.com> #44Dec 3, 2017 10:41PM

Thanks - hadn't thought about that. Good news on the cost aspect but please
do leave it open as it would be more straightforward if date-format could
be specified or '-' be ignored.

Will use the suggested workaround in the meantime !

On Sun, Dec 3, 2017 at 4:22 PM, <buganizer-system@google.com> wrote:

- Show quoted text -

ke...@gmail.com <ke...@gmail.com> #45Dec 3, 2017 10:53PM

This is also discussed here:

https://stackoverflow.com/questions/33849969/custom-date-format-for-loading-data-into-bigquery-using-bq

https://stackoverflow.com/questions/44165737/how-to-load-an-integer-value-in-an-avro-file-to-a-date-column-in-bigquery

Are there any plans - either using the 'excuse' of the column-based time partitioning rollout to enrich functionality in this area, or just because more generally it's a desirable feature - to enhance support for parsing different date formats at load-time, and avoid the need to do post-processing of datasets with date values that don't happen to match BigQuery's preferred format?

ya...@gmail.com <ya...@gmail.com> #46Dec 4, 2017 01:58PM

I am confused actually with #43.
Scanning = querying. As per

https://cloud.google.com/bigquery/pricing, loading is free but querying external to insert into a partitioned table would not (see INSERT under

https://cloud.google.com/bigquery/docs/reference/standard-sql/data-manipulation-language).
If so, in addition to the boilerplate, this is very costly just to convert dates.

Message last modified on Dec 4, 2017 02:03PM

pe...@gmail.com <pe...@gmail.com> #47Dec 4, 2017 02:19PM

I tried to understand your problem, I read that in your thinking there are multiple level of significance/costs.

1. you want to load to partition FREE (as load is free)
2. the #43 mentioned that it will be 1 single file scan (it's not free but it's only 1 single file scan)
3. for the sake of example, 3rd case would be multiple scans probably using multiple queries

you didn't explained what "costly" meant for you ($10 for 2TB), if you have 1 single file scan, and paying 1 time the single scan would you consider that to be "costly" having 2TB of daily data?

ya...@gmail.com <ya...@gmail.com> #48Dec 4, 2017 03:11PM

Querying from external and inserting to table is not much different from loading into staging (with dates = String) and inserting parsed dates to new set of tables. Cost is the same.
Sure the former is one scan vs. 2 for the latter but as pointed by others, either way could be avoided with the support of custom date format or looser rule ('-' in date optional).
In addition, we use Airflow and I am not sure there is an operator to perform a DML on GCS data.

What costly means here (to me) is irrelevant. 2TB is not a lot for big data, others may have bigger savings. But there are more benefits to the request. Maybe this should be logged separately.
I am just surprised that a date-pattern parameter on load was never raised before.

[Deleted User] <[Deleted User]> #49Dec 4, 2017 06:48PM

We'd like to be whitelisted: project id is 413669492896, email gil.jaysmith@techland.pl
Cheers!

de...@derekperkins.com <de...@derekperkins.com> #50Dec 4, 2017 11:33PM

Now that this alpha has been running for ~2 months, is there an ETA for GA release?

ge...@markovian.com <ge...@markovian.com> #51Dec 5, 2017 11:28AM

Could you whitelist our project ID 679991979866 please? Email: georg.walther@markovian.com

[Deleted User] <[Deleted User]> #52Dec 5, 2017 01:25PM

Hi, can we please get whitelisted as well?
project id: 675596469635
email: or.sher@personali.com

[Deleted User] <[Deleted User]> #53Dec 5, 2017 01:29PM

Is there any way to overwrite a specific partition the same way it used to work with the _PARTITIONTIME decorator ?
When trying do so with --replace / 'WRITE_TRUNCATE' get the following error:
*The table is partitioned by a field. Data can only be added to the table itself and distributed to partitions by the system.*
Without the decorator it truncates the whole table.

Message last modified on Dec 5, 2017 01:46PM

ep...@google.com <ep...@google.com> #54Dec 5, 2017 05:44PM

Regarding WRITE_TRUNCATE on a partition, we will have the ability to do this through the MERGE DML statement (

https://issuetracker.google.com/35905927)
Depending on when that becomes available, we are exploring alternatives ways to achieve this functionality.

ep...@google.com <ep...@google.com> #55Dec 6, 2017 12:53AM

There is no ETA for GA at this point, tentatively hope to go beta in the next quarter.

me...@disney.com <me...@disney.com> #56Dec 14, 2017 08:58PM

Hello, can we please get whitelisted as well?
project id: 498740506390
email: melissa.j.perkins.-nd@disney.com; caleb.jones@disney.com;

[Deleted User] <[Deleted User]> #57Dec 19, 2017 12:52PM

Facing below issue when trying to load partitioned table from src table using _time_partitioning_field .

My intention is to move data from src to dest, where src is not partitioned but have field named 'created' and dest is partitioned by day. So wanted to move data partitioned based on 'created' date

Our prj is already whitelisted for alpha version. If you guys have any solution to this problem, pls let us know.

BigQuery error in load operation: Incompatible table partitioning specification. Expects partitioning specification interval(type:day,field:created), but input partitioningspecification is interval(type:day)

Bq load command used is

bq load --source_format=AVRO --time_partitioning_type=DAY --time_partitioning_field='created' userstore.Test_Record_Dest_P1 ~/Downloads/test_01.avro myschema.json

Any suggestions to make it work ?

Message last modified on Dec 19, 2017 12:59PM

ma...@groovymedia.com <ma...@groovymedia.com> #58Dec 20, 2017 05:18AM

We would love to test this feature! Project number: 998012707676, email: maxim@groovymedia.com
Thanks!

ep...@google.com <ep...@google.com> #59Dec 20, 2017 06:19AM

Responding to:

https://issuetracker.google.com/65440943#comment57

Are you using the latest version of the bq command line tool? I verified that with version 2.0.27 it works correctly.

[Deleted User] <[Deleted User]> #60Dec 20, 2017 04:39PM

Hey guys,

We are playing with this and are really happy with the results so far. The question I have is that because this is in alpha it could be turned off at any time. What would happen to our existing tables in that scenario?

Thanks.

ep...@google.com <ep...@google.com> #61Dec 20, 2017 06:13PM

We are planning on going beta sometime next quarter. At this point, we think that existing tables will be generally unaffected.

sa...@gmail.com <sa...@gmail.com> #62Dec 20, 2017 06:19PM

+ request to be added to whitelist

ep...@google.com <ep...@google.com> #63Dec 20, 2017 06:49PM

We'd need your project number, in order to whitelist you.

sa...@gmail.com <sa...@gmail.com> #64Dec 20, 2017 07:05PM

Email me and I'll tell you? Buganizer doesn't show me your email, only
shows mine to you.

de...@derekperkins.com <de...@derekperkins.com> #65Dec 20, 2017 08:53PM

Is it possible to whitelist an organization so that any projects inside the organization can use this?

ep...@google.com <ep...@google.com> #66Dec 20, 2017 10:10PM

Yes, it is possible to whitelist an entire organization.

de...@derekperkins.com <de...@derekperkins.com> #67Dec 20, 2017 10:12PM

That's awesome. Can I get organization 496154168690 whitelisted? Thanks!

ep...@google.com <ep...@google.com> #68Dec 20, 2017 10:34PM

Done.

ro...@storemaven.com <ro...@storemaven.com> #69Dec 21, 2017 10:45AM

Field partition based table access acts differently than pseudo _PARTITIONTIME based ones.

We encountered an issue which led to tremendously high costs while migrating vasts amounts of data using field-based partitioning.

project id: mockappstore
datasets involved: sm_migration & sm_analytics

The query below will FULL SCAN a table even though it is partitioned on eventTime field (TIMESTAMP type):

SELECT customerName,productName,testName,storeName,action,category,context,uId,userType,userIP,eventTime,clientTime,sessionId,firstSessionId,firstSessionTime,imageIndex,campaign,
ad,cId,pId,testId,storeId,expectedUserDevice,userDevice,userPlatform,userAgent,userBrowser,userOS,userBrowserVersion,iphoneModel,userRegion,testType,serverVersion,appVersion,
referer,userCountry,userCity,xFbHttpEngine,xPurpose,aString,aIndex,qString,qIndex,videoTime,userRequestedWith,editorsNotes,lastVisibleImage,maxVisibleImage,firstVisibleImage,
itemIndex,itemName,lastVisibleItem,featuredApp,galleryOrientation,entry,extraHeaders,extraUrl,browserEvents,gyro,battery,stickerGallery, videoSessionIndex,videoContinueIndex,sessionDurations,storeOS
FROM sm_migration.cust_867c775a766c4a04a346dd341e74026a
WHERE TIMESTAMP(DATE(eventTime)) = PARSE_TIMESTAMP('%Y%m%d','20170312', 'UTC')

Results a FULL SCAN - 5.73 Megabytes

The equivalent table partitioned by _PARTITIONTIME will scan the 2017-03-12 partitions only (expected behavior), see query:

SELECT customerName,productName,testName,storeName,action,category,context,uId,userType,userIP,eventTime,clientTime,sessionId,firstSessionId,firstSessionTime,imageIndex,campaign,
ad,cId,pId,testId,storeId,expectedUserDevice,userDevice,userPlatform,userAgent,userBrowser,userOS,userBrowserVersion,iphoneModel,userRegion,testType,serverVersion,appVersion,
referer,userCountry,userCity,xFbHttpEngine,xPurpose,aString,aIndex,qString,qIndex,videoTime,userRequestedWith,editorsNotes,lastVisibleImage,maxVisibleImage,firstVisibleImage,
itemIndex,itemName,lastVisibleItem,featuredApp,galleryOrientation,entry,extraHeaders,extraUrl,browserEvents,gyro,battery,stickerGallery, videoSessionIndex,videoContinueIndex,sessionDurations,storeOS
FROM sm_analytics.cust_p_867c775a766c4a04a346dd341e74026a
WHERE TIMESTAMP(DATE(_PARTITIONTIME)) = PARSE_TIMESTAMP('%Y%m%d','20170312', 'UTC')

Results a SINGLE PARTITION SCAN - 335 Kilobytes (Perfectly fine)

The table above, which stores 5.73 MB of data is used to emphasize our issue. We got tables with over 800 GB of data, and the migration processes 712 days.

Instead of 712 single partitions scans == the same cost as 1 FULL TABLE SCAN, this causes 712 FULL SCANS on each of our 680+ tables.

Thanks

ep...@google.com <ep...@google.com> #70Dec 21, 2017 07:02PM

Thanks for the feedback. Field partitioned tables based on timestamp field will behave differently for partition pruning from pseudo column partitioned tables due to the granularity of values stored in the column. I am not sure if you have access to the alpha documentation, but this is one of the caveats mentioned there. Here are a few things we are planning on providing soon:

1. A require_partition_filter flag on a partitioned table. Setting this to true will ensure that the system will fail queries that do NOT have a filter that can be used for evaluating the partitions to scan.
2. We are improving partition pruning for field based partitioned tables. We have identified common functions that are used on timestamp fields and support them for partition pruning. We are going to keep adding to this list (and document it). For example, since we launched the alpha, we have added support for predicates of the kind DATE(eventTime) = "2017-03-12" for pruning. If you have other common filters that you use, please list them here and we will look into adding support for them.

For filters that aren't supported for pruning, users can avoid incurring huge bills by setting the flag described in (1).

ep...@google.com <ep...@google.com> #71Dec 21, 2017 07:09PM

Accepted by ep...@google.com.

Repurposed this bug to be specific to timestamp and date field partitioning.

https://issuetracker.google.com/35905817 tracks the non-date/timestamp field partitioning.

ci...@gmail.com <ci...@gmail.com> #72Dec 24, 2017 09:44AM

We also want to be added to the whitelist. Can you please shoot an email to cityscooper@gmail.com for the project id. Thanks.

wo...@gmail.com <wo...@gmail.com> #73Dec 25, 2017 09:34AM

We want to try this feature, projectid: 231841946736.Thanks.

ep...@google.com <ep...@google.com> #74Dec 27, 2017 04:24PM

Due to the holidays, we will able to resume whitelisting only after new year. Apologies for any inconvenience. Thanks!

[Deleted User] <[Deleted User]> #75Dec 27, 2017 04:39PM

Recently BigQuery added the ability to query partitioned wildcard tables. This does not appear to work with field based partitioning. Is this on the roadmap to fix? This is the error:

BigQuery error in update operation: Wildcard table over field partitioning tables is disallowed.

ep...@google.com <ep...@google.com> #76Dec 27, 2017 04:46PM

Yes, we will be supporting this before going to beta (tentatively mid Q1).

ep...@google.com <ep...@google.com> #77Jan 2, 2018 07:07PM

All whitelist requests have been processed.

[Deleted User] <[Deleted User]> #78Jan 3, 2018 01:33PM

Can I please get organization 600153968971 added to the white list. Thanks.

[Deleted User] <[Deleted User]> #79Jan 3, 2018 05:47PM

Hey guys, can I please get whitelisted to test this feature? projectid: 450940328522

[Deleted User] <[Deleted User]> #80Jan 4, 2018 07:19AM

We were trying to test this and realized it's not GA. Please add 404270122387 to the whitelist. Thanks!

al...@veolia.com <al...@veolia.com> #81Jan 4, 2018 03:27PM

Hello,

Could you add the project 192056161420 to the whitelist?
Thanks.

Regards,
Alexandre

ya...@gmail.com <ya...@gmail.com> #82Jan 11, 2018 09:02PM

please whitelist 897478599326 and 333196740116

Thanks, Yannick

ya...@gmail.com <ya...@gmail.com> #83Jan 12, 2018 03:52PM

Was hoping to work on that during the w-e.
Any ETA on pending whitelistings ? Thanks !

ep...@google.com <ep...@google.com> #84Jan 12, 2018 08:16PM

All pending whitelist requests have been processed as of 11 am PST today.

ma...@gmail.com <ma...@gmail.com> #85Jan 12, 2018 08:35PM

Please whitelist 368656566068, 1007394073692 and 464117503368.

Thank you!

ya...@gmail.com <ya...@gmail.com> #86Jan 15, 2018 07:12PM

Forgot one - can you please whitelist 529599436172 too
Thanks.

ep...@google.com <ep...@google.com> #87Jan 16, 2018 06:19PM

All whitelists so far have been processed.

ya...@gmail.com <ya...@gmail.com> #88Jan 19, 2018 09:41AM

With the recent support of CREATE TABLE DDL, can whitelisted projects use user-defined date field for the PARTITION BY parameter ?
If not, please consider.

el...@google.com <el...@google.com> #89Jan 19, 2018 04:41PM

Yes, you can use it. You should be able to execute statements such as these if you are part of the column-based partitioning alpha: CREATE TABLE my_dataset.my_table (x INT64, y DATE) PARTITION BY y; or CREATE TABLE my_dataset.my_table (x INT64, y TIMESTAMP) PARTITION BY DATE(y);

Once column-based partitioning is in beta (tentatively mid-Q1, per #76), we will update

https://cloud.google.com/bigquery/docs/data-definition-language to reflect the supported syntax.

ya...@gmail.com <ya...@gmail.com> #90Jan 19, 2018 08:07PM

That's great - however I just hit an undocumented limit:

Error: Too many partitions produced by query, allowed 500, query produces at least 1095 partitions
(Job ID: sunlit-flag-177515:bquijob_74fe0af7_1610ffe2449)

I read somewhere about a 2000 partitions limit which, if enforced, would work fine in my use case - need 5 years of daily data, partitioned to allow continued discovery and very short response time.
500 is really to low.

Can you increase that threshold or wave ?

ep...@google.com <ep...@google.com> #91Jan 19, 2018 08:11PM

This is documented in the alpha doc that was shared with our whitelisted users. We hope to increase it to 2500. Note that this limit today is different from total number of partitions in table (this is already 2500 today).

ya...@gmail.com <ya...@gmail.com> #92Jan 19, 2018 08:24PM

My bad, you're right. I have gone around it by issuing one 'insert into'
DML per year.

[Deleted User] <[Deleted User]> #93Jan 19, 2018 09:31PM

Please add two projects. Can I also please get a copy of the alpha doc?

961768382261
368656566068

[Deleted User] <[Deleted User]> #94Jan 19, 2018 10:09PM

Can you please whitelist 187213475372.
Thanks.

ep...@google.com <ep...@google.com> #95Jan 20, 2018 01:02AM

Whitelist requests so far have been processed.

di...@monzo.com <di...@monzo.com> #96Jan 21, 2018 05:11PM

Can you whitelist 962019738027 as well please?
Thanks in advance!

ma...@gmail.com <ma...@gmail.com> #97Jan 22, 2018 06:41PM

Please whitelist 961768382261 as well! thank you!

dr...@gtempaccount.com <dr...@gtempaccount.com> #98Jan 22, 2018 08:13PM

Please whitelist 464117503368, thank you!

ad...@google.com <ad...@google.com> #99Jan 22, 2018 08:23PM

Please whitelist 228840735359 as well and let me know when it has been processed?

Thank you

Arend

[Deleted User] <[Deleted User]> #100Jan 22, 2018 08:51PM

Please whitelist 1057665635705 as well, thanks

ep...@google.com <ep...@google.com> #101Jan 22, 2018 09:21PM

All whitelist requests up to (and including) #99 have been processed.

da...@gmail.com <da...@gmail.com> #102Jan 23, 2018 09:13AM

Please whitelist project ID 57024152020. Thank you so much.

[Deleted User] <[Deleted User]> #103Jan 23, 2018 11:33AM

Could you please whitelist project ID 1006213818196? Thanks

mi...@qlirogroup.com <mi...@qlirogroup.com> #104Jan 25, 2018 06:50PM

We would also very much appreciate a whitelist: 369318398231, Thanks

[Deleted User] <[Deleted User]> #105Jan 29, 2018 05:36AM

Is it too late to sign up?

These projects:
478780302124
112734977199
62448027116
438110786158

Thank you!

ta...@care2team.com <ta...@care2team.com> #106Jan 30, 2018 07:02PM

I would very much appreciate a whitelist for our project, which is #176550870186.

Thank you!

ak...@gmail.com <ak...@gmail.com> #107Feb 16, 2018 02:23AM

We would like to try this feature, projects: db-cm-dev and db-cm-pro. Thanks.

ma...@gmail.com <ma...@gmail.com> #108Feb 16, 2018 04:07AM

This is in beta now:

https://cloud.google.com/bigquery/docs/creating-column-partitions

You can already create these tables and query them. They've been great for us so far, although they don't currently support DML (update/delete) statements.

ep...@google.com <ep...@google.com> #109Feb 22, 2018 07:46PM

Marked as fixed.

DML support will be available shortly. I am closing this feature request. Partitioning on non-date/timestamp field is tracked in

https://issuetracker.google.com/35905817

ya...@gmail.com <ya...@gmail.com> #110Apr 16, 2018 06:23PM

Sorry for the cross-posting as I wrote the same on the dedicated google group for alpha users but just in case, as the group had only 4 posts so far...:

I have 2 (actually 252) tables which each are partioned the same way, using an "as_of" timestamp - these are then partioned by day (e.g. in below, partition 2012-11-09).

In below query, data gets pruned just fine.

SELECT
t2.*
FROM
`dataset.t1` join `dataset.t2` using (a, b, c)
WHERE
t1.as_of=TIMESTAMP('2012-11-09T23:59:51')
AND
t2.as_of=TIMESTAMP('2012-11-09T23:59:51')

However, when I set only one condition and use a join "for the other(s)", data is not pruned on the joined tables and a full tablescan is performed on table t2:

SELECT
t2.*
FROM
`dataset.t1` join `dataset.t2` using (a, b, c, as_of)
WHERE
t1.as_of=TIMESTAMP('2012-11-09T23:59:51')

The latter is more convenient especially if you have a lot of tables to join.

Message last modified on Apr 16, 2018 06:30PM

ya...@gmail.com <ya...@gmail.com> #111Apr 16, 2018 08:12PM

Wanted to add that this is not just for convenience: the first query cannot leverage a view, whereas for the second, if pruning is supported, one could create a view:

CREATE VIEW `dataset.myview` AS
SELECT
t2.*
FROM
`dataset.t1` join `dataset.t2` using (a, b, c, as_of)

and running a
SELECT * FROM myview WHERE as_of = '2012-11-09T23:59:51'
would prune nice and clean.