Status Update
Comments
ep...@google.com <ep...@google.com> #2
Can you please check your .iml files? Also, instead of opening the project, *import* it, that will completely rewrite your .iml files and you won't see that error again.
de...@derekperkins.com <de...@derekperkins.com> #3
1) open AS,
2) delete the Gradle Java modules from the project,
3) re-import the Gradle Java modules to the project,
4) close AS,
5) re-open AS.
In that case AS does not *not* complain about Gradle Java modules to be non-Gradle Java modules, and I've confirmed that the generated *.iml files contain
However, if I do
6) close AS,
7) delete the *.iml files,
8) re-open AS,
then AS again complains *although* the generates files again contain
Can it be that AS is somehow performing the check for non-Gradle Java modules before the *.iml files are generated in case of just opening (instead of importing) the project?
de...@derekperkins.com <de...@derekperkins.com> #4
ke...@king.com <ke...@king.com> #5
[1]
ga...@gmail.com <ga...@gmail.com> #6
r....@gmail.com <r....@gmail.com> #7
r....@gmail.com <r....@gmail.com> #8
jm...@gmail.com <jm...@gmail.com> #9
ep...@google.com <ep...@google.com> #10
<!----------- My comment -------------->
r....@gmail.com <r....@gmail.com> #11
I have one main android application + six Android libraries.
We do not commit our .iml files to source control.
ke...@king.com <ke...@king.com> #12
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #13
sw...@bainbridgehealth.com <sw...@bainbridgehealth.com> #14
bz...@gmail.com <bz...@gmail.com> #15
pe...@gmail.com <pe...@gmail.com> #16
bz...@gmail.com <bz...@gmail.com> #17
pe...@gmail.com <pe...@gmail.com> #18
ep...@google.com <ep...@google.com> #19
And are you, by any chance, using the com.github.dcendents.android-maven plugin? I upgraded from 1.4.2 of that to 1.5, and updated to Gradle 3.2 at the same time, and I'm seeing the error even after downgrading to AS 2.2.3.
I'm in the process of doing a clean install now, and then I'll be trying reverting back to the older version of both, to see if that changes anything.
de...@derekperkins.com <de...@derekperkins.com> #20
That said, it seems interesting to me that several people at once all seemed to run into the same thing. There may be something going on, still.
bi...@ilab.dk <bi...@ilab.dk> #21
de...@derekperkins.com <de...@derekperkins.com> #22
bz...@gmail.com <bz...@gmail.com> #23
ma...@google.com <ma...@google.com> #24
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #25
la...@amedia.no <la...@amedia.no> #26
da...@gmail.com <da...@gmail.com> #27
jo...@gmail.com <jo...@gmail.com> #28
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #29
6:48 PM Unsupported Modules Detected: Compilation is not supported for following modules: AndroidStudioProjects. Unfortunately you can't have non-Gradle Java modules and Android-Gradle modules in one project.
I do not have any non-Gradle Java modules in my project.
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #30
Android Studio 3.1.4
Build #AI-173.4907809, built on July 23, 2018
JRE: 1.8.0_152-release-1024-b01 amd64
JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Linux 4.15.0-32-generic
I am not sure, but I believe this causes further errors which are quite problematic. I have several projects with this issue: in all of them, some dependencies that are defined in a module's build.gradle file cannot be resolved by AS. Actually, the dependencies are resolved and downloaded from the repositories and show up under "External Libraries" in Project view. However, everywhere I try to use any class from those dependencies... I get a "Cannot resolve symbol XYZ". So I end up stuck with unresolved imports, no way to navigate through classes, etc.
The wierdest thing, though, is that the app compiles fine and runs in a device. It's just AS cannot resolve symbols from well-resolved dependencies.
Project structure overview:
- "app" Android application module (Gradle); depends on "commons", "annotations" and "processor" modules -> *gets detected as non-Gradle*
- "commons" Android library module (Gradle) with common utils, views, dependencies, etc.
- "annotations" java module (Gradle)
- "processor" java module (Gradle)
The symbols that are not resolved by AS come from dependencies defined in the "commons" module.
I have invalidated caches, deleted .idea and .gradle folders, re-downloaded the project, deleted ~/.gradle/caches folder, deleted *.iml files... no luck whatsoever.
Any hints on this issue? Do you believe the "Unsupported Modules Detected" error has anything to do with the "Cannot resolve symbol" issue?
Thanks a lot and best regards.
[Deleted User] <[Deleted User]> #31
Did some research and this was the outcome after trying in Windows & Mac.
When importing a project you have a screen giving 2 options:
- Create project from existing sources
- Import project from external model
Inside the "Import project from external model" there are 2 more options:
- Android Gradle
- Gradle
If you select "Android Gradle" everything is fine, no false positive at all.
If you select Gradle you will get the false positive error message every time you open that project in Android Studio.
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #32
ep...@google.com <ep...@google.com> #33
ya...@gmail.com <ya...@gmail.com> #34
ep...@google.com <ep...@google.com> #35
for one of older project
deleted .idea and .gradle folders, re-downloaded the project, deleted ~/.gradle/caches folder, deleted *.iml files..
work ok.
ak...@gmail.com <ak...@gmail.com> #36
+1
fe...@lindenlab.com <fe...@lindenlab.com> #37
when I double-click on the "build" gradle task - but the IDE can't build the Java modules itself. Better to use a Makefile for
building I guess ...
ep...@google.com <ep...@google.com>
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #38
pe...@gmail.com <pe...@gmail.com> #39
hu...@google.com <hu...@google.com> #40
If you get this message it is unlikely to be the same old issue. Please file a new bug and, if possible, share your idea.log
files (Help | Show Log...
). Thank you!
ma...@gmail.com <ma...@gmail.com> #41
bug reported
hu...@google.com <hu...@google.com> #42
[Deleted User] <[Deleted User]> #43
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #44
ma...@gmail.com <ma...@gmail.com> #45
--clustering_fields: Comma separated field names. Can only be specified with
time based partitioning. Data will be first partitioned and subsequently
"clustered on these fields.
ep...@google.com <ep...@google.com> #46
sa...@gmail.com <sa...@gmail.com> #47
everyone who's requested whitelisting for the partitioning alpha to
also be requesting whitelist for the clustering alpha?
And to avoid issue tracker spam, could you give a form or email to
contact for whitelist requests? Maybe e.g. just a single google form
for all alpha feature whitelist requests, which you keep updated w/
whatever the current set of available alphas is — and notify anyone
who's requested any alpha before about new alphas available?
Remember, buganizer hides your email/name from us, though not vice
versa, so we can't email you offlist. Which, while on the subject,
does not seem very friendly to me. :-/
[Deleted User] <[Deleted User]> #48
ma...@gmail.com <ma...@gmail.com> #49
ep...@google.com <ep...@google.com>
[Deleted User] <[Deleted User]> #50
how do I do a whitelist request for the --clustering_fields feature?
ep...@google.com <ep...@google.com> #51
ep...@google.com <ep...@google.com> #52
[Deleted User] <[Deleted User]> #53
We have done a few small experiments with the clustering Alpha but do not see the advantages yest. Let me share you what we did / our findings:
-The basis is a user game / useractivity table that we've copied to a clustered table (time partitioned on firstTimeActivity, clustered on app_name
-When we now filter on firstTimeActivity, we have lower costs (logical, as we filter on time partitions)
-When we do a LIMIT X on the clustered table we see LOWER costs compared to doing a LIMIT X on the non partitioned and non clustered table. Even without any WHERE statements:
Job ID spil-bi:EU.bquijob_64230f64_1646953cea7
-However, when we filter on firstTimeActivity + app_name (the cluster key) we do NOT see reduced costs, NOR do we see a significant reduction in query time:
Job ID spil-bi:EU.bquijob_665faebd_164695a0583
ba...@aliz.ai <ba...@aliz.ai> #54
[Deleted User] <[Deleted User]> #55
[Deleted User] <[Deleted User]> #56
VP of Customer success , Data Scientist
Mobile *+972 (0) 52 420 6631 * | skype: hagit.ben.shoshan
On Thu, Jul 5, 2018 at 12:51 PM, <buganizer-system@google.com> wrote:
ep...@google.com <ep...@google.com> #57
I looked at the table in question there. There is only ~10MiB of data in each partition of the table. Clustering breaks the data further within each partition into blocks of some reasonable size (generally a few 100MiB). 10MiB of data per partition is too small for clustering to split. Billing applies at block granularity. This is one of the cases where partitioning differs from clustering. With clustering, BigQuery automatically infers splits the data into blocks, thus strict cost guarantees are not available (unlike partitioning which guarantees the partition boundaries). If you have order of a few GiB of data per partition, a query like yours will see cost reduction and performance improvement.
[Deleted User] <[Deleted User]> #58
Tested it with 3+ GB time partitions, and works now.
We are super happy about this!
tt...@monsanto.com <tt...@monsanto.com> #59
I have been attempting to apply for the alpha whitelist at the URL made available above:
However, the site repeatedly asks me to sign-in, and even once I have signed in, it will not allow me to edit the form (and thus submit). Can you please advise me on how I can submit a request for whitelisting?
Thank you
ep...@google.com <ep...@google.com> #60
ep...@google.com <ep...@google.com> #61
mi...@shopify.com <mi...@shopify.com> #62
ep...@google.com <ep...@google.com> #63
Is there a reason why you cannot use clustering to achieve this? Partitioning offers some guarantees that clustering currently does not, but I am curious to know if your scenario really needs partitioning and clustering wouldn't suffice.
ke...@gmail.com <ke...@gmail.com> #64
There have been a number of questions in this issue about whether Google intends to support both a date partition and an integer/string partition _on the same table_ (ie, two-level partitioning). As far as I can see re-reading the entire thread, answering these questions has always been carefully avoided. ;)
[Deleted User] <[Deleted User]> #65
ep...@google.com <ep...@google.com> #66
We are aware of some issues with load time increase for clustering in certain scenarios and are working actively on improving its performance and are going to rollout more improvements in the near future. There is a some amount of cost to pay to arrange data in a way that makes queries (write once, read multiple times) efficient and cost-effective. That said, we have work ongoing to reduce the impact of this.
Clustering is our recommended mechanism to obtain two level partitioning. It offers finer grained partitioning without significant metadata maintenance overhead.
ke...@gmail.com <ke...@gmail.com> #67
“Over time, as more and more operations modify a table, the degree to which the data is sorted begins to weaken, and the table becomes partially sorted. In a partially sorted table, queries that use the clustering columns may need to scan more blocks compared to a table that is fully sorted. You can re-cluster the data in the entire table by running a SELECT * query that selects from and overwrites the table (or any specific partition in it). In addition, any arbitrary portion of the table can be re-clustered using a DML MERGE statement.”
In other words, you need to manually cause the data in a clustered table to be re-clustered from time to time, if you wish to retain the benefits of clustering. With a partitioned table, you don’t need to do this.
in...@gmail.com <in...@gmail.com> #68
Closing the issue render.
ri...@gmail.com <ri...@gmail.com> #69
th...@google.com <th...@google.com> #70
pe...@gmail.com <pe...@gmail.com> #71
Then you can use clustering on 5 columns as you define.
th...@google.com <th...@google.com> #72
I just tested it. Querying a null-partitioned clustered table specifying the cluster column in the where clause does reduce the amount of data read.
It still feels like there should be a cleaner way to do this however. Hopefully somebody from the product team can comment on this trick? :)
pe...@gmail.com <pe...@gmail.com> #73
From docs:
Partitioned tables are subject to the following limitations:
The partitioning column must be either a scalar DATE or TIMESTAMP column. While the mode of the column may be REQUIRED or NULLABLE, it cannot be REPEATED (array-based)
so NULLABLE is there, and you can just read adding a NULL value to all rows means you have one single partition.
ya...@gmail.com <ya...@gmail.com> #74
Clustering is like ordering so this should work well when data don't change (much). Partitioning would be more efficient and effective but so would be materializing query result to mimic an index. All depends on the use case, query time/cost, cardinality,...
Still, after more than 2yrs, it would be good to hear back from the product team on this popular request.
ep...@google.com <ep...@google.com> #75
Note that with clustering, even in the event of data appends we try to keep things clustered in the background. Partitioning on date + clustering is generally a good approach for some of our users. Once a date becomes inactive, we will try to get the partition to a fully clustered state (we are constantly making improvements here). Also, clustering can provide significant cost reduction when datasets are over a few GB (with a partitioned table, we require over a few GB of data within that partition for cost reduction to kick in).
ke...@gmail.com <ke...@gmail.com> #76
Or, if not, is that ever likely to be an option?
ya...@gmail.com <ya...@gmail.com> #77
Would joins work on integer-based partitions ? Where needed, idea would be to use the integer as a hashcode of a string, or as uniqueID to a string stored as pairs in a master table. Note that the latter is possible using the current date-based partitions (date --> string).
hu...@google.com <hu...@google.com> #78
Re Yannick: Sent you an email. What do you mean would joins work? You can sure join on an integer column.
aj...@gmail.com <aj...@gmail.com> #79
[Deleted User] <[Deleted User]> #80
hu...@google.com <hu...@google.com> #81
[Deleted User] <[Deleted User]> #82
hu...@google.com <hu...@google.com> #83
[Deleted User] <[Deleted User]> #84
We manage data loading hour by hour, and we want to be able to load or re-load (in case of late data or corrupted data to reprocess) in an atomic manner.
Today, we can do that day by day (by atomically replacing an entire partition). To do it hour by hour, we have to use 1 table per hour, to have the same guarantee of atomicity. Having 1 table by hour is a pain to query and maintain.
I guess we could hack our way with integer partitioning (by having an int field representing a date like 2018123123), but a date partitioning with hour granularity instead of day would better fit our need.
hu...@google.com <hu...@google.com> #85
[Deleted User] <[Deleted User]> #86
(1) with this feature enabled, what is the maximum number of partitions that a table will support?
(2) will it be possible to utilize two-level partitioning (i.e., first partition a table by date, and then by the integer field)?
(3) are you guys also considering supporting hour partitions? Integer partitioning is great, but in some use-cases hour partitions would be a more natural fit (as others have pointed out).
Thanks,
Conrad
hu...@google.com <hu...@google.com> #87
Regarding your questions:
(1) The maximum number of partitions will be the same as time partitioning.
(2) Partitioning + clustering is our recommendation if you need to partition by multiple fields.
(3) There's no plan for hourly partitions. Alban gave a good use case for hourly partitions, but we believe most other cases can be satisfied by partitioning and clustering on the timestamp field.
[Deleted User] <[Deleted User]> #88
currently 4000 as specified here:
there have been some other number floating around the issue trackers.
I agree that for many use-cases partitioning+clustering this is an ideal
solution, but I don't think it will work for some of mine -- see the
details here:
On Tue, Apr 16, 2019 at 11:55 PM <buganizer-system@google.com> wrote:
ep...@google.com <ep...@google.com> #89
We are doing some a fair amount of work with streaming to keep the table clustered upto a certain recent time interval. We don't have a good ETA to offer on this at this point, but we hope to have more information on this soon. In general, Date partitioning + clustering is likely to work best where data is generally arriving for current date, since the system then doesn't have to recluster older dates often.
[Deleted User] <[Deleted User]> #90
Do you have any paper related to this partitioning in BQ and do we have a way to figure out the bytes billed approximately before execution ?
is there any edge cases where small updates of the table change the cost of the same query significantly ?
Thanks,
Samir
hu...@google.com <hu...@google.com> #91
Billing-wise it works the same as time partitioning. You can find out the cost of the query through a dry-run. I can't think of any small updates that could increase the cost of a query significantly. Have you seen such cases on time partitioning?
[Deleted User] <[Deleted User]> #92
hu...@google.com <hu...@google.com> #93
[Deleted User] <[Deleted User]> #94
hu...@google.com <hu...@google.com> #95
ma...@icteam.it <ma...@icteam.it> #96
hu...@google.com <hu...@google.com> #97
ho...@google.com <ho...@google.com> #98
-
bj...@s-communication.de <bj...@s-communication.de> #99
ra...@gmail.com <ra...@gmail.com> #100
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #101
pe...@gmail.com <pe...@gmail.com> #103
a) partition by event string +cluster by 4 other columns
b) partition by an arbitrary column + cluster by 4 columns (one is event string)
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #104
My use case is having trillions of events to query by eventName (so single string as the effective PK, but a bounded unique set at least), and these names are not all controlled by BQ, the master source system we ingest can introduce more event types with their releases. We could build a BQ lookup table from name -> int, but that requires maintenance.....Id want to for now look to use something like FarmHash (or other option) in the interim to do more dynamically.
I also wish the PK fields could be nested some levels (not arrays just nesting via structs e.g PK of b where sourced from struct a.b)
Thanks!
pe...@gmail.com <pe...@gmail.com> #105
hu...@google.com <hu...@google.com> #106
+1 for clustering. We're going to support clustering without partitioning soon.
ku...@xiatech.co.uk <ku...@xiatech.co.uk> #107
Cheers!
pe...@gmail.com <pe...@gmail.com> #108
[Deleted User] <[Deleted User]> #109
Are you planning in your roadmap to include string as a partition field?
Thanks.
ep...@google.com <ep...@google.com> #110
[Deleted User] <[Deleted User]> #111
ya...@gmail.com <ya...@gmail.com> #112
There are use cases to support partitioning instead of clustering though - as discussed months ago in the G beta user group - like dropping a partition instead of scanning for a cluster to delete.
If you have 1000 or so tables to "resize" e.g. replace a partition, dropping an object is immediate and free (DROP is a DDL) whereas scanning each table for deletion is slow and costly (DELETE is a DML).
The above assumes that dropping a partition through a DDL (or API) will be supported soon... which G had said is in the works.
bw...@google.com <bw...@google.com> #113
bw...@google.com <bw...@google.com>
[Deleted User] <[Deleted User]> #114
Hope you guys consider adding it to your roadmap, this or next year.
Currently, due to this kind of limitation (including only 4000 partitions per table), we have one table per year (date partition) and one table per season (date partition as well), which is bad in terms of ETL, costs and maintenance, for obvious reasons.
With a string column as a partition field, we could be able to get rid of most of them.
Already tried the Farm-Fingerprint hash function but it doesn't work as expected, different string values are getting the same integer value.
Anyway, looking forward to see this feature in action... maybe one day.
ya...@gmail.com <ya...@gmail.com> #115
#114, that's pretty weird as FARM_FINGERPRINT() is unique. The problem is using it for partitioning, which is currently not possible as the partition key is not known upfront.
Hash partitioning e.g. PARTITION BY HASH(salesman_id) would work for most of the use cases but there is no sign G is working on supporting this. Even the partition by LIST() is stalled.
[Deleted User] <[Deleted User]> #116
Any news regarding this subject? Anything on the roadmap for 2021?
Thanks.
[Deleted User] <[Deleted User]> #117
Thanks in advance.
[Deleted User] <[Deleted User]> #118
ep...@google.com <ep...@google.com> #119
without any limits on the number of partitions. The system can
automatically determine the partitioning by a variety of column types and
supports composition of multiple columns.
We understand there are some special cases where users want control on the
partition boundaries and the ability to address partitions by name.
However, for many cases clustering does satisfy the requirements.
Worth noting is that while clustered tables have flexibility and almost
infinite scalability , the exact cost of query (bytes processed) is not
known upfront (dry run provided value) and we only provide an upper bound.
The cost at the end of the query does take partition pruning into account
and only charges for the blocks of data that BigQuery actually ends up
scanning.
[Deleted User] <[Deleted User]> #120
thanks.
ep...@google.com <ep...@google.com> #121
[Deleted User] <[Deleted User]> #122
+1
sh...@gmail.com <sh...@gmail.com> #123
Appreciate any sort of help
pr...@gmail.com <pr...@gmail.com> #124
Any idea on the progress when this will be made possible
Description
1) Date - good for broad recent queries
2) ID - good for narrow historical queries
I'm happy to pay to store my data twice, and in fact I am currently doing that in 10M tables. I would love to be able to benefit from the partitioning improvements rather than maintain them all manually.
Is key based partitioning coming up soon on the roadmap?
Thanks,
Derek