Status Update
Comments
cz...@google.com <cz...@google.com> #2
Thank you for writing in. You have filed this as a bug, but from what I can see this is more of a Feature Request, would you like me to change the category of the issue to a Feature Request?
Could you add in a few examples of the requests you are making and the types of responses you are dealing with?
Can you describe a real-life example in which you use these requests and the quantified drawbacks of the current system - i.e. You mention that it is prohibitively expensive, how exactly do you mean? It takes too long? How long does it take?
Can you also clarify if your sheet goes over the 1 million cell limit of Google Sheets? You mention it has 200K+ rows, so if you have more than 5 columns, you are at the limit of what Sheets is intended for and you may want to consider moving to a database.
Finally, can you elaborate in your words the business impact that this new feature (if it is a feature request), would have on your business.
Thank you.
cz...@google.com <cz...@google.com>
cz...@google.com <cz...@google.com>
[Deleted User] <[Deleted User]> #3
Hi,
You asked a lot of questions, I tried to answer everything one by one. Let me know if I missed something.
You have filed this as a bug, but from what I can see this is more of a Feature Request, would you like me to change the category of the issue to a Feature Request?
I am currently migrating from V3, if I am not able to keep doing what I am doing in some way in V4, I consider this a bug, I think it would be a stretch to call this a feature.
Could you add in a few examples of the requests you are making and the types of responses you are dealing with?
Here are some basic minimalistic examples. The theme is the same, I want to append a new row to the end of the spreadsheet. Simple with V3, impossible with V4 to do it correctly.
Example 1: Values.Append
Add a new row to the sheet. Given data in row A1 "<empty>, something", and using Values.Append and inputRange="A1", the new row will be inserted to A1 instead of A2.
To fix this let's change inputRange to 1:1. Given data in row A1 "<empty>, something, <empty>, something", the new will be inserted into D2 instead of A2.
Or I could use range A:Z, but that also has the same issue, any empty row and column would make the insert misaligned and not start from column A.
I couldn't find any solution in V4 that would work in all cases that might occur in our spreadsheets. We haven't had such problems with V3.
Example 2: AppendCellsRange
Let's assume you fixed
I want to add a new row and return it's position and the inserted data. AppendCellsRange will add the row to the end of the sheet, but does not return the data.
I would need to make Values.Get request to read the full sheet. But I have a sheet with 160K rows and 16 columns. Instead I can do mutliple Values.Get, let's say 16x10K rows and try to search for the data, but this takes 16x the roundtime + it will hit the quota limit very fast and lock out my user for a while.
Can you describe a real-life example in which you use these requests and the quantified drawbacks of the current system - i.e. You mention that it is prohibitively expensive, how exactly do you mean? It takes too long? How long does it take?
The real-life example is Google Sheets integration with our own system. This is a feature we provide to some of our customers who need this. Our customers manage records of data in their spreadsheet, one record per row. While they use our system, they add and update rows in their sheet in real time.
The workflow looks like this:
- User navigates to a website and uses our service to store data from this website to a row in their spreadsheet, a new row to the end of the sheet, the problem I described above applies here.
- User navigates to another site, and does the same.
- User comes back to the first site and expects to see their data loaded from their spreadsheet in our service. So we need to search the sheet for the identifier and read the row, with V4 this is almost impossible, so we had to add complicated logic to index the spreadsheet and trying to refresh this inde as needed to speed up searches. Otherwise, or during indexing, we read the full sheet in batches of 10k rows, but without the index (cache) this dies on quota limits pretty quickly even after they got bumped by quite a lot, plus it takes a lot of time.
- User edits the loaded data and saves it back to the sheet. With V4 this is also become worse, We have to calculate checksums and try to to do a Get just before the Update and check for any concurrent changes, while with V3 the concurrent changes where handled automatically using eTags.
So in a nutshell this is our business logic for these customers.
You mention that it is prohibitively expensive, how exactly do you mean? It takes too long? How long does it take?
I think I explained this, reading the full sheet takes too much time and Google APIs are also timing out quite a lot, so we try to make more reliable calls by splitting the sheet into more predictable batches of fixed sizes, but this increases the number of API calls, again quota and latency.
It takes too long? How long does it take?
I hope you are aware of your own APIs performance, but in case not here are some numbers.
This is only for retrieving 4 columns using BatchGetValuesByDataFilterRequest.
Batch Size | Memory | Latency | Total latency (100k rows) |
---|---|---|---|
1k | 4-5 MB | 0.4s | 40s |
5k | 6 MB | 0.6s | 12s |
10k | 11 MB | 0.7s | 7s |
20k | 20 MB | 1s | 5s |
30k | 28 MB | 1.2s | 4s |
50k | 44 MB | 1.3s | 2.6s |
100k | 84 MB | 2s | 2s |
We handle a lot of users, so latency is only part of the issue, the other is keeping memory usage reasonable. So we cannot read the full spreadsheet of all users into memory every time. Consider sheets with 16 columns and 100K+ rows.
Here is an alternative using Spreadsheets.Get
Batch Size | Memory | Latency | Total latency (100k rows) |
---|---|---|---|
1k | 40 MB | 1.5s | 150s |
10k | 400 MB | 8s | 80s |
This is unusable, the Google Java SDK creates loads of objects and it is very slow here too.
Can you also clarify if your sheet goes over the 1 million cell limit of Google Sheets?
Are you sure about the 1 million cell limit? We have spreadsheets with over 1M cells that work without problems (especially with V3).
Please see here
Up to 5 million cells or 18,278 columns (column ZZZ) for spreadsheets that are created in or converted to Google Sheets.
So your service seems to have a limit for 5 million cells, so my example sheet of 200K rows and 16 columns is way below the limits.
You mention it has 200K+ rows, so if you have more than 5 columns, you are at the limit of what Sheets is intended for and you may want to consider moving to a database.
Again, your comment about the limit is false. The limit is 5 million and we are well aware of it, but most of our users are well below this limit, but a 200K row, 16 column sheet, which is well within limits of Google Sheets, is still to large to be read in full on every operation.
I cannot ask my users who are individuals without technical background to swap their spreadsheets with a database. They want to keep records in their spreadsheets, Google Sheets V3 works perfectly fine, V4 breaks a lot of this integration.
Finally, can you elaborate in your words the business impact that this new feature (if it is a feature request), would have on your business.
From my POV this is a bug in V4 as it breaks functionality that used to work in V3 and as I mentioned above, to consider appending a single row to the end of the spreadsheet as a new feature sounds like a stretch to me.
The business impact of not having this and losing V3 is that we might need to abandon our Google Sheets integration and risk losing around 2K customers.
al...@google.com <al...@google.com> #4
Thank you very much for the extensive details on the context around this request, it is very helpful.
Apologies about the 1 million cells comment, I am not sure where I got that from but you are absolutely right about the 5 million cell limit.
That said, from what I can make out, there are three main things your issue is about.
-
spreadsheets.values.append
does not return the row in which it added the values just appended, and it does not include the values. -
spreadsheets.values.append
sometimes does not detect or input values in the right place. -
spreadsheets.values.append
does not support more than 4 concurrent requests.
For 2, this has already been raised here
For 3, as you have already found, has another issue that is being dealt with separately.
Ensure to "star" these issues so that we know more people are affected and this will also subscribe you to updates.
For number 1, could you please review this example:
- Create a new sheet.
- In range B6:E9 manually insert [[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
- Go to
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/append and use the "Try this API" and input:
- the Spreadsheet ID
- the range string
Sheet1!B6
includeValuesInResponse
astrue
valueInputOption
asRAW
- and the request body as below:
{
"values": [
[
1,
2,
3
]
]
}
This for me returns a response:
{
"spreadsheetId": "[SS_ID]",
"tableRange": "Sheet1!B6:C9",
"updates": {
"spreadsheetId": "[SS_ID]",
"updatedRange": "Sheet1!B10:D10",
"updatedRows": 1,
"updatedColumns": 3,
"updatedCells": 3,
"updatedData": {
"range": "Sheet1!B10:D10",
"majorDimension": "ROWS",
"values": [
[
"1",
"2",
"3"
]
]
}
}
}
Which as you can see in updatedRange
contains the range of the new cells, and this information is repeated in updatedData
which also contains the values that were just appended.
Or I may be missing something?
cz...@google.com <cz...@google.com>
[Deleted User] <[Deleted User]> #5
Hi,
Thank you very much for the quick response, you are close, but some of the points are not related to the issue I was describing.
For number 1. spreadsheets.values.append
does return the inserted values and its position, I already use that and works perfectly well. But I was also testing spreadsheets.batchUpdate
with AppendCellsRequest
which works well for inserting rows to the expected last row, but it does not return the inserted data.
Your point in number 2 is not exactly what I was reporting either. I was not even aware of those edge cases reported in spreadsheets.values.append
(talking about the expected behavior, not bugs) does not provide a way to specifically insert a row starting from the first column after the last data row regardless of what rows and columns are missing above it.
To summarize, Google Sheets V4 provides 2 ways to append data:
spreadsheets.values.append
that returns inserted values, but it is not possible to use it to insert data after the last data row starting from the first column without a suitable arrangement of the preceding rowsspreadsheets.batchUpdate
'sAppendCellsRequest
does insert rows after the last data row starting from the first column as we need, but it does not return the inserted values and range.
In the end this means that none of the options provided by V4 are correct alternatives to V3's insert
operation and for our use case.
From the above two requests I prefer to use spreadsheets.values.append
that is faster and returns the inserted data and range. If we could solve the problem of inserting rows into the first column after the last data row, it would be awesome. Alternatively, if AppendCellsRequest
would return the inserted data or range, that would work for us too.
To summarize my suggestions:
- A. Add an option to
spreadsheets.values.append
to allow inserting data from the first column after the last data row such asAppendCellsRequest
regardless of table arrangements. - B. Or add a new request e.g.
spreadsheets.values.appendCells
that behaves likeAppendCellsRequest
but returns the inserted data and range just likespreadsheets.values.append
- C. Or make
AppendCellsRequest
return anAppenCellsResponse
that returns the inserted data and range on request.
Basically, you have 2 types of append requests that behave differently, what we need is an append request that is the combination of the two which would behave similarly to V3's insert
operation.
This is not specifically a bug in the implementation. What I am reporting is a bug in the design of V4 that either does not allow deterministic append behavior if the arrangement of data is not know to the caller or does not return the inserted data and range, and thus in the end does not provide a good alternative for migration from V3's insert
operation.
al...@google.com <al...@google.com> #6
Another datapoint on why this is considered a bug is based on the Google announcement here
We want to make sure that projects originally built on the v3 API continue working long after it is gone.
This is the only thing we want too, but sometimes V4 makes it hard to do so and already forced us to make several workarounds, so that I cannot fully agree with the below statement from the same announcement.
As part of the migration to the Sheets v4 API, which provides a better developer and user experience, we will be retiring the Sheets v3 API
[Deleted User] <[Deleted User]> #7
Thank you for the further information.
Let me try and summarize this in my words to see if we are in agreement:
You are reliant on the behavior of appending cells in v3 (specifically the insert
behavior).
The task is, in a very large spreadsheet, to insert a row at the end of the sheet and have the response include the data inserted and its location in the sheet.
v4 offers two methods to do this, spreadsheet.values.append
and spreadsheets.batchUpdate
using the appendCells
request and both of these methods have slightly different behavior.
spreadsheet.values.append
will try to automatically detect the table in the range that is given as a non-optional parameter and will append the data after the last line of the "table". This does return the the data inserted and its location/
spreadsheets.batchUpdate
using the appendCells
request, does not seem to detect tables in this way, and will just detect the last row that has data and append it to that starting in row A
. This can return the position and values, but only in the form of the whole spreadsheet resource.
For your use case, neither is suitable. spreadsheet.values.append
does not work because you may have some empty rows and so it might not insert values at the end of the table. spreadsheets.batchUpdate
is not suitable because the spreadsheet resource is too large.
Is that accurate?
ja...@jthegedus.com <ja...@jthegedus.com> #8
Yes, your summary is spot on.
spreadsheets.batchUpdate
is not suitable because the spreadsheet resource is too large.
I would just add that it also does not return the appended row position and data and hence the only other option is to retrieve the sheet which is too large to do so on a regular basis.
[Deleted User] <[Deleted User]> #9
Thank you. I have raised this internally. Once we have some updates, we will post them here.
[Deleted User] <[Deleted User]> #10
pr...@hawkfish.us <pr...@hawkfish.us> #11
This also is a big limitation for Terraform users. It would be really nice to be able to have a Terraform module that creates a project and automatically connects the repo, and setup cloud build triggers. Right now, you have to apply multiple times because the triggers can't be created until the connection is complete.
ad...@gmail.com <ad...@gmail.com> #12
[Deleted User] <[Deleted User]> #13
[Deleted User] <[Deleted User]> #14
jm...@fitbit.com <jm...@fitbit.com> #15
Hi,
There is an API method to connect to a GitHub repository first, but it is not part of projects.triggers API, but projects.repos [1] API. When calling this API directly, the request body requires a Repo [2] object, in which you will set the mirrorConfig [3] where your GitHub or BitBucket repository will be specified.
This is incorrect, we can't create a mirror via the API. The mirrorConfig field from the Repo object is read only, it can't be set.
za...@codexhealth.com <za...@codexhealth.com> #16
[Deleted User] <[Deleted User]> #17
an...@tryon.technology <an...@tryon.technology> #18
[Deleted User] <[Deleted User]> #19
move from building on cloud-build to building with github-actions. As a
bonus, for node.js, github actions are twice as fast thanks to cool caching
options.
So Google, if you're reading this, be aware that you're "earning" churn
over this one ...
On Fri, 9 Oct 2020, 1:32 , <buganizer-system@google.com> wrote:
jr...@glynndevins.com <jr...@glynndevins.com> #20
an...@tryon.technology <an...@tryon.technology> #21
Right now as a workaround, we using Terraform for keeping our Infrastructure under control. One of the parameters that we use is a commit hash and GitHub token. So basically, for each terraform run (in Terraform Cloud) we invoke local-exec to sync code from a Github repo Cloud Repository. Where for Cloud Repository we have an automated trigger for Cloud Build. In GitHub we are invoking terraform deployments but creating a new terraform runs and provide current commit hash, so it can resync the sources to activate a build.
Of course, it would be much efficient, if the Trigger template was properly working or if the mirroring API was provided (right now it is a read-only property of the API).
If those things were working correctly, it would be possible to automate the full cycle. Terraform would be able to set up the Cloud Build triggers and connect source repositories, and so the stuff would work with no workarounds.
Hope this can be fixed soon, in the meantime, we use those unnecessary steps.
[Deleted User] <[Deleted User]> #22
si...@mogoplus.com <si...@mogoplus.com> #23
Here's an example of why I'd love to have that in GCP, too: I consulted for a previous company using GCP, and it turned out different employees had manually added different repos under their own credentials over time. When they left, builds broke, and the repos had to be manually added again (under better credentials). Automation definitely makes it easier to avoid maintenance traps like this.
fe...@britned.com <fe...@britned.com> #24
[Deleted User] <[Deleted User]> #25
I would like to add an additional use case to this. We are currently looking to build end-to-end automation with some image build pipelines, where we have Terraform is automating everything. The automation includes the creation of GitHub repos + GCP projects, so therefore the projects nor the repos pre-exist to be able to manually connect the repos. Our only course of action currently is to have the Terraform automation fail and then re-run it, or to have separated Terraform plans with this manual repo-connection step in the middle.
Would love to see an API endpoint and/or routine to be able to automated the connected repo piece.
Cheers
gu...@finaptic.com <gu...@finaptic.com> #26
ar...@gmail.com <ar...@gmail.com> #27
di...@bulder.no <di...@bulder.no> #28
We're considering adopting cloud build. This feature would allow us to onboard a new github repository through Terraform. Since this feature is missing, we are forced to perform manual operations in the GCP console whenever one of our developers want to build a new microservice housed in its own GitHub repository.
If it was possible to perform google
Terraform provider), the new repository could be completely integrated via Terraform, without the need for manual UI operations.
vi...@b6tp.com <vi...@b6tp.com> #29
ev...@umaproject.org <ev...@umaproject.org> #30
ma...@gmail.com <ma...@gmail.com> #31
how is it going? I'm making a game about programming for others and I'm stuck here,
Solution?
It says in Cloud Build documentation
When you set up a build trigger with an external repository for the first time, you'll need to set up authorization with that repository. For more information, see Adding a repository as a remote.
GitHub
- create GCS bucket
- add permission
- define cloudbuild.yaml or command-line flag
[Deleted User] <[Deleted User]> #32
fa...@gmail.com <fa...@gmail.com> #33
Add comment
[Deleted User] <[Deleted User]> #34
Abandoned cloudbuild in favor of Github-Actions almost a year ago, the migration was super-easy and our developers are much happier now that the builds are consistently twice as fast (building node.js docker containers)
mi...@gmail.com <mi...@gmail.com> #35
157513027
mi...@gmail.com <mi...@gmail.com> #36
SIJM
ba...@google.com <ba...@google.com>
su...@google.com <su...@google.com> #37
Hello,
This feature is available now. This has been launched with the launch of Cloud Build Repositories. Additionally you can go through the
Description
The creation of the trigger works fine using the new GitHubEventsConfig block (
I'd like to request an addition to the API: connecting a repository via the Cloud Build GitHub App. The app is already authenticated for all repositories so that hopefully shouldn't be a blocker.
Without this feature, full automation of GitHub App triggers is not possible (the only alternative is fallback to mirrored GitHub repositories, but then we don't get the benefit of GitHub checks), so I think it's quite a necessity - hopefully you agree too!
Thanks,
Rob