Verified

Feature Request

Status Update

No update yet.

Description

[Deleted User]

created issue #1

Oct 12, 2019 12:59PM

Hello, I'm very happy that support for GitHub App triggers has landed in beta. I have been experimenting with the API and also with the Terraform google-beta provider.

The creation of the trigger works fine using the new GitHubEventsConfig block (

https://cloud.google.com/cloud-build/docs/api/reference/rest/v1/projects.triggers#GitHubEventsConfig), but it doesn't seem possible to connect a project with a repository via the API. If this is not done first, manually, then the trigger creation will fail with, "Error 400: Repository mapping does not exist."

I'd like to request an addition to the API: connecting a repository via the Cloud Build GitHub App. The app is already authenticated for all repositories so that hopefully shouldn't be a blocker.

Without this feature, full automation of GitHub App triggers is not possible (the only alternative is fallback to mirrored GitHub repositories, but then we don't get the benefit of GitHub checks), so I think it's quite a necessity - hopefully you agree too!

Thanks,
Rob

Comments

cz...@google.com <cz...@google.com> #2Oct 17, 2019 08:32AM

Thank you for writing in. You have filed this as a bug, but from what I can see this is more of a Feature Request, would you like me to change the category of the issue to a Feature Request?

Could you add in a few examples of the requests you are making and the types of responses you are dealing with?

Can you describe a real-life example in which you use these requests and the quantified drawbacks of the current system - i.e. You mention that it is prohibitively expensive, how exactly do you mean? It takes too long? How long does it take?

Can you also clarify if your sheet goes over the 1 million cell limit of Google Sheets? You mention it has 200K+ rows, so if you have more than 5 columns, you are at the limit of what Sheets is intended for and you may want to consider moving to a database.

Finally, can you elaborate in your words the business impact that this new feature (if it is a feature request), would have on your business.

Thank you.

Message last modified on Oct 17, 2019 08:45AM

cz...@google.com <cz...@google.com> Oct 17, 2019 09:55AM

Assigned to gc...@google.com.

cz...@google.com <cz...@google.com> Oct 17, 2019 12:37PM

Reassigned to cz...@google.com.

[Deleted User] <[Deleted User]> #3Oct 22, 2019 08:34AM

Hi,

You asked a lot of questions, I tried to answer everything one by one. Let me know if I missed something.

You have filed this as a bug, but from what I can see this is more of a Feature Request, would you like me to change the category of the issue to a Feature Request?

I am currently migrating from V3, if I am not able to keep doing what I am doing in some way in V4, I consider this a bug, I think it would be a stretch to call this a feature.

Could you add in a few examples of the requests you are making and the types of responses you are dealing with?

Here are some basic minimalistic examples. The theme is the same, I want to append a new row to the end of the spreadsheet. Simple with V3, impossible with V4 to do it correctly.

Example 1: Values.Append

Add a new row to the sheet. Given data in row A1 "<empty>, something", and using Values.Append and inputRange="A1", the new row will be inserted to A1 instead of A2.

To fix this let's change inputRange to 1:1. Given data in row A1 "<empty>, something, <empty>, something", the new will be inserted into D2 instead of A2.

Or I could use range A:Z, but that also has the same issue, any empty row and column would make the insert misaligned and not start from column A.

I couldn't find any solution in V4 that would work in all cases that might occur in our spreadsheets. We haven't had such problems with V3.

Example 2: AppendCellsRange

Let's assume you fixed https://issuetracker.google.com/issues/36760568 after 5 years.

I want to add a new row and return it's position and the inserted data. AppendCellsRange will add the row to the end of the sheet, but does not return the data.

I would need to make Values.Get request to read the full sheet. But I have a sheet with 160K rows and 16 columns. Instead I can do mutliple Values.Get, let's say 16x10K rows and try to search for the data, but this takes 16x the roundtime + it will hit the quota limit very fast and lock out my user for a while.

Can you describe a real-life example in which you use these requests and the quantified drawbacks of the current system - i.e. You mention that it is prohibitively expensive, how exactly do you mean? It takes too long? How long does it take?

The real-life example is Google Sheets integration with our own system. This is a feature we provide to some of our customers who need this. Our customers manage records of data in their spreadsheet, one record per row. While they use our system, they add and update rows in their sheet in real time.

The workflow looks like this:

User navigates to a website and uses our service to store data from this website to a row in their spreadsheet, a new row to the end of the sheet, the problem I described above applies here.
User navigates to another site, and does the same.
User comes back to the first site and expects to see their data loaded from their spreadsheet in our service. So we need to search the sheet for the identifier and read the row, with V4 this is almost impossible, so we had to add complicated logic to index the spreadsheet and trying to refresh this inde as needed to speed up searches. Otherwise, or during indexing, we read the full sheet in batches of 10k rows, but without the index (cache) this dies on quota limits pretty quickly even after they got bumped by quite a lot, plus it takes a lot of time.
User edits the loaded data and saves it back to the sheet. With V4 this is also become worse, We have to calculate checksums and try to to do a Get just before the Update and check for any concurrent changes, while with V3 the concurrent changes where handled automatically using eTags.

So in a nutshell this is our business logic for these customers.

You mention that it is prohibitively expensive, how exactly do you mean? It takes too long? How long does it take?

I think I explained this, reading the full sheet takes too much time and Google APIs are also timing out quite a lot, so we try to make more reliable calls by splitting the sheet into more predictable batches of fixed sizes, but this increases the number of API calls, again quota and latency.

It takes too long? How long does it take?

I hope you are aware of your own APIs performance, but in case not here are some numbers.

This is only for retrieving 4 columns using BatchGetValuesByDataFilterRequest.

Batch Size	Memory	Latency	Total latency (100k rows)
1k	4-5 MB	0.4s	40s
5k	6 MB	0.6s	12s
10k	11 MB	0.7s	7s
20k	20 MB	1s	5s
30k	28 MB	1.2s	4s
50k	44 MB	1.3s	2.6s
100k	84 MB	2s	2s

We handle a lot of users, so latency is only part of the issue, the other is keeping memory usage reasonable. So we cannot read the full spreadsheet of all users into memory every time. Consider sheets with 16 columns and 100K+ rows.

Here is an alternative using Spreadsheets.Get

Batch Size	Memory	Latency	Total latency (100k rows)
1k	40 MB	1.5s	150s
10k	400 MB	8s	80s

This is unusable, the Google Java SDK creates loads of objects and it is very slow here too.

Can you also clarify if your sheet goes over the 1 million cell limit of Google Sheets?

Are you sure about the 1 million cell limit? We have spreadsheets with over 1M cells that work without problems (especially with V3).

Please see here https://support.google.com/drive/answer/37603

Up to 5 million cells or 18,278 columns (column ZZZ) for spreadsheets that are created in or converted to Google Sheets.

So your service seems to have a limit for 5 million cells, so my example sheet of 200K rows and 16 columns is way below the limits.

You mention it has 200K+ rows, so if you have more than 5 columns, you are at the limit of what Sheets is intended for and you may want to consider moving to a database.

Again, your comment about the limit is false. The limit is 5 million and we are well aware of it, but most of our users are well below this limit, but a 200K row, 16 column sheet, which is well within limits of Google Sheets, is still to large to be read in full on every operation.

I cannot ask my users who are individuals without technical background to swap their spreadsheets with a database. They want to keep records in their spreadsheets, Google Sheets V3 works perfectly fine, V4 breaks a lot of this integration.

Finally, can you elaborate in your words the business impact that this new feature (if it is a feature request), would have on your business.

From my POV this is a bug in V4 as it breaks functionality that used to work in V3 and as I mentioned above, to consider appending a single row to the end of the spreadsheet as a new feature sounds like a stretch to me.

The business impact of not having this and losing V3 is that we might need to abandon our Google Sheets integration and risk losing around 2K customers.

al...@google.com <al...@google.com> #4Oct 28, 2019 12:04PM

Thank you very much for the extensive details on the context around this request, it is very helpful.

Apologies about the 1 million cells comment, I am not sure where I got that from but you are absolutely right about the 5 million cell limit.

That said, from what I can make out, there are three main things your issue is about.

spreadsheets.values.append does not return the row in which it added the values just appended, and it does not include the values.
spreadsheets.values.append sometimes does not detect or input values in the right place.
spreadsheets.values.append does not support more than 4 concurrent requests.

For 2, this has already been raised here https://issuetracker.google.com/185272184, though your explanation and examples would potentially be more helpful than what is there already, so I would encourage you to copy that over there.

For 3, as you have already found, has another issue that is being dealt with separately.

Ensure to "star" these issues so that we know more people are affected and this will also subscribe you to updates.

For number 1, could you please review this example:

Create a new sheet.
In range B6:E9 manually insert [[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
Go to https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/append and use the "Try this API" and input:

the Spreadsheet ID
the range string Sheet1!B6
includeValuesInResponse as true
valueInputOption as RAW
and the request body as below:

{
  "values": [
    [
      1,
      2,
      3
    ]
  ]
}

This for me returns a response:

{
  "spreadsheetId": "[SS_ID]",
  "tableRange": "Sheet1!B6:C9",
  "updates": {
    "spreadsheetId": "[SS_ID]",
    "updatedRange": "Sheet1!B10:D10",
    "updatedRows": 1,
    "updatedColumns": 3,
    "updatedCells": 3,
    "updatedData": {
      "range": "Sheet1!B10:D10",
      "majorDimension": "ROWS",
      "values": [
        [
          "1",
          "2",
          "3"
        ]
      ]
    }
  }
}

Which as you can see in updatedRange contains the range of the new cells, and this information is repeated in updatedData which also contains the values that were just appended.

Or I may be missing something?

cz...@google.com <cz...@google.com> Oct 28, 2019 12:04PM

Reassigned to al...@google.com.

[Deleted User] <[Deleted User]> #5Oct 28, 2019 01:01PM

Hi,

Thank you very much for the quick response, you are close, but some of the points are not related to the issue I was describing.

For number 1. spreadsheets.values.append does return the inserted values and its position, I already use that and works perfectly well. But I was also testing spreadsheets.batchUpdate with AppendCellsRequest which works well for inserting rows to the expected last row, but it does not return the inserted data.

Your point in number 2 is not exactly what I was reporting either. I was not even aware of those edge cases reported in https://issuetracker.google.com/185272184 and I was afraid that there are more hidden issues and seems rightfully. My problem is that even if the issues reported in https://issuetracker.google.com/185272184 are fixed, there is no way to create a request that would always inserted into the last row of the spreadsheet as the current design of spreadsheets.values.append (talking about the expected behavior, not bugs) does not provide a way to specifically insert a row starting from the first column after the last data row regardless of what rows and columns are missing above it.

To summarize, Google Sheets V4 provides 2 ways to append data:

spreadsheets.values.append that returns inserted values, but it is not possible to use it to insert data after the last data row starting from the first column without a suitable arrangement of the preceding rows
spreadsheets.batchUpdate's AppendCellsRequest does insert rows after the last data row starting from the first column as we need, but it does not return the inserted values and range.

In the end this means that none of the options provided by V4 are correct alternatives to V3's insert operation and for our use case.

From the above two requests I prefer to use spreadsheets.values.append that is faster and returns the inserted data and range. If we could solve the problem of inserting rows into the first column after the last data row, it would be awesome. Alternatively, if AppendCellsRequest would return the inserted data or range, that would work for us too.

To summarize my suggestions:

A. Add an option to spreadsheets.values.append to allow inserting data from the first column after the last data row such as AppendCellsRequest regardless of table arrangements.
B. Or add a new request e.g. spreadsheets.values.appendCells that behaves like AppendCellsRequest but returns the inserted data and range just like spreadsheets.values.append
C. Or make AppendCellsRequest return an AppenCellsResponse that returns the inserted data and range on request.

Basically, you have 2 types of append requests that behave differently, what we need is an append request that is the combination of the two which would behave similarly to V3's insert operation.

This is not specifically a bug in the implementation. What I am reporting is a bug in the design of V4 that either does not allow deterministic append behavior if the arrangement of data is not know to the caller or does not return the inserted data and range, and thus in the end does not provide a good alternative for migration from V3's insert operation.

github-connect.png

123 KB

View

Download

al...@google.com <al...@google.com> #6Oct 29, 2019 02:56PM

Reassigned to gc...@google.com.

Another datapoint on why this is considered a bug is based on the Google announcement here https://cloud.google.com/blog/products/g-suite/migrate-your-apps-use-latest-sheets-api

We want to make sure that projects originally built on the v3 API continue working long after it is gone.

This is the only thing we want too, but sometimes V4 makes it hard to do so and already forced us to make several workarounds, so that I cannot fully agree with the below statement from the same announcement.

As part of the migration to the Sheets v4 API, which provides a better developer and user experience, we will be retiring the Sheets v3 API

[Deleted User] <[Deleted User]> #7Oct 29, 2019 03:40PM

Thank you for the further information.

Let me try and summarize this in my words to see if we are in agreement:

You are reliant on the behavior of appending cells in v3 (specifically the insert behavior).

The task is, in a very large spreadsheet, to insert a row at the end of the sheet and have the response include the data inserted and its location in the sheet.

v4 offers two methods to do this, spreadsheet.values.append and spreadsheets.batchUpdate using the appendCells request and both of these methods have slightly different behavior.

spreadsheet.values.append will try to automatically detect the table in the range that is given as a non-optional parameter and will append the data after the last line of the "table". This does return the the data inserted and its location/

spreadsheets.batchUpdate using the appendCells request, does not seem to detect tables in this way, and will just detect the last row that has data and append it to that starting in row A. This can return the position and values, but only in the form of the whole spreadsheet resource.

For your use case, neither is suitable. spreadsheet.values.append does not work because you may have some empty rows and so it might not insert values at the end of the table. spreadsheets.batchUpdate is not suitable because the spreadsheet resource is too large.

Is that accurate?

ja...@jthegedus.com <ja...@jthegedus.com> #8Nov 3, 2019 06:14AM

Yes, your summary is spot on.

spreadsheets.batchUpdate is not suitable because the spreadsheet resource is too large.

I would just add that it also does not return the appended row position and data and hence the only other option is to retrieve the sheet which is too large to do so on a regular basis.

[Deleted User] <[Deleted User]> #9Apr 21, 2020 12:26PM

Thank you. I have raised this internally. Once we have some updates, we will post them here.

[Deleted User] <[Deleted User]> #10Apr 21, 2020 02:08PM

we have created code which will allow us to generate triggers on cloud build for repository. The problem we are facing is connecting this repository with cloud build through programmatically.

pr...@hawkfish.us <pr...@hawkfish.us> #11Apr 21, 2020 02:11PM

This also is a big limitation for Terraform users. It would be really nice to be able to have a Terraform module that creates a project and automatically connects the repo, and setup cloud build triggers. Right now, you have to apply multiple times because the triggers can't be created until the connection is complete.

ad...@gmail.com <ad...@gmail.com> #12May 9, 2020 07:14AM

I would also be very interested in this, as a Terraform user, since this process currently can't be automated end to end it makes it very difficult to create triggers programatically

[Deleted User] <[Deleted User]> #13Jun 17, 2020 08:52AM

We also need this. right now we need to do manual operations and run Terraform multiple times to complete our tasks

[Deleted User] <[Deleted User]> #14Jun 17, 2020 05:21PM

also need this for my project. This is the only manual step.

jm...@fitbit.com <jm...@fitbit.com> #15Jun 26, 2020 07:47AM

Hi,

There is an API method to connect to a GitHub repository first, but it is not part of projects.triggers API, but projects.repos [1] API. When calling this API directly, the request body requires a Repo [2] object, in which you will set the mirrorConfig [3] where your GitHub or BitBucket repository will be specified.

https://issuetracker.google.com/issues/142550612#comment4

This is incorrect, we can't create a mirror via the API. The mirrorConfig field from the Repo object is read only, it can't be set.

za...@codexhealth.com <za...@codexhealth.com> #16Jul 20, 2020 04:11PM

We also need this for our company projects.

[Deleted User] <[Deleted User]> #17Jul 23, 2020 07:22AM

This is crucial for us as well.

an...@tryon.technology <an...@tryon.technology> #18Oct 8, 2020 10:32PM

We are wondering if there is any sort of change is planned for this issue and the automation of source repo provisioning.

[Deleted User] <[Deleted User]> #19Oct 8, 2020 10:41PM

For us this is a major issue, which is one of the main incentives for us to
move from building on cloud-build to building with github-actions. As a
bonus, for node.js, github actions are twice as fast thanks to cool caching
options.

So Google, if you're reading this, be aware that you're "earning" churn
over this one ...

On Fri, 9 Oct 2020, 1:32 , <buganizer-system@google.com> wrote:

- Show quoted text -

jr...@glynndevins.com <jr...@glynndevins.com> #20Oct 9, 2020 12:58AM

We're switching over to Bitbucket pipelines because of this. Just can't do it anymore.

an...@tryon.technology <an...@tryon.technology> #21Oct 9, 2020 02:15AM

We totally agree, because of the missing API piece it is quite hard to close the cycle for CI/CD and Infrastructure as Code deployments.
Right now as a workaround, we using Terraform for keeping our Infrastructure under control. One of the parameters that we use is a commit hash and GitHub token. So basically, for each terraform run (in Terraform Cloud) we invoke local-exec to sync code from a Github repo Cloud Repository. Where for Cloud Repository we have an automated trigger for Cloud Build. In GitHub we are invoking terraform deployments but creating a new terraform runs and provide current commit hash, so it can resync the sources to activate a build.

Of course, it would be much efficient, if the Trigger template was properly working or if the mirroring API was provided (right now it is a read-only property of the API).

https://issuetracker.google.com/issues/119569442

If those things were working correctly, it would be possible to automate the full cycle. Terraform would be able to set up the Cloud Build triggers and connect source repositories, and so the stuff would work with no workarounds.

Hope this can be fixed soon, in the meantime, we use those unnecessary steps.

[Deleted User] <[Deleted User]> #22Nov 17, 2020 09:22AM

Any update on the topic ? We are facing the same, huge, issue.

si...@mogoplus.com <si...@mogoplus.com> #23Jan 15, 2021 03:20AM

AWS CodeBuild can access GitHub repos using a token stored in Systems Manager. It's all in the API, so a CI/CD pipeline can be built end-to-end using just a Terraform config and a GitHub token.

Here's an example of why I'd love to have that in GCP, too: I consulted for a previous company using GCP, and it turned out different employees had manually added different repos under their own credentials over time. When they left, builds broke, and the repos had to be manually added again (under better credentials). Automation definitely makes it easier to avoid maintenance traps like this.

fe...@britned.com <fe...@britned.com> #24Jan 19, 2021 06:55PM

I would really like to have this too.

[Deleted User] <[Deleted User]> #25Jan 22, 2021 09:24PM

Hi there

I would like to add an additional use case to this. We are currently looking to build end-to-end automation with some image build pipelines, where we have Terraform is automating everything. The automation includes the creation of GitHub repos + GCP projects, so therefore the projects nor the repos pre-exist to be able to manually connect the repos. Our only course of action currently is to have the Terraform automation fail and then re-run it, or to have separated Terraform plans with this manual repo-connection step in the middle.

Would love to see an API endpoint and/or routine to be able to automated the connected repo piece.

Cheers

gu...@finaptic.com <gu...@finaptic.com> #26Mar 1, 2021 06:39PM

I'm with

comment #25

- this is breaking end to end new repo creation + making it so we need to retain admin privileges on github user accounts that are not automation accounts. This reduces productivity and security.

ar...@gmail.com <ar...@gmail.com> #27Mar 29, 2021 05:56PM

we're planning to change to Bitbucket Pipelines or GitHub Actions because of that too. This is the only manual step in our pipeline.

di...@bulder.no <di...@bulder.no> #28Apr 8, 2021 10:12AM

We're considering adopting cloud build. This feature would allow us to onboard a new github repository through Terraform. Since this feature is missing, we are forced to perform manual operations in the GCP console whenever one of our developers want to build a new microservice housed in its own GitHub repository.

If it was possible to perform this step via the CLI (and by extension, through the google Terraform provider), the new repository could be completely integrated via Terraform, without the need for manual UI operations.

vi...@b6tp.com <vi...@b6tp.com> #29Apr 20, 2021 10:30PM

i take it this is still not implemented?

ev...@umaproject.org <ev...@umaproject.org> #30Aug 18, 2021 02:14PM

Falling in the same issue here and found this thread. The only manual process on the deployment.

ma...@gmail.com <ma...@gmail.com> #31Jan 19, 2022 10:50AM

how is it going? I'm making a game about programming for others and I'm stuck here,

Solution? GitHub Actions not my problem that it's Microsoft

It says in Cloud Build documentation

When you set up a build trigger with an external repository for the first time, you'll need to set up authorization with that repository. For more information, see Adding a repository as a remote.

GitHub

This issue solves the problem (using GitHub workflows) in a descriptive manner.

create GCS bucket
add permission
define cloudbuild.yaml or command-line flag

Message last modified on Jan 19, 2022 10:03PM

[Deleted User] <[Deleted User]> #32Jun 27, 2022 11:10AM

Same issue here. Considering abandoning Google Cloud Build for the same reason as previous commenters.

fa...@gmail.com <fa...@gmail.com> #33Jul 24, 2022 05:37PM

Same issue here. Considering abandoning Google Cloud Build for the same reason as previous commenters.
Add comment

[Deleted User] <[Deleted User]> #34Jul 24, 2022 09:56PM

I also mentioned this to our account manager at Google several times, he took note ...
Abandoned cloudbuild in favor of Github-Actions almost a year ago, the migration was super-easy and our developers are much happier now that the builds are consistently twice as fast (building node.js docker containers)

mi...@gmail.com <mi...@gmail.com> #35Aug 3, 2022 10:39PM

157513027

mi...@gmail.com <mi...@gmail.com> #36Aug 4, 2022 10:59PM

SIJM

ba...@google.com <ba...@google.com> May 1, 2023 12:27PM

Reassigned to su...@google.com.

su...@google.com <su...@google.com> #37May 1, 2023 02:52PM

Verified by su...@google.com.

Hello,

This feature is available now. This has been launched with the launch of Cloud Build Repositories. Additionally you can go through the documentation for more information. I'm going to close this thread which will no longer be monitored. If you have any concerns please feel free to open a new issue tracker by referencing this thread.