Long cold start times for Node.js programs with gRPC dependencies [158014637]

Fixed

Customer Issue

Status Update

No update yet.

Description

be...@google.com

created issue #1

Jun 2, 2020 07:24PM

Update, Jan 8th 2021

👋 I’ve updated my initial post and the title to be more specific, based on the problems still being discussed in this thread:

Cold Start performance issues seem to correlate closely with gRPC libraries (like Firestore). Folks switching to HTTP dependencies, from gRPC, have seen performance improvements (this seems to point to gRPC as well).

If the problems you’re running into do not seem to correlate to gRPC SDKs, such as Firestore, please don’t hesitate to open an issue and we will investigate (Also, I’ve pulled together a Tips & Tricks post, based on some of what I’ve learned investigating this thread which might help).

We continue to work on cross cutting features that will help cold start performance in general, e.g., Min Instances for Cloud Functions, and will keep this thread updated as features roll out.

Problem you have encountered:

We've had a long standing issue on GitHub related to cold start performance of Cloud Functions.

The original issue was that the grpc dependency was quite large, and could lead to several additional seconds in load times. We have since moved to @grpc/gprc-js. this issue has been addressed, and the load times of modules themselves during cold start seem reasonable:

256MB functions saw around 1.5 second load times for @google-cloud/firestore.
512MB functions saw around 750 ms load times for @google-cloud/firestore.
1024MB functions saw around 400 ms load times for @google-cloud/firestore.

Despite module load times being reasonable, customers are seeing long delays when accessing cloud functions on cold starts:

Here's the log of the function call that took 5016ms from an API consumer perspective. The log says the execution only took 1073ms so as @manwithsteelnerves pointed out there must be a 4 second delay even before the "Function execution started" message is printed.

I tried with the above code and it took 5.6secs to finish (where the above code was a minimal cloud function with the import of firebase-functions). 5334.827399 ms just to start execution of function 277ms for exact function code execution.

What you expected to happen:

A cold start time in a reasonable threshold, ideally <1s, vs., 5s.

Steps to reproduce:

The below code can be used to benchmark the time spent loading modules, during GCF initialization:

const requireSoSlow = require('require-so-slow');
const shim = require('require-so-slow/build/src/shim')

require('@google-cloud/firestore')

let events;
exports.main = async (req, res) => {
  if (!events) {
    events = shim.getAndClearEvents()
  }
  res.set('Access-Control-Allow-Origin', '*');
  res.status(200).send(events);
}

The URL of the Cloud Function can then be plugged into Timeline Viewer.

Customers indicate that, on cold starts, the time to first byte in the browser is significantly delayed from the observed load time of the function.

Comments

oa...@google.com <oa...@google.com> #2Jun 5, 2020 12:09PM

Assigned to gc...@google.com.

Our engineering team had a go at reproducing this condition and couldn't. Could you please build us a reproduction case along the lines of

https://github.com/domesticmouse/MovingMarkerPosition ?

br...@roboflow.com <br...@roboflow.com> #3Jun 5, 2020 08:49PM

Fixed in version 1.10.0. Thanks for the report!

python-function-lag.png

601 KB

View

Download

ay...@gmail.com <ay...@gmail.com> #4Jun 26, 2020 05:59PM

Hello team,
Do you still see this as an issue that doesn't need quick attention?
This issue makes it totally unusable in production. Still we see around "12 secs" to even start the function.

Please check the screenshot and let us know if we are missing anything. For starting loading our dependencies only it took nearly 12 seconds which is no way to have it on production!

Screenshot 2020-06-26 at 11.25.00 PM.png

181 KB

View

Download

gr...@gmail.com <gr...@gmail.com> #5Jun 26, 2020 06:20PM

Seconding that question. In case it isn't clear, this is not a small bug. My team has actually spent significant time refactoring to move from cloud functions to AWS Lambda. Believe me - only a vanishingly small fraction of developers are going to bother to come here and complain about this. It's a lot simpler to just move to a service that actually works.

ry...@cinder.studio <ry...@cinder.studio> #6Jun 26, 2020 06:26PM

I will add my voice to the above sentiment here. Our company operates two businesses on Firebase Functions and we also see regular, reproducible, unusable start times in our uses of Firebase Functions. We have put significant work into most of our critical path calls off of Firebase Functions due to this issue.

ry...@cinder.studio <ry...@cinder.studio> #7Jun 26, 2020 06:27PM

I should also note here that it took us a while to conclude that this issue was related to this specific bug ticket.

br...@roboflow.com <br...@roboflow.com> #8Jun 26, 2020 07:44PM

Also adding my +1 that we just had to put in significant effort to move one of our core flows off of Cloud Functions due to this and have plans to migrate the rest of our infrastructure as well because 12 seconds is pretty ridiculous latency.

ay...@gmail.com <ay...@gmail.com> #9Jun 27, 2020 05:41AM

I'm really scared to shift to AWS or other alternatives at this moment but seems there are no other options. If it's related to my own dependencies leading to cold start, I accept it. But here, I have no control when my code execution gets started.

At this point we are tightly coupled with GCP and down the line if this issue is not yet addressed, we may need to start migrating to other alternatives. I understand Google doesn't commit to any ETA' s as per their policies but this issue is a hidden massive loop hole in the whole cloud functions system.

I hope someone from official team updates on the status.

Message last modified on Jun 27, 2020 05:43AM

ry...@cinder.studio <ry...@cinder.studio> #10Jun 28, 2020 03:24AM

I've added a Github Repo and "Free Tier" Firebase Project to demonstrate the issue:

https://github.com/ryanhornberger/firebase-functions-cold-start-bug

ay...@gmail.com <ay...@gmail.com> #11Jul 2, 2020 07:04AM

Can anyone from team update incase if this issue is under resolution?

ph...@gmail.com <ph...@gmail.com> #12Jul 2, 2020 08:26AM

I am also wondering if this issue is under resolution. Cloud functions seems unusable in its current state.

[Deleted User] <[Deleted User]> #13Jul 8, 2020 02:19PM

Hello everyone, i'm at the same point, really scared to shift to AWS or other alternatives at this moment. As someone said, If it's related to my own dependencies leading to cold start, I accept it. But here, I have no control when my code execution gets started.

If there is no news, i think i will start the migration by the end of July

ry...@cinder.studio <ry...@cinder.studio> #14Jul 8, 2020 02:48PM

lu...@lemouvement.ong,

This github repo has bare-minimum dependencies and lines of code and proves a 5 second startup cost exclusively attributable to the firestore libraries. It's not your code and should be a P0 at Google.

https://github.com/ryanhornberger/firebase-functions-cold-start-bug

ay...@gmail.com <ay...@gmail.com> #15Jul 8, 2020 02:51PM

As its a core problem and no one from the team responded till date, how can
we get to their radar?

On Wed, 8 Jul 2020 at 8:19 PM, <buganizer-system@google.com> wrote:

- Show quoted text -

ry...@cinder.studio <ry...@cinder.studio> #16Jul 8, 2020 02:52PM

To everyone frustrated with this bug. We've successfully moved many of our Firestore API calls to simple REST API calls (rather than using the firestore built-in libraries). The performance of our site has improved dramatically thanks to these changes. We accomplished this by deploying an entirely NEW firebase function (separate from our old API function). This new function never instantiates the Google Firestore Libraries and simply uses AXIOS to call firestore.

Every query or operation we move to the new firebase function (that excludes the google library) is reporting a significant boost in performance. Milliseconds vs seconds.

ay...@gmail.com <ay...@gmail.com> #17Jul 12, 2020 03:53AM

Looks like they corrected the trace now. Now its showing that our code starts at 0th ms (actually showing in negative values which is impossible). But atleast now its showing the function is getting called as soon as its triggered, which is great.

Anyone else experiencing the same? I'm using the latest versions, btw.

Will debug more.

vi...@google.com <vi...@google.com> #18Jul 24, 2020 09:10PM

Hello,

Thank You for reaching out to us on this and keeping this thread updated.

We are actively working with a number of internal teams to have a resolution to this problem.

We are treating this as one of our top priorities and will keep this thread updated as we make progress on this issue.

We apologize for any inconvenience caused and please also feel free to reach out to us with more questions on this issue.

We are also happy to meet with folks on an individual basis to help with your specific architectural needs and use cases as we work on the issue on our side.

Best regards,
Cloud Functions Team

ry...@cinder.studio <ry...@cinder.studio> #19Jul 25, 2020 11:58PM

vi...@google.com thank you for the update! Does P2 and S2 represent your current prioritization of this ticket?

ay...@gmail.com <ay...@gmail.com> #20Jul 27, 2020 03:39PM

Looks like it got reverted again. I see huge delays again before start of the function!!!

[Deleted User] <[Deleted User]> #21Jul 28, 2020 02:36PM

Any status on this? I will have to migrate away from the platform: having that 5 seconds delay was OK while we were building our MVP, but it's absolutely not suited for production use. If there's no resolution or status update within 1 week; we'll start offloading to another cloud provider before we invest too much in this (broken) product.

ja...@google.com <ja...@google.com> #22Jul 28, 2020 04:10PM

Hi folks,

Apologies for the delay in responding here. The quick summary is "we're still working on it" (The last update on the internal bug was yesterday). I can't give an ETA because honestly we don't know, but the "good" news is that we're able to reproduce the issue and we're investigating the cause. This is not expected behavior, which is why it's taking some time to track down (it unfortunately was not an obvious cause)

We will update again once we know more.

vi...@google.com <vi...@google.com> #23Jul 28, 2020 04:18PM

Hello Charles,

We are working on this as our top most priority.

Would you like to have a meeting. We want to make sure we provide you as much help as possible.

Please email me at viramachandran@google.com. We could setup a time.

In addition, if any other customer would like to meet on this issue, please email me at viramachandran@google.com and we could setup a meeting.

Best regards,
Vinod

vi...@google.com <vi...@google.com> #24Jul 28, 2020 05:19PM

Hello Customers,

We are actively working on this issue, and we are working on key fixes which will be rolling out soon to help mitigate this issue. In addition, we are working on key long term efforts which will help further make this better. Please feel free to reach out to us directly over email (viramachandran@google.com) and we can setup a meeting to work with you on this. We want to make sure we serve you to the best effort possible and meet your key needs.

Best regards,
Vinod

ay...@gmail.com <ay...@gmail.com> #25Jul 29, 2020 12:25PM

Thanks Vinod for sharing some info. May I know what fixes are getting soon? By soon is it in days or weeks or months? I understand Google can't commit to timeline but at-least info about a tentative date to developers is pretty much required(considering you are already on the fixes)

It at-least helps us to evaluate based on that. Also, the fixes we saw couple of days got reverted and again its showing around 5 secs just to start the function. Initially I thought its because of firestore libs and took the pain to convert all to REST api's. But now again we see it happening.

It would be really great if you can share what fixes we may seen soon so that we can see if our issues are covered.

Thanks, Ayyappa

vi...@google.com <vi...@google.com> #26Jul 30, 2020 02:58PM

Hello,

The fixes that are being rolled out could take a couple of weeks to roll out.

We are working on this on top priority to roll the changes out.

We sincerely apologize for the inconvenience during this period.

I have also setup a meeting with you so that we could look at your architecture and see how best our team can help you.

Best regards,
Vinod

at...@protonmail.com <at...@protonmail.com> #27Jul 31, 2020 02:17PM

Has there been some changes in the last 24 hours? I have seen quite a bit of improvement in my application's cold starts since yesterday.

gr...@gmail.com <gr...@gmail.com> #28Jul 31, 2020 04:43PM

No improvement here, unfortunately.

ay...@gmail.com <ay...@gmail.com> #29Aug 1, 2020 04:39AM

We still see the delays however we would like to try with cloud run as suggested by the team. We will update here if we have further updates.

Also incase if you have updated from Node8 to Node10, make sure you change your code to the latest environment variables. Mainly, replace FUNCTION_NAME to K_SERVICE if you are using it in your project.

Thanks, Ayyappa

ay...@gmail.com <ay...@gmail.com> #30Aug 3, 2020 12:16PM

We moved to Cloud Run with the help of google team and its good w.r.t cold starts but with some drawbacks. However coming from functions we need to figure out few things

Cost comparison (looks like its 10 times more pricey but concurrency might help to reduce)
Debugging with trace becomes difficult as it won't show any logs
Concurrency number per cloud run instance
Additional Min instances cost to keep instances warm
Thread safe code as cloud run allows concurrency compared to functions where it's more isolated
Unable to deploy multiple functions as different cloud run services(as functions framework allowing only one target). Functions offered more control where we can set the required memory per function.

Unfortunately, I can't exactly measure the cold start as its not displaying more details on the trace. Will keep you informed.

On another note, we totally migrated from firestore client libs to REST api which saved couple of seconds.

Message last modified on Aug 3, 2020 12:21PM

vi...@google.com <vi...@google.com> #31Aug 7, 2020 05:38PM

Hello Customers,

We wanted to inform you that we rolled out some key fixes last week, which will help in reducing the cold start time mentioned above in this thread.

We are actively working on more changes and we will keep you updated as we make progress on them.

We sincerely Thank You for your patience on this issue.

Please feel free to reach out to us with more questions.

Best regards,
Cloud Functions Team

gr...@gmail.com <gr...@gmail.com> #32Aug 7, 2020 07:33PM

Thanks for the update! Unfortunately I'm seeing no improvement. Hopefully some of the other work being done will resolve the problem though!

vi...@google.com <vi...@google.com> #33Aug 7, 2020 08:06PM

Hello,

Thanks for the comment.

Are you open to having a meeting. We would like to work with you on your use case and see how best we could serve you.

Could you please email me a viramachandran@google.com.

Best regards,
Vinod

at...@protonmail.com <at...@protonmail.com> #34Aug 7, 2020 08:10PM

I commented last week about a big improvement in the cold starts of my app, from 9 seconds down to 3 seconds. That only lasted for 24 hours and went back to 9 seconds, still getting the same tonight.

wb...@sentryware.com <wb...@sentryware.com> #35Aug 7, 2020 11:26PM

Is this affecting all Node runtimes- 8, 10, and 12- or only some of them?

vi...@google.com <vi...@google.com> #36Aug 10, 2020 05:43PM

Hello,

Thanks a lot for your feedback.

Would griffinjohnston@gmail.com>, atomicweb@protonmail.com and wbattel@sentryware.com, be open to having a call.

Please feel free to reach out to me at viramachandran@google.com and we can setup a time.

We want to work with you to understand your use cases and see how best we could help solve this problem for you.

Our apologies for any inconvenience caused here.

Best regards,
Cloud Functions Teams

at...@protonmail.com <at...@protonmail.com> #37Aug 10, 2020 06:56PM

Hi Virama,

My use case is straightforward just a Firebase project using cloud functions, in a function that onlyt returns a timestamp I'm getting cold starts of 3 seconds and in functions where I'm loading the firebase-admin, initialising it and reading a handful of documents it takes at least 9 seconds sometimes more than 15 seconds.

When warm, the timestamp function takes less than 50ms and the function that uses the admin package takes 150ms on average.

Can you elaborate on the changes the team made in your last update?

As I said before I commented a week or so ago about my cold start times being significantly reduced from 9 seconds to around 3 seconds which is manageable in my use case. However, that lasted only 24 hours and I have since gone back to the same times as before.

I have also done everything suggested to limit cold starts and these are the best times I can get.

9 seconds makes cloud functions unusable so I'm starting to look into migrating to AWS but if this issue can be fixed I would much prefer to stick with Firebase and cloud functiions.

Thanks,

Jon.

vi...@google.com <vi...@google.com> #38Aug 10, 2020 07:29PM

Hello Jon,

We are actively working on getting the issue resolved. How about we have a meeting with the team.

We will have folks from both Cloud Functions and Firebase in the meeting and we can look at it together.

Best regards,
Vinod

da...@panerabread.com <da...@panerabread.com> #39Aug 10, 2020 07:44PM

Hello,

I had submitted a ticket for this same issue but was directed here and have been watching this ticket.

I have a simple cloud function with @google-cloud/firestore 4.2.0 as the only dependency (it does a single get by document id) and the cold start on this is routinely upwards of 14s.

I just tried and had a 6.1 sec cold start time (5699 ms function processing time) and warm times of 471 (382 ms), and 252 (182 ms). I tried a few other cold starts and they averaged 5-6 sec cold starts.

This is a client facing endpoint that should not have 6+ second wait (most calling apps have timeouts of 2-5 secs).

vi...@google.com <vi...@google.com> #40Aug 11, 2020 04:54AM

Hello Darren,

Thanks for your feedback. As mentioned above, we are actively working on landing some changes which would make things better.

I have also setup sometime to sync later this week to see how best we could help you.

Best regards,
Vinod

ay...@gmail.com <ay...@gmail.com> #41Aug 19, 2020 04:33AM

Any updates on this Vinod? We still have firebase triggers on functions which have lot of delays.

vi...@google.com <vi...@google.com> #42Aug 19, 2020 05:45PM

Hello Ayyappa,

We are investigating the Firebase triggers more and are actively working on it.

We will update the bug shortly with our plan further.

Best regards,
Vinod

ay...@gmail.com <ay...@gmail.com> #43Aug 19, 2020 06:45PM

Hey Vinod, Coming to the firestore triggers, It's been more than 4 days I ran some tests. I gave a quick test now and see the cold starts are around 2secs which is ok for a background trigger. But I can tell it's far much better than earlier which used to be in the order of 5-10secs.

Thanks, Ayyappa

jj...@raxial.com <jj...@raxial.com> #44Aug 23, 2020 09:14PM

Throwing another case study in here. Using a very basic Express NodeJS 10 REST function, even combined with a scheduled every-minute warming ping, the function still bounces back and forth between ~200ms to ~6000ms to ~12000ms. Just to reiterate what is obvious, is that this is completely unusable for any front-facing functionality.

vi...@google.com <vi...@google.com> #45Aug 23, 2020 09:48PM

Hello James,

Thank You for your feedback.

Could we please have a meeting. We would like to understand your use cases and see how best we could serve you.

Please feel free to reach out to me at viramachandran@google.com and we can setup a time.

Best regards,
Vinod

fg...@bausoft.cl <fg...@bausoft.cl> #46Sep 7, 2020 05:31PM

Hello,
I have an application running in nodejs 10.
I am using cloud functions, I am finishing the test period, so it is still acceptable startup times over 5 seconds, but now going to production these times are not feasible for the end user. There is some time to solve this problem, if not to find an alternative, thank you

Regards
Fabian

[Deleted User] <[Deleted User]> #47Sep 17, 2020 07:27AM

This has been too much of a hassle, we ended migrating our backend away from this platform, now everything runs below 100ms, consistently (without changing our code). Right now, it’s just not production ready unfortunately. Can’t recommend anyone to give it a shot, except maybe for 1 day hackathons where you throw everything away at the end (the demo time will be awkward with all “waiting for things to load” though 😅).

Appreciate the effort of the team working on it to fix this issue, simply would have wished for a more honest heads up before onboarding this service.

Message last modified on Sep 17, 2020 11:01AM

vi...@google.com <vi...@google.com> #48Sep 17, 2020 04:22PM

Hello Charles, Fabian,

We sincerely apologize for any convenience caused.

I would like to setup sometime to chat with you, to work with your use cases, and see how best we could serve you.

I will setup sometime to meet via email.

Best regards,
Vinod

ry...@cinder.studio <ry...@cinder.studio> #49Sep 17, 2020 06:32PM

@Vinod is there not a case here to escalate the severity of this ticket from S2? It's been demonstrated that this is exclusively a delay in the firebase / firestore libraries. I'm sure the dependencies here are very complicated but this remains a massive issue to the performance of EVERY firebase function that depends on default libraries to get access to firestore.

This platform is top notch, it's just slow in the libraries that access the firestore database.

Companies building production systems are abandoning the platform exclusively on this issue. It's a big deal.

vi...@google.com <vi...@google.com> #50Sep 17, 2020 06:36PM

Hello Ryan,

We have already escalated it to a P1 and the team is working on it with top priority. P1 is taken very seriously internally.

All our internal teams are working on the investigation and the Firebase team is actively exploring to make things better.

We sincerely apologize for any inconvenience caused. We have been rolling out fixes on a regular cadence on this issue. We definitely have more room for improvement, and we are doing our best to make things even better.

Best regards,
Vinod

ry...@cinder.studio <ry...@cinder.studio> #51Sep 17, 2020 07:09PM

Thanks for the quick response! Glad to hear it's considered a serious issue internally. I wish every team involved here the best as you work through this.

Thank you for communicating efficiently here.

Best,

Ryan

da...@google.com <da...@google.com> #52
Restricted
Sep 18, 2020 07:21PM

Comment has been deleted.

si...@gmail.com <si...@gmail.com> #53Sep 29, 2020 11:44AM

We are also heavily relying on Cloud Functions with Firestore for our mobile app. The cold start was ok while executing the business logic in the backend. But now we are really stuck when we tried to introduce a ChatBot using dialogflow. Dialogflow has a hard restriction of 5 seconds to respond for the Fulfillment webhook. Our fulfillment webhook is on Cloud Function and the cold start time is creating a delay more than 5 seconds when the chatbot is not used for a longer duration.

sa...@gmail.com <sa...@gmail.com> #54Oct 13, 2020 01:18AM

Pretty sure I'm seeing this too. Seems any firebase function that uses the std library, i.e., interacts with Firestore, has a cold start of 4–5 seconds. I don't remember this being the case in the past. I've switched to all dynamic, isolated imports, am running in Node 10 environment, and am using all the latest Firebase dependencies. Anything I can do to mitigate this issue? Happy to provide documentation if it's helpful. Thanks, Sam

vi...@google.com <vi...@google.com> #55Oct 16, 2020 08:07PM

Hello,

Our sincere apologies for the inconvenience caused.

We are working on this issue with the highest priority. We have rolled out a set of fixes earlier this quarter. We will be rolling out additional fixes in this quarter as well.

In addition, we will be publishing a best practices doc for Node functions with Firebase.

Please also feel free to email me directly at viramachandran@google.com and we would like to work with your specific use cases and see how best we could serve you.

Best regards,
Vinod

da...@google.com <da...@google.com> #56Oct 19, 2020 05:05PM

Firebase customers: the FUNCTION_NAME env var has been changed to FUNCTION_TARGET, and this may be impacting your cold start performance.

If you are using code such as in this Github issue, e.g. like this example and many similar examples in that issue, that code is now falling back to the "load all functions" behavior because the FUNCTION_NAME env var has gone away.

Adjusting the code to look at FUNCTION_TARGET should improve cold start performance in that case.

ry...@cinder.studio <ry...@cinder.studio> #57Oct 19, 2020 07:25PM

da...@google.com,

It was my understanding that K_SERVICE was the var we should be using. What is the advice between K_SERVICE vs FUNCTION_TARGET ?

da...@google.com <da...@google.com> #58Oct 19, 2020 08:31PM

K_SERVICE also works (it will have the same value as FUNCTION_TARGET).

ry...@cinder.studio <ry...@cinder.studio> #59Oct 19, 2020 08:36PM

Great thanks!

ja...@google.com <ja...@google.com> #60Oct 19, 2020 09:52PM

The wording on http://cloud/functions/docs/env-var is a bit unclear, but I think that FUNCTION_TARGET is the thing that explicitly maps to the in-code function.

For example, K_SERVICE does not appear in the Node.js Functions Framework [1], but FUNCTION_TARGET does [2].

[1] https://github.com/GoogleCloudPlatform/functions-framework-nodejs/search?q=K_SERVICE

[2] https://github.com/GoogleCloudPlatform/functions-framework-nodejs/search?q=FUNCTION_TARGET

ja...@google.com <ja...@google.com> #61Oct 19, 2020 09:53PM

The Functions Framework contract also explicitly mentions FUNCTION_TARGET: https://github.com/GoogleCloudPlatform/functions-framework#specification-summary

lu...@geitner.io <lu...@geitner.io> #62Oct 21, 2020 04:44PM

Hello everyone,

Thanks vi...@google.com for the update,

I'm still desperate about the latency of google cloud functions but anyway,

I do have a question for the google team,

Are you considering moving cloud functions hosting from App Engine to Cloud run ?

👉🏻There is now a way to deploy cloud function framework directly with a new buildpack.

https://cloud.google.com/blog/products/containers-kubernetes/google-cloud-now-supports-buildpacks
👉🏻Soon, Cloud run will get events, maybe exactly like the firebase trigger offert.

https://cloud.google.com/run/docs/events/anthos/quickstart

It would be a nice way to improve performance and capabilities :)

Thanks a lots,
Lucas Geitner

vi...@google.com <vi...@google.com> #63Oct 21, 2020 04:48PM

Hello Lucas,

Thank You very much for your feedback.

With your question on using the FF with the buildpack.

This link has a few instructions on taking your existing function and using a buildpack to deploy it to Cloud Run:

https://github.com/GoogleCloudPlatform/buildpacks

Please feel free to reach out to me directly if you have further questions on this.

Best regards,
Vinod

vi...@google.com <vi...@google.com> #64Oct 21, 2020 04:48PM

Just to follow up on the comment above:

https://github.com/GoogleCloudPlatform/buildpacks#building-a-function

ch...@gmail.com <ch...@gmail.com> #65Nov 5, 2020 04:34PM

Is there a new update on the issue? The status is still assigned, but it is not clear if the above items regarding the ENV VARS change is the recommended mitigation.

st...@gmail.com <st...@gmail.com> #66Nov 13, 2020 04:15AM

ENV Vars change caused me some troubles, but after fixing it - firebase functions still have terrible latency of up to 12 seconds.

[Deleted User] <[Deleted User]> #67Nov 13, 2020 07:08AM

In our case, we have many applications that rely on firebase as a backend. We are also observing times of over 12 seconds, which is not production-ready under any circumstance. What would be a good alternative to Firebase, that offers similar capabilities in terms of authentication and serverless, and that it is possible to port an app from one to the other?

ay...@gmail.com <ay...@gmail.com> #68Nov 13, 2020 09:18AM

We are in the process of migrating to Cloud Run with min instances. At the
moment it seems to have results. However we miss mainly a few things
compared to functions.
1. Firestore triggers are not directly supported
2. Need to come up with build scripts for deployment (compared to firebase
it's just a single command)
3. Testing isn't easy as of now as there is no local emulator

However, thanks to the "functions framework" for making it easier to shift
to cloud run. We may pay high compared to functions but atleast concurrency
option of cloud run might bring down the costs but we are free from cold
starts for now.

Thanks,
Ayyappa

On Fri, Nov 13, 2020 at 12:39 PM <buganizer-system@google.com> wrote:

- Show quoted text -

[image: Mailtrack]
<

https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<

https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
11/13/20,
02:48:02 PM

[Deleted User] <[Deleted User]> #69Nov 16, 2020 07:27AM

Hi Ayyappa,

So you are migrating your apps to Cloud Run? Is it necessary a big rewrite in the code base?

ay...@gmail.com <ay...@gmail.com> #70Nov 16, 2020 07:57AM

>
> Is it necessary a big rewrite in the code base?

*No,* Not at all. Please follow the below steps for a quick try.
1. Setup for using Functions Framework
<

https://cloud.google.com/functions/docs/functions-framework> first.
1. Define the start command in package.json
2. Set FUNCTION_TARGET env variable before calling start command
2. Once you are done with step 1, you can package your function as a
container using and push it to gcr(container registry)
3. Make your cloud run to use the gcr container
4. Done!

Please *note* that if you are using Firestore triggers in cloud functions,
you may need to find an alternative for handling those on cloud run. At the
moment, we are still using functions for firestore triggers and using cloud
run for main rest api services.

Google team recommended to use *build packs *which seems to be more common
in the future but the below way was chosen as its much easier for us.

Sharing my script for quick reference.
FUNCTION_TARGET : Set your function name here
IMAGE_NAME : Name for your container (New parameter compared to functions)
PROJECT : Set your project name
Usage for building dev envi : ./build.sh dev

#!/bin/bash
RED=`tput setaf 1`
GREEN=`tput setaf 2`
YELLOW=`tput setaf 3`
CYAN=`tput setaf 6`

BOLD=`tput bold`
RESET=`tput sgr0`

WARN=$RED
LOG=$GREEN
INFO=$CYAN

IMAGE_NAME=rest-api
REGION=us-central1
MIN_INSTANCES=0
MAX_INSTANCES=10
VPC_CONNECTOR=functions-connector
REDIS_HOST=10.128.0.2
REDIS_PASSWORD=password

if test -z "$1"
then
ENV=dev
else
echo "${INFO}Deploying for project : ${BOLD}${WARN}$1 ${RESET}"
ENV=$1
fi

case $1 in
dev)
PROJECT=project-name-dev
MIN_INSTANCES=0
MAX_INSTANCES=2
;;
staging)
PROJECT=project-name-stage
MIN_INSTANCES=1
MAX_INSTANCES=5
;;
production)
PROJECT=project-name-prod
MIN_INSTANCES=1
MAX_INSTANCES=10
;;
*)
echo "${BOLD}${WARN}You need to pass environment(dev|staging|production) as
the first argument. ${INFO}ex: deploy.sh dev${RESET}"
exit 1
esac

echo "${WARN}Project : ${BOLD}${INFO}${PROJECT}${RESET}"

#echo "${INFO}Deploying all functions...${BOLD}${WARN}$ENV (${PROJECT})
${RESET}"
#firebase deploy --only functions

echo "${INFO}Starting deploying to cloud run...${RESET}"

#Make build and push to container registry
echo "${INFO}Submitting build to google container registry...${RESET}"
*gcloud builds submit --tag

gcr.io/${PROJECT}/${IMAGE_NAME}
<

http://gcr.io/${PROJECT}/${IMAGE_NAME}>*

#Deploy cloud run from container registry image with env variables
FUNCTION_TARGET=b2b
echo "${INFO}Deploying ${FUNCTION_TARGET} service to cloud run from google
container registry...${RESET}"
#Url will be cloud-run-url/v1

gcloud alpha run deploy ${FUNCTION_TARGET} \
--image

gcr.io/${PROJECT}/${IMAGE_NAME} \
--platform=managed \
--allow-unauthenticated \
--set-env-vars GCLOUD_PROJECT=${PROJECT} \
--set-env-vars REDIS_HOST=${REDIS_HOST} \
--set-env-vars REDIS_PASSWORD=${REDIS_PASSWORD} \
--set-env-vars ENABLE_PROFILING=true \
--set-env-vars FUNCTION_TARGET=${FUNCTION_TARGET} \
--vpc-connector ${VPC_CONNECTOR} \
--region ${REGION} \
--min-instances ${MIN_INSTANCES} \
--max-instances ${MAX_INSTANCES}

*I'm not an experienced backend developer as I'm a game dev by profession
past 12 years but, using functions from the past 1.5 years. So, please
double check and let me know if you have any better alternatives.*

Thanks,
Ayyappa
Twitter. ayyappa_1

On Mon, Nov 16, 2020 at 12:58 PM <buganizer-system@google.com> wrote:

- Show quoted text -

id...@gmail.com <id...@gmail.com> #71Nov 29, 2020 03:20PM

Facing same issue still (with freshly created firebase project using latest deps and nodejs version). We have an https function that simply writes some user data to firestore when called, no third party deps just admin and firestore. Cold starts are always randomly executed at 4-12s times, which is a lot compared to other similar serverles offerings.

It took me a while to find this issue, yet it is very helpful at explaining the root cause. I saw many mentions of "Wee have deployed a fix and are working on more", but it seems that there is no real effect atm, as the problem still persists.

With remaining fixes / plans, can we at all expect to see respectable cold starts (~1s max) for these simple core functions that perform very basic yet core firebase functionality, or should new project move on to something like cloud run / app engine instead?

Message last modified on Nov 29, 2020 03:21PM

br...@askgms.com <br...@askgms.com> #72Dec 23, 2020 08:26PM

For an acknowledged first-priority issue at a company renowned as a technology leader with a market cap north of $1 trillion, the amount of time this has languished in triage is absolutely awful. The GitHub issue was created January 2019, and we're now swiftly approaching January 2021! Two years of cold starts which are entirely unacceptable for production, and the best we've been getting is that they may have made a breaking change to an environment variable name.

Google team, is this the expected turnaround speed for priority 1 tasks? If so, I think we'll have to move on to other cloud providers and suggest our counterparts do the same.

vi...@google.com <vi...@google.com> #73Dec 23, 2020 08:43PM

Hello Brandon,

We sincerely apologize for any inconvenience regarding this issue.

We have been actively working on reducing cold starts and have rolled out a number of fixes in the past two quarter. In addition, we are going to be published a blog on the best practices to setup and structure your cloud functions to minimize cold starts. We will be updating this bug shortly after the blog is published.

In addition, we are working on additional features to further reduce cold starts in Q1.

We sincerely apologize for any inconvenience caused.

Best regards,
Vinod

br...@askgms.com <br...@askgms.com> #74Dec 23, 2020 09:21PM

Hi Vinod,

While I appreciate that (ostensibly) you guys have been actively working on this, I think we're all starting to feel that your team's definition of "active" may be different than that of other developers. Firebase has been heavily marketed, yet basic samples easily illustrate the cold start issues. While there was a special Firebase Summit built and hosted, which undoubtedly pulled dev time, a critical component of the platform was still languishing with no meaningful changes to an issue which blocks many production deployments.

Publishing a blog post about best practices is great, but why in the world does a function whose sole purpose in life is to be called intermittently have such bad performance when used in that manner? It should NOT be necessary for your users to go through blog posts to understand why a basic demo function takes >5s to perform anything at all (and that's assuming fixes are available!). Consider your competition: they require no such silliness, and you both offer similarly-priced services. This is how you lose customers, and infrastructure customers generally don't come back when burned.

At the start of June, Benjamin on the Cloud team created this ticket after the GitHub issue had been ongoing since January 2019. In August, you were "actively working on getting the issue resolved", and it looks like you met with several different developers at different companies to understand the issue better. In October, you say you "will be rolling out additional fixes in this quarter [2020 Q4] as well". Now you're saying it'll be 2021 Q1 for some further features which might help? Forgive us for some skepticism.

Trust needs to be earned back here, and possibly having some features which maybe/potentially/hypothetically could help isn't enough to allow any business to decide whether to continue development on a platform. This isn't a matter of inconvenience - it's a matter of whether we ever work with Google Cloud Platform again, and whether GCP has developers as advocates or critics.

I like the platform and its potential, but this makes me and my team wary of actually using it for anything that matters.

st...@googlemail.com <st...@googlemail.com> #75Dec 23, 2020 09:55PM

I can only add my voice to this. The way this issue has been handled gives an awful impression of GCP and Firebase.
If this is how a P1/S1 task gets 'resolved' then what do you guys even do with anything lower rated?!

vi...@google.com <vi...@google.com> #76Dec 23, 2020 10:23PM

Hello Brandon,

Our team has actively worked on making cold starts better and we are treating this as very key priority.

We seen significant improvements in our internal measurments.

Could you please provide details on your specific functions. I will setup a meeting with our team to review them for further analysis.

Likewise stefernet@, could you please provide details on your functions as well. Please feel free to email me directly at viramachandran@google.com.

Likewise idaderko@gmail.com, could you please provide details on your functions as well. Please feel free to email me directly at viramachandran@google.com.

We do want to earn your trust and we are working on this with the highest priority.

Best regards,
Vinod

vi...@google.com <vi...@google.com> #77Dec 23, 2020 10:37PM

Hello Team.

As mentioned above regarding the blog post, here is the published blog post writing and deploying Node.js Apps on Cloud Functions with the goal to optimize performance and minimize cold starts.

https://cloud.google.com/blog/products/serverless/running-effective-nodejs-apps-on-cloud-functions

As mentioned above this is orthogonal to any issues you could be facing with your functions. Please email me directly with your functions details and we will investigate it with highest priority. As mentioned earlier, we have seen significant improvements in our internal measurements and we do want to make sure we analyze your functions and meet your performances needs.

Our apologies for any inconvenience.

Best regards,
Vinod

wi...@google.com <wi...@google.com> #78Dec 23, 2020 11:45PM

Reassigned to wi...@google.com.

While I appreciate that (ostensibly) you guys have been actively working on this, I think we're all starting to feel that your team's definition of "active" may be different than that of other developers.

I'm the engineering director at Google responsible for Cloud Functions. For a multitude of reasons, multiple changes/updates haven't yet made their way to production which we believe would have improved the performance issues cited in this bug report. Although I cannot promise that it will be fully resolved once those changes and updates have rolled out.

What I can promise you is that starting in January you'll see us making a lot more progress on this, and with regular updates as well as transparency on what we believe is causing the issue and what we're doing to address it.

br...@askgms.com <br...@askgms.com> #79Dec 24, 2020 12:25AM

Thanks for the replies! I really appreciate hearing from someone with a direct line-of-sight on progress, and I'm sure others do as well. It sounds like there are legitimate plans for January. Will those changes be posted here, or is there some better way to subscribe to those updates elsewhere? I really look forward to getting this behind us, as this is the last piece of the Firebase puzzle which hasn't locked in well enough for production.

And Vinod, at this point I don't believe our function calls are sufficiently different from any of the others to really warrant a deeper analysis, though if you and the team really believe otherwise, we can set up some time. I'll check out the link you posted again, though I've run through that in the past during initial builds.

be...@google.com <be...@google.com> #80Dec 24, 2020 12:32AM

An update

👋 stefernet@ brandon@, other folks in this thread, I'd like to give some perspective and explain why this ticket has taken quite some time to address.

When this ticket was opened we were having intermittent infrastructure issues with Cloud Functions resulting in poor performance. This thread helped surface a P0 issue, which was addressed.

However...

As this ongoing discussion demonstrates, this isn't the whole story.

A variety of people continue to have a poor experience with Cloud Functions. Internally, we've put together a group of folks across several teams (Cloud Functions, Developer Relations, support, Firebase), and have been discussing how to better meet people's expectations around performance.

What's clear, is there's no one problem:

there were infrastructure issues; some of which have been addressed, some of which where there's ongoing work around (e.g., improving gVisor read performance).
large dependency graphs hurt cold start performance.
Cloud Functions can be a tricky paradigm to code for, which can contribute to cold start issues.
there are product improvements we can make to make cold starts less frequent.

Our plan

I've worked with support to pull together this comprehensive article "Tips for writing and deploying Node.js apps on Cloud Functions". It's my hope that this helps any people in this thread who were bumping into issues due to coding hiccups (such as unhandled promises).
We have features in the works that will allow people to avoid frequent cold starts; concretely, we intend to support minimum instances for Cloud Functions.
We intend to continue having cross- team meetings to get to the bottom of specific problems customers are having (for instance, why slow cold starts seem to frequently be correlating with certain libraries).

Unfortunately, this thread has turned into a bit of a catch- all issue (for a problem that is nuanced).

In the new year, I would like to close this issue, in favor of more specific issues where we can dig into specific categories of problems (this will make it easier for us to help individual customers).

We want Cloud Functions to be an awesome product for everyone's use cases. And I apologize for the frustration this long-lived thread has caused.

Edit: in response to brandon@, we will update the "best practices" post, as we release features in the new year. There will also be posts to cloud.google.com, and @googlecloud/(@BenjaminCoe) on Twitter.

Message last modified on Dec 24, 2020 12:52AM

id...@gmail.com <id...@gmail.com> #81Dec 24, 2020 10:32AM

Overall, that blog post has a lot of useful information, but it doesn't seem to cover core issue, which is cloud function usage with firestore. I feel like answers here are focused mostly on general NodeJS performance, but there are no big issues there.

Most of us are talking about insanely long cold start times related to firestore triggers / using firestore inside cloud function. Here is firebase documentation on related topic

https://firebase.google.com/docs/functions/firestore-events

That 10s reaction time, is where the biggest pain point is atm, since firestore and cloud functions are used so much in conjunction. Were there any improvements in this area, I believe issue was with grpc establishing initial connection from function to firestore?

From what I read here, the only viable solution will be to increase "minimum instances" count for cloud functions. Will this become available as part of firebase function config?

Message last modified on Dec 24, 2020 10:33AM

br...@askgms.com <br...@askgms.com> #82Dec 24, 2020 04:08PM

Comment #81

, looks like they're already planning on implementing minimum instances for Firebase Functions per the second bullet in their "Our Plan" section.

@benjamin Thanks for the detailed explanation and battle plan! I certainly understand that there are several issues likely at play here, exemplified by the above commenter and the variety of conditions which could trigger the behavior. We'll look forward to more updates in January, and appreciate you guys working on getting this fixed up.

bu...@gmail.com <bu...@gmail.com> #83Jan 6, 2021 11:09AM

Just to add to this, the cold start issue isn't strictly with cloud functions.

The firebase/firestore sdk is equally randomly slow on AWS lambda (sometimes 4+ seconds). I would imagine it's the same issue on azure too.

Edit: I updated all the firebase node libraries which removed the grpc dependencies and now uses the js one.
Initialising the sdk is much more consistent in startup times than before, also seems much faster. Now i'm getting around 1600ms on a cold boot

Message last modified on Jan 6, 2021 01:59PM

da...@gmail.com <da...@gmail.com> #84Jan 6, 2021 11:42AM

I can also add my bit on top of this issue, have been developing a large app for 2 years now and I'm still facing a problem with long cold start (even 9-10s!).

The web application is running on Nuxt.js in SSR mode, therefore a cloud function is serving the website. I'd consider it a common use case and it's hard to digest the fact that the users need to wait around 5-10s for the main page to load (given the scenario of lower traffic and therefore function kept in the cold state).

I'd happily let someone from the team have a look and see if analysing the way I architected the functions' code would be of a benefit to any of us. In conclusion Firebase suits all my needs apart from this issue that's been hanging in the air for a long time.

Kind regards,
Damian

ry...@cinder.studio <ry...@cinder.studio> #85Jan 6, 2021 05:22PM

Hey @Google Team,

In particular bu...@gmail.com THANK YOU!

Is your GRPC change published yet? We'd love to take advantage of that change. This is precisely what I've been waiting to hear you change. I previously shared on this email thread a testing tool that demonstrates and measures the issue. I just updated it to all of the latest libraries and I see 0 improvements.

The below sample application has only 2 serverside dependencies:
- firebase-admin
- firebase-functions

It demonstrates that JUST INSTANTIATING FIRESTORE can take up to 5 seconds.

https://github.com/ryanhornberger/firebase-functions-cold-start-bug

@ wi...@google.com,

While we are excited to hear that this ticket has revealed a variety of issues across the platform that will be improvements we will enjoy; a massive amount of the issues WE are concerned about can be relieved by taking some time away from the bigger picture stuff and focusing on speeding up the "Instantiation Time" of the Firestore JS library.

@ALL Companies suffering from this issue,

Our company has found tremendous success in performance BYPASSING the published Google Libraries in favor of using the Firestore REST API's directly. We've migrated all critical path firestore calls to direct REST API calls. Reach out to me if you want some suggestions: ryan@cinder.studio

bu...@gmail.com <bu...@gmail.com> #86Jan 7, 2021 01:33PM

@ryan@cinder.studio the memory allocation also has a really big impact on how long it takes to initialise firebase.

Doubling the available ram should cut the cold boot time in half, it's a trade off though as it makes all future invocations more expensive.

Setting memory to 2GB you should expect the function to complete in under 1 second, but all subsequent calls probably don't get much faster.

I'm using 1GB memory allocation which puts my cold boot functions under 2s, subsequent calls are faster than before; but don't improve when using more than 1GB. After 1GB is just wasting money.

The frustrating thing is the memory used is only 134mb, the rest is just wasted reducing the cold boot time.
In AWS this is probably due to the vCPU scaling with the available memory, more RAM = more CPU.
Does cloud functions scale in the same way?

If the low memory is causing slow init of the sdk, I think it's unrealistic of Google to expect customers to opt for more expensive 1GB memory just to get reasonable cold boot response times with a core library.

ry...@cinder.studio <ry...@cinder.studio> #87Jan 7, 2021 05:31PM

@ bu...@gmail.com

Thanks for staying engaged! I understand that the current library is giving difficulty here. It's apparently just too heavy... I'm suggesting Google may need to invest someone's time to publish an alternative library if this library can't be "adjusted" to correct the issue. Publish both if you want. One that is simple, fast-and-light, but has fewer features. And keep the bigger, heavier one for more complicated tasks. Google's Firebase REST API is quite robust and does not need all the fancy to be accessed.

We know this solution works because we invested in engineering to author our own solution:

Here at Cinder Studio, we think Firebase is the most feature-rich scalable cloud database we've ever worked with and we wanted to use it so badly that we ended up making our own library to bypass the speed concerns of Google Supplied Library.

Our library

* has only 3 small dependencies (I may need to verify this further),
* is composed of less than 1000 lines of code (excluding testing files), and
* uses simple REST API CALLS to accomplish the job.
* uses minimal memory and minimal instantiation time

It likely lacks all the features of your library but it accomplishes 85% of the DB calls we need today with excellent (low) error rates (We've got a few stragglers we've been too lazy to migrate because we'd need to add a few more features to the library to make it happen)

Below is an example of how similar they are when compared side-by-side.

---------

import dataArchiveGoogleFb from ...
import dataArchiveCinderFb from ...

export default (ownerAccountId:string, datasetShortname:string) => {

// EXAMPLE USING GOOGLE FIREBASE LIBRARY

const googleQueryResult = await (
dataArchiveGoogleFb.collection
.select(
'id',
'createdAt',
'updatedAt',
'ownerAccountId',
'datasetShortname',
'data',
)
.where("ownerAccountId",'==', ownerAccountId)
.where("datasetShortname",'==', datasetShortname)
.where("deletedAt", "IS_NULL")
.orderBy("createdAt", "DESCENDING")
.limit(1)
.get()
)
return queryResult

// EXAMPLE USING CINDER STUDIO'S LIBRARY

const cinderQueryResult = await QuickRead.query(
dataArchiveCinderFb.newCollectionQuery()
.select(
'id',
'createdAt',
'updatedAt',
'ownerAccountId',
'datasetShortname',
'data',
)
.whereComposite('ownerAccountId','EQUAL','string', ownerAccountId)
.whereComposite('datasetShortname','EQUAL','string', datasetShortname)
.whereComposite('deletedAt', 'IS_NULL')
.orderBy("createdAt", "DESCENDING")
.limit(1)
.prepare()
)

return {
googleQueryResult: googleQueryResult,
cinderQueryResult: cinderQueryResult,
}

}

be...@google.com <be...@google.com> #88Jan 8, 2021 10:14PM

An update

👋 I’ve updated my initial post and the title to be more specific, based on the problems still being discussed in this thread:

We continue to work on cross cutting features that will help cold start performance in general, e.g., Min Instances for Cloud Functions, and will keep this thread updated as features roll out.
There's an ongoing internal conversation about how we can improve cold start issues brought about by dependencies.

Message last modified on Jan 8, 2021 10:18PM

sa...@gmail.com <sa...@gmail.com> #89Jan 22, 2021 07:27PM

Sounds like I'm having the same issue (https://twitter.com/sambecker/status/1352676553671532544).

Basically, created a blank firebase project with an https function that modifies one field of one firestore document.

Cold start times (which seem to reset within 30m) are in the 6000ms range.

10:28:41.934 ColdStartTest.tsx?6c99:21 Request starting ...
10:28:48.176 ColdStartTest.tsx?6c99:24 Success
10:28:48.176 ColdStartTest.tsx?6c99:29 Cold start time: 6242
10:28:53.802 ColdStartTest.tsx?6c99:21 Request starting ...
10:28:53.979 ColdStartTest.tsx?6c99:24 Success
10:28:53.979 ColdStartTest.tsx?6c99:29 Cold start time: 177
11:07:39.914 ColdStartTest.tsx?6c99:21 Request starting ...
11:07:46.073 ColdStartTest.tsx?6c99:24 Success
11:07:46.073 ColdStartTest.tsx?6c99:29 Cold start time: 6159
12:13:45.491 ColdStartTest.tsx?6c99:21 Request starting ...
12:13:52.109 ColdStartTest.tsx?6c99:24 Success
12:13:52.110 ColdStartTest.tsx?6c99:29 Cold start time: 6619

I can post the repo if it's helpful, but honestly it just seems like it's a problem using firestore within a firebase project which doesn't sound like it makes any sense? Please correct me if I'm misunderstanding this thread! I was encouraged by @mbleigh to contribute.

EDIT 1:

I should mention that I'm using the dependency firestore-admin vs @google-cloud/firestore as that's what the default Firebase project starts you with. Would love to know whether that distinction's related to this issue or not.

EDIT 2:

Switching from default function memory of 256MB to 512MB seems to bring cold starts down to 2000-3000ms range. Amazing to think that would be necessary on an empty firebase project that makes use of firestore but very good to know.

Message last modified on Jan 22, 2021 10:56PM

id...@gmail.com <id...@gmail.com> #90Jan 30, 2021 08:34PM

Upping memory (which also allocates more CPU resources) seems to help a lot. I had function running at 256MB as it was only using ~72mb of memory and cold starts took 5000-9000ms. Upping that to 2GB reduced cold starts to 1200-1500ms. Going higher to 4GB seems to have no additional benefit. Files deployed to function are only 4kb in size. The only npm packages in package.json are firebase-functions and firebase-admin.

What could be so demanding on memory / cpu here? I have suspicion that it is related to installing / initialising or using firebase-admin package, in particular for firestore.

Message last modified on Jan 30, 2021 08:36PM

[Deleted User] <[Deleted User]> #91Feb 1, 2021 07:43AM

I don't think using x8 more resources to deploy an empty project qualifies as a solution. It will make the project economically inviable pretty quick

Message last modified on Feb 1, 2021 07:43AM

jr...@gmail.com <jr...@gmail.com> #92Feb 5, 2021 10:10AM

Is there a solution yet? I'm at 6-10 second cold start times for querying a single document.

That's outrageous.

ry...@cinder.studio <ry...@cinder.studio> #93Feb 5, 2021 04:38PM

jr...@gmail.com if you are just now starting your application for the first time, my advice is to remove the firestore libraries from Google and just go straight to REST API calls straight to the firestore API.

In our case we already had a large number of calls using the firestore libraries so we implemented a new API on a new firebase function that did not have the firestore dependencies and on that new API all of our calls are making direct REST calls to the firestore API.

Performance jumped significantly.

Ryan@Cinder.Studio

po...@gmail.com <po...@gmail.com> #94Feb 5, 2021 05:13PM

remove the firestore libraries from Google and just go straight to REST API calls straight to the firestore API

This is a remarkable observation, and I'm frankly amazed that the official shipped SDK would be significantly less performant than effectively writing one's own SDK to wrap the underlying firestore API.

In common with others, I've also deduced that cold start times on our FB projects are largely impacted by the first call to admin.firestore() - not module imports, or global initialisers. Just this. We wrapped a timer around that call and we could see it represented the majority % of the first run time.

console.time("admin.firestore() " + process.env.K_SERVICE);
const db = admin.firestore();
console.timeEnd("admin.firestore() " + process.env.K_SERVICE);

I can concur we've also upped the memory allocation on some of our busier functions simply to reduce this firestore latency a bit. We also went so far as to use the RTDB for some datastore functionality we needed to have faster response times on, since it spins up in ~80-100ms rather than ~300-800ms for firestore.

ma...@apptreesoftware.com <ma...@apptreesoftware.com> #95Feb 5, 2021 05:34PM

In common with others, I've also deduced that cold start times on our FB projects are largely impacted by the first call to admin.firestore() - not module imports, or global initialisers. Just this. We wrapped a timer around that call and we could see it represented the majority % of the first run time.

Totally agree with this. I spent a lot of time refactoring our cloud functions to follow the best dependency practices outlined in the various articles published. After completing that we saw only a minor improvements in startup times. We then discovered this issue and moved all of our services to Cloud Run. We still use firestore trigger cloud functions but they simply forward the request to our cloud run instance. By avoiding any calls to the admin SDK in the firestore triggers themselves we're now down to sub-second cold start times.

This problem should be mentioned in the performance articles. Don't have people waste a bunch of time optimizing dependencies and import patterns if you have such a blatant performance problem in the admin SDK.

Given how long this has been a P1 issue I've given up on it. I would really like to see Eventarc include firestore trigger events so that Cloud Run can receive those directly.

jj...@raxial.com <jj...@raxial.com> #96Feb 5, 2021 05:41PM

We did something very similar last year. Remain using Firebase/Cloud Functions for backend processing, due to the nice triggering system, but move anything UX-facing to Cloud Run (with minimum 1 instance.) Has worked great.

Between Cloud Functions and Cloud Run, there is an ideal architecture. As far as I can tell, Cloud Run is the future as it feels like Cloud Functions 2.0 in virtually every way. Just need more trigger support.

st...@googlemail.com <st...@googlemail.com> #97Feb 5, 2021 06:25PM

People in this thread are implementing their own interfaces that make direct REST calls to the firestore API with significant performance gains. Meanwhile google is pretending to work at this with P1. How can it take over a year to fix this for the firebase team when even a quick bodge by some users has a such a big impact.

ca...@hypermob.co.uk <ca...@hypermob.co.uk> #98Feb 5, 2021 06:34PM

Meanwhile google is pretending to work at this with P1. How can it take over a year to fix this for the firebase team when even a quick bodge by some users has a such a big impact.

There's a reason why Google cloud has lost $5.6B in 2020.

I'm waiting around +20 seconds to save a document + firestore trigger + Save other document - which is a lot since is just an audit behaviour.

ay...@gmail.com <ay...@gmail.com> #99Feb 6, 2021 01:03AM

"We still use firestore trigger cloud functions but they simply forward the
request to our cloud run instance."

Can you please share how you are doing this? Through pub-sub?

We also have firestore triggers which are still stuck at functions due to
the lack of triggers support on Cloud Run.

Also, we are maintaining minimum instances on cloud run just to avoid the
cold starts but definitely a lighter firestore sdk will definitely help. As
we are having min instances in all of our environments for all services
which is billed nearly for 10 min instances and this is unfortunate as its
not a perfect solution.

Hope cloud run supports firestore triggers too pretty quickly or a lighter
firestore sdk ls published soon.

Also is the firestore sdk opensource? If so do you guys see it ideal to
remove the gRpc stuff to avoid this issue? if thats technically possible,
that would be worth a try from my end.

Thanks,
Ayyappa

On Fri, Feb 5, 2021 at 11:04 PM <buganizer-system@google.com> wrote:

- Show quoted text -

ma...@apptreesoftware.com <ma...@apptreesoftware.com> #100Feb 9, 2021 03:17PM

Not through pubsub, just a normal HTTP call to our Cloud Run instance. ie.


const host = functions.config().api.functions

const httpClient = axios.create({
  baseURL: host,
});

export const onMessageCreated = functions.firestore
  .document('threads/{threadId}/messages/{messageId}')
  .onCreate(async (snapshot, context) => {
    const threadId = context.params.threadId;
    const messageId = context.params.messageId;
    const message = snapshot.data() as any;
    message.messageId = messageId;
    await httpClient.post('/functions/onMessageCreated', {
      threadId: threadId,
      snapshot: message
    });
  });

Not ideal but brought our cold start times from ~20 sec to < 1 sec. Getting trigger events to Cloud Run (or fixing this issue) would be ideal because all cloud functions are doing for us now is adding latency and cost.

The advantage to this setup is that you can configure the host to point to your dev machine using something like ngrok. This makes debugging triggers code much easier.

an...@rydeup.de <an...@rydeup.de> #101Mar 2, 2021 05:05PM

Any updates here?

wi...@google.com <wi...@google.com> #102Mar 4, 2021 09:59PM

There are three different paths that we're taking to resolve this issue. For many, I believe #3 will be the most practical and near term outcome.

Generally speaking, Cloud Functions cold start time is not as good as industry leaders in this benchmark, and we expect to improve this over time, but I don't have a time table to communicate right now. There are ideas that we're investigating, but they're too early for me to give you a definitive timeline, and I would categorize this as incremental product performance improvements that we're going to deliver over time.
Firebase client libraries seem to be particularly slow to start up. Which we think is specifically related to the presence of gRPC and Node.js. For the purposes of resolving this issue, I want to delegate this to the Firebase team, as this does not appear to be a problem specific to Cloud Functions. Unfortunately, Firebase does not seem to have a public issue tracker, and I need to figure out where to file this issue.
Finally, there are cold starts improvements coming to Cloud Functions in the form of the Min Instances feature. We've created a feature request here https://buganizer.corp.google.com/issues/181884353 which you can subscribe to for more updates. We expect to have this feature in Private Preview in the next 4-6 weeks. For many users here, this may be a satisfactory resolution.

bu...@gmail.com <bu...@gmail.com> #103Mar 4, 2021 10:15PM

While 3 may be most practical, it does come with a cost. Similar to using
"lamda warmers" or min instances on aws, it only hides the problem and
generally generates the cloud provider more money.

Ideally #2 would be best, or even the possibility of breaking out
components so we don't have to load in the whole sdk just to use one admin
function or make a firestore call.

On Thu, 4 Mar 2021, 21:59 , <buganizer-system@google.com> wrote:

- Show quoted text -

ma...@apptreesoftware.com <ma...@apptreesoftware.com> #104Mar 4, 2021 11:32PM

#1 is true but I find without the firebase SDK it's acceptable. Any improvements would be appreciated.

Min instances are a great addition for specific use cases but overall they are not providing a huge benefit to us. In our application we have many scaling events during the day. Our app is spanned across multiple time zones and there are time periods within each timezone where functions need to scale up to meet demand. When our application needs to scale a user was often waiting 20s+ as a new instance came online to handle their request. Then they would perform another action in the application which would cause yet another scaling event for a different function. Again waiting 20s+.

Solution #3 just means we have to set min instances on a large number of our functions to avoid this experience for our users. We still run the risk of the app scaling past the point of my configured min instances. While this will band-aid the issue, the value proposition for cloud functions is lost once you start using the Firebase SDK.

#2 is the solution I am waiting for... OR the ability to deliver firestore triggers and callable functions via EventArc so we can handle them directly in Cloud Run.

wi...@google.com <wi...@google.com> #105Mar 5, 2021 12:30AM

Assigned to bl...@google.com.

Hi all, for Firestore improvements, I talked to the Firestore team and they asked me to pass this along:

The Firestore SDK team has examined the impact of loading the Firebase Admin SDK pretty extensively, and while there are some small gains to be had code loading/weight doesn't explain anything like the multi-second cold starts folks on this thread have reported. We believe the issue seems to lie specifically with the gRPC connection the Firestore SDK uses to read data, but our internal testing has not been able to reproduce the same effect that has been described in this thread. We're going to keep investigating (including options such as providing a non-gRPC SDK), and if you can reliably reproduce 5s+ cold starts with a minimal code sample, we'd love to know more about it so we can take a look (including how you're measuring the cold start duration).

I'm going to move this issue to the Firestore component so that the Firestore team can continue to action this issue.

(For GCF specific improvements to cold start, as mentioned earlier, I would direct you to https://buganizer.corp.google.com/issues/181884353).

ay...@gmail.com <ay...@gmail.com> #106Mar 5, 2021 03:47AM

There are already sample projects explaining the problem at start of this
thread which can be referred by firebase team.

Regarding min instances, currently we are using it in cloud run.
Unfortunately, even though they solve for first concurrent 80 requests(in
best case), its a huge overload on the cost factor with different
environments.

We have 4 environments currently and 4 services on cloud run.
4 x 4 x 6 = $96 per month for lowest cloud run spec.

This is no way usable when preferring a serverless environment.

And for finctions, min instance = 1 can solve only for one concurrent
request only right?

I see #2 as most optimal solution for ghis problem. Sdk need to be fixed
with a REST fallback as an alternative.

On Fri, Mar 5, 2021 at 6:31 AM <buganizer-system@google.com> wrote:

- Show quoted text -

vi...@google.com <vi...@google.com> #107Mar 24, 2021 07:52PM

Hello,

We are excited to ship Min Instances on Cloud Functions to help with cold start times.

To get onboarded to the feature, please fill the given onboarding form:

https://docs.google.com/forms/d/e/1FAIpQLSfTHX45WiMzhlkgCoqU6DYRzfpkBECrSLLn6ej90mX7FjB6mQ/viewform

Best regards,
Vinod

wb...@sentryware.com <wb...@sentryware.com> #108Apr 1, 2021 09:36PM

We're still experiencing 5+ second cold starts for functions that perform a simple Firestore transaction. We love the ease-of-use of GCF, but we're now forced to reevaluate other options as this behavior is not acceptable in production. Frankly, I'm surprised Google hasn't pulled out all of the stops on this one, as it completely cripples one of their flagship cloud products.

The new min-instances feature does help some, but it's only a bandaid that does not resolve the problem. Even with min-instances, customers are still vulnerable to cold-starts when concurrent requests exceed the allocated capacity.

su...@gmail.com <su...@gmail.com> #109Apr 1, 2021 09:52PM

+1, I have to spent one comment on this but it's totally my thought. Or this is fixed or, at least, closer to the market offer of speed or I will need to move the projects to other providers. The clients are not satisfied with this cold start issues.

Message last modified on Apr 2, 2021 08:19PM

bl...@google.com <bl...@google.com> Apr 2, 2021 06:39PM

Reassigned to kh...@google.com.

ay...@gmail.com <ay...@gmail.com> #110Apr 29, 2021 08:18AM

From past couple of weeks, I'm working on a solution to avoid the min
instances too as its getting costly(cloud run) for the diff environments we
have (around nearly $50 all envi's but which may go up to $40 per
environment when we split our microservices further).

The same applies for cloud functions too and more over its not capable of
handling request concurrently. Only option need to increase min instances
which further increases the cost.

So, we wanted to modify the repo as thats the correct way to solve it
rather than min-instances. I tried at max not to change much of the
nodejs-firestore <

https://github.com/googleapis/nodejs-firestore> code as
it makes it easier to rebase with master and also to make compatible with
existing projects.

Before making it public, I would like to do a few more tests to prove it's
beneficial for all who want to avoid cold starts with firestore sdk.

Here are the results until now.
1. Able to completely wrap the nodejs firestore api (except for
partialQuery api)
2. Finished lazy loading of grpc as its not used any more with rest
implementation.
3. Able to save 2secs loading time which I see need to be squeezed more -
working on it.

[image: Screenshot 2021-04-29 at 1.36.52 PM.png]
If someone can make testcases to quickly try out the api, it will be of
great help. Do let me know if you would like to contribute.

Thanks,
Ayyappa

On Sat, Apr 3, 2021 at 12:09 AM <buganizer-system@google.com> wrote:

- Show quoted text -

ch...@gmail.com <ch...@gmail.com> #111May 18, 2021 05:06PM

After waiting for a long term viable solution for node.js programs with cold starts, we switched away from node.js to golang and away from 60+ cloud functions to 3 dumb functions pushing events to a centralized pub sub architecture hosted on k8s.

This was after a significant amount of work done via node.js and the decision was not made lightly. We host multitenant applications across multiple firebase projects and the startup time was causing 45-60 second processing time for purchasing. While some of this delay was caused by multiple functions chaining and triggering other workflows, the start up time for 7-10 chained/triggered functions was unbearable.

Cloud run, if you are not aware (I wasn't), only processes items during http requests and does not allocate cpu time to pull based pub/sub. Min instances are insanely expensive (4 cloud run revisions set to 1 instance) is more than running a 3 node zonal k8s cluster with preemptable e2 medium instances (total 6cpu, 24GB for the cluster).

Our firestore event to processing times are now down below 2 sec for the entire chain (further processing still occurs - purchase time is below 8 seconds).
- firebase cloud function go --> http cloud run receiver
- receiver cloud run http service --> pub sub
- pub sub --> processing micro service in k8s

For log append type processing, messing with cloud functions and cloud run solely, ended up being a disaster. While running all in the emulator works really well, deployment to production ends up being a constant troubleshooting case that eats significant time.

If you are not doing this as a hobby and have more complicated workflows, being pennywise and pound foolish on saving a few bucks with "free" cloud functions might not be the best way to go. Waiting for a long term resolution ended up forcing our hand to change our core architecture and also caused pause about relying too much on what should be a simple library from google to connect to firebase/firestore.

While this is not applicable to everyone on this thread (given architecture needs), the change in direction for us allowed a much more simplified and proven architecture as opposed to the spaghetti mess that cloud functions inadvertently created.

da...@allfront.io <da...@allfront.io> #112Jun 1, 2021 04:34PM

Were there any updates here? There's no mention of pricing on min instances. I'm also experiencing 7 second bootup times for a very simple project. I read the entire thread and it's unclear to me - is the issue that the Google team can't replicate this? All you need to do is query a document using the SDK. Is there a solution on the horizon since 2019 or should we just consider switching to another provider? I believe the lack of stars on this issue has a lot more to do with how hard it is to find, than how few people have this problem.

The tips and tricks post is nice but it's I think it's a bit tone-deaf as it's throwing the blame on your customers, when doing nothing else but querying a document using the Firebase SDK replicates the issue, as many people on this thread have shown. I can provide a minimal code example / repo to reproduce it if it's an issue.

Message last modified on Jun 1, 2021 04:43PM

pm...@gmail.com <pm...@gmail.com> #113Jun 1, 2021 04:47PM

FWIW, I was able to get my start times down to about 2-3s (from 5s+) by:

- Trimming my dependencies to the bare minimum
- Switching to @google-cloud/firestore instead of firebase admin SDK

If I add a min instance then it halves cold start again to 500ms-800ms or so. Once the function is active then it runs about 50ms per invocation.

It looks like (?) the min instance uses idle pricing - so it's not as expensive as just keeping it running with pings. But I haven't done a thorough analysis on that.

bu...@gmail.com <bu...@gmail.com> #114Jun 1, 2021 05:59PM

I'd like to know how after a year, this P1 S1 issue still doesn't have a decent fix from Google. They wouldn't deem 7 seconds acceptable for a login function so why should we?

Throwing money at it for min instances isn't a fix, it's just a crappy solution that happens to make Google more money and keep us quiet for a bit.

kh...@google.com <kh...@google.com> #115Jun 10, 2021 07:52PM

Hi everyone. Thank you for being patient with us while we have looked into this issue. We understand the impact it has and how frustrating it can be to deal with. We take this issue very seriously however there are a lot of complex moving parts involved which must be adjusted very carefully. To date we’ve done the following:

Worked directly with numerous customers that reached out to us and helped them address their issue
Published best practices on writing performant functions
Rolled out a series of performance enhancements to cut function startup time for common workloads.
Released the min instances feature
Sped up function invocations from Firestore Triggers when only reading the triggered document (pull request).

We have more improvements coming. Please stay tuned.

As we explore and prioritize other improvements it would help if we knew the following:

What are your startup times at the 95th vs 99th percentile when testing a minimal reproduction of the issue? How does it account for network overhead between the origin and GCP? Please share your code and data with us.
What is your specific pattern of traffic at the time you see the issue? For example, is it during significant spikes, idle times, steady state, etc? What is the traffic volume at the time?

br...@askgms.com <br...@askgms.com> #116Jun 10, 2021 08:32PM

Thanks for posting an update. I think everyone who's still bothering to follow this is trying to be patient, but this is an egregious length of time for a significant blocker to usage. A lot of the frustration seems to come from the obviousness of the solution. The requisite dependencies for interacting with many Firebase components cause this issue, and they are absolutely the prime offender regardless of how many other dependencies are included: they need to be trimmed/deferred/modularized/all of the above.

It's great that there's at least one deferral change in place is starting to get attention - #5 is an example of this - but at this one-use-case-per-year pace, many of us may never see results which make expected use cases for Functions w/NodeJS work properly.

There are numerous examples provided by users of the exact behaviors which are issues here (e.g., comment #94). The fact that we can get rid of much of this problem just by making our own wrappers and skipping the Firestore SDK is an indictment of the state of that package (though Firebase Admin is also an offender). Most of us would like to use the official resources, but maybe they should have a huge asterisk in documentation about the performance implications if they won't be addressed anytime soon.

As a note, the #4 min instances "feature" doesn't actually address the problem at all; cold starts are still exactly the same, you just hit them less often since you front-loaded the lag time. You also have to anticipate your load well enough to set a min instances count that will prevent cold starts and that means functionally paying for peak capacity 100% of the time. Either that, or you hit cold starts when your traffic has any sort of a spike.

One thing that could solve this for the majority of use cases is not de-allocating shards after such a short span. If you allowed each to live for 24+ hours, I suspect you'd find that most concerns would evaporate, though traffic spikes could still cause cold starts. Another approach could partner with this to analyze load increases and anticipate spikes, making available shards in advance of calls being made. Obviously only helpful for Functions with a steady flow of traffic and intermittent ramped spikes, but that is another major use case (for instance, site hosting via Functions would likely be fixed with this).

To answer your questions in our case:

We consistently observe 5-8 seconds for cold starts in most cases, with it rarely dipping below that range. Statistical analysis is unavailable. We look at the logs while loading an asset, observe the lag, observe that the function doesn't start for that length of time, note the response is received after the delay. The next request completes without delay, as do all following that fall within a window where there has been prior activity. Not sure if it's enough to "account" for network overhead. Honestly, if you're still at this phase of investigation, this is likely a lost cause - just try out any minimum viable examples that weren't explicitly fixed by the special patch in #5 and you can easily observe this behavior yourself. We will not be able to share our specific code or data.
This is at any time when a Function is called on a new shard (or whatever the term is for the container running the Function). It's especially obvious if a Function has been dormant for more than ~20 minutes, guessing that shards are deallocated at that point and so any call needs to cold start one. I'd expect any deviation from a completely steady stream of requests would eventually trigger this.

ry...@cinder.studio <ry...@cinder.studio> #117Jun 11, 2021 01:40PM

Everyone,

Our company migrated away from using the default firestore libraries some time ago in favor of building our own against the firestore Rest API and our performance increased significantly. The majority of our issues had to do with the gRPC features of the Google sourced firestore library taking several seconds (up to 5 seconds) to warm-up.

A few weeks ago I began an effort to extract the code we have put together on this, and move it into an independent 3rd-party library. The goal to be that we open source it on Github. It's not perfect, and it's far from "open source standards" of ready (in relation to documentation and needed thorough testing). It is also not yet ready to deploy to NPM. I simply have just not had the time.

However, if anyone wants to make use of the libraries we are using you are welcome to them. We'd love any support in helping get this library open-source and NPM quality. We rushed this solution together to solve an immediate need so it could be improved.

We've been beta testing a few instances of our systems against the newly extracted codebase (as compared to the implementation of the same technology in our current codebase) and it appears to be functional.

So if you are interested in using (and possibly even helping out with) a pre-production open source library on the topic. Please dive in!

https://github.com/cinder-studio/quick-firestore

da...@allfront.io <da...@allfront.io> #118Jun 15, 2021 08:59AM

I had some limited success improving the load times by using better-firebase-functions (

https://github.com/gramstr/better-firebase-functions) on npm. Clean and better than littering all the source code with IF statements.

Re this sentence from the tips and tricks:

"blocking a user-facing UI update on the response from a Cloud Function is not a good idea."

I feel mislead that I need to dig so deep to find that out.

Until this is fixed, I suggest that this is placed first thing in your docs or in bold, in a prominent place in your marketing material, otherwise there's a danger that other developers would be misled into thinking that firebase + functions are for building apps that have UI updates.

Message last modified on Jun 15, 2021 09:09AM

da...@allfront.io <da...@allfront.io> #119Jun 15, 2021 09:44AM

Has anyone on this thread considered, or had success moving their time-critical cloud functions to App Engine + min instances? I'm wondering how this compares to the proposed min instances for cloud functions in terms of pricing and performance.

ay...@gmail.com <ay...@gmail.com> #120Jun 15, 2021 10:10AM

We migrated our main frontend facing apis to cloud run with min instances.
Its fine but as we have more services, we need to pay more for min instaces
per service(cloud run).

Each min instance for 128mb config costs around $7 anx for 512mb it costs
$19. So ideally if you have microservices project, with diff environments,
multiple services per environment will be a huge cost factor for us.

Cloud functions + min instances is not even a good solution to be honest.
Its better to pay for cloud run min instances as they handle concurrency.

However, AppEngine has an advantage but comes with a cost.

On Tue, Jun 15, 2021 at 3:14 PM <buganizer-system@google.com> wrote:

- Show quoted text -

da...@shax.com <da...@shax.com> #121Jun 15, 2021 02:05PM

Even the most minimal example causes this issue. You don't need more code
examples or a deeper understanding of use cases to recreate this issue. The
issue has been identified: it is extremely slow to create the initial GPC
connection from the function to Firestore.

The primary solution is: identify why GPC is slow to connect and fix that
issue.

A secondary (workaround) solution is: in the Admin SDK, officially support
the REST API as an alternative to GPC for Firestore and allow that to be
configured when creating a Firestore instance (or potentially make it the
default). It is acceptable if using the REST API does not support real time
subscriptions.

We appreciate you prioritising this issue, but asking for more examples and
use cases is starting to feel like stalling. It is now time to assign
engineering resources to solve the problem — not continue to triage it ad
infinitum.

Thanks
Dave

ch...@chris-reilly.com <ch...@chris-reilly.com> #122Jul 22, 2021 12:47PM

I’m building a project that heavily relies upon cloud functions and have hit this issue in a few ways. At first I was running a function that simply captured events and wrote them out to a log, then used log sinks to a pubsub with a second function that inserted the event into Firestore. That was taking 40-60 seconds to have the event render on my client, but the initial function was consistent under 100ms even on cold starts. The log sink adds ~15 seconds which is expected, pubsub has an issue with super slow delivery on low event volume which is a drag (but outside of the scope of this issue), but the write to firestore from the second function to firestore was consistently slow. The function itself ran within ~500ms but it took a long time from there for the event to be available in the Firestore collection.

Next, I eliminated the sink from the equation by writing to pubsub directly using the @google-cloud/pubsub library. To my surprise I still was seeing 20-40 second delays, and increased my cold starts on the first function to 5-8 secs.

Finally I attempted to write directly to firestore from my first function and still see 20 second delays from invocation to the document being in the db and rendered on the front end. The curious thing is that doesn’t just happen on cold starts, and even when the function finishes in 500ms it still takes 10+ seconds for the document to be in the db.

I mentioned all of that to raise two issues that might add to the discussion:

1- the delays associated with gRPC calls from cloud functions appear to be affecting pubsub as well.

2- firestore seems to have delays with ingestion even well after the function finishes. Pushing the same event from my local machine with the same library resolved within milliseconds.

I’m trying to build a ‘real-time’ serverless product and these latencies are an existential threat to its viability on GCP.

[Deleted User] <[Deleted User]> #123Sep 1, 2021 09:30AM

My team have also had issues with firestore connections on cold start over the last few months in Node.js. Very inconsistent, can be a few seconds up to 8 seconds at times. Min instances for functions isnt the best solution cost wise and doesnt help if traffic suddenly spikes. I'd rather just see a solution using an alternative to gRPC such as a lib supporting the REST API.

Is there any news on the status of this issue?

kh...@google.com <kh...@google.com> #124Sep 1, 2021 04:18PM

Hi everyone,

Yes, we are still looking into this issue and it is still a priority. As I mentioned back in June it is a very complex problem due to all the moving parts involved and improvements will be incremental. I'm not at liberty to discuss specifics but we have improvements in progress and this ticket will be updated when we have something public we can share.

Thank you for your continued patience.

ay...@gmail.com <ay...@gmail.com> #125Sep 1, 2021 06:38PM

To my surprise, I saw this today and was a bit disappointed with
Firestore + Functions.
Firestore showing up 5.8+ secs @99.9 percentile which is unacceptable.
Maybe they could have tried with min instances or cloud run but still it's
totally related to the current problem we are discussing here.

Ref:

https://serverless-battleground.vercel.app/

https://blog.upstash.com/serverless-database-benchmark

On Wed, Sep 1, 2021 at 9:49 PM <buganizer-system@google.com> wrote:

- Show quoted text -

st...@google.com <st...@google.com> #126Sep 1, 2021 07:23PM

If I understand correctly https://serverless-battleground.vercel.app/ is using AWS Lambda, right? So what it shows is that the Firestore library also has high latency on AWS Lambda, similarly to what it observed on Cloud Functions.

ay...@gmail.com <ay...@gmail.com> #127Sep 1, 2021 07:44PM

No. They actually used Cloud functions for testing firestore.
Ref 1:

https://github.com/upstash/latency-comparison/blob/b8113f8ec7b21dc26d9689e649bea8d96b676cfe/newsapis/gcp/index.js#L40
Ref 2:

https://blog.upstash.com/serverless-database-benchmark

However,
As per the title of this issue, it's **not actually** cloud functions fault
but mainly firestore's gRpc libraries. Just because all fall under the same
umbrella(Google), the issue is referred here by everyone regardless of its
firestore library issue, reason being on cloud, firestore used mostly in
combination with functions.

On Thu, Sep 2, 2021 at 12:53 AM <buganizer-system@google.com> wrote:

- Show quoted text -

ca...@google.com <ca...@google.com> #128Sep 3, 2021 07:10PM

Hi,

As an update on this Issue Tracker, our Engineering Team is still working on this issue and they will provide an update as soon as they have something relevant to share.

Message last modified on Sep 3, 2021 07:12PM

da...@gmail.com <da...@gmail.com> #129Sep 3, 2021 08:04PM

The issue is made worse by the fact that you're aggressively killing and removing the function from memory to save money and server time. Maybe as a workaround while you work on a solution you can increase this timeout so the cold start issue is less critical. The function seems to remain "warm" if you keep calling it, but the cold start issue comes back almost immediately after you stop calling the function.

Message last modified on Sep 3, 2021 08:06PM

st...@google.com <st...@google.com> #130Sep 7, 2021 04:56PM

Note that customers who wish to keep their Cloud Functions warm to avoid cold start can leverage the "min instances" feature https://cloud.google.com/functions/docs/configuring/min-instances

bu...@gmail.com <bu...@gmail.com> #131Sep 7, 2021 05:03PM

Min instances aren't really a solution, any surge in traffic above min then suffers excessive startup time. Scheduled min instances has more benefits if you have a predictable workload e.g high load at 9am

ay...@gmail.com <ay...@gmail.com> #132Sep 16, 2021 09:10PM

Finally! I'm done with the rest module for firestore.
I noticed it's pretty fast and tries to maintain the
nodejs-firestore project in parallel so that I can shift once Firestore
fixes the gRpc problem.

Steps to integrate:
1. npm install @bountyrush/firestore
2. Replace require('@google-cloud/firestore') with
require('@bountyrush/firestore')
3. Have FIRESTORE_USE_REST_API = 'true' in your environment variables.
(process.env.FIRESTORE_USE_REST_API should be set to 'true' for using in
rest mode. If its not set, it just standard firestore with grpc connections)

As I'm using the same nodejs-firestore project by forking it, I tried to
have the compatibility to max. If you see any issue implementing it please
let me know.

I see the cold starts are much better now and can be further improved. Do
let me know your feedback/thoughts.

Thanks,
Ayyappa

On Tue, Sep 7, 2021 at 10:33 PM <buganizer-system@google.com> wrote:

- Show quoted text -

ry...@cinder.studio <ry...@cinder.studio> #133Oct 16, 2021 01:48AM

ay...@gmail.com we should team up. I posted this project back in June to accomplish the same problem. We use this library in prod fairly regularly. But it does not get the love it needs in documentation. Maybe your library is more thorough?

https://github.com/cinder-studio/quick-firestore

ay...@gmail.com <ay...@gmail.com> #134Oct 16, 2021 12:43PM

Hey,
I will check it out. The main goal for the library (

https://github.com/bountyrush/nodejs-firestore) is to make sure it has
better cold starts and fully compatible with existing official one so that
we can quickly shift back to it once firestore fixes it (If they really do
- I'm not sure they go away from gRPC anytime soon :|).

Due to having rest mode fully compatible with the official one, we got some
extra baggage with it (due its internal libraries). However, we see much
better loading times and are currently using it in our project.

Would be happy to collaborate if you have some plan in mind to make it
better.

Thanks,
Ayyappa

On Sat, Oct 16, 2021 at 7:18 AM <buganizer-system@google.com> wrote:

- Show quoted text -

ce...@google.com <ce...@google.com> #135Oct 20, 2021 04:49PM

comment

Hi,

Thanks for contacting Google Cloud Platform Support,

The library that mentioned is not supported by us. As such, we cannot answer any questions about its functionality.

Regards

wb...@sentryware.com <wb...@sentryware.com> #136Oct 25, 2021 06:25PM

The fact that now two Google customers have provided meaningful workarounds to a product-breaking, 16-month old problem that a nearly two trillion dollar company hasn't provided any solutions for does not inspire confidence.

It looks like there are about 30 Googlers CC'd on this issue, to which I say this: Please fix your product. We really want to use it, because when it works it works really well. However, this problem makes it completely unusable in production. We cannot reasonably ask our users to wait 6+ seconds for a function to cold start. The workarounds provided by Google can reduce some of the impact but are not sufficient. It is clear that Google is trying to grow their cloud market share, and we're cheering for you, but the inaction on this issue is producing the opposite effect.

If Google cannot or will not solve this issue in a reasonable amount of time (which, frankly, was a long time ago), my org is going to be forced to abandon this product. We cannot gamble our own success on hints of an eventual resolution. After 510 days of waiting, we need more than the unfulfilled promises we've been given thus far.

br...@askgms.com <br...@askgms.com> #137Oct 25, 2021 06:56PM

Sadly, we've had to migrate away from Firebase Functions for hosting our Node.js programs. We had assurances from their team in December 2020 that this was being fixed ASAP, and here we are in October 2021 with no progress to speak of. We've moved on to Cloud Run, but frankly, if we weren't already built out for other portions of the Firebase environment such as auth, we'd have migrated back to AWS and likely never revisited GCP again. That may still happen once we have time to port functionality - my faith in critical issues being resolved by the GCP team is entirely eroded.

This is a severe black eye for any aspiring cloud platform, and many times more so given this is Google! Either tell us you're not going to fix this, or commit to fixing this on a certain time table and do it. Not that it'll benefit my team any longer, but for the sake of everyone else here, get your head in the game.

Message last modified on Oct 25, 2021 06:58PM

ma...@gmail.com <ma...@gmail.com> #138Oct 25, 2021 09:01PM

TLDR: Try writing critical client-facing functions in Go

Yes, Go is going to be better than Node.js, but I was curious to see how it performs in the Functions environment, especially considering this issue. This is my first post here, there are plenty of complaints here, I thought I might try to share my own experience. This is absolutely not a silver bullet and not applicable to everyone, but it might help certain use cases. I've been increasingly frustrated by the 5-15 second cold-starts, and this is just to execute a simple Firestore query returning a single document ID, seeing as the client libraries do not play well with Node.js SSR. Cold-starts are crucial for client-facing functions. Minimum instances are expensive for some, and they still affect that random unlucky client.

Serverless is supposed to be fast, not a background processing platform (although it excels at that too!). It's caused me to reconsider Firebase several times and try Cloud Run with Go, but then I realise that Functions aren't much different, and I keep coming back. Some things, like storage rules and auth, are baked into the Firebase platform. It's exhausting and frustrating, but at least I don't have the pressures of a company, my heart goes out to you that do.

I haven't had time to try the above community-improved libraries, what I did try is re-writing a function in Go after seriously considering starting from scratch on Cloud Run, and I'm shocked at how much better it performs.

Before: 1G instance, 8000-15000ms cold-starts in Node.js (depending on which VM is allocated, and probably also somewhat dependent on the direction of the wind), ~100ms warm execution times (as little as 3ms for CORS pre-flight requests)
After: ~80ms cold-starts using 128M instances written in Go, using around 20MBs of memory. In fact, the 128M seems to perform better than 1G, strangely. I haven't measured the overhead due to the actual allocation of a machine and download of the container, but it is negligible with network latencies.

This is not a viable solution for everyone. I myself have built significant tooling in Typescript to get rid of boilerplate, stuff that would have been amazing to have in a firebase-contrib type standard library. It's a struggle to set into a new language with no generics and increased verbosity, but the performance gains are worth it. I just wish that Firebase natively supported Go.

Some of the pros 👍 are:

No need to lazy-import files to help that cold start
No more worries about exceeding the maximum function upload size (node_modules), Go is naturally smaller and has dead-code elimination I believe
Go doesn't have to synchronously require(...) 10/100/1000s of files from the disk (future ES Modules and bundling could help, but it's extra setup boilerplate)
The SDKs seem to be high quality, perhaps better than Node.js (native Query iterators that I avoided in Node.js due to lack of documentation), the bonus of being Google's star language, it pretty much feels like import-and-go (also an excellent language for bad puns), like with Deno
The std library is amazing, it even has image manipulation for thumbnail processing
It integrates nicely with the Firebase dashboard and error tracker, as if it was meant to be
Did I mention the cold-start times that are faster than Node.js's warm-start times (OK, maybe exaggerating a bit...)?

Bear in mind that you will have to manually enable APIs and set up Scheduler timers, and there is no native request auth verification (big bummer, there is no onCall equivalent yet, but no problem if the function is public serving), you have to verify that yourself. If you're dependant on Node.js for SSR or have a very large app/function, you're out of luck, but I would recommend taking a look at Cloud Run, it processes several requests concurrently and will further reduce the chance of hitting a cold start, and supports background events (via push). From my experiments, it also seems to allocate extra instances pre-emptively during spikes, but this is pure speculation. Testing & emulators are also going to be more of a pain, but to be frank, it's already a pain with Node.js.

To be clear, I consider this a workaround. The issue lies clearly with the Node.js SDK, not the language or runtime itself. Go is more efficient, but unless you're doing compute-heavy work, it's not magically 10x faster, and it also has its quirks. There is little need for CSP when behaviour is mostly synchronous on a single-core instance. It just feels like the Node.js SDK (or the underlying gRPC dependency) isn't natively JS, but rather ported from Java/Go.

Until GCP addresses this issue, I would recommend to try a gradual adoption with Go for critical parts.

Create a new directory in your Firebase project
Add a go.mod file (Go 1.16)
Create a functions.go file with a non-main package name. This file can house any number of functions, you specify the entry-point when deploying each function
Each function is a normal Go-type HTTP request handler, very similar to Node.js/Express' request handler.

The good news is that deployment is still very automatic, a single command with no Cloudbuild configuration. The firebase functions config you're used to pretty much maps 1:1 to gcloud functions deploy CLI arguments, it's reproducible and there's no need to mess with the GCP GUI.

Marcus

kh...@google.com <kh...@google.com> #139Oct 26, 2021 12:52AM

wbattel@sentryware.com, brandon@askgms.com -- Thanks for taking the time to express your frustrations while also showing support for the product. We absolutely care about this issue and on a personal note it pains me to read comments like yours.

As I mentioned in September Google Cloud Functions is a large product built on a LOT of very complex interconnected systems. Since then work has been done on this issue but I can't share specifics. I know this is extremely frustrating to hear but want to reassure you that this is important to Google and is regularly checked on. It's unlikely there will be a dramatic update anytime soon but it is moving forward.

kh...@google.com <kh...@google.com> Oct 26, 2021 10:32PM

Reassigned to be...@google.com.

at...@protonmail.com <at...@protonmail.com> #140Nov 17, 2021 01:13PM

@ay...@gmail.com thanks for your work Ayyappa does that mean that just switching the import of Firestore to @bountyrush/firestore and setting the environment variable to use the rest API then I wouldn't have to change any other code? Methods like runTransaction etc will all work with your Firestore export?

Thanks,

Jon.

ay...@gmail.com <ay...@gmail.com> #141Nov 17, 2021 01:26PM

Yep!
Do let me know if you hit any trouble!
Thanks,
Ayyappa

On Wed, Nov 17, 2021 at 6:43 PM <buganizer-system@google.com> wrote:

- Show quoted text -

at...@protonmail.com <at...@protonmail.com> #142Nov 17, 2021 01:28PM

Awesome, will give it a try. Thanks for your work. Jon.

da...@allfront.io <da...@allfront.io> #143Nov 23, 2021 07:53AM

Has anyone here had luck sorting this out with min instances?
We enabled min instances on some functions, it seems better but if they are not used after a day or so we are still getting a cold start.
Do you really need to have min instances + warm up jobs to work around this?

jo...@gmail.com <jo...@gmail.com> #144Nov 23, 2021 05:01PM

Has anyone here had luck sorting this out with min instances?

min instances in firebase cloud functions did not solve it for us

we moved all functions that require immediate response to a more traditional web server running in cloud run, and are using traditional rest api calls rather than invoking a cloud function directly

br...@askgms.com <br...@askgms.com> #145Nov 23, 2021 05:10PM

We also migrated to Cloud Run - min instances didn't solve anything for us.

Once we were there, I realized there was another key component of Functions which makes running a Node.js server extremely inappropriate: Functions only allow 1 concurrency. This means that, even if you have a min instances count of $HIGH_NUMBER, you can hit that with one client making a variety of concurrent requests (say, asset fetches, auth checks, page content, etc). Our usage shows approximately 20 requests concurrent on average for a first page load, which means that we'd have to have 20 min instances for ONE client to be served without a cold start impacting latency.

Obviously that's untenable, even for one client, so we moved to Cloud Run and can have a concurrency in the hundreds without issues. Set a min instance count there, forget it. The concurrency limitation is so severe that I'd actually encourage the Firebase team to explicitly note that Node.js functions should not be latency-sensitive and any which would expect concurrency (e.g., web servers) should absolutely avoid Functions. Maybe it's somewhere in the documentation, but we didn't find it and wasted a lot of time barking up the wrong tree.

jo...@puul.io <jo...@puul.io> #146Nov 24, 2021 12:13AM

Hi there,

How would I authenticate

https://github.com/bountyrush/nodejs-firestore or

https://github.com/cinder-studio/quick-firestore with Firebase Admin SDK? I'm having the same cold start issues with basic Cloud Function & Firestore Usage.

ay...@gmail.com <ay...@gmail.com> #147Nov 24, 2021 11:51AM

Here is some sample code shared by one of the users on how to use it

process.env.FIRESTORE_USE_REST_API = true
const functions = require('firebase-functions')
const admin = require('firebase-admin')
const adminConfig = JSON.parse(process.env.FIREBASE_CONFIG, process.env.
FIRESTORE_USE_REST_API)
admin.initializeApp(adminConfig)
const { Firestore } = require('@bountyrush/firestore')
const db = new Firestore()

I will write a simple tutorial on how to use it. I can share the code from
my project but I have my own abstraction(to include multiple database
drivers) which may look complicated.

Please post on the github issues if you need any help.
Thanks,
Ayyappa

On Wed, Nov 24, 2021 at 5:44 AM <buganizer-system@google.com> wrote:

- Show quoted text -

jo...@puul.io <jo...@puul.io> #148Nov 24, 2021 08:22PM

Great, thanks so much!

jo...@examind.io <jo...@examind.io> #149Apr 21, 2022 12:28AM

Is there any update on this? I'm also struggling with cold start times when using Cloud Functions with Firestore via the Firebase Admin SDK. GCP released its 2nd gen functions in February (

https://cloud.google.com/blog/products/serverless/introducing-the-next-generation-of-cloud-functions). I'm still running gen 1 functions. Does gen 2 improve cold starts?

Note: min instances improves things slightly, but I still encounter long (~8 sec) start times

pe...@gmail.com <pe...@gmail.com> #150Apr 26, 2022 04:59PM

it has nothing to do with cloud functions infrastructure so gen 1 or gen 2 is not gonna help a lot, the problem is with the big grpc library if you use an instance of less than 500mb ram your gonna have problem with cold start because of the time needed for importing grpc. firestore depends on grpc so the best you can do is try the library (

https://github.com/bountyrush/nodejs-firestore) is a version of firestore where you can shutdown grpc and depends on rest

ay...@gmail.com <ay...@gmail.com> #151Apr 27, 2022 09:28AM

@pargolfsolutions.com Thanks for pointing to our REST implementation (https://github.com/bountyrush/nodejs-firestore). If you use it, can you please share the gains when using REST instead of gRPC?

jo...@examind.io <jo...@examind.io> #152Apr 28, 2022 06:24AM

pargolfsolutions.com Thanks. Does bumping ram to 512MB (or even 1GB) fix this issue? That hasn't been my experience with the functions where I've already assigned 1GB, but if it'll help, I don't mind bumping the memory of many other functions that are currently running on 256MB.

ma...@gmail.com <ma...@gmail.com> #153Apr 28, 2022 01:32PM

It won't fix the issue, but you'll probably have a higher chance of being allocated on higher-end hardware that may make the cold-start a little faster. I've even had function execution time take twice as long on some instances with the same configuration as on others for the same task. The lower-end instance types are noticeably slower in my experience.

bl...@gmail.com <bl...@gmail.com> #154Apr 29, 2022 02:43PM

@ayyappa - Your implementation has improved my cold start speed significantly. Using your library + 1GB instances, I'm currently getting sub 3 second response times. Thanks for your hard work supporting this community!

ay...@gmail.com <ay...@gmail.com> #155Apr 30, 2022 05:42AM

Glad it's helpful! Thats definitely a booster! It has scope for reducing a couple of seconds more, which I will start working on :)

jo...@examind.io <jo...@examind.io> #156May 4, 2022 05:52PM

I bumped a lot of cloud functions from 256MB to 1GB RAM. Together with setting a minimum instance, the cold start time has noticeably improved. Prior to bumping the memory, even with minimum instances, my cloud functions suffered terribly from cold starts.

I'm still using the Firebase Admin SDK with Firestore.

wb...@sentryware.com <wb...@sentryware.com> #157May 19, 2022 10:17PM

I'm curious if Cloud Functions v2 will help with this since it is moving to Cloud Run as the backend. If anyone has tried the public preview, I'd love to know if you've witnessed any improvement regarding this issue.

st...@ctma.fr <st...@ctma.fr> #158May 20, 2022 08:37AM

After 2 years waiting for this to be fixed, we decided to move away from it. It is just not acceptable for production use.

be...@gmail.com <be...@gmail.com> #159May 20, 2022 08:56AM

Totally agree with you. I’m also looking to migrate the whole cloud functions triggers into cloud run services coupled to mongodb atlas triggers.
And you what are you planning to use instead ?

bu...@gmail.com <bu...@gmail.com> #160May 20, 2022 08:59AM

Unfortunately cloud functions v2 won't fix how the library is written, v2 may end up being faster but the init of the sdk will still be an issue (it's also slow on AWS)

IMO it's not particularly suitable for enterprise if you're relying on various firebase/firestore stuff and serverless/firebase functions.

I find there is also various things you can't do because the methods just don't exist, you can raise a feature request on github, but it'll probably sit there for years with all the other requests.

jo...@gmail.com <jo...@gmail.com> #161May 20, 2022 03:10PM

Comment has been deleted.

Message last modified on May 21, 2022 12:55PM

ay...@gmail.com <ay...@gmail.com> #162May 21, 2022 12:26PM

Could you please open an issue at the github page so that I can look into it next week?

st...@google.com <st...@google.com> #163Jun 16, 2022 11:53PM

Cloud Functions gen2 runs on Cloud Run, but Cloud Run is using by default the same execution environment as Cloud Functions gen1.

However, Cloud Run has a new execution environment in Preview: https://cloud.google.com/run/docs/about-execution-environments

I'd be interested to know if the latency issue also occurs on Cloud Run's second generation execution environment.
If you are using Cloud Functions gen2, you can enable the second generation execution environment by changing this setting in Cloud Run.

[Deleted User] <[Deleted User]> #164Jul 13, 2022 03:55PM

We are currently experiencing this issue running our node application in Cloud Run. Locally, the app starts up in less than 3 seconds. In Cloud Run with 8 CPUs and 32gb RAM, it's taking over 15s. Would like to get an update on the issue if possible

se...@nextnowagency.com <se...@nextnowagency.com> #165Aug 17, 2022 07:48PM

The company I work at has used Firebase for quite a few one-off client projects, to the point where we've occasionally bumped up against our project cap. As such, I thought nothing of using Firebase + Firestore as our platform for a new simple database-backed quiz game, with the unusual caveat of an architectural strong desire to have all content pages generated server-side and returned pre-populated, rather than having the DOM get built out dynamically after page load on the basis of client library transactions with our database.

I was astounded when usually-reliable cloud functions were returning page content with whopping 5 second latency. Doubly so when I isolated the problem to, as pretty much this whole thread has indicated, the very first access to admin.firestore(), reliably 4000-5000ms cold, but only 70-200ms presumably-warm, "cooling off" again arbitrarily after 0-30 seconds with no real pattern.

It took a morning of searching before I tripped over the original 2019 github issue, leading to another issue thread, leading eventually here. 3 years, no fix? Client-side libraries and server-side REST calls still outperform the official documented server-side pack-in solution??

This project is already costed & contracted anticipating the ease-of-use of Firebase, but as senior architect, I'll definitely be having some second thoughts about pitching Firebase in the future knowing that a single admin.firestore().doc('...').get() can take full seconds to go through under ANY circumstance.

da...@gmail.com <da...@gmail.com> #166Aug 17, 2022 07:55PM

Honestly it's the functions that are terrible. I just moved those to app engine(still using realtime db and admin calls) and left the architecture otherwise the same for a few internal projects and we are back down to sub 50ms from 6+ seconds. No major refactor, clients happy, but I've pretty much given up this will ever be fixed.

da...@allfront.io <da...@allfront.io> #167Aug 17, 2022 08:14PM

Thought I'd add a note - min Instances does nothing to fix this properly,
it's min *idle* instances that fixed it for me in app engine. But we don't
have that flag on firebase.

On Wed, 17 Aug 2022 at 22:56, <buganizer-system@google.com> wrote:

- Show quoted text -

--

David Stellini Partner

Email

Phone

Website

davidstellini@allfront.io

+356 79954701

allfront.io <

http://www.allfront.io/>

bu...@gmail.com <bu...@gmail.com> #168Aug 17, 2022 08:39PM

It's just initializing the whole damn sdk takes ages, it's the same issue on AWS with lambda and I assume it's the same on whatever Azure's serverless solution is.

Any longer running service like app engine, ec2 with no cold starts doesn't have the issue.

I'm not too surprised that Google still haven't fixed it yet, but also amazed they felt 1~8 seconds cold starts is acceptable? It certainly wouldn't be acceptable for thier own products, imagine if doing a Google search or logging into Gmail / YouTube took nearly 10 seconds?

wb...@sentryware.com <wb...@sentryware.com> #169Aug 17, 2022 09:24PM

There are supposedly over thirty Googlers CC'd on this issue and yet here we are three years later with no solution other than to just not use the product they are trying to champion. This product is fantastic but how are we supposed to build anything production-worthy when this issue turns a simple Firestore read into a user churn because they rightfully didn't feel like waiting 6+ seconds?

If this is the way Google handles P1/S1 issues, how are we supposed to trust ANY of their cloud products?

Google? Can we fix this three year old show-stopping issue please?

Thank you. Sincerely, people that want to use your products.

gr...@gmail.com <gr...@gmail.com> #170Aug 17, 2022 10:08PM

If anyone at Google is listening, all that most of us really need is a simple library for performing CRUD operations. A lightweight wrapper around the REST API with more friendly DX would be a perfect solution here. Surely you can devote a little engineering talent to such a project. I'm heavily invested in the Firebase and Google Cloud ecosystem at this point, but this issue often makes me look longingly at other services. I will be looking very hard at alternatives the next time I start a project fresh.

Message last modified on Aug 17, 2022 10:09PM

ay...@gmail.com <ay...@gmail.com> #171Aug 18, 2022 02:20AM

It's pretty clear around 2 years back that its just because of the gRpc its
causing the coldstarts and using REST api clearly fixes the problem.

I still wonder whats the problem in providing an official rest wrapper
which can be maintained.
We made a wrapper which works wonders but still don't have the energy to
maintain it being a one person team (

https://www.npmjs.com/package/@bountyrush/firestore)

Its super annoying to look for workarounds like min instances which are not
even a proper solution. It costs hell as we go forward and even with
current count.

We will decide pretty soon!!!

On Thu, Aug 18, 2022 at 3:38 AM <buganizer-system@google.com> wrote:

- Show quoted text -

st...@google.com <st...@google.com> #172Aug 18, 2022 04:39AM

Sorry for the lack of progress. As you might guess, the cause is hard to pin point, and involves many teams.

Honnestly, the best you can do is leaving comments that will help the Google teams debug. So anyone leaving a comment going forward, please capture:

If you are using Cloud Functions (e.g. deploy via gcloud functions) or Cloud Functions for Firebase (deploy via firebase deploy)
If Cloud Functions, then if you are using 1st gen or 2nd gen.
The function startup times that you are measuring
The package name and version of all google-owned modules you are loading
(Optionally) a pointer to a repro case, or a copy of your package.json

Also note that in comment #163, I gave a suggestion: Use GCF gen2 and then go into Cloud Run to enable the second generation execution environment. We know that this new execution environment has a faster filesystem. Has anybody tried? It would be helpful if you could and then report back.

To anyone using Cloud Run and seeing a startup latency that they consider too high, please open a new bug in the Cloud Run component. It is unclear if the cause is the same, do not assume it is. This bug is focused on Cloud Functions.

am...@gmail.com <am...@gmail.com> #173Aug 18, 2022 05:20AM

Understood.

But why can't you release an official rest API wrapper? This has been raised by several people over the years, but it seems to get conveniently ignored, among other issues.

Your view on the official rest API wrapper will be highly appreciated.

ja...@google.com <ja...@google.com> #174Aug 18, 2022 01:40PM

Have you tried firestore/lite? That's the official REST wrapper for Firestore.

an...@taskheroics.com <an...@taskheroics.com> #175Aug 18, 2022 02:25PM

"Sorry for the lack of progress. As you might guess, the cause is hard to pin point, and involves many teams."

I think this thread has pretty conclusively pointed toward the cause being the size of the grpc library dep. The frustration felt by myself and others here is IMO due to exactly that sort of comment which seems to show that this thread has not been carefully read and this problem not carefully investigated by anybody at Google. It feels like a platitude at this point with a P1 S1 issue having little meaningful progress in 3 years.

I hope it's easy to see how all of us affected by (and losing customers because of) this issue might feel that this is being ignored.

How can we _actually_ meaningfully move this forward? Can this be brought to the attention of someone at Google who has the power to direct some resources toward fixing it? Or can we at least change the status to "won't fix" and prominently update the docs to mention that cloud functions have an expected latency of up to 6 seconds?

Message last modified on Aug 18, 2022 02:34PM

gr...@gmail.com <gr...@gmail.com> #176Aug 18, 2022 03:12PM

Have you tried firestore/lite? That's the official REST wrapper for Firestore.

That's a replacement for the web client library. This bug is about the Node.js library.

Message last modified on Aug 18, 2022 03:27PM

ch...@sidkik.com <ch...@sidkik.com> #177Aug 18, 2022 03:51PM

Quick note about this entire thread.

This is a prime example of cloud vendor lock in. You create an app that leverages a cloud specific libraries and managed applications and are totally dependent upon the vendor resolving and fixing issues. Sometimes that happens quickly and other times (as in this thread) the underlying condition is not addressed and a patchwork of costly options are presented as fixes. This happens across all 3 big cloud vendors and is not specific to google.

If this problem was going to be fixed, it probably would have by now.

I started using all of the baked in services in gcp and for a while, things worked well and were cost effective. Once you get to scale or if new bugs/features are introduced you have new challenges. This specific challenge is a show stopper both from cost and performance.

I still use gcp, but I pivoted away from most if not all of the managed gcp services and only leverage cloud agnostic apps hosted on gke. If you run in a single zone (avoid backplane charges with an isolated billing account) and use spot instances you can run a much better environment for ~$25 a month. No cold starts,only applications designed to be stateless with spot instances. I still use firestore as the storage layer, but avoid the cold starts with long running containers. You could run this on cloud run, but if you have more than one service, the cost will quickly exceed what you can do in gke. You could also run in anthos, but that is even more expensive than cloud run.

I originally forwarded a simple trigger payload to an http listener that would drop into google pub/sub (a handful of functions down from 40-50 functions). That works ok, but once you move to prod, you will see a .5 to 1 sec lag on pubsub receiving and the pushing. If you run a multistep async process (create account, create stripe, create active campaign, create quickbooks, etc) then those delays add up quickly and you are almost as bad as before with pub/sub being the bottleneck.

Our latest iteration, uses a handful of triggers forwarding to http listeners that then push to nats. Latency for intercommunication is less than 1sec from trigger to insertion in nats. And intercommunication between services (using messages on nats) is sub ms to 2-5ms delays. Our original payment process using stripe (multistep triggers) was about 45-60 seconds in prod which was unacceptable. Payment processing is down to 2-6 seconds with the nats setup.

This setup does not come cheap from a development/management perspective. You need to know k8s, nats, spot instance drops, monitoring, backup etc. If you don't want to invest in that knowledge, then you are beholden to vendor lockin and issues. At least with this setup, you can control response time and bug fixing to mitigate customer impact.

se...@nextnowagency.com <se...@nextnowagency.com> #178Aug 18, 2022 04:34PM

Ok. FINALLY got my stuff rewritten from admin.firestore() to @google-cloud/firestore -> @bountyrush/firestore with a detour through @google-cloud/storage to take a flamelink "dependency" out of the loop (looks like whatever it does under the hood also invokes gRPC judging by timing) and it's so nice to see worst response times in the 2s range rather than the 6s range.

firestore/light absolutely looks like the kind of thing that ought to have been mentioned about 18 months ago (since the best "solutions" are already "just pivot to client/REST libraries"), unless it's brand-new. I look forward to trying it out hopefully this afternoon!

Message last modified on Aug 18, 2022 04:38PM

wb...@sentryware.com <wb...@sentryware.com> #179Aug 18, 2022 04:42PM

As mentioned above, "firestore/lite" is not for the Admin SDK, but rather a replacement for the standard web client SDK. I don't find it at all relevant for this issue.

Also as mentioned above, the comments coming from Googlers are not informed and read as though they have not read the hundreds of comments that include all of the information they need to diagnose the problem. All you have to do to replicate this issue is use the Node.js Firebase Admin SDK in Cloud Functions, with the default configuration, and try to interact with Firestore in any way (read, write, transaction, etc) as stated in the Firebase documentation. You will find the cold start time to be consistently in excess of 5 seconds. That's your reproduction case. Please fix it.

Very frustrating.

gr...@gmail.com <gr...@gmail.com> #180Aug 18, 2022 04:46PM

Truly frustrating. What does P1/S1 even mean? This is clearly not being prioritized.

And the commenter above is absolutely correct. At this point, being told to provide a reproduction of this issue is almost insulting. This issue can be easily and reliably reproduced performing the most basic tasks with Firestore using the admin SDK. Refer to any of your own tutorials. Here's an example: https://github.com/firebase/functions-samples/tree/main/quickstarts/uppercase-firestore

Message last modified on Aug 18, 2022 05:14PM

ja...@google.com <ja...@google.com> #181Aug 18, 2022 05:56PM

FWIW the scope of this bug is Cloud Functions cold-start & GRPC. If you'd like to submit a feature request for a firestore-lite equivalent in the Firebase Admin SDK there are more appropriate channels for that.

gr...@gmail.com <gr...@gmail.com> #182Aug 18, 2022 06:08PM

Fair enough! Would you mind pointing us to the right place to request that feature?

ja...@google.com <ja...@google.com> #183Aug 18, 2022 07:35PM

You can file feature requests for Firebase here
<

https://firebase.google.com/support/troubleshooter/report/features> also open
an issue directly on the firebase-admin Github repo here
<

https://github.com/firebase/firebase-admin-node/issues>.

It's very possible that code could be reused between the firebase/lite
client library and the firebase-admin codebase—at the very least the
authentication mechanism would have to be swapped out

On Thu, Aug 18, 2022 at 2:08 PM <buganizer-system@google.com> wrote:

- Show quoted text -

gr...@gmail.com <gr...@gmail.com> #184Aug 18, 2022 08:21PM

I've filed a feature request for a Node version of Firestore Lite. If anyone wants to make some noise there, it might help it get attention: https://github.com/firebase/firebase-admin-node/issues/1879

Message last modified on Aug 18, 2022 08:21PM

st...@google.com <st...@google.com> #185Aug 19, 2022 05:19AM

I have followed up with the Node.js client library team about REST. We hope to have good news to share here soon. Stay tuned.

All you have to do to replicate this issue is use the Node.js Firebase Admin SDK in Cloud Functions, with the default configuration, and try to interact with Firestore in any way (read, write, transaction, etc) as stated in the Firebase documentation. You will find the cold start time to be consistently in excess of 5 seconds.

Unfortunately, this is not what I observe with a quickstart using a 2GB function. The times I measure are that I see in the logs.

Baseline

exports.helloWorld = (req, res) => {
  let message = req.query.message || req.body.message || 'Hello World!';
  res.status(200).send(message);
};

GCF gen1

cold: 689 ms
exec: 6 ms

GCF gen2

cold: 839 ms
exec: 4 ms

Using firestore

const {Firestore} = require('@google-cloud/firestore');
const firestore = new Firestore();

exports.helloWorld = async (req, res) => {
  const document = firestore.doc('users/steren');
  const doc = await document.get();
  console.log('Read the document');

  res.status(200).send('Hey');
};

GCF gen1:

cold: 1400 ms
exec: 84 ms

GCF gen2:

cold: 1900 ms
exec: 80 ms

GCF gen2 + enable "second generation execution environment" in Cloud Run

cold: 1026 ms
exec: 110 ms

This last experiment tests what I suggested in comment #163.

And as I suggested in comment #172, it would be helpful for debugging if you could share code and other parameters that reproduces the abnormally high cold starts.

pi...@gmail.com <pi...@gmail.com> #186Aug 19, 2022 05:41AM

If you throw resources (2GB functions) at the problem, is not as bad, but the whole point is it shouldn't take that long on more conservative resources (ie 512MB)...

If I'm fetching a plain document stored on other server (ie: firestore or API request), I simply don't need 2GB memory every single run, since most of the time the function will be waiting for a response, wasting resources and money.

What's the most upset about this, is the fact this problem has been known for so many years, it has been in P1 for so long, and the comments that we are getting are:

Blaming other library grpc
asking users to use expensive workarounds (min instances)
asking to check a web based library on a nodejs problem (firestore/lite)
asking users to use wasteful resources than needed (higher memory functions)

Even with the 2GB, looking at the numbers we can assume there's 80~100ms wait from the document get, making the cold start 2 to 3 times slower just by using firestore official library, that's the problem, is understandable some delay but almost a second to init a library is just too much, I find it surprising how google is always making excuses on such a bad performance, as someone said, it would be interesting seeing this "cold start" happening on google search engine...

I would rather have this as won't fix than having to keep listening to excuses and wasteful workarounds, at the least be transparent.

st...@google.com <st...@google.com> #187Aug 19, 2022 05:52AM

(the reason I selected 2GB in my previous test is because Cloud Run requires a minimum of 1CPU to enable the second generation execution environment, and picking 2GB gets you 1CPU)

Repeating my tests with 512MB:

Baseline

exports.helloWorld = (req, res) => {
  let message = req.query.message || req.body.message || 'Hello World!';
  res.status(200).send(message);
};

GCF gen1

cold: 499 ms
exec: 4 ms

GCF gen2

cold: 829 ms
exec: 4 ms

Using firestore

const {Firestore} = require('@google-cloud/firestore');
const firestore = new Firestore();

exports.helloWorld = async (req, res) => {
  const document = firestore.doc('users/steren');
  const doc = await document.get();
  console.log('Read the document');

  res.status(200).send('Hey');
};

GCF gen1:

cold: 1311 ms
exec: 66 ms

GCF gen2:

cold: 2200 ms
exec: 60 ms

I'd really appreciate if you could share a repository that includes code + package.json + the exact gcloud functions deploy command that reproduces the abnormally high cold start. Thank you.

Message last modified on Aug 19, 2022 05:53AM

bu...@gmail.com <bu...@gmail.com> #188Aug 19, 2022 08:21AM

The admin sdk that takes a really long time to init.

Here's an example that takes ~9 seconds on cold starts, then ~800ms afterwards. it creates/updates a user then read/writes to firestore. Tested on AWS.

https://github.com/ganey/firebase-sdk-slow

The hosting infrastructure doesn't really matter, its slow on GCF / AWS / Other

st...@google.com <st...@google.com> #189Aug 19, 2022 03:22PM

Sorry, it is out of scope of this bug to fix firebase-admin on AWS runtimes. Could you deploy it to an HTTP Cloud Function with gcloud functions deploy and measure the cold and warm times as I did above?

br...@askgms.com <br...@askgms.com> #190Aug 19, 2022 03:33PM

It really looks like you've already reproduced the issue... or are you implying that jumping from an 829 ms cold start to a 2200 ms cold start with just one library initialization is acceptable from your perspective? Nearly 1500 ms just for one library to initialize certainly doesn't seem acceptable as a general case, and obviously that's not what you would accept internally since none of your assets seem to have 1 1/2 second bonus lags at random times.

gr...@gmail.com <gr...@gmail.com> #191Aug 19, 2022 03:39PM

I agree with the comment above. What you've shown is the issue. In what world is a 2.2s startup time acceptable for a simple read or write operation?

It's also pretty disappointing to see that the problem actually gets worse in the gen 2 environment.

st...@google.com <st...@google.com> #192Aug 19, 2022 03:41PM

I am only capturing data, not making a judgement. I am not saying 2s is a great cold start, I am observing that 2s does not match the 5s or 12s latency reported on this bug.

This bug was originally opened for an abnormally high cold start observed when using Firestore client library. If this bug is now used for a more generic "Improve Cloud Functions cold start when using GCP client libraries from 2s to 0.5s", it is a very different issue for us.

gr...@gmail.com <gr...@gmail.com> #193Aug 19, 2022 03:52PM

First of all, thank you for engaging with us on this and actually investigating. To answer your question: I personally never saw anything like 12s cold-starts. I was periodically seeing 5 or 6 second startups around the time this issue was first filed. Subsequent work on the product has improved this for me, but I still regularly see 1.5-3.5s cold-starts, which is not too far from your own data. Those are completely unacceptable numbers for any user-facing task.

In my opinion, nothing about our ask has changed. This is still an issue with the Nodejs Firestore client library seeing extremely high cold-start times. Do you not view a 2s cold start as abnormally high?

Message last modified on Aug 19, 2022 03:58PM

br...@askgms.com <br...@askgms.com> #194Aug 19, 2022 04:02PM

I don't want to get too pedantic, but originally this ticket was opened in response to https://github.com/googleapis/google-cloud-node/issues/2942, which was originally filed under Google Cloud Node, not specifically Cloud Functions, and definitely not specifically the Firestore client library. The reporter from Google had assumed that the scope was exclusive to Cloud Functions, but that's clearly not the case (as was actually pointed out on the Github ticket itself before this was even opened: https://github.com/googleapis/google-cloud-node/issues/2942#issuecomment-636318679).

We definitely appreciate attention on this issue, but we've been getting "attention" for a long while here. Over two years ago we were told that this was the team's "top priority": https://issuetracker.google.com/issues/158014637#comment26

So please forgive us if we're losing patience with being asked over and over for the same reproduction scenarios, the same clarifications about why a ticket was opened, etc. We don't get to control opening these tickets, and we've provided extremely generous, clearly-reproducible scenarios many times over, and some of us have even written entire libraries to work around these issues while Google has failed to take meaningful action.

At this point, it feels like at minimum your team should be able to use the basic reproduction case, find the critical path there, and then identify optimizations that address the problem. Please stop telling us this is the top priority and then failing to fix much of anything. Or feel free to close this as "won't do" so at least we know it's time to move on to other NoSQL options.

st...@google.com <st...@google.com> #195Aug 19, 2022 04:03PM

Thanks. I can indeed confirm that we have released many improvements over the past 2 years. And the "second generation execution environment" that Cloud Run is offering in Preview is another one of these that should ultimately make its way to GCF as the default.

Thanks for confirming that the 2s cold start is now what we should be looking into.

I agree 2s is high, we should improve either by investing into the Cloud Functions execution environment, or by minimizing the footprint of GCP client libraries.

In the meantime, I encourage you to:

deploy to Cloud Functions 2nd gen, and then go into Cloud Run and set the "max concurrency" from 1 to 100. This will allow Cloud Functions to send multiple requests at the same time to the same instance, in practice, this drastically reduces the number of cold starts. (Node.js is designed to handle concurrent requests)
set min-instances = 1, so that the 0 to 1 cold start is mitigated by an instance being kept warm (this instance is not charged at full price when it's not processing requests).

ay...@gmail.com <ay...@gmail.com> #196Aug 19, 2022 04:05PM

We can't upgrade to Gen 2 as there is no proper solution for firestore
triggers and also on cloud run. We can't rely on audit logs so still on gen
1 cloud functions.

You can see the delays(6s - 8s) are not acceptable for simple trigger
callbacks. Actually the code is nothing other than just loading a doc in
the trigger from firestore.

Note that its 256mb function though but still didn't expect it to be this
way.

On Fri, Aug 19, 2022 at 9:22 PM <buganizer-system@google.com> wrote:

- Show quoted text -

gr...@gmail.com <gr...@gmail.com> #197Aug 19, 2022 04:14PM

The max concurrency suggestion doesn't help my case (a pretty common one I would think), which is sporadic, user-facing, time-sensitive tasks: processing payments, account upgrades, user registrations, etc. These are tasks that don't happen frequently enough to keep any instances warm, even with a fairly significant user base.

The min-instances suggestion has been made multiple times before, and is pretty frustrating, since it's basically asking us to pay extra to receive basic levels of acceptable functionality from this product.

Also to add a +1 — I am also unable to move to gen2, since I rely heavily on Firestore triggers.

Message last modified on Aug 19, 2022 04:34PM

bu...@gmail.com <bu...@gmail.com> #198Aug 19, 2022 04:26PM

Seeing as you wont check for AWS, here's the same repo but for gloud functions, gen 2, used 256mb (same as i did on aws lambda), i deployed through the web console:

https://github.com/ganey/firebase-sdk-slow/blob/gcloud-v2/index.js

first request in cloud functions was 10.5 seconds, following is around 1.4-1.6s

AWS is twice as fast for the subsequent requests, but i'm assuming thats because their vCPU is probably higher for the 256mb instance?

hd...@google.com <hd...@google.com> #199Aug 19, 2022 08:29PM

Hi Folks,

We are in progress to support HTTP REST transport (as an option) in addition to gRPC for the Firestore client and other Cloud services as well. As we understand, switching to HTTP transport in use should probably mitigate the cold start time issue.

We will provide an update early next week on the ETA/timeline.

Thanks,
Hari

gr...@gmail.com <gr...@gmail.com> #200Aug 19, 2022 09:10PM

Amazing news, thank you!

da...@google.com <da...@google.com> #201Aug 22, 2022 09:30PM

I took a look at the test cases Steren provided in #185 / #187.

First, there have been a number of performance improvements over the past couple of years, which have brought down overall cold start latencies for Cloud Functions. Most of these improvements are at the infrastructure layer (e.g. filesystem performance, kernel scheduler improvements, etc), and were not targeting this specific issue. This is why what previously might have taken 6-9 seconds has improved over the years down to more like 2-3 seconds in a fair number of cases.

Second, I agree that the current implementation of node.js gRPC is not optimized for fast cold starts. More on that shortly.

Third, here's an example cold start from Steren's 512 MB gen1 GCF example in #187. If we strace the startup (which admittedly adds a bit of overhead in and of itself), we see the following milestones:

Time (s) Milestone
==========================================================================================
0.000000 cold start begins
0.221176 node begins loading functions-framework module
0.737450 function loads user code <= FUNCTION ENTRY POINT
0.742013 node begins loading @google-cloud/firestore module
1.438190 node finishes loading @google-cloud/firestore module <= FUNCTION EXECUTION BEGINS
1.842618 function returns result <= FUNCTION EXECUTION ENDS
—-----------------------------------------------------------------------------------------

And looking more closely at the main JS modules being loaded:

functions-framework: stat 881 distinct paths, read 880724 bytes from 252 files = 516 ms
@google-cloud/firestore: stat 1157 distinct paths, read 4989964 bytes from 347 files = 696 ms

In this particular example, I saw around 122 ms of filesystem wait time (this is one of the things we've been optimizing). So now, the overall cold start latency is dominated by the function runtime itself - i.e. the node binary loading all those JavaScript dependencies, parsing / compiling them, etc.

Some of this time is independent of the user code. The node.js Cloud Functions runtimes do some amount of setup work before loading the user code, and this setup includes loading modules such as functions-framework, etc. So, even the most trivial "hello, world" node.js function is going to take 500+ ms for a cold start as things stand right now.

Separate from that, there is the user code and its dependencies... and now we get to the bit about node.js / gRPC. As many have pointed out on this bug, @google-cloud/firestore has a gRPC dependency, which pulls in grpc-js, protobufjs, and google-gax. These are very large dependencies (e.g. google-gax includes multiple megabytes of generated JS from various large proto descriptors), and these massive amounts of generated code (e.g. see the huge .js files here) simply cannot be loaded that quickly.

In addition, it is very easy with JavaScript to wind up pulling in a large amount of transitive dependencies. Hence we find ourselves loading 600 files (some of which are quite large) to run what looks like a trivial snippet of code.

I have not profiled an example that includes firebase-admin, but I suspect it is more of the same.

In summary, the update in #199 (HTTP/REST Firestore client) is likely the best path forward for reducing cold start latency. The combination of gRPC / protobuf (with its descriptors / schemas) leads to a lot of generated code that is slow to load. This is not the case in every language runtime, but is certainly the case with node.js today.

gr...@gmail.com <gr...@gmail.com> #202Aug 22, 2022 09:53PM

Thanks for the update - I'm very excited for HTTP/REST support. I currently use a mix of REST for time-sensitive functions and firebase-admin where that's less important, and I would absolutely love to be able to refactor all of that to just use an HTTP version of the Node.js Firestore client.

fe...@google.com <fe...@google.com> #203Aug 26, 2022 05:04AM

I did some work decoupling @grpc/grpc-js from google-gax:

https://github.com/googleapis/gax-nodejs/pull/1326 will help client libraries avoid loading @grpc/grpc-js if they only intend to use the fallback (HTTP) version of google-gax;
https://github.com/googleapis/gapic-generator-typescript/pull/1224 updates the generated client libraries to allow passing an instance of google-gax, which could be either require('google-gax') or require('google-gax/build/src/fallback'), in the latter case @grpc/grpc-js will never get loaded.

After these two PRs are merged and released, we'll make the Firestore library use this so it will only load the fallback implementation by default.

Message last modified on Aug 26, 2022 02:34PM

br...@askgms.com <br...@askgms.com> #204Aug 26, 2022 12:20PM

Thanks a ton for putting this together! It seems like these PRs may be exactly what’s needed to actually address the root problems, at least for many of us. Have any performance comparisons been done yet?

bu...@gmail.com <bu...@gmail.com> #205Aug 26, 2022 12:30PM

Thanks for the PR's, can these improvements also trickle into the admin SDK?

This would significantly improve the authentication speed for people using custom tokens or working with the admin SDK to manage users.

Registering a user via the admin SDK and creating a firestore document for the user can take 8 seconds cold, which feels like an awfully long time to wait from the frontend.

fe...@google.com <fe...@google.com> #206Sep 2, 2022 02:51AM

For those who follow this problem, I put together the fixes we were working on into a pre-release, please try

npm install @google-cloud/firestore@6.1.0-pre.0

and pass preferRest: true as an option for the constructor to enable the REST transport:

const db = new Firestore({preferRest: true});

I see an improvement in my quick tests, please let us know if it makes things better for you.

Message last modified on Sep 2, 2022 02:51AM

gr...@gmail.com <gr...@gmail.com> #207Sep 2, 2022 04:00PM

Amazing, thanks for the update. I'm testing this in real world conditions and will post again when I have results.

be...@gmail.com <be...@gmail.com> #208Sep 2, 2022 04:09PM

Thank you so much 🙏🏻 for the update. I'll also try this out.

Le ven. 2 sept. 2022 à 04:51, <buganizer-system@google.com> a écrit :

- Show quoted text -

--

*Benjamin Bouvier*
CEO & Co-fondateur - Knockk

🔗

www.knockk.app
<

https://knockk.app>

Message last modified on Sep 2, 2022 04:24PM

fl...@gmail.com <fl...@gmail.com> #209Sep 2, 2022 09:28PM

Tested on DEV environment. Empirically, it works much faster than before.
I've been waiting for this for a long time like 2 years, and I'm glad
there's progress finally.

Best Regards,

Derrick

On Sat, Sep 3, 2022 at 1:09 AM <buganizer-system@google.com> wrote:

- Show quoted text -

gr...@gmail.com <gr...@gmail.com> #210Sep 3, 2022 01:24PM

I'm seeing a significant performance boost from this in my testing so far. Very excited to see this happening! Can't wait for it to trickle down to firebase-admin, which will simplify usage.

at...@protonmail.com <at...@protonmail.com> #211Sep 3, 2022 04:38PM

Thanks for the update. I'm seeing better times compared to @google-cloud/firestore without REST but not as good as the times I'm getting from the @bountyrush/firestore package that a user in this discussion put together a few months ago. Not sure why that is, their package is a wrapper around the REST api. For some crude numbers the standard firestore package gives me cold starts of 9-10 seconds, the prefers rest option gives me 3-4 seconds while the bounty rush is between 1-2 seconds. All on V1 of cloud functions with just the standard configuration. Reads one doc and then updates a field on it.

Been very pleased with the @bountyrush performance although one that google supports directly would be much more appealing as I'm sure the developer can't commit much time to it.

fe...@google.com <fe...@google.com> #212Sep 15, 2022 12:05AM

Marked as fixed, reassigned to fe...@google.com.

Hi folks,

TL;DR: We released @google-cloud/firestore v6.2.0 with the HTTP/1.1 REST transport, please use it with {preferRest: true}, this bug will be closed.

Now that this bug is more than 2 years old, let me summarize what we did during this time and what the current state is.

One of the main findings here was that the slow cold start times could be linked with accessing the filesystem during the cold start, and to loading the gRPC library. During these two years, we implemented an alternative HTTP/1.1-only transport, and also reduced the number and size of files accessed during the library load. Since some Firestore functionality depends on gRPC (RPCs that require bi-directional streaming must use gRPC), the HTTP transport will be used whenever possible, switching to gRPC if needed. We made gRPC imported conditionally, so that it never tries to read any gRPC file from node_modules unless requested for a bi-directional streaming call.

Today we released @google-cloud/firestore v6.2.0, which includes all the fixes from the previously published pre-release, plus some reduction of the size of the files it loads during startup. Please note that the HTTP transport is currently not the default option, and should be enabled by passing {preferRest: true} to the Firestore constructor:

const db = new Firestore({preferRest: true});
// chooses HTTP or gRPC as needed, defaults to HTTP

Note: the change we made affects not only Firestore, but most of our other libraries (most of them have auto-generated parts that now support HTTP transport). E.g. if you are creating the Firestore Admin client directly, you can avoid loading gRPC by requesting only the HTTP part of our transport library, google-gax, and enabling the "fallback" mode:

const gax = require('google-gax/build/src/fallback');
// avoids loading google-gax with gRPC
const adminClient = new FirestoreAdminClient({fallback: 'rest'}, gax);

We'll eventually make it the default transport; since it's a big change in how the library behaves, the default change will likely go to the next major version (e.g. when we drop Node.js v12 support next year). For now, the HTTP transport will stay behind this constructor option.

We expect this release to improves cold start times. I saw the comment about @bountyrush/firestore performance and I will take a look to see if we can improve even more. At the same time, having a 2 years old open bug does not help tracking the problem at all, since a lot of things have changed and improved since the time the bug was opened.

So, the summary is:

I will close this bug as Fixed since v6.2.0 should resolve most of the slow cold start concerns.
Please update @google-cloud/firestore to v6.2.0, and pass {preferRest: true} if you experience a slow cold start problem.
Please feel free to open new bugreports (e.g. here) if you keep having the slow start problems, or contact support if you have support contracts. We are committed to improve the customer experience with our libraries, and we appreciate all bug reports and feature requests.

gr...@gmail.com <gr...@gmail.com> #213Sep 15, 2022 01:57AM

For folks interested in seeing preferRest exposed in firebase-admin, see this GH issue: https://github.com/firebase/firebase-admin-node/issues/1879

bu...@gmail.com <bu...@gmail.com> #214Sep 15, 2022 09:02AM

Thanks so much for the work on this!

Could you please clarify how this would work for imports or where we need to init the Firestore with a specific app?

e.g:

import {getFirestore} from "firebase-admin/firestore"; // could also be from "firebase/firestore"

const f = getFirestore(someSpecificAppInstance);

fe...@google.com <fe...@google.com> #215Sep 15, 2022 02:55PM

Re: firebase-admin, its latest version depends on @google-cloud/firestore v5, while the GitHub code already depends on ^6.0.0, so I'm guessing it needs to have a way to pass preferRest to the Firestore instance through the options, and then an npm release. This is better to be tracked in https://github.com/firebase/firebase-admin-node/issues/1879 or in a separate issue in that repository.

Re: firebase, it does not use @google-cloud/firestore at all, providing its own Firestore implementation, actually, two of them: the gRPC implementation and the lite HTTP implementation. You might just be able to follow https://firebase.google.com/docs/firestore/solutions/firestore-lite to pick the lite one. Just to clarify this, no fixes described in this bug are related to @firebase/firestore, which is a separate codebase with already existing lite implementation that does not load gRPC.

Message last modified on Sep 15, 2022 02:55PM

fe...@google.com <fe...@google.com> #216Sep 15, 2022 05:06PM

I sent https://github.com/firebase/firebase-admin-node/pull/1901 for firebase-admin.

Issue 158014637

Description

Update, Jan 8th 2021

Problem you have encountered:

What you expected to happen:

Steps to reproduce:

Issue summary

Comments

oa...@google.com <oa...@google.com> #2Jun 5, 2020 12:09PM

br...@roboflow.com <br...@roboflow.com> #3Jun 5, 2020 08:49PM

ay...@gmail.com <ay...@gmail.com> #4Jun 26, 2020 05:59PM

gr...@gmail.com <gr...@gmail.com> #5Jun 26, 2020 06:20PM

ry...@cinder.studio <ry...@cinder.studio> #6Jun 26, 2020 06:26PM

ry...@cinder.studio <ry...@cinder.studio> #7Jun 26, 2020 06:27PM

br...@roboflow.com <br...@roboflow.com> #8Jun 26, 2020 07:44PM

ay...@gmail.com <ay...@gmail.com> #9Jun 27, 2020 05:41AM

ry...@cinder.studio <ry...@cinder.studio> #10Jun 28, 2020 03:24AM

ay...@gmail.com <ay...@gmail.com> #11Jul 2, 2020 07:04AM

ph...@gmail.com <ph...@gmail.com> #12Jul 2, 2020 08:26AM

[Deleted User] <[Deleted User]> #13Jul 8, 2020 02:19PM

ry...@cinder.studio <ry...@cinder.studio> #14Jul 8, 2020 02:48PM

ay...@gmail.com <ay...@gmail.com> #15Jul 8, 2020 02:51PM

ry...@cinder.studio <ry...@cinder.studio> #16Jul 8, 2020 02:52PM

ay...@gmail.com <ay...@gmail.com> #17Jul 12, 2020 03:53AM

vi...@google.com <vi...@google.com> #18Jul 24, 2020 09:10PM

ry...@cinder.studio <ry...@cinder.studio> #19Jul 25, 2020 11:58PM

ay...@gmail.com <ay...@gmail.com> #20Jul 27, 2020 03:39PM

[Deleted User] <[Deleted User]> #21Jul 28, 2020 02:36PM

ja...@google.com <ja...@google.com> #22Jul 28, 2020 04:10PM

vi...@google.com <vi...@google.com> #23Jul 28, 2020 04:18PM

vi...@google.com <vi...@google.com> #24Jul 28, 2020 05:19PM

ay...@gmail.com <ay...@gmail.com> #25Jul 29, 2020 12:25PM

vi...@google.com <vi...@google.com> #26Jul 30, 2020 02:58PM

at...@protonmail.com <at...@protonmail.com> #27Jul 31, 2020 02:17PM

gr...@gmail.com <gr...@gmail.com> #28Jul 31, 2020 04:43PM

ay...@gmail.com <ay...@gmail.com> #29Aug 1, 2020 04:39AM

ay...@gmail.com <ay...@gmail.com> #30Aug 3, 2020 12:16PM

vi...@google.com <vi...@google.com> #31Aug 7, 2020 05:38PM

gr...@gmail.com <gr...@gmail.com> #32Aug 7, 2020 07:33PM

vi...@google.com <vi...@google.com> #33Aug 7, 2020 08:06PM

at...@protonmail.com <at...@protonmail.com> #34Aug 7, 2020 08:10PM

wb...@sentryware.com <wb...@sentryware.com> #35Aug 7, 2020 11:26PM

vi...@google.com <vi...@google.com> #36Aug 10, 2020 05:43PM

at...@protonmail.com <at...@protonmail.com> #37Aug 10, 2020 06:56PM

vi...@google.com <vi...@google.com> #38Aug 10, 2020 07:29PM

da...@panerabread.com <da...@panerabread.com> #39Aug 10, 2020 07:44PM

vi...@google.com <vi...@google.com> #40Aug 11, 2020 04:54AM

ay...@gmail.com <ay...@gmail.com> #41Aug 19, 2020 04:33AM

vi...@google.com <vi...@google.com> #42Aug 19, 2020 05:45PM

ay...@gmail.com <ay...@gmail.com> #43Aug 19, 2020 06:45PM

jj...@raxial.com <jj...@raxial.com> #44Aug 23, 2020 09:14PM

vi...@google.com <vi...@google.com> #45Aug 23, 2020 09:48PM

fg...@bausoft.cl <fg...@bausoft.cl> #46Sep 7, 2020 05:31PM

[Deleted User] <[Deleted User]> #47Sep 17, 2020 07:27AM

vi...@google.com <vi...@google.com> #48Sep 17, 2020 04:22PM

ry...@cinder.studio <ry...@cinder.studio> #49Sep 17, 2020 06:32PM

vi...@google.com <vi...@google.com> #50Sep 17, 2020 06:36PM

ry...@cinder.studio <ry...@cinder.studio> #51Sep 17, 2020 07:09PM

da...@google.com <da...@google.com> #52 Restricted Sep 18, 2020 07:21PM

si...@gmail.com <si...@gmail.com> #53Sep 29, 2020 11:44AM

sa...@gmail.com <sa...@gmail.com> #54Oct 13, 2020 01:18AM

vi...@google.com <vi...@google.com> #55Oct 16, 2020 08:07PM

da...@google.com <da...@google.com> #56Oct 19, 2020 05:05PM

ry...@cinder.studio <ry...@cinder.studio> #57Oct 19, 2020 07:25PM

da...@google.com <da...@google.com> #58Oct 19, 2020 08:31PM

ry...@cinder.studio <ry...@cinder.studio> #59Oct 19, 2020 08:36PM

ja...@google.com <ja...@google.com> #60Oct 19, 2020 09:52PM

ja...@google.com <ja...@google.com> #61Oct 19, 2020 09:53PM

lu...@geitner.io <lu...@geitner.io> #62Oct 21, 2020 04:44PM

vi...@google.com <vi...@google.com> #63Oct 21, 2020 04:48PM

vi...@google.com <vi...@google.com> #64Oct 21, 2020 04:48PM

ch...@gmail.com <ch...@gmail.com> #65Nov 5, 2020 04:34PM

st...@gmail.com <st...@gmail.com> #66Nov 13, 2020 04:15AM

[Deleted User] <[Deleted User]> #67Nov 13, 2020 07:08AM

ay...@gmail.com <ay...@gmail.com> #68Nov 13, 2020 09:18AM

[Deleted User] <[Deleted User]> #69Nov 16, 2020 07:27AM

ay...@gmail.com <ay...@gmail.com> #70Nov 16, 2020 07:57AM

id...@gmail.com <id...@gmail.com> #71Nov 29, 2020 03:20PM

br...@askgms.com <br...@askgms.com> #72Dec 23, 2020 08:26PM

vi...@google.com <vi...@google.com> #73Dec 23, 2020 08:43PM

da...@google.com <da...@google.com> #52
Restricted
Sep 18, 2020 07:21PM