Status Update
Comments
oa...@google.com <oa...@google.com> #2
br...@roboflow.com <br...@roboflow.com> #3
ay...@gmail.com <ay...@gmail.com> #4
Do you still see this as an issue that doesn't need quick attention?
This issue makes it totally unusable in production. Still we see around "12 secs" to even start the function.
Please check the screenshot and let us know if we are missing anything. For starting loading our dependencies only it took nearly 12 seconds which is no way to have it on production!
gr...@gmail.com <gr...@gmail.com> #5
ry...@cinder.studio <ry...@cinder.studio> #6
ry...@cinder.studio <ry...@cinder.studio> #7
br...@roboflow.com <br...@roboflow.com> #8
Also adding my +1 that we just had to put in significant effort to move one of our core flows off of Cloud Functions due to this and have plans to migrate the rest of our infrastructure as well because 12 seconds is pretty ridiculous latency.
ay...@gmail.com <ay...@gmail.com> #9
I'm really scared to shift to AWS or other alternatives at this moment but seems there are no other options. If it's related to my own dependencies leading to cold start, I accept it. But here, I have no control when my code execution gets started.
At this point we are tightly coupled with GCP and down the line if this issue is not yet addressed, we may need to start migrating to other alternatives. I understand Google doesn't commit to any ETA' s as per their policies but this issue is a hidden massive loop hole in the whole cloud functions system.
I hope someone from official team updates on the status.
ry...@cinder.studio <ry...@cinder.studio> #10
ay...@gmail.com <ay...@gmail.com> #11
Can anyone from team update incase if this issue is under resolution?
ph...@gmail.com <ph...@gmail.com> #12
[Deleted User] <[Deleted User]> #13
If there is no news, i think i will start the migration by the end of July
ry...@cinder.studio <ry...@cinder.studio> #14
This github repo has bare-minimum dependencies and lines of code and proves a 5 second startup cost exclusively attributable to the firestore libraries. It's not your code and should be a P0 at Google.
ay...@gmail.com <ay...@gmail.com> #15
we get to their radar?
On Wed, 8 Jul 2020 at 8:19 PM, <buganizer-system@google.com> wrote:
ry...@cinder.studio <ry...@cinder.studio> #16
Every query or operation we move to the new firebase function (that excludes the google library) is reporting a significant boost in performance. Milliseconds vs seconds.
ay...@gmail.com <ay...@gmail.com> #17
Looks like they corrected the trace now. Now its showing that our code starts at 0th ms (actually showing in negative values which is impossible). But atleast now its showing the function is getting called as soon as its triggered, which is great.
Anyone else experiencing the same? I'm using the latest versions, btw.
Will debug more.
vi...@google.com <vi...@google.com> #18
Thank You for reaching out to us on this and keeping this thread updated.
We are actively working with a number of internal teams to have a resolution to this problem.
We are treating this as one of our top priorities and will keep this thread updated as we make progress on this issue.
We apologize for any inconvenience caused and please also feel free to reach out to us with more questions on this issue.
We are also happy to meet with folks on an individual basis to help with your specific architectural needs and use cases as we work on the issue on our side.
Best regards,
Cloud Functions Team
ry...@cinder.studio <ry...@cinder.studio> #19
ay...@gmail.com <ay...@gmail.com> #20
Looks like it got reverted again. I see huge delays again before start of the function!!!
[Deleted User] <[Deleted User]> #21
ja...@google.com <ja...@google.com> #22
Hi folks,
Apologies for the delay in responding here. The quick summary is "we're still working on it" (The last update on the internal bug was yesterday). I can't give an ETA because honestly we don't know, but the "good" news is that we're able to reproduce the issue and we're investigating the cause. This is not expected behavior, which is why it's taking some time to track down (it unfortunately was not an obvious cause)
We will update again once we know more.
vi...@google.com <vi...@google.com> #23
We are working on this as our top most priority.
Would you like to have a meeting. We want to make sure we provide you as much help as possible.
Please email me at viramachandran@google.com. We could setup a time.
In addition, if any other customer would like to meet on this issue, please email me at viramachandran@google.com and we could setup a meeting.
Best regards,
Vinod
vi...@google.com <vi...@google.com> #24
We are actively working on this issue, and we are working on key fixes which will be rolling out soon to help mitigate this issue. In addition, we are working on key long term efforts which will help further make this better. Please feel free to reach out to us directly over email (viramachandran@google.com) and we can setup a meeting to work with you on this. We want to make sure we serve you to the best effort possible and meet your key needs.
Best regards,
Vinod
ay...@gmail.com <ay...@gmail.com> #25
Thanks Vinod for sharing some info. May I know what fixes are getting soon? By soon is it in days or weeks or months? I understand Google can't commit to timeline but at-least info about a tentative date to developers is pretty much required(considering you are already on the fixes)
It at-least helps us to evaluate based on that. Also, the fixes we saw couple of days got reverted and again its showing around 5 secs just to start the function. Initially I thought its because of firestore libs and took the pain to convert all to REST api's. But now again we see it happening.
It would be really great if you can share what fixes we may seen soon so that we can see if our issues are covered.
Thanks, Ayyappa
vi...@google.com <vi...@google.com> #26
The fixes that are being rolled out could take a couple of weeks to roll out.
We are working on this on top priority to roll the changes out.
We sincerely apologize for the inconvenience during this period.
I have also setup a meeting with you so that we could look at your architecture and see how best our team can help you.
Best regards,
Vinod
at...@protonmail.com <at...@protonmail.com> #27
gr...@gmail.com <gr...@gmail.com> #28
ay...@gmail.com <ay...@gmail.com> #29
We still see the delays however we would like to try with cloud run as suggested by the team. We will update here if we have further updates.
Also incase if you have updated from Node8 to Node10, make sure you change your code to the latest environment variables. Mainly, replace FUNCTION_NAME to K_SERVICE if you are using it in your project.
Thanks, Ayyappa
ay...@gmail.com <ay...@gmail.com> #30
We moved to Cloud Run with the help of google team and its good w.r.t cold starts but with some drawbacks. However coming from functions we need to figure out few things
- Cost comparison (looks like its 10 times more pricey but concurrency might help to reduce)
- Debugging with trace becomes difficult as it won't show any logs
- Concurrency number per cloud run instance
- Additional Min instances cost to keep instances warm
- Thread safe code as cloud run allows concurrency compared to functions where it's more isolated
- Unable to deploy multiple functions as different cloud run services(as functions framework allowing only one target). Functions offered more control where we can set the required memory per function.
Unfortunately, I can't exactly measure the cold start as its not displaying more details on the trace. Will keep you informed.
On another note, we totally migrated from firestore client libs to REST api which saved couple of seconds.
vi...@google.com <vi...@google.com> #31
We wanted to inform you that we rolled out some key fixes last week, which will help in reducing the cold start time mentioned above in this thread.
We are actively working on more changes and we will keep you updated as we make progress on them.
We sincerely Thank You for your patience on this issue.
Please feel free to reach out to us with more questions.
Best regards,
Cloud Functions Team
gr...@gmail.com <gr...@gmail.com> #32
vi...@google.com <vi...@google.com> #33
Thanks for the comment.
Are you open to having a meeting. We would like to work with you on your use case and see how best we could serve you.
Could you please email me a viramachandran@google.com.
Best regards,
Vinod
at...@protonmail.com <at...@protonmail.com> #34
wb...@sentryware.com <wb...@sentryware.com> #35
Is this affecting all Node runtimes- 8, 10, and 12- or only some of them?
vi...@google.com <vi...@google.com> #36
Thanks a lot for your feedback.
Would griffinjohnston@gmail.com>, atomicweb@protonmail.com and wbattel@sentryware.com, be open to having a call.
Please feel free to reach out to me at viramachandran@google.com and we can setup a time.
We want to work with you to understand your use cases and see how best we could help solve this problem for you.
Our apologies for any inconvenience caused here.
Best regards,
Cloud Functions Teams
at...@protonmail.com <at...@protonmail.com> #37
My use case is straightforward just a Firebase project using cloud functions, in a function that onlyt returns a timestamp I'm getting cold starts of 3 seconds and in functions where I'm loading the firebase-admin, initialising it and reading a handful of documents it takes at least 9 seconds sometimes more than 15 seconds.
When warm, the timestamp function takes less than 50ms and the function that uses the admin package takes 150ms on average.
Can you elaborate on the changes the team made in your last update?
As I said before I commented a week or so ago about my cold start times being significantly reduced from 9 seconds to around 3 seconds which is manageable in my use case. However, that lasted only 24 hours and I have since gone back to the same times as before.
I have also done everything suggested to limit cold starts and these are the best times I can get.
9 seconds makes cloud functions unusable so I'm starting to look into migrating to AWS but if this issue can be fixed I would much prefer to stick with Firebase and cloud functiions.
Thanks,
Jon.
vi...@google.com <vi...@google.com> #38
We are actively working on getting the issue resolved. How about we have a meeting with the team.
We will have folks from both Cloud Functions and Firebase in the meeting and we can look at it together.
Best regards,
Vinod
da...@panerabread.com <da...@panerabread.com> #39
I had submitted a ticket for this same issue but was directed here and have been watching this ticket.
I have a simple cloud function with @google-cloud/firestore 4.2.0 as the only dependency (it does a single get by document id) and the cold start on this is routinely upwards of 14s.
I just tried and had a 6.1 sec cold start time (5699 ms function processing time) and warm times of 471 (382 ms), and 252 (182 ms). I tried a few other cold starts and they averaged 5-6 sec cold starts.
This is a client facing endpoint that should not have 6+ second wait (most calling apps have timeouts of 2-5 secs).
vi...@google.com <vi...@google.com> #40
Thanks for your feedback. As mentioned above, we are actively working on landing some changes which would make things better.
I have also setup sometime to sync later this week to see how best we could help you.
Best regards,
Vinod
ay...@gmail.com <ay...@gmail.com> #41
Any updates on this Vinod? We still have firebase triggers on functions which have lot of delays.
vi...@google.com <vi...@google.com> #42
We are investigating the Firebase triggers more and are actively working on it.
We will update the bug shortly with our plan further.
Best regards,
Vinod
ay...@gmail.com <ay...@gmail.com> #43
Hey Vinod, Coming to the firestore triggers, It's been more than 4 days I ran some tests. I gave a quick test now and see the cold starts are around 2secs which is ok for a background trigger. But I can tell it's far much better than earlier which used to be in the order of 5-10secs.
Thanks, Ayyappa
jj...@raxial.com <jj...@raxial.com> #44
vi...@google.com <vi...@google.com> #45
Thank You for your feedback.
Could we please have a meeting. We would like to understand your use cases and see how best we could serve you.
Please feel free to reach out to me at viramachandran@google.com and we can setup a time.
Best regards,
Vinod
fg...@bausoft.cl <fg...@bausoft.cl> #46
I have an application running in nodejs 10.
I am using cloud functions, I am finishing the test period, so it is still acceptable startup times over 5 seconds, but now going to production these times are not feasible for the end user. There is some time to solve this problem, if not to find an alternative, thank you
Regards
Fabian
[Deleted User] <[Deleted User]> #47
Appreciate the effort of the team working on it to fix this issue, simply would have wished for a more honest heads up before onboarding this service.
vi...@google.com <vi...@google.com> #48
We sincerely apologize for any convenience caused.
I would like to setup sometime to chat with you, to work with your use cases, and see how best we could serve you.
I will setup sometime to meet via email.
Best regards,
Vinod
ry...@cinder.studio <ry...@cinder.studio> #49
This platform is top notch, it's just slow in the libraries that access the firestore database.
Companies building production systems are abandoning the platform exclusively on this issue. It's a big deal.
vi...@google.com <vi...@google.com> #50
We have already escalated it to a P1 and the team is working on it with top priority. P1 is taken very seriously internally.
All our internal teams are working on the investigation and the Firebase team is actively exploring to make things better.
We sincerely apologize for any inconvenience caused. We have been rolling out fixes on a regular cadence on this issue. We definitely have more room for improvement, and we are doing our best to make things even better.
Best regards,
Vinod
ry...@cinder.studio <ry...@cinder.studio> #51
Thank you for communicating efficiently here.
Best,
Ryan
da...@google.com <da...@google.com> #52 Restricted
si...@gmail.com <si...@gmail.com> #53
sa...@gmail.com <sa...@gmail.com> #54
vi...@google.com <vi...@google.com> #55
Our sincere apologies for the inconvenience caused.
We are working on this issue with the highest priority. We have rolled out a set of fixes earlier this quarter. We will be rolling out additional fixes in this quarter as well.
In addition, we will be publishing a best practices doc for Node functions with Firebase.
Please also feel free to email me directly at viramachandran@google.com and we would like to work with your specific use cases and see how best we could serve you.
Best regards,
Vinod
da...@google.com <da...@google.com> #56
Firebase customers: the FUNCTION_NAME
env var has been changed to FUNCTION_TARGET
, and this may be impacting your cold start performance.
If you are using code such as in FUNCTION_NAME
env var has gone away.
Adjusting the code to look at FUNCTION_TARGET
should improve cold start performance in that case.
ry...@cinder.studio <ry...@cinder.studio> #57
It was my understanding that K_SERVICE was the var we should be using. What is the advice between K_SERVICE vs FUNCTION_TARGET ?
da...@google.com <da...@google.com> #58
K_SERVICE also works (it will have the same value as FUNCTION_TARGET).
ry...@cinder.studio <ry...@cinder.studio> #59
ja...@google.com <ja...@google.com> #60
The wording on FUNCTION_TARGET
is the thing that explicitly maps to the in-code function.
For example, K_SERVICE
does not appear in the Node.js Functions Framework [1], but FUNCTION_TARGET
does [2].
[1]
[2]
lu...@geitner.io <lu...@geitner.io> #62
Thanks vi...@google.com for the update,
I'm still desperate about the latency of google cloud functions but anyway,
I do have a question for the google team,
Are you considering moving cloud functions hosting from App Engine to Cloud run ?
👉🏻There is now a way to deploy cloud function framework directly with a new buildpack.
👉🏻Soon, Cloud run will get events, maybe exactly like the firebase trigger offert.
It would be a nice way to improve performance and capabilities :)
Thanks a lots,
Lucas Geitner
vi...@google.com <vi...@google.com> #63
Thank You very much for your feedback.
With your question on using the FF with the buildpack.
This link has a few instructions on taking your existing function and using a buildpack to deploy it to Cloud Run:
Please feel free to reach out to me directly if you have further questions on this.
Best regards,
Vinod
ch...@gmail.com <ch...@gmail.com> #65
st...@gmail.com <st...@gmail.com> #66
[Deleted User] <[Deleted User]> #67
ay...@gmail.com <ay...@gmail.com> #68
moment it seems to have results. However we miss mainly a few things
compared to functions.
1. Firestore triggers are not directly supported
2. Need to come up with build scripts for deployment (compared to firebase
it's just a single command)
3. Testing isn't easy as of now as there is no local emulator
However, thanks to the "functions framework" for making it easier to shift
to cloud run. We may pay high compared to functions but atleast concurrency
option of cloud run might bring down the costs but we are free from cold
starts for now.
Thanks,
Ayyappa
On Fri, Nov 13, 2020 at 12:39 PM <buganizer-system@google.com> wrote:
[Deleted User] <[Deleted User]> #69
So you are migrating your apps to Cloud Run? Is it necessary a big rewrite in the code base?
ay...@gmail.com <ay...@gmail.com> #70
> Is it necessary a big rewrite in the code base?
*No,* Not at all. Please follow the below steps for a quick try.
1. Setup for using Functions Framework
<
1. Define the start command in package.json
2. Set FUNCTION_TARGET env variable before calling start command
2. Once you are done with step 1, you can package your function as a
container using and push it to gcr(container registry)
3. Make your cloud run to use the gcr container
4. Done!
Please *note* that if you are using Firestore triggers in cloud functions,
you may need to find an alternative for handling those on cloud run. At the
moment, we are still using functions for firestore triggers and using cloud
run for main rest api services.
Google team recommended to use *build packs *which seems to be more common
in the future but the below way was chosen as its much easier for us.
Sharing my script for quick reference.
FUNCTION_TARGET : Set your function name here
IMAGE_NAME : Name for your container (New parameter compared to functions)
PROJECT : Set your project name
Usage for building dev envi : ./build.sh dev
#!/bin/bash
RED=`tput setaf 1`
GREEN=`tput setaf 2`
YELLOW=`tput setaf 3`
CYAN=`tput setaf 6`
BOLD=`tput bold`
RESET=`tput sgr0`
WARN=$RED
LOG=$GREEN
INFO=$CYAN
IMAGE_NAME=rest-api
REGION=us-central1
MIN_INSTANCES=0
MAX_INSTANCES=10
VPC_CONNECTOR=functions-connector
REDIS_HOST=10.128.0.2
REDIS_PASSWORD=password
if test -z "$1"
then
ENV=dev
else
echo "${INFO}Deploying for project : ${BOLD}${WARN}$1 ${RESET}"
ENV=$1
fi
case $1 in
dev)
PROJECT=project-name-dev
MIN_INSTANCES=0
MAX_INSTANCES=2
;;
staging)
PROJECT=project-name-stage
MIN_INSTANCES=1
MAX_INSTANCES=5
;;
production)
PROJECT=project-name-prod
MIN_INSTANCES=1
MAX_INSTANCES=10
;;
*)
echo "${BOLD}${WARN}You need to pass environment(dev|staging|production) as
the first argument. ${INFO}ex: deploy.sh dev${RESET}"
exit 1
esac
echo "${WARN}Project : ${BOLD}${INFO}${PROJECT}${RESET}"
#echo "${INFO}Deploying all functions...${BOLD}${WARN}$ENV (${PROJECT})
${RESET}"
#firebase deploy --only functions
echo "${INFO}Starting deploying to cloud run...${RESET}"
#Make build and push to container registry
echo "${INFO}Submitting build to google container registry...${RESET}"
*gcloud builds submit --tag
<
#Deploy cloud run from container registry image with env variables
FUNCTION_TARGET=b2b
echo "${INFO}Deploying ${FUNCTION_TARGET} service to cloud run from google
container registry...${RESET}"
#Url will be cloud-run-url/v1
gcloud alpha run deploy ${FUNCTION_TARGET} \
--image
--platform=managed \
--allow-unauthenticated \
--set-env-vars GCLOUD_PROJECT=${PROJECT} \
--set-env-vars REDIS_HOST=${REDIS_HOST} \
--set-env-vars REDIS_PASSWORD=${REDIS_PASSWORD} \
--set-env-vars ENABLE_PROFILING=true \
--set-env-vars FUNCTION_TARGET=${FUNCTION_TARGET} \
--vpc-connector ${VPC_CONNECTOR} \
--region ${REGION} \
--min-instances ${MIN_INSTANCES} \
--max-instances ${MAX_INSTANCES}
*I'm not an experienced backend developer as I'm a game dev by profession
past 12 years but, using functions from the past 1.5 years. So, please
double check and let me know if you have any better alternatives.*
Thanks,
Ayyappa
Twitter. ayyappa_1
On Mon, Nov 16, 2020 at 12:58 PM <buganizer-system@google.com> wrote:
id...@gmail.com <id...@gmail.com> #71
It took me a while to find this issue, yet it is very helpful at explaining the root cause. I saw many mentions of "Wee have deployed a fix and are working on more", but it seems that there is no real effect atm, as the problem still persists.
With remaining fixes / plans, can we at all expect to see respectable cold starts (~1s max) for these simple core functions that perform very basic yet core firebase functionality, or should new project move on to something like cloud run / app engine instead?
br...@askgms.com <br...@askgms.com> #72
Google team, is this the expected turnaround speed for priority 1 tasks? If so, I think we'll have to move on to other cloud providers and suggest our counterparts do the same.
vi...@google.com <vi...@google.com> #73
We sincerely apologize for any inconvenience regarding this issue.
We have been actively working on reducing cold starts and have rolled out a number of fixes in the past two quarter. In addition, we are going to be published a blog on the best practices to setup and structure your cloud functions to minimize cold starts. We will be updating this bug shortly after the blog is published.
In addition, we are working on additional features to further reduce cold starts in Q1.
We sincerely apologize for any inconvenience caused.
Best regards,
Vinod
br...@askgms.com <br...@askgms.com> #74
While I appreciate that (ostensibly) you guys have been actively working on this, I think we're all starting to feel that your team's definition of "active" may be different than that of other developers. Firebase has been heavily marketed, yet basic samples easily illustrate the cold start issues. While there was a special Firebase Summit built and hosted, which undoubtedly pulled dev time, a critical component of the platform was still languishing with no meaningful changes to an issue which blocks many production deployments.
Publishing a blog post about best practices is great, but why in the world does a function whose sole purpose in life is to be called intermittently have such bad performance when used in that manner? It should NOT be necessary for your users to go through blog posts to understand why a basic demo function takes >5s to perform anything at all (and that's assuming fixes are available!). Consider your competition: they require no such silliness, and you both offer similarly-priced services. This is how you lose customers, and infrastructure customers generally don't come back when burned.
At the start of June, Benjamin on the Cloud team created this ticket after the GitHub issue had been ongoing since January 2019. In August, you were "actively working on getting the issue resolved", and it looks like you met with several different developers at different companies to understand the issue better. In October, you say you "will be rolling out additional fixes in this quarter [2020 Q4] as well". Now you're saying it'll be 2021 Q1 for some further features which might help? Forgive us for some skepticism.
Trust needs to be earned back here, and possibly having some features which maybe/potentially/hypothetically could help isn't enough to allow any business to decide whether to continue development on a platform. This isn't a matter of inconvenience - it's a matter of whether we ever work with Google Cloud Platform again, and whether GCP has developers as advocates or critics.
I like the platform and its potential, but this makes me and my team wary of actually using it for anything that matters.
st...@googlemail.com <st...@googlemail.com> #75
If this is how a P1/S1 task gets 'resolved' then what do you guys even do with anything lower rated?!
vi...@google.com <vi...@google.com> #76
Our team has actively worked on making cold starts better and we are treating this as very key priority.
We seen significant improvements in our internal measurments.
Could you please provide details on your specific functions. I will setup a meeting with our team to review them for further analysis.
Likewise stefernet@, could you please provide details on your functions as well. Please feel free to email me directly at viramachandran@google.com.
Likewise idaderko@gmail.com, could you please provide details on your functions as well. Please feel free to email me directly at viramachandran@google.com.
We do want to earn your trust and we are working on this with the highest priority.
Best regards,
Vinod
vi...@google.com <vi...@google.com> #77
As mentioned above regarding the blog post, here is the published blog post writing and deploying Node.js Apps on Cloud Functions with the goal to optimize performance and minimize cold starts.
As mentioned above this is orthogonal to any issues you could be facing with your functions. Please email me directly with your functions details and we will investigate it with highest priority. As mentioned earlier, we have seen significant improvements in our internal measurements and we do want to make sure we analyze your functions and meet your performances needs.
Our apologies for any inconvenience.
Best regards,
Vinod
wi...@google.com <wi...@google.com> #78
While I appreciate that (ostensibly) you guys have been actively working on this, I think we're all starting to feel that your team's definition of "active" may be different than that of other developers.
I'm the engineering director at Google responsible for Cloud Functions. For a multitude of reasons, multiple changes/updates haven't yet made their way to production which we believe would have improved the performance issues cited in this bug report. Although I cannot promise that it will be fully resolved once those changes and updates have rolled out.
What I can promise you is that starting in January you'll see us making a lot more progress on this, and with regular updates as well as transparency on what we believe is causing the issue and what we're doing to address it.
br...@askgms.com <br...@askgms.com> #79
And Vinod, at this point I don't believe our function calls are sufficiently different from any of the others to really warrant a deeper analysis, though if you and the team really believe otherwise, we can set up some time. I'll check out the link you posted again, though I've run through that in the past during initial builds.
be...@google.com <be...@google.com> #80
An update
👋 stefernet@ brandon@, other folks in this thread, I'd like to give some perspective and explain why this ticket has taken quite some time to address.
When this ticket was opened we were having intermittent infrastructure issues with Cloud Functions resulting in poor performance. This thread helped surface a P0 issue, which was addressed.
However...
As this ongoing discussion demonstrates, this isn't the whole story.
A variety of people continue to have a poor experience with Cloud Functions. Internally, we've put together a group of folks across several teams (Cloud Functions, Developer Relations, support, Firebase), and have been discussing how to better meet people's expectations around performance.
What's clear, is there's no one problem:
- there were infrastructure issues; some of which have been addressed, some of which where there's ongoing work around (e.g., improving gVisor read performance).
- large dependency graphs hurt cold start performance.
- Cloud Functions can be a tricky paradigm to code for, which can contribute to cold start issues.
- there are product improvements we can make to make cold starts less frequent.
Our plan
- I've worked with support to pull together this comprehensive article
. It's my hope that this helps any people in this thread who were bumping into issues due to coding hiccups (such as unhandled promises)."Tips for writing and deploying Node.js apps on Cloud Functions" - We have features in the works that will allow people to avoid frequent cold starts; concretely, we intend to support
for Cloud Functions.minimum instances - We intend to continue having cross- team meetings to get to the bottom of specific problems customers are having (for instance, why slow cold starts seem to frequently be correlating with certain libraries).
Unfortunately, this thread has turned into a bit of a catch- all issue (for a problem that is nuanced).
In the new year, I would like to close this issue, in favor of more specific issues where we can dig into specific categories of problems (this will make it easier for us to help individual customers).
We want Cloud Functions to be an awesome product for everyone's use cases. And I apologize for the frustration this long-lived thread has caused.
Edit: in response to brandon@, we will update the "best practices" post, as we release features in the new year. There will also be posts to cloud.google.com, and
id...@gmail.com <id...@gmail.com> #81
Most of us are talking about insanely long cold start times related to firestore triggers / using firestore inside cloud function. Here is firebase documentation on related topic
That 10s reaction time, is where the biggest pain point is atm, since firestore and cloud functions are used so much in conjunction. Were there any improvements in this area, I believe issue was with grpc establishing initial connection from function to firestore?
From what I read here, the only viable solution will be to increase "minimum instances" count for cloud functions. Will this become available as part of firebase function config?
br...@askgms.com <br...@askgms.com> #82
@benjamin Thanks for the detailed explanation and battle plan! I certainly understand that there are several issues likely at play here, exemplified by the above commenter and the variety of conditions which could trigger the behavior. We'll look forward to more updates in January, and appreciate you guys working on getting this fixed up.
bu...@gmail.com <bu...@gmail.com> #83
The firebase/firestore sdk is equally randomly slow on AWS lambda (sometimes 4+ seconds). I would imagine it's the same issue on azure too.
Edit: I updated all the firebase node libraries which removed the grpc dependencies and now uses the js one.
Initialising the sdk is much more consistent in startup times than before, also seems much faster. Now i'm getting around 1600ms on a cold boot
da...@gmail.com <da...@gmail.com> #84
The web application is running on Nuxt.js in SSR mode, therefore a cloud function is serving the website. I'd consider it a common use case and it's hard to digest the fact that the users need to wait around 5-10s for the main page to load (given the scenario of lower traffic and therefore function kept in the cold state).
I'd happily let someone from the team have a look and see if analysing the way I architected the functions' code would be of a benefit to any of us. In conclusion Firebase suits all my needs apart from this issue that's been hanging in the air for a long time.
Kind regards,
Damian
ry...@cinder.studio <ry...@cinder.studio> #85
In particular bu...@gmail.com THANK YOU!
Is your GRPC change published yet? We'd love to take advantage of that change. This is precisely what I've been waiting to hear you change. I previously shared on this email thread a testing tool that demonstrates and measures the issue. I just updated it to all of the latest libraries and I see 0 improvements.
The below sample application has only 2 serverside dependencies:
- firebase-admin
- firebase-functions
It demonstrates that JUST INSTANTIATING FIRESTORE can take up to 5 seconds.
@ wi...@google.com,
While we are excited to hear that this ticket has revealed a variety of issues across the platform that will be improvements we will enjoy; a massive amount of the issues WE are concerned about can be relieved by taking some time away from the bigger picture stuff and focusing on speeding up the "Instantiation Time" of the Firestore JS library.
@ALL Companies suffering from this issue,
Our company has found tremendous success in performance BYPASSING the published Google Libraries in favor of using the Firestore REST API's directly. We've migrated all critical path firestore calls to direct REST API calls. Reach out to me if you want some suggestions: ryan@cinder.studio
bu...@gmail.com <bu...@gmail.com> #86
Doubling the available ram should cut the cold boot time in half, it's a trade off though as it makes all future invocations more expensive.
Setting memory to 2GB you should expect the function to complete in under 1 second, but all subsequent calls probably don't get much faster.
I'm using 1GB memory allocation which puts my cold boot functions under 2s, subsequent calls are faster than before; but don't improve when using more than 1GB. After 1GB is just wasting money.
The frustrating thing is the memory used is only 134mb, the rest is just wasted reducing the cold boot time.
In AWS this is probably due to the vCPU scaling with the available memory, more RAM = more CPU.
Does cloud functions scale in the same way?
If the low memory is causing slow init of the sdk, I think it's unrealistic of Google to expect customers to opt for more expensive 1GB memory just to get reasonable cold boot response times with a core library.
ry...@cinder.studio <ry...@cinder.studio> #87
Thanks for staying engaged! I understand that the current library is giving difficulty here. It's apparently just too heavy... I'm suggesting Google may need to invest someone's time to publish an alternative library if this library can't be "adjusted" to correct the issue. Publish both if you want. One that is simple, fast-and-light, but has fewer features. And keep the bigger, heavier one for more complicated tasks. Google's Firebase REST API is quite robust and does not need all the fancy to be accessed.
We know this solution works because we invested in engineering to author our own solution:
Here at Cinder Studio, we think Firebase is the most feature-rich scalable cloud database we've ever worked with and we wanted to use it so badly that we ended up making our own library to bypass the speed concerns of Google Supplied Library.
Our library
* has only 3 small dependencies (I may need to verify this further),
* is composed of less than 1000 lines of code (excluding testing files), and
* uses simple REST API CALLS to accomplish the job.
* uses minimal memory and minimal instantiation time
It likely lacks all the features of your library but it accomplishes 85% of the DB calls we need today with excellent (low) error rates (We've got a few stragglers we've been too lazy to migrate because we'd need to add a few more features to the library to make it happen)
Below is an example of how similar they are when compared side-by-side.
---------
import dataArchiveGoogleFb from ...
import dataArchiveCinderFb from ...
export default (ownerAccountId:string, datasetShortname:string) => {
// EXAMPLE USING GOOGLE FIREBASE LIBRARY
const googleQueryResult = await (
dataArchiveGoogleFb.collection
.select(
'id',
'createdAt',
'updatedAt',
'ownerAccountId',
'datasetShortname',
'data',
)
.where("ownerAccountId",'==', ownerAccountId)
.where("datasetShortname",'==', datasetShortname)
.where("deletedAt", "IS_NULL")
.orderBy("createdAt", "DESCENDING")
.limit(1)
.get()
)
return queryResult
// EXAMPLE USING CINDER STUDIO'S LIBRARY
const cinderQueryResult = await QuickRead.query(
dataArchiveCinderFb.newCollectionQuery()
.select(
'id',
'createdAt',
'updatedAt',
'ownerAccountId',
'datasetShortname',
'data',
)
.whereComposite('ownerAccountId','EQUAL','string', ownerAccountId)
.whereComposite('datasetShortname','EQUAL','string', datasetShortname)
.whereComposite('deletedAt', 'IS_NULL')
.orderBy("createdAt", "DESCENDING")
.limit(1)
.prepare()
)
return {
googleQueryResult: googleQueryResult,
cinderQueryResult: cinderQueryResult,
}
}
be...@google.com <be...@google.com> #88
An update
👋 I’ve updated my initial post and the title to be more specific, based on the problems still being discussed in this thread:
Cold Start performance issues seem to correlate closely with gRPC libraries (like Firestore). Folks switching to HTTP dependencies, from gRPC, have seen performance improvements (this seems to point to gRPC as well).
If the problems you’re running into do not seem to correlate to gRPC SDKs, such as Firestore, please don’t hesitate to open an issue and we will investigate (Also, I’ve pulled together a
- We continue to work on cross cutting features that will help cold start performance in general, e.g., Min Instances for Cloud Functions, and will keep this thread updated as features roll out.
- There's an ongoing internal conversation about how we can improve cold start issues brought about by dependencies.
sa...@gmail.com <sa...@gmail.com> #89
Sounds like I'm having the same issue (
Basically, created a blank firebase
project with an https
function that modifies one field of one firestore
document.
Cold start times (which seem to reset within 30m
) are in the 6000ms
range.
10:28:41.934 ColdStartTest.tsx?6c99:21 Request starting ...
10:28:48.176 ColdStartTest.tsx?6c99:24 Success
10:28:48.176 ColdStartTest.tsx?6c99:29 Cold start time: 6242
10:28:53.802 ColdStartTest.tsx?6c99:21 Request starting ...
10:28:53.979 ColdStartTest.tsx?6c99:24 Success
10:28:53.979 ColdStartTest.tsx?6c99:29 Cold start time: 177
11:07:39.914 ColdStartTest.tsx?6c99:21 Request starting ...
11:07:46.073 ColdStartTest.tsx?6c99:24 Success
11:07:46.073 ColdStartTest.tsx?6c99:29 Cold start time: 6159
12:13:45.491 ColdStartTest.tsx?6c99:21 Request starting ...
12:13:52.109 ColdStartTest.tsx?6c99:24 Success
12:13:52.110 ColdStartTest.tsx?6c99:29 Cold start time: 6619
I can post the repo if it's helpful, but honestly it just seems like it's a problem using firestore within a firebase project which doesn't sound like it makes any sense? Please correct me if I'm misunderstanding this thread! I was encouraged by @mbleigh to contribute.
EDIT 1:
I should mention that I'm using the dependency firestore-admin
vs @google-cloud/firestore
as that's what the default Firebase project starts you with. Would love to know whether that distinction's related to this issue or not.
EDIT 2:
Switching from default function memory of 256MB
to 512MB
seems to bring cold starts down to 2000-3000ms
range. Amazing to think that would be necessary on an empty firebase project that makes use of firestore but very good to know.
id...@gmail.com <id...@gmail.com> #90
What could be so demanding on memory / cpu here? I have suspicion that it is related to installing / initialising or using firebase-admin package, in particular for firestore.
[Deleted User] <[Deleted User]> #91
jr...@gmail.com <jr...@gmail.com> #92
That's outrageous.
ry...@cinder.studio <ry...@cinder.studio> #93
In our case we already had a large number of calls using the firestore libraries so we implemented a new API on a new firebase function that did not have the firestore dependencies and on that new API all of our calls are making direct REST calls to the firestore API.
Performance jumped significantly.
Ryan@Cinder.Studio
po...@gmail.com <po...@gmail.com> #94
remove the firestore libraries from Google and just go straight to REST API calls straight to the firestore API
This is a remarkable observation, and I'm frankly amazed that the official shipped SDK would be significantly less performant than effectively writing one's own SDK to wrap the underlying firestore API.
In common with others, I've also deduced that cold start times on our FB projects are largely impacted by the first call to admin.firestore()
- not module imports, or global initialisers. Just this. We wrapped a timer around that call and we could see it represented the majority % of the first run time.
console.time("admin.firestore() " + process.env.K_SERVICE);
const db = admin.firestore();
console.timeEnd("admin.firestore() " + process.env.K_SERVICE);
I can concur we've also upped the memory allocation on some of our busier functions simply to reduce this firestore latency a bit. We also went so far as to use the RTDB for some datastore functionality we needed to have faster response times on, since it spins up in ~80-100ms rather than ~300-800ms for firestore.
ma...@apptreesoftware.com <ma...@apptreesoftware.com> #95
In common with others, I've also deduced that cold start times on our FB projects are largely impacted by the first call to admin.firestore() - not module imports, or global initialisers. Just this. We wrapped a timer around that call and we could see it represented the majority % of the first run time.
Totally agree with this. I spent a lot of time refactoring our cloud functions to follow the best dependency practices outlined in the various articles published. After completing that we saw only a minor improvements in startup times. We then discovered this issue and moved all of our services to Cloud Run. We still use firestore trigger cloud functions but they simply forward the request to our cloud run instance. By avoiding any calls to the admin SDK in the firestore triggers themselves we're now down to sub-second cold start times.
This problem should be mentioned in the
Given how long this has been a P1 issue I've given up on it. I would really like to see Eventarc include firestore trigger events so that Cloud Run can receive those directly.
jj...@raxial.com <jj...@raxial.com> #96
Between Cloud Functions and Cloud Run, there is an ideal architecture. As far as I can tell, Cloud Run is the future as it feels like Cloud Functions 2.0 in virtually every way. Just need more trigger support.
st...@googlemail.com <st...@googlemail.com> #97
ca...@hypermob.co.uk <ca...@hypermob.co.uk> #98
Meanwhile google is pretending to work at this with P1. How can it take over a year to fix this for the firebase team when even a quick bodge by some users has a such a big impact.
There's a reason why Google cloud has lost $5.6B in 2020.
I'm waiting around +20 seconds to save a document + firestore trigger + Save other document - which is a lot since is just an audit behaviour.
ay...@gmail.com <ay...@gmail.com> #99
request to our cloud run instance."
Can you please share how you are doing this? Through pub-sub?
We also have firestore triggers which are still stuck at functions due to
the lack of triggers support on Cloud Run.
Also, we are maintaining minimum instances on cloud run just to avoid the
cold starts but definitely a lighter firestore sdk will definitely help. As
we are having min instances in all of our environments for all services
which is billed nearly for 10 min instances and this is unfortunate as its
not a perfect solution.
Hope cloud run supports firestore triggers too pretty quickly or a lighter
firestore sdk ls published soon.
Also is the firestore sdk opensource? If so do you guys see it ideal to
remove the gRpc stuff to avoid this issue? if thats technically possible,
that would be worth a try from my end.
Thanks,
Ayyappa
On Fri, Feb 5, 2021 at 11:04 PM <buganizer-system@google.com> wrote:
ma...@apptreesoftware.com <ma...@apptreesoftware.com> #100
Not through pubsub, just a normal HTTP call to our Cloud Run instance. ie.
const host = functions.config().api.functions
const httpClient = axios.create({
baseURL: host,
});
export const onMessageCreated = functions.firestore
.document('threads/{threadId}/messages/{messageId}')
.onCreate(async (snapshot, context) => {
const threadId = context.params.threadId;
const messageId = context.params.messageId;
const message = snapshot.data() as any;
message.messageId = messageId;
await httpClient.post('/functions/onMessageCreated', {
threadId: threadId,
snapshot: message
});
});
Not ideal but brought our cold start times from ~20 sec to < 1 sec. Getting trigger events to Cloud Run (or fixing this issue) would be ideal because all cloud functions are doing for us now is adding latency and cost.
The advantage to this setup is that you can configure the host
to point to your dev machine using something like ngrok. This makes debugging triggers code much easier.
an...@rydeup.de <an...@rydeup.de> #101
wi...@google.com <wi...@google.com> #102
There are three different paths that we're taking to resolve this issue. For many, I believe #3 will be the most practical and near term outcome.
-
Generally speaking, Cloud Functions cold start time is not as good as industry leaders in this benchmark, and we expect to improve this over time, but I don't have a time table to communicate right now. There are ideas that we're investigating, but they're too early for me to give you a definitive timeline, and I would categorize this as incremental product performance improvements that we're going to deliver over time.
-
Firebase client libraries seem to be particularly slow to start up. Which we think is specifically related to the presence of gRPC and Node.js. For the purposes of resolving this issue, I want to delegate this to the Firebase team, as this does not appear to be a problem specific to Cloud Functions. Unfortunately, Firebase does not seem to have a public issue tracker, and I need to figure out where to file this issue.
-
Finally, there are cold starts improvements coming to Cloud Functions in the form of the Min Instances feature. We've created a feature request here
which you can subscribe to for more updates. We expect to have this feature in Private Preview in the next 4-6 weeks. For many users here, this may be a satisfactory resolution.https://buganizer.corp.google.com/issues/181884353
bu...@gmail.com <bu...@gmail.com> #103
"lamda warmers" or min instances on aws, it only hides the problem and
generally generates the cloud provider more money.
Ideally #2 would be best, or even the possibility of breaking out
components so we don't have to load in the whole sdk just to use one admin
function or make a firestore call.
On Thu, 4 Mar 2021, 21:59 , <buganizer-system@google.com> wrote:
ma...@apptreesoftware.com <ma...@apptreesoftware.com> #104
#1 is true but I find without the firebase SDK it's acceptable. Any improvements would be appreciated.
Min instances are a great addition for specific use cases but overall they are not providing a huge benefit to us. In our application we have many scaling events during the day. Our app is spanned across multiple time zones and there are time periods within each timezone where functions need to scale up to meet demand. When our application needs to scale a user was often waiting 20s+ as a new instance came online to handle their request. Then they would perform another action in the application which would cause yet another scaling event for a different function. Again waiting 20s+.
Solution #3 just means we have to set min instances on a large number of our functions to avoid this experience for our users. We still run the risk of the app scaling past the point of my configured min instances. While this will band-aid the issue, the value proposition for cloud functions is lost once you start using the Firebase SDK.
#2 is the solution I am waiting for... OR the ability to deliver firestore triggers and callable functions via EventArc so we can handle them directly in Cloud Run.
wi...@google.com <wi...@google.com> #105
Hi all, for Firestore improvements, I talked to the Firestore team and they asked me to pass this along:
The Firestore SDK team has examined the impact of loading the Firebase Admin SDK pretty extensively, and while there are some small gains to be had code loading/weight doesn't explain anything like the multi-second cold starts folks on this thread have reported. We believe the issue seems to lie specifically with the gRPC connection the Firestore SDK uses to read data, but our internal testing has not been able to reproduce the same effect that has been described in this thread. We're going to keep investigating (including options such as providing a non-gRPC SDK), and if you can reliably reproduce 5s+ cold starts with a minimal code sample, we'd love to know more about it so we can take a look (including how you're measuring the cold start duration).
I'm going to move this issue to the Firestore component so that the Firestore team can continue to action this issue.
(For GCF specific improvements to cold start, as mentioned earlier, I would direct you to
ay...@gmail.com <ay...@gmail.com> #106
thread which can be referred by firebase team.
Regarding min instances, currently we are using it in cloud run.
Unfortunately, even though they solve for first concurrent 80 requests(in
best case), its a huge overload on the cost factor with different
environments.
We have 4 environments currently and 4 services on cloud run.
4 x 4 x 6 = $96 per month for lowest cloud run spec.
This is no way usable when preferring a serverless environment.
And for finctions, min instance = 1 can solve only for one concurrent
request only right?
I see #2 as most optimal solution for ghis problem. Sdk need to be fixed
with a REST fallback as an alternative.
On Fri, Mar 5, 2021 at 6:31 AM <buganizer-system@google.com> wrote:
vi...@google.com <vi...@google.com> #107
We are excited to ship Min Instances on Cloud Functions to help with cold start times.
To get onboarded to the feature, please fill the given onboarding form:
Best regards,
Vinod
wb...@sentryware.com <wb...@sentryware.com> #108
We're still experiencing 5+ second cold starts for functions that perform a simple Firestore transaction. We love the ease-of-use of GCF, but we're now forced to reevaluate other options as this behavior is not acceptable in production. Frankly, I'm surprised Google hasn't pulled out all of the stops on this one, as it completely cripples one of their flagship cloud products.
The new min-instances feature does help some, but it's only a bandaid that does not resolve the problem. Even with min-instances, customers are still vulnerable to cold-starts when concurrent requests exceed the allocated capacity.
su...@gmail.com <su...@gmail.com> #109
bl...@google.com <bl...@google.com>
ay...@gmail.com <ay...@gmail.com> #110
instances too as its getting costly(cloud run) for the diff environments we
have (around nearly $50 all envi's but which may go up to $40 per
environment when we split our microservices further).
The same applies for cloud functions too and more over its not capable of
handling request concurrently. Only option need to increase min instances
which further increases the cost.
So, we wanted to modify the repo as thats the correct way to solve it
rather than min-instances. I tried at max not to change much of the
nodejs-firestore <
it makes it easier to rebase with master and also to make compatible with
existing projects.
Before making it public, I would like to do a few more tests to prove it's
beneficial for all who want to avoid cold starts with firestore sdk.
Here are the results until now.
1. Able to completely wrap the nodejs firestore api (except for
partialQuery api)
2. Finished lazy loading of grpc as its not used any more with rest
implementation.
3. Able to save 2secs loading time which I see need to be squeezed more -
working on it.
[image: Screenshot 2021-04-29 at 1.36.52 PM.png]
If someone can make testcases to quickly try out the api, it will be of
great help. Do let me know if you would like to contribute.
Thanks,
Ayyappa
On Sat, Apr 3, 2021 at 12:09 AM <buganizer-system@google.com> wrote:
ch...@gmail.com <ch...@gmail.com> #111
This was after a significant amount of work done via node.js and the decision was not made lightly. We host multitenant applications across multiple firebase projects and the startup time was causing 45-60 second processing time for purchasing. While some of this delay was caused by multiple functions chaining and triggering other workflows, the start up time for 7-10 chained/triggered functions was unbearable.
Cloud run, if you are not aware (I wasn't), only processes items during http requests and does not allocate cpu time to pull based pub/sub. Min instances are insanely expensive (4 cloud run revisions set to 1 instance) is more than running a 3 node zonal k8s cluster with preemptable e2 medium instances (total 6cpu, 24GB for the cluster).
Our firestore event to processing times are now down below 2 sec for the entire chain (further processing still occurs - purchase time is below 8 seconds).
- firebase cloud function go --> http cloud run receiver
- receiver cloud run http service --> pub sub
- pub sub --> processing micro service in k8s
For log append type processing, messing with cloud functions and cloud run solely, ended up being a disaster. While running all in the emulator works really well, deployment to production ends up being a constant troubleshooting case that eats significant time.
If you are not doing this as a hobby and have more complicated workflows, being pennywise and pound foolish on saving a few bucks with "free" cloud functions might not be the best way to go. Waiting for a long term resolution ended up forcing our hand to change our core architecture and also caused pause about relying too much on what should be a simple library from google to connect to firebase/firestore.
While this is not applicable to everyone on this thread (given architecture needs), the change in direction for us allowed a much more simplified and proven architecture as opposed to the spaghetti mess that cloud functions inadvertently created.
da...@allfront.io <da...@allfront.io> #112
The tips and tricks post is nice but it's I think it's a bit tone-deaf as it's throwing the blame on your customers, when doing nothing else but querying a document using the Firebase SDK replicates the issue, as many people on this thread have shown. I can provide a minimal code example / repo to reproduce it if it's an issue.
pm...@gmail.com <pm...@gmail.com> #113
- Trimming my dependencies to the bare minimum
- Switching to @google-cloud/firestore instead of firebase admin SDK
If I add a min instance then it halves cold start again to 500ms-800ms or so. Once the function is active then it runs about 50ms per invocation.
It looks like (?) the min instance uses idle pricing - so it's not as expensive as just keeping it running with pings. But I haven't done a thorough analysis on that.
bu...@gmail.com <bu...@gmail.com> #114
Throwing money at it for min instances isn't a fix, it's just a crappy solution that happens to make Google more money and keep us quiet for a bit.
kh...@google.com <kh...@google.com> #115
Hi everyone. Thank you for being patient with us while we have looked into this issue. We understand the impact it has and how frustrating it can be to deal with. We take this issue very seriously however there are a lot of complex moving parts involved which must be adjusted very carefully. To date we’ve done the following:
- Worked directly with numerous customers that reached out to us and helped them address their issue
Published best practices on writing performant functions - Rolled out a series of performance enhancements to cut function startup time for common workloads.
- Released the
min instances feature - Sped up function invocations from Firestore Triggers when only reading the triggered document (
).pull request
We have more improvements coming. Please stay tuned.
As we explore and prioritize other improvements it would help if we knew the following:
- What are your startup times at the 95th vs 99th percentile when testing a minimal reproduction of the issue? How does it account for network overhead between the origin and GCP? Please share your code and data with us.
- What is your specific pattern of traffic at the time you see the issue? For example, is it during significant spikes, idle times, steady state, etc? What is the traffic volume at the time?
br...@askgms.com <br...@askgms.com> #116
Thanks for posting an update. I think everyone who's still bothering to follow this is trying to be patient, but this is an egregious length of time for a significant blocker to usage. A lot of the frustration seems to come from the obviousness of the solution. The requisite dependencies for interacting with many Firebase components cause this issue, and they are absolutely the prime offender regardless of how many other dependencies are included: they need to be trimmed/deferred/modularized/all of the above.
It's great that there's at least one deferral change in place is starting to get attention - #5 is an example of this - but at this one-use-case-per-year pace, many of us may never see results which make expected use cases for Functions w/NodeJS work properly.
There are numerous examples provided by users of the exact behaviors which are issues here (e.g.,
As a note, the #4 min instances "feature" doesn't actually address the problem at all; cold starts are still exactly the same, you just hit them less often since you front-loaded the lag time. You also have to anticipate your load well enough to set a min instances count that will prevent cold starts and that means functionally paying for peak capacity 100% of the time. Either that, or you hit cold starts when your traffic has any sort of a spike.
One thing that could solve this for the majority of use cases is not de-allocating shards after such a short span. If you allowed each to live for 24+ hours, I suspect you'd find that most concerns would evaporate, though traffic spikes could still cause cold starts. Another approach could partner with this to analyze load increases and anticipate spikes, making available shards in advance of calls being made. Obviously only helpful for Functions with a steady flow of traffic and intermittent ramped spikes, but that is another major use case (for instance, site hosting via Functions would likely be fixed with this).
To answer your questions in our case:
- We consistently observe 5-8 seconds for cold starts in most cases, with it rarely dipping below that range. Statistical analysis is unavailable. We look at the logs while loading an asset, observe the lag, observe that the function doesn't start for that length of time, note the response is received after the delay. The next request completes without delay, as do all following that fall within a window where there has been prior activity. Not sure if it's enough to "account" for network overhead. Honestly, if you're still at this phase of investigation, this is likely a lost cause - just try out any minimum viable examples that weren't explicitly fixed by the special patch in #5 and you can easily observe this behavior yourself. We will not be able to share our specific code or data.
- This is at any time when a Function is called on a new shard (or whatever the term is for the container running the Function). It's especially obvious if a Function has been dormant for more than ~20 minutes, guessing that shards are deallocated at that point and so any call needs to cold start one. I'd expect any deviation from a completely steady stream of requests would eventually trigger this.
ry...@cinder.studio <ry...@cinder.studio> #117
Our company migrated away from using the default firestore libraries some time ago in favor of building our own against the firestore Rest API and our performance increased significantly. The majority of our issues had to do with the gRPC features of the Google sourced firestore library taking several seconds (up to 5 seconds) to warm-up.
A few weeks ago I began an effort to extract the code we have put together on this, and move it into an independent 3rd-party library. The goal to be that we open source it on Github. It's not perfect, and it's far from "open source standards" of ready (in relation to documentation and needed thorough testing). It is also not yet ready to deploy to NPM. I simply have just not had the time.
However, if anyone wants to make use of the libraries we are using you are welcome to them. We'd love any support in helping get this library open-source and NPM quality. We rushed this solution together to solve an immediate need so it could be improved.
We've been beta testing a few instances of our systems against the newly extracted codebase (as compared to the implementation of the same technology in our current codebase) and it appears to be functional.
So if you are interested in using (and possibly even helping out with) a pre-production open source library on the topic. Please dive in!
da...@allfront.io <da...@allfront.io> #118
Re this sentence from the tips and tricks:
"blocking a user-facing UI update on the response from a Cloud Function is not a good idea."
I feel mislead that I need to dig so deep to find that out.
Until this is fixed, I suggest that this is placed first thing in your docs or in bold, in a prominent place in your marketing material, otherwise there's a danger that other developers would be misled into thinking that firebase + functions are for building apps that have UI updates.
da...@allfront.io <da...@allfront.io> #119
ay...@gmail.com <ay...@gmail.com> #120
Its fine but as we have more services, we need to pay more for min instaces
per service(cloud run).
Each min instance for 128mb config costs around $7 anx for 512mb it costs
$19. So ideally if you have microservices project, with diff environments,
multiple services per environment will be a huge cost factor for us.
Cloud functions + min instances is not even a good solution to be honest.
Its better to pay for cloud run min instances as they handle concurrency.
However, AppEngine has an advantage but comes with a cost.
On Tue, Jun 15, 2021 at 3:14 PM <buganizer-system@google.com> wrote:
da...@shax.com <da...@shax.com> #121
examples or a deeper understanding of use cases to recreate this issue. The
issue has been identified: it is extremely slow to create the initial GPC
connection from the function to Firestore.
The primary solution is: identify why GPC is slow to connect and fix that
issue.
A secondary (workaround) solution is: in the Admin SDK, officially support
the REST API as an alternative to GPC for Firestore and allow that to be
configured when creating a Firestore instance (or potentially make it the
default). It is acceptable if using the REST API does not support real time
subscriptions.
We appreciate you prioritising this issue, but asking for more examples and
use cases is starting to feel like stalling. It is now time to assign
engineering resources to solve the problem — not continue to triage it ad
infinitum.
Thanks
Dave
ch...@chris-reilly.com <ch...@chris-reilly.com> #122
Next, I eliminated the sink from the equation by writing to pubsub directly using the @google-cloud/pubsub library. To my surprise I still was seeing 20-40 second delays, and increased my cold starts on the first function to 5-8 secs.
Finally I attempted to write directly to firestore from my first function and still see 20 second delays from invocation to the document being in the db and rendered on the front end. The curious thing is that doesn’t just happen on cold starts, and even when the function finishes in 500ms it still takes 10+ seconds for the document to be in the db.
I mentioned all of that to raise two issues that might add to the discussion:
1- the delays associated with gRPC calls from cloud functions appear to be affecting pubsub as well.
2- firestore seems to have delays with ingestion even well after the function finishes. Pushing the same event from my local machine with the same library resolved within milliseconds.
I’m trying to build a ‘real-time’ serverless product and these latencies are an existential threat to its viability on GCP.
[Deleted User] <[Deleted User]> #123
Is there any news on the status of this issue?
kh...@google.com <kh...@google.com> #124
Hi everyone,
Yes, we are still looking into this issue and it is still a priority. As I mentioned back in June it is a very complex problem due to all the moving parts involved and improvements will be incremental. I'm not at liberty to discuss specifics but we have improvements in progress and this ticket will be updated when we have something public we can share.
Thank you for your continued patience.
ay...@gmail.com <ay...@gmail.com> #125
Firestore + Functions.
Firestore showing up 5.8+ secs @99.9 percentile which is unacceptable.
Maybe they could have tried with min instances or cloud run but still it's
totally related to the current problem we are discussing here.
Ref:
On Wed, Sep 1, 2021 at 9:49 PM <buganizer-system@google.com> wrote:
st...@google.com <st...@google.com> #126
If I understand correctly
ay...@gmail.com <ay...@gmail.com> #127
Ref 1:
Ref 2:
However,
As per the title of this issue, it's **not actually** cloud functions fault
but mainly firestore's gRpc libraries. Just because all fall under the same
umbrella(Google), the issue is referred here by everyone regardless of its
firestore library issue, reason being on cloud, firestore used mostly in
combination with functions.
On Thu, Sep 2, 2021 at 12:53 AM <buganizer-system@google.com> wrote:
ca...@google.com <ca...@google.com> #128
Hi,
As an update on this Issue Tracker, our Engineering Team is still working on this issue and they will provide an update as soon as they have something relevant to share.
da...@gmail.com <da...@gmail.com> #129
st...@google.com <st...@google.com> #130
Note that customers who wish to keep their Cloud Functions warm to avoid cold start can leverage the "min instances" feature
bu...@gmail.com <bu...@gmail.com> #131
ay...@gmail.com <ay...@gmail.com> #132
I noticed it's pretty fast and tries to maintain the
nodejs-firestore project in parallel so that I can shift once Firestore
fixes the gRpc problem.
Steps to integrate:
1. npm install @bountyrush/firestore
2. Replace require('@google-cloud/firestore') with
require('@bountyrush/firestore')
3. Have FIRESTORE_USE_REST_API = 'true' in your environment variables.
(process.env.FIRESTORE_USE_REST_API should be set to 'true' for using in
rest mode. If its not set, it just standard firestore with grpc connections)
As I'm using the same nodejs-firestore project by forking it, I tried to
have the compatibility to max. If you see any issue implementing it please
let me know.
I see the cold starts are much better now and can be further improved. Do
let me know your feedback/thoughts.
Thanks,
Ayyappa
On Tue, Sep 7, 2021 at 10:33 PM <buganizer-system@google.com> wrote:
ry...@cinder.studio <ry...@cinder.studio> #133
ay...@gmail.com <ay...@gmail.com> #134
I will check it out. The main goal for the library (
better cold starts and fully compatible with existing official one so that
we can quickly shift back to it once firestore fixes it (If they really do
- I'm not sure they go away from gRPC anytime soon :|).
Due to having rest mode fully compatible with the official one, we got some
extra baggage with it (due its internal libraries). However, we see much
better loading times and are currently using it in our project.
Would be happy to collaborate if you have some plan in mind to make it
better.
Thanks,
Ayyappa
On Sat, Oct 16, 2021 at 7:18 AM <buganizer-system@google.com> wrote:
wb...@sentryware.com <wb...@sentryware.com> #136
The fact that now two Google customers have provided meaningful workarounds to a product-breaking, 16-month old problem that a nearly two trillion dollar company hasn't provided any solutions for does not inspire confidence.
It looks like there are about 30 Googlers CC'd on this issue, to which I say this: Please fix your product. We really want to use it, because when it works it works really well. However, this problem makes it completely unusable in production. We cannot reasonably ask our users to wait 6+ seconds for a function to cold start. The workarounds provided by Google can reduce some of the impact but are not sufficient. It is clear that Google is trying to grow their cloud market share, and we're cheering for you, but the inaction on this issue is producing the opposite effect.
If Google cannot or will not solve this issue in a reasonable amount of time (which, frankly, was a long time ago), my org is going to be forced to abandon this product. We cannot gamble our own success on hints of an eventual resolution. After 510 days of waiting, we need more than the unfulfilled promises we've been given thus far.
br...@askgms.com <br...@askgms.com> #137
Sadly, we've had to migrate away from Firebase Functions for hosting our Node.js programs. We had assurances from their team in December 2020 that this was being fixed ASAP, and here we are in October 2021 with no progress to speak of. We've moved on to Cloud Run, but frankly, if we weren't already built out for other portions of the Firebase environment such as auth, we'd have migrated back to AWS and likely never revisited GCP again. That may still happen once we have time to port functionality - my faith in critical issues being resolved by the GCP team is entirely eroded.
This is a severe black eye for any aspiring cloud platform, and many times more so given this is Google! Either tell us you're not going to fix this, or commit to fixing this on a certain time table and do it. Not that it'll benefit my team any longer, but for the sake of everyone else here, get your head in the game.
ma...@gmail.com <ma...@gmail.com> #138
TLDR: Try writing critical client-facing functions in Go
Yes, Go is going to be better than Node.js, but I was curious to see how it performs in the Functions environment, especially considering this issue. This is my first post here, there are plenty of complaints here, I thought I might try to share my own experience. This is absolutely not a silver bullet and not applicable to everyone, but it might help certain use cases. I've been increasingly frustrated by the 5-15 second cold-starts, and this is just to execute a simple Firestore query returning a single document ID, seeing as the client libraries
Serverless is supposed to be fast, not a background processing platform (although it excels at that too!). It's caused me to reconsider Firebase several times and try Cloud Run with Go, but then I realise that Functions aren't much different, and I keep coming back. Some things, like storage rules and auth, are baked into the Firebase platform. It's exhausting and frustrating, but at least I don't have the pressures of a company, my heart goes out to you that do.
I haven't had time to try the above community-improved libraries, what I did try is re-writing a function in Go after seriously considering starting from scratch on Cloud Run, and I'm shocked at how much better it performs.
- Before: 1G instance, 8000-15000ms cold-starts in Node.js (depending on which VM is allocated, and probably also somewhat dependent on the direction of the wind), ~100ms warm execution times (as little as 3ms for CORS pre-flight requests)
- After: ~80ms cold-starts using 128M instances written in Go, using around 20MBs of memory. In fact, the 128M seems to perform better than 1G, strangely. I haven't measured the overhead due to the actual allocation of a machine and download of the container, but it is negligible with network latencies.
This is not a viable solution for everyone. I myself have built significant tooling in Typescript to get rid of boilerplate, stuff that would have been amazing to have in a firebase-contrib
type standard library. It's a struggle to set into a new language with no generics and increased verbosity, but the performance gains are worth it. I just wish that Firebase natively supported Go.
Some of the pros 👍 are:
- No need to lazy-import files to help that cold start
- No more worries about exceeding the maximum function upload size (
node_modules
), Go is naturally smaller and has dead-code elimination I believe - Go doesn't have to synchronously
require(...)
10/100/1000s of files from the disk (future ES Modules and bundling could help, but it's extra setup boilerplate) - The SDKs seem to be high quality, perhaps better than Node.js (native Query iterators that I avoided in Node.js due to lack of documentation), the bonus of being Google's star language, it pretty much feels like import-and-go (also an excellent language for bad puns), like with Deno
- The std library is amazing, it even has image manipulation for thumbnail processing
- It integrates nicely with the Firebase dashboard and error tracker, as if it was meant to be
- Did I mention the cold-start times that are faster than Node.js's warm-start times (OK, maybe exaggerating a bit...)?
Bear in mind that you will have to manually enable APIs and set up Scheduler timers, and there is no native request auth verification (big bummer, there is no onCall
equivalent yet, but no problem if the function is public serving), you have to verify that yourself. If you're dependant on Node.js for SSR or have a very large app/function, you're out of luck, but I would recommend taking a look at Cloud Run, it processes several requests concurrently and will further reduce the chance of hitting a cold start, and supports background events (via push). From my experiments, it also seems to allocate extra instances pre-emptively during spikes, but this is pure speculation. Testing & emulators are also going to be more of a pain, but to be frank,
To be clear, I consider this a workaround. The issue lies clearly with the Node.js SDK, not the language or runtime itself. Go is more efficient, but unless you're doing compute-heavy work, it's not magically 10x faster, and it also has its quirks. There is little need for CSP when behaviour is mostly synchronous on a single-core instance. It just feels like the Node.js SDK (or the underlying gRPC dependency) isn't natively JS, but rather ported from Java/Go.
Until GCP addresses this issue, I would recommend to try a gradual adoption with Go for critical parts.
- Create a new directory in your Firebase project
- Add a
go.mod
file (Go 1.16) - Create a
functions.go
file with a non-main
package name. This file can house any number of functions, you specify the entry-point when deploying each function - Each function is a normal Go-type HTTP request handler, very similar to Node.js/Express' request handler.
The good news is that deployment is still very automatic, a single command with no Cloudbuild configuration. The firebase functions config you're used to pretty much maps 1:1 to gcloud functions deploy
CLI arguments, it's reproducible and there's no need to mess with the GCP GUI.
Marcus
kh...@google.com <kh...@google.com> #139
As I mentioned in September Google Cloud Functions is a large product built on a LOT of very complex interconnected systems. Since then work has been done on this issue but I can't share specifics. I know this is extremely frustrating to hear but want to reassure you that this is important to Google and is regularly checked on. It's unlikely there will be a dramatic update anytime soon but it is moving forward.
kh...@google.com <kh...@google.com>
at...@protonmail.com <at...@protonmail.com> #140
Thanks,
Jon.
ay...@gmail.com <ay...@gmail.com> #141
Do let me know if you hit any trouble!
Thanks,
Ayyappa
On Wed, Nov 17, 2021 at 6:43 PM <buganizer-system@google.com> wrote:
at...@protonmail.com <at...@protonmail.com> #142
da...@allfront.io <da...@allfront.io> #143
We enabled min instances on some functions, it seems better but if they are not used after a day or so we are still getting a cold start.
Do you really need to have min instances + warm up jobs to work around this?
jo...@gmail.com <jo...@gmail.com> #144
Has anyone here had luck sorting this out with min instances?
min instances in firebase cloud functions did not solve it for us
we moved all functions that require immediate response to a more traditional web server running in cloud run, and are using traditional rest api calls rather than invoking a cloud function directly
br...@askgms.com <br...@askgms.com> #145
We also migrated to Cloud Run - min instances didn't solve anything for us.
Once we were there, I realized there was another key component of Functions which makes running a Node.js server extremely inappropriate: Functions only allow 1 concurrency. This means that, even if you have a min instances count of $HIGH_NUMBER, you can hit that with one client making a variety of concurrent requests (say, asset fetches, auth checks, page content, etc). Our usage shows approximately 20 requests concurrent on average for a first page load, which means that we'd have to have 20 min instances for ONE client to be served without a cold start impacting latency.
Obviously that's untenable, even for one client, so we moved to Cloud Run and can have a concurrency in the hundreds without issues. Set a min instance count there, forget it. The concurrency limitation is so severe that I'd actually encourage the Firebase team to explicitly note that Node.js functions should not be latency-sensitive and any which would expect concurrency (e.g., web servers) should absolutely avoid Functions. Maybe it's somewhere in the documentation, but we didn't find it and wasted a lot of time barking up the wrong tree.
jo...@puul.io <jo...@puul.io> #146
How would I authenticate
ay...@gmail.com <ay...@gmail.com> #147
process.env.FIRESTORE_USE_REST_API = true
const functions = require('firebase-functions')
const admin = require('firebase-admin')
const adminConfig = JSON.parse(process.env.FIREBASE_CONFIG, process.env.
FIRESTORE_USE_REST_API)
admin.initializeApp(adminConfig)
const { Firestore } = require('@bountyrush/firestore')
const db = new Firestore()
I will write a simple tutorial on how to use it. I can share the code from
my project but I have my own abstraction(to include multiple database
drivers) which may look complicated.
Please post on the github issues if you need any help.
Thanks,
Ayyappa
On Wed, Nov 24, 2021 at 5:44 AM <buganizer-system@google.com> wrote:
jo...@puul.io <jo...@puul.io> #148
jo...@examind.io <jo...@examind.io> #149
Note: min instances improves things slightly, but I still encounter long (~8 sec) start times
pe...@gmail.com <pe...@gmail.com> #150
ay...@gmail.com <ay...@gmail.com> #151
@pargolfsolutions.com Thanks for pointing to our REST implementation (
jo...@examind.io <jo...@examind.io> #152
ma...@gmail.com <ma...@gmail.com> #153
It won't fix the issue, but you'll probably have a higher chance of being allocated on higher-end hardware that may make the cold-start a little faster. I've even had function execution time take twice as long on some instances with the same configuration as on others for the same task. The lower-end instance types are noticeably slower in my experience.
bl...@gmail.com <bl...@gmail.com> #154
ay...@gmail.com <ay...@gmail.com> #155
Glad it's helpful! Thats definitely a booster! It has scope for reducing a couple of seconds more, which I will start working on :)
jo...@examind.io <jo...@examind.io> #156
I'm still using the Firebase Admin SDK with Firestore.
wb...@sentryware.com <wb...@sentryware.com> #157
I'm curious if Cloud Functions v2 will help with this since it is moving to Cloud Run as the backend. If anyone has tried the public preview, I'd love to know if you've witnessed any improvement regarding this issue.
st...@ctma.fr <st...@ctma.fr> #158
be...@gmail.com <be...@gmail.com> #159
And you what are you planning to use instead ?
bu...@gmail.com <bu...@gmail.com> #160
IMO it's not particularly suitable for enterprise if you're relying on various firebase/firestore stuff and serverless/firebase functions.
I find there is also various things you can't do because the methods just don't exist, you can raise a feature request on github, but it'll probably sit there for years with all the other requests.
jo...@gmail.com <jo...@gmail.com> #161
ay...@gmail.com <ay...@gmail.com> #162
Could you please open an issue at the github page so that I can look into it next week?
st...@google.com <st...@google.com> #163
Cloud Functions gen2 runs on Cloud Run, but Cloud Run is using by default the same execution environment as Cloud Functions gen1.
However, Cloud Run has a new execution environment in Preview:
I'd be interested to know if the latency issue also occurs on Cloud Run's second generation execution environment.
If you are using Cloud Functions gen2, you can enable the second generation execution environment by changing this setting in Cloud Run.
[Deleted User] <[Deleted User]> #164
se...@nextnowagency.com <se...@nextnowagency.com> #165
I was astounded when usually-reliable cloud functions were returning page content with whopping 5 second latency. Doubly so when I isolated the problem to, as pretty much this whole thread has indicated, the very first access to admin.firestore(), reliably 4000-5000ms cold, but only 70-200ms presumably-warm, "cooling off" again arbitrarily after 0-30 seconds with no real pattern.
It took a morning of searching before I tripped over the original 2019 github issue, leading to another issue thread, leading eventually here. 3 years, no fix? Client-side libraries and server-side REST calls still outperform the official documented server-side pack-in solution??
This project is already costed & contracted anticipating the ease-of-use of Firebase, but as senior architect, I'll definitely be having some second thoughts about pitching Firebase in the future knowing that a single admin.firestore().doc('...').get() can take full seconds to go through under ANY circumstance.
da...@gmail.com <da...@gmail.com> #166
da...@allfront.io <da...@allfront.io> #167
it's min *idle* instances that fixed it for me in app engine. But we don't
have that flag on firebase.
On Wed, 17 Aug 2022 at 22:56, <buganizer-system@google.com> wrote:
David Stellini Partner
Phone
Website
davidstellini@allfront.io
+356 79954701
bu...@gmail.com <bu...@gmail.com> #168
Any longer running service like app engine, ec2 with no cold starts doesn't have the issue.
I'm not too surprised that Google still haven't fixed it yet, but also amazed they felt 1~8 seconds cold starts is acceptable? It certainly wouldn't be acceptable for thier own products, imagine if doing a Google search or logging into Gmail / YouTube took nearly 10 seconds?
wb...@sentryware.com <wb...@sentryware.com> #169
There are supposedly over thirty Googlers CC'd on this issue and yet here we are three years later with no solution other than to just not use the product they are trying to champion. This product is fantastic but how are we supposed to build anything production-worthy when this issue turns a simple Firestore read into a user churn because they rightfully didn't feel like waiting 6+ seconds?
If this is the way Google handles P1/S1 issues, how are we supposed to trust ANY of their cloud products?
Google? Can we fix this three year old show-stopping issue please?
Thank you. Sincerely, people that want to use your products.
gr...@gmail.com <gr...@gmail.com> #170
ay...@gmail.com <ay...@gmail.com> #171
causing the coldstarts and using REST api clearly fixes the problem.
I still wonder whats the problem in providing an official rest wrapper
which can be maintained.
We made a wrapper which works wonders but still don't have the energy to
maintain it being a one person team (
Its super annoying to look for workarounds like min instances which are not
even a proper solution. It costs hell as we go forward and even with
current count.
We will decide pretty soon!!!
On Thu, Aug 18, 2022 at 3:38 AM <buganizer-system@google.com> wrote:
st...@google.com <st...@google.com> #172
Sorry for the lack of progress. As you might guess, the cause is hard to pin point, and involves many teams.
Honnestly, the best you can do is leaving comments that will help the Google teams debug. So anyone leaving a comment going forward, please capture:
- If you are using Cloud Functions (e.g. deploy via
gcloud functions
) or Cloud Functions for Firebase (deploy viafirebase deploy
) - If Cloud Functions, then if you are using 1st gen or 2nd gen.
- The function startup times that you are measuring
- The package name and version of all google-owned modules you are loading
- (Optionally) a pointer to a repro case, or a copy of your package.json
Also note that in
To anyone using Cloud Run and seeing a startup latency that they consider too high, please open a new bug in the Cloud Run component. It is unclear if the cause is the same, do not assume it is. This bug is focused on Cloud Functions.
am...@gmail.com <am...@gmail.com> #173
Understood.
But why can't you release an official rest API wrapper? This has been raised by several people over the years, but it seems to get conveniently ignored, among other issues.
Your view on the official rest API wrapper will be highly appreciated.
ja...@google.com <ja...@google.com> #174
Have you tried firestore/lite
an...@taskheroics.com <an...@taskheroics.com> #175
I think this thread has pretty conclusively pointed toward the cause being the size of the grpc library dep. The frustration felt by myself and others here is IMO due to exactly that sort of comment which seems to show that this thread has not been carefully read and this problem not carefully investigated by anybody at Google. It feels like a platitude at this point with a P1 S1 issue having little meaningful progress in 3 years.
I hope it's easy to see how all of us affected by (and losing customers because of) this issue might feel that this is being ignored.
How can we _actually_ meaningfully move this forward? Can this be brought to the attention of someone at Google who has the power to direct some resources toward fixing it? Or can we at least change the status to "won't fix" and prominently update the docs to mention that cloud functions have an expected latency of up to 6 seconds?
gr...@gmail.com <gr...@gmail.com> #176
Have you tried firestore/lite? That's the official REST wrapper for Firestore.
That's a replacement for the web client library. This bug is about the Node.js library.
ch...@sidkik.com <ch...@sidkik.com> #177
This is a prime example of cloud vendor lock in. You create an app that leverages a cloud specific libraries and managed applications and are totally dependent upon the vendor resolving and fixing issues. Sometimes that happens quickly and other times (as in this thread) the underlying condition is not addressed and a patchwork of costly options are presented as fixes. This happens across all 3 big cloud vendors and is not specific to google.
If this problem was going to be fixed, it probably would have by now.
I started using all of the baked in services in gcp and for a while, things worked well and were cost effective. Once you get to scale or if new bugs/features are introduced you have new challenges. This specific challenge is a show stopper both from cost and performance.
I still use gcp, but I pivoted away from most if not all of the managed gcp services and only leverage cloud agnostic apps hosted on gke. If you run in a single zone (avoid backplane charges with an isolated billing account) and use spot instances you can run a much better environment for ~$25 a month. No cold starts,only applications designed to be stateless with spot instances. I still use firestore as the storage layer, but avoid the cold starts with long running containers. You could run this on cloud run, but if you have more than one service, the cost will quickly exceed what you can do in gke. You could also run in anthos, but that is even more expensive than cloud run.
I originally forwarded a simple trigger payload to an http listener that would drop into google pub/sub (a handful of functions down from 40-50 functions). That works ok, but once you move to prod, you will see a .5 to 1 sec lag on pubsub receiving and the pushing. If you run a multistep async process (create account, create stripe, create active campaign, create quickbooks, etc) then those delays add up quickly and you are almost as bad as before with pub/sub being the bottleneck.
Our latest iteration, uses a handful of triggers forwarding to http listeners that then push to nats. Latency for intercommunication is less than 1sec from trigger to insertion in nats. And intercommunication between services (using messages on nats) is sub ms to 2-5ms delays. Our original payment process using stripe (multistep triggers) was about 45-60 seconds in prod which was unacceptable. Payment processing is down to 2-6 seconds with the nats setup.
This setup does not come cheap from a development/management perspective. You need to know k8s, nats, spot instance drops, monitoring, backup etc. If you don't want to invest in that knowledge, then you are beholden to vendor lockin and issues. At least with this setup, you can control response time and bug fixing to mitigate customer impact.
se...@nextnowagency.com <se...@nextnowagency.com> #178
firestore/light absolutely looks like the kind of thing that ought to have been mentioned about 18 months ago (since the best "solutions" are already "just pivot to client/REST libraries"), unless it's brand-new. I look forward to trying it out hopefully this afternoon!
wb...@sentryware.com <wb...@sentryware.com> #179
As mentioned above, "firestore/lite" is not for the Admin SDK, but rather a replacement for the standard web client SDK. I don't find it at all relevant for this issue.
Also as mentioned above, the comments coming from Googlers are not informed and read as though they have not read the hundreds of comments that include all of the information they need to diagnose the problem. All you have to do to replicate this issue is use the Node.js Firebase Admin SDK in Cloud Functions, with the default configuration, and try to interact with Firestore in any way (read, write, transaction, etc) as stated in the Firebase documentation. You will find the cold start time to be consistently in excess of 5 seconds. That's your reproduction case. Please fix it.
Very frustrating.
gr...@gmail.com <gr...@gmail.com> #180
Truly frustrating. What does P1/S1 even mean? This is clearly not being prioritized.
And the commenter above is absolutely correct. At this point, being told to provide a reproduction of this issue is almost insulting. This issue can be easily and reliably reproduced performing the most basic tasks with Firestore using the admin SDK. Refer to any of your own tutorials. Here's an example:
ja...@google.com <ja...@google.com> #181
FWIW the scope of this bug is Cloud Functions cold-start & GRPC. If you'd like to submit a feature request for a firestore-lite equivalent in the Firebase Admin SDK there are more appropriate channels for that.
gr...@gmail.com <gr...@gmail.com> #182
Fair enough! Would you mind pointing us to the right place to request that feature?
ja...@google.com <ja...@google.com> #183
<
an issue directly on the firebase-admin Github repo here
<
It's very possible that code could be reused between the firebase/lite
client library and the firebase-admin codebase—at the very least the
authentication mechanism would have to be swapped out
On Thu, Aug 18, 2022 at 2:08 PM <buganizer-system@google.com> wrote:
gr...@gmail.com <gr...@gmail.com> #184
I've filed a feature request for a Node version of Firestore Lite. If anyone wants to make some noise there, it might help it get attention:
st...@google.com <st...@google.com> #185
I have followed up with the Node.js client library team about REST. We hope to have good news to share here soon. Stay tuned.
All you have to do to replicate this issue is use the Node.js Firebase Admin SDK in Cloud Functions, with the default configuration, and try to interact with Firestore in any way (read, write, transaction, etc) as stated in the Firebase documentation. You will find the cold start time to be consistently in excess of 5 seconds.
Unfortunately, this is not what I observe with a quickstart using a 2GB function. The times I measure are that I see in the logs.
Baseline
exports.helloWorld = (req, res) => {
let message = req.query.message || req.body.message || 'Hello World!';
res.status(200).send(message);
};
GCF gen1
- cold: 689 ms
- exec: 6 ms
GCF gen2
- cold: 839 ms
- exec: 4 ms
Using firestore
const {Firestore} = require('@google-cloud/firestore');
const firestore = new Firestore();
exports.helloWorld = async (req, res) => {
const document = firestore.doc('users/steren');
const doc = await document.get();
console.log('Read the document');
res.status(200).send('Hey');
};
GCF gen1:
- cold: 1400 ms
- exec: 84 ms
GCF gen2:
- cold: 1900 ms
- exec: 80 ms
GCF gen2 + enable "second generation execution environment" in Cloud Run
- cold: 1026 ms
- exec: 110 ms
This last experiment tests what I suggested in
And as I suggested in
pi...@gmail.com <pi...@gmail.com> #186
If you throw resources (2GB functions) at the problem, is not as bad, but the whole point is it shouldn't take that long on more conservative resources (ie 512MB)...
If I'm fetching a plain document stored on other server (ie: firestore or API request), I simply don't need 2GB memory every single run, since most of the time the function will be waiting for a response, wasting resources and money.
What's the most upset about this, is the fact this problem has been known for so many years, it has been in P1 for so long, and the comments that we are getting are:
- Blaming other library
grpc
- asking users to use expensive workarounds (min instances)
- asking to check a web based library on a nodejs problem (firestore/lite)
- asking users to use wasteful resources than needed (higher memory functions)
Even with the 2GB, looking at the numbers we can assume there's 80~100ms wait from the document get
, making the cold start 2 to 3 times slower just by using firestore
official library, that's the problem, is understandable some delay but almost a second to init
a library is just too much, I find it surprising how google is always making excuses on such a bad performance, as someone said, it would be interesting seeing this "cold start" happening on google search engine...
I would rather have this as won't fix than having to keep listening to excuses and wasteful workarounds, at the least be transparent.
st...@google.com <st...@google.com> #187
(the reason I selected 2GB in my previous test is because Cloud Run requires a minimum of 1CPU to enable the second generation execution environment, and picking 2GB gets you 1CPU)
Repeating my tests with 512MB:
Baseline
exports.helloWorld = (req, res) => {
let message = req.query.message || req.body.message || 'Hello World!';
res.status(200).send(message);
};
GCF gen1
- cold: 499 ms
- exec: 4 ms
GCF gen2
- cold: 829 ms
- exec: 4 ms
Using firestore
const {Firestore} = require('@google-cloud/firestore');
const firestore = new Firestore();
exports.helloWorld = async (req, res) => {
const document = firestore.doc('users/steren');
const doc = await document.get();
console.log('Read the document');
res.status(200).send('Hey');
};
GCF gen1:
- cold: 1311 ms
- exec: 66 ms
GCF gen2:
- cold: 2200 ms
- exec: 60 ms
I'd really appreciate if you could share a repository that includes code + package.json
+ the exact gcloud functions deploy
command that reproduces the abnormally high cold start. Thank you.
bu...@gmail.com <bu...@gmail.com> #188
Here's an example that takes ~9 seconds on cold starts, then ~800ms afterwards. it creates/updates a user then read/writes to firestore. Tested on AWS.
The hosting infrastructure doesn't really matter, its slow on GCF / AWS / Other
st...@google.com <st...@google.com> #189
Sorry, it is out of scope of this bug to fix firebase-admin
on AWS runtimes. Could you deploy it to an HTTP Cloud Function with gcloud functions deploy
and measure the cold and warm times as I did above?
br...@askgms.com <br...@askgms.com> #190
It really looks like you've already reproduced the issue... or are you implying that jumping from an 829 ms cold start to a 2200 ms cold start with just one library initialization is acceptable from your perspective? Nearly 1500 ms just for one library to initialize certainly doesn't seem acceptable as a general case, and obviously that's not what you would accept internally since none of your assets seem to have 1 1/2 second bonus lags at random times.
gr...@gmail.com <gr...@gmail.com> #191
I agree with the comment above. What you've shown is the issue. In what world is a 2.2s startup time acceptable for a simple read or write operation?
It's also pretty disappointing to see that the problem actually gets worse in the gen 2 environment.
st...@google.com <st...@google.com> #192
I am only capturing data, not making a judgement. I am not saying 2s is a great cold start, I am observing that 2s does not match the 5s or 12s latency reported on this bug.
This bug was originally opened for an abnormally high cold start observed when using Firestore client library. If this bug is now used for a more generic "Improve Cloud Functions cold start when using GCP client libraries from 2s to 0.5s", it is a very different issue for us.
gr...@gmail.com <gr...@gmail.com> #193
First of all, thank you for engaging with us on this and actually investigating. To answer your question: I personally never saw anything like 12s cold-starts. I was periodically seeing 5 or 6 second startups around the time this issue was first filed. Subsequent work on the product has improved this for me, but I still regularly see 1.5-3.5s cold-starts, which is not too far from your own data. Those are completely unacceptable numbers for any user-facing task.
In my opinion, nothing about our ask has changed. This is still an issue with the Nodejs Firestore client library seeing extremely high cold-start times. Do you not view a 2s cold start as abnormally high?
br...@askgms.com <br...@askgms.com> #194
I don't want to get too pedantic, but originally this ticket was opened in response to
We definitely appreciate attention on this issue, but we've been getting "attention" for a long while here. Over two years ago we were told that this was the team's "top priority":
So please forgive us if we're losing patience with being asked over and over for the same reproduction scenarios, the same clarifications about why a ticket was opened, etc. We don't get to control opening these tickets, and we've provided extremely generous, clearly-reproducible scenarios many times over, and some of us have even written entire libraries to work around these issues while Google has failed to take meaningful action.
At this point, it feels like at minimum your team should be able to use the basic reproduction case, find the critical path there, and then identify optimizations that address the problem. Please stop telling us this is the top priority and then failing to fix much of anything. Or feel free to close this as "won't do" so at least we know it's time to move on to other NoSQL options.
st...@google.com <st...@google.com> #195
Thanks. I can indeed confirm that we have released many improvements over the past 2 years. And the "second generation execution environment" that Cloud Run is offering in Preview is another one of these that should ultimately make its way to GCF as the default.
Thanks for confirming that the 2s cold start is now what we should be looking into.
I agree 2s is high, we should improve either by investing into the Cloud Functions execution environment, or by minimizing the footprint of GCP client libraries.
In the meantime, I encourage you to:
- deploy to Cloud Functions 2nd gen, and then go into Cloud Run and set the "max concurrency" from 1 to 100. This will allow Cloud Functions to send multiple requests at the same time to the same instance, in practice, this drastically reduces the number of cold starts. (Node.js is designed to handle concurrent requests)
- set min-instances = 1, so that the 0 to 1 cold start is mitigated by an instance being kept warm (this instance is not charged at full price when it's not processing requests).
ay...@gmail.com <ay...@gmail.com> #196
triggers and also on cloud run. We can't rely on audit logs so still on gen
1 cloud functions.
You can see the delays(6s - 8s) are not acceptable for simple trigger
callbacks. Actually the code is nothing other than just loading a doc in
the trigger from firestore.
Note that its 256mb function though but still didn't expect it to be this
way.
On Fri, Aug 19, 2022 at 9:22 PM <buganizer-system@google.com> wrote:
gr...@gmail.com <gr...@gmail.com> #197
The max concurrency suggestion doesn't help my case (a pretty common one I would think), which is sporadic, user-facing, time-sensitive tasks: processing payments, account upgrades, user registrations, etc. These are tasks that don't happen frequently enough to keep any instances warm, even with a fairly significant user base.
The min-instances suggestion has been made multiple times before, and is pretty frustrating, since it's basically asking us to pay extra to receive basic levels of acceptable functionality from this product.
Also to add a +1 — I am also unable to move to gen2, since I rely heavily on Firestore triggers.
bu...@gmail.com <bu...@gmail.com> #198
first request in cloud functions was 10.5 seconds, following is around 1.4-1.6s
AWS is twice as fast for the subsequent requests, but i'm assuming thats because their vCPU is probably higher for the 256mb instance?
hd...@google.com <hd...@google.com> #199
We are in progress to support HTTP REST transport (as an option) in addition to gRPC for the Firestore client and other Cloud services as well. As we understand, switching to HTTP transport in use should probably mitigate the cold start time issue.
We will provide an update early next week on the ETA/timeline.
Thanks,
Hari
gr...@gmail.com <gr...@gmail.com> #200
Amazing news, thank you!
da...@google.com <da...@google.com> #201
I took a look at the test cases Steren provided in #185 / #187.
First, there have been a number of performance improvements over the past couple of years, which have brought down overall cold start latencies for Cloud Functions. Most of these improvements are at the infrastructure layer (e.g. filesystem performance, kernel scheduler improvements, etc), and were not targeting this specific issue. This is why what previously might have taken 6-9 seconds has improved over the years down to more like 2-3 seconds in a fair number of cases.
Second, I agree that the current implementation of node.js gRPC is not optimized for fast cold starts. More on that shortly.
Third, here's an example cold start from Steren's 512 MB gen1 GCF example in #187. If we strace the startup (which admittedly adds a bit of overhead in and of itself), we see the following milestones:
Time (s) Milestone
==========================================================================================
0.000000 cold start begins
0.221176 node begins loading functions-framework module
0.737450 function loads user code <= FUNCTION ENTRY POINT
0.742013 node begins loading @google-cloud/firestore module
1.438190 node finishes loading @google-cloud/firestore module <= FUNCTION EXECUTION BEGINS
1.842618 function returns result <= FUNCTION EXECUTION ENDS
—-----------------------------------------------------------------------------------------
And looking more closely at the main JS modules being loaded:
functions-framework: stat 881 distinct paths, read 880724 bytes from 252 files = 516 ms
@google-cloud/firestore: stat 1157 distinct paths, read 4989964 bytes from 347 files = 696 ms
In this particular example, I saw around 122 ms of filesystem wait time (this is one of the things we've been optimizing). So now, the overall cold start latency is dominated by the function runtime itself - i.e. the node binary loading all those JavaScript dependencies, parsing / compiling them, etc.
Some of this time is independent of the user code. The node.js Cloud Functions runtimes do some amount of setup work before loading the user code, and this setup includes loading modules such as functions-framework, etc. So, even the most trivial "hello, world" node.js function is going to take 500+ ms for a cold start as things stand right now.
Separate from that, there is the user code and its dependencies... and now we get to the bit about node.js / gRPC. As many have pointed out on this bug, @google-cloud/firestore has a gRPC dependency, which pulls in grpc-js, protobufjs, and google-gax. These are very large dependencies (e.g. google-gax includes multiple megabytes of generated JS from various large proto descriptors), and these massive amounts of generated code (e.g. see the huge .js files
In addition, it is very easy with JavaScript to wind up pulling in a large amount of transitive dependencies. Hence we find ourselves loading 600 files (some of which are quite large) to run what looks like a trivial snippet of code.
I have not profiled an example that includes firebase-admin, but I suspect it is more of the same.
In summary, the update in #199 (HTTP/REST Firestore client) is likely the best path forward for reducing cold start latency. The combination of gRPC / protobuf (with its descriptors / schemas) leads to a lot of generated code that is slow to load. This is not the case in every language runtime, but is certainly the case with node.js today.
gr...@gmail.com <gr...@gmail.com> #202
Thanks for the update - I'm very excited for HTTP/REST support. I currently use a mix of REST for time-sensitive functions and firebase-admin where that's less important, and I would absolutely love to be able to refactor all of that to just use an HTTP version of the Node.js Firestore client.
fe...@google.com <fe...@google.com> #203
I did some work decoupling @grpc/grpc-js
from google-gax
:
-
https://github.com/googleapis/gax-nodejs/pull/1326 will help client libraries avoid loading@grpc/grpc-js
if they only intend to use the fallback (HTTP) version ofgoogle-gax
; -
https://github.com/googleapis/gapic-generator-typescript/pull/1224 updates the generated client libraries to allow passing an instance ofgoogle-gax
, which could be eitherrequire('google-gax')
orrequire('google-gax/build/src/fallback')
, in the latter case@grpc/grpc-js
will never get loaded.
After these two PRs are merged and released, we'll make the Firestore library use this so it will only load the fallback implementation by default.
br...@askgms.com <br...@askgms.com> #204
Thanks a ton for putting this together! It seems like these PRs may be exactly what’s needed to actually address the root problems, at least for many of us. Have any performance comparisons been done yet?
bu...@gmail.com <bu...@gmail.com> #205
This would significantly improve the authentication speed for people using custom tokens or working with the admin SDK to manage users.
Registering a user via the admin SDK and creating a firestore document for the user can take 8 seconds cold, which feels like an awfully long time to wait from the frontend.
fe...@google.com <fe...@google.com> #206
For those who follow this problem, I put together the fixes we were working on into a pre-release, please try
npm install @google-cloud/firestore@6.1.0-pre.0
and pass preferRest: true
as an option for the constructor to enable the REST transport:
const db = new Firestore({preferRest: true});
I see an improvement in my quick tests, please let us know if it makes things better for you.
gr...@gmail.com <gr...@gmail.com> #207
Amazing, thanks for the update. I'm testing this in real world conditions and will post again when I have results.
be...@gmail.com <be...@gmail.com> #208
Le ven. 2 sept. 2022 à 04:51, <buganizer-system@google.com> a écrit :
fl...@gmail.com <fl...@gmail.com> #209
I've been waiting for this for a long time like 2 years, and I'm glad
there's progress finally.
Best Regards,
Derrick
On Sat, Sep 3, 2022 at 1:09 AM <buganizer-system@google.com> wrote:
gr...@gmail.com <gr...@gmail.com> #210
I'm seeing a significant performance boost from this in my testing so far. Very excited to see this happening! Can't wait for it to trickle down to firebase-admin, which will simplify usage.
at...@protonmail.com <at...@protonmail.com> #211
Been very pleased with the @bountyrush performance although one that google supports directly would be much more appealing as I'm sure the developer can't commit much time to it.
fe...@google.com <fe...@google.com> #212
Hi folks,
TL;DR: We released @google-cloud/firestore
v6.2.0 with the HTTP/1.1 REST transport, please use it with {preferRest: true}
, this bug will be closed.
Now that this bug is more than 2 years old, let me summarize what we did during this time and what the current state is.
One of the main findings here was that the slow cold start times could be linked with accessing the filesystem during the cold start, and to loading the gRPC library. During these two years, we implemented an alternative HTTP/1.1-only transport, and also reduced the number and size of files accessed during the library load. Since some Firestore functionality depends on gRPC (RPCs that require bi-directional streaming must use gRPC), the HTTP transport will be used whenever possible, switching to gRPC if needed. We made gRPC imported conditionally, so that it never tries to read any gRPC file from node_modules
unless requested for a bi-directional streaming call.
Today we released @google-cloud/firestore
v6.2.0, which includes all the fixes from the previously published pre-release, plus some reduction of the size of the files it loads during startup. Please note that the HTTP transport is currently not the default option, and should be enabled by passing {preferRest: true}
to the Firestore constructor:
const db = new Firestore({preferRest: true});
// chooses HTTP or gRPC as needed, defaults to HTTP
Note: the change we made affects not only Firestore, but most of our other libraries (most of them have auto-generated parts that now support HTTP transport). E.g. if you are creating the Firestore Admin client directly, you can avoid loading gRPC by requesting only the HTTP part of our transport library, google-gax
, and enabling the "fallback" mode:
const gax = require('google-gax/build/src/fallback');
// avoids loading google-gax with gRPC
const adminClient = new FirestoreAdminClient({fallback: 'rest'}, gax);
We'll eventually make it the default transport; since it's a big change in how the library behaves, the default change will likely go to the next major version (e.g. when we drop Node.js v12 support next year). For now, the HTTP transport will stay behind this constructor option.
We expect this release to improves cold start times. I saw the comment about @bountyrush/firestore
performance and I will take a look to see if we can improve even more. At the same time, having a 2 years old open bug does not help tracking the problem at all, since a lot of things have changed and improved since the time the bug was opened.
So, the summary is:
-
I will close this bug as Fixed since v6.2.0 should resolve most of the slow cold start concerns.
-
Please update
@google-cloud/firestore
to v6.2.0, and pass{preferRest: true}
if you experience a slow cold start problem. -
Please feel free to open new bugreports (e.g.
) if you keep having the slow start problems, or contact support if you have support contracts. We are committed to improve the customer experience with our libraries, and we appreciate all bug reports and feature requests.here
gr...@gmail.com <gr...@gmail.com> #213
bu...@gmail.com <bu...@gmail.com> #214
Thanks so much for the work on this!
Could you please clarify how this would work for imports or where we need to init the Firestore with a specific app?
e.g:
import {getFirestore} from "firebase-admin/firestore"; // could also be from "firebase/firestore"
const f = getFirestore(someSpecificAppInstance);
fe...@google.com <fe...@google.com> #215
Re: firebase-admin, its latest version depends on @google-cloud/firestore
v5, while the GitHub code already depends on ^6.0.0, so I'm guessing it needs to have a way to pass preferRest
to the Firestore instance through the options, and then an npm release. This is better to be tracked in
Re: firebase, it does not use @google-cloud/firestore
at all, providing its own Firestore implementation, actually, two of them: the gRPC implementation and the lite HTTP implementation. You might just be able to follow @firebase/firestore
, which is a separate codebase with already existing lite implementation that does not load gRPC.
Description
Update, Jan 8th 2021
👋 I’ve updated my initial post and the title to be more specific, based on the problems still being discussed in this thread:
Cold Start performance issues seem to correlate closely with gRPC libraries (like Firestore). Folks switching to HTTP dependencies, from gRPC, have seen performance improvements (this seems to point to gRPC as well).
If the problems you’re running into do not seem to correlate to gRPC SDKs, such as Firestore, please don’t hesitate to open an issue and we will investigate (Also, I’ve pulled together a Tips & Tricks post , based on some of what I’ve learned investigating this thread which might help).
We continue to work on cross cutting features that will help cold start performance in general, e.g., Min Instances for Cloud Functions, and will keep this thread updated as features roll out.
Problem you have encountered:
We've had a long standing issue on GitHub related to cold start performance of Cloud Functions.
The original issue was that the
grpc
dependency was quite large, and could lead to several additional seconds in load times. We have since moved to@grpc/gprc-js
. this issue has been addressed, and the load times of modules themselves during cold start seem reasonable:@google-cloud/firestore
.@google-cloud/firestore
.@google-cloud/firestore
.Despite module load times being reasonable, customers are seeing long delays when accessing cloud functions on cold starts:
What you expected to happen:
A cold start time in a reasonable threshold, ideally <1s, vs., 5s.
Steps to reproduce:
The below code can be used to benchmark the time spent loading modules, during GCF initialization:
The URL of the Cloud Function can then be plugged into Timeline Viewer .
Customers indicate that, on cold starts, the time to first byte in the browser is significantly delayed from the observed load time of the function.