Assigned
Status Update
Comments
mi...@google.com <mi...@google.com>
at...@google.com <at...@google.com> #2
The Cloud ML Engine engineering team were notified about the interest of having this feature implemented.
Kindly note there are no ETAs or guarantees for this feature to be available. Future updates will be shared on this thread.
Kindly note there are no ETAs or guarantees for this feature to be available. Future updates will be shared on this thread.
Description
What you would like to accomplish:
In order to reduce the latency of online predictions, customer considers necessary implementing some customization options to the autoscaling algorithm
How this might work:
• Option to specify the idle time: Currently, the documentation states that the service scales down to zero after several minutes without a prediction request. Customer wants to specify how many minutes they want to keep the workers up before it scales down.
• Option to have both a minimum node count and the scaling down to zero: Currently, the minNodes option disables the scaling down to zero, customer wants that when the service scaled down to zero and then it gets a new request, the service pushes up a minimum number of nodes.
• Options for aggressively adding more nodes: For example, customer wants to specify that if the utilization is above 50%, then add more nodes.
If applicable, reasons why alternative solutions are not sufficient:
Low latency is business critical for the customer, but they want to avoid unused resources as well for when their customers are not likely to submit requests, so those are some options the customer wants in order to guarantee a low latency for the most of the time.
Other information (workarounds you have tried, documentation consulted, etc):