Infeasible
Status Update
Comments
le...@practiv.com <le...@practiv.com> #2
Correction:
So, the end result is that usb_write() sometimes sends a ZLP when it doesn't need to
should be
So, the end result is that usb_write() sometimes sends a ZLP when it doesn't need to, and sometimes neglects to send a ZLP when it should
So, the end result is that usb_write() sometimes sends a ZLP when it doesn't need to
should be
So, the end result is that usb_write() sometimes sends a ZLP when it doesn't need to, and sometimes neglects to send a ZLP when it should
le...@practiv.com <le...@practiv.com> #3
Note that the CHECK_LE has since been replaced with HandleError(), which causes adbd to not abort...but that stowaway amessage header is being dropped all the same...and so the behavior is pretty much undefined, I imagine. I'm not sure the switch to HandleError was a good idea, since it more or less swept the real problem under the rug.
ba...@google.com <ba...@google.com>
ma...@google.com <ma...@google.com> #4
good bug report, thanks! i'm assuming this isn't darwin-specific either --- it looks like we have similar logic using masks in the linux and windows backends and in the libusb backend too.
I'm not sure the switch to HandleError was a good idea, since it more or less swept the real problem under the rug.
yeah, i know what you mean, but it's also hard to argue with this logic in the commit message that made that change:
These CHECKs are expected to happen if the client does the wrong thing,
so we probably shouldn't be aborting in adbd.
a CHECK in the client (as you suggested earlier) would probably have been the best idea... postel's law and all that :-)
Description
Context
I have two simple PHP apps running in GKE as Kubernetes Deployments. They are APIs supporting a website, named as
api-deployment
andplugin-id-deployment
. Theapi-deployment
is associated with a service namedapi-service
, which is used by theplugin-id
application, as it makes calls to theapi-deployment
workload. Theapi-deployment
has a HPA associated with it.The Problem:
We noticed that, when traffic at the website starts dropping after peak lead and the HPA starts scaling down the
api-deployment
, a few requests from theplugin-id
to theapi-deployment
app fail. By looking into logs we can see that GKE is sending a command to terminate some of the pods before it updates theapi-service
(via endpoint slices) to mark these pods as "terminating" and "not ready". As a consequence, we can see that the pods start announcing that a SIGTERM has been received while the Kubernetes service is still sending new traffic to these pods, which results in some request failures.You can see logs attached showing the issue.
Note1: Ignore the fact that
SIGWINCH
is being reported instead ofSIGTERM
, that's because Apache2 misuses these signals.Note2: The
api-deployment
has a graceful termination period of 30s, but its container terminates very quickly, it only needs a second to respond to pending requests and shutdown. Incoming requests start being rejected when the termination signal is received.What you expected to happen:
I expected the service to mark some of the pods as "terminating" and "not ready" BEFORE they were deleted, to ensure availability. In other words, I expected the service endpoint and endpoint slices requests to be logged before the deletion of the pods and the termination signal logs.
Steps to reproduce:
You don't really need a HPA to reproduce, you can manually force a scale down which produces the same result.
wrk -t15 -c20 -d30s --latency --timeout 3 https://myapp.com/app-a
. This command tests for 30 secondsYou should see that
wrk
reports failing requestsWorkaround attempted
I've tried adding a
preStop
hook an deployment B (api-deployment
) to make it sleep for 5 seconds. It seems to reduce the number of errors reported/logged, but doesn't completely resolve the issue.