Status Update
Comments
wa...@google.com <wa...@google.com> #2
The original problem seems to be an interrupt handler uses the cprints() to print some error message. The cprints() shows the timestamp. When the timestamp is larger, the divmod() takes more time to calculate. It delays the interrupt handler and causes a more serious failure.
A good practice may be avoiding cprints() in interrupt handlers. Use cprintf() instead, which doesn't show the timestamp.
Or when cprints() is used in an interrupt handler, it just prints a single-digit of the integer part and the whole fractional part. So the print time is fixed. The 10 of second part is likely the same as the previous messages.
wa...@google.com <wa...@google.com> #3
This is the implementation of uint64divmod().
int uint64divmod(uint64_t *n, int d)
{
uint64_t q = 0, mask;
int r = 0;
/* Divide-by-zero returns zero */
if (!d) {
*n = 0;
return 0;
}
/* Common powers of 2 = simple shifts */
if (d == 2) {
r = *n & 1;
*n >>= 1;
return r;
} else if (d == 16) {
r = *n & 0xf;
*n >>= 4;
return r;
}
/* If v fits in 32-bit, we're done. */
if (*n <= 0xffffffff) {
uint32_t v32 = *n;
r = v32 % d;
*n = v32 / d;
return r;
}
/* Otherwise do integer division the slow way. */
for (mask = (1ULL << 63); mask; mask >>= 1) {
r <<= 1;
if (*n & mask)
r |= 1;
if (r >= d) {
r -= d;
q |= mask;
}
}
*n = q;
return r;
}
It becomes worse when the timestamp is over the 32-bit boundary. It is around 1 hour 11 minutes.
Description
In the private bug 180438354 we ended up with a failure that would only show its face after the EC was up for a day. Rebooting the EC would make it go away.
This isn't so wonderful and meant that the bug escaped notice for quite a while. For me personally I always have so many pieces of hardware / random firmware versions that if I see what looks like an EC problem my first instinct is to update my firmware and then try to reproduce. That always "fixed" the problem for me.
I believe the problem is that 64-bit math is pretty heavy for our ECs and so this loop takes a non-trivial amount of time:
In theory I guess we could force it to do a fixed number of division operations (even if we didn't print out extra zeros) to make it always take the same amount of time.