Status Update
Comments
br...@google.com <br...@google.com> #2
Information redacted by Android Beta Feedback.
ap...@google.com <ap...@google.com> #4
Please provide the requested information to proceed further. Unfortunately the issue will be closed within 7 days if there is no further update.
ap...@google.com <ap...@google.com> #5
Branch: master
commit f460e88508d0f76c1b4451d81d7e479d14be5ded
Author: Mike Frysinger <vapier@chromium.org>
Date: Thu Oct 26 15:59:33 2023
make min compile time constant
Isolate from the clock on the build system, and make the build
reproducible regardless of when the package was built.
Pick a time that is kept in sync with platform2/init.
BUG=b:307794426
TEST=CQ passes
Change-Id: If0afa919047f7b1548938f251995728301ce31d3
Reviewed-on:
Commit-Queue: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Pavol Marko <pmarko@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
M
va...@google.com <va...@google.com> #6
i thought tlsdated was using the max time to reset the initial clock like it does with the min (compile) time. i didn't realize it was a clamp on the time passed back from the server.
i understand that capping the clock means it causes issues if that software is run in <whatever future timeframe>. i contend that the software OS wouldn't be reasonably usable in the first place.
- root CA certs expire in shorter timespans which means you can't make HTTPS connections -- this has happened once already in the existing 10 year lifespan of CrOS
- encryption/hasing algorithms evolve -- look at the turndown of md5/sha1 in certs, and of lower RSA keysizes, and the rise of ECC
- web standards evolve and require updated browsers for JS/HTML/CSS standards -- gmail is killing basic HTML in Jan 2024
- physical network standards change. i thought i saw chatter of newer routers dropping support for older 802.11 standards, but i can't find a link. so i'll tinfoil hat this.
- network protocols change. IPv4-vs-IPv6 obviously, and to a lesser degree, the other things we build off of like DHCP & DNS. admittedly weak based on how hard people have bent over so far here.
- all 32-bit devices we shipped have gone EOL, and all of them will def stop working in
-- 15 years from now.2038 - i wonder if we're checking the range of RTC's that we're shipping now ... if they use seconds, then 2038 is a hard limit.in <100 years. i see a couple use days, and allocate only 15 bits, which means 1970+89->2059.
Google services def won't work indefinitely with outdated browsers. which means the only thing your EOL CrOS device could do is guest mode. and maybe you have to manually configure the network to do so.
so i don't think "we need to write our software with the assumption it will be used forever and never update" is a reasonable balance to reducing the impact of bugs. will an OS build work for 5 years without updates ? most likely. 10 years ? prob ? 15 years ? getting sketchy. 100 years ? no ... and i doubt the hardware is going to survive in any fashion we care about.
from reading the kernel code, 64-bit time_t has a ceiling of 2232:
- it's signed -> 63-bits
- it stores nanoseconds, not seconds, since epoch -> scale down by 10^9
- linux has 30 year max uptime (see include/linux/time64.h:TIME_UPTIME_SEC_MAX)
- 1970 + ( (2 ^ 63) / (365 * 24 * 60 * 60 * 1000 * 1000 * 1000) ) - 30 -> 2232
so what is your number ? 200 years ? 2232 is 209 years from now ...
ap...@google.com <ap...@google.com> #7
Branch: master
commit 224a83e39627ff1511a8b94adff8ebbec2fc7ae3
Author: Mike Frysinger <vapier@chromium.org>
Date: Thu Oct 26 16:00:32 2023
make max time constant relative to min compile time
tlsdate has been hardcoding 18 May 2033 as the max valid time it will
accept. This has been "really far in the future", but it's now less
than 10 years away. Let's change the max valid time to be relative to
the min compile time that we already enforce.
For now, set the window to 15 years. This matches platform2/init.
BUG=b:307794426
TEST=CQ passes
Change-Id: Ia24464a52dd50491722aefa73133577f5c037910
Reviewed-on:
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Pavol Marko <pmarko@chromium.org>
Commit-Queue: Mike Frysinger <vapier@chromium.org>
M
br...@google.com <br...@google.com> #8
I just wanted to start from understanding what tlsdated does, and what (if any) we want to imitate from it. Once we have that out of the way, it's easier to make further decisions.
I also agree that putting some cap is reasonable too, but at least now I think we're agreeing it's mostly to guard against an outrageous RTC behavior or against very edge-case (as in, overflows, etc.) time boundaries.
I was personally considering the "threat" model of "what if we forget to update the year in the source code for a while?". (I wasn't even personally aware of this hard-coded year until now, and it seems mostly you (vapier@) who's been updating them. It seems reasonable to call this a low bus factor at the moment...) So supposing we don't update the max time for a few years, and we have 10 year AUE...then we might be shipping Final(TM) OS's to people that have a 5 year clock lifetime. I think that's unreasonable, and you seem to agree.
And sure, I understand that an OS built today is unlikely to be performing pretty much any useful function in 15-100 years. Your list is pretty on-track there, with varying degrees of strength.
As a rough ballpark, I would probably pick something on the order of 100 years. Adding digits (e.g., I've seen BCD-like RTCs that count from the year 2000) and things on that scale seem like a realistic concern for brokenness. And if Linux 64-bit time is really a problem in the 22xx's, we might even cap that independently too.
Side note: this is a public issue, but I'll note that we have some amount of data on active devices (i.e., reporting metrics, etc.) that are years past AUE. So we can probably get at least a lower bound on expected post-AUE lifetime.
va...@google.com <va...@google.com> #9
the Linux 22xx cap means attempts to set the date that far in the future will fail with EINVAL. so adding our own check to reject it ahead of time wouldn't really add any value.
100 seems too far to be reasonable. 15 seems too low if we fall behind in the updates. how about 30 ?
i wrote a doc on this and sent it to you for pre-review ... part of that made me come across our reliance on TLS 1.2 behavior with a single Google server. TLS 1.3 doesn't support tlsdate, and it sounds like internal commitments are only going as far as 2030 atm (we can assume it won't end sooner, but i don't know we have a promise that it'll go further). so 30 years seems more than reasonable.
br...@google.com <br...@google.com> #10
the Linux 22xx cap means attempts to set the date that far in the future will fail with EINVAL. so adding our own check to reject it ahead of time wouldn't really add any value.
I was assuming you were interested in signed overflows. Suppose we allow a time that's almost Y2K Y2232 (due to a buggy RTC), and we have software that starts projecting schedules (e.g., monthly, yearly, ... enterprise features?) into the future, we might land in negative time territory. But user space doesn't tend to operate on the kernel's version of time_t
(does it ever?), but on a wider-range format, so maybe EINVAL at the kernel boundary is the only thing that will happen.
In that case, we're only guarding against buggy RTC? And what's the effect of a (too large) bad RTC value?
- Bad UX (confusingly wrong time)? That would happen anyway. Why is <build_year> (definitely the wrong time) better than <large_year> (possibly the right time)?
- Bad security? Expiring certificates? AIUI, it's safe to roll time forward (and expire certificates; we'd rely on tlsdated to see if we can recover a reasonable-looking mostly-TLS-passing time); it's less safe to roll time backward, such that previously-expired certificates become valid again.
So I'm still not really getting the point. I'd rather let tlsdate handle the correction, if it can swing it.
But even so:
how about 30 ?
Sure, that at least doesn't hit my complaint with 15. I could live with that.
(I've left a few comments on go/cros-timekeeping. I might revisit still.)
va...@google.com <va...@google.com> #11
i read the tlsdated code. i think we're both correct on some points. while there is 1 define for the min (RECENT_COMPILE_DATE
), the codebase has 2 defines for the max (TLSDATED_MAX_DATE
& MAX_REASONABLE_TIME
) which has lead to confusion. i posted a CL to update both at build time to the same constant (compile time + 15 years atm).
TL;DR
once we exceed max time in the real world, it's game over
- tlsdated will automatically rewind the time to the
RECENT_COMPILE_DATE
at boot, and set the RTC & system clock immediately - tlsdate will reject server times that exceed the max time, both to verify the certificate, and to set the system clock
- tlsdated will ignore requests over dbus to manually set the time (which is how crosh & Chrome set the clock)
so you could manually set the clock to the max time that tlsdated would accept, and then let the clock tick by itself, but you'd have to do this on every boot
tlsdated behavior (TLSDATED_MAX_DATE
)
tlsdated is the main daemon that runs the event loop, including dbus, and manages the clock resources (notably the RTC). it does not talk to the network by design. it uses the max time when starting up to clamp the clock.
TLSDATED_MAX_DATE
is only used in 1 placetlsdated.c:49 is_sane_time()
which checks if a timestamp is newer than the compile time & older than the max time
is_sane_time()
is called in a few placestlsdated.c:79 load_disk_timestamp()
to verify the saved timestamp is within range, otherwise it ignores ittlsdated.c:552 main()
to verify the loaded timestamp is within range, otherwise it ignores it -- this call is redundant sinceload_disk_timestamp()
already did it, so we can ignore ittlsdated.c:564 main()
to verify tlsdated's initial timestamp at startup is within range- this would be the system clock (which came from the RTC), or the disk timestamp (although this would be the 3rd time it was checked)
- if it's out of range, the initial timestamp is reset to the build time (
RECENT_COMPILE_DATE
) - an
E_SAVE
event is triggered with the timestamp to update clocks -- so if it's out of range, tlsdated pulls it back
tlsdate-setter.c:175 time_setter_coprocess()
is theE_SAVE
event handler- it's a sep process that reads the timestamp sent to it over an fd
- if the timestamp is out of range, it's ignored, and keeps running
- this is the only func that syncs the RTC via
sync_hwclock()
- this is the only func that syncs the system clock via
settimeofday()
- this is the only func that writes the timestamp to the disk
dbus.c:379 handle_set_time()
is a dbus callback fororg.torproject.tlsdate
SetTime
- if the dbus request tries to set the time outside of the min/max range, it ignores it and returns an error
- crosh's
set_time
helper calls this to let people manually set the clock when network sync isn't working - Chrome uses this API to set the clock too (presumably via the OS settings dialog, but i didn't dive deeper)
- so it's not possible to manually set the time beyond the max limit
events/tlsdate_status.c:76 action_tlsdate_status()
is theE_TLSDATE_STATUS
event handler- if the timestamp is out of range, it's ignored, and keeps running for the next event
- if the timestamp is within range, it triggers an
E_SAVE
event E_TLSDATE_STATUS
is only triggered byaction_run_tlsdate
, and that's only triggered byE_TLSDATE
eventE_TLSDATE
is triggered by many things, but i don't think how super matters at this point as it all ends up in theE_SAVE
handler which is described above -- basically, whentlsdate
works, we trigger these events
tlsdate behavior (MAX_REASONABLE_TIME
)
tlsdate is the helper program that tlsdated periodically spawns in a sandbox to talk to the network & set the system clock. it takes care of all the messy TLS/certificate stuff. it does not sync the RTC.
NB: there's tlsdate & there's tlsdate-helper. tlsdated invokes tlsdate with CLI options which it parses before execing tlsdate-helper. basically tlsdate just normalizes arbitrary --options into static list of options. not sure why it bothers, but it does.
side note: the TLS connection uses a 32-bit time offset, so filed
MAX_REASONABLE_TIME
is used in 2 placestlsdate-helper.c:375 verify_with_server_time()
is called when--leap
is used withtlsdate
(which we do) to check the server time- it reads the time reported by the server in the TLS connection and checks it against
max_reasonable_time
- if it's out of range, we abort -- thus tlsdate will never sync once it reaches this limit
- if it's in range, the time used to verify the certificate is set to the server time instead of the local time
- somewhat ironically, if we didn't use
--leap
, then server & certificate time that exceeds the max time would be accepted ... although only for the purpose of cert verification
- it reads the time reported by the server in the TLS connection and checks it against
tlsdate-helper.c:1103 main()
checks the server time before setting the clock- it creates 32-bit mmap for
time_map
- calls
run_ssl()
in a child to read the server time via the TLS connection that it establishes & verifies - the
time_map
is converted from network byte order to cpu byte order and set inserver_time_s
(uint32_t) server_time_s
is turned intoserver_time
(struct tlsdate_time)- if
setclock
is enabled (which we do), the time is compared to the min & max times, and if they're out of range, we abort -- thus tlsdate will never sync once it reaches this limit - otherwise the time is written to the system clock via
clock_set_real_time_linux
- this is the only func that syncs the system clock via
clock_settime(CLOCK_REALTIME)
- NB:
clock_settime(CLOCK_REALTIME)
is also used earlier inmain()
if--timewarp
is enabled, but we don't do that
- NB:
- it creates 32-bit mmap for
br...@google.com <br...@google.com> #12
Thanks for the deep dive. The multiple constants were indeed confusing, as was the man page. At this point, I don't even know where I was misled by the documentation or by my own reading of the code...
Notably, even the man page was correct about the --leap
behavior, but I missed this point:
"When the only issue with the certificates in question is the timing information, this option allows you to trust the remote system's time, as long as it is after RECENT_COMPILE_DATE and before MAX_REASONABLE_TIME." (emphasis mine)
somewhat ironically, if we didn't use --leap, then server & certificate time that exceeds the max time would be accepted
Yeah, that part seemed weird to me too. But:
... although only for the purpose of cert verification
this immediately means it doesn't really matter.
All in all, it's probably a good idea that you've written go/cros-timekeeping, because reading through tlsdate code and docs is a mess -- even if it's reasonably well documented, the behavior is subtle based on the exact flags we use.
Now, even with your excellent roadmap of tlsdate does, it doesn't really cover the why, and whether that's all a good thing. Particularly:
once we exceed max time in the real world, it's game over [...] so you could manually set the clock to the max time that tlsdated would accept, and then let the clock tick by itself, but you'd have to do this on every boot
This still feels like planned obsolescence, without much of a good reason. Yes, security and network connectivity is likely to suffer, but going as far as preventing the user from setting the correct time? That seems pointless.
But I also acknowledge that if we keep RECENT_COMPILE_DATE
fresh until AUE, then my concerns are essentially meaningless. So I think this will be my last complaint :)
va...@google.com <va...@google.com> #13
Now, even with your excellent roadmap of tlsdate does, it doesn't really cover the why, and whether that's all a good thing
once i finished that dive, i started work on a doc for this, and i'll loop you on it once it's ready. i get the sense that we, as in CrOS, didn't really notice/think/care about it. or at least, when we adopted tlsdate in 2012, the hardcoded max time of 2033 was so far in the future, it's someone else's problem ... assuming we were still running tlsdate by that point.
ap...@google.com <ap...@google.com> #14
Branch: main
commit 3fb3df807a94309aa082fed14c2bf6ed25f42ed3
Author: Mike Frysinger <vapier@chromium.org>
Date: Thu Oct 26 15:48:12 2023
init: add a ceiling for the clock
Since tlsdated is enforcing a max valid time, integrate one here
too. This can come up if the RTC is buggy or the battery runs out,
and it loads with a very large value.
We'll clamp the clock to 30 years with the expectation none of this
software will be used that far in the future. If it is, while we reset
the clock backwards, it might still self-heal with tlsdate, assuming
the code can even still talk to the network.
See the bug for much deeper discussion of tlsdate behavior.
BUG=b:307794426
TEST=CQ passes
Change-Id: Iae087d660b1446a6b63c1467fa666403e414072d
Reviewed-on:
Tested-by: Mike Frysinger <vapier@chromium.org>
Commit-Queue: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
M init/startup/chromeos_startup.cc
M init/startup/constants.h
ap...@google.com <ap...@google.com> #15
Branch: master
commit af28194c01809e21db20ead77aeb2bcfa43e429d
Author: Mike Frysinger <vapier@chromium.org>
Date: Tue Oct 31 21:24:36 2023
sync max time constants
Upstream uses one constant for tlsdated and a diff one for tlsdate.
Make sure we set them to the same value we control.
BUG=b:307794426
TEST=CQ passes
Change-Id: I0930cb2c0bd2cdd54d8b7b6395534d13b54369ee
Reviewed-on:
Reviewed-by: Pavol Marko <pmarko@chromium.org>
Commit-Queue: Pavol Marko <pmarko@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Auto-Submit: Mike Frysinger <vapier@chromium.org>
M
va...@google.com <va...@google.com> #16
posted go/cros-timekeeping-max for review. let's see how it goes.
ap...@google.com <ap...@google.com> #17
Branch: main
commit d998465fb8d728928cf78166eb479d1f9059de36
Author: Meelunae <git@eleuna.me>
Date: Thu May 09 20:30:35 2024
initramfs: Updated year in common/init.sh
Changed the date for the clock initialization function from 2023 to 2024.
BUG=b:307794426
TEST=manual testing, change based on comments recommending yearly modifications.
Change-Id: I5e104eaf62c583b6785549d0398a560e2434a005
Reviewed-on:
Reviewed-by: Mike Frysinger <vapier@chromium.org>
Tested-by: Emanuele Iaccarino <git@eleuna.me>
Commit-Queue: Emanuele Iaccarino <git@eleuna.me>
M common/init.sh
Description
tlsdated will update the clock to a min time during boot based on when it was compiled. it will also factor in a hardcoded max time (~18 May 2033 as of this bug filing).
the max time tlsdated is currently using isn't so bad -- it's ~10 years in the future. but we should really harmonize this behavior between the 2 projects, and make sure we keep it under control so that 2033 rolls around and we panic because no one noticed.
* init: enable 64-bit time_t support. 32-bit time_t is 19 January 2038 which is <15 years away.
* init: add a max time check based on our current constant. let's call it +15 years.
* init: add a note in the header mentioning the tlsdated constants to help keep in sync.
* tlsdated: enable 64-bit time_t support.
* tlsdated: set the min time to the same constant used by init. this isolates us from clock weirdness on build systems, and avoids cache invalidation where every package build is diff.
* tlsdated: set the max time to the same constant used by init.
* tlsdated: add a note in the source mentioning the init constants to help keep in sync.