Assigned
Status Update
Comments
jo...@teamtag.com <jo...@teamtag.com> #2
TL;DR: GCE should document that the DHCP client software should support the "Local Subnet Routes" feature specified in RFC 3442. (Ironically though, ISC DHCP does *not* support RFC 3442 at all, but works anyway because of a different non-standard extension; see below for details.)
Today I spent some time looking into this again, because I noticed at some point in the past year OpenBSD's DHCP client stopped working with GCE's DHCP server.
Just for posterity, here's a current DHCP client/server exchange:
16:25:26.187577 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
10.240.120.1.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 42:01:0a:f0:78:01, length 300, xid 0xdf529822, Flags [none] (0x0000)
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Request
Hostname Option 12, length 8: "xxxxxxxx"
Requested-IP Option 50, length 4: 10.240.120.1
Parameter-Request Option 55, length 8:
Subnet-Mask, BR, Time-Zone, Classless-Static-Route
Default-Gateway, Domain-Name, Domain-Name-Server, Hostname
Client-ID Option 61, length 7: ether 42:01:0a:f0:78:01
16:25:26.188125 IP (tos 0x0, ttl 1, id 0, offset 0, flags [none], proto UDP (17), length 471)
169.254.169.254.67 > 10.240.120.1.68: [udp sum ok] BOOTP/DHCP, Reply, length 443, xid 0xdf529822, Flags [none] (0x0000)
Your-IP 10.240.120.1
Server-IP 10.240.0.1
Gateway-IP 10.240.0.1
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: ACK
Server-ID Option 54, length 4: 169.254.169.254
Domain-Name-Server Option 6, length 8: 169.254.169.254,10.240.0.1
Lease-Time Option 51, length 4: 4294967295
Domain-Name Option 15, length 30: "c.xxxxxxxxxxxxxxxxxx.internal."
T119 Option 119, length 63: 1.99.18.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.8.105.110.116.101.114.110.97.108.0.12.51.55.52.49.50.49.55.50.48.50.52.49.6.103.111.111.103.108.101.8.105.110.116.101.114.110.97.108.0.192.44
Subnet-Mask Option 1, length 4: 255.255.255.255
Default-Gateway Option 3, length 4: 10.240.0.1
Classless-Static-Route Option 121, length 14: (10.240.0.1/32:0.0.0.0 ),(default:10.240.0.1)
MTU Option 26, length 2: 1460
Hostname Option 12, length 40: "xxxxxxxxxx.c.xxxxxxxxxxxxxxxxxx.internal"
NTP Option 42, length 4: 169.254.169.254
The particularly relevant details:
- RFC 3442 specifies that a Classless-Static-Route entry like "10.240.0.1/32:0.0.0.0 " indicates 10.240.0.1/32 is a "local subnet route" that's directly routable even though it's not part of the leased IP address's subnet (i.e., 10.240.120.1/32 for the above exchange).
- For DHCP clients that don't support RFC 3442, if Subnet-Mask == 255.255.255.255, then the DHCP client needs to assume the Default-Gateway is directly routable. (This isn't specified by any RFCs as far as I can tell though.)
The regression was because:
- ISC DHCP doesn't implement Classless-Static-Route support (as far as I can tell), but it does implement the Subnet-Mask == 255.255.255.255 hack for Default-Gateway.
- When I first modified OpenBSD dhclient to work with GCE, dhclient wasn't seeing any Classless-Static-Route options in the server response. Since ISC DHCP's behavior was undocumented, I simply matched the implementation exactly by only extending the Default-Gateway processing.
- Some point within the last year, OpenBSD dhclient started seeing Classless-Static-Route options from the server*. OpenBSD's Classless-Static-Route support didn't implement the "local route" behavior (instead it skipped over those routes as permitted by the RFC), and the presence of the Classless-Static-Route option precludes handling of the Default-Gateway option.
* It's unclear to me why. It looks like OpenBSD dhclient has supported Classless-Static-Route for more than a year, so I suspect GCE's DHCP server must have changed since then to start using this option.
Finally, the fix was to implement "local subnet route" support in OpenBSD dhclient:http://marc.info/?l=openbsd-tech&m=141212568615772&w=2
Today I spent some time looking into this again, because I noticed at some point in the past year OpenBSD's DHCP client stopped working with GCE's DHCP server.
Just for posterity, here's a current DHCP client/server exchange:
16:25:26.187577 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
10.240.120.1.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 42:01:0a:f0:78:01, length 300, xid 0xdf529822, Flags [none] (0x0000)
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Request
Hostname Option 12, length 8: "xxxxxxxx"
Requested-IP Option 50, length 4: 10.240.120.1
Parameter-Request Option 55, length 8:
Subnet-Mask, BR, Time-Zone, Classless-Static-Route
Default-Gateway, Domain-Name, Domain-Name-Server, Hostname
Client-ID Option 61, length 7: ether 42:01:0a:f0:78:01
16:25:26.188125 IP (tos 0x0, ttl 1, id 0, offset 0, flags [none], proto UDP (17), length 471)
169.254.169.254.67 > 10.240.120.1.68: [udp sum ok] BOOTP/DHCP, Reply, length 443, xid 0xdf529822, Flags [none] (0x0000)
Your-IP 10.240.120.1
Server-IP 10.240.0.1
Gateway-IP 10.240.0.1
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: ACK
Server-ID Option 54, length 4: 169.254.169.254
Domain-Name-Server Option 6, length 8: 169.254.169.254,10.240.0.1
Lease-Time Option 51, length 4: 4294967295
Domain-Name Option 15, length 30: "c.xxxxxxxxxxxxxxxxxx.internal."
T119 Option 119, length 63: 1.99.18.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.8.105.110.116.101.114.110.97.108.0.12.51.55.52.49.50.49.55.50.48.50.52.49.6.103.111.111.103.108.101.8.105.110.116.101.114.110.97.108.0.192.44
Subnet-Mask Option 1, length 4: 255.255.255.255
Default-Gateway Option 3, length 4: 10.240.0.1
Classless-Static-Route Option 121, length 14: (
MTU Option 26, length 2: 1460
Hostname Option 12, length 40: "xxxxxxxxxx.c.xxxxxxxxxxxxxxxxxx.internal"
NTP Option 42, length 4: 169.254.169.254
The particularly relevant details:
- RFC 3442 specifies that a Classless-Static-Route entry like "
- For DHCP clients that don't support RFC 3442, if Subnet-Mask == 255.255.255.255, then the DHCP client needs to assume the Default-Gateway is directly routable. (This isn't specified by any RFCs as far as I can tell though.)
The regression was because:
- ISC DHCP doesn't implement Classless-Static-Route support (as far as I can tell), but it does implement the Subnet-Mask == 255.255.255.255 hack for Default-Gateway.
- When I first modified OpenBSD dhclient to work with GCE, dhclient wasn't seeing any Classless-Static-Route options in the server response. Since ISC DHCP's behavior was undocumented, I simply matched the implementation exactly by only extending the Default-Gateway processing.
- Some point within the last year, OpenBSD dhclient started seeing Classless-Static-Route options from the server*. OpenBSD's Classless-Static-Route support didn't implement the "local route" behavior (instead it skipped over those routes as permitted by the RFC), and the presence of the Classless-Static-Route option precludes handling of the Default-Gateway option.
* It's unclear to me why. It looks like OpenBSD dhclient has supported Classless-Static-Route for more than a year, so I suspect GCE's DHCP server must have changed since then to start using this option.
Finally, the fix was to implement "local subnet route" support in OpenBSD dhclient:
jo...@teamtag.com <jo...@teamtag.com> #4
Furthermore, this is improperly documented inside the Init Delay Helper toast in the GCP console.
Description
- There is a confusing explanation of how Init Delays really work. The assumption from public documentation assumes it’s an arbitrary time you can set before health checks are run against your instance (not on first failure) [1]:
- Clarify or offer a functionality to truly delay the initial health check of a VM (even if it is passing HCs). Init Delays should be Initial Delays (no caveats). Today, the VM will be immediately marked as healthy when the first health check completes successfully instead of waiting for the specified delay.
[1]
What you would like to accomplish:
- Be granted a way to delay VMs from being marked healthy on a specified initial delay.
How this might work:
- Users provide a time in acceptable seconds of how long to wait before health checks are ran against a give VM in a Managed Instance Group.
If applicable, reasons why alternative solutions are not sufficient:
- The existing solution is poorly worded, documented, and does not provide what its name sets out to do. Today, initial delay is only considered if the VM is marked unhealthy; not an arbitrary delay that you can specify to wait to begin checking the VM in a MIG.
Other information (workarounds you have tried, documentation consulted, etc):
- We are going to add a sleep to our application to delay health checks instead of doing it at the instance level. Similar to other cloud providers: