Assigned
Status Update
Comments
an...@google.com <an...@google.com> #2
TL;DR: GCE should document that the DHCP client software should support the "Local Subnet Routes" feature specified in RFC 3442. (Ironically though, ISC DHCP does *not* support RFC 3442 at all, but works anyway because of a different non-standard extension; see below for details.)
Today I spent some time looking into this again, because I noticed at some point in the past year OpenBSD's DHCP client stopped working with GCE's DHCP server.
Just for posterity, here's a current DHCP client/server exchange:
16:25:26.187577 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
10.240.120.1.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 42:01:0a:f0:78:01, length 300, xid 0xdf529822, Flags [none] (0x0000)
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Request
Hostname Option 12, length 8: "xxxxxxxx"
Requested-IP Option 50, length 4: 10.240.120.1
Parameter-Request Option 55, length 8:
Subnet-Mask, BR, Time-Zone, Classless-Static-Route
Default-Gateway, Domain-Name, Domain-Name-Server, Hostname
Client-ID Option 61, length 7: ether 42:01:0a:f0:78:01
16:25:26.188125 IP (tos 0x0, ttl 1, id 0, offset 0, flags [none], proto UDP (17), length 471)
169.254.169.254.67 > 10.240.120.1.68: [udp sum ok] BOOTP/DHCP, Reply, length 443, xid 0xdf529822, Flags [none] (0x0000)
Your-IP 10.240.120.1
Server-IP 10.240.0.1
Gateway-IP 10.240.0.1
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: ACK
Server-ID Option 54, length 4: 169.254.169.254
Domain-Name-Server Option 6, length 8: 169.254.169.254,10.240.0.1
Lease-Time Option 51, length 4: 4294967295
Domain-Name Option 15, length 30: "c.xxxxxxxxxxxxxxxxxx.internal."
T119 Option 119, length 63: 1.99.18.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.8.105.110.116.101.114.110.97.108.0.12.51.55.52.49.50.49.55.50.48.50.52.49.6.103.111.111.103.108.101.8.105.110.116.101.114.110.97.108.0.192.44
Subnet-Mask Option 1, length 4: 255.255.255.255
Default-Gateway Option 3, length 4: 10.240.0.1
Classless-Static-Route Option 121, length 14: (10.240.0.1/32:0.0.0.0 ),(default:10.240.0.1)
MTU Option 26, length 2: 1460
Hostname Option 12, length 40: "xxxxxxxxxx.c.xxxxxxxxxxxxxxxxxx.internal"
NTP Option 42, length 4: 169.254.169.254
The particularly relevant details:
- RFC 3442 specifies that a Classless-Static-Route entry like "10.240.0.1/32:0.0.0.0 " indicates 10.240.0.1/32 is a "local subnet route" that's directly routable even though it's not part of the leased IP address's subnet (i.e., 10.240.120.1/32 for the above exchange).
- For DHCP clients that don't support RFC 3442, if Subnet-Mask == 255.255.255.255, then the DHCP client needs to assume the Default-Gateway is directly routable. (This isn't specified by any RFCs as far as I can tell though.)
The regression was because:
- ISC DHCP doesn't implement Classless-Static-Route support (as far as I can tell), but it does implement the Subnet-Mask == 255.255.255.255 hack for Default-Gateway.
- When I first modified OpenBSD dhclient to work with GCE, dhclient wasn't seeing any Classless-Static-Route options in the server response. Since ISC DHCP's behavior was undocumented, I simply matched the implementation exactly by only extending the Default-Gateway processing.
- Some point within the last year, OpenBSD dhclient started seeing Classless-Static-Route options from the server*. OpenBSD's Classless-Static-Route support didn't implement the "local route" behavior (instead it skipped over those routes as permitted by the RFC), and the presence of the Classless-Static-Route option precludes handling of the Default-Gateway option.
* It's unclear to me why. It looks like OpenBSD dhclient has supported Classless-Static-Route for more than a year, so I suspect GCE's DHCP server must have changed since then to start using this option.
Finally, the fix was to implement "local subnet route" support in OpenBSD dhclient:http://marc.info/?l=openbsd-tech&m=141212568615772&w=2
Today I spent some time looking into this again, because I noticed at some point in the past year OpenBSD's DHCP client stopped working with GCE's DHCP server.
Just for posterity, here's a current DHCP client/server exchange:
16:25:26.187577 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
10.240.120.1.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 42:01:0a:f0:78:01, length 300, xid 0xdf529822, Flags [none] (0x0000)
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Request
Hostname Option 12, length 8: "xxxxxxxx"
Requested-IP Option 50, length 4: 10.240.120.1
Parameter-Request Option 55, length 8:
Subnet-Mask, BR, Time-Zone, Classless-Static-Route
Default-Gateway, Domain-Name, Domain-Name-Server, Hostname
Client-ID Option 61, length 7: ether 42:01:0a:f0:78:01
16:25:26.188125 IP (tos 0x0, ttl 1, id 0, offset 0, flags [none], proto UDP (17), length 471)
169.254.169.254.67 > 10.240.120.1.68: [udp sum ok] BOOTP/DHCP, Reply, length 443, xid 0xdf529822, Flags [none] (0x0000)
Your-IP 10.240.120.1
Server-IP 10.240.0.1
Gateway-IP 10.240.0.1
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: ACK
Server-ID Option 54, length 4: 169.254.169.254
Domain-Name-Server Option 6, length 8: 169.254.169.254,10.240.0.1
Lease-Time Option 51, length 4: 4294967295
Domain-Name Option 15, length 30: "c.xxxxxxxxxxxxxxxxxx.internal."
T119 Option 119, length 63: 1.99.18.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.8.105.110.116.101.114.110.97.108.0.12.51.55.52.49.50.49.55.50.48.50.52.49.6.103.111.111.103.108.101.8.105.110.116.101.114.110.97.108.0.192.44
Subnet-Mask Option 1, length 4: 255.255.255.255
Default-Gateway Option 3, length 4: 10.240.0.1
Classless-Static-Route Option 121, length 14: (
MTU Option 26, length 2: 1460
Hostname Option 12, length 40: "xxxxxxxxxx.c.xxxxxxxxxxxxxxxxxx.internal"
NTP Option 42, length 4: 169.254.169.254
The particularly relevant details:
- RFC 3442 specifies that a Classless-Static-Route entry like "
- For DHCP clients that don't support RFC 3442, if Subnet-Mask == 255.255.255.255, then the DHCP client needs to assume the Default-Gateway is directly routable. (This isn't specified by any RFCs as far as I can tell though.)
The regression was because:
- ISC DHCP doesn't implement Classless-Static-Route support (as far as I can tell), but it does implement the Subnet-Mask == 255.255.255.255 hack for Default-Gateway.
- When I first modified OpenBSD dhclient to work with GCE, dhclient wasn't seeing any Classless-Static-Route options in the server response. Since ISC DHCP's behavior was undocumented, I simply matched the implementation exactly by only extending the Default-Gateway processing.
- Some point within the last year, OpenBSD dhclient started seeing Classless-Static-Route options from the server*. OpenBSD's Classless-Static-Route support didn't implement the "local route" behavior (instead it skipped over those routes as permitted by the RFC), and the presence of the Classless-Static-Route option precludes handling of the Default-Gateway option.
* It's unclear to me why. It looks like OpenBSD dhclient has supported Classless-Static-Route for more than a year, so I suspect GCE's DHCP server must have changed since then to start using this option.
Finally, the fix was to implement "local subnet route" support in OpenBSD dhclient:
ch...@biff.ai <ch...@biff.ai> #4
Same here, would be great to get an update on the issue.
mi...@gmail.com <mi...@gmail.com> #5
Is there an update on #191885497?
Description
This issue is described by the staging time of the VM.
Coldboot for Virtual Machine Instances with Nvidia GPU attached takes much longer (28s in my testing) to launch compared to non-GPU instances.
What you expected to happen: Time to live with GPU-attached instance taking a similar amount of time to start as non-GPU instance.
Steps to reproduce:
Other information (workarounds you have tried, documentation consulted, etc):
Here is my data on 40 virtual machines with Tesla T4 GPUs attached tested throughout the week 13/09/21 to 20/09/21 (DD/MM/YY).
Machine spec 8vCPU, 16gb Memory, 50gb pm-SSD, Preemptible False
Avg GPU TTL 28±2, avg NonGPU TTL 6±2
Different distros and boot-configurations have no affect on the staging time. Installing Nvidia drivers does not change the TTL. I have tried multiple different distros to see if there is a fault - no difference, tested CentOS, minimalised Debian, Clear Linux, Fedora, Marketplace images of pre-packaged Nvidia setups. Windows is N/A as the boot time surpasses the mentioned 30s anyway.
Systemd-analyze time, kernel cmdline tweaks and bootloader edits have no effect. I have tested multiple different zones, the worst being US-Central1 (I assume it's a high usage area), time of day of testing does not change the data's distribution nor does the size of the bootdisk. Specialised machine configurations (GVNIC, fast socket etc) do not change data. I come to the conclusion that this is an internal issue that should be brought to attention.
Understandably, provisioning complexity is huge, and this adds to latency especially when considering Nvidia services and hardware, I hope that Compute Engine's Technical Infrastructure team can reduce this time.