Status Update
Comments
ka...@google.com <ka...@google.com>
ka...@google.com <ka...@google.com> #2
Today I spent some time looking into this again, because I noticed at some point in the past year OpenBSD's DHCP client stopped working with GCE's DHCP server.
Just for posterity, here's a current DHCP client/server exchange:
16:25:26.187577 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
10.240.120.1.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 42:01:0a:f0:78:01, length 300, xid 0xdf529822, Flags [none] (0x0000)
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Request
Hostname Option 12, length 8: "xxxxxxxx"
Requested-IP Option 50, length 4: 10.240.120.1
Parameter-Request Option 55, length 8:
Subnet-Mask, BR, Time-Zone, Classless-Static-Route
Default-Gateway, Domain-Name, Domain-Name-Server, Hostname
Client-ID Option 61, length 7: ether 42:01:0a:f0:78:01
16:25:26.188125 IP (tos 0x0, ttl 1, id 0, offset 0, flags [none], proto UDP (17), length 471)
169.254.169.254.67 > 10.240.120.1.68: [udp sum ok] BOOTP/DHCP, Reply, length 443, xid 0xdf529822, Flags [none] (0x0000)
Your-IP 10.240.120.1
Server-IP 10.240.0.1
Gateway-IP 10.240.0.1
Client-Ethernet-Address 42:01:0a:f0:78:01
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: ACK
Server-ID Option 54, length 4: 169.254.169.254
Domain-Name-Server Option 6, length 8: 169.254.169.254,10.240.0.1
Lease-Time Option 51, length 4: 4294967295
Domain-Name Option 15, length 30: "c.xxxxxxxxxxxxxxxxxx.internal."
T119 Option 119, length 63: 1.99.18.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.8.105.110.116.101.114.110.97.108.0.12.51.55.52.49.50.49.55.50.48.50.52.49.6.103.111.111.103.108.101.8.105.110.116.101.114.110.97.108.0.192.44
Subnet-Mask Option 1, length 4: 255.255.255.255
Default-Gateway Option 3, length 4: 10.240.0.1
Classless-Static-Route Option 121, length 14: (
MTU Option 26, length 2: 1460
Hostname Option 12, length 40: "xxxxxxxxxx.c.xxxxxxxxxxxxxxxxxx.internal"
NTP Option 42, length 4: 169.254.169.254
The particularly relevant details:
- RFC 3442 specifies that a Classless-Static-Route entry like "
- For DHCP clients that don't support RFC 3442, if Subnet-Mask == 255.255.255.255, then the DHCP client needs to assume the Default-Gateway is directly routable. (This isn't specified by any RFCs as far as I can tell though.)
The regression was because:
- ISC DHCP doesn't implement Classless-Static-Route support (as far as I can tell), but it does implement the Subnet-Mask == 255.255.255.255 hack for Default-Gateway.
- When I first modified OpenBSD dhclient to work with GCE, dhclient wasn't seeing any Classless-Static-Route options in the server response. Since ISC DHCP's behavior was undocumented, I simply matched the implementation exactly by only extending the Default-Gateway processing.
- Some point within the last year, OpenBSD dhclient started seeing Classless-Static-Route options from the server*. OpenBSD's Classless-Static-Route support didn't implement the "local route" behavior (instead it skipped over those routes as permitted by the RFC), and the presence of the Classless-Static-Route option precludes handling of the Default-Gateway option.
* It's unclear to me why. It looks like OpenBSD dhclient has supported Classless-Static-Route for more than a year, so I suspect GCE's DHCP server must have changed since then to start using this option.
Finally, the fix was to implement "local subnet route" support in OpenBSD dhclient:
ka...@google.com <ka...@google.com> #4
Hi Jon,
Could you please provide some detailed instructions on how we can reproduce the issue on our end? Specifically, how you set up the probes, what you expect to see and what actually occurred.
Also, you mentioned that the probes don't seem to trigger on COS 97 and want to know if they were working fine on previous versions of COS, e.g. COS 93?
Thanks
ia...@spyderbat.com <ia...@spyderbat.com> #5
I installed my dumb ubuntu pod, it is privileged. I've attached the yaml.
Mount the trace filesystem to get access to the kprobes:
mount -ttracefs none /sys/kernel/tracing
Make two kprobes:
echo 'p:isn_sock_probe sock_sendmsg sock=%di msg=%si' >> /sys/kernel/tracing/kprobe_events
echo 'p:isn_icmp_echo icmp_echo' >> /sys/kernel/tracing/kprobe_events
The first kprobe is on a kernel function that dispatches network writes to the different protocol stacks. The second fires when the kernel is pinged.
Enable kprobes:
echo 1 >> /sys/kernel/tracing/events/kprobes/enable
Install ping so we can make the test happen
apt install inetutils-ping
I run this in the background, it is the trace output.
cat /sys/kernel/tracing/trace_pipe. &
if you ping 127.0.0.1. you'll see the second kprobe invoked. I cannot make the first one ever trigger. It does on ubuntu worker nodes and most Linux systems.
ia...@spyderbat.com <ia...@spyderbat.com> #6
thank you,
Ian
bo...@google.com <bo...@google.com> #7
Hello Nelson,
Thanks for reaching out to us for your trust and continued support to improve Google Cloud Platform products .
The information has been shared with the product specialist team and any further updates will be communicated here in this thread.
bo...@google.com <bo...@google.com>
ia...@spyderbat.com <ia...@spyderbat.com> #8
thank you,
Ian
jo...@spyderbat.com <jo...@spyderbat.com> #9
Best
Jon
ia...@spyderbat.com <ia...@spyderbat.com> #10
dh...@google.com <dh...@google.com>
dh...@google.com <dh...@google.com> #11
Hello,
According to the
To ensure a faster resolution and dedicated support for your issue, I kindly request you to file a support ticket by clicking
Please note that the Issue Tracker is primarily meant for reporting bugs and requesting new features. For individual support issues, it is best to utilize the support ticketing system. If you have any additional issues or concerns, please don’t hesitate to create a new thread on the
Thanks & Regards
Description
my worker node is:
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="
BUG_REPORT_URL="
GOOGLE_CRASH_ID=Lakitu
KERNEL_COMMIT_ID=d45a5f95207952be35f6d99da9bf7e5b1c3c1d7a
GOOGLE_METRICS_PRODUCT_ID=26
VERSION=97
VERSION_ID=97
BUILD_ID=16919.189.3
Specifically sock_sendmsg and sock_recvmsg are symbols in /proc/kallsyms and it allows me to create the trace points but they never seem to fire.