Fixed
Status Update
Comments
ma...@gmail.com <ma...@gmail.com> #2
Ok, this is the same issue but as of today, we're seeing slightly different behaviour to what was reported above.
All the fragments do arrive (tcpdump on eth0 within the gce shows this). The checksums of the IP fragments appear to be correct, so the various necessary bits of NAT have been done correctly and checksums recalculated on the IP header. Packet reassemble occurs correctly. However, the UDP header checksum is incorrect, and so the kernel drops the reassembled packet. Thus NAT being applied is faulty.
The UDP checksum recalculation is obviously a massive PITA as it includes the whole payload *and* the pseudo IPv4 header. Thus whenever NAT occurs, the whole datagram needs to be reassembled, a new pseudo IPv4 header generated, the UDP checksum updated, and then the datagram sent out and most likely refragmented. So yes, it's a fairly expensive thing to do.
I wouldn't be at all surprised to learn that GCE is not reassembling the datagram at all during NAT for speed/buffering reasons. Thus it's probably just recalculating the UDP checksum based on the first fragment only. This will explain why it works for non-fragmented UDP datagrams but not for fragmented ones.
For UDP over IPv4, the UDP checksum is optional. Thus the simplest fix for you would be during NAT, if you find UDP and you find the IP MoreFragments flag is set, just set the UDP checksum field to all zeros. That way you wouldn't need to do complete reassembly of the datagram.
All the fragments do arrive (tcpdump on eth0 within the gce shows this). The checksums of the IP fragments appear to be correct, so the various necessary bits of NAT have been done correctly and checksums recalculated on the IP header. Packet reassemble occurs correctly. However, the UDP header checksum is incorrect, and so the kernel drops the reassembled packet. Thus NAT being applied is faulty.
The UDP checksum recalculation is obviously a massive PITA as it includes the whole payload *and* the pseudo IPv4 header. Thus whenever NAT occurs, the whole datagram needs to be reassembled, a new pseudo IPv4 header generated, the UDP checksum updated, and then the datagram sent out and most likely refragmented. So yes, it's a fairly expensive thing to do.
I wouldn't be at all surprised to learn that GCE is not reassembling the datagram at all during NAT for speed/buffering reasons. Thus it's probably just recalculating the UDP checksum based on the first fragment only. This will explain why it works for non-fragmented UDP datagrams but not for fragmented ones.
For UDP over IPv4, the UDP checksum is optional. Thus the simplest fix for you would be during NAT, if you find UDP and you find the IP MoreFragments flag is set, just set the UDP checksum field to all zeros. That way you wouldn't need to do complete reassembly of the datagram.
si...@inaccess.com <si...@inaccess.com> #3
Any updates on this?
si...@inaccess.com <si...@inaccess.com> #4
This also makes IPSEC non-usable since during the authentication phase the UDP packets fail to be exchanged
mo...@google.com <mo...@google.com> #5
I have forwarded this request to the engineering team. We will update this issue with any progress updates and a resolution.
Best Regards,
Josh Moyer
Google Cloud Platform Support
Best Regards,
Josh Moyer
Google Cloud Platform Support
st...@gmail.com <st...@gmail.com> #6
Are there any updates on when this may be resolved? I am running into the same issue and have confirmed with Wireshark that incoming UDP packets from Google Cloud exceeding the 1460 MTU have an invalid UDP checksum.
da...@gmail.com <da...@gmail.com> #7
Google's own rfc5766 turn server fails (with WebRTC + DTLS) on GCE because of this. AWS on the other hand works fine!
In fact old less-secure certificates which Google are trying to discourage are smaller, are not fragmented and are fine, but the new certificates that Chrome is starting to enforce will fail because they are > 1480 bytes.
In fact old less-secure certificates which Google are trying to discourage are smaller, are not fragmented and are fine, but the new certificates that Chrome is starting to enforce will fail because they are > 1480 bytes.
ly...@gmail.com <ly...@gmail.com> #8
We are experiencing this exact same issue. Large SIP packets are getting cut off.
si...@inaccess.com <si...@inaccess.com> #10
IPSEC PSK mostly works, but IPSEC with certificates (which is mandatory for roadwarrior setups) fails. If router CPU is your concern you could offer this as an option but please do something fast. We have more than 1000 VPN tunnels with remote sites, and this bug is a showstopper for us.
ma...@gmail.com <ma...@gmail.com> #11
[Comment deleted]
[Deleted User] <[Deleted User]> #12
Are there any updates on this issue? We are currently migrating 8 servers off of GCE because of this. :( This issue basically makes using anything UDP heavy with GCE a non-starter.
si...@gmail.com <si...@gmail.com> #13
We just tried migration of ~ 1000 VPNs to GCE, it was a bad experience. We used protocol forwarding which seemed to perform better with UDP. About 10% of random remote clients could not be reached by the instance. Google representatives said there is a patch coming in 2weeks which will fix the current network state, is this true? Please give us an update, GCE has currently network issues in many fronts: UDP (fragmentation), routing/nat (protocol forwarding issues), HTTP (TCP issues).
ly...@gmail.com <ly...@gmail.com> #14
A confirmation on the supposed 2 week fix would be beneficial. We would like to stay with GCE if possible.
wu...@gmail.com <wu...@gmail.com> #15
gce network issue currently forces solutions with weak keys, because eg. ikeV2 fragmentation support is not build into windows.
ly...@gmail.com <ly...@gmail.com> #16
Any updates on this issue?
bn...@fluentstream.com <bn...@fluentstream.com> #17
Apparently the update from Google is to find another support channel to escalate, or only use TCP. I've given up on this thread providing answers.
[Deleted User] <[Deleted User]> #18
si...@inaccess.com <si...@inaccess.com> #19
We have escalated, a fix is being prepared.
ma...@gmail.com <ma...@gmail.com> #20
We operate a Graylog cluster on GCE and were wondering why we were losing some of the GELF UDP messages sent to the logservers from external instances over a network load balancer with sticky connections, while on GCE internal sent messages we had no losses.
Having noticed this thread we had to reduce the UDP packet size in order to come around this GCE reassembly issue.
Can Google please comment on when this will be resolved? The issue was accepted Oct 22, 2014 - which is more than 1 year ago.
Having noticed this thread we had to reduce the UDP packet size in order to come around this GCE reassembly issue.
Can Google please comment on when this will be resolved? The issue was accepted Oct 22, 2014 - which is more than 1 year ago.
ly...@gmail.com <ly...@gmail.com> #21
I have an open commercial support ticket with Google for this issue. Ive been told that most of the related issues are resolved and they are rolling out the fixes to production soon.
ch...@packetzoom.com <ch...@packetzoom.com> #22
Is it just a coincidence that hacker news post appeared on Nov16th and we the ticket got "escalated" on Nov 17th? An oh btw, I still don't see an answer from google.com on this ticket itself. So remember folks, if you have a bug report for google, the most effective method to get it fixed is public shaming on forums *outside* of google provided support channels.
ma...@google.com <ma...@google.com> #23
Hi everyone. Here is an update from the Google Compute Engine networking team.
In the past GCE network layer did incorrectly drop packet fragments in some circumstances. In recent months the network team has closed that gap almost completely, and in general you shouldn't see any packet fragment drops on the GCE network.
There's still a rare edge case where UDP fragmentation doesn't play nice with load balancers. If you're running multiple forwarding rules on the same IP address handling different UDP ports, and you're likely to get fragmented UDP packets, you need to look at this document[1]. In a nutshell, you need to switch off the default 5-tuple hashing that includes UDP port numbers, since UDP packet fragments don't contain port numbers (except for the first one of course).
[1]https://cloud-dot-devsite.googleplex.com/compute/docs/load-balancing/network/#load_balancing_and_fragmented_udp_packets
In the past GCE network layer did incorrectly drop packet fragments in some circumstances. In recent months the network team has closed that gap almost completely, and in general you shouldn't see any packet fragment drops on the GCE network.
There's still a rare edge case where UDP fragmentation doesn't play nice with load balancers. If you're running multiple forwarding rules on the same IP address handling different UDP ports, and you're likely to get fragmented UDP packets, you need to look at this document[1]. In a nutshell, you need to switch off the default 5-tuple hashing that includes UDP port numbers, since UDP packet fragments don't contain port numbers (except for the first one of course).
[1]
Description
1.Send UDP packet larger than 1500 MTU limit to instance with static ip.
2.Packet will be fragmented by "the network" as it gets NAT'd and delivered to the instance.
3.First fragment will arrive with "more fragments" flag set. No additional fragments arrive, and the TTL runs out, resulting in the instance returning a fragment reassembly time exceeded ICMP message back to the source.
What is the expected output? What do you see instead?
Additional fragments that have the same Packet ID should arrive at the instance if one arrives that has the "more fragments" flag set. Currently only the first fragment with offset 0 arrives.
What version of the product are you using? On what operating system?
OS: CentOS release 6.5 (Final)
Machine type: n1-standard-2 (2 vCPU, 7.5 GB memory)
Zone: us-central1-a
External IP: cbf-media-1 (23.236.59.200)
Internal IP: 10.240.217.75
Please provide any additional information below.
Initial Fragment Packet:
No. Time Source Destination Protocol Length Info
2 0.055924 216.115.69.144 10.240.217.75 IPv4 1514 Fragmented IP protocol (proto=UDP 17, off=0, ID=323b)
Frame 2: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits)
Ethernet II, Src: Google_dd:b6:2a (00:1a:11:dd:b6:2a), Dst: 42:01:0a:f0:d9:4b (42:01:0a:f0:d9:4b)
Internet Protocol Version 4, Src: 216.115.69.144 (216.115.69.144), Dst: 10.240.217.75 (10.240.217.75)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x60 (DSCP 0x18: Class Selector 3; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
Total Length: 1500
Identification: 0x323b (12859)
Flags: 0x01 (More Fragments)
Fragment offset: 0
Time to live: 52
Protocol: UDP (17)
Header checksum: 0x2c37 [validation disabled]
Source: 216.115.69.144 (216.115.69.144)
Destination: 10.240.217.75 (10.240.217.75)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
Data (1480 bytes)
------Data Truncated------
TTL Expires packet:
No. Time Source Destination Protocol Length Info
41 30.055838 10.240.217.75 216.115.69.144 ICMP 590 Time-to-live exceeded (Fragment reassembly time exceeded)
Frame 41: 590 bytes on wire (4720 bits), 590 bytes captured (4720 bits)
Ethernet II, Src: 42:01:0a:f0:d9:4b (42:01:0a:f0:d9:4b), Dst: 42:01:0a:f0:00:01 (42:01:0a:f0:00:01)
Internet Protocol Version 4, Src: 10.240.217.75 (10.240.217.75), Dst: 216.115.69.144 (216.115.69.144)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
Total Length: 576
Identification: 0x1a9d (6813)
Flags: 0x00
Fragment offset: 0
Time to live: 64
Protocol: ICMP (1)
Header checksum: 0x5b21 [validation disabled]
Source: 10.240.217.75 (10.240.217.75)
Destination: 216.115.69.144 (216.115.69.144)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
Internet Control Message Protocol
Type: 11 (Time-to-live exceeded)
Code: 1 (Fragment reassembly time exceeded)
Checksum: 0xefff [correct]
Internet Protocol Version 4, Src: 216.115.69.144 (216.115.69.144), Dst: 10.240.217.75 (10.240.217.75)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x60 (DSCP 0x18: Class Selector 3; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
Total Length: 1500
Identification: 0x323b (12859)
Flags: 0x01 (More Fragments)
Fragment offset: 0
Time to live: 52
Protocol: UDP (17)
Header checksum: 0x2c37 [validation disabled]
Source: 216.115.69.144 (216.115.69.144)
Destination: 10.240.217.75 (10.240.217.75)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
User Datagram Protocol, Src Port: sip (5060), Dst Port: sip (5060)
Session Initiation Protocol