Status Update
Comments
li...@c0d3.blue <li...@c0d3.blue> #2
mw...@gmail.com <mw...@gmail.com> #3
li...@c0d3.blue <li...@c0d3.blue> #4
So still seems to me that Android treats certain, specific ICMPv6 types differently in sleep mode.
Another small clarification I missed in the test description: I'm running "icmpv6-test.sh" on the laptop shown in the setup.png. Not on the OpenWrt router directly. However the OpenWrt router has a bridge over its AP and LAN interfaces as is the default with a fresh OpenWrt installation (with multicast snooping, multicast querier etc. disabled). So it is forwarding the test packets from the laptop to the Android device as is. And I checked with tcpdump-mini on the OpenWrt router that MLD queries from the laptop / icmpv6-test.sh for instance are leaving the wlan0 interface on the OpenWrt router.
vi...@google.com <vi...@google.com>
vi...@google.com <vi...@google.com> #5
Android bug report (to be captured after reproducing the issue)
For steps to capture a bug report, please refer:
Note: Please upload the files to google drive and share the folder to android-bugreport@google.com, then share the link here.
li...@c0d3.blue <li...@c0d3.blue> #6
vi...@google.com <vi...@google.com> #7
li...@c0d3.blue <li...@c0d3.blue> #8
I'd start forwarding this issue to the Linux bridge and IETF Multicast mailing lists to raise some awareness and to discuss potential workarounds, soon. Is there anything I should share from the Google/Android side?
li...@c0d3.blue <li...@c0d3.blue> #9
Plus an according OpenWrt package for it:
Some Freifunk communities are currently testing this approach. Feedback from our testers should arrive soon.
ma...@google.com <ma...@google.com> #10
I mean obviously we want them to work, and hearing that they don't is sad, disheartening and a valid signal, but there simply is very little we can do about it.
We don't even know what code they're running (we don't have the source to the kernel, the os, nor the wifi firmware, nor do we even know the hardware involved and its capabilities). It's all up to the OEM (many, if not most/all, OEMs customize many things about their phones including kernel and aosp code, or use chipsets/firmware we've never even heard of).
Plus, honestly, there's already an excessive amount of code just in our devices - the number of relevant kernel trees/branches doesn't fit on my fingers...
Our only real ability to influence things is via enforcing that some tests must pass and the CDD document.
(And as you can guess writing meaningful tests for this sort of stuff is very very difficult, since they're not run in a 'controlled' environment of your own managed lab)
--
Basically phones don't have the battery to continually respond to queries while asleep (and not charging).
The timeouts on this network appear to be way too low.
As a workaround the user can disable wifi or plug the phone in - or use a different wifi network. Yeah, I realize this isn't a satisfactory answer... But a fully charged phone battery lasting less than a day when asleep also isn't.
Patches to improve this situation in a way that doesn't negatively impact battery life are welcome (via AOSP).
(ie. for example doing something unsolicited on wakeup is ok, filtering less probably isn't, unless it's similarly limited like the RA APF filtering code)
Of course even if such patches were available/reviewed/submitted right now, they'd still only make it into the next release of Android, and thus rollout to phones over the course of the next ~3 years... (faster for Pixel phones)
Now, I don't exactly know what the MLD specific behaviour is during sleep, so take the following with a grain of salt, but I'd imagine that this might also, potentially, maybe, be fixable on the network side.
The precise details would need some thought, but something like:
Send RAs every 50 seconds, valid for 180 seconds (3 minutes).
Followup each RA with an MLD query, and honour a response for 4 minutes.
That query should thus at least periodically arrive while the phone is still awake from processing the RA (it'll have to process at least one of them every 3 minutes)
just might work.
Note: losing connectivity when asleep is a violation of the Android CDD section 7.4.5, which since release 6.0 has said:
[C-0-5] Rate-limiting MUST NOT cause the device to lose IPv6 connectivity on any IPv6-compliant network that uses RA lifetimes of at least 180 seconds.
li...@c0d3.blue <li...@c0d3.blue> #11
The Android CDD is interesting. I guess that might be where the confusion might come from. Wifi driver developers might not be 100% aware of all nuances and necessities of IPv6 to work properly. Would it make sense to update the CDD regarding the "Minimum Network Capability" to be more verbose in that regard? Like spefically mentioning MLD and ND, too and not just RA, and exempting them from filtering. I also think RFC7772 ("Reducing Energy Consumption of Router Advertisements") [1] (which basically says to use individual unicast packets instead of multicast for RA) is a better, more robust option than rate limiting ICMPv6-RA. And the CDD should probably also require to adhere to RFC4890 ("Recommendations for Filtering ICMPv6 Messages in Firewalls"), which notes some more, important ICMPv6 types.
I also started a discussion on the according IETF mailing lists yesterday (mcast-wifi@ietf.org, pim@ietf.org) [2]. I hope we can find some solution/workaround there, feel free to participate.
Regards, Linus
[0]:
[1]:
[2]:
li...@c0d3.blue <li...@c0d3.blue> #12
ma...@google.com <ma...@google.com> #13
Do you know if there's a write up somewhere with rfc links of what the actual requirements in this area are? Both the RFC mandated and the actually make things work in practice?
Also in #3 you mention you see the behaviour on pixel 3a jan 2020. Could you post some tcpdumps showing the misbehaviour? [and what the required frequency of responses / timeouts are] That's something I at least have access to both hardware and the source code (and behaviour should [most likely] be equivalent across the entire pixel 3/3a/3xl/3a-xl lineup and possibly across all the pixels - so it would be worthwhile to fix if possible).
li...@c0d3.blue <li...@c0d3.blue> #14
It's actually not as short as it might appear. Due to the robustness parameter increased from 2 to 9, up to 8 MLD queries may be left unanswered until a device "forgets" about a a listener. So the effective timeout with 20s querier interval, 5s maximum response delay and robustness 9 is 20*9+5=185, vs. the default of 125*2+10=260 (RFC3810, section 9.4 [0]).
The reason why we chose to use such a large robustness parameter and fast querier interval is to compensate for Wifi packetloss for one thing and to compensate for devices which do not send an unsolicited MLD Report after roaming from one AP to another.
[0]:
ma...@google.com <ma...@google.com> #15
$ cat Android-49O/net/ipv6/addrconf.c | egrep 'resend MLD reports'
* but resend MLD reports, we might
$ cat Android-49P/net/ipv6/addrconf.c | egrep 'resend MLD reports'
* but resend MLD reports, we might
$ cat Android-49Q/net/ipv6/addrconf.c | egrep 'resend MLD reports'
* but resend MLD reports, we might
$ cat Android-B1C1B4S4/net/ipv6/addrconf.c | egrep 'resend MLD reports'
* but resend MLD reports, we might
Presumably due to being in 4.9.33 LTS
$ git log --oneline -n 1 remotes/linux-stable/v4.9.32..1cadd394bbf94
1cadd394bbf9 ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
$ git log --oneline -n 1 remotes/linux-stable/v4.9.33..1cadd394bbf94
// product:bonito model:Pixel_3a_XL device:bonito
$ adb shell uname -a
Linux localhost 4.9.124-gfd67e45cc855-ab5269778 #0 SMP PREEMPT Wed Jan 30 13:15:08 UTC 2019 aarch64
ma...@google.com <ma...@google.com> #16
li...@c0d3.blue <li...@c0d3.blue> #17
> Do you know if there's a write up somewhere with rfc links of what the actual requirements in this area are? Both the RFC mandated and the actually make things work in practice?
Hm, good question. The best one (/ best starting point) I can think of is RFC4890, which I mentioned above. It gives a good overview of the essential ICMPv6 types and some guidance on filtering ICMPv6 types.
Regarding #3, that wasn't me (but I can guess, I'll bump someone on IRC). I unfortunately don't have a Google Pixel.
li...@c0d3.blue <li...@c0d3.blue> #18
However the timing is quite tricky... Sending the MLD Query before receiving the Echo Reply does not work. The Echo Reply has some "random" latency after the Echo Request (the device needs a moment to wake up, I guess), in my case about 0.2 to 0.7 seconds. Also the MLD Query needs to be send shortly after the Echo Reply. In my case about 0.2 seconds seemed to be the boundary.
And hitting 0.0x to 0.2 seconds is already quite tricky, as with Wifi we can have packet loss and then we have retries with an exponentially increasing backoff. So if the medium is quite busy then it can happen that a packet needs more than 0.2 seconds.
So I guess I'll need to add some more complexity to my mldpoker tool... And will need to think about a way to not make it too complex, to have a chance to also implement it in and get it upstream in the Linux bridge itself at some point in the future.
Also, I don't know yet if Echo Requests work for all Android devices. Or just for the HTC U11 I have here, whether other vendors have implemented other multicast/ICMPv6 filters.
---
I guess a fix or workaround in Android, as well as an update of the Android Compatibility Definition Document, is not in sight yet, right?
ma...@google.com <ma...@google.com> #19
Reasonable patches to aosp-master //packages/modules/NetworkStack/src/android/net/apf/ApfFilter.java (with tests) would be welcome. If you want to pursue this you'd probably need to try some development on a pixel 2/2xl/3/3xl/3a/3axl/4/4xl phone (I *think* they all support apf).
Technically since it's part of a mainline module this could potentially rollout to some phones pretty quickly (quarters instead of years, although it's only optional in Q, but I think it is required in R)... and of course it still won't fix non APF using devices.
Unfortunately I don't have a network that exhibits this problem (not to mention WFH related slow downs and other things to work on).
I'm not sure if an update to the CDD is required... technically it says things should work :-)
Another approach would be to send some update-the-router packet on wake-up...
This might be safer battery life wise?
ma...@google.com <ma...@google.com> #20
ma...@google.com <ma...@google.com>
mw...@gmail.com <mw...@gmail.com> #21
FWIW, It was me at #3 and I'm still seeing that issue on the Pixel 3a with Android 11. I'd be happy to provide a "Bug report" if that helps anything at this point.
Every morning I'm toggling the WiFi, because else the network experience is bad, because it thinks it can use IPv6, but since it's not part of the neighbor discovery group anymore, that doesn't work, and it doesn't notice that.
The network uses Bird 1.6 for RAdv with its default settings (min/max ra interval 600/200s, default lft 1800s) on Debian Buster (4.19.132).
protocol radv r_vlan110 {
interface "br-wifi" {
rdnss {
ns 2001:db8::1;
};
dnssl {
domain "example.com";
};
};
}
The wifi interface is part of a bridge
24: wlp4s0.120: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-iot state UP mode DEFAULT group default qlen 1000
link/ether 04:f0:21:36:61:b6 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 256 maxmtu 2304
bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8002 port_no 0x2 designated_port 32770 designated_cost 0 designated_bridge 8000.0:d:b9:49:cc:f9 designated_root 8000.0:d:b9:49:cc:f9 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off addrgenmode eui64 numtxqueues 4 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
7: br-wifi: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 04:f0:21:36:61:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.4:f0:21:36:61:b5 designated_root 8000.4:f0:21:36:61:b5 root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 37.64 vlan_default_pvid 1 vlan_stats_enabled 0 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4 mcast_hash_max 512 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3124 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
I've not modified linux' sysctls regarding neighbor discovery:
net.ipv6.conf.br-wifi.accept_dad = 1
net.ipv6.conf.br-wifi.accept_ra = 0
net.ipv6.conf.br-wifi.accept_ra_defrtr = 1
net.ipv6.conf.br-wifi.accept_ra_from_local = 0
net.ipv6.conf.br-wifi.accept_ra_min_hop_limit = 1
net.ipv6.conf.br-wifi.accept_ra_mtu = 1
net.ipv6.conf.br-wifi.accept_ra_pinfo = 1
net.ipv6.conf.br-wifi.accept_ra_rt_info_max_plen = 0
net.ipv6.conf.br-wifi.accept_ra_rt_info_min_plen = 0
net.ipv6.conf.br-wifi.accept_ra_rtr_pref = 1
net.ipv6.conf.br-wifi.accept_redirects = 1
net.ipv6.conf.br-wifi.accept_source_route = 0
net.ipv6.conf.br-wifi.addr_gen_mode = 0
net.ipv6.conf.br-wifi.autoconf = 0
net.ipv6.conf.br-wifi.dad_transmits = 1
net.ipv6.conf.br-wifi.disable_ipv6 = 0
net.ipv6.conf.br-wifi.disable_policy = 0
net.ipv6.conf.br-wifi.drop_unicast_in_l2_multicast = 0
net.ipv6.conf.br-wifi.drop_unsolicited_na = 0
net.ipv6.conf.br-wifi.enhanced_dad = 1
net.ipv6.conf.br-wifi.force_mld_version = 0
net.ipv6.conf.br-wifi.force_tllao = 0
net.ipv6.conf.br-wifi.forwarding = 1
net.ipv6.conf.br-wifi.hop_limit = 64
net.ipv6.conf.br-wifi.ignore_routes_with_linkdown = 0
net.ipv6.conf.br-wifi.keep_addr_on_down = 0
net.ipv6.conf.br-wifi.max_addresses = 16
net.ipv6.conf.br-wifi.max_desync_factor = 600
net.ipv6.conf.br-wifi.mc_forwarding = 1
net.ipv6.conf.br-wifi.mldv1_unsolicited_report_interval = 10000
net.ipv6.conf.br-wifi.mldv2_unsolicited_report_interval = 1000
net.ipv6.conf.br-wifi.mtu = 1500
net.ipv6.conf.br-wifi.ndisc_notify = 0
net.ipv6.conf.br-wifi.ndisc_tclass = 0
net.ipv6.conf.br-wifi.optimistic_dad = 0
net.ipv6.conf.br-wifi.proxy_ndp = 0
net.ipv6.conf.br-wifi.regen_max_retry = 3
net.ipv6.conf.br-wifi.router_probe_interval = 60
net.ipv6.conf.br-wifi.router_solicitation_delay = 1
net.ipv6.conf.br-wifi.router_solicitation_interval = 4
net.ipv6.conf.br-wifi.router_solicitation_max_interval = 3600
net.ipv6.conf.br-wifi.router_solicitations = -1
net.ipv6.conf.br-wifi.seg6_enabled = 0
net.ipv6.conf.br-wifi.seg6_require_hmac = 0
net.ipv6.conf.br-wifi.suppress_frag_ndisc = 1
net.ipv6.conf.br-wifi.temp_prefered_lft = 86400
net.ipv6.conf.br-wifi.temp_valid_lft = 604800
net.ipv6.conf.br-wifi.use_oif_addrs_only = 0
net.ipv6.conf.br-wifi.use_optimistic = 0
net.ipv6.conf.br-wifi.use_tempaddr = 0
net.ipv6.neigh.br-wifi.anycast_delay = 100
net.ipv6.neigh.br-wifi.app_solicit = 0
net.ipv6.neigh.br-wifi.base_reachable_time_ms = 30000
net.ipv6.neigh.br-wifi.delay_first_probe_time = 5
net.ipv6.neigh.br-wifi.gc_stale_time = 60
net.ipv6.neigh.br-wifi.locktime = 0
net.ipv6.neigh.br-wifi.mcast_resolicit = 0
net.ipv6.neigh.br-wifi.mcast_solicit = 3
net.ipv6.neigh.br-wifi.proxy_delay = 80
net.ipv6.neigh.br-wifi.proxy_qlen = 64
net.ipv6.neigh.br-wifi.retrans_time_ms = 1000
net.ipv6.neigh.br-wifi.ucast_solicit = 3
net.ipv6.neigh.br-wifi.unres_qlen = 101
net.ipv6.neigh.br-wifi.unres_qlen_bytes = 212992
li...@c0d3.blue <li...@c0d3.blue> #22
Unfortunately upstream Linux has declined this workaround here:
@ #3
> Maybe a better (?) approach would be a way to trigger MLD report from the kernel via some mechanism???
I'm not quite sure. This still sounds like an incomplete workaround to me. It fixes 99% of the issue and I'm a little afraid that the remaining 1% will lead to issues which are even harder to comprehend for a user or Android app developer. We would still need to keep the "bridge snooping wakeupcall" workaround mentioned above until the remaining 1% are fixed in our project. And I'm also afraid that if going for this userspace triggered MLD report workaround that priority to fix it properly would decline, delaying a proper fix even further.
So I would prefer to not use MLD reports on wakeup and instead focus on fixing the MLD query reception while the device is asleep.
mw...@gmail.com <mw...@gmail.com> #24
FWIW, It was me at #3 and I'm still seeing that issue on the Pixel 3a with Android 11. I'd be happy to provide a "Bug report" if that helps anything at this point.
Every morning I'm toggling the WiFi, because else the network experience is bad, because it thinks it can use IPv6, but since it's not part of the neighbor discovery group anymore, that doesn't work, and it doesn't notice that.
The network uses Bird 1.6 for RAdv with its default settings (min/max ra interval 600/200s, default lft 1800s) on Debian Buster (4.19.132).
protocol radv r_vlan110 {
interface "br-wifi" {
rdnss {
ns 2001:db8::1;
};
dnssl {
domain "example.com";
};
};
}
The wifi interface is part of a bridge
24: wlp4s0.120: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-iot state UP mode DEFAULT group default qlen 1000
link/ether 04:f0:21:36:61:b6 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 256 maxmtu 2304
bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8002 port_no 0x2 designated_port 32770 designated_cost 0 designated_bridge 8000.0:d:b9:49:cc:f9 designated_root 8000.0:d:b9:49:cc:f9 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off addrgenmode eui64 numtxqueues 4 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
7: br-wifi: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 04:f0:21:36:61:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.4:f0:21:36:61:b5 designated_root 8000.4:f0:21:36:61:b5 root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 37.64 vlan_default_pvid 1 vlan_stats_enabled 0 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4 mcast_hash_max 512 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3124 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
I've not modified linux' sysctls regarding neighbor discovery:
net.ipv6.conf.br-wifi.accept_dad = 1
net.ipv6.conf.br-wifi.accept_ra = 0
net.ipv6.conf.br-wifi.accept_ra_defrtr = 1
net.ipv6.conf.br-wifi.accept_ra_from_local = 0
net.ipv6.conf.br-wifi.accept_ra_min_hop_limit = 1
net.ipv6.conf.br-wifi.accept_ra_mtu = 1
net.ipv6.conf.br-wifi.accept_ra_pinfo = 1
net.ipv6.conf.br-wifi.accept_ra_rt_info_max_plen = 0
net.ipv6.conf.br-wifi.accept_ra_rt_info_min_plen = 0
net.ipv6.conf.br-wifi.accept_ra_rtr_pref = 1
net.ipv6.conf.br-wifi.accept_redirects = 1
net.ipv6.conf.br-wifi.accept_source_route = 0
net.ipv6.conf.br-wifi.addr_gen_mode = 0
net.ipv6.conf.br-wifi.autoconf = 0
net.ipv6.conf.br-wifi.dad_transmits = 1
net.ipv6.conf.br-wifi.disable_ipv6 = 0
net.ipv6.conf.br-wifi.disable_policy = 0
net.ipv6.conf.br-wifi.drop_unicast_in_l2_multicast = 0
net.ipv6.conf.br-wifi.drop_unsolicited_na = 0
net.ipv6.conf.br-wifi.enhanced_dad = 1
net.ipv6.conf.br-wifi.force_mld_version = 0
net.ipv6.conf.br-wifi.force_tllao = 0
net.ipv6.conf.br-wifi.forwarding = 1
net.ipv6.conf.br-wifi.hop_limit = 64
net.ipv6.conf.br-wifi.ignore_routes_with_linkdown = 0
net.ipv6.conf.br-wifi.keep_addr_on_down = 0
net.ipv6.conf.br-wifi.max_addresses = 16
net.ipv6.conf.br-wifi.max_desync_factor = 600
net.ipv6.conf.br-wifi.mc_forwarding = 1
net.ipv6.conf.br-wifi.mldv1_unsolicited_report_interval = 10000
net.ipv6.conf.br-wifi.mldv2_unsolicited_report_interval = 1000
net.ipv6.conf.br-wifi.mtu = 1500
net.ipv6.conf.br-wifi.ndisc_notify = 0
net.ipv6.conf.br-wifi.ndisc_tclass = 0
net.ipv6.conf.br-wifi.optimistic_dad = 0
net.ipv6.conf.br-wifi.proxy_ndp = 0
net.ipv6.conf.br-wifi.regen_max_retry = 3
net.ipv6.conf.br-wifi.router_probe_interval = 60
net.ipv6.conf.br-wifi.router_solicitation_delay = 1
net.ipv6.conf.br-wifi.router_solicitation_interval = 4
net.ipv6.conf.br-wifi.router_solicitation_max_interval = 3600
net.ipv6.conf.br-wifi.router_solicitations = -1
net.ipv6.conf.br-wifi.seg6_enabled = 0
net.ipv6.conf.br-wifi.seg6_require_hmac = 0
net.ipv6.conf.br-wifi.suppress_frag_ndisc = 1
net.ipv6.conf.br-wifi.temp_prefered_lft = 86400
net.ipv6.conf.br-wifi.temp_valid_lft = 604800
net.ipv6.conf.br-wifi.use_oif_addrs_only = 0
net.ipv6.conf.br-wifi.use_optimistic = 0
net.ipv6.conf.br-wifi.use_tempaddr = 0
net.ipv6.neigh.br-wifi.anycast_delay = 100
net.ipv6.neigh.br-wifi.app_solicit = 0
net.ipv6.neigh.br-wifi.base_reachable_time_ms = 30000
net.ipv6.neigh.br-wifi.delay_first_probe_time = 5
net.ipv6.neigh.br-wifi.gc_stale_time = 60
net.ipv6.neigh.br-wifi.locktime = 0
net.ipv6.neigh.br-wifi.mcast_resolicit = 0
net.ipv6.neigh.br-wifi.mcast_solicit = 3
net.ipv6.neigh.br-wifi.proxy_delay = 80
net.ipv6.neigh.br-wifi.proxy_qlen = 64
net.ipv6.neigh.br-wifi.retrans_time_ms = 1000
net.ipv6.neigh.br-wifi.ucast_solicit = 3
net.ipv6.neigh.br-wifi.unres_qlen = 101
net.ipv6.neigh.br-wifi.unres_qlen_bytes = 212992
mw...@gmail.com <mw...@gmail.com> #25
Sorry, browser fckp. Ignore #24, it's a duplicate of #21.
li...@c0d3.blue <li...@c0d3.blue> #26
I would have expected this to be a bit higher on the Android agenda, as proper MLD snooping support with wifi APs should ultimately reduce multicast packet transmissions to a client and by that should reduce power consumption.
mw...@gmail.com <mw...@gmail.com> #27
Still affects me to this day, now on my Pixel 6a and Android 13.
ma...@google.com <ma...@google.com> #28
Though this one is an even deeper rabbit hole than the existing ones...
I'm guessing we'd need to rev APF to support packet transmit and be able to do IPv6 MLD replies???
ma...@google.com <ma...@google.com> #29
That should at least avoid dtim multiplier related packet loss.
li...@gmail.com <li...@gmail.com> #30
Note that it's likely not enough to just check for the MLD Query type (130) in the same way as it's done for ICMPV6_ECHO_REQUEST_TYPE (128) a few lines later. Because an MLD Query has a hop-by-hop router-alert option between the IPv6 and ICMPv6 header.
I unfortunately only have an outdated HTC U11 with no root access here at the moment, so can't try (creating) a patch myself right now.
yu...@google.com <yu...@google.com> #31
Hi @li...@gmail.com
If you plan to change the Apf behavior, it is possible to test it with unit test without a actual hardware.
If you can run atest
command locally, you can edit ApfFilter.java and confirm the behavior by adding a unit test. The unit test will run the real apf_interpreter
Thanks,
yuyang
ma...@google.com <ma...@google.com> #32
it might be appropriate (at least short term) to just let all ipv6/HBH (ie. 0) through.
Though we would need to check what HBH packets are sent by Linux/etc on connection to a network.
I also vaguely recall having to add some hack to some Android tests (though I think that might have been for IPv4 tests where this is an IPv4 option, and basically the only case of IPv4 options in the wild I've ever seen)
I seem to recall that interface up sends a few ipv6/HBH packets, we (probably?) wouldn't want a new device connecting to a network to wake all existing devices... but maybe that's rare enough we don't care??
Note that the fact it sends HBH I know, I'm just not sure of the destinations - all hosts? all routers? multicast? broadcast?
I'm guessing it sends a HBH router alert MLD packet to all routers multicast group....
Note that APF doesn't have to be perfect - but it should not drop stuff it shouldn't (ie. this bug for example) and best effort drop as much network spam as possible.
ma...@google.com <ma...@google.com> #33
FYI, timeline wise - I think the patch won't roll out via mainline till near the end of October...
So, in the meantime you need builds from head.
You'll probably still want various ipv6 RA lifetimes to be high [ie. 2-4+ hours] with unsolicited RA frequency on the order of 5-10 minutes, although technically with "dtim_interval >= 5" RA lifetime being 15+ * RA frequency is far less important)
Description
The issue initially observed was that after turning on the screen of an Android device it took up to 20 seconds until IPv6 connectivity started to work. Even with both "Battery saver" and "Extreme saver" disabled in the UI. Further tests could narrow the issue down to Android being unresponsive to MLD [0][1] Queries when in sleep mode. Which results in the router being unaware of the multicast listeners of this device. Which in turn breaks IPv6 Neighbor Discovery and by that IPv6 communication in general. When the Android device wakes up IPv6 communication continues to be broken until the next periodic MLD query is sent from the router. When the phone is connected to a charger, no IPv6 communication issues appear.
The issue was observed and reported from users in Freifunk networks [2] with the Gluon firmware [3] (a mesh networking firmware framework based on OpenWrt [4]). The current firmware makes heavy use of multicast snooping features of the Linux bridge and batman-adv, a layer 2 mesh routing protocol and Linux kernel module.
The Gluon firmware of a Freifunk router currently enables the bridge MLD querier with the following settings: 20sec MLD query interval, 5sec query response interval, robustness factor of 9. The defaults in the IETF RFCs are 120 seconds query interval, 10 seconds query response interval and a robustness factor of 2. So the values chosen in the Gluon firmware are already very conservative.
# Steps to reproduce
## Simplified Setup
Our IPv6 packet loss scenario involved snooping switches and by that has some complexity. However, here is a simplified setup with which we were able to reliably reproduce the missing MLD Response to an MLD Query with an Android phone running on battery power:
* Android 9 phone: HTC U11, software number 3.31.401.1, kernel version 4.4.153-perf-gd52e1de
* Openwrt router: TP-Link 842ND v2, running a (mostly [*]) default OpenWrt
* Linux Laptop: Linux 5.2.17
([*]: Roles of WAN port and LAN ports swapped due to some issues with the switch chip.)
## Test tool
We updated the Si6 ipv6toolkit with MLD Query capabilities [5] and wrote a little script to automize the test runs and evaluations [6].
# Current output
Running the test script results in the following output: [7]. Summary:
Results with phone on battery:
* 0/8 MLD Reports received (lines 5-16)
* 3/8 Neighbor Advertisements received (lines 18-32)
* 8/8 Echo Replies received (lines 34-53)
Results with phone on charger:
* 8/8 MLD Reports received (lines 224-269)
* 8/8 Neighbor Advertisements received (lines 271-290)
* 8/8 Echo Replies received (lines 292-311)
The other tests in the output had slightly modified, non-standard ethernet and IPv6 destination addresses. I was kind of hoping/expecting that sending MLD Queries to the Android device directly, via unicast (lines 81-110) would result in an 8/8 just like the Echo Request Test. However this one too only got 3 of 8 expected MLD Reports to the MLD Queries. So I'm wondering a bit, since ICMPv6 Echo Requests have no issues even in sleep mode does Android in sleep mode filter/rate-limit based on specific ICMPv6 types?
# Expected output
The OpenWrt router should receive an MLD Report from the Android phone for (nearly) all MLD Queries the router sent, even when the phone is in sleep mode. The test script should show all 8/8 when the phone is in sleep mode, too, and not only when connected to a charger.
# Frequency
* Missing MLD Reports: 100% of the time
* Broken IPv6 TCP/UDP/...: When multicast snooping switches are involved
Note that from a user perspective though when there are no multicast snooping switches or multicast routers involved a user will not notice any issues with his/her common IPv6 TCP/UDP/... communications. Only multicast snooping switches and multicast routers care about MLD. Multicast snooping capabilities are however standard and usually enabled in enterprise switches for instance. Also OpenWrt, a common Linux OS for home routers, had multicast snooping enabled by default at one point but had to disable it again due to issues like this one.
----------
I know that the whole IPv6 NDISC+MLD cha-cha-cha and its interactions with multicast snooping switches is quite complex (there is even the extra RFC4541 [8] to "clarify" the interactions between MLD and snooping switches). But hopefully the tools and results provided shed some light on what is going on and what is going wrong.
If there are any further clarifications needed, please don't hesitate to ask. We would like to resolve this issues as soon as possible as it has quite an impact on (the scalability of) our Freifunk networks in the wild.
[0]:
[1]:
[2]:
[3]:
[4]:
[5]:
[6]:
[7]:
[8]: