Status Update
Comments
ak...@google.com <ak...@google.com>
bo...@google.com <bo...@google.com> #2
Hello,
I understand your issue is that the upload using the Python libraries to GS buckets and loading a table to BigQuery is comparatively slower than the upload using the commands gsutil and bq. Please let me know if I have misunderstood.
Let me clarify that the comparision should be done this way: Python's blob.upload_from_file() vs gsutil command and Python's load_table_from_file() vs bq command. Once that is clear, I would like to ask you for the codes that you use so I can reproduce the situation myself and get further insights. Please remove all the personal information from your codes before sharing them.
I will wait for your response,
Manuel Alaman
Google Cloud Big Data Support Barcelona
I understand your issue is that the upload using the Python libraries to GS buckets and loading a table to BigQuery is comparatively slower than the upload using the commands gsutil and bq. Please let me know if I have misunderstood.
Let me clarify that the comparision should be done this way: Python's blob.upload_from_file() vs gsutil command and Python's load_table_from_file() vs bq command. Once that is clear, I would like to ask you for the codes that you use so I can reproduce the situation myself and get further insights. Please remove all the personal information from your codes before sharing them.
I will wait for your response,
Manuel Alaman
Google Cloud Big Data Support Barcelona
wd...@google.com <wd...@google.com> #3
You are correct.
Attached python script will generate a test csv file and conduct the python client test. Please find and replace all occurrences of `UPDATE_THIS` text.
It also has the DDL query you'll need to use to create the BQ table before you run the script.
Additionally, it has the exact bq command you'll need to test the bq CLI utility against the same file.
I just tested again after creating this using python 3.6.9, google-cloud-bigquery 2.20.0, and BigQuery CLI 2.0.69 (most recent versions). I still see the same performance difference (~ 4MBps upload from the python client, vs ~70MBps upload for the same file to the same table using BigQuery CLI.
Let me know if you need anything else.
Attached python script will generate a test csv file and conduct the python client test. Please find and replace all occurrences of `UPDATE_THIS` text.
It also has the DDL query you'll need to use to create the BQ table before you run the script.
Additionally, it has the exact bq command you'll need to test the bq CLI utility against the same file.
I just tested again after creating this using python 3.6.9, google-cloud-bigquery 2.20.0, and BigQuery CLI 2.0.69 (most recent versions). I still see the same performance difference (~ 4MBps upload from the python client, vs ~70MBps upload for the same file to the same table using BigQuery CLI.
Let me know if you need anything else.
jo...@jmgao.dev <jo...@jmgao.dev> #4
Hey there any update on this?
jm...@gmail.com <jm...@gmail.com> #5
Hi Kevin,
We are still investigating the issue. At this point we obtained [1] for the script and [2] for the bq command, where the “Upload complete” was achieved in about 11 seconds.
Further updates will be published here.
[1]
2021-06-30 06:55:01,496 root test_uploads INFO: Beginning load job...
2021-06-30 06:57:08,662 root test_uploads INFO: Job ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
2021-06-30 06:57:08,662 root test_uploads INFO: BQ load job complete without error!
[2]
Upload complete.
Waiting on bqjob_XXXXXXXXXXXXXXXXX_XXXXXXXXXXXXXXXX_X ... (48s) Current status: DONE
We are still investigating the issue. At this point we obtained [1] for the script and [2] for the bq command, where the “Upload complete” was achieved in about 11 seconds.
Further updates will be published here.
[1]
2021-06-30 06:55:01,496 root test_uploads INFO: Beginning load job...
2021-06-30 06:57:08,662 root test_uploads INFO: Job ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
2021-06-30 06:57:08,662 root test_uploads INFO: BQ load job complete without error!
[2]
Upload complete.
Waiting on bqjob_XXXXXXXXXXXXXXXXX_XXXXXXXXXXXXXXXX_X ... (48s) Current status: DONE
ma...@google.com <ma...@google.com> #6
Hi there, has there been any progress on this? Should I move this over to an Issue at https://github.com/googleapis/google-cloud-python ?
wd...@google.com <wd...@google.com> #8
Since this is being investigated in github, having this issue also here seems like a duplicate. Let's close this and follow the fix on github.
jo...@jmgao.dev <jo...@jmgao.dev> #9
It's QEMU.
Are you seeing it hang there? I didn't see any threads blocked on getaddrinfo in gdb.
My question is: will connection to localhost still requires talking to the DNS server?
It shouldn't, names in /etc/hosts should be immediately resolved without having to talk to the DNS server.
ma...@google.com <ma...@google.com> #11
If calling getaddrinfo() with ipv4/ipv6 literals fails when the dns server is unreachable or unresponsive, instead of returning the obvious thing... then that feels like a getaddrinfo() bug (which I think would mean a bug in libc, likely glibc...). That's my interpretation / take away from #10.
Of course this could be fixed by a trivial wrapper around getaddrinfo() to fix this case...
---
There is one gotcha here... technically on an ipv6-only network you could want getaddrinfo(AF_UNSPEC/AF_INET6, "1.2.3.4") to return "nat64prefix96bits::1.2.3.4" (instead of just 1.2.3.4) which requires looking up AAAA records on ipv4only.arpa to figure out the prefix (the lookup against a normal DNS server fails, against a DNS64 server, like google public dns64, returns for example 64:ff9b:1::c000:aa.
[Note this of course wouldn't happen with a getaddrinfo(AF_INET, "1.2.3.4") lookup since that can't return AF_INET6 addresses]
Using ipv6 instead of ipv4 for ipv4 literals has two benefits. It works if the DNS server is dead, or not actually fully DNS64 enabled (Orange PL cell network, only does the ipv4only.arpa dance but not the full add prefix on everything dance). Additionally it gets rid of one layer of nat (the nat46 clat portion), and only keeps the ISP side (the nat64 plat portion), which is simply more efficient for everyone involved.
ie. use of ipv6 even for ipv4 literals should be preferred on an ipv6only network whenever possible.
For example, I'm sitting on a T-Mobile US cellular hotspot at the moment... this is an ipv6-only network with NAT64/DNS64:
maze@maze-glaptop:~$ host ipv4only.arpa
ipv4only.arpa has address 192.0.0.170
ipv4only.arpa has address 192.0.0.171
ipv4only.arpa has IPv6 address 2607:7700:0:1c:0:1:c000:aa
ipv4only.arpa has IPv6 address 2607:7700:0:1c:0:1:c000:ab
^ replies showing 2607:7700:0:1c:0:1::/96 prefix.
maze@maze-glaptop:~$ host ipv4only.arpa.dns.google
Using domain server:
Name:dns.google
Address: 2001:4860:4860::8844#53
Aliases:
ipv4only.arpa has address 192.0.0.170
ipv4only.arpa has address 192.0.0.171
^ normal non-DNS64 server -- what you get if you query the actual authoritative servers for "arpa."
maze@maze-glaptop:~$ host ipv4only.arpa.dns64.dns.google
Using domain server:
Name:dns64.dns.google
Address: 2001:4860:4860::6464#53
Aliases:
ipv4only.arpa has address 192.0.0.170
ipv4only.arpa has address 192.0.0.171
ipv4only.arpa has IPv6 address 64:ff9b::c000:aa
ipv4only.arpa has IPv6 address 64:ff9b::c000:ab
^ DNS64 server with the default 'well known' prefix -- what you get if you query the authoritative servers for "arpa." and then do DNS64 synthesis with the WKP (well known prefix)
---
So... this means in order of preference:
(a) fix getaddrinfo() -- but this will be hard, because if I'm not mistaken this requires fixing (likely) multiple libc (glibc/musl/etc...) and rolling it out to the world... this of course would be best
bonus points for getting the dns64 synthesis on ipv4 literals working... (apple does this)
(b) wrap getaddrinfo() in a fixed version of the function, have qemu use that...
(b1) if getaddrinfo() fails, check for literal, spoof return value <-- if getaddrinfo() is ever fixed (or isn't broken on a given system) or does dns64 this just works
(b2) check for literal, return explicit value, only call getaddrinfo() for non literals <-- no real benefit to doing this
while dns64 synthesis could be done as well in B1, there's no benefit in B2, since with working DNS (which is needed to query ipv4only.arpa and figure out the prefix) we'll never hit the new code anyway.
[though note that technically the ipv6only ipv4 96-bit prefix can technically also be learned from listening to RAs, but again really only likely if the network already works, and then dns should too...]
---
It's a lot of words... but I'm pretty sure you'll end up with [b2].
fixed_getaddrinfo(...) { v = getaddrinfo(...); if (!error) return v; if (!literal) return v; return (manual literal parsing, taking care to handle AI_V4MAPPED correctly)... }
Of course this could be fixed by a trivial wrapper around getaddrinfo() to fix this case...
---
There is one gotcha here... technically on an ipv6-only network you could want getaddrinfo(AF_UNSPEC/AF_INET6, "1.2.3.4") to return "nat64prefix96bits::1.2.3.4" (instead of just 1.2.3.4) which requires looking up AAAA records on ipv4only.arpa to figure out the prefix (the lookup against a normal DNS server fails, against a DNS64 server, like google public dns64, returns for example 64:ff9b:1::c000:aa.
[Note this of course wouldn't happen with a getaddrinfo(AF_INET, "1.2.3.4") lookup since that can't return AF_INET6 addresses]
Using ipv6 instead of ipv4 for ipv4 literals has two benefits. It works if the DNS server is dead, or not actually fully DNS64 enabled (Orange PL cell network, only does the ipv4only.arpa dance but not the full add prefix on everything dance). Additionally it gets rid of one layer of nat (the nat46 clat portion), and only keeps the ISP side (the nat64 plat portion), which is simply more efficient for everyone involved.
ie. use of ipv6 even for ipv4 literals should be preferred on an ipv6only network whenever possible.
For example, I'm sitting on a T-Mobile US cellular hotspot at the moment... this is an ipv6-only network with NAT64/DNS64:
maze@maze-glaptop:~$ host ipv4only.arpa
ipv4only.arpa has address 192.0.0.170
ipv4only.arpa has address 192.0.0.171
ipv4only.arpa has IPv6 address 2607:7700:0:1c:0:1:c000:aa
ipv4only.arpa has IPv6 address 2607:7700:0:1c:0:1:c000:ab
^ replies showing 2607:7700:0:1c:0:1::/96 prefix.
maze@maze-glaptop:~$ host ipv4only.arpa.
Using domain server:
Name:
Address: 2001:4860:4860::8844#53
Aliases:
ipv4only.arpa has address 192.0.0.170
ipv4only.arpa has address 192.0.0.171
^ normal non-DNS64 server -- what you get if you query the actual authoritative servers for "arpa."
maze@maze-glaptop:~$ host ipv4only.arpa.
Using domain server:
Name:
Address: 2001:4860:4860::6464#53
Aliases:
ipv4only.arpa has address 192.0.0.170
ipv4only.arpa has address 192.0.0.171
ipv4only.arpa has IPv6 address 64:ff9b::c000:aa
ipv4only.arpa has IPv6 address 64:ff9b::c000:ab
^ DNS64 server with the default 'well known' prefix -- what you get if you query the authoritative servers for "arpa." and then do DNS64 synthesis with the WKP (well known prefix)
---
So... this means in order of preference:
(a) fix getaddrinfo() -- but this will be hard, because if I'm not mistaken this requires fixing (likely) multiple libc (glibc/musl/etc...) and rolling it out to the world... this of course would be best
bonus points for getting the dns64 synthesis on ipv4 literals working... (apple does this)
(b) wrap getaddrinfo() in a fixed version of the function, have qemu use that...
(b1) if getaddrinfo() fails, check for literal, spoof return value <-- if getaddrinfo() is ever fixed (or isn't broken on a given system) or does dns64 this just works
(b2) check for literal, return explicit value, only call getaddrinfo() for non literals <-- no real benefit to doing this
while dns64 synthesis could be done as well in B1, there's no benefit in B2, since with working DNS (which is needed to query ipv4only.arpa and figure out the prefix) we'll never hit the new code anyway.
[though note that technically the ipv6only ipv4 96-bit prefix can technically also be learned from listening to RAs, but again really only likely if the network already works, and then dns should too...]
---
It's a lot of words... but I'm pretty sure you'll end up with [b2].
fixed_getaddrinfo(...) { v = getaddrinfo(...); if (!error) return v; if (!literal) return v; return (manual literal parsing, taking care to handle AI_V4MAPPED correctly)... }
ma...@google.com <ma...@google.com> #12
Another possible hack would be to switch from 127.0.0.1 / ::1 to things that should likely be in /etc/hosts...
Maybe stuff from /etc/hosts works even when dns is dead?
ie. inhttps://android.googlesource.com/platform/external/qemu/+/refs/heads/emu-master-dev/android-qemu2-glue/main.cpp#2801
worth checking if things like "localhost" (instead of 127.0.0.1, though this might return ::1 instead...) and ip6-localhost (::1) work even when literals fail...
Unfortunately there's no 100% chance that these entries are in /etc/hosts, and indeed localhost might be ::1 instead...
So perhaps, another solution, would be to alternate attempts between literals (first!) and 'localhost / ip6-localhost' (there doesn't seem to be a well known ipv4 only localhost name:
maze@maze-glaptop:~$ cat /etc/hosts | egrep local
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
)
But I think [b2] approach is nicer... though probably more complex to implement (you should really use the new fixed_getaddrinfo() at every call site...)
Maybe stuff from /etc/hosts works even when dns is dead?
ie. in
worth checking if things like "localhost" (instead of 127.0.0.1, though this might return ::1 instead...) and ip6-localhost (::1) work even when literals fail...
Unfortunately there's no 100% chance that these entries are in /etc/hosts, and indeed localhost might be ::1 instead...
So perhaps, another solution, would be to alternate attempts between literals (first!) and 'localhost / ip6-localhost' (there doesn't seem to be a well known ipv4 only localhost name:
maze@maze-glaptop:~$ cat /etc/hosts | egrep local
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
)
But I think [b2] approach is nicer... though probably more complex to implement (you should really use the new fixed_getaddrinfo() at every call site...)
wd...@google.com <wd...@google.com> #13
Hi
Description
When disconnected from any networks on my laptop, I'm seeing qemu hang forever on startup. The name resolution error message looks suspicious, but it's not IPv6 related, it still hangs after
ip addr del ::1 dev lo
, with 127.0.0.1 instead of ::1 in the message.I've attached thread backtraces and memory mappings, in case you want to symbolicate the missing frames.