Change theme
Help
Press space for more information.
Show links for this issue (Shortcut: i, l)
Copy issue ID
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
Vote: I am impacted
Notification menu
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Pending code changes (auto-populated)
View issue level access limits(Press Alt + Right arrow for more information)
Request for new functionality
View staffing
Description
Summary:
Hadoop's default behaviour is to automatically decompress files with the ).
.gz
extension (seehere
When gzip encoding is enabled (
fs.gs.inputstream.support.gzip.encoding.enable=true
), upon reading gzip-encoded files from GCS, both the GCS connector and Hadoop FS will attempt to decompress the file, leading to errors like:Since disabling the gzip decompression behaviour in Hadoop is not possible without changing the
hadoop-core
library, it's helpful if the GCS connector can automatically skip the decompression when the file extension is.gz
or at least provide a configuration property for disabling the automatic decompression.Github Issue:https://github.com/GoogleCloudDataproc/hadoop-connectors/issues/1060
Reproduction Steps:
Content-Encoding: gzip
core:fs.gs.inputstream.support.gzip.encoding.enable=true
Mitigation:
Either unset the
Content-Encoding: gzip
metadata field on the GCS object (so the connector would not decompress it) or remove the.gz
extension from the object name