Change theme
Help
Press space for more information.
Show links for this issue (Shortcut: i, l)
Copy issue ID
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
Vote: I am impacted
Notification menu
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Pending code changes (auto-populated)
View issue level access limits(Press Alt + Right arrow for more information)
Request for new functionality
View staffing
Description
Please describe your requested enhancement. Good feature requests will solve common problems or enable new use cases.
What you would like to accomplish: Upload a multiple file pipeline as python package, using a private package hosted in a private python artifact repository and using (if possible) a custom container for faster uploads and consistent dependencies.
The customer added that when they try to upload a job with a setup file that points to their local artifact registry, it downloads an html file rather than finding the package and installing it (with pip). However, it does work when you use a specific url to a specific .whl file.
So they tried to upload a Dataflow job without a custom container, but rather accessing a private python repository in deploy-time to install their package. So what they’re looking for is the right way to deploy with a setup file, as it is working with a specific .whl path, but not with a repository url.
How this might work:
If applicable, reasons why alternative solutions are not sufficient: The customer said that their Dataflow work is mainly for streaming jobs, so dataproc + pyspark is probably not their direction. As for using the custom container option, they have tried it but doing so, led them to not being able to run the pipeline as a package. This means that they can’t deploy a pipeline that consists of multiple files, which was their initial problem.
Other information (workarounds you have tried, documentation consulted, etc):