Assigned
Status Update
Comments
fn...@google.com <fn...@google.com> #2
Hi , Could you please clarify the issue description or share any screen shot of the problem where you are facing issue ?
No update yet.
Hi , Could you please clarify the issue description or share any screen shot of the problem where you are facing issue ?
Description
Problem you have encountered:
environment.yml
does not work in private IP Dataproc cluster.What you expected to happen: The cluster create script should not try to connect to the internet however it should only use the internal artifactory.
Steps to reproduce:
Create a dataproc cluster and pass the dataproc:conda.env.config.uri flag . The environment.yml file used specifies an internal factory as channels that are to be used to install packages.
Workarounds:
Create a new cluster using the above init script.
To run the job, explicity set the Spark job properties to point to virtual environment's Python for example
--properties="spark.pyspark.python=/opt/conda/miniconda3/envs/<env_name>/bin/python,spark.pyspark.driver.python=/opt/conda/miniconda3/envs/env_name/bin/python"
. Replaceenv_name
with the name of the environment created in step 1.