Assigned
Status Update
Comments
hi...@google.com <hi...@google.com> #2
A possible approach to using automation for VertexAI is creating a runtime Jupyterlab templates via serverless Dataproc: https://cloud.google.com/dataproc-serverless/docs/overview
Description
The user is unhappy with the current situation where user-managed and managed notebooks are being deprecated. They find the proposed alternative of creating a workbench instance and having users manually create Dataproc clusters inconvenient for several reasons:
It would be great if: User-managed and managed notebooks are being deprecated by GCP.
The proposed alternative of creating Workbench instances and manual Dataproc cluster provisioning is impractical due to:
Increased burden on users: Notebook users shouldn't manage infrastructure. Lack of isolation: No way to guarantee each user has their own Dataproc cluster. Disruptive workflow changes: Requires additional training and a significant shift. I created this issue tracker for customer visibility. Engineering team will update here once the refresh is done internally.
In user-managed notebooks, the user was able to spin up a dataproc environment, on demand, to run the Jupyter notebook by just clicking on the OPEN JUPYTERLAB button.
The dataproc spawner behind the scenes would go and spawn a dataproc cluster for us. However, I realized a number of Github repositiories that have to do with the dataproc spawner got archived.
I did however come across this repo [1] but if you scroll to the bottom of the readme, it says "For a Google-supported version of the Dataproc Spawner, refer to the official Dataproc Hub documentation." The Dataproc Hub documentation [2] they're pointing to has been deprecated.
The "Create a Dataproc-Enabled Vertex AI Workbench Instance"[3] Documentation says nothing about Dataproc Hub.
We don't want to have folks creating dataproc clusters manually, we want the Workbench Instance to create it for them on launch, just how we were able to do it with user-managed notebooks.
[1]https://github.com/GoogleCloudDataproc/jupyterhub-dataprocspawner
[2]https://cloud.google.com/dataproc/docs/tutorials/dataproc-hub-admins
[3]https://cloud.google.com/vertex-ai/docs/workbench/instances/create-dataproc-enabled Comment