Spark - Experienced dataloss on a job running on 2.2.9-debian12 [331226134]

Assigned

Bug

Status Update

No update yet.

Description

an...@oncrawl.com

created issue #1

Mar 26, 2024 07:58AM

I experienced data loss running a Spark job on the dataproc image 2.2.9-debian12.

I was able to find a related (now closed) SPARK issue:

https://issues.apache.org/jira/browse/SPARK-47019

Setting the property `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` to `false` (the default before Spark 3.5.0) did fix the issue.

I was able to confirm locally that bumping to Spark 3.5.1 did fix the problem too.

I could not find any information about this around Dataproc, sorry if I missed it, but I though it was worth raising awareness on this.

I was also wondering when a dataproc image based on Spark 3.5.1 was planned? Could not find the information.

Comments

va...@google.com <va...@google.com> Mar 28, 2024 06:36AM

Assigned to nr...@google.com.

nr...@google.com <nr...@google.com> #2Mar 28, 2024 06:05PM

Hello,

To troubleshoot the issue further, I have created a private ticket to provide some information about the issue (for which you should have received a notification). Please provide requested information there. Don't put any personal information, including project identifiers in this public ticket.

nr...@google.com <nr...@google.com> #3Apr 4, 2024 06:23AM

Reassigned to gc...@google.com.

Hello,

Thank you for reaching out to us with your request.

We have duly noted your feedback and will thoroughly validate it. While we cannot provide an estimated time of implementation or guarantee the fulfillment of the issue, please be assured that your input is highly valued. Your feedback enables us to enhance our products and services.

We appreciate your continued trust and support in improving our Google Cloud Platform products. In case you want to report a new issue, Please do not hesitate to create a new issue on the Issue Tracker providing a detailed description of your issue.

Once again, we sincerely appreciate your valuable feedback; Thank you for your understanding and collaboration.

Issue 331226134

Description

Issue summary

Comments

va...@google.com <va...@google.com> Mar 28, 2024 06:36AM

nr...@google.com <nr...@google.com> #2Mar 28, 2024 06:05PM

nr...@google.com <nr...@google.com> #3Apr 4, 2024 06:23AM

Add comment

Issue metadata