WAI
Status Update
Comments
ja...@ambiata.com <ja...@ambiata.com> #2
I agree it would be good if each output response could be linked to an input file at least, or yes returned in the original order
yb...@tangerine.ca <yb...@tangerine.ca> #3
This is very important to resolve, otherwise what's the use of the batch prediction job - if we cant guarantee the order of the probability scores for each input line..
ma...@telus.com <ma...@telus.com> #4
Like to just add another case that is similar in ask:
Google Cloud Support 42224058: Are not able to use Batch Predictions Vertex Pipeline Component as predictions are not labelled.
Google Cloud Support 42224058: Are not able to use Batch Predictions Vertex Pipeline Component as predictions are not labelled.
ra...@gmail.com <ra...@gmail.com> #5
sh...@google.com <sh...@google.com> #6
We internally use some form of MapReduce framework to run the batch prediction and the data is shuffled across multiple workers so by nature they are not sorted. The way to get the result in some order is mentioned in
sh...@google.com <sh...@google.com>
gr...@bebr.nl <gr...@bebr.nl> #7
So just to confirm, until this patch is in General Availability (eta 3-6 months?), there is no built-in way to use ModelBatchPredictOp or BatchPredictionJob to do batch predictions in a production environment? Because if the predictions cannot be linked back to the thing they are predicting for then there isn't much point in making them!
One potential work around I can see is to make a copy of the prediction features in your pipeline before they go into the ModelBatchPredictOp with the ids attached, then take the instances+predictions output of ModelBatchPredictOP and join together using the features as merge keys. But this seems pretty fraught with potential errors. But I'm curious if there a better way? In a custom model you could probably handle this in the predictor I guess?
Finally this patch says it makes these passthrough arguments available on BatchPredictionJob (which is great!), but will they also be added to ModelBatchPredictOp?
One potential work around I can see is to make a copy of the prediction features in your pipeline before they go into the ModelBatchPredictOp with the ids attached, then take the instances+predictions output of ModelBatchPredictOP and join together using the features as merge keys. But this seems pretty fraught with potential errors. But I'm curious if there a better way? In a custom model you could probably handle this in the predictor I guess?
Finally this patch says it makes these passthrough arguments available on BatchPredictionJob (which is great!), but will they also be added to ModelBatchPredictOp?
Description
Please describe your requested enhancement. Good feature requests will solve common problems or enable new use cases.
What you would like to accomplish:
Currently, Vertex AI Batch Predictions does not guarantee order of output data.
This is not documented as a limitation of Vertex AI but AI Platform document[1] suggests this is WAI, also I can observe this phenomenon with Vertex AI.
It would be more useful if output files are generated with the same order as an input file.
[1]