## Problem you have encountered:

The customer wants to be able to have accurate metrics that could be used to monitor their Cloud Spanner service.

After the enablement of change streams is expected to see a high number of canceled and failed operations. This relies inside the "expected behavior". However this impacts the customer when they try to diagnose any genuine issue with a significant count of queries encountering cancel status (what it would be expected due the change streams enabled).

In order to improve the monitoring, even with this flood of canceled and failed queries per change streams. The only available metric that can be used exclusively for monitor change streams is the following[1] (api/request_latencies_by_change_stream_read_type), however this is a new metric for change streams that does not specifically can improve the filtering for the status of general queries over spanner.

## How this might work:

Provide a metric that can help to filter change streams queries per their status. Being able to exclude those canceled that are expected to be reported by change streams (as these queries are also taken by the general status metric of the queries and does not let the cx identify genuine issues with a significant count of queries encountering cancel status).

## If applicable, reasons why alternative solutions are not sufficient:

The options available for this request were explored.

Due to the granularity this requires I was not able to find a feasible solution or workaround that wouldn't imply losing data. (e.g: filtering per method, dataflow job).

After reviewing this with a specialist, we do not support that level of granularity for metrics in spanner as both options on the cst desired metric are logged the same way in our side therefore we cannot group in a way to differentiate them.

## Other information(workarounds you have tried, documentation consulted, etc):
The options available do not provide a full coverage of the request, due is missing data from queries:

1. In the specific case between a dataflow job against a spanner change stream, the 'cancel' queries are expected to be seen in metrics. So in this case, we can define their alerts or dashboards to exclude 'OK' and 'Cancel' together.

2. In the case of a dataflow pipeline, add filter for exclude OK and Cancel for the specific dataflow pipeline.

3. An alternative might be with data access audit logs, and a log based metric customized by the customer based on their scenario. (However this would imply a significant increase in costs)

## References:

(1): https://cloud.google.com/spanner/docs/monitoring-console#:~:text=Latency%20by-,change%20stream%20read,-api/read_request_latencies_by_change_stream