Status Update
Comments
fh...@google.com <fh...@google.com> #2
by...@gmail.com <by...@gmail.com> #3
Hi,
Can you provide more information about:
- Steps to reproduce the issue.
- If possible can you provide a sample data for reproduction. Please remove PII if there are any.
If possible can you also provide a screenshot of the error?
Thanks
du...@gmail.com <du...@gmail.com> #4
{
"insertId": "63a0f443-0000-2ad1-bbc1-f403045f7a4e@a1",
"jsonPayload": {
"context": "CDC",
"event_code": "UNSUPPORTED_EVENTS_DISCARDED",
"read_method": "",
"message": "Discarded 1180 unsupported events for BigQuery destination: 880653332314.datastream_txns_public.adjustment_adjustmentmodel, with reason code: BIGQUERY_TOO_MANY_PRIMARY_KEYS, details: Failed to create the table in BigQuery, because the source table has too many primary keys.."
},
"resource": {
"type": "
"labels": {
"resource_container": "REDACTED",
"stream_id": "sandpit-txns-to-sandpit-bq1",
"location": "europe-west2"
}
},
"timestamp": "2022-10-04T12:56:20.846705Z",
"severity": "WARNING",
"logName": "projects/REDACTED/logs/
"receiveTimestamp": "2022-10-04T12:56:21.525370213Z"
},
bl...@gmail.com <bl...@gmail.com> #5
CREATE TABLE public.adjustment_adjustmentmodel (
created_date timestamptz NOT NULL,
modified_date timestamptz NOT NULL,
id uuid NOT NULL,
adjustment_type int4 NOT NULL,
adjusted_item_id uuid NOT NULL,
line_item_id uuid NOT NULL,
transaction_reference_id uuid NOT NULL,
promotion_id uuid NULL,
CONSTRAINT adjustment_adjustmentmodel_line_item_id_key UNIQUE (line_item_id),
CONSTRAINT adjustment_adjustmentmodel_pkey PRIMARY KEY (id)
);
sa...@gmail.com <sa...@gmail.com> #6
bh...@techprovint.com <bh...@techprovint.com> #7
CASE WHEN pg_index.indisprimary IS NULL THEN $32 ELSE $33 END AS is_primary_key
but pg_index.indisprimary can be 't' or 'f', so just checking for NULL results in columns from non-primary key indexes to be flagged as a primary index. The line should change to something like:
CASE WHEN pg_index.indisprimary = 't' THEN $33 ELSE $32 END AS is_primary_key
This query also is the cause of another bug: 251216031
la...@gmail.com <la...@gmail.com> #8
Hi,
Just to confirm, are you following this
The log you have posted pertains to datastream_txns_public.adjustment_adjustmentmodel and your DDL points to public.adjustment_adjustmentmodel. There seems to have a mismatch of the logs and DDL. Is this expected?
Thanks
en...@gmail.com <en...@gmail.com> #9
si...@gmail.com <si...@gmail.com> #10
dj...@gmail.com <dj...@gmail.com> #11
ja...@casetext.com <ja...@casetext.com> #12
Hi @mark.doutre,
I tried replicating your issue by following this
Prior to this I created a table in my CloudSQL Postgres database using the DDL you have provided. See schema [1].
I created mock data using the query below:
INSERT INTO public.adjustment_adjustmentmodel (created_date,modified_date,id,adjustment_type,adjusted_item_id,line_item_id,transaction_reference_id,promotion_id) values ('2022-09-15 19:00+11:00','2022-09-16 19:00+11:00',gen_random_uuid (),1,gen_random_uuid (),gen_random_uuid (),gen_random_uuid (),gen_random_uuid ());
Postgres Data is inserted successfully as seen in [2]. I proceeded with creating the profile for both Postgres and BigQuery as seen in the
Let me know if I missed anything on my reproduction steps so I can retry my replication based on the steps you have taken.
[1] adjustment_schema.png
[2] postgres_query_output.png
[3] created_stream.png
[4] bq_streamed_data.png
en...@gmail.com <en...@gmail.com> #13
When I view the schema in datastream, I see the attached.
kr...@imatronix.com <kr...@imatronix.com> #14
Hi ri...,
To replicate the issue you need to add indexes that also reference the primary key column. For example, creating this table and trying to make a stream with this table will fail with the BIGQUERY_TOO_MANY_PRIMARY_KEYS error, even though it clearly only has 1 primary key with a single column id
.
CREATE TABLE too_many_keys_failure (
id int,
created_date timestamp,
last_modified_date timestamp,
user_id int,
facility_id int,
manager_id int,
is_available bool,
CONSTRAINT id_pk PRIMARY KEY (id) --NOTE THAT THIS IS THE ONLY PRIMARY KEY!
);
--NOTE THE NON-PRIMARY KEY INDEXES
CREATE INDEX ON too_many_keys_failure (id, user_id, facility_id, manager_id, is_available, last_modified_date);
CREATE INDEX ON too_many_keys_failure (id, user_id, facility_id, manager_id, is_available);
CREATE INDEX ON too_many_keys_failure (id, user_id, facility_id, manager_id, last_modified_date);
CREATE INDEX ON too_many_keys_failure (id, user_id, facility_id, is_available, last_modified_date);
CREATE INDEX ON too_many_keys_failure (id, user_id, facility_id);
CREATE INDEX ON too_many_keys_failure (id, facility_id);
INSERT INTO too_many_keys_failure
VALUES
(1,current_timestamp, current_timestamp, '1','1','2',false),
(2,current_timestamp, current_timestamp, '2','1',null,false),
(3,current_timestamp, current_timestamp, '3','1',null,false);
This appears to be due to how Datastream attempts to detect primary keys in its query to PostgreSQL. I believe there is a bug in the query that was written/generated where it checks pg_index.indisprimary IS NULL
instead of pg_index.indisprimary = 't'
The screenshot attached shows the failure when creating a new stream for this the table public_too_many_keys_failure
. Note that the prefix of the schema public
is because I chose the "single dataset for all schemas option when setting up the stream.
I've also in the stream added a second table called public_too_this_one_works
that is identical to this table except without the non-primary key indexes. This one is shown in the screenshot and we can see that it wrote the 3 records.
ol...@onehot.io <ol...@onehot.io> #15
Hi @leon.verhelst,
Thank you for providing additional details. I will provide an update on my replication and findings.
fo...@gradient.ai <fo...@gradient.ai> #16
Hi,
I was able to replicate the issue. I reached out to the product team and created an internal issue about this. Please keep in mind that this issue has to be analyzed and considered by the product team and I can't provide you an ETA for it. However, you can keep track of the status by following this thread.
su...@gmail.com <su...@gmail.com> #18
Hi,
I would appreciate it if you can provide your insight on this. Assuming a unique key is a must for clustering to properly load data into BigQuery, would you expect data stream to randomly choose a unique index if one such index exists and use it as the primary key in BigQuery, or look for primary keys only, and fail if none exists?
Thanks
da...@difr.me <da...@difr.me> #19
I would rather see an option on the destination creation, where the use can specify how the data should be clustered or partitioned if required. For instance, in my use case I want to take transactional data from Postgres and load it into BQ for analytics purposed. The destination query workloads are going to be different from the source workloads, so it would be advantage for my usecase if I could cluster data, for example, on some userid etc to assist in analysis.
r....@gmail.com <r....@gmail.com> #20
I would expect BigQuery to respect the REPLICA IDENTITY
from the source tables and act similar to PostgreSQL's rules for setting up a publication:
From:
A published table must have a “replica identity” configured in order to be able to replicate UPDATE and DELETE operations, so that appropriate rows to update or delete can be identified on the subscriber side. By default, this is the primary key, if there is one. Another unique index (with certain additional requirements) can also be set to be the replica identity. If the table does not have any suitable key, then it can be set to replica identity “full”, which means the entire row becomes the key. This, however, is very inefficient and should only be used as a fallback if no other solution is possible. If a replica identity other than “full” is set on the publisher side, a replica identity comprising the same or fewer columns must also be set on the subscriber side. See REPLICA IDENTITY for details on how to set the replica identity. If a table without a replica identity is added to a publication that replicates UPDATE or DELETE operations then subsequent UPDATE or DELETE operations will cause an error on the publisher. INSERT operations can proceed regardless of any replica identity.
A Postgres -> BigQuery replication should use the REPLICA IDENTITY
that is set on the source table, which normally is set like so:
- Use the PK if exists
- Otherwise use a specified unique index as per the table definition
- Otherwise use the full row
For information on how to set the replica identity see:
Finding the replica identity for a table is done as described here:
js...@gmail.com <js...@gmail.com> #21
ee...@google.com <ee...@google.com> #22
be...@gmail.com <be...@gmail.com> #23
la...@gmail.com <la...@gmail.com> #24
vi...@align.com.au <vi...@align.com.au> #25
ad...@groon.dev <ad...@groon.dev> #26
A fix for this bug is currently being rolled-out, and should be applied to all Google Cloud regions by the end of the week (Oct. 29).
ba...@gmail.com <ba...@gmail.com> #27
de...@google.com <de...@google.com> #28
re...@gmail.com <re...@gmail.com> #29
Is this issue the same solution the following error message?
BIGQUERY_UNSUPPORTED_PRIMARY_KEY_CHANGE
am...@gmail.com <am...@gmail.com> #30
datastream because it does not allow to generate the partitioned table?
or am i doing something wrong?
to...@gmail.com <to...@gmail.com> #31
ma...@gmail.com <ma...@gmail.com> #32
In our case the table it is trying to copy over from Cloud SQL (MySQL) to BigQuery using the new Datastream feature does have 5 columns as its Primary Key. Is there a limit to the number of columns in a primary key for this to work? Not sure why this is a limitation..
- Error message details: Failed to create the table in BigQuery, because the source table has too many primary keys.."
er...@google.com <er...@google.com> #33
step by step to reproduce this:
1. start datastream from postgresql to bigquery
2. after the transfer done, all tables in bigquery, pause the job
3. partition one of the table
4. resume the job
5. the log says
{
"insertId": "640c1ed5-0000-20cd-8059-883d24fc7d54@a1",
"jsonPayload": {
"read_method": "",
"event_code": "UNSUPPORTED_EVENTS_DISCARDED",
"context": "CDC",
"message": "Discarded 25 unsupported events for BigQuery destination: DATASET_ID, with reason code: BIGQUERY_UNSUPPORTED_PRIMARY_KEY_CHANGE, details: Failed to write to BigQuery due to an unsupported primary key change: adding primary keys to existing tables is not supported.."
},
"resource": {
"type": "
"labels": {
"resource_container": "",
"location": "LOCATION",
"stream_id": "DATASET_ID"
}
},
"timestamp": "2022-11-16T04:40:05.318457Z",
"severity": "WARNING",
"logName": "projects/PROJECT_ID/logs/
"receiveTimestamp": "2022-11-16T04:40:06.332008985Z"
}
i checked the differences only lies on the partitioned table or not, the cluster is the same (using the id of that table).
when i changed back the destination table to not having partition, it works successfully
ta...@verifast.tech <ta...@verifast.tech> #34
{
insertId: "64443c8d-0000-2756-9db0-14c14ef32a9c@a1"
jsonPayload: {
context: "CDC"
event_code: "UNSUPPORTED_EVENTS_DISCARDED"
message: "Discarded 1677 unsupported events for BigQuery destination: [my table], with reason code: BIGQUERY_TOO_MANY_PRIMARY_KEYS, details: Failed to create the table in BigQuery, because the source table has too many primary keys.."
read_method: ""
}
logName: "projects/PROJECT_ID/logs/
receiveTimestamp: "2022-11-19T00:51:48.226399021Z"
resource: {2}
severity: "WARNING"
timestamp: "2022-11-19T00:51:48.177058Z"
}
Create table statement from source Postgres Cloud SQL:
create table myschema.mytable
(
company_id bigint not null,
region_id integer not null,
day date not null,
sales numeric not null,
hits numeric not null,
constraint mytable_uniq
unique (company_id, region_id, day)
);
sh...@gmail.com <sh...@gmail.com> #35
Apparently there's been some regression to this issue... an updated fix is pending, and will be rolled out ASAP.
I'll update here again once the fix is in production.
ad...@gmail.com <ad...@gmail.com> #36
Hi Team, - Seems like I am as well having the same issue as #33 (This issues is blocking for me in production )Please let us know the status in this .
I am getting error when I was trying to partition the destination table in BigQuery while working with DataStream.
step by step to reproduce this:
1. start DataStream from CloudSQL(MYSQL) to BigQuery
2. once the Stream Completed all tables in BigQuery, pause the job
3. Partition one of the table
4. Resume the job
5. Getting error log as below
====================================================
Discarded 97 unsupported events for BigQuery destination: 833537404433.Test_Membership_1.internal_Membership, with reason code: BIGQUERY_UNSUPPORTED_PRIMARY_KEY_CHANGE, details: Failed to write to BigQuery due to an unsupported primary key change: adding primary keys to existing tables is not supported..
{
insertId: "65ad79ec-0000-24c7-a66e-14223bbf970a@a1"
jsonPayload: {
context: "CDC"
event_code: "UNSUPPORTED_EVENTS_DISCARDED"
message: "Discarded 97 unsupported events for BigQuery destination: 833537404433.Test_Membership_1.internal_Membership, with reason code: BIGQUERY_UNSUPPORTED_PRIMARY_KEY_CHANGE, details: Failed to write to BigQuery due to an unsupported primary key change: adding primary keys to existing tables is not supported.."
read_method: ""
}
logName: "projects/gcp-everwash-wh-dw/logs/
receiveTimestamp: "2022-11-22T22:08:38.620495835Z"
resource: {2}
severity: "WARNING"
timestamp: "2022-11-22T22:08:37.726075Z"
}
---------------------------------------------------------------
What you expected to happen: ?
I am expecting to create Partition for certain tables that are getting inserted in BigQuery via DataStream.
Attaching Screenshot for reference--
ga...@gmail.com <ga...@gmail.com> #37
For Postgres/BQ pairing, what are the steps needed to confirm this fix works? Will a running stream with a broken source table self-correct with the new code? A standard cleanup procedure would be very helpful.
- Does the table in question need to be removed (unchecked, saved) and added again in the source configuration?
- Does the stream need to be stopped (paused) and restarted? Deleted and recreated to pickup the new code?
- Does the destination table need to be deleted in BQ?
ri...@gmail.com <ri...@gmail.com> #38
The fix has been rolled out.\
To recover from this error:
- If a table was already created in BigQuery it should be manually deleted
- Trigger a backfill for the table in Datastream
da...@ai.moda <da...@ai.moda> #39
BigQuery Product Manager here. It looks like the request here is to add partitioning to an existing BigQuery table. Unfortunately that's not supported. You have to add partitioning to a net-new table. Technically you can create a newly partitioned table from the result of a query [1], however this approach won't work for existing Datastream sourced tables since there wouldn't be a _CHANGE_SEQUENCE_NUMBER field which is required to correctly apply UPSERT operations in the correct order. So the only option would be to pre-create the table with partitioning/clustering/primary keys before starting the Datastream stream like the below DDL SQL query example [2].
One thing to note however is that today partitioning may not be as effective to reduce the data scanned when performing background CDC apply operations because the background merges could be an UPSERT against any record within the base table and partitioning pruning isn't propagated to the background operation. However it is worth noting that clustering should still be beneficial because clustering is used (the PK fields are also denoted as the clustered fields).
[1]
[2] CREATE TABLE `project.dataset.new_table`
(
`Primary_key_field` INTEGER PRIMARY KEY NOT ENFORCED,
`time_field` TIMESTAMP,
`field1` STRING,
#Just an example above. Add needed fields within the base table...
)
PARTITION BY
DATE(time_field)
CLUSTER BY
Primary_key_field #This must be an exact match of the specified primary key fields
OPTIONS(max_staleness = INTERVAL 15 MINUTE) #or whatever the desired max_staleness value is
jl...@gladia.io <jl...@gladia.io> #40
fe...@gmail.com <fe...@gmail.com> #41
st...@google.com <st...@google.com> #42
@johan.eliasson - does the table actually have more than 4 PKs? If not, can you share the CREATE TABLE statement (you can email it to me directly instead of posting it here)?
If the table has more than 4 PK columns, then this error is currently the expected behavior, but there's a change coming to BQ which will allow more than 4 columns in the PK. I'm not able to share exact timelines for this change, but it's WIP (perhaps @nickorlove can provide more details).
on...@gmail.com <on...@gmail.com> #43
This is a public thread, so I'll refrain from providing exact timelines, however please note the limit of 4 PKs is a known issue we are working hard to address.
I'll update this thread once more concrete details can be shared with the broader community.
st...@staropshq.com <st...@staropshq.com> #44
I encoutered this issue as well, my DDL is:
CREATE TABLE public.spoon_mst ( spoon_code varchar(32) NOT NULL, qr_code varchar(32) NULL, scan_date timestamp(6) NULL, product_name varchar(128) NULL, weight varchar(16) NULL, mfg_date timestamp(6) NULL, exp_date timestamp(6) NULL, is_active bool NOT NULL DEFAULT true, status varchar(255) NULL DEFAULT 'UNUSED'::character varying, code_length int4 NULL, created_date timestamp NULL DEFAULT now(), description varchar(255) NULL, is_check bool NULL DEFAULT false, updated_date timestamp NULL, ext_id varchar(255) NULL, qr_manufacture_date timestamp NULL, is_synced bool NULL DEFAULT true, "version" int4 NOT NULL DEFAULT 0, CONSTRAINT spoon_mst_pkey PRIMARY KEY (spoon_code) );
My primary key was a random string generated from an algorithm. In case I try to sync other tables with primary key in number format (id), it works well. Is using primary key in string format causes this issue?
il...@gmail.com <il...@gmail.com> #45
It looks like the issue with your DDL is around syntax and that the primary key does not match the table's clustering key.An example DDL to create a table to be used with Datastream would be like this:
CREATE TABLE customers ( ID INT64 PRIMARY KEY NOT ENFORCED, NAME STRING, SALARY INT64) CLUSTER BY ID;
za...@khawajapartners.com <za...@khawajapartners.com> #46
FYI my earlier comment of a suggested DDL was in the frame of mind of running a DDL within BigQuery to create a BQ table which would be used as the destination for Datastream replication.
If your question was more about the syntax of running a DDL from the source database, please ignore
to...@gmail.com <to...@gmail.com> #47
Thank you so much, it worked like a charm after I created Big Query table manually with clustering key then starting a new stream.
an...@gliai.ai <an...@gliai.ai> #48
mi...@gmail.com <mi...@gmail.com> #49
This is now fixed. BigQuery have increased the limitation to 16 PK columns, and Datastream now aligns to this new limitation.
BigQuery still doesn't support more than four clustering columns, so when replicating a table with more than four primary key columns, Datastream uses four primary key columns as the clustering columns.
st...@staropshq.com <st...@staropshq.com> #50
ce...@gmail.com <ce...@gmail.com> #51
This problem is limiting gemini and us as developers a lot, because we want to give the model access to documentation for apis or related stuff to get things RIGHT
ga...@gmail.com <ga...@gmail.com> #52
"Make a python program comparing speed of various sorting methods" also triggers it
How am I supposed to use this model ?
ro...@gmail.com <ro...@gmail.com> #53
ro...@classtime.com <ro...@classtime.com> #54
gemini-1.5-pro-preview-0409
lu...@gmail.com <lu...@gmail.com> #55
Please fix the RECITATION issue asap - it makes the Gemini API hopeless for serious applications.
be...@noricor.com <be...@noricor.com> #56
ve...@gmail.com <ve...@gmail.com> #57
Here are the logs for the Recitation error I am receiving. Maybe it will help you as you asked for it. I am not sure if I will be able to use this API in production, please resolve this issue ASAP.
using Web Server Gateway Interface (WSGI)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1721544261.102805 8 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
I0000 00:00:1721544261.181513 8 check_gcp_environment.cc:61] BIOS data file does not exist or cannot be opened.
[ERROR] 2024-07-21T06:44:26.022Z e7345f62-8c7f-401c-9e97-7af62fcce503 Exception on /generate [POST]
Traceback (most recent call last):
File "/var/task/flask/app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/task/flask/app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/task/flask/app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "/var/task/flask/app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/task/app.py", line 128, in generate
response = chat.send_message( content )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/task/google/generativeai/generative_models.py", line 588, in send_message
self._check_response(response=response, stream=stream)
File "/var/task/google/generativeai/generative_models.py", line 616, in _check_response
raise generation_types.StopCandidateException(response.candidates[0])
google.generativeai.types.generation_types.StopCandidateException: index: 0
finish_reason: RECITATION
ma...@mkmbs.co.uk <ma...@mkmbs.co.uk> #58
ra...@wix.com <ra...@wix.com> #59
st...@google.com <st...@google.com> #60
Hi Ranya, improving Gemini filtering logic is an ongoing process. For some customers, we've already seen a huge improvement from changes that landed in the past few weeks. Other folks are still bringing us edge cases that we're actively debugging. There's no silver bullet here.
ro...@gmail.com <ro...@gmail.com> #61
I'm trying to use 1.5 pro to extract information on policies in various documents. It either adds information that isn't in the document OR if I prompt to only provide information that exists in the document I get the RECITATION. Sometimes i get RECITATION no matter what I prompt with.
pseudo prompt:
Use only the content from text below to answer the question and do not add any information that is not directly in this text:
{{Document}}
za...@khawajapartners.com <za...@khawajapartners.com> #62
ab...@gmail.com <ab...@gmail.com> #63
pa...@gmail.com <pa...@gmail.com> #64
Question is: ❔ Asking Gemini for PXL_20221023_020849161.jpg: 'Is there any text or writing visible in the image?'
ri...@google.com <ri...@google.com> #65
I'm getting this problem using:
gemini-1.5-pro
.- language: python
Output + trace:
❔ Asking Gemini for PXL_20221023_020849161.jpg: 'Describe this image (from a Scuba Diving perspective).'
I0000 00:00:1724522946.811092 76818899 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
❔ Asking Gemini for PXL_20221023_020849161.jpg: 'Are there any fish/maritime life/animal/humans in the image? Give me a bulletpoint list of them, with cardinality before and description in parenthesis (eg "1 turle (green, a bit blurred)")'
❔ Asking Gemini for PXL_20221023_020849161.jpg: 'How would you rate this image quality from 1 to 10? I'm thinking of blurring, and simple ability to see what's in it.'
❔ Asking Gemini for PXL_20221023_020849161.jpg: 'Would you consider this image worth keeping for me (I would say yes if there is a member of my family and/or there's a cool animal or fish, and the quality is not too bad)'
❔ Asking Gemini for PXL_20221023_020849161.jpg: 'What is the main subject of the image?'
❔ Asking Gemini for PXL_20221023_020849161.jpg: 'Describe the colors and patterns present in the image.'
❔ Asking Gemini for PXL_20221023_020849161.jpg: 'Is there any text or writing visible in the image?'
Traceback (most recent call last):
File "/Users/ricc/git/gic/bin/gopro-gemini-iterator.py", line 260, in <module>
main()
File "/Users/ricc/git/gic/bin/gopro-gemini-iterator.py", line 240, in main
response_texts = call_gemini_api(file_path, image_mimetype, prompts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ricc/git/gic/bin/gopro-gemini-iterator.py", line 180, in call_gemini_api
response = chat_session.send_message(prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ricc/git/gic/.venv/lib/python3.12/site-packages/google/generativeai/generative_models.py", line 588, in send_message
self._check_response(response=response, stream=stream)
File "/Users/ricc/git/gic/.venv/lib/python3.12/site-packages/google/generativeai/generative_models.py", line 616, in _check_response
raise generation_types.StopCandidateException(response.candidates[0])
google.generativeai.types.generation_types.StopCandidateException: index: 0
finish_reason: RECITATION
Also note: people are complaining here and would be nice to add an answer there:
ee...@google.com <ee...@google.com> #66
Recitation filters are used by multiple different models for copyright detection and protection.
Due to changing model versions and upgrades, the filter block rate varies. We have made several changes since the bug was first created and addressed many user issues, while also charting a plan to address the issues that are still open (such as the last one that was shared above).
One aspect that would be helpful is if we did not have an umbrella bug, since even when the output is blocked by recitation, the trigger could be different and merging all issues makes debugging harder.
I would like to branch this bug into separate issues, that are still open; so engineering teams can take a look and mark this umbrella bug as a duplicate of those issues.
fheromolea@ as you are the original reporter I am assigning this request to you so you can help triage cases appropriately.
Thanks
Eesha
sa...@gmail.com <sa...@gmail.com> #67
rh...@gmail.com <rh...@gmail.com> #68
Why don't scientists trust atoms? Because they make up everything!
This fails and reports "citations" of:
{
"start_index": 307,
"end_index": 459,
"uri": "
}
Which no offense to the blogger, doesn't exactly seem like original content.
of...@gmail.com <of...@gmail.com> #69
br...@mypicker.net <br...@mypicker.net> #70
I've tried different temperatures and reducing safety settings, but that doesn't improve things.
After switching to 1.5 Flash I'm not getting the RECITATION for those simple prompts (but haven't tested it with lots of cases yet).
Looks like this is a big issue requiring fixing.
na...@gmail.com <na...@gmail.com> #71
ra...@google.com <ra...@google.com> #72
Hi naourass,
Can you share your prompt and the recitation error response.
na...@gmail.com <na...@gmail.com> #73
The actual prompt is a part of a multi-steps chat process with a lengthy history containing both text and document images.
I made a minimal reproducible example which seems to fail every time in AI Studio with the error "Full output blocked. Edit prompt and retry" and the Citation icon having a few reference links.
**MRE prompt:**
What are the descriptions of the norms "NM EN 17032" and "MN EN 13215" ? Extract the full descriptions as they are please.
**MRE image:**
Attached is the MRE input image
la...@google.com <la...@google.com> #74
la...@google.com <la...@google.com> #75
fh...@google.com <fh...@google.com> #76
Closing this thread as per comment66
na...@gmail.com <na...@gmail.com> #77
br...@mypicker.net <br...@mypicker.net> #78
According to
Does someone have a list on those issues so to follow them up?
Thanks!
si...@gmail.com <si...@gmail.com> #79
il...@gmail.com <il...@gmail.com> #80
us...@gmail.com <us...@gmail.com> #81
489, in text raise ValueError(ValueError: Invalid operation: The `response.text` quick accessor requires the response to contain a valid `Part`, but none were returned. The candidate's [finish_reason](
os...@digitalstaff.ca <os...@digitalstaff.ca> #82
nc...@google.com <nc...@google.com> #83
nc...@google.com <nc...@google.com> #84
bl...@gmail.com <bl...@gmail.com> #85
na...@gmail.com <na...@gmail.com> #86
ab...@gmail.com <ab...@gmail.com> #87
mu...@gmail.com <mu...@gmail.com> #88
se...@google.com <se...@google.com> #89
What is the fix/workaround provided?
ag...@ownhealth.ca <ag...@ownhealth.ca> #90
ye...@google.com <ye...@google.com> #91
I am also having this issue.. looking over the thread, I don't see anything that works for me. Truning off the safety filters won't do. Any update?
ke...@voicecheap.ai <ke...@voicecheap.ai> #92
pm...@gmail.com <pm...@gmail.com> #93
al...@avlaskin.com <al...@avlaskin.com> #94
Getting RECITATION response, which is clearly false positive.
rg...@oberlin.edu <rg...@oberlin.edu> #95
gr...@ingeniousideas.net <gr...@ingeniousideas.net> #96
False positive trying to extract text from a handwritten letter from 1850
mi...@gmail.com <mi...@gmail.com> #97
ro...@newtenberg.com <ro...@newtenberg.com> #98
oc...@gmail.com <oc...@gmail.com> #99
le...@gmail.com <le...@gmail.com> #100
Are there any plans to fix this? It makes the API unusable, since you cant predict when it might refuse. It seems no other LLM providers have this issue? Why can't Google resolve this? It makes Audio transcription completely unusable.
ra...@google.com <ra...@google.com> #101
We are actively working on addressing errors occurred during translation and transcription.
Description
This public issue tracker aims to follow-up on the internal existing efforts to improve the occurrence of recitation issues with some prompts in the Gemini API.
Description:
The Gemini model occasionally generates responses that are near-verbatim repetitions of information found in the training data or other external sources. This issue persists even after implementing standard mitigation techniques.
Desired Outcome:
Significant Reduction in Recitation errors: The goal is to minimize the instances of recitation errors when a client is intending to get non randomized responses (for example when using Retrieval-Augmented Generation (RAG) technique), making Gemini responses more reliable.
Improved Mitigation Techniques: Identify and implement more effective strategies for preventing recitation if this is the user intention.