Fixed
Status Update
Comments
jf...@google.com <jf...@google.com> #2
Thank you for your detailed report!
We'll take a closer look and let you know if we have any updates.
We'll take a closer look and let you know if we have any updates.
da...@gmail.com <da...@gmail.com> #3
We have just clarified the resumable upload protocol in our documentation here: https://developers.google.com/photos/library/guides/resumable-uploads
Note that this is slightly different from our previous protocol for resumable uploads. Please let us know if you are still having trouble following this new guide. Our apologies, but the general flow should be very similar to the previous definition.
Note that this is slightly different from our previous protocol for resumable uploads. Please let us know if you are still having trouble following this new guide. Our apologies, but the general flow should be very similar to the previous definition.
da...@gmail.com <da...@gmail.com> #4
Thanks for the updated documentation. I've given it a go and resumable uploads now seem to be working fine. Thanks.
jf...@google.com <jf...@google.com> #5
Thanks for the detailed report!
We have been able to replicate this issue as well and will look into it further. We will update this issue when we have anything new to share. Thanks for your patience!
We have been able to replicate this issue as well and will look into it further. We will update this issue when we have anything new to share. Thanks for your patience!
jf...@google.com <jf...@google.com> #6
We are still looking into the issue, but it is unfortunately difficult to replicate and to address. How frequently have you encountered duplicate IDs? (It seems to be quite rare from our investigation.)
At this point, our recommendation would be to handle duplicate media Items on your end, for example by caching them in a Set or HashMap that ensures uniqueness, rather than a simple List.
At this point, our recommendation would be to handle duplicate media Items on your end, for example by caching them in a Set or HashMap that ensures uniqueness, rather than a simple List.
da...@gmail.com <da...@gmail.com> #7
Is some bot yelling about a 120 day timeout? ;)
Yeah -- as mentioned inhttps://issuetracker.google.com/issues/113870729#comment3 I already switched my code back in August to a python `set` once I fixed it. Easy to workaround, no big deal for me!
Yeah -- as mentioned in
ph...@gmail.com <ph...@gmail.com> #8
I can confirm that this is still happening. For me, duplicates are always the last image of one page and the first of the next one. It happens pretty consistently (every time I try to retrieve the last 100 images with 10 results per page).
It started after I deleted a lot of images from my library; before that, I did not notice that. Now I am also getting results with 3 or 4 images per page even though I am requesting 10 items per page so it might be some problem with cached results where some items are no longer valid and therefore removed? This might also explain the duplicates because indexes of these images have changed but cached results are not ready for that.
It started after I deleted a lot of images from my library; before that, I did not notice that. Now I am also getting results with 3 or 4 images per page even though I am requesting 10 items per page so it might be some problem with cached results where some items are no longer valid and therefore removed? This might also explain the duplicates because indexes of these images have changed but cached results are not ready for that.
jf...@google.com <jf...@google.com> #9
Thanks for adding the extra context.
Unfortunately this is a little tricky to track down, but it is on our list to investigate further.
(The page size is a known limitation of the API.)
Unfortunately this is a little tricky to track down, but it is on our list to investigate further.
(The page size is a known limitation of the API.)
er...@gmail.com <er...@gmail.com> #10
I can also reproduce, and can provide more account details privately.
It happens consistently on my account when I list the entire photo library using either mediaItems.list or mediaItems.search.
When duplicates are returned, it is always the final mediaItem of one page, which is then duplicated as the first media item of the successive page (i.e. the subsequent request for the nextPageToken).
Duplicates happen only once for a particular ID, and happen infrequently relative to the total size of the photos library (0.18% of the mediaItems I get back are duplicates).
I believe this relates to deletions.
I have been curating my library and deleting a number of photos, so my overall library size has been shrinking in the past month. And yet the number of duplicates I get has been increasing. Whereas last month I would get around 15 duplicate IDs when listing my library, I am now getting close to 80 duplicates. So reproducing probably involves some churn of deleting + adding files.
In terms of severity, reporting duplicate mediaItems isn't too bad. However, if there is no plan to fix it please note this possibility in the API reference. Clients can certainly work around it if they are aware of the possibility of duplicates, however it is surprise when this happens, and may not be discovered in testing since it happens at a low occurrence and not for all accounts.
Hopefully it is just an issue which causes duplicate mediaItems to be reported, and not a deeper problem where some mediaItems are simply returned.
Possibly related, but I separately had an issue in the past where I deleted the majority of my photos library. It was now nearly empty, however mediaItems.list would still step through hundreds of empty pages. Loading fromphotos.google.com would take about a minute to display an empty library, so I expect it was internally hitting a similar issue with lots of empty media item shards.
It happens consistently on my account when I list the entire photo library using either mediaItems.list or mediaItems.search.
When duplicates are returned, it is always the final mediaItem of one page, which is then duplicated as the first media item of the successive page (i.e. the subsequent request for the nextPageToken).
Duplicates happen only once for a particular ID, and happen infrequently relative to the total size of the photos library (0.18% of the mediaItems I get back are duplicates).
I believe this relates to deletions.
I have been curating my library and deleting a number of photos, so my overall library size has been shrinking in the past month. And yet the number of duplicates I get has been increasing. Whereas last month I would get around 15 duplicate IDs when listing my library, I am now getting close to 80 duplicates. So reproducing probably involves some churn of deleting + adding files.
In terms of severity, reporting duplicate mediaItems isn't too bad. However, if there is no plan to fix it please note this possibility in the API reference. Clients can certainly work around it if they are aware of the possibility of duplicates, however it is surprise when this happens, and may not be discovered in testing since it happens at a low occurrence and not for all accounts.
Hopefully it is just an issue which causes duplicate mediaItems to be reported, and not a deeper problem where some mediaItems are simply returned.
Possibly related, but I separately had an issue in the past where I deleted the majority of my photos library. It was now nearly empty, however mediaItems.list would still step through hundreds of empty pages. Loading from
er...@gmail.com <er...@gmail.com> #11
> and not a deeper problem where some mediaItems are simply returned.
Typo, missing "not".
Intended to write: "and not a deeper problem where some mediaItems are simply not returned."
Typo, missing "not".
Intended to write: "and not a deeper problem where some mediaItems are simply not returned."
jf...@google.com <jf...@google.com> #12
Thanks again for your report. We have just rolled out a change that addresses this issue.
I'll close this for now, but please let us know if you are still seeing this.
I'll close this for now, but please let us know if you are still seeing this.
Description
When I page the search results, I get duplicate results. My query is:
params = {
'fields': 'mediaItems(id,baseUrl,filename,mimeType,productUrl),nextPageToken',
}
search_json = {
"pageSize": 100,
"filters": {
"includeArchivedMedia": False,
"contentFilter": {
"excludedContentCategories": [
"DOCUMENTS",
"RECEIPTS",
"SCREENSHOTS",
"UTILITY",
"WHITEBOARDS",
]
},
"mediaTypeFilter": {
"mediaTypes": [
"PHOTO",
],
},
},
}
using the Google Python API to execute it:
rsp =
'
params=params,
json=search_json,
).json()
where session is of type AuthorizedSession:
from google.auth.transport.requests import AuthorizedSession
Paging this API queries ~18k items in total, about 5-10 of which are duplicates. Which photos IDs are duplicated does not seem to be deterministic.
A small code sample that reliably reproduces the issue. The sample should run as-is or with minimal setup.
You can see all the code here:
You can run it if you have app credentials + user token in JSON that already works stored locally. You'll need your own app credentials regardless, but you can get your own user credentials dynamically (using the out-of-band method) by commenting out the first two lines of the __main__ block, and uncomment the third.
The calls to the API that lead to the error. Include the sequence of calls, including request headers and body.
Do not include any personal information, authentication secrets, media item or album IDs.
What steps will reproduce the problem?
Run the linked script on a reasonable large user's album, after setting up the app credentials needed.
What is the expected output? What do you see instead? If you see error messages, please provide them.
Expected: A list of media_item entries, all of which are unique.
Actual: A list of media_item entries, with some duplicates.
Please provide any additional information below.
I'm happy to provide any information that's useful!
Here's an example from my own log:
There were four duplicated IDs out of 18K + items, spread across probably over 200 calls to the search API (100 items per, often fewer than 100 returned).
The duplicated IDs were returned in sequence, but I did not log the accompanying nextPageTokens.
One media item ID that was returned twice: AGj1epXH-iAAeNQQ_uk9BPAFjF1YIBc3HqsfZ88X6xN94-qG4uML8fJQk07naI_WnppXejtav2Eg_-c.
--
I am happy to go back and add better logging of the requests in order to debug the dupes if that helps!