Assigned
Status Update
Comments
va...@google.com <va...@google.com>
ku...@google.com <ku...@google.com> #2
I have filed your feature request to our product team. However, there is no guarantee that this feature will be implemented, and the ETA cannot be provided. Rest assured that your feedback is always taken into account, as it allows us to improve the platform.
Any future updates to this feature will be posted here.
Any future updates to this feature will be posted here.
al...@transitionzero.org <al...@transitionzero.org> #3
Hi,
I understand that the issue is potentially low priority and that numerous workarounds exists but I would like to share some background info about this.
We have a Cloud SQL Second Gen instance and that went nuts one day, consuming all disk space in the universe up until it reached the maximum disk capacity of ~9TB. That issue "self resolved" (some Cloud magick is not bad...) and we did not saw any impact on any of our queries or none of our database users noticed.
This is to some extent good: something that would otherwise cause service disruption did not, the issue self-resolved and the service continued to work. However, our data usage for the instance is around 20G. But our database now have 9TB of disk space allocated to it. Huge amounts of *wasted* disk space in the Cloud era.
I have lots os services that use this database and rebuilding it from scratch (*shutdown services*, *cause service disruption for users*, mysqldump, backup, stop old instance, start new one, use mysql to restore data, reconfigure all services to new database, new IP, etc.) is laborious and, well, can happen again in the future.
It would be a nice feature to reduce the size, but I understand as a developer that "it is not that simple". Lots of infrastructure and key design decisions prevent me as a user to "just change the size down", but I also have no way/tools to assist me into moving the relevant data (20GB) out of all that wasted reserved "empty" disk space I'm paying for, without causing service disruption. Instead of spending time testing and simulating that DB work to rebuild our DB, I would like to use it to improve our system to our customers. But I'll have to go to the office on a Sunday to make this slightly laborious change because I have no easy way around this one.
I understand that the issue is potentially low priority and that numerous workarounds exists but I would like to share some background info about this.
We have a Cloud SQL Second Gen instance and that went nuts one day, consuming all disk space in the universe up until it reached the maximum disk capacity of ~9TB. That issue "self resolved" (some Cloud magick is not bad...) and we did not saw any impact on any of our queries or none of our database users noticed.
This is to some extent good: something that would otherwise cause service disruption did not, the issue self-resolved and the service continued to work. However, our data usage for the instance is around 20G. But our database now have 9TB of disk space allocated to it. Huge amounts of *wasted* disk space in the Cloud era.
I have lots os services that use this database and rebuilding it from scratch (*shutdown services*, *cause service disruption for users*, mysqldump, backup, stop old instance, start new one, use mysql to restore data, reconfigure all services to new database, new IP, etc.) is laborious and, well, can happen again in the future.
It would be a nice feature to reduce the size, but I understand as a developer that "it is not that simple". Lots of infrastructure and key design decisions prevent me as a user to "just change the size down", but I also have no way/tools to assist me into moving the relevant data (20GB) out of all that wasted reserved "empty" disk space I'm paying for, without causing service disruption. Instead of spending time testing and simulating that DB work to rebuild our DB, I would like to use it to improve our system to our customers. But I'll have to go to the office on a Sunday to make this slightly laborious change because I have no easy way around this one.
gi...@revalue.earth <gi...@revalue.earth> #4
+1 to this.
Description
Problem you have encountered
I have a PostgreSQL instance in Google Cloud SQL in which PostGIS and PostGIS Raster extensions are installed. I am trying to query rasters that are stored in a private GCS bucket and registered as out-of-database. This requires that the GDAL Virtual File System driver is configured with access to GCS.
The configuration options are set using the
postgis.gdal_vsi_options
GUC variable. However, when attempting to set the necessary configuration I receive the following error:As a result, I cannot access my raster data using the out-of-database access pattern.
What you expected to happen
I expect to be able to:
SET postgis.enable_outdb_rasters='true';
SET postgis.gdal_enabled_drivers='ENABLE_ALL';
SET postgis.gdal_vsi_options='GS_ACCESS_KEY_ID=XXXX GS_SECRET_ACCESS_KEY=YYYY';
SELECT ST_Value(rast, 1, 1) from <table> limit 1;
Steps to reproduce
My Cloud SQL PostgreSQL instance has the following PostGIS and PostGIS Raster version information:
SELECT postgis_full_version();
Note the GDAL version:
GDAL="GDAL 3.0.0dev
.I can register Cloud Optimised GeoTiffs that are stored in a private GCS Bucket as out-of-database rasters using the
raster2pgsql
utility. I run this from a VM on the same VPC as my database e.g.:Following this step I can confirm that corresponding records exist in my database.
The next step is to query the raster data. This requires PostGIS to make a call to the out-of-database COGs using the VSI driver configuration.
I first set the required GUCs:
SET postgis.enable_outdb_rasters='true';
SET postgis.gdal_enabled_drivers='ENABLE_ALL';
SET postgis.gdal_vsi_options='GS_ACCESS_KEY_ID=XXXX GS_SECRET_ACCESS_KEY=YYYY';
At this point step 3 fails with:
As a result, the query
SELECT ST_Value(rast, 1, 1) from <table> limit 1;
fails with:Note: Sometimes subsequent attempts to set the
postgis.gdal_vsi_options
will fail silently. A further request to get data will also result in thert_band_load_offline_data
error as above.Other information
Public COGs
I carried out the above steps to ingest and access a COG file from a public AWS bucket. However, this failed in the same way;
SET postgis.gdal_vsi_options='AWS_NO_SIGN_REQUEST=YES';
raised an invalid value error forAWS_NO_SIGN_REQUEST
.Alternate Approach
For comparison, I set up a compute instance running a docker image of PostGIS (
postgis/postgis:14-3.3
). I repeated all of the steps outlined above and was successful in querying my raster data and performing crop and union operations. The version information is as follows:GDAL on Cloud SQL
When setting
postgis.gdal_vsi_options
, PostGIS will validate the options by reading the available options from GDAL'sVSIGetFileSystemOptions()
function. Therefore, the ability to set these options in PostGIS is directly dependent on the GDAL version in use. What relevant options are available and when did they become available in GDAL?AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
-->GDAL=2.3
AWS_NO_SIGN_REQUEST
-->GDAL=2.3
GS_ACCESS_KEY_ID
andGS_SECRET_ACCESS_KEY
-->GDAL=2.3
GS_NO_SIGN_REQUEST
-->GDAL=3.4
In summary:
PostGIS >= 3.2
.postgis.gdal_vsi_options
.GDAL >= 2.3, <3.4
, users can set the necessary options to access:GDAL >= 3.4
public Google Buckets can also be accessed.Revisiting the PostGIS version installed on CloudSQL:
Checklist:
POSTGIS=3.2.5
>= 3.2 ✅GDAL=3.0.0dev
>= 2.3 ✅GDAL=3.0.0dev
>= 3.4 ❌Given the above, the only issue should be that we cannot provide the
GS_NO_SIGN_REQUEST=YES
option to access public GCS buckets. My personal requirement of accessing data in a private GCS bucket should be achievable.This leads me to believe the problem lies with the
GDAL=3.0.0dev
package used in Cloud SQL.