PostGIS/PostgreSQL tips [work in progress]

I have recently begin with PHP and PostgreSQL extension for spatial data - PostGIS. There is a similar extensions for MySQL, but it felt little rusty, so I decided to go with classic. In PostGIS, I have also enabled raster support (this uses GDAL library).

My PHP wrapper library can be found on GitHub - PostGIS. This library as well as presented tips are mostly from what I have discovered. I don´t guarantee, that they are 100% foolproof and that there is no other and faster way, hot to achieve things.

Great resources I have used (PostGIS)

PostGIS manual
Raster reference
gis.stackexchange.com (referred in text as SE)
TimeZones import
Indexing

Great resources I have used (PostgreSQL)

PostgreSQL tutorial [Czech only]

Raster and bands count

There can be multiple bands inside one raster. We usually want to obtain values from DB. I have wondered, which solution is faster: I have written simple test, where I select 100 values from single raster at random positions [x, y] from rasters. My raster data have size 720x360. I have created table with two rows, first row has (1) band data, second (N) bands (where N = 6). I have tried to use gist index for raster data
CREATE INDEX rast_gist_idx ON raster_data_table USING GIST (ST_ConvexHull(rast));
but results were the same.

In both test, I have run the single query to obtain one result at a time. What if we try to select multiple values in one row (SE).
  WITH pairs(x,y) AS (
     VALUES (x1, y1), (x2, y2) ... other points ...					
  ) 
  SELECT 
    x, y, 
    ST_Value(rast, 1, x, y) AS val,
    ... other bands ...				 
  FROM
    raster_data_table 	
  CROSS JOIN pairs
  WHERE ....
I have run this query for 100 random points at positions [x, y] (same raster data as before).

Indexing - UNACCENT and LOWER

If you create an index, it does not automatically mean, that it will be used. Sometimes, DB may decide to use sequential scan instead, because it will be faster (eg. you will return about 90% of all rows). But to be sure, that index is used even for small groups of result, you can use EXPLAIN. Assuming, you are using PgAdmin III and run query from it, it can be run via F7.

If we want to use UNACCENT, we need to install it as extension:
CREATE EXTENSION unaccent;
We will build index:
CREATE INDEX cities_name_idx
ON cities
USING btree
  (LOWER(name) COLLATE pg_catalog."C");
Note: COLLATE can have C, POSIX or default. C and POSIX are using ANSI C sorting of characters. This is quite OK for our solution, since we want to search UNACCENT characters. If I have used default I have no indexing at all.

Assume this query:
SELECT name
FROM cities 
WHERE LOWER(name) LIKE 'české bu%' 
LIMIT 2


But if we use this:
SELECT name
FROM cities 
WHERE UNACCENT(LOWER(name)) LIKE 'ceske bu%' 
LIMIT 2
Time: 1562ms


No indexing is used this time and sequential search is used. This is because we are searching over UNACCENT(LOWER(...)).
Problem is that UNACCENT can not be used as part of the btree index, since it is not IMMUTABLE. Postgres only accepts IMMUTABLE functions for indexes (If a function can return a different result for the same input, the index could silently break)!

Solution (based on SO):

First create your own function:
CREATE OR REPLACE FUNCTION unaccent_lower(text)
  RETURNS text AS
$func$
SELECT unaccent('unaccent', lower($1))
$func$  LANGUAGE sql IMMUTABLE SET search_path = public, pg_temp;
Use this function to create index:
CREATE INDEX cities_name_idx
ON cities
USING btree
 (unaccent_lower(name) COLLATE pg_catalog."POSIX");
Finally, use this in your query:
SELECT name
FROM cities 
WHERE unaccent_lower(name) LIKE 'ceske bu%' 
LIMIT 2
Time: 11ms

Second solution, maybe more straightforward, is to add a new column name_search with data copy
UPDATE cities SET name_search = UNACCENT(LOWER(name))
and a classic index
CREATE INDEX cities_name_search_idx
ON cities
USING btree
 (name_search COLLATE pg_catalog."POSIX");
and use a classic SQL search without the need for function calls. The searched string is "unaccent_lower" outside the database, eg. in a script via PHP. The timing is also a little faster than with created function.

Update Tiles created from raster

We can store raster data in PostGIS. It is nice and works. The problem is searching. If raster is of a bigger size (bigger is meant something 200px in one dimension), there is a problem witch searching and other operations. For this, is better to split data into tiles and create index to speed up searching. So far so good. But what if we want to update tiles, after we have updated the original raster? For this, I am using this query:
UPDATE t SET raw_data = f.raw_data_tile 
FROM (
	SELECT id, ST_Tile(raw_data, , ) As raw_data_tile
	FROM m 
	WHERE  		
) as f
WHERE t.id = f.id	
	AND ST_UpperLeftX(t.raw_data) = ST_UpperLeftX(f.raw_data_tile)
	AND ST_UpperLeftY(t.raw_data) = ST_UpperLeftY(f.raw_data_tile)
The important part is matching existing tiles to the newly created tiles via
ST_UpperLeftX(t.raw_data) = ST_UpperLeftX(f.raw_data_tile)
. Each tile contains information about its top (upper) left corner and we are matching those. The problem with this query is its speed. It is not the fastest UPDATE and running it in "real-time" can be quite expensive.




Martin Prantl 2016