Postgres Notes: Difference between revisions

From Federal Burro of Information
Jump to navigationJump to search
 
(55 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Wide outout ==
== Postgres cli commands  ==
 
'''Wide output''' ( a la MYSQL's \G )


Good for seeing examples of FEW records:
Good for seeing examples of FEW records:
Line 5: Line 7:
  \x
  \x
  select * from users limit 1;
  select * from users limit 1;
It's a toggle so do it again on it's own and it gets turned off.
list databases:
\l
List tables:
\dt
databases by size:
\l+
\l+ PGDATABASE
change db after connect with
\c databasename
list all databases
\list or \l
list all tables in the current database
\dt
list relations ships ( tables and sequences )
\d


== An App user ==
== An App user ==
Line 22: Line 55:


full treatment: https://stackoverflow.com/questions/41537825/restricted-postgresql-permissions-for-web-app/47955583#47955583
full treatment: https://stackoverflow.com/questions/41537825/restricted-postgresql-permissions-for-web-app/47955583#47955583
=== User Access Permissions Report ===
To get a view of user permissions across the whole enchilada for user "myusername":
<pre>
SELECT
    u.usename,
    s.tblname,
    has_table_privilege(u.usename,s.tblname,'SELECT') AS user_has_select_permission,
    has_table_privilege(u.usename,s.tblname,'INSERT') AS user_has_insert_permission,
    has_table_privilege(u.usename,s.tblname,'UPDATE') AS user_has_update_permission,
    has_table_privilege(u.usename,s.tblname,'DELETE') AS user_has_delete_permission,
    has_table_privilege(u.usename,s.tblname,'REFERENCES') AS user_has_references_permission
FROM
    pg_user u
CROSS JOIN
    (SELECT DISTINCT schemaname+'.'+tablename as tblname FROM pg_tables where schemaname not in ('pg_catalog', 'public')) s
where u.usename = 'myusername'
--and user_has_select_permission = false
order by 3, 2;
</pre>
=== READ only user ===
-- Create a group
CREATE ROLE readaccess;
-- Grant access to existing tables
GRANT USAGE ON SCHEMA public TO readaccess;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readaccess;
-- Grant access to future tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO readaccess;
-- Create a final user with password
CREATE USER tomek WITH PASSWORD 'secret';
GRANT readaccess TO tomek;
=== Setting Default permissions for a read only user ===
For subsequent tables with no Schema scope defined:
ALTER DEFAULT PRIVILEGES GRANT SELECT ON TABLES TO myusername;
Changes current tables and not subsequent:
GRANT SELECT ON ALL TABLES IN SCHEMA myschema TO myusername;
=== Another read only user ===
postgres=> \du+ pg_monitor
<pre>
postgres=> create role user_monitor password 'XXX' login;
CREATE ROLE
postgres=> grant pg_monitor to user_monitor ;
GRANT ROLE
postgres=> \du+ user_monitor
                    List of roles
  Role name  | Attributes |  Member of  | Description
--------------+------------+--------------+-------------
user_monitor |            | {pg_monitor} |
</pre>
== Restore and backup ==
as postgres user
restore:
psql dbname < infile
backup:
/bin/pg_dump dbname > /tmp/dbname.backup.date
Bigger script:
<pre>
#!/bin/sh -x
DBNAME="dbname"
DIR=/data/database_backups
NOW=`date "+%Y%m%d%H%M%S"`
# binaries
# stand alone postgres
PGDUMP="/bin/pg_dump"
PSQL="/bin/psql"
PGCONF="/var/lib/pgsql/data/postgresql.conf"
echo pg_dump
${PGDUMP} ${DBNAME} | /bin/gzip > ${DIR}/vmdb.pg_dump.${NOW}.gz
echo postgresql.conf
/bin/cp ${PGCONF} ${DIR}/postgresql.${NOW}.conf
echo server vars
echo "SHOW ALL" | ${PSQL} > ${DIR}/psql_SHOW_ALL_${NOW}.out</pre>
</pre>
UPDATE 2019 style!
pg_dump -h host -p 5432 -U user -Fd -j 16 -b -v -f "/tmp/filename" db_name
<pre>
pg_dump
-h host # host
-p 5432 # port
-U user # user
-Fd # output format = custom ( fit for pg_restore command ).
-j 16 # concurrency control
-b # include large objects
-v # be verbose
-f "/tmp/db_name_dir" db_name
</pre>


== RDS Gochas ==
== RDS Gochas ==
Line 28: Line 180:


https://stackoverflow.com/questions/26684643/error-must-be-member-of-role-when-creating-schema-in-postgresql
https://stackoverflow.com/questions/26684643/error-must-be-member-of-role-when-creating-schema-in-postgresql
'''Adding GIS support '''
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html#Appendix.PostgreSQL.CommonDBATasks.PostGIS
== pg_stat_statements based reports ==
https://gist.github.com/NikolayS/b6ec676ea63ab8d4db680a4c0d88e8bf
== pg_stat_activity based reports ==
install this extension pg_stat_statements using
tenant=> create extension pg_stat_statements;
CREATE EXTENSION
then query:
SELECT calls, total_time,total_time::decimal/calls avg, query FROM pg_stat_statements
          ORDER BY total_time DESC LIMIT 10;


== per table cache hit ratios ==
== per table cache hit ratios ==
Line 46: Line 221:


dont forget to select pg_stat_reset();
dont forget to select pg_stat_reset();
<pre>
SELECT
  relname as table ,
  heap_blks_read as heap_read,
  heap_blks_hit as heap_hit,
  to_char( ( heap_blks_hit::decimal / ( heap_blks_hit::decimal + heap_blks_read::decimal ) ) * 100 , '99.99' ) as ratio
FROM
  pg_statio_user_tables
WHERE
  heap_blks_hit + heap_blks_read > 0
  AND relid > 10000
ORDER BY ratio;
</pre>


== percent of time index used ==
== percent of time index used ==
Line 62: Line 251:
</pre>
</pre>


== vacuum ==
== Process list stuff ==
 
SELECT client_addr , count(1) from pg_stat_activity group by client_addr;
 
postgres 9.5 .9.6, others?
 
SELECT pid, datname, usename, query_start, state_change, waiting , state
from pg_stat_activity ;
 
SELECT min(query_start) as oldest, max(query_start) as newest, state, count(1)
from pg_stat_activity
GROUP BY state ;
 
=== postgres 11 ===
 
SELECT pid, usename, client_addr , query_start, backend_start, state_change, wait_event_type, wait_event, state, backend_type , substring( query , 0, 20 ) as query
from pg_stat_activity ;
 
prometheus
 
SELECT pid, usename, client_addr , query_start, backend_start, state_change, wait_event_type, wait_event, state, backend_type , query
from pg_stat_activity
where usename = 'prometheus';
 
select * from pg_stat_bgwriter;
 
== Vacuum ==
 
<pre>
with base as(SELECT round(n_dead_tup / (n_live_tup * current_setting('autovacuum_vacuum_scale_factor')::float8 + current_setting('autovacuum_vacuum_threshold')::float8)::numeric, 3) fraqmentation
, schemaname
, relname
, greatest(last_vacuum, last_autovacuum) AT TIME ZONE 'EDT' last_vacuum
, case when coalesce(last_vacuum,'1970-01-01') > coalesce(last_autovacuum,'1970-01-01') then 'Job' when coalesce(last_vacuum,'1970-01-01') < coalesce(last_autovacuum,'1970-01-01') then 'Auto' else null end vaccum_type
, greatest(last_analyze, last_autoanalyze) AT TIME ZONE 'EDT' last_analyze
, case when coalesce(last_analyze,'1970-01-01') > coalesce(last_autoanalyze,'1970-01-01') then 'Job' when coalesce(last_analyze,'1970-01-01') < coalesce(last_autoanalyze,'1970-01-01') then 'Auto' else null end analyze_type
, n_live_tup live_tuple
, n_dead_tup dead_tuple
, round(n_dead_tup / greatest(n_live_tup,1)::numeric,4) ratio
FROM pg_stat_all_tables
WHERE schemaname = 'public')
select * from base
where relname like 'steps_2019_0%'
ORDER BY fraqmentation DESC;
</pre>
 
( might not owrk with postgres 11 )


=== Global settings ===
=== Global settings ===
Line 74: Line 309:
from pg_class
from pg_class
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
where relname like 'data%' and pg_namespace.nspname = 'public';
where pg_namespace.nspname = 'public';
 
select relname, substring(reloptions::varchar,0,70) as options, pg_namespace.nspname
from pg_class
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
where pg_namespace.nspname = 'public'
ORDER BY reloptions;
 
</pre>
 
=== what vaccum is running now ? ===
 
possibly old version:
 
<pre>
select
datname,
usename,
pid,
state,
wait_event,
current_timestamp - xact_start AS xact_runtime,
query
from pg_stat_activity
where upper(query) LIKE '%VACUUM%' and
pid != pg_backend_pid()
order by xact_start;
</pre>
 
new ?
 
<pre>
select
datname,
usename,
pid,
state,
waiting,
current_timestamp - xact_start AS xact_runtime,
query
from pg_stat_activity
where upper(query) LIKE '%VACUUM%' and
pid != pg_backend_pid()
order by xact_start;
</pre>
</pre>


pg 11:
select * from pg_stat_progress_vacuum;
=== current vaccuum config ===
SELECT relname, reltuples , reloptions FROM pg_class ORDER BY reloptions ASC ,relname;
doesn't work on pg 11
=== disk usage ===
<pre>
SELECT
      relname AS "table_name",
      pg_size_pretty(pg_table_size(C.oid)) AS "table_size"
FROM
      pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' AND relkind IN ('r')
ORDER BY pg_table_size(C.oid)
DESC LIMIT 20;
</pre>


=== stats ===
=== stats ===
Line 92: Line 393:
   last_autovacuum ASC;
   last_autovacuum ASC;
</pre>
</pre>
<pre>
SELECT
  relname,
  last_vacuum,
  last_autovacuum,
  vacuum_count,
  autovacuum_count
FROM
  pg_stat_user_tables
ORDER BY
  last_autovacuum DESC;
</pre>
also see: https://www.datadoghq.com/blog/postgresql-vacuum-monitoring/
can also use "pg_stat_all_tables"
also see: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW
possibly also from the log: grep -A 5 -i vacuum postgresql-201*.log| awk 'BEGIN{RS="--"}{gsub("\n"," ",$0); print $0 }
=== transacion id wrap around ===
vacccum might happen becasue of this:
reference; https://www.cybertec-postgresql.com/en/autovacuum-wraparound-protection-in-postgresql/
<pre>
SELECT
      oid::regclass::text AS table,
      age(relfrozenxid) AS xid_age,
      mxid_age(relminmxid) AS mxid_age,
      least(
(SELECT setting::int
            FROM    pg_settings
            WHERE  name = 'autovacuum_freeze_max_age') - age(relfrozenxid),
(SELECT setting::int
            FROM    pg_settings
            WHERE  name = 'autovacuum_multixact_freeze_max_age') - mxid_age(relminmxid) 
) AS tx_before_wraparound_vacuum,
pg_size_pretty(pg_total_relation_size(oid)) AS size,
pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum
FROM    pg_class
WHERE  relfrozenxid != 0
AND oid > 16384
ORDER BY tx_before_wraparound_vacuum;
</pre>
=== Datadog Dashboard ===
This is a great dashboard:
* service latency ( the front side view ) not really post gres but the thing you don't want impacted by activity in the db.
* Locks by table
* Dead Row by Table
* Lock by Mode
* Table Size by Table
* Live rows
* IOPS
* CPU Utiliation
== Replication ==
=== RDS replication ===
permissions get replicated.
create a user on the master , wait a bit and the reader will have those same permissions.
== Performance ==
You should do this for your queries:
EXPLAIN (ANALYSE,VERBOSE,BUFFERS,JSON) SELECT * FROM fake_data LIMIT 10 OFFSET 200
It can highlight issues.
wrong default_statistics_target ?
wrong work_mem ?
=== Join and sub queries cost the same now ===
<pre>
EXPLAIN (ANALYZE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
SELECT titles_translations.*
FROM titles JOIN titles_translations USING(title_id)
WHERE titles.type = 'movie' AND titles.primary_name = 'How to Train Your Dragon';
EXPLAIN (ANALYZE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
  SELECT * FROM titles_translations
  WHERE title_id IN( SELECT title_id FROM titles
  WHERE type = 'movie' AND primary_name = 'How to Train Your Dragon' );
EXPLAIN (ANALYZE, BUFFERS FlaISE, TIMING FALSE, SUMMARY FALSE)
  SELECT * FROM titles_transliations
  WHERE EXISTS( SELECT * FROM titles
  WHERE title_id = titles_translations.title_id AND type = 'movie' AND primary_name = 'How to Train Your Dragon' );
</pre>
== Dead tuples report ==
SELECT relname, n_dead_tup FROM pg_stat_user_tables order by n_dead_tup desc limit 30;
would there be a vacuum ?
<pre>
SELECT * FROM pg_stat_all_tables
ORDER BY n_dead_tup
    / (n_live_tup
      * current_setting('autovacuum_vacuum_scale_factor')::float8
          + current_setting('autovacuum_vacuum_threshold')::float8)
    DESC
LIMIT 10;
</pre>
== Lock report ==
tip: how to exclude your own query:
WHERE a.pid != pg_backend_pid()
<pre>
SELECT a.datname,
        l.relation::regclass,
        l.mode,
        l.GRANTED,
        a.usename,
        substring(a.query,0,80) as query,
        a.query_start,
        age(now(), a.query_start) AS "age",
        a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
WHERE a.pid != pg_backend_pid()
ORDER BY a.query_start;
</pre>
long running locking queries:
<pre>
select pid, state, usename, query, query_start, now() - query_start as elapse
from pg_stat_activity
where pid in (
select pid
from pg_locks l
join pg_class t on l.relation = t.oid
where t.relkind = 'r' AND
database = 5302651
)
ORDER BY elapse desc LIMIT 1;
</pre>
useful walk through of locking:
https://jaketrent.com/post/find-kill-locks-postgres


== Disk usage ==
== Disk usage ==
By table:
<Pre>
SELECT
  relname as "Table",
  pg_size_pretty(pg_total_relation_size(relid)) As "Size",
  pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as "External Size"
  FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;
</pre>


https://wiki.postgresql.org/wiki/Disk_Usage
https://wiki.postgresql.org/wiki/Disk_Usage
Line 116: Line 587:


</pre>
</pre>
<pre>
SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 5;
<pre>


== cache hit ration summary ==
== cache hit ration summary ==
Line 189: Line 672:
     , max(id) - min(id) AS id_span
     , max(id) - min(id) AS id_span
FROM mytable ;
FROM mytable ;
<pre>
</pre>


let us suppose that your span is 5100000 and that you distribution of id is dense
let us suppose that your span is 5100000 and that you distribution of id is dense
Line 208: Line 691:
LIMIT  1000;  
LIMIT  1000;  
</pre>
</pre>
== Also See ==
* [[Sql of note]]
* https://gist.github.com/NikolayS/b6ec676ea63ab8d4db680a4c0d88e8bf
* [https://www.citusdata.com/blog/2019/02/08/the-most-useful-postgres-extension-pg-stat-statements/ The most useful Postgres extension: pg_stat_statements]
* [https://gist.github.com/NikolayS/b6ec676ea63ab8d4db680a4c0d88e8bf Slowest query et al]
* https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT
* Why upgrade postgres https://why-upgrade.depesz.com/show?from=12&to=12.1
* Graphical explain explain.depesz.com
* https://www.enterprisedb.com/blog/8-cool-interesting-facts-things-postgresql-can-do
[[Category:DATA]]
[[Category:SQL]]

Latest revision as of 14:22, 18 September 2023

Postgres cli commands

Wide output ( a la MYSQL's \G )

Good for seeing examples of FEW records:

\x
select * from users limit 1;

It's a toggle so do it again on it's own and it gets turned off.

list databases:

\l

List tables:

\dt

databases by size:

\l+
\l+ PGDATABASE

change db after connect with

\c databasename

list all databases

\list or \l

list all tables in the current database

\dt

list relations ships ( tables and sequences )

\d

An App user

Adding a secure , least priviledges user for you app, no root.


GRANT USAGE ON SCHEMA public TO myuser; -- more schemas?
GRANT SELECT ON ALL TABLES IN SCHEMA public TO myuser;

This ensure taht all subsequent tables are accessable by that user ( role ):

ALTER DEFAULT PRIVILEGES FOR ROLE mycreating_user IN SCHEMA public
GRANT SELECT ON TABLES TO myuser;

ref: https://dba.stackexchange.com/questions/91953/grant-access-to-all-tables-of-a-database

full treatment: https://stackoverflow.com/questions/41537825/restricted-postgresql-permissions-for-web-app/47955583#47955583

User Access Permissions Report

To get a view of user permissions across the whole enchilada for user "myusername":

SELECT
    u.usename,
    s.tblname,
    has_table_privilege(u.usename,s.tblname,'SELECT') AS user_has_select_permission,
    has_table_privilege(u.usename,s.tblname,'INSERT') AS user_has_insert_permission,
    has_table_privilege(u.usename,s.tblname,'UPDATE') AS user_has_update_permission,
    has_table_privilege(u.usename,s.tblname,'DELETE') AS user_has_delete_permission,
    has_table_privilege(u.usename,s.tblname,'REFERENCES') AS user_has_references_permission
FROM
    pg_user u
CROSS JOIN
    (SELECT DISTINCT schemaname+'.'+tablename as tblname FROM pg_tables where schemaname not in ('pg_catalog', 'public')) s
where u.usename = 'myusername'
--and user_has_select_permission = false 
order by 3, 2;


READ only user

-- Create a group

CREATE ROLE readaccess;

-- Grant access to existing tables

GRANT USAGE ON SCHEMA public TO readaccess;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readaccess;

-- Grant access to future tables

ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO readaccess;

-- Create a final user with password

CREATE USER tomek WITH PASSWORD 'secret';
GRANT readaccess TO tomek;

Setting Default permissions for a read only user

For subsequent tables with no Schema scope defined:

ALTER DEFAULT PRIVILEGES GRANT SELECT ON TABLES TO myusername;

Changes current tables and not subsequent:

GRANT SELECT ON ALL TABLES IN SCHEMA myschema TO myusername;

Another read only user

postgres=> \du+ pg_monitor
postgres=> create role user_monitor password 'XXX' login;
CREATE ROLE
postgres=> grant pg_monitor to user_monitor ;
GRANT ROLE
postgres=> \du+ user_monitor
                     List of roles
  Role name   | Attributes |  Member of   | Description
--------------+------------+--------------+-------------
 user_monitor |            | {pg_monitor} |

Restore and backup

as postgres user

restore:

psql dbname < infile

backup:

/bin/pg_dump dbname > /tmp/dbname.backup.date

Bigger script:

#!/bin/sh -x

DBNAME="dbname"
DIR=/data/database_backups
NOW=`date "+%Y%m%d%H%M%S"`
# binaries

# stand alone postgres
PGDUMP="/bin/pg_dump"
PSQL="/bin/psql"
PGCONF="/var/lib/pgsql/data/postgresql.conf"

echo pg_dump
${PGDUMP} ${DBNAME} | /bin/gzip > ${DIR}/vmdb.pg_dump.${NOW}.gz
echo postgresql.conf
/bin/cp ${PGCONF} ${DIR}/postgresql.${NOW}.conf
echo server vars
echo "SHOW ALL" | ${PSQL} > ${DIR}/psql_SHOW_ALL_${NOW}.out

UPDATE 2019 style!


pg_dump -h host -p 5432 -U user -Fd -j 16 -b -v -f "/tmp/filename" db_name


pg_dump
 -h host # host
 -p 5432 # port 
 -U user # user
 -Fd # output format = custom ( fit for pg_restore command ).
 -j 16 # concurrency control
 -b # include large objects
 -v # be verbose
 -f "/tmp/db_name_dir" db_name

RDS Gochas

Grant the master user to the role you want to grant others ( RDS only )

https://stackoverflow.com/questions/26684643/error-must-be-member-of-role-when-creating-schema-in-postgresql

Adding GIS support

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html#Appendix.PostgreSQL.CommonDBATasks.PostGIS

pg_stat_statements based reports

https://gist.github.com/NikolayS/b6ec676ea63ab8d4db680a4c0d88e8bf

pg_stat_activity based reports

install this extension pg_stat_statements using

tenant=> create extension pg_stat_statements;
CREATE EXTENSION

then query:

SELECT calls, total_time,total_time::decimal/calls avg, query FROM pg_stat_statements
         ORDER BY total_time DESC LIMIT 10;



per table cache hit ratios

SELECT 
  relname as table , 
  heap_blks_read as heap_read,
  heap_blks_hit as heap_hit,
  to_char( ( heap_blks_hit::decimal / ( heap_blks_hit::decimal + heap_blks_read::decimal ) ) * 100 , '99.99' )
FROM 
  pg_statio_user_tables
WHERE 
  heap_blks_hit + heap_blks_read > 0
  AND relid > 10000
ORDER BY relname;

dont forget to select pg_stat_reset();

SELECT 
  relname as table , 
  heap_blks_read as heap_read,
  heap_blks_hit as heap_hit,
  to_char( ( heap_blks_hit::decimal / ( heap_blks_hit::decimal + heap_blks_read::decimal ) ) * 100 , '99.99' ) as ratio
FROM 
  pg_statio_user_tables
WHERE 
  heap_blks_hit + heap_blks_read > 0
  AND relid > 10000
ORDER BY ratio;

percent of time index used

SELECT 
  relname, 
  100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used, 
  n_live_tup rows_in_table
FROM 
  pg_stat_user_tables
WHERE 
    seq_scan + idx_scan > 0 
ORDER BY 
  percent_of_times_index_used ASC;

Process list stuff

SELECT client_addr , count(1) from pg_stat_activity group by client_addr; 

postgres 9.5 .9.6, others?

SELECT pid, datname, usename, query_start, state_change, waiting , state
from pg_stat_activity ;
SELECT min(query_start) as oldest, max(query_start) as newest, state, count(1)
from pg_stat_activity
GROUP BY state ;

postgres 11

SELECT pid, usename, client_addr , query_start, backend_start, state_change, wait_event_type, wait_event, state, backend_type , substring( query , 0, 20 ) as query
from pg_stat_activity ;

prometheus

SELECT pid, usename, client_addr , query_start, backend_start, state_change, wait_event_type, wait_event, state, backend_type , query
from pg_stat_activity
where usename = 'prometheus';

select * from pg_stat_bgwriter;

Vacuum

with base as(SELECT round(n_dead_tup / (n_live_tup * current_setting('autovacuum_vacuum_scale_factor')::float8 + current_setting('autovacuum_vacuum_threshold')::float8)::numeric, 3) fraqmentation
, schemaname
, relname
, greatest(last_vacuum, last_autovacuum) AT TIME ZONE 'EDT' last_vacuum
, case when coalesce(last_vacuum,'1970-01-01') > coalesce(last_autovacuum,'1970-01-01') then 'Job' when coalesce(last_vacuum,'1970-01-01') < coalesce(last_autovacuum,'1970-01-01') then 'Auto' else null end vaccum_type
, greatest(last_analyze, last_autoanalyze) AT TIME ZONE 'EDT' last_analyze
, case when coalesce(last_analyze,'1970-01-01') > coalesce(last_autoanalyze,'1970-01-01') then 'Job' when coalesce(last_analyze,'1970-01-01') < coalesce(last_autoanalyze,'1970-01-01') then 'Auto' else null end analyze_type
, n_live_tup live_tuple
, n_dead_tup dead_tuple
, round(n_dead_tup / greatest(n_live_tup,1)::numeric,4) ratio
FROM pg_stat_all_tables
WHERE schemaname = 'public')
select * from base
where relname like 'steps_2019_0%'
ORDER BY fraqmentation DESC;

( might not owrk with postgres 11 )

Global settings

select * from pg_settings where category like 'Autovacuum';

per table settings

select relname, reloptions, pg_namespace.nspname
from pg_class
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
where pg_namespace.nspname = 'public';

select relname, substring(reloptions::varchar,0,70) as options, pg_namespace.nspname
from pg_class
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
where pg_namespace.nspname = 'public'
ORDER BY reloptions;

what vaccum is running now ?

possibly old version:

select
datname, 
usename, 
pid, 
state, 
wait_event, 
current_timestamp - xact_start AS xact_runtime, 
query
from pg_stat_activity 
where upper(query) LIKE '%VACUUM%' and 
pid != pg_backend_pid()
order by xact_start;

new ?

select
datname, 
usename, 
pid, 
state, 
waiting, 
current_timestamp - xact_start AS xact_runtime, 
query
from pg_stat_activity 
where upper(query) LIKE '%VACUUM%' and 
pid != pg_backend_pid()
order by xact_start;

pg 11:

select * from pg_stat_progress_vacuum;

current vaccuum config

SELECT relname, reltuples , reloptions FROM pg_class ORDER BY reloptions ASC ,relname;

doesn't work on pg 11

disk usage

SELECT 
       relname AS "table_name", 
       pg_size_pretty(pg_table_size(C.oid)) AS "table_size" 
FROM 
       pg_class C 
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) 
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' AND relkind IN ('r') 
ORDER BY pg_table_size(C.oid) 
DESC LIMIT 20;

stats

SELECT 
  relname, 
  last_vacuum,
  last_autovacuum,
  vacuum_count,
  autovacuum_count
FROM 
  pg_stat_user_tables
ORDER BY 
  last_autovacuum ASC;
SELECT 
  relname, 
  last_vacuum,
  last_autovacuum,
  vacuum_count,
  autovacuum_count
FROM 
  pg_stat_user_tables
ORDER BY 
  last_autovacuum DESC;

also see: https://www.datadoghq.com/blog/postgresql-vacuum-monitoring/

can also use "pg_stat_all_tables"

also see: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW

possibly also from the log: grep -A 5 -i vacuum postgresql-201*.log| awk 'BEGIN{RS="--"}{gsub("\n"," ",$0); print $0 }

transacion id wrap around

vacccum might happen becasue of this:

reference; https://www.cybertec-postgresql.com/en/autovacuum-wraparound-protection-in-postgresql/

SELECT
       oid::regclass::text AS table,
       age(relfrozenxid) AS xid_age, 
       mxid_age(relminmxid) AS mxid_age, 
       least( 
(SELECT setting::int
            FROM    pg_settings
            WHERE   name = 'autovacuum_freeze_max_age') - age(relfrozenxid), 
(SELECT setting::int
            FROM    pg_settings
            WHERE   name = 'autovacuum_multixact_freeze_max_age') - mxid_age(relminmxid)  
) AS tx_before_wraparound_vacuum,
pg_size_pretty(pg_total_relation_size(oid)) AS size,
pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum
FROM    pg_class
WHERE   relfrozenxid != 0
AND oid > 16384
ORDER BY tx_before_wraparound_vacuum;

Datadog Dashboard

This is a great dashboard:

  • service latency ( the front side view ) not really post gres but the thing you don't want impacted by activity in the db.
  • Locks by table
  • Dead Row by Table
  • Lock by Mode
  • Table Size by Table
  • Live rows
  • IOPS
  • CPU Utiliation

Replication

RDS replication

permissions get replicated.

create a user on the master , wait a bit and the reader will have those same permissions.


Performance

You should do this for your queries:

EXPLAIN (ANALYSE,VERBOSE,BUFFERS,JSON) SELECT * FROM fake_data LIMIT 10 OFFSET 200

It can highlight issues.

wrong default_statistics_target ?

wrong work_mem ?

Join and sub queries cost the same now

EXPLAIN (ANALYZE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
 SELECT titles_translations.*
 FROM titles JOIN titles_translations USING(title_id)
 WHERE titles.type = 'movie' AND titles.primary_name = 'How to Train Your Dragon'; 

 EXPLAIN (ANALYZE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
  SELECT * FROM titles_translations
  WHERE title_id IN( SELECT title_id FROM titles
  WHERE type = 'movie' AND primary_name = 'How to Train Your Dragon' ); 

 EXPLAIN (ANALYZE, BUFFERS FlaISE, TIMING FALSE, SUMMARY FALSE)
  SELECT * FROM titles_transliations
  WHERE EXISTS( SELECT * FROM titles
  WHERE title_id = titles_translations.title_id AND type = 'movie' AND primary_name = 'How to Train Your Dragon' ); 

Dead tuples report

SELECT relname, n_dead_tup FROM pg_stat_user_tables order by n_dead_tup desc limit 30;

would there be a vacuum ?

SELECT * FROM pg_stat_all_tables
ORDER BY n_dead_tup
    / (n_live_tup
       * current_setting('autovacuum_vacuum_scale_factor')::float8
          + current_setting('autovacuum_vacuum_threshold')::float8)
     DESC
LIMIT 10;

Lock report

tip: how to exclude your own query:

WHERE a.pid != pg_backend_pid()
SELECT a.datname,
         l.relation::regclass,
         l.mode,
         l.GRANTED,
         a.usename,
         substring(a.query,0,80) as query,
         a.query_start,
         age(now(), a.query_start) AS "age",
         a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
WHERE a.pid != pg_backend_pid()
ORDER BY a.query_start;


long running locking queries:

select pid, state, usename, query, query_start, now() - query_start as elapse 
from pg_stat_activity 
where pid in (
select pid
from pg_locks l 
join pg_class t on l.relation = t.oid 
where t.relkind = 'r' AND 
database = 5302651
)
ORDER BY elapse desc LIMIT 1;

useful walk through of locking:

https://jaketrent.com/post/find-kill-locks-postgres

Disk usage

By table:

SELECT
   relname as "Table",
   pg_size_pretty(pg_total_relation_size(relid)) As "Size",
   pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as "External Size"
   FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;

https://wiki.postgresql.org/wiki/Disk_Usage

SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
  SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r'
  ) a
) a;

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 5;
<pre>

== cache hit ration summary ==

<pre>
SELECT 
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit)  as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM 
  pg_statio_user_tables;

best to reset counters some where in there.

pg_stat_reset()

Enable Extensions

In AWS RDS Postgres check that the extention you want is already in the param group.

Name: rds.extensions Values: ... big long string...

then check the system var:

SHOW rds.extensions;

Then enable it:

CREATE EXTENSION IF NOT EXISTS tablefunc;

Tablefunc gives you the awesome crosstabN() funtion.

References:

Pivot table in postgres?

use crosstabN ...

Random data

Generate

reference:

Random date:

SELECT date_trunc('day', date '2014-01-10' + random() * interval '3 days' )::timestamp::date as day, seq, u.id AS user_id
  FROM GENERATE_SERIES(1, 50) seq, users u;
SELECT TO_CHAR(day, 'YYYY-MM-DD'), random() FROM generate_series
        ( '2017-02-01'::date
        , '2017-04-01'::date
        , '1 day'::interval) day


SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;

Select

examine the shape of your ID:

SELECT count(*) AS ct              -- optional
     , min(id)  AS min_id
     , max(id)  AS max_id
     , max(id) - min(id) AS id_span
FROM mytable ;

let us suppose that your span is 5100000 and that you distribution of id is dense

WITH params AS (
    SELECT 1       AS min_id           -- minimum id <= current min id
         , 5100000 AS id_span          -- rounded up. (max_id - min_id + buffer)
    )
SELECT *
FROM  (
    SELECT p.min_id + trunc(random() * p.id_span)::integer AS id
    FROM   params p
          ,generate_series(1, 1100) g  -- 1000 + buffer
    GROUP  BY 1                        -- trim duplicates
    ) r
JOIN   big USING (id)
LIMIT  1000; 

Also See