Postgres Notes: Difference between revisions

From Federal Burro of Information
Jump to navigationJump to search
Line 476: Line 476:
wrong work_mem ?
wrong work_mem ?


=== Join and sub queries cost the same now ===
<pre>
EXPLAIN (ANALYZE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
SELECT titles_translations.*
FROM titles JOIN titles_translations USING(title_id)
WHERE titles.type = 'movie' AND titles.primary_name = 'How to Train Your Dragon';
EXPLAIN (ANALYZE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
SELECT * FROM titles_translations
WHERE title_id IN( SELECT title_id FROM titles
WHERE type = 'movie' AND primary_name = 'How to Train Your Dragon' );
EXPLAIN (ANALYZE, BUFFERS FlaISE, TIMING FALSE, SUMMARY FALSE)
SELECT * FROM titles_transliations
WHERE EXISTS( SELECT * FROM titles
WHERE title_id = titles_translations.title_id AND type = 'movie' AND primary_name = 'How to Train Your Dragon' );
</pre>


== Dead tuples report ==
== Dead tuples report ==

Revision as of 14:21, 18 September 2023

Postgres cli commands

Wide output ( a la MYSQL's \G )

Good for seeing examples of FEW records:

\x
select * from users limit 1;

It's a toggle so do it again on it's own and it gets turned off.

list databases:

\l

List tables:

\dt

databases by size:

\l+
\l+ PGDATABASE

change db after connect with

\c databasename

list all databases

\list or \l

list all tables in the current database

\dt

list relations ships ( tables and sequences )

\d

An App user

Adding a secure , least priviledges user for you app, no root.


GRANT USAGE ON SCHEMA public TO myuser; -- more schemas?
GRANT SELECT ON ALL TABLES IN SCHEMA public TO myuser;

This ensure taht all subsequent tables are accessable by that user ( role ):

ALTER DEFAULT PRIVILEGES FOR ROLE mycreating_user IN SCHEMA public
GRANT SELECT ON TABLES TO myuser;

ref: https://dba.stackexchange.com/questions/91953/grant-access-to-all-tables-of-a-database

full treatment: https://stackoverflow.com/questions/41537825/restricted-postgresql-permissions-for-web-app/47955583#47955583

User Access Permissions Report

To get a view of user permissions across the whole enchilada for user "myusername":

SELECT
    u.usename,
    s.tblname,
    has_table_privilege(u.usename,s.tblname,'SELECT') AS user_has_select_permission,
    has_table_privilege(u.usename,s.tblname,'INSERT') AS user_has_insert_permission,
    has_table_privilege(u.usename,s.tblname,'UPDATE') AS user_has_update_permission,
    has_table_privilege(u.usename,s.tblname,'DELETE') AS user_has_delete_permission,
    has_table_privilege(u.usename,s.tblname,'REFERENCES') AS user_has_references_permission
FROM
    pg_user u
CROSS JOIN
    (SELECT DISTINCT schemaname+'.'+tablename as tblname FROM pg_tables where schemaname not in ('pg_catalog', 'public')) s
where u.usename = 'myusername'
--and user_has_select_permission = false 
order by 3, 2;


READ only user

-- Create a group

CREATE ROLE readaccess;

-- Grant access to existing tables

GRANT USAGE ON SCHEMA public TO readaccess;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readaccess;

-- Grant access to future tables

ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO readaccess;

-- Create a final user with password

CREATE USER tomek WITH PASSWORD 'secret';
GRANT readaccess TO tomek;

Setting Default permissions for a read only user

For subsequent tables with no Schema scope defined:

ALTER DEFAULT PRIVILEGES GRANT SELECT ON TABLES TO myusername;

Changes current tables and not subsequent:

GRANT SELECT ON ALL TABLES IN SCHEMA myschema TO myusername;

Another read only user

postgres=> \du+ pg_monitor
postgres=> create role user_monitor password 'XXX' login;
CREATE ROLE
postgres=> grant pg_monitor to user_monitor ;
GRANT ROLE
postgres=> \du+ user_monitor
                     List of roles
  Role name   | Attributes |  Member of   | Description
--------------+------------+--------------+-------------
 user_monitor |            | {pg_monitor} |

Restore and backup

as postgres user

restore:

psql dbname < infile

backup:

/bin/pg_dump dbname > /tmp/dbname.backup.date

Bigger script:

#!/bin/sh -x

DBNAME="dbname"
DIR=/data/database_backups
NOW=`date "+%Y%m%d%H%M%S"`
# binaries

# stand alone postgres
PGDUMP="/bin/pg_dump"
PSQL="/bin/psql"
PGCONF="/var/lib/pgsql/data/postgresql.conf"

echo pg_dump
${PGDUMP} ${DBNAME} | /bin/gzip > ${DIR}/vmdb.pg_dump.${NOW}.gz
echo postgresql.conf
/bin/cp ${PGCONF} ${DIR}/postgresql.${NOW}.conf
echo server vars
echo "SHOW ALL" | ${PSQL} > ${DIR}/psql_SHOW_ALL_${NOW}.out

UPDATE 2019 style!


pg_dump -h host -p 5432 -U user -Fd -j 16 -b -v -f "/tmp/filename" db_name


pg_dump
 -h host # host
 -p 5432 # port 
 -U user # user
 -Fd # output format = custom ( fit for pg_restore command ).
 -j 16 # concurrency control
 -b # include large objects
 -v # be verbose
 -f "/tmp/db_name_dir" db_name

RDS Gochas

Grant the master user to the role you want to grant others ( RDS only )

https://stackoverflow.com/questions/26684643/error-must-be-member-of-role-when-creating-schema-in-postgresql

Adding GIS support

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html#Appendix.PostgreSQL.CommonDBATasks.PostGIS

pg_stat_statements based reports

https://gist.github.com/NikolayS/b6ec676ea63ab8d4db680a4c0d88e8bf

pg_stat_activity based reports

install this extension pg_stat_statements using

tenant=> create extension pg_stat_statements;
CREATE EXTENSION

then query:

SELECT calls, total_time,total_time::decimal/calls avg, query FROM pg_stat_statements
         ORDER BY total_time DESC LIMIT 10;



per table cache hit ratios

SELECT 
  relname as table , 
  heap_blks_read as heap_read,
  heap_blks_hit as heap_hit,
  to_char( ( heap_blks_hit::decimal / ( heap_blks_hit::decimal + heap_blks_read::decimal ) ) * 100 , '99.99' )
FROM 
  pg_statio_user_tables
WHERE 
  heap_blks_hit + heap_blks_read > 0
  AND relid > 10000
ORDER BY relname;

dont forget to select pg_stat_reset();

SELECT 
  relname as table , 
  heap_blks_read as heap_read,
  heap_blks_hit as heap_hit,
  to_char( ( heap_blks_hit::decimal / ( heap_blks_hit::decimal + heap_blks_read::decimal ) ) * 100 , '99.99' ) as ratio
FROM 
  pg_statio_user_tables
WHERE 
  heap_blks_hit + heap_blks_read > 0
  AND relid > 10000
ORDER BY ratio;

percent of time index used

SELECT 
  relname, 
  100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used, 
  n_live_tup rows_in_table
FROM 
  pg_stat_user_tables
WHERE 
    seq_scan + idx_scan > 0 
ORDER BY 
  percent_of_times_index_used ASC;

Process list stuff

SELECT client_addr , count(1) from pg_stat_activity group by client_addr; 

postgres 9.5 .9.6, others?

SELECT pid, datname, usename, query_start, state_change, waiting , state
from pg_stat_activity ;
SELECT min(query_start) as oldest, max(query_start) as newest, state, count(1)
from pg_stat_activity
GROUP BY state ;

postgres 11

SELECT pid, usename, client_addr , query_start, backend_start, state_change, wait_event_type, wait_event, state, backend_type , substring( query , 0, 20 ) as query
from pg_stat_activity ;

prometheus

SELECT pid, usename, client_addr , query_start, backend_start, state_change, wait_event_type, wait_event, state, backend_type , query
from pg_stat_activity
where usename = 'prometheus';

select * from pg_stat_bgwriter;

Vacuum

with base as(SELECT round(n_dead_tup / (n_live_tup * current_setting('autovacuum_vacuum_scale_factor')::float8 + current_setting('autovacuum_vacuum_threshold')::float8)::numeric, 3) fraqmentation
, schemaname
, relname
, greatest(last_vacuum, last_autovacuum) AT TIME ZONE 'EDT' last_vacuum
, case when coalesce(last_vacuum,'1970-01-01') > coalesce(last_autovacuum,'1970-01-01') then 'Job' when coalesce(last_vacuum,'1970-01-01') < coalesce(last_autovacuum,'1970-01-01') then 'Auto' else null end vaccum_type
, greatest(last_analyze, last_autoanalyze) AT TIME ZONE 'EDT' last_analyze
, case when coalesce(last_analyze,'1970-01-01') > coalesce(last_autoanalyze,'1970-01-01') then 'Job' when coalesce(last_analyze,'1970-01-01') < coalesce(last_autoanalyze,'1970-01-01') then 'Auto' else null end analyze_type
, n_live_tup live_tuple
, n_dead_tup dead_tuple
, round(n_dead_tup / greatest(n_live_tup,1)::numeric,4) ratio
FROM pg_stat_all_tables
WHERE schemaname = 'public')
select * from base
where relname like 'steps_2019_0%'
ORDER BY fraqmentation DESC;

( might not owrk with postgres 11 )

Global settings

select * from pg_settings where category like 'Autovacuum';

per table settings

select relname, reloptions, pg_namespace.nspname
from pg_class
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
where pg_namespace.nspname = 'public';

select relname, substring(reloptions::varchar,0,70) as options, pg_namespace.nspname
from pg_class
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
where pg_namespace.nspname = 'public'
ORDER BY reloptions;

what vaccum is running now ?

possibly old version:

select
datname, 
usename, 
pid, 
state, 
wait_event, 
current_timestamp - xact_start AS xact_runtime, 
query
from pg_stat_activity 
where upper(query) LIKE '%VACUUM%' and 
pid != pg_backend_pid()
order by xact_start;

new ?

select
datname, 
usename, 
pid, 
state, 
waiting, 
current_timestamp - xact_start AS xact_runtime, 
query
from pg_stat_activity 
where upper(query) LIKE '%VACUUM%' and 
pid != pg_backend_pid()
order by xact_start;

pg 11:

select * from pg_stat_progress_vacuum;

current vaccuum config

SELECT relname, reltuples , reloptions FROM pg_class ORDER BY reloptions ASC ,relname;

doesn't work on pg 11

disk usage

SELECT 
       relname AS "table_name", 
       pg_size_pretty(pg_table_size(C.oid)) AS "table_size" 
FROM 
       pg_class C 
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) 
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' AND relkind IN ('r') 
ORDER BY pg_table_size(C.oid) 
DESC LIMIT 20;

stats

SELECT 
  relname, 
  last_vacuum,
  last_autovacuum,
  vacuum_count,
  autovacuum_count
FROM 
  pg_stat_user_tables
ORDER BY 
  last_autovacuum ASC;
SELECT 
  relname, 
  last_vacuum,
  last_autovacuum,
  vacuum_count,
  autovacuum_count
FROM 
  pg_stat_user_tables
ORDER BY 
  last_autovacuum DESC;

also see: https://www.datadoghq.com/blog/postgresql-vacuum-monitoring/

can also use "pg_stat_all_tables"

also see: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW

possibly also from the log: grep -A 5 -i vacuum postgresql-201*.log| awk 'BEGIN{RS="--"}{gsub("\n"," ",$0); print $0 }

transacion id wrap around

vacccum might happen becasue of this:

reference; https://www.cybertec-postgresql.com/en/autovacuum-wraparound-protection-in-postgresql/

SELECT
       oid::regclass::text AS table,
       age(relfrozenxid) AS xid_age, 
       mxid_age(relminmxid) AS mxid_age, 
       least( 
(SELECT setting::int
            FROM    pg_settings
            WHERE   name = 'autovacuum_freeze_max_age') - age(relfrozenxid), 
(SELECT setting::int
            FROM    pg_settings
            WHERE   name = 'autovacuum_multixact_freeze_max_age') - mxid_age(relminmxid)  
) AS tx_before_wraparound_vacuum,
pg_size_pretty(pg_total_relation_size(oid)) AS size,
pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum
FROM    pg_class
WHERE   relfrozenxid != 0
AND oid > 16384
ORDER BY tx_before_wraparound_vacuum;

Datadog Dashboard

This is a great dashboard:

  • service latency ( the front side view ) not really post gres but the thing you don't want impacted by activity in the db.
  • Locks by table
  • Dead Row by Table
  • Lock by Mode
  • Table Size by Table
  • Live rows
  • IOPS
  • CPU Utiliation

Replication

RDS replication

permissions get replicated.

create a user on the master , wait a bit and the reader will have those same permissions.


Performance

You should do this for your queries:

EXPLAIN (ANALYSE,VERBOSE,BUFFERS,JSON) SELECT * FROM fake_data LIMIT 10 OFFSET 200

It can highlight issues.

wrong default_statistics_target ?

wrong work_mem ?

Join and sub queries cost the same now

EXPLAIN (ANALYZE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
 SELECT titles_translations.*
 FROM titles JOIN titles_translations USING(title_id)
 WHERE titles.type = 'movie' AND titles.primary_name = 'How to Train Your Dragon'; 

 EXPLAIN (ANALYZE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
 SELECT * FROM titles_translations
 WHERE title_id IN( SELECT title_id FROM titles
 WHERE type = 'movie' AND primary_name = 'How to Train Your Dragon' ); 

 EXPLAIN (ANALYZE, BUFFERS FlaISE, TIMING FALSE, SUMMARY FALSE)
 SELECT * FROM titles_transliations
 WHERE EXISTS( SELECT * FROM titles
 WHERE title_id = titles_translations.title_id AND type = 'movie' AND primary_name = 'How to Train Your Dragon' ); 

 

Dead tuples report

SELECT relname, n_dead_tup FROM pg_stat_user_tables order by n_dead_tup desc limit 30;

would there be a vacuum ?

SELECT * FROM pg_stat_all_tables
ORDER BY n_dead_tup
    / (n_live_tup
       * current_setting('autovacuum_vacuum_scale_factor')::float8
          + current_setting('autovacuum_vacuum_threshold')::float8)
     DESC
LIMIT 10;

Lock report

tip: how to exclude your own query:

WHERE a.pid != pg_backend_pid()
SELECT a.datname,
         l.relation::regclass,
         l.mode,
         l.GRANTED,
         a.usename,
         substring(a.query,0,80) as query,
         a.query_start,
         age(now(), a.query_start) AS "age",
         a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
WHERE a.pid != pg_backend_pid()
ORDER BY a.query_start;


long running locking queries:

select pid, state, usename, query, query_start, now() - query_start as elapse 
from pg_stat_activity 
where pid in (
select pid
from pg_locks l 
join pg_class t on l.relation = t.oid 
where t.relkind = 'r' AND 
database = 5302651
)
ORDER BY elapse desc LIMIT 1;

useful walk through of locking:

https://jaketrent.com/post/find-kill-locks-postgres

Disk usage

By table:

SELECT
   relname as "Table",
   pg_size_pretty(pg_total_relation_size(relid)) As "Size",
   pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as "External Size"
   FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;

https://wiki.postgresql.org/wiki/Disk_Usage

SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
  SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r'
  ) a
) a;

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 5;
<pre>

== cache hit ration summary ==

<pre>
SELECT 
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit)  as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM 
  pg_statio_user_tables;

best to reset counters some where in there.

pg_stat_reset()

Enable Extensions

In AWS RDS Postgres check that the extention you want is already in the param group.

Name: rds.extensions Values: ... big long string...

then check the system var:

SHOW rds.extensions;

Then enable it:

CREATE EXTENSION IF NOT EXISTS tablefunc;

Tablefunc gives you the awesome crosstabN() funtion.

References:

Pivot table in postgres?

use crosstabN ...

Random data

Generate

reference:

Random date:

SELECT date_trunc('day', date '2014-01-10' + random() * interval '3 days' )::timestamp::date as day, seq, u.id AS user_id
  FROM GENERATE_SERIES(1, 50) seq, users u;
SELECT TO_CHAR(day, 'YYYY-MM-DD'), random() FROM generate_series
        ( '2017-02-01'::date
        , '2017-04-01'::date
        , '1 day'::interval) day


SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;

Select

examine the shape of your ID:

SELECT count(*) AS ct              -- optional
     , min(id)  AS min_id
     , max(id)  AS max_id
     , max(id) - min(id) AS id_span
FROM mytable ;

let us suppose that your span is 5100000 and that you distribution of id is dense

WITH params AS (
    SELECT 1       AS min_id           -- minimum id <= current min id
         , 5100000 AS id_span          -- rounded up. (max_id - min_id + buffer)
    )
SELECT *
FROM  (
    SELECT p.min_id + trunc(random() * p.id_span)::integer AS id
    FROM   params p
          ,generate_series(1, 1100) g  -- 1000 + buffer
    GROUP  BY 1                        -- trim duplicates
    ) r
JOIN   big USING (id)
LIMIT  1000; 

Also See