We are running a PostgreSQL 16.2 server on a dedicated physical machine with local storage. This database has been stable and fast for years. We started seeing significant and increasing slowdowns on certain queries over about two weeks, and were able to fix most of the problem by killing an old query that was not needed and had been running since the issues started. However, some queries remained intermittently slow (takes ~10s instead of ~0.6s), and seem to be slow after an insert on that table but fast on repeated runs. We do have indexes on these columns. These queries used to run quickly even after an insert. No other queries have slowdowns that we have noticed.
I have confirmed that there are no more long running queries, and restarted the Apache server that initiated them. Top does not show any heavy CPU or memory usage. There is plenty of disk space. I have rebooted the database machine. I have tried VACUUM ANALYZE
and REINDEX
on the relevant schema. I have run VACUUM FULL VERBOSE ANALYZE
on the relevant tables.
The problematic query before the VACUUM FULL
, first when running as fast as expected and then when running slowly. (These queries are from two different tables; these tables' schema and indexes are identical and all of them show the same behaviour.)
db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p6_2025 group by pos_id;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=1000.45..97638.85 rows=502 width=11) (actual time=473.892..582.477 rows=502 loops=1)
Group Key: pos_id
-> Gather Merge (cost=1000.45..97628.81 rows=1004 width=11) (actual time=473.464..580.906 rows=1008 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=0.43..96512.90 rows=502 width=11) (actual time=2.169..381.810 rows=336 loops=3)
Group Key: pos_id
-> Parallel Index Only Scan using positioner_moves_p6_2025_pos_id_pos_move_index_idx on positioner_moves_p6_2025 (cost=0.43..91917.61 rows=918055 width=11) (actual time=0.059..181.813 rows=734444 loops=3)
Heap Fetches: 135083
Planning Time: 0.245 ms
Execution Time: 582.662 ms
(11 rows)
Time: 583.438 ms
db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p7_2025 group by pos_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=1000.45..99587.09 rows=502 width=11) (actual time=18095.778..18457.488 rows=502 loops=1)
Group Key: pos_id
-> Gather Merge (cost=1000.45..99577.05 rows=1004 width=11) (actual time=18095.319..18455.945 rows=951 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=0.43..98461.14 rows=502 width=11) (actual time=696.362..12425.781 rows=317 loops=3)
Group Key: pos_id
-> Parallel Index Only Scan using positioner_moves_p7_2025_pos_id_pos_move_index_idx on positioner_moves_p7_2025 (cost=0.43..93525.00 rows=986224 width=11) (actual time=33.352..11812.136 rows=789479 loops=3)
Heap Fetches: 133143
Planning Time: 1.203 ms
Execution Time: 18457.675 ms
(11 rows)
Time: 18459.788 ms (00:18.460)
After running VACUUM FULL VERBOSE ANALYZE
on the relevant tables the plan changed but I still see some slow queries:
db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p4_2025 GROUP BY pos_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=127983.10..128110.28 rows=502 width=11) (actual time=629.331..630.973 rows=502 loops=1)
Group Key: pos_id
-> Gather Merge (cost=127983.10..128100.24 rows=1004 width=11) (actual time=629.323..630.345 rows=1506 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=126983.08..126984.33 rows=502 width=11) (actual time=624.514..624.553 rows=502 loops=3)
Sort Key: pos_id
Sort Method: quicksort Memory: 44kB
Worker 0: Sort Method: quicksort Memory: 44kB
Worker 1: Sort Method: quicksort Memory: 44kB
-> Partial HashAggregate (cost=126955.54..126960.56 rows=502 width=11) (actual time=623.343..623.467 rows=502 loops=3)
Group Key: pos_id
Batches: 1 Memory Usage: 73kB
Worker 0: Batches: 1 Memory Usage: 73kB
Worker 1: Batches: 1 Memory Usage: 73kB
-> Parallel Seq Scan on positioner_moves_p4_2025 (cost=0.00..122148.69 rows=961369 width=11) (actual time=0.024..239.819 rows=769145 loops=3)
Planning Time: 0.106 ms
Execution Time: 631.053 ms
(18 rows)
The VACUUM FULL
log:
db_name=# VACUUM FULL VERBOSE ANALYZE posmovedb.positioner_moves_p4_2025;
INFO: vacuuming "posmovedb.positioner_moves_p4_2025"
INFO: "posmovedb.positioner_moves_p4_2025": found 1 removable, 2307435 nonremovable row versions in 119934 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 5.89 s, system: 1.90 s, elapsed: 28.88 s.
INFO: analyzing "posmovedb.positioner_moves_p4_2025"
INFO: "positioner_moves_p4_2025": scanned 30000 of 112535 pages, containing 615085 live rows and 0 dead rows; 30000 rows in sample, 2307286 estimated total rows
VACUUM
The table definition:
db_name=# \d posmovedb.positioner_moves_p5_2025
Table "posmovedb.positioner_moves_p5_2025"
Column | Type | Collation | Nullable | Default
----------------------+--------------------------+-----------+----------+---------
petal_id | integer | | |
device_loc | integer | | |
pos_id | text | | |
pos_move_index | integer | | |
time_recorded | timestamp with time zone | | |
bus_id | text | | |
pos_t | double precision | | |
pos_p | double precision | | |
last_meas_obs_x | double precision | | |
last_meas_obs_y | double precision | | |
last_meas_peak | double precision | | |
last_meas_fwhm | double precision | | |
total_move_sequences | integer | | |
total_cruise_moves_t | integer | | |
total_cruise_moves_p | integer | | |
total_creep_moves_t | integer | | |
total_creep_moves_p | integer | | |
ctrl_enabled | boolean | | not null | true
move_cmd | text | | |
move_val1 | text | | |
move_val2 | text | | |
log_note | text | | |
exposure_id | integer | | |
exposure_iter | integer | | |
flags | bigint | | |
obs_x | double precision | | |
obs_y | double precision | | |
ptl_x | double precision | | |
ptl_y | double precision | | |
ptl_z | double precision | | |
site | text | | |
postscript | text | | |
Indexes:
"positioner_moves_p5_2025_exposure_idx" btree (exposure_id)
"positioner_moves_p5_2025_exposure_iter_idx" btree (exposure_iter)
"positioner_moves_p5_2025_idx" btree (pos_id, petal_id, pos_move_index)
"positioner_moves_p5_2025_petal_id_idx" btree (petal_id)
"positioner_moves_p5_2025_pos_id_idx" btree (pos_id)
"positioner_moves_p5_2025_pos_id_pos_move_index_idx" btree (pos_id, pos_move_index)
"positioner_moves_p5_2025_pos_move_index_idx" btree (pos_move_index)
"positioner_moves_p5_2025_time_recorded_idx" btree (time_recorded)
Check constraints:
"positioner_moves_p5_2025_time_recorded_check" CHECK (time_recorded >= '2025-01-01 12:00:00'::timestamp without time zone AND time_recorded < '2026-01-01 12:00:00'::timestamp without time zone)
Inherits: positioner_moves_p5
We are running a PostgreSQL 16.2 server on a dedicated physical machine with local storage. This database has been stable and fast for years. We started seeing significant and increasing slowdowns on certain queries over about two weeks, and were able to fix most of the problem by killing an old query that was not needed and had been running since the issues started. However, some queries remained intermittently slow (takes ~10s instead of ~0.6s), and seem to be slow after an insert on that table but fast on repeated runs. We do have indexes on these columns. These queries used to run quickly even after an insert. No other queries have slowdowns that we have noticed.
I have confirmed that there are no more long running queries, and restarted the Apache server that initiated them. Top does not show any heavy CPU or memory usage. There is plenty of disk space. I have rebooted the database machine. I have tried VACUUM ANALYZE
and REINDEX
on the relevant schema. I have run VACUUM FULL VERBOSE ANALYZE
on the relevant tables.
The problematic query before the VACUUM FULL
, first when running as fast as expected and then when running slowly. (These queries are from two different tables; these tables' schema and indexes are identical and all of them show the same behaviour.)
db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p6_2025 group by pos_id;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=1000.45..97638.85 rows=502 width=11) (actual time=473.892..582.477 rows=502 loops=1)
Group Key: pos_id
-> Gather Merge (cost=1000.45..97628.81 rows=1004 width=11) (actual time=473.464..580.906 rows=1008 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=0.43..96512.90 rows=502 width=11) (actual time=2.169..381.810 rows=336 loops=3)
Group Key: pos_id
-> Parallel Index Only Scan using positioner_moves_p6_2025_pos_id_pos_move_index_idx on positioner_moves_p6_2025 (cost=0.43..91917.61 rows=918055 width=11) (actual time=0.059..181.813 rows=734444 loops=3)
Heap Fetches: 135083
Planning Time: 0.245 ms
Execution Time: 582.662 ms
(11 rows)
Time: 583.438 ms
db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p7_2025 group by pos_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=1000.45..99587.09 rows=502 width=11) (actual time=18095.778..18457.488 rows=502 loops=1)
Group Key: pos_id
-> Gather Merge (cost=1000.45..99577.05 rows=1004 width=11) (actual time=18095.319..18455.945 rows=951 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=0.43..98461.14 rows=502 width=11) (actual time=696.362..12425.781 rows=317 loops=3)
Group Key: pos_id
-> Parallel Index Only Scan using positioner_moves_p7_2025_pos_id_pos_move_index_idx on positioner_moves_p7_2025 (cost=0.43..93525.00 rows=986224 width=11) (actual time=33.352..11812.136 rows=789479 loops=3)
Heap Fetches: 133143
Planning Time: 1.203 ms
Execution Time: 18457.675 ms
(11 rows)
Time: 18459.788 ms (00:18.460)
After running VACUUM FULL VERBOSE ANALYZE
on the relevant tables the plan changed but I still see some slow queries:
db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p4_2025 GROUP BY pos_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=127983.10..128110.28 rows=502 width=11) (actual time=629.331..630.973 rows=502 loops=1)
Group Key: pos_id
-> Gather Merge (cost=127983.10..128100.24 rows=1004 width=11) (actual time=629.323..630.345 rows=1506 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=126983.08..126984.33 rows=502 width=11) (actual time=624.514..624.553 rows=502 loops=3)
Sort Key: pos_id
Sort Method: quicksort Memory: 44kB
Worker 0: Sort Method: quicksort Memory: 44kB
Worker 1: Sort Method: quicksort Memory: 44kB
-> Partial HashAggregate (cost=126955.54..126960.56 rows=502 width=11) (actual time=623.343..623.467 rows=502 loops=3)
Group Key: pos_id
Batches: 1 Memory Usage: 73kB
Worker 0: Batches: 1 Memory Usage: 73kB
Worker 1: Batches: 1 Memory Usage: 73kB
-> Parallel Seq Scan on positioner_moves_p4_2025 (cost=0.00..122148.69 rows=961369 width=11) (actual time=0.024..239.819 rows=769145 loops=3)
Planning Time: 0.106 ms
Execution Time: 631.053 ms
(18 rows)
The VACUUM FULL
log:
db_name=# VACUUM FULL VERBOSE ANALYZE posmovedb.positioner_moves_p4_2025;
INFO: vacuuming "posmovedb.positioner_moves_p4_2025"
INFO: "posmovedb.positioner_moves_p4_2025": found 1 removable, 2307435 nonremovable row versions in 119934 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 5.89 s, system: 1.90 s, elapsed: 28.88 s.
INFO: analyzing "posmovedb.positioner_moves_p4_2025"
INFO: "positioner_moves_p4_2025": scanned 30000 of 112535 pages, containing 615085 live rows and 0 dead rows; 30000 rows in sample, 2307286 estimated total rows
VACUUM
The table definition:
db_name=# \d posmovedb.positioner_moves_p5_2025
Table "posmovedb.positioner_moves_p5_2025"
Column | Type | Collation | Nullable | Default
----------------------+--------------------------+-----------+----------+---------
petal_id | integer | | |
device_loc | integer | | |
pos_id | text | | |
pos_move_index | integer | | |
time_recorded | timestamp with time zone | | |
bus_id | text | | |
pos_t | double precision | | |
pos_p | double precision | | |
last_meas_obs_x | double precision | | |
last_meas_obs_y | double precision | | |
last_meas_peak | double precision | | |
last_meas_fwhm | double precision | | |
total_move_sequences | integer | | |
total_cruise_moves_t | integer | | |
total_cruise_moves_p | integer | | |
total_creep_moves_t | integer | | |
total_creep_moves_p | integer | | |
ctrl_enabled | boolean | | not null | true
move_cmd | text | | |
move_val1 | text | | |
move_val2 | text | | |
log_note | text | | |
exposure_id | integer | | |
exposure_iter | integer | | |
flags | bigint | | |
obs_x | double precision | | |
obs_y | double precision | | |
ptl_x | double precision | | |
ptl_y | double precision | | |
ptl_z | double precision | | |
site | text | | |
postscript | text | | |
Indexes:
"positioner_moves_p5_2025_exposure_idx" btree (exposure_id)
"positioner_moves_p5_2025_exposure_iter_idx" btree (exposure_iter)
"positioner_moves_p5_2025_idx" btree (pos_id, petal_id, pos_move_index)
"positioner_moves_p5_2025_petal_id_idx" btree (petal_id)
"positioner_moves_p5_2025_pos_id_idx" btree (pos_id)
"positioner_moves_p5_2025_pos_id_pos_move_index_idx" btree (pos_id, pos_move_index)
"positioner_moves_p5_2025_pos_move_index_idx" btree (pos_move_index)
"positioner_moves_p5_2025_time_recorded_idx" btree (time_recorded)
Check constraints:
"positioner_moves_p5_2025_time_recorded_check" CHECK (time_recorded >= '2025-01-01 12:00:00'::timestamp without time zone AND time_recorded < '2026-01-01 12:00:00'::timestamp without time zone)
Inherits: positioner_moves_p5
Share
Improve this question
edited Mar 11 at 18:33
Dale K
27.5k15 gold badges58 silver badges83 bronze badges
asked Mar 11 at 16:55
elliottaelliotta
211 silver badge2 bronze badges
1
|
1 Answer
Reset to default 4Writing here because it'd be too big for a comment.
From your second query, there's a Parallel Index Only Scan using positioner_moves_p7_2025_pos_id_pos_move_index_idx
that takes 11.8 seconds compared to 0.1 seconds to do the same thing on positioner_moves_p6_2025_pos_id_pos_move_index_idx
. They are relatively the same size with actual rows 789479
and 734444
, respectively. Moreover, they both hit the disk roughly the same number of times (Heap Fetches: 133143
~ Heap Fetches: 135083
). Why does one scan of an index take 11.8 seconds while the other scan takes 0.1 seconds? Typically, if it's not a resource issue, then it could be an interference issue (i.e., locking) or some significant bloat.
Where it gets hard to analyze is after your VACUUM FULL
, you share an EXPLAIN ANALYZE
on positioner_moves_p4_2025
, which is different from the first two queries (using p6
and p7
). I would like to assume that p4
is similarly sized as p6
and p7
but I don't think we can make that assumption. Moreover, the timing on the p4
query is 0.6 seconds, which still falls into your acceptable range -- do you have a slow query to share from after VACUUM FULL
that takes over 10 seconds? It would be nice to see how the query on p7
changed, since that's the only query that you shared that came up 10+ seconds.
Since you have shared 4 different partitions (a slow p7
, acceptable p4
and p6
, and the table description for p5
), it's tricky to pinpoint the issue -- sort of a moving target. But that also suggests to me that you are seeing the slowness intermittently, on a different partition every time. With that observation, I would check to see what kind of locking behavior occurs on your tables, as you likely have different concurrent queries waiting on each other.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744781207a4593351.html
Parallel Index Only Scan using positioner_moves_p7_2025_pos_id_pos_move_index_idx
is where it went wrong, but the rows returned look fine indicating to me that it's not a planner problem. I would start by checking for locks. Also,positioner_moves_p5_2025_pos_id_idx
is redundant withpositioner_moves_p5_2025_idx
andpositioner_moves_p5_2025_pos_id_pos_move_index_idx
. – Schwern Commented Mar 12 at 1:17