sql - Postgresql SELECTs went from 0.6s to 10+s after killing a (different) long running SELECT query - Stack Overflow

We are running a PostgreSQL 16.2 server on a dedicated physical machine with local storage. This databa

We are running a PostgreSQL 16.2 server on a dedicated physical machine with local storage. This database has been stable and fast for years. We started seeing significant and increasing slowdowns on certain queries over about two weeks, and were able to fix most of the problem by killing an old query that was not needed and had been running since the issues started. However, some queries remained intermittently slow (takes ~10s instead of ~0.6s), and seem to be slow after an insert on that table but fast on repeated runs. We do have indexes on these columns. These queries used to run quickly even after an insert. No other queries have slowdowns that we have noticed.

I have confirmed that there are no more long running queries, and restarted the Apache server that initiated them. Top does not show any heavy CPU or memory usage. There is plenty of disk space. I have rebooted the database machine. I have tried VACUUM ANALYZE and REINDEX on the relevant schema. I have run VACUUM FULL VERBOSE ANALYZE on the relevant tables.

The problematic query before the VACUUM FULL, first when running as fast as expected and then when running slowly. (These queries are from two different tables; these tables' schema and indexes are identical and all of them show the same behaviour.)

db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p6_2025 group by pos_id;
                                                                                                          QUERY PLAN                                                                                                          
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=1000.45..97638.85 rows=502 width=11) (actual time=473.892..582.477 rows=502 loops=1)
   Group Key: pos_id
   ->  Gather Merge  (cost=1000.45..97628.81 rows=1004 width=11) (actual time=473.464..580.906 rows=1008 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=0.43..96512.90 rows=502 width=11) (actual time=2.169..381.810 rows=336 loops=3)
               Group Key: pos_id
               ->  Parallel Index Only Scan using positioner_moves_p6_2025_pos_id_pos_move_index_idx on positioner_moves_p6_2025  (cost=0.43..91917.61 rows=918055 width=11) (actual time=0.059..181.813 rows=734444 loops=3)
                     Heap Fetches: 135083
 Planning Time: 0.245 ms
 Execution Time: 582.662 ms
(11 rows)

Time: 583.438 ms
db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p7_2025 group by pos_id;
                                                                                                           QUERY PLAN                                                                                                            
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=1000.45..99587.09 rows=502 width=11) (actual time=18095.778..18457.488 rows=502 loops=1)
   Group Key: pos_id
   ->  Gather Merge  (cost=1000.45..99577.05 rows=1004 width=11) (actual time=18095.319..18455.945 rows=951 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=0.43..98461.14 rows=502 width=11) (actual time=696.362..12425.781 rows=317 loops=3)
               Group Key: pos_id
               ->  Parallel Index Only Scan using positioner_moves_p7_2025_pos_id_pos_move_index_idx on positioner_moves_p7_2025  (cost=0.43..93525.00 rows=986224 width=11) (actual time=33.352..11812.136 rows=789479 loops=3)
                     Heap Fetches: 133143
 Planning Time: 1.203 ms
 Execution Time: 18457.675 ms
(11 rows)

Time: 18459.788 ms (00:18.460)

After running VACUUM FULL VERBOSE ANALYZE on the relevant tables the plan changed but I still see some slow queries:

db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p4_2025 GROUP BY pos_id;
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=127983.10..128110.28 rows=502 width=11) (actual time=629.331..630.973 rows=502 loops=1)
   Group Key: pos_id
   ->  Gather Merge  (cost=127983.10..128100.24 rows=1004 width=11) (actual time=629.323..630.345 rows=1506 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=126983.08..126984.33 rows=502 width=11) (actual time=624.514..624.553 rows=502 loops=3)
               Sort Key: pos_id
               Sort Method: quicksort  Memory: 44kB
               Worker 0:  Sort Method: quicksort  Memory: 44kB
               Worker 1:  Sort Method: quicksort  Memory: 44kB
               ->  Partial HashAggregate  (cost=126955.54..126960.56 rows=502 width=11) (actual time=623.343..623.467 rows=502 loops=3)
                     Group Key: pos_id
                     Batches: 1  Memory Usage: 73kB
                     Worker 0:  Batches: 1  Memory Usage: 73kB
                     Worker 1:  Batches: 1  Memory Usage: 73kB
                     ->  Parallel Seq Scan on positioner_moves_p4_2025  (cost=0.00..122148.69 rows=961369 width=11) (actual time=0.024..239.819 rows=769145 loops=3)
 Planning Time: 0.106 ms
 Execution Time: 631.053 ms
(18 rows)

The VACUUM FULL log:

db_name=# VACUUM FULL VERBOSE ANALYZE posmovedb.positioner_moves_p4_2025;
INFO:  vacuuming "posmovedb.positioner_moves_p4_2025"
INFO:  "posmovedb.positioner_moves_p4_2025": found 1 removable, 2307435 nonremovable row versions in 119934 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 5.89 s, system: 1.90 s, elapsed: 28.88 s.
INFO:  analyzing "posmovedb.positioner_moves_p4_2025"
INFO:  "positioner_moves_p4_2025": scanned 30000 of 112535 pages, containing 615085 live rows and 0 dead rows; 30000 rows in sample, 2307286 estimated total rows
VACUUM

The table definition:

db_name=# \d posmovedb.positioner_moves_p5_2025
                    Table "posmovedb.positioner_moves_p5_2025"
        Column        |           Type           | Collation | Nullable | Default 
----------------------+--------------------------+-----------+----------+---------
 petal_id             | integer                  |           |          | 
 device_loc           | integer                  |           |          | 
 pos_id               | text                     |           |          | 
 pos_move_index       | integer                  |           |          | 
 time_recorded        | timestamp with time zone |           |          | 
 bus_id               | text                     |           |          | 
 pos_t                | double precision         |           |          | 
 pos_p                | double precision         |           |          | 
 last_meas_obs_x      | double precision         |           |          | 
 last_meas_obs_y      | double precision         |           |          | 
 last_meas_peak       | double precision         |           |          | 
 last_meas_fwhm       | double precision         |           |          | 
 total_move_sequences | integer                  |           |          | 
 total_cruise_moves_t | integer                  |           |          | 
 total_cruise_moves_p | integer                  |           |          | 
 total_creep_moves_t  | integer                  |           |          | 
 total_creep_moves_p  | integer                  |           |          | 
 ctrl_enabled         | boolean                  |           | not null | true
 move_cmd             | text                     |           |          | 
 move_val1            | text                     |           |          | 
 move_val2            | text                     |           |          | 
 log_note             | text                     |           |          | 
 exposure_id          | integer                  |           |          | 
 exposure_iter        | integer                  |           |          | 
 flags                | bigint                   |           |          | 
 obs_x                | double precision         |           |          | 
 obs_y                | double precision         |           |          | 
 ptl_x                | double precision         |           |          | 
 ptl_y                | double precision         |           |          | 
 ptl_z                | double precision         |           |          | 
 site                 | text                     |           |          | 
 postscript           | text                     |           |          | 
Indexes:
    "positioner_moves_p5_2025_exposure_idx" btree (exposure_id)
    "positioner_moves_p5_2025_exposure_iter_idx" btree (exposure_iter)
    "positioner_moves_p5_2025_idx" btree (pos_id, petal_id, pos_move_index)
    "positioner_moves_p5_2025_petal_id_idx" btree (petal_id)
    "positioner_moves_p5_2025_pos_id_idx" btree (pos_id)
    "positioner_moves_p5_2025_pos_id_pos_move_index_idx" btree (pos_id, pos_move_index)
    "positioner_moves_p5_2025_pos_move_index_idx" btree (pos_move_index)
    "positioner_moves_p5_2025_time_recorded_idx" btree (time_recorded)
Check constraints:
    "positioner_moves_p5_2025_time_recorded_check" CHECK (time_recorded >= '2025-01-01 12:00:00'::timestamp without time zone AND time_recorded < '2026-01-01 12:00:00'::timestamp without time zone)
Inherits: positioner_moves_p5

We are running a PostgreSQL 16.2 server on a dedicated physical machine with local storage. This database has been stable and fast for years. We started seeing significant and increasing slowdowns on certain queries over about two weeks, and were able to fix most of the problem by killing an old query that was not needed and had been running since the issues started. However, some queries remained intermittently slow (takes ~10s instead of ~0.6s), and seem to be slow after an insert on that table but fast on repeated runs. We do have indexes on these columns. These queries used to run quickly even after an insert. No other queries have slowdowns that we have noticed.

I have confirmed that there are no more long running queries, and restarted the Apache server that initiated them. Top does not show any heavy CPU or memory usage. There is plenty of disk space. I have rebooted the database machine. I have tried VACUUM ANALYZE and REINDEX on the relevant schema. I have run VACUUM FULL VERBOSE ANALYZE on the relevant tables.

The problematic query before the VACUUM FULL, first when running as fast as expected and then when running slowly. (These queries are from two different tables; these tables' schema and indexes are identical and all of them show the same behaviour.)

db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p6_2025 group by pos_id;
                                                                                                          QUERY PLAN                                                                                                          
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=1000.45..97638.85 rows=502 width=11) (actual time=473.892..582.477 rows=502 loops=1)
   Group Key: pos_id
   ->  Gather Merge  (cost=1000.45..97628.81 rows=1004 width=11) (actual time=473.464..580.906 rows=1008 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=0.43..96512.90 rows=502 width=11) (actual time=2.169..381.810 rows=336 loops=3)
               Group Key: pos_id
               ->  Parallel Index Only Scan using positioner_moves_p6_2025_pos_id_pos_move_index_idx on positioner_moves_p6_2025  (cost=0.43..91917.61 rows=918055 width=11) (actual time=0.059..181.813 rows=734444 loops=3)
                     Heap Fetches: 135083
 Planning Time: 0.245 ms
 Execution Time: 582.662 ms
(11 rows)

Time: 583.438 ms
db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p7_2025 group by pos_id;
                                                                                                           QUERY PLAN                                                                                                            
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=1000.45..99587.09 rows=502 width=11) (actual time=18095.778..18457.488 rows=502 loops=1)
   Group Key: pos_id
   ->  Gather Merge  (cost=1000.45..99577.05 rows=1004 width=11) (actual time=18095.319..18455.945 rows=951 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=0.43..98461.14 rows=502 width=11) (actual time=696.362..12425.781 rows=317 loops=3)
               Group Key: pos_id
               ->  Parallel Index Only Scan using positioner_moves_p7_2025_pos_id_pos_move_index_idx on positioner_moves_p7_2025  (cost=0.43..93525.00 rows=986224 width=11) (actual time=33.352..11812.136 rows=789479 loops=3)
                     Heap Fetches: 133143
 Planning Time: 1.203 ms
 Execution Time: 18457.675 ms
(11 rows)

Time: 18459.788 ms (00:18.460)

After running VACUUM FULL VERBOSE ANALYZE on the relevant tables the plan changed but I still see some slow queries:

db_name=> EXPLAIN ANALYZE SELECT pos_id, max(pos_move_index) from posmovedb.positioner_moves_p4_2025 GROUP BY pos_id;
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=127983.10..128110.28 rows=502 width=11) (actual time=629.331..630.973 rows=502 loops=1)
   Group Key: pos_id
   ->  Gather Merge  (cost=127983.10..128100.24 rows=1004 width=11) (actual time=629.323..630.345 rows=1506 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=126983.08..126984.33 rows=502 width=11) (actual time=624.514..624.553 rows=502 loops=3)
               Sort Key: pos_id
               Sort Method: quicksort  Memory: 44kB
               Worker 0:  Sort Method: quicksort  Memory: 44kB
               Worker 1:  Sort Method: quicksort  Memory: 44kB
               ->  Partial HashAggregate  (cost=126955.54..126960.56 rows=502 width=11) (actual time=623.343..623.467 rows=502 loops=3)
                     Group Key: pos_id
                     Batches: 1  Memory Usage: 73kB
                     Worker 0:  Batches: 1  Memory Usage: 73kB
                     Worker 1:  Batches: 1  Memory Usage: 73kB
                     ->  Parallel Seq Scan on positioner_moves_p4_2025  (cost=0.00..122148.69 rows=961369 width=11) (actual time=0.024..239.819 rows=769145 loops=3)
 Planning Time: 0.106 ms
 Execution Time: 631.053 ms
(18 rows)

The VACUUM FULL log:

db_name=# VACUUM FULL VERBOSE ANALYZE posmovedb.positioner_moves_p4_2025;
INFO:  vacuuming "posmovedb.positioner_moves_p4_2025"
INFO:  "posmovedb.positioner_moves_p4_2025": found 1 removable, 2307435 nonremovable row versions in 119934 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 5.89 s, system: 1.90 s, elapsed: 28.88 s.
INFO:  analyzing "posmovedb.positioner_moves_p4_2025"
INFO:  "positioner_moves_p4_2025": scanned 30000 of 112535 pages, containing 615085 live rows and 0 dead rows; 30000 rows in sample, 2307286 estimated total rows
VACUUM

The table definition:

db_name=# \d posmovedb.positioner_moves_p5_2025
                    Table "posmovedb.positioner_moves_p5_2025"
        Column        |           Type           | Collation | Nullable | Default 
----------------------+--------------------------+-----------+----------+---------
 petal_id             | integer                  |           |          | 
 device_loc           | integer                  |           |          | 
 pos_id               | text                     |           |          | 
 pos_move_index       | integer                  |           |          | 
 time_recorded        | timestamp with time zone |           |          | 
 bus_id               | text                     |           |          | 
 pos_t                | double precision         |           |          | 
 pos_p                | double precision         |           |          | 
 last_meas_obs_x      | double precision         |           |          | 
 last_meas_obs_y      | double precision         |           |          | 
 last_meas_peak       | double precision         |           |          | 
 last_meas_fwhm       | double precision         |           |          | 
 total_move_sequences | integer                  |           |          | 
 total_cruise_moves_t | integer                  |           |          | 
 total_cruise_moves_p | integer                  |           |          | 
 total_creep_moves_t  | integer                  |           |          | 
 total_creep_moves_p  | integer                  |           |          | 
 ctrl_enabled         | boolean                  |           | not null | true
 move_cmd             | text                     |           |          | 
 move_val1            | text                     |           |          | 
 move_val2            | text                     |           |          | 
 log_note             | text                     |           |          | 
 exposure_id          | integer                  |           |          | 
 exposure_iter        | integer                  |           |          | 
 flags                | bigint                   |           |          | 
 obs_x                | double precision         |           |          | 
 obs_y                | double precision         |           |          | 
 ptl_x                | double precision         |           |          | 
 ptl_y                | double precision         |           |          | 
 ptl_z                | double precision         |           |          | 
 site                 | text                     |           |          | 
 postscript           | text                     |           |          | 
Indexes:
    "positioner_moves_p5_2025_exposure_idx" btree (exposure_id)
    "positioner_moves_p5_2025_exposure_iter_idx" btree (exposure_iter)
    "positioner_moves_p5_2025_idx" btree (pos_id, petal_id, pos_move_index)
    "positioner_moves_p5_2025_petal_id_idx" btree (petal_id)
    "positioner_moves_p5_2025_pos_id_idx" btree (pos_id)
    "positioner_moves_p5_2025_pos_id_pos_move_index_idx" btree (pos_id, pos_move_index)
    "positioner_moves_p5_2025_pos_move_index_idx" btree (pos_move_index)
    "positioner_moves_p5_2025_time_recorded_idx" btree (time_recorded)
Check constraints:
    "positioner_moves_p5_2025_time_recorded_check" CHECK (time_recorded >= '2025-01-01 12:00:00'::timestamp without time zone AND time_recorded < '2026-01-01 12:00:00'::timestamp without time zone)
Inherits: positioner_moves_p5
Share Improve this question edited Mar 11 at 18:33 Dale K 27.5k15 gold badges58 silver badges83 bronze badges asked Mar 11 at 16:55 elliottaelliotta 211 silver badge2 bronze badges 1
  • 1 Parallel Index Only Scan using positioner_moves_p7_2025_pos_id_pos_move_index_idx is where it went wrong, but the rows returned look fine indicating to me that it's not a planner problem. I would start by checking for locks. Also, positioner_moves_p5_2025_pos_id_idx is redundant with positioner_moves_p5_2025_idx and positioner_moves_p5_2025_pos_id_pos_move_index_idx. – Schwern Commented Mar 12 at 1:17
Add a comment  | 

1 Answer 1

Reset to default 4

Writing here because it'd be too big for a comment.

From your second query, there's a Parallel Index Only Scan using positioner_moves_p7_2025_pos_id_pos_move_index_idx that takes 11.8 seconds compared to 0.1 seconds to do the same thing on positioner_moves_p6_2025_pos_id_pos_move_index_idx. They are relatively the same size with actual rows 789479 and 734444, respectively. Moreover, they both hit the disk roughly the same number of times (Heap Fetches: 133143 ~ Heap Fetches: 135083). Why does one scan of an index take 11.8 seconds while the other scan takes 0.1 seconds? Typically, if it's not a resource issue, then it could be an interference issue (i.e., locking) or some significant bloat.

Where it gets hard to analyze is after your VACUUM FULL, you share an EXPLAIN ANALYZE on positioner_moves_p4_2025, which is different from the first two queries (using p6 and p7). I would like to assume that p4 is similarly sized as p6 and p7 but I don't think we can make that assumption. Moreover, the timing on the p4 query is 0.6 seconds, which still falls into your acceptable range -- do you have a slow query to share from after VACUUM FULL that takes over 10 seconds? It would be nice to see how the query on p7 changed, since that's the only query that you shared that came up 10+ seconds.

Since you have shared 4 different partitions (a slow p7, acceptable p4 and p6, and the table description for p5), it's tricky to pinpoint the issue -- sort of a moving target. But that also suggests to me that you are seeing the slowness intermittently, on a different partition every time. With that observation, I would check to see what kind of locking behavior occurs on your tables, as you likely have different concurrent queries waiting on each other.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744781207a4593351.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信