sql - Standalone query returns 87 rows in 2 minutes, but batch script gets stuck in infinite loop - Stack Overflow

I'm encountering an issue where a standalone query runs in about 2 minutes and returns 87 records,

I'm encountering an issue where a standalone query runs in about 2 minutes and returns 87 records, yet when I embed it in a batch processing script, the script seems to hang in an infinite loop. I've let the script run for over 2 hours with no progress.

The query is part of a loop that's supposed to insert rows in batches. Typically, towards the end of the job there are very few rows left to process, so I wouldn't expect such a drastic slowdown.

For context, my tables have huge volumes of data:

Source table: ~600 million rows

Existing table: ~900 million rows

Here is a simplified version of my script with masked table and column names:

IF OBJECT_ID('Temp.TempArchive', 'U') IS NOT NULL
BEGIN
    DROP TABLE Temp.TempArchive;
    RAISERROR('Existing table Temp.TempArchive dropped.', 0, 1) WITH NOWAIT;
END;

CREATE TABLE Temp.TempArchive 
(
    RecordID BIGINT NOT NULL PRIMARY KEY
);
RAISERROR('Table Temp.TempArchive created.', 0, 1) WITH NOWAIT;

CREATE INDEX IDX_TempArchive_RecordID 
ON Temp.TempArchive (RecordID);

RAISERROR('Index IDX_TempArchive_RecordID created on RecordID.', 0, 1) WITH NOWAIT;

DECLARE @BatchSize INT = 100, @RowsAffected INT = 1;
DECLARE @BatchStartTime DATETIME, @BatchEndTime DATETIME, @BatchTime INT;
DECLARE @OverallStartTime DATETIME = GETDATE(), @OverallEndTime DATETIME, @OverallTime INT;
DECLARE @IterationNumber INT = 1;
DECLARE @LastID BIGINT = 1626851290;

WHILE (@RowsAffected > 0)
BEGIN
    SET @BatchStartTime = GETDATE();
    RAISERROR('STARTED', 0, 1) WITH NOWAIT;

    DECLARE @InsertedIDs TABLE (RecordID BIGINT);

    INSERT INTO Temp.TempArchive (RecordID)
    OUTPUT inserted.RecordID INTO @InsertedIDs
        SELECT DISTINCT TOP (@BatchSize) src.RecordID
        FROM SourceTable src WITH (NOLOCK)
        WHERE NOT EXISTS (SELECT 1 
                          FROM ExistingTable chk WITH (NOLOCK)
                          WHERE chk.RecordID = src.RecordID)
          AND src.RecordID IS NOT NULL
          AND src.RecordID > @LastID
        ORDER BY src.RecordID
        OPTION (RECOMPILE);

    SET @RowsAffected = @@ROWCOUNT;

    IF EXISTS (SELECT 1 FROM @InsertedIDs)
    BEGIN
        SELECT @LastID = MAX(RecordID) FROM @InsertedIDs;
    END;

    SET @BatchEndTime = GETDATE();
    SET @BatchTime = DATEDIFF(SECOND, @BatchStartTime, @BatchEndTime);

    RAISERROR('LastID %I64d', 0, 1, @LastID) WITH NOWAIT;
    RAISERROR('Inserted %d rows in this batch. Batch time: %d seconds.', 0, 1, @RowsAffected, @BatchTime) WITH NOWAIT;
    RAISERROR('Iteration Number %d', 0, 1, @IterationNumber) WITH NOWAIT;
    SET @IterationNumber = @IterationNumber + 1;
    RAISERROR('====================================================================================', 0, 1) WITH NOWAIT;
END

SET @OverallEndTime = GETDATE();
SET @OverallTime = DATEDIFF(SECOND, @OverallStartTime, @OverallEndTime);

RAISERROR('Data inserted successfully. Total time: %d seconds.', 0, 1, @OverallTime) WITH NOWAIT;

And here's the standalone version of the query:

SELECT DISTINCT TOP 10000 src.RecordID
FROM SourceTable src WITH (NOLOCK)
WHERE NOT EXISTS (SELECT 1 
                  FROM ExistingTable chk WITH (NOLOCK)
                  WHERE chk.RecordID = src.RecordID)
  AND src.RecordID IS NOT NULL
  AND src.RecordID > 1626851290
ORDER BY src.RecordID;

The standalone query executes in about 2 minutes and returns 87 rows, but when executed within the loop, the process never completes.

Any suggestions on why the query might run fine on its own but get stuck in the batch loop?

Could this be related to issues such as query plan caching, locking, or something else inherent to the batch process? Solved a few issues with parameter sniffing but this is not a stored procedure.

I tried two approaches:

  • I ran the query by itself. It completed in about 2 minutes and returned 87 rows.

  • I embedded the same query in a loop that processes rows in batches (using a TOP clause and updating a marker variable). I expected the loop to process those 87 records and then exit once there were no more rows to insert.

What I was expecting was that the batch loop would terminate as soon as there were no more qualifying rows left (i.e. when @@ROWCOUNT is 0). Instead, the script appears to get stuck—running indefinitely (I even let it run for 2 hours and 30 minutes) without finishing, even though the standalone query shows that very few rows remain to be processed at the end.

I'm encountering an issue where a standalone query runs in about 2 minutes and returns 87 records, yet when I embed it in a batch processing script, the script seems to hang in an infinite loop. I've let the script run for over 2 hours with no progress.

The query is part of a loop that's supposed to insert rows in batches. Typically, towards the end of the job there are very few rows left to process, so I wouldn't expect such a drastic slowdown.

For context, my tables have huge volumes of data:

Source table: ~600 million rows

Existing table: ~900 million rows

Here is a simplified version of my script with masked table and column names:

IF OBJECT_ID('Temp.TempArchive', 'U') IS NOT NULL
BEGIN
    DROP TABLE Temp.TempArchive;
    RAISERROR('Existing table Temp.TempArchive dropped.', 0, 1) WITH NOWAIT;
END;

CREATE TABLE Temp.TempArchive 
(
    RecordID BIGINT NOT NULL PRIMARY KEY
);
RAISERROR('Table Temp.TempArchive created.', 0, 1) WITH NOWAIT;

CREATE INDEX IDX_TempArchive_RecordID 
ON Temp.TempArchive (RecordID);

RAISERROR('Index IDX_TempArchive_RecordID created on RecordID.', 0, 1) WITH NOWAIT;

DECLARE @BatchSize INT = 100, @RowsAffected INT = 1;
DECLARE @BatchStartTime DATETIME, @BatchEndTime DATETIME, @BatchTime INT;
DECLARE @OverallStartTime DATETIME = GETDATE(), @OverallEndTime DATETIME, @OverallTime INT;
DECLARE @IterationNumber INT = 1;
DECLARE @LastID BIGINT = 1626851290;

WHILE (@RowsAffected > 0)
BEGIN
    SET @BatchStartTime = GETDATE();
    RAISERROR('STARTED', 0, 1) WITH NOWAIT;

    DECLARE @InsertedIDs TABLE (RecordID BIGINT);

    INSERT INTO Temp.TempArchive (RecordID)
    OUTPUT inserted.RecordID INTO @InsertedIDs
        SELECT DISTINCT TOP (@BatchSize) src.RecordID
        FROM SourceTable src WITH (NOLOCK)
        WHERE NOT EXISTS (SELECT 1 
                          FROM ExistingTable chk WITH (NOLOCK)
                          WHERE chk.RecordID = src.RecordID)
          AND src.RecordID IS NOT NULL
          AND src.RecordID > @LastID
        ORDER BY src.RecordID
        OPTION (RECOMPILE);

    SET @RowsAffected = @@ROWCOUNT;

    IF EXISTS (SELECT 1 FROM @InsertedIDs)
    BEGIN
        SELECT @LastID = MAX(RecordID) FROM @InsertedIDs;
    END;

    SET @BatchEndTime = GETDATE();
    SET @BatchTime = DATEDIFF(SECOND, @BatchStartTime, @BatchEndTime);

    RAISERROR('LastID %I64d', 0, 1, @LastID) WITH NOWAIT;
    RAISERROR('Inserted %d rows in this batch. Batch time: %d seconds.', 0, 1, @RowsAffected, @BatchTime) WITH NOWAIT;
    RAISERROR('Iteration Number %d', 0, 1, @IterationNumber) WITH NOWAIT;
    SET @IterationNumber = @IterationNumber + 1;
    RAISERROR('====================================================================================', 0, 1) WITH NOWAIT;
END

SET @OverallEndTime = GETDATE();
SET @OverallTime = DATEDIFF(SECOND, @OverallStartTime, @OverallEndTime);

RAISERROR('Data inserted successfully. Total time: %d seconds.', 0, 1, @OverallTime) WITH NOWAIT;

And here's the standalone version of the query:

SELECT DISTINCT TOP 10000 src.RecordID
FROM SourceTable src WITH (NOLOCK)
WHERE NOT EXISTS (SELECT 1 
                  FROM ExistingTable chk WITH (NOLOCK)
                  WHERE chk.RecordID = src.RecordID)
  AND src.RecordID IS NOT NULL
  AND src.RecordID > 1626851290
ORDER BY src.RecordID;

The standalone query executes in about 2 minutes and returns 87 rows, but when executed within the loop, the process never completes.

Any suggestions on why the query might run fine on its own but get stuck in the batch loop?

Could this be related to issues such as query plan caching, locking, or something else inherent to the batch process? Solved a few issues with parameter sniffing but this is not a stored procedure.

I tried two approaches:

  • I ran the query by itself. It completed in about 2 minutes and returned 87 rows.

  • I embedded the same query in a loop that processes rows in batches (using a TOP clause and updating a marker variable). I expected the loop to process those 87 records and then exit once there were no more rows to insert.

What I was expecting was that the batch loop would terminate as soon as there were no more qualifying rows left (i.e. when @@ROWCOUNT is 0). Instead, the script appears to get stuck—running indefinitely (I even let it run for 2 hours and 30 minutes) without finishing, even though the standalone query shows that very few rows remain to be processed at the end.

Share Improve this question edited Mar 25 at 5:13 marc_s 756k184 gold badges1.4k silver badges1.5k bronze badges asked Mar 25 at 4:53 minato namikazeminato namikaze 291 silver badge6 bronze badges 10
  • 1 Are there any records with the followin filters: src.RecordID > 1626851290 – SouravA Commented Mar 25 at 5:56
  • 1 Why the loop ? Does the insert into select not inserts all rows ? Can there be rows inserted into sourcetable while the insert into is running ? In that case, this will loop and loop and loop until there is a moment that no rows are added to sourcetable, wich might seem as forever – GuidoG Commented Mar 25 at 6:41
  • 1 what is the print output when it gets stuck? – Martin Smith Commented Mar 25 at 6:56
  • Your @insertedids table will get pretty large, maybe that's whats taking time, when you fetch the max value. I usually use two tables, one for inserted output and one final table – siggemannen Commented Mar 25 at 7:49
  • 1 Can you share the query plans for this script? pastetheplan – Charlieface Commented Mar 25 at 13:36
 |  Show 5 more comments

1 Answer 1

Reset to default 1

Honestly, you are probably better just removing the loop and just doing it all at once.

INSERT INTO Temp.TempArchive (RecordID)
SELECT DISTINCT
  src.RecordID
FROM SourceTable src
WHERE NOT EXISTS (SELECT 1 
    FROM ExistingTable chk
    WHERE chk.RecordID = src.RecordID
)
  AND src.RecordID > 1626851290
-- why ORDER BY src.RecordID ?? It's not necessary and makes little sense
OPTION (RECOMPILE);

What you definitely need on such a large table is an index.

SourceTable (RecordID)
ExistingTable (RecordID)

If either ExistingTable or SourceTable actually has only a few distinct values of RecordID then an aggregated indexed view might be more beneficial.

CREATE VIEW dbo.vSourceTable_RecordIDs
WITH SCHEMABINDING
AS
SELECT
  s.RecordID,
  COUNT_BIG(*) AS Count   -- necessary for the indexed view
FROM dbo.SourceTable s   -- or ExistingTable
GROUP BY
  s.RecordID;
CREATE UNIQUE CLUSTERED INDEX CX ON dbo.vSourceTable_RecordIDs (RecordID);

  • You can drop the index IDX_TempArchive_RecordID on Temp.TempArchive it's completely superfluous.
  • Don't use NOLOCK unless you really know what you're doing, it has serious data integrity implications.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744217254a4563612.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信