I'm encountering an issue where a standalone query runs in about 2 minutes and returns 87 records, yet when I embed it in a batch processing script, the script seems to hang in an infinite loop. I've let the script run for over 2 hours with no progress.
The query is part of a loop that's supposed to insert rows in batches. Typically, towards the end of the job there are very few rows left to process, so I wouldn't expect such a drastic slowdown.
For context, my tables have huge volumes of data:
Source table: ~600 million rows
Existing table: ~900 million rows
Here is a simplified version of my script with masked table and column names:
IF OBJECT_ID('Temp.TempArchive', 'U') IS NOT NULL
BEGIN
DROP TABLE Temp.TempArchive;
RAISERROR('Existing table Temp.TempArchive dropped.', 0, 1) WITH NOWAIT;
END;
CREATE TABLE Temp.TempArchive
(
RecordID BIGINT NOT NULL PRIMARY KEY
);
RAISERROR('Table Temp.TempArchive created.', 0, 1) WITH NOWAIT;
CREATE INDEX IDX_TempArchive_RecordID
ON Temp.TempArchive (RecordID);
RAISERROR('Index IDX_TempArchive_RecordID created on RecordID.', 0, 1) WITH NOWAIT;
DECLARE @BatchSize INT = 100, @RowsAffected INT = 1;
DECLARE @BatchStartTime DATETIME, @BatchEndTime DATETIME, @BatchTime INT;
DECLARE @OverallStartTime DATETIME = GETDATE(), @OverallEndTime DATETIME, @OverallTime INT;
DECLARE @IterationNumber INT = 1;
DECLARE @LastID BIGINT = 1626851290;
WHILE (@RowsAffected > 0)
BEGIN
SET @BatchStartTime = GETDATE();
RAISERROR('STARTED', 0, 1) WITH NOWAIT;
DECLARE @InsertedIDs TABLE (RecordID BIGINT);
INSERT INTO Temp.TempArchive (RecordID)
OUTPUT inserted.RecordID INTO @InsertedIDs
SELECT DISTINCT TOP (@BatchSize) src.RecordID
FROM SourceTable src WITH (NOLOCK)
WHERE NOT EXISTS (SELECT 1
FROM ExistingTable chk WITH (NOLOCK)
WHERE chk.RecordID = src.RecordID)
AND src.RecordID IS NOT NULL
AND src.RecordID > @LastID
ORDER BY src.RecordID
OPTION (RECOMPILE);
SET @RowsAffected = @@ROWCOUNT;
IF EXISTS (SELECT 1 FROM @InsertedIDs)
BEGIN
SELECT @LastID = MAX(RecordID) FROM @InsertedIDs;
END;
SET @BatchEndTime = GETDATE();
SET @BatchTime = DATEDIFF(SECOND, @BatchStartTime, @BatchEndTime);
RAISERROR('LastID %I64d', 0, 1, @LastID) WITH NOWAIT;
RAISERROR('Inserted %d rows in this batch. Batch time: %d seconds.', 0, 1, @RowsAffected, @BatchTime) WITH NOWAIT;
RAISERROR('Iteration Number %d', 0, 1, @IterationNumber) WITH NOWAIT;
SET @IterationNumber = @IterationNumber + 1;
RAISERROR('====================================================================================', 0, 1) WITH NOWAIT;
END
SET @OverallEndTime = GETDATE();
SET @OverallTime = DATEDIFF(SECOND, @OverallStartTime, @OverallEndTime);
RAISERROR('Data inserted successfully. Total time: %d seconds.', 0, 1, @OverallTime) WITH NOWAIT;
And here's the standalone version of the query:
SELECT DISTINCT TOP 10000 src.RecordID
FROM SourceTable src WITH (NOLOCK)
WHERE NOT EXISTS (SELECT 1
FROM ExistingTable chk WITH (NOLOCK)
WHERE chk.RecordID = src.RecordID)
AND src.RecordID IS NOT NULL
AND src.RecordID > 1626851290
ORDER BY src.RecordID;
The standalone query executes in about 2 minutes and returns 87 rows, but when executed within the loop, the process never completes.
Any suggestions on why the query might run fine on its own but get stuck in the batch loop?
Could this be related to issues such as query plan caching, locking, or something else inherent to the batch process? Solved a few issues with parameter sniffing but this is not a stored procedure.
I tried two approaches:
I ran the query by itself. It completed in about 2 minutes and returned 87 rows.
I embedded the same query in a loop that processes rows in batches (using a TOP clause and updating a marker variable). I expected the loop to process those 87 records and then exit once there were no more rows to insert.
What I was expecting was that the batch loop would terminate as soon as there were no more qualifying rows left (i.e. when @@ROWCOUNT
is 0). Instead, the script appears to get stuck—running indefinitely (I even let it run for 2 hours and 30 minutes) without finishing, even though the standalone query shows that very few rows remain to be processed at the end.
I'm encountering an issue where a standalone query runs in about 2 minutes and returns 87 records, yet when I embed it in a batch processing script, the script seems to hang in an infinite loop. I've let the script run for over 2 hours with no progress.
The query is part of a loop that's supposed to insert rows in batches. Typically, towards the end of the job there are very few rows left to process, so I wouldn't expect such a drastic slowdown.
For context, my tables have huge volumes of data:
Source table: ~600 million rows
Existing table: ~900 million rows
Here is a simplified version of my script with masked table and column names:
IF OBJECT_ID('Temp.TempArchive', 'U') IS NOT NULL
BEGIN
DROP TABLE Temp.TempArchive;
RAISERROR('Existing table Temp.TempArchive dropped.', 0, 1) WITH NOWAIT;
END;
CREATE TABLE Temp.TempArchive
(
RecordID BIGINT NOT NULL PRIMARY KEY
);
RAISERROR('Table Temp.TempArchive created.', 0, 1) WITH NOWAIT;
CREATE INDEX IDX_TempArchive_RecordID
ON Temp.TempArchive (RecordID);
RAISERROR('Index IDX_TempArchive_RecordID created on RecordID.', 0, 1) WITH NOWAIT;
DECLARE @BatchSize INT = 100, @RowsAffected INT = 1;
DECLARE @BatchStartTime DATETIME, @BatchEndTime DATETIME, @BatchTime INT;
DECLARE @OverallStartTime DATETIME = GETDATE(), @OverallEndTime DATETIME, @OverallTime INT;
DECLARE @IterationNumber INT = 1;
DECLARE @LastID BIGINT = 1626851290;
WHILE (@RowsAffected > 0)
BEGIN
SET @BatchStartTime = GETDATE();
RAISERROR('STARTED', 0, 1) WITH NOWAIT;
DECLARE @InsertedIDs TABLE (RecordID BIGINT);
INSERT INTO Temp.TempArchive (RecordID)
OUTPUT inserted.RecordID INTO @InsertedIDs
SELECT DISTINCT TOP (@BatchSize) src.RecordID
FROM SourceTable src WITH (NOLOCK)
WHERE NOT EXISTS (SELECT 1
FROM ExistingTable chk WITH (NOLOCK)
WHERE chk.RecordID = src.RecordID)
AND src.RecordID IS NOT NULL
AND src.RecordID > @LastID
ORDER BY src.RecordID
OPTION (RECOMPILE);
SET @RowsAffected = @@ROWCOUNT;
IF EXISTS (SELECT 1 FROM @InsertedIDs)
BEGIN
SELECT @LastID = MAX(RecordID) FROM @InsertedIDs;
END;
SET @BatchEndTime = GETDATE();
SET @BatchTime = DATEDIFF(SECOND, @BatchStartTime, @BatchEndTime);
RAISERROR('LastID %I64d', 0, 1, @LastID) WITH NOWAIT;
RAISERROR('Inserted %d rows in this batch. Batch time: %d seconds.', 0, 1, @RowsAffected, @BatchTime) WITH NOWAIT;
RAISERROR('Iteration Number %d', 0, 1, @IterationNumber) WITH NOWAIT;
SET @IterationNumber = @IterationNumber + 1;
RAISERROR('====================================================================================', 0, 1) WITH NOWAIT;
END
SET @OverallEndTime = GETDATE();
SET @OverallTime = DATEDIFF(SECOND, @OverallStartTime, @OverallEndTime);
RAISERROR('Data inserted successfully. Total time: %d seconds.', 0, 1, @OverallTime) WITH NOWAIT;
And here's the standalone version of the query:
SELECT DISTINCT TOP 10000 src.RecordID
FROM SourceTable src WITH (NOLOCK)
WHERE NOT EXISTS (SELECT 1
FROM ExistingTable chk WITH (NOLOCK)
WHERE chk.RecordID = src.RecordID)
AND src.RecordID IS NOT NULL
AND src.RecordID > 1626851290
ORDER BY src.RecordID;
The standalone query executes in about 2 minutes and returns 87 rows, but when executed within the loop, the process never completes.
Any suggestions on why the query might run fine on its own but get stuck in the batch loop?
Could this be related to issues such as query plan caching, locking, or something else inherent to the batch process? Solved a few issues with parameter sniffing but this is not a stored procedure.
I tried two approaches:
I ran the query by itself. It completed in about 2 minutes and returned 87 rows.
I embedded the same query in a loop that processes rows in batches (using a TOP clause and updating a marker variable). I expected the loop to process those 87 records and then exit once there were no more rows to insert.
What I was expecting was that the batch loop would terminate as soon as there were no more qualifying rows left (i.e. when @@ROWCOUNT
is 0). Instead, the script appears to get stuck—running indefinitely (I even let it run for 2 hours and 30 minutes) without finishing, even though the standalone query shows that very few rows remain to be processed at the end.
1 Answer
Reset to default 1Honestly, you are probably better just removing the loop and just doing it all at once.
INSERT INTO Temp.TempArchive (RecordID)
SELECT DISTINCT
src.RecordID
FROM SourceTable src
WHERE NOT EXISTS (SELECT 1
FROM ExistingTable chk
WHERE chk.RecordID = src.RecordID
)
AND src.RecordID > 1626851290
-- why ORDER BY src.RecordID ?? It's not necessary and makes little sense
OPTION (RECOMPILE);
What you definitely need on such a large table is an index.
SourceTable (RecordID)
ExistingTable (RecordID)
If either ExistingTable
or SourceTable
actually has only a few distinct values of RecordID
then an aggregated indexed view might be more beneficial.
CREATE VIEW dbo.vSourceTable_RecordIDs
WITH SCHEMABINDING
AS
SELECT
s.RecordID,
COUNT_BIG(*) AS Count -- necessary for the indexed view
FROM dbo.SourceTable s -- or ExistingTable
GROUP BY
s.RecordID;
CREATE UNIQUE CLUSTERED INDEX CX ON dbo.vSourceTable_RecordIDs (RecordID);
- You can drop the index
IDX_TempArchive_RecordID
onTemp.TempArchive
it's completely superfluous. - Don't use
NOLOCK
unless you really know what you're doing, it has serious data integrity implications.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744217254a4563612.html
insert into select
not inserts all rows ? Can there be rows inserted into sourcetable while the insert into is running ? In that case, this will loop and loop and loop until there is a moment that no rows are added to sourcetable, wich might seem as forever – GuidoG Commented Mar 25 at 6:41@insertedids
table will get pretty large, maybe that's whats taking time, when you fetch the max value. I usually use two tables, one for inserted output and one final table – siggemannen Commented Mar 25 at 7:49