I am struggling with implementing pagination for a REST API in a Microsoft Fabric Data Factory pipeline. I have read the official Microsoft documentation and have tried various things but to no avail. In my scenario, I am using a Copy Data activity which connects to the API endpoint and outputs the resultant JSON file to a lakehouse folder.
The API endpoint URL is in the following format (example):
={startdatetime}&end={enddatetime}&limit=200&page={page}
The REST API returns the following values in the response body:
{
"total_count": 325,
"page": 1,
"total_pages": 2,
"result": [ array of result objects ]
}
The Source connection details are as follows (I have not specified a value for the page parameter in the query string):
Relative URL: :00:00Z&end=2025-01-29T00:00:00Z&limit=200&page=
Pagination rules:
My assumption was that Data Factory would dynamically increase the value of the page parameter by 1 until no more data was returned. I have tried variations where the 'page' parameter is wrapped in curly braces like so: {page} but no matter what, I only ever get the first page of results returned.
I'd like to avoid having to wrap the activity in a loop, as it is already in a loop because I need to fetch data for a range of days and nested loops are not supported in Data Factory. I'm aware that I could use a child pipeline for the inner loop, but I want to try see if the pagination will work first before I go down that route.
I am struggling with implementing pagination for a REST API in a Microsoft Fabric Data Factory pipeline. I have read the official Microsoft documentation and have tried various things but to no avail. In my scenario, I am using a Copy Data activity which connects to the API endpoint and outputs the resultant JSON file to a lakehouse folder.
The API endpoint URL is in the following format (example):
https://api.somecompany/data?start={startdatetime}&end={enddatetime}&limit=200&page={page}
The REST API returns the following values in the response body:
{
"total_count": 325,
"page": 1,
"total_pages": 2,
"result": [ array of result objects ]
}
The Source connection details are as follows (I have not specified a value for the page parameter in the query string):
Relative URL: https://api.somecompany/data?start=2025-01-29T00:00:00Z&end=2025-01-29T00:00:00Z&limit=200&page=
Pagination rules:
My assumption was that Data Factory would dynamically increase the value of the page parameter by 1 until no more data was returned. I have tried variations where the 'page' parameter is wrapped in curly braces like so: {page} but no matter what, I only ever get the first page of results returned.
I'd like to avoid having to wrap the activity in a loop, as it is already in a loop because I need to fetch data for a range of days and nested loops are not supported in Data Factory. I'm aware that I could use a child pipeline for the inner loop, but I want to try see if the pagination will work first before I go down that route.
Share Improve this question asked Jan 29 at 17:52 JazzaJazza 1,0821 gold badge9 silver badges24 bronze badges 2- Are you getting any error? – Rakesh Govindula Commented Jan 30 at 2:07
- @RakeshGovindula no, the Copy Activity succeeds but I only ever get the first page of results. – Jazza Commented Jan 30 at 9:44
1 Answer
Reset to default 0You can use below configurations in the copy activity source REST API to achieve your requirement.
First give the page number value with any parameter in between {}
in the dataset Relative URL as shown below.
?page={Offset}&size=20
In the copy activity source, give the Range and End condition as below.
EndCondition - $.<your_object> - Nonexist
QueryParameter - {Offset} - Range - <your_page_start_value>,End,1
Give your sink and debug the pipeline. Now, all data from the REST API pages will be copied to the target location as below sample.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745286072a4620569.html
评论列表(0条)