I have an external table defined within my Databricks database that's pointing to a particular folder within an S3 bucket. There are multiple csv files in the folder, the contents of which all appear as rows in my table. This all works perfectly as expected.
What I want to know is, for a row in my external table, is there any way to extract information about the particular csv file that it came from? Ideally, I'd like to be able to extract the write date of the file so I can distinguish which rows are the most recently written.
I have an external table defined within my Databricks database that's pointing to a particular folder within an S3 bucket. There are multiple csv files in the folder, the contents of which all appear as rows in my table. This all works perfectly as expected.
What I want to know is, for a row in my external table, is there any way to extract information about the particular csv file that it came from? Ideally, I'd like to be able to extract the write date of the file so I can distinguish which rows are the most recently written.
Share Improve this question asked Mar 25 at 10:10 Chris HuntChris Hunt 1213 bronze badges 3- SELECT *, INPUT_FILE_NAME() AS source_file FROM your_external_table; – Dileep Raj Narayan Thumula Commented Mar 25 at 10:30
- can you try the above? – Dileep Raj Narayan Thumula Commented Mar 25 at 10:30
- @DileepRajNarayanThumula no, it didn't work - but it did suggest something that did: select *, _metadata.file_name from my_table. Thanks! I'll add an answer below. – Chris Hunt Commented Mar 25 at 10:51
2 Answers
Reset to default 1Thanks to a hint from @DileepRajNarayanThumula I have an answer to my own question:
There's a hidden column in external tables called _metadata . This is a STRUCT containing several fields, including the file update date. So to get the update date for each row of my table, I can write something like this:
SELECT *,
_metadata.file_modification_time
FROM my_external_table
You can try the below code:
SELECT *, INPUT_FILE_NAME() AS source_file FROM your_external_table;
This will include a source_file column that contains the complete S3 file path for each row.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744204426a4563032.html
评论列表(0条)