I wrote a DataFrame into a Delta table (e.g., demo_table) using the overwrite mode, which involves dropping the table beforehand. After the write operation was successful, I executed the OPTIMIZE command on the table. However, the OPTIMIZE operation took nearly an hour to complete. How can I improve this process?
Note : The table is in a partitioned format. Command : OPTIMIZE schema.demo_table ZORDER BY (custom_id,sales_date) Note : custom_id : Generated new columns , when we create final df final record count would be 3 million records. Not a wider table. Schema have basic data types . integer,string. there is no complex data types. Observation : when i use existing column in Zorder , it executed within 5 min.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745614735a4636163.html
评论列表(0条)