python - Streaming write using ray's write_parquet for vllm inference

I need to do inference using vllm for large dataset, code structure as below:

ds = ray.data.read_parquet(my_input_path)
ds = input_data.map_batches(
    VLLMPredictor,
    concurrency=ray_concurrency,
    ...
    **resources_kwarg
)
ds.write_parquet(my_output_path)

What I observed is for each node, the write process start only when all inference jobs finished. Is there a way to achieve streaming write? like every n batch we do a write.

The reason is

When doing inference, only GPUs are working and CPUs are idle, don't want to waste CPU resources at the moment
If the dataset is large (~100GB), I don't want to store the whole result in memory which may cause OOM, and I want to see inference result earlier, as long as inference result is generated

Some of the logs during inference

run_local/0                                                                                                                                                      
run_local/0 Processed prompts:  37%|███▋      | 381/1024 [01:46<03:10,  3.37it/s, est. speed input: 2210.75 toks/s, output: 341.30 toks/s]
run_local/0 Running Dataset. Active & requested resources: 0/8 CPU, 1/1 GPU, 483.4MB/40.0MB object store: : 0.00 row [11:25, ? row/s]                            
run_local/0 - ReadParquet->Map(transform_row): Tasks: 0 [backpressured]; Queued blocks: 195; Resources: 0.0 CPU, 227.4MB object store:   3%|▎         | 94.2k/3.7
run_local/0                                                                                                                          
run_local/0

The Processed prompts is keep changing, while object store bar is not changing for long time no matter what --object-store-memory I set.

Does ray support it? How can I achieve it?

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744730142a4590417.html

python - Streaming write using ray's write_parquet for vllm inference - Stack Overflow

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Streaming write using ray&#39;s write_parquet for vllm inference - Stack Overflow

相关推荐

python - Streaming write using ray&#39;s write_parquet for vllm inference - Stack Overflow

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Streaming write using ray's write_parquet for vllm inference - Stack Overflow

python - Streaming write using ray's write_parquet for vllm inference - Stack Overflow