python - loading a dataset with datasets.load_dataset is hanging - Stack Overflow

I'm trying to load some data using datasets.load_datasets. It runs correctly on a head node. The i

I'm trying to load some data using datasets.load_datasets. It runs correctly on a head node. The issue is happening on a slurm node. I'm using a conda env with datasets installed.

When I run on head node with the conda env active, this command works:

python -c "from datasets import load_dataset; d=load_dataset(\"json\", data_files={\"train\": \"/scratch/train/shard1.jsonl\"}); print(d)"

The issue occurs when I submit the job to the cluster. This hangs:

salloc --nodes 1 --qos interactive --time 00:15:00 --constraint gpu --account=my_account  --mem=1G --gres=gpu:1

srun --nodes=1 --ntasks-per-node=1 --constraint=gpu --account=my_account --gres=gpu:1 \
    bash -c '
    source /global/homes/my_username/miniconda3/etc/profile.d/conda.sh &&
    conda activate my_env &&
    python -c "from datasets import load_dataset; load_dataset(\"json\", data_files={\"train\": \"/scratch/my_username/train/shard1.jsonl\"})"
    '

I get similar behavior when I submit with sbatch. I'm using a tiny data file to test this:

{"text": "ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"}

I'm trying to load some data using datasets.load_datasets. It runs correctly on a head node. The issue is happening on a slurm node. I'm using a conda env with datasets installed.

When I run on head node with the conda env active, this command works:

python -c "from datasets import load_dataset; d=load_dataset(\"json\", data_files={\"train\": \"/scratch/train/shard1.jsonl\"}); print(d)"

The issue occurs when I submit the job to the cluster. This hangs:

salloc --nodes 1 --qos interactive --time 00:15:00 --constraint gpu --account=my_account  --mem=1G --gres=gpu:1

srun --nodes=1 --ntasks-per-node=1 --constraint=gpu --account=my_account --gres=gpu:1 \
    bash -c '
    source /global/homes/my_username/miniconda3/etc/profile.d/conda.sh &&
    conda activate my_env &&
    python -c "from datasets import load_dataset; load_dataset(\"json\", data_files={\"train\": \"/scratch/my_username/train/shard1.jsonl\"})"
    '

I get similar behavior when I submit with sbatch. I'm using a tiny data file to test this:

{"text": "ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"}
Share Improve this question asked Mar 10 at 20:01 ate50eggsate50eggs 4543 silver badges14 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

The issue was an incompatibility between my cluster's filesystem and the caching behavior. using the --cache_dir flag to point at the worker node's tmp

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744825371a4595761.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信