I'm trying to load some data using datasets.load_datasets. It runs correctly on a head node. The issue is happening on a slurm node. I'm using a conda env with datasets installed.
When I run on head node with the conda env active, this command works:
python -c "from datasets import load_dataset; d=load_dataset(\"json\", data_files={\"train\": \"/scratch/train/shard1.jsonl\"}); print(d)"
The issue occurs when I submit the job to the cluster. This hangs:
salloc --nodes 1 --qos interactive --time 00:15:00 --constraint gpu --account=my_account --mem=1G --gres=gpu:1
srun --nodes=1 --ntasks-per-node=1 --constraint=gpu --account=my_account --gres=gpu:1 \
bash -c '
source /global/homes/my_username/miniconda3/etc/profile.d/conda.sh &&
conda activate my_env &&
python -c "from datasets import load_dataset; load_dataset(\"json\", data_files={\"train\": \"/scratch/my_username/train/shard1.jsonl\"})"
'
I get similar behavior when I submit with sbatch. I'm using a tiny data file to test this:
{"text": "ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"}
I'm trying to load some data using datasets.load_datasets. It runs correctly on a head node. The issue is happening on a slurm node. I'm using a conda env with datasets installed.
When I run on head node with the conda env active, this command works:
python -c "from datasets import load_dataset; d=load_dataset(\"json\", data_files={\"train\": \"/scratch/train/shard1.jsonl\"}); print(d)"
The issue occurs when I submit the job to the cluster. This hangs:
salloc --nodes 1 --qos interactive --time 00:15:00 --constraint gpu --account=my_account --mem=1G --gres=gpu:1
srun --nodes=1 --ntasks-per-node=1 --constraint=gpu --account=my_account --gres=gpu:1 \
bash -c '
source /global/homes/my_username/miniconda3/etc/profile.d/conda.sh &&
conda activate my_env &&
python -c "from datasets import load_dataset; load_dataset(\"json\", data_files={\"train\": \"/scratch/my_username/train/shard1.jsonl\"})"
'
I get similar behavior when I submit with sbatch. I'm using a tiny data file to test this:
{"text": "ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT"}
Share
Improve this question
asked Mar 10 at 20:01
ate50eggsate50eggs
4543 silver badges14 bronze badges
1 Answer
Reset to default 0The issue was an incompatibility between my cluster's filesystem and the caching behavior. using the --cache_dir
flag to point at the worker node's tmp
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744825371a4595761.html
评论列表(0条)