I am working on a llama fine-tuning task. When I train on a single GPU, the program runs fine.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_name = "../models/llama3_8b/"
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map=device,
torch_dtype=compute_dtype,
quantization_config=bnb_config,
)
But when I wanted to use multiple GPUs for fine-tuning, an error occurred. The modified code is as follows:
model = AutoModelForCausalLM.from_pretrained(
model_name,
# device_map=device,
**device_map="auto",** # Modifications
torch_dtype=compute_dtype,
quantization_config=bnb_config,
)
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0,
r=64,
bias="none",
task_type="CAUSAL_LM",
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
)
training_arguments = TrainingArguments(
...
**local_rank=os.getenv("LOCAL_RANK", -1),** # Modifications
**ddp_find_unused_parameters=False,** # Modifications
)
trainer = SFTTrainer(
model=model,
args=training_arguments,
train_dataset=train_data,
#eval_dataset=eval_data,
peft_config=peft_config,
dataset_text_field="text",
tokenizer=tokenizer,
max_seq_length=max_seq_length,
packing=False,
dataset_kwargs={
"add_special_tokens": False,
"append_concat_token": False,
},
)
trainer.train()
The error are as follows:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!
Executing Code:
CUDA_VISIBLE_DEVICES=3,4 python llama3.py
Does anyone know how to solve it?
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742266200a4411832.html
评论列表(0条)