python - Multi-GPU fine-tuning llama issue. RuntimeError: Expected all tensors to be on the same device, but found at least two

admin•2025-03-19 14:08:14•questions•阅读1

I am working on a llama fine-tuning task. When I train on a single GPU, the program runs fine.import o

I am working on a llama fine-tuning task. When I train on a single GPU, the program runs fine.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_name = "../models/llama3_8b/"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map=device,
    torch_dtype=compute_dtype,
    quantization_config=bnb_config,
)

But when I wanted to use multiple GPUs for fine-tuning, an error occurred. The modified code is as follows:

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    # device_map=device,
    **device_map="auto",**  # Modifications
    torch_dtype=compute_dtype,
    quantization_config=bnb_config,
)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj",],
)
training_arguments = TrainingArguments(
    ...
    **local_rank=os.getenv("LOCAL_RANK", -1),**  # Modifications
    **ddp_find_unused_parameters=False,**  # Modifications
)
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=train_data,
    #eval_dataset=eval_data,
    peft_config=peft_config,
    dataset_text_field="text",
    tokenizer=tokenizer,
    max_seq_length=max_seq_length,
    packing=False,
    dataset_kwargs={
        "add_special_tokens": False,
        "append_concat_token": False,
    },
)
trainer.train()

The error are as follows：

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

Executing Code:

CUDA_VISIBLE_DEVICES=3,4 python llama3.py

Does anyone know how to solve it?

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1742266200a4411832.html

admin

questions
json - jq: how do I handle recursion in this case? - Stack Overflow
I have JSON output as shown below (the output of lsblk, greatly simplified). I need to generate CSV out
admin
22分钟前
10
questions
http - Download files using url in javascript - Stack Overflow
I have multiple url's, pointing to different files. I want to be able to download them just by usi
admin
21分钟前
00
questions
Terraform Azure site recovery services pipeline timeout - Stack Overflow
I'm creating a recovery services vault with some replicated VMs for failover, however the pipeline
admin
21分钟前
10
questions
javascript - Whats a cleaner way to append something to a link URL? - Stack Overflow
I've got this code so far, which isn't working:$('.passName').click(function(){var
admin
20分钟前
10
questions
spring boot - @RestClientTest without mocking Azure OAuth - Stack Overflow
Having some issues correctly setting up @RestClientTest that is using a RestClient with the new OAuth I
admin
20分钟前
10
questions
javascript - Table row not stretching to 100% width - Stack Overflow
I am hiding and showing a table row on click of an anchor, but when the tr is shown next time, it does
admin
19分钟前
00
questions
javascript - ckeditor 4 standard with text colour - Stack Overflow
I have installed ckeditor standard and want to have text colour and background colour. I installed the
admin
19分钟前
10
questions
c++ - The difference in behaviour of VS2022 and g++14 compilers - Stack Overflow
It's the assignment from student work, the task was to write sort using std::nth_element and itera
admin
18分钟前
10
questions
wp admin - Easiest way to make post private by default
I don't know how to make a plugin so I can't do what's suggested here How can I make it so the Add New Po
admin
17分钟前
10
questions
javascript - Error with jquery references and getting two versions of jquery at once - Stack Overflow
For some reason I am getting version 1.8.2 of jQuery added to my solution even though there is no refer
admin
15分钟前
00
questions
plugins - Wordpress usersme endpoint request forbidden
Using Wordpress's plugin WP REST API version 1, there is an endpoint called usersme which response with data for
admin
11分钟前
10
questions
javascript - angularjs set default date input at today - Stack Overflow
I try to set the default value of an input field at today. I can do that with an text field but when I
admin
8分钟前
10
questions
javascript - document.getElementbyID vs. variable declaration and delete performance - Stack Overflow
I would like to know what is more appropriate to do if I really need a good performance for my app. I&#
admin
8分钟前
10
questions
Hide other users' posts in admin panel
I intended to run a multi-author site, I don't want the posts from other authors to be shown in wp-adminedit.php
admin
4分钟前
10
questions
javascript - Best way to find smallest value from array containing objects - Stack Overflow
I wants to find out the object which holds the smallest value of a key in an array.var tempArr = [{name
admin
4分钟前
10
questions
Azure Data Factory : Error 409 when trying to unzip files in blob storage - Stack Overflow
I have compressed files in blob storage and I'm trying to unzip them with an Azure Data Factory ac
admin
4分钟前
10
questions
javascript - SwiperJS how to make it possible to swipe on slides with youtube iframe? - Stack Overflow
I use iframe from YouTube as slides in SwiperJS. But swipe event doesn't fire on iframe. How is it
admin
3分钟前
10
questions
javascript - How to add items entered in a dynamic HTML table to a model-bound list - Stack Overflow
I am using .NET CORE MVC for making some forms for the user to add people to a system (A manual process
admin
3分钟前
00
questions
apache iotdb - How to filter for custom date part? - Stack Overflow
I have 15+ years of temperature data with a resolution of 10 minutes. I would like to create a matrix p
admin
3分钟前
00
questions
javascript - Node.js Remove route while server is running - Stack Overflow
Currently I am trying to remove the route of a Node.js server application on runtime.for (k in self.app
admin
53秒前
00

发表回复

评论列表（0条）

暂无评论

python - Multi-GPU fine-tuning llama issue. RuntimeError: Expected all tensors to be on the same device, but found at least two

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Multi-GPU fine-tuning llama issue. RuntimeError: Expected all tensors to be on the same device, but found at least two

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888