I'm trying to fine-tune a model using SFTTrainer from trl.
This is how my SFTConfig arguments look like,
from trl import SFTConfig
training_arguments = SFTConfig(
output_dir=output_dir,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type,
report_to="tensorboard",
dataset_text_field="instruction",
max_seq_length=None,
packing=False,
gradient_checkpointing=False,
)
and this is my SFTTrainer block.
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config,
tokenizer=tokenizer,
args=training_arguments,
)
The error comes from internal function SFTTrainer._prepare_model_for_kbit_training
.
"""Prepares a quantized model for kbit training."""
330 prepare_model_kwargs = {
331 "use_gradient_checkpointing": args.gradient_checkpointing,
332 "gradient_checkpointing_kwargs": args.gradient_checkpointing_kwargs or {},
333 }
I tried passing gradient_checkpointing
as False and gradient_checkpointing_kwargs
as an empty dictionary, but no luck.
How can I avoid this error?
I'm trying to fine-tune a model using SFTTrainer from trl.
This is how my SFTConfig arguments look like,
from trl import SFTConfig
training_arguments = SFTConfig(
output_dir=output_dir,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type,
report_to="tensorboard",
dataset_text_field="instruction",
max_seq_length=None,
packing=False,
gradient_checkpointing=False,
)
and this is my SFTTrainer block.
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config,
tokenizer=tokenizer,
args=training_arguments,
)
The error comes from internal function SFTTrainer._prepare_model_for_kbit_training
.
"""Prepares a quantized model for kbit training."""
330 prepare_model_kwargs = {
331 "use_gradient_checkpointing": args.gradient_checkpointing,
332 "gradient_checkpointing_kwargs": args.gradient_checkpointing_kwargs or {},
333 }
I tried passing gradient_checkpointing
as False and gradient_checkpointing_kwargs
as an empty dictionary, but no luck.
How can I avoid this error?
Share Improve this question edited Apr 4 at 0:39 Starship Remembers Shadow 9934 gold badges15 silver badges28 bronze badges asked Mar 22 at 16:58 sabira kabeersabira kabeer 11 1- With gradient_checkpointing_kwargs={'use_reentrant':False} ? – rehaqds Commented Mar 22 at 22:46
1 Answer
Reset to default 0Using gradient_checkpointing_kwargs={'use_reentrant':False}
instead of gradient_checkpointing=False
might work.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744307254a4567775.html
评论列表(0条)