python - SFTTrainer Error : prepare_model_for_kbit_training() got an unexpected keyword argument 'gradient_checkpointing

I'm trying to fine-tune a model using SFTTrainer from trl.This is how my SFTConfig arguments look

I'm trying to fine-tune a model using SFTTrainer from trl.

This is how my SFTConfig arguments look like,

from trl import SFTConfig
training_arguments = SFTConfig(
       output_dir=output_dir,
       num_train_epochs=num_train_epochs,
       per_device_train_batch_size=per_device_train_batch_size,
       gradient_accumulation_steps=gradient_accumulation_steps,
       optim=optim,
       save_steps=save_steps,
       logging_steps=logging_steps,
       learning_rate=learning_rate,
       weight_decay=weight_decay,
       fp16=fp16,
       bf16=bf16,
       max_grad_norm=max_grad_norm,
       max_steps=max_steps,
       warmup_ratio=warmup_ratio,
       group_by_length=group_by_length,
       lr_scheduler_type=lr_scheduler_type,
       report_to="tensorboard",
       dataset_text_field="instruction",
       max_seq_length=None,
       packing=False,
       gradient_checkpointing=False,
   )

and this is my SFTTrainer block.

trainer = SFTTrainer(
   model=model,
   train_dataset=dataset,
   peft_config=peft_config,
   tokenizer=tokenizer,
   args=training_arguments,
)

The error comes from internal function SFTTrainer._prepare_model_for_kbit_training.

 """Prepares a quantized model for kbit training."""
    330 prepare_model_kwargs = {
    331     "use_gradient_checkpointing": args.gradient_checkpointing,
    332     "gradient_checkpointing_kwargs": args.gradient_checkpointing_kwargs or {},
    333 }

I tried passing gradient_checkpointing as False and gradient_checkpointing_kwargs as an empty dictionary, but no luck.

How can I avoid this error?

I'm trying to fine-tune a model using SFTTrainer from trl.

This is how my SFTConfig arguments look like,

from trl import SFTConfig
training_arguments = SFTConfig(
       output_dir=output_dir,
       num_train_epochs=num_train_epochs,
       per_device_train_batch_size=per_device_train_batch_size,
       gradient_accumulation_steps=gradient_accumulation_steps,
       optim=optim,
       save_steps=save_steps,
       logging_steps=logging_steps,
       learning_rate=learning_rate,
       weight_decay=weight_decay,
       fp16=fp16,
       bf16=bf16,
       max_grad_norm=max_grad_norm,
       max_steps=max_steps,
       warmup_ratio=warmup_ratio,
       group_by_length=group_by_length,
       lr_scheduler_type=lr_scheduler_type,
       report_to="tensorboard",
       dataset_text_field="instruction",
       max_seq_length=None,
       packing=False,
       gradient_checkpointing=False,
   )

and this is my SFTTrainer block.

trainer = SFTTrainer(
   model=model,
   train_dataset=dataset,
   peft_config=peft_config,
   tokenizer=tokenizer,
   args=training_arguments,
)

The error comes from internal function SFTTrainer._prepare_model_for_kbit_training.

 """Prepares a quantized model for kbit training."""
    330 prepare_model_kwargs = {
    331     "use_gradient_checkpointing": args.gradient_checkpointing,
    332     "gradient_checkpointing_kwargs": args.gradient_checkpointing_kwargs or {},
    333 }

I tried passing gradient_checkpointing as False and gradient_checkpointing_kwargs as an empty dictionary, but no luck.

How can I avoid this error?

Share Improve this question edited Apr 4 at 0:39 Starship Remembers Shadow 9934 gold badges15 silver badges28 bronze badges asked Mar 22 at 16:58 sabira kabeersabira kabeer 11 1
  • With gradient_checkpointing_kwargs={'use_reentrant':False} ? – rehaqds Commented Mar 22 at 22:46
Add a comment  | 

1 Answer 1

Reset to default 0

Using gradient_checkpointing_kwargs={'use_reentrant':False} instead of gradient_checkpointing=False might work.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744307254a4567775.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信