In the HuggingFace's TrainingArguments and SFTConfig (inheriting from TrainingArguments
), there are two arguments for initializing SFTConfig()
:
group_by_length
: Whether or not to group together samples of roughly the same length in the training dataset (to minimize padding applied and be more efficient). Only useful if applying dynamic padding.packing
: Whether to pack multiple sequences into a fixed-length format. Usesmax_length
to define sequence length.
config = SFTConfig(...,
group_by_length=True,
packing=True, ...)
Those arguments serve the purpose of reducing the effort to filling in paddings. However, when packing=True
, it is pointless to use group_by_length=True
. Shall we use both to increase the training performance? Do they counteract each other?
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744201281a4562889.html
评论列表(0条)