I need to detect words an LLM has no knowledge about, to add RAG-based definition of said word to the prompt, i.e.:
What is the best way to achieve slubalisme using the new fabridocium product ?
, should highlight slubalisme
and fabridocium
as unknown words.
What is the best way to achieve this ?
What I've tried:
- Tokenizer based: checking if the model tokenizer splits the word in multiple pieces. This is not accurate as some known words can easily be split in multiple pieces by the tokenizer. There are a lot of false positives
- Comparing vocab list: prone to spelling issue
- Prompting an LLM: works OK but really inefficient
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745285192a4620518.html
评论列表(0条)