python 3.x - Unable to figure out the hardware requirement(Cloud or on-prem) for open source inference for multiple users - Stac

I am trying to budget for setting up a llm based RAG application which will serve users with dynamic si

I am trying to budget for setting up a llm based RAG application which will serve users with dynamic size(Anything from 100 to 2000).

I am able to figure out the GPU requirement to host a certain llm[1], let's say LLAMA 70billion at half-precision will require 168 GB. But I am unable to figure out how to calculate the token speed for a single user and then for multiple concurrent users and how to look for appropriate hardware for.

How should I approach this problem?

Thanks for taking the time to read this. [1]:

I am trying to budget for setting up a llm based RAG application which will serve users with dynamic size(Anything from 100 to 2000).

I am able to figure out the GPU requirement to host a certain llm[1], let's say LLAMA 70billion at half-precision will require 168 GB. But I am unable to figure out how to calculate the token speed for a single user and then for multiple concurrent users and how to look for appropriate hardware for.

How should I approach this problem?

Thanks for taking the time to read this. [1]: https://www.substratus.ai/blog/calculating-gpu-memory-for-llm

Share Improve this question asked Nov 19, 2024 at 17:50 BingBing 6311 gold badge9 silver badges23 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

From experience - it is not so simple. You need to have take into account:

  1. engine used for inference (TGI? pure transformers? llama-cpp)
  2. card type (really it matters whether it is H100 or L40S or A100)
  3. batch size
  4. is it a chatbot like experience or maybe you need to process offline?
  5. What is the maximum context you would like to process?

On basis of this you need to run some benchmark and generalize it

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742408752a4438384.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信