[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]  Avi Chawla [@_avichawla](/creator/twitter/_avichawla) on x 42.3K followers Created: 2025-07-27 06:31:07 UTC That said, KV cache also takes a lot of memory. Llama3-70B has: - total layers = XX - hidden size = 8k - max output size = 4k Here: - Every token takes up ~2.5 MB in KV cache. - 4k tokens will take up XXXX GB. More users → more memory. I'll cover KV optimization soon.  XXXXX engagements  **Related Topics** [token](/topic/token) [$avijo](/topic/$avijo) [Post Link](https://x.com/_avichawla/status/1949356861809213522)
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Avi Chawla @_avichawla on x 42.3K followers
Created: 2025-07-27 06:31:07 UTC
That said, KV cache also takes a lot of memory.
Llama3-70B has:
Here:
More users → more memory.
I'll cover KV optimization soon.
XXXXX engagements
/post/tweet::1949356861809213522