[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@jino_rohit Jino Rohit

Jino Rohit posts on X about inference, llm, llamacpp, tokenization the most. They currently have XXXXX followers and XXX posts still getting attention that total XXX engagements in the last XX hours.

Engagements: XXX #

X Week XXXXXX -XX%
X Month XXXXXXX -XX%
X Months XXXXXXX +8,207%
X Year XXXXXXX +1,066,760%

Mentions: X #

X Months XX +172%
X Year XX +2,950%

Followers: XXXXX #

X Week XXXXX +1.40%
X Month XXXXX +18%
X Months XXXXX +2,245%

CreatorRank: XXXXXXXXX #

Social Influence

Social category influence finance technology brands

Social topic influence inference #213, llm #277, llamacpp, tokenization, dot

Top Social Posts

Top posts by engagements in the last XX hours

"@Abhishekcur yeah noticed all big projects like llamacpp etc have super complex cmake stuff"
X Link 2025-11-12T07:21Z 3401 followers, 1342 engagements

"by far my favorite blog on positional encodings for LLM"
X Link 2025-12-13T11:27Z 3402 followers, 24.8K engagements

"Today I'll be working through - X. ive implemented top-k greedy sampling and random sampling for the inference engine in C++ X. cleaned the tokenizer json parsing to use jsoncpp and refactor a bit X. need to think through how to i want to design the memory layout for the architecture. X. trying to understand safetensors model format to load the weights"
X Link 2025-11-16T10:48Z 3391 followers, 8956 engagements

"I built my first C++ inference engine for LLM - InferGPT. This is a video walkthrough of the engine(with audio) and currently it has - - BPE encoder and decoder. - gpt2 architecture implemented from scratch. - greedy sampling and temperature based sampling for tokens. - generates XX tokens/sec on CPU. I will be extending this with - - operator fusion + multithreading - implement SIMD instructions - add quantization algorithms with performance benchmarks - support GPU operations via CUDA C++"
X Link 2025-11-29T09:14Z 3392 followers, 13K engagements

"Im building my own C++ inference engine for LLMs that runs on CPU it currently supports - byte pair encoding for tokenization. - gpt2 architecture implemented with strided memory. - kv cache for speedup. - greedy sampling and temperature based sampling for tokens. - NEON instrinsics SIMD for dot product operations and few segments of the multi-head attention layer. - currently generates XX tokens/sec on CPU. I want to extend this to - have flash attention - basically reduce my X pass softmax to X passes and also blocked matrix multiplication - operator fusion + multithreading - add"
X Link 2025-12-04T09:00Z 3398 followers, 12.6K engagements

"working on quantization for my C++ inference engine. X. implemented absmax quantization and converted weights from fp32 to int8. X. now ill need to extend my tensor class to create modify and view int8 tensors. X. also implemented perplexity scores for the model prompt. gpt2 small has perplexity scores of XX which seems right to me. X. large refactor of the inference engine upcoming"
X Link 2025-12-06T08:50Z 3393 followers, 3816 engagements

"unity docs has surprising well detailed documentation on arm NEON SIMD"
X Link 2025-12-12T12:03Z 3391 followers, XXX engagements

"@tm23twt @parthshr370 ghost of yotei but wheel is the MC"
X Link 2025-12-14T09:08Z 3393 followers, XXX engagements