LunarCrush LLM | Vision Token (VISION)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

[Vision Token](/topic/vision-token)

### Top Social Posts

*Showing only X posts for non-authenticated requests. Use your API key in requests for full results.*

"the crazy part isnt the ocr itself. its that deepseek-ocr basically gives llms a visual memory system. people remember pages as pictures not paragraphs. this is the first real step toward machines doing the same. feels like early agi memory architecture tbh"
[X Link](https://x.com/diptalksdeep/status/1980531173911920744) [@diptalksdeep](/creator/x/diptalksdeep) 2025-10-21T07:06Z X followers, XX engagements

"1 vision token = XX text tokens is not something that can be concluded based on their experiments. XXX% text reconstruction does not imply that the vision tokens encode all textual information since the language decoder plays a big role. Need to remove the language prior to get a more accurate compression ratio. E.g. what if the image contains text in non-readable order"
[X Link](https://x.com/LiJunnan0409/status/1980446374144667774) [@LiJunnan0409](/creator/x/LiJunnan0409) 2025-10-21T01:29Z 2772 followers, XXX engagements

"DeepSeek just dropped an insane new model based on a revolutionary idea for token management. DeepSeek-OCR flips the entire vision token problem on its head. Instead of struggling to manage thousands of tokens per page they compress 1024x1024 images down to 64-400 vision tokens while keeping XX% of information. Simple PDFs XX tokens. Complex documents XXX tokens max. For context: previous models could burn 6000+ tokens for what DeepSeek's new model can do with XXX. The strategic play here isn't just efficiency - it's that processing text as compressed images can actually use LESS compute than"
[X Link](https://x.com/ariangibson/status/1980401419665453228) [@ariangibson](/creator/x/ariangibson) 2025-10-20T22:31Z XX followers, XX engagements

"sooo it's essentially the embodiment of "a picture is worth a thousand words" A single vision token can encode the equivalent of multiple text tokens since text tokens have fixed representations while v tokens do not. I'm not sure but: Is using vision tokens just a precedent cause existing implementations are already in place to use those as a sort of "soft token" i.e alot of info bunched into one place Can we not use a more novel approach or is that irrelevant I am a vision noob"
[X Link](https://x.com/muzzdotdev/status/1980327081956503741) [@muzzdotdev](/creator/x/muzzdotdev) 2025-10-20T17:35Z XXX followers, 1721 engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

Vision Token

Top Social Posts

Showing only X posts for non-authenticated requests. Use your API key in requests for full results.

"the crazy part isnt the ocr itself. its that deepseek-ocr basically gives llms a visual memory system. people remember pages as pictures not paragraphs. this is the first real step toward machines doing the same. feels like early agi memory architecture tbh"
X Link @diptalksdeep 2025-10-21T07:06Z X followers, XX engagements

"1 vision token = XX text tokens is not something that can be concluded based on their experiments. XXX% text reconstruction does not imply that the vision tokens encode all textual information since the language decoder plays a big role. Need to remove the language prior to get a more accurate compression ratio. E.g. what if the image contains text in non-readable order"
X Link @LiJunnan0409 2025-10-21T01:29Z 2772 followers, XXX engagements

"DeepSeek just dropped an insane new model based on a revolutionary idea for token management. DeepSeek-OCR flips the entire vision token problem on its head. Instead of struggling to manage thousands of tokens per page they compress 1024x1024 images down to 64-400 vision tokens while keeping XX% of information. Simple PDFs XX tokens. Complex documents XXX tokens max. For context: previous models could burn 6000+ tokens for what DeepSeek's new model can do with XXX. The strategic play here isn't just efficiency - it's that processing text as compressed images can actually use LESS compute than"
X Link @ariangibson 2025-10-20T22:31Z XX followers, XX engagements

"sooo it's essentially the embodiment of "a picture is worth a thousand words" A single vision token can encode the equivalent of multiple text tokens since text tokens have fixed representations while v tokens do not. I'm not sure but: Is using vision tokens just a precedent cause existing implementations are already in place to use those as a sort of "soft token" i.e alot of info bunched into one place Can we not use a more novel approach or is that irrelevant I am a vision noob"
X Link @muzzdotdev 2025-10-20T17:35Z XXX followers, 1721 engagements