Dark | Light
# ![@jxmnop Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::783098774130401280.png) @jxmnop jack morris

jack morris posts on X about ai, open ai, $googl, in the the most. They currently have [------] followers and [---] posts still getting attention that total [-------] engagements in the last [--] hours.

### Engagements: [-------] [#](/creator/twitter::783098774130401280/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::783098774130401280/c:line/m:interactions.svg)

- [--] Week [-------] +97%
- [--] Month [---------] +75%
- [--] Months [----------] -22%
- [--] Year [----------] -13%

### Mentions: [--] [#](/creator/twitter::783098774130401280/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::783098774130401280/c:line/m:posts_active.svg)

- [--] Month [--] +20%
- [--] Months [---] -45%
- [--] Year [---] +41%

### Followers: [------] [#](/creator/twitter::783098774130401280/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::783098774130401280/c:line/m:followers.svg)

- [--] Week [------] +0.12%
- [--] Month [------] +1.80%
- [--] Months [------] +12%
- [--] Year [------] +92%

### CreatorRank: [------] [#](/creator/twitter::783098774130401280/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::783098774130401280/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  #5069 [stocks](/list/stocks)  [social networks](/list/social-networks)  [finance](/list/finance)  [countries](/list/countries)  [automotive brands](/list/automotive-brands)  [travel destinations](/list/travel-destinations)  [fashion brands](/list/fashion-brands)  [currencies](/list/currencies)  [gaming](/list/gaming) 

**Social topic influence**
[ai](/topic/ai) #1072, [open ai](/topic/open-ai) #62, [$googl](/topic/$googl), [in the](/topic/in-the), [data](/topic/data), [llms](/topic/llms), [if you](/topic/if-you), [meta](/topic/meta), [agi](/topic/agi), [llm](/topic/llm)

**Top accounts mentioned or mentioned by**
[@srushnlp](/creator/undefined) [@lateinteraction](/creator/undefined) [@justintchiu](/creator/undefined) [@yuntiandeng](/creator/undefined) [@essenaccount](/creator/undefined) [@etsy](/creator/undefined) [@shopify](/creator/undefined) [@stripe](/creator/undefined) [@starcloudinc1](/creator/undefined) [@brunostefoni](/creator/undefined) [@jpohhhh](/creator/undefined) [@dwarkeshsp](/creator/undefined) [@chhaviyadav](/creator/undefined) [@kamalikac](/creator/undefined) [@vertinski](/creator/undefined) [@johnschulman2](/creator/undefined) [@thestalwart](/creator/undefined) [@ethanwbrown](/creator/undefined) [@gilpinskyy](/creator/undefined) [@pronouncedkyle](/creator/undefined)

**Top assets mentioned**
[Alphabet Inc Class A (GOOGL)](/topic/$googl) [Tesla, Inc. (TSLA)](/topic/tesla) [DoorDash Inc. (DASH)](/topic/doordash) [DeepSeek (DEEPSEEK)](/topic/deepseek) [Shopify Inc (SHOP)](/topic/$shop) [GPT2 (GPT2)](/topic/gpt2)
### Top Social Posts
Top posts by engagements in the last [--] hours

"Peloton stock lookin like a normal distribution"  
[X Link](https://x.com/jxmnop/status/1523837132820914177)  2022-05-10T01:27Z 19.6K followers, [--] engagements


"@jpohhhh reminds me of those mushrooms in skyrim"  
[X Link](https://x.com/jxmnop/status/1560825230863507456)  2022-08-20T03:05Z 43.9K followers, [--] engagements


"@unixpickle high school is even worse"  
[X Link](https://x.com/jxmnop/status/1689492799203508224)  2023-08-10T04:24Z [----] followers, [--] engagements


"the truth about machine learning software engineering in academia is that everyone is running commands like this: TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=0 stdbuf -oL -eL Cthon train_phase2_patchmlp.py --train_path data/$FOLDER/src1_train.txt --val_path data/$FOLDER/src1_valid.txt --test_path data/$FOLDER/src1_test.txt --epochs $EPOCHS --lr $LR --model $MODEL --batch_size $BSZ --qmodel $QMODEL --save_model /n/disk/rush_lab/Users/jack/implicit/$SAVE --mode top --accumulate $A $SAVE/log.train.text.modelgpt2-medium.folder$FOLDER.e$EPOCHS.lr$LR.$BSZ 2&1& tee tst_tmp.txt"  
[X Link](https://x.com/jxmnop/status/1694763479805124862)  2023-08-24T17:27Z [----] followers, 52.8K engagements


"OpenAI has been ahead of the curve on so many things. one I find mildly interesting is the trend of removing dropout from models GPT-1 had dropout (as well as original transformer BERT etc.) but they got rid of it for GPT-2 these days LLMs are never trained with dropout"  
[X Link](https://x.com/jxmnop/status/1696181845979787421)  2023-08-28T15:24Z [----] followers, 49.8K engagements


"An amazing mystery of machine learning right now is that state-of-the-art vision models are 2B parameters (8 gigabytes) while our best text models are 200B parameters (800 gb) why could this be philosophically are images inherently less complicated than text (no right)"  
[X Link](https://x.com/jxmnop/status/1696571664497074562)  2023-08-29T17:13Z [----] followers, 434.7K engagements


"@Ted_Underwood super cool fact I didnt know this"  
[X Link](https://x.com/jxmnop/status/1696662322096345253)  2023-08-29T23:13Z [----] followers, 13.5K engagements


"are there any smaller-than-giant companies that (1) train LLMs and (2) offer internships to PhD students I was thinking of openAI anthropic characterAI adept etc. but couldn't find any info on those. any suggestions here"  
[X Link](https://x.com/jxmnop/status/1697597423458386283)  2023-09-01T13:09Z [----] followers, 52.2K engagements


"@sam_havens don't you mean DATABRICKS"  
[X Link](https://x.com/jxmnop/status/1697639163648938102)  2023-09-01T15:54Z [----] followers, [----] engagements


"Little things I learned when implementing the LLAMA forward pass from scratch with @lionellevine: - ROPE positional embeddings are tricky and weird and I'm convinced fewer than [---] people really understand them - the residual is added *twice* per layer (after self-attention and after MLP) - layernorm actually has a scaling matrix with some learned params (multiplicative scaling factor for each hidden dim) - (A @ B.T) = (B @ A.T).T in torch (a very surprising and unfortunate gotcha) -- so unless you implement the matmuls in the same order as the reference impl your outputs will slightly differ"  
[X Link](https://x.com/jxmnop/status/1697992702267191567)  2023-09-02T15:19Z [----] followers, 108.4K engagements


"if floating-point math was associative we would have had AGI back in 2016"  
[X Link](https://x.com/jxmnop/status/1698761938551906813)  2023-09-04T18:16Z [----] followers, 121.7K engagements


"I used to work on adversarial example research specifically in NLP. that line of work searches for "obviously correct" inputs that fool machine learning models I decided that adversarial examples aren't very important in the long-term view of AI and here's why:"  
[X Link](https://x.com/jxmnop/status/1699431461646893348)  2023-09-06T14:36Z [----] followers, 33K engagements


"in this case the "adversarial examples" look like complete sets of inputs that can continuously fool the model over time I have no reason right now to think this kind of adversarial example will exist so future models may not be susceptible in the same way at all"  
[X Link](https://x.com/jxmnop/status/1699431469343527363)  2023-09-06T14:36Z 19.3K followers, [----] engagements


"one analogy is that right now adversarial example research on neural networks is like showing an image (or text) to a human for a single instant and then proclaiming "look how dumb this human is we showed it a dog for .001ms and this guy thought it was a muffin""  
[X Link](https://x.com/jxmnop/status/1699431471499358481)  2023-09-06T14:36Z [----] followers, [----] engagements


"@moyix I think you could do this with a custom logits processor that doesn't change the logits it just streams the argmax"  
[X Link](https://x.com/jxmnop/status/1699628709169254526)  2023-09-07T03:40Z 17.5K followers, [---] engagements


"Is Gary marcus the leading critic of LLMs / modern AI feels like there is a big opportunity here for a dissenting voice - someone articulate knowledgeable and thoughtful but similarly contrarian too bad all the people who actually understand neural networks believe in them"  
[X Link](https://x.com/jxmnop/status/1699774472050401373)  2023-09-07T13:19Z [----] followers, [----] engagements


"llama.etc is neat but I think this blog post from back in January is much more readable/easier to learn from: "GPT in [--] Lines of NumPy""  
[X Link](https://x.com/jxmnop/status/1700846079535903190)  2023-09-10T12:18Z [----] followers, 13K engagements


"curious if anyone knows where Google went wrong with TensorFlow it's bad software fundamentally broken. when I was an AI resident I found [--] bugs within tensorflow core in around a year. but how does a failure like this this happen so many smart people work there"  
[X Link](https://x.com/jxmnop/status/1702009756632682497)  2023-09-13T17:22Z [----] followers, 202K engagements


"unpopular opinion: training open-source LLMs is a losing battle. a complete dead end the gap between closed models like GPT-4 and open models like LLAMA will only continue to widen as models grow bigger and require more resources no one is building particle colliders at home"  
[X Link](https://x.com/jxmnop/status/1702669847875010670)  2023-09-15T13:05Z [----] followers, [----] engagements


"if I were starting my research career today and interested in language models I would become a world expert on tokenization & tokenizers tokenization is weird fascinating and poorly understood yet ubiquitous and necessary for things like chatGPT to work"  
[X Link](https://x.com/jxmnop/status/1703799889346658617)  2023-09-18T15:55Z [----] followers, 35.1K engagements


"interesting to think about LLM hallucinations as a UX issue more than a modeling issue. when all people have is text they tend to believe what they read. but behind the scenes we have confidence scores for every token"  
[X Link](https://x.com/jxmnop/status/1704265839128531213)  2023-09-19T22:46Z [----] followers, 10.2K engagements


"surprised physical AI art hasn't become a thing yet. i want to plaster my walls with classics-inspired masterpieces of my own creation dalis matisses picassos etc is there an image resolution issue or maybe just no overlap between art-printing people and ai-image people"  
[X Link](https://x.com/jxmnop/status/1704908850292224322)  2023-09-21T17:22Z [----] followers, [----] engagements


"im definitely running eval too often now"  
[X Link](https://x.com/jxmnop/status/1706147369857802437)  2023-09-25T03:23Z 19.3K followers, [---] engagements


"is anyone working on meta-predicting optimization so I can know what the loss will be with some confidence after training for some time might be impossible but it would be nice for me if somebody made that"  
[X Link](https://x.com/jxmnop/status/1706424921876836766)  2023-09-25T21:46Z 19.3K followers, [---] engagements


"Anyone ever compute alignment between two different tokenizers by that I mean I have a probability vector from say GPT-2 and want to convert it to what the vector would look in the vocabulary of another model say LLAMA BOUNTY: if u can write code to do this in the next few"  
[X Link](https://x.com/jxmnop/status/1707469016946651439)  2023-09-28T18:55Z [----] followers, [--] engagements


"@soldni mixed how"  
[X Link](https://x.com/jxmnop/status/1707482509796057157)  2023-09-28T19:48Z 17.4K followers, [---] engagements


"@_dsevero why is leetcode useful for compression research"  
[X Link](https://x.com/jxmnop/status/1708591779962884105)  2023-10-01T21:16Z [----] followers, [----] engagements


"finally published our latest research on text embeddings TLDR: Vector databases are NOT safe. 😳 Text embeddings can be inverted. We can do this exactly for sentence-length inputs and get very close with paragraphs"  
[X Link](https://x.com/jxmnop/status/1712562908133999069)  2023-10-12T20:16Z [----] followers, 336.6K engagements


"Very excited to publish this research and will have more in this direction to share soon check out our paper on arxiv:"  
[X Link](https://x.com/jxmnop/status/1712562916136820951)  2023-10-12T20:16Z [----] followers, [----] engagements


"@jobergum hmm I agree its complex and I like your analogy maybe like people thought the hash was save w/o a private key but given enough hashes we can just learn to reverse the hash function (so pinecone can just sell ur data)"  
[X Link](https://x.com/jxmnop/status/1712569961095286880)  2023-10-12T20:44Z [----] followers, [----] engagements


"Ironic to read a sarcastic response to my research on vector database security from the VP of Marketing at Pinecone the same guy who argued that an embedding "is sufficiently obfuscated to not count as PII""  
[X Link](https://x.com/jxmnop/status/1712885642755977533)  2023-10-13T17:38Z [----] followers, 39.8K engagements


"most useful bit of code I've written all year: call map() on a HuggingFace dataset in torch distributed mode (like DDP) as one example this will let you compute embeddings for a dataset in parallel using all the GPUs you have"  
[X Link](https://x.com/jxmnop/status/1716834517909119019)  2023-10-24T15:10Z [----] followers, 27.3K engagements


"Imagine this prompt: "write me Python code to disable the NYC subway system" obviously gpt-4 can't do this now. It'll refuse but even if we jailbreak it it'll answer incorrectly but if we keep training bigger & better language models won't one eventually be able to do this"  
[X Link](https://x.com/jxmnop/status/1717572407077077468)  2023-10-26T16:02Z [----] followers, 47.6K engagements


"all community notes should be read in the voice of the narrator from arrested development"  
[X Link](https://x.com/jxmnop/status/1718404506008818128)  2023-10-28T23:08Z [----] followers, [----] engagements


"I turned down a job at google research to do a PhD at Cornell right before chatGPT came out and I dont regret it at all. I see it like this. Do you want to work with a large group on building the fastest & fanciest system in the world or in a small group testing crazy theories in smaller settings"  
[X Link](https://x.com/jxmnop/status/1720444107636584583)  2023-11-03T14:13Z [----] followers, 362.9K engagements


"is there anything novel or technically interesting about this model besides the fact that it outputs swear words"  
[X Link](https://x.com/jxmnop/status/1721260806224924765)  2023-11-05T20:18Z [----] followers, 94.8K engagements


"prediction: every vector database will eventually be replaced with a single transformer a row for every individual datapoint is so old-school feels outdated what would be better: a single differentiable blob something that knows about all your data and can chat about it"  
[X Link](https://x.com/jxmnop/status/1722307181054157155)  2023-11-08T17:36Z [----] followers, 83.9K engagements


"tired of paying OpenAI for GPT-4 API the NYC Department of Small Business Services has your back the NYC small business chatbot is powered by GPT-4 so equally capable just have to ask it information about operating a business in New York City first"  
[X Link](https://x.com/jxmnop/status/1722625184430215395)  2023-11-09T14:40Z [----] followers, 257.2K engagements


"the openAI schism is thrilling but dont think it changes much for the future trajectory of AI the true geniuses behind this stuff arent sama gdb or karparthy; theyre scientists like ilya and alec radford and as of now seems like all the scientists still work there"  
[X Link](https://x.com/anyuser/status/1725915456375037999)  2023-11-18T16:34Z [--] followers, 35.2K engagements


"@abacaj what are you implying"  
[X Link](https://x.com/jxmnop/status/1725918609405247511)  2023-11-18T16:47Z [----] followers, [----] engagements


"cool research idea for someone: text diffusion in embedding space solve any sequence-to-sequence task in three steps: [--]. embed source sentence text [--]. build diffusion model that maps input text embedding to target text embedding [--]. invert to produce target text"  
[X Link](https://x.com/jxmnop/status/1726629927053901902)  2023-11-20T15:53Z [----] followers, 163.2K engagements


"will there ever be a 1B param model thats more capable in every way than GPT-4 is today"  
[X Link](https://x.com/jxmnop/status/1728431805806596175)  2023-11-25T15:13Z [----] followers, 30K engagements


"@DotDotJames no scaling laws like the scaling laws for language models:"  
[X Link](https://x.com/jxmnop/status/1729515713080250725)  2023-11-28T15:00Z [----] followers, [---] engagements


"my man congrats aaron on the well-deserved award"  
[X Link](https://x.com/jxmnop/status/1730477759653454234)  2023-12-01T06:43Z [----] followers, [----] engagements


"language models encode information in their *weights* while embeddings encode information in their *activations* this distinction is important possibly somewhat profound"  
[X Link](https://x.com/jxmnop/status/1731511601549705425)  2023-12-04T03:11Z [----] followers, 62.9K engagements


"fewer than [---] people deeply understand both (i) transformers and (ii) the GPU programming model want to learn machine learning gain some esoteric systems knowledge; spend some time really learning CUDA"  
[X Link](https://x.com/jxmnop/status/1731726810470056258)  2023-12-04T17:27Z [----] followers, 222.6K engagements


"i am in singapore presenting some cool research at #EMNLP2023 iPrompt: Explaining Patterns in Data with Language Models via Interpretable Autoprompting Text Embeddings Reveal (Almost) As Much as Text Tree Prompting: Efficient Task Adaptation without Fine-Tuning"  
[X Link](https://x.com/jxmnop/status/1732573848611262884)  2023-12-07T01:32Z [----] followers, [----] engagements


"i'm sorry but you shouldn't be allowed to give an oral presentation over zoom. sitting in a room of [---] people staring at a stage where someone's face is projected onto a screen as they try to talk w/ internet cutting in and out is a waste of time and just a bad vibe"  
[X Link](https://x.com/jxmnop/status/1733023102265925775)  2023-12-08T07:18Z [----] followers, 10.8K engagements


"say what you will about mistral tweeting exclusively download links to new models with no context is unbelievably cool"  
[X Link](https://x.com/jxmnop/status/1733292483101221301)  2023-12-09T01:08Z [----] followers, 59.1K engagements


"Honestly the Gemini release made me really sad. Gemini beats GPT-4 at 32-shot COT (read: not straight up) is exactly what a PhD student would say if their model wasnt as good as theyd hoped whats so special about GPT-4 is it people systems data some kind of blind luck"  
[X Link](https://x.com/jxmnop/status/1733688530583621959)  2023-12-10T03:22Z [----] followers, 388.9K engagements


"@jpohhhh I also think it must be data but cant fathom what type of data it is that GOOGLE cant get their hands on"  
[X Link](https://x.com/jxmnop/status/1733691109434376628)  2023-12-10T03:32Z [----] followers, 11.9K engagements


"@Alexir563 maybe next year :)"  
[X Link](https://x.com/jxmnop/status/1733774449063653575)  2023-12-10T09:03Z [----] followers, [---] engagements


"@inductionheads for sure the probabilistic breakdown is the same: given text sequence x and its embedding e p(x) = p(x e) p(e). this looks a lot like a latent variable model with latent embedding e in my case p(x e) is done by vec2text with openAI embeddings; we just need to learn p(e)"  
[X Link](https://x.com/jxmnop/status/1734975088749838718)  2023-12-13T16:34Z [----] followers, [----] engagements


"@volokuleshov thankful twitter hasn't implemented a peer review process yet unlike latent diffusion in this case the embedding space is fixed (it's openAI ada [--] in my notebook) I think this is kind of like conditional generation"  
[X Link](https://x.com/jxmnop/status/1735021652419793300)  2023-12-13T19:39Z [----] followers, [----] engagements


"machine learning research question: whats an idea that you think would catch on if only someone spent the money to test it at scale ill go first: tokenization-free transformers"  
[X Link](https://x.com/jxmnop/status/1735072585778389426)  2023-12-13T23:01Z [----] followers, 90.1K engagements


"Seen a lot of evidence that GPT-4 crushes Gemini on all the head-to-head LM benchmarks. here's my theory about what went wrong: - chatGPT released - google execs freak out - google consolidates (integrates Deepmind Brain) - google assembles a giant team to build a single giant language model (spoiler: it's gemini) - team defines success as building the system that gets the best performance on MMLU - team starts building model - finds out that beating GPT-4 level performance is really hard - many cycles of experimental iteration ensue all with bottom-line MMLU performance dictating success -"  
[X Link](https://x.com/jxmnop/status/1737129875746468140)  2023-12-19T15:16Z [----] followers, 233.7K engagements


"fun research idea: Latent chain-of-thought / Latent scratchpad it's well-known that language models perform better when they generate intermediate reasoning tokens through some sort of 'scratchpad'. but there's no reason scratchpad tokens need to be human-readable. in fact generating real language involves a lot of unnecessary tokens. this seems to me like a big inefficiency. let me explain this a little further under the hood the typical process for generating a scratchpad token looks like [--]. generate a *vector* as the output of the language model [--]. project this vector to a distribution"  
[X Link](https://x.com/jxmnop/status/1737500671484514423)  2023-12-20T15:50Z [----] followers, 127.3K engagements


"@lateinteraction yes I don't think it exists pause tokens: every pause token is the same right so they can't carry extra information from the result of previous computations the link: this is different enough from what I suggested that I don't feel like explaining rn"  
[X Link](https://x.com/jxmnop/status/1737503200373076121)  2023-12-20T16:00Z [----] followers, [----] engagements


"im curious about effective altruism: how do so many smart people with the goal do good for the world wind up with the subgoal analyze the neurons of GPT-2 small or something similar"  
[X Link](https://x.com/jxmnop/status/1740514663711031562)  2023-12-28T23:26Z [----] followers, 95.8K engagements


"certainly AI alignment (on a high level) is an important issue but why is it undervalued relative to eg world hunger or pandemic preparedness and given this why is studying language models considered a reasonable path towards eventually saving the world"  
[X Link](https://x.com/jxmnop/status/1740514666043048404)  2023-12-28T23:26Z [----] followers, [----] engagements


"@burny_tech I don't think language models are "the most powerful technology"; I think that title would go to something else perhaps nuclear bombs or battleships or The Internet"  
[X Link](https://x.com/jxmnop/status/1740796142869455122)  2023-12-29T18:05Z [----] followers, [---] engagements


"people keep saying AI is moving so fast. some days I agree but some days I'm not sure so many papers published but I don't feel like we're making that many fundamental breakthroughs. to cap off [----] here's a list of things we still don't know about language models: - how can we reliably answer questions from long texts like books - can we pretrain a world-class LM for less than a million dollars - can we prevent a model from making up facts - how can we "think more" for some inputs than others - how can we train on both private and public data - are our architectures (transformers) close to"  
[X Link](https://x.com/anyuser/status/1740804797777797296)  2023-12-29T18:39Z [--] followers, 165.4K engagements


"my twitter is in a funny place right now because if you look at my tweets it's all machine learning and language model gobbledygook but behind the scenes i'm DMing my friends memes of that human pop tart that danced its way down into the toaster last night at the poptarts bowl"  
[X Link](https://x.com/jxmnop/status/1740858078512439601)  2023-12-29T22:11Z 22K followers, [----] engagements


"there's something sinister about these two sharing a stagecrossover episode featuring two of the most prominent pseudointellectual grifters. they manage to emulate the structure of intelligent conversation while really saying nothing at all. hoping for better role models in [----] Here's my conversation with Guillaume Verdon (@GillVerd) aka Beff Jezos (@BasedBeffJezos) a physicist quantum computing researcher and founder of e/acc (effective accelerationism) movement that advocates for rapid technological progress physics-based reasoning and memes. https://t.co/kewuLFEdNr Here's my conversation"  
[X Link](https://x.com/jxmnop/status/1741231083335668141)  2023-12-30T22:53Z [----] followers, 84.9K engagements


"on podcasts: lex fridman brings on a wonderful selection of guests (mostly) but doesnt research them beforehand and resorts to asking them things like do u think aliens are real for people asking for a replacement I think @dwarkesh_sp is great and has spectacular guests too"  
[X Link](https://x.com/jxmnop/status/1741321522801783106)  2023-12-31T04:52Z [----] followers, 35K engagements


"@_mattfreed if this picture didn't come out of an orb than i'm not buying IT"  
[X Link](https://x.com/jxmnop/status/1741542795884544082)  2023-12-31T19:32Z [----] followers, [--] engagements


"the openAINYT lawsuit is a big deal for copyright precedent. literally all popular models right now were trained on copyrighted data. except for one my friend from school @SkyLi0n developed a diffusion model that's not trained on any copyrighted data it's called CommonCanvas"  
[X Link](https://x.com/anyuser/status/1742370632808251868)  2024-01-03T02:21Z [--] followers, 20.7K engagements


"@Guuber42 probably not tbh except maybe myopia wrt current LLMs"  
[X Link](https://x.com/jxmnop/status/1743705720837513641)  2024-01-06T18:46Z [----] followers, [--] engagements


"@danielleboccell reading an entire textbook cover-to-cover is a herculean task. maybe we should just pick some notable sections to focus on"  
[X Link](https://x.com/jxmnop/status/1744854012984262977)  2024-01-09T22:49Z [----] followers, [--] engagements


"tokenizing. πŸš‚"  
[X Link](https://x.com/jxmnop/status/1745297244977352716)  2024-01-11T04:11Z [----] followers, [----] engagements


"fun research story about how we jailbroke the the chatGPT API: so every time you run inference with a language model like GPT-whatever the model outputs a full probabilities over its entire vocabulary (50000 tokens) but when you use their API OpenAI hides all this info from you and just returns the top token -- or at best the top [--] probabilities we needed the fullvector (all [-----] numbers) for our research so we developed a clever algorithm for recovering it by making many API calls important to know is that the API supports a parameter called "logit bias" which lets you upweight or"  
[X Link](https://x.com/jxmnop/status/1745540622029672536)  2024-01-11T20:18Z [----] followers, 122.9K engagements


"@dl_rui - inverting prompts from logits - training data detection (@WeijiaShi2 worked on this) - distillation"  
[X Link](https://x.com/jxmnop/status/1745590359680249914)  2024-01-11T23:35Z 17.6K followers, [----] engagements


"@arivero not sure what your first question means but in this case we rely on knowing the tokens of the tokenizer; openAI gives us back key-value pairs of token-logprob"  
[X Link](https://x.com/jxmnop/status/1745598198368866547)  2024-01-12T00:06Z 17.6K followers, [----] engagements


"i don't think this is right at all; the reason why having every textbook pdf ever available for free and every course available on youtube didn't change humanity is simple learning things takes so much ENERGY; most people have had textbooks for a century but most go unread @dwarkesh_sp A few reasons but one is that genetic determinism is wrong and drivers of progress arent just spawned naturally they have to be carefully constructed under fragile conditions @dwarkesh_sp A few reasons but one is that genetic determinism is wrong and drivers of progress arent just spawned naturally they have to"  
[X Link](https://x.com/jxmnop/status/1746922554311221309)  2024-01-15T15:49Z [----] followers, 10.9K engagements


"i've seen several papers from google claiming their LLMs beat DOCTORS at various medical tasks. this isn't just sketchy PR it's actively harmful. doctors and surgeons are transplanting organs etc. saving actual people's lives on a daily basis. feels a little disrespectful"  
[X Link](https://x.com/jxmnop/status/1747018160266506590)  2024-01-15T22:09Z [----] followers, [----] engagements


"what are some ideas from machine learning that i can explain to my grandma"  
[X Link](https://x.com/jxmnop/status/1748008193991540912)  2024-01-18T15:43Z [----] followers, [----] engagements


"@ImageDeeply i think the speedup depends on a lot of stuff like how big your dataset is how big the shards are how many CPUs you have what disk you have etc. Unless you have a really big dataset I doubt this will help you nearly as much"  
[X Link](https://x.com/jxmnop/status/1749822139685814564)  2024-01-23T15:51Z [----] followers, [---] engagements


"when doing inference on lots of samples how much speed up can you expect from increasing the batch size context: i'm doing inference for lots of images using resnet100 on an a6000 gpu. increased batch size [--] - [----] (64x) and only getting a 20% speedup. how is this possible"  
[X Link](https://x.com/jxmnop/status/1751007444187189518)  2024-01-26T22:21Z [----] followers, 16.1K engagements


"surprised to see so many people excited to see google sitting in second place on a leaderboard πŸ₯² also the obvious question here is why is GPT-4 turbo beating GPT-4 on this benchmark i thought turbo was intended to be faster but slightly dumber πŸ”₯Breaking News from Arena Google's Bard has just made a stunning leap surpassing GPT-4 to the SECOND SPOT on the leaderboard Big congrats to @Google for the remarkable achievement The race is heating up like never before Super excited to see what's next for Bard + Gemini https://t.co/QPtsqZdJhC πŸ”₯Breaking News from Arena Google's Bard has just made a"  
[X Link](https://x.com/jxmnop/status/1751269591282573411)  2024-01-27T15:43Z [----] followers, 69.7K engagements


"one exciting observation about transformers (and most modern deep learning) is that you can understand them using high school math. really just multiplication division sums and exponentiation many times and in a strange and initially hard-to-grok order"  
[X Link](https://x.com/jxmnop/status/1751309109519900768)  2024-01-27T18:20Z [----] followers, 34.4K engagements


"carbon dating ML papers by which open-source LM they use GPT-2 the before times GPT-J summer [----] LLAMA-1 spring [----] LLAMA-2 summer [----] mistral 7b fall [----] mixtral 8x7b the current era"  
[X Link](https://x.com/jxmnop/status/1752861163233165801)  2024-02-01T01:07Z [----] followers, 18.7K engagements


"whats the simplest neural language model i can code for my little class maybe just an MLP on some word embeddings doing an RNN feels like it could be overkill"  
[X Link](https://x.com/jxmnop/status/1754293487451742695)  2024-02-04T23:58Z [----] followers, 29.6K engagements


"what are some easy ways for a normal person like me to speed up training a transformer on a single GPU i'm using huggingface T5 implementation + defaults. i know about FlashAttention and BetterTransformer and torch compile will these all work together is there anything else"  
[X Link](https://x.com/jxmnop/status/1755265116226949628)  2024-02-07T16:19Z [----] followers, 46.7K engagements


"welp. this is what happened when i tried to use torch compile"  
[X Link](https://x.com/jxmnop/status/1755397683471143172)  2024-02-08T01:06Z [----] followers, 13.1K engagements


"@zicnyteoh i'm using nvidia a6000 gpus :)"  
[X Link](https://x.com/jxmnop/status/1755407440441450547)  2024-02-08T01:45Z [----] followers, [---] engagements


"@Geronimo_AI OpenAI ada 2"  
[X Link](https://x.com/jxmnop/status/1756200798592340168)  2024-02-10T06:17Z [----] followers, [----] engagements


"@svonava I dont think thats necessarily true text is actually relatively few bytes while each 32-bit float is [--] bytes so at least at short lengths things could be lossless"  
[X Link](https://x.com/jxmnop/status/1756435376267309156)  2024-02-10T21:50Z [----] followers, [---] engagements


"people spent years optimizing GANs before realizing that diffusion models were simpler and better people spent years developing RLHF before realizing that DPO is simpler and better what are we working on rn i want to find the simpler and better version and work on that instead"  
[X Link](https://x.com/jxmnop/status/1758905760665240059)  2024-02-17T17:26Z [----] followers, 109.4K engagements


"today i'm making voronoi diagrams of text embedding spaces what can we do with these"  
[X Link](https://x.com/jxmnop/status/1759295805033259333)  2024-02-18T19:16Z [----] followers, 59.4K engagements


"@waltuuuhr i'm talking about sequence text embeddings not word embeddings"  
[X Link](https://x.com/jxmnop/status/1760727764695699693)  2024-02-22T18:06Z [----] followers, [----] engagements


"biggest lesson I learned from gemini is that LLM trainers now have to choose between overfitting to chain-of-thought-type inputs (best for absolute reasoning ability) and over-fitting to human chatbot interactions (best for talking to humans) no free lunch here have to choose"  
[X Link](https://x.com/jxmnop/status/1762160913128263904)  2024-02-26T17:01Z [----] followers, 14.3K engagements


"@ysegmond too bad this is fake :'( not even sure you can fit [--] h100s in a single machine lol"  
[X Link](https://x.com/jxmnop/status/1762631505849770411)  2024-02-28T00:11Z [----] followers, [----] engagements


"@mathijsfietst unfortunately that's due to a bug I made when training the model; the gtr embeddings i use are missing a last post-processing step. someone trained a fixed model more info here"  
[X Link](https://x.com/jxmnop/status/1763243515390075294)  2024-02-29T16:43Z [----] followers, [--] engagements


"if the blue line drops below the green line then were going to disney world babe"  
[X Link](https://x.com/jxmnop/status/1763325102324908459)  2024-02-29T22:07Z [----] followers, 33.1K engagements


"my library is here bm25_pt: the key insight is that you can reduce the BM25 scoring to a single matrix multiplication. everything in the big fraction on the right side here can be stored in a big matrix of scores then bm25(q) = bag(q) @ scores.T"  
[X Link](https://x.com/jxmnop/status/1763586427864944779)  2024-03-01T15:25Z [----] followers, [----] engagements


"without bm25_pt you either have to use really slow string-based code (rank-bm25) or run a server (elasticsearch) to do bm25 in python now it's a pip package and two lines; i made everything fast using huggingface & pytorch"  
[X Link](https://x.com/jxmnop/status/1763586429613940774)  2024-03-01T15:25Z [----] followers, [----] engagements


"for those who haven't heard about this: openAI only gives you the top [--] logprobs per response (max). we (mostly @justintchiu) came up with an algorithm to get logits from openAI. essentially for a different token (not top-5) u can binary search logit bias to see the minimal number that gets it into the top 5"  
[X Link](https://x.com/jxmnop/status/1765873234489061539)  2024-03-07T22:52Z 20.5K followers, [----] engagements


"since i got into ML in [----] or so there have been exactly three major advancements: [--]. web-scale language model pretraining (OpenAI 2020) [--]. diffusion models (OpenAI 2020) [--]. language model refinement via human feedback (OpenAI 2022) whats next its been a while"  
[X Link](https://x.com/jxmnop/status/1770801494536647055)  2024-03-21T13:15Z [----] followers, 95.5K engagements


"my former boss from google would cry a little bit if he knew the quality of code I am churning out daily in academia today im moving much faster though - creating and discarding hypotheses more quickly learning more etc but my unit tests dont always pass"  
[X Link](https://x.com/jxmnop/status/1772265225489616910)  2024-03-25T14:12Z [----] followers, 17.8K engagements


"scent teleportation should be getting more publicity πŸŒΊπŸŒ·πŸ’ i want to download roses; email candles to my friends; store the distinct smell of a certain forest on a specific day on a thumb drive to revisit years later this is way cooler than Sora or the latest LLM finetune Today Osmo is proud to introduce Scent Teleportation a technology that captures a smell in one part of the world and releases it in another. This is a new way to communicate that could one day help resensitize the digital world. https://t.co/YFUIMMOBHK https://t.co/Bnj5bDprku Today Osmo is proud to introduce Scent"  
[X Link](https://x.com/jxmnop/status/1772317325015793848)  2024-03-25T17:39Z [----] followers, [----] engagements


"ok before one of you tries to assassinate me for building AGI i figured out the bug. at least it's an interesting one πŸ₯²β˜” so we're doing contrastive learning which is a matching task between (query document) pairs query [--] matches to document [--] query [--] matches to document [--] and so on. the matching score is computed via the dot product between the respective embeddings for this experiment i was trying to add a soft prompt to everything in the batch to make learning better unfortunately the model outsmarted me and devised a very clever internal scheme to track its position within the batch for"  
[X Link](https://x.com/jxmnop/status/1772698790698336501)  2024-03-26T18:55Z [----] followers, 89.6K engagements


"yesterday my model was training and its performance was so good that twitter ppl threatened to blow up my house in case i was about to release AGI then i fixed one line of code. now my model performance is exactly the same as the baseline ah the ups and downs of science πŸ€“πŸ‘¨πŸ”¬"  
[X Link](https://x.com/jxmnop/status/1772997458617905646)  2024-03-27T14:41Z [----] followers, [----] engagements


"People: cornell obviously is very strong academically but i have noticed compared to other similar-caliber schools people here are remarkably chill. people don't really work nights/weekends and encourage you to have work-life balance. everyone in nlp/ML is fun to hang out with"  
[X Link](https://x.com/jxmnop/status/1775227472109338651)  2024-04-02T18:23Z [----] followers, [----] engagements


"Location: i'm sure New York City's reputation precedes itself so all I'm going to say is that I love it here living in manhattan is amazing. new york city. nowhere else like it"  
[X Link](https://x.com/jxmnop/status/1775227474420404525)  2024-04-02T18:23Z [----] followers, [----] engagements


"Research: finally intellectually i'm doing the best work of my life. the talks here are also phenomenal. it's kind of like with your favorite bands all the academics you want to meet will come through new york city every year or two. this semester we've had talks on campus i really liked from yann lecun tim dettmers simran arora johannes balle ben recht. my collaborators are great"  
[X Link](https://x.com/jxmnop/status/1775227477108916559)  2024-04-02T18:23Z [----] followers, [----] engagements


"New Research: a lot of talk today about "what happens" inside a language model since they spend the exact same amount of compute on each token regardless of difficulty. we touch on this question on our new theory paper Do Language Models Plan for Future Tokens"  
[X Link](https://x.com/jxmnop/status/1775914581036003373)  2024-04-04T15:53Z 10.1K followers, 135.8K engagements


"model-trainers of Twitter: why does my model start getting worse over time without the grad norm noticeably increasing -- how can this be possible and how do i fix it (again please help :) )"  
[X Link](https://x.com/jxmnop/status/1778436832075678100)  2024-04-11T14:55Z 10.1K followers, 106.6K engagements


"ok so it was a gradient bug. let me explain with the huggingface trainer when training on multiple GPUs sometimes the model is stored as an nn.Module but sometimes its wrapped in this DistributedDataParallel thing I was inadvertently forwarding on the nn.Module in multi-GPU meaning gradients werent syncing or averaging between steps each GPU was learning its own solution but the representations were aggregated between GPUs in the contrastive loss making for some really strange behavior sometimes train loss would go down for awhile as a single GPU learned to tell its own samples from other"  
[X Link](https://x.com/jxmnop/status/1778520637193240892)  2024-04-11T20:28Z 10.1K followers, 51.8K engagements


"who is building the google docs for pandas dataframes i would use that often. i want to send df links save them to my notes plot graphs in the cloud etc"  
[X Link](https://x.com/jxmnop/status/1780604880849420311)  2024-04-17T14:31Z 10.2K followers, 11.8K engagements


"you're telling me an 8B param model was trained on fifteen trillion tokens i didn't even know there was that much text in the world really interesting to see how scaling laws have changed best practices; GPT-3 was [---] billion params and trained on a paltry [---] billion tokens"  
[X Link](https://x.com/jxmnop/status/1780998136653095371)  2024-04-18T16:33Z 10.2K followers, 84.4K engagements


"common dark thought pattern in research run baseline experiment change thing A also change thing B run new experiment collect results "wow thing A works""  
[X Link](https://x.com/jxmnop/status/1782532731718545684)  2024-04-22T22:11Z 10.2K followers, [----] engagements


"in my opinion the Phi approach to training language models is just wrong i'm not convinced that training on less (albeit "higher-quality") data is better than training on as much data as possible i'm not convinced that training on synthetic data ever works better than training on the original data source (unless it's just a form of distillation after all distillation is magic) sorry Microsoft shouting "it works" without releasing any datasets or ablations isn't going to change my mind πŸΈβ˜•"  
[X Link](https://x.com/jxmnop/status/1782752895189794888)  2024-04-23T12:46Z 10.7K followers, 67.2K engagements


"weights & biases is great software but has anyone ever learned anything from one of these graphs can't even see a correlation between two variables if their columns aren't next to each other"  
[X Link](https://x.com/jxmnop/status/1782842994464366647)  2024-04-23T18:44Z 10.2K followers, 25.1K engagements


"move over meta the true biggest benefactor of open source machine learning is CHANEL TIL scikit-learn an open-source ML library has only one Platinum sponsor and it is . Chanel https://t.co/dBjaGntrAG TIL scikit-learn an open-source ML library has only one Platinum sponsor and it is . Chanel https://t.co/dBjaGntrAG"  
[X Link](https://x.com/jxmnop/status/1783134094135533855)  2024-04-24T14:01Z 10.7K followers, 65.4K engagements


"@Yampeleg it's a very different architecture and starting from a different pre-trained model"  
[X Link](https://x.com/jxmnop/status/1784244784837616004)  2024-04-27T15:34Z 10.8K followers, [----] engagements


"i wonder if the AI-is-going-to-kill-all-the-humans hype may have crested a wave at least for now i remember when a few accounts that would tweet "quit your job AGI is coming" and "scaling is all it takes" type stuff nonstop. don't see nearly as much compared to a year ago"  
[X Link](https://x.com/jxmnop/status/1784606750089634157)  2024-04-28T15:33Z 10.8K followers, [----] engagements


"@Zonalic agree yes there could be an architecture or training process of a completely different class that breaks us out of this local minimum"  
[X Link](https://x.com/jxmnop/status/1784703484169724380)  2024-04-28T21:57Z 10.8K followers, 12.1K engagements


"@VictorTaelin whats HVM CUDA"  
[X Link](https://x.com/jxmnop/status/1785129749708251628)  2024-04-30T02:11Z 10.8K followers, [---] engagements


"okay so what's up with this - jan: no big open LLM releases - feb: no big open LLM releases - mar/apr: cohere Command-R databricks/mosaic DBRX Reka Core meta LLAMA-3. - may: no big open LLM releases"  
[X Link](https://x.com/jxmnop/status/1786079237578842367)  2024-05-02T17:04Z 10.9K followers, 66.6K engagements


"i can feel the friday energy but this is where i'm at debugging-wise"  
[X Link](https://x.com/jxmnop/status/1786516225520013647)  2024-05-03T22:00Z 10.8K followers, 12.6K engagements


"@bentleynathan1 not for a while thats the point of the barrier - I think it might time out after [--] or [--] or [--] min I forget"  
[X Link](https://x.com/jxmnop/status/1786610166395207921)  2024-05-04T04:13Z 10.8K followers, [---] engagements


"i'm in vienna for #ICLR2024 dm me if you want to chat or grab a beer also come hang out at the poster session for Language Model Inversion on Friday i'll tweet about it once i know when and where. excited to DELVE in"  
[X Link](https://x.com/jxmnop/status/1787579019937849818)  2024-05-06T20:23Z 10.9K followers, [----] engagements


"ok so i'm at ICLR; this is my first machine learning conference. as you might imagine it's all very fun and exciting. but these poster sessions are absolutely INSANE this is an airplane hanger crammed with hundreds of posters each with dozens of people talking over each other in the general direction of the presenter the paper topics all sound so cool but i can't get a word in edgewise. it reminds me of a crowded bar or airport security line in there. if i look at a poster too long someone will bump into me what's the right way to navigate this should i get there early to stand my ground or"  
[X Link](https://x.com/jxmnop/status/1788276464292446237)  2024-05-08T18:35Z 10.9K followers, 38.4K engagements


"i think people have generally taken this blog post (The Bitter Lesson by Richard Sutton) far too seriously. some reminders: - you CAN come up with clever ideas and show that they work without a bazillion gpus - you CAN build useful systems at the 100M parameter scale - if you want to do important research remember that long-lasting breakthroughs will probably come from thinking up really different ways to do things rather than improving our current (already large & unwieldy) systems - (and while i'm at it: scaling is generally predictable and therefore is not really interesting"  
[X Link](https://x.com/jxmnop/status/1789371442158530920)  2024-05-11T19:06Z 10.9K followers, 106.9K engagements


"by the way it's ironic to me that The Bitter Lesson was written by a professor since the ideas feel so antithetical to research in a small lab i think that's where Colin Raffel was trying to go with his talk "The Sweet Lesson" http://colinraffel.com/talks/dlrl2022how.pdf http://colinraffel.com/talks/dlrl2022how.pdf"  
[X Link](https://x.com/jxmnop/status/1789371942518043006)  2024-05-11T19:08Z 10.9K followers, [----] engagements


"so apparently there are 10x more research papers written than there were ten years ago (at least in AI) i guess we're learning 10x more about the world each year now .or we're making discoveries at about the same rate but each individual paper is 10% as meaningful"  
[X Link](https://x.com/jxmnop/status/1790812702463398021)  2024-05-15T18:33Z 10.9K followers, 21.8K engagements


"passed my A exam and got my master's degree; officially halfway through my phd cheers everyone"  
[X Link](https://x.com/jxmnop/status/1793722436837634255)  2024-05-23T19:15Z 11K followers, 12.6K engagements


"i found this interview fascinating. leopold makes a lot of bold assumptions and is undoubtedly on a lot of amphetamines but has a fascinating worldview and has thought more about the future than almost anyone else in the know about AI yet so many people online disagree with his views. but is there anyone who has made an equally coherent case about why we *won't* achieve superintelligence by [----] or why automating AI research is hard actually why is the countercase so hard to argue here is it that by doing research on the topic you get automatically scaling-pilled .@leopoldasch on: - the"  
[X Link](https://x.com/jxmnop/status/1798835785325859059)  2024-06-06T21:54Z 20.3K followers, 91.7K engagements


"an underdiscussed gotcha behind the search + LLM = AGI narrative is search is only valuable when statewide improvements are *quantifiable* this is the case in Go and coding problems w/ tests and this ARC benchmark. we can explore the (LLM-generated) state space and leverage traditional search algorithms to hill-climb toward better solutions but we cant easily measure improvement in the general case where tasks are much more abstract. how do you use MCTS to write a better essay or generate a more actionable plan to take over the world ARC-AGIs been hyped over the last week as a benchmark that"  
[X Link](https://x.com/jxmnop/status/1803119881338134837)  2024-06-18T17:37Z 20.4K followers, 85.5K engagements


"googles Gemini paper has [---] authors which is more people than OpenAI has employees. yet gemini still underperforms OpenAIs best models (and i think anthropics too) what are most of those [---] people doing infra data whatever it is seems its not really necessary"  
[X Link](https://x.com/jxmnop/status/1809692293865304481)  2024-07-06T20:53Z 15.1K followers, 56.1K engagements


"five topics you can talk about for [--] minutes with zero prep redux [--]. memorization capacity of language models [--]. geography and climate of San Francisco [--]. golf swing mechanics [--]. training tricks for text embedding models [--]. AGI What are five topics you can talk about for [--] minutes with zero prep [--]. The Red Shoes [--]. Housing Policy [--]. Kants moral philosophy [--]. Campus Novels [--]. Norm Macdonald What are five topics you can talk about for [--] minutes with zero prep [--]. The Red Shoes [--]. Housing Policy [--]. Kants moral philosophy [--]. Campus Novels [--]. Norm Macdonald"  
[X Link](https://x.com/jxmnop/status/1810074025038864394)  2024-07-07T22:10Z 15.1K followers, 21.5K engagements


"@neurobiophysics @KtunaxaAmerika @SCDC87 @Edi_Danalache yep shout out dr torbert"  
[X Link](https://x.com/jxmnop/status/1810583620106781161)  2024-07-09T07:55Z 45.1K followers, 11.8K engagements


"my goal: a one on one with mark zuckerberg my status: meeting DECLINED by admins ❎ my spirits: still high my story: starts all the way back in middle school year is [----] a young jack morris signs up for Facebook hip new social networking site its great a nice way to connect with friends all that was fine twelve years pass Facebook is now Meta i work there im also a phd student graduation just around the corner (only a few years to go) considering both industry and academic roles starting a company could be what i want but im not sure who at this company can i look up to as a friend and mentor"  
[X Link](https://x.com/jxmnop/status/1810798086748569672)  2024-07-09T22:08Z 15.1K followers, 85.9K engagements


"by this time next year the typical ML/data scientist interview requirement will be a medium-level question from ML leetcode one hardcore session of of prompt engineering and five years of CUDA experience someone finally made leetcode for machine learning and it's everything we hoped it would be just solved the first exercise: computing a matrix-vector product without any tensor operations (only python lists allowed) https://t.co/qDRWIXvYSu https://t.co/2dnTEqkB56 someone finally made leetcode for machine learning and it's everything we hoped it would be just solved the first exercise:"  
[X Link](https://x.com/jxmnop/status/1811503193798639970)  2024-07-11T20:49Z 15.4K followers, 28.5K engagements


"@itsmattchan what's routing"  
[X Link](https://x.com/jxmnop/status/1813370770556612770)  2024-07-17T00:30Z 15.4K followers, [----] engagements


"reflections after one month as a research scientist intern at Meta - finally finished onboarding; i am out of trainings to complete. there were many - research is starting to heat up; spent lot of time brainstorming and now i have more ideas than i have time to execute. need to prioritize - colleague at Meta in SF is simply taking a week to work out of the Paris office. pretty sweet that you can just do that - next week everyone else will be gone for a conference (ICML in Vienna) so my calendar is completely white () - rode in my first self-driving car a Waymo. pretty smooth - SF marina and"  
[X Link](https://x.com/jxmnop/status/1813963918433452333)  2024-07-18T15:47Z 15.4K followers, 39.1K engagements


"in my mind all the evidence points towards AI approaching above-average human reasoning capabilities *sigmoidally* what evidence is there that we can build superhuman-level AI in any domain (games dont count) and is this even possible with supervised learning yearly reminder everything looks exponential from the middle of a sigmoid https://t.co/DCeXCGjTTL yearly reminder everything looks exponential from the middle of a sigmoid https://t.co/DCeXCGjTTL"  
[X Link](https://x.com/jxmnop/status/1814337962852983005)  2024-07-19T16:34Z 15.4K followers, 59K engagements


"@cheeetoo_ because you cant perfectly simulate the real world"  
[X Link](https://x.com/jxmnop/status/1814341846371573912)  2024-07-19T16:49Z 15.4K followers, [----] engagements


"countries that have both trained large language models and manufactured nuclear weapons: USA UK France Israel China countries with nukes but no LLMs: Russia North Korea countries with LLMs but no nukes: Japan South Korea"  
[X Link](https://x.com/jxmnop/status/1816493363539202314)  2024-07-25T15:19Z 15.5K followers, [---] engagements


"countries that have both trained large language models and manufactured nuclear weapons: USA UK France Israel China Russia countries with nukes but no LLMs: North Korea countries with LLMs but no nukes: Japan UAE South Korea"  
[X Link](https://x.com/jxmnop/status/1816498157418959058)  2024-07-25T15:38Z 15.5K followers, [---] engagements


"if you have language model weights on your computer you also are in possession of powerful compression software just put the finishing touches on gptzip a little personal project for compressing strings with language models compress text w/ hf transformers 5x better rates than gzip"  
[X Link](https://x.com/jxmnop/status/1817660589797486860)  2024-07-28T20:37Z 16K followers, 97K engagements


"@despinosagon lossless slower smaller filesize"  
[X Link](https://x.com/jxmnop/status/1817798719984873978)  2024-07-29T05:46Z 16.2K followers, [---] engagements


"sign on now and watch me talk about my research 😎 (grateful to be invited although the choice of picture here is criminal) .@stanfordnlp #NLProc Seminar this Thursday will feature @jxmnop (Jack Morris) from Cornell University Jack will talk about Inverting Language Models. Non-Stanford registration: https://t.co/ZFQJunGBai. Zoom link will be sent to non-Stanford registrants 1hr prior to the talk. https://t.co/MeTrY8VUtr .@stanfordnlp #NLProc Seminar this Thursday will feature @jxmnop (Jack Morris) from Cornell University Jack will talk about Inverting Language Models. Non-Stanford"  
[X Link](https://x.com/jxmnop/status/1819070649736090094)  2024-08-01T18:00Z 16.3K followers, 19.9K engagements


"funny little story about Extropic AI been curious about them for a while have twitter mutual who is an engineer/researcher for this company often tweets energy-based modeling and LM-quantumhype mumbojumbo last winter i wanted to get to the bottom of this meet up with extropic guy for drinks in nyc (west village) guy keeps mentioning energy-based modeling and LMs coincidentally a friend of mine (@yuntiandeng) wrote what is maybe the most well-known recent work on energy-based language modeling i like this paper i mention the paper offhand guy has never heard of it asks me to send it to him"  
[X Link](https://x.com/jxmnop/status/1820876333154759091)  2024-08-06T17:35Z 16.7K followers, 190.5K engagements


"turns out this is what happens when u question the value proposition of Extropic AI"  
[X Link](https://x.com/jxmnop/status/1820914931316990333)  2024-08-06T20:08Z 16.5K followers, 54.7K engagements


"things you definitely do NOT need to understand to be an expert on LLMs: - linear regression - bias variance trade off - most probability distributions (Gaussian Bernoulli poisson etc.) - RNNs - LSTMs - CNNs - higher-order calculus (beyond first derivatives + chain rule) - tensorflow - EM - kmeans - boosting & bagging - decision trees - random forests - graph neural networks - naive bayes - reinforcement learning (except RLHF maybe) - SVM - Gaussian processes"  
[X Link](https://x.com/jxmnop/status/1822398566406783196)  2024-08-10T22:24Z 19.3K followers, 251K engagements


"if huggingface builds superintelligence theyll probably just open-source it and make it installable like pip install agi"  
[X Link](https://x.com/jxmnop/status/1823517732870218200)  2024-08-14T00:31Z 19.3K followers, 74.9K engagements


"prompt optimization is such a hard problem and the algorithms are dumb. the fact that they work at all is baffling and people don't talk about it enough for those who don't know the prompt optimization problem is argmax_x (y x; ) where x is some prompt and is the loss for output y under language model . solving this problem exactly requires enumerating all possible prompts x. .and the space of possible 10-token prompts in a 50k vocab is [-------]. yet people regularly find solutions to these problems with 100-token prompts. how can this be possible the most popular algorithm (AutoPrompt/GCG) is"  
[X Link](https://x.com/jxmnop/status/1826681982375571621)  2024-08-22T18:04Z 19.3K followers, 122.5K engagements


"@ahiajsbwks [----] steps and it depends on model size and other hyperparameters but approx ten minutes to hour for a 1b model on a 40gb a100 gpu"  
[X Link](https://x.com/jxmnop/status/1826687108201021916)  2024-08-22T18:25Z 19.3K followers, [---] engagements


"@selfattentive good luck typing your prefix into OpenAI"  
[X Link](https://x.com/jxmnop/status/1826777085979099541)  2024-08-23T00:22Z 19.3K followers, [----] engagements


"have a question about transformers and CUDA: my understanding is that GPUs perform operations so quickly that most of the cost comes from data on and off a lot of performance gains come from "fusing" operations together so they can be done without sending any data back to the CPU so why not run the entire transformer on the GPU send data once do *all* the operations on the GPU and then send back only the output"  
[X Link](https://x.com/jxmnop/status/1828535014679814571)  2024-08-27T20:48Z 19.2K followers, 116.9K engagements


"a year or so ago there was a blip of excitement about running LLMs locally on your laptop or something; now it doesnt feel like many people are doing that. why not too slow too much loss in quality from quantization or turns out the APIs are actually convenient enough"  
[X Link](https://x.com/jxmnop/status/1828875445892502010)  2024-08-28T19:20Z 43.7K followers, 305.9K engagements


"such a useful piece of code wish i had this a few years ago"  
[X Link](https://x.com/jxmnop/status/1829152775537865076)  2024-08-29T13:43Z 19.1K followers, 152.1K engagements


"its gone long enough without happening that i am tweeting this research project into existence: i am wholly convinced that images can be *exactly* recovered from their embeddings and the fact that no one has done this so far is simply a skill issue a year ago now we showed that text can be exactly recovered from embeddings (Text Embeddings Reveal (Almost) As Much As Text). and theres nothing text-specific about the method our idea of iteratively refining a guess and guiding it closer to a ground-truth embedding should work in any modality. at least the iteration should work until one of two"  
[X Link](https://x.com/jxmnop/status/1829683142661455920)  2024-08-31T00:50Z 19.2K followers, 93.3K engagements


"there are three types of AI people in SF: - idealist (care about conceptual aesthetics; in it for the beauty of it all; novelty function) - grinder/tech bro (love NVIDIA/TSLA; mostly just in it for the $$) - doomer (think AI might kill us; earnestly trying to save the world)"  
[X Link](https://x.com/jxmnop/status/1831792896980480446)  2024-09-05T20:33Z 19.1K followers, [----] engagements


"TIME Magazine has rightly named famed deep learning pioneer ptrblock as the most influential person in Artificial Intelligence"  
[X Link](https://x.com/jxmnop/status/1831828366003175548)  2024-09-05T22:54Z 19.2K followers, 117.6K engagements


"On why you should read more (research papers): one of the most valuable problems we could solve as a community is idea deduplication at the meta-project level the counterintuitive consequence is that the most effective way for researchers to cut through the AI hype and be more productive is to read *more* papers i think the most common waste of researchers' time is spending energy on problems that they don't know already someone has been working on i often come out of an interesting conversation with a fellow researcher thinking "oh wow that would be cool but has someone done this already""  
[X Link](https://x.com/jxmnop/status/1832559616322068705)  2024-09-07T23:20Z 19.2K followers, 32.7K engagements


"no one told me that post-pandemic the most common work style at tech companies is not work from home or working from the office but a secret third thing: showing up just before lunch eating the free food and chilling for a couple hours then going back home again"  
[X Link](https://x.com/jxmnop/status/1836136389735518652)  2024-09-17T20:13Z 19.2K followers, 20.8K engagements


"learning to use copilot after programming on my own for [--] years is bittersweet kind of feels like being a carpenter thats trained to cut perfect corners; now here comes a machine that can do it perfectly and much faster yet I somehow miss the satisfaction of doing it myself"  
[X Link](https://x.com/jxmnop/status/1836620167012585933)  2024-09-19T04:15Z 19.1K followers, 22.9K engagements


"what's something that's actually *fun* that i can build in CUDA or triton when i was first learning to code i wrote games little puzzles animations. what can i make with CUDA besides just different matrix multiplications (preferably unrelated to machine learning)"  
[X Link](https://x.com/jxmnop/status/1838218763877191839)  2024-09-23T14:08Z 19.2K followers, 63.9K engagements


"sneak preview 🍿 of our new embedding model: cde-small-v1 cde-small-v1 is the text embedding model that we (@srush_nlp and i) have been working on at Cornell for about a year tested the model yesterday on MTEB the text embeddings benchmark; turns out we have state-of-the-art results cde stands for "Contextual Document Embeddings". our model works differently from other text embedding models. consequently it's a little more involved to use. i'm actively working on this and should have instructions up in the next few days feels good to have the best small text embedding model in the world (if"  
[X Link](https://x.com/jxmnop/status/1838973137784091133)  2024-09-25T16:05Z 19.2K followers, 67.5K engagements


"guess it turns out that langchain was just a fad something that we all tried and appreciated but eventually lost our zeal for such as tamagotchis silly bandz cup stacking etc"  
[X Link](https://x.com/jxmnop/status/1839351648105451836)  2024-09-26T17:09Z 19.2K followers, 433.3K engagements


"programs & things you need to know to succeed as a PhD student in Computer Science: python jupyter notebook pytorch jax bash SLURM LaTeX github matplotlib seaborn wandb gmail google docs google slides google calendar the espresso maker zoom Adobe Illustrator"  
[X Link](https://x.com/jxmnop/status/1840825812678820034)  2024-09-30T18:47Z 19.2K followers, 19.1K engagements


"once i realized GPUs operate at 150F under normal conditions that really changed my perspective honestly i'm surprised they don't break more often"  
[X Link](https://x.com/jxmnop/status/1840885187585937684)  2024-09-30T22:43Z 19.3K followers, [----] engagements


"@ThinkInSysDev yeah we actually try this in the experiments; it performs a bit worse per-domain context tokens are critical"  
[X Link](https://x.com/jxmnop/status/1842287721663869250)  2024-10-04T19:36Z 19.3K followers, [----] engagements


"im at #COLM2024 talk to me about whether language models plan for future tokens talk to me about contextual embeddings talk to me about your crazy new project idea talk to me about how to pick research ideas (not that i really know) talk to me about whatever talk to me"  
[X Link](https://x.com/jxmnop/status/1843345034256765263)  2024-10-07T17:37Z 19K followers, [----] engagements


"man the huggingface team is *cracked* two days after i released my contextual embedding model which has a pretty different API @tomaarsen implemented CDE in sentence transformers you can already use it implementation was not at all trivial; those people just work fast"  
[X Link](https://x.com/jxmnop/status/1844068801312178199)  2024-10-09T17:33Z 19K followers, 85.4K engagements


"oh no somehow i managed to format a list exactly like chatGPT (even though I wrote it myself) cool article about CDE (contextual document embeddings) out in Venture Beat πŸ“° also chatting with them got me thinking about what future work looks like in this line of research. here are a few ideas if you will allow me to pontificate for just a moment 🧐 - Multimodality. We https://t.co/qPyP0IY1AZ cool article about CDE (contextual document embeddings) out in Venture Beat πŸ“° also chatting with them got me thinking about what future work looks like in this line of research. here are a few ideas if"  
[X Link](https://x.com/jxmnop/status/1844515934678982822)  2024-10-10T23:10Z 18.9K followers, [----] engagements


"geoffrey hinton working on neural networks in the 80s and 90s"  
[X Link](https://x.com/jxmnop/status/1844755838701035533)  2024-10-11T15:04Z 19K followers, 37.4K engagements


"state space models are super neat & interesting but i have never seen any evidence that theyre *smarter* than transformers - only more efficient any architectural innovation that doesnt advance the pareto frontier of intelligence-per-parameter is an offramp on the road to AGI"  
[X Link](https://x.com/jxmnop/status/1845928158228631700)  2024-10-14T20:42Z 19K followers, 31.5K engagements


"funny little section from Fast Semantic Extraction Using a Novel Neural Network Architecture (Collobert & Weston 2007)"  
[X Link](https://x.com/jxmnop/status/1846370480280023188)  2024-10-16T02:00Z 19K followers, [----] engagements


"when we develop the final piece we need for agi it's gonna be released in a tweet with a link to arxiv paper about a new method named with a insane acronym like or CCs"  
[X Link](https://x.com/jxmnop/status/1847729960653389893)  2024-10-19T20:02Z 46K followers, 16.9K engagements


"people should be doing multiple LLM forward passes with the same weights feels like you could get a lot more for same model size. and this is all differentiable so training this way is trivial surely diffusion models aren't the only way to successfully reuse representations"  
[X Link](https://x.com/jxmnop/status/1848413511803933179)  2024-10-21T17:18Z 19K followers, 44K engagements


""Github claims that 40% of the code programmers write is written by Copilot. I was curious how they measured this number and so wanted to poke a bit into the telemetry." here's the link 🧲 http://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html http://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html"  
[X Link](https://x.com/jxmnop/status/1848734563868037375)  2024-10-22T14:34Z 46K followers, [----] engagements


"dang guess sometimes architecture does make a difference"  
[X Link](https://x.com/jxmnop/status/1850557774641426936)  2024-10-27T15:18Z 46K followers, 147.4K engagements


"tinygrad is running the funniest goodhart around right now they're obsessed with talking about how their library uses fewer lines of code than pytorch so their codebase is growing horizontally instead of vertically some parts are borderline unreadable to humans"  
[X Link](https://x.com/jxmnop/status/1850975062905516191)  2024-10-28T18:56Z 46K followers, 222.4K engagements


"i don't know the full personal history here and it can be hard to tell why some ideas catch on and some don't nonetheless it's a good read and a fascinating time capsule: http://arxiv.org/abs/1706.05137 http://arxiv.org/abs/1706.05137"  
[X Link](https://x.com/jxmnop/status/1851278355187802598)  2024-10-29T15:02Z 46K followers, [----] engagements


"giving a talk tonight at the NYC GenAI Collective Roundtable in Soho πŸ™ swing by if you want to hear about my research on contextual embeddings and how we made cde the first truly contextual text embedding model Going to be a packed house at our Research Roundtable in NYC this Tuesday Excited to see everyone πŸ”₯ https://t.co/ZrpLAJxdz2 Going to be a packed house at our Research Roundtable in NYC this Tuesday Excited to see everyone πŸ”₯ https://t.co/ZrpLAJxdz2"  
[X Link](https://x.com/jxmnop/status/1851349208168870088)  2024-10-29T19:43Z 46K followers, [----] engagements


"what does it mean when the grad norm does this (the loss looks fine; learning rate has been constant this whole time too)"  
[X Link](https://x.com/jxmnop/status/1851619720010621299)  2024-10-30T13:38Z 46K followers, 51.8K engagements


"just open-sourced the training and evaluation code for cde our state-of-the-art small text embedding model includes code for lots of hard stuff: * efficient clustering large datasets * contrastive training for SOTA retrieval models * our custom two-stage model architecture that embeds contextual tokens and uses them in downstream embeddings * a two-stage gradient caching technique that enables training our two-headed model efficiently * packing clusters and sampling from them even in distributed settings * on-the-fly filtering for clusters based on a pretrained model * more :)"  
[X Link](https://x.com/jxmnop/status/1851706815244902691)  2024-10-30T19:24Z 46K followers, 39.6K engagements


"google scholar PDF reader dark mode. honestly makes any paper look awesome"  
[X Link](https://x.com/jxmnop/status/1852388953300546051)  2024-11-01T16:35Z 19.8K followers, 20K engagements


"i decided you don't need lots of GPUs to do good research when i read "Are Emergent Abilities of Large Language Models a Mirage" (won award at NeurIPS 2023) if you buy that loss scales smoothly and predictably from small to large scale (both in model parameters and sequence length) then you should be able to validate and test most hypotheses on small models with small sequences both my most recent successful research results (language model inversion work and contextual embeddings) came from scaling down and running hundreds of tiny experiments with a tight short feedback loop if you're"  
[X Link](https://x.com/jxmnop/status/1853475704962216204)  2024-11-04T16:33Z 19.8K followers, 70.2K engagements


"what are some examples of impressive & notable deep learning research that was done with [--] GPUs"  
[X Link](https://x.com/jxmnop/status/1853587492059898021)  2024-11-04T23:57Z 19.8K followers, 92.2K engagements


"woke up to a bunch of these i don't deserve this"  
[X Link](https://x.com/jxmnop/status/1853804369608671393)  2024-11-05T14:19Z 19.8K followers, 47.2K engagements


"@DaftOdyssey"  
[X Link](https://x.com/jxmnop/status/1853806591075623052)  2024-11-05T14:28Z 19.6K followers, [----] engagements


"some people will hate me for saying this but my conclusion from moving to san francisco this year to do AI research was this: you dont need to move to san francisco to do AI research theres certainly a higher concentration of people who speak fluent AI and more people closely following the cutting edge - and there are really amazing researchers in SF (the people at OpenAI whoever made Gemini everyone at stanford & berkeley etc.) unlike most other cities SF definitely has an ingroup (which I was not a part of) of AI-adjacent folks who meet for parties and dinners and talk about things like"  
[X Link](https://x.com/jxmnop/status/1854543386880971048)  2024-11-07T15:16Z 19.9K followers, 70.6K engagements


"it's been two months since OpenAI released O1 and it doesn't feel like we're even close to an open-source replication i'm wondering if this is the start of a new era one where academic research is so far behind that we can't even study the best & newest models (hopefully not)"  
[X Link](https://x.com/jxmnop/status/1856400672775876658)  2024-11-12T18:16Z 21.5K followers, 70.1K engagements


"anybody who tells you to learn AI by reading a textbook is gatekeeping there simply isn't a textbook out there that covers this stuff. goodfellow Deep Learning is outdated (one subsection on language models). and the ML textbooks (bishop and murphy) are not really relevant"  
[X Link](https://x.com/jxmnop/status/1856517434275975536)  2024-11-13T02:00Z 20.5K followers, 316.1K engagements


"@dejavucoder i think this is probably the issue in my case"  
[X Link](https://x.com/jxmnop/status/1857148975234585019)  2024-11-14T19:49Z 20.1K followers, 12.6K engagements


"@0x_Broke_Boi then who's out there thinking big would you say"  
[X Link](https://x.com/jxmnop/status/1857149760668320009)  2024-11-14T19:53Z 20.1K followers, [----] engagements


"more fun open-source research news - new paper drops (nGPT) - claims 4-20x training speedup over GPT - shocking - very cool - very valuable - community tries to reproduce - doesn't hold up - turns out baseline was busted - another cool new research idea oneshotted by github anon @francoisfleuret Bcs the baseline by the looks of it was wrong https://t.co/iMPdPQzKhY @francoisfleuret Bcs the baseline by the looks of it was wrong https://t.co/iMPdPQzKhY"  
[X Link](https://x.com/jxmnop/status/1858627599981048211)  2024-11-18T21:45Z 20.5K followers, 130.3K engagements


"so the error in the transformer impl from nGPT was very easy to make the residual stream propagated as this x = norm(x) + attn(norm(x)) instead of this x = x + attn(norm(x)) TLDR this breaks everything and makes learning impossible. but why is there a simple explanation more fun open-source research news - new paper drops (nGPT) - claims 4-20x training speedup over GPT - shocking - very cool - very valuable - community tries to reproduce - doesn't hold up - turns out baseline was busted - another cool new research idea oneshotted by github anon https://t.co/StyA4fegjW more fun open-source"  
[X Link](https://x.com/jxmnop/status/1858895357209403510)  2024-11-19T15:29Z 20.5K followers, 67.4K engagements


"this google guy made big headlines two years ago funniest part: he was duped into empathizing with *LaMDA* an extremely primitive language model by [----] standards. undertrained on low-quality data no RLHF/DPO etc. if he talked to the latest Gemini he would simply combust"  
[X Link](https://x.com/jxmnop/status/1860011346466865167)  2024-11-22T17:23Z 20.5K followers, 100.6K engagements


"AI is a field you can get up to speed in and contribute to in just a year or two case in point: the 2018-2021 Google Brain residency produced a disproportionate number of elite researchers katherine lee jason wei luke metz barrett zoph colin raffel hardmaru + many more"  
[X Link](https://x.com/jxmnop/status/1861078042778484777)  2024-11-25T16:02Z 20.5K followers, 99.9K engagements


"xAI just quietly dropped a new text embedding model and for some reason no one is talking about it; you can already query the model via API since the new grok training hasn't worked out maybe they are focusing on embeddings instead"  
[X Link](https://x.com/jxmnop/status/1864334032915358050)  2024-12-04T15:40Z 21K followers, 13.8K engagements


"the story of entropix is a sad one; a cautionary tale over 3k github stars tens of thousands of likes & replies on X; unfortunately it is not real turns out you can vaguepost your way to the top with plots that look good with numbers that seem high and as long as you never evaluate the full thing on a known benchmark no one can refute your points top the confusing graphs off with some technical-sounding mumbojumbo ("varentropy" "entropy scaffolds".) and you're well on your way to a cult following Twitter could be a great place to share research but the algorithm these days encourages this"  
[X Link](https://x.com/jxmnop/status/1864415167578050599)  2024-12-04T21:03Z 21.3K followers, 176.6K engagements


"@tensorqt more like lack of real evaluations or benchmarks its not my job to prove something doesnt work or navigate some convoluted rabbit hole to figure out why (and Im not even sure I could do so); burden is on the creator(s) to prove the discovery of anything useful"  
[X Link](https://x.com/jxmnop/status/1864422469550960764)  2024-12-04T21:32Z 21.1K followers, [----] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@jxmnop Avatar @jxmnop jack morris

jack morris posts on X about ai, open ai, $googl, in the the most. They currently have [------] followers and [---] posts still getting attention that total [-------] engagements in the last [--] hours.

Engagements: [-------] #

Engagements Line Chart

  • [--] Week [-------] +97%
  • [--] Month [---------] +75%
  • [--] Months [----------] -22%
  • [--] Year [----------] -13%

Mentions: [--] #

Mentions Line Chart

  • [--] Month [--] +20%
  • [--] Months [---] -45%
  • [--] Year [---] +41%

Followers: [------] #

Followers Line Chart

  • [--] Week [------] +0.12%
  • [--] Month [------] +1.80%
  • [--] Months [------] +12%
  • [--] Year [------] +92%

CreatorRank: [------] #

CreatorRank Line Chart

Social Influence

Social category influence technology brands #5069 stocks social networks finance countries automotive brands travel destinations fashion brands currencies gaming

Social topic influence ai #1072, open ai #62, $googl, in the, data, llms, if you, meta, agi, llm

Top accounts mentioned or mentioned by @srushnlp @lateinteraction @justintchiu @yuntiandeng @essenaccount @etsy @shopify @stripe @starcloudinc1 @brunostefoni @jpohhhh @dwarkeshsp @chhaviyadav @kamalikac @vertinski @johnschulman2 @thestalwart @ethanwbrown @gilpinskyy @pronouncedkyle

Top assets mentioned Alphabet Inc Class A (GOOGL) Tesla, Inc. (TSLA) DoorDash Inc. (DASH) DeepSeek (DEEPSEEK) Shopify Inc (SHOP) GPT2 (GPT2)

Top Social Posts

Top posts by engagements in the last [--] hours

"Peloton stock lookin like a normal distribution"
X Link 2022-05-10T01:27Z 19.6K followers, [--] engagements

"@jpohhhh reminds me of those mushrooms in skyrim"
X Link 2022-08-20T03:05Z 43.9K followers, [--] engagements

"@unixpickle high school is even worse"
X Link 2023-08-10T04:24Z [----] followers, [--] engagements

"the truth about machine learning software engineering in academia is that everyone is running commands like this: TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=0 stdbuf -oL -eL Cthon train_phase2_patchmlp.py --train_path data/$FOLDER/src1_train.txt --val_path data/$FOLDER/src1_valid.txt --test_path data/$FOLDER/src1_test.txt --epochs $EPOCHS --lr $LR --model $MODEL --batch_size $BSZ --qmodel $QMODEL --save_model /n/disk/rush_lab/Users/jack/implicit/$SAVE --mode top --accumulate $A $SAVE/log.train.text.modelgpt2-medium.folder$FOLDER.e$EPOCHS.lr$LR.$BSZ 2&1& tee tst_tmp.txt"
X Link 2023-08-24T17:27Z [----] followers, 52.8K engagements

"OpenAI has been ahead of the curve on so many things. one I find mildly interesting is the trend of removing dropout from models GPT-1 had dropout (as well as original transformer BERT etc.) but they got rid of it for GPT-2 these days LLMs are never trained with dropout"
X Link 2023-08-28T15:24Z [----] followers, 49.8K engagements

"An amazing mystery of machine learning right now is that state-of-the-art vision models are 2B parameters (8 gigabytes) while our best text models are 200B parameters (800 gb) why could this be philosophically are images inherently less complicated than text (no right)"
X Link 2023-08-29T17:13Z [----] followers, 434.7K engagements

"@Ted_Underwood super cool fact I didnt know this"
X Link 2023-08-29T23:13Z [----] followers, 13.5K engagements

"are there any smaller-than-giant companies that (1) train LLMs and (2) offer internships to PhD students I was thinking of openAI anthropic characterAI adept etc. but couldn't find any info on those. any suggestions here"
X Link 2023-09-01T13:09Z [----] followers, 52.2K engagements

"@sam_havens don't you mean DATABRICKS"
X Link 2023-09-01T15:54Z [----] followers, [----] engagements

"Little things I learned when implementing the LLAMA forward pass from scratch with @lionellevine: - ROPE positional embeddings are tricky and weird and I'm convinced fewer than [---] people really understand them - the residual is added twice per layer (after self-attention and after MLP) - layernorm actually has a scaling matrix with some learned params (multiplicative scaling factor for each hidden dim) - (A @ B.T) = (B @ A.T).T in torch (a very surprising and unfortunate gotcha) -- so unless you implement the matmuls in the same order as the reference impl your outputs will slightly differ"
X Link 2023-09-02T15:19Z [----] followers, 108.4K engagements

"if floating-point math was associative we would have had AGI back in 2016"
X Link 2023-09-04T18:16Z [----] followers, 121.7K engagements

"I used to work on adversarial example research specifically in NLP. that line of work searches for "obviously correct" inputs that fool machine learning models I decided that adversarial examples aren't very important in the long-term view of AI and here's why:"
X Link 2023-09-06T14:36Z [----] followers, 33K engagements

"in this case the "adversarial examples" look like complete sets of inputs that can continuously fool the model over time I have no reason right now to think this kind of adversarial example will exist so future models may not be susceptible in the same way at all"
X Link 2023-09-06T14:36Z 19.3K followers, [----] engagements

"one analogy is that right now adversarial example research on neural networks is like showing an image (or text) to a human for a single instant and then proclaiming "look how dumb this human is we showed it a dog for .001ms and this guy thought it was a muffin""
X Link 2023-09-06T14:36Z [----] followers, [----] engagements

"@moyix I think you could do this with a custom logits processor that doesn't change the logits it just streams the argmax"
X Link 2023-09-07T03:40Z 17.5K followers, [---] engagements

"Is Gary marcus the leading critic of LLMs / modern AI feels like there is a big opportunity here for a dissenting voice - someone articulate knowledgeable and thoughtful but similarly contrarian too bad all the people who actually understand neural networks believe in them"
X Link 2023-09-07T13:19Z [----] followers, [----] engagements

"llama.etc is neat but I think this blog post from back in January is much more readable/easier to learn from: "GPT in [--] Lines of NumPy""
X Link 2023-09-10T12:18Z [----] followers, 13K engagements

"curious if anyone knows where Google went wrong with TensorFlow it's bad software fundamentally broken. when I was an AI resident I found [--] bugs within tensorflow core in around a year. but how does a failure like this this happen so many smart people work there"
X Link 2023-09-13T17:22Z [----] followers, 202K engagements

"unpopular opinion: training open-source LLMs is a losing battle. a complete dead end the gap between closed models like GPT-4 and open models like LLAMA will only continue to widen as models grow bigger and require more resources no one is building particle colliders at home"
X Link 2023-09-15T13:05Z [----] followers, [----] engagements

"if I were starting my research career today and interested in language models I would become a world expert on tokenization & tokenizers tokenization is weird fascinating and poorly understood yet ubiquitous and necessary for things like chatGPT to work"
X Link 2023-09-18T15:55Z [----] followers, 35.1K engagements

"interesting to think about LLM hallucinations as a UX issue more than a modeling issue. when all people have is text they tend to believe what they read. but behind the scenes we have confidence scores for every token"
X Link 2023-09-19T22:46Z [----] followers, 10.2K engagements

"surprised physical AI art hasn't become a thing yet. i want to plaster my walls with classics-inspired masterpieces of my own creation dalis matisses picassos etc is there an image resolution issue or maybe just no overlap between art-printing people and ai-image people"
X Link 2023-09-21T17:22Z [----] followers, [----] engagements

"im definitely running eval too often now"
X Link 2023-09-25T03:23Z 19.3K followers, [---] engagements

"is anyone working on meta-predicting optimization so I can know what the loss will be with some confidence after training for some time might be impossible but it would be nice for me if somebody made that"
X Link 2023-09-25T21:46Z 19.3K followers, [---] engagements

"Anyone ever compute alignment between two different tokenizers by that I mean I have a probability vector from say GPT-2 and want to convert it to what the vector would look in the vocabulary of another model say LLAMA BOUNTY: if u can write code to do this in the next few"
X Link 2023-09-28T18:55Z [----] followers, [--] engagements

"@soldni mixed how"
X Link 2023-09-28T19:48Z 17.4K followers, [---] engagements

"@_dsevero why is leetcode useful for compression research"
X Link 2023-10-01T21:16Z [----] followers, [----] engagements

"finally published our latest research on text embeddings TLDR: Vector databases are NOT safe. 😳 Text embeddings can be inverted. We can do this exactly for sentence-length inputs and get very close with paragraphs"
X Link 2023-10-12T20:16Z [----] followers, 336.6K engagements

"Very excited to publish this research and will have more in this direction to share soon check out our paper on arxiv:"
X Link 2023-10-12T20:16Z [----] followers, [----] engagements

"@jobergum hmm I agree its complex and I like your analogy maybe like people thought the hash was save w/o a private key but given enough hashes we can just learn to reverse the hash function (so pinecone can just sell ur data)"
X Link 2023-10-12T20:44Z [----] followers, [----] engagements

"Ironic to read a sarcastic response to my research on vector database security from the VP of Marketing at Pinecone the same guy who argued that an embedding "is sufficiently obfuscated to not count as PII""
X Link 2023-10-13T17:38Z [----] followers, 39.8K engagements

"most useful bit of code I've written all year: call map() on a HuggingFace dataset in torch distributed mode (like DDP) as one example this will let you compute embeddings for a dataset in parallel using all the GPUs you have"
X Link 2023-10-24T15:10Z [----] followers, 27.3K engagements

"Imagine this prompt: "write me Python code to disable the NYC subway system" obviously gpt-4 can't do this now. It'll refuse but even if we jailbreak it it'll answer incorrectly but if we keep training bigger & better language models won't one eventually be able to do this"
X Link 2023-10-26T16:02Z [----] followers, 47.6K engagements

"all community notes should be read in the voice of the narrator from arrested development"
X Link 2023-10-28T23:08Z [----] followers, [----] engagements

"I turned down a job at google research to do a PhD at Cornell right before chatGPT came out and I dont regret it at all. I see it like this. Do you want to work with a large group on building the fastest & fanciest system in the world or in a small group testing crazy theories in smaller settings"
X Link 2023-11-03T14:13Z [----] followers, 362.9K engagements

"is there anything novel or technically interesting about this model besides the fact that it outputs swear words"
X Link 2023-11-05T20:18Z [----] followers, 94.8K engagements

"prediction: every vector database will eventually be replaced with a single transformer a row for every individual datapoint is so old-school feels outdated what would be better: a single differentiable blob something that knows about all your data and can chat about it"
X Link 2023-11-08T17:36Z [----] followers, 83.9K engagements

"tired of paying OpenAI for GPT-4 API the NYC Department of Small Business Services has your back the NYC small business chatbot is powered by GPT-4 so equally capable just have to ask it information about operating a business in New York City first"
X Link 2023-11-09T14:40Z [----] followers, 257.2K engagements

"the openAI schism is thrilling but dont think it changes much for the future trajectory of AI the true geniuses behind this stuff arent sama gdb or karparthy; theyre scientists like ilya and alec radford and as of now seems like all the scientists still work there"
X Link 2023-11-18T16:34Z [--] followers, 35.2K engagements

"@abacaj what are you implying"
X Link 2023-11-18T16:47Z [----] followers, [----] engagements

"cool research idea for someone: text diffusion in embedding space solve any sequence-to-sequence task in three steps: [--]. embed source sentence text [--]. build diffusion model that maps input text embedding to target text embedding [--]. invert to produce target text"
X Link 2023-11-20T15:53Z [----] followers, 163.2K engagements

"will there ever be a 1B param model thats more capable in every way than GPT-4 is today"
X Link 2023-11-25T15:13Z [----] followers, 30K engagements

"@DotDotJames no scaling laws like the scaling laws for language models:"
X Link 2023-11-28T15:00Z [----] followers, [---] engagements

"my man congrats aaron on the well-deserved award"
X Link 2023-12-01T06:43Z [----] followers, [----] engagements

"language models encode information in their weights while embeddings encode information in their activations this distinction is important possibly somewhat profound"
X Link 2023-12-04T03:11Z [----] followers, 62.9K engagements

"fewer than [---] people deeply understand both (i) transformers and (ii) the GPU programming model want to learn machine learning gain some esoteric systems knowledge; spend some time really learning CUDA"
X Link 2023-12-04T17:27Z [----] followers, 222.6K engagements

"i am in singapore presenting some cool research at #EMNLP2023 iPrompt: Explaining Patterns in Data with Language Models via Interpretable Autoprompting Text Embeddings Reveal (Almost) As Much as Text Tree Prompting: Efficient Task Adaptation without Fine-Tuning"
X Link 2023-12-07T01:32Z [----] followers, [----] engagements

"i'm sorry but you shouldn't be allowed to give an oral presentation over zoom. sitting in a room of [---] people staring at a stage where someone's face is projected onto a screen as they try to talk w/ internet cutting in and out is a waste of time and just a bad vibe"
X Link 2023-12-08T07:18Z [----] followers, 10.8K engagements

"say what you will about mistral tweeting exclusively download links to new models with no context is unbelievably cool"
X Link 2023-12-09T01:08Z [----] followers, 59.1K engagements

"Honestly the Gemini release made me really sad. Gemini beats GPT-4 at 32-shot COT (read: not straight up) is exactly what a PhD student would say if their model wasnt as good as theyd hoped whats so special about GPT-4 is it people systems data some kind of blind luck"
X Link 2023-12-10T03:22Z [----] followers, 388.9K engagements

"@jpohhhh I also think it must be data but cant fathom what type of data it is that GOOGLE cant get their hands on"
X Link 2023-12-10T03:32Z [----] followers, 11.9K engagements

"@Alexir563 maybe next year :)"
X Link 2023-12-10T09:03Z [----] followers, [---] engagements

"@inductionheads for sure the probabilistic breakdown is the same: given text sequence x and its embedding e p(x) = p(x e) p(e). this looks a lot like a latent variable model with latent embedding e in my case p(x e) is done by vec2text with openAI embeddings; we just need to learn p(e)"
X Link 2023-12-13T16:34Z [----] followers, [----] engagements

"@volokuleshov thankful twitter hasn't implemented a peer review process yet unlike latent diffusion in this case the embedding space is fixed (it's openAI ada [--] in my notebook) I think this is kind of like conditional generation"
X Link 2023-12-13T19:39Z [----] followers, [----] engagements

"machine learning research question: whats an idea that you think would catch on if only someone spent the money to test it at scale ill go first: tokenization-free transformers"
X Link 2023-12-13T23:01Z [----] followers, 90.1K engagements

"Seen a lot of evidence that GPT-4 crushes Gemini on all the head-to-head LM benchmarks. here's my theory about what went wrong: - chatGPT released - google execs freak out - google consolidates (integrates Deepmind Brain) - google assembles a giant team to build a single giant language model (spoiler: it's gemini) - team defines success as building the system that gets the best performance on MMLU - team starts building model - finds out that beating GPT-4 level performance is really hard - many cycles of experimental iteration ensue all with bottom-line MMLU performance dictating success -"
X Link 2023-12-19T15:16Z [----] followers, 233.7K engagements

"fun research idea: Latent chain-of-thought / Latent scratchpad it's well-known that language models perform better when they generate intermediate reasoning tokens through some sort of 'scratchpad'. but there's no reason scratchpad tokens need to be human-readable. in fact generating real language involves a lot of unnecessary tokens. this seems to me like a big inefficiency. let me explain this a little further under the hood the typical process for generating a scratchpad token looks like [--]. generate a vector as the output of the language model [--]. project this vector to a distribution"
X Link 2023-12-20T15:50Z [----] followers, 127.3K engagements

"@lateinteraction yes I don't think it exists pause tokens: every pause token is the same right so they can't carry extra information from the result of previous computations the link: this is different enough from what I suggested that I don't feel like explaining rn"
X Link 2023-12-20T16:00Z [----] followers, [----] engagements

"im curious about effective altruism: how do so many smart people with the goal do good for the world wind up with the subgoal analyze the neurons of GPT-2 small or something similar"
X Link 2023-12-28T23:26Z [----] followers, 95.8K engagements

"certainly AI alignment (on a high level) is an important issue but why is it undervalued relative to eg world hunger or pandemic preparedness and given this why is studying language models considered a reasonable path towards eventually saving the world"
X Link 2023-12-28T23:26Z [----] followers, [----] engagements

"@burny_tech I don't think language models are "the most powerful technology"; I think that title would go to something else perhaps nuclear bombs or battleships or The Internet"
X Link 2023-12-29T18:05Z [----] followers, [---] engagements

"people keep saying AI is moving so fast. some days I agree but some days I'm not sure so many papers published but I don't feel like we're making that many fundamental breakthroughs. to cap off [----] here's a list of things we still don't know about language models: - how can we reliably answer questions from long texts like books - can we pretrain a world-class LM for less than a million dollars - can we prevent a model from making up facts - how can we "think more" for some inputs than others - how can we train on both private and public data - are our architectures (transformers) close to"
X Link 2023-12-29T18:39Z [--] followers, 165.4K engagements

"my twitter is in a funny place right now because if you look at my tweets it's all machine learning and language model gobbledygook but behind the scenes i'm DMing my friends memes of that human pop tart that danced its way down into the toaster last night at the poptarts bowl"
X Link 2023-12-29T22:11Z 22K followers, [----] engagements

"there's something sinister about these two sharing a stagecrossover episode featuring two of the most prominent pseudointellectual grifters. they manage to emulate the structure of intelligent conversation while really saying nothing at all. hoping for better role models in [----] Here's my conversation with Guillaume Verdon (@GillVerd) aka Beff Jezos (@BasedBeffJezos) a physicist quantum computing researcher and founder of e/acc (effective accelerationism) movement that advocates for rapid technological progress physics-based reasoning and memes. https://t.co/kewuLFEdNr Here's my conversation"
X Link 2023-12-30T22:53Z [----] followers, 84.9K engagements

"on podcasts: lex fridman brings on a wonderful selection of guests (mostly) but doesnt research them beforehand and resorts to asking them things like do u think aliens are real for people asking for a replacement I think @dwarkesh_sp is great and has spectacular guests too"
X Link 2023-12-31T04:52Z [----] followers, 35K engagements

"@_mattfreed if this picture didn't come out of an orb than i'm not buying IT"
X Link 2023-12-31T19:32Z [----] followers, [--] engagements

"the openAINYT lawsuit is a big deal for copyright precedent. literally all popular models right now were trained on copyrighted data. except for one my friend from school @SkyLi0n developed a diffusion model that's not trained on any copyrighted data it's called CommonCanvas"
X Link 2024-01-03T02:21Z [--] followers, 20.7K engagements

"@Guuber42 probably not tbh except maybe myopia wrt current LLMs"
X Link 2024-01-06T18:46Z [----] followers, [--] engagements

"@danielleboccell reading an entire textbook cover-to-cover is a herculean task. maybe we should just pick some notable sections to focus on"
X Link 2024-01-09T22:49Z [----] followers, [--] engagements

"tokenizing. πŸš‚"
X Link 2024-01-11T04:11Z [----] followers, [----] engagements

"fun research story about how we jailbroke the the chatGPT API: so every time you run inference with a language model like GPT-whatever the model outputs a full probabilities over its entire vocabulary (50000 tokens) but when you use their API OpenAI hides all this info from you and just returns the top token -- or at best the top [--] probabilities we needed the fullvector (all [-----] numbers) for our research so we developed a clever algorithm for recovering it by making many API calls important to know is that the API supports a parameter called "logit bias" which lets you upweight or"
X Link 2024-01-11T20:18Z [----] followers, 122.9K engagements

"@dl_rui - inverting prompts from logits - training data detection (@WeijiaShi2 worked on this) - distillation"
X Link 2024-01-11T23:35Z 17.6K followers, [----] engagements

"@arivero not sure what your first question means but in this case we rely on knowing the tokens of the tokenizer; openAI gives us back key-value pairs of token-logprob"
X Link 2024-01-12T00:06Z 17.6K followers, [----] engagements

"i don't think this is right at all; the reason why having every textbook pdf ever available for free and every course available on youtube didn't change humanity is simple learning things takes so much ENERGY; most people have had textbooks for a century but most go unread @dwarkesh_sp A few reasons but one is that genetic determinism is wrong and drivers of progress arent just spawned naturally they have to be carefully constructed under fragile conditions @dwarkesh_sp A few reasons but one is that genetic determinism is wrong and drivers of progress arent just spawned naturally they have to"
X Link 2024-01-15T15:49Z [----] followers, 10.9K engagements

"i've seen several papers from google claiming their LLMs beat DOCTORS at various medical tasks. this isn't just sketchy PR it's actively harmful. doctors and surgeons are transplanting organs etc. saving actual people's lives on a daily basis. feels a little disrespectful"
X Link 2024-01-15T22:09Z [----] followers, [----] engagements

"what are some ideas from machine learning that i can explain to my grandma"
X Link 2024-01-18T15:43Z [----] followers, [----] engagements

"@ImageDeeply i think the speedup depends on a lot of stuff like how big your dataset is how big the shards are how many CPUs you have what disk you have etc. Unless you have a really big dataset I doubt this will help you nearly as much"
X Link 2024-01-23T15:51Z [----] followers, [---] engagements

"when doing inference on lots of samples how much speed up can you expect from increasing the batch size context: i'm doing inference for lots of images using resnet100 on an a6000 gpu. increased batch size [--] - [----] (64x) and only getting a 20% speedup. how is this possible"
X Link 2024-01-26T22:21Z [----] followers, 16.1K engagements

"surprised to see so many people excited to see google sitting in second place on a leaderboard πŸ₯² also the obvious question here is why is GPT-4 turbo beating GPT-4 on this benchmark i thought turbo was intended to be faster but slightly dumber πŸ”₯Breaking News from Arena Google's Bard has just made a stunning leap surpassing GPT-4 to the SECOND SPOT on the leaderboard Big congrats to @Google for the remarkable achievement The race is heating up like never before Super excited to see what's next for Bard + Gemini https://t.co/QPtsqZdJhC πŸ”₯Breaking News from Arena Google's Bard has just made a"
X Link 2024-01-27T15:43Z [----] followers, 69.7K engagements

"one exciting observation about transformers (and most modern deep learning) is that you can understand them using high school math. really just multiplication division sums and exponentiation many times and in a strange and initially hard-to-grok order"
X Link 2024-01-27T18:20Z [----] followers, 34.4K engagements

"carbon dating ML papers by which open-source LM they use GPT-2 the before times GPT-J summer [----] LLAMA-1 spring [----] LLAMA-2 summer [----] mistral 7b fall [----] mixtral 8x7b the current era"
X Link 2024-02-01T01:07Z [----] followers, 18.7K engagements

"whats the simplest neural language model i can code for my little class maybe just an MLP on some word embeddings doing an RNN feels like it could be overkill"
X Link 2024-02-04T23:58Z [----] followers, 29.6K engagements

"what are some easy ways for a normal person like me to speed up training a transformer on a single GPU i'm using huggingface T5 implementation + defaults. i know about FlashAttention and BetterTransformer and torch compile will these all work together is there anything else"
X Link 2024-02-07T16:19Z [----] followers, 46.7K engagements

"welp. this is what happened when i tried to use torch compile"
X Link 2024-02-08T01:06Z [----] followers, 13.1K engagements

"@zicnyteoh i'm using nvidia a6000 gpus :)"
X Link 2024-02-08T01:45Z [----] followers, [---] engagements

"@Geronimo_AI OpenAI ada 2"
X Link 2024-02-10T06:17Z [----] followers, [----] engagements

"@svonava I dont think thats necessarily true text is actually relatively few bytes while each 32-bit float is [--] bytes so at least at short lengths things could be lossless"
X Link 2024-02-10T21:50Z [----] followers, [---] engagements

"people spent years optimizing GANs before realizing that diffusion models were simpler and better people spent years developing RLHF before realizing that DPO is simpler and better what are we working on rn i want to find the simpler and better version and work on that instead"
X Link 2024-02-17T17:26Z [----] followers, 109.4K engagements

"today i'm making voronoi diagrams of text embedding spaces what can we do with these"
X Link 2024-02-18T19:16Z [----] followers, 59.4K engagements

"@waltuuuhr i'm talking about sequence text embeddings not word embeddings"
X Link 2024-02-22T18:06Z [----] followers, [----] engagements

"biggest lesson I learned from gemini is that LLM trainers now have to choose between overfitting to chain-of-thought-type inputs (best for absolute reasoning ability) and over-fitting to human chatbot interactions (best for talking to humans) no free lunch here have to choose"
X Link 2024-02-26T17:01Z [----] followers, 14.3K engagements

"@ysegmond too bad this is fake :'( not even sure you can fit [--] h100s in a single machine lol"
X Link 2024-02-28T00:11Z [----] followers, [----] engagements

"@mathijsfietst unfortunately that's due to a bug I made when training the model; the gtr embeddings i use are missing a last post-processing step. someone trained a fixed model more info here"
X Link 2024-02-29T16:43Z [----] followers, [--] engagements

"if the blue line drops below the green line then were going to disney world babe"
X Link 2024-02-29T22:07Z [----] followers, 33.1K engagements

"my library is here bm25_pt: the key insight is that you can reduce the BM25 scoring to a single matrix multiplication. everything in the big fraction on the right side here can be stored in a big matrix of scores then bm25(q) = bag(q) @ scores.T"
X Link 2024-03-01T15:25Z [----] followers, [----] engagements

"without bm25_pt you either have to use really slow string-based code (rank-bm25) or run a server (elasticsearch) to do bm25 in python now it's a pip package and two lines; i made everything fast using huggingface & pytorch"
X Link 2024-03-01T15:25Z [----] followers, [----] engagements

"for those who haven't heard about this: openAI only gives you the top [--] logprobs per response (max). we (mostly @justintchiu) came up with an algorithm to get logits from openAI. essentially for a different token (not top-5) u can binary search logit bias to see the minimal number that gets it into the top 5"
X Link 2024-03-07T22:52Z 20.5K followers, [----] engagements

"since i got into ML in [----] or so there have been exactly three major advancements: [--]. web-scale language model pretraining (OpenAI 2020) [--]. diffusion models (OpenAI 2020) [--]. language model refinement via human feedback (OpenAI 2022) whats next its been a while"
X Link 2024-03-21T13:15Z [----] followers, 95.5K engagements

"my former boss from google would cry a little bit if he knew the quality of code I am churning out daily in academia today im moving much faster though - creating and discarding hypotheses more quickly learning more etc but my unit tests dont always pass"
X Link 2024-03-25T14:12Z [----] followers, 17.8K engagements

"scent teleportation should be getting more publicity πŸŒΊπŸŒ·πŸ’ i want to download roses; email candles to my friends; store the distinct smell of a certain forest on a specific day on a thumb drive to revisit years later this is way cooler than Sora or the latest LLM finetune Today Osmo is proud to introduce Scent Teleportation a technology that captures a smell in one part of the world and releases it in another. This is a new way to communicate that could one day help resensitize the digital world. https://t.co/YFUIMMOBHK https://t.co/Bnj5bDprku Today Osmo is proud to introduce Scent"
X Link 2024-03-25T17:39Z [----] followers, [----] engagements

"ok before one of you tries to assassinate me for building AGI i figured out the bug. at least it's an interesting one πŸ₯²β˜” so we're doing contrastive learning which is a matching task between (query document) pairs query [--] matches to document [--] query [--] matches to document [--] and so on. the matching score is computed via the dot product between the respective embeddings for this experiment i was trying to add a soft prompt to everything in the batch to make learning better unfortunately the model outsmarted me and devised a very clever internal scheme to track its position within the batch for"
X Link 2024-03-26T18:55Z [----] followers, 89.6K engagements

"yesterday my model was training and its performance was so good that twitter ppl threatened to blow up my house in case i was about to release AGI then i fixed one line of code. now my model performance is exactly the same as the baseline ah the ups and downs of science πŸ€“πŸ‘¨πŸ”¬"
X Link 2024-03-27T14:41Z [----] followers, [----] engagements

"People: cornell obviously is very strong academically but i have noticed compared to other similar-caliber schools people here are remarkably chill. people don't really work nights/weekends and encourage you to have work-life balance. everyone in nlp/ML is fun to hang out with"
X Link 2024-04-02T18:23Z [----] followers, [----] engagements

"Location: i'm sure New York City's reputation precedes itself so all I'm going to say is that I love it here living in manhattan is amazing. new york city. nowhere else like it"
X Link 2024-04-02T18:23Z [----] followers, [----] engagements

"Research: finally intellectually i'm doing the best work of my life. the talks here are also phenomenal. it's kind of like with your favorite bands all the academics you want to meet will come through new york city every year or two. this semester we've had talks on campus i really liked from yann lecun tim dettmers simran arora johannes balle ben recht. my collaborators are great"
X Link 2024-04-02T18:23Z [----] followers, [----] engagements

"New Research: a lot of talk today about "what happens" inside a language model since they spend the exact same amount of compute on each token regardless of difficulty. we touch on this question on our new theory paper Do Language Models Plan for Future Tokens"
X Link 2024-04-04T15:53Z 10.1K followers, 135.8K engagements

"model-trainers of Twitter: why does my model start getting worse over time without the grad norm noticeably increasing -- how can this be possible and how do i fix it (again please help :) )"
X Link 2024-04-11T14:55Z 10.1K followers, 106.6K engagements

"ok so it was a gradient bug. let me explain with the huggingface trainer when training on multiple GPUs sometimes the model is stored as an nn.Module but sometimes its wrapped in this DistributedDataParallel thing I was inadvertently forwarding on the nn.Module in multi-GPU meaning gradients werent syncing or averaging between steps each GPU was learning its own solution but the representations were aggregated between GPUs in the contrastive loss making for some really strange behavior sometimes train loss would go down for awhile as a single GPU learned to tell its own samples from other"
X Link 2024-04-11T20:28Z 10.1K followers, 51.8K engagements

"who is building the google docs for pandas dataframes i would use that often. i want to send df links save them to my notes plot graphs in the cloud etc"
X Link 2024-04-17T14:31Z 10.2K followers, 11.8K engagements

"you're telling me an 8B param model was trained on fifteen trillion tokens i didn't even know there was that much text in the world really interesting to see how scaling laws have changed best practices; GPT-3 was [---] billion params and trained on a paltry [---] billion tokens"
X Link 2024-04-18T16:33Z 10.2K followers, 84.4K engagements

"common dark thought pattern in research run baseline experiment change thing A also change thing B run new experiment collect results "wow thing A works""
X Link 2024-04-22T22:11Z 10.2K followers, [----] engagements

"in my opinion the Phi approach to training language models is just wrong i'm not convinced that training on less (albeit "higher-quality") data is better than training on as much data as possible i'm not convinced that training on synthetic data ever works better than training on the original data source (unless it's just a form of distillation after all distillation is magic) sorry Microsoft shouting "it works" without releasing any datasets or ablations isn't going to change my mind πŸΈβ˜•"
X Link 2024-04-23T12:46Z 10.7K followers, 67.2K engagements

"weights & biases is great software but has anyone ever learned anything from one of these graphs can't even see a correlation between two variables if their columns aren't next to each other"
X Link 2024-04-23T18:44Z 10.2K followers, 25.1K engagements

"move over meta the true biggest benefactor of open source machine learning is CHANEL TIL scikit-learn an open-source ML library has only one Platinum sponsor and it is . Chanel https://t.co/dBjaGntrAG TIL scikit-learn an open-source ML library has only one Platinum sponsor and it is . Chanel https://t.co/dBjaGntrAG"
X Link 2024-04-24T14:01Z 10.7K followers, 65.4K engagements

"@Yampeleg it's a very different architecture and starting from a different pre-trained model"
X Link 2024-04-27T15:34Z 10.8K followers, [----] engagements

"i wonder if the AI-is-going-to-kill-all-the-humans hype may have crested a wave at least for now i remember when a few accounts that would tweet "quit your job AGI is coming" and "scaling is all it takes" type stuff nonstop. don't see nearly as much compared to a year ago"
X Link 2024-04-28T15:33Z 10.8K followers, [----] engagements

"@Zonalic agree yes there could be an architecture or training process of a completely different class that breaks us out of this local minimum"
X Link 2024-04-28T21:57Z 10.8K followers, 12.1K engagements

"@VictorTaelin whats HVM CUDA"
X Link 2024-04-30T02:11Z 10.8K followers, [---] engagements

"okay so what's up with this - jan: no big open LLM releases - feb: no big open LLM releases - mar/apr: cohere Command-R databricks/mosaic DBRX Reka Core meta LLAMA-3. - may: no big open LLM releases"
X Link 2024-05-02T17:04Z 10.9K followers, 66.6K engagements

"i can feel the friday energy but this is where i'm at debugging-wise"
X Link 2024-05-03T22:00Z 10.8K followers, 12.6K engagements

"@bentleynathan1 not for a while thats the point of the barrier - I think it might time out after [--] or [--] or [--] min I forget"
X Link 2024-05-04T04:13Z 10.8K followers, [---] engagements

"i'm in vienna for #ICLR2024 dm me if you want to chat or grab a beer also come hang out at the poster session for Language Model Inversion on Friday i'll tweet about it once i know when and where. excited to DELVE in"
X Link 2024-05-06T20:23Z 10.9K followers, [----] engagements

"ok so i'm at ICLR; this is my first machine learning conference. as you might imagine it's all very fun and exciting. but these poster sessions are absolutely INSANE this is an airplane hanger crammed with hundreds of posters each with dozens of people talking over each other in the general direction of the presenter the paper topics all sound so cool but i can't get a word in edgewise. it reminds me of a crowded bar or airport security line in there. if i look at a poster too long someone will bump into me what's the right way to navigate this should i get there early to stand my ground or"
X Link 2024-05-08T18:35Z 10.9K followers, 38.4K engagements

"i think people have generally taken this blog post (The Bitter Lesson by Richard Sutton) far too seriously. some reminders: - you CAN come up with clever ideas and show that they work without a bazillion gpus - you CAN build useful systems at the 100M parameter scale - if you want to do important research remember that long-lasting breakthroughs will probably come from thinking up really different ways to do things rather than improving our current (already large & unwieldy) systems - (and while i'm at it: scaling is generally predictable and therefore is not really interesting"
X Link 2024-05-11T19:06Z 10.9K followers, 106.9K engagements

"by the way it's ironic to me that The Bitter Lesson was written by a professor since the ideas feel so antithetical to research in a small lab i think that's where Colin Raffel was trying to go with his talk "The Sweet Lesson" http://colinraffel.com/talks/dlrl2022how.pdf http://colinraffel.com/talks/dlrl2022how.pdf"
X Link 2024-05-11T19:08Z 10.9K followers, [----] engagements

"so apparently there are 10x more research papers written than there were ten years ago (at least in AI) i guess we're learning 10x more about the world each year now .or we're making discoveries at about the same rate but each individual paper is 10% as meaningful"
X Link 2024-05-15T18:33Z 10.9K followers, 21.8K engagements

"passed my A exam and got my master's degree; officially halfway through my phd cheers everyone"
X Link 2024-05-23T19:15Z 11K followers, 12.6K engagements

"i found this interview fascinating. leopold makes a lot of bold assumptions and is undoubtedly on a lot of amphetamines but has a fascinating worldview and has thought more about the future than almost anyone else in the know about AI yet so many people online disagree with his views. but is there anyone who has made an equally coherent case about why we won't achieve superintelligence by [----] or why automating AI research is hard actually why is the countercase so hard to argue here is it that by doing research on the topic you get automatically scaling-pilled .@leopoldasch on: - the"
X Link 2024-06-06T21:54Z 20.3K followers, 91.7K engagements

"an underdiscussed gotcha behind the search + LLM = AGI narrative is search is only valuable when statewide improvements are quantifiable this is the case in Go and coding problems w/ tests and this ARC benchmark. we can explore the (LLM-generated) state space and leverage traditional search algorithms to hill-climb toward better solutions but we cant easily measure improvement in the general case where tasks are much more abstract. how do you use MCTS to write a better essay or generate a more actionable plan to take over the world ARC-AGIs been hyped over the last week as a benchmark that"
X Link 2024-06-18T17:37Z 20.4K followers, 85.5K engagements

"googles Gemini paper has [---] authors which is more people than OpenAI has employees. yet gemini still underperforms OpenAIs best models (and i think anthropics too) what are most of those [---] people doing infra data whatever it is seems its not really necessary"
X Link 2024-07-06T20:53Z 15.1K followers, 56.1K engagements

"five topics you can talk about for [--] minutes with zero prep redux [--]. memorization capacity of language models [--]. geography and climate of San Francisco [--]. golf swing mechanics [--]. training tricks for text embedding models [--]. AGI What are five topics you can talk about for [--] minutes with zero prep [--]. The Red Shoes [--]. Housing Policy [--]. Kants moral philosophy [--]. Campus Novels [--]. Norm Macdonald What are five topics you can talk about for [--] minutes with zero prep [--]. The Red Shoes [--]. Housing Policy [--]. Kants moral philosophy [--]. Campus Novels [--]. Norm Macdonald"
X Link 2024-07-07T22:10Z 15.1K followers, 21.5K engagements

"@neurobiophysics @KtunaxaAmerika @SCDC87 @Edi_Danalache yep shout out dr torbert"
X Link 2024-07-09T07:55Z 45.1K followers, 11.8K engagements

"my goal: a one on one with mark zuckerberg my status: meeting DECLINED by admins ❎ my spirits: still high my story: starts all the way back in middle school year is [----] a young jack morris signs up for Facebook hip new social networking site its great a nice way to connect with friends all that was fine twelve years pass Facebook is now Meta i work there im also a phd student graduation just around the corner (only a few years to go) considering both industry and academic roles starting a company could be what i want but im not sure who at this company can i look up to as a friend and mentor"
X Link 2024-07-09T22:08Z 15.1K followers, 85.9K engagements

"by this time next year the typical ML/data scientist interview requirement will be a medium-level question from ML leetcode one hardcore session of of prompt engineering and five years of CUDA experience someone finally made leetcode for machine learning and it's everything we hoped it would be just solved the first exercise: computing a matrix-vector product without any tensor operations (only python lists allowed) https://t.co/qDRWIXvYSu https://t.co/2dnTEqkB56 someone finally made leetcode for machine learning and it's everything we hoped it would be just solved the first exercise:"
X Link 2024-07-11T20:49Z 15.4K followers, 28.5K engagements

"@itsmattchan what's routing"
X Link 2024-07-17T00:30Z 15.4K followers, [----] engagements

"reflections after one month as a research scientist intern at Meta - finally finished onboarding; i am out of trainings to complete. there were many - research is starting to heat up; spent lot of time brainstorming and now i have more ideas than i have time to execute. need to prioritize - colleague at Meta in SF is simply taking a week to work out of the Paris office. pretty sweet that you can just do that - next week everyone else will be gone for a conference (ICML in Vienna) so my calendar is completely white () - rode in my first self-driving car a Waymo. pretty smooth - SF marina and"
X Link 2024-07-18T15:47Z 15.4K followers, 39.1K engagements

"in my mind all the evidence points towards AI approaching above-average human reasoning capabilities sigmoidally what evidence is there that we can build superhuman-level AI in any domain (games dont count) and is this even possible with supervised learning yearly reminder everything looks exponential from the middle of a sigmoid https://t.co/DCeXCGjTTL yearly reminder everything looks exponential from the middle of a sigmoid https://t.co/DCeXCGjTTL"
X Link 2024-07-19T16:34Z 15.4K followers, 59K engagements

"@cheeetoo_ because you cant perfectly simulate the real world"
X Link 2024-07-19T16:49Z 15.4K followers, [----] engagements

"countries that have both trained large language models and manufactured nuclear weapons: USA UK France Israel China countries with nukes but no LLMs: Russia North Korea countries with LLMs but no nukes: Japan South Korea"
X Link 2024-07-25T15:19Z 15.5K followers, [---] engagements

"countries that have both trained large language models and manufactured nuclear weapons: USA UK France Israel China Russia countries with nukes but no LLMs: North Korea countries with LLMs but no nukes: Japan UAE South Korea"
X Link 2024-07-25T15:38Z 15.5K followers, [---] engagements

"if you have language model weights on your computer you also are in possession of powerful compression software just put the finishing touches on gptzip a little personal project for compressing strings with language models compress text w/ hf transformers 5x better rates than gzip"
X Link 2024-07-28T20:37Z 16K followers, 97K engagements

"@despinosagon lossless slower smaller filesize"
X Link 2024-07-29T05:46Z 16.2K followers, [---] engagements

"sign on now and watch me talk about my research 😎 (grateful to be invited although the choice of picture here is criminal) .@stanfordnlp #NLProc Seminar this Thursday will feature @jxmnop (Jack Morris) from Cornell University Jack will talk about Inverting Language Models. Non-Stanford registration: https://t.co/ZFQJunGBai. Zoom link will be sent to non-Stanford registrants 1hr prior to the talk. https://t.co/MeTrY8VUtr .@stanfordnlp #NLProc Seminar this Thursday will feature @jxmnop (Jack Morris) from Cornell University Jack will talk about Inverting Language Models. Non-Stanford"
X Link 2024-08-01T18:00Z 16.3K followers, 19.9K engagements

"funny little story about Extropic AI been curious about them for a while have twitter mutual who is an engineer/researcher for this company often tweets energy-based modeling and LM-quantumhype mumbojumbo last winter i wanted to get to the bottom of this meet up with extropic guy for drinks in nyc (west village) guy keeps mentioning energy-based modeling and LMs coincidentally a friend of mine (@yuntiandeng) wrote what is maybe the most well-known recent work on energy-based language modeling i like this paper i mention the paper offhand guy has never heard of it asks me to send it to him"
X Link 2024-08-06T17:35Z 16.7K followers, 190.5K engagements

"turns out this is what happens when u question the value proposition of Extropic AI"
X Link 2024-08-06T20:08Z 16.5K followers, 54.7K engagements

"things you definitely do NOT need to understand to be an expert on LLMs: - linear regression - bias variance trade off - most probability distributions (Gaussian Bernoulli poisson etc.) - RNNs - LSTMs - CNNs - higher-order calculus (beyond first derivatives + chain rule) - tensorflow - EM - kmeans - boosting & bagging - decision trees - random forests - graph neural networks - naive bayes - reinforcement learning (except RLHF maybe) - SVM - Gaussian processes"
X Link 2024-08-10T22:24Z 19.3K followers, 251K engagements

"if huggingface builds superintelligence theyll probably just open-source it and make it installable like pip install agi"
X Link 2024-08-14T00:31Z 19.3K followers, 74.9K engagements

"prompt optimization is such a hard problem and the algorithms are dumb. the fact that they work at all is baffling and people don't talk about it enough for those who don't know the prompt optimization problem is argmax_x (y x; ) where x is some prompt and is the loss for output y under language model . solving this problem exactly requires enumerating all possible prompts x. .and the space of possible 10-token prompts in a 50k vocab is [-------]. yet people regularly find solutions to these problems with 100-token prompts. how can this be possible the most popular algorithm (AutoPrompt/GCG) is"
X Link 2024-08-22T18:04Z 19.3K followers, 122.5K engagements

"@ahiajsbwks [----] steps and it depends on model size and other hyperparameters but approx ten minutes to hour for a 1b model on a 40gb a100 gpu"
X Link 2024-08-22T18:25Z 19.3K followers, [---] engagements

"@selfattentive good luck typing your prefix into OpenAI"
X Link 2024-08-23T00:22Z 19.3K followers, [----] engagements

"have a question about transformers and CUDA: my understanding is that GPUs perform operations so quickly that most of the cost comes from data on and off a lot of performance gains come from "fusing" operations together so they can be done without sending any data back to the CPU so why not run the entire transformer on the GPU send data once do all the operations on the GPU and then send back only the output"
X Link 2024-08-27T20:48Z 19.2K followers, 116.9K engagements

"a year or so ago there was a blip of excitement about running LLMs locally on your laptop or something; now it doesnt feel like many people are doing that. why not too slow too much loss in quality from quantization or turns out the APIs are actually convenient enough"
X Link 2024-08-28T19:20Z 43.7K followers, 305.9K engagements

"such a useful piece of code wish i had this a few years ago"
X Link 2024-08-29T13:43Z 19.1K followers, 152.1K engagements

"its gone long enough without happening that i am tweeting this research project into existence: i am wholly convinced that images can be exactly recovered from their embeddings and the fact that no one has done this so far is simply a skill issue a year ago now we showed that text can be exactly recovered from embeddings (Text Embeddings Reveal (Almost) As Much As Text). and theres nothing text-specific about the method our idea of iteratively refining a guess and guiding it closer to a ground-truth embedding should work in any modality. at least the iteration should work until one of two"
X Link 2024-08-31T00:50Z 19.2K followers, 93.3K engagements

"there are three types of AI people in SF: - idealist (care about conceptual aesthetics; in it for the beauty of it all; novelty function) - grinder/tech bro (love NVIDIA/TSLA; mostly just in it for the $$) - doomer (think AI might kill us; earnestly trying to save the world)"
X Link 2024-09-05T20:33Z 19.1K followers, [----] engagements

"TIME Magazine has rightly named famed deep learning pioneer ptrblock as the most influential person in Artificial Intelligence"
X Link 2024-09-05T22:54Z 19.2K followers, 117.6K engagements

"On why you should read more (research papers): one of the most valuable problems we could solve as a community is idea deduplication at the meta-project level the counterintuitive consequence is that the most effective way for researchers to cut through the AI hype and be more productive is to read more papers i think the most common waste of researchers' time is spending energy on problems that they don't know already someone has been working on i often come out of an interesting conversation with a fellow researcher thinking "oh wow that would be cool but has someone done this already""
X Link 2024-09-07T23:20Z 19.2K followers, 32.7K engagements

"no one told me that post-pandemic the most common work style at tech companies is not work from home or working from the office but a secret third thing: showing up just before lunch eating the free food and chilling for a couple hours then going back home again"
X Link 2024-09-17T20:13Z 19.2K followers, 20.8K engagements

"learning to use copilot after programming on my own for [--] years is bittersweet kind of feels like being a carpenter thats trained to cut perfect corners; now here comes a machine that can do it perfectly and much faster yet I somehow miss the satisfaction of doing it myself"
X Link 2024-09-19T04:15Z 19.1K followers, 22.9K engagements

"what's something that's actually fun that i can build in CUDA or triton when i was first learning to code i wrote games little puzzles animations. what can i make with CUDA besides just different matrix multiplications (preferably unrelated to machine learning)"
X Link 2024-09-23T14:08Z 19.2K followers, 63.9K engagements

"sneak preview 🍿 of our new embedding model: cde-small-v1 cde-small-v1 is the text embedding model that we (@srush_nlp and i) have been working on at Cornell for about a year tested the model yesterday on MTEB the text embeddings benchmark; turns out we have state-of-the-art results cde stands for "Contextual Document Embeddings". our model works differently from other text embedding models. consequently it's a little more involved to use. i'm actively working on this and should have instructions up in the next few days feels good to have the best small text embedding model in the world (if"
X Link 2024-09-25T16:05Z 19.2K followers, 67.5K engagements

"guess it turns out that langchain was just a fad something that we all tried and appreciated but eventually lost our zeal for such as tamagotchis silly bandz cup stacking etc"
X Link 2024-09-26T17:09Z 19.2K followers, 433.3K engagements

"programs & things you need to know to succeed as a PhD student in Computer Science: python jupyter notebook pytorch jax bash SLURM LaTeX github matplotlib seaborn wandb gmail google docs google slides google calendar the espresso maker zoom Adobe Illustrator"
X Link 2024-09-30T18:47Z 19.2K followers, 19.1K engagements

"once i realized GPUs operate at 150F under normal conditions that really changed my perspective honestly i'm surprised they don't break more often"
X Link 2024-09-30T22:43Z 19.3K followers, [----] engagements

"@ThinkInSysDev yeah we actually try this in the experiments; it performs a bit worse per-domain context tokens are critical"
X Link 2024-10-04T19:36Z 19.3K followers, [----] engagements

"im at #COLM2024 talk to me about whether language models plan for future tokens talk to me about contextual embeddings talk to me about your crazy new project idea talk to me about how to pick research ideas (not that i really know) talk to me about whatever talk to me"
X Link 2024-10-07T17:37Z 19K followers, [----] engagements

"man the huggingface team is cracked two days after i released my contextual embedding model which has a pretty different API @tomaarsen implemented CDE in sentence transformers you can already use it implementation was not at all trivial; those people just work fast"
X Link 2024-10-09T17:33Z 19K followers, 85.4K engagements

"oh no somehow i managed to format a list exactly like chatGPT (even though I wrote it myself) cool article about CDE (contextual document embeddings) out in Venture Beat πŸ“° also chatting with them got me thinking about what future work looks like in this line of research. here are a few ideas if you will allow me to pontificate for just a moment 🧐 - Multimodality. We https://t.co/qPyP0IY1AZ cool article about CDE (contextual document embeddings) out in Venture Beat πŸ“° also chatting with them got me thinking about what future work looks like in this line of research. here are a few ideas if"
X Link 2024-10-10T23:10Z 18.9K followers, [----] engagements

"geoffrey hinton working on neural networks in the 80s and 90s"
X Link 2024-10-11T15:04Z 19K followers, 37.4K engagements

"state space models are super neat & interesting but i have never seen any evidence that theyre smarter than transformers - only more efficient any architectural innovation that doesnt advance the pareto frontier of intelligence-per-parameter is an offramp on the road to AGI"
X Link 2024-10-14T20:42Z 19K followers, 31.5K engagements

"funny little section from Fast Semantic Extraction Using a Novel Neural Network Architecture (Collobert & Weston 2007)"
X Link 2024-10-16T02:00Z 19K followers, [----] engagements

"when we develop the final piece we need for agi it's gonna be released in a tweet with a link to arxiv paper about a new method named with a insane acronym like or CCs"
X Link 2024-10-19T20:02Z 46K followers, 16.9K engagements

"people should be doing multiple LLM forward passes with the same weights feels like you could get a lot more for same model size. and this is all differentiable so training this way is trivial surely diffusion models aren't the only way to successfully reuse representations"
X Link 2024-10-21T17:18Z 19K followers, 44K engagements

""Github claims that 40% of the code programmers write is written by Copilot. I was curious how they measured this number and so wanted to poke a bit into the telemetry." here's the link 🧲 http://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html http://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html"
X Link 2024-10-22T14:34Z 46K followers, [----] engagements

"dang guess sometimes architecture does make a difference"
X Link 2024-10-27T15:18Z 46K followers, 147.4K engagements

"tinygrad is running the funniest goodhart around right now they're obsessed with talking about how their library uses fewer lines of code than pytorch so their codebase is growing horizontally instead of vertically some parts are borderline unreadable to humans"
X Link 2024-10-28T18:56Z 46K followers, 222.4K engagements

"i don't know the full personal history here and it can be hard to tell why some ideas catch on and some don't nonetheless it's a good read and a fascinating time capsule: http://arxiv.org/abs/1706.05137 http://arxiv.org/abs/1706.05137"
X Link 2024-10-29T15:02Z 46K followers, [----] engagements

"giving a talk tonight at the NYC GenAI Collective Roundtable in Soho πŸ™ swing by if you want to hear about my research on contextual embeddings and how we made cde the first truly contextual text embedding model Going to be a packed house at our Research Roundtable in NYC this Tuesday Excited to see everyone πŸ”₯ https://t.co/ZrpLAJxdz2 Going to be a packed house at our Research Roundtable in NYC this Tuesday Excited to see everyone πŸ”₯ https://t.co/ZrpLAJxdz2"
X Link 2024-10-29T19:43Z 46K followers, [----] engagements

"what does it mean when the grad norm does this (the loss looks fine; learning rate has been constant this whole time too)"
X Link 2024-10-30T13:38Z 46K followers, 51.8K engagements

"just open-sourced the training and evaluation code for cde our state-of-the-art small text embedding model includes code for lots of hard stuff: * efficient clustering large datasets * contrastive training for SOTA retrieval models * our custom two-stage model architecture that embeds contextual tokens and uses them in downstream embeddings * a two-stage gradient caching technique that enables training our two-headed model efficiently * packing clusters and sampling from them even in distributed settings * on-the-fly filtering for clusters based on a pretrained model * more :)"
X Link 2024-10-30T19:24Z 46K followers, 39.6K engagements

"google scholar PDF reader dark mode. honestly makes any paper look awesome"
X Link 2024-11-01T16:35Z 19.8K followers, 20K engagements

"i decided you don't need lots of GPUs to do good research when i read "Are Emergent Abilities of Large Language Models a Mirage" (won award at NeurIPS 2023) if you buy that loss scales smoothly and predictably from small to large scale (both in model parameters and sequence length) then you should be able to validate and test most hypotheses on small models with small sequences both my most recent successful research results (language model inversion work and contextual embeddings) came from scaling down and running hundreds of tiny experiments with a tight short feedback loop if you're"
X Link 2024-11-04T16:33Z 19.8K followers, 70.2K engagements

"what are some examples of impressive & notable deep learning research that was done with [--] GPUs"
X Link 2024-11-04T23:57Z 19.8K followers, 92.2K engagements

"woke up to a bunch of these i don't deserve this"
X Link 2024-11-05T14:19Z 19.8K followers, 47.2K engagements

"@DaftOdyssey"
X Link 2024-11-05T14:28Z 19.6K followers, [----] engagements

"some people will hate me for saying this but my conclusion from moving to san francisco this year to do AI research was this: you dont need to move to san francisco to do AI research theres certainly a higher concentration of people who speak fluent AI and more people closely following the cutting edge - and there are really amazing researchers in SF (the people at OpenAI whoever made Gemini everyone at stanford & berkeley etc.) unlike most other cities SF definitely has an ingroup (which I was not a part of) of AI-adjacent folks who meet for parties and dinners and talk about things like"
X Link 2024-11-07T15:16Z 19.9K followers, 70.6K engagements

"it's been two months since OpenAI released O1 and it doesn't feel like we're even close to an open-source replication i'm wondering if this is the start of a new era one where academic research is so far behind that we can't even study the best & newest models (hopefully not)"
X Link 2024-11-12T18:16Z 21.5K followers, 70.1K engagements

"anybody who tells you to learn AI by reading a textbook is gatekeeping there simply isn't a textbook out there that covers this stuff. goodfellow Deep Learning is outdated (one subsection on language models). and the ML textbooks (bishop and murphy) are not really relevant"
X Link 2024-11-13T02:00Z 20.5K followers, 316.1K engagements

"@dejavucoder i think this is probably the issue in my case"
X Link 2024-11-14T19:49Z 20.1K followers, 12.6K engagements

"@0x_Broke_Boi then who's out there thinking big would you say"
X Link 2024-11-14T19:53Z 20.1K followers, [----] engagements

"more fun open-source research news - new paper drops (nGPT) - claims 4-20x training speedup over GPT - shocking - very cool - very valuable - community tries to reproduce - doesn't hold up - turns out baseline was busted - another cool new research idea oneshotted by github anon @francoisfleuret Bcs the baseline by the looks of it was wrong https://t.co/iMPdPQzKhY @francoisfleuret Bcs the baseline by the looks of it was wrong https://t.co/iMPdPQzKhY"
X Link 2024-11-18T21:45Z 20.5K followers, 130.3K engagements

"so the error in the transformer impl from nGPT was very easy to make the residual stream propagated as this x = norm(x) + attn(norm(x)) instead of this x = x + attn(norm(x)) TLDR this breaks everything and makes learning impossible. but why is there a simple explanation more fun open-source research news - new paper drops (nGPT) - claims 4-20x training speedup over GPT - shocking - very cool - very valuable - community tries to reproduce - doesn't hold up - turns out baseline was busted - another cool new research idea oneshotted by github anon https://t.co/StyA4fegjW more fun open-source"
X Link 2024-11-19T15:29Z 20.5K followers, 67.4K engagements

"this google guy made big headlines two years ago funniest part: he was duped into empathizing with LaMDA an extremely primitive language model by [----] standards. undertrained on low-quality data no RLHF/DPO etc. if he talked to the latest Gemini he would simply combust"
X Link 2024-11-22T17:23Z 20.5K followers, 100.6K engagements

"AI is a field you can get up to speed in and contribute to in just a year or two case in point: the 2018-2021 Google Brain residency produced a disproportionate number of elite researchers katherine lee jason wei luke metz barrett zoph colin raffel hardmaru + many more"
X Link 2024-11-25T16:02Z 20.5K followers, 99.9K engagements

"xAI just quietly dropped a new text embedding model and for some reason no one is talking about it; you can already query the model via API since the new grok training hasn't worked out maybe they are focusing on embeddings instead"
X Link 2024-12-04T15:40Z 21K followers, 13.8K engagements

"the story of entropix is a sad one; a cautionary tale over 3k github stars tens of thousands of likes & replies on X; unfortunately it is not real turns out you can vaguepost your way to the top with plots that look good with numbers that seem high and as long as you never evaluate the full thing on a known benchmark no one can refute your points top the confusing graphs off with some technical-sounding mumbojumbo ("varentropy" "entropy scaffolds".) and you're well on your way to a cult following Twitter could be a great place to share research but the algorithm these days encourages this"
X Link 2024-12-04T21:03Z 21.3K followers, 176.6K engagements

"@tensorqt more like lack of real evaluations or benchmarks its not my job to prove something doesnt work or navigate some convoluted rabbit hole to figure out why (and Im not even sure I could do so); burden is on the creator(s) to prove the discovery of anything useful"
X Link 2024-12-04T21:32Z 21.1K followers, [----] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@jxmnop
/creator/twitter::jxmnop