Dark | Light
# ![@AlexGDimakis Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::29178343.png) @AlexGDimakis Alex Dimakis

Alex Dimakis posts on X about model, paper, ai, in the the most. They currently have [------] followers and [---] posts still getting attention that total [---] engagements in the last [--] hours.

### Engagements: [---] [#](/creator/twitter::29178343/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::29178343/c:line/m:interactions.svg)

- [--] Week [---] -96%
- [--] Month [------] -46%
- [--] Months [-------] +15%
- [--] Year [-------] -74%

### Mentions: [--] [#](/creator/twitter::29178343/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::29178343/c:line/m:posts_active.svg)

- [--] Months [--] -62%
- [--] Year [--] -56%

### Followers: [------] [#](/creator/twitter::29178343/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::29178343/c:line/m:followers.svg)

- [--] Week [------] +0.12%
- [--] Month [------] +1.30%
- [--] Months [------] +8.80%
- [--] Year [------] +20%

### CreatorRank: undefined [#](/creator/twitter::29178343/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::29178343/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  [stocks](/list/stocks)  [countries](/list/countries)  [social networks](/list/social-networks)  [travel destinations](/list/travel-destinations)  [cryptocurrencies](/list/cryptocurrencies)  [automotive brands](/list/automotive-brands)  [products](/list/products)  [celebrities](/list/celebrities)  [finance](/list/finance) 

**Social topic influence**
[model](/topic/model), [paper](/topic/paper), [ai](/topic/ai), [in the](/topic/in-the), [open ai](/topic/open-ai), [6969](/topic/6969), [the first](/topic/the-first), [if you](/topic/if-you), [this is](/topic/this-is), [theory](/topic/theory)

**Top assets mentioned**
[Alphabet Inc Class A (GOOGL)](/topic/$googl) [Microsoft Corp. (MSFT)](/topic/microsoft)
### Top Social Posts
Top posts by engagements in the last [--] hours

"@pmddomingos My point is that a differentiable imagenet classifier can be finetuned to other problems or used as a feature extractor. Also discriminators are guiding training of generators. Concatenation of differentiable blocks is differentiable and widely used. An RF or XGB cannot do that"  
[X Link](https://x.com/AlexGDimakis/status/1507141058999816195)  2022-03-24T23:43Z 18.7K followers, [--] engagements


"Human bilinguals are more robust to dementia and cognitive decline. In our recent NeurIPS paper we show that bilingual GPT models are also more robust to structural damage in their neuron weights. Further we develop a theory. (1/n)"  
[X Link](https://x.com/anyuser/status/1622006950950014981)  2023-02-04T22:59Z 22.5K followers, 312.6K engagements


"I was surprised by a talk Yejin Choi (an NLP expert) gave yesterday in Berkeley on some surprising weaknesses of GPT4: As many humans know 237*757=179409 but GPT4 said [------]. For the easy problem of multiplying two [--] digit numbers they measured GPT4 accuracy being only 59% accuracy on [--] digit number multiplication. Only 4% on [--] digit number multiplication and zero on 5x5. Adding scratchpad helped GPT4 but only to 92% accuracy on multiplying two [--] digit numbers. Even more surprisingly finetuning GPT3 on 1.8m examples of [--] digit multiplication still only gives [--] percent test accuracy (in"  
[X Link](https://x.com/anyuser/status/1691600985938858432)  2023-08-16T00:01Z 22.5K followers, 1.7M engagements


"@boazbaraktcs @OpenAI @ilyasut @janleike Congratulations"  
[X Link](https://x.com/AlexGDimakis/status/1697075491847418067)  2023-08-31T02:35Z 15.2K followers, [----] engagements


"@roydanroy @boazbaraktcs @OpenAI @ilyasut @janleike its only true for large values of N"  
[X Link](https://x.com/AlexGDimakis/status/1697076182724874600)  2023-08-31T02:37Z 11.9K followers, [---] engagements


"Here are a few things I learned from the AI Institutes #AIHillDay showcasing in the Senate yesterday: [--]. Many on the Hill are talking about AI. Things are happening. [--]. It is not obvious to many in DC that universities play a key role in developing the *research* used in modern AI models. The visit mitigated that to some extent by showing fundamental research results from the [--] NSF institutes @NSF ranging from fundamentals in optimization generative AI trust in AI systems to applying AI in agriculture neuroscience next generation food systems edge devices and education. @AI_EDGE_INST@AI4OPT"  
[X Link](https://x.com/AlexGDimakis/status/1704664256170336300)  2023-09-21T01:10Z 12.2K followers, 21.2K engagements


"@madiator True story: We learned that in some states AI stands for Artificial Insemination - and that is perhaps the biggest AI problem this country will face"  
[X Link](https://x.com/AlexGDimakis/status/1704671374613041319)  2023-09-21T01:38Z 12.2K followers, [---] engagements


"@ryan_p_adams They mean Monte Carlo Monte Carlo. Like NY NY"  
[X Link](https://x.com/AlexGDimakis/status/1705051541084725553)  2023-09-22T02:49Z 12.2K followers, [----] engagements


"The specifications of building a web browser form the secret to the largest text dataset needed to train gpt5"  
[X Link](https://x.com/AlexGDimakis/status/1705762910348157335)  2023-09-24T01:55Z 12.7K followers, [----] engagements


"Excited to introduce open efficient customizable Pytorch code for training Large Language Models: #OpenLM 1B and 7B models released with scripts to make it easy reproduce and modify. We also kept intermediate checkpoints every 25B tokens. Contributors include @ssgrn @Mitchnw @Vaishaal @sy_gadre @achalddave @lschmidt3 @GeorgeSmyrnis1 and others. Thanks to @laion_ai @StabilityAI and the support of @NSF Institute @MLFoundations Everyone is welcome to join as a contributor and please let us know how OpenLM can help your research. OpenLM-1B and OpenLM-7B are some of the best models available and"  
[X Link](https://x.com/AlexGDimakis/status/1706738069045453245)  2023-09-26T18:30Z 11.9K followers, [----] engagements


"More research results (related to the "Reversal Curse") show that LLMs cannot (and should not) be used as databases. As shown in this interesting paper Inverse search fails. For example given training data 'John Von Neumann was born on December [--] 1903' the inverse search question is "Who was born on December [--] 1903'" The paper argues that transformers cannot perform inverse search (unless the knowledge was pretrained in reverse order in the training set) due to the left-to-right autoregressive training. This does not happen in the context (since everything can attend to everything there)."  
[X Link](https://x.com/AlexGDimakis/status/1708735593906078026)  2023-10-02T06:48Z 12.2K followers, 17.3K engagements


"@DimitrisPapail I thought the point of these recent papers is that causal pretraining causes fundamental limitations. Eg if a dataset contains "A= B" it is impossible to learn that "B=A" since the token for A cannot attend to the future B"  
[X Link](https://x.com/AlexGDimakis/status/1708856689493983398)  2023-10-02T14:49Z 13.1K followers, [---] engagements


"@DimitrisPapail @OpenAI Never seen this before"  
[X Link](https://x.com/AlexGDimakis/status/1709734440568561723)  2023-10-05T00:57Z 20.7K followers, [---] engagements


"@DimitrisPapail @OpenAI Lol"  
[X Link](https://x.com/AlexGDimakis/status/1709740382760607767)  2023-10-05T01:20Z 20.7K followers, [---] engagements


"VC: Pitch me. Pitch: We are thinking of building a startup that develops a web front-end and uses GPT API to create lyrics with rhyme and street vernacular to be performed over a backing beat or musical accompaniment. We call it GPT Wrapper VC: Get. out"  
[X Link](https://x.com/AlexGDimakis/status/1718683041420198292)  2023-10-29T17:35Z 11.9K followers, [----] engagements


"Cutting edge research on genAI has gone in a unicorn direction πŸ¦„"  
[X Link](https://x.com/AlexGDimakis/status/1719379374326079500)  2023-10-31T15:42Z 11.9K followers, [----] engagements


"Excited about our GenAI workshop this week"  
[X Link](https://x.com/AlexGDimakis/status/1729210689737376107)  2023-11-27T18:48Z 12.2K followers, [----] engagements


"We had a great panel on GenAI today"  
[X Link](https://x.com/AlexGDimakis/status/1729993979788239204)  2023-11-29T22:41Z 12.2K followers, [----] engagements


"Zaid Harchaoui presenting in our GenAI IFML workshop: can transformers finally learn multiplication or not"  
[X Link](https://x.com/AlexGDimakis/status/1730260589811773501)  2023-11-30T16:20Z 13.9K followers, [----] engagements


"Today in our GenAi IFML workshop Aditi Raghunathan telling us that transformers can learn different types of regression with finetuning but can catastrophically forget their previous prior"  
[X Link](https://x.com/AlexGDimakis/status/1730653918856663265)  2023-12-01T18:23Z 13.9K followers, [----] engagements


"@litu_rout_ Cool project congrats"  
[X Link](https://x.com/AlexGDimakis/status/1732078467053695097)  2023-12-05T16:44Z 12.2K followers, [---] engagements


"Some really cool work on inpainting and other inverse problems using a pre-trained Stable diffusion"  
[X Link](https://x.com/AlexGDimakis/status/1732127862017196222)  2023-12-05T20:00Z 12.2K followers, [----] engagements


""Datacomp1B is the first public dataset that outperforms OpenAI" #NeurIPS2023"  
[X Link](https://x.com/anyuser/status/1735340429380370530)  2023-12-14T16:46Z 22.5K followers, 38.1K engagements


"The Google Gemini paper was released today and has [---] authors. I was impressed but then found that a recent LHC physics paper with [----] authors. The first nine pages describe the research and the other [--] pages list the authors and their institutions. But that's not even the record. The most authors on a single peer-reviewed academic paper is [-----] and was achieved by the COVIDSurg and GlobalSurg Collaboratives at the University of Birmingham and the University of Edinburgh. All [---] Gemini coauthors are expected to quit Google and start [---] LLM startups next year"  
[X Link](https://x.com/anyuser/status/1737598802415018157)  2023-12-20T22:20Z 22.5K followers, 56.2K engagements


"Midjourney v6 is generating training data that seems very copyrighted. @Rahll It's clearly spitting out training data. Here's someone prompting 'Joaquin Phoenix Joker movie [----] screenshot from a movie movie scene'. https://t.co/haIEHzGDpB @Rahll It's clearly spitting out training data. Here's someone prompting 'Joaquin Phoenix Joker movie [----] screenshot from a movie movie scene'. https://t.co/haIEHzGDpB"  
[X Link](https://x.com/AlexGDimakis/status/1738772579337359430)  2023-12-24T04:04Z 12.7K followers, [----] engagements


"We just discovered that the inpainting model in Stable Diffusion is cheating. To clarify: Inpainting is a type of inverse problem where some missing data (pixels) must be filled in. In our testing some of the inpaintings from the SDXL inpainting model where a little 'too good': filling in details in the masked missing pixels they couldn't possibly know unless the model was cheating by observing masked pixels. So we created this test dog image with some Pink-Cyan boxes and then asked the model to inpaint it. We chose the masking region to fully contain the Pink and Cyan boxes so there is no"  
[X Link](https://x.com/anyuser/status/1747749640315789399)  2024-01-17T22:36Z 22.5K followers, 48.1K engagements


"Excited to be the director for the new Texas Center for Generative AI Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://t.co/jTZd4uV0ps Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://t.co/jTZd4uV0ps"  
[X Link](https://x.com/anyuser/status/1750580887194943640)  2024-01-25T18:06Z 22.5K followers, 52.6K engagements


"Interested in going to Greece this summer ISIT the International symposium on Information Theory will be in Athens Greece on July 7th to 12th [----]. I am co-chairing the tutorials with Lalitha Sankar from ASU. Please submit your Tutorial proposals by March 15th on information processing information theory and related fields Submission deadline: March [--] 2024"  
[X Link](https://x.com/AlexGDimakis/status/1755820033660289142)  2024-02-09T05:04Z 12.8K followers, [----] engagements


"The #Sora model is indeed incredible 🀯 congratulations to the OpenAI team. It is common for people to think that all the amazing research breakthroughs in AI (like #Sora) are happening inside companies like OpenAI while universities are becoming irrelevant. I want to highlight that the two first authors in the Sora paper Tim Brooks and Bill Peebles received their PhDs from UC Berkeley in [----] and their dissertation research is closely related to this breakthrough. Of course the compute infra and talent in OpenAI is critical for breakthroughs. I just want to point out that the training of the"  
[X Link](https://x.com/anyuser/status/1759283664527970584)  2024-02-18T18:28Z 22.5K followers, 44.7K engagements


"This is a very bizarre phenomenon. Some coordinates in the token vectors in transformers take huge values (2k or 7k while most values are below 1). These seem to lie on fixed feature dimensions and disapear in the last layers. LLMs are great but their internals are less explored. I'm excited to share very interesting findings in paper Massive Activations in Large Language Models LLMs have very few internal activations with drastically outsized magnitudes e.g. 100000x larger than others. (1/n) https://t.co/DRAgEPRHgw LLMs are great but their internals are less explored. I'm excited to share"  
[X Link](https://x.com/AlexGDimakis/status/1763810823108001924)  2024-03-02T06:17Z 13K followers, 12.7K engagements


"Excited to give one of the keynotes in the Data Council event in Austin in two weeks. I plan to talk about GenAI and Datacomp: Creating the Largest Public Multimodal Dataset in Academia. One central problem is what role can universities have in the GenAI ecosystem -- I think that that one of the roles that the open source community and academia can play is in the creation curation and evaluation of datasets. There are four underlying trends I identify: [--]. In the past data cleaning jobs were boring and menial tasks done manually by inexperienced researchers. As the datasets get larger data"  
[X Link](https://x.com/AlexGDimakis/status/1769126928005415212)  2024-03-16T22:21Z 13K followers, [----] engagements


"The biological brain is definitely more energy efficient (I'm reading that the human brain is roughly operating at 20w while one a100 needs 250W). But if we consider that one analog multiplication is physically computed at each point a synapse meets another neuron then 100b neurons will perform more multiplications I think)"  
[X Link](https://x.com/AlexGDimakis/status/1777900415939424456)  2024-04-10T03:24Z 13.1K followers, [--] engagements


"Thank you thats when twitter is at its best getting non-trivial answers in [--] min So the question is the use of the generated content and if it is competing with the owner of the copyrighted content. Is this legal consensus in your opinion or this is debated in current trials"  
[X Link](https://x.com/AlexGDimakis/status/1780375449576542523)  2024-04-16T23:19Z 13.1K followers, [---] engagements


"Phi-3 just released by Microsoft. Three small size models (3.8B 7B and 14B) trained on highly filtered and synthetic data. They report impressive performance since the 3.8B model (trained on 3T tokens) has MMLU of 69% matching Llama3 8B and the 7B Phi-3 model has 75% MMLU which I think makes it the best 7B model by far. The pre-training is done in two phases: in phase [--] its web data to teach general knowledge and phase [--] has heavily filtered web data and synthetic data created from GPT4. The number of training tokens is much smaller than Llama3 and I suspect the amazing performance comes from"  
[X Link](https://x.com/AlexGDimakis/status/1782642806927786377)  2024-04-23T05:28Z 13.1K followers, [----] engagements


"We are very excited that our first GH200 nodes have arrived in TACC for our GenAI center. Here is one. Fun facts: NVIDIA makes GH200 'superchips' (i.e. modules) a GH200 DGX box and a GH200 rack which are all different. As Dan Stanzione our TACC director kindly explained to me under the 15lb heat sink there is a GH200 superchip with 96GB memory. There is also 240GB of CPU RAM in a coherent address space so in theory should have as much memory as [--] A100x80gb. Also a 400GB/sec network interface is awesome. Excited to see what open models we can train on these"  
[X Link](https://x.com/AlexGDimakis/status/1783749606519517615)  2024-04-26T06:47Z 13.2K followers, 20.4K engagements


"This year I'm serving as the co-chair for ISIT Tutorials. The conference is happening in Athens Greece (Intercontinental) and the tutorials are on July 7th. Titles and presenters: Theory and Methods for Deep Generative Models Presenters: Yao Xie Taiji Suzuki and Xiuyuan Cheng Information-Theoretic Statistical and Algorithmic Foundations of RL Presenters: Yuejie Chi Yuxin Chen and Yuting Wei Language Model Inference: Theory and Algorithms Presenters: Ahmad Beirami and Ananda Theertha Suresh Graph Matching: Fundamental Limits and Efficient Algorithms Presenters: Hye Won Chung and Lele Wang"  
[X Link](https://x.com/AlexGDimakis/status/1796628540781154494)  2024-05-31T19:43Z 13.3K followers, 11.5K engagements


"This paper seems very interesting: say you train an LLM to play chess using only transcripts of games of players up to [----] elo. Is it possible that the model plays better than [----] elo (i.e. "transcends" the training data performance). It seems you get something from nothing and some information theory arguments that this should be impossible were discussed in conversations I had in the past. But this paper shows this can happen: training on [----] elo game transcripts and getting an LLM that plays at [----] Further the authors connect to a clean theoretical framework for why: it's ensembling"  
[X Link](https://x.com/anyuser/status/1803293833889042637)  2024-06-19T05:08Z 22.5K followers, 392.7K engagements


"@yevgets Any argument that says it's not surprising must also explain why it didn't happen at [----] elo training or why it doesn't happen at higher temperatures"  
[X Link](https://x.com/AlexGDimakis/status/1803514289959084305)  2024-06-19T19:44Z 13.8K followers, [----] engagements


"Our filtered dataset is only 1.4% of the original huge pool of 240T Common crawl scrape and it is very high quality"  
[X Link](https://x.com/AlexGDimakis/status/1816881315901505572)  2024-07-26T17:00Z 14K followers, [---] engagements


"More info here Congratulations to the whole DCLM team esp @lschmidt3 @Vaishaal @achalddave for their leadership. https://www.datacomp.ai/dclm/ https://www.datacomp.ai/dclm/"  
[X Link](https://x.com/AlexGDimakis/status/1816881318912803071)  2024-07-26T17:00Z 14K followers, [---] engagements


"A fantastic Economist article on Open-source AI by two pioneers: Martin Cassado and Ion Stoica. (plus a nice human-generated drawing). The main points I got from their Economist article: [--]. "Regulation hurts innovation": I agree and I am worried about Europe going in the wrong direction on this despite good intentions. [--]. "Open source makes systems safer": I agree and I also want to say more robust and more modular. [--]. "Open source drives innovation." I absolutely agree. One thing I want to add: Open-weights (i.e. Llama) is awesome but is not open-source. We should think of open weights as if"  
[X Link](https://x.com/AlexGDimakis/status/1817963738936611119)  2024-07-29T16:41Z 14K followers, 15.1K engagements


"Original post here: congratulations @martin_casado and Ion. https://x.com/martin_casado/status/1817947793677492318 Professor Ion Stoica and I wrote an article in the Economist arguing for the importance of Open Source in AI including the largest most powerful models. https://t.co/FsvJyWN30l https://x.com/martin_casado/status/1817947793677492318 Professor Ion Stoica and I wrote an article in the Economist arguing for the importance of Open Source in AI including the largest most powerful models. https://t.co/FsvJyWN30l"  
[X Link](https://x.com/AlexGDimakis/status/1817964119984849158)  2024-07-29T16:43Z 14K followers, [----] engagements


"@tidyanalysis Many companies have benefited from open source in the past. I think the main advantage is that you have the whole world developing or using your platform. Monetizing that is non-trivial but I think many have done it successfully (eg databricks on spark) and many have not"  
[X Link](https://x.com/AlexGDimakis/status/1818221224817803326)  2024-07-30T09:44Z 14.1K followers, [--] engagements


"This is the most scary Halloween costume I've seen πŸ‘» Time to start planning for Halloween. Ordering my new costume now https://t.co/bjRTUOOiCu Time to start planning for Halloween. Ordering my new costume now https://t.co/bjRTUOOiCu"  
[X Link](https://x.com/AlexGDimakis/status/1819289922282422760)  2024-08-02T08:31Z 14K followers, [----] engagements


"Try our grounded factuality checker here We also make it super easy to use by API (and free for now). https://playground.bespokelabs.ai/ https://playground.bespokelabs.ai/"  
[X Link](https://x.com/AlexGDimakis/status/1821965309185855992)  2024-08-09T17:42Z 15.2K followers, [----] engagements


"@ZachariahNKM Absolutely Send us an email at company@bespokelabs.ai for a free API key and to fact-check longer documents. We also are working on domain-specific factuality checkers"  
[X Link](https://x.com/AlexGDimakis/status/1822290511450763533)  2024-08-10T15:14Z 15.3K followers, [---] engagements


"Hey collective Twitter AI hivemind: which Supervised FineTuning library should we use for our research We are exploring libraries for SFT e.g. TRL Axolotl Torchtune or other options"  
[X Link](https://x.com/AlexGDimakis/status/1831406200103248219)  2024-09-04T18:57Z 15.2K followers, [----] engagements


"@bclavie ohh no thats an imam baildi (one of my favorite Greek (hmm) dishes)"  
[X Link](https://x.com/AlexGDimakis/status/1831503130493866428)  2024-09-05T01:22Z 15.2K followers, [----] engagements


"GPT is having a profound effect on how students write. Its verbose style full of cliches and 'fancy' out of place vocabulary is in every paper and draft I read. A few years back there were grammar errors and awkwardness -- but at least people had their own voice. Now scholarship is getting full of robotic triviality"  
[X Link](https://x.com/anyuser/status/1831833630022496515)  2024-09-05T23:15Z 22.5K followers, 951.3K engagements


"@nandofioretto Yes that's right. Structure and flow in writing help us organize our thought. Blindly using LLMs is an airbrush that makes it harder for people to see that they have muddled flow"  
[X Link](https://x.com/anyuser/status/1831895840799256669)  2024-09-06T03:22Z 22.5K followers, 42.9K engagements


"Thank you for your response Dimitris. I appreciate your take on the issue. It's true that a request for "a few typos" and fewer "fancy words" may help bring back a sense of authenticity to writing. Theres a delicate balance between polishing a draft and maintaining the writers original voice and sometimes that balance is lost when students rely too heavily on tools like GPT. I find that students are increasingly focused on perfecting their writing in a technical sense but often at the cost of depth originality and personal style. The quirks errors and occasional awkwardness that were once"  
[X Link](https://x.com/anyuser/status/1831955374758621686)  2024-09-06T07:19Z 22.5K followers, 132.2K engagements


"for getting a free api key for our models http://console.bespokelabs.ai http://console.bespokelabs.ai"  
[X Link](https://x.com/AlexGDimakis/status/1832531933747933390)  2024-09-07T21:30Z 15.2K followers, [---] engagements


"GANs are basically a training method. Basically you get gradients through another network (the discriminator ) solving a min-max problem as opposed to an optimization problem. You can train a GAN using generators that are unets or transformers or cnns. GANs where the best generative models until diffusions dethroned them circa [----]. They are also much harder to train"  
[X Link](https://x.com/AlexGDimakis/status/1838744932917612909)  2024-09-25T00:58Z 15.2K followers, [----] engagements


"To answer on architectures my understanding is that CNNs are the best and simplest architectures for vision unless you have a ton of data. Then ViTs start becoming better because they are more flexible and allow non-local features. Most of the innovation in my opinion didn't come from changing architecture but from where you get supervision ie self supervised methods discriminators and diffusion training methods (for different problems)"  
[X Link](https://x.com/AlexGDimakis/status/1838745731437596911)  2024-09-25T01:01Z 15.2K followers, [----] engagements


"This is a herculean effort of unifying the vast literature on diffusions for inverse problems (used for inpainting deblurring MRI etc) in a unified mathematical framework. Congratulations especially to the student coauthors for all the hard work. https://giannisdaras.github.io/publications/diffusion_survey.pdf Why are there so many different methods for using diffusion models for inverse problems πŸ€” And how do these methods relate to each other In this survey we review more than [--] different methods and we attempt to unify them into common mathematical formulations. https://t.co/B19YG31IdC"  
[X Link](https://x.com/AlexGDimakis/status/1839874260778840209)  2024-09-28T03:46Z 15.2K followers, [----] engagements


"@giannis_daras Current Phd level AGI test: Hey gpt here are [--] PDFs. Derive the score function approximation that each one is implicitly using"  
[X Link](https://x.com/AlexGDimakis/status/1839880861170139427)  2024-09-28T04:12Z 15.2K followers, [---] engagements


"I think keeping some trade secrets is a reasonable strategy in a competitive system. It's great that some communication is still open through blog posts but that does not pass the bar of peer reviewed research as Yann said. I still think that peer reviewed research is the most robust long term way to make scientific progress despite it's many limitations (which are mainly happening in AI because the field is growing ie more submissions than experts). A good question is when there is high employee churn it's impossible to keep trade secrets so maybe it's a better strategy to be more open to"  
[X Link](https://x.com/AlexGDimakis/status/1840454563830415822)  2024-09-29T18:12Z 15.2K followers, [----] engagements


"@QuanquanGu @polynoamial @ylecun @thomaspower @OpenAI Lol wrote similar points in parallel πŸ‘"  
[X Link](https://x.com/AlexGDimakis/status/1840486125355323669)  2024-09-29T20:17Z 15.2K followers, [---] engagements


"@RishiSonthalia @polynoamial @ylecun @thomaspower @OpenAI Lol sorry I believe the opposite. Wanted to say signal to bs ratio"  
[X Link](https://x.com/AlexGDimakis/status/1840515788995711340)  2024-09-29T22:15Z 15.2K followers, [---] engagements


"For the first (and probably last) time in my life I understand the technical details of both the physics and chemistry Nobel prizes. BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and the other half jointly to Demis Hassabis and John M. Jumper for protein structure prediction. https://t.co/gYrdFFcD4T BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and"  
[X Link](https://x.com/anyuser/status/1843995475743228128)  2024-10-09T12:42Z 22.5K followers, 56.4K engagements


"@apostoliev Super cool what model did you use"  
[X Link](https://x.com/AlexGDimakis/status/1852846213525389614)  2024-11-02T22:52Z 15.2K followers, [--] engagements


"AI monoliths vs Unix Philosophy: The case for small specialized models. The current thinking in AI is that AGI is coming and that one gigantic model will be able to reason and solve business problems ranging from customer support to product development. Currently agents are basically big system prompts on the same gigantic model. Through prompt engineering AI builders are trying to plan and execute complex multi-step processes. This is not working very well. This monolith view of AI is in sharp contrast to how we teach engineers to build systems. When multiple people have to build complex"  
[X Link](https://x.com/AlexGDimakis/status/1855302511412003105)  2024-11-09T17:32Z 15.3K followers, [----] engagements


"@ifeelbig I think progress towards AGI is amazing. I just dont think the best way to solve practical AI engineering problems is through prompts to a gigantic model. Its slow and wastes a lot of energy and compute cannot be competitive to small specialized models"  
[X Link](https://x.com/AlexGDimakis/status/1855309697802477867)  2024-11-09T18:01Z 15.3K followers, [---] engagements


"(2/n) Evalchemy πŸ§ͺ offers optimized Eval Performance: Many LM benchmarks are not optimized for performance and cost and can take dozens of hours to compute even if you have a small model. Evalchemy can run the full battery of popular benchmarks in about 4h for a Llama8B (more than 3x acceleration according to our benchmarks due to parallelism) and allows easy installation and a consistent platform to run benchmarks and keep track in a leaderboard. We also support adding your own custom benchmarks"  
[X Link](https://x.com/AlexGDimakis/status/1858545286744002805)  2024-11-18T16:18Z 15.4K followers, [---] engagements


"@spirosx I remember those days. It would be like turning down an OpenAI offer for a nonexistent startup πŸ˜…"  
[X Link](https://x.com/AlexGDimakis/status/1861843770196812277)  2024-11-27T18:45Z 15.4K followers, [---] engagements


"@spirosx Correct. Joining a drama-free OpenAI vs nonexistent startup in uiuc incubator"  
[X Link](https://x.com/AlexGDimakis/status/1861846485274591708)  2024-11-27T18:56Z 15.4K followers, [---] engagements


"There are two interesting things going on here: First there is no red in this picture. Our brains are filling the red color (you can check if you zoom in). The picture has only light blue black and white. Also Claude sees a gondola ski chair on a blue background. Source: This was posted on fb by Scott Aaronson from a Quantum Physics post"  
[X Link](https://x.com/AlexGDimakis/status/1865634281361678812)  2024-12-08T05:47Z 15.4K followers, [----] engagements


"Life update: I am excited to announce that I will be starting as a Professor in UC Berkeley in the EECS Department. I spend [--] wonderful years teaching in UT Austin and I am grateful to all my colleagues and students there and extremely proud of what we have achieved in AI in UT Austin and I plan to continue my numerous UT close collaborations. I will also continue as Chief Scientist in Bespoke Labs making it much easier now being in the Bay area. I received my Phd in [----] from @Berkeley_EECS and I am thrilled to be back. I am grateful for this new opportunity"  
[X Link](https://x.com/anyuser/status/1869124346264043827)  2024-12-17T20:55Z 22.5K followers, 110.9K engagements


"@jefrankle @databricks Congrats thats quite a large number πŸ€“"  
[X Link](https://x.com/AlexGDimakis/status/1869125842158387307)  2024-12-17T21:01Z 16.1K followers, [---] engagements


"This is an interesting definition of AGI: Youll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible. ARC-AGI is a very cool test and o3 is amazing but as this thread argues more evals are needed. People are panicking with O3 doing really well on ARC-AGI so wanted to share some perspectives. [--]. OpenAI trained on 75% of the training set of ARC-AGI. On X there has been a lot of discussion around this but wanted to share what i think. So training on train set is fine by ML People are panicking with O3 doing really well on"  
[X Link](https://x.com/AlexGDimakis/status/1871464903485321285)  2024-12-24T07:56Z 16.1K followers, [----] engagements


"I was very surprised that straightforward SFT on a few thousand examples produces a posttraining dataset that gives o1 level reasoning abilities. Congratulations to the team: they show there is absolutely no moat in reasoning and no RL is needed if you can get a few thousand CoTs 1/6 πŸš€ Introducing Sky-T1-32B-Preview our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks trained under $450 πŸ“ŠBlog: https://t.co/LtuTJeilmv πŸ‹β™€Model weights: https://t.co/Vn1dmtrHWo https://t.co/KLk6zZr9KA 1/6 πŸš€ Introducing Sky-T1-32B-Preview our fully"  
[X Link](https://x.com/AlexGDimakis/status/1879078546435674470)  2025-01-14T08:10Z 16.2K followers, [----] engagements


"Yes we can achieve it without distilling. Just have humans solve [-----] math and coding problems and write detailed solutions. Routinely done in every big university in one course with [----] students and [--] homework problems each. Frontier labs paid hundreds of millions for humans to label much more than that"  
[X Link](https://x.com/AlexGDimakis/status/1879274248147243278)  2025-01-14T21:07Z 16.2K followers, [---] engagements


"The definition of art is quite non-trivial I'm afraid but I enjoy your Shannon-theoretic approach. :) As I'm learning from Britannica Marcel Duchamp submitted a urinal to a public exhibition in NYC. Now "Through this act Duchamp put forth a new definition of what constitutes a work of art: he implied that it is enough for an artist to deem something art and put it in a publicly accepted venue." So if we consider art what is displayed in reputable museums (and hence deemed as art by the professional art curators) AI generated art is art. But maybe this is like the first photographs taken that"  
[X Link](https://x.com/AlexGDimakis/status/1881246634677571878)  2025-01-20T07:45Z 16.8K followers, [---] engagements


"@plasmatic99 @hardmaru Hmm I don't think this is the problem. I believe Llama even admitted to training on libgen in publicly released docs. https://www.rollingstone.com/culture/culture-news/ai-meta-pirated-library-zuckerberg-1235235394/ https://www.rollingstone.com/culture/culture-news/ai-meta-pirated-library-zuckerberg-1235235394/"  
[X Link](https://x.com/AlexGDimakis/status/1881511215224205602)  2025-01-21T01:16Z 17K followers, [---] engagements


"Most AI researchers I talk to have been a bit shocked by DeepSeek-R1 and its performance. My preliminary understanding nuggets: [--]. Simple post-training recipe called GRPO: Start with a good model and reward for correctness and style outcomes. No PRM no MCTS no fancy reward models. Basically checks if the answer is correct. πŸ˜… [--]. Small models can reason very very well with correct distillation post-training. They released a 1.5B model () that is better than Claude and Llama 405B in AIME24. Also their distilled 7B model seems better than o1 preview. πŸ€“ [--]. The datasets used are not released if I"  
[X Link](https://x.com/anyuser/status/1881511481164079507)  2025-01-21T01:17Z 22.5K followers, 184.1K engagements


"@Mag_Jembrih is it good in your tests"  
[X Link](https://x.com/AlexGDimakis/status/1881665517737697709)  2025-01-21T11:29Z 17.3K followers, [----] engagements


"here is our blog-post for the release of the reasoning model and dataset. https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness- https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-"  
[X Link](https://x.com/AlexGDimakis/status/1882136005261849011)  2025-01-22T18:39Z 16.8K followers, [--] engagements


"Here we mean simply what you said: create data with R1 and SFT a Qwen32. You're right there is also the correct form of distillation where you look at the logits of the teacher and use that in the loss but we didn't do that. To the best of my knowledge when people say they distill LLMs they mean the SFT simplification"  
[X Link](https://x.com/AlexGDimakis/status/1882171115373728063)  2025-01-22T20:58Z 18K followers, [---] engagements


"@jxmnop Its even simpler than ppo: ppo needs a reward model. This is basically try to solve the problem for correct solutions call them positive for wrong solutions call them negative and do roughly dpo. I think thats it but also weighted by style a bit"  
[X Link](https://x.com/AlexGDimakis/status/1882179999526158705)  2025-01-22T21:34Z 17.6K followers, 10.8K engagements


"@bookwormengr Yes indeed. Further our dataset is only 17k questions while DeepSeek distilled with 800k. Most importantly ours is open so anyone can improve it or distill their own models with it"  
[X Link](https://x.com/AlexGDimakis/status/1882225575722668230)  2025-01-23T00:35Z 17.6K followers, [---] engagements


"@mignano You're right in terms of value creation. But then after a startup finds a valuable thin wrapper they should probably start making it out of thicker paper or plastic to establish a moat right Unless there is another moat through business tactics or something else"  
[X Link](https://x.com/AlexGDimakis/status/1882581065602322746)  2025-01-24T00:07Z 17.3K followers, [--] engagements


"@victor207755822 Congratulations for your contribution. I think DeepSeek-R1 has earned its place in the AI history books"  
[X Link](https://x.com/AlexGDimakis/status/1882857668987310522)  2025-01-24T18:27Z 18K followers, [----] engagements


"LM Arena confirms that DeepSeek-R1 is very good. Breaking News: DeepSeek-R1 surges to the top-3 in Arena🐳 Now ranked #3 Overall matching the top reasoning model o1 while being 20x cheaper and open-weight Highlights: - #1 in technical domains: Hard Prompts Coding Math - Joint #1 under Style Control - MIT-licensed A https://t.co/gwpgD4hmYI Breaking News: DeepSeek-R1 surges to the top-3 in Arena🐳 Now ranked #3 Overall matching the top reasoning model o1 while being 20x cheaper and open-weight Highlights: - #1 in technical domains: Hard Prompts Coding Math - Joint #1 under Style Control -"  
[X Link](https://x.com/AlexGDimakis/status/1882858024353800701)  2025-01-24T18:28Z 18.1K followers, [----] engagements


"@percyliang @deepseek_ai We are working on fixing that and create the largest open reasoning dataset. More coming very soon πŸ˜‰"  
[X Link](https://x.com/anyuser/status/1883420222029476177)  2025-01-26T07:42Z 22.5K followers, 37.9K engagements


"Our reasoning dataset Bespoke-Stratos-17k is trending on Huggingface. I think its the best reasoning dataset available today. We are one of the top trending datasets on HuggingFace today https://t.co/1KX73898It We are one of the top trending datasets on HuggingFace today https://t.co/1KX73898It"  
[X Link](https://x.com/AlexGDimakis/status/1883948783760990586)  2025-01-27T18:42Z 18.1K followers, [----] engagements


"Had great fun in the Effortless podcast where we discussed how Post-training and data curation is becoming the new hot space with @amitp42 and @dheeraj https://youtu.be/RmHhe2KEIu0si=u65fWwiwQ-vWcAqs https://youtu.be/RmHhe2KEIu0si=u65fWwiwQ-vWcAqs"  
[X Link](https://x.com/AlexGDimakis/status/1884485685249466585)  2025-01-29T06:16Z 18.5K followers, [----] engagements


"Excited about the popularity of our reasoning datasets. Our small (Bespoke-Stratos-17k) and large (OpenThoughts-114k) datasets are in no2 and no4 trending datasets in Huggingface. Multiple folks reaching out that they are using them to train their own reasoning models. Train your own DeepSeek-R1 distilled variant at home"  
[X Link](https://x.com/AlexGDimakis/status/1884700622831968762)  2025-01-29T20:30Z 18.5K followers, [----] engagements


"Discovered a very interesting thing about DeepSeek-R1 and all reasoning models: The wrong answers are much longer while the correct answers are much shorter. Even on the same question when we re-run the model it sometimes produces a short (usually correct) answer or a wrong verbose one. Based on this I'd like to propose a simple idea called Laconic decoding: Run the model [--] times (in parallel) and pick the answer with the smallest number of tokens. Our preliminary results show that this decoding gives +6-7% on AIME24 with only a few parallel runs. I think this is better (and faster) than"  
[X Link](https://x.com/anyuser/status/1885447830120362099)  2025-01-31T21:59Z 22.5K followers, 222.8K engagements


"@NeginRaoof_ Btw we can call this shortest of k decoding as opposed to best of k consensus of k etc. but laconic has a connection to humans look it up"  
[X Link](https://x.com/AlexGDimakis/status/1885470866877895166)  2025-01-31T23:30Z 18.7K followers, [----] engagements


"@rudiranck We will systematically compare. My intuition is that when you do trial and error you dont need consensus. Youd be better off doing something reflection realized you rambled for [--] minutes or you got lucky and found the key to the answer"  
[X Link](https://x.com/AlexGDimakis/status/1885584597393826282)  2025-02-01T07:02Z 18.6K followers, [---] engagements


"@NoahB1904 Did you try with the 32B or 7B distilled DeepSeeks_R1s Thats what we're mostly worried about not the big-R1"  
[X Link](https://x.com/AlexGDimakis/status/1887956647852826959)  2025-02-07T20:08Z 18.7K followers, [--] engagements


"You can already post on arxiv and ignore the peer reviewed system. The expensive thing is peoples attention and the peer-reviewed system is actually a (noisy) equalizer. To elaborate: The reason the peer-reviewed system has worked as the only known stable system for the progress of science is because it acts as a filter for quality. When a paper is published at Neurips or Nature this is a signal that some people consider it good enough to pass this bar. If it wins a best paper award or oral even more so. Reviewers were forced to read it as a service mechanism that phd students (plus academics"  
[X Link](https://x.com/AlexGDimakis/status/1903350670125920563)  2025-03-22T07:38Z 19.1K followers, 27.7K engagements


"@ipeirotis @yoavgo Yes but you got a PhD tenure track job and tenure because the peer reviewing system (and nsf grant reviewing system ) helped first to establish your personal brand. Also the paper brings a different type of credibility compared to a non reviewed blog post"  
[X Link](https://x.com/AlexGDimakis/status/1903583955972157568)  2025-03-22T23:05Z 19K followers, [---] engagements


"@DimitrisPapail Is this some sort of alignment to protect from deep fakes or something"  
[X Link](https://x.com/AlexGDimakis/status/1905062575576055849)  2025-03-27T01:01Z 19K followers, [----] engagements


"We are excited to release the OpenThinker2 reasoning models and data. In summary: [--]. Openthinker32B Outperforms DeepSeekR1-32B in reasoning. [--]. Fully open source open weights and open data (1M carefully curated samples). [--]. Post-trained only with SFT. RL post-training will likely further improve performance. Read the whole story.πŸ‘‡ Turns out its possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data OpenThoughts2-1M curated by selecting quality instructions from diverse sources. 🧡 (1/n)"  
[X Link](https://x.com/anyuser/status/1907837879902224862)  2025-04-03T16:49Z 22.5K followers, 16.7K engagements


""RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time the model tries [--] times (the Group in GRPO) and a gradient step is performed to increase the reward which is a very simple verification of the correct answers repeated thousands of times on the same problem. The shocking finding is that the model does not overfit to this one question: RL on one example makes the model better in MATH500 and other benchmarks. (If instead"  
[X Link](https://x.com/anyuser/status/1921348214525219206)  2025-05-10T23:34Z 22.5K followers, 354.5K engagements


"The thing I have been trying to understand is: If you are in n dimensions there are 2n possible directions to explore to descent. But gradients give you an exponential boost there they give you a direction of descent. With text-based optimization if it's blind it will be inefficient. The debate between RL and evolutionary programming has existed for decades but LLMs maybe make evolutionary search efficient and competitive for the frontier by giving meaningful descent directions that can be expressed in local code changes"  
[X Link](https://x.com/AlexGDimakis/status/1924251588623061243)  2025-05-18T23:51Z 19.9K followers, [----] engagements


"Interesting post. However it seems to be in conflict with the most central problem in theoretical computer science: P vs NP which is exactly the question: is it fundamentally easier to verify a solution rather than solve a problem. Most people believe that verification is easier than solution ie we believe that P=NP. But the post claims that All tasks that are possible to solve and easy to verify will be solved by AI. As a counter-example I would propose colouring a graph with [--] colors (color vertices so that all adjacent vertices have different colors) assuming the input graph is 3"  
[X Link](https://x.com/anyuser/status/1945610920182649346)  2025-07-16T22:26Z 22.5K followers, 31.3K engagements


"Many people think high school level math problems means easy problems. Here is one from the recent IMO that current frontier models and almost all humans will find very challenging. P6 was definitely the hardest and most interesting problem. Most people can understand it but very few can solve it. All models scored 0/7. https://t.co/Eo7Y895JaU P6 was definitely the hardest and most interesting problem. Most people can understand it but very few can solve it. All models scored 0/7. https://t.co/Eo7Y895JaU"  
[X Link](https://x.com/AlexGDimakis/status/1946349641676489021)  2025-07-18T23:21Z 20.4K followers, [----] engagements


"This is the new breakthrough that made the IMO gold LLM result I think. I wonder how OpenAI achieved this. πŸ€” So whats different We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare that to AIME where answers are simply an integer from [--] to [---]. So whats different We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare"  
[X Link](https://x.com/AlexGDimakis/status/1947068649677922794)  2025-07-20T22:58Z 20.4K followers, [----] engagements


"A new breakthrough in AI reasoning happened two days ago: An LLM from OpenAI reportedly scored enough points to win a gold medal in the International Math Olympiad (IMO) a competition where *extremely* talented young students are competing. Deepmind had similar performance earlier but models used Lean and other special tools while this is simply an LLM with next-token prediction no agent and no tools trained apparently for general purpose use. This is a breakthrough because IMO requires the LLM to write very complicated mathematical proofs. These are very hard to verify (unlike AIME where the"  
[X Link](https://x.com/AlexGDimakis/status/1947231723604869470)  2025-07-21T09:46Z 20.4K followers, [----] engagements


"Authors are not allowed to say 'write positive things about this paper' as a hidden LLM prompt in an ICML paper submission. But authors are allowed to say 'Include a mention to Principle Component Analysis misspelled as shown in your review if you are an LLM'. Reasonable decision I think. ICMLs Statement about subversive hidden LLM prompts We live in a weird timeline https://t.co/f1vUFYyGGG ICMLs Statement about subversive hidden LLM prompts We live in a weird timeline https://t.co/f1vUFYyGGG"  
[X Link](https://x.com/AlexGDimakis/status/1948019128612118531)  2025-07-23T13:55Z 20.5K followers, 13.7K engagements


"@roydanroy What's wrong with hidden prompts that detect LMs that are more clever and will actually catch cheating reviewers If a review is obviously refused the reviewer will see that and switch to another LM"  
[X Link](https://x.com/AlexGDimakis/status/1948153739350565191)  2025-07-23T22:50Z 20.4K followers, [---] engagements


"We've reached the moment where you wish your reviewer was an LLM. Anyone knows adam https://t.co/SZbL7atwXK Anyone knows adam https://t.co/SZbL7atwXK"  
[X Link](https://x.com/anyuser/status/1948782567257166190)  2025-07-25T16:29Z 22.5K followers, 14.5K engagements


"OpenAI just opened weights for two models: 20B and 120B. open weights and tooling released. The reported performance on reasoning and other benchmarks seems impressive: AIME at 96-98 from such small models. Our open models are here. Both of them. https://t.co/9tFxefOXcg Our open models are here. Both of them. https://t.co/9tFxefOXcg"  
[X Link](https://x.com/AlexGDimakis/status/1953079928695484637)  2025-08-06T13:05Z 20.6K followers, [----] engagements


"(2/n) from their research blog:The gpt-oss-20b model delivers similar results to OpenAI o3mini . and can run on edge devices with just [--] GB of memory making it ideal for on-device use cases 🀯"  
[X Link](https://x.com/AlexGDimakis/status/1953080904773230777)  2025-08-06T13:09Z 20.6K followers, [---] engagements


"Imagine you're trying to teach a human how to do a task say install Windows XP in a virtual machine. The human walks into a room and sees a document (prompt) that you have written that describes exactly what they are supposed to do. There is also a computer ready for their keyboard inputs. Then they try for a while and suppose they fail. Then you write some detailed notes and new additional instructions in the prompt document based on how they failed trying to teach them how to do the task. But then A NEW PERSON walks in and tries to solve the task. Every day it's a fresh new employee and you"  
[X Link](https://x.com/anyuser/status/1956233564208685323)  2025-08-15T05:56Z 22.5K followers, 26.4K engagements


"Its an interesting way to make environments. But one thing I don't understand: The reason we want environments is to be able to train agents. So we need something rendering the pixels and also a backend. I can believe that a big diffusion model could generate the pixels of an App but how would we have a back-end that would allow us to check if a task was successfully completed or not"  
[X Link](https://x.com/AlexGDimakis/status/1959399739423875547)  2025-08-23T23:37Z 20.8K followers, [---] engagements


"Very interesting work on training long-horizon web agents. More evidence that prompt driven graph-based agents are not going to take us very far we need SFT+RL. Super thrilled to WebExplorer which is a simple yet effective approach to train long-horizon web agents. Instead of depending heavily on rigid pre-defined graph structures WebExplorer utilizes the model-based exploration strategy to synthesize high-quality agentic data. Our 8B https://t.co/cQfPI2d30v Super thrilled to WebExplorer which is a simple yet effective approach to train long-horizon web agents. Instead of depending heavily on"  
[X Link](https://x.com/AlexGDimakis/status/1965553118936179081)  2025-09-09T23:09Z 20.9K followers, 16.5K engagements


"What are RL environments Are they just evals There is significant confusion in the community so here is my opinion: My answer is inspired by Terminal-bench an elegant framework for creating RL environments evaluating agents and even training agents. First an RL environment is simply a Docker container. It contains three things: [--]. A snapshot of the state of the world when a problem happened. [--]. A task description and [--]. A reward that verifies if the agent has solved the task. Can be using LLM as a judge or run tests. For example lets take the 'broken-python' environment in Terminal bench. The"  
[X Link](https://x.com/anyuser/status/1965947230696910935)  2025-09-11T01:15Z 22.5K followers, 34.6K engagements


"(2/2) The elegance of Terminal-bench is that it packages the whole state of the world in a Docker container and allows agents to try different things and check if they solved the problem. (The agent and the tools can live inside the docker container) The broken python environment and task is fully contained here In general I believe that terminal-bench is an extremely powerful framework for evaluating and training agents in many domains including devops software engineering scientific computing and many other domains basically everything that doesn't need agents controlling graphical user"  
[X Link](https://x.com/AlexGDimakis/status/1965947233226150391)  2025-09-11T01:15Z 20.9K followers, [----] engagements


"Very interesting piece of history I just learned from Ion Stoica in AI Native event: Databricks was founded because Hortonworks would not support the Spark open source project so some company needed to be created to support it"  
[X Link](https://x.com/AlexGDimakis/status/1859712895745130981)  2024-11-21T21:38Z 22.5K followers, [----] engagements


"Im excited to introduce Evalchemy πŸ§ͺ a unified platform for evaluating LLMs. If you want to evaluate an LLM you may want to run popular benchmarks on your model like MTBench WildBench RepoBench IFEval AlpacaEval etc as well as standard pre-training metrics like MMLU. This requires you to download and install more than [--] repos each with different dependencies and issues. This is as you might expect an actual nightmare. (1/n) https://github.com/mlfoundations/evalchemy https://github.com/mlfoundations/evalchemy"  
[X Link](https://x.com/anyuser/status/1858545284386803975)  2024-11-18T16:18Z 22.5K followers, 148K engagements


"The multiple answers mystery is the most surprising thing we stumbled on from OpenThoughts: Sampling multiple answers for the same question is better than having more questions each answered once. To explain: Say you are creating a dataset of questions and answers to SFT a reasoning llm. You can take [----] questions (eg from stackexchange) and answer them with deepseekR1. Or you can take [---] questions (from the same distribution) and answer each question *twice* independently with deepseekR1. Which one is a better dataset Surprisingly if you re-answer the same questions its a better dataset"  
[X Link](https://x.com/AlexGDimakis/status/1997753658357022856)  2025-12-07T19:42Z 22.5K followers, 28.5K engagements


"Live demo of llava on a MacBook in front of thousands at #NeurIPS2023 With [--] seconds left in your talk timeslot. That was brave and it worked πŸ‘"  
[X Link](https://x.com/AlexGDimakis/status/1735332419765445042)  2023-12-14T16:14Z 21.5K followers, [----] engagements


"The Berkeley Sky computing lab just trained Sky-T1-32B-Preview a GPT-o1 level reasoning model spending only $450 to create the instruction dataset. The data is 17K math and coding problems solved step by step. They created this dataset by prompting QwQ at $450 cost. Can it be done without another reasoning model to distill Teach a [----] student class and assign [--] homework problems. Side benefit: make $10M by charging $10K tuition"  
[X Link](https://x.com/AlexGDimakis/status/1879228531353518283)  2025-01-14T18:06Z 18.9K followers, 15.9K engagements


"DeepSeek-R1 is amazing but they did not release their reasoning dataset. We release a high-quality open reasoning dataset building on the Berkeley NovaSky Sky-T1 pipeline and R1. Using this we post-train a 32B model Bespoke-Stratos-32B that shows o1-Preview reasoning performance. Surprisingly we get good performance with only 17k questions-answers while DeepSeek distillation used 800k i.e. 47x more data. We open-source everything for the community to experiment with. Introducing Bespoke-Stratos-32B our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSkys Sky-T1 recipe. The model"  
[X Link](https://x.com/AlexGDimakis/status/1882134498512666640)  2025-01-22T18:33Z 18.8K followers, 139.3K engagements


"Ok the model was so popular and fun that our engineers put a playground for it talk to openthinker32b Remember this is a reasoning model super eager to please you and thinking very hard on everything. So if you say hi you will get a surprisingly long and thought-out answer :) http://playground.bespokelabs.ai http://playground.bespokelabs.ai Try OpenThinker-32B at https://t.co/IndvwLN9mK The reasoning traces are fun to read http://playground.bespokelabs.ai http://playground.bespokelabs.ai Try OpenThinker-32B at https://t.co/IndvwLN9mK The reasoning traces are fun to read"  
[X Link](https://x.com/AlexGDimakis/status/1890523807992991936)  2025-02-14T22:09Z 18.8K followers, [----] engagements


"1/ Introducing $BAI the official token of @bespokelabsai in collaboration with @berkeley_ai πŸ€– Built for AI-driven data curation model refinement and decentralized intelligence all on Solana. Contract: 4F93oBZXBRa1Gqon7s1DEY3x1cjVKXPoZfAL4ow4pump"  
[X Link](https://x.com/AlexGDimakis/status/1892210304819372354)  2025-02-19T13:51Z 18.9K followers, [---] engagements


"2/ Why $BAI πŸ€–πŸš€ The AI revolution depends on databut todays models are built on centralized outdated and biased datasets. $BAI changes that. πŸ”Ή Decentralized Data Curation Power AI with transparent high-quality datasets. πŸ”Ή Post-Training & Distillation Optimize models with community-driven refinement. πŸ”Ή Built on Solana Scalable low-cost and ready for the AI economy. AI needs better data better incentives and better governance. $BAI is the future. Join us. πŸš€"  
[X Link](https://x.com/AlexGDimakis/status/1892210351963267503)  2025-02-19T13:51Z 18.9K followers, [----] engagements


"@NovaSkyAI @anyscalecompute @databricks @LambdaAPI @BerkeleySky Congratulations on the great work"  
[X Link](https://x.com/AlexGDimakis/status/1893193787150635355)  2025-02-22T06:59Z 18.9K followers, [--] engagements


"Great question. We learn how to do data curation for post-training. Post-training is not about building another bigger general model but rather how to specialize a general model to do a specific job on your data. Here are some lessons we learned. (For more info see our paper which has all the details.) Paper: Blog: https://openthoughts.ai/blog/ot3 https://arxiv.org/abs/2506.04178 https://openthoughts.ai/blog/ot3 https://arxiv.org/abs/2506.04178"  
[X Link](https://x.com/AlexGDimakis/status/1930754908757852495)  2025-06-05T22:33Z 21.7K followers, [---] engagements


"Jackie Chan giving another interpretation on work-life balance. Jackie Chan said this scene terrified him more than almost anything else in his career. He spent days standing on a rooftop staring down [--] stories trying to convince himself to jump. Sliding face-first down a 45-degree glass wall with hidden cable. https://t.co/d59qPq9k9L Jackie Chan said this scene terrified him more than almost anything else in his career. He spent days standing on a rooftop staring down [--] stories trying to convince himself to jump. Sliding face-first down a 45-degree glass wall with hidden cable."  
[X Link](https://x.com/AlexGDimakis/status/1980197798076440996)  2025-10-20T09:02Z 21.3K followers, [----] engagements


"@kchonyc Are papers on LLMs in medicine supposed to generate new medical evidence I would expect they study how well LLMs answer questions based on existing medical evidence"  
[X Link](https://x.com/AlexGDimakis/status/1982838581128532116)  2025-10-27T15:55Z 21.3K followers, [----] engagements


"This is very cool work. The benefit of such compound architectures is that you can finetune only an orchestrator or advisor and still benefit from stronger models used as tool calls. I scaled coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford's TerminalBench and it was fun βš‘πŸ€“ https://t.co/CeJO5pbPgk I scaled coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford's TerminalBench and it was fun βš‘πŸ€“ https://t.co/CeJO5pbPgk"  
[X Link](https://x.com/AlexGDimakis/status/1985513094337196529)  2025-11-04T01:03Z 21.3K followers, [----] engagements


"Very interesting research. Writing detailed and personalized cover letters for job applications had value. Now that LLMs automate it there is no longer value to them since they do not signal candidate skill or effort anymore. There are many similar tasks that we think have value and LLMs will contribute to the economy by automating them but in reality it will only make them useless. Reminds me of some discussions about mining asteroids: they were saying this asteroid has [--] trillions worth of minerals so it may be worth a space mission. But in reality these minerals would be worth much less"  
[X Link](https://x.com/AlexGDimakis/status/1985939568659435822)  2025-11-05T05:17Z 21.3K followers, [----] engagements


"Just announced: Terminal-Bench [---] launching Tommorow. [--] new realistic tasks more than [---] hours of manual reviewing. Congratulations to the terminal-bench team"  
[X Link](https://x.com/AlexGDimakis/status/1986627963564269578)  2025-11-07T02:53Z 21.4K followers, [----] engagements


"Congratulations @Mike_A_Merrill @alexgshaw and the [---] contributors for standardizing what RL environments for CLI agents means for the open source community"  
[X Link](https://x.com/AlexGDimakis/status/1986628607150870598)  2025-11-07T02:55Z 21.3K followers, [---] engagements


"UT Austin is doubling its supercomputing cluster to more than [----] GPUs. This cluster has been a key for open source AI. Datacomp DCLM OpenThoughts and many other open source projects by researchers in Austin and many other universities and labs around the world critically rely on this open compute infrastructure. UT gets more computehttps://t.co/LZPDhJpAz9 UT gets more computehttps://t.co/LZPDhJpAz9"  
[X Link](https://x.com/AlexGDimakis/status/1988061932239384684)  2025-11-11T01:51Z 21.4K followers, 25.8K engagements


"Congratulations Tasso for joining databricks NYC Big personal update πŸ’₯ After founding & exiting two companies (a database pioneer & a CDP powerhouse) Im starting a new chapter: I've joined @databricks Im here to build and lead their brand-new engineering office right here in NYC πŸ—½ https://t.co/ZjyuMtL2ro Big personal update πŸ’₯ After founding & exiting two companies (a database pioneer & a CDP powerhouse) Im starting a new chapter: I've joined @databricks Im here to build and lead their brand-new engineering office right here in NYC πŸ—½ https://t.co/ZjyuMtL2ro"  
[X Link](https://x.com/AlexGDimakis/status/1988437158781481165)  2025-11-12T02:42Z 21.5K followers, [----] engagements


"ICLR reviews are out. ICLR reviews are out probably by paper id. Good luck arguing with the reviewers πŸ˜… ICLR reviews are out probably by paper id. Good luck arguing with the reviewers πŸ˜…"  
[X Link](https://x.com/AlexGDimakis/status/1988492719262728617)  2025-11-12T06:23Z 21.4K followers, [----] engagements


"I dont think this is true. The peer review system is a (noisy) way to have some people to be forced to read your work as reviewers. The vast majority of arxiv papers now I believe is read by zero people. If a paper gets into a top venue or gets spotlight oral etc its a way for authors to get visibility. Obviously its noisy but its better than no filtering"  
[X Link](https://x.com/AlexGDimakis/status/1988835449180418178)  2025-11-13T05:04Z 21.5K followers, [----] engagements


"COLM is great submit to COLM to forget about those 6622s look how happy they are submit to COLM look how happy they are submit to COLM"  
[X Link](https://x.com/AlexGDimakis/status/1988868937623167434)  2025-11-13T07:18Z 21.4K followers, 19.1K engagements


"I keep hearing that Excel spreadsheets and other apps will disappear: All knowledge work will be AI agents built on top of systems of record and the user will just ask the agents to do the work. But TIL that spreadsheets are older than any other form of written language. I.e. the earliest known writing in the world is basically spreadsheets with grain ratios tables with counts of workers and how much beer each received etc. So yeah spreadsheets and other applications with useful UIs are probably not going away"  
[X Link](https://x.com/AlexGDimakis/status/1995313106155864092)  2025-12-01T02:04Z 21.5K followers, [----] engagements


"Congratulations to Adam Klivans and all the co-authors for winning the FOCS [----] Test of Time Award Their paper was a learning theory breakthrough: It provided the first efficient algorithm for learning halfspaces when there is adversarial label noise under distributional assumptions. (Hardness arguments from cryptography suggest that learning halfspaces without distribution assumptions is impossible). From the FOCS citation: "The work contributed to a fundamental shift in the fields perspective leading to an outpouring of new positive results for learning geometric concepts in more"  
[X Link](https://x.com/AlexGDimakis/status/2000703010398519799)  2025-12-15T23:02Z 21.7K followers, 11.5K engagements


"My final exam is today in Berkeley. Pen and paper in person all the students try to solve challenging problems. No machines. This ancient method of evaluating students is going to survive in the AI era"  
[X Link](https://x.com/AlexGDimakis/status/2001814187841130578)  2025-12-19T00:37Z 22.2K followers, 107.7K engagements


"We are using and developing AI agents (and datasets) to help with teaching - and AI can make amazing personalized tutors. But the role of an in person test is to *evaluate understanding*. Beyond having a great ai tutor students who know they will be truly evaluated are motivated to put the effort to use learning tools (old and new) to actually understand the concepts. There is no royal road to learning and knowing a real in-person exam is at the end changes the way students approach the whole semester and put more effort I believe"  
[X Link](https://x.com/AlexGDimakis/status/2001885490820334040)  2025-12-19T05:21Z 21.9K followers, [----] engagements


"The study says simply that the very top at young age are not identical with the very top adults. (As one would expect since there are many many more non elite young candidates). Elite young performers are still [--] times more likely to be in the top adults compare to general population as the paper acknowledges in page 6-7 but this is buried in the technical analysis. Overall I found some parts of this paper to be misleading and not sufficiently emphasising odds ratio vs base rate"  
[X Link](https://x.com/AlexGDimakis/status/2002520800000422356)  2025-12-20T23:25Z 22.1K followers, [----] engagements


"A paper was recently published in Science on highest level of human performance across athletics science math and music. I think the paper makes some classical statistics mistakes that still fool many smart people. The paper "Recent discoveries on the acquisition of the highest levels of human performance" by Gullich et al. claims: "In summary when comparing performers across the highest levels of achievement the evidence suggests that eventual peak performance is negatively associated with early performance." The paper makes two mistakes. Base-rate fallacy and missing Berkson's paradox (aka"  
[X Link](https://x.com/AlexGDimakis/status/2002848594953732521)  2025-12-21T21:08Z 22.2K followers, 124.1K engagements


"Very interesting and impressive study. Identical twins ate the same calories for [--] months and there was still significant variability in how much weight they gained: +8kg on average but ranged from 4kg to 13kg. I initially thought this would violate the first law of thermodynamics but I guess human bodies introduce variability. [--]. And guess what It was bang on - [---] kg. But avearges miss the crucial aspect that there is often heterogeneity - in fact the lowest gainter was only [---] kg and the highest gaineer was [----] kg Not fair [--]. And guess what It was bang on - [---] kg. But avearges miss the"  
[X Link](https://x.com/AlexGDimakis/status/2007778801581797547)  2026-01-04T11:39Z 22.2K followers, [----] engagements


"@Kangwook_Lee in the game you can actually have the player write grant proposals and have LLM-as-a-judge reviewers kill the proposals and write comments. Then it becomes too good of a game you might as well do the real thing"  
[X Link](https://x.com/AlexGDimakis/status/2008329100692197658)  2026-01-06T00:05Z 22.2K followers, [---] engagements


"Parth and Alan presenting Advisor Models in the Berkeley Sky lab retreat. Advisor models are small models that are trained to create personalization or steering advice prompts that are fed to a large model like GPT. Its basically dynamic prompting done by a small LLM that can be trained or personalised. In one experiment the advisor learned which users like short movie reviews and who prefers detailed reviews purely by RL with a numerical reward. Then it adds this personalization information to the prompt of GPT5 that writes the movie reviews."  
[X Link](https://x.com/AlexGDimakis/status/2011967337596113177)  2026-01-16T01:02Z 22.4K followers, [----] engagements


"If you've lost track of startups coming out of UC Berkeley Sky Lab raising in the last [--] weeks: SGLang (RadixArk) raised at 400m valuation VLLM (Inferact) at 150m at 800m valuation LMArena raised 150m at 1.7B valuation. Not too bad for impact in January 2026"  
[X Link](https://x.com/AlexGDimakis/status/2014508959621959724)  2026-01-23T01:22Z 22.4K followers, 78.9K engagements


"Coding agents as a path to Continual Learning Continual learning is among the most important open problems in AI: the ability to personalize adapt and specialize while doing tasks. Right now the model weights are not updating and there is a lot of on-going work on how to use RL for continual learning. But there is another alternative lets call it 'Code is all you need' or 'CLI is all you need': Take a (fixed weight) coding agent and give it a terminal a file system and let it create files skills and scripts for continual learning. The file system can act as long-term memory with hierarchical"  
[X Link](https://x.com/AlexGDimakis/status/2017287141018243236)  2026-01-30T17:21Z 22.4K followers, 21.5K engagements


"Here is a very good reason why the NyquistShannon sampling theorem requires that your function is low-pass before you sub-sample to downscale. If you just sub-sample without smoothing a bad guy can place another image exactly on the pixels you sub-sample. Adversarial aliasing. image-scaling attacks are wild small dots added to the image on the left turns it into the image on the right when downscaled could make auditing ML systems very tricky if you only look at the original images. https://t.co/PvidAaxJLS image-scaling attacks are wild small dots added to the image on the left turns it into"  
[X Link](https://x.com/anyuser/status/1456859486728212483)  2021-11-06T05:42Z 22.5K followers, [---] engagements


"@TheGregYang @HeinrichKuttler I love this platform for the mere intellectual depth of the ongoing discourse"  
[X Link](https://x.com/AlexGDimakis/status/2018872312234484171)  2026-02-04T02:20Z 22.5K followers, [---] engagements


"Great post on evaluating agents. If you give the agent a machine with strict memory limits (as specified in Terminal-Bench 2) you drop [--] percent or more. Daytona allows 3x more memory and that smooths things out. The environment is part of the benchmark and understanding these variations is key for scientific measurement and optimization. New on the Engineering Blog: Quantifying infrastructure noise in agentic coding evals. Infrastructure configuration can swing agentic coding benchmarks by several percentage pointssometimes more than the leaderboard gap between top models. Read more:"  
[X Link](https://x.com/AlexGDimakis/status/2019676628088164498)  2026-02-06T07:36Z 22.5K followers, [----] engagements


"GPT is having a profound effect on how students write. Its verbose style full of cliches and 'fancy' out of place vocabulary is in every paper and draft I read. A few years back there were grammar errors and awkwardness -- but at least people had their own voice. Now scholarship is getting full of robotic triviality"  
[X Link](https://x.com/anyuser/status/1831833630022496515)  2024-09-05T23:15Z 22.5K followers, 951.3K engagements


"Someone is trying to scam my PhD student. My student asks to verify their identity 1/2"  
[X Link](https://x.com/anyuser/status/1487251984482766850)  2022-01-29T02:31Z 22.5K followers, [----] engagements


"I was surprised by a talk Yejin Choi (an NLP expert) gave yesterday in Berkeley on some surprising weaknesses of GPT4: As many humans know 237*757=179409 but GPT4 said [------]. For the easy problem of multiplying two [--] digit numbers they measured GPT4 accuracy being only 59% accuracy on [--] digit number multiplication. Only 4% on [--] digit number multiplication and zero on 5x5. Adding scratchpad helped GPT4 but only to 92% accuracy on multiplying two [--] digit numbers. Even more surprisingly finetuning GPT3 on 1.8m examples of [--] digit multiplication still only gives [--] percent test accuracy (in"  
[X Link](https://x.com/anyuser/status/1691600985938858432)  2023-08-16T00:01Z 22.5K followers, 1.7M engagements


"This paper seems very interesting: say you train an LLM to play chess using only transcripts of games of players up to [----] elo. Is it possible that the model plays better than [----] elo (i.e. "transcends" the training data performance). It seems you get something from nothing and some information theory arguments that this should be impossible were discussed in conversations I had in the past. But this paper shows this can happen: training on [----] elo game transcripts and getting an LLM that plays at [----] Further the authors connect to a clean theoretical framework for why: it's ensembling"  
[X Link](https://x.com/anyuser/status/1803293833889042637)  2024-06-19T05:08Z 22.5K followers, 392.7K engagements


"Discovered a very interesting thing about DeepSeek-R1 and all reasoning models: The wrong answers are much longer while the correct answers are much shorter. Even on the same question when we re-run the model it sometimes produces a short (usually correct) answer or a wrong verbose one. Based on this I'd like to propose a simple idea called Laconic decoding: Run the model [--] times (in parallel) and pick the answer with the smallest number of tokens. Our preliminary results show that this decoding gives +6-7% on AIME24 with only a few parallel runs. I think this is better (and faster) than"  
[X Link](https://x.com/anyuser/status/1885447830120362099)  2025-01-31T21:59Z 22.5K followers, 222.8K engagements


"Human bilinguals are more robust to dementia and cognitive decline. In our recent NeurIPS paper we show that bilingual GPT models are also more robust to structural damage in their neuron weights. Further we develop a theory. (1/n)"  
[X Link](https://x.com/anyuser/status/1622006950950014981)  2023-02-04T22:59Z 22.5K followers, 312.6K engagements


"Thank you for your response Dimitris. I appreciate your take on the issue. It's true that a request for "a few typos" and fewer "fancy words" may help bring back a sense of authenticity to writing. Theres a delicate balance between polishing a draft and maintaining the writers original voice and sometimes that balance is lost when students rely too heavily on tools like GPT. I find that students are increasingly focused on perfecting their writing in a technical sense but often at the cost of depth originality and personal style. The quirks errors and occasional awkwardness that were once"  
[X Link](https://x.com/anyuser/status/1831955374758621686)  2024-09-06T07:19Z 22.5K followers, 132.2K engagements


"Most AI researchers I talk to have been a bit shocked by DeepSeek-R1 and its performance. My preliminary understanding nuggets: [--]. Simple post-training recipe called GRPO: Start with a good model and reward for correctness and style outcomes. No PRM no MCTS no fancy reward models. Basically checks if the answer is correct. πŸ˜… [--]. Small models can reason very very well with correct distillation post-training. They released a 1.5B model () that is better than Claude and Llama 405B in AIME24. Also their distilled 7B model seems better than o1 preview. πŸ€“ [--]. The datasets used are not released if I"  
[X Link](https://x.com/anyuser/status/1881511481164079507)  2025-01-21T01:17Z 22.5K followers, 184.1K engagements


""RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time the model tries [--] times (the Group in GRPO) and a gradient step is performed to increase the reward which is a very simple verification of the correct answers repeated thousands of times on the same problem. The shocking finding is that the model does not overfit to this one question: RL on one example makes the model better in MATH500 and other benchmarks. (If instead"  
[X Link](https://x.com/anyuser/status/1921348214525219206)  2025-05-10T23:34Z 22.5K followers, 354.5K engagements


"Life update: I am excited to announce that I will be starting as a Professor in UC Berkeley in the EECS Department. I spend [--] wonderful years teaching in UT Austin and I am grateful to all my colleagues and students there and extremely proud of what we have achieved in AI in UT Austin and I plan to continue my numerous UT close collaborations. I will also continue as Chief Scientist in Bespoke Labs making it much easier now being in the Bay area. I received my Phd in [----] from @Berkeley_EECS and I am thrilled to be back. I am grateful for this new opportunity"  
[X Link](https://x.com/anyuser/status/1869124346264043827)  2024-12-17T20:55Z 22.5K followers, 110.9K engagements


"2/ Scammer ends up improving our sample complexity bound for StyleGAN inverse problems. They teach them to do chaining arguments instead of just union bounds now jeez. @giannis_daras"  
[X Link](https://x.com/anyuser/status/1487251986382831617)  2022-01-29T02:31Z 22.5K followers, [----] engagements


"For the first (and probably last) time in my life I understand the technical details of both the physics and chemistry Nobel prizes. BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and the other half jointly to Demis Hassabis and John M. Jumper for protein structure prediction. https://t.co/gYrdFFcD4T BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and"  
[X Link](https://x.com/anyuser/status/1843995475743228128)  2024-10-09T12:42Z 22.5K followers, 56.4K engagements


"BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and the other half jointly to Demis Hassabis and John M. Jumper for protein structure prediction"  
[X Link](https://x.com/anyuser/status/1843951197960777760)  2024-10-09T09:46Z 1.3M followers, 9.1M engagements


"Doctor: We used a deep learning algorithm for your MRI reconstruction. Turns out one of your kidneys is a cat"  
[X Link](https://x.com/anyuser/status/1396898949068431377)  2021-05-24T18:40Z 22.5K followers, [----] engagements


"One huge advantage of deep learning (vs classical ML models) that is not often discussed is *modularity*: One can download pre-trained models glue them like Legos and fine tune them end-to-end because gradients flow through. (1/n)"  
[X Link](https://x.com/anyuser/status/1506485470255280129)  2022-03-23T04:18Z 22.5K followers, [----] engagements


"Based on recent papers (Gpt3 Palm dalle2 Gato Metaformer) I am forming the opinion that maybe 'Scale is all you need' possibly even for general intelligence (). Just convert everything to tokens and predict the next token. (1/n)"  
[X Link](https://x.com/anyuser/status/1526388274348150784)  2022-05-17T02:24Z 22.5K followers, [----] engagements


"The term Artificial Intelligence was coined by John McCarthy to avoid association with Cybernetics and specifically its pioneer Norbert Wiener who was already famous pain to work with and working on Cybernetics in MIT. Original quote from McCarthy's Stanford page: . (1/n)"  
[X Link](https://x.com/anyuser/status/1516451200408993795)  2022-04-19T16:18Z 22.5K followers, [---] engagements


"@FernleafFlynn @even_kei @IllithidHeretic Two major industries breaking ways for a paltry sum"  
[X Link](https://x.com/anyuser/status/1263544264375574528)  2020-05-21T18:56Z 22.5K followers, [---] engagements


"Here is a simple way to beat ChatGPT and any similar architecture with one Turing test question. ChatGPT GPT3 and all related Transformers have a finite maximum token sequence length usually 2k to 4k tokens. (1/n)"  
[X Link](https://x.com/anyuser/status/1628790477808967685)  2023-02-23T16:14Z 22.5K followers, 163.2K engagements


"Best to leave TF for later"  
[X Link](https://x.com/anyuser/status/1167897353866465283)  2019-08-31T20:30Z 22.5K followers, [---] engagements


"My thoughts on the now famous Google leak doc: [--]. Open source AI is winning. I agree and that is great for the world and for a competitive ecosystem. In LLMs we're not there but we just got OpenClip to beat openAI Clip and Stable diffusion is better than closed models. [--]. You don't need huge models high quality data is much more efficient and important. Alpacaing models behind APIs further reduces moats. [--]. You can start with a good foundation model and parameter efficient fine-tuning (PEFT) algorithms like Lora work super well in a day. Finally an opening for algorithmic innovations 4."  
[X Link](https://x.com/anyuser/status/1654286036015411205)  2023-05-05T00:45Z 22.5K followers, 189.1K engagements


"Probably the best 1h introduction to LLMs that I've seen. And after 20mins its not an introduction its getting into cutting edge research updates updated up to this month. I had not heard of the data exfiltration by prompt injection or the recent finetuning Poisoning attacks. https://www.youtube.com/watchv=zjkBMFhNj_g&t=2s https://www.youtube.com/watchv=zjkBMFhNj_g&t=2s"  
[X Link](https://x.com/anyuser/status/1727595762266026128)  2023-11-23T07:51Z 22.5K followers, 74.5K engagements


"Excited to be the director for the new Texas Center for Generative AI Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://t.co/jTZd4uV0ps Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://t.co/jTZd4uV0ps"  
[X Link](https://x.com/anyuser/status/1750580887194943640)  2024-01-25T18:06Z 22.5K followers, 52.6K engagements


"Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://news.utexas.edu/2024/01/25/new-texas-center-will-create-generative-ai-computing-cluster-among-largest-of-its-kind/ https://news.utexas.edu/2024/01/25/new-texas-center-will-create-generative-ai-computing-cluster-among-largest-of-its-kind/"  
[X Link](https://x.com/anyuser/status/1750577250997596603)  2024-01-25T17:51Z [----] followers, 106K engagements


"A Thanksgiving story A few years back I used to play tennis in a ladder system which would match me up with various folks in my neighborhood. After Thanksgiving I had a tennis match with this guy: nice guy two kids a bit overweight in his 50ies I had never met him before. We start our match. During the match he says -Sorry lets stop for a bit I want to catch my breath. -Sure no problem. We start and [--] minutes after he says: -Sorry I ate too much at the Thanksgiving dinner and I have digestion problems. He was burping a bit and looked tired. He asks to reschedule the game I say sure sounds"  
[X Link](https://x.com/anyuser/status/1862560015263179256)  2024-11-29T18:11Z 22.5K followers, 37.5K engagements


"A small experiment: This Tweet has an even number of likes"  
[X Link](https://x.com/anyuser/status/1623087611802779649)  2023-02-07T22:33Z 22.5K followers, 45.5K engagements


"Ptolemy the king of Egypt wanted to learn geometry but found Euclid's book the Elements too difficult to study. So he asked Euclid to show him an easier way to master it. Euclid famously said "Sir there is no royal road to geometry." This is still true a few thousand years later in the days of Youtube and TikTok as Andrej nicely points out. # on shortification of "learning" There are a lot of videos on YouTube/TikTok etc. that give the appearance of education but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy"  
[X Link](https://x.com/anyuser/status/1756784225628401725)  2024-02-11T20:56Z 22.5K followers, 93.6K engagements


"# on shortification of "learning" There are a lot of videos on YouTube/TikTok etc. that give the appearance of education but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are learning (but actually they are just having fun). The people creating this content also enjoy it because fun has a much larger audience fame and revenue. But as far as learning goes this is a trap. This content is an epsilon away from watching the Bachelorette. It's like snacking on those "Garden Veggie Straws" which feel"  
[X Link](https://x.com/anyuser/status/1756380066580455557)  2024-02-10T18:10Z 1.8M followers, 2.2M engagements


"As Information theory was becoming a 'hot' scientific trend in the 50s Claude Shannon wrote a one-page paper advising hype *reduction*. That never happens anymore. Claude Shannon's "The Bandwagon" (1956) is a timeless gem. Short one page advise and perspective on the status of the field. ". we must keep our own house in first class order. The subject of information theory has certainly been sold if not oversold." https://t.co/Jn0e72B5Bz Claude Shannon's "The Bandwagon" (1956) is a timeless gem. Short one page advise and perspective on the status of the field. ". we must keep our own house in"  
[X Link](https://x.com/anyuser/status/1274776409370701825)  2020-06-21T18:49Z 22.5K followers, [---] engagements


"Claude Shannon's "The Bandwagon" (1956) is a timeless gem. Short one page advise and perspective on the status of the field. ". we must keep our own house in first class order. The subject of information theory has certainly been sold if not oversold.""  
[X Link](https://x.com/anyuser/status/1265913781240115201)  2020-05-28T07:52Z [---] followers, [---] engagements


"I was informed that Alexander Vardy a giant in coding theory passed away. A tragic loss for his family UCSD and academia. Alex's many discoveries include the Polar decoding algorithm used in the 5G wireless standard (1/3)"  
[X Link](https://x.com/anyuser/status/1503807067391418373)  2022-03-15T18:55Z 22.5K followers, [---] engagements


"Here is a very good reason why the NyquistShannon sampling theorem requires that your function is low-pass before you sub-sample to downscale. If you just sub-sample without smoothing a bad guy can place another image exactly on the pixels you sub-sample. Adversarial aliasing. image-scaling attacks are wild small dots added to the image on the left turns it into the image on the right when downscaled could make auditing ML systems very tricky if you only look at the original images. https://t.co/PvidAaxJLS image-scaling attacks are wild small dots added to the image on the left turns it into"  
[X Link](https://x.com/anyuser/status/1456859486728212483)  2021-11-06T05:42Z 22.5K followers, [---] engagements


"image-scaling attacks are wild small dots added to the image on the left turns it into the image on the right when downscaled could make auditing ML systems very tricky if you only look at the original images"  
[X Link](https://x.com/anyuser/status/1456149826337263621)  2021-11-04T06:42Z [----] followers, [----] engagements


"If you are a #neurips2020 reviewer please read the authors rebuttal and at the very least update your review indicating that you read it and your updated thoughts. It takes [--] minutes and its a good step towards decency. Meta-reviewers please enforce this"  
[X Link](https://x.com/anyuser/status/1301052865268588545)  2020-09-02T07:02Z 22.5K followers, [---] engagements


"Honored to be selected as an IEEE Fellow for contributions to distributed coding and learning' Congratulations to the whole Fellows class of [----] https://www.ieee.org/content/dam/ieee-org/ieee/web/org/about/fellows/2022-ieee-fellows-class.pdf Congratulations to @utexasece's Seth Bank @AlexGDimakis and Sriram Vishwanath for being selected as @IEEEorg Fellows https://t.co/yPfwbxMVb9 https://www.ieee.org/content/dam/ieee-org/ieee/web/org/about/fellows/2022-ieee-fellows-class.pdf Congratulations to @utexasece's Seth Bank @AlexGDimakis and Sriram Vishwanath for being selected as @IEEEorg Fellows"  
[X Link](https://x.com/anyuser/status/1463330284183638019)  2021-11-24T02:15Z 22.5K followers, [---] engagements


"Congratulations to @utexasece's Seth Bank @AlexGDimakis and Sriram Vishwanath for being selected as @IEEEorg Fellows https://www.ieee.org/content/dam/ieee-org/ieee/web/org/about/fellows/2022-ieee-fellows-class.pdf https://www.ieee.org/content/dam/ieee-org/ieee/web/org/about/fellows/2022-ieee-fellows-class.pdf"  
[X Link](https://x.com/anyuser/status/1463324472488902661)  2021-11-24T01:51Z [---] followers, [--] engagements


"What are RL environments Are they just evals There is significant confusion in the community so here is my opinion: My answer is inspired by Terminal-bench an elegant framework for creating RL environments evaluating agents and even training agents. First an RL environment is simply a Docker container. It contains three things: [--]. A snapshot of the state of the world when a problem happened. [--]. A task description and [--]. A reward that verifies if the agent has solved the task. Can be using LLM as a judge or run tests. For example lets take the 'broken-python' environment in Terminal bench. The"  
[X Link](https://x.com/anyuser/status/1965947230696910935)  2025-09-11T01:15Z 22.5K followers, 34.6K engagements


"@percyliang @deepseek_ai We are working on fixing that and create the largest open reasoning dataset. More coming very soon πŸ˜‰"  
[X Link](https://x.com/anyuser/status/1883420222029476177)  2025-01-26T07:42Z 22.5K followers, 37.9K engagements


"The Google Gemini paper was released today and has [---] authors. I was impressed but then found that a recent LHC physics paper with [----] authors. The first nine pages describe the research and the other [--] pages list the authors and their institutions. But that's not even the record. The most authors on a single peer-reviewed academic paper is [-----] and was achieved by the COVIDSurg and GlobalSurg Collaboratives at the University of Birmingham and the University of Edinburgh. All [---] Gemini coauthors are expected to quit Google and start [---] LLM startups next year"  
[X Link](https://x.com/anyuser/status/1737598802415018157)  2023-12-20T22:20Z 22.5K followers, 56.2K engagements


"Let the advisor show you how to write the rebuttal https://x.com/i/status/1294367648814424064/video/1 https://x.com/i/status/1294367648814424064/video/1"  
[X Link](https://x.com/anyuser/status/1295979503076999168)  2020-08-19T07:02Z 22.5K followers, [---] engagements


"New neural renderer by Nvidia. The model adds fingerprints smudges and dust and generates renders indistinguishable from real to me. Oh and its done at *real-time*. Can't wait to see games using this. (1/2)"  
[X Link](https://x.com/anyuser/status/1655056946150481922)  2023-05-07T03:48Z 22.5K followers, 29.9K engagements


"We're very excited that @UT Austin will lead an NSF national Institute on the Foundations of Machine Learning with @UW @WichitaState and @MSFTResearch Announcement: https://news.utexas.edu/2020/08/26/ut-austin-selected-as-home-of-national-ai-institute-focused-on-machine-learning/ https://news.utexas.edu/2020/08/26/ut-austin-selected-as-home-of-national-ai-institute-focused-on-machine-learning/"  
[X Link](https://x.com/anyuser/status/1298619401164533761)  2020-08-26T13:52Z 22.5K followers, [---] engagements


"@nandofioretto Yes that's right. Structure and flow in writing help us organize our thought. Blindly using LLMs is an airbrush that makes it harder for people to see that they have muddled flow"  
[X Link](https://x.com/anyuser/status/1831895840799256669)  2024-09-06T03:22Z 22.5K followers, 42.9K engagements


"I am excited to announce that our AI institute (Institute for Foundations of Machine Learning IFML) has been renewed. IFML was part of the first cohort of AI Institutes announced in [----]. Led by UT Austin the new award will build on the trajectory of the past five years and develop new foundational tools to advance generative AI. NSF IFML's work on diffusion models is a key technology behind major Google products powering widely used generative models such as Stable Diffusion [--] and Flux. In it's next phase NSF IFML will expand generative AI to new domains including protein engineering"  
[X Link](https://x.com/anyuser/status/1950249255127372000)  2025-07-29T17:37Z 22.5K followers, 26.7K engagements


"That's a bit of a simplification right πŸ˜… It's like showing a prototype of Pagerank and saying this is all the code needed to replicate Google search. You need to replicate V3 get a ton of extra data and do many things on top of GRPO to get to R1. It does replicate the core idea for reasoning RL however"  
[X Link](https://x.com/anyuser/status/1885117595738898629)  2025-01-31T00:07Z 22.5K followers, 86.9K engagements


"Who first generated text with statistical methods like GPT In [----] Claude Shannon wrote the landmark paper 'A Mathematical Theory of Communication'. There he defined and estimated the entropy of English by generating synthetic text: 'THE HEAD AND IN FRONTAL ATTACK ON (1/n)"  
[X Link](https://x.com/anyuser/status/1623113337574682626)  2023-02-08T00:15Z 22.5K followers, 42.4K engagements


"References: The Faith and Fate Paper is available here: Video of this great talk here: https://www.youtube.com/watchv=P7ZdUbSAujQ https://arxiv.org/pdf/2305.18654.pdf https://www.youtube.com/watchv=P7ZdUbSAujQ https://arxiv.org/pdf/2305.18654.pdf"  
[X Link](https://x.com/anyuser/status/1691601220039831553)  2023-08-16T00:02Z 22.5K followers, 57.4K engagements


"@raj_raj88 But even fine-tuning with 1.8m multiplication examples was not able to teach it to generalize to other (3 digit) multiplications. This indicates some fundamental architecture limitation"  
[X Link](https://x.com/anyuser/status/1691626794477076842)  2023-08-16T01:43Z 22.5K followers, 27K engagements


"Greece is quite the outlier here in the south on the number of metal bands per Capita. Any explanations Metal bands per [--] million people (Europe) https://t.co/OPEROKiBLo Metal bands per [--] million people (Europe) https://t.co/OPEROKiBLo"  
[X Link](https://x.com/anyuser/status/1513244007270334466)  2022-04-10T19:54Z 22.5K followers, [---] engagements


"Metal bands per [--] million people (Europe)"  
[X Link](https://x.com/anyuser/status/789135229399146496)  2016-10-20T16:04Z 594.6K followers, 24.3K engagements


"I was thrilled to learn about this best paper award announced today in COLT [----] the premier learning theory venue. The paper is "Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension" authored by students Gautam Chandrasekaran Konstantinos Stavropoulos IFML postdoc Vasilis Kontonis IFML director Adam Klivans and former UT CS PhD Raghu Meka. Smoothed analysis is an ingenious idea of going beyond worst case pioneered by my former USC colleague Shanghua Teng and Dan Spielman). This paper showed how to apply this framework for learning theory. Here is my basic understanding of"  
[X Link](https://x.com/anyuser/status/1808535472886468754)  2024-07-03T16:17Z 22.5K followers, 38.2K engagements


""Datacomp1B is the first public dataset that outperforms OpenAI" #NeurIPS2023"  
[X Link](https://x.com/anyuser/status/1735340429380370530)  2023-12-14T16:46Z 22.5K followers, 38.1K engagements


"My students after every joke I make in a Zoom lecture. (h/t: @OdedRechavi )"  
[X Link](https://x.com/anyuser/status/1339274191074373642)  2020-12-16T18:20Z 22.5K followers, [---] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@AlexGDimakis Avatar @AlexGDimakis Alex Dimakis

Alex Dimakis posts on X about model, paper, ai, in the the most. They currently have [------] followers and [---] posts still getting attention that total [---] engagements in the last [--] hours.

Engagements: [---] #

Engagements Line Chart

  • [--] Week [---] -96%
  • [--] Month [------] -46%
  • [--] Months [-------] +15%
  • [--] Year [-------] -74%

Mentions: [--] #

Mentions Line Chart

  • [--] Months [--] -62%
  • [--] Year [--] -56%

Followers: [------] #

Followers Line Chart

  • [--] Week [------] +0.12%
  • [--] Month [------] +1.30%
  • [--] Months [------] +8.80%
  • [--] Year [------] +20%

CreatorRank: undefined #

CreatorRank Line Chart

Social Influence

Social category influence technology brands stocks countries social networks travel destinations cryptocurrencies automotive brands products celebrities finance

Social topic influence model, paper, ai, in the, open ai, 6969, the first, if you, this is, theory

Top assets mentioned Alphabet Inc Class A (GOOGL) Microsoft Corp. (MSFT)

Top Social Posts

Top posts by engagements in the last [--] hours

"@pmddomingos My point is that a differentiable imagenet classifier can be finetuned to other problems or used as a feature extractor. Also discriminators are guiding training of generators. Concatenation of differentiable blocks is differentiable and widely used. An RF or XGB cannot do that"
X Link 2022-03-24T23:43Z 18.7K followers, [--] engagements

"Human bilinguals are more robust to dementia and cognitive decline. In our recent NeurIPS paper we show that bilingual GPT models are also more robust to structural damage in their neuron weights. Further we develop a theory. (1/n)"
X Link 2023-02-04T22:59Z 22.5K followers, 312.6K engagements

"I was surprised by a talk Yejin Choi (an NLP expert) gave yesterday in Berkeley on some surprising weaknesses of GPT4: As many humans know 237*757=179409 but GPT4 said [------]. For the easy problem of multiplying two [--] digit numbers they measured GPT4 accuracy being only 59% accuracy on [--] digit number multiplication. Only 4% on [--] digit number multiplication and zero on 5x5. Adding scratchpad helped GPT4 but only to 92% accuracy on multiplying two [--] digit numbers. Even more surprisingly finetuning GPT3 on 1.8m examples of [--] digit multiplication still only gives [--] percent test accuracy (in"
X Link 2023-08-16T00:01Z 22.5K followers, 1.7M engagements

"@boazbaraktcs @OpenAI @ilyasut @janleike Congratulations"
X Link 2023-08-31T02:35Z 15.2K followers, [----] engagements

"@roydanroy @boazbaraktcs @OpenAI @ilyasut @janleike its only true for large values of N"
X Link 2023-08-31T02:37Z 11.9K followers, [---] engagements

"Here are a few things I learned from the AI Institutes #AIHillDay showcasing in the Senate yesterday: [--]. Many on the Hill are talking about AI. Things are happening. [--]. It is not obvious to many in DC that universities play a key role in developing the research used in modern AI models. The visit mitigated that to some extent by showing fundamental research results from the [--] NSF institutes @NSF ranging from fundamentals in optimization generative AI trust in AI systems to applying AI in agriculture neuroscience next generation food systems edge devices and education. @AI_EDGE_INST@AI4OPT"
X Link 2023-09-21T01:10Z 12.2K followers, 21.2K engagements

"@madiator True story: We learned that in some states AI stands for Artificial Insemination - and that is perhaps the biggest AI problem this country will face"
X Link 2023-09-21T01:38Z 12.2K followers, [---] engagements

"@ryan_p_adams They mean Monte Carlo Monte Carlo. Like NY NY"
X Link 2023-09-22T02:49Z 12.2K followers, [----] engagements

"The specifications of building a web browser form the secret to the largest text dataset needed to train gpt5"
X Link 2023-09-24T01:55Z 12.7K followers, [----] engagements

"Excited to introduce open efficient customizable Pytorch code for training Large Language Models: #OpenLM 1B and 7B models released with scripts to make it easy reproduce and modify. We also kept intermediate checkpoints every 25B tokens. Contributors include @ssgrn @Mitchnw @Vaishaal @sy_gadre @achalddave @lschmidt3 @GeorgeSmyrnis1 and others. Thanks to @laion_ai @StabilityAI and the support of @NSF Institute @MLFoundations Everyone is welcome to join as a contributor and please let us know how OpenLM can help your research. OpenLM-1B and OpenLM-7B are some of the best models available and"
X Link 2023-09-26T18:30Z 11.9K followers, [----] engagements

"More research results (related to the "Reversal Curse") show that LLMs cannot (and should not) be used as databases. As shown in this interesting paper Inverse search fails. For example given training data 'John Von Neumann was born on December [--] 1903' the inverse search question is "Who was born on December [--] 1903'" The paper argues that transformers cannot perform inverse search (unless the knowledge was pretrained in reverse order in the training set) due to the left-to-right autoregressive training. This does not happen in the context (since everything can attend to everything there)."
X Link 2023-10-02T06:48Z 12.2K followers, 17.3K engagements

"@DimitrisPapail I thought the point of these recent papers is that causal pretraining causes fundamental limitations. Eg if a dataset contains "A= B" it is impossible to learn that "B=A" since the token for A cannot attend to the future B"
X Link 2023-10-02T14:49Z 13.1K followers, [---] engagements

"@DimitrisPapail @OpenAI Never seen this before"
X Link 2023-10-05T00:57Z 20.7K followers, [---] engagements

"@DimitrisPapail @OpenAI Lol"
X Link 2023-10-05T01:20Z 20.7K followers, [---] engagements

"VC: Pitch me. Pitch: We are thinking of building a startup that develops a web front-end and uses GPT API to create lyrics with rhyme and street vernacular to be performed over a backing beat or musical accompaniment. We call it GPT Wrapper VC: Get. out"
X Link 2023-10-29T17:35Z 11.9K followers, [----] engagements

"Cutting edge research on genAI has gone in a unicorn direction πŸ¦„"
X Link 2023-10-31T15:42Z 11.9K followers, [----] engagements

"Excited about our GenAI workshop this week"
X Link 2023-11-27T18:48Z 12.2K followers, [----] engagements

"We had a great panel on GenAI today"
X Link 2023-11-29T22:41Z 12.2K followers, [----] engagements

"Zaid Harchaoui presenting in our GenAI IFML workshop: can transformers finally learn multiplication or not"
X Link 2023-11-30T16:20Z 13.9K followers, [----] engagements

"Today in our GenAi IFML workshop Aditi Raghunathan telling us that transformers can learn different types of regression with finetuning but can catastrophically forget their previous prior"
X Link 2023-12-01T18:23Z 13.9K followers, [----] engagements

"@litu_rout_ Cool project congrats"
X Link 2023-12-05T16:44Z 12.2K followers, [---] engagements

"Some really cool work on inpainting and other inverse problems using a pre-trained Stable diffusion"
X Link 2023-12-05T20:00Z 12.2K followers, [----] engagements

""Datacomp1B is the first public dataset that outperforms OpenAI" #NeurIPS2023"
X Link 2023-12-14T16:46Z 22.5K followers, 38.1K engagements

"The Google Gemini paper was released today and has [---] authors. I was impressed but then found that a recent LHC physics paper with [----] authors. The first nine pages describe the research and the other [--] pages list the authors and their institutions. But that's not even the record. The most authors on a single peer-reviewed academic paper is [-----] and was achieved by the COVIDSurg and GlobalSurg Collaboratives at the University of Birmingham and the University of Edinburgh. All [---] Gemini coauthors are expected to quit Google and start [---] LLM startups next year"
X Link 2023-12-20T22:20Z 22.5K followers, 56.2K engagements

"Midjourney v6 is generating training data that seems very copyrighted. @Rahll It's clearly spitting out training data. Here's someone prompting 'Joaquin Phoenix Joker movie [----] screenshot from a movie movie scene'. https://t.co/haIEHzGDpB @Rahll It's clearly spitting out training data. Here's someone prompting 'Joaquin Phoenix Joker movie [----] screenshot from a movie movie scene'. https://t.co/haIEHzGDpB"
X Link 2023-12-24T04:04Z 12.7K followers, [----] engagements

"We just discovered that the inpainting model in Stable Diffusion is cheating. To clarify: Inpainting is a type of inverse problem where some missing data (pixels) must be filled in. In our testing some of the inpaintings from the SDXL inpainting model where a little 'too good': filling in details in the masked missing pixels they couldn't possibly know unless the model was cheating by observing masked pixels. So we created this test dog image with some Pink-Cyan boxes and then asked the model to inpaint it. We chose the masking region to fully contain the Pink and Cyan boxes so there is no"
X Link 2024-01-17T22:36Z 22.5K followers, 48.1K engagements

"Excited to be the director for the new Texas Center for Generative AI Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://t.co/jTZd4uV0ps Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://t.co/jTZd4uV0ps"
X Link 2024-01-25T18:06Z 22.5K followers, 52.6K engagements

"Interested in going to Greece this summer ISIT the International symposium on Information Theory will be in Athens Greece on July 7th to 12th [----]. I am co-chairing the tutorials with Lalitha Sankar from ASU. Please submit your Tutorial proposals by March 15th on information processing information theory and related fields Submission deadline: March [--] 2024"
X Link 2024-02-09T05:04Z 12.8K followers, [----] engagements

"The #Sora model is indeed incredible 🀯 congratulations to the OpenAI team. It is common for people to think that all the amazing research breakthroughs in AI (like #Sora) are happening inside companies like OpenAI while universities are becoming irrelevant. I want to highlight that the two first authors in the Sora paper Tim Brooks and Bill Peebles received their PhDs from UC Berkeley in [----] and their dissertation research is closely related to this breakthrough. Of course the compute infra and talent in OpenAI is critical for breakthroughs. I just want to point out that the training of the"
X Link 2024-02-18T18:28Z 22.5K followers, 44.7K engagements

"This is a very bizarre phenomenon. Some coordinates in the token vectors in transformers take huge values (2k or 7k while most values are below 1). These seem to lie on fixed feature dimensions and disapear in the last layers. LLMs are great but their internals are less explored. I'm excited to share very interesting findings in paper Massive Activations in Large Language Models LLMs have very few internal activations with drastically outsized magnitudes e.g. 100000x larger than others. (1/n) https://t.co/DRAgEPRHgw LLMs are great but their internals are less explored. I'm excited to share"
X Link 2024-03-02T06:17Z 13K followers, 12.7K engagements

"Excited to give one of the keynotes in the Data Council event in Austin in two weeks. I plan to talk about GenAI and Datacomp: Creating the Largest Public Multimodal Dataset in Academia. One central problem is what role can universities have in the GenAI ecosystem -- I think that that one of the roles that the open source community and academia can play is in the creation curation and evaluation of datasets. There are four underlying trends I identify: [--]. In the past data cleaning jobs were boring and menial tasks done manually by inexperienced researchers. As the datasets get larger data"
X Link 2024-03-16T22:21Z 13K followers, [----] engagements

"The biological brain is definitely more energy efficient (I'm reading that the human brain is roughly operating at 20w while one a100 needs 250W). But if we consider that one analog multiplication is physically computed at each point a synapse meets another neuron then 100b neurons will perform more multiplications I think)"
X Link 2024-04-10T03:24Z 13.1K followers, [--] engagements

"Thank you thats when twitter is at its best getting non-trivial answers in [--] min So the question is the use of the generated content and if it is competing with the owner of the copyrighted content. Is this legal consensus in your opinion or this is debated in current trials"
X Link 2024-04-16T23:19Z 13.1K followers, [---] engagements

"Phi-3 just released by Microsoft. Three small size models (3.8B 7B and 14B) trained on highly filtered and synthetic data. They report impressive performance since the 3.8B model (trained on 3T tokens) has MMLU of 69% matching Llama3 8B and the 7B Phi-3 model has 75% MMLU which I think makes it the best 7B model by far. The pre-training is done in two phases: in phase [--] its web data to teach general knowledge and phase [--] has heavily filtered web data and synthetic data created from GPT4. The number of training tokens is much smaller than Llama3 and I suspect the amazing performance comes from"
X Link 2024-04-23T05:28Z 13.1K followers, [----] engagements

"We are very excited that our first GH200 nodes have arrived in TACC for our GenAI center. Here is one. Fun facts: NVIDIA makes GH200 'superchips' (i.e. modules) a GH200 DGX box and a GH200 rack which are all different. As Dan Stanzione our TACC director kindly explained to me under the 15lb heat sink there is a GH200 superchip with 96GB memory. There is also 240GB of CPU RAM in a coherent address space so in theory should have as much memory as [--] A100x80gb. Also a 400GB/sec network interface is awesome. Excited to see what open models we can train on these"
X Link 2024-04-26T06:47Z 13.2K followers, 20.4K engagements

"This year I'm serving as the co-chair for ISIT Tutorials. The conference is happening in Athens Greece (Intercontinental) and the tutorials are on July 7th. Titles and presenters: Theory and Methods for Deep Generative Models Presenters: Yao Xie Taiji Suzuki and Xiuyuan Cheng Information-Theoretic Statistical and Algorithmic Foundations of RL Presenters: Yuejie Chi Yuxin Chen and Yuting Wei Language Model Inference: Theory and Algorithms Presenters: Ahmad Beirami and Ananda Theertha Suresh Graph Matching: Fundamental Limits and Efficient Algorithms Presenters: Hye Won Chung and Lele Wang"
X Link 2024-05-31T19:43Z 13.3K followers, 11.5K engagements

"This paper seems very interesting: say you train an LLM to play chess using only transcripts of games of players up to [----] elo. Is it possible that the model plays better than [----] elo (i.e. "transcends" the training data performance). It seems you get something from nothing and some information theory arguments that this should be impossible were discussed in conversations I had in the past. But this paper shows this can happen: training on [----] elo game transcripts and getting an LLM that plays at [----] Further the authors connect to a clean theoretical framework for why: it's ensembling"
X Link 2024-06-19T05:08Z 22.5K followers, 392.7K engagements

"@yevgets Any argument that says it's not surprising must also explain why it didn't happen at [----] elo training or why it doesn't happen at higher temperatures"
X Link 2024-06-19T19:44Z 13.8K followers, [----] engagements

"Our filtered dataset is only 1.4% of the original huge pool of 240T Common crawl scrape and it is very high quality"
X Link 2024-07-26T17:00Z 14K followers, [---] engagements

"More info here Congratulations to the whole DCLM team esp @lschmidt3 @Vaishaal @achalddave for their leadership. https://www.datacomp.ai/dclm/ https://www.datacomp.ai/dclm/"
X Link 2024-07-26T17:00Z 14K followers, [---] engagements

"A fantastic Economist article on Open-source AI by two pioneers: Martin Cassado and Ion Stoica. (plus a nice human-generated drawing). The main points I got from their Economist article: [--]. "Regulation hurts innovation": I agree and I am worried about Europe going in the wrong direction on this despite good intentions. [--]. "Open source makes systems safer": I agree and I also want to say more robust and more modular. [--]. "Open source drives innovation." I absolutely agree. One thing I want to add: Open-weights (i.e. Llama) is awesome but is not open-source. We should think of open weights as if"
X Link 2024-07-29T16:41Z 14K followers, 15.1K engagements

"Original post here: congratulations @martin_casado and Ion. https://x.com/martin_casado/status/1817947793677492318 Professor Ion Stoica and I wrote an article in the Economist arguing for the importance of Open Source in AI including the largest most powerful models. https://t.co/FsvJyWN30l https://x.com/martin_casado/status/1817947793677492318 Professor Ion Stoica and I wrote an article in the Economist arguing for the importance of Open Source in AI including the largest most powerful models. https://t.co/FsvJyWN30l"
X Link 2024-07-29T16:43Z 14K followers, [----] engagements

"@tidyanalysis Many companies have benefited from open source in the past. I think the main advantage is that you have the whole world developing or using your platform. Monetizing that is non-trivial but I think many have done it successfully (eg databricks on spark) and many have not"
X Link 2024-07-30T09:44Z 14.1K followers, [--] engagements

"This is the most scary Halloween costume I've seen πŸ‘» Time to start planning for Halloween. Ordering my new costume now https://t.co/bjRTUOOiCu Time to start planning for Halloween. Ordering my new costume now https://t.co/bjRTUOOiCu"
X Link 2024-08-02T08:31Z 14K followers, [----] engagements

"Try our grounded factuality checker here We also make it super easy to use by API (and free for now). https://playground.bespokelabs.ai/ https://playground.bespokelabs.ai/"
X Link 2024-08-09T17:42Z 15.2K followers, [----] engagements

"@ZachariahNKM Absolutely Send us an email at company@bespokelabs.ai for a free API key and to fact-check longer documents. We also are working on domain-specific factuality checkers"
X Link 2024-08-10T15:14Z 15.3K followers, [---] engagements

"Hey collective Twitter AI hivemind: which Supervised FineTuning library should we use for our research We are exploring libraries for SFT e.g. TRL Axolotl Torchtune or other options"
X Link 2024-09-04T18:57Z 15.2K followers, [----] engagements

"@bclavie ohh no thats an imam baildi (one of my favorite Greek (hmm) dishes)"
X Link 2024-09-05T01:22Z 15.2K followers, [----] engagements

"GPT is having a profound effect on how students write. Its verbose style full of cliches and 'fancy' out of place vocabulary is in every paper and draft I read. A few years back there were grammar errors and awkwardness -- but at least people had their own voice. Now scholarship is getting full of robotic triviality"
X Link 2024-09-05T23:15Z 22.5K followers, 951.3K engagements

"@nandofioretto Yes that's right. Structure and flow in writing help us organize our thought. Blindly using LLMs is an airbrush that makes it harder for people to see that they have muddled flow"
X Link 2024-09-06T03:22Z 22.5K followers, 42.9K engagements

"Thank you for your response Dimitris. I appreciate your take on the issue. It's true that a request for "a few typos" and fewer "fancy words" may help bring back a sense of authenticity to writing. Theres a delicate balance between polishing a draft and maintaining the writers original voice and sometimes that balance is lost when students rely too heavily on tools like GPT. I find that students are increasingly focused on perfecting their writing in a technical sense but often at the cost of depth originality and personal style. The quirks errors and occasional awkwardness that were once"
X Link 2024-09-06T07:19Z 22.5K followers, 132.2K engagements

"for getting a free api key for our models http://console.bespokelabs.ai http://console.bespokelabs.ai"
X Link 2024-09-07T21:30Z 15.2K followers, [---] engagements

"GANs are basically a training method. Basically you get gradients through another network (the discriminator ) solving a min-max problem as opposed to an optimization problem. You can train a GAN using generators that are unets or transformers or cnns. GANs where the best generative models until diffusions dethroned them circa [----]. They are also much harder to train"
X Link 2024-09-25T00:58Z 15.2K followers, [----] engagements

"To answer on architectures my understanding is that CNNs are the best and simplest architectures for vision unless you have a ton of data. Then ViTs start becoming better because they are more flexible and allow non-local features. Most of the innovation in my opinion didn't come from changing architecture but from where you get supervision ie self supervised methods discriminators and diffusion training methods (for different problems)"
X Link 2024-09-25T01:01Z 15.2K followers, [----] engagements

"This is a herculean effort of unifying the vast literature on diffusions for inverse problems (used for inpainting deblurring MRI etc) in a unified mathematical framework. Congratulations especially to the student coauthors for all the hard work. https://giannisdaras.github.io/publications/diffusion_survey.pdf Why are there so many different methods for using diffusion models for inverse problems πŸ€” And how do these methods relate to each other In this survey we review more than [--] different methods and we attempt to unify them into common mathematical formulations. https://t.co/B19YG31IdC"
X Link 2024-09-28T03:46Z 15.2K followers, [----] engagements

"@giannis_daras Current Phd level AGI test: Hey gpt here are [--] PDFs. Derive the score function approximation that each one is implicitly using"
X Link 2024-09-28T04:12Z 15.2K followers, [---] engagements

"I think keeping some trade secrets is a reasonable strategy in a competitive system. It's great that some communication is still open through blog posts but that does not pass the bar of peer reviewed research as Yann said. I still think that peer reviewed research is the most robust long term way to make scientific progress despite it's many limitations (which are mainly happening in AI because the field is growing ie more submissions than experts). A good question is when there is high employee churn it's impossible to keep trade secrets so maybe it's a better strategy to be more open to"
X Link 2024-09-29T18:12Z 15.2K followers, [----] engagements

"@QuanquanGu @polynoamial @ylecun @thomaspower @OpenAI Lol wrote similar points in parallel πŸ‘"
X Link 2024-09-29T20:17Z 15.2K followers, [---] engagements

"@RishiSonthalia @polynoamial @ylecun @thomaspower @OpenAI Lol sorry I believe the opposite. Wanted to say signal to bs ratio"
X Link 2024-09-29T22:15Z 15.2K followers, [---] engagements

"For the first (and probably last) time in my life I understand the technical details of both the physics and chemistry Nobel prizes. BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and the other half jointly to Demis Hassabis and John M. Jumper for protein structure prediction. https://t.co/gYrdFFcD4T BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and"
X Link 2024-10-09T12:42Z 22.5K followers, 56.4K engagements

"@apostoliev Super cool what model did you use"
X Link 2024-11-02T22:52Z 15.2K followers, [--] engagements

"AI monoliths vs Unix Philosophy: The case for small specialized models. The current thinking in AI is that AGI is coming and that one gigantic model will be able to reason and solve business problems ranging from customer support to product development. Currently agents are basically big system prompts on the same gigantic model. Through prompt engineering AI builders are trying to plan and execute complex multi-step processes. This is not working very well. This monolith view of AI is in sharp contrast to how we teach engineers to build systems. When multiple people have to build complex"
X Link 2024-11-09T17:32Z 15.3K followers, [----] engagements

"@ifeelbig I think progress towards AGI is amazing. I just dont think the best way to solve practical AI engineering problems is through prompts to a gigantic model. Its slow and wastes a lot of energy and compute cannot be competitive to small specialized models"
X Link 2024-11-09T18:01Z 15.3K followers, [---] engagements

"(2/n) Evalchemy πŸ§ͺ offers optimized Eval Performance: Many LM benchmarks are not optimized for performance and cost and can take dozens of hours to compute even if you have a small model. Evalchemy can run the full battery of popular benchmarks in about 4h for a Llama8B (more than 3x acceleration according to our benchmarks due to parallelism) and allows easy installation and a consistent platform to run benchmarks and keep track in a leaderboard. We also support adding your own custom benchmarks"
X Link 2024-11-18T16:18Z 15.4K followers, [---] engagements

"@spirosx I remember those days. It would be like turning down an OpenAI offer for a nonexistent startup πŸ˜…"
X Link 2024-11-27T18:45Z 15.4K followers, [---] engagements

"@spirosx Correct. Joining a drama-free OpenAI vs nonexistent startup in uiuc incubator"
X Link 2024-11-27T18:56Z 15.4K followers, [---] engagements

"There are two interesting things going on here: First there is no red in this picture. Our brains are filling the red color (you can check if you zoom in). The picture has only light blue black and white. Also Claude sees a gondola ski chair on a blue background. Source: This was posted on fb by Scott Aaronson from a Quantum Physics post"
X Link 2024-12-08T05:47Z 15.4K followers, [----] engagements

"Life update: I am excited to announce that I will be starting as a Professor in UC Berkeley in the EECS Department. I spend [--] wonderful years teaching in UT Austin and I am grateful to all my colleagues and students there and extremely proud of what we have achieved in AI in UT Austin and I plan to continue my numerous UT close collaborations. I will also continue as Chief Scientist in Bespoke Labs making it much easier now being in the Bay area. I received my Phd in [----] from @Berkeley_EECS and I am thrilled to be back. I am grateful for this new opportunity"
X Link 2024-12-17T20:55Z 22.5K followers, 110.9K engagements

"@jefrankle @databricks Congrats thats quite a large number πŸ€“"
X Link 2024-12-17T21:01Z 16.1K followers, [---] engagements

"This is an interesting definition of AGI: Youll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible. ARC-AGI is a very cool test and o3 is amazing but as this thread argues more evals are needed. People are panicking with O3 doing really well on ARC-AGI so wanted to share some perspectives. [--]. OpenAI trained on 75% of the training set of ARC-AGI. On X there has been a lot of discussion around this but wanted to share what i think. So training on train set is fine by ML People are panicking with O3 doing really well on"
X Link 2024-12-24T07:56Z 16.1K followers, [----] engagements

"I was very surprised that straightforward SFT on a few thousand examples produces a posttraining dataset that gives o1 level reasoning abilities. Congratulations to the team: they show there is absolutely no moat in reasoning and no RL is needed if you can get a few thousand CoTs 1/6 πŸš€ Introducing Sky-T1-32B-Preview our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks trained under $450 πŸ“ŠBlog: https://t.co/LtuTJeilmv πŸ‹β™€Model weights: https://t.co/Vn1dmtrHWo https://t.co/KLk6zZr9KA 1/6 πŸš€ Introducing Sky-T1-32B-Preview our fully"
X Link 2025-01-14T08:10Z 16.2K followers, [----] engagements

"Yes we can achieve it without distilling. Just have humans solve [-----] math and coding problems and write detailed solutions. Routinely done in every big university in one course with [----] students and [--] homework problems each. Frontier labs paid hundreds of millions for humans to label much more than that"
X Link 2025-01-14T21:07Z 16.2K followers, [---] engagements

"The definition of art is quite non-trivial I'm afraid but I enjoy your Shannon-theoretic approach. :) As I'm learning from Britannica Marcel Duchamp submitted a urinal to a public exhibition in NYC. Now "Through this act Duchamp put forth a new definition of what constitutes a work of art: he implied that it is enough for an artist to deem something art and put it in a publicly accepted venue." So if we consider art what is displayed in reputable museums (and hence deemed as art by the professional art curators) AI generated art is art. But maybe this is like the first photographs taken that"
X Link 2025-01-20T07:45Z 16.8K followers, [---] engagements

"@plasmatic99 @hardmaru Hmm I don't think this is the problem. I believe Llama even admitted to training on libgen in publicly released docs. https://www.rollingstone.com/culture/culture-news/ai-meta-pirated-library-zuckerberg-1235235394/ https://www.rollingstone.com/culture/culture-news/ai-meta-pirated-library-zuckerberg-1235235394/"
X Link 2025-01-21T01:16Z 17K followers, [---] engagements

"Most AI researchers I talk to have been a bit shocked by DeepSeek-R1 and its performance. My preliminary understanding nuggets: [--]. Simple post-training recipe called GRPO: Start with a good model and reward for correctness and style outcomes. No PRM no MCTS no fancy reward models. Basically checks if the answer is correct. πŸ˜… [--]. Small models can reason very very well with correct distillation post-training. They released a 1.5B model () that is better than Claude and Llama 405B in AIME24. Also their distilled 7B model seems better than o1 preview. πŸ€“ [--]. The datasets used are not released if I"
X Link 2025-01-21T01:17Z 22.5K followers, 184.1K engagements

"@Mag_Jembrih is it good in your tests"
X Link 2025-01-21T11:29Z 17.3K followers, [----] engagements

"here is our blog-post for the release of the reasoning model and dataset. https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness- https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-"
X Link 2025-01-22T18:39Z 16.8K followers, [--] engagements

"Here we mean simply what you said: create data with R1 and SFT a Qwen32. You're right there is also the correct form of distillation where you look at the logits of the teacher and use that in the loss but we didn't do that. To the best of my knowledge when people say they distill LLMs they mean the SFT simplification"
X Link 2025-01-22T20:58Z 18K followers, [---] engagements

"@jxmnop Its even simpler than ppo: ppo needs a reward model. This is basically try to solve the problem for correct solutions call them positive for wrong solutions call them negative and do roughly dpo. I think thats it but also weighted by style a bit"
X Link 2025-01-22T21:34Z 17.6K followers, 10.8K engagements

"@bookwormengr Yes indeed. Further our dataset is only 17k questions while DeepSeek distilled with 800k. Most importantly ours is open so anyone can improve it or distill their own models with it"
X Link 2025-01-23T00:35Z 17.6K followers, [---] engagements

"@mignano You're right in terms of value creation. But then after a startup finds a valuable thin wrapper they should probably start making it out of thicker paper or plastic to establish a moat right Unless there is another moat through business tactics or something else"
X Link 2025-01-24T00:07Z 17.3K followers, [--] engagements

"@victor207755822 Congratulations for your contribution. I think DeepSeek-R1 has earned its place in the AI history books"
X Link 2025-01-24T18:27Z 18K followers, [----] engagements

"LM Arena confirms that DeepSeek-R1 is very good. Breaking News: DeepSeek-R1 surges to the top-3 in Arena🐳 Now ranked #3 Overall matching the top reasoning model o1 while being 20x cheaper and open-weight Highlights: - #1 in technical domains: Hard Prompts Coding Math - Joint #1 under Style Control - MIT-licensed A https://t.co/gwpgD4hmYI Breaking News: DeepSeek-R1 surges to the top-3 in Arena🐳 Now ranked #3 Overall matching the top reasoning model o1 while being 20x cheaper and open-weight Highlights: - #1 in technical domains: Hard Prompts Coding Math - Joint #1 under Style Control -"
X Link 2025-01-24T18:28Z 18.1K followers, [----] engagements

"@percyliang @deepseek_ai We are working on fixing that and create the largest open reasoning dataset. More coming very soon πŸ˜‰"
X Link 2025-01-26T07:42Z 22.5K followers, 37.9K engagements

"Our reasoning dataset Bespoke-Stratos-17k is trending on Huggingface. I think its the best reasoning dataset available today. We are one of the top trending datasets on HuggingFace today https://t.co/1KX73898It We are one of the top trending datasets on HuggingFace today https://t.co/1KX73898It"
X Link 2025-01-27T18:42Z 18.1K followers, [----] engagements

"Had great fun in the Effortless podcast where we discussed how Post-training and data curation is becoming the new hot space with @amitp42 and @dheeraj https://youtu.be/RmHhe2KEIu0si=u65fWwiwQ-vWcAqs https://youtu.be/RmHhe2KEIu0si=u65fWwiwQ-vWcAqs"
X Link 2025-01-29T06:16Z 18.5K followers, [----] engagements

"Excited about the popularity of our reasoning datasets. Our small (Bespoke-Stratos-17k) and large (OpenThoughts-114k) datasets are in no2 and no4 trending datasets in Huggingface. Multiple folks reaching out that they are using them to train their own reasoning models. Train your own DeepSeek-R1 distilled variant at home"
X Link 2025-01-29T20:30Z 18.5K followers, [----] engagements

"Discovered a very interesting thing about DeepSeek-R1 and all reasoning models: The wrong answers are much longer while the correct answers are much shorter. Even on the same question when we re-run the model it sometimes produces a short (usually correct) answer or a wrong verbose one. Based on this I'd like to propose a simple idea called Laconic decoding: Run the model [--] times (in parallel) and pick the answer with the smallest number of tokens. Our preliminary results show that this decoding gives +6-7% on AIME24 with only a few parallel runs. I think this is better (and faster) than"
X Link 2025-01-31T21:59Z 22.5K followers, 222.8K engagements

"@NeginRaoof_ Btw we can call this shortest of k decoding as opposed to best of k consensus of k etc. but laconic has a connection to humans look it up"
X Link 2025-01-31T23:30Z 18.7K followers, [----] engagements

"@rudiranck We will systematically compare. My intuition is that when you do trial and error you dont need consensus. Youd be better off doing something reflection realized you rambled for [--] minutes or you got lucky and found the key to the answer"
X Link 2025-02-01T07:02Z 18.6K followers, [---] engagements

"@NoahB1904 Did you try with the 32B or 7B distilled DeepSeeks_R1s Thats what we're mostly worried about not the big-R1"
X Link 2025-02-07T20:08Z 18.7K followers, [--] engagements

"You can already post on arxiv and ignore the peer reviewed system. The expensive thing is peoples attention and the peer-reviewed system is actually a (noisy) equalizer. To elaborate: The reason the peer-reviewed system has worked as the only known stable system for the progress of science is because it acts as a filter for quality. When a paper is published at Neurips or Nature this is a signal that some people consider it good enough to pass this bar. If it wins a best paper award or oral even more so. Reviewers were forced to read it as a service mechanism that phd students (plus academics"
X Link 2025-03-22T07:38Z 19.1K followers, 27.7K engagements

"@ipeirotis @yoavgo Yes but you got a PhD tenure track job and tenure because the peer reviewing system (and nsf grant reviewing system ) helped first to establish your personal brand. Also the paper brings a different type of credibility compared to a non reviewed blog post"
X Link 2025-03-22T23:05Z 19K followers, [---] engagements

"@DimitrisPapail Is this some sort of alignment to protect from deep fakes or something"
X Link 2025-03-27T01:01Z 19K followers, [----] engagements

"We are excited to release the OpenThinker2 reasoning models and data. In summary: [--]. Openthinker32B Outperforms DeepSeekR1-32B in reasoning. [--]. Fully open source open weights and open data (1M carefully curated samples). [--]. Post-trained only with SFT. RL post-training will likely further improve performance. Read the whole story.πŸ‘‡ Turns out its possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data OpenThoughts2-1M curated by selecting quality instructions from diverse sources. 🧡 (1/n)"
X Link 2025-04-03T16:49Z 22.5K followers, 16.7K engagements

""RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time the model tries [--] times (the Group in GRPO) and a gradient step is performed to increase the reward which is a very simple verification of the correct answers repeated thousands of times on the same problem. The shocking finding is that the model does not overfit to this one question: RL on one example makes the model better in MATH500 and other benchmarks. (If instead"
X Link 2025-05-10T23:34Z 22.5K followers, 354.5K engagements

"The thing I have been trying to understand is: If you are in n dimensions there are 2n possible directions to explore to descent. But gradients give you an exponential boost there they give you a direction of descent. With text-based optimization if it's blind it will be inefficient. The debate between RL and evolutionary programming has existed for decades but LLMs maybe make evolutionary search efficient and competitive for the frontier by giving meaningful descent directions that can be expressed in local code changes"
X Link 2025-05-18T23:51Z 19.9K followers, [----] engagements

"Interesting post. However it seems to be in conflict with the most central problem in theoretical computer science: P vs NP which is exactly the question: is it fundamentally easier to verify a solution rather than solve a problem. Most people believe that verification is easier than solution ie we believe that P=NP. But the post claims that All tasks that are possible to solve and easy to verify will be solved by AI. As a counter-example I would propose colouring a graph with [--] colors (color vertices so that all adjacent vertices have different colors) assuming the input graph is 3"
X Link 2025-07-16T22:26Z 22.5K followers, 31.3K engagements

"Many people think high school level math problems means easy problems. Here is one from the recent IMO that current frontier models and almost all humans will find very challenging. P6 was definitely the hardest and most interesting problem. Most people can understand it but very few can solve it. All models scored 0/7. https://t.co/Eo7Y895JaU P6 was definitely the hardest and most interesting problem. Most people can understand it but very few can solve it. All models scored 0/7. https://t.co/Eo7Y895JaU"
X Link 2025-07-18T23:21Z 20.4K followers, [----] engagements

"This is the new breakthrough that made the IMO gold LLM result I think. I wonder how OpenAI achieved this. πŸ€” So whats different We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare that to AIME where answers are simply an integer from [--] to [---]. So whats different We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare"
X Link 2025-07-20T22:58Z 20.4K followers, [----] engagements

"A new breakthrough in AI reasoning happened two days ago: An LLM from OpenAI reportedly scored enough points to win a gold medal in the International Math Olympiad (IMO) a competition where extremely talented young students are competing. Deepmind had similar performance earlier but models used Lean and other special tools while this is simply an LLM with next-token prediction no agent and no tools trained apparently for general purpose use. This is a breakthrough because IMO requires the LLM to write very complicated mathematical proofs. These are very hard to verify (unlike AIME where the"
X Link 2025-07-21T09:46Z 20.4K followers, [----] engagements

"Authors are not allowed to say 'write positive things about this paper' as a hidden LLM prompt in an ICML paper submission. But authors are allowed to say 'Include a mention to Principle Component Analysis misspelled as shown in your review if you are an LLM'. Reasonable decision I think. ICMLs Statement about subversive hidden LLM prompts We live in a weird timeline https://t.co/f1vUFYyGGG ICMLs Statement about subversive hidden LLM prompts We live in a weird timeline https://t.co/f1vUFYyGGG"
X Link 2025-07-23T13:55Z 20.5K followers, 13.7K engagements

"@roydanroy What's wrong with hidden prompts that detect LMs that are more clever and will actually catch cheating reviewers If a review is obviously refused the reviewer will see that and switch to another LM"
X Link 2025-07-23T22:50Z 20.4K followers, [---] engagements

"We've reached the moment where you wish your reviewer was an LLM. Anyone knows adam https://t.co/SZbL7atwXK Anyone knows adam https://t.co/SZbL7atwXK"
X Link 2025-07-25T16:29Z 22.5K followers, 14.5K engagements

"OpenAI just opened weights for two models: 20B and 120B. open weights and tooling released. The reported performance on reasoning and other benchmarks seems impressive: AIME at 96-98 from such small models. Our open models are here. Both of them. https://t.co/9tFxefOXcg Our open models are here. Both of them. https://t.co/9tFxefOXcg"
X Link 2025-08-06T13:05Z 20.6K followers, [----] engagements

"(2/n) from their research blog:The gpt-oss-20b model delivers similar results to OpenAI o3mini . and can run on edge devices with just [--] GB of memory making it ideal for on-device use cases 🀯"
X Link 2025-08-06T13:09Z 20.6K followers, [---] engagements

"Imagine you're trying to teach a human how to do a task say install Windows XP in a virtual machine. The human walks into a room and sees a document (prompt) that you have written that describes exactly what they are supposed to do. There is also a computer ready for their keyboard inputs. Then they try for a while and suppose they fail. Then you write some detailed notes and new additional instructions in the prompt document based on how they failed trying to teach them how to do the task. But then A NEW PERSON walks in and tries to solve the task. Every day it's a fresh new employee and you"
X Link 2025-08-15T05:56Z 22.5K followers, 26.4K engagements

"Its an interesting way to make environments. But one thing I don't understand: The reason we want environments is to be able to train agents. So we need something rendering the pixels and also a backend. I can believe that a big diffusion model could generate the pixels of an App but how would we have a back-end that would allow us to check if a task was successfully completed or not"
X Link 2025-08-23T23:37Z 20.8K followers, [---] engagements

"Very interesting work on training long-horizon web agents. More evidence that prompt driven graph-based agents are not going to take us very far we need SFT+RL. Super thrilled to WebExplorer which is a simple yet effective approach to train long-horizon web agents. Instead of depending heavily on rigid pre-defined graph structures WebExplorer utilizes the model-based exploration strategy to synthesize high-quality agentic data. Our 8B https://t.co/cQfPI2d30v Super thrilled to WebExplorer which is a simple yet effective approach to train long-horizon web agents. Instead of depending heavily on"
X Link 2025-09-09T23:09Z 20.9K followers, 16.5K engagements

"What are RL environments Are they just evals There is significant confusion in the community so here is my opinion: My answer is inspired by Terminal-bench an elegant framework for creating RL environments evaluating agents and even training agents. First an RL environment is simply a Docker container. It contains three things: [--]. A snapshot of the state of the world when a problem happened. [--]. A task description and [--]. A reward that verifies if the agent has solved the task. Can be using LLM as a judge or run tests. For example lets take the 'broken-python' environment in Terminal bench. The"
X Link 2025-09-11T01:15Z 22.5K followers, 34.6K engagements

"(2/2) The elegance of Terminal-bench is that it packages the whole state of the world in a Docker container and allows agents to try different things and check if they solved the problem. (The agent and the tools can live inside the docker container) The broken python environment and task is fully contained here In general I believe that terminal-bench is an extremely powerful framework for evaluating and training agents in many domains including devops software engineering scientific computing and many other domains basically everything that doesn't need agents controlling graphical user"
X Link 2025-09-11T01:15Z 20.9K followers, [----] engagements

"Very interesting piece of history I just learned from Ion Stoica in AI Native event: Databricks was founded because Hortonworks would not support the Spark open source project so some company needed to be created to support it"
X Link 2024-11-21T21:38Z 22.5K followers, [----] engagements

"Im excited to introduce Evalchemy πŸ§ͺ a unified platform for evaluating LLMs. If you want to evaluate an LLM you may want to run popular benchmarks on your model like MTBench WildBench RepoBench IFEval AlpacaEval etc as well as standard pre-training metrics like MMLU. This requires you to download and install more than [--] repos each with different dependencies and issues. This is as you might expect an actual nightmare. (1/n) https://github.com/mlfoundations/evalchemy https://github.com/mlfoundations/evalchemy"
X Link 2024-11-18T16:18Z 22.5K followers, 148K engagements

"The multiple answers mystery is the most surprising thing we stumbled on from OpenThoughts: Sampling multiple answers for the same question is better than having more questions each answered once. To explain: Say you are creating a dataset of questions and answers to SFT a reasoning llm. You can take [----] questions (eg from stackexchange) and answer them with deepseekR1. Or you can take [---] questions (from the same distribution) and answer each question twice independently with deepseekR1. Which one is a better dataset Surprisingly if you re-answer the same questions its a better dataset"
X Link 2025-12-07T19:42Z 22.5K followers, 28.5K engagements

"Live demo of llava on a MacBook in front of thousands at #NeurIPS2023 With [--] seconds left in your talk timeslot. That was brave and it worked πŸ‘"
X Link 2023-12-14T16:14Z 21.5K followers, [----] engagements

"The Berkeley Sky computing lab just trained Sky-T1-32B-Preview a GPT-o1 level reasoning model spending only $450 to create the instruction dataset. The data is 17K math and coding problems solved step by step. They created this dataset by prompting QwQ at $450 cost. Can it be done without another reasoning model to distill Teach a [----] student class and assign [--] homework problems. Side benefit: make $10M by charging $10K tuition"
X Link 2025-01-14T18:06Z 18.9K followers, 15.9K engagements

"DeepSeek-R1 is amazing but they did not release their reasoning dataset. We release a high-quality open reasoning dataset building on the Berkeley NovaSky Sky-T1 pipeline and R1. Using this we post-train a 32B model Bespoke-Stratos-32B that shows o1-Preview reasoning performance. Surprisingly we get good performance with only 17k questions-answers while DeepSeek distillation used 800k i.e. 47x more data. We open-source everything for the community to experiment with. Introducing Bespoke-Stratos-32B our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSkys Sky-T1 recipe. The model"
X Link 2025-01-22T18:33Z 18.8K followers, 139.3K engagements

"Ok the model was so popular and fun that our engineers put a playground for it talk to openthinker32b Remember this is a reasoning model super eager to please you and thinking very hard on everything. So if you say hi you will get a surprisingly long and thought-out answer :) http://playground.bespokelabs.ai http://playground.bespokelabs.ai Try OpenThinker-32B at https://t.co/IndvwLN9mK The reasoning traces are fun to read http://playground.bespokelabs.ai http://playground.bespokelabs.ai Try OpenThinker-32B at https://t.co/IndvwLN9mK The reasoning traces are fun to read"
X Link 2025-02-14T22:09Z 18.8K followers, [----] engagements

"1/ Introducing $BAI the official token of @bespokelabsai in collaboration with @berkeley_ai πŸ€– Built for AI-driven data curation model refinement and decentralized intelligence all on Solana. Contract: 4F93oBZXBRa1Gqon7s1DEY3x1cjVKXPoZfAL4ow4pump"
X Link 2025-02-19T13:51Z 18.9K followers, [---] engagements

"2/ Why $BAI πŸ€–πŸš€ The AI revolution depends on databut todays models are built on centralized outdated and biased datasets. $BAI changes that. πŸ”Ή Decentralized Data Curation Power AI with transparent high-quality datasets. πŸ”Ή Post-Training & Distillation Optimize models with community-driven refinement. πŸ”Ή Built on Solana Scalable low-cost and ready for the AI economy. AI needs better data better incentives and better governance. $BAI is the future. Join us. πŸš€"
X Link 2025-02-19T13:51Z 18.9K followers, [----] engagements

"@NovaSkyAI @anyscalecompute @databricks @LambdaAPI @BerkeleySky Congratulations on the great work"
X Link 2025-02-22T06:59Z 18.9K followers, [--] engagements

"Great question. We learn how to do data curation for post-training. Post-training is not about building another bigger general model but rather how to specialize a general model to do a specific job on your data. Here are some lessons we learned. (For more info see our paper which has all the details.) Paper: Blog: https://openthoughts.ai/blog/ot3 https://arxiv.org/abs/2506.04178 https://openthoughts.ai/blog/ot3 https://arxiv.org/abs/2506.04178"
X Link 2025-06-05T22:33Z 21.7K followers, [---] engagements

"Jackie Chan giving another interpretation on work-life balance. Jackie Chan said this scene terrified him more than almost anything else in his career. He spent days standing on a rooftop staring down [--] stories trying to convince himself to jump. Sliding face-first down a 45-degree glass wall with hidden cable. https://t.co/d59qPq9k9L Jackie Chan said this scene terrified him more than almost anything else in his career. He spent days standing on a rooftop staring down [--] stories trying to convince himself to jump. Sliding face-first down a 45-degree glass wall with hidden cable."
X Link 2025-10-20T09:02Z 21.3K followers, [----] engagements

"@kchonyc Are papers on LLMs in medicine supposed to generate new medical evidence I would expect they study how well LLMs answer questions based on existing medical evidence"
X Link 2025-10-27T15:55Z 21.3K followers, [----] engagements

"This is very cool work. The benefit of such compound architectures is that you can finetune only an orchestrator or advisor and still benefit from stronger models used as tool calls. I scaled coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford's TerminalBench and it was fun βš‘πŸ€“ https://t.co/CeJO5pbPgk I scaled coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford's TerminalBench and it was fun βš‘πŸ€“ https://t.co/CeJO5pbPgk"
X Link 2025-11-04T01:03Z 21.3K followers, [----] engagements

"Very interesting research. Writing detailed and personalized cover letters for job applications had value. Now that LLMs automate it there is no longer value to them since they do not signal candidate skill or effort anymore. There are many similar tasks that we think have value and LLMs will contribute to the economy by automating them but in reality it will only make them useless. Reminds me of some discussions about mining asteroids: they were saying this asteroid has [--] trillions worth of minerals so it may be worth a space mission. But in reality these minerals would be worth much less"
X Link 2025-11-05T05:17Z 21.3K followers, [----] engagements

"Just announced: Terminal-Bench [---] launching Tommorow. [--] new realistic tasks more than [---] hours of manual reviewing. Congratulations to the terminal-bench team"
X Link 2025-11-07T02:53Z 21.4K followers, [----] engagements

"Congratulations @Mike_A_Merrill @alexgshaw and the [---] contributors for standardizing what RL environments for CLI agents means for the open source community"
X Link 2025-11-07T02:55Z 21.3K followers, [---] engagements

"UT Austin is doubling its supercomputing cluster to more than [----] GPUs. This cluster has been a key for open source AI. Datacomp DCLM OpenThoughts and many other open source projects by researchers in Austin and many other universities and labs around the world critically rely on this open compute infrastructure. UT gets more computehttps://t.co/LZPDhJpAz9 UT gets more computehttps://t.co/LZPDhJpAz9"
X Link 2025-11-11T01:51Z 21.4K followers, 25.8K engagements

"Congratulations Tasso for joining databricks NYC Big personal update πŸ’₯ After founding & exiting two companies (a database pioneer & a CDP powerhouse) Im starting a new chapter: I've joined @databricks Im here to build and lead their brand-new engineering office right here in NYC πŸ—½ https://t.co/ZjyuMtL2ro Big personal update πŸ’₯ After founding & exiting two companies (a database pioneer & a CDP powerhouse) Im starting a new chapter: I've joined @databricks Im here to build and lead their brand-new engineering office right here in NYC πŸ—½ https://t.co/ZjyuMtL2ro"
X Link 2025-11-12T02:42Z 21.5K followers, [----] engagements

"ICLR reviews are out. ICLR reviews are out probably by paper id. Good luck arguing with the reviewers πŸ˜… ICLR reviews are out probably by paper id. Good luck arguing with the reviewers πŸ˜…"
X Link 2025-11-12T06:23Z 21.4K followers, [----] engagements

"I dont think this is true. The peer review system is a (noisy) way to have some people to be forced to read your work as reviewers. The vast majority of arxiv papers now I believe is read by zero people. If a paper gets into a top venue or gets spotlight oral etc its a way for authors to get visibility. Obviously its noisy but its better than no filtering"
X Link 2025-11-13T05:04Z 21.5K followers, [----] engagements

"COLM is great submit to COLM to forget about those 6622s look how happy they are submit to COLM look how happy they are submit to COLM"
X Link 2025-11-13T07:18Z 21.4K followers, 19.1K engagements

"I keep hearing that Excel spreadsheets and other apps will disappear: All knowledge work will be AI agents built on top of systems of record and the user will just ask the agents to do the work. But TIL that spreadsheets are older than any other form of written language. I.e. the earliest known writing in the world is basically spreadsheets with grain ratios tables with counts of workers and how much beer each received etc. So yeah spreadsheets and other applications with useful UIs are probably not going away"
X Link 2025-12-01T02:04Z 21.5K followers, [----] engagements

"Congratulations to Adam Klivans and all the co-authors for winning the FOCS [----] Test of Time Award Their paper was a learning theory breakthrough: It provided the first efficient algorithm for learning halfspaces when there is adversarial label noise under distributional assumptions. (Hardness arguments from cryptography suggest that learning halfspaces without distribution assumptions is impossible). From the FOCS citation: "The work contributed to a fundamental shift in the fields perspective leading to an outpouring of new positive results for learning geometric concepts in more"
X Link 2025-12-15T23:02Z 21.7K followers, 11.5K engagements

"My final exam is today in Berkeley. Pen and paper in person all the students try to solve challenging problems. No machines. This ancient method of evaluating students is going to survive in the AI era"
X Link 2025-12-19T00:37Z 22.2K followers, 107.7K engagements

"We are using and developing AI agents (and datasets) to help with teaching - and AI can make amazing personalized tutors. But the role of an in person test is to evaluate understanding. Beyond having a great ai tutor students who know they will be truly evaluated are motivated to put the effort to use learning tools (old and new) to actually understand the concepts. There is no royal road to learning and knowing a real in-person exam is at the end changes the way students approach the whole semester and put more effort I believe"
X Link 2025-12-19T05:21Z 21.9K followers, [----] engagements

"The study says simply that the very top at young age are not identical with the very top adults. (As one would expect since there are many many more non elite young candidates). Elite young performers are still [--] times more likely to be in the top adults compare to general population as the paper acknowledges in page 6-7 but this is buried in the technical analysis. Overall I found some parts of this paper to be misleading and not sufficiently emphasising odds ratio vs base rate"
X Link 2025-12-20T23:25Z 22.1K followers, [----] engagements

"A paper was recently published in Science on highest level of human performance across athletics science math and music. I think the paper makes some classical statistics mistakes that still fool many smart people. The paper "Recent discoveries on the acquisition of the highest levels of human performance" by Gullich et al. claims: "In summary when comparing performers across the highest levels of achievement the evidence suggests that eventual peak performance is negatively associated with early performance." The paper makes two mistakes. Base-rate fallacy and missing Berkson's paradox (aka"
X Link 2025-12-21T21:08Z 22.2K followers, 124.1K engagements

"Very interesting and impressive study. Identical twins ate the same calories for [--] months and there was still significant variability in how much weight they gained: +8kg on average but ranged from 4kg to 13kg. I initially thought this would violate the first law of thermodynamics but I guess human bodies introduce variability. [--]. And guess what It was bang on - [---] kg. But avearges miss the crucial aspect that there is often heterogeneity - in fact the lowest gainter was only [---] kg and the highest gaineer was [----] kg Not fair [--]. And guess what It was bang on - [---] kg. But avearges miss the"
X Link 2026-01-04T11:39Z 22.2K followers, [----] engagements

"@Kangwook_Lee in the game you can actually have the player write grant proposals and have LLM-as-a-judge reviewers kill the proposals and write comments. Then it becomes too good of a game you might as well do the real thing"
X Link 2026-01-06T00:05Z 22.2K followers, [---] engagements

"Parth and Alan presenting Advisor Models in the Berkeley Sky lab retreat. Advisor models are small models that are trained to create personalization or steering advice prompts that are fed to a large model like GPT. Its basically dynamic prompting done by a small LLM that can be trained or personalised. In one experiment the advisor learned which users like short movie reviews and who prefers detailed reviews purely by RL with a numerical reward. Then it adds this personalization information to the prompt of GPT5 that writes the movie reviews."
X Link 2026-01-16T01:02Z 22.4K followers, [----] engagements

"If you've lost track of startups coming out of UC Berkeley Sky Lab raising in the last [--] weeks: SGLang (RadixArk) raised at 400m valuation VLLM (Inferact) at 150m at 800m valuation LMArena raised 150m at 1.7B valuation. Not too bad for impact in January 2026"
X Link 2026-01-23T01:22Z 22.4K followers, 78.9K engagements

"Coding agents as a path to Continual Learning Continual learning is among the most important open problems in AI: the ability to personalize adapt and specialize while doing tasks. Right now the model weights are not updating and there is a lot of on-going work on how to use RL for continual learning. But there is another alternative lets call it 'Code is all you need' or 'CLI is all you need': Take a (fixed weight) coding agent and give it a terminal a file system and let it create files skills and scripts for continual learning. The file system can act as long-term memory with hierarchical"
X Link 2026-01-30T17:21Z 22.4K followers, 21.5K engagements

"Here is a very good reason why the NyquistShannon sampling theorem requires that your function is low-pass before you sub-sample to downscale. If you just sub-sample without smoothing a bad guy can place another image exactly on the pixels you sub-sample. Adversarial aliasing. image-scaling attacks are wild small dots added to the image on the left turns it into the image on the right when downscaled could make auditing ML systems very tricky if you only look at the original images. https://t.co/PvidAaxJLS image-scaling attacks are wild small dots added to the image on the left turns it into"
X Link 2021-11-06T05:42Z 22.5K followers, [---] engagements

"@TheGregYang @HeinrichKuttler I love this platform for the mere intellectual depth of the ongoing discourse"
X Link 2026-02-04T02:20Z 22.5K followers, [---] engagements

"Great post on evaluating agents. If you give the agent a machine with strict memory limits (as specified in Terminal-Bench 2) you drop [--] percent or more. Daytona allows 3x more memory and that smooths things out. The environment is part of the benchmark and understanding these variations is key for scientific measurement and optimization. New on the Engineering Blog: Quantifying infrastructure noise in agentic coding evals. Infrastructure configuration can swing agentic coding benchmarks by several percentage pointssometimes more than the leaderboard gap between top models. Read more:"
X Link 2026-02-06T07:36Z 22.5K followers, [----] engagements

"GPT is having a profound effect on how students write. Its verbose style full of cliches and 'fancy' out of place vocabulary is in every paper and draft I read. A few years back there were grammar errors and awkwardness -- but at least people had their own voice. Now scholarship is getting full of robotic triviality"
X Link 2024-09-05T23:15Z 22.5K followers, 951.3K engagements

"Someone is trying to scam my PhD student. My student asks to verify their identity 1/2"
X Link 2022-01-29T02:31Z 22.5K followers, [----] engagements

"I was surprised by a talk Yejin Choi (an NLP expert) gave yesterday in Berkeley on some surprising weaknesses of GPT4: As many humans know 237*757=179409 but GPT4 said [------]. For the easy problem of multiplying two [--] digit numbers they measured GPT4 accuracy being only 59% accuracy on [--] digit number multiplication. Only 4% on [--] digit number multiplication and zero on 5x5. Adding scratchpad helped GPT4 but only to 92% accuracy on multiplying two [--] digit numbers. Even more surprisingly finetuning GPT3 on 1.8m examples of [--] digit multiplication still only gives [--] percent test accuracy (in"
X Link 2023-08-16T00:01Z 22.5K followers, 1.7M engagements

"This paper seems very interesting: say you train an LLM to play chess using only transcripts of games of players up to [----] elo. Is it possible that the model plays better than [----] elo (i.e. "transcends" the training data performance). It seems you get something from nothing and some information theory arguments that this should be impossible were discussed in conversations I had in the past. But this paper shows this can happen: training on [----] elo game transcripts and getting an LLM that plays at [----] Further the authors connect to a clean theoretical framework for why: it's ensembling"
X Link 2024-06-19T05:08Z 22.5K followers, 392.7K engagements

"Discovered a very interesting thing about DeepSeek-R1 and all reasoning models: The wrong answers are much longer while the correct answers are much shorter. Even on the same question when we re-run the model it sometimes produces a short (usually correct) answer or a wrong verbose one. Based on this I'd like to propose a simple idea called Laconic decoding: Run the model [--] times (in parallel) and pick the answer with the smallest number of tokens. Our preliminary results show that this decoding gives +6-7% on AIME24 with only a few parallel runs. I think this is better (and faster) than"
X Link 2025-01-31T21:59Z 22.5K followers, 222.8K engagements

"Human bilinguals are more robust to dementia and cognitive decline. In our recent NeurIPS paper we show that bilingual GPT models are also more robust to structural damage in their neuron weights. Further we develop a theory. (1/n)"
X Link 2023-02-04T22:59Z 22.5K followers, 312.6K engagements

"Thank you for your response Dimitris. I appreciate your take on the issue. It's true that a request for "a few typos" and fewer "fancy words" may help bring back a sense of authenticity to writing. Theres a delicate balance between polishing a draft and maintaining the writers original voice and sometimes that balance is lost when students rely too heavily on tools like GPT. I find that students are increasingly focused on perfecting their writing in a technical sense but often at the cost of depth originality and personal style. The quirks errors and occasional awkwardness that were once"
X Link 2024-09-06T07:19Z 22.5K followers, 132.2K engagements

"Most AI researchers I talk to have been a bit shocked by DeepSeek-R1 and its performance. My preliminary understanding nuggets: [--]. Simple post-training recipe called GRPO: Start with a good model and reward for correctness and style outcomes. No PRM no MCTS no fancy reward models. Basically checks if the answer is correct. πŸ˜… [--]. Small models can reason very very well with correct distillation post-training. They released a 1.5B model () that is better than Claude and Llama 405B in AIME24. Also their distilled 7B model seems better than o1 preview. πŸ€“ [--]. The datasets used are not released if I"
X Link 2025-01-21T01:17Z 22.5K followers, 184.1K engagements

""RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time the model tries [--] times (the Group in GRPO) and a gradient step is performed to increase the reward which is a very simple verification of the correct answers repeated thousands of times on the same problem. The shocking finding is that the model does not overfit to this one question: RL on one example makes the model better in MATH500 and other benchmarks. (If instead"
X Link 2025-05-10T23:34Z 22.5K followers, 354.5K engagements

"Life update: I am excited to announce that I will be starting as a Professor in UC Berkeley in the EECS Department. I spend [--] wonderful years teaching in UT Austin and I am grateful to all my colleagues and students there and extremely proud of what we have achieved in AI in UT Austin and I plan to continue my numerous UT close collaborations. I will also continue as Chief Scientist in Bespoke Labs making it much easier now being in the Bay area. I received my Phd in [----] from @Berkeley_EECS and I am thrilled to be back. I am grateful for this new opportunity"
X Link 2024-12-17T20:55Z 22.5K followers, 110.9K engagements

"2/ Scammer ends up improving our sample complexity bound for StyleGAN inverse problems. They teach them to do chaining arguments instead of just union bounds now jeez. @giannis_daras"
X Link 2022-01-29T02:31Z 22.5K followers, [----] engagements

"For the first (and probably last) time in my life I understand the technical details of both the physics and chemistry Nobel prizes. BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and the other half jointly to Demis Hassabis and John M. Jumper for protein structure prediction. https://t.co/gYrdFFcD4T BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and"
X Link 2024-10-09T12:42Z 22.5K followers, 56.4K engagements

"BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and the other half jointly to Demis Hassabis and John M. Jumper for protein structure prediction"
X Link 2024-10-09T09:46Z 1.3M followers, 9.1M engagements

"Doctor: We used a deep learning algorithm for your MRI reconstruction. Turns out one of your kidneys is a cat"
X Link 2021-05-24T18:40Z 22.5K followers, [----] engagements

"One huge advantage of deep learning (vs classical ML models) that is not often discussed is modularity: One can download pre-trained models glue them like Legos and fine tune them end-to-end because gradients flow through. (1/n)"
X Link 2022-03-23T04:18Z 22.5K followers, [----] engagements

"Based on recent papers (Gpt3 Palm dalle2 Gato Metaformer) I am forming the opinion that maybe 'Scale is all you need' possibly even for general intelligence (). Just convert everything to tokens and predict the next token. (1/n)"
X Link 2022-05-17T02:24Z 22.5K followers, [----] engagements

"The term Artificial Intelligence was coined by John McCarthy to avoid association with Cybernetics and specifically its pioneer Norbert Wiener who was already famous pain to work with and working on Cybernetics in MIT. Original quote from McCarthy's Stanford page: . (1/n)"
X Link 2022-04-19T16:18Z 22.5K followers, [---] engagements

"@FernleafFlynn @even_kei @IllithidHeretic Two major industries breaking ways for a paltry sum"
X Link 2020-05-21T18:56Z 22.5K followers, [---] engagements

"Here is a simple way to beat ChatGPT and any similar architecture with one Turing test question. ChatGPT GPT3 and all related Transformers have a finite maximum token sequence length usually 2k to 4k tokens. (1/n)"
X Link 2023-02-23T16:14Z 22.5K followers, 163.2K engagements

"Best to leave TF for later"
X Link 2019-08-31T20:30Z 22.5K followers, [---] engagements

"My thoughts on the now famous Google leak doc: [--]. Open source AI is winning. I agree and that is great for the world and for a competitive ecosystem. In LLMs we're not there but we just got OpenClip to beat openAI Clip and Stable diffusion is better than closed models. [--]. You don't need huge models high quality data is much more efficient and important. Alpacaing models behind APIs further reduces moats. [--]. You can start with a good foundation model and parameter efficient fine-tuning (PEFT) algorithms like Lora work super well in a day. Finally an opening for algorithmic innovations 4."
X Link 2023-05-05T00:45Z 22.5K followers, 189.1K engagements

"Probably the best 1h introduction to LLMs that I've seen. And after 20mins its not an introduction its getting into cutting edge research updates updated up to this month. I had not heard of the data exfiltration by prompt injection or the recent finetuning Poisoning attacks. https://www.youtube.com/watchv=zjkBMFhNj_g&t=2s https://www.youtube.com/watchv=zjkBMFhNj_g&t=2s"
X Link 2023-11-23T07:51Z 22.5K followers, 74.5K engagements

"Excited to be the director for the new Texas Center for Generative AI Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://t.co/jTZd4uV0ps Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://t.co/jTZd4uV0ps"
X Link 2024-01-25T18:06Z 22.5K followers, 52.6K engagements

"Please welcome the Center for Generative AI -- a World-Class AI Research Center with a Texas-Sized GPU Cluster. Led by @AlexGDimakis πŸ’« #YearofAI @TexasScience @UTAustin https://news.utexas.edu/2024/01/25/new-texas-center-will-create-generative-ai-computing-cluster-among-largest-of-its-kind/ https://news.utexas.edu/2024/01/25/new-texas-center-will-create-generative-ai-computing-cluster-among-largest-of-its-kind/"
X Link 2024-01-25T17:51Z [----] followers, 106K engagements

"A Thanksgiving story A few years back I used to play tennis in a ladder system which would match me up with various folks in my neighborhood. After Thanksgiving I had a tennis match with this guy: nice guy two kids a bit overweight in his 50ies I had never met him before. We start our match. During the match he says -Sorry lets stop for a bit I want to catch my breath. -Sure no problem. We start and [--] minutes after he says: -Sorry I ate too much at the Thanksgiving dinner and I have digestion problems. He was burping a bit and looked tired. He asks to reschedule the game I say sure sounds"
X Link 2024-11-29T18:11Z 22.5K followers, 37.5K engagements

"A small experiment: This Tweet has an even number of likes"
X Link 2023-02-07T22:33Z 22.5K followers, 45.5K engagements

"Ptolemy the king of Egypt wanted to learn geometry but found Euclid's book the Elements too difficult to study. So he asked Euclid to show him an easier way to master it. Euclid famously said "Sir there is no royal road to geometry." This is still true a few thousand years later in the days of Youtube and TikTok as Andrej nicely points out. # on shortification of "learning" There are a lot of videos on YouTube/TikTok etc. that give the appearance of education but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy"
X Link 2024-02-11T20:56Z 22.5K followers, 93.6K engagements

"# on shortification of "learning" There are a lot of videos on YouTube/TikTok etc. that give the appearance of education but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are learning (but actually they are just having fun). The people creating this content also enjoy it because fun has a much larger audience fame and revenue. But as far as learning goes this is a trap. This content is an epsilon away from watching the Bachelorette. It's like snacking on those "Garden Veggie Straws" which feel"
X Link 2024-02-10T18:10Z 1.8M followers, 2.2M engagements

"As Information theory was becoming a 'hot' scientific trend in the 50s Claude Shannon wrote a one-page paper advising hype reduction. That never happens anymore. Claude Shannon's "The Bandwagon" (1956) is a timeless gem. Short one page advise and perspective on the status of the field. ". we must keep our own house in first class order. The subject of information theory has certainly been sold if not oversold." https://t.co/Jn0e72B5Bz Claude Shannon's "The Bandwagon" (1956) is a timeless gem. Short one page advise and perspective on the status of the field. ". we must keep our own house in"
X Link 2020-06-21T18:49Z 22.5K followers, [---] engagements

"Claude Shannon's "The Bandwagon" (1956) is a timeless gem. Short one page advise and perspective on the status of the field. ". we must keep our own house in first class order. The subject of information theory has certainly been sold if not oversold.""
X Link 2020-05-28T07:52Z [---] followers, [---] engagements

"I was informed that Alexander Vardy a giant in coding theory passed away. A tragic loss for his family UCSD and academia. Alex's many discoveries include the Polar decoding algorithm used in the 5G wireless standard (1/3)"
X Link 2022-03-15T18:55Z 22.5K followers, [---] engagements

"Here is a very good reason why the NyquistShannon sampling theorem requires that your function is low-pass before you sub-sample to downscale. If you just sub-sample without smoothing a bad guy can place another image exactly on the pixels you sub-sample. Adversarial aliasing. image-scaling attacks are wild small dots added to the image on the left turns it into the image on the right when downscaled could make auditing ML systems very tricky if you only look at the original images. https://t.co/PvidAaxJLS image-scaling attacks are wild small dots added to the image on the left turns it into"
X Link 2021-11-06T05:42Z 22.5K followers, [---] engagements

"image-scaling attacks are wild small dots added to the image on the left turns it into the image on the right when downscaled could make auditing ML systems very tricky if you only look at the original images"
X Link 2021-11-04T06:42Z [----] followers, [----] engagements

"If you are a #neurips2020 reviewer please read the authors rebuttal and at the very least update your review indicating that you read it and your updated thoughts. It takes [--] minutes and its a good step towards decency. Meta-reviewers please enforce this"
X Link 2020-09-02T07:02Z 22.5K followers, [---] engagements

"Honored to be selected as an IEEE Fellow for contributions to distributed coding and learning' Congratulations to the whole Fellows class of [----] https://www.ieee.org/content/dam/ieee-org/ieee/web/org/about/fellows/2022-ieee-fellows-class.pdf Congratulations to @utexasece's Seth Bank @AlexGDimakis and Sriram Vishwanath for being selected as @IEEEorg Fellows https://t.co/yPfwbxMVb9 https://www.ieee.org/content/dam/ieee-org/ieee/web/org/about/fellows/2022-ieee-fellows-class.pdf Congratulations to @utexasece's Seth Bank @AlexGDimakis and Sriram Vishwanath for being selected as @IEEEorg Fellows"
X Link 2021-11-24T02:15Z 22.5K followers, [---] engagements

"Congratulations to @utexasece's Seth Bank @AlexGDimakis and Sriram Vishwanath for being selected as @IEEEorg Fellows https://www.ieee.org/content/dam/ieee-org/ieee/web/org/about/fellows/2022-ieee-fellows-class.pdf https://www.ieee.org/content/dam/ieee-org/ieee/web/org/about/fellows/2022-ieee-fellows-class.pdf"
X Link 2021-11-24T01:51Z [---] followers, [--] engagements

"What are RL environments Are they just evals There is significant confusion in the community so here is my opinion: My answer is inspired by Terminal-bench an elegant framework for creating RL environments evaluating agents and even training agents. First an RL environment is simply a Docker container. It contains three things: [--]. A snapshot of the state of the world when a problem happened. [--]. A task description and [--]. A reward that verifies if the agent has solved the task. Can be using LLM as a judge or run tests. For example lets take the 'broken-python' environment in Terminal bench. The"
X Link 2025-09-11T01:15Z 22.5K followers, 34.6K engagements

"@percyliang @deepseek_ai We are working on fixing that and create the largest open reasoning dataset. More coming very soon πŸ˜‰"
X Link 2025-01-26T07:42Z 22.5K followers, 37.9K engagements

"The Google Gemini paper was released today and has [---] authors. I was impressed but then found that a recent LHC physics paper with [----] authors. The first nine pages describe the research and the other [--] pages list the authors and their institutions. But that's not even the record. The most authors on a single peer-reviewed academic paper is [-----] and was achieved by the COVIDSurg and GlobalSurg Collaboratives at the University of Birmingham and the University of Edinburgh. All [---] Gemini coauthors are expected to quit Google and start [---] LLM startups next year"
X Link 2023-12-20T22:20Z 22.5K followers, 56.2K engagements

"Let the advisor show you how to write the rebuttal https://x.com/i/status/1294367648814424064/video/1 https://x.com/i/status/1294367648814424064/video/1"
X Link 2020-08-19T07:02Z 22.5K followers, [---] engagements

"New neural renderer by Nvidia. The model adds fingerprints smudges and dust and generates renders indistinguishable from real to me. Oh and its done at real-time. Can't wait to see games using this. (1/2)"
X Link 2023-05-07T03:48Z 22.5K followers, 29.9K engagements

"We're very excited that @UT Austin will lead an NSF national Institute on the Foundations of Machine Learning with @UW @WichitaState and @MSFTResearch Announcement: https://news.utexas.edu/2020/08/26/ut-austin-selected-as-home-of-national-ai-institute-focused-on-machine-learning/ https://news.utexas.edu/2020/08/26/ut-austin-selected-as-home-of-national-ai-institute-focused-on-machine-learning/"
X Link 2020-08-26T13:52Z 22.5K followers, [---] engagements

"@nandofioretto Yes that's right. Structure and flow in writing help us organize our thought. Blindly using LLMs is an airbrush that makes it harder for people to see that they have muddled flow"
X Link 2024-09-06T03:22Z 22.5K followers, 42.9K engagements

"I am excited to announce that our AI institute (Institute for Foundations of Machine Learning IFML) has been renewed. IFML was part of the first cohort of AI Institutes announced in [----]. Led by UT Austin the new award will build on the trajectory of the past five years and develop new foundational tools to advance generative AI. NSF IFML's work on diffusion models is a key technology behind major Google products powering widely used generative models such as Stable Diffusion [--] and Flux. In it's next phase NSF IFML will expand generative AI to new domains including protein engineering"
X Link 2025-07-29T17:37Z 22.5K followers, 26.7K engagements

"That's a bit of a simplification right πŸ˜… It's like showing a prototype of Pagerank and saying this is all the code needed to replicate Google search. You need to replicate V3 get a ton of extra data and do many things on top of GRPO to get to R1. It does replicate the core idea for reasoning RL however"
X Link 2025-01-31T00:07Z 22.5K followers, 86.9K engagements

"Who first generated text with statistical methods like GPT In [----] Claude Shannon wrote the landmark paper 'A Mathematical Theory of Communication'. There he defined and estimated the entropy of English by generating synthetic text: 'THE HEAD AND IN FRONTAL ATTACK ON (1/n)"
X Link 2023-02-08T00:15Z 22.5K followers, 42.4K engagements

"References: The Faith and Fate Paper is available here: Video of this great talk here: https://www.youtube.com/watchv=P7ZdUbSAujQ https://arxiv.org/pdf/2305.18654.pdf https://www.youtube.com/watchv=P7ZdUbSAujQ https://arxiv.org/pdf/2305.18654.pdf"
X Link 2023-08-16T00:02Z 22.5K followers, 57.4K engagements

"@raj_raj88 But even fine-tuning with 1.8m multiplication examples was not able to teach it to generalize to other (3 digit) multiplications. This indicates some fundamental architecture limitation"
X Link 2023-08-16T01:43Z 22.5K followers, 27K engagements

"Greece is quite the outlier here in the south on the number of metal bands per Capita. Any explanations Metal bands per [--] million people (Europe) https://t.co/OPEROKiBLo Metal bands per [--] million people (Europe) https://t.co/OPEROKiBLo"
X Link 2022-04-10T19:54Z 22.5K followers, [---] engagements

"Metal bands per [--] million people (Europe)"
X Link 2016-10-20T16:04Z 594.6K followers, 24.3K engagements

"I was thrilled to learn about this best paper award announced today in COLT [----] the premier learning theory venue. The paper is "Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension" authored by students Gautam Chandrasekaran Konstantinos Stavropoulos IFML postdoc Vasilis Kontonis IFML director Adam Klivans and former UT CS PhD Raghu Meka. Smoothed analysis is an ingenious idea of going beyond worst case pioneered by my former USC colleague Shanghua Teng and Dan Spielman). This paper showed how to apply this framework for learning theory. Here is my basic understanding of"
X Link 2024-07-03T16:17Z 22.5K followers, 38.2K engagements

""Datacomp1B is the first public dataset that outperforms OpenAI" #NeurIPS2023"
X Link 2023-12-14T16:46Z 22.5K followers, 38.1K engagements

"My students after every joke I make in a Zoom lecture. (h/t: @OdedRechavi )"
X Link 2020-12-16T18:20Z 22.5K followers, [---] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

creator/x::AlexGDimakis
/creator/x::AlexGDimakis