@samstevens6860 Avatar @samstevens6860 Sam Stevens

Sam Stevens posts on X about ai, gpus, claude, llm the most. They currently have [---] followers and [---] posts still getting attention that total [---] engagements in the last [--] hours.

Engagements: [---] #

Engagements Line Chart

Mentions: [--] #

Mentions Line Chart

Followers: [---] #

Followers Line Chart

CreatorRank: [---------] #

CreatorRank Line Chart

Social Influence

Social category influence pga golfers #2116 technology brands stocks countries cryptocurrencies social networks finance travel destinations automotive brands

Social topic influence ai, gpus, claude, llm, fit, san, token, 1m, leaderboard, open ai

Top accounts mentioned or mentioned by @xeophon @xjdr @vikhyatk @neuripsconf @yugunlp @ysunlp @wightmanr @cloneofsimo @yacinemtb @imageomics @ohiostate @dpetrini @ilyashairline @thexeophon @timdarcet @tinygrad @osunlp @ricardousbeck @nrehiew @willccbb

Top assets mentioned Alphabet Inc Class A (GOOGL) HyperWo (WO) Pixels (PIXEL) Spotify Technology (SPOT)

Top Social Posts

Top posts by engagements in the last [--] hours

"Super proud to be a part of MMMU"
X Link 2023-11-29T03:49Z [--] followers, [----] engagements

"@yugu_nlp @Ricardo_Usbeck I agree with (a) but I disagree that we only need small benchmarks. Old benchmarks were small then MTurk made it easy to gather big benchmarks. Reasoning benchmarks are hard to gather so we now gather small benchmarks. Soon we will scale to larger benchmarks"
X Link 2023-12-01T14:14Z [--] followers, [--] engagements

"@Ricardo_Usbeck @yugu_nlp I think segment anything from Meta is one of the best recent works on model-assisted training data collection; I'd like to see some work on doing the same for LLM evaluation. Or just hard work: MMMU is 11.5K examples collected grad students"
X Link 2023-12-01T14:58Z [--] followers, [--] engagements

"@Ricardo_Usbeck @yugu_nlp collected *by grad students"
X Link 2023-12-01T15:29Z [--] followers, [--] engagements

"@YutongBAI1002 @younggeng @Karttikeya_m @_amirbar @YuilleAlan @JitendraMalikCV Are there ImageNet linear probing results Or any comparisons to existing models"
X Link 2023-12-04T21:52Z [--] followers, [---] engagements

"Integrating lessons from biology and computer vision to develop BioCLIP was challenging and rewarding. It's a big step towards AI understanding evolutionary biology. BioCLIP is a great backbone for computer vision + biology tasks"
X Link 2023-12-11T17:43Z [--] followers, [----] engagements

"Feel free to ask for support if you have issues with the model; I want BioCLIP to be useful for both biologists and computer scientists"
X Link 2023-12-11T17:44Z [--] followers, [--] engagements

"@TouristShaun @ysu_nlp In my experience the organism to classify should be centered and clearly visible. You can also try cropping the image to a square to reduce distortion from resizing"
X Link 2023-12-11T20:43Z [--] followers, [--] engagements

"MoE from a Hardware Perspective An underrated component of MoE models is their efficient mapping to hardware during training. In this thread I'll try to convince you that MoE models are a smarter use of hardware resources than dense models during training"
X Link 2024-01-04T21:55Z [---] followers, 38.8K engagements

"Here's a diagram of a typical dense transformer FFNN layer and how it's split up over some GPUs for training. Tokens move from bottom to top. On each GPU there are [--] copies of $W_i$ and $W_o$ the matrices that make up the fully connected layer in a transformer FFNN layer"
X Link 2024-01-04T21:55Z [---] followers, [----] engagements

"Our batch has [--] sequences with [--] tokens each. The GPUs have enough memory for a local batch size of [--]. Each sequence goes onto its own GPU. To train do a forward pass store the activations compute gradients sync gradients across GPUs update moments then update the weights"
X Link 2024-01-04T21:55Z [---] followers, [---] engagements

"The key points: - We copy the same FFNN weights for each token across all GPUs - If you had different FFNN weights you'd have the same FLOP requirements for a forward/backward pass"
X Link 2024-01-04T21:55Z [---] followers, [---] engagements

"Here's the same diagram as before but for a standard MoE FFNN layer. We now have [--] experts split across [--] GPUs. Each GPU carries [--] copies of each expert (so each expert processes [--] tokens)"
X Link 2024-01-04T21:55Z [---] followers, [---] engagements

"What changed [--]. + We 12x'ed our parameter count (more parameters means a better model). [--]. - We need to map tokens to GPUs (likely learning some trained token-expert affinity). [--]. - We need to combine the experts' outputs. [--]. - We need to keep track of 12x Adam moments"
X Link 2024-01-04T21:55Z [--] followers, [---] engagements

"What hasn't changed [--]. We only do [--] tokens worth of FLOPs for the FFNN even though we have 12x as many parameters. [--]. We only store [--] tokens worth of activations"
X Link 2024-01-04T21:55Z [---] followers, [---] engagements

"The useful bit is how MoE models naturally map onto GPU-based training. If you're Cerebras you can train a huge dense transformer language model on one chip:"
X Link 2024-01-04T21:55Z [--] followers, [---] engagements

"Everyone else splits their model over multiple GPUs. Could be data parallelism or model parallelism or even tensor parallelism. Deciding how to split models is its own research field: you must balance tradeoffs between memory bandwidth FLOPs/sec model size network speed etc"
X Link 2024-01-04T21:55Z [--] followers, [---] engagements

"While MoE models introduce more options I argue that they actually simplify how you split your model onto GPUs"
X Link 2024-01-04T21:55Z [--] followers, [---] engagements

"If your experts are small enough each expert can fit on a single GPU. If your experts are very small you can put multiple copies of an expert on a single GPU (each expert processes more than one token). Your hardware availability dictates your optimal model hyperparameters."
X Link 2024-01-04T21:55Z [---] followers, [---] engagements

"If you're a scientist this is awful You only think about hardware so you can explore how/why LMs work. Your experiments should be reproducible on any hardware. Code abstracts the hardware away from the experiment itself. Choosing hyperparams based on hardware is the opposite"
X Link 2024-01-04T21:55Z [---] followers, [---] engagements

"If you're an engineer this is great Of course your hardware dictates the optimal model you can train. MoE simplifies the software abstraction: each GPU has some parameters that it updates all on its own"
X Link 2024-01-04T21:55Z [---] followers, [---] engagements

"Hopefully this shifted your thinking about MoE models w.r.t. hardware and compute I think lots of papers focus on MoE models being more token-efficient w.r.t. loss and few think about it from a compute perspective. If you do know of such papers please let me know"
X Link 2024-01-04T21:55Z [---] followers, [---] engagements

"@abacaj Do you worry about sandboxing the human eval paper and repo make a big deal out of running untrusted code in a proper sandbox but I can't tell if it's really required or just a recommendation"
X Link 2024-01-13T21:56Z [---] followers, [--] engagements

"@yugu_nlp llama.cpp also has constrained decoding with custom grammars which can help too"
X Link 2024-01-14T16:43Z [---] followers, [---] engagements

"Had a great time doing my best to explain important ML concepts. We started at trendlines in Excel and worked up to the cutting edge of the field discussing foundation models generative AI and synthetic data. @imageomics NextGens and @OhioState PhD student @samstevens6860 discusses the main trade-off in #MachineLearning during Day [--] of our #allhands. https://t.co/GCiTDS1LZC @imageomics NextGens and @OhioState PhD student @samstevens6860 discusses the main trade-off in #MachineLearning during Day [--] of our #allhands. https://t.co/GCiTDS1LZC"
X Link 2024-01-21T03:17Z [--] followers, [---] engagements

"As I finish up my CVPR rebuttal I cannot stop thinking about ImageNet-1K. Here's a thread about ImageNet's success over more than a decade of research"
X Link 2024-01-30T23:43Z [---] followers, 21.9K engagements

"ImageNet (the [----] split) was the first large-scale (1M+ images 1K classes) labeled image dataset which was critical for training deep neural networks. Before ImageNet SVMs dominated image classification. After deep neural networks took over"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"For years getting better top1 accuracy on ImageNet correlated with your proposed architecture being good for other vision tasks"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"As we got into transfer learning good top1 accuracy on ImageNet correlated with your model outputting good features for other vision tasks"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"As we got into pre-training/fine-tuning (similar to fine-tuning but focusing on non-ImageNet datasets for pre-training) good top1 accuracy on ImageNet with linear probing on model features correlated with those features being good for other vision tasks"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"As we got into language-image pre-training (CLIP) good zero-shot top1 accuracy on ImageNet (and its variants) correlated with good zero-shot and few-shot accuracy on other vision and vision/language tasks"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"Hot take: ImageNet correlates so strongly with overall vision tasks that even though ViTs were not immediately applicable to object detection and other models that need hierarchical features we kept going with them anyways bc ViTs did better on ImageNet"
X Link 2024-01-30T23:43Z [---] followers, [----] engagements

"(I'm deliberately ignoring Swin transformers and other architectural modifications to ViT because they were in hindsight more complicated than language model transformers which I think is a good upper bound for complexity in a model architecture.)"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"(Incidentally I think we will find pure ViT options for diffusion models instead of U-net architectures for similar reasons: transformers are simple and extremely effective.)"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"Back to ImageNet: it's amazing that performance on ImageNet is so strongly correlated with overall vision understanding"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"Every time I see a paper on whether we should stop using ImageNet I laugh a little bit. We're never going to stop using ImageNet"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"If you're comparing ImageNet top-1 accuracy with numbers above 90% you're almost certainly overfitting to ImageNet itself (and should actually stop using ImageNet)"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"But if you're trying to improve zero-shot top-1 (CLIP models) ImageNet continues to be a good benchmark We should be shocked Somehow the natural language class labels in ImageNet are also an extremely robust benchmark for evaluating overall vision"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"If you're comparing ImageNet top-1 accuracy with limited FLOPs either total training FLOPs or FLOPs/inference then again ImageNet is also a good benchmark (assuming you're operating in that 70-90% top1 acc regime where you're not overfitting labels)"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"I wasn't working on computer vision until 2022; if I'm misunderstanding how ImageNet affected computer vision research let me know"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"If you're using ImageNet for class-conditioned image generation.I don't know if you should be using ImageNet because I don't spend enough time reading those papers πŸ˜…"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"I sometimes wonder what NLP would be like if we had an ImageNet-like benchmark that was an accurate predictor of language understanding across ten years of progress"
X Link 2024-01-30T23:43Z [---] followers, [---] engagements

"@ysu_nlp Code and data is now available as well as an easy Python package for using BioCLIP thanks John) https://github.com/Imageomics/bioclip https://huggingface.co/datasets/imageomics/TreeOfLife-10M https://github.com/Imageomics/pybioclip https://github.com/Imageomics/bioclip https://huggingface.co/datasets/imageomics/TreeOfLife-10M https://github.com/Imageomics/pybioclip"
X Link 2024-04-04T20:59Z [---] followers, [--] engagements

"@wightmanr @dpetrini Do you have any thoughts on the relationship between patch size and image size Higher-res images contain strictly more signal so assuming that you're not bandwidth bound (not always true) would it make sense to always scale image size and patch size"
X Link 2024-05-14T23:24Z [---] followers, [--] engagements

"@wightmanr @dpetrini def agree I would love to see some work where we keep everything the same but change image and patch sizes measuring both natural image understanding (imagenet) and fine-grained stuff like text/document understanding"
X Link 2024-05-15T00:31Z [---] followers, [--] engagements

"@wightmanr @dpetrini I suspect that it doesn't matter much for natural image understanding because an individual pixel can be safely inferred from its neighbors but it's not true for text"
X Link 2024-05-15T00:32Z [---] followers, [--] engagements

"@wightmanr @dpetrini it might also imply that we can do 128x128 images with 8x8 patches and get a bandwidth improvement rather than 256x256 with 16x16 but use the same GPU FLOPs/forward pass"
X Link 2024-05-15T00:33Z [---] followers, [--] engagements

"@ArmenAgha This paper suggests that language models should benefit from images during pre-training as additional signal. Do you feel Chameleon is better at language-only tasks than llama2 because of the multiple modalities https://arxiv.org/abs/2405.07987 https://arxiv.org/abs/2405.07987"
X Link 2024-05-17T18:59Z [---] followers, [---] engagements

"@isidentical why doesn't webdataset work https://github.com/webdataset/webdataset https://github.com/webdataset/webdataset"
X Link 2024-06-12T01:38Z [---] followers, [---] engagements

"@isidentical I assume you've already tried it"
X Link 2024-06-12T01:38Z [---] followers, [--] engagements

"Excited to be at CVPR presenting BioCLIP DM me if you want to chat about computer vision for animals multimodal foundation models or AI for science The @CVPRConf is the premiere #computervision conference and our researchers are well represented this week. Today they are exhibiting the pioneering field of #Imageomics and @ABC_ClimateCtr at the #CV4Animals workshop. #CVPR2024 #AIforConservation https://t.co/KnzmO6fEm1 The @CVPRConf is the premiere #computervision conference and our researchers are well represented this week. Today they are exhibiting the pioneering field of #Imageomics and"
X Link 2024-06-18T04:56Z [---] followers, [----] engagements

"@sarameghanbeery @imageomics Thanks Sara"
X Link 2024-06-19T16:14Z [---] followers, [---] engagements

"I did start DL work after GPT-2 and I do think architecture doesn't matter. But I'm a student with limited compute and architectures must be empirically validated with thousands of GPU hours. It's silly for me to work on architecture in a world where data quality is a thing. Only folks that started large scale DL work after GPT-2 think architecture doesnt matter the rest saw how much arch work had to happen to get here. Only folks that started large scale DL work after GPT-2 think architecture doesnt matter the rest saw how much arch work had to happen to get here"
X Link 2024-06-23T21:37Z [---] followers, [---] engagements

"I wish I could do new arch work it's great to see and I applaud the big labs for exploring the design space. but it's not an effective of use of my time and energy"
X Link 2024-06-23T21:38Z [---] followers, [--] engagements

"@_xjdr (disclaimer: from my lab) https://arxiv.org/abs/2405.14831 https://arxiv.org/abs/2405.14831"
X Link 2024-06-24T18:00Z [---] followers, [---] engagements

"@_xjdr glad to share this is a nice intro twitter thread: https://x.com/bernaaaljg/status/1795260855002583101 πŸ“£πŸ“£ Super proud to present the most exciting project of my PhD so far: HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. HippoRAG as the title suggests is a brain-inspired RAG framework that enables LLMs to effectively and efficiently https://t.co/f58fMxMRbh https://x.com/bernaaaljg/status/1795260855002583101 πŸ“£πŸ“£ Super proud to present the most exciting project of my PhD so far: HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language"
X Link 2024-06-24T18:07Z [---] followers, [---] engagements

"@cloneofsimo Even if it's not useful it would be great to have a single document with all the notes. OPT was never a good language model but the logbook was an amazing community resource https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf"
X Link 2024-06-24T18:17Z [---] followers, [---] engagements

"@yacineMTB Do you mean literally writing stuff like for batch in dataloader: loss = model(batch); loss.backward() Or something different/lower-level"
X Link 2024-07-10T19:31Z [---] followers, [--] engagements

"@yugu_nlp There's also some work that tries to distill planning/searching (in the form of CoT) into greedy decoding: For specific known tasks it seems like we can explicitly add that "intuition" into the LM weights. https://arxiv.org/abs/2407.06023v1 https://arxiv.org/abs/2407.06023v1"
X Link 2024-07-19T17:29Z [---] followers, [---] engagements

"@wightmanr @sudeeppillai @cursor_ai is really good for refactoring. It handles all the patch formatting and does it with git commits so it integrates well into many different workflows. https://aider.chat/ https://aider.chat/"
X Link 2024-08-01T17:11Z [---] followers, [--] engagements

"@cloneofsimo @main_horse yes we need more SAEs trained on non language models. There seems to be a lot of arch/hparam choices that are underexplored outside of LMs"
X Link 2024-09-21T05:05Z [---] followers, [---] engagements

"@cloneofsimo What do you think about modula in comparison to uP etc. Not affiliated with either but I'm new to hparam transfer across scales https://arxiv.org/abs/2405.14813 https://jeremybernste.in/modula/ https://arxiv.org/abs/2405.14813 https://jeremybernste.in/modula/"
X Link 2024-09-26T13:40Z [---] followers, [--] engagements

"@_xjdr Any chance you can post the implementation [--] lines is short enough for a screenshot πŸ‘€"
X Link 2024-09-30T17:01Z [---] followers, [---] engagements

"@marikgoldstein @cloneofsimo The original kaplan et al scaling paper"
X Link 2024-10-01T04:10Z [---] followers, [---] engagements

"@_xjdr @doolasux What tolerance do you end up choosing for allclose I'm rewriting some torch stuff in jax and am obviously close but not exact.any tips"
X Link 2024-10-07T05:06Z [---] followers, [---] engagements

"@nickcammarata How do you interact w claude the anthropic chat web app or aider or something else"
X Link 2024-10-23T21:36Z [---] followers, [---] engagements

"@vikhyatk pdb is the goat"
X Link 2024-10-25T22:40Z [---] followers, [--] engagements

"@recursus What about I-JEPA V-JEPA data2vec2 etc These models seem to continue to scale well because they remove the dependence on the pixels in their self-supervision"
X Link 2024-11-08T16:46Z [---] followers, [---] engagements

"@marktenenholtz Ryan day torturing OSU fans everywhere :("
X Link 2024-11-30T21:26Z [---] followers, [---] engagements

"@vikhyatk why not tomllib"
X Link 2024-12-01T23:13Z [---] followers, [---] engagements

"If you haven't read The Grug Brained Developer take a couple minutes to check it out (link below). It and "A Philosophy of Software Design" by @JohnOusterhout are my top two best software design guides"
X Link 2024-12-12T17:26Z [---] followers, [--] engagements

"They send the same message: complexity is bad. Grug: "apex predator of grug is complexity.given choice between complexity or one on one against t-rex grug take t-rex" John: "The greatest limitation in writing software is our ability to understand the systems we are creating""
X Link 2024-12-12T17:26Z [---] followers, [--] engagements

"It took me years of programming to fully realize this. The earlier you internalize this idea the better. https://grugbrain.dev/ https://grugbrain.dev/"
X Link 2024-12-12T17:26Z [---] followers, [--] engagements

"Figure [--] in this paper is absolutely killer it makes it so obvious that only some steps/actions matter. Exploiting the "environment" of language to run MC sampling is also a great example of connecting two different fields of expertise (RL and NLP). Love this work VinePPO a straightforward modification to PPO unlocks RLs true potential for LLM Reasoning. It beats RL-free methods (DPO and RestEM) and PPO surpassing it in less steps(up to 9x) less time(up to 3x) and less KL with half memory. Time to rethink RL post-training🧡: 1/n https://t.co/TTf29Xix4I VinePPO a straightforward modification"
X Link 2024-12-17T03:56Z [---] followers, [---] engagements

"More and more I find myself building one-off tools for individual problems. I think aider + Claude + uv is finally making it possible to do this fluidly on-demand without serious effort"
X Link 2024-12-24T20:31Z [---] followers, [--] engagements

"I wrote a small script to track progress towards 10K pullups in [----] using these tools and was done in an hour without getting a notebook out. https://samuelstevens.me/writing/10k https://samuelstevens.me/writing/10k"
X Link 2024-12-24T20:31Z [---] followers, [--] engagements

"It feels like cheating to build tools with such high-quality UX using tools with such high-quality UX (aider running commands and fixing exceptions on its own uv installing half a dozen packages instantly etc)"
X Link 2024-12-24T20:31Z [---] followers, [--] engagements

"@andrew_n_carr What's the HAL-LLM training setup/paper"
X Link 2024-12-27T04:55Z [---] followers, [--] engagements

"@IlyasHairline @_xjdr golden gate claude could also have been explained by prompting"
X Link 2025-01-03T22:38Z [---] followers, [--] engagements

"@IlyasHairline @_xjdr not afaik. just saying that we can hypothesize about differences in prompting vs control vectors by comparing the vibes from golden gate claude and these "steered" claudes in the op"
X Link 2025-01-03T22:51Z [---] followers, [--] engagements

"@IlyasHairline @_xjdr this is a nice post on finding control vectors. I bet you can find a control vector that corresponds with what you describe in your prompt above. https://vgel.me/posts/representation-engineering/#Control_Vectors_v.s._Prompt_Engineering https://vgel.me/posts/representation-engineering/#Control_Vectors_v.s._Prompt_Engineering"
X Link 2025-01-04T00:42Z [---] followers, [--] engagements

"@IlyasHairline @_xjdr not saying prompting isn't good enough or even that control vectors are better. But it's not super clear to me (or I think to anyone) what the differences between control vectors and prompting are in terms of vibes on generated outputs"
X Link 2025-01-04T00:43Z [---] followers, [--] engagements

"What's actually different between CLIP and DINOv2 CLIP knows what "Brazil" looks like: Rio's skyline sidewalk patterns and soccer jerseys. We mapped [-----] visual features in vision models using sparse autoencoders revealing surprising differences in what they understand"
X Link 2025-02-26T02:18Z [---] followers, 30.8K engagements

"The difference between CLIP and visual-only models like DINOv2 is striking. CLIP forms country-specific visual representations while DINOv2 doesn't see these cultural connections. Here are examples from a USA feature and a Brazil feature"
X Link 2025-02-26T02:18Z [---] followers, [----] engagements

"By decomposing dense activations into a larger but sparse space we find a diverse and precise visual vocabulary. But discovering features isn't enough. We need to prove they actually matter for model behavior"
X Link 2025-02-26T02:18Z [---] followers, [---] engagements

"So we built interactive demos where you can suppress specific features and watch model predictions change. See below for examples of what you can do. https://osu-nlp-group.github.io/SAE-V/#demos https://osu-nlp-group.github.io/SAE-V/#demos"
X Link 2025-02-26T02:18Z [---] followers, [----] engagements

"🐦 In this bird classification example when we suppress the "spotted" feature (technically mottling) on this bird's breast and neck the ViT switches from predicting "Canada Warbler" to "Wilson Warbler" a similar bird species but without necklace pattern:"
X Link 2025-02-26T02:18Z [---] followers, [---] engagements

"πŸ– For semantic segmentation we can suppress specific concepts like "sand" across the entire image. The model then predicts the next most likely class (like "earth" or "water") while leaving other parts of the scene unchanged:"
X Link 2025-02-26T02:18Z [---] followers, [---] engagements

"By unifying interpretation with controlled experiments SAEs enable rigorous scientific investigation of vision models. Check out our full paper for more examples and analysis"
X Link 2025-02-26T02:19Z [---] followers, [---] engagements

"@vikhyatk There are but they are not as exciting. SAEs trained on DINOv2 have a lot of different fine-grained textures patterns and objects which is supporting evidence for its strong classification abilities"
X Link 2025-02-26T12:24Z [---] followers, [---] engagements

"I built a bunch of demos for this project and only now understand the value of building demos even crappy ones. it gets your product into the hands of people who don't already believe in your research idea and this is very powerful So we built interactive demos where you can suppress specific features and watch model predictions change. https://t.co/0mSj8AOZsV See below for examples of what you can do. https://t.co/wXzAAY3eTP So we built interactive demos where you can suppress specific features and watch model predictions change. https://t.co/0mSj8AOZsV See below for examples of what you"
X Link 2025-02-28T03:19Z [---] followers, [---] engagements

"I've watched biologists use the BioCLIP and SAMv2 demos to iteratively develop their understanding of model capabilities in real time then use that understanding to shape their study design"
X Link 2025-02-28T03:19Z [---] followers, [--] engagements

"without the demos some of the studies wouldn't exist because biologists and other non-AI experts wouldn't believe that AI can solve their problems. anyways I wrote a guide to making demos at different levels of effort. check it out and build some demos https://samuelstevens.me/writing/interactive-demos https://samuelstevens.me/writing/interactive-demos"
X Link 2025-02-28T03:19Z [---] followers, [--] engagements

"@nrehiew_ what about grok I remember AIME scores for [----] and [----] being shown on the livestream"
X Link 2025-03-08T00:38Z [---] followers, [---] engagements

"@nrehiew_ checked the blog post and their reported scores are consensus@64 + their o3-mini scores don't match the table you have. But they report [----] for grok [--] thinking for both '24 and '25 (diff = 0) and [----] - [----] (a drop of 5%) for grok [--] mini thinking"
X Link 2025-03-08T00:48Z [---] followers, [---] engagements

"Writing your own programs to abstract complex steps of behaviors away is a great step towards general intelligence and the web is a great playground for such techniques. Really stoked about this great work from Boyuan and the OSU NLP group. πŸ”§What if your web agent could abstract its experience into programmatic skillsand improve itself autonomously 🌟 Introducing SkillWeaver: a framework to enable self-improvement through autonomous exploration and constructing an ever-growing library of programmatic skills. 🧠 https://t.co/BVVNenjrqm πŸ”§What if your web agent could abstract its experience"
X Link 2025-04-10T22:32Z [---] followers, [---] engagements

"it doesn't bother me that LLMs cannot reliably do multi-digit addition or multiplication. many of my labmates are worried that LLMs can't do symbolically simple tasks like association copying addition etc. (see @BoshiWang2's work as an example) https://x.com/BoshiWang2/status/1909772639104540677 LLMs exhibit the Reversal Curse a basic generalization failure where they struggle to learn reversible factual associations (e.g. "A is B" - "B is A"). But why Our new work uncovers that it's a symptom of the long-standing binding problem in AI and shows that a model design https://t.co/oTGuQbGBLS"
X Link 2025-04-12T16:02Z [---] followers, [---] engagements

"why doesn't it bother me I think a popular mental model of intelligence is building skills on top of each other like a pyramid. through evolution or schooling humans learned lots of new skills but each skill depended on previous building blocks"
X Link 2025-04-12T16:02Z [---] followers, [--] engagements

"if this is your model of intelligence then missing the ability to copy strings or do multi-digit addition implies that LLMs cannot reach superhuman intelligence because they lack the foundation to do so"
X Link 2025-04-12T16:02Z [---] followers, [--] engagements

"another plausible argument (to me) is that llms can learn higher-level skills without lower-level skills because evolution did not kill off llms with missing foundational skills"
X Link 2025-04-12T16:02Z [---] followers, [--] engagements

"under this model it doesn't matter that LLMs can't do multi-digit multiplication or inductive reasoning in their weight space because they have other ways to do so (coding tool use etc). furthermore the models can still achieve superhuman intelligence"
X Link 2025-04-12T16:02Z [---] followers, [--] engagements

"why this might be wrong: [--]. gradient descent is a sort of evolution where each step of "learning" must lead to improvement. [--]. the arc agi benchmark: a non-trivial task that models fail at and cannot solve with existing tools"
X Link 2025-04-12T16:02Z [---] followers, [--] engagements

"thanks to @BoshiWang2 and @bernaaaljg for the great discussion and forcing me to introspect and think about why I'm unbothered about multi-digit addition. 10/10 recommend the OSU NLP group for stuff like this"
X Link 2025-04-12T16:02Z [---] followers, [--] engagements

"@wordgrammer can you do this with hypothesis + pytest it tries to find minimal test cases that find bugs I use it to make sure reference implementations and fast/custom implementations are identical"
X Link 2025-04-26T00:49Z [---] followers, [--] engagements

"@wordgrammer one small example is here: https://github.com/OSU-NLP-Group/saev/blob/91f4d66e970bfcb9c67af6a13dda3c37ca1e66ae/saev/nn/test_objectives.py#L88 https://github.com/OSU-NLP-Group/saev/blob/91f4d66e970bfcb9c67af6a13dda3c37ca1e66ae/saev/nn/test_objectives.py#L88"
X Link 2025-04-26T00:49Z [---] followers, [--] engagements

"@TheXeophon for compensation/incentives: contributing meaningfully to the dataset collection (scraping filtering annotating LLM answers) meant you would be a middle author (source: I am an author)"
X Link 2025-05-05T11:36Z [---] followers, [---] engagements

"@willccbb I built a custom LSP so I can highlight parts of my .md file along with a revision instruction send it to the LLM and go on to the next part of the document to continue editing but still not good enough from a UI perspective. i'd love for the ink and switch lab to work on this"
X Link 2025-05-06T14:54Z [---] followers, [--] engagements

"@art_zucker what does simpler code look like I remember looking at transformers model definitions two years ago and really struggling to understand bc of all the abstractions in comparison to repos like gpt-fast's model code: https://github.com/pytorch-labs/gpt-fast/blob/main/model.py https://github.com/pytorch-labs/gpt-fast/blob/main/model.py"
X Link 2025-05-15T15:36Z [---] followers, [---] engagements

"@natolambert o3 in feels better for coding right now even with the overhead of copy-paste just because of the built-in web search and o3's willingness to use tools http://chatgpt.com http://chatgpt.com"
X Link 2025-05-19T00:00Z [---] followers, [---] engagements

"@Hesamation tyro over typer in my opinion. It works trivially with the huge config classes with dozens of nested fields that are typical in ML projects"
X Link 2025-05-20T00:13Z [---] followers, [---] engagements

"@_xjdr can you say more about how big/long the worker tasks are and then any details about optimal project structure best practices etc are these insights general across all models/agent scaffolds or only true for some"
X Link 2025-05-21T16:14Z [---] followers, [---] engagements

"@distributionat ". the primary point of research evals is to set research directions" I haven't heard this idea before but it makes intuitive sense. Is this core idea discussed anywhere else"
X Link 2025-05-22T15:49Z [---] followers, [---] engagements

"@TheXeophon but eval work is likely the best way to affect big lab research direction. I think most eval work doesn't spend enough effort updating the leaderboard. For example came out but has no results for sonnet-4 and I wouldn't be surprised if it never does https://www.tbench.ai/leaderboard https://www.tbench.ai/leaderboard"
X Link 2025-05-25T19:36Z [---] followers, [---] engagements

"@TheXeophon I think aider's leaderboard is a good example of an eval that the maintainer @paulgauthier continually updated with new models until recently openai used it as an eval for gpt-4.1 https://openai.com/index/gpt-4-1/ https://openai.com/index/gpt-4-1/"
X Link 2025-05-25T19:37Z [---] followers, [--] engagements

"@TheXeophon @paulgauthier these two are just examples of benchmarks that have been updated (aider polyglot) and have not been updated (terminal bench). I think a big differentiating factor for an eval's popularity is the continuing evaluation of models rather than a static leaderboard"
X Link 2025-05-25T23:18Z [---] followers, [--] engagements

"@_lukaemon why hasn't voyager taken off in other settings agents on the web agents in your terminal etc"
X Link 2025-05-30T17:51Z [---] followers, [--] engagements

"@andrew_n_carr @cloneofsimo what are shortcut models"
X Link 2025-05-31T03:17Z [---] followers, [---] engagements

"@_lukaemon I looked into this some and while webvoyager exists I think the lack of curriculum/long-term goal prevents voyager in the terminal. claude code can write shell/python scripts to help itself but the lack of "get diamond" as a goal is the blocker imo http://arxiv.org/abs/2401.13919 http://arxiv.org/abs/2401.13919"
X Link 2025-05-31T15:57Z [---] followers, [--] engagements

"@Dorialexander For a while it seemed that DoRA/QDoRA would displace LoRA. Any reason LoRA still is the default Or is LORA a catch-all for PEFT methods"
X Link 2025-06-02T14:09Z [---] followers, [---] engagements

"@TimDarcet How do you think about linear probing vs non-linear probing like attentive probes Obv DINOv2 is a great model but the goal of linearly separable classes might be too extreme"
X Link 2025-06-22T18:54Z [---] followers, [---] engagements

"@TimDarcet yeah I've been thinking a lot about feature linearity (because of my work with SAEs). this idea that only linear probes are good enough "MLPs are less indicative of raw representations" is very odd to me"
X Link 2025-06-23T13:07Z [---] followers, [---] engagements

"@TimDarcet dinov2 are biased to be linear (due to your evaluation) but there's no reason for vjepa features to be linear (deep decoder during training). so I dont intrinsically expect classes to be linearly separable in vjepas repr. space but that doesnt mean the repr. space isnt good"
X Link 2025-06-23T13:07Z [---] followers, [---] engagements

"@TimDarcet that's really insightful much appreciated. do you know of any prior work that discusses this goal explicitly"
X Link 2025-06-24T17:24Z [---] followers, [--] engagements

"I'm excited to bring the Imageomics workshop to NeurIPS [----] Consider submitting your work on ai4ecology ai4conservation and general ai4science--if you're using images to learn something about the natural world chances are it's a good fit for the imageomics workshop Announcing the @NeurIPSConf [----] workshop on Imageomics: Discovering Biological Knowledge from Images Using AI The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego #NeurIPS2025 https://t.co/NE7pzIKhzj Announcing the @NeurIPSConf 2025"
X Link 2025-07-23T23:34Z [---] followers, [----] engagements

"@vikhyatk @halcyonrayes what does the internal tool better than wandb/other existing solutions do you find that you have meaningfully improved your visibility into training processes"
X Link 2025-07-28T23:59Z [---] followers, [---] engagements

"@f_charton @GuillaumeLample Congrats Franois"
X Link 2025-09-02T12:01Z [---] followers, [---] engagements

"@jasonjoyride @prodbysister spotify"
X Link 2025-02-16T00:09Z [---] followers, [---] engagements

"siglip was a great model on a ton of my internal benchmarks I'm so excited to read this paper and play with siglip [--] Google presents: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding Localization and Dense Features Opensources model ckpts with four sizes from 86M to 1B https://t.co/cvqu7oWdAl Google presents: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding Localization and Dense Features Opensources model ckpts with four sizes from 86M to 1B https://t.co/cvqu7oWdAl"
X Link 2025-02-21T03:52Z [---] followers, [---] engagements

"@tinygrad sells a computer called the tinybox to commoditize the petaflop and enable AI for everyone. I love the idea of tinygrad and the tinybox but I dont think the tinybox will ever be able to run the best LLMs"
X Link 2025-11-21T14:51Z [---] followers, [--] engagements

"@tinygrad I think it will be infeasible for me to afford sufficient compute (VRAM and FLOP/s) to run the best LLMs for many years. Thus I have to outsource that to model providers (OpenAI and Anthropic but also third parties like Groq or Cerebras)"
X Link 2025-11-21T14:52Z [---] followers, [--] engagements

"@tinygrad In contrast I can spend compute to provide more parallel environments for the LLMs. Rather than use Codex Cloud or Googles Jules I can run coding environments (Docker images) on my personal compute in parallel"
X Link 2025-11-21T14:52Z [---] followers, [--] engagements

"@tinygrad I will still be compute bound but I think it will be more general compute (CPU) rather than matrix-multiplies (GPU)"
X Link 2025-11-21T14:52Z [---] followers, [--] engagements

"@tokenbender screenshotted at just the right timestamp"
X Link 2025-11-25T18:59Z [---] followers, [--] engagements

"@bradhilton @tokenbender I agree with this attention seems to be way over-discussed relative to the total flop/fwd pass spent on mlps. also why do we use MoE mlps but not MoE attn"
X Link 2025-11-26T14:51Z [---] followers, [--] engagements

"@mbodhisattwa @allen_ai I do AI for scientific discovery and will be at neurips; would love to grab coffee and discuss RS positions. https://arxiv.org/abs/2511.17735 https://arxiv.org/abs/2511.17735"
X Link 2025-12-01T03:29Z [---] followers, [--] engagements

"@AndrewLampinen I'm using SAEs to enable scientific discovery from large scientific foundation models. I'd love to pick your brain on representations in transformers over coffee"
X Link 2025-12-01T16:44Z [---] followers, [--] engagements

"@jwuphysics @NeurIPSConf Really cool work. I led a similar effort in SAEs for morphological traits of fish. We had a similar struggle with going from re-discovery of existing concepts to finding new concepts. Did you figure out any good way to search SAE features for new discoveries"
X Link 2025-12-01T22:47Z [---] followers, [--] engagements

"@jwuphysics @NeurIPSConf is our preprint and our code is at we need a project page/summary still but the preprint is finished https://github.com/OSU-NLP-Group/saev/tree/main/contrib/trait_discovery https://arxiv.org/abs/2511.17735 https://github.com/OSU-NLP-Group/saev/tree/main/contrib/trait_discovery https://arxiv.org/abs/2511.17735"
X Link 2025-12-01T22:48Z [---] followers, [--] engagements

"@livgorton I think the linear representation hypothesis could be a paradigm-defining theory. There are examples where LRH doesn't hold and alternative theories exist (Minkowski Representation Hypothesis proposed in but LRH is pretty dominant. https://arxiv.org/pdf/2510.08638 https://arxiv.org/pdf/2510.08638"
X Link 2025-12-02T19:10Z [---] followers, [--] engagements

"@livgorton Interesting. I do SAEs for vision where most pre-training tries to learn features that enable linear probing so I think patch representations are inherently linearly separable/aligned (dinov3 siglip). I have no intuition whether that's true for the middle layers of LLMs"
X Link 2025-12-02T19:22Z [---] followers, [--] engagements

"@rajammanabrolu tacos el gordo/taco stand"
X Link 2025-12-02T22:48Z [---] followers, [---] engagements

"Jianyang is a fantastic collaborator. He led BioCLIP [--] to a huge success and is also a great mentor to younger students and larger teams (FinerCAM SST). 10/10 recommend working with/hiring Jianyang https://x.com/vimar_gu/status/1996285313950622155 https://x.com/vimar_gu/status/1996285313950622155"
X Link 2025-12-04T07:34Z [---] followers, [----] engagements

"@yacineMTB I actually just started working on this. I plan on starting with Bird-MAE and BirdAVES as pretrained models. for data I'm gonna use and for general domain bird song. https://huggingface.co/datasets/DBD-research-group/BirdSet https://xeno-canto.org/ https://huggingface.co/datasets/DBD-research-group/BirdSet https://xeno-canto.org/"
X Link 2025-12-09T16:42Z [---] followers, [--] engagements

"@yacineMTB based on vision models I don't think MAE is the optimal pre-training objective but it's big and pretrained already sooo"
X Link 2025-12-09T16:43Z [---] followers, [--] engagements

"@yacineMTB but I'm just going to train SAEs on this audio in order to find patterns in bird song between/within species rather than species classification. for pure species classification look at Perch [---] and BirdNet"
X Link 2025-12-09T16:44Z [---] followers, [--] engagements

"@yacineMTB the kitzes lab and sam lapp are probably world-class in this area tbh https://samlapp.com/ https://www.kitzeslab.org/publications/ https://samlapp.com/ https://www.kitzeslab.org/publications/"
X Link 2025-12-09T16:46Z [---] followers, [--] engagements

"@xeophon_ I am continually impressed by moondream by @vikhyatk as well as isaac by perceptron. they're not frontier-tier LLM intelligence but for tasks that require visual reasoning they are really strong imo"
X Link 2025-12-10T14:33Z [---] followers, [---] engagements

"@xeophon_ @vikhyatk I just wish they would compare against each other instead of only comparing against qwen/gemma/etc"
X Link 2025-12-10T14:33Z [---] followers, [--] engagements

"@xeophon_ @vikhyatk otoh neither of them succeed at extracting the shared tasks from this screenshot while gemini [--] is perfect so what do I know"
X Link 2025-12-10T14:41Z [---] followers, [--] engagements

"@xeophon_ @vikhyatk gemini's comparison table"
X Link 2025-12-10T14:42Z [---] followers, [--] engagements

"@maxrumpf unrelated to model 'personality' but this post is a cool visualization of the inherent instability in hparams https://sohl-dickstein.github.io/2024/02/12/fractal.html https://sohl-dickstein.github.io/2024/02/12/fractal.html"
X Link 2025-12-12T01:30Z [---] followers, [--] engagements

"@xeophon_ @Designarena how do you use gemini via google API + amp or a google AI subscription + gemini cli"
X Link 2025-12-14T17:50Z [---] followers, [---] engagements

"@_xjdr can you talk about why duckdb I'm on board with sqlite/embedded dbs for tracking but why switch from sqlite for metrics"
X Link 2025-12-17T16:45Z [---] followers, [---] engagements

"@paradite_ @xeophon as a measure of a skill maybe not good. as an example of continuing to run evals on new models until big labs started including it in their release notes great example"
X Link 2026-01-11T15:35Z [---] followers, [--] engagements

"@tenobrus is there a simple skill md for this I have a codex-oracle skill but claude doesn't use it as frequently as I'd like"
X Link 2026-01-12T14:32Z [---] followers, [--] engagements

"@scaling01 this is like retro right retrieve some n-grams during training is the operational cost of running a retrieval db/server worth the language modeling gains if your db goes down then your GPUs are just burning hours with no gains"
X Link 2026-01-12T17:11Z [---] followers, [----] engagements

"@yoobinray @__roycohen opencode + codex [---] here I come"
X Link 2026-01-13T16:39Z [---] followers, [--] engagements

"@yoobinray philosophy of software design is an all time great"
X Link 2026-01-14T14:10Z [---] followers, [--] engagements

"@yoobinray @__roycohen yep I agree opencode feels crazy snappy compared to cc almost like but without using ACP https://github.com/batrachianai/toad https://github.com/batrachianai/toad"
X Link 2026-01-14T23:30Z [---] followers, [--] engagements

"@croissanthology @claudeai @AnthropicAI also don't have a cowork button I'm in the US and have had Max for months :("
X Link 2026-01-15T17:05Z [---] followers, [--] engagements

"@Rasmic what harness are you using"
X Link 2026-01-16T00:09Z [---] followers, [--] engagements

"@OpenAI great to see folks building. but is building go to market software or "AI agents for concierge customer experience" really the frontier that openai is pushing for"
X Link 2026-01-22T17:09Z [---] followers, [---] engagements

"@willdepue would superbpe help with this. I know modded nano doesn't change the vocab but if bigrams are really that useful would a vocab that includes bigrams help w this"
X Link 2026-01-22T20:18Z [---] followers, [---] engagements

"@A_K_Nain would love to see any samples. I'm trying with submitit + slurm on a single node [--] gpus"
X Link 2026-01-02T17:32Z [---] followers, [--] engagements

"@ludwigABAP @beffjezos for some reason termius on android doesn't let me scroll in tmux shells; have you dealt with this before"
X Link 2026-01-20T22:07Z [---] followers, [---] engagements

"@DimitrisPapail writing a unit test is the hard part though"
X Link 2026-01-21T23:41Z [---] followers, [----] engagements

"@willccbb could also be called a dspy module"
X Link 2026-01-23T21:39Z [---] followers, [---] engagements

"@kilian_maciej how does data sparsity compare to soft MoE which also lets you allocate a constant compute budget regardless of the number of tokens https://arxiv.org/abs/2308.00951 https://arxiv.org/abs/2308.00951"
X Link 2026-01-25T22:32Z [---] followers, [---] engagements

"@thsottiaux stoked to see that the codex app works on my [----] macos machine. the chatgpt app doesn't any chance you can colab with the desktop app folks to make that possible"
X Link 2026-02-02T18:32Z [---] followers, [---] engagements

"@gdb do you miss keyboard shortcuts while editing leaving the terminal feels like going back in time because all of my keyboard driven workflows disappear"
X Link 2026-02-02T18:32Z [---] followers, [---] engagements

"@xeophon I don't do llm research so I have no intelligent takes here BUT every time someone says llms won't be able to do X someone makes them do X (math proofs arc-agi scientific discovery etc)"
X Link 2026-02-10T19:02Z [---] followers, [--] engagements

"@xeophon true but if you're bottlenecked by running a headless chrome instance because concur/salesforce only has a web ui and not an http api"
X Link 2026-02-10T19:07Z [---] followers, [--] engagements

"@nrehiew_ pixel space l2 dist doesn't relate to similarity & is a poor measure of img quality (blurry imgs have low l2 dist) I do discriminative img research and not img gen but this was surprising to me. how did you build up this belief specific papers blogs models"
X Link 2026-02-11T16:45Z [---] followers, [--] engagements

"@xeophon I think we chatted a bit about this at neurips; maybe we never run llms on our own machines but the parallel sandboxes with browsers/compilers/test harnesses you'll need a big machine for that"
X Link 2026-02-10T18:55Z [---] followers, [---] engagements

"@khoomeik isn't your brain constantly running JEPA we don't actually predict the next token we predict the embedding of the next token (but your point about loss masking is clear)"
X Link 2026-02-16T01:15Z [---] followers, [---] engagements

"RT @hhsun1: Unpopular (but urgent) take amid the frenzy around GPT-5.3-Codex Claude Opus [---] and OpenClaw: With more people giving their"
X Link 2026-02-10T18:05Z [---] followers, [--] engagements

"Unpopular (but urgent) take amid the frenzy around GPT-5.3-Codex Claude Opus [---] and OpenClaw: With more people giving their mouse and keyboard to computer-use agents the scariest thing is that we havent figured out how to monitor their actions detect misaligned ones and correct them before execution. Agents get tricked by malicious injections delete files even without attacks or wander off-task to perform irrelevant actions causing real harm or derailing progress. We tackle this head-on: MisActBench: First systematic benchmark for misaligned action detection built from real agent"
X Link 2026-02-10T17:52Z [----] followers, 10.1K engagements

"Computer-use agents (CUAs) are getting really capable. But as their autonomy grows the stakes of them going off-task get much higher 🚨 They can be misled by malicious injections embedded in websites (e.g. a deceptive Reddit post) accidentally delete your local files or just wander into irrelevant apps on your laptop. Such misaligned actions can cause real harm or silently derail task progress and we need to catch them before they take effect. We present the first systematic study of misaligned action detection in CUAs with a new benchmark (MisActBench) and a plug-and-play runtime guardrail"
X Link 2026-02-10T17:37Z [---] followers, 12.8K engagements

"There are competing views on whether RL can genuinely improve base model's performance (e.g. pass@128). The answer is both yes and no largely depending on the interplay between pre-training mid-training and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic GSM-like reasoning data from scratch. Here are what we found: 🧡"
X Link 2025-12-09T20:20Z [----] followers, 326.6K engagements

"Im at NeurIPS and giving a keynote at the @imageomics workshop on Saturday If you want to learn more about my new labs research be sure to drop by and dont forget to stick around for all the great talks and posters throughout the workshop https://imageomics.github.io/Imageomics-NeurIPS-2025/ https://imageomics.github.io/Imageomics-NeurIPS-2025/"
X Link 2025-12-03T18:15Z [---] followers, [----] engagements

"Jianyang is a fantastic collaborator. He led BioCLIP [--] to a huge success and is also a great mentor to younger students and larger teams (FinerCAM SST). 10/10 recommend working with/hiring Jianyang https://x.com/vimar_gu/status/1996285313950622155 https://x.com/vimar_gu/status/1996285313950622155"
X Link 2025-12-04T07:34Z [---] followers, [----] engagements

"Going to #NeurIPS2025 San Diego Escape the conference for a couple hours with a morning bird walkin the trails of Balboa Park. [--] am on Thurs Dec. 4"
X Link 2025-11-28T14:13Z [----] followers, [---] engagements

"Life update: I moved to silicon valley to tackle agents' biggest challenges: plasticity and reliability. Today's agents are smart but brittle. They lack plasticity (continual learning and adaptation) and reliability (stable predictable behavior with bounded failures). These two traits define whether agents become critical infrastructure or remain clever demos. Plastic systems like to change. Reliable systems resist change. Is it even possible to have both of these seemingly conflicting traits Fortunately humans are a living example of that. We are constantly learning and adapting while"
X Link 2025-11-25T20:47Z 12.5K followers, 81.3K engagements

"RT @ysu_nlp: BioCLIP [--] - #neurips25 Spotlight AC's comment restores my faith in peer-review: "I recommend this work for spotlight due"
X Link 2025-09-18T22:20Z [---] followers, [--] engagements

"BioCLIP [--] - #neurips25 Spotlight AC's comment restores my faith in peer-review: "I recommend this work for spotlight due to its potential impact in a relatively underexplored area. The work provides a large scale curated dataset a trained embedding space and extensive analyses and experiments on the resulting embedding space providing valuable insights and tools for the research community. Assuming that the authors will release the dataset and the trained model we believe these resources will open up new directions for future research in biological vision." And yes we did have released the"
X Link 2025-09-18T18:22Z 12.5K followers, 21K engagements

"πŸ“ˆ Scaling may be hitting a wall in the digital world but it's only beginning in the biological world We trained a foundation model on 214M images of 1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧡"
X Link 2025-06-11T17:44Z 12.5K followers, 42K engagements

"Computer Use: Modern Moravec's Paradox A new blog post arguing why computer-use agents may be the biggest opportunity and challenge for AGI. Table of Contents Moravecs Paradox Moravec's Paradox in [----] Computer use may be the biggest opportunity for AGI Chatbots agents Internet-scale learning of human cognition Bits atoms Enormous economic value Why is computer use hard for AI Computer use clicks + typing Idiosyncratic environments Contextual understanding Tacit knowledge Is RL the panacea Looking forward If you are also excited about CUAs and want to do some serious work let's chat"
X Link 2025-09-03T20:55Z 12.5K followers, 55.5K engagements

"πŸ§ͺ Chemists spend many hours planning and replanning synthetic routes for a target molecule to avoid dangerous reactants and intermediates.☠🚫 πŸ€” What if an AI agent could plan around them automaticallybetter and faster than human experts πŸ”¬ Constrained retrosynthesis planningnot only finding a valid synthetic route but also making sure the route satisfies practical constraintsremains a major challenge for AI for chemistry. πŸš€ Recently we have made exciting progress in this challenge Led by @FrazierBaker we propose LARC the first LLM-based Agentic framework for Retrosynthesis planning under"
X Link 2025-09-03T01:52Z [----] followers, [----] engagements

"πŸŽ‰ Excited to share that our paper EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution was accepted at VLDB [----] πŸš€ πŸ“’ Reminder: join us at VLDB [----] in London πŸ—“ Sept [--] (Tue) 10:45 AM 12:15 PM πŸ“ Room Wordsworth 4F πŸ“„ #VLDB2025 #LLMs https://www.vldb.org/pvldb/vol18/p3655-zhang.pdf https://www.vldb.org/pvldb/vol18/p3655-zhang.pdf"
X Link 2025-09-02T03:04Z [---] followers, [----] engagements

"Welcome back students At @imageomics and the @ABCGlobalCenter were kicking off the semester with big questions: How can AI help us better understand and protect life on Earth #AIforNature #AIforGood @OSUengineering"
X Link 2025-08-25T16:23Z [---] followers, [---] engagements

"πŸš€ Still have a chance to submit to @NeurIPSConf for our Multi-Turn Workshop πŸ† Best Paper Awards πŸŽ“ 10-15 Registration Waivers for student authors 🎀 New panelist: @willccbb from @primeintellect Deadline is August 22only [--] days left πŸŽ‰ Thanks to our sponsor @OrbyAI We also invite you to become a reviewer and help shape the future of multi-turn research Links for submission and reviewer sign-up are below. πŸ‘‡ #NeurIPS #Agents #MultiTurnAgenticRL #MultiTurnAlignment #CallForReviewers"
X Link 2025-08-13T16:24Z [---] followers, 16.5K engagements

"πŸš€ Excited to share our #ACL2025 Findings paper: Explorer a scalable pipeline that generates diverse web trajectories via exploration powering generalist GUI agents with strong performance πŸ“„ 🌐 #WebAgents #SyntheticData #LLM https://osu-nlp-group.github.io/Explorer/ https://arxiv.org/pdf/2502.11357 https://osu-nlp-group.github.io/Explorer/ https://arxiv.org/pdf/2502.11357"
X Link 2025-07-28T21:31Z [---] followers, [----] engagements

"As AI agents start taking real actions online how do we prevent unintended harm We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧡"
X Link 2025-07-28T21:05Z 73K followers, 32.2K engagements

"RT @boyuan__zheng: Remember Son of Anton from the Silicon Valley show(@SiliconHBO) The experimental AI that efficiently orders [----] l"
X Link 2025-07-29T02:48Z [---] followers, [--] engagements

"Remember Son of Anton from the Silicon Valley show(@SiliconHBO) The experimental AI that efficiently orders [----] lbs of meat while looking for a cheap burger and fixes a bug by deleting all the code Its starting to look a lot like reality. Even [--] months ago my own simple web agent SeeAct booked me a Tesla demo drivewithout me noticing. Now imagine what far more powerful agents (Operator Claude Computer Use ChatGPT Agents) are capable of. Autonomy is powerful. But without guardrails the internet risks becoming a wild west of agents. Thats why we built WebGuard: the first large-scale dataset"
X Link 2025-07-29T01:43Z [---] followers, [----] engagements

"I'm excited to bring the Imageomics workshop to NeurIPS [----] Consider submitting your work on ai4ecology ai4conservation and general ai4science--if you're using images to learn something about the natural world chances are it's a good fit for the imageomics workshop Announcing the @NeurIPSConf [----] workshop on Imageomics: Discovering Biological Knowledge from Images Using AI The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego #NeurIPS2025 https://t.co/NE7pzIKhzj Announcing the @NeurIPSConf 2025"
X Link 2025-07-23T23:34Z [---] followers, [----] engagements

"Announcing the @NeurIPSConf [----] workshop on Imageomics: Discovering Biological Knowledge from Images Using AI The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego #NeurIPS2025"
X Link 2025-07-23T23:26Z [--] followers, 13.4K engagements

"🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs) co-advised with @ysu_nlp @osunlp. We welcome strong applicants with experience in CUAs long-horizon reasoning/planning safety or RL. Please DM or reach out 🧠 About OSU NLP: Were a vibrant and forward-looking group pushing the frontier of AI agents and their safety. Our recent projects include: πŸ”Ή Mind2Web Online-Mind2Web Mind2Web [--] πŸ”Ή SeeAct UGround WebDreamer πŸ”Ή EIA RedTeamCUA Grokked Transformers πŸ’» Infrastructure: In addition to group-owned"
X Link 2025-07-15T15:45Z [----] followers, 20.7K engagements

"πŸ”ŽAgentic search like Deep Research is fundamentally changing web search but it also brings an evaluation crisis⚠ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - [---] tasks (each requiring avg. 100+ webpages) from 1000+ hours of expert labor - Agent-as-a-Judge with 99% reliability - Comprehensive eval and analysis of frontier systems against humans Takeaway: Best-performing system (OpenAI Deep Research) already achieves 50-70% of human performance while spending half the time. Humans are subject to cognitive fatigue and limited working memory for such complex tasks."
X Link 2025-06-27T17:35Z 12.5K followers, 41.5K engagements

"RT @luke_ch_song: Are you at #CVPR2025 RoboSpatial Oral is today πŸ“… June [--] (Sat) πŸ• 1:00 PM πŸ“Oral Session 4B @ ExHall A2"
X Link 2025-06-14T15:29Z [---] followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing