[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.] [@willccbb](/creator/twitter/willccbb) "@afurgs at that point just hand over the hardware lol"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947849195521774040) 2025-07-23 02:40:12 UTC 27.6K followers, XXX engagements "@tw_killian the research findings yes what's not clear is how much curated human data they're using nowadays for training model-based verifiers to RL against if deep research + operator + agent + IMO needed tons of human annotations system is not AGI if e2e synthetic system is maybe AGI"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947036808195317813) 2025-07-20 20:52:03 UTC 27.6K followers, 3667 engagements "@korigero @menhguin sounds good doesnt work nobody is using *pretrained LLMs* for this kind of forecasting at scale theyre using much smaller models with LLM-inspired architectures trained exclusively on financial data"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947404088540598545) 2025-07-21 21:11:30 UTC 27.6K followers, XXX engagements "@TheAhmadOsman i dont need it to beat opus or kimi. i just want it to be like V3-0324 but smaller"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947372243710771624) 2025-07-21 19:04:57 UTC 27.6K followers, XXX engagements "i for one am very glad to have a true non-thinking qwen3 with absolutely bonkers benchmarks and none of that /no_think think/think business"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947371221835755976) 2025-07-21 19:00:54 UTC 27.6K followers, 19.2K engagements "are there any standard LLM evaluations that have a friendly programming interface and clean documentation for using them with API clients i do not want to run a custom cli tool i want to uv add and import and pass an OpenAI client"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1945164715774152899) 2025-07-15 16:53:02 UTC 27.5K followers, 14.3K engagements "@TheodoreGalanos but actually the tool call responses are also simulated the tools are never real hallucination is a feature not a bug"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947420753915511264) 2025-07-21 22:17:43 UTC 27.5K followers, XXX engagements "@Miles_Brundage scans much more like the way youd hand-write a proof in a seated exam vs a take-home problem set"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947599517983117492) 2025-07-22 10:08:04 UTC 27.6K followers, 3433 engagements "cant stop thinking about this one insanely elegant seems insanely powerful"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1945940824379752733) 2025-07-17 20:17:01 UTC 27.6K followers, 99K engagements "you know things are heating up when one of the hottest early-stage AI VC firms of all time Stanford University is losing top talent to a new upstart"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1940479816517697870) 2025-07-02 18:36:55 UTC 27.6K followers, 12.2K engagements "if a model uses several sequential tool calls interleaved with chain-of-thought reasoning to answer a single question this is:"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947665782999290322) 2025-07-22 14:31:23 UTC 27.6K followers, 12.7K engagements "@wallwalker98 @algobaker my role was more core banking-related than quant lots of LLM automation"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947707538855891186) 2025-07-22 17:17:18 UTC 27.6K followers, XX engagements "most comprehensive papers on it are probably: - OpenThoughts: - OpenMathReasoning: -Phi-4-Reasoning: TLDR: definitely useful but with some caveats you get your model to "be a reasoner" if it isn't already via distillation implanting useful patterns into the distribution (similar to the theory that more CoT on the internet post-GPT4 unlocked R1's "natural" RL CoT scaling) OpenThoughts found that the smartest models aren't always the best teachers QwQ can be better than R1 many have found that training on Claude outputs doesn't actually work that well -- my view of this is that CoTs are a"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1944125580468695187) 2025-07-12 20:03:53 UTC 27.6K followers, 2263 engagements "the models i've been using the most lately are Cursor Tab and willcb/Qwen3-1.7B-Wordle"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1945812756692656180) 2025-07-17 11:48:07 UTC 27.6K followers, 6986 engagements "honestly i would much rather have an open-source 4.1-mini than an open-source o3-mini"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1946204303338668136) 2025-07-18 13:43:59 UTC 27.6K followers, 15.9K engagements "@korigero @menhguin LLMs in finance are for parsing sentiment and ops automations. wrong tool for predicting numbers we have better ones"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947404314214863227) 2025-07-21 21:12:24 UTC 27.5K followers, XX engagements "@casper_hansen_ old banking habits die hard"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947643427237318952) 2025-07-22 13:02:33 UTC 27.6K followers, 1033 engagements "@SciumoInc @casper_hansen_ after the next stable verifiers release i am aiming to resolve the forking issue for the vast majority of use cases 🙏 @casper_hansen_ lets chat about features that youd want added to main to support this"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947682610487943505) 2025-07-22 15:38:15 UTC 27.6K followers, XX engagements "@casper_hansen_ research engineers in shambles"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947668301271740883) 2025-07-22 14:41:23 UTC 27.6K followers, XXX engagements "im much more inclined to say that the RL *system* inside OpenAI is AGI rather than than any fixed model checkpoint which comes out of it"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1946880991798173808) 2025-07-20 10:32:54 UTC 27.6K followers, 399.7K engagements "strong LLM priors are a powerful basin of reasonableness. you can twist and contort this to sample filter evaluate synthesize; theres a gravity towards Good imbued from the whole of the internet you just have to set up the dominoes and let probability take the wheel"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947402114763825190) 2025-07-21 21:03:39 UTC 27.6K followers, 4349 engagements "if youre doing research in LLM reinforcement learning what are your pain points what are the things that you feel like should really be easier but they arent"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1926504374198509711) 2025-05-25 05:03:30 UTC 27.5K followers, 77.2K engagements "@REALlTYMACHINE @goodside @Archonic2 RIP heidegger you would've loved LLM terminology discourse"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947705824303776100) 2025-07-22 17:10:29 UTC 27.6K followers, XX engagements "maybe the way to handle offloading tasks to semi-reliable agents is to just rebuild everything around git. email git. shopping git. paying taxes git. calendar believe it or not also git"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1946194886249648418) 2025-07-18 13:06:34 UTC 27.6K followers, 56.3K engagements "@nrehiew_ its annoying to have to handle as a special case in harness code / with preformatted datasets sometimes you just want an LLM to be an LLM with no other fanciness"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947380888125509745) 2025-07-21 19:39:18 UTC 27.6K followers, XXX engagements "idk about that most of my friends know what it is"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1936514349000003984) 2025-06-21 19:59:34 UTC 27.6K followers, 369.5K engagements "it wasn't even a tech bro flight. copenhagen to newark"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947644114683518985) 2025-07-22 13:05:17 UTC 27.6K followers, 5534 engagements "one of my favorite parts of working at prime intellect is getting to pick the silly names whenever someone launches a new instance"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947848238058364952) 2025-07-23 02:36:23 UTC 27.6K followers, 4599 engagements "synthetic RL environments all the way down. the future is here"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947401093526573551) 2025-07-21 20:59:36 UTC 27.6K followers, 4433 engagements "it's still crazy to me how much my life has totally changed in the past year. last summer i had just finished a CS theory phd converted from banking intern to banking full-timer and had just reached 1000 followers on here. yesterday i got recognized by someone on my flight"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947643303517634683) 2025-07-22 13:02:03 UTC 27.6K followers, 59.6K engagements "turning a big dial that says "AGI" on it and constantly looking back at the audience for approval like a contestant on the price is right"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1946871244269203598) 2025-07-20 09:54:10 UTC 27.6K followers, 5989 engagements "one of these days i'm gonna start squashing commits but today is not that day"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947792011123179550) 2025-07-22 22:52:58 UTC 27.6K followers, 6127 engagements "i was hoping i could skip adding this one to my Qwen3 but with normal tokenizer chat templates HF collection but :("  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947386897048998259) 2025-07-21 20:03:11 UTC 27.6K followers, 2344 engagements "its a shame that were running out of internet data because everyone collectively stopped putting new content onto the internet"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947778052320006492) 2025-07-22 21:57:30 UTC 27.6K followers, 12K engagements "@daniel_mac8 getting self-verification + self-generation of learning curricula to work at scale would count as AGI in my book"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1946972877556986320) 2025-07-20 16:38:01 UTC 27.6K followers, 10.5K engagements "@solarizid just seeing how small of a model i can get to hillclimb wordle with as few output toks as possible (more as a multi-turn infra test case than for science) this is with a SFT'd Qwen3-1.7B"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1944715289234379155) 2025-07-14 11:07:10 UTC 27.5K followers, XXX engagements "@HamelHusain pretty solid collection of implementations yeah not exactly what i'm looking for but this + evalscope are decent enough starting points"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1945183289905438980) 2025-07-15 18:06:50 UTC 27.6K followers, XXX engagements "do we think an ai system will be able to reliably do semi-complex personal taxes by april 2026"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1939117639428661586) 2025-06-29 00:24:06 UTC 27.6K followers, 37.8K engagements "@FkoffatAOL ML research team @ morgan stanley haha"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947680924654866584) 2025-07-22 15:31:33 UTC 27.6K followers, XXX engagements "ChatGPT Agent is actually quite useful for booking flights"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947734820374647057) 2025-07-22 19:05:42 UTC 27.6K followers, 5541 engagements "thinking is a specific mode of LLM inference which should always be removed from context in future turns is a really weird abstraction we were collectively forced into by o1 not showing CoTs"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947387896274817077) 2025-07-21 20:07:09 UTC 27.6K followers, 14.1K engagements "after a few weeks of productively working from just a laptop screen starting to wonder if X monitors is a mistake"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1947641191933079909) 2025-07-22 12:53:40 UTC 27.6K followers, 29.5K engagements "its kinda hilarious that the quora ceo has been on the board of openai since 2018 and they still missed this hard"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1945471336693760267) 2025-07-16 13:11:26 UTC 27.6K followers, 182.3K engagements "if you're in the Zurich area and interested in coming to the @PrimeIntellect event in Berlin this week but struggling to figure out transportation hit up @Laz4rz carpool options are being explored"  [@willccbb](/creator/x/willccbb) on [X](/post/tweet/1945113922282942602) 2025-07-15 13:31:12 UTC 27.4K followers, 16.7K engagements
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@willccbb
"@afurgs at that point just hand over the hardware lol" @willccbb on X 2025-07-23 02:40:12 UTC 27.6K followers, XXX engagements
"@tw_killian the research findings yes what's not clear is how much curated human data they're using nowadays for training model-based verifiers to RL against if deep research + operator + agent + IMO needed tons of human annotations system is not AGI if e2e synthetic system is maybe AGI" @willccbb on X 2025-07-20 20:52:03 UTC 27.6K followers, 3667 engagements
"@korigero @menhguin sounds good doesnt work nobody is using pretrained LLMs for this kind of forecasting at scale theyre using much smaller models with LLM-inspired architectures trained exclusively on financial data" @willccbb on X 2025-07-21 21:11:30 UTC 27.6K followers, XXX engagements
"@TheAhmadOsman i dont need it to beat opus or kimi. i just want it to be like V3-0324 but smaller" @willccbb on X 2025-07-21 19:04:57 UTC 27.6K followers, XXX engagements
"i for one am very glad to have a true non-thinking qwen3 with absolutely bonkers benchmarks and none of that /no_think think/think business" @willccbb on X 2025-07-21 19:00:54 UTC 27.6K followers, 19.2K engagements
"are there any standard LLM evaluations that have a friendly programming interface and clean documentation for using them with API clients i do not want to run a custom cli tool i want to uv add and import and pass an OpenAI client" @willccbb on X 2025-07-15 16:53:02 UTC 27.5K followers, 14.3K engagements
"@TheodoreGalanos but actually the tool call responses are also simulated the tools are never real hallucination is a feature not a bug" @willccbb on X 2025-07-21 22:17:43 UTC 27.5K followers, XXX engagements
"@Miles_Brundage scans much more like the way youd hand-write a proof in a seated exam vs a take-home problem set" @willccbb on X 2025-07-22 10:08:04 UTC 27.6K followers, 3433 engagements
"cant stop thinking about this one insanely elegant seems insanely powerful" @willccbb on X 2025-07-17 20:17:01 UTC 27.6K followers, 99K engagements
"you know things are heating up when one of the hottest early-stage AI VC firms of all time Stanford University is losing top talent to a new upstart" @willccbb on X 2025-07-02 18:36:55 UTC 27.6K followers, 12.2K engagements
"if a model uses several sequential tool calls interleaved with chain-of-thought reasoning to answer a single question this is:" @willccbb on X 2025-07-22 14:31:23 UTC 27.6K followers, 12.7K engagements
"@wallwalker98 @algobaker my role was more core banking-related than quant lots of LLM automation" @willccbb on X 2025-07-22 17:17:18 UTC 27.6K followers, XX engagements
"most comprehensive papers on it are probably: - OpenThoughts: - OpenMathReasoning: -Phi-4-Reasoning: TLDR: definitely useful but with some caveats you get your model to "be a reasoner" if it isn't already via distillation implanting useful patterns into the distribution (similar to the theory that more CoT on the internet post-GPT4 unlocked R1's "natural" RL CoT scaling) OpenThoughts found that the smartest models aren't always the best teachers QwQ can be better than R1 many have found that training on Claude outputs doesn't actually work that well -- my view of this is that CoTs are a" @willccbb on X 2025-07-12 20:03:53 UTC 27.6K followers, 2263 engagements
"the models i've been using the most lately are Cursor Tab and willcb/Qwen3-1.7B-Wordle" @willccbb on X 2025-07-17 11:48:07 UTC 27.6K followers, 6986 engagements
"honestly i would much rather have an open-source 4.1-mini than an open-source o3-mini" @willccbb on X 2025-07-18 13:43:59 UTC 27.6K followers, 15.9K engagements
"@korigero @menhguin LLMs in finance are for parsing sentiment and ops automations. wrong tool for predicting numbers we have better ones" @willccbb on X 2025-07-21 21:12:24 UTC 27.5K followers, XX engagements
"@casper_hansen_ old banking habits die hard" @willccbb on X 2025-07-22 13:02:33 UTC 27.6K followers, 1033 engagements
"@SciumoInc @casper_hansen_ after the next stable verifiers release i am aiming to resolve the forking issue for the vast majority of use cases 🙏 @casper_hansen_ lets chat about features that youd want added to main to support this" @willccbb on X 2025-07-22 15:38:15 UTC 27.6K followers, XX engagements
"@casper_hansen_ research engineers in shambles" @willccbb on X 2025-07-22 14:41:23 UTC 27.6K followers, XXX engagements
"im much more inclined to say that the RL system inside OpenAI is AGI rather than than any fixed model checkpoint which comes out of it" @willccbb on X 2025-07-20 10:32:54 UTC 27.6K followers, 399.7K engagements
"strong LLM priors are a powerful basin of reasonableness. you can twist and contort this to sample filter evaluate synthesize; theres a gravity towards Good imbued from the whole of the internet you just have to set up the dominoes and let probability take the wheel" @willccbb on X 2025-07-21 21:03:39 UTC 27.6K followers, 4349 engagements
"if youre doing research in LLM reinforcement learning what are your pain points what are the things that you feel like should really be easier but they arent" @willccbb on X 2025-05-25 05:03:30 UTC 27.5K followers, 77.2K engagements
"@REALlTYMACHINE @goodside @Archonic2 RIP heidegger you would've loved LLM terminology discourse" @willccbb on X 2025-07-22 17:10:29 UTC 27.6K followers, XX engagements
"maybe the way to handle offloading tasks to semi-reliable agents is to just rebuild everything around git. email git. shopping git. paying taxes git. calendar believe it or not also git" @willccbb on X 2025-07-18 13:06:34 UTC 27.6K followers, 56.3K engagements
"@nrehiew_ its annoying to have to handle as a special case in harness code / with preformatted datasets sometimes you just want an LLM to be an LLM with no other fanciness" @willccbb on X 2025-07-21 19:39:18 UTC 27.6K followers, XXX engagements
"idk about that most of my friends know what it is" @willccbb on X 2025-06-21 19:59:34 UTC 27.6K followers, 369.5K engagements
"it wasn't even a tech bro flight. copenhagen to newark" @willccbb on X 2025-07-22 13:05:17 UTC 27.6K followers, 5534 engagements
"one of my favorite parts of working at prime intellect is getting to pick the silly names whenever someone launches a new instance" @willccbb on X 2025-07-23 02:36:23 UTC 27.6K followers, 4599 engagements
"synthetic RL environments all the way down. the future is here" @willccbb on X 2025-07-21 20:59:36 UTC 27.6K followers, 4433 engagements
"it's still crazy to me how much my life has totally changed in the past year. last summer i had just finished a CS theory phd converted from banking intern to banking full-timer and had just reached 1000 followers on here. yesterday i got recognized by someone on my flight" @willccbb on X 2025-07-22 13:02:03 UTC 27.6K followers, 59.6K engagements
"turning a big dial that says "AGI" on it and constantly looking back at the audience for approval like a contestant on the price is right" @willccbb on X 2025-07-20 09:54:10 UTC 27.6K followers, 5989 engagements
"one of these days i'm gonna start squashing commits but today is not that day" @willccbb on X 2025-07-22 22:52:58 UTC 27.6K followers, 6127 engagements
"i was hoping i could skip adding this one to my Qwen3 but with normal tokenizer chat templates HF collection but :(" @willccbb on X 2025-07-21 20:03:11 UTC 27.6K followers, 2344 engagements
"its a shame that were running out of internet data because everyone collectively stopped putting new content onto the internet" @willccbb on X 2025-07-22 21:57:30 UTC 27.6K followers, 12K engagements
"@daniel_mac8 getting self-verification + self-generation of learning curricula to work at scale would count as AGI in my book" @willccbb on X 2025-07-20 16:38:01 UTC 27.6K followers, 10.5K engagements
"@solarizid just seeing how small of a model i can get to hillclimb wordle with as few output toks as possible (more as a multi-turn infra test case than for science) this is with a SFT'd Qwen3-1.7B" @willccbb on X 2025-07-14 11:07:10 UTC 27.5K followers, XXX engagements
"@HamelHusain pretty solid collection of implementations yeah not exactly what i'm looking for but this + evalscope are decent enough starting points" @willccbb on X 2025-07-15 18:06:50 UTC 27.6K followers, XXX engagements
"do we think an ai system will be able to reliably do semi-complex personal taxes by april 2026" @willccbb on X 2025-06-29 00:24:06 UTC 27.6K followers, 37.8K engagements
"@FkoffatAOL ML research team @ morgan stanley haha" @willccbb on X 2025-07-22 15:31:33 UTC 27.6K followers, XXX engagements
"ChatGPT Agent is actually quite useful for booking flights" @willccbb on X 2025-07-22 19:05:42 UTC 27.6K followers, 5541 engagements
"thinking is a specific mode of LLM inference which should always be removed from context in future turns is a really weird abstraction we were collectively forced into by o1 not showing CoTs" @willccbb on X 2025-07-21 20:07:09 UTC 27.6K followers, 14.1K engagements
"after a few weeks of productively working from just a laptop screen starting to wonder if X monitors is a mistake" @willccbb on X 2025-07-22 12:53:40 UTC 27.6K followers, 29.5K engagements
"its kinda hilarious that the quora ceo has been on the board of openai since 2018 and they still missed this hard" @willccbb on X 2025-07-16 13:11:26 UTC 27.6K followers, 182.3K engagements
"if you're in the Zurich area and interested in coming to the @PrimeIntellect event in Berlin this week but struggling to figure out transportation hit up @Laz4rz carpool options are being explored" @willccbb on X 2025-07-15 13:31:12 UTC 27.4K followers, 16.7K engagements
/creator/twitter::3064259332/posts