[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.] [@webagentlab](/creator/twitter/webagentlab) "WebArXiv: Evaluating Multimodal Agents on Time-Invariant arXiv Tasks WebArXiv is a static benchmark designed to evaluate multimodal web agents on time-invariant tasks from the arXiv platform addressing issues of ground truth instability and proposing a dynamic reflection mechanism to enhance agent decision-making. Zihao Sun Meng Fang Ling Chen Australian Artificial Intelligence Institute; University of Liverpool"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1943558900923748681) 2025-07-11 06:32:06 UTC XXX followers, XX engagements "LineRetriever: Planning-Aware Observation Reduction for Web Agents LineRetriever is a novel method that enhances web agent performance by intelligently reducing observation size through planning-aware retrieval allowing for more efficient navigation while maintaining decision-making effectiveness within context limits. Imene Kerboua Sahar Omidi Shayegan @sahar_shayegan Megh Thakkar @Megh1211 Xing Han L @xhluca Massimo Caccia @MassCaccia Vronique Eglin Alexandre Aussem Jrmy Espinas Alexandre Lacoste @alex_lacoste_ INSA Lyon; Esker; ServiceNow Research; Mila AI Institute; McGill University;"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1940960364189102511) 2025-07-04 02:26:26 UTC XXX followers, XX engagements "Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties This paper presents a comprehensive evaluation of mobile GUI agents vulnerabilities to misleading content attacks using a novel simulation framework called AgentHazard revealing significant susceptibility and proposing strategies for enhancing their robustness. Guohong Liu Jialei Ye Jiacheng Liu Yuanchun Li Wei Liu Pengzhi Gao Jian Luan Yunxin Liu Institute for AI Industry Research; Tsinghua University; University of Electronic Science and Technology of China; Xiaomi AI Lab"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1943558873211900328) 2025-07-11 06:31:59 UTC XXX followers, XX engagements "GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents GUI-Actor is a novel coordinate-free approach for visual grounding in GUI agents that utilizes attention mechanisms to enhance spatial-semantic alignment and improve performance while requiring fewer parameters and training data compared to traditional coordinate-based methods. Qianhui Wu @5000hui Kanzhi Cheng @njucckevin Rui Yang Chaoyun Zhang @vyokky Jianwei Yang @jw2yang4ai Huiqiang Jiang Jian Mu Baolin Peng Bo Qiao Reuben Tan Si Qin Lars Liden Qingwei Lin Huan Zhang Tong Zhang Jianbing Zhang Dongmei Zhang Jianfeng Gao"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1930891549560512700) 2025-06-06 07:36:34 UTC XXX followers, XXX engagements "Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree The paper reveals how adversaries can exploit vulnerabilities in LLM-based web navigation agents through Indirect Prompt Injection attacks by embedding malicious triggers in HTML highlighting significant security risks and the urgent need for stronger defenses. Sam Johnson Viet Pham Thai Le Indiana University; University of Science"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1948631547852304880) 2025-07-25 06:28:59 UTC XXX followers, XX engagements "WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization WebShaper is a formalization-driven framework that enhances the synthesis of training data for information-seeking agents powered by Large Language Models addressing limitations of existing methods through structured task definitions and iterative task expansion resulting in superior performance on benchmarks like GAIA and WebWalkerQA. Zhengwei Tao Jialong Wu @jlwu55 Wenbiao Yin Junkai Zhang Baixuan Li Haiyang Shen Kuan Li @likuan1995 Liwen Zhang Xinyu Wang Yong Jiang Pengjun Xie Fei Huang Jingren Zhou Tongyi Lab;"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1948631536875831424) 2025-07-25 06:28:56 UTC XXX followers, XX engagements "GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning we open-source GLM-4.1V-9B-Thinking which achieves state-of-the-art performance among models of comparable size. In a comprehensive evaluation across XX public benchmarks our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on XX benchmarks relative to the significantly larger Qwen2.5-VL-72B. Notably GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1940960359185326101) 2025-07-04 02:26:25 UTC XXX followers, XX engagements "WebGuard: Building a Generalizable Guardrail for Web Agents WebGuard is a novel dataset and framework designed to evaluate and enhance the safety of autonomous web agents by categorizing 4939 actions across various websites into risk levels revealing significant limitations in current models predictive capabilities and emphasizing the need for robust guardrails. Boyuan Zheng @boyuan__zheng Zeyi Liao @LiaoZeyi Scott Salisbury Zeyuan Liu Michael Lin Qinyuan Zheng Zifan Wang @_zifan_wang Xiang Deng Dawn Song @dawnsongtweets Huan Sun @hhsun1 Yu Su @ysu_nlp The Ohio State University; Scale AI;"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1948631542219333917) 2025-07-25 06:28:58 UTC XXX followers, XX engagements "WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis WebSynthesis is a novel framework that enhances the training of automated web agents by using a learned world model and Monte Carlo Tree Search to efficiently generate high-quality interaction trajectories offline outperforming traditional methods that rely on extensive real-world data. Yifei Gao Junhong Ye Jiaqi Wang Jitao Sang Beijing Jiaotong University"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1943558878245052535) 2025-07-11 06:32:00 UTC XXX followers, XX engagements "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks Meta SecAlign is an open-source large language model designed to enhance security against prompt injection attacks by integrating robust model-level defenses while maintaining high utility thus fostering transparency and collaboration in AI security research. Sizhe Chen @_Sizhe_Chen_ Arman Zharmagambetov @ArmanZharmagam1 David Wagner Chuan Guo Meta FAIR; UC Berkeley"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1943558894728708571) 2025-07-11 06:32:04 UTC XXX followers, XX engagements "Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence The paper introduces EMBODIED WEB AGENTS a new AI paradigm that integrates physical embodiment with web-scale knowledge access to enhance task performance across diverse domains like cooking navigation and shopping while highlighting the challenges of cross-domain reasoning faced by current AI models. Yining Hong @yining_hong Rui Sun @RuiSun94013021 Bingxuan Li Xingcheng Yao Maxine Wu Alexander Chien Da Yin @Wade_Yin9712 Ying Nian Wu Zhecan James Wang Kai-Wei Chang @kaiwei_chang University of California"  [@webagentlab](/creator/x/webagentlab) on [X](/post/tweet/1935951371741852117) 2025-06-20 06:42:29 UTC XXX followers, XXX engagements
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@webagentlab
"WebArXiv: Evaluating Multimodal Agents on Time-Invariant arXiv Tasks WebArXiv is a static benchmark designed to evaluate multimodal web agents on time-invariant tasks from the arXiv platform addressing issues of ground truth instability and proposing a dynamic reflection mechanism to enhance agent decision-making. Zihao Sun Meng Fang Ling Chen Australian Artificial Intelligence Institute; University of Liverpool" @webagentlab on X 2025-07-11 06:32:06 UTC XXX followers, XX engagements
"LineRetriever: Planning-Aware Observation Reduction for Web Agents LineRetriever is a novel method that enhances web agent performance by intelligently reducing observation size through planning-aware retrieval allowing for more efficient navigation while maintaining decision-making effectiveness within context limits. Imene Kerboua Sahar Omidi Shayegan @sahar_shayegan Megh Thakkar @Megh1211 Xing Han L @xhluca Massimo Caccia @MassCaccia Vronique Eglin Alexandre Aussem Jrmy Espinas Alexandre Lacoste @alex_lacoste_ INSA Lyon; Esker; ServiceNow Research; Mila AI Institute; McGill University;" @webagentlab on X 2025-07-04 02:26:26 UTC XXX followers, XX engagements
"Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties This paper presents a comprehensive evaluation of mobile GUI agents vulnerabilities to misleading content attacks using a novel simulation framework called AgentHazard revealing significant susceptibility and proposing strategies for enhancing their robustness. Guohong Liu Jialei Ye Jiacheng Liu Yuanchun Li Wei Liu Pengzhi Gao Jian Luan Yunxin Liu Institute for AI Industry Research; Tsinghua University; University of Electronic Science and Technology of China; Xiaomi AI Lab" @webagentlab on X 2025-07-11 06:31:59 UTC XXX followers, XX engagements
"GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents GUI-Actor is a novel coordinate-free approach for visual grounding in GUI agents that utilizes attention mechanisms to enhance spatial-semantic alignment and improve performance while requiring fewer parameters and training data compared to traditional coordinate-based methods. Qianhui Wu @5000hui Kanzhi Cheng @njucckevin Rui Yang Chaoyun Zhang @vyokky Jianwei Yang @jw2yang4ai Huiqiang Jiang Jian Mu Baolin Peng Bo Qiao Reuben Tan Si Qin Lars Liden Qingwei Lin Huan Zhang Tong Zhang Jianbing Zhang Dongmei Zhang Jianfeng Gao" @webagentlab on X 2025-06-06 07:36:34 UTC XXX followers, XXX engagements
"Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree The paper reveals how adversaries can exploit vulnerabilities in LLM-based web navigation agents through Indirect Prompt Injection attacks by embedding malicious triggers in HTML highlighting significant security risks and the urgent need for stronger defenses. Sam Johnson Viet Pham Thai Le Indiana University; University of Science" @webagentlab on X 2025-07-25 06:28:59 UTC XXX followers, XX engagements
"WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization WebShaper is a formalization-driven framework that enhances the synthesis of training data for information-seeking agents powered by Large Language Models addressing limitations of existing methods through structured task definitions and iterative task expansion resulting in superior performance on benchmarks like GAIA and WebWalkerQA. Zhengwei Tao Jialong Wu @jlwu55 Wenbiao Yin Junkai Zhang Baixuan Li Haiyang Shen Kuan Li @likuan1995 Liwen Zhang Xinyu Wang Yong Jiang Pengjun Xie Fei Huang Jingren Zhou Tongyi Lab;" @webagentlab on X 2025-07-25 06:28:56 UTC XXX followers, XX engagements
"GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning we open-source GLM-4.1V-9B-Thinking which achieves state-of-the-art performance among models of comparable size. In a comprehensive evaluation across XX public benchmarks our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on XX benchmarks relative to the significantly larger Qwen2.5-VL-72B. Notably GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks" @webagentlab on X 2025-07-04 02:26:25 UTC XXX followers, XX engagements
"WebGuard: Building a Generalizable Guardrail for Web Agents WebGuard is a novel dataset and framework designed to evaluate and enhance the safety of autonomous web agents by categorizing 4939 actions across various websites into risk levels revealing significant limitations in current models predictive capabilities and emphasizing the need for robust guardrails. Boyuan Zheng @boyuan__zheng Zeyi Liao @LiaoZeyi Scott Salisbury Zeyuan Liu Michael Lin Qinyuan Zheng Zifan Wang @_zifan_wang Xiang Deng Dawn Song @dawnsongtweets Huan Sun @hhsun1 Yu Su @ysu_nlp The Ohio State University; Scale AI;" @webagentlab on X 2025-07-25 06:28:58 UTC XXX followers, XX engagements
"WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis WebSynthesis is a novel framework that enhances the training of automated web agents by using a learned world model and Monte Carlo Tree Search to efficiently generate high-quality interaction trajectories offline outperforming traditional methods that rely on extensive real-world data. Yifei Gao Junhong Ye Jiaqi Wang Jitao Sang Beijing Jiaotong University" @webagentlab on X 2025-07-11 06:32:00 UTC XXX followers, XX engagements
"Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks Meta SecAlign is an open-source large language model designed to enhance security against prompt injection attacks by integrating robust model-level defenses while maintaining high utility thus fostering transparency and collaboration in AI security research. Sizhe Chen @Sizhe_Chen Arman Zharmagambetov @ArmanZharmagam1 David Wagner Chuan Guo Meta FAIR; UC Berkeley" @webagentlab on X 2025-07-11 06:32:04 UTC XXX followers, XX engagements
"Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence The paper introduces EMBODIED WEB AGENTS a new AI paradigm that integrates physical embodiment with web-scale knowledge access to enhance task performance across diverse domains like cooking navigation and shopping while highlighting the challenges of cross-domain reasoning faced by current AI models. Yining Hong @yining_hong Rui Sun @RuiSun94013021 Bingxuan Li Xingcheng Yao Maxine Wu Alexander Chien Da Yin @Wade_Yin9712 Ying Nian Wu Zhecan James Wang Kai-Wei Chang @kaiwei_chang University of California" @webagentlab on X 2025-06-20 06:42:29 UTC XXX followers, XXX engagements
/creator/twitter::1857262354221957120/posts