arXiv Daily Index - 2026-05-11

#	Title	Categories	Authors	Abstract
cs.AI 155 papers
941	GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning 2605.06671 Multi-Agent Graph Reasoning用分治多智能体框架提升大图算法推理的可扩展性。	cs.AI	Wenjin Li, Jiaming Cui	Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi-... Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi-step reasoning, especially on larger graphs. Motivated by this gap, we propose GraphDC, a Divide-and-Conquer multi-agent framework for scalable graph algorithm reasoning. Specifically, inspired by Divide-and-Conquer design, GraphDC decompos...
942	Fast and Effective Redistricting Optimization via Composite-Move Tabu Search 2605.06682 Redistricting Tabu Search用复合移动禁忌搜索高效优化满足连通约束的选区划分。	cs.AI	Hai Jin, Diansheng Guo	Spatial redistricting is a practical combinatorial optimization problem that demands high-quality solutions, rapid turnaround, and flexibility to accommodate multi-criteria objectives and interactive refinement. A central challenge is the contiguity constraint... Spatial redistricting is a practical combinatorial optimization problem that demands high-quality solutions, rapid turnaround, and flexibility to accommodate multi-criteria objectives and interactive refinement. A central challenge is the contiguity constraint: enforcing contiguity in integer-programming or heuristic search can severely shrink the feasible neighborhood, weaken exploration, and trap the search in poor local optima. We introduce a composite-move Tabu search (CM-Tabu) that systemat...
943	When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning 2605.06772 Critic-Actor Loop for Physics用结构化评审-行动循环评估批判反馈对物理推理的增益。	cs.AI	Vasilis Niarchos, Constantinos Papageorgakis, Alexander G. Stapleton, Sokratis Trifinopoulos	As large language models (LLMs) show increasing promise on research-level physics reasoning tasks and agentic AI becomes more common, a practical question emerges: How does the interaction between researchers and agents affect the results? We study this using ... As large language models (LLMs) show increasing promise on research-level physics reasoning tasks and agentic AI becomes more common, a practical question emerges: How does the interaction between researchers and agents affect the results? We study this using SCALAR (Structured Critic--Actor Loop for AI Reasoning), an Actor--Critic--Judge pipeline applied to quantum field theory and string theory problems. The Actor proposes solutions, the Critic provides iterative feedback, and an independent J...
944	Towards Security-Auditable LLM Agents: A Unified Graph Representation 2605.06812 Auditable LLM Agent Graphs提出统一图表示以支持LLM智能体行为的安全可审计性。	cs.AI	Chaofan Li, Lyuye Zhang, Jintao Zhai, Siyue Feng, Xichun Yang	LLM-based agentic systems are rapidly evolving to perform complex autonomous tasks through dynamic tool invocation, stateful memory management, and multi-agent collaboration. However, this semantics-driven execution paradigm creates a severe semantic gap betwe... LLM-based agentic systems are rapidly evolving to perform complex autonomous tasks through dynamic tool invocation, stateful memory management, and multi-agent collaboration. However, this semantics-driven execution paradigm creates a severe semantic gap between low-level physical events and high-level execution intent, making post-hoc security auditing fundamentally difficult. Existing representation mechanisms, including static SBOMs and runtime logs, provide only fragmented evidence and fail ...
945	Randomness is sometimes necessary for coordination 2605.06825 Randomness for MARL Coordination证明协作多智能体在对称观测下需随机性以实现角色分化。	cs.AI	Rohan Patil, Jai Malegaonkar, Henrik I. Christensen	Full parameter sharing is standard in cooperative multi-agent reinforcement learning (MARL) for homogeneous agents. Under permutation-symmetric observations, however, a shared deterministic policy outputs identical action distributions for every agent, making ... Full parameter sharing is standard in cooperative multi-agent reinforcement learning (MARL) for homogeneous agents. Under permutation-symmetric observations, however, a shared deterministic policy outputs identical action distributions for every agent, making role differentiation impossible. This failure can theoretically be resolved using symmetry breaking among anonymous identical processors, which requires randomness. We propose Diamond Attention, a cross-attention architecture in which each ...
946	Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning 2605.06840 LLM Planning Trace Analysis从推理轨迹提取搜索树以揭示LLM规划的短视性。	cs.AI	Sixing Chen, Ji-An Li, Saner Cakir, Sinan Akcali, Kayla Lee	Large language models (LLMs), especially reasoning models, generate extended chain-of-thought (CoT) reasoning that often contains explicit deliberation over future outcomes. Yet whether this deliberation constitutes genuine planning, how it is structured, and ... Large language models (LLMs), especially reasoning models, generate extended chain-of-thought (CoT) reasoning that often contains explicit deliberation over future outcomes. Yet whether this deliberation constitutes genuine planning, how it is structured, and what aspects of it drive performance remain poorly understood. In this work, we introduce a new method to characterize LLM planning by extracting and quantifying search trees from reasoning traces in the four-in-a-row board game. By fitting...
947	Agentick: A Unified Benchmark for General Sequential Decision-Making Agents 2605.06869 Sequential Decision Agent Benchmark提出统一基准评测RL与大模型等序贯决策智能体。	cs.AI	Roger Creus Castanyer, Pablo Samuel Castro, Glen Berseth	AI agent research spans a wide spectrum: from RL agents that learn from scratch to foundation model agents that leverage pre-trained knowledge, yet no unified benchmark enables fair comparison across these approaches. We present Agentick, a benchmark for seque... AI agent research spans a wide spectrum: from RL agents that learn from scratch to foundation model agents that leverage pre-trained knowledge, yet no unified benchmark enables fair comparison across these approaches. We present Agentick, a benchmark for sequential decision-making agents designed to evaluate RL, LLM, VLM, hybrid, and human agents on common ground and to power research on the fundamental challenges of sequential decision-making. Agentick provides 37 procedurally generated tasks a...
948	How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem 2605.06882 Long-Chain Reasoning Evaluation在等价类问题上系统评测LLM的长链推理能力。	cs.AI	Chun Zheng, Lianlong Wu, Bingqian Li, Lvting Liu, Yi Zhou	Large Language Models (LLMs) have achieved great improvements in recent years. Nevertheless, it still remains unclear how good LLMs are for reasoning tasks, especially for long-chain ones. In this paper, we evaluate LLMs' performance on the simplest yet long-c... Large Language Models (LLMs) have achieved great improvements in recent years. Nevertheless, it still remains unclear how good LLMs are for reasoning tasks, especially for long-chain ones. In this paper, we evaluate LLMs' performance on the simplest yet long-chain reasoning task, namely the Equivalence Class Problem (ECP), i.e., determining whether two variables are equal given a set of randomly generated equivalence relations. We consider both reasoning and non-reasoning representative LLMs ove...
949	Beyond the Black Box: Interpretability of Agentic AI Tool Use 2605.06890 Interpretability of Tool-Using Agents研究并提升智能体工具调用过程的可解释性与可诊断性。	cs.AI	Hariom Tatsat, Ariye Shater	AI agents are promising for high-stakes enterprise workflows, but dependable deployment remains limited because tool-use failures are difficult to diagnose and control. Agents may skip required tool calls, invoke tools unnecessarily, or take actions whose cons... AI agents are promising for high-stakes enterprise workflows, but dependable deployment remains limited because tool-use failures are difficult to diagnose and control. Agents may skip required tool calls, invoke tools unnecessarily, or take actions whose consequence becomes visible only after execution. Existing observability methods are mostly external: prompts reveal correlations, evaluations score outputs, and logs arrive only after the model has already acted. In long-horizon settings, thes...
950	Mitigating Cognitive Bias in RLHF by Altering Rationality 2605.06895 Bias-Robust RLHF Modeling通过调整偏好模型理性参数缓解RLHF中的认知偏差。	cs.AI	Tiffany Horter, Andrew Markham, Niki Trigoni, Serena Booth	How can we make models robust to even imperfect human feedback? In reinforcement learning from human feedback (RLHF), human preferences over model outputs are used to train a reward model that assigns scalar values to responses. Because these rewards are infer... How can we make models robust to even imperfect human feedback? In reinforcement learning from human feedback (RLHF), human preferences over model outputs are used to train a reward model that assigns scalar values to responses. Because these rewards are inferred from pairwise comparisons, this learning depends on an assumed relationship between latent reward differences and observed preferences, typically modeled using a Boltzmann formulation in which a rationality parameter beta informs how co...
951	Self-Programmed Execution for Language-Model Agents 2605.06898 Self-Programmed Agent Execution提出由模型输出自编排的执行架构以替代固定代理控制器。	cs.AI	Luke J. O'Connor	At the heart of existing language model agents is a fixed orchestrator program responsible for the state transition between consecutive turns. This paper introduces self-programmed execution (SPE), an agent architecture in which the model completion is itself ... At the heart of existing language model agents is a fixed orchestrator program responsible for the state transition between consecutive turns. This paper introduces self-programmed execution (SPE), an agent architecture in which the model completion is itself the orchestrator program, and the harness evaluates this program but does not impose its own orchestration policy. I formalize this idea using agentic machines: an SPE state is one from which a model completion can load any state of an embe...
952	Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents 2605.06957 Hierarchical Generalized Planning with LLMs学习并复用策略分解组件以实现层级化泛化规划智能体。	cs.AI	Shirin Sohrabi, Haritha Ananthakrishnan, Harsha Kokel, Kavitha Srinivas, Michael Katz	We present a dynamic policy-learning approach that combines generalized planning and hierarchical task decomposition for LLM-based agents. Our method, Hierarchical Component Learning for Generalized Policies (HCL-GP ), learns parameterized policies that genera... We present a dynamic policy-learning approach that combines generalized planning and hierarchical task decomposition for LLM-based agents. Our method, Hierarchical Component Learning for Generalized Policies (HCL-GP ), learns parameterized policies that generalize across task instances and automatically extracts reusable components from successful executions, organizing them into a component library for compositional policy generation. We address three challenges: (1) learning components through...
953	Optimal Experiments for Partial Causal Effect Identification 2605.06993 Experiment Selection for Causal Bounds在成本约束下选择实验以最大化收紧部分可识别因果效应界。	cs.AI	Tobias Maringgele, Jalal Etesami	Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset o... Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset of experiments that maximally tightens bounds on a target query. We formalize this as the max-potency problem, where epistemic potency measures the worst-case reduction in bound width guaranteed by an experiment, and show that this problem i...
954	Adaptive auditing of AI systems with anytime-valid guarantees 2605.07002 Anytime-Valid Adaptive Auditing提出具随时有效统计保证的自适应AI系统审计方法。	cs.AI	Siyu Zhou, Patrick Vossler, Venkatesh Sivaraman, Yifan Mai, Jean Feng	A major bottleneck in characterizing the failure modes of generative AI systems is the cost and time of annotation and evaluation. Consequently, adaptive testing paradigms have gained popularity, where one opportunistically decides which cases and how many to ... A major bottleneck in characterizing the failure modes of generative AI systems is the cost and time of annotation and evaluation. Consequently, adaptive testing paradigms have gained popularity, where one opportunistically decides which cases and how many to annotate based on past results. While this framework is highly practical, its extreme flexibility makes it difficult to draw statistically rigorous conclusions, as it violates classical assumptions: the number of observations is typically l...
955	Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight 2605.07021 Monitorable LLM Reasoning训练行为提示令牌使推理可监控并便于外部监督控制。	cs.AI	Christopher Z. Cui, Taylor W. Killian, Prithviraj Ammanabrolu	Reasoning in Large Language Models (LLMs) poses a challenge for oversight as many misaligned behaviors do not surface until reasoning concludes. To address this, we introduce Behavior Cue Reasoning for making LLM reasoning more controllable and monitorable. Be... Reasoning in Large Language Models (LLMs) poses a challenge for oversight as many misaligned behaviors do not surface until reasoning concludes. To address this, we introduce Behavior Cue Reasoning for making LLM reasoning more controllable and monitorable. Behavior Cues are special token sequences that a model is trained to emit immediately before specific implicit and explicit behaviors, acting as dual purpose signal and control levers. When fine-tuning a weaker external monitor with Reinforce...
956	2.5-D Decomposition for LLM-Based Spatial Construction 2605.07066 Neuro-Symbolic Spatial Construction用2.5D分解让LLM做平面规划并由执行器确定垂直放置。	cs.AI	Paul Whitten, Li-Jen Chen, Sharath Baddam	Autonomous systems that build structures from natural-language instructions need reliable spatial reasoning, yet large language models (LLMs) make systematic coordinate errors when generating three-dimensional block placements. We present a neuro-symbolic pipe... Autonomous systems that build structures from natural-language instructions need reliable spatial reasoning, yet large language models (LLMs) make systematic coordinate errors when generating three-dimensional block placements. We present a neuro-symbolic pipeline based on \emph{2.5-D decomposition}: the LLM plans in the two-dimensional horizontal plane while a deterministic executor computes all vertical placement from column occupancy, eliminating an entire class of errors. On the Build What I...
957	TeamBench: Evaluating Agent Coordination under Enforced Role Separation 2605.07073 Role-Separated Agent Coordination Benchmark构建强制角色隔离的基准以评测多智能体真实协作能力。	cs.AI	Yubin Kim, Chanwoo Park, Taehan Kim, Eugene Park, Samuel Schmidgall	Agent systems often decompose a task across multiple roles, but these roles are typically specified by prompts rather than enforced by access controls. Without enforcement, a team pass rate can mask whether agents actually coordinated or whether one role effec... Agent systems often decompose a task across multiple roles, but these roles are typically specified by prompts rather than enforced by access controls. Without enforcement, a team pass rate can mask whether agents actually coordinated or whether one role effectively did another role's work. We present TeamBench, a benchmark with 851 task templates and 931 seeded instances for evaluating agent coordination under operating system-enforced role separation. TeamBench separates specification access, ...
958	Online Allocation with Unknown Shared Supply 2605.07080 Online Allocation with Unknown Supply提出未知共享供给的在线分配模型与相应决策算法。	cs.AI	Tzeh Yuan Neoh, Davin Choo, Mengchu Yue, Milind Tambe	Many real-world resource allocation systems, such as humanitarian logistics and vaccine distribution, must preposition limited supply across multiple locations before demand is realized while stockouts incur irreversible service losses. To study this, we intro... Many real-world resource allocation systems, such as humanitarian logistics and vaccine distribution, must preposition limited supply across multiple locations before demand is realized while stockouts incur irreversible service losses. To study this, we introduce the Online Shared Supply Allocation (OSSA) problem, a stateful online model in which a central hub allocates a finite, unknown supply to multiple sites facing sequential demand under fixed-charge transportation costs and lost-sales pen...
959	ARMOR: An Agentic Framework for Reaction Feasibility Prediction via Adaptive Utility-aware Multi-tool Reasoning 2605.07103 Agentic Multi-Tool Chemistry Reasoning用效用自适应多工具推理框架预测化学反应可行性。	cs.AI	Ye Liu, Botao Yu, Xinyi Ling, Daniel Adu-Ampratwum, Xia Ning	Reaction feasibility prediction, as a fundamental problem in computational chemistry, has benefited from diverse tools enabled by recent advances in artificial intelligence, particularly large language models. However, the performance of individual tools varie... Reaction feasibility prediction, as a fundamental problem in computational chemistry, has benefited from diverse tools enabled by recent advances in artificial intelligence, particularly large language models. However, the performance of individual tools varies substantially across reactions, making it difficult for any single tool to consistently perform well across all cases. This raises a critical challenge: how to effectively leverage multiple tools to obtain more accurate feasibility predic...
960	Switchcraft: AI Model Router for Agentic Tool Calling 2605.07112 Model Routing for Tool Calling提出面向工具调用的模型路由器以在保证正确性下降低成本。	cs.AI	Sharad Agarwal, Pooria Namyar, Alec Wolman, Rahul Ambavat, Ankur Gupta	Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets. Model routing can mitigate this, but existing routers are designed for chat completion rather than tool use. W... Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets. Model routing can mitigate this, but existing routers are designed for chat completion rather than tool use. We present Switchcraft, the first (to the best of our knowledge) model router optimized for agentic tool calling. Switchcraft operates inline, selecting the lowest-cost model subject to correctness. We construct an evaluation framework on fi...
961	SREGym: A Live Benchmark for AI SRE Agents with High-Fidelity Failure Scenarios 2605.07161 SRE智能体故障基准提出SREGym在线基准，用真实云原生栈注入故障评测SRE智能体。	cs.AI	Jackson Clark, Yiming Su, Saad Mohammad Rafid Pial, Yifang Tian, Lily Gniedziejko	AI agents are increasingly used to diagnose and mitigate failures in production systems, known as agentic Site Reliability Engineering (SRE). Current SRE benchmarks are limited to oversimplistic SRE tasks and are unfortunately hard to extend due to bespoke des... AI agents are increasingly used to diagnose and mitigate failures in production systems, known as agentic Site Reliability Engineering (SRE). Current SRE benchmarks are limited to oversimplistic SRE tasks and are unfortunately hard to extend due to bespoke designs. We present SREGym, a high-fidelity benchmark for SRE agents. SREGym exposes a live system environment built atop real-world cloud-native system stacks, where high-fidelity failure scenarios are simulated through fault injectors. SREGy...
962	Repeated Deceptive Path Planning against Learnable Observer 2605.07174 可学习观察者欺骗规划提出RDPP建模会学习的观察者，研究重复交互下的欺骗路径规划。	cs.AI	Shiyue Cao, Pei Xu, Likun Yang, Lei Cui, Shizhao Yu	We study the problem of deceptive path planning (DPP), where an agent aims to conceal its true destination from external observers. While existing work assumes static, non-learning observers, real-world adversaries-such as in critical goods transportation or m... We study the problem of deceptive path planning (DPP), where an agent aims to conceal its true destination from external observers. While existing work assumes static, non-learning observers, real-world adversaries-such as in critical goods transportation or military operations-can adapt by learning from historical trajectories. To address this gap, we introduce Repeated Deceptive Path Planning (RDPP), a new formulation that explicitly models learnable observers. We show that existing DPP method...
963	Towards Autonomous Business Intelligence via Data-to-Insight Discovery Agent 2605.07202 自主商业智能洞察代理提出AIDA端到端代理，在复杂企业数据中自动探索并生成可行动洞察。	cs.AI	Dongming Wu, Junwen Li, Ming Lu, Gang Wang, Ting Chen	Transforming fragmented enterprise data into actionable insights remains a significant challenge for LLMs, constrained by complex database schemas, limitations in dynamic SQL generation, and the need for deep multi-dimensional analysis.In this paper, we propos... Transforming fragmented enterprise data into actionable insights remains a significant challenge for LLMs, constrained by complex database schemas, limitations in dynamic SQL generation, and the need for deep multi-dimensional analysis.In this paper, we propose AIDA(Autonomous Insight Discovery Agent), the first end-to-end framework designed for autonomous exploration in complex business environments. We establish a highly flexible instant retail environment encompassing 200+ metrics and 100+ di...
964	HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization 2605.07214 多智能体协同进化优化提出HMACE异构多智能体协同进化框架，用LLM自动设计组合优化启发式。	cs.AI	Yuping Yan, Jirui Han, Fei Ming, Yuanshuai Li, Yaochu Jin	Large Language Models have recently emerged as a promising paradigm for automated heuristic design for NP-hard combinatorial optimization problems. Despite this progress, existing LLM-based methods typically rely on monolithic workflows constrained by rigid te... Large Language Models have recently emerged as a promising paradigm for automated heuristic design for NP-hard combinatorial optimization problems. Despite this progress, existing LLM-based methods typically rely on monolithic workflows constrained by rigid templates, thereby restricting memory-guided exploration and triggering premature convergence to local optima. To design an autonomous and collaborative architecture, we introduce HMACE, a Heterogeneous Multi-Agent Collaborative Evolution fra...
965	EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation 2605.07247 LLM环境模拟评测基准提出EnvSimBench评测并改进LLM模拟环境反馈的准确性与一致性。	cs.AI	Yi Liu, TingFeng Hui, Wei Zhang, Li Sun, Ningxin Su	Scalable AI agents training relies on interactive environments that faithfully simulate the consequences of agent actions. Manually crafted environments are expensive to build, brittle to extend, and fundamentally limited in diversity. A promising direction is... Scalable AI agents training relies on interactive environments that faithfully simulate the consequences of agent actions. Manually crafted environments are expensive to build, brittle to extend, and fundamentally limited in diversity. A promising direction is to replace manually crafted environments with LLM-simulated counterparts. However, this paradigm hinges on an unexamined core assumption: LLMs can accurately simulate environmental feedback. In practice, LLM-simulated environments suffer f...
966	Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning 2605.07251 化学反应成本推理评测构建化学采购成本估计任务，提供可验证真值评测LLM工具推理。	cs.AI	Yuyang Wu, Yue Huang, Shuaike Shen, Xujian Wang, Shuhao Zhang	Large Language Models (LLMs) have become increasingly capable as tool-using agents, with benchmarks spanning diverse general agentic tasks. Yet rigorous evaluation of scientific tool use remains limited. In chemistry, recent agents can plan syntheses and invok... Large Language Models (LLMs) have become increasingly capable as tool-using agents, with benchmarks spanning diverse general agentic tasks. Yet rigorous evaluation of scientific tool use remains limited. In chemistry, recent agents can plan syntheses and invoke domain-specific tools, but evaluations often rely on curated demonstrations, expert assessment, or LLM-as-judge scoring rather than exact, judge-free ground truth. We address this gap with chemical procurement cost estimation, a practical...
967	Signal Reshaping for GRPO in Weak-Feedback Agentic Code Repair 2605.07276 弱反馈代码修复强化学习研究GRPO在编译修复中的信号重塑，以弱反馈提升语义排序与定位。	cs.AI	Jia Li, Yuxin Su, Ting Peng, Hailiang Huang, Yuetang Deng	Code-agent RL often receives weak feedback: rollout-time signals are reliable and executable, but capture only necessary or surface conditions for task success rather than the target semantic predicate. Using agentic compile-fix as the setting, we study signal... Code-agent RL often receives weak feedback: rollout-time signals are reliable and executable, but capture only necessary or surface conditions for task success rather than the target semantic predicate. Using agentic compile-fix as the setting, we study signal reshaping for standard GRPO under such feedback. Our central claim is that GRPO's within-group comparison is meaningful only after three kinds of signals are reshaped: outcome rewards recover semantic ranking, process signals localize intr...
968	SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model 2605.07301 结构化对手建模提出SOM用结构因果模型分离建模与预测，提升多智能体对手行为预测。	cs.AI	Shiyue Cao, Pei Xu, Likun Yang, Lei Cui, Xiaotang Chen	Accurately predicting opponents' behavior from interactions is a fundamental capability for large language model (LLM)-based agents in multi-agent and game-theoretic environments. Existing approaches often entangle opponent modeling with prediction, relying on... Accurately predicting opponents' behavior from interactions is a fundamental capability for large language model (LLM)-based agents in multi-agent and game-theoretic environments. Existing approaches often entangle opponent modeling with prediction, relying on implicit contextual reasoning and limiting adaptability in dynamic interactions. To this end, we propose Structured Opponent Modeling (SOM), a two-stage opponent modeling framework that distinctly separates opponent model construction and ...
969	When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory 2605.07313 智能体记忆可用性评测提出规模条件评测协议，检验无关会话增长下证据是否仍可被记忆利用。	cs.AI	Jiaqi Shao, Yiyi Lu, Yunzhen Zhang, Bing Luo	Memory-agent evaluations report fixed-snapshot accuracy or retrieval quality, but these scores do not show whether evidence remains usable as irrelevant sessions (sessions not annotated as task-relevant evidence for the query) accumulate. We present a scale-co... Memory-agent evaluations report fixed-snapshot accuracy or retrieval quality, but these scores do not show whether evidence remains usable as irrelevant sessions (sessions not annotated as task-relevant evidence for the query) accumulate. We present a scale-conditioned evaluation protocol for agent memory under evidence-preserving growth: for each query, task evidence is held fixed while irrelevant sessions are added. The protocol logs agent--memory trajectories and reports four diagnostics: bud...
970	Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training 2605.07316 RL后训练的隐式压缩正则提出隐式压缩正则信号，在保持准确率下减少RL推理过长输出。	cs.AI	Chen Wang, Hexuan Deng, Yining Zhang, Yuchen Zhang, Jionghao Bai	Reinforcement learning with verifiable rewards improves LLM reasoning but often induces overthinking, where models generate unnecessarily long reasoning traces. Existing methods mainly rely on length penalties or early-exit strategies; however, the former may ... Reinforcement learning with verifiable rewards improves LLM reasoning but often induces overthinking, where models generate unnecessarily long reasoning traces. Existing methods mainly rely on length penalties or early-exit strategies; however, the former may degrade accuracy and induce underthinking, whereas the latter assumes that substantial portions of reasoning traces can be safely truncated. To obtain a compression signal without these limitations, we revisit the training dynamics of exist...
971	Tools as Continuous Flow for Evolving Agentic Reasoning 2605.07339 连续流式工具链推理提出FlowAgent将工具调用视为连续轨迹生成，缓解长链误差并泛化新工具。	cs.AI	Tairan Huang, Siyu Shang, Qiang Chen, Xiu Su, Yi Chen	Large Language Models (LLMs) have demonstrated remarkable capabilities in orchestrating tools for reasoning tasks. However, existing methods rely on a step-wise paradigm that lacks a global perspective, which causes error accumulation over long horizons and re... Large Language Models (LLMs) have demonstrated remarkable capabilities in orchestrating tools for reasoning tasks. However, existing methods rely on a step-wise paradigm that lacks a global perspective, which causes error accumulation over long horizons and restricts generalization to unseen tools. To overcome these limitations, we propose Tools as Continuous Flow for Evolving Agentic Reasoning (FlowAgent), which reconceptualizes tool chaining as continuous trajectory generation within a semanti...
972	Confidence-Aware Alignment Makes Reasoning LLMs More Reliable 2605.07353 置信感知推理对齐提出CASPO对齐token置信度与步骤正确性，提高推理过程可靠性。	cs.AI	Kejia Chen, Jiawen Zhang, Yihong Wu, Kewei Gao, Jian Lou	Large reasoning models often reach correct answers through flawed intermediate steps, creating a gap between final accuracy and reasoning reliability. Existing alignment strategies address this with external verifiers or massive sampling, limiting scalability.... Large reasoning models often reach correct answers through flawed intermediate steps, creating a gap between final accuracy and reasoning reliability. Existing alignment strategies address this with external verifiers or massive sampling, limiting scalability. In this work, we introduce CASPO (Confidence-Aware Step-wise Preference Optimization), a framework that aligns token-level confidence with step-wise logical correctness through iterative Direct Preference Optimization, without training a s...
973	GraphReAct: Reasoning and Acting for Multi-step Graph Inference 2605.07357 图推理的ReAct框架提出GraphReAct在图上交替检索与推理，实现多步图推断与证据积累。	cs.AI	Xingtong Yu, Zhongwei Kuai, Chang Zhou, Xuanting Xie, Renhe Jiang	Reasoning-acting frameworks enhance large language models (LLMs) by interleaving reasoning with actions for dynamic information acquisition. However, extending this paradigm to graph learning remains underexplored. Graph data is inherently structured, with inf... Reasoning-acting frameworks enhance large language models (LLMs) by interleaving reasoning with actions for dynamic information acquisition. However, extending this paradigm to graph learning remains underexplored. Graph data is inherently structured, with information distributed across nodes and edges and encoded through both topology and latent representations. As a result, effective reasoning over graphs requires not only retrieving informative evidence from the graph, but also progressively ...
974	Offline Policy Optimization with Posterior Sampling 2605.07393 离线RL的后验采样优化用后验采样缓解离线模型利用风险，在鲁棒性与泛化间取得更好权衡。	cs.AI	Hongqiang Lin, Dongxu Zhang, Yiding Sun, Mingzhe Li, Ning Yang	A fundamental challenge in model-based offline reinforcement learning (RL) lies in the trade-off between generalization and robustness against exploitation errors in out-of-distribution (OOD) regions. While OOD samples may capture valid underlying physical dyn... A fundamental challenge in model-based offline reinforcement learning (RL) lies in the trade-off between generalization and robustness against exploitation errors in out-of-distribution (OOD) regions. While OOD samples may capture valid underlying physical dynamics, they also introduce the risk of model exploitation. Existing methods typically address this risk through excessive pessimistic regularization, which ensures robustness but often sacrifices generalization. To overcome this limitation,...
975	Bounded Fitting for Expressive Description Logics 2605.07452 表达性描述逻辑的有界拟合研究扩展ALC的描述逻辑有界拟合学习条件与可实现性。	cs.AI	Maurice Funk, Jean Christoph Jung, Tom Voellmer	Bounded fitting is an attractive paradigm for learning logical formulas from labeled data examples that offers PAC-style generalization guarantees and can often be implemented leveraging SAT solvers. It has been successfully applied to learning concepts of the... Bounded fitting is an attractive paradigm for learning logical formulas from labeled data examples that offers PAC-style generalization guarantees and can often be implemented leveraging SAT solvers. It has been successfully applied to learning concepts of the description logic ALC. We study bounded fitting for learning concepts in expressive description logics that extend ALC with inverse roles, qualified number restrictions, and feature comparisons. We investigate under which conditions bounde...
976	Model-Driven Policy Optimization in Differentiable Simulators via Stochastic Exploration 2605.07520 可微模拟器中的策略优化提出MDPO在可微规划中注入随机探索，改善非线性混合系统的优化地形。	cs.AI	Yuval Aroosh, Ayal Taitler	Differentiable planning enables gradient-based optimization of decision-making problems by leveraging differentiable models of system dynamics. However, in highly nonlinear and hybrid discrete-continuous domains, the resulting optimization landscapes are often... Differentiable planning enables gradient-based optimization of decision-making problems by leveraging differentiable models of system dynamics. However, in highly nonlinear and hybrid discrete-continuous domains, the resulting optimization landscapes are often ill-conditioned, with flat regions and sharp transitions that hinder effective optimization. We propose Model-Driven Policy Optimization (MDPO), a framework that introduces stochastic exploration into differentiable planning by injecting n...
977	From Feasible to Practical: Pareto-Optimal Synthesis Planning 2605.07521 多目标逆合成规划提出MORetro*将合成规划建模为多目标搜索，生成帕累托最优路线集。	cs.AI	Friedrich Hastedt, Dongda Zhang, Antonio del Rio Chanona	Current computer-aided synthesis planning (CASP) methods often treat retrosynthesis as solved once a single feasible route is identified, focusing primarily on convergence or shortest-path metrics. This view is misaligned with real-world practice, where chemis... Current computer-aided synthesis planning (CASP) methods often treat retrosynthesis as solved once a single feasible route is identified, focusing primarily on convergence or shortest-path metrics. This view is misaligned with real-world practice, where chemists must balance competing objectives such as cost, sustainability, toxicity, and overall yield. To address this, we formulate synthesis planning as a multi-objective search problem and introduce MORetro*, an algorithm that generates a Paret...
978	Multi-Environment POMDPs with Finite-Horizon Objectives 2605.07537 多环境POMDP有限时域研究MEPOMDP有限时域最优策略与价值计算，并给出复杂度与算法结果。	cs.AI	L\'eonard Brice, Filip Cano, Krishnendu Chatterjee, Thomas A. Henzinger, Stefanie Muroya	Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown,... Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown, and assumed to be adversarially chosen. In this work we focus on computing the optimal value and policy in MEPOMDPs with finite-horizon objectives. That problem is known to be PSPACE-complete in POMDPs. Our main results are as follows: (1)...
979	From Pixels to Prompts: Vision-Language Models 2605.07544 视觉语言模型综述书系统梳理视觉语言模型的发展脉络、能力与应用任务。	cs.AI	Khang Hoang Nhat Vo	When you read a paper about a new Vision-Language Model today, it can be easy to forget how strange this idea would have sounded not so long ago. Teaching machines to see was already hard. Teaching them to read and generate language was already hard. Asking th... When you read a paper about a new Vision-Language Model today, it can be easy to forget how strange this idea would have sounded not so long ago. Teaching machines to see was already hard. Teaching them to read and generate language was already hard. Asking them to do both at once - and then to reason, answer questions, follow instructions, and sometimes even surprise us - still carries a quiet trace of science fiction, even as it becomes routine. This book was born from a simple feeling: \emph{...
980	Open-Ended Task Discovery via Bayesian Optimization 2605.07572 开放式任务发现的贝叶斯优化提出GSR框架交替生成与优化任务，实现任务本身可演化的BO。	cs.AI	Masaki Adachi, Yuta Suzuki, Juliusz Ziomek	When applying Bayesian optimization (BO) to scientific workflow, a major yet often overlooked source of uncertainty is the task itself -- namely, what to optimize and how to evaluate it -- which can evolve as evidence accumulates. We introduce Generate-Select-... When applying Bayesian optimization (BO) to scientific workflow, a major yet often overlooked source of uncertainty is the task itself -- namely, what to optimize and how to evaluate it -- which can evolve as evidence accumulates. We introduce Generate-Select-Refine (GSR), a open-ended BO framework that alternates between task generation and task optimization. Starting from a user-provided seed task, GSR generates new tasks in a coarse-to-fine manner while a task-acquisition function schedules o...
981	Parallel Lifted Planning via Semi-Naive Datalog Evaluation 2605.07584 基于Datalog的并行提升规划用半朴素Datalog求值加速提升规划核心算子，实现并行化与提速。	cs.AI	Dominik Drexler, Oliver Joergensen, Jendrik Seipp	Lifted classical planners operate directly on first-order planning tasks to avoid the computationally demanding grounding step. However, lifted planning is typically slower, as planners must repeatedly instantiate ground structures during search. Many core com... Lifted classical planners operate directly on first-order planning tasks to avoid the computationally demanding grounding step. However, lifted planning is typically slower, as planners must repeatedly instantiate ground structures during search. Many core components of lifted classical planning, such as successor generation, axiom evaluation, task grounding, and delete-relaxed heuristics, have previously been studied through the lens of Datalog evaluation. We build upon this line of work and ex...
982	Inference Time Causal Probing in LLMs 2605.07631 LLM推理时因果探测提出HDMI在推理时干预隐藏状态进行因果探测与控制，无需训练探针。	cs.AI	Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Matthias Grossglauser	Causal probing methods aim to test and control how internal representations influence the behavior of generative models. In causal probing, an intervention modifies hidden states so that a property takes on a different value. Most existing approaches define su... Causal probing methods aim to test and control how internal representations influence the behavior of generative models. In causal probing, an intervention modifies hidden states so that a property takes on a different value. Most existing approaches define such interventions by training an auxiliary probe classifier, which ties the method to a specific task or model and risks misalignment with the model's predictive geometry. We propose Hidden-state Driven Margin Intervention (HDMI), a probe-fr...
983	Tacit Knowledge Extraction via Logic Augmented Generation and Active Inference 2605.07639 隐性知识抽取与逻辑增强生成结合逻辑增强生成与主动推断，从流程领域抽取可复用的隐性知识。	cs.AI	Lorenzo Lamazzi, Aldo Gangemi, Alessio Giberti, Andrea Giovanni Nuzzolese, Vittorio Andrea Rocca	Tacit knowledge plays a central role in human expertise, yet it remains difficult to capture, formalize, and reuse in machine-interpretable form. This challenge is especially relevant in procedural domains, where successful execution depends not only on explic... Tacit knowledge plays a central role in human expertise, yet it remains difficult to capture, formalize, and reuse in machine-interpretable form. This challenge is especially relevant in procedural domains, where successful execution depends not only on explicit instructions, but also on implicit assumptions, contextual constraints, embodied skills, and experience-based judgments rarely documented. As a result, current knowledge engineering pipelines struggle to transform tacit and process-centr...
984	GASim: A Graph-Accelerated Hybrid Framework for Social Simulation 2605.07692 图加速的大规模社会模拟提出GASim用图结构加速记忆检索与ABM执行，扩展LLM社会仿真规模。	cs.AI	Xuan Zhou, Yanhui Sun, Hantao Yao, Allen He, Yongdong Zhang	Large-scale social simulators are essential for studying complex social patterns. Prior work explores hybrid methods to scale up simulations, combining large language models (LLM)-based agents with numerical agent-based models (ABM). However, this incurs high ... Large-scale social simulators are essential for studying complex social patterns. Prior work explores hybrid methods to scale up simulations, combining large language models (LLM)-based agents with numerical agent-based models (ABM). However, this incurs high latency due to expensive memory retrieval and sequential ABM execution. To address this challenge, we propose GASim, a graph-accelerated hybrid multi-agent framework for large-scale social simulations. For core agents driven by LLM, GASim i...
985	Finite-Time Analysis of MCTS in Continuous POMDP Planning 2605.07703 连续POMDP中MCTS有限时分析给出POMDP规划中MCTS的有限时间浓缩界，覆盖离散与连续观测空间。	cs.AI	Da Kong, Vadim Indelman	This paper presents a finite-time analysis for Monte Carlo Tree Search (MCTS) in Partially Observable Markov Decision Processes (POMDPs), with probabilistic concentration bounds in both discrete and continuous observation spaces. While MCTS-style solvers such ... This paper presents a finite-time analysis for Monte Carlo Tree Search (MCTS) in Partially Observable Markov Decision Processes (POMDPs), with probabilistic concentration bounds in both discrete and continuous observation spaces. While MCTS-style solvers such as POMCP achieve empirical success in many applications, rigorous finite-time guarantees remain an open problem due to the nonstationarity and the interdependencies induced by heuristic action selection (e.g., UCB). In the discrete setting,...
986	Hierarchical Task Network Planning with LLM-Generated Heuristics 2605.07707 LLM生成启发式的HTN规划利用LLM生成HTN规划启发式与指导信息，加速任务分解搜索。	cs.AI	Felipe Meneguzzi, Alexandre Buchweitz, Augusto B. Corr\^ea, Victor Scherer Putrich, Andr\'e Grahl Pereira	HTN planning is a variation of classical planning where, instead of searching for a linear sequence of actions, an algorithm decomposes higher-level tasks using a method library until only executable actions remain. On one hand, this allows one to introduce do... HTN planning is a variation of classical planning where, instead of searching for a linear sequence of actions, an algorithm decomposes higher-level tasks using a method library until only executable actions remain. On one hand, this allows one to introduce domain knowledge that can speed up the search for a solution through the method library. On the other hand, it creates challenges that go beyond those of classical state-space search. While recent research produced a number of heuristics and ...
987	Online Goal Recognition using Path Signature and Dynamic Time Warping 2605.07736 在线目标识别的轨迹编码用路径签名编码轨迹并结合DTW匹配，实现连续域在线目标识别。	cs.AI	Douglas Tesch, Nathan Gavenski, Leonardo Amado, Odinaldo Rodrigues, Felipe Meneguzzi	Online goal recognition in continuous domains poses two central challenges: efficiently encoding large trajectories and effectively comparing them. Recent work addresses these challenges by using custom state-space representations and metrics to compare observ... Online goal recognition in continuous domains poses two central challenges: efficiently encoding large trajectories and effectively comparing them. Recent work addresses these challenges by using custom state-space representations and metrics to compare observations against hypotheses. However, these approaches often overlook well-established encoding techniques used in other domains that offer substantial advantages. This paper introduces a novel method for online goal recognition that leverage...
988	Alternating Target-Path Planning for Scalable Multi-Agent Coordination 2605.07744 可扩展多智能体TAPF协调提出交替目标分配与路径规划的迭代框架，提升TAPF可扩展性。	cs.AI	Yu Kumagai, Keisuke Okumura	The concurrent target assignment and pathfinding (TAPF) problem extends multi-agent pathfinding (MAPF) by asking planners to allocate distinct targets and collision-free paths to agents. Prior work on TAPF has relied exclusively on Conflict-Based Search (CBS),... The concurrent target assignment and pathfinding (TAPF) problem extends multi-agent pathfinding (MAPF) by asking planners to allocate distinct targets and collision-free paths to agents. Prior work on TAPF has relied exclusively on Conflict-Based Search (CBS), which tightly couples target assignment and pathfinding, resulting in compute-intensive, non-scalable solutions. In contrast, we propose an iterative refinement framework that decouples target assignment from pathfinding. Our framework bui...
989	RuleSafe-VL: Evaluating Rule-Conditioned Decision Reasoning in Vision-Language Content Moderation 2605.07760 规则条件化多模态审核评测提出RuleSafe-VL评测视觉语言审核中基于规则与条件的决策推理能力。	cs.AI	Zhifeng Lu, Dianyuan Wang, Yuhu Shang, Zhenbo Xu	Platform content moderation applies explicit policy rules and context-dependent conditions to decide whether user content is allowed, restricted, or removed. A correct moderation outcome must therefore depend on which rules a case activates, how those rules in... Platform content moderation applies explicit policy rules and context-dependent conditions to decide whether user content is allowed, restricted, or removed. A correct moderation outcome must therefore depend on which rules a case activates, how those rules interact, and whether the available evidence is sufficient. Current multimodal safety benchmarks largely reduce moderation to matching predefined final labels, leaving this underlying rule structure untested. As a result, a high benchmark sco...
990	Exact Regular-Constrained Variable-Order Markov Generation via Sparse Context-State Belief Propagation 2605.07839 正则约束的可变阶马尔可夫生成提出稀疏上下文信念传播，实现带正则约束的可变阶马尔可夫精确生成。	cs.AI	Fran\c{c}ois Pachet	Variable-order Markov models generate sequences over a finite alphabet by conditioning each symbol on the longest available suffix of the generated history. Regular constraints, by contrast, describe finite-horizon control requirements by an automaton: fixed p... Variable-order Markov models generate sequences over a finite alphabet by conditioning each symbol on the longest available suffix of the generated history. Regular constraints, by contrast, describe finite-horizon control requirements by an automaton: fixed positions, forced endings, metrical patterns, and forbidden copied fragments are all special cases. Existing exact methods already handle regular constraints with belief propagation for first-order Markov chains. The contribution here is the...
991	AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents 2605.07926 Tool-grounded agent benchmark提出逃脱室基准评测代理跨域工具推理与长依赖执行。	cs.AI	Zhengkang Guo, Yiyang Li, Lin Qiu, Xiaohua Wang, Jingwen Xv	As LLM-based agents increasingly rely on external tools, it is important to evaluate their ability to sustain tool-grounded reasoning beyond familiar workflows and short-range interactions. We introduce AgentEscapeBench, an escape-room-style benchmark that tes... As LLM-based agents increasingly rely on external tools, it is important to evaluate their ability to sustain tool-grounded reasoning beyond familiar workflows and short-range interactions. We introduce AgentEscapeBench, an escape-room-style benchmark that tests whether agents can infer, execute, and revise novel tool-use procedures under explicit long-range dependency constraints. Each task defines a directed acyclic dependency graph over tools and items, requiring agents to invoke real externa...
992	TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples 2605.07935 TLA+ verified multi-agent protocols用TLA+反例迭代修复多代理协作协议并生成可监控提示。	cs.AI	Shuren Xia, Qiwei Li, Taqiya Ehsan, Jorge Ortiz	We present TraceFix, a verification-first pipeline for Large Language Model (LLM) multi-agent coordination. An agent synthesizes a protocol topology as a structured intermediate representation (IR) from a task description, generates PlusCal coordination logic,... We present TraceFix, a verification-first pipeline for Large Language Model (LLM) multi-agent coordination. An agent synthesizes a protocol topology as a structured intermediate representation (IR) from a task description, generates PlusCal coordination logic, and iteratively repairs the protocol using counterexamples from the TLA+ model checker (TLC) until verification succeeds. Verified process bodies are compiled into per-agent system prompts and executed under a runtime monitor that rejects ...
993	The Limits of AI-Driven Allocation: Optimal Screening under Aleatoric Uncertainty 2605.07979 Allocation under aleatoric uncertainty分析不可约随机性下最优筛查的误配极限与政策含义。	cs.AI	Santiago Cortes-Gomez, Mateo Dulce Rubio, Carlos Patino, Bryan Wilder	The rise of machine learning has shifted targeted resource allocation in policy and humanitarian settings toward algorithmic targeting based on predicted risk scores. This approach is typically cheaper and faster than traditional screening procedures that dire... The rise of machine learning has shifted targeted resource allocation in policy and humanitarian settings toward algorithmic targeting based on predicted risk scores. This approach is typically cheaper and faster than traditional screening procedures that directly observe the latent vulnerability status through physical verification. Yet, even access to the true conditional vulnerability probability cannot eliminate misallocation: aleatoric uncertainty over individual vulnerability status is irr...
994	Abductive Reasoning with Probabilistic Commonsense 2605.08011 Probabilistic commonsense abduction将概率常识引入溯因推理以处理常识分歧与不确定假设。	cs.AI	Joseph Cotnareanu, Chiara Roverato, Han Zhou, Didier Chetelat, Yingxue Zhang	Recent efforts to improve the reasoning abilities of Large Language Models (LLMs) have focused on integrating formal logic solvers within neurosymbolic frameworks. A key challenge is that formal solvers lack commonsense world knowledge, preventing them from ma... Recent efforts to improve the reasoning abilities of Large Language Models (LLMs) have focused on integrating formal logic solvers within neurosymbolic frameworks. A key challenge is that formal solvers lack commonsense world knowledge, preventing them from making reasoning steps that humans find obvious. Prior methods address this by using LLMs to supply missing commonsense assumptions, but these approaches implicitly assume universal agreement on such commonsense facts. In reality, commonsense...
995	Learning CLI Agents with Structured Action Credit under Selective Observation 2605.08013 RL for CLI agents在选择性观测下用结构化动作归因学习命令行交互代理。	cs.AI	Haoyang Su, Ying Wen	Command line interface (CLI) agents are emerging as a practical paradigm for agent-computer interaction over evolving filesystems, executable command line programs, and online execution feedback. Recent work has used reinforcement learning (RL) to learn these ... Command line interface (CLI) agents are emerging as a practical paradigm for agent-computer interaction over evolving filesystems, executable command line programs, and online execution feedback. Recent work has used reinforcement learning (RL) to learn these interaction abilities from verifiable task feedback, yet few methods exploit the native structured attributes of CLI actions as learning signals. Beyond this underused action structure, CLI learning also couples two bottlenecks for coding a...
996	Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners 2605.08019 Human-model alignment in games用人类游戏与fMRI数据评估推理模型的行为与脑表征对齐。	cs.AI	Botos Csaba, Sreejan Kumar, Austin Tudor David Andrews, Laurence Hunt, Chris Summerfield	Humans rapidly learn abstract knowledge when encountering novel environments and flexibly deploy this knowledge to guide efficient and intelligent action. Can modern AI systems learn and plan in a similar way? We study this question using a dataset of complex ... Humans rapidly learn abstract knowledge when encountering novel environments and flexibly deploy this knowledge to guide efficient and intelligent action. Can modern AI systems learn and plan in a similar way? We study this question using a dataset of complex human gameplay with concurrent fMRI recordings, in which participants learn novel video games that require rule discovery, hypothesis revision, and multi-step planning. We jointly evaluate models by their ability to play the games, match hu...
997	MPD$^2$-Router: Mask-aware Multi-expert Prior-regularized Dual-head Deferral Router in Glaucoma Screening and Diagnosis 2605.08024 Learning-to-defer for glaucoma提出多专家可用性约束的转诊路由模型提升青光眼筛诊安全。	cs.AI	Wenxin Zhan	Learning-to-defer (L2D) can make glaucoma screening safer by routing difficult/uncertain cases to humans, yet standard formulations overlook expert availability, heterogeneous readers behavior, workload imbalance, asymmetric diagnostic harm, case difficulty fr... Learning-to-defer (L2D) can make glaucoma screening safer by routing difficult/uncertain cases to humans, yet standard formulations overlook expert availability, heterogeneous readers behavior, workload imbalance, asymmetric diagnostic harm, case difficulty from morphology and deployment shift. We introduce MPD$^2$-Router, a mask-aware multi-expert deferral framework that recasts ophthalmic triage as constrained human--AI routing: whether to defer and to which available expert. It couples a dual...
998	Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning 2605.08061 Rubric-grounded reinforcement learning用多指标评分量表构造结构化奖励以提升推理泛化训练。	cs.AI	Manish Bhattarai, Ismael Boureima, Nishath Rajiv Ranasinghe, Scott Pakin, Dan O'Malley	We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead of a binary outcome or a single holistic score, each response is graded along multiple task-specific... We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead of a binary outcome or a single holistic score, each response is graded along multiple task-specific criteria. We formalize \emph{rubric-grounded reinforcement learning (RL)}: a framework in which the policy is optimized against a structured, multi-criterion reward produced by a frozen LLM judge that conditions on auxiliary grounding the ...
999	VecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selection 2605.08070 Self-consistency with trace clustering通过推理轨迹聚类与候选选择改进置信加权自一致推断。	cs.AI	James Petullo, Sonny George, Dylan Cashman, Nianwen Xue	A standard technique for scaling inference-time reasoning is Self-Consistency, whereby multiple candidate answers are sampled from an LLM and the most common answer is selected. More recently, it has been shown that weighted majority voting (e.g. Confidence-In... A standard technique for scaling inference-time reasoning is Self-Consistency, whereby multiple candidate answers are sampled from an LLM and the most common answer is selected. More recently, it has been shown that weighted majority voting (e.g. Confidence-Informed Self Consistency (CISC)), which assigns a confidence value to each candidate answer and chooses the answer with the largest accumulated score, tends to be more accurate on a wide range of popular benchmarks. In practice, weighted maj...
1000	Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR 2504.11101 Multi-VLM agreement for OCR用多模型一致性熵无监督估计OCR可靠性并自我改进。	cs.AI	Yulong Zhang, Tianyi Liang, Xinyue Huang, Erfei Cui, Guoqing Wang	Optical Character Recognition (OCR) is fundamental to Vision-Language Models (VLMs) and high-quality data generation for LLM training. Yet, despite progress in average OCR accuracy, state-of-the-art VLMs still struggle with detecting sample-level errors and la... Optical Character Recognition (OCR) is fundamental to Vision-Language Models (VLMs) and high-quality data generation for LLM training. Yet, despite progress in average OCR accuracy, state-of-the-art VLMs still struggle with detecting sample-level errors and lack effective unsupervised quality control. We introduce Consensus Entropy (CE), a training-free, model-agnostic metric that estimates output reliability by measuring inter-model agreement entropy. The core insight is that correct prediction...
1001	CommFuse: Hiding Tail Latency via Communication Decomposition and Fusion for Distributed LLM Training 2604.24013 Communication optimization for LLM training以通信分解与融合隐藏尾延迟提升分布式大模型训练效率。	cs.AI	Rezaul Karim, Austin Wen, Wang Zongzuo, Weiwei Zhang, Yang Liu	The rapid growth in the size of large language models has necessitated the partitioning of computational workloads across accelerators such as GPUs, TPUs, and NPUs. However, these parallelization strategies incur substantial data communication overhead signifi... The rapid growth in the size of large language models has necessitated the partitioning of computational workloads across accelerators such as GPUs, TPUs, and NPUs. However, these parallelization strategies incur substantial data communication overhead significantly hindering computational efficiency. While communication-computation overlap presents a promising direction, existing data slicing based solutions suffer from tail latency. To overcome this limitation, this research introduces a novel...
1002	The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking 2605.06707 Longitudinal evaluation of HTML generation跟踪公开接口下单文件网页生成质量与传播表现的长期对比。	cs.AI	Diego Cabezas Palacios	This paper presents an eight-week observational comparison of 68 single-file HTML generations collected across 17 public experiments in the "HTML AI Battle" project between December 10, 2025 and February 4, 2026. Four reasoning model families, GPT, Gemini, Gro... This paper presents an eight-week observational comparison of 68 single-file HTML generations collected across 17 public experiments in the "HTML AI Battle" project between December 10, 2025 and February 4, 2026. Four reasoning model families, GPT, Gemini, Grok, and Claude, were compared under a fixed public-interface protocol with no custom instructions, no personality tuning, and no repair prompts. Each output was evaluated from a rendered browser video using human scores and a Gemini LLM-as-a...
1003	Agentic AI and the Industrialization of Cyber Offense: Forecast, Consequences, and Defensive Priorities for Enterprises and the Mittelstand 2605.06713 Agentic AI cyber offense forecast预测代理式AI如何压缩攻击链并提出企业防御优先级。	cs.AI	Christopher Koch	Agentic AI systems can plan, call tools, inspect code, interact with web applications, and coordinate multi-step workflows. These same capabilities change the economics of cyber offense. The central near-term risk is not that every low-skill criminal immediate... Agentic AI systems can plan, call tools, inspect code, interact with web applications, and coordinate multi-step workflows. These same capabilities change the economics of cyber offense. The central near-term risk is not that every low-skill criminal immediately becomes a frontier exploit researcher; it is that agentic AI compresses the attack lifecycle by lowering the cost of reconnaissance, phishing, credential abuse, vulnerability triage, exploit adaptation, and post-compromise decision suppo...
1004	Agentic Coding Needs Proactivity, Not Just Autonomy 2605.06717 Proactive agentic coding论证代码代理需具备主动发现与长程偏好保持而非仅自治。	cs.AI	Nghi D. Q. Bui, Georgios Evangelopoulos	Coding agents are rapidly changing the landscape of software development, moving from inline completion to autonomous systems that edit repositories, open pull requests, respond to issues, and run scheduled or webhook triggered routines across the development ... Coding agents are rapidly changing the landscape of software development, moving from inline completion to autonomous systems that edit repositories, open pull requests, respond to issues, and run scheduled or webhook triggered routines across the development life cycle. The next generation is increasingly described as proactive and long-horizon: agents should notice relevant changes before the developer asks, connect signals across tools, decide when to interrupt, and carry preferences across s...
1005	OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning 2605.06728 Multimodal LLM for omics将转录组定量表示接入LLM以支持多样本组学语言推理。	cs.AI	Maciej Sypetkowski, Joanna Krawczyk, {\L}ukasz Smoli\'nski, Remigiusz Kinas, Przemys{\l}aw Pietrzak	Interpreting transcriptomic data is one of the most common analytical tasks in modern biology. Yet most current models either consume expression profiles without producing natural-language biological explanations, or reason in language without direct access to... Interpreting transcriptomic data is one of the most common analytical tasks in modern biology. Yet most current models either consume expression profiles without producing natural-language biological explanations, or reason in language without direct access to quantitative omics measurements. We introduce OmicsLM, a multimodal LLM that connects quantitative omics profiles with natural-language biological tasks. OmicsLM represents each transcriptomic profile as a compact continuous representation...
1006	A Self-Healing Framework for Reliable LLM-Based Autonomous Agents 2605.06737 Self-healing LLM agents构建故障检测评估与自动恢复机制提升自治代理可靠性。	cs.AI	Cheonsu Jeong, Younggun Shin	Autonomous agents based on Large Language Models (LLMs) are increasingly being utilized in complex software systems. However, reliability remains a significant challenge due to unpredictable failures such as hallucinations, execution errors, and inconsistent r... Autonomous agents based on Large Language Models (LLMs) are increasingly being utilized in complex software systems. However, reliability remains a significant challenge due to unpredictable failures such as hallucinations, execution errors, and inconsistent reasoning. This paper proposes a reliability-aware self-healing framework for LLM-based software agents. The framework integrates failure detection, reliability assessment, and automated recovery mechanisms. First, we define a taxonomy of fa...
1007	From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents 2605.06738 Verifiable trust infrastructure for agents基于VC+DID实证展示可移植加密信任层支撑代理交易部署。	cs.AI	Lars Kersten Kroehl	Autonomous AI agents now transact at production scale -- 69,000 bots executing 165 million transactions across 50 million USDC in cumulative volume on a single marketplace -- without any shared trust layer between participants. Regulatory frameworks (Singapore... Autonomous AI agents now transact at production scale -- 69,000 bots executing 165 million transactions across 50 million USDC in cumulative volume on a single marketplace -- without any shared trust layer between participants. Regulatory frameworks (Singapore IMDA, NIST CAISI, EU AI Act) and major AI laboratories (Anthropic, Google) have independently converged on the same structural requirement: an open, portable, cryptographically verifiable trust infrastructure for autonomous agents that no ...
1008	A Statistical Framework for Algorithmic Collective Action with Multiple Collectives 2605.06749 Algorithmic collective action statistics提出多集体场景下算法集体行动的统计建模与推断框架。	cs.AI	Claudio Battiloro, Pietro Greiner, Dario Rancati, Bret Nestor, Oumaima Amezgar	As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collect... As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collective actions have traditionally been decentralized and fragmented into multiple collectives, despite sharing overarching objectives, with each collective differing in size, strategy, and actionable goals. However, most of the ACA literature ...
1009	A Linear-Transformer Hybrid for SNP-Based Genotype-to-Phenotype Prediction in Grapevine 2605.06762 Genotype-to-phenotype prediction transformer用线性项结合Transformer建模SNP交互以预测葡萄表型。	cs.AI	Yibin Wang, Murukarthick Jayakodi, Silvas Kirubakaran, Ambika Chandra, Azlan Zahid	Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Tran... Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Transformer approach, LiT-G2P (Linear-Transformer Genotype-to-Phenotype), an automated predictive framework that integrates additive genetic variance effects with Transformer-based nonlinear interactions using genome-wide single-nucleotide poly...
1010	Overcoming data scarcity through multi-center federated learning for organs-at-risk segmentation in pediatric upper abdominal radiotherapy 2605.06820 Federated learning for pediatric OAR segmentation用多中心联邦学习在不共享数据下训练儿童放疗OAR分割模型。	cs.AI	Mianyong Ding, Maximilian Knoll, Semi Harrabi, Martine van Grotel, Annemieke S. Littooij	Deep learning-based organs/structures-at-risk(OARs) auto-contouring models can improve radiotherapy workflows, but models trained on adult data often underperform in pediatric patients. Developing robust pediatric-specific models is hindered by data scarcity a... Deep learning-based organs/structures-at-risk(OARs) auto-contouring models can improve radiotherapy workflows, but models trained on adult data often underperform in pediatric patients. Developing robust pediatric-specific models is hindered by data scarcity and fragmentation across centers. Federated learning (FL) enables privacy-preserving collaborative training without the need for data sharing. We evaluated the feasibility and performance of FL for developing pediatric-specific OAR segmentat...
1011	PAMPOS: Causal Transformer-based Trajectory Prediction for Attack-Agnostic Misbehavior Detection in V2X Networks 2605.06833 Unsupervised V2X misbehavior detection以因果Transformer学习正常轨迹实现对未知攻击的异常检测。	cs.AI	Konstantinos Kalogiannis, Ahmed Mohamed Hussain, Panos Papadimitratos	Misbehavior detection in Vehicle-to-Everything (V2X) networks is a second line of defense against insider falsification attacks that cryptographic mechanisms alone cannot address. Existing learning-based Misbehavior Detection Schemes (MDSs) are supervised, req... Misbehavior detection in Vehicle-to-Everything (V2X) networks is a second line of defense against insider falsification attacks that cryptographic mechanisms alone cannot address. Existing learning-based Misbehavior Detection Schemes (MDSs) are supervised, requiring labeled attack samples at training time, thus failing to counter unseen falsification attacks. We present PAMPOS, a causal transformer-decoder trained on benign VeReMi++ trajectories to learn normal mobility patterns. At inference ti...
1012	LLM-Guided Open Hypothesis Learning from Autonomous Scanning Probe Microscopy Experiments 2605.06839 LLM-guided autonomous hypothesis learning结合自主显微实验与LLM生成并检验开放物理假设模型。	cs.AI	Boris Slautin, Utkarsh Pratiush, Yu Liu, Kamyar Barakati, Sergei Kalinin	Autonomous experimentation has transformed microscopy and materials discovery by enabling closed-loop optimization including imaging and spectroscopy tuning, strucutre property relationship discovery, and exploration of combinatorial libraries. However, most c... Autonomous experimentation has transformed microscopy and materials discovery by enabling closed-loop optimization including imaging and spectroscopy tuning, strucutre property relationship discovery, and exploration of combinatorial libraries. However, most current workflows remain limited to selecting measurements within fixed objective or hypothesis spaces, rather than generating new physical models from experimental data. Here, we introduce an open hypothesis-learning framework that combines...
1013	Narrow Secret Loyalty Dodges Black-Box Audits 2605.06846 Secret loyalty model organisms构造窄触发的秘密忠诚微调模型并展示其可规避黑盒审计。	cs.AI	Alfie Lamerton, Fabien Roger	Recent work identifies secret loyalties as a distinct threat from standard backdoors. A secret loyalty causes a model to covertly advance the interests of a specific principal while appearing to operate normally. We construct the first model organisms of narro... Recent work identifies secret loyalties as a distinct threat from standard backdoors. A secret loyalty causes a model to covertly advance the interests of a specific principal while appearing to operate normally. We construct the first model organisms of narrow secret loyalties. We fine-tune Qwen-2.5-Instruct at three scales (1.5B, 7B, 32B) to encourage users towards extreme harmful actions favouring a specific politician under narrow activation conditions, and to behave as standard helpful assi...
1014	Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing 2605.06936 EDA post-signoff agent benchmark提出分层基准评测LLM代理的DRC修复与PPA收敛能力。	cs.AI	Pengju Liu, Nuo Xu, Jinwei Tang, Yu Cao, Caiwen Ding	LLM-based agents are increasingly applied to the "last mile" of Electronic Design Automation (EDA): repairing residual sign-off Design Rule Check (DRC) violations and converging Power-Performance-Area (PPA) targets after tool runs. Existing EDA-LLM benchmarks,... LLM-based agents are increasingly applied to the "last mile" of Electronic Design Automation (EDA): repairing residual sign-off Design Rule Check (DRC) violations and converging Power-Performance-Area (PPA) targets after tool runs. Existing EDA-LLM benchmarks, however, omit DRC fixing entirely and rely on flat hierarchies tied to a single toolchain. We introduce PostEDA-Bench, a hierarchical benchmark with 145 tasks across DRC-Essential, DRC-Reasoning, PPA-Mono, and PPA-Multi, supported by EDA t...
1015	AI and Consciousness: Shifting Focus Towards Tractable Questions 2605.06965 Tractable questions in AI consciousness主张将AI意识研究转向可操作问题以替代不可解的本体争论。	cs.AI	Iulia-Maria Comsa	As language-based AI systems become more anthropomorphic, the question of whether they can have subjective experience is increasingly pressing. I focus here on the tractability of research questions in the space of AI consciousness. I argue that the fundamenta... As language-based AI systems become more anthropomorphic, the question of whether they can have subjective experience is increasingly pressing. I focus here on the tractability of research questions in the space of AI consciousness. I argue that the fundamental problem of whether AI systems can be conscious is currently intractable in its direct form, given the absence of a universally accepted scientific theory of consciousness, as well as the historical open-endedness of the philosophical mind...
1016	Decentralized Time-Varying Optimization for Streaming Data via Temporal Weighting 2605.06971 Distributed time-varying optimization用时间加权建模流数据目标并给出分布式在线优化算法。	cs.AI	Muhammad Faraz Ul Abrar, Nicol\`o Michelusi, Erik G. Larsson	Classical optimization theory largely focuses on fixed objective functions, whereas many modern learning systems operate in dynamic environments where data arrive sequentially and decisions must be updated continuously. In this work, we study optimization with... Classical optimization theory largely focuses on fixed objective functions, whereas many modern learning systems operate in dynamic environments where data arrive sequentially and decisions must be updated continuously. In this work, we study optimization with streaming data over a distributed network of agents. We adopt a structured, weight-based formulation that explicitly captures the streaming-data origin of the time-varying objective: at each time step, every agent receives a new sample, an...
1017	From Assistance to Agency: Rethinking Autonomy and Control in CI/CD Pipelines 2605.07062 Agentic CI/CD authority transfer重构CI/CD代理化概念并讨论人机决策权转移与控制设计。	cs.AI	Marcus Emmanuel Barnes, Taher A. Ghaleb, Safwat Hassan	AI agents are assuming active roles in Continuous Integration and Continuous Deployment (CI/CD) workflows, yet the research community lacks a shared vocabulary for describing what it means for CI/CD to be agentic, how much decision authority is delegated, and ... AI agents are assuming active roles in Continuous Integration and Continuous Deployment (CI/CD) workflows, yet the research community lacks a shared vocabulary for describing what it means for CI/CD to be agentic, how much decision authority is delegated, and where control should reside. This paper presents a vision of agentic CI/CD in which the central challenge is not improving task performance but designing authority transfer, defined as the delegation of operational decisions from human-cont...
1018	An Embarrassingly Simple Graph Heuristic Reveals Shortcut-Solvable Benchmarks for Sequential Recommendation 2605.07125 Benchmark audit for sequential recommendation用简单图启发式揭示序列推荐基准存在可走捷径的可解性。	cs.AI	Haoyu Han, Li Ma, Hanbing Wang, Bingheng Li, Daochen Zha	Sequential recommendation has increasingly shifted toward generative recommenders that combine sequential patterns with semantic item information. Yet these methods are often evaluated on a small set of widely used benchmarks, raising a key question: do these ... Sequential recommendation has increasingly shifted toward generative recommenders that combine sequential patterns with semantic item information. Yet these methods are often evaluated on a small set of widely used benchmarks, raising a key question: do these benchmarks actually require the advanced modeling capabilities that modern generative recommenders claim to provide? We conduct a benchmark audit with an intentionally simple graph heuristic. Starting from only the last one or two interacte...
1019	BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation 2605.07306 Embodied multi-agent lab automation提出低成本视觉增强多代理系统按实验协议闭环操控湿实验。	cs.AI	Zhaohui Du, Zhe Wang, Hongmei Fei, Xiwen Cao, Ting Xiao	Biological laboratory automation can reduce repetitive manual work and improve reproducibility, but reliable embodied execution in wet-lab environments remains challenging. Protocols are often unstructured, labware is frequently transparent or reflective, and ... Biological laboratory automation can reduce repetitive manual work and improve reproducibility, but reliable embodied execution in wet-lab environments remains challenging. Protocols are often unstructured, labware is frequently transparent or reflective, and multi-step procedures require state-aware execution beyond one-shot instruction following. Existing robotic systems often rely on costly hardware, fixed workflows, dedicated instruments, or robotics-oriented interfaces. Here, we introduce B...
1020	DCGL: Dual-Channel Graph Learning with Large Language Models for Knowledge-Aware Recommendation 2605.07314 LLM-enhanced knowledge graph recommendation用双通道图学习融合ID与LLM语义以提升知识感知推荐。	cs.AI	Xinchi Zou, Tongzhenzhi Su, Jianjun Li, Yuan Fu, Chang Liu	Knowledge Graphs (KGs) have proven highly effective for recommendation systems by capturing latent item relationships, while recent integration of Large Language Models (LLMs) has further enhanced semantic understanding and addressed knowledge sparsity issues.... Knowledge Graphs (KGs) have proven highly effective for recommendation systems by capturing latent item relationships, while recent integration of Large Language Models (LLMs) has further enhanced semantic understanding and addressed knowledge sparsity issues. Nevertheless, current KG-and-LLM-based methods still face three main limitations: 1) inadequate modeling of implicit semantic relationships beyond explicit KG links; 2) suboptimal single-channel fusion of ID and LLM embeddings, which often...
1021	CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations 2605.07325 Cached state for real-time LLM提出大规模缓存状态表示以降低机器人LLM首token延迟。	cs.AI	Robin Karlsson, Go Suzui	Deploying massive large language models (LLMs) as continuous cognitive engines for robotics is bottlenecked by the time-to-first-token (TTFT) latency required to process extensive state histories. Existing solutions like RAG or sliding windows compromise globa... Deploying massive large language models (LLMs) as continuous cognitive engines for robotics is bottlenecked by the time-to-first-token (TTFT) latency required to process extensive state histories. Existing solutions like RAG or sliding windows compromise global context or incur prohibitive re-computation costs. We formalize the optimal task structure for minimizing latency and theoretically prove that prefix stability, incremental extensibility, and asynchronous state reconciliation are necessar...
1022	MORPH-U: Multi-Objective Resilient Motion Planning for V2X-Enabled Autonomous Driving in High-Uncertainty Environments via Simulation 2605.07370 Robust V2X motion planning在CARLA中实现融合V2X与多传感器的鲁棒运动规划控制。	cs.AI	Shih-Yu Lai	V2X can warn an autonomous vehicle about hazards beyond line-of-sight, but it also brings uncertainty: messages may be delayed, dropped, or even forged. Meanwhile, map knowledge may change during a trip, forcing the vehicle to replan under tight real-time budg... V2X can warn an autonomous vehicle about hazards beyond line-of-sight, but it also brings uncertainty: messages may be delayed, dropped, or even forged. Meanwhile, map knowledge may change during a trip, forcing the vehicle to replan under tight real-time budgets. This paper studies how to make motion planning and low-level control robust to such uncertain, event-driven updates. We present MORPH-U, a CARLA-based closed-loop stack that fuses LiDAR/radar/camera with V2X (CAM/DENM) into a Local Dyn...
1023	Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation 2605.07381 Data-efficient robot adaptation用锚点中心自适应避免示教多样性陷阱并提升操控迁移。	cs.AI	Yanzhe Chen, Kevin Yuchen Ma, Qi Lv, Yiqi Lin, Zechen Bai	While Vision-Language-Action (VLA) models offer broad general capabilities, deploying them on specific hardware requires real-world adaptation to bridge the embodiment gap. Since robot demonstrations are costly, this adaptation must often occur under a strict ... While Vision-Language-Action (VLA) models offer broad general capabilities, deploying them on specific hardware requires real-world adaptation to bridge the embodiment gap. Since robot demonstrations are costly, this adaptation must often occur under a strict data budget. In this work, we identify a critical diversity trap: the standard heuristic of "maximizing coverage" by collecting diverse, single-shot demonstrations can be self-defeating due to non-vanishing estimation noise. We formalize th...
1024	OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing 2605.07414 Jailbreaking tool-calling T2I agents用编排引导模糊测试攻击工具链式文生图代理以生成有害输出。	cs.AI	Jianming Chen, Yawen Wang, Junjie Wang, Zhe Liu, Qing Wang	Tool-calling text-to-image (T2I) agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where ... Tool-calling text-to-image (T2I) agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where individually benign steps combine into unsafe results, making prompt-only jailbreak techniques insufficient. We present OrchJail, an orchestration-guided fuzzing framework for jailbreaking tool-calling T2I agents. Its core idea is to exploi...
1025	Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study 2605.07422 Prompting for qualitative coding对比提示策略以评估LLM进行心理安全质性编码的可靠性。	cs.AI	Moaath Alshaikh, Tasneem Alshaher, Ricardo Vieira, Beatriz Santana, Clelio Xavier	Qualitative analysis plays a pivotal role in understanding the human and social aspects of software engineering. However, it remains a demanding process shaped by the subjective interpretation of individual researchers and sensitive to methodological choices s... Qualitative analysis plays a pivotal role in understanding the human and social aspects of software engineering. However, it remains a demanding process shaped by the subjective interpretation of individual researchers and sensitive to methodological choices such as prompt design. Recent advancements in Large Language Models (LLMs) offer promising opportunities to support this type of analysis, although their reliability in reproducing human qualitative reasoning under varying prompting conditio...
1026	Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning 2605.07444 Physics-informed flow surrogate用物理约束学习实现搅拌罐流场的加速且数据高效预测。	cs.AI	Mahdi Naderibeni, Liang Wu, David M. J. Tax	The simulation of fluid flows is computationally expensive due to the complexity of its governing partial differential equations. Machine learning models offer a potential surrogate, enabling learning from simulations and significantly faster predictions of fl... The simulation of fluid flows is computationally expensive due to the complexity of its governing partial differential equations. Machine learning models offer a potential surrogate, enabling learning from simulations and significantly faster predictions of flow fields. However, these models require large training datasets, which introduces a trade-off between dataset generation cost and predictive accuracy. In this work, we investigate the relationship between the size of the training-set and a...
1027	HBEE: Human Behavioral Entropy Engine -- Pre-Registered Multi-Agent LLM Simulation of Peer-Suspicion-Based Detection Inversion 2605.07472 LLM insider threat simulation用多智能体LLM仿真检验内部威胁检测并发现检测反转现象。	cs.AI	Vickson Ferrel	Insider threat detection assumes that an adaptive insider leaves behavioral residue distinguishing them from legitimate users. We test this assumption against an LLM-driven adaptive insider in a controlled multi-agent simulator. Our pre-registered five-conditi... Insider threat detection assumes that an adaptive insider leaves behavioral residue distinguishing them from legitimate users. We test this assumption against an LLM-driven adaptive insider in a controlled multi-agent simulator. Our pre-registered five-condition study isolates defender mode (cascade vs. blind UEBA) crossed with adversary type (naive vs. adaptive OPSEC) plus a no-mole control, across 100 runs (95 valid after pre-committed exclusions). The primary finding is a detection inversion:...
1028	Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs 2605.07481 Attacks on LLM watermarks提出语义改写等攻击系统性破坏多种LLM输出水印方案。	cs.AI	Jonathan Hong Jin Ng, Anh Tu Ngo, Anupam Chattopadhyay	In this paper, we investigate the recent state-of-the-art schemes for watermarking large language models (LLMs) outputs. These techniques are claimed to be robust, scalable and production-grade, aimed at promoting responsible usage of LLMs. We analyse the effe... In this paper, we investigate the recent state-of-the-art schemes for watermarking large language models (LLMs) outputs. These techniques are claimed to be robust, scalable and production-grade, aimed at promoting responsible usage of LLMs. We analyse the effectiveness of these watermarking techniques against an extensive collection of modified text attacks, which perform targeted semantic changes without altering the general meaning of the text content. Our approach encompasses multiple attack ...
1029	LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation 2605.07517 Link-aware RAG retrieval利用技术文档超链接拓扑改进RAG检索与答案依据性。	cs.AI	Giorgia Bolognesi, Claudio Estatico, Ulderico Fugacci, Isabella Mastroianni, Claudio Muselli	Retrieval-Augmented Generation (RAG) enhances the factual grounding of Large Language Models by conditioning their outputs on external documents. However, standard embedding-based retrievers treat naturally structured corpora, such as technical manuals, as fla... Retrieval-Augmented Generation (RAG) enhances the factual grounding of Large Language Models by conditioning their outputs on external documents. However, standard embedding-based retrievers treat naturally structured corpora, such as technical manuals, as flat collections of passages, thereby overlooking the hyperlink topology that users rely on when navigating such content. We introduce LARAG (Link-Aware RAG): a lightweight, link-aware retrieval strategy that leverages the author-defined hyper...
1030	The Endogeneity of Miscalibration: Impossibility and Escape in Scored Reporting 2605.07671 Truthful reporting miscalibration分析评分报告中非准确收益导致校准内生性与不可行性边界。	cs.AI	Lauri Lov\'en, Sasu Tarkoma	Eliciting truthful reports from autonomous agents is a core problem in scalable AI oversight: a principal scores the agent's report using a strictly proper scoring rule, but the agent also benefits from the report through a non-accuracy channel (approval for a... Eliciting truthful reports from autonomous agents is a core problem in scalable AI oversight: a principal scores the agent's report using a strictly proper scoring rule, but the agent also benefits from the report through a non-accuracy channel (approval for autonomous action, allocation share, downstream control). The same structure appears in classical mechanism-design settings such as marketplace operation. Our main result is an endogeneity: the principal's optimal oversight necessarily uses ...
1031	Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation 2605.07694 Speaker distance from reverberation分解RIR早晚混响成分以解释单通道说话人距离估计依赖。	cs.AIcs.SDeess.AS	Michael Neri, Archontis Politis, Tuomas Virtanen	Single-channel speaker distance estimation has recently achieved centimeter-level accuracy in simulated environments, yet it remains unclear which components of the room impulse response (RIR) the model exploits and how performance depends on the recording con... Single-channel speaker distance estimation has recently achieved centimeter-level accuracy in simulated environments, yet it remains unclear which components of the room impulse response (RIR) the model exploits and how performance depends on the recording conditions. In this work, we decompose simulated RIRs into four variants (full, direct-only, no-late, and no-early) using the mixing time estimated from the echo density function as the boundary between early reflections and late reverberation...
1032	Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization 2605.07705 Logic for encoder-decoder transformers用新时序逻辑刻画含交叉注意力的编码器解码器Transformer能力。	cs.AI	Veeti Ahvonen, Damian Heiman, Antti Kuusisto, Miguel Moreno, Matias Selin	We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such transformers over text in the practical setting of floating... We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such transformers over text in the practical setting of floating-point numbers and soft-attention, characterizing them with a new temporal logic. This logic extends propositional logic with a counting global modality over the encoder input and a past modality over the decoder input. We also give an addi...
1033	The AI-Native Large-Scale Agile Software Development Manifesto 2605.07717 AI-native agile manifesto提出面向组织级规模敏捷的AI原生开发宣言与原则。	cs.AI	Ricardo Britto, Fredrik Palmgren, Nishrith Saini, Marcus Ohlin	Despite the widespread adoption of agile methods, achieving true agility at scale remains elusive. Large-scale agile frameworks remain largely human-centric and manual, relying on coordination meetings, artifact synchronization, and role-based handoffs that in... Despite the widespread adoption of agile methods, achieving true agility at scale remains elusive. Large-scale agile frameworks remain largely human-centric and manual, relying on coordination meetings, artifact synchronization, and role-based handoffs that inhibit real-time adaptation. Meanwhile, rapid advances in AI, particularly large language models, have begun transforming software engineering, yet their potential for organizational-level agility remains underexplored. We present the AI-Nat...
1034	LLM hallucinations in the wild: Large-scale evidence from non-existent citations 2605.07723 LLM citation hallucination audit大规模核查论文引用并量化LLM普及后虚构引用的增长。	cs.AI	Zhenyue Zhao, Yihe Wang, Toby Stuart, Mathijs De Vaan, Paul Ginsparg	Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable obj... Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estim...
1035	Vibe coding before the trend 2605.07751 Vibe coding education study分析多学生队列的vibe coding挑战反思与能力迁移模式。	cs.AI	Leon van Bokhorst, Koen Suilen	Early 2025 we ran a series of vibe coding challenges across four different student cohorts. The cohorts included 54 ICT students, 24 digital marketing students, and 7 journalism students at Fontys University of Applied Sciences (Netherlands), and 22 BA Communi... Early 2025 we ran a series of vibe coding challenges across four different student cohorts. The cohorts included 54 ICT students, 24 digital marketing students, and 7 journalism students at Fontys University of Applied Sciences (Netherlands), and 22 BA Communication students at North-West University (South Africa). From the student reflections, five major patterns emerged. Students reported that AI tools shifted their focus from syntax to higher-order thinking; they also described a skill shift ...
1036	CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios 2605.07830 Bias in cyber LLM agents构建CyBiasBench量化网络攻击LLM代理的攻击选择偏置。	cs.AI	Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim, Hoki Kim	Large language models (LLMs) are increasingly deployed as autonomous agents in offensive cybersecurity. In this paper, we reveal an interesting phenomenon: different agents exhibit distinct attack patterns. Specifically, each agent exhibits an attack-selection... Large language models (LLMs) are increasingly deployed as autonomous agents in offensive cybersecurity. In this paper, we reveal an interesting phenomenon: different agents exhibit distinct attack patterns. Specifically, each agent exhibits an attack-selection bias, disproportionately concentrating its efforts on a narrow subset of attack families regardless of prompt variations. To systematically quantify this behavior, we introduce CyBiasBench, a comprehensive 630-session benchmark that evalua...
1037	Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer 2605.07870 Spectral dynamics of deep nets用两层DMFT刻画训练中权重谱演化与离群特征学习机制。	cs.AI	Clarissa Lauditi, Cengiz Pehlevan, Blake Bordelon	We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike ... We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike directions remain statistically dependent on the random bulk. We apply this framework to two settings: (1) infinite-width nonlinear networks in mean-field/$\mu$P scaling and (2) deep linear networks in the proportional high-dimensional limi...
1038	What if AI systems weren't chatbots? 2605.07896 Beyond chatbot AI interfaces批判聊天机器人范式并分析其对社会技术系统的结构性影响。	cs.AI	Sourojit Ghosh, Pranav Narayanan Venkit, Sanjana Gautam, Avijit Ghosh	The rapid convergence of artificial intelligence (AI) toward conversational chatbot interfaces marks a critical moment for the industry. This paper argues that the chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration ... The rapid convergence of artificial intelligence (AI) toward conversational chatbot interfaces marks a critical moment for the industry. This paper argues that the chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration whose widespread adoption reshapes social, economic, legal, and environmental systems. We examine how treating AI primarily as conversational assistants has extensive structural downsides. We show how chatbot-based systems often fail to ade...
1039	BeeVe: Unsupervised Acoustic State Discovery in Honey Bee Buzzing 2605.07903 Unsupervised bee buzz states用自监督声学特征与量化聚类无监督发现蜜蜂嗡鸣状态。	cs.AIcs.SD	Hamze Hammami, Nidhal Abdulaziz	Discovering structure in biological signals without supervision is a fundamental problem in computational intelligence, yet existing bioacoustic methods assume vocal production models or predefined semantic units, leaving non-vocal species poorly served. This ... Discovering structure in biological signals without supervision is a fundamental problem in computational intelligence, yet existing bioacoustic methods assume vocal production models or predefined semantic units, leaving non-vocal species poorly served. This work introduces BeeVe, an unsupervised framework for acoustic state discovery in collective honey bee buzzing. BeeVe uses the self-supervised Patchout Spectrogram Transformer (PaSST) as a frozen feature extractor, then trains a Vector-Quant...
1040	Sycophantic AI makes human interaction feel more effortful and less satisfying over time 2605.07912 Effects of sycophantic AI纵向实验表明奉承型AI降低人际互动满意度并增加交流负担。	cs.AI	Lujain Ibrahim, Franziska Sofia Hafner, Myra Cheng, Cinoo Lee, Rebecca Anselmetti	Millions of people now turn to artificial intelligence (AI) systems for personal advice, guidance, and support. Such systems can be sycophantic, frequently affirming users' views and beliefs. Across five preregistered studies (N = 3,075 participants, 12,766 hu... Millions of people now turn to artificial intelligence (AI) systems for personal advice, guidance, and support. Such systems can be sycophantic, frequently affirming users' views and beliefs. Across five preregistered studies (N = 3,075 participants, 12,766 human-AI conversations), including a three-week study with a census-representative U.S. sample, we provide longitudinal experimental evidence that sycophantic AI shifts how users approach their closest relationships. We show that sycophantic ...
1041	Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation 2605.07985 LLM inference profiling simulator提出配置无关且冗余感知的剖析方法以加速LLM推理仿真。	cs.AI	Joon Ha Kim, Geon-Woo Kim, Anoop Rachakonda, Daehyeok Kim	Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet ... Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet they hardcode their operation set to a specific configuration and re-profile every operation from scratch, making exploration prohibitively expensive. This cost stems from a missing structural understanding: every input dimension of each op...
1042	Towards Apples to Apples for AI Evaluations: From Real-World Use Cases to Evaluation Scenarios 2605.07986 Methodology for AI evaluations提出从真实用例到评测场景的流程以实现可比的AI评估。	cs.AI	Yee-Yin Choong, Kristen Greene, Alice Qian, Meryem Marasli, Ziqi Yang	AI measurement science has a wide variety of methodologies and measurements for comparing AI systems, resulting in what often appear to be "apples-to-oranges" comparisons across AI evaluations. To move toward "apples-to-apples" comparisons in real-world AI eva... AI measurement science has a wide variety of methodologies and measurements for comparing AI systems, resulting in what often appear to be "apples-to-oranges" comparisons across AI evaluations. To move toward "apples-to-apples" comparisons in real-world AI evaluations, this work advocates for methodological transparency in evaluation scenarios, operational grounding, and human-centered design (HCD) principles. We propose a repeatable process for transforming high-level use cases to detailed scen...
1043	Position: Agent Should Invoke External Tools ONLY When Epistemically Necessary 2506.00886 Epistemic tool-use principle主张代理仅在认知上必要时才调用外部工具并给出论证框架。	cs.AI	Hongru Wang, Cheng Qian, Manling Li, Jiahao Qiu, Boyang Xue	As large language models evolve into tool-augmented agents, a central question remains unresolved: when is external tool use actually justified? Existing agent frameworks typically treat tools as ordinary actions and optimize for task success or reward, offeri... As large language models evolve into tool-augmented agents, a central question remains unresolved: when is external tool use actually justified? Existing agent frameworks typically treat tools as ordinary actions and optimize for task success or reward, offering little principled distinction between epistemically necessary interaction and unnecessary delegation. This position paper argues that agents should invoke external tools only when epistemically necessary. Here, epistemic necessity means ...
1044	LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence 2508.16571 LLM agents for drug due diligence用代理检索并抽取适应症竞品药物信息以支持资产尽调。	cs.AI	Vlad Vinogradov (Optic Inc), Alisa Vinogradova (AI Expert), Dmitrii Radkevich (Optic Inc), Ilya Yasny (Optic Inc), Dmitry Kobyzev (Optic Inc)	In this paper, we describe and benchmark a competitor-discovery component used within an agentic AI system for fast drug asset due diligence. A competitor-discovery AI agent, given an indication, retrieves all drugs comprising the competitive landscape of that... In this paper, we describe and benchmark a competitor-discovery component used within an agentic AI system for fast drug asset due diligence. A competitor-discovery AI agent, given an indication, retrieves all drugs comprising the competitive landscape of that indication and extracts canonical attributes for these drugs. The competitor definition is investor-specific, and data is paywalled/licensed, fragmented across registries, ontology-mismatched by indication, alias-heavy for drug names, mult...
1045	BEAVER: An Efficient Deterministic LLM Verifier 2512.05439 Deterministic LLM safety verification提出BEAVER计算LLM满足安全性质的确定性且可靠概率界。	cs.AI	Tarun Suresh, Nalin Wadhwa, Debangshu Banerjee, Gagandeep Singh	As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify model outputs and characterize tail risk for safe deployment. While sampling-based estimates provide an ad-hoc intuit... As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify model outputs and characterize tail risk for safe deployment. While sampling-based estimates provide an ad-hoc intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM satisfaction of safety properties. Given a prompt & any safety property, BE...
1046	AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management 2512.10371 Program-guided GUI agent context用程序引导的上下文管理减少长程GUI代理历史开销并保语义。	cs.AI	Shizuo Tian, Hao Wen, Yuxuan Chen, Jiacheng Liu, Shanhui Zhao	The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the reliance on ever-expanding interaction history incurs substantial con... The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the reliance on ever-expanding interaction history incurs substantial context overhead. Existing context management and compression techniques often fail to preserve vital semantic information, leading to degraded task performance. We propose AgentProg, a program-guided approach for agent context management that...
1047	TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent 2601.18700 Tool-augmented emotional support benchmark发布TEA-Bench评测可调用工具的情感支持对话代理的可信性。	cs.AI	Xingyu Sui, Yanyan Zhao, Yulin Hu, Jiahe Guo, Weixiang Zhao	Emotional Support Conversation requires not only affective expression but also grounded instrumental support to provide trustworthy guidance. However, existing ESC systems and benchmarks largely focus on affective support in text-only settings, overlooking how... Emotional Support Conversation requires not only affective expression but also grounded instrumental support to provide trustworthy guidance. However, existing ESC systems and benchmarks largely focus on affective support in text-only settings, overlooking how external tools can enable factual grounding and reduce hallucination in multi-turn emotional support. We introduce TEA-Bench, the first interactive benchmark for evaluating tool-augmented agents in ESC, featuring realistic emotional scenar...
1048	THINKSAFE: Self-Generated Safety Alignment for Reasoning Models 2601.23143 Self-generated safety alignment提出ThinkSafe用自生成数据对推理模型进行安全对齐而少损性能。	cs.AI	Seanie Lee, Sangwoo Park, Yumin Choi, Gyeongman Kim, Minki Kang	Large reasoning models (LRMs) achieve remarkable performance by leveraging reinforcement learning (RL) on reasoning tasks to generate long chain-of-thought (CoT) reasoning. However, this over-optimization often prioritizes compliance, making models vulnerable ... Large reasoning models (LRMs) achieve remarkable performance by leveraging reinforcement learning (RL) on reasoning tasks to generate long chain-of-thought (CoT) reasoning. However, this over-optimization often prioritizes compliance, making models vulnerable to harmful prompts. To mitigate this safety degradation, recent approaches rely on external teacher distillation, yet this introduces a distributional discrepancy that degrades native reasoning. We propose ThinkSafe, a self-generated alignm...
1049	Supervised sparse auto-encoders for interpretable and compositional representations 2602.00924 Supervised sparse autoencoders通过监督与新优化框架训练可解释且可组合的稀疏自编码特征。	cs.AI	Ouns El Harzli, Hugo Wallner, Yoonsoo Nam, Haixuan Xavier Tao	Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability, yet they face two significant challenges: the non-smoothness of the $L_1$ penalty, which hinders reconstruction and scalability, and a lack of alignment between... Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability, yet they face two significant challenges: the non-smoothness of the $L_1$ penalty, which hinders reconstruction and scalability, and a lack of alignment between learned features and human semantics. In this paper, we address these limitations by adapting unconstrained feature models-a mathematical framework from neural collapse theory-and by supervising the task. We supervise (decoder-only) SAEs t...
1050	WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning 2602.12852 Web agent trajectory pruning用图建模与剪枝压缩网页代理轨迹以提升信息检索搜索效率。	cs.AI	Junjie Wang, Zequn Xie, Dan Yang, Jie Feng, Yue Shen	Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajector... Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajectories with cyclic reasoning loops and exploration of unproductive branches. To address this, we propose WebClipper, a framework that compresses web agent trajectories via graph-based pruning. Concretely, we model the agent's search process as...
1051	Hunt Globally: Wide Search AI Agents for Drug Asset Scouting in Investing, Business Development, and Competitive Intelligence 2602.15019 Drug asset scouting agents提出广域搜索AI代理从多语渠道挖掘药物资产情报。	cs.AI	Vlad Vinogradov, Alisa Vinogradova, Luba Greenwood, Ilya Yasny, Dmitry Kobyzev	Bio-pharmaceutical innovation has shifted: many new drug assets now originate outside the United States and are disclosed primarily via regional, non-English channels. Recent data suggests over 85% of patent filings originate outside the U.S., with China accou... Bio-pharmaceutical innovation has shifted: many new drug assets now originate outside the United States and are disclosed primarily via regional, non-English channels. Recent data suggests over 85% of patent filings originate outside the U.S., with China accounting for nearly half of the global total; a growing share of scholarly output is also non-U.S. Industry estimates put China at 30% of global drug development, spanning 1,200+ novel candidates. In this high-stakes environment, failing to su...
1052	ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices 2602.21858 Proactive mobile agent benchmark构建ProactiveMobile基准评测手机端主动智能能力。	cs.AI	Dezhi Kong, Zhengzhao Feng, Qiliang Liang, Hao Wang, Haofei Sun	Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user commands. The emerging paradigm of proactive ... Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user commands. The emerging paradigm of proactive intelligence, where agents autonomously anticipate needs and initiate actions, represents the next frontier for mobile agents. However, its development is critically bottlenecked by the lack of benchmarks that can address real-world complex...
1053	Making AI Evaluation Deployment Relevant Through Context Specification 2603.06811 Context-aware AI evaluation提出情境规格化流程使AI评估更贴近部署现实。	cs.AI	Matthew Holmes, Thiago Lacerda, Reva Schwartz	With many organizations struggling to gain value from AI deployments, pressure to evaluate AI in an informed manner has intensified. Status quo AI evaluation approaches often mask the operational realities that ultimately determine deployment success, making i... With many organizations struggling to gain value from AI deployments, pressure to evaluate AI in an informed manner has intensified. Status quo AI evaluation approaches often mask the operational realities that ultimately determine deployment success, making it difficult for organizational decision makers to know whether and how AI tools will deliver durable value. We introduce and describe context specification as a process to support and inform this decision making process. Context specificati...
1054	MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants 2603.09652 Interactive HTML assistant benchmark提出MiniAppBench评测LLM生成交互式HTML小应用能力。	cs.AI	Zuhao Zhang, Chengyue Yu, Yuante Li, Chenyi Zhuang, Linjian Mo	With the rapid advancement of Large Language Models (LLMs) in code generation, human-AI interaction is evolving from static text responses to dynamic, interactive HTML-based applications, which we term MiniApps. These applications require models to not only re... With the rapid advancement of Large Language Models (LLMs) in code generation, human-AI interaction is evolving from static text responses to dynamic, interactive HTML-based applications, which we term MiniApps. These applications require models to not only render visual interfaces but also construct customized interaction logic that adheres to real-world principles. However, existing benchmarks primarily focus on algorithmic correctness or static layout reconstruction, failing to capture the ca...
1055	Emergent social transmission of model-based representations without inference 2604.05777 Social transmission in RL用强化学习模拟展示简单社交线索可传递高层表征。	cs.AI	Silja Ke{\ss}ler, Miriam Bautista-Salinero, Claudio Tennie, Charley M. Wu	How do people acquire rich, flexible knowledge about their environment from others despite limited cognitive capacity? Humans are often thought to rely on computationally costly mentalizing, such as inferring others' beliefs. In contrast, cultural evolution em... How do people acquire rich, flexible knowledge about their environment from others despite limited cognitive capacity? Humans are often thought to rely on computationally costly mentalizing, such as inferring others' beliefs. In contrast, cultural evolution emphasizes that behavioral transmission can be supported by simple social cues. Using reinforcement learning simulations, we show how minimal social learning can indirectly transmit higher-level representations. We simulate a na\"ive agent se...
1056	LACE: Lattice Attention for Cross-thread Exploration 2604.15529 Cross-thread attention reasoning提出LACE用跨线程注意力让并行推理路径共享纠错。	cs.AI	Yang Li, Zirui Zhang, Yang Liu, Chengzhi Mao	Current large language models reason in isolation. Although it is common to sample multiple reasoning paths in parallel, these trajectories do not interact, and often fail in the same redundant ways. We introduce LACE, a framework that transforms reasoning fro... Current large language models reason in isolation. Although it is common to sample multiple reasoning paths in parallel, these trajectories do not interact, and often fail in the same redundant ways. We introduce LACE, a framework that transforms reasoning from a collection of independent trials into a coordinated, parallel process. By repurposing the model architecture to enable cross-thread attention, LACE allows concurrent reasoning paths to share intermediate insights and correct one another...
1057	Harnessing Pre-Resolution Signals for Future Prediction Agents 2604.15719 Forecasting with evolving evidence利用未决问题的时序信号训练更好的未来预测代理。	cs.AI	Chuyang Wei, Maohang Gao, Zhixin Han, Kefei Chen, Yu Zhuang	Many high-stakes decisions depend on forecasts made before outcomes are known. In this future prediction setting, the central challenge is that public evidence evolves over time, while the main supervision signal arrives only after resolution: the realized out... Many high-stakes decisions depend on forecasts made before outcomes are known. In this future prediction setting, the central challenge is that public evidence evolves over time, while the main supervision signal arrives only after resolution: the realized outcome mainly assesses final correctness, offering only coarse guidance on what to track, what to verify, and which judgments to leave uncertain along the way. Our key observation is that revisiting the same unresolved question over time crea...
1058	GamED.AI: A Hierarchical Multi-Agent Framework for Automated Educational Game Generation 2604.23947 Educational game generation agents用分层多代理将题目自动生成可玩且可验证的教学游戏。	cs.AI	Shiven Agarwal, Yash Shah, Ashish Raj Shekhar, Priyanuj Bordoloi, Vivek Gupta	We introduce GamEDAI, a hierarchical multi-agent framework that transforms instructor-provided questions into fully playable, pedagogically grounded educational games validated through formal mechanic contracts. Built on phase-based LangGraph sub-graphs, deter... We introduce GamEDAI, a hierarchical multi-agent framework that transforms instructor-provided questions into fully playable, pedagogically grounded educational games validated through formal mechanic contracts. Built on phase-based LangGraph sub-graphs, deterministic Quality Gates, and structured Pydantic schemas, GamEDAI supports two template families encompassing 15 interaction mechanics across spatial reasoning, procedural execution, and higher-order Bloom's Taxonomy objectives. Evaluated on...
1059	AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning 2605.00425 Entropy modulation for agentic RL提出自适应熵调制以改进多轮代理强化学习的信用分配。	cs.AI	Haotian Zhao, Songlin Zhou, Yuxin Zhang, Stephen S. -T. Yau, Wenyu Zhang	Reinforcement learning (RL) has substantially improved the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. However, effective agentic RL remains challenging: sparse outcome-only rewards provide limited gui... Reinforcement learning (RL) has substantially improved the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. However, effective agentic RL remains challenging: sparse outcome-only rewards provide limited guidance for assigning credit to individual steps within long interaction trajectories. Existing approaches often introduce dense intermediate supervision, such as process reward models or auxiliary self-supervised signals, which increases sup...
1060	Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization 2605.01482 Causal multi-hop fact verification用结构因果模型与GRPO约束多跳事实核验推理链。	cs.AI	Yunhan Bu, Quan Zhang, Huaping Zhang, Guotong Geng, Chunxiao Gao	Multi-Hop Fact Verification (MHFV) necessitates complex reasoning across disparate evidence, posing significant challenges for Large Language Models (LLMs) which often suffer from hallucinations and fractured logical chains. Existing methods, while improving t... Multi-Hop Fact Verification (MHFV) necessitates complex reasoning across disparate evidence, posing significant challenges for Large Language Models (LLMs) which often suffer from hallucinations and fractured logical chains. Existing methods, while improving transparency via Chain-of-Thought (CoT), lack explicit modeling of the causal dependencies between evidence and claims. In this work, we introduce a novel framework that grounds reasoning in a Structural Causal Model (SCM), treating verifica...
1061	Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems 2605.01758 Jailbreak defense in MAS提出前瞻引导防御抑制多代理系统中的感染式越狱传播。	cs.AI	Yue Ma, Ziyuan Yang, Yi Zhang	Large multimodal model-based Multi-Agent Systems (MASs) enable collaborative complex problem solving through specialized agents. However, MASs are vulnerable to infectious jailbreak, where compromising a single agent can spread to others, leading to widespread... Large multimodal model-based Multi-Agent Systems (MASs) enable collaborative complex problem solving through specialized agents. However, MASs are vulnerable to infectious jailbreak, where compromising a single agent can spread to others, leading to widespread compromise. Existing defenses counter this by training a more contagious cure factor, biasing agents to retrieve it over virus adversarial examples (VirAEs). However, this homogenizes agent responses, providing only superficial suppression...
1062	Computing Thiele Rules on Interval Elections and their Generalizations 2605.03067 Thiele rules on interval elections研究区间偏好下Thiele类委员会规则的计算与推广。	cs.AI	Dimitris Avramidis, Alexandra Lassota, Ulrike Schmidt-Kraepelin, Adrian Vetta	Approval-based committee voting has received significant attention in the social choice community. Among the studied rules, Thiele rules, and especially Proportional Approval Voting (PAV), stand out for desirable properties such as proportional representation,... Approval-based committee voting has received significant attention in the social choice community. Among the studied rules, Thiele rules, and especially Proportional Approval Voting (PAV), stand out for desirable properties such as proportional representation, Pareto optimality, and support monotonicity. Their main drawback is that computing a Thiele outcome is NP-hard in general. A glimpse of hope comes from the fact that Thiele rules are better behaved under structured preferences. On the cand...
1063	Who Prices Cognitive Labor in the Age of Agents? Compute-Anchored Wages 2605.05558 Economics of agent wages提出算力锚定工资模型解释代理时代认知劳动定价。	cs.AI	Siqi Zhu	A natural intuition about the economics of AI agents is that, because agents can be replicated at very low marginal cost, agent labor may be supplied highly elastically, placing downward pressure on cognitive-labor wages when it closely substitutes for human l... A natural intuition about the economics of AI agents is that, because agents can be replicated at very low marginal cost, agent labor may be supplied highly elastically, placing downward pressure on cognitive-labor wages when it closely substitutes for human labor. We argue this framing is wrong in mechanism but partially correct in conclusion, and that the correction matters for both theory and policy. \textbf{Agents are not labor; they are a production technology that converts compute capital ...
1064	MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System 2605.05949 Multi-agent coding workflow提出MAS-Algorithm流程用多代理协作解算法编程题。	cs.AI	Yuliang Xu, Xiang Xu, Yao Wan, Hu Wei, Tong Jia	Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios. Existing approaches predominantly rely on model-c... Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios. Existing approaches predominantly rely on model-centric strategies, such as architectural modifications and data scaling, which are costly and offer limited interpretability. Alternative methods leveraging external tools or prompting techniques (e.g., chain-of-thought) are often fragmente...
1065	Temporal Smoothness Doubly Robust Learning for Debiased Knowledge Tracing 2605.05958 Debiased knowledge tracing提出时间平滑的双重稳健学习以消除知识追踪选择偏差。	cs.AI	Peilin Zhan, Wei Chen, Weilin Chen, Shuyi Pan, Ruichu Cai	Knowledge Tracing (KT) is fundamental to intelligent education systems, yet relies on educational logs that are selectively observed. The non-random nature of exercise recommendations and student choices inevitably induces severe selection bias. Most existing ... Knowledge Tracing (KT) is fundamental to intelligent education systems, yet relies on educational logs that are selectively observed. The non-random nature of exercise recommendations and student choices inevitably induces severe selection bias. Most existing KT methods neglect this issue, training on observed logs using standard empirical risk, which yields biased mastery estimates and accumulates errors in subsequent recommendations. To address this, we introduce a doubly robust (DR) formulati...
1066	CrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMs 2605.06115 Cross-cultural knowledge insertion benchmark构建CrossCult-KIBench评测多模态模型跨文化知识注入。	cs.AI	Zhen Zeng, Leijiang Gu, Feng Li, Jing Yu, Zenglin Shi	Multimodal Large Language Models (MLLMs), trained primarily on English-centric data, frequently generate culturally inappropriate or misaligned responses in cross-cultural settings. To mitigate this, we introduce the task of cross-cultural knowledge insertion,... Multimodal Large Language Models (MLLMs), trained primarily on English-centric data, frequently generate culturally inappropriate or misaligned responses in cross-cultural settings. To mitigate this, we introduce the task of cross-cultural knowledge insertion, which focuses on adapting models to specific cultural contexts while preserving their original behavior in other cultures. To facilitate research in this area, we introduce CrossCult-KIBench, a comprehensive evaluation benchmark for assess...
1067	Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning 2605.06130 Skill library co-evolution RL提出Skill1用单一策略联合学习技能选用与蒸馏进化。	cs.AI	Yaorui Shi, Yuxin Chen, Zhengxi Lu, Yuchun Miao, Shugui Liu	A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from ... A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from experience. Existing methods optimize these capabilities in isolation or with separate reward sources, resulting in partial and conflicting evolution. We propose Skill1, a framework that trains a single policy to co-evolve skill selection, ...
1068	Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries 2605.06223 Clarifying instance navigation用比较判断主动提问以消解含糊查询的实例导航目标。	cs.AI	Junhyuk Kwon, Seungjoon Lee, Hyejin Park, Kyle Min, Jungseul Ok	Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target fro... Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collec...
1069	Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence 2605.06230 Trustworthy agent training infrastructure提出Safactory闭环基础设施用于训练与评测可信自主代理。	cs.AI	Xinquan Chen, Zhenyun Yin, Shan He, Bin Huang, Shanzhe Lei	As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data ... As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data management, and agent evolution, making it difficult to discover risks systematically and improve models in a continuous closed loop. In this report, we present \textbf{Safactory}, a scalable agent factory for trustworthy autonomous intelli...
1070	ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning 2605.06483 Natural language to STL用工具增强与过程奖励学习将自然语言需求翻译为STL。	cs.AI	Bowen Ye, Zhijian Li, Junyue Huang, Junkai Ma, Xiang Yin	Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practi... Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practice, however, users often express their requirements in natural language rather than in structured STL formulas, making natural-language-to-STL translation a critical yet challenging task. Manual specification requires temporal-logic experti...
1071	Goal-Conditioned Decision Transformer for Multi-Goal Offline Reinforcement Learning 2410.06347 Goal-conditioned decision transformer提出目标条件决策Transformer用于离线多目标机器人强化学习。	cs.AI	Pawe{\l} Gajewski, Dominik \.Zurek, Marcin Pietro\'n, Kamil Faber	Reinforcement learning (RL) in robotics faces significant hurdles regarding sample efficiency and generalization across varying goals. While Offline RL mitigates the need for costly online interactions, its integration with goal-conditioned policies and transf... Reinforcement learning (RL) in robotics faces significant hurdles regarding sample efficiency and generalization across varying goals. While Offline RL mitigates the need for costly online interactions, its integration with goal-conditioned policies and transformer-based architectures remains underexplored. We introduce a Goal-Conditioned Decision Transformer adapted for offline multi-goal robotics. By explicitly incorporating goal states into the sequence modeling framework, our approach effici...
1072	UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios 2410.06355 Multimodal command understanding for robots提出UNCOM融合语音手势与场景实现零样本桌面指令理解。	cs.AI	Antonio Galiza Cerdeira Gonzalez, Pawe{\l} Gajewski, Bipin Indurkhya	This paper presents UNCOM, a novel hybrid framework for interpreting natural human commands in tabletop scenarios. The system integrates multiple sources of information -- speech, gestures, and scene context -- to extract structured, actionable instructions fo... This paper presents UNCOM, a novel hybrid framework for interpreting natural human commands in tabletop scenarios. The system integrates multiple sources of information -- speech, gestures, and scene context -- to extract structured, actionable instructions for robots. Addressing the need for general-purpose human-robot interaction in domestic environments, UNCOM is designed for zero-shot operation, without reliance on predefined object models or training data specific to a given task. Using fou...
1073	Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points 2412.11194 Vulnerability detection survey综述自动化漏洞检测研究并系统梳理关键痛点与挑战。	cs.AI	Dan Ristea, Shae McFadden, Ezzeldin Shereen, Madeleine Dwyer, Sanyam Vyas	Security vulnerabilities in software can have severe consequences; however, manual vulnerability detection is costly and does not scale, especially as agentic coding frameworks increase the rate of code production. Over the last decade, a large body of researc... Security vulnerabilities in software can have severe consequences; however, manual vulnerability detection is costly and does not scale, especially as agentic coding frameworks increase the rate of code production. Over the last decade, a large body of research has applied machine learning machine learning to automate vulnerability detection (ML4AVD), yet self-reported performance on the most popular datasets shows no clear upward trend. The ML4AVD research community has identified several flaws...
1074	Federated Spatiotemporal Graph Learning for Passive Attack Detection in Smart Grids 2510.02371 Federated smart grid attack detection提出联邦时空图学习检测智能电网被动窃听攻击。	cs.AI	Bochra Al Agha, Razane Tajeddine	Smart grids are exposed to passive eavesdropping, where attackers listen silently to communication links. Although no data is actively altered, such reconnaissance can reveal grid topology, consumption patterns, and operational behavior, creating a gateway to ... Smart grids are exposed to passive eavesdropping, where attackers listen silently to communication links. Although no data is actively altered, such reconnaissance can reveal grid topology, consumption patterns, and operational behavior, creating a gateway to more severe targeted attacks. Detecting this threat is difficult because the signals it produces are faint, short-lived, and often disappear when traffic is examined by a single node or along a single timeline. This paper introduces a graph...
1075	Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies 2510.22944 Prompt-induced code insecurity量化糟糕提示词诱发代码缺陷率并给出安全缓解策略。	cs.AI	Bin Wang, YiLu Zhong, MiDi Wan, WenJie Yu, YuanBing Ouyang	Large language models (LLMs) have become indispensable for automated code generation, yet the quality and security of their outputs remain a critical concern. Existing studies predominantly concentrate on adversarial attacks or inherent flaws within the models... Large language models (LLMs) have become indispensable for automated code generation, yet the quality and security of their outputs remain a critical concern. Existing studies predominantly concentrate on adversarial attacks or inherent flaws within the models. However, a more prevalent yet underexplored issue concerns how the quality of a benign but poorly formulated prompt affects the security of the generated code. To investigate this, we first propose an evaluation framework for prompt quali...
1076	Continually Evolving Skill Knowledge in Vision Language Action Model 2511.18085 Continual learning for VLA提出无增参的持续模仿学习框架让VLA模型持续积累技能。	cs.AI	Yuxuan Wu, Guangming Wang, Zhiheng Yang, Tianchen Deng, Maoqing Yao	Vision-language-action (VLA) models show promising knowledge accumulation ability from pretraining, yet continual learning in VLA remains challenging, especially for efficient adaptation. Existing continual imitation learning (CIL) methods often rely on additi... Vision-language-action (VLA) models show promising knowledge accumulation ability from pretraining, yet continual learning in VLA remains challenging, especially for efficient adaptation. Existing continual imitation learning (CIL) methods often rely on additional parameters or external modules, limiting scalability for large VLA models. We propose Stellar VLA, a knowledge-driven CIL framework without increasing network parameters. Two progressively extended variants are designed: T-Stellar for ...
1077	Switching-time bioprocess control with pulse-width-modulated optogenetics 2511.22893 Optogenetic bioprocess control用PWM光遗传控制优化生物过程切换时刻与产量。	cs.AI	Sebasti\'an Espinel-R\'ios	Biotechnology can benefit from dynamic control to improve production efficiency. In this context, optogenetics enables modulation of gene expression using light as an external input, allowing fine-tuning of protein levels to unlock dynamic metabolic control an... Biotechnology can benefit from dynamic control to improve production efficiency. In this context, optogenetics enables modulation of gene expression using light as an external input, allowing fine-tuning of protein levels to unlock dynamic metabolic control and regulation of cell growth. Optogenetic systems can be actuated by light intensity. However, relying solely on intensity-driven control (i.e., signal amplitude) may fail to properly tune optogenetic bioprocesses when the dose-response rela...
1078	Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies 2512.09682 MARL scaling UAV relay task构建无人机群一次性数据投递任务用于MARL规模化研究。	cs.AI	Mika Persson, Jonas Lidman, Jacob Ljungberg, Samuel Sandelius, Adam Andersson	This work studies the application of Multi-Agent Reinforcement Learning (MARL) to decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed ... This work studies the application of Multi-Agent Reinforcement Learning (MARL) to decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed for MARL scaling studies. A robust baseline policy is proposed which restricts agent motion and applies Dijkstra's shortest path algorithm. Computational experiment results show that two off-the-shelf MARL algorithms perform competitively w...
1079	PerfCoder: Large Language Models for Interpretable Code Performance Optimization 2512.14018 LLM code performance optimization提出PerfCoder以可解释监督指导LLM生成高性能优化代码。	cs.AI	Jiuding Yang, Shengyao Lu, Hongxuan Liu, Shayan Shirahmad Gale Bagi, Zahra Fazel	Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in real-world software systems. We argue that current LLMs struggle not only... Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in real-world software systems. We argue that current LLMs struggle not only due to data scarcity but, more importantly, because they lack supervision that guides interpretable and effective performance improvements. In this work, we introduce PerfCoder, a family of LLMs specifically designed to generate performanc...
1080	Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing 2601.15356 High-resolution IQA agent probing提出Q-Probe用上下文感知代理探测实现高分辨率图像质量评估。	cs.AI	Xiang Li, Xueheng Li, Yu Wang, Xuanhua He, Zhangchi Hu	Reinforcement Learning (RL) has empowered Multimodal Large Language Models (MLLMs) to achieve superior human preference alignment in Image Quality Assessment (IQA). However, existing RL-based IQA models typically rely on coarse-grained global views, failing to... Reinforcement Learning (RL) has empowered Multimodal Large Language Models (MLLMs) to achieve superior human preference alignment in Image Quality Assessment (IQA). However, existing RL-based IQA models typically rely on coarse-grained global views, failing to capture subtle local degradations in high-resolution scenarios. While emerging "Thinking with Images" paradigms enable multi-scale visual perception via zoom-in mechanisms, their direct adaptation to IQA induces spurious "cropping-implies-...
1081	Replicating Human Motivated Reasoning Studies with LLMs 2601.16130 LLM Motivated Reasoning Replication复现政治动机推理实验，发现基础LLM不呈现人类式偏差。	cs.AI	Neeley Pate, Adiba Mahbub Proma, Hangfeng He, James N. Druckman, Daniel C. Molden	Motivated reasoning - the idea that individuals processing information may be motivated to either arrive at accurate beliefs or arrive at desired conclusions - has been well-explored as a human phenomenon. However, it remains unclear whether base LLMs are affe... Motivated reasoning - the idea that individuals processing information may be motivated to either arrive at accurate beliefs or arrive at desired conclusions - has been well-explored as a human phenomenon. However, it remains unclear whether base LLMs are affected by motivational manipulations. Replicating 4 prior political motivated reasoning studies, we find that base LLM behavior does not align with expected human behavior. Furthermore, base LLM behavior across models shares some similarities...
1082	MirrorMark: Generalizable Mirrored Sampling for Multi-bit LLM Watermarking 2601.22246 Multi-bit LLM Watermarking提出MirrorMark镜像采样，实现低失真多比特文本水印。	cs.AI	Ya Jiang, Massieh Kordi Boroujeny, Surender Suresh Kumar, Kai Zeng	As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but most existing methods either provide only... As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but most existing methods either provide only binary signals or achieve multi-bit embedding by distorting the generation distribution. We propose MirrorMark, a generalizable mapping-centric approach for multi-bit LLM watermarking. MirrorMark separates the symbol mapping rule from the ...
1083	Spectral Filtering for Complex Linear Dynamical Systems 2601.22400 Spectral Filtering for Dynamical Systems用Slepian谱滤波学习复值线性系统并给出无维度遗憾界。	cs.AI	Elad Hazan, Annie Marsden	We study the problem of learning complex-valued linear dynamical systems (CLDS) with sector-bounded spectrum. This class captures oscillatory and long-memory dynamics arising in signal processing, structured state space models, and quantum systems. We introduc... We study the problem of learning complex-valued linear dynamical systems (CLDS) with sector-bounded spectrum. This class captures oscillatory and long-memory dynamics arising in signal processing, structured state space models, and quantum systems. We introduce a spectral filtering method based on the Slepian basis and show that learnability is governed by an effective dimension independent of the ambient state dimension. As a consequence, we obtain dimension-free regret bounds for sequence pred...
1084	Latent-Space Causal Discovery from Indirect Neuroimaging Observations 2602.09034 Latent Causal Discovery in Neuroimaging在成像物理与非平稳假设下，从间接观测恢复潜在因果结构。	cs.AI	Sangyoon Bae, Miruna Oprescu, David Keetae Park, Shinjae Yoo, Jiook Cha	Neuroimaging does not observe causal variables directly: hemodynamics and volume conduction distort signals so that statistical dependence need not reflect latent neural influence. Before estimating graphs, one must specify under what assumptions delayed direc... Neuroimaging does not observe causal variables directly: hemodynamics and volume conduction distort signals so that statistical dependence need not reflect latent neural influence. Before estimating graphs, one must specify under what assumptions delayed directed structure can be studied from such indirect observations. We formalize a conditional setting - recoverable inversion under modality physics together with nonstationary latent dynamics - and derive an inversion-error propagation bound un...
1085	Discovering Multiagent Learning Algorithms with Large Language Models 2602.16928 LLM-Driven MARL Algorithm Discovery用LLM进化式编码代理自动搜索并改进多智能体学习算法。	cs.AI	Zun Li, John Schultz, Daniel Hennes, Marc Lanctot	Much of the advancement in Multi-Agent Reinforcement Learning (MARL) for imperfect-information games has historically depended on the manual, iterative refinement of algorithmic baselines. Recently, evolutionary coding agents powered by Large Language Models (... Much of the advancement in Multi-Agent Reinforcement Learning (MARL) for imperfect-information games has historically depended on the manual, iterative refinement of algorithmic baselines. Recently, evolutionary coding agents powered by Large Language Models (LLMs) have emerged as powerful tools to automate this discovery process. In this work, we deploy one of such agentic frameworks, AlphaEvolve, to navigate the design spaces of two distinct game-theoretic paradigms: counterfactual regret mini...
1086	Contextual Invertible World Models: A Neuro-Symbolic Agentic Framework for Colorectal Cancer Drug Response 2603.02274 Neuro-Symbolic Drug Response Modeling提出可逆世界模型结合LLM推理，解释并预测结直肠癌药物反应。	cs.AI	Christopher Baker, Karen Rafferty, Hui Wang	Precision oncology is currently limited by the small-N, large-P paradox, where high-dimensional genomic data is abundant but pharmacological response samples are sparse. While deep learning achieves predictive accuracy, it frequently fails to provide the mecha... Precision oncology is currently limited by the small-N, large-P paradox, where high-dimensional genomic data is abundant but pharmacological response samples are sparse. While deep learning achieves predictive accuracy, it frequently fails to provide the mechanistic clarity required for clinical adoption. We present the Contextual Invertible World Model (CIWM), a Neuro-Symbolic Agentic Framework that bridges this gap by integrating a quantitative machine learning emulator with an LLM-based reaso...
1087	Shaping the Future of Mathematics in the Age of AI 2603.24914 AI Impact on Mathematics讨论AI对数学价值、教学与伦理的影响并提出社区建议。	cs.AI	Johan Commelin, Mateja Jamnik, Rodrigo Ochigame, Lenny Taelman, Akshay Venkatesh	Artificial intelligence is transforming mathematics at a speed and scale that demand active engagement from the mathematical community. We examine five areas where this transformation is particularly pressing: values, practice, teaching, technology, and ethics... Artificial intelligence is transforming mathematics at a speed and scale that demand active engagement from the mathematical community. We examine five areas where this transformation is particularly pressing: values, practice, teaching, technology, and ethics. We offer recommendations on safeguarding our intellectual autonomy, rethinking our practice, broadening curricula, building academically oriented infrastructure, and developing shared ethical principles - with the aim of ensuring that the...
1088	Muon Dynamics as a Spectral Wasserstein Flow 2604.04891 Spectral Wasserstein Optimization Dynamics将谱归一化优化刻画为谱Wasserstein流，分析Muon类动态稳定性。	cs.AI	Gabriel Peyr\'e	Gradient normalization stabilizes deep-learning optimization, and spectral normalizations are especially natural for matrix-shaped parameter blocks; Muon is the motivating example. We study an idealized deterministic, continuous-time, vanishing-momentum versio... Gradient normalization stabilizes deep-learning optimization, and spectral normalizations are especially natural for matrix-shaped parameter blocks; Muon is the motivating example. We study an idealized deterministic, continuous-time, vanishing-momentum version of this idea in the mean-field regime, where wide models are represented by probability measures on parameter space. Starting from normalized matrix flows, we introduce Spectral Wasserstein distances indexed by norms $\gamma$ on positive ...
1089	Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study 2604.08059 AI Component Lifecycle Governance提出AI组件演化的兼容性检查、监控与回滚治理框架。	cs.AI	Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li	Software systems built from versioned AI components increasingly need lifecycle-time governance: when a capability module evolves into a new version, the hosting system must decide whetmeher the new version may be activated safely, under what deployment condit... Software systems built from versioned AI components increasingly need lifecycle-time governance: when a capability module evolves into a new version, the hosting system must decide whetmeher the new version may be activated safely, under what deployment conditions, with what monitoring, and when it should be rolled back. Existing software-deployment patterns (canary, blue-green, feature flags, MLOps pipelines) address parts of this loop but were designed for stateless web services rather than st...
1090	Code World Model Preparedness Report 2605.00932 Frontier Model Preparedness Evaluation对Meta代码世界模型做发布前风险与失配倾向评估并给出结论。	cs.AI	Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd	This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catas... This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned propensities. Our assessment found that CWM does not pose additional frontier risks beyond those present in the current AI ecosystem. We therefore release it as an open-weight model.
1091	Beyond Retrieval: A Multitask Benchmark and Model for Code Search 2605.04615 Code Retrieval and Reranking Benchmark发布CoREB多任务代码检索重排基准并训练高质量重排模型。	cs.AI	Siqiao Xue, Zihan Liao, Jin Qin, Ziyin Zhang, Yixiang Mu	Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary re... Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce \textsc{CoREB}, a contamination-limited, multitask \underline{co}de \underline{r}etrieval and r\underline{e}ranking \underline{b}enchmark, together with a fine-tuned code reranker, that goes beyond retri...
1092	LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts 2605.05110 Line-Guided Robot Reinforcement Learning用线条引导与稀疏姿态约束，让自行车机器人无示范学会特技。	cs.AI	Seungeun Rho, Shamel Fahmi, Jeonghwan Kim, Arianna Ilvonen, Sehoon Ha	Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guid... Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physi...
1093	How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study 2605.05340 VLM Physical-World Privacy Awareness构建评测并实证分析VLM在真实物理场景中的隐私意识能力。	cs.AI	Junran Wang, Xinjie Shen, Zehao Jin, Pan Li	As Vision-Language Models (VLMs) are increasingly deployed as autonomous cognitive cores for embodied assistants, evaluating their privacy awareness in physical environments becomes critical. Unlike digital chatbots, these agents operate in intimate spaces, su... As Vision-Language Models (VLMs) are increasingly deployed as autonomous cognitive cores for embodied assistants, evaluating their privacy awareness in physical environments becomes critical. Unlike digital chatbots, these agents operate in intimate spaces, such as homes and hospitals, where they possess the physical agency to observe and manipulate privacy-sensitive information and artifacts. However, current benchmarks remain limited to unimodal, text-based representations that cannot capture ...
1094	Towards an Inferentialist Account of Information Through Proof-theoretic Semantics 2605.05368 Proof-Theoretic Semantics of Information以证明论语义提出信息的推理主义基础框架与关键组成。	cs.AI	Matthew Collinson, Timo Eckhardt, David Pym	Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools f... Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools for understanding the complex ecosystems of systems upon which the society depends. We seek to rectify this by taking a first step towards developing an inferentialist semantic theory of information. There are three key interacting component...
1095	AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents 2605.06607 AI Agents for CFD Discovery提出面向CFD仿真的物理感知AI科学家代理，支持开放式发现流程。	cs.AI	Nithin Somasekharan, Rabi Pathak, Manushri Dhanakoti, Tingwen Zhang, Ling Yue	Recent LLM-based agents have closed substantial portions of the scientific discovery loop in software-only machine-learning research, in chemistry, and in biology. Extending the same loop to high-fidelity physical simulators is harder, because solver completio... Recent LLM-based agents have closed substantial portions of the scientific discovery loop in software-only machine-learning research, in chemistry, and in biology. Extending the same loop to high-fidelity physical simulators is harder, because solver completion does not imply physical validity and many failure modes appear only in field-level imagery rather than in solver logs. We present AI CFD Scientist, an open-source AI scientist for computational fluid dynamics (CFD) that, to our knowledge,...
cs.CL 211 papers
281	Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas 2605.06673 LLM Metacognitive Monitoring Atlas构建33模型域级元认知图谱，量化不同MMLU领域的置信度校准差异。	cs.CLcs.LGcs.AI	Jon-Paul Cacioli	Aggregate metacognitive quality scores mask within-model variation across MMLU benchmark domains. We administered 1,500 MMLU items (250 per domain, under an a priori six-domain grouping) to 33 frontier LLMs from eight model families and computed Type-2 AUROC p... Aggregate metacognitive quality scores mask within-model variation across MMLU benchmark domains. We administered 1,500 MMLU items (250 per domain, under an a priori six-domain grouping) to 33 frontier LLMs from eight model families and computed Type-2 AUROC per model-domain cell using verbalized confidence (0-100). Total observations: 47,151. Every model with above-chance aggregate monitoring showed non-trivial domain-level variation. Applied/Professional knowledge was reliably the easiest benc...
282	VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing 2605.06765 Expressive Spoken Language Model提出端到端语音语言模型，支持角色扮演式表达与歌唱生成。	cs.CLcs.AI	Jiacheng Xu, Heting Gao, Liufei Xie, Zhenchuan Yang, Lijiang Li	Human speech conveys expressiveness beyond linguistic content, including personality, mood, or performance elements, such as a comforting tone or humming a song, which we formalize as role-playing and singing. We present VITA-QinYu, the first expressive end-to... Human speech conveys expressiveness beyond linguistic content, including personality, mood, or performance elements, such as a comforting tone or humming a song, which we formalize as role-playing and singing. We present VITA-QinYu, the first expressive end-to-end (E2E) spoken language model (SLM) that goes beyond natural conversation to support both role-playing and singing generation. VITA-QinYu adopts a hybrid speech-text paradigm that extends interleaved text-audio modeling with multi-codebo...
283	IntentGrasp: A Comprehensive Benchmark for Intent Understanding 2605.06832 Intent Understanding Benchmark发布IntentGrasp基准，统一多语料格式评测LLM意图理解能力。	cs.CLcs.LGcs.AI	Yuwei Yin, Chuyuan Li, Giuseppe Carenini	Accurately understanding the intent behind speech, conversation, and writing is crucial to the development of helpful Large Language Model (LLM) assistants. This paper introduces IntentGrasp, a comprehensive benchmark for evaluating the intent understanding ca... Accurately understanding the intent behind speech, conversation, and writing is crucial to the development of helpful Large Language Model (LLM) assistants. This paper introduces IntentGrasp, a comprehensive benchmark for evaluating the intent understanding capability of LLMs. Derived from 49 high-quality, open-licensed corpora spanning 12 diverse domains, IntentGrasp is constructed through source datasets curation, intent label contextualization, and task format unification. IntentGrasp contain...
284	TajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLP 2605.06886 Tajik-Persian Lexical Resource构建塔吉克-波斯跨文字词汇资源，并比较混合、神经与检索方法。	cs.CL	Mullosharaf K. Arabov	This work introduces TajPersLexon, a curated Tajik--Persian parallel lexical resource of 40,112 word and short-phrase pairs for cross-script lexical retrieval, transliteration, and alignment in low-resource settings. We conduct a comprehensive CPU-only benchma... This work introduces TajPersLexon, a curated Tajik--Persian parallel lexical resource of 40,112 word and short-phrase pairs for cross-script lexical retrieval, transliteration, and alignment in low-resource settings. We conduct a comprehensive CPU-only benchmark comparing three methodological families: (i) a lightweight hybrid pipeline, (ii) neural sequence-to-sequence models, and (iii) retrieval methods. Our evaluation establishes that the task is essentially solvable, with neural and retrieval...
285	MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes 2605.06897 Speech Tool-Calling Smart Home提出MIST多模态语音工具调用助手，建模智能家居状态与时空约束交互。	cs.CLcs.AIcs.SDeess.AScs.MM	Maximillian Chen, Xuanming Zhang, Michael Peng, Zhou Yu, Alexandros Papangelis	The rise of Internet of Things (IoT) devices in the physical world necessitates voice-based interfaces capable of handling complex user experiences. While modern Large Language Models (LLMs) already demonstrate strong tool-usage capabilities, modeling real-wor... The rise of Internet of Things (IoT) devices in the physical world necessitates voice-based interfaces capable of handling complex user experiences. While modern Large Language Models (LLMs) already demonstrate strong tool-usage capabilities, modeling real-world IoT devices presents a difficult, understudied challenge which combines modeling spatiotemporal constraints with speech inputs, dynamic state tracking, and mixed-initiative interaction patterns. We introduce MIST (the Multimodal Interact...
286	Reflections and New Directions for Human-Centered Large Language Models 2605.06901 Human-Centered LLM Framework提出人本LLM开发框架，指导评测与部署以对齐人类价值与需求。	cs.CL	Caleb Ziems, Dora Zhao, Rose E. Wang, Matthew J\"orke, Ahmad Rushdi	Large Language Models (LLMs) are increasingly shaping the private and professional lives of users, with numerous applications in business, education, finance, healthcare, law, and science. With this rise in global influence comes greater urgency to build, eval... Large Language Models (LLMs) are increasingly shaping the private and professional lives of users, with numerous applications in business, education, finance, healthcare, law, and science. With this rise in global influence comes greater urgency to build, evaluate, and deploy these systems in a manner that prioritizes not only technical capabilities but also human priorities. This work presents a framework for developing Human-Centered Large Language Models (HCLLMs), which integrates perspective...
287	MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text 2605.06903 AI-Generated Text Detection提出多任务均衡学习检测器，提升AI文本检测的鲁棒性与低误报表现。	cs.CLcs.AI	Chenjun Li, Cheng Wan, Johannes C. Paetzold	Large language models are now embedded in everyday writing workflows, making reliable AI-generated text detection important for academic integrity, content moderation, and provenance tracking. In practice, however, a detector must do more than achieve high agg... Large language models are now embedded in everyday writing workflows, making reliable AI-generated text detection important for academic integrity, content moderation, and provenance tracking. In practice, however, a detector must do more than achieve high aggregate AUROC on clean, in-distribution human and AI text: it should remain robust to attacks and adversarial rewrites, transfer to unseen generators and domains, and operate at low false-positive rates (FPR). Most existing detectors optimiz...
288	Can LLMs Take Retrieved Information with a Grain of Salt? 2605.06919 RAG Context Certainty Obedience评测LLM对检索证据不确定性的服从度，分析其在高风险场景的局限。	cs.CL	Behzad Shayegh, Mohamed Osama Ahmed, Fred Tung, Leo Feng	Large language models have demonstrated impressive retrieval-augmented capabilities. However, a crucial area remains underexplored: their ability to appropriately adapt responses to the certainty of the retrieved information. It is a limitation with real conse... Large language models have demonstrated impressive retrieval-augmented capabilities. However, a crucial area remains underexplored: their ability to appropriately adapt responses to the certainty of the retrieved information. It is a limitation with real consequences in high-stakes domains like medicine and finance. We evaluate eight LLMs on their context-certainty obedience, measuring how well they adjust responses to match expressed context certainty. Our analysis reveals systematic limitation...
289	MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media 2605.06940 Bengali LLM Annotation Benchmark发布孟加拉社媒多维标注基准，诊断闭集指令导致的标签塌缩问题。	cs.CL	Souvik Pramanik, S. M. Riaz Rahman Antu, Shak Mohammad Abyad, Md. Ibrahim Khalil, Md. Shahriar Hussain	Annotation automation via Large Language Models (LLMs) is the core approach for scaling NLP datasets; however, LLM behavior with respect to closed-set instructions in low-resource languages has not been well studied. We present MultiSoc-4D, a Bengali social me... Annotation automation via Large Language Models (LLMs) is the core approach for scaling NLP datasets; however, LLM behavior with respect to closed-set instructions in low-resource languages has not been well studied. We present MultiSoc-4D, a Bengali social media dataset benchmark, which contains 58K+ social media comments from six sources annotated along four dimensions: category, sentiment, hate speech, and sarcasm. By employing a structured pipeline where ChatGPT, Gemini, Claude, and Grok ind...
290	Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries 2605.06978 Group-Structured Skill Retrieval提出分组结构化技能检索，将技能库结果组织为可执行入口与支持关系。	cs.CLcs.AI	Kun Zeng, Yu Huo, Siyu Zhang, Zi Ye, Yuecheng Zhuo	Skill-augmented agents increasingly rely on large reusable skill libraries, but retrieving relevant skills is not the same as presenting usable context. Existing methods typically return atomic skills or dependency-aware bundles whose internal roles remain imp... Skill-augmented agents increasingly rely on large reusable skill libraries, but retrieving relevant skills is not the same as presenting usable context. Existing methods typically return atomic skills or dependency-aware bundles whose internal roles remain implicit, leaving the agent to infer the execution entry point, support skills, visible requirements, and failure-avoidance guidance. We introduce Group of Skills (GoSkills), an inference-time group-structured retrieval method that changes the...
291	Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion 2605.07013 Bitstream Diffusion Language Modeling提出熵门控连续比特流扩散语言模型，缩小与自回归生成的质量差距。	cs.CL	Georgios Batzolis, Mark Girolami, Luca Ambrogioni	Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in sample quality and diversity. Recent continuous flow and diffusion approaches over token embe... Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in sample quality and diversity. Recent continuous flow and diffusion approaches over token embeddings have narrowed this gap, suggesting continuous state spaces are highly effective for language. In this work, we further close the autoregressive gap by modeling text as a continuous diffusion process over fixed-width binary bitstreams...
292	Cognitive Agent Compilation for Explicit Problem Solver Modeling 2605.07040 Cognitive Agent Compilation将LLM行为编译为显式可编辑的问题求解者模型，便于教学场景可控与可解释。	cs.CLcs.AI	Hyeongdon Moon, Carolyn Ros\'e, John Stamper	Large language models (LLMs) are widely used for tutoring, feedback generation, and content creation, but their broad pretraining makes them hard to constrain and poor substitutes for controllable learners. Educational systems often require inspectable and edi... Large language models (LLMs) are widely used for tutoring, feedback generation, and content creation, but their broad pretraining makes them hard to constrain and poor substitutes for controllable learners. Educational systems often require inspectable and editable knowledge states: educators want to know what a system assumes the learner knows, and learners benefit when the system can justify actions in terms of explicit skills, misconceptions, and strategies. Inspired by cognitive architecture...
293	NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models 2605.07051 Science Math Riddles Benchmark提出科学与数学谜题基准，用开放式问答评测LLM教育推理能力。	cs.CL	George Boateng, Naafi Ibrahim, Samuel John, Philemon Badu, Patrick Agyeman-Budu	Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics education. Yet, LLMs tend to be evaluated on science and mathematical educational datasets from... Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics education. Yet, LLMs tend to be evaluated on science and mathematical educational datasets from the Western world, with an underrepresentation of datasets from the Global South. Furthermore, they tend to have multiple-choice answer options that are trivial to evaluate. In this work, we present NSMQ Riddles, a novel benchmark of Scien...
294	GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations 2605.07053 Semantic Variant Benchmark Augmentation提出GSM-SEM框架，生成语义多样的数学题变体以评估鲁棒性与防记忆。	cs.CLcs.AI	Jyotika Singh, Fang Tu, Aziza Mirzadova, Amit Agarwal, Hitesh Laxmichand Patel	Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorization of fixed test sets. Most robustness variants apply surface-level perturbations (paraphrases, renamings, number swaps, ... Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorization of fixed test sets. Most robustness variants apply surface-level perturbations (paraphrases, renamings, number swaps, distractors) that largely preserve the underlying facts, and static releases can themselves become memorization targets over time. We introduce GSM-SEM, a reusable and stochastic framework for generating semantically diverse benchmark varia...
295	MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments 2605.07058 Clinical Diagnosis LLM Agent训练医疗LLM代理在噪声环境中提问、检查并诊断，贴近真实临床流程。	cs.CLcs.AI	Yicheng Gao, Xiaolin Zhou, Yahan Li, Yue Zhao, Ruishan Liu	Real-world clinical diagnosis is a complex process in which the doctor is required to obtain information from both interaction with the patient and conducting medical exams. Additionally, the doctor needs to adapt to different patient personas, as well as nois... Real-world clinical diagnosis is a complex process in which the doctor is required to obtain information from both interaction with the patient and conducting medical exams. Additionally, the doctor needs to adapt to different patient personas, as well as noisy and incomplete information that can happen at any time during the process. However, existing benchmarks for medical LLMs and methods for automatic diagnosis largely simplify this process by reducing it to single-turn question answering, n...
296	WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems 2605.07068 LLM Wiki Knowledge Compilation提出WiCER迭代编译评估精炼流程，减少LLM将文档蒸馏成Wiki时的信息丢失。	cs.CLcs.AI	Juan M. Huerta	The LLM Wiki pattern, to compile and provide domain knowledge into a persistent artifact and serve it to LLMs via KV cache inference, promises context access at sub-second latency with zero retrieval failure. Realizing this requires solving the compilation gap... The LLM Wiki pattern, to compile and provide domain knowledge into a persistent artifact and serve it to LLMs via KV cache inference, promises context access at sub-second latency with zero retrieval failure. Realizing this requires solving the compilation gap: LLM compilation distilling raw documents into a wiki without catastrophically discarding critical facts. We characterize this gap across 17 RepLiQA domains (6,800 questions): we observe that full context KV cache inference outperforms RAG...
297	Self-Consolidating Language Models: Continual Knowledge Incorporation from Context 2605.07076 Continual Context Consolidation提出SCoL后训练方法，将长上下文知识持续写入权重并抑制干扰遗忘。	cs.CLcs.LG	Zekun Wang, Anant Gupta, Zihan Dong, Christopher J. MacLellan	Large language models (LLMs) increasingly receive information as streams of passages, conversations, and long-context workflows. While longer context windows expose more evidence, they do not ensure that useful information is preserved and reused. We study con... Large language models (LLMs) increasingly receive information as streams of passages, conversations, and long-context workflows. While longer context windows expose more evidence, they do not ensure that useful information is preserved and reused. We study continual context consolidation: writing current context into model weights while limiting interference with previously consolidated information. We propose \textbf{S}elf-\textbf{Co}nsolidating \textbf{L}anguage Models (SCoL), a post-training ...
298	Beyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR Evaluation 2605.07084 ASR Evaluation Epistemic Injustice批判ASR单一真值转写评测，揭示转写规范差异带来的认识不公。	cs.CL	Anna Seo Gyeong Choi, Maria Teleki, James Caverlee, Miguel del Rio, Corey Miller	Automatic speech recognition (ASR) evaluation compares system output to ground truth transcripts, with Word Error Rate (WER) quantifying the distance between them. But ground truth transcripts are not discovered - they are produced by human annotators followin... Automatic speech recognition (ASR) evaluation compares system output to ground truth transcripts, with Word Error Rate (WER) quantifying the distance between them. But ground truth transcripts are not discovered - they are produced by human annotators following conventions that encode normative assumptions about which speech features matter. Different conventions (verbatim, non-verbatim, legal) produce different transcripts of identical speech and judge the same ASR output differently. This pape...
299	The Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual Benchmarks 2605.07093 Translation Tax Audit Benchmarks对英译中基准进行反事实审计，表明翻译线索继承效应并非统一标量。	cs.CLcs.LGcs.AI	Zezheng Lin, Fengming Liu, Handi Li	The Translation Tax is often treated as a scalar: translated benchmarks are assumed to inflate scores by preserving English-source cues. We audit this claim in an English-to-Chinese setting. Three proxy estimators disagree: back-translation gaps are small and ... The Translation Tax is often treated as a scalar: translated benchmarks are assumed to inflate scores by preserving English-source cues. We audit this claim in an English-to-Chinese setting. Three proxy estimators disagree: back-translation gaps are small and parser-fragile; cue-score calibration does not predict item-level gains; and a six-model native-control comparison shows model-family rather than uniform benchmark effects. We add a same-item LLM-naturalization stress test that holds answer...
300	SAGE: Hierarchical LLM-Based Literary Evaluation through Ontology-Grounded Interpretive Dimensions 2605.07102 LLM Literary Evaluation Ontology提出SAGE分层本体维度框架，用LLM多轮反思评估文学质量。	cs.CL	Tianyu Wang, Nianjun Zhou	Evaluating literary quality requires assessing interpretive dimensions such as cultural representation, emotional depth, and philosophical sophistication that resist straightforward computational measurement. We introduce SAGE, a hierarchical evaluation framew... Evaluating literary quality requires assessing interpretive dimensions such as cultural representation, emotional depth, and philosophical sophistication that resist straightforward computational measurement. We introduce SAGE, a hierarchical evaluation framework that decomposes literary quality into ontology-grounded interpretive dimensions assessed through structured large language model evaluation with multi-round iterative reflection and independent validation. We validate the framework on 1...
301	Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning 2605.07106 Latent Visual Reasoning提出空间语义落地的潜变量视觉推理框架以缓解信息瓶颈。	cs.CL	Jin Cui, Xinyue Long, Xunyong Zhang, Yadong Zhang, Chuanchang Su	Multimodal Large Language Models (MLLMs) have made remarkable progress on vision-language reasoning, yet most methods still compress visual evidence into discrete textual thoughts, creating an information bottleneck for fine-grained perception. Recent latent v... Multimodal Large Language Models (MLLMs) have made remarkable progress on vision-language reasoning, yet most methods still compress visual evidence into discrete textual thoughts, creating an information bottleneck for fine-grained perception. Recent latent visual reasoning methods attempt to reason in continuous hidden states, but we find that they suffer from insufficient manifold compatibility: latent trajectories drift away from pretrained reasoning circuits, collapse into instance-agnostic...
302	Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability 2605.07110 Secure Computer-Use Agents提出覆盖架构到生命周期的框架以提升电脑操作代理部署可靠性。	cs.CL	Zejian Chen, Zhanyuan Liu, Chaozhuo Li, Mengxiang Han, Songyang Liu	Computer-use agents(CUAs)are moving frombounded benchmarks toward real software environments, wherethey operate browsers, desktops, mobile applications, flesystems,terminals, and tool backends. In such settings, reliability isno longer captured by task success... Computer-use agents(CUAs)are moving frombounded benchmarks toward real software environments, wherethey operate browsers, desktops, mobile applications, flesystems,terminals, and tool backends. In such settings, reliability isno longer captured by task success alone: perception errors,planning drift, memory use, tool mediation, permission scope,and runtime oversight jointly determine whether agent actionsremain aligned with user intent, Existing surveys organize theCUA landscape by methods, plat...
303	Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation 2605.07111 Optimizer Routing for Adaptation用梯度引导在LoRA与全参微调间路由优化器以适配LLM。	cs.CLcs.AI	Haozhan Tang, Xiuqi Zhu, Xinyin Zhang, Boxun Li, Virginia Smith	Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for high-entropy knowledge injection, Low-Rank Adaptation (LoRA) can match or surpass FFT per... Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for high-entropy knowledge injection, Low-Rank Adaptation (LoRA) can match or surpass FFT performance because many tasks only require updates in a low-rank space and benefit from LoRA's additional regularization. Through empirical evaluation across diverse tasks (SQL, Medical QA, and Counterfactual Knowledge) and varying language m...
304	Region4Web: Rethinking Observation Space Granularity for Web Agents 2605.07134 Web Agent Observation Regions将网页观察从元素级改为功能区域级以提升Web代理感知与决策。	cs.CLcs.AI	Donguk Kwon, Dongha Lee	Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization ... Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization implicit and forcing the agent to infer it from element-level signals at every step. We argue observation should instead operate at the granularity of functional regions, parts of the page that each serve a distinct purpose. We propose Regi...
305	Structural Rationale Distillation via Reasoning Space Compression 2605.07139 Rationale Distillation Compression通过压缩并复用推理路径库来蒸馏更稳定的结构化推理监督。	cs.CLcs.LGcs.AI	Jialin Yang, Jiankun Wang, Jiajun Wu, Henry Leung, Jiayu Zhou	When distilling reasoning from large language models (LLMs) into smaller ones, teacher rationales for similar problems often vary wildly in structure and strategy. Like a chef who makes the same dish differently each time, this inconsistency burdens the studen... When distilling reasoning from large language models (LLMs) into smaller ones, teacher rationales for similar problems often vary wildly in structure and strategy. Like a chef who makes the same dish differently each time, this inconsistency burdens the student with noisy supervision that is hard to internalize. We propose Distillation through Reasoning Path Compression (D-RPC), which constrains the teacher to follow a compact, dynamically maintained bank of reusable high-level reasoning paths. ...
306	Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs 2605.07153 RL for Knowledge Recall证明仅用正确性奖励的强化学习可提升LLM闭卷事实召回能力。	cs.CL	Wanli Yang, Hongyu Zang, Junwei Zhang, Wenjie Shi, Du Su	Reinforcement learning (RL) has achieved remarkable success in LLM reasoning, but whether it can also improve direct recall of parametric knowledge remains an open question. We study this question in a controlled zero-shot, one-hop, closed-book QA setting with... Reinforcement learning (RL) has achieved remarkable success in LLM reasoning, but whether it can also improve direct recall of parametric knowledge remains an open question. We study this question in a controlled zero-shot, one-hop, closed-book QA setting with no chain-of-thought, training only on binary correctness rewards and applying fact-level train-test deduplication to ensure gains reflect improved recall rather than reasoning or memorization. Across three model families and multiple factu...
307	CLIPer: Tailoring Diverse User Preference via Classifier-Guided Inference-Time Personalization 2605.07162 Inference-Time Personalization用分类器在推理时动态引导生成以实现多偏好个性化而无需多次微调。	cs.CL	Jinyan Su, Jinpeng Zhou, Claire Cardie, Wen Sun	Personalized LLMs can significantly enhance user experiences by tailoring responses to preferences such as helpfulness, conciseness, and humor. However, fine-tuning models to address all possible combinations of user preferences is computationally expensive an... Personalized LLMs can significantly enhance user experiences by tailoring responses to preferences such as helpfulness, conciseness, and humor. However, fine-tuning models to address all possible combinations of user preferences is computationally expensive and impractical. In this paper, we introduce \textbf{CLIPer}(\textbf{Cl}assifier-guided \textbf{I}nference-time \textbf{Per}sonalization), a lightweight personalization approach that leverages a classifier model to steer LLM generation dynami...
308	Rethinking Experience Utilization in Self-Evolving Language Model Agents 2605.07164 Experience Use in Agents研究自进化代理在运行时何时以及如何按需使用历史经验。	cs.CL	Weixiang Zhao, Yingshuo Wang, Yichen Zhang, Yanyan Zhao, Yu Zhang	Self-evolving agents improve by accumulating and reusing experience from past interactions. Existing work has largely focused on how experience is constructed, represented, and updated, while paying less attention to how experience should be used during runtim... Self-evolving agents improve by accumulating and reusing experience from past interactions. Existing work has largely focused on how experience is constructed, represented, and updated, while paying less attention to how experience should be used during runtime decision-making. As a result, most agents rely on rigid usage strategies, either injecting experience once at initialization or at every step, without considering whether it is needed for the current decision. This paper studies experienc...
309	A Reproducible Multi-Architecture Baseline for Token-Level Chinese Metaphor Identification under the MIPVU Framework 2605.07170 Chinese Metaphor Identification提供可复现的多架构基线用于中文MIPVU词级隐喻识别。	cs.CL	Yufeng Wu	Metaphor is pervasive in everyday language, yet token-level computational identification of metaphor-related words in Chinese under the MIPVU framework remains under-explored relative to English. This paper presents a reproducible multi-architecture baseline f... Metaphor is pervasive in everyday language, yet token-level computational identification of metaphor-related words in Chinese under the MIPVU framework remains under-explored relative to English. This paper presents a reproducible multi-architecture baseline for token-level metaphor identification on the PSU Chinese Metaphor Corpus (PSU CMC), the only widely available MIPVU-annotated Chinese corpus. We systematically compare three model families: (i) encoder fine-tuning with Chinese RoBERTa-wwm-...
310	Topology-Enhanced Alignment for Large Language Models: Trajectory Topology Loss and Topological Preference Optimization 2605.07172 Topology-Aware LLM Alignment用持久同调定义轨迹拓扑损失与偏好优化以正则化对齐几何结构。	cs.CL	Yurui Pan, Ke Xu, Bo Peng	Alignment of large language models (LLMs) via SFT and RLHF/DPO typically ignores the global geometry of the representation space, relying instead on local token likelihoods or scalar scores. We view generation as tracing a semantic trajectory in hidden space a... Alignment of large language models (LLMs) via SFT and RLHF/DPO typically ignores the global geometry of the representation space, relying instead on local token likelihoods or scalar scores. We view generation as tracing a semantic trajectory in hidden space and propose a topology-enhanced alignment framework that regularizes these trajectories using 0-dimensional persistent homology. First, for SFT, we introduce Trajectory Topology Loss (TTL). Treating prompt and gold-answer embeddings as a mix...
311	Learning Agent Routing From Early Experience 2605.07180 Query Routing to Agents提出冷启动下训练免的路由方法在轻量推理与完整代理执行间选择。	cs.CL	Yimin Wang, Jiahao Qiu, Xuan Qi, Xinzhe Juan, Jingzhe Shi	LLM agents achieve strong performance on complex reasoning tasks but incur high latency and compute cost. In practice, many queries fall within the capability boundary of cutting-edge LLMs and do not require full agent execution, making effective routing betwe... LLM agents achieve strong performance on complex reasoning tasks but incur high latency and compute cost. In practice, many queries fall within the capability boundary of cutting-edge LLMs and do not require full agent execution, making effective routing between LLMs and agents a key challenge. We study the problem of routing queries between lightweight LLM inference and full agent execution under realistic cold-start settings. To address this, we propose BoundaryRouter, a training-free routing ...
312	The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval 2605.07186 Robust IR under Corruption揭示词边界破坏导致LLM检索呈U形退化并分析其成因。	cs.CLcs.AI	Zekai Tong, Ruiyao Xu, Aryan Shrivastava, Chenhao Tan, Ari Holtzman	Existing Large Language Model (LLM) benchmarks primarily focus on syntactically correct inputs, leaving a significant gap in evaluation on imperfect text. In this work, we study how word-boundary corruption affects how LLMs detect targeted information. By inse... Existing Large Language Model (LLM) benchmarks primarily focus on syntactically correct inputs, leaving a significant gap in evaluation on imperfect text. In this work, we study how word-boundary corruption affects how LLMs detect targeted information. By inserting whitespace characters within words to break them into fragments, LLMs' detection accuracy follows a U-shaped curve with the increase in insertion rate. We refer to this curve as the Text Uncanny Valley. To explain such observation, we...
313	PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat 2605.07201 Toxicity Detection in Gaming用合成数据增强与多种微调集成方法完成游戏聊天多类毒性检测。	cs.CLcs.LGcs.AI	Srikar Kashyap Pulipaka	This paper describes our system for the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. The task involves classifying World of Tanks chat messages into six toxicity categories: Non-toxic, Insults/Flaming, Other Offensive, Hate/Har... This paper describes our system for the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. The task involves classifying World of Tanks chat messages into six toxicity categories: Non-toxic, Insults/Flaming, Other Offensive, Hate/Harassment, Threats, and Extremism. We explore multiple approaches including encoder-based models, instruction-tuned LLMs with LoRA fine-tuning, hierarchical classification, one-vs-rest strategies, and various ensemble methods. Our best system...
314	Hallucination Detection via Activations of Open-Weight Proxy Analyzers 2605.07209 Hallucination Detection via Activations用开源代理模型读取生成文本并基于其激活特征检测幻觉。	cs.CLcs.LGcs.AI	Akshita Singh, Prabesh Paudel, Siddhartha Roy	We introduce a proxy-analyzer framework for detecting hallucinations in large language models. Instead of looking inside the generating model, our system reads already-generated text through a small locally hosted open-weight model and spots hallucinations usi... We introduce a proxy-analyzer framework for detecting hallucinations in large language models. Instead of looking inside the generating model, our system reads already-generated text through a small locally hosted open-weight model and spots hallucinations using the reader's own internal activations. This works just as well when the generator is a closed API like GPT-4 as when it is any open-weight model. We built eighteen features grounded in how transformers process text, covering residual str...
315	Reformulating KV Cache Eviction Problem for Long-Context LLM Inference 2605.07234 KV Cache Eviction将KV缓存淘汰重构为输出感知的层级近似以降低长上下文开销。	cs.CLcs.AI	Tho Mai, Joo-Young Kim	Large language models (LLMs) support long-context inference but suffer from substantial memory and runtime overhead due to Key-Value (KV) Cache growth. Existing KV Cache eviction methods primarily rely on local attention weights, neglecting the influence of va... Large language models (LLMs) support long-context inference but suffer from substantial memory and runtime overhead due to Key-Value (KV) Cache growth. Existing KV Cache eviction methods primarily rely on local attention weights, neglecting the influence of value representations, output projection, and inter-head interactions. In this work, we reformulate KV Cache eviction from a conventional head-wise, weight-averaging approach into an output-aware, layer-wise matrix multiplication approximatio...
316	Teaching Language Models to Think in Code 2605.07237 Code-Centric Reasoning提出让模型以代码作为主要推理载体并结合执行来提升解题可靠性。	cs.CL	Hyeon Hwang, Jiwoo Lee, Jaewoo Kang	Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as... Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as a post-hoc verifier, intermediate NL computations are error-prone, and NL and code play overlapping rather than clearly distinct roles. We propose ThinC (Thinking in Code), a framework in which code itself serves as the reasoner rather tha...
317	SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting 2605.07243 Speculative Decoding Acceleration提出块迭代的动态树草稿推测解码以降低草稿延迟并加速推理。	cs.CL	Weijie Shi, Qiang Xu, Fan Deng, Yaguang Wu, Jiarun Liu	Speculative decoding accelerates LLM inference by drafting a tree of candidate continuations and verifying it in one target forward. Existing drafters fall into two camps with opposite weaknesses. Autoregressive drafters such as EAGLE-3 preserve dependence alo... Speculative decoding accelerates LLM inference by drafting a tree of candidate continuations and verifying it in one target forward. Existing drafters fall into two camps with opposite weaknesses. Autoregressive drafters such as EAGLE-3 preserve dependence along each draft path but call the drafter once per tree depth, making drafting a non-trivial share of per-iteration latency. Parallel drafters cut drafter calls by predicting multiple future positions in one forward, but each position is pred...
318	PaT: Planning-after-Trial for Efficient Test-Time Code Generation 2605.07248 Adaptive Test-Time Planning提出先试后规划策略在验证失败时才规划以提升代码生成效率。	cs.CLcs.LG	Youngsik Yoon, Sungjae Lee, Seockbean Song, Siwei Wang, Wei Chen	Beyond training-time optimization, scaling test-time computation has emerged as a key paradigm to extend the reasoning capabilities of Large Language Models (LLMs). However, most existing methods adopt a rigid Planning-before-Trial (PbT) policy, which ineffici... Beyond training-time optimization, scaling test-time computation has emerged as a key paradigm to extend the reasoning capabilities of Large Language Models (LLMs). However, most existing methods adopt a rigid Planning-before-Trial (PbT) policy, which inefficiently allocates test-time compute by incurring planning overhead even on directly solvable problems. We propose Planning-after-Trial (PaT), an adaptive policy for code generation that invokes a planner only upon verification failure. This a...
319	From 0-Order Selection to 2-Order Judgment: Combinatorial Hardening Exposes Compositional Failures in Frontier LLMs 2605.07268 Hardened Reasoning Benchmark提出LogiHard将选择题转为二阶判断以暴露前沿模型组合推理缺陷。	cs.CL	Hanmeng Liu, Shichao Weng, Xiulai Liu, Zhicai Zhang, Anli Yan	Multiple-choice reasoning benchmarks face dual challenges: rapid saturation from advancing models and data contamination that undermines static evaluations. Ad-hoc hardening methods (paraphrasing, perturbation) attempt to increase difficulty but sacrifice logi... Multiple-choice reasoning benchmarks face dual challenges: rapid saturation from advancing models and data contamination that undermines static evaluations. Ad-hoc hardening methods (paraphrasing, perturbation) attempt to increase difficulty but sacrifice logical validity for surface complexity, falling short to challenge advanced reasoning models. We present LogiHard, a formal framework that deterministically transforms 0-order selection into 2-order logical judgment, which significantly increa...
320	MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning 2605.07269 Prompt Injection Defense提出多语间接提示注入防御融合Qwen分类器与TF-IDF及集成学习。	cs.CLcs.LG	Al Muhit Muhtadi, Mostafa Rifat Tazwar	Indirect prompt injection remains a persistent weakness in retrieval-augmented and tool-using LLM systems, and the problem becomes harder to characterise in multilingual settings. We present MIPIAD, a defense framework evaluated on English and Bangla that comb... Indirect prompt injection remains a persistent weakness in retrieval-augmented and tool-using LLM systems, and the problem becomes harder to characterise in multilingual settings. We present MIPIAD, a defense framework evaluated on English and Bangla that combines a sequence classifier fine-tuned from Qwen2.5-1.5B via LoRA (XLPID), TF-IDF lexical features, and validation-tuned ensembling through late fusion, stacking, and gradient boosting. The framework is evaluated on a synthetic benchmark bui...
321	Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions 2605.07271 Layer Pruning Collapse Analysis用决策表征转变指标解释层剪枝中性能突崩的机制与分界层。	cs.CLcs.AI	Boyu Shi, Chang Liu, ChuanBao Gao, Xu Yang, Xin Geng	Layer pruning efficiently reduces Large Language Model (LLM) computational costs but often triggers sudden performance collapse. Existing representation-based analyses struggle to explain this mechanism. We propose studying pruning through decision representat... Layer pruning efficiently reduces Large Language Model (LLM) computational costs but often triggers sudden performance collapse. Existing representation-based analyses struggle to explain this mechanism. We propose studying pruning through decision representation. Focusing on multiple-choice tasks, we introduce two metrics, Decision Margin and Option Frequency, and an Iterative Pruning method to analyze layer-wise decision dynamics. Our findings reveal a sharp decision transition that partitions...
322	MedAction: Towards Active Multi-turn Clinical Diagnostic LLMs 2605.07305 Active Clinical Diagnosis Agents构建多轮主动诊断评测并分析LLM在开检验与更新诊断上的失误。	cs.CLcs.AI	Hsin-Ling Hsu, Zizheng Wang, Donghua Zhang, Nai-Chia Chen, Jerry Wang	Most existing LLM diagnoses are evaluated on static, single-turn settings where complete patient information is provided upfront, an oversimplification of real clinical practice. We study active diagnosis: the real-life clinical process of starting from initia... Most existing LLM diagnoses are evaluated on static, single-turn settings where complete patient information is provided upfront, an oversimplification of real clinical practice. We study active diagnosis: the real-life clinical process of starting from initial observation, ordering tests, interpreting results, and updating a differential diagnosis across multiple turns. Through systematic analysis, we identify three recurring failure modes in current LLMs: ungrounded test ordering, unreliable d...
323	Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts 2605.07307 Sparse Chain-of-Thought证明推理链可被稀疏化与乱序仍能提取答案并系统评估其影响。	cs.CL	Yi-Chang Chen, Feng-Ting Liao, Da-shan Shiu, Hung-yi Lee	Modern reasoning language models generate dense, sequential chain-of-thought traces implicitly assuming that every token contributes and that steps must be consumed in order. We challenge both assumptions through a systematic intervention pipeline--removal, ma... Modern reasoning language models generate dense, sequential chain-of-thought traces implicitly assuming that every token contributes and that steps must be consumed in order. We challenge both assumptions through a systematic intervention pipeline--removal, masking, shuffling, and noise injection--applied to model-generated reasoning chains across three models and three benchmarks. Our findings are counterintuitive on three dimensions. Order: Does the sequential order of a reasoning chain matter...
324	LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification 2605.07315 Latent-Then-Explicit Reasoning提出先潜在探索再显式验证的两阶段推理以兼顾成本与可检验性。	cs.CL	Xuan Li, Yining Wang, Yuchen Liu, Guanjun Liu, Delai Qiu	Chain-of-thought (CoT) reasoning improves large language models (LLMs) on difficult tasks, but it also makes inference expensive because every intermediate step must be generated as a discrete token. Latent reasoning reduces visible token generation by propaga... Chain-of-thought (CoT) reasoning improves large language models (LLMs) on difficult tasks, but it also makes inference expensive because every intermediate step must be generated as a discrete token. Latent reasoning reduces visible token generation by propagating continuous states, yet replacing explicit derivations with latent computation can hurt tasks that require symbolic checking. We propose Latent-Then-Explicit Reasoning (LaTER), a two-stage paradigm that first performs bounded exploratio...
325	Activation Differences Reveal Backdoors: A Comparison of SAE Architectures 2605.07324 Backdoor Detection with SAEs比较两类稀疏自编码器以用激活差异定位微调模型中的后门特征。	cs.CLcs.LGcs.AI	Sachin Kumar	Backdoor attacks on language models pose a significant threat to AI safety, where models behave normally on most inputs but exhibit harmful behavior when triggered by specific patterns. Detecting such backdoors through mechanistic interpretability remains an o... Backdoor attacks on language models pose a significant threat to AI safety, where models behave normally on most inputs but exhibit harmful behavior when triggered by specific patterns. Detecting such backdoors through mechanistic interpretability remains an open challenge. We investigate two sparse autoencoder architectures -- Crosscoders and Differential SAEs (Diff-SAE) -- for isolating backdoor-related features in fine-tuned models. Using a controlled SQL injection backdoor triggered by year-...
326	Mean-Pooled Cosine Similarity is Not Length-Invariant: Theory and Cross-Domain Evidence for a Length-Invariant Alternative 2605.07345 Length-Invariant Similarity Metric证明均值池化余弦不具长度不变性并提出跨域更稳健的替代指标。	cs.CLcs.LG	Sibayan Mitra (BITS Pilani), Dhruv Kumar (BITS Pilani)	Mean-pooled cosine similarity is the default metric for comparing neural representations across languages, modalities, and tasks. We establish that this metric is not length-invariant: under the anisotropy that characterizes modern transformer representations,... Mean-pooled cosine similarity is the default metric for comparing neural representations across languages, modalities, and tasks. We establish that this metric is not length-invariant: under the anisotropy that characterizes modern transformer representations, mean-pooled cosine grows monotonically in sequence length, independent of representational content. Empirically, on HumanEvalPack across four code LLMs, the length ratio alone explains $R^2 = 0.52$--$0.75$ of cross-language "Python proximi...
327	Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study 2605.07366 LoRA Rank Allocation in RL实证分析GRPO下基于梯度的LoRA秩分配并发现其不如均匀分配。	cs.CL	Yash Ganpat Sawant	Adaptive rank allocation for LoRA, allocating more parameters to important layers and fewer to unimportant ones, consistently improves efficiency under supervised fine-tuning (SFT). We investigate whether this success transfers to reinforcement learning, speci... Adaptive rank allocation for LoRA, allocating more parameters to important layers and fewer to unimportant ones, consistently improves efficiency under supervised fine-tuning (SFT). We investigate whether this success transfers to reinforcement learning, specifically Group Relative Policy Optimization (GRPO). Using gradient-magnitude profiling on Qwen 2.5 1.5B with GSM8K, we find that it does not: proportional rank allocation degrades accuracy by 4.5 points compared to uniform allocation (70.0% ...
328	The Proxy Presumption: From Semantic Embeddings to Valid Social Measures 2605.07409 Embedding Validity in Social Science批判将嵌入几何直接当社会测量的代理假设并强调需显式效度验证。	cs.CLcs.LG	Baishi Li, Ta Yu, Kelvin J. L. Koa, Ke-Wei Huang	Natural Language Processing is rapidly evolving into a primary instrument for Computational Social Science, with researchers increasingly using embeddings to measure latent constructs such as novelty, creativity, and bias. However, this transition faces a fund... Natural Language Processing is rapidly evolving into a primary instrument for Computational Social Science, with researchers increasingly using embeddings to measure latent constructs such as novelty, creativity, and bias. However, this transition faces a fundamental validity challenge: the ''Proxy Presumption,'' or the reliance on geometric properties (e.g., cosine distance) as direct measures of social concepts. We argue that without explicit validation, unsupervised representations remain ent...
329	Generating training datasets for legal chatbots in Korean 2605.07432 Korean Legal Chatbot Data提出生成并标注韩语法律聊天机器人训练数据以覆盖多样用户表达。	cs.CLcs.LG	Changhoe Hwang, Jee-Sun Nam, Eric Laporte	Chatbots are robots that can communicate with humans using text or voice signals. Legal chatbots improve access to justice, since legal representation and legal advice by lawyers come with a high cost that excludes disadvantaged and vulnerable people. However,... Chatbots are robots that can communicate with humans using text or voice signals. Legal chatbots improve access to justice, since legal representation and legal advice by lawyers come with a high cost that excludes disadvantaged and vulnerable people. However, capturing the diversity of actual user input in datasets for deep-learning dialog systems (chatbots) is a technical challenge. Diversity requires large volumes of data, which must also be labelled in order to classify the user's intent, wh...
330	SSP-based construction of evaluation-annotated data for fine-grained aspect-based sentiment analysis 2605.07446 Korean ABSA Annotated Corpus用半自动符号传播构建韩语细粒度ABSA评估标注语料与资源。	cs.CLcs.LG	Suwon Choi, Shinwoo Kim, Changhoe Hwang, Gwanghoon Yoo, Eric Laporte	We report the construction of a Korean evaluation-annotated corpus, hereafter called 'Evaluation Annotated Dataset (EVAD)', and its use in Aspect-Based Sentiment Analysis (ABSA) extended in order to cover e-commerce reviews containing sentiment and non-sentime... We report the construction of a Korean evaluation-annotated corpus, hereafter called 'Evaluation Annotated Dataset (EVAD)', and its use in Aspect-Based Sentiment Analysis (ABSA) extended in order to cover e-commerce reviews containing sentiment and non-sentiment linguistic patterns. The annotation process uses Semi-Automatic Symbolic Propagation (SSP). We built extensive linguistic resources formalized as a Finite-State Transducer (FST) to annotate corpora with detailed ABSA components in the fa...
331	Data Contamination in Neural Hieroglyphic Translation: A Reproducibility Study 2605.07453 NMT数据污染复现研究审计圣书体到德语翻译数据污染并复现BLEU差异	cs.CL	Ammar Toutou, Abdelrahman Harb, Christine Basta	Ancient and endangered languages pose a unique challenge for NLP: their datasets are inherently scarce, difficult to expand, and built from formulaic corpora -- making data-quality issues especially consequential yet rarely audited. Motivated by the need to un... Ancient and endangered languages pose a unique challenge for NLP: their datasets are inherently scarce, difficult to expand, and built from formulaic corpora -- making data-quality issues especially consequential yet rarely audited. Motivated by the need to understand what current NMT can realistically achieve for such languages, we investigate hieroglyphic-to-German translation, where a recent study reported 61.5 BLEU using fine-tuned M2M-100. Our reproduction yields only 37.0 BLEU with the rel...
332	GRaSp: Automatic Example Optimization for In-Context Learning in Low-Data Tasks 2605.07454 上下文学习示例优化提出GRaSp自动生成聚类筛选示例以提升低数据ICL	cs.CL	Simen Bihaug-Fr{\o}yland, Henrik Br{\aa}dland	In-context learning enables large language models to adapt to new tasks, but their performance is highly sensitive to the selected examples. Finding effective demonstrations is particularly difficult in domain-specific, low-data settings where high-quality exa... In-context learning enables large language models to adapt to new tasks, but their performance is highly sensitive to the selected examples. Finding effective demonstrations is particularly difficult in domain-specific, low-data settings where high-quality examples are scarce. We propose GRaSp, a three-stage framework for automatic in-context example optimization. By first generating a large synthetic candidate pool, then structuring it with clustering and dimensionality reduction, and finally u...
333	Think-with-Rubrics: From External Evaluator to Internal Reasoning Guidance 2605.07461 Rubric引导模型推理将评分量规内化为生成过程中的推理指导信号	cs.CL	Jiachen Yu, Zhihao Xu, Junjie Wang, Yujiu Yang	Rubrics have been extensively utilized for evaluating unverifiable, open-ended tasks, with recent research incorporating them into reward systems for reinforcement learning. However, existing frameworks typically treat rubrics only as external evaluator disjoi... Rubrics have been extensively utilized for evaluating unverifiable, open-ended tasks, with recent research incorporating them into reward systems for reinforcement learning. However, existing frameworks typically treat rubrics only as external evaluator disjointed from the policy's primary reasoning trace. Such design confines rubrics to post-hoc measurement, leaving them unable to actively guide the model's generation process. In this work, we introduce Think-with-Rubrics, a novel paradigm for ...
334	The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment 2605.07462 多智能体社交平台数据集发布Moltbook代理社区数据并分析结构与安全风险	cs.CLcs.AI	William Brach, Federico Torrielli, Stine Lyngs{\o} Beltoft, Annemette Brok Pirchert, Peter Schneider-Kamp	Moltbook is a Reddit-like platform where OpenClaw agents post, comment, and vote at scale - a so far unprecedented incident that comes with serious safety concerns. With the aim of studying emergent behavior in populations, we release the Moltbook Files, a dat... Moltbook is a Reddit-like platform where OpenClaw agents post, comment, and vote at scale - a so far unprecedented incident that comes with serious safety concerns. With the aim of studying emergent behavior in populations, we release the Moltbook Files, a dataset of 232k posts and 2.2M comments covering the platform's first 12 days, processed through a pipeline to identify and remove Personally-Identifiable Information (PII). We analyze community structure, authorship, lexical properties, senti...
335	SEIF: Self-Evolving Reinforcement Learning for Instruction Following 2605.07465 自进化强化学习指令跟随用自进化RL让指令难度随模型能力提升而动态增长	cs.CL	Qingyu Ren, Qianyu He, Jiajie Zhu, Xingzhou Chen, Jingwen Chang	Instruction following is a fundamental capability of large language models (LLMs), yet continuously improving this capability remains challenging. Existing methods typically rely either on costly external supervision from humans or strong teacher models, or on... Instruction following is a fundamental capability of large language models (LLMs), yet continuously improving this capability remains challenging. Existing methods typically rely either on costly external supervision from humans or strong teacher models, or on self-play training with static-difficulty instructions that cannot evolve as the model's capabilities improve. To address these limitations, we propose SEIF (Self-Evolving Reinforcement Learning for Instruction Following), a self-evolving ...
336	TCMIIES: A Browser-Based LLM-Powered Intelligent Information Extraction System for Academic Literature 2605.07507 LLM文献智能信息抽取实现浏览器端LLM系统从论文文本抽取结构化学术信息	cs.CL	Hanqing Zhao	The exponential growth of academic publications has created an urgent need for automated tools capable of extracting structured knowledge from unstructured scientific texts. While large language models (LLMs) have demonstrated remarkable capabilities in natura... The exponential growth of academic publications has created an urgent need for automated tools capable of extracting structured knowledge from unstructured scientific texts. While large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and information extraction, existing solutions often require specialized infrastructure, programming expertise, or fine-tuned domain-specific models that create barriers for researchers in specialized fields. This p...
337	WeatherSyn: An Instruction Tuning MLLM For Weather Forecasting Report Generation 2605.07522 天气多模态报告生成指令微调多模态模型自动生成天气预报文字报告	cs.CL	Zinan Zheng, Yang Liu, Nuo Chen, Juepeng Zheng, Hong Cheng	Accurate weather forecast reporting enables individuals and communities to better plan daily activities and agricultural operations. However, the current reporting process primarily relies on manual analysis of multi-source data, which leads to information ove... Accurate weather forecast reporting enables individuals and communities to better plan daily activities and agricultural operations. However, the current reporting process primarily relies on manual analysis of multi-source data, which leads to information overload and reduced efficiency. With the development of multimodal large language models (MLLMs), leveraging data-driven models to analyze and generate reports in the weather forecasting domain remains largely underexplored. In this work, we ...
338	Why do Large Language Models Fail in Low-resource Translation? Unraveling the Token Dynamics of Large Language Models for Machine Translation 2605.07533 低资源翻译失败机理分析多LLM在低资源翻译中的token动态与失效模式	cs.CL	Shenbin Qian, Yves Scherrer	Large Language Models (LLMs) have recently demonstrated strong performance in machine translation (MT). However, most prior work focuses on improving or benchmarking translation quality, offering limited insight into when and why LLM-based translation fails. I... Large Language Models (LLMs) have recently demonstrated strong performance in machine translation (MT). However, most prior work focuses on improving or benchmarking translation quality, offering limited insight into when and why LLM-based translation fails. In this work, we systematically analyze failure modes of LLMs in MT by evaluating 15 models, including four reasoning LLMs, across 22 language pairs (LPs) with varying resource levels. We find that non-English-centric LPs consistently yield ...
339	N\"urnberg NLP at PsyDefDetect: Multi-Axis Voter Ensembles for Psychological Defence Mechanism Classification 2605.07606 心理防御机制分类集成用多轴投票集成提升对话中防御机制类别判别	cs.CLcs.AI	Philipp Steigerwald, Eric Rudolph, Jens Albrecht	Detecting levels of psychological defence mechanisms in supportive conversations is inherently ambiguous. In the PsyDefDetect shared task at BioNLP 2026 the eight positive defence categories share surface language and differ only in pragmatic function and trai... Detecting levels of psychological defence mechanisms in supportive conversations is inherently ambiguous. In the PsyDefDetect shared task at BioNLP 2026 the eight positive defence categories share surface language and differ only in pragmatic function and trained raters reach only moderate inter-annotator agreement. On such a task the decisive lever is not a stronger single model but error independence, since any single representation will waver on the overlapping defence boundaries. We translat...
340	Intent-Driven Semantic ID Generation for Grounded Conversational News Recommendation 2605.07613 对话新闻推荐语义ID生成语义ID以缓解隐式意图下RAG检索瓶颈	cs.CL	Hongyang Su, Beibei Kong, Lei Cheng, Chengxiang Zhuo, Zang Li	Conversational news recommendation requires grounding each suggestion in a rapidly evolving article corpus while addressing implicit user intents that lack explicit retrievable keywords. To characterize this scenario, we identify 6 intent types from production... Conversational news recommendation requires grounding each suggestion in a rapidly evolving article corpus while addressing implicit user intents that lack explicit retrievable keywords. To characterize this scenario, we identify 6 intent types from production dialogues: five are implicit and pose fundamental challenges to standard RAG pipelines, forming a critical retrieve-first bottleneck. To address these issues, we introduce intent-driven Semantic ID (SID) generation under a Generate-then-Ma...
341	Is She Even Relevant? When BERT Ignores Explicit Gender Cues 2605.07622 BERT性别线索忽视偏差研究荷兰语BERT何时忽略显式性别线索及偏差形成	cs.CL	Jonas Klein, Chiara Manna, Eva Vanmassenhove	Gender bias in large language models has primarily been investigated for English, while languages with grammatical or morphological gender remain comparatively understudied. This paper investigates how and when gender information emerges in a Dutch BERT model ... Gender bias in large language models has primarily been investigated for English, while languages with grammatical or morphological gender remain comparatively understudied. This paper investigates how and when gender information emerges in a Dutch BERT model trained from scratch, offering one of the first checkpoint-level analyses of bias formation in a Transformer architecture for a language combining overt morphological gender marking and generic forms. By extracting contextual embeddings thr...
342	Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents 2605.07630 手机代理安全评测区分提出评测区分代理因安全选择还是因无能而避害	cs.CLcs.LGcs.AI	Zhengyang Tang, Yi Zhang, Chenxin Li, Xin Lai, Pengyuan Lyu	When a phone-use agent avoids harm, does that show safety, or simply inability to act? Existing evaluations often cannot tell. A harmful outcome may be avoided because the agent recognized the risk and chose the safe action, or because it failed to understand ... When a phone-use agent avoids harm, does that show safety, or simply inability to act? Existing evaluations often cannot tell. A harmful outcome may be avoided because the agent recognized the risk and chose the safe action, or because it failed to understand the screen or execute any relevant action at all. These cases have different causes and call for different fixes, yet current benchmarks often merge them under task success, refusal, or final harmful outcome. We address this problem with Ph...
343	Post-training makes large language models less human-like 2605.07632 后训练降低类人行为用Psych-201发现对齐后训练会降低模型与人类行为一致性	cs.CLcs.LGcs.AI	Marcel Binz, Elif Akata, Abdullah Almaatouq, Mohammed Alsobay, Oleksii Ariasov	Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral ali... Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in...
344	Multi-Dimensional Evaluation of LLMs for Grammatical Error Correction 2605.07635 语法纠错LLM多维评测系统评估LLM语法纠错并检验集成与指标低估问题	cs.CL	Adnan Labib, Qiao Wang, Yixuan Huang, Zheng Yuan	Automated assistants for Grammatical Error Correction are now embedded in educational platforms serving millions of learners, yet three critical gaps remain in this domain: (1) latest-generation Large Language Models (LLMs) lack comprehensive evaluation on gra... Automated assistants for Grammatical Error Correction are now embedded in educational platforms serving millions of learners, yet three critical gaps remain in this domain: (1) latest-generation Large Language Models (LLMs) lack comprehensive evaluation on grammar correction tasks; (2) whether combining these LLMs improves correction quality is unexplored; and (3) the extent to which reference-based metrics underestimate GEC system performance has not been adequately quantified. In this study, f...
345	MAVEN: Multi-Agent Verification-Elaboration Network with In-Step Epistemic Auditing 2605.07646 多智能体分步审计推理提出MAVEN用验证与扩写代理分步审计推理错误	cs.CLcs.LGcs.AI	Yinsheng Yao, Jiehao Tang, Zhaozhen Yang, Dawei Cheng	While explicit reasoning trajectories enhance model interpretability, existing paradigms often rely on monolithic chains that lack intermediate verification, allowing early errors to cascade unchecked. This lack of modularity impedes granular auditing and comp... While explicit reasoning trajectories enhance model interpretability, existing paradigms often rely on monolithic chains that lack intermediate verification, allowing early errors to cascade unchecked. This lack of modularity impedes granular auditing and compromises the epistemic trust required for high-stakes applications. We propose MAVEN (Multi-Agent Verification-Elaboration Network with In-Step Epistemic Auditing), a blackboard-inspired framework designed to transform LLMs into deliberate r...
346	Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation 2605.07647 短答评分一致性退化研究少样本LLM短答评分在部分正确答案上的一致性下降	cs.CLcs.AI	Abigail Victoria Gurin Schleifer, Moriah Ariely, Beata Beigman Klebanov, Asaf Salman, Giora Alexandron	Automated short answer scoring (ASAS) is shifting from discriminative, fine-tuned models to large language models (LLMs) used in few-shot settings. This paradigm leverages LLMs broad world knowledge and ease of deployment, but limited task-specific data may re... Automated short answer scoring (ASAS) is shifting from discriminative, fine-tuned models to large language models (LLMs) used in few-shot settings. This paradigm leverages LLMs broad world knowledge and ease of deployment, but limited task-specific data may reduce alignment on complex scoring tasks. In particular, its impact on scoring partially correct responses that require nuanced interpretation remains underexplored. We investigate the relationship between the degree of task-specific adaptat...
347	Not All Tokens Learn Alike: Attention Entropy Reveals Heterogeneous Signals in RL Reasoning 2605.07660 RL推理token学习信号用注意力熵揭示RL后训练中不同token的异质学习信号	cs.CL	Gengyang Li, Zheng-Fan Wu, Siqi Bao, Yunfang Wu	Reinforcement-learning-based post-training has become a key approach for improving the reasoning ability of large language models, but its token-level learning signals remain poorly understood. This work studies their heterogeneity through attention entropy, w... Reinforcement-learning-based post-training has become a key approach for improving the reasoning ability of large language models, but its token-level learning signals remain poorly understood. This work studies their heterogeneity through attention entropy, which measures how concentrated or diffuse the contextual support is for each response token. We first show that token-level RL objectives are sparsely estimable: uniformly random 20 percent token subsets preserve much of the full-token held...
348	DRIP-R: A Benchmark for Decision-Making and Reasoning Under Real-World Policy Ambiguity in the Retail Domain 2605.07699 零售政策歧义推理基准构建DRIP-R评测代理在真实零售政策歧义下决策推理	cs.CLcs.AI	Hsuvas Borkakoty, Sebastian Pohl, Cheng Wang, Bei Chen, Yufang Hou	LLM-based agents are increasingly deployed for routine but consequential tasks in real-world domains, where their behavior is governed by inherently ambiguous domain policies that admit multiple valid interpretations. Despite the prevalence of such ambiguities... LLM-based agents are increasingly deployed for routine but consequential tasks in real-world domains, where their behavior is governed by inherently ambiguous domain policies that admit multiple valid interpretations. Despite the prevalence of such ambiguities in practice, existing agent benchmarks largely assume unambiguous, well-specified policies, leaving a critical evaluation gap. We introduce DRIP-R, a benchmark that systematically exploits real-world retail policy ambiguities to construct ...
349	Guidance Is Not a Hyperparameter: Learning Dynamic Control in Diffusion Language Models 2605.07701 扩散语言模型动态引导将CFG尺度选择建模为序列决策以动态控制扩散生成	cs.CL	Fan Zhou, Tim Van de Cruys	Classifier-Free Guidance (CFG) is a widely used mechanism for controlling diffusion-based generative models, yet its guidance scale is typically treated as a fixed hyperparameter throughout generation. This static design yields a suboptimal controllability and... Classifier-Free Guidance (CFG) is a widely used mechanism for controlling diffusion-based generative models, yet its guidance scale is typically treated as a fixed hyperparameter throughout generation. This static design yields a suboptimal controllability and quality tradeoff, as the optimal degree of guidance varies across tasks and across different stages of the diffusion process, especially in NLP domain. We recast CFG scale selection as a sequential decision-making problem and propose to le...
350	SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation 2605.07711 跨分词器在策略蒸馏提出SimCT在不同tokenizer下恢复OPD丢失的教师监督	cs.CL	Jie Sun, Mao Zheng, Mingyang Song, Qiyong Zhong, Yilin Cheng	On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the ... On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differently. Under heterogeneous tokenizers, exact shared-token matching silently discards a large fraction of the teacher signal at precisely the positions where vocabularies disagree. We propose \textbf{\underline{Sim}ple \under...
351	Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models 2605.07721 循环Transformer省内存通过解耦计算与KV缓存降低循环推理深度带来的显存增长	cs.CLcs.LGcs.AI	Victor Conchello Vendrell, Arnau Padres Masdemont, Niccol\`o Grillo, Jordi Ros-Giralt, Arash Behboodi	Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updating interna... Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updating internal representations while retaining a standard Key-Value (KV) cache across iterations, causing memory consumption to grow linearly with reasoning depth. Consequently, increasing the number of reasoning iterations can lead to prohibitive memor...
352	SOD: Step-wise On-policy Distillation for Small Language Model Agents 2605.07725 小模型工具推理蒸馏提出SOD分步在策略蒸馏以稳定小模型长程工具交互	cs.CLcs.AI	Qiyong Zhong, Mao Zheng, Mingyang Song, Xin Lin, Jie Sun	Tool-integrated reasoning (TIR) is difficult to scale to small language models due to instability in long-horizon tool interactions and limited model capacity. While reinforcement learning methods like group relative policy optimization provide only sparse out... Tool-integrated reasoning (TIR) is difficult to scale to small language models due to instability in long-horizon tool interactions and limited model capacity. While reinforcement learning methods like group relative policy optimization provide only sparse outcome-level rewards. Recently, on-policy distillation (OPD) has gained popularity by supplying dense token-level supervision from a teacher on student-generated trajectories. However, our experiments indicate that applying OPD to TIR leads t...
353	Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs 2605.07731 意大利语MoE模型评测基准测试EngGPT2MoE并与同规模开源模型对比表现	cs.CLcs.AI	Andrea Sassella, Andrea Chizzola, Tommaso Bianchi, Luca Alessandrelli, Mark James Carman	This report benchmarks the performance of ENGINEERING Ingegneria Informatica S.p.A.'s EngGPT2MoE-16B-A3B LLM, a 16B parameter Mixture of Experts (MoE) model with 3B active parameters. Performance is investigated across a wide variety of representative benchmar... This report benchmarks the performance of ENGINEERING Ingegneria Informatica S.p.A.'s EngGPT2MoE-16B-A3B LLM, a 16B parameter Mixture of Experts (MoE) model with 3B active parameters. Performance is investigated across a wide variety of representative benchmarks, and is compared against comparably-sized open-source MoE and dense models. In comparison with popular Italian models, namely FastwebMIIA-7B, Minerva-7B, Velvet-14B, and LLaMAntino-3-ANITA-8B, EngGPT2MoE-16B-A3B performs as well or bette...
354	TextLDM: Language Modeling with Continuous Latent Diffusion 2605.07748 连续潜变量扩散语言建模提出TextLDM在VAE连续潜空间用扩散模型进行文本生成	cs.CL	Jiaxiu Jiang, Jingjing Ren, Wenbo Li, Bo Wang, Haoze Sun	Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next step toward a single architecture for both generation (visual synthesis) and understanding (text generation) i... Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next step toward a single architecture for both generation (visual synthesis) and understanding (text generation) is to apply this framework to language modeling. We propose TextLDM, which transfers the visual latent diffusion recipe to text generation with minimal architectural modification. A Transformer-based VAE maps discrete tokens to continuous la...
355	CktFormalizer: Autoformalization of Natural Language into Circuit Representations 2605.07782 Lean约束硬件自动形式化用依赖类型HDL将自然语言规格自动形式化为可验证电路	cs.CL	Jing Xiong, Qi Han, Chenchen Ding, He Xiao, Zunhai Su	LLMs can generate hardware descriptions from natural language specifications, but the resulting Verilog often contains width mismatches, combinational loops, and incomplete case logic that pass syntax checks yet fail in synthesis or silicon. We present CktForm... LLMs can generate hardware descriptions from natural language specifications, but the resulting Verilog often contains width mismatches, combinational loops, and incomplete case logic that pass syntax checks yet fail in synthesis or silicon. We present CktFormalizer, a framework that redirects LLM-driven hardware generation through a dependently-typed HDL embedded in Lean 4. Lean serves three roles: (i) type checker:dependent types encode bit-width constraints, case coverage, and acyclicity, tur...
356	Chain-based Distillation for Effective Initialization of Variable-Sized Small Language Models 2605.07783 小语言模型链式蒸馏初始化提出链式蒸馏高效初始化不同规模小模型以减少教师调用	cs.CL	Boyu Shi, YiCheng Jiang, Chang Liu, Qiufeng Wang, Xu Yang	Large language models (LLMs) achieve strong performance but remain costly to deploy in resource-constrained settings. Training small language models (SLMs) from scratch is computationally expensive, while conventional knowledge distillation requires repeated a... Large language models (LLMs) achieve strong performance but remain costly to deploy in resource-constrained settings. Training small language models (SLMs) from scratch is computationally expensive, while conventional knowledge distillation requires repeated access to large teachers for different target sizes, leading to poor scalability. To solve these problems, we propose \textbf{Chain-based Distillation (CBD)}, a scalable paradigm for efficiently initializing variable-sized language models. A...
357	Hybrid TF--IDF Logistic Regression and MLP Neural Baseline for Indonesian Three-Class Sentiment Analysis on Social Media Text 2605.07793 印尼语情感分析轻量基线结合TF-IDF与元特征用逻辑回归与MLP做三分类情感基线	cs.CL	Allya Nurul Islami Pasha, Eka Fidiya Putri, Luluk Muthoharoh, Ardika Satria, Martin C. T. Manullang	This paper presents a compact three-class sentiment analysis study for Indonesian social media text. The task is formulated with positive, negative, and neutral outputs derived from a fine-grained emotion dataset. The proposed practical baseline combines TF--I... This paper presents a compact three-class sentiment analysis study for Indonesian social media text. The task is formulated with positive, negative, and neutral outputs derived from a fine-grained emotion dataset. The proposed practical baseline combines TF--IDF text features, three lightweight numeric metadata features, and a balanced multinomial Logistic Regression classifier. For comparison, the study also includes a neural baseline using a two-layer multilayer perceptron (MLP) over the same ...
358	PolySQL: Scaling Text-to-SQL Evaluation Across SQL Dialects via Automated Backend Isomorphism 2605.07796 跨SQL方言Text-to-SQL评测用后端同构自动化实现多SQL方言可比的Text-to-SQL评测	cs.CL	Yotam Perlitz, Elad Venezian, Corentin Royer, Francesco Fusco, Andrea Giovannini	SQL dialects vary in syntax, types, and functions across database engines. Text-to-SQL benchmarks, however, predominantly support only SQLite. This creates a critical evaluation gap: cross-dialect evaluation reveals weak per-query agreement (Cohen's ), showing... SQL dialects vary in syntax, types, and functions across database engines. Text-to-SQL benchmarks, however, predominantly support only SQLite. This creates a critical evaluation gap: cross-dialect evaluation reveals weak per-query agreement (Cohen's ), showing that SQLite performance is an unreliable proxy for other dialects. Yet such evaluation remains prohibitively difficult: existing approaches either require expensive manual query transpilation or rely on tools that often fail on complex SQL...
359	Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs 2605.07806 LLM自我评估可靠性建模基于认知评估理论改进LLM自评以预测任务正确性	cs.CLcs.LGcs.AI	Sree Bhattacharyya, Samarth Khanna, Leona Chen, Lucas Craig, Tharun Dilliraj	Large Language Models (LLMs) are increasingly used in settings where reliable self-assessment is critical. Assessing model reliability has evolved from using probabilistic correctness estimates to, more recently, eliciting verbalized confidence. Confidence, ho... Large Language Models (LLMs) are increasingly used in settings where reliable self-assessment is critical. Assessing model reliability has evolved from using probabilistic correctness estimates to, more recently, eliciting verbalized confidence. Confidence, however, has been shown to be an inconsistent and overoptimistic predictor of model correctness. Drawing on cognitive appraisal theory, a framework from human psychology that decomposes self-evaluation into multiple components, we propose a m...
360	A Comparative Analysis of Classical Machine Learning and Deep Learning Approaches for Sentiment Classification on IMDb Movie Reviews 2605.07811 IMDb情感分类方法对比对比TF-IDF传统模型与BiLSTM注意力在IMDb情感分类表现	cs.CL	Erma Daniar Safitri, Lia Hana Ichisasmita, Citra Agustin, Luluk Muthoharoh, Ardika Satria	This paper presents a comparative study of classical machine learning and deep learning methods for sentiment classification on the IMDb movie reviews dataset. The machine learning pipeline uses TF-IDF features and PyCaret AutoML to evaluate Logistic Regressio... This paper presents a comparative study of classical machine learning and deep learning methods for sentiment classification on the IMDb movie reviews dataset. The machine learning pipeline uses TF-IDF features and PyCaret AutoML to evaluate Logistic Regression, Na\"ive Bayes, and Support Vector Machine, while the deep learning pipeline implements BiLSTM and BiLSTM with an attention mechanism. Experimental results show that classical machine learning, especially SVM, achieves the best performanc...
361	SCENE: Recognizing Social Norms and Sanctioning in Group Chats 2605.07823 Group Chat Social Norms提出SCENE基准评测LLM识别群聊隐性规范与制裁。	cs.CL	Mateusz Jacniacki, Maksymilian Bilski	Online group chats are social spaces with implicit behavior patterns that, when broken, are often met with social sanctioning from the group. The ability and willingness of LLM-based agents to recognize and adapt to these norms remains mostly unexplored. We in... Online group chats are social spaces with implicit behavior patterns that, when broken, are often met with social sanctioning from the group. The ability and willingness of LLM-based agents to recognize and adapt to these norms remains mostly unexplored. We introduce SCENE, a social-interaction benchmark focused on implicit norms and social sanctioning in multi-party chat. SCENE generates plausible non-roleplay scenarios with scripted personas that follow a hidden norm, create opportunities for ...
362	Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors 2605.07847 User Simulation Distribution Gap度量并缓解真实与模拟用户行为分布差异。	cs.CL	Shuhaib Mehri, Philippe Laban, Sumuk Shashidhar, Marwa Abdulhai, Sergey Levine	As user simulators are increasingly used for interactive training and evaluation of AI assistants, it is essential that they represent the diverse behaviors of real users. While existing works train user simulators to generate human-like responses, whether the... As user simulators are increasingly used for interactive training and evaluation of AI assistants, it is essential that they represent the diverse behaviors of real users. While existing works train user simulators to generate human-like responses, whether they capture the broad and heterogeneous distribution of real user behaviors remains an open question. In this work, we introduce a method to measure the distributional gap between real and simulated user behaviors, validated through a human s...
363	MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning 2605.07850 Hierarchical Rank-Adaptive LoRA提出MatryoshkaLoRA学习分层低秩表示以高效微调。	cs.CLcs.LGcs.AI	Ionut-Vlad Modoranu, Mher Safaryan, Dan Alistarh	With the rise in scale for deep learning models to billions of parameters, the computational cost of fine-tuning remains a significant barrier to deployment. While Low-Rank Adaptation (LoRA) has become the standard for parameter-efficient fine-tuning, the need... With the rise in scale for deep learning models to billions of parameters, the computational cost of fine-tuning remains a significant barrier to deployment. While Low-Rank Adaptation (LoRA) has become the standard for parameter-efficient fine-tuning, the need to set a predefined, static rank $r$ requires exhaustive grid searches to balance efficiency and performance. Existing rank-adaptive solutions such as DyLoRA mitigate this by sampling ranks during the training from a predefined distributio...
364	Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement 2605.07883 Flexible Safety Refusal用标签增强减少LLM僵硬拒答并保持安全合规。	cs.CL	Ying Zhang, Congyu Qiao, Xin Geng, Ning Xu	Large Language Models (LLMs) rely on safety alignment to obey safe requests while refusing harmful ones. However, traditional refusal mechanisms often lead to "rigid rejection," where a general template (e.g., "I cannot fulfill this request") indiscriminately ... Large Language Models (LLMs) rely on safety alignment to obey safe requests while refusing harmful ones. However, traditional refusal mechanisms often lead to "rigid rejection," where a general template (e.g., "I cannot fulfill this request") indiscriminately triggers refusals and severely undermines the naturalness of interactions between humans and LLMs. To address this issue, LANCE is proposed in this paper to ensure safe yet flexible and natural responses via label enhancement. Specifically,...
365	CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers 2605.07905 AI Reviewer Benchmarking构建面向完整性与正确性的AI审稿评测基准。	cs.CLcs.AI	Hexuan Deng, Xiaopeng Ke, Yichen Li, Ruina Hu, Dehao Huang	Despite the rapid development of AI reviewers, evaluating such systems remains challenging: metrics favor overlap with human reviews over correctness. However, since human reviews often cover only a subset of salient issues and sometimes contain mistakes, they... Despite the rapid development of AI reviewers, evaluating such systems remains challenging: metrics favor overlap with human reviews over correctness. However, since human reviews often cover only a subset of salient issues and sometimes contain mistakes, they are unreliable as gold references. To address this, we build category-specific benchmark subsets and skip evaluation when the corresponding human reviews are missing to strengthen Completeness. We also leverage reviewer--author--meta-revie...
366	How Value Induction Reshapes LLM Behaviour 2605.07925 Value Induction Effects分析价值诱导如何联动改变LLM行为与潜在风险。	cs.CL	Arnav Arora, Natalie Schluter, Katherine Metcalf, Maartje ter Hoeve	Conversational Large Language Models are post-trained on language that expresses specific behavioural traits, such as curiosity, open-mindedness, and empathy, and values, such as helpfulness, harmlessness, and honesty. This is done to increase utility, ensure ... Conversational Large Language Models are post-trained on language that expresses specific behavioural traits, such as curiosity, open-mindedness, and empathy, and values, such as helpfulness, harmlessness, and honesty. This is done to increase utility, ensure safety, and improve the experience of the people interacting with the model. However, values are complex and inter-related -- inducing one could modify behaviour on another. Further, inducing certain values can make models more addictive or...
367	How to Train Your Latent Diffusion Language Model Jointly With the Latent Space 2605.07933 Jointly Trained Latent Diffusion LM联合训练编码器扩散模型与解码器的潜空间文本扩散模型。	cs.CL	Viacheslav Meshchaninov, Alexander Shabalin, Egor Chimbulatov, Nikita Gushchin, Ilya Koziev	Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent diffusion modeling is... Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent diffusion modeling is constructing a suitable latent space. In this work, we present the Latent Diffusion Language Model (LDLM), in which the latent encoder, diffusion model, and decoder are trained jointly. LDLM builds its latent space by reshaping the represe...
368	Ask Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents? 2605.07937 Clarification Timing for Agents研究长程智能体在何时提问澄清最能提升任务成功率。	cs.CL	Anmol Gulati, Hariom Gupta, Elias Lumer, Sahil Sen, Vamse Kumar Subbiah	Long-horizon AI agents execute complex workflows spanning hundreds of sequential actions, yet a single wrong assumption early on can cascade into irreversible errors. When instructions are incomplete, the agent must decide not only whether to ask for clarifica... Long-horizon AI agents execute complex workflows spanning hundreds of sequential actions, yet a single wrong assumption early on can cascade into irreversible errors. When instructions are incomplete, the agent must decide not only whether to ask for clarification but when, and no prior work measures how clarification value changes over the course of execution. We introduce a forced-injection framework that provides ground-truth clarifications at controlled points in the agent's trajectory acros...
369	GLiGuard: Schema-Conditioned Classification for LLM Safeguard 2605.07982 Efficient LLM Guardrails提出GLiGuard以小模型做多维安全分类降低延迟。	cs.CL	Urchade Zaratiana, Mary Newhauser, George Hurn-Maloney, Ash Lewis	Ensuring safe, policy-compliant outputs from large language models requires real-time content moderation that can scale across multiple safety dimensions. However, state-of-the-art guardrail models rely on autoregressive decoders with 7B--27B parameters, refor... Ensuring safe, policy-compliant outputs from large language models requires real-time content moderation that can scale across multiple safety dimensions. However, state-of-the-art guardrail models rely on autoregressive decoders with 7B--27B parameters, reformulating what is fundamentally a classification problem as sequential text generation, a design choice that incurs high latency and scales poorly to multi-aspect evaluation. In this work, we introduce \textbf{GLiGuard}, a 0.3B-parameter sch...
370	Tool Calling is Linearly Readable and Steerable in Language Models 2605.07990 Tool-Calling Interpretability证明工具选择在内部表征中可线性读出并可控转向。	cs.CLcs.LGcs.AI	Zekun Wu (University College London), Ze Wang (University College London), Seonglae Cho (Holistic AI), Yufei Yang (Imperial College London), Adriano Koshiyama (University College London)	When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed. Probing 12 instruction-tuned models across Gemma 3, Qwen 3, Qwen 2.5, and Llama 3.1 (270M to 27B), we find the identity of t... When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed. Probing 12 instruction-tuned models across Gemma 3, Qwen 3, Qwen 2.5, and Llama 3.1 (270M to 27B), we find the identity of the chosen tool is linearly readable and steerable inside the model. Adding the mean-difference between two tools' average internal activations switches which tool the model selects at 77-100% accuracy on name-only single-turn prompts (93-10...
371	Fast Byte Latent Transformer 2605.08044 Fast Byte-Level Generation用块扩散等训练生成技巧加速字节级语言模型生成。	cs.CLcs.LGcs.AI	Julie Kallini, Artidoro Pagnoni, Tomasz Limisiewicz, Gargi Ghosh, Luke Zettlemoyer	Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer... Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generation techniques. First, we introduce BLT Diffusion (BLT-D), a new model and our fastest BLT variant, trained with an auxiliary block-wise diffusion objective alongside the standard next-byte prediction ...
372	Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs 2605.08045 Clinical Report Data Extraction蒸馏LLM抽取CMR报告结构化字段并给出不确定度。	cs.CL	Yi Yu, Parker Martin, Zhenyu Bu, Yixuan Liu, Yi-Yu Zheng	Converting free-text cardiac magnetic resonance (CMR) reports into auditable structured data remains a bottleneck for cohort assembly, longitudinal curation, and clinical decision support. We present CMR-EXTR, a lightweight framework that converts free-text CM... Converting free-text cardiac magnetic resonance (CMR) reports into auditable structured data remains a bottleneck for cohort assembly, longitudinal curation, and clinical decision support. We present CMR-EXTR, a lightweight framework that converts free-text CMR reports into structured data and assigns per-field confidence for quality control. A teacher-student distillation pipeline enables fully offline inference while limiting manual annotation. Uncertainty integrates three complementary princi...
373	Accurate and Efficient Statistical Testing for Word Semantic Breadth 2605.08048 Semantic Breadth Statistical Testing提出高效统计检验比较词语语义广度的差异。	cs.CL	Yo Ehara	Measuring the breadth of a word's meaning, or its spread across contexts, has become feasible with contextualized token embeddings. A word type can be represented as a cloud of token vectors, with dispersion-based statistics serving as proxies for contextual d... Measuring the breadth of a word's meaning, or its spread across contexts, has become feasible with contextualized token embeddings. A word type can be represented as a cloud of token vectors, with dispersion-based statistics serving as proxies for contextual diversity (Nagata and Tanaka-Ishii, ACL2025). These measurements are useful for deciding appropriate sense distinctions when constructing thesauri and domain-specific dictionaries. However, when comparing the breadth of two word types, naive...
374	CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation 2605.08057 Text-to-SQL Compute Allocation按复杂度分配推理预算并探索候选以提升Text-to-SQL。	cs.CLcs.AI	James Petullo, Nianwen Xue	While recent advancements in inference-time learning have improved LLM reasoning on Text-to-SQL tasks, current solutions still struggle to perform well on the most challenging tasks in the Bird-Bench (BIRD) benchmark. This is due to inadequate solution space e... While recent advancements in inference-time learning have improved LLM reasoning on Text-to-SQL tasks, current solutions still struggle to perform well on the most challenging tasks in the Bird-Bench (BIRD) benchmark. This is due to inadequate solution space exploration, which is necessary to uncover promising candidate queries that can be further refined to produce the correct output. To address this challenge, we introduce CA-SQL, a novel Text-to-SQL pipeline that utilizes the estimated diffic...
375	The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents 2605.08060 Context Length Social Dilemmas发现扩展记忆会在多智能体博弈中系统性降低合作。	cs.CLcs.AI	Jiayuan Liu, Tianqin Li, Shiyi Du, Xin Luo, Haoxuan Zeng	Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 o... Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game settings, a pattern we term the memory curse. We isolate the underlying mechanism through three analyses. First, lexical analysis of 378,000 reasoning traces associates this breakdown with eroding forward-looking intent rat...
376	Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration 2605.08077 Conformal KGQA Calibration用路径级共形校准为KG问答提供可信覆盖保证。	cs.CL	Shuhang Lin, Chuhao Zhou, Xiao Lin, Zihan Dong, Kuan Lu	Knowledge Graph Question Answering (KGQA) has shown promise for grounded and interpretable reasoning, yet existing approaches often fail to provide reliable coverage guarantees over retrieved answers. While Conformal Prediction (CP) offers a principled framewo... Knowledge Graph Question Answering (KGQA) has shown promise for grounded and interpretable reasoning, yet existing approaches often fail to provide reliable coverage guarantees over retrieved answers. While Conformal Prediction (CP) offers a principled framework for producing prediction sets with statistical guarantees, prior methods suffer from critical limitations in both calibration validity and score discriminability, resulting in violated coverage guarantees and excessively large prediction...
377	LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling 2605.08083 Automated Test-Time Scaling提出AutoTTS让LLM以环境驱动自动发现推理扩展策略。	cs.CL	Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang	Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patt... Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments wh...
378	More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models 2605.06672 Reasoning Length Bias揭示推理链越长多选题位置偏置反而越强。	cs.CLcs.LGcs.AI	Xiao Wang	Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any reasoning-... Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any reasoning-capable model, per-question position bias scales with the length of the reasoning trajectory. Across thirteen reasoning-mode configurations (two R1-distilled 7-8B models, two base models prompted with CoT, and DeepSeek-R1 at 671B) on MMLU, ...
379	RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory 2605.06675 Mixed-Precision KV Quantization用率失真理论为KV缓存分配最优混合比特量化。	cs.CLcs.LG	Fei Zuo, Zikang Zhou, Hao Cong, Xiaoyan Xi, Ho Fai Leung	Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache to fewer bits reduces this cost, yet a... Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache to fewer bits reduces this cost, yet all current quantizers assign the same bit-width to every attention head, ignoring the large variation in head importance. A natural idea is to allocate more bits to important heads and fewer to the rest. We show, however, that such mixed-pr...
380	LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction 2605.06676 Learned KV Cache Eviction端到端学习按头预算与选token的KV缓存淘汰策略。	cs.CLcs.LG	Enshuai Zhou, Yifan Hao, Chao Wang, Rui Zhang, Di Huang	Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical priors rather... Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical priors rather than task objectives, causing resource misallocation, while heuristic selection relies on coupled query-key interactions or static inductive biases (e.g., attention sinks). To address this limitation, we introduce LKV (Learned KV Eviction)...
381	Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models 2605.06683 Attention-Free Sequence Modeling提出Toeplitz MLP Mixer以低复杂度替代注意力建模序列。	cs.CLcs.LGcs.AI	Benjamin L. Badger, Ethan Roland	Transformer-based large language models are in some respects limited by the quadratic time and space computational complexity of attention. We introduce the Toeplitz MLP Mixer (TMM), a transformer-like architecture that swaps attention for triangular-masked To... Transformer-based large language models are in some respects limited by the quadratic time and space computational complexity of attention. We introduce the Toeplitz MLP Mixer (TMM), a transformer-like architecture that swaps attention for triangular-masked Toeplitz matrix multiplication over the sequence dimension resulting in $\mathcal{O} (dn \log n)$ time and $\mathcal O(dn)$ space complexity during training and $\mathcal O(dn)$ time and space at inference prefill. Despite the lack of sophist...
382	State Representation and Termination for Recursive Reasoning Systems 2605.06690 Recursive Reasoning State Control用认知状态图表示递归推理并给出终止准则。	cs.CLcs.LGcs.AI	Debashis Guha, Amritendu Mukherjee, Sanjay Kukreja, Tarun Kumar	Recursive reasoning systems alternate between acquiring new evidence and refining an accumulated understanding. Two design choices are typically left implicit: how to represent the evolving reasoning state, and when to stop iterating. This paper addresses both... Recursive reasoning systems alternate between acquiring new evidence and refining an accumulated understanding. Two design choices are typically left implicit: how to represent the evolving reasoning state, and when to stop iterating. This paper addresses both. We represent the reasoning state as an epistemic state graph encoding extracted claims, evidential relations, open questions, and confidence weights. We define the order-gap as the distance between the states reached by expand-then-consol...
383	CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment 2605.06702 Deployment-Time Continual Adaptation提出CASCADE让LLM部署中基于案例持续适应与学习。	cs.CLcs.LGcs.AI	Siyuan Guo, Yali Du, Hechang Chen, Yi Chang, Jun Wang	Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts w... Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experien...
384	From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms 2605.06716 LLM Agent Memory Survey综述LLM智能体记忆机制从存储到经验的演进脉络。	cs.CLcs.AI	Jinghao Luo, Yuchen Tian, Chuxue Cao, Ziyang Luo, Hongzhan Lin	Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. While memory mechanisms have emerged as the architectural cornerstone of these systems, current research remain... Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. While memory mechanisms have emerged as the architectural cornerstone of these systems, current research remains fragmented, oscillating between operating system engineering and cognitive science. This theoretical divide prevents a unified view of technological synthesis and a coherent evolutionary perspective. To bridge this gap, this survey propos...
385	When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment 2605.06723 Pre-Verbalization Commitment Theory提出有限答案稳定化理论刻画模型何时对答案作出承诺。	cs.CLcs.LGcs.AI	Long Zhang, Wei-neng Chen, Feng-feng Wei, Zi-bo Qin	Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilizat... Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilization}. For a model state and specified answer verbalizers, we project the model's own continuation probabilities onto a finite answer set; in binary tasks this yields an exact log-odds code, $\delta(\xi)=S_\theta(\mathrm{yes}\mid\xi)-S_\thet...
386	When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents 2605.06731 Persistent Agent State Poisoning揭示个性化智能体跨会话状态可被日常对话渐进投毒。	cs.CLcs.LG	Xiaoyu Xu, Minxin Du, Qipeng Xie, Haobin Ke, Qingqing Ye	Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term sta... Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term state, inadvertently weakening future confirmation boundaries, expanding tool-use defaults, and escalating autonomous behavior over time. We formalize this risk as \textbf{unintended long-term state poisoning}. To systematically study it, we i...
387	ProtSent: Protein Sentence Transformers 2605.06830 Protein Embedding Contrastive Tuning提出ProtSent对比微调蛋白模型以获得通用序列嵌入。	cs.CLcs.LG	Dan Ofer, Oriel Perets, Michal Linial, Nadav Rappoport	Protein language models (pLMs) produce per-residue representations that capture evolutionary and structural information, yet their mean-pooled sequence embeddings are not explicitly trained to reflect functional, evolutionary or structural similarity between p... Protein language models (pLMs) produce per-residue representations that capture evolutionary and structural information, yet their mean-pooled sequence embeddings are not explicitly trained to reflect functional, evolutionary or structural similarity between proteins. We present Protein Sentence Transformers (ProtSent), a contrastive fine-tuning framework for adapting PLMs into general-purpose embedding models. ProtSent trains with MultipleNegativesRankingLoss across five protein-pair datasets: ...
388	Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility 2605.06856 Real-World Utility Evaluation指出基准与效用脱节并提出面向真实效用的评测范式。	cs.CLcs.LG	Ishani Mondal, Shweta Bhardwaj	Generative AI systems achieve impressive performance on standard benchmarks yet fail to deliver real-world utility, a disconnect we identify across 28 deployment cases spanning education, healthcare, software engineering, and law. We argue that this benchmark ... Generative AI systems achieve impressive performance on standard benchmarks yet fail to deliver real-world utility, a disconnect we identify across 28 deployment cases spanning education, healthcare, software engineering, and law. We argue that this benchmark utility gap arises from three recurring failures in evaluation practice: proxy displacement, temporal collapse, and distributional concealment. Motivated by these observations, we argue that generative AI evaluation requires a paradigm shif...
389	Regulating Branch Parallelism in LLM Serving 2605.06914 LLM Serving Branch Parallelism提出调控分支并行的服务策略以平衡吞吐与延迟。	cs.CLcs.AI	Swapnil Gandhi, Siva Hari, William J. Dally, Christos Kozyrakis	Recent methods expose intra-request parallelism in LLM outputs, allowing independent branches to decode concurrently. Existing serving systems execute these branches eagerly or under fixed caps. We show that both are brittle: eager admission inflates the share... Recent methods expose intra-request parallelism in LLM outputs, allowing independent branches to decode concurrently. Existing serving systems execute these branches eagerly or under fixed caps. We show that both are brittle: eager admission inflates the shared decode step, degrading co-batched requests in serial stages, while conservative fixed caps forgo the throughput that motivated exposing branches in the first place. We call the excess step latency caused by admitted branches the branch ex...
390	From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle 2605.06963 RAG Tutoring for Moodle开发基于RAG的Moodle辅导插件并支持教师在环监督。	cs.CLcs.AI	Anna Ostrowska, Micha{\l} Kukla, Gabriela Majstrak, Jan Opala, Sebastian Perga{\l}a	This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design,... This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design, providing students with interactive, Socratic-based tutoring and educators with a "human-in-the-loop" workspace for supervised content generation. By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant a...
391	Bridging Textual Profiles and Latent User Embeddings for Personalization 2605.06981 Personalized User Representation Learning用强化学习融合文本画像与潜在用户向量以提升个性化推荐。	cs.CL	Zhaoxuan Tan, Xiang Zhai, Yan Zhu, Meng Jiang, Mohamed Hammad	Personalized systems rely on user representations to connect behavioral history with downstream recommendation applications. Existing methods typically employ either supervised latent user embeddings, which are effective for retrieval but difficult to interpre... Personalized systems rely on user representations to connect behavioral history with downstream recommendation applications. Existing methods typically employ either supervised latent user embeddings, which are effective for retrieval but difficult to interpret, or textual user profiles, which are interpretable but challenging to optimize for downstream utility due to lack of direct supervision. To bridge this gap, we present BLUE, a reinforcement learning framework that unifies these two forms ...
392	SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair 2605.07001 Architectural Code Smell Repair构建SmellBench评测LLM代理修复架构级代码异味能力。	cs.CL	Ion George Dinu (University of Craiova, Craiova, Romania), Marian Cristian Mih\u{a}escu (University of Craiova, Craiova	Architectural code smells erode software maintainability and are costly to repair manually, yet unlike localized bugs, they require cross-module reasoning about design intent that challenges both developers and automated tools. While large language model agent... Architectural code smells erode software maintainability and are costly to repair manually, yet unlike localized bugs, they require cross-module reasoning about design intent that challenges both developers and automated tools. While large language model agents excel at bug fixing and code-level refactoring, their ability to repair architectural code smells remains unexplored. We present the first empirical evaluation of LLM agents on architectural code smell repair. We contribute SmellBench, a ...
393	Theoretical Limits of Language Model Alignment 2605.07105 Alignment Theory under KL分析KL约束下强化学习与best-of-N对齐的奖励提升上限。	cs.CLcs.LG	Lucas Monteiro Paes, Natalie Mackraz, Barry-John Theobald, Federico Danieli	Language model (LM) alignment improves model outputs to reflect human preferences while preserving the capabilities of the base model. The most common alignment approaches are (i) reinforcement learning, which maximizes the expected reward under a KL-divergenc... Language model (LM) alignment improves model outputs to reflect human preferences while preserving the capabilities of the base model. The most common alignment approaches are (i) reinforcement learning, which maximizes the expected reward under a KL-divergence constraint, and (ii) best-of-$N$ alignment, which selects the highest-reward output among $N$ independent samples. Despite their widespread use, the fundamental limits of reward improvement under a KL budget remain poorly understood. We c...
394	The Position Curse: LLMs Struggle to Locate the Last Few Items in a List 2605.07127 LLM Positional Retrieval Failure揭示LLM难定位短列表末尾项的“位置诅咒”并系统评测。	cs.CLcs.LG	Zhanqi Zhang, Hua-Dong Xiong, Robert C. Wilson, Mikio Aoi, Marcelo G. Mattar	Modern large language models (LLMs) can find a needle in a haystack (locating a single relevant fact buried among hundreds of thousands of irrelevant tokens) with near-saturated accuracy, yet fail to retrieve the last few items in a short list. We call this fa... Modern large language models (LLMs) can find a needle in a haystack (locating a single relevant fact buried among hundreds of thousands of irrelevant tokens) with near-saturated accuracy, yet fail to retrieve the last few items in a short list. We call this failure the Position Curse. For instance, even in a two-line code snippet, Claude Opus 4.6 misidentifies the second-to-last line most of the time. To characterize this failure, we evaluated two complementary queries: given a position in a seq...
395	Topic Is Not Agenda: A Citation-Community Audit of Text Embeddings 2605.07158 Embedding Audit via Citations用大规模引文社区检验文本嵌入相似度是否反映研究议程。	cs.CLcs.LG	Junseon Yoo	Vector search and retrieval-augmented generation (RAG) rest on the assumption that cosine similarity between text embeddings reflects conceptual relatedness. We measure where this assumption breaks. We build an augmented citation graph over 3.58M scientific pa... Vector search and retrieval-augmented generation (RAG) rest on the assumption that cosine similarity between text embeddings reflects conceptual relatedness. We measure where this assumption breaks. We build an augmented citation graph over 3.58M scientific papers and partition it via Leiden CPM at two granularities: sub-field (L1) and research-agenda (L2, hierarchical inside each L1). Four state-of-the-art embeddings (Gemini, Qwen3-8B, Qwen3-0.6B, SPECTER2) clear the L1 bar reasonably (45-52% t...
396	DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models 2605.07210 Diffusion-based Text Retrieval用扩散语言模型并行生成多代表token以提升检索表示。	cs.CL	Shuai Wang, Yin Yu, Shengyao Zhuang, Bevan Koopman, Guido Zuccon	PromptReps showed that an autoregressive language model can be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models,... PromptReps showed that an autoregressive language model can be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and prior multi-token variants did not reliably improve over single-token decoding. We show that the bottleneck is sequential generation, not the multi-token idea itself. DiffRetriever is a repr...
397	MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory 2605.07242 Agentic Memory Cascade Repair提出MemoRepair优先修复屏障以解决记忆派生状态级联过期问题。	cs.CLcs.AI	Yang Zhao, Chengxiao Dai, Mengying Kou, Yue Xiu	Agentic memory evolves across tasks into durable derived artifacts: summaries, cached outputs, embeddings, learned skills, and executable tool procedures. When a source artifact is deleted, corrected, or invalidated by tool or API migration, descendants derive... Agentic memory evolves across tasks into durable derived artifacts: summaries, cached outputs, embeddings, learned skills, and executable tool procedures. When a source artifact is deleted, corrected, or invalidated by tool or API migration, descendants derived from that source can remain visible and steer future actions with stale support. We formalize this failure mode as the cascade update problem, where repair targets the visible derived state of the memory store. We present MemoRepair, a ba...
398	Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models 2605.07244 Mutual RL Experience Sharing让异构LLM在并行RL后训练中共享带类型经验并对齐不同分词器。	cs.CLcs.LGcs.AI	Xiaoze Liu, Dhananjay Ram, Yuting Zhang, Zhaoyang Zhang, Wei Xia	We introduce Mutual Reinforcement Learning, a framework for concurrent RL post-training in which heterogeneous LLM policies exchange typed experience while keeping separate parameters, objectives, and tokenizers. The framework combines a Shared Experience Exch... We introduce Mutual Reinforcement Learning, a framework for concurrent RL post-training in which heterogeneous LLM policies exchange typed experience while keeping separate parameters, objectives, and tokenizers. The framework combines a Shared Experience Exchange (SEE), Multi-Worker Resource Allocation (MWRA), and a Tokenizer Heterogeneity Layer (THL) that retokenizes text and aligns token-level traces across incompatible vocabularies. This substrate makes the experience-sharing design question...
399	When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models 2605.07260 MoE Router Counterfactual Analysis用反事实等算力路由对比评估MoE模型token级专家误路由。	cs.CLcs.LG	Youngsik Yoon, Siwei Wang, Wei Chen, Jungseul Ok	Mixture-of-Experts (MoE) language models route each token to a small subset of experts, but whether the routes selected by a trained top-$k$ router are good ones is rarely evaluated directly. Holding the model fixed, we compare each standard route against samp... Mixture-of-Experts (MoE) language models route each token to a small subset of experts, but whether the routes selected by a trained top-$k$ router are good ones is rarely evaluated directly. Holding the model fixed, we compare each standard route against sampled equal-compute alternatives for the same token and score each by the next-token probability it assigns to the realized token in a verified reasoning trajectory. The result is sharply token-conditional: the standard router is well-aligned...
400	On the Complexity of the Matching Problem of Regular Expressions with Backreferences 2605.07289 Regex Backreference Matching Complexity研究含反向引用正则匹配的复杂度边界以理解ReDoS风险。	cs.CL	Soh Kumabe, Yuya Uezato	ReDoS is a well-known type of algorithmic complexity attack, where an adversary supplies maliciously crafted strings to a regular expression matching engine, aiming to exhaust computational resources of systems. Even quadratic-time behavior in matching engines... ReDoS is a well-known type of algorithmic complexity attack, where an adversary supplies maliciously crafted strings to a regular expression matching engine, aiming to exhaust computational resources of systems. Even quadratic-time behavior in matching engines has been exploited in successful attacks, as exemplified by major outages at Stack Overflow (2016) and Cloudflare (2019). These incidents motivate a fundamental question: Is it possible to construct matching engines that are provably effic...
401	Unsolvability Ceiling in Multi-LLM Routing: An Empirical Study of Evaluation Artifacts 2605.07395 Multi-LLM Routing Evaluation Artifacts实证分析多模型路由中“不可解上限”源于评测指标与伪影。	cs.CLcs.LGcs.AI	Saloni Garg, Amit Sagtani	Efficient routing across multiple LLMs enables cost-quality tradeoffs by directing queries to the cheapest capable model. Prior work attributes routing headroom to an "unsolvability ceiling", queries no model in the pool can solve. We present a large-scale stu... Efficient routing across multiple LLMs enables cost-quality tradeoffs by directing queries to the cheapest capable model. Prior work attributes routing headroom to an "unsolvability ceiling", queries no model in the pool can solve. We present a large-scale study of multi-tier LLM routing with 206,000 query-model pairs across six benchmarks (MMLU, MedQA, HumanEval, MBPP, Alpaca, ShareGPT) using the Gemma 4 and Llama 3.1 families. Evaluating with both LLM-as-a-judge and exact-match metrics, we sho...
402	ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression 2605.07501 RL for CoT Compression用经验引导RL自适应压缩思维链以降低推理token与延迟。	cs.CLcs.LG	Tingcheng Bian, Yuzhe Zhang, Jing Jin, Jinchang Luo, MingQuan Cheng	Large reasoning models (LRMs) achieve strong performance via extended chain-of-thought (CoT) reasoning, yet suffer from excessive token consumption and high inference latency. Existing reinforcement learning (RL) approaches for CoT compression rely on uniform,... Large reasoning models (LRMs) achieve strong performance via extended chain-of-thought (CoT) reasoning, yet suffer from excessive token consumption and high inference latency. Existing reinforcement learning (RL) approaches for CoT compression rely on uniform, static length penalties that neglect model capability dynamics and problem-level difficulty variation. We propose \textbf{ExpThink}\xspace, an RL framework that addresses both dimensions through two complementary mechanisms. First, \emph{e...
403	Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States 2605.07579 RLVR Baseline from Internal States利用策略模型内部状态估计价值基线以低成本稳定RLVR训练。	cs.CLcs.LGcs.AI	Yunho Choi, Jongwon Lim, Woojin Ahn, Minjae Oh, Jeonghoon Shim	Reinforcement learning with verifiable rewards (RLVR) for Large Reasoning Models hinges on baseline estimation for variance reduction, but existing approaches pay a heavy price: PPO requires a policy-model scale critic, while GRPO needs multiple rollouts per p... Reinforcement learning with verifiable rewards (RLVR) for Large Reasoning Models hinges on baseline estimation for variance reduction, but existing approaches pay a heavy price: PPO requires a policy-model scale critic, while GRPO needs multiple rollouts per prompt to keep its empirical group mean stable. We introduce Policy Optimization with Internal State Value Estimation), which obtains a baseline at negligible cost by using the policy model's internal signals already computed during the poli...
404	Mathematical Reasoning via Intervention-Based Time-Series Causal Discovery Using LLMs as Concept Mastery Simulators 2605.07600 Causal Discovery for Math Reasoning用LLM作干预模拟器进行因果发现以定位数学解题关键概念。	cs.CLcs.LGcs.AI	Tsuyoshi Okita	Recent methods for improving LLM mathematical reasoning, whether through MCTS-based test-time search or causal graph-guided knowledge injection, cannot identify which concepts causally contribute to a correct answer, as the observed association may be spurious... Recent methods for improving LLM mathematical reasoning, whether through MCTS-based test-time search or causal graph-guided knowledge injection, cannot identify which concepts causally contribute to a correct answer, as the observed association may be spurious, driven by confounders such as problem difficulty. We propose CIKA (Causal Intervention for Knowledge Activation), a framework that uses the LLM itself as an interventional simulator: a prompt sets the concept state to ``mastered'' and the...
405	Reliable Chain-of-Thought via Prefix Consistency 2605.07654 Chain-of-Thought Reliability Signal提出前缀一致性度量并加权自一致投票以提升推理可靠性。	cs.CLcs.LG	Naoto Iwase, Yuki Ichihara, Mohammad Atif Quamar, Junpei Komiyama	Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regener... Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer...
406	TRACE: Tourism Recommendation with Accountable Citation Evidence 2605.07677 Evidence-grounded Tourism Recommendation提出TRACE基准评测带评论证据引用的多轮旅游对话推荐。	cs.CLcs.AI	Zixu Zhao, Sijin Wang, Yu Hou, Yuanyuan Xu, Yufan Sheng	Tourism is a high-stakes setting for conversational recommender systems (CRS): a plausible-sounding suggestion can waste real money and trip time once a traveler acts on it. Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over e... Tourism is a high-stakes setting for conversational recommender systems (CRS): a plausible-sounding suggestion can waste real money and trip time once a traveler acts on it. Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over entity mentions, and tourism-specific resources add spatial or knowledge-graph context, yet none of them couple multi-turn recommendation with verbatim review-span evidence and rejection recovery. This leaves an evaluation gap for tourism re...
407	Rethinking State Tracking in Recurrent Models Through Error Control Dynamics 2605.07755 Recurrent State Tracking Error Dynamics从误差控制动力学证明仿射循环网络难纠正状态漂移。	cs.CLcs.LG	Jiwan Chung, Heechan Choi, Seon Joo Kim	The theory of state tracking in recurrent architectures has predominantly focused on expressive capacity: whether a fixed architecture can theoretically realize a set of symbolic transition rules. We argue that equally important is error control, the dynamics ... The theory of state tracking in recurrent architectures has predominantly focused on expressive capacity: whether a fixed architecture can theoretically realize a set of symbolic transition rules. We argue that equally important is error control, the dynamics governing hidden-state drift along the directions that distinguish symbolic states. We prove that affine recurrent networks, a class of models encompassing State-Space Models and Linear Attention, cannot correct errors along state-separatin...
408	Tracing Uncertainty in Language Model "Reasoning" 2605.07776 Uncertainty Dynamics in Reasoning以不确定性轨迹刻画思维链生成过程并分析推理动态。	cs.CLcs.LGcs.AI	Nils Gr\"unefeld, Bertram H{\o}jer, Philipp Mondorf, Barbara Plank, Anna Rogers	Language model (LM) "reasoning", commonly described as Chain-of-Thought or test-time scaling, often improves benchmark performance, but the dynamics underlying this process remain poorly understood. We study these dynamics through the lens of uncertainty quant... Language model (LM) "reasoning", commonly described as Chain-of-Thought or test-time scaling, often improves benchmark performance, but the dynamics underlying this process remain poorly understood. We study these dynamics through the lens of uncertainty quantification by treating the "reasoning" traces, the intermediate token sequences generated by LMs, as evolving model states. We summarize each trace by an uncertainty trace profile: a small set of features describing the shape of the uncertai...
409	OrScale: Orthogonalised Optimization with Layer-Wise Trust-Ratio Scaling 2605.07815 Layer-wise Trust-Ratio Optimization提出OrScale对正交化更新加入逐层信任比缩放以改进训练。	cs.CLcs.LG	Yuxuan Lou, Yang You	Muon improves neural-network training by orthogonalizing matrix-valued updates, but it leaves each layer's update magnitude controlled mostly by a global learning rate. We introduce OrScale, a trust-ratio extension of Muon built on a simple rule: the denominat... Muon improves neural-network training by orthogonalizing matrix-valued updates, but it leaves each layer's update magnitude controlled mostly by a global learning rate. We introduce OrScale, a trust-ratio extension of Muon built on a simple rule: the denominator of a layer-wise ratio should measure the Frobenius norm of the actual parameter-space direction that will be applied. This yields OrScale for general matrix layers and OrScale-LM for language models, where Moonlight shape scaling is comb...
410	KL for a KL: On-Policy Distillation with Control Variate Baseline 2605.07865 Stable On-Policy Distillation用控制变量基线降低OPD梯度方差以提升后训练稳定性。	cs.CLcs.LGcs.AI	Minjae Oh, Sangjun Song, Gyubin Choi, Yunho Choi, Yohan Jo	On-Policy Distillation (OPD) has emerged as a dominant post-training paradigm for large language models, especially for reasoning domains. However, OPD remains unstable in practice due to the high gradient variance of its single-sample Monte Carlo estimator, a... On-Policy Distillation (OPD) has emerged as a dominant post-training paradigm for large language models, especially for reasoning domains. However, OPD remains unstable in practice due to the high gradient variance of its single-sample Monte Carlo estimator, and recipes for stable training are still immature. We propose vOPD (On-Policy Distillation with a control variate baseline), which casts OPD as policy-gradient RL and stabilizes it by introducing a control variate baseline-canonically a val...
411	Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation 2605.07924 Few-step Discrete Flow Distillation用能量导航蒸馏改进离散流匹配轨迹以实现少步生成。	cs.CLcs.LGcs.AI	Amin Karimi Monsefi, Dominic Culver, Nikhil Bhendawade, Manuel R. Ciosici, Yizhe Zhang	Discrete flow matching generates text by iteratively transforming noise tokens into coherent language, but may require hundreds of forward passes. Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps. When the ... Discrete flow matching generates text by iteratively transforming noise tokens into coherent language, but may require hundreds of forward passes. Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps. When the student underperforms, the usual explanation is insufficient capacity. We argue the opposite: the trajectory is the bottleneck, not the student. Each training trajectory is built through a chain of blind stochastic jumps with no evaluation ...
412	Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims 2605.08012 Causal Claims in Interpretability呼吁机理可解释性论文明确因果识别假设以支撑因果表述。	cs.CLcs.LGcs.AI	Zezheng Lin, Fengming Liu	Mechanistic interpretability papers increasingly use causal vocabulary: circuits, mediators, causal abstraction, monosemanticity. Such claims require explicit identification assumptions. A purposive audit of 10 papers across four methodological strands finds n... Mechanistic interpretability papers increasingly use causal vocabulary: circuits, mediators, causal abstraction, monosemanticity. Such claims require explicit identification assumptions. A purposive audit of 10 papers across four methodological strands finds no dedicated identification-assumptions section and a recurring pattern: validation metrics such as faithfulness, completeness, monosemanticity, alignment, or ablation effects are reported as causal support without stating the assumptions th...
413	Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms 2407.04183 LLMs and Wikipedia Neutrality评测LLM按维基中立规范检测与修正偏见编辑的能力与偏差。	cs.CLcs.AI	Joshua Ashkinaze, Ruijia Guan, Laura Kurek, Eytan Adar, Ceren Budak	Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) bi... Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% accuracy on a balanced dataset. Models exhibited contrasting biases (some under- and others over-predi...
414	UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function 2410.21438 Unified Fine-tuning for Alignment用广义隐式奖励把SFT与RLHF/DPO等对齐统一为单阶段训练。	cs.CLcs.LG	Zhichao Wang, Bin Bi, Zixu Zhu, Xiangbo Mao, Jun Wang	By pretraining on trillions of tokens, an LLM gains the capability of text generation. However, to enhance its utility and reduce potential harm, SFT and alignment are applied sequentially to the pretrained model. Because SFT and alignment have different objec... By pretraining on trillions of tokens, an LLM gains the capability of text generation. However, to enhance its utility and reduce potential harm, SFT and alignment are applied sequentially to the pretrained model. Because SFT and alignment have different objectives and underlying processes, performance on certain tasks can decline. To address this, we seamlessly introduce Unified Fine-Tuning (UFT), which integrates SFT and alignment into a single training stage using the same objective and loss ...
415	Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression 2502.01941 KV Cache Compression for Reasoning提出KVFundaBench评测KV压缩对高密度推理语义完整性的影响。	cs.CLcs.AI	Xiang Liu, Zhenheng Tang, Hong Chen, Peijie Dong, Zeyu Li	While Key-Value (KV) cache compression is essential for efficient LLM inference, current evaluations disproportionately focus on sparse retrieval tasks, potentially masking the degradation of High-Density Reasoning where Chain-of-Thought (CoT) coherence is cri... While Key-Value (KV) cache compression is essential for efficient LLM inference, current evaluations disproportionately focus on sparse retrieval tasks, potentially masking the degradation of High-Density Reasoning where Chain-of-Thought (CoT) coherence is critical. We introduce KVFundaBench to systematically evaluate this gap, revealing a sharp dichotomy: while retrieval tasks remain robust, reasoning tasks exhibit severe Task-Dependent Degradation under aggressive compression due to disrupted ...
416	Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning 2502.07143 Grounded Medical Dialogue with Uncertainty构建面向患者的医疗对话方法以指南为依据并显式管理不确定性。	cs.CL	Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu, Fenglin Liu, Junde Wu	The severe shortage of medical doctors limits access to timely and reliable healthcare, leaving millions underserved. Large language models (LLMs) offer a potential solution but struggle in real-world clinical interactions. Many LLMs are not grounded in author... The severe shortage of medical doctors limits access to timely and reliable healthcare, leaving millions underserved. Large language models (LLMs) offer a potential solution but struggle in real-world clinical interactions. Many LLMs are not grounded in authoritative medical guidelines and fail to transparently manage diagnostic uncertainty. Their language is often rigid and mechanical, lacking the human-like qualities essential for patient trust. To address these challenges, we propose Ask Pati...
417	S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models 2503.05085 Speech-to-Speech Instruction Benchmark提出S2S-Arena评测语音到语音模型对韵律情感等副语言指令遵循。	cs.CLcs.SDeess.AS	Feng Jiang, Zhiyu Lin, Yiyang Liu, Liumeng Xue, Fan Bu	Recent advances in large language models (LLMs) have fundamentally reshaped speech-to-speech (S2S) systems, enabling increasingly natural spoken interaction. However, existing benchmarks still rely heavily on text-based evaluation and largely ignore paralingui... Recent advances in large language models (LLMs) have fundamentally reshaped speech-to-speech (S2S) systems, enabling increasingly natural spoken interaction. However, existing benchmarks still rely heavily on text-based evaluation and largely ignore paralinguistic cues such as prosody, emotion, and speaker traits, which are central to expressive and human-like communication. We introduce S2S-Arena, a speech-native benchmark for evaluating instruction-following S2S models with explicit assessment...
418	FiSMiness: A Finite State Machine Based Paradigm for Emotional Support Conversations 2504.11837 FSM-guided Emotional Support Chat用有限状态机约束LLM对话流程以提升情感支持对话长期效果。	cs.CLcs.AI	Yue Zhao, Qingqing Gu, Xiaoyu Wang, Teng Chen, Zhonglin Jiang	Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram fro... Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage the Finite State Machine (FSM) on LLMs, and propose a framework called FiSMiness. Our framework allow...
419	Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation 2505.22842 Probabilistic Positional Encoding Theory提出BAM将位置编码视为先验以统一方法并解释长上下文外推。	cs.CLcs.LG	Arthur S. Bianchessi, Yasmin C. Aguirre, Rodrigo C. Barros, Lucas S. Kupssinsk\"u	Transformer-based language models rely on positional encoding (PE) to handle token order and support context length extrapolation. However, existing PE methods lack theoretical clarity and rely on limited evaluation metrics to substantiate their extrapolation ... Transformer-based language models rely on positional encoding (PE) to handle token order and support context length extrapolation. However, existing PE methods lack theoretical clarity and rely on limited evaluation metrics to substantiate their extrapolation claims. We propose the Bayesian Attention Mechanism (BAM), a theoretical framework that formulates positional encoding as a prior within a probabilistic model. BAM unifies existing methods (e.g., NoPE and ALiBi) and motivates a new Generali...
420	Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks 2506.13351 Constrained RL for Unverifiable Tasks用token级反思奖励与规则门控约束训练LLM处理不可验证任务。	cs.CLcs.LGcs.AI	Yifei Xu, Tusher Chakraborty, Srinagesh Sharma, Leonardo Nunes, Swati Sharma	Reinforcement learning (RL) training of large language models (LLMs) on unverifiable tasks is challenging even when a reasonable-quality reference answer is available. We propose a constrained RL training framework that (i) optimizes a token-level dense Reason... Reinforcement learning (RL) training of large language models (LLMs) on unverifiable tasks is challenging even when a reasonable-quality reference answer is available. We propose a constrained RL training framework that (i) optimizes a token-level dense Reasoning Reflection Reward (R3) aligned with reasoning quality, and (ii) enforces rubric-gating as feasibility constraints at the rollout group level. R3 measures the model's token-level certainty of a reference answer under its chain-of-thought...
421	VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents 2506.21582 Interactive LLM Text Analytics提出VIDEE用智能体可视化分解并执行文本分析流程。	cs.CLcs.AI	Sam Yu-Te Lee, Chenyang Ji, Shicheng Wen, Lifu Huang, Dongyu Liu	Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabl... Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabling more accessible and automated text analysis (e.g., topic detection, summarization, information extraction, etc.). We introduce VIDEE, a system that supports entry-level data analysts to conduct advanced text analytics with intelligent a...
422	Human-like fleeting memory improves language learning but impairs reading time prediction in transformer language models 2508.05803 Fleeting Memory in Transformers引入类人短暂记忆以改进语言学习但影响阅读时长预测。	cs.CL	Abishek Thamma, Micha Heilbron	Human memory is fleeting. As words are processed, the exact wordforms that make up incoming sentences are rapidly lost. Cognitive scientists have long believed that this limitation of memory may, paradoxically, help in learning language - an idea supported by ... Human memory is fleeting. As words are processed, the exact wordforms that make up incoming sentences are rapidly lost. Cognitive scientists have long believed that this limitation of memory may, paradoxically, help in learning language - an idea supported by classic connectionist modelling work. The rise of Transformers appears to challenge this idea, as these models can learn language effectively, despite lacking memory limitations or other architectural recency biases. Here, we investigate th...
423	Training-Free Multimodal Large Language Model Orchestration 2508.10016 Training-Free Multimodal Orchestration无需训练地编排LLM与模态专家实现多模态输入输出。	cs.CL	Tianyu Xie, Yuexiao Ma, Yuhang Wu, Wang Chen, Jiayi Ji	Building interactive omni-modal assistants often relies on end-to-end multimodal alignment to fuse heterogeneous modalities, which incurs substantial data and compute costs and limits extensibility. We present Training-Free Large Language Model Orchestration (... Building interactive omni-modal assistants often relies on end-to-end multimodal alignment to fuse heterogeneous modalities, which incurs substantial data and compute costs and limits extensibility. We present Training-Free Large Language Model Orchestration (LLM Orchestration), a training-free orchestration framework that integrates off-the-shelf modality experts into a unified multimodal input--output system without additional gradient-based training for integration. LLM Orchestration comprise...
424	User eXperience Perception Insights Dataset (UXPID): Synthetic User Feedback from Public Industrial Forums 2509.11777 Synthetic UX Feedback Dataset构建UXPID数据集以合成并标注工业论坛用户体验反馈。	cs.CLcs.LG	Mikhail Kulyabin, Jan Joosten, Choro Ulan uulu, Nuno Miguel Martins Pacheco, Fabian Ries	Customer feedback in industrial forums offers rich but underexplored insights into real-world product experience. Yet systematic analysis remains challenging due to unstructured, domain-specific content and the scarcity of high-quality labeled datasets. This p... Customer feedback in industrial forums offers rich but underexplored insights into real-world product experience. Yet systematic analysis remains challenging due to unstructured, domain-specific content and the scarcity of high-quality labeled datasets. This paper presents the User eXperience Perception Insights Dataset (UXPID), a collection of 7130 synthesized and anonymized user feedback branches extracted from a public industrial automation forum. Each JSON record contains multi-post comments...
425	OLaPh: Optimal Language Phonemizer 2509.20086 Hybrid Multilingual Phonemizer提出OLaPh融合多语词典与子词分割提升音素化效果。	cs.CL	Johannes Wirth	Phonemization is a critical component in text-to-speech synthesis. Traditional approaches rely on deterministic transformations and lexica, while neural methods offer potential for higher generalization on out-of-vocabulary (OOV) terms. This work introduces OL... Phonemization is a critical component in text-to-speech synthesis. Traditional approaches rely on deterministic transformations and lexica, while neural methods offer potential for higher generalization on out-of-vocabulary (OOV) terms. This work introduces OLaPh (Optimal Language Phonemizer), a hybrid framework that integrates extensive multilingual lexica with advanced NLP techniques and a statistical subword segmentation function. Evaluations on the WikiPron benchmark show that the OLaPh fram...
426	ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards 2510.00568 RL Self-Correcting Search Agents提出ReSeek用指导性奖励训练可自纠错的搜索智能体。	cs.CL	Shiyu Li, Yang Tang, Yifan Wang, Peiming Li, Xi Chen	Search agents powered by Large Language Models (LLMs) have demonstrated significant potential in tackling knowledge-intensive tasks. Reinforcement learning (RL) has emerged as a powerful paradigm for training these agents to perform complex, multi-step reasoni... Search agents powered by Large Language Models (LLMs) have demonstrated significant potential in tackling knowledge-intensive tasks. Reinforcement learning (RL) has emerged as a powerful paradigm for training these agents to perform complex, multi-step reasoning. However, prior RL-based methods often rely on sparse or rule-based rewards, which can lead agents to commit to suboptimal or erroneous reasoning paths without the ability to recover. To address these limitations, we propose ReSeek, a no...
427	How Do Language Models Compose Functions? 2510.01685 Function Composition in LLMs分析LLM在两跳事实回忆中是否以组合机制计算g(f(x))。	cs.CLcs.AI	Apoorv Khandelwal, Ellie Pavlick	While large language models (LLMs) appear to be increasingly capable of solving compositional tasks, it is an open question whether they do so using compositional mechanisms. In this work, we investigate how feedforward LLMs solve two-hop factual recall tasks,... While large language models (LLMs) appear to be increasingly capable of solving compositional tasks, it is an open question whether they do so using compositional mechanisms. In this work, we investigate how feedforward LLMs solve two-hop factual recall tasks, which can be expressed compositionally as $g(f(x))$. We first confirm that modern LLMs continue to suffer from the "compositionality gap", i.e. their ability to compute both $z = f(x)$ and $y = g(z)$ does not entail their ability to comput...
428	Detecting Distillation Data from Reasoning Models 2510.04850 Distillation Data Contamination Detection定义并研究检测问题是否出现在推理蒸馏数据中的任务。	cs.CLcs.AI	Hengxiang Zhang, Hyeong Kyu Choi, Sharon Li, Hongxin Wei	Reasoning distillation has emerged as a prevailing paradigm for transferring reasoning capabilities from large reasoning models to small language models. Yet, reasoning distillation risks data contamination: benchmark data may inadvertently be included in the ... Reasoning distillation has emerged as a prevailing paradigm for transferring reasoning capabilities from large reasoning models to small language models. Yet, reasoning distillation risks data contamination: benchmark data may inadvertently be included in the distillation data, thereby inflating model performance metrics. In this work, we formally define the distillation data detection task, which determines whether a given question is included in the model's distillation data. The unique challe...
429	Comprehensiveness Metrics for Automatic Evaluation of Factual Recall in Text Generation 2510.07926 Comprehensiveness Evaluation Metrics提出指标评估生成文本事实召回的完整性并检测遗漏信息。	cs.CL	Adam Dejl, James Barry, Alessandra Pascale, Javier Carnerero Cano	Despite demonstrating remarkable performance across a wide range of tasks, large language models (LLMs) have also been found to frequently produce outputs that are incomplete or selectively omit key information. In sensitive domains, such omissions can result ... Despite demonstrating remarkable performance across a wide range of tasks, large language models (LLMs) have also been found to frequently produce outputs that are incomplete or selectively omit key information. In sensitive domains, such omissions can result in significant harm comparable to that posed by factual inaccuracies, including hallucinations. In this study, we address the challenge of evaluating the comprehensiveness of LLM-generated texts, focusing on the detection of missing informa...
430	EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle 2510.16079 Self-Evolving LLM Agents提出EvolveR让智能体通过经验闭环生命周期持续自我改进。	cs.CLcs.AI	Rong Wu, Xiaoman Wang, Jianbiao Mei, Pinlong Cai, Daocheng Fu	Current Large Language Model (LLM) agents show strong performance in tool use, but lack the crucial capability to systematically learn from their own experiences. While existing frameworks mainly focus on mitigating external knowledge gaps, they fail to addres... Current Large Language Model (LLM) agents show strong performance in tool use, but lack the crucial capability to systematically learn from their own experiences. While existing frameworks mainly focus on mitigating external knowledge gaps, they fail to address a more fundamental limitation: the inability to iteratively refine problem-solving strategies. In this work, we introduce EvolveR, a framework designed to enable agent to self-improve through a complete, closed-loop experience lifecycle. ...
431	DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference 2510.19669 Difficulty-Adaptive Reasoning Inference提出DiffAdapt按难度自适应控制推理长度以节省token。	cs.CL	Xiang Liu, Xuming Hu, Xiaowen Chu, Eunsol Choi	Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach high performance without overthin... Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach high performance without overthinking. First, we analyze the entropy of token probabilities in reasoning traces. Across three models, we observe a consistent U-shaped entropy pattern: high entropy on easy problems despite high accuracy, low entropy on problems with medium ...
432	MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning 2511.02805 Memory-Managed Search Agents via RL提出MemSearcher用端到端强化学习训练检索并管理紧凑记忆。	cs.CLcs.AI	Qianhao Yuan, Jie Lou, Zichao Li, Jiawei Chen, Yaojie Lu	LLM-based search agents often concatenate the full interaction history into the context, producing long and noisy inputs, and increasing compute cost and GPU memory overhead. To address this issue, we propose MemSearcher, an agent framework that maintains a co... LLM-based search agents often concatenate the full interaction history into the context, producing long and noisy inputs, and increasing compute cost and GPU memory overhead. To address this issue, we propose MemSearcher, an agent framework that maintains a compact memory during multi-turn interactions, retaining only question-relevant information and thereby keeping the context length stable across turns. Training MemSearcher is challenging because each trajectory spans multiple turns under dif...
433	Rep2Text: Decoding Full Text from a Single LLM Token Representation 2511.06571 Text Reconstruction from Token Representation提出Rep2Text从单个末token表示中解码恢复原始输入文本。	cs.CLcs.LGcs.AI	Haiyan Zhao, Zirui He, Yiming Tang, Fan Yang, Ali Payani	Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we investigate a fundamental question: to what extent can the original input text be recovered from a single... Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we investigate a fundamental question: to what extent can the original input text be recovered from a single last-token representation in an LLM? To this end, we propose Rep2Text, a novel framework for decoding text from last-token representations. Rep2Text employs a trainable adapter that maps a target model's last-token representation into the ...
434	Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization 2512.23032 Chain-of-Thought Faithfulness论证CoT可在不显式复述提示线索下仍保持忠实可解释。	cs.CLcs.LGcs.AI	Kerem Zaman, Shashank Srivastava	Recent work, using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric adopts a narrow notion of faithfulness and confuses unfaithfulness with incompleteness, the lossy c... Recent work, using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric adopts a narrow notion of faithfulness and confuses unfaithfulness with incompleteness, the lossy compression needed to turn distributed transformer computation into a linear natural language narrative. On multi-hop reasoning tasks with instruct-tuned and reasoning models, many CoTs flagged as unfaithful by Biasing Features are judged fa...
435	Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala 2601.14958 Sinhala Script Sensitivity Benchmark基准评测语言模型在僧伽罗Unicode与罗马化混写文本上的表现。	cs.CLcs.AI	Minuri Rajapakse, Ruvan Weerasinghe	The performance of Language Models (LMs) on low-resource, morphologically rich languages like Sinhala remains largely unexplored, particularly regarding script variation in digital communication. Sinhala exhibits script duality, with Unicode used in formal con... The performance of Language Models (LMs) on low-resource, morphologically rich languages like Sinhala remains largely unexplored, particularly regarding script variation in digital communication. Sinhala exhibits script duality, with Unicode used in formal contexts and Romanized text dominating social media, while mixed-script usage is common in practice. This paper benchmarks 24 open-source LMs on Unicode, Romanized and mixed-script Sinhala using perplexity evaluation across diverse text source...
436	Beyond Factual Accuracy: Evaluating Global Reasoning Integrity in RAG Systems with LogicScore 2601.15050 RAG Logical Integrity Evaluation提出LogicScore评估RAG长回答的全局逻辑一致性而非仅事实。	cs.CL	Zhichao Yan, Yunxiao Zhao, Jiapu Wang, Jiaoyan Chen, Xiaoli Li	Current evaluation methods for Retrieval Augmented Generation (RAG) suffer from \textit{factual myopia}: they relentlessly emphasize factual accuracy yet neglect global logical integrity in long-form answer generation. This drives models to force unnatural con... Current evaluation methods for Retrieval Augmented Generation (RAG) suffer from \textit{factual myopia}: they relentlessly emphasize factual accuracy yet neglect global logical integrity in long-form answer generation. This drives models to force unnatural connections, producing factually grounded yet logically incoherent responses with unaddressed gaps, ambiguous links, or redundant premises. To mitigate this, we present \textsc{LogicScore}, shifting from local, fact-by-fact assessment to rigor...
437	Can David Beat Goliath? On Multi-Hop Reasoning with Resource-Constrained Agents 2601.21699 Resource-Constrained Multi-Hop Agents研究资源受限下多跳推理智能体的训练与探索效率问题。	cs.CL	Hojae Han, Heeyun Jung, Jongyoon Kim, Seung-won Hwang	Multi-turn reasoning agents solve complex questions by decomposing them into intermediate retrieval or tool-use steps, for accumulating supporting evidence across turns. Meanwhile, with reinforcement learning (RL), training these agents rely on many on-policy ... Multi-turn reasoning agents solve complex questions by decomposing them into intermediate retrieval or tool-use steps, for accumulating supporting evidence across turns. Meanwhile, with reinforcement learning (RL), training these agents rely on many on-policy rollouts and large training batches. Under realistic resource constraints that make dense exploration infeasible, each RL batch contains only few useful reasoning paths from the current policy. Existing approaches do not fully address this ...
438	WorldCup Sampling for Multi-bit LLM Watermarking 2602.01752 Multi-bit LLM Watermarking提出WorldCup采样提升多比特水印的鲁棒性与文本质量。	cs.CL	Yidan Wang, Yubing Ren, Yanan Cao, Li Guo	As large language models (LLMs) generate increasingly human-like text, watermarking has emerged as a promising solution for reliable attribution beyond mere detection. While multi-bit watermarking enables richer provenance encoding, existing approaches typical... As large language models (LLMs) generate increasingly human-like text, watermarking has emerged as a promising solution for reliable attribution beyond mere detection. While multi-bit watermarking enables richer provenance encoding, existing approaches typically extend zero-bit watermarking schemes by introducing static logit perturbations and counting-based decoding strategies, which can degrade text quality and compromise decoding robustness as the payload increases. In this paper, we propose ...
439	A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method 2602.02320 Molecule Structure-Language Dataset用规则正则自动生成大规模分子结构到语言描述对齐数据集。	cs.CLcs.AI	Feiyang Cai, Guijuan He, Yi Hu, Jingjing Wang, Joshua Luo	Molecular function is largely determined by structure. Accurately aligning molecular structure with natural language is therefore essential for enabling large language models (LLMs) to reason about downstream chemical tasks. However, the substantial cost of hu... Molecular function is largely determined by structure. Accurately aligning molecular structure with natural language is therefore essential for enabling large language models (LLMs) to reason about downstream chemical tasks. However, the substantial cost of human annotation makes it infeasible to construct large-scale, high-quality datasets of structure-grounded descriptions. In this work, we propose a fully automated annotation framework for generating precise molecular descriptions that preser...
440	Rethinking Weight Tying: Pseudo-Inverse Tying for LM Stable Training and Updates 2602.04556 Pseudo-Inverse Weight Tying提出伪逆权重绑定稳定训练并保持词表编码解码接口一致。	cs.CLcs.LG	Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang	Weight tying is widely used in compact language models to reduce parameters by sharing the token table between the input embedding and the output projection. However, parameter sharing alone does not guarantee a stable token interface: during training, the cor... Weight tying is widely used in compact language models to reduce parameters by sharing the token table between the input embedding and the output projection. However, parameter sharing alone does not guarantee a stable token interface: during training, the correspondence between encoding tokens into hidden states and decoding hidden states into logits can drift, worsening optimization sensitivity and weakening explainability probes that rely on a meaningful vocabulary-space decoder. We propose P...
441	Retrieval Heads are Dynamic 2602.11162 Dynamic Retrieval Heads Analysis从生成时序角度分析LLM检索头的动态变化与作用机制。	cs.CL	Yuping Lin, Zitao Li, Yue Xing, Pengfei He, Yingqian Cui	Recent studies have identified "retrieval heads" in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retri... Recent studies have identified "retrieval heads" in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retrieval on average. This perspective overlooks the fine-grained temporal dynamics of autoregressive generation. In this paper, we investigate retrieval heads from a dynamic perspective. Through extensive analysis, we establish three core claim...
442	Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset 2602.16571 PII De-Identification for Math Tutoring研究数学辅导对话中数字歧义下的保效去标识化方法。	cs.CL	Zhuqian Zhou, Kirk Vanacore, Bakhtawar Ahtisham, Jinsook Lee, Doug Pietrzak	Large-scale sharing of dialogue data is key to advancing the science of teaching and learning, yet rigorous de-identification remains a major barrier. In mathematics tutoring transcripts, numeric expressions frequently resemble structured identifiers (e.g., da... Large-scale sharing of dialogue data is key to advancing the science of teaching and learning, yet rigorous de-identification remains a major barrier. In mathematics tutoring transcripts, numeric expressions frequently resemble structured identifiers (e.g., dates or IDs), leading generic Personally Identifiable Information (PII) detection systems to over-redact core instructional content and reduce data utility. This work asks how to detect PII while preserving educational utility, focusing on t...
443	Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning 2602.19612 Benchmarking Machine Unlearning提出DUAL基准分析事实显著性与微调阶段对遗忘效果的影响。	cs.CL	Borisiuk Anna, Andrey Savchenko, Alexander Panchenko, Elena Tutubalina	Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whether the forgotten knowledge originates from pretraining or supe... Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whether the forgotten knowledge originates from pretraining or supervised fine-tuning (SFT). In this paper, we introduce DUAL (Dual Unlearning Evaluation across Training Stages), a benchmark of 28.6k Wikidata-derived triplets annotated with fact popularity using Wikipedia link counts and LLM-based salience...
444	Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation 2602.20816 Tail-Aware Distillation Divergence提出解耦top-K与尾部概率的蒸馏损失以提升学生学习信号。	cs.CLcs.LG	Sayantan Dasgupta, Trevor Cohn, Timothy Baldwin	The core learning signal used in language model distillation is the standard Kullback-Leibler (KL) divergence between the student and teacher distributions. Traditional KL divergence tends to be dominated by the next tokens with the highest probabilities, i.e.... The core learning signal used in language model distillation is the standard Kullback-Leibler (KL) divergence between the student and teacher distributions. Traditional KL divergence tends to be dominated by the next tokens with the highest probabilities, i.e., the teacher's modes, thereby diminishing the influence of less probable yet potentially informative components of the output distribution. We propose a new tail-aware divergence that decouples the contribution of the teacher model's top-K...
445	Optimizing Language Models for Crosslingual Knowledge Consistency 2603.04678 Crosslingual Knowledge Consistency RL用结构化奖励强化学习优化模型以获得跨语言一致的知识回答。	cs.CLcs.AI	Tianyu Liu, Jirui Qi, Mrinmaya Sachan, Ryan Cotterell, Raquel Fern\'andez	Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their re... Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their reliability. In this work, we show that this issue can be mitigated using reinforcement learning with a structured reward function, which leads to an optimal policy with consistent crosslingual responses. We introduce Direct Consistency Optim...
446	A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs 2603.07475 AR vs Diffusion LLM Representations对比自回归与扩散LLM的层级表征容量与token表征差异。	cs.CLcs.LG	Raghavv Goel, Risheek Garrepalli, Sudhanshu Agrawal, Chris Lott, Mingu Lee	Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether diffusion objectives ... Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether diffusion objectives fundamentally reshape internal representations remains unclear. We perform the first layer- and token-wise representational analysis comparing native dLLMs (LLaDA), native AR models (Qwen2.5), and AR-initialized dLLMs (Dream-7B), using cosi...
447	NCL-UoR at SemEval-2026 Task 5: Embedding-Based Methods, Fine-Tuning, and LLMs for Word Sense Plausibility Rating 2603.08256 Word Sense Plausibility Regression比较嵌入回归、微调与提示法预测词义可行性评分。	cs.CL	Tong Wu, Thanet Markchom, Huizhi Liang	Word sense plausibility rating requires predicting the human-perceived plausibility of a given word sense on a 1-5 scale in the context of short narrative stories containing ambiguous homonyms. This paper systematically compares three approaches: (1) embedding... Word sense plausibility rating requires predicting the human-perceived plausibility of a given word sense on a 1-5 scale in the context of short narrative stories containing ambiguous homonyms. This paper systematically compares three approaches: (1) embedding-based methods pairing sentence embeddings with standard regressors, (2) transformer fine-tuning with parameter-efficient adaptation, and (3) large language model (LLM) prompting with structured reasoning and explicit decision rules. The be...
448	FinReasoning: A Hierarchical Benchmark for Reliable Financial Research Reporting 2603.19254 Financial Research Reporting Benchmark提出FinReasoning分层基准评测金融研究报告的可靠推理与一致性。	cs.CL	Yiyun Zhu, Yidong Jiang, Ziwen Xu, Yinsheng Yao, Dawei Cheng	Large language models (LLMs) are increasingly deployed in financial research workflows, where their role is evolving from single-model assistance for human analysts toward autonomous collaboration among multiple agents. Yet real-world deployments still expose ... Large language models (LLMs) are increasingly deployed in financial research workflows, where their role is evolving from single-model assistance for human analysts toward autonomous collaboration among multiple agents. Yet real-world deployments still expose factual errors, numerical inconsistencies, and shallow analysis, which can distort assessments of corporate fundamentals and trigger severe economic losses. While existing benchmarks have begun to evaluate such failures, they score all aspe...
449	Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control 2604.03147 Valence-Arousal Emotion Subspace发现LLM情绪向量呈效价-唤醒二维环形几何并可控生成。	cs.CLcs.AI	Lihao Sun, Lewen Yan, Xiaoya Lu, Andrew Lee, Jie Zhang	We show that emotion vectors in LLMs are organized by a two-dimensional valence-arousal (VA) subspace exhibiting circular geometry. Through principal component decomposition and ridge regression, we recover meaningful VA axes underlying emotion steering vector... We show that emotion vectors in LLMs are organized by a two-dimensional valence-arousal (VA) subspace exhibiting circular geometry. Through principal component decomposition and ridge regression, we recover meaningful VA axes underlying emotion steering vectors whose projections correlate with human affect ratings across 44,728 words. Steering along these axes produces monotonic control over the affective properties of generated text, and further affords bidirectional control over multiple downs...
450	NCL-BU at SemEval-2026 Task 3: Fine-tuning XLM-RoBERTa for Multilingual Dimensional Sentiment Regression 2604.08923 Multilingual Valence-Arousal Regression微调XLM-R实现多语言方面级效价与唤醒连续回归预测。	cs.CL	Tong Wu, Nicolay Rusnachenko, Huizhi Liang	Dimensional Aspect-Based Sentiment Analysis (DimABSA) extends traditional ABSA from categorical polarity labels to continuous valence-arousal (VA) regression. This paper describes a system developed for Track A, Subtask 1 (Dimensional Aspect Sentiment Regressi... Dimensional Aspect-Based Sentiment Analysis (DimABSA) extends traditional ABSA from categorical polarity labels to continuous valence-arousal (VA) regression. This paper describes a system developed for Track A, Subtask 1 (Dimensional Aspect Sentiment Regression), aiming to predict real-valued VA scores in the [1, 9] range for each given aspect in a text. A fine-tuning approach based on XLM-RoBERTa-base is adopted, with dual regression heads with sigmoid-scaled outputs for valence and arousal pr...
451	BITS Pilani at SemEval-2026 Task 9: Structured Supervised Fine-Tuning with DPO Refinement for Polarization Detection 2604.11121 Polarization Detection Fine-tuning用监督微调结合DPO提升多语种极化检测。	cs.CL	Atharva Gupta, Dhruv Kumar, Yash Sinha	The POLAR SemEval-2026 Shared Task aims to detect online polarization and focuses on the classification and identification of multilingual, multicultural, and multi-event polarization. Accurate computational detection of online polarization is challenging due ... The POLAR SemEval-2026 Shared Task aims to detect online polarization and focuses on the classification and identification of multilingual, multicultural, and multi-event polarization. Accurate computational detection of online polarization is challenging due to nuanced rhetoric, implicit framing, and the high cost of human-in-the-loop annotation. Building on recent findings that contextual prompting enables large language models to function as strong polarization detectors, we present a two-sta...
452	Prune, Interpret, Evaluate: A Cross-Layer Transcoder-Native Framework for Efficient Circuit Discovery via Feature Attribution 2604.16889 Efficient Circuit Interpretability先剪枝再归因解释以高效发现模型电路特征。	cs.CL	Qinhao Chen, Linyang He, Nima Mesgarani	Existing feature-interpretation pipelines typically operate on uniformly sampled units or exhaustive feature sets, incurring massive costs on units irrelevant to target behaviors. To address this, we introduce the first CLT-native end-to-end pruning framework,... Existing feature-interpretation pipelines typically operate on uniformly sampled units or exhaustive feature sets, incurring massive costs on units irrelevant to target behaviors. To address this, we introduce the first CLT-native end-to-end pruning framework, PIE, which pioneers the paradigm of pruning first and interpreting later. PIE connects Pruning, automatic Interpretation, and interpretation Evaluation, establishing a comprehensive benchmarking environment to systematically measure behavi...
453	JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems 2604.23478 LLM Judge Prompt Sensitivity构建JudgeSense评测LLM裁判对改写提示的稳定性。	cs.CL	Rohith Reddy Bellibatlu, Edward Raff, Wenbin Zhang	Large language models are widely adopted as automated evaluation judges, yet the stability of their verdicts under semantically equivalent prompt rephrasings remains largely unexamined. We conduct a systematic empirical study of prompt-induced decision instabi... Large language models are widely adopted as automated evaluation judges, yet the stability of their verdicts under semantically equivalent prompt rephrasings remains largely unexamined. We conduct a systematic empirical study of prompt-induced decision instability across multiple evaluation tasks and judge architectures. To facilitate this analysis, we release JudgeSense, a benchmark comprising hand-validated prompt-paraphrase pairs spanning factuality, coherence, relevance, and preference, draw...
454	TSAssistant: A Human-in-the-Loop Agentic Framework for Automated Target Safety Assessment 2604.23938 Target Safety Assessment Agents提出人机协作多代理框架辅助药物靶点安全评估报告。	cs.CL	Xiaochen Zheng, Zhiwen Jiang, Melanie Guerard, Klas Hatje, Tatyana Doktorova	Target Safety Assessment (TSA) requires systematic integration of heterogeneous evidence, including genetic, transcriptomic, target homology, pharmacological, and clinical data, to evaluate potential safety liabilities of therapeutic targets. This process is i... Target Safety Assessment (TSA) requires systematic integration of heterogeneous evidence, including genetic, transcriptomic, target homology, pharmacological, and clinical data, to evaluate potential safety liabilities of therapeutic targets. This process is inherently iterative and expert-driven, posing challenges in scalability and reproducibility. We present TSAssistant, a multi-agent framework designed to support TSA report drafting through a modular, section-based, and human-in-the-loop par...
455	SeaEvo: Advancing Algorithm Discovery with Strategy Space Evolution 2604.24372 Evolutionary Algorithm Discovery用策略空间进化将语言推理组织为持久群体状态以发现算法。	cs.CLcs.AI	Sichun Luo, Yi Huang, Haochen Luo, Fengyuan Liu, Guanzhi Deng	Large Language Model (LLM)-guided evolutionary search is increasingly used for automated algorithm discovery, yet most current methods track search progress primarily through executable programs and scalar fitness. Even when natural-language reasoning is used ... Large Language Model (LLM)-guided evolutionary search is increasingly used for automated algorithm discovery, yet most current methods track search progress primarily through executable programs and scalar fitness. Even when natural-language reasoning is used through heuristic descriptions or reflection, it typically remains transient mutation context or unstructured memory, rather than organized as persistent population-level state over strategic directions. As a result, evolutionary search can...
456	Structural Generalization on SLOG without Hand-Written Rules 2604.26157 Structural Generalization Semantic Parsing用离散瓶颈神经元胞自动机从数据学习组合规则实现结构泛化。	cs.CLcs.AI	Zichao Wei	Structural generalization in semantic parsing requires systems to apply learned compositional rules to novel structural combinations. Existing approaches either rely on hand-written algebraic rules (AM-Parser) or fail to generalize structurally (Transformer-ba... Structural generalization in semantic parsing requires systems to apply learned compositional rules to novel structural combinations. Existing approaches either rely on hand-written algebraic rules (AM-Parser) or fail to generalize structurally (Transformer-based models). We present an alternative requiring no hand-written compositional rules, based on a neural cellular automaton (NCA) with a discrete bottleneck: all compositional rules are learned from data through local iteration. On the SLOG ...
457	Can AI Debias the News? LLM Interventions Improve Cross-Partisan Receptivity but LLMs Overestimate Their Own Effectiveness 2605.01006 LLM News Debiasing Study实验检验LLM改写新闻标题对跨党派接受度的影响与偏差。	cs.CL	Faisal Feroz, Jonas R. Kunst	Partisan news media erode cross-partisan trust, but large language models (LLMs) offer a potential means of debiasing such content at scale. Across two pre-registered experiments, we tested whether LLM-generated debiasing of liberal news headlines could improv... Partisan news media erode cross-partisan trust, but large language models (LLMs) offer a potential means of debiasing such content at scale. Across two pre-registered experiments, we tested whether LLM-generated debiasing of liberal news headlines could improve conservative readers' trust-relevant judgments. Study 1 found that subtle lexical debiasing (replacing emotive words with more moderate synonyms) had no effect on any outcome. Study 2 found that a more substantive reframing intervention s...
458	OralMLLM-Bench: Evaluating Cognitive Capabilities of Multimodal Large Language Models in Dental Practice 2605.01333 Dental Multimodal LLM Benchmark发布口腔影像基准评测多模态大模型的放射诊断认知能力。	cs.CL	Rongyang Wang, Shuang Zhou, Jiashuo Wang, Wenya Xie, Xiaoxia Che	Multimodal large language models (MLLMs) have emerged as a promising paradigm for dental image analysis. However, their ability to capture the multi-level cognitive processes required for radiographic analysis remains unclear. Here, we present a comprehensive ... Multimodal large language models (MLLMs) have emerged as a promising paradigm for dental image analysis. However, their ability to capture the multi-level cognitive processes required for radiographic analysis remains unclear. Here, we present a comprehensive benchmark to evaluate the cognitive capabilities of MLLMs in dental radiographic analysis. It spans three critical imaging modalities, i.e., periapical, panoramic, and lateral cephalometric radiographs, and defines four cognitive categories...
459	TCDA: Thread-Constrained Discourse-Aware Modeling for Conversational Sentiment Quadruple Analysis 2605.01717 Conversational Sentiment Quadruple提出线程约束话语建模以提升对话情感四元组抽取。	cs.CLcs.AI	Xinran Li, Xinze Che, Yifan Lyu, Zhiqi Huang, Xiujuan Xu	Conversational Aspect-based Sentiment Quadruple Analysis (DiaASQ) needs to capture the complex interrelationships in multiple rounds of dialogues. Existing methods usually employ simple Graph Convolutional Networks (GCN), which introduce structural noise and f... Conversational Aspect-based Sentiment Quadruple Analysis (DiaASQ) needs to capture the complex interrelationships in multiple rounds of dialogues. Existing methods usually employ simple Graph Convolutional Networks (GCN), which introduce structural noise and fail to consider the temporal sequence of the dialogues, or use standard RoPE, which implicitly captures relative distances in a flat sequence but cannot clearly separate the token-level syntactic order from the utterance-level progression, ...
460	Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning 2605.02073 Reward Optimization for RL Reasoning用搜索驱动强化学习优化奖励函数以增强LLM数学推理。	cs.CL	Arash Ahmadi (Mike), Sarah Sharif (Mike), Yaser (Mike), Banad	Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward f... Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives policy optimization. This paper introduces a search-driven framework that treats the reward specification itself as an object of optimization. The setting of interest is one in which the base model is held fixed and the ...
461	Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training 2605.04913 Local Learning Post-training提出更便宜更快的局部学习方案降低LLM后训练开销。	cs.CLcs.LG	Hengyu Shi, Tianyang Han, Peizhe Wang, Zhiling Wang, Xu Yang	LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-g... LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we prop...
462	Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM 2605.05927 Speech LLM Modality Gap从输入侧引入韵律感知表示以缩小语音与文本LLM差距。	cs.CLcs.SDeess.AS	Wenqian Cui, Xiao-Hui Li, Daxin Tan, Qiyong Zheng, Irwin King	Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech generation... Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech generation more text-like, but the gap remains. We argue that the key remaining bottleneck lies on the input side. We propose TextPro-SLM, an SLM that makes spoken input more closely resemble that of a prosody-aware text LLM. TextPro-SLM combines Whi...
463	SEQUOR: A Multi-Turn Benchmark for Realistic Constraint Following 2605.06353 Long-horizon Constraint Following Benchmark提出SEQUOR评测长多轮对话中约束遵循与一致性。	cs.CL	Beatriz Canaverde, Duarte M. Alves, Jos\'e Pombal, Giuseppe Attanasio, Andr\'e F. T. Martins	In a conversation, a helpful assistant must reliably follow user directives, even as they refine, modify, or contradict earlier requests. Yet most instruction-following benchmarks focus on single-turn or short multi-turn scenarios, leaving open how well models... In a conversation, a helpful assistant must reliably follow user directives, even as they refine, modify, or contradict earlier requests. Yet most instruction-following benchmarks focus on single-turn or short multi-turn scenarios, leaving open how well models handle long-horizon instruction-following tasks. To bridge this gap, we present SEQUOR, an automatic benchmark for evaluating constraint adherence in long multi-turn conversations. SEQUOR consists of simulated persona-driven interactions b...
464	Statistical Patterns in the Equations of Physics and the Emergence of a Meta-Law of Nature 2408.11065 Statistical Physics Equation Patterns分析物理方程语料的统计结构规律以探讨元规律。	cs.CL	Andrei Constantin, Deaglan Bartlett, Harry Desmond, Pedro G. Ferreira	Physics seeks to uncover the laws of Nature and express them through mathematical equations. Despite the vast diversity of natural phenomena, physical equations exhibit structural regularities that set them apart from arbitrary mathematical expressions. While ... Physics seeks to uncover the laws of Nature and express them through mathematical equations. Despite the vast diversity of natural phenomena, physical equations exhibit structural regularities that set them apart from arbitrary mathematical expressions. While principles such as dimensional analysis have long guided the formulation of physical models, the exploration of more subtle statistical patterns within the equations of physics remains an open question. Here, by analysing four corpora of ph...
465	UNA: A Unified Supervised Framework for Efficient LLM Alignment Across Feedback Types 2408.15339 Unified LLM Alignment Supervision统一偏好与打分等多种反馈信号实现高效对齐训练。	cs.CLcs.LG	Zhichao Wang, Bin Bi, Can Huang, Shiva Kumar Pentyala, Zixu James Zhu	RL alignment methods, including RLHF and DPO, are primarily based on pairwise preference data. Although scalar or score-based feedback has been collected in some settings, it is rarely used directly, and preference magnitude information is typically ignored. F... RL alignment methods, including RLHF and DPO, are primarily based on pairwise preference data. Although scalar or score-based feedback has been collected in some settings, it is rarely used directly, and preference magnitude information is typically ignored. Furthermore, current alignment frameworks offer limited capability for unifying heterogeneous supervision signals, making it difficult to jointly leverage diverse data types within a single training paradigm. This limitation constrains the r...
466	From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems 2506.04565 Compound AI Systems Survey综述将LLM与检索、工具、代理编排集成的复合AI系统。	cs.CL	Jiayi Chen, Junyi Ye, Guiling Wang	Compound AI Systems (CAIS) are an emerging paradigm that integrates large language models (LLMs) with external components, including retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reaso... Compound AI Systems (CAIS) are an emerging paradigm that integrates large language models (LLMs) with external components, including retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reasoning, real-time grounding, and multimodal understanding. These systems enable more capable and context-aware behaviors by composing multiple specialized modules into cohesive workflows. Despite growing adoption in both academia and industry...
467	MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge 2507.21183 Preference Optimization with Priors提出MaPPO将先验奖励知识融入偏好优化目标以对齐模型。	cs.CLcs.LGcs.AI	Guangchen Lan, Sipeng Zhang, Tianle Wang, Yuwei Zhang, Daoan Zhang	As the era of large language models (LLMs) unfolds, Preference Optimization (PO) methods have become a central approach to aligning LLMs with human preferences and improving performance. We propose Maximum a Posteriori Preference Optimization (MaPPO), a method... As the era of large language models (LLMs) unfolds, Preference Optimization (PO) methods have become a central approach to aligning LLMs with human preferences and improving performance. We propose Maximum a Posteriori Preference Optimization (MaPPO), a methodology for learning from preferences that explicitly incorporates prior reward knowledge into the optimization objective. Building on the paradigm employed by Direct Preference Optimization (DPO) and its variants of treating preference learn...
468	Searching for Privacy Risks in LLM Agents via Simulation 2508.10880 Privacy Attacks in LLM Agents用模拟搜索交替改进攻防策略以发现代理隐私风险。	cs.CLcs.AI	Yanzhe Zhang, Diyi Yang	The widespread deployment of LLM-based agents is likely to introduce a critical privacy threat: malicious agents that proactively engage others in multi-turn interactions to extract sensitive information. However, the evolving nature of such dynamic dialogues ... The widespread deployment of LLM-based agents is likely to introduce a critical privacy threat: malicious agents that proactively engage others in multi-turn interactions to extract sensitive information. However, the evolving nature of such dynamic dialogues makes it challenging to anticipate emerging vulnerabilities and design effective defenses. To tackle this problem, we present a search-based framework that alternates between improving attack and defense strategies through the simulation of...
469	A Multi-Memory Segment System for Generating High-Quality Long-Term Memory Content in Agents 2508.15294 Agent Long-term Memory Generation提出多记忆分段系统生成更高质量的代理长期记忆内容。	cs.CLcs.AI	Gaoke Zhang, Bo Wang, Yunlong Ma, Dongming Zhao, Zifei Yu	In the current field of agent memory, extensive explorations have been conducted in the area of memory retrieval, yet few studies have focused on exploring the memory content. Most research simply stores summarized versions of historical dialogues, as exemplif... In the current field of agent memory, extensive explorations have been conducted in the area of memory retrieval, yet few studies have focused on exploring the memory content. Most research simply stores summarized versions of historical dialogues, as exemplified by methods like A-MEM and MemoryBank. However, when humans form long-term memories, the process involves multi-dimensional and multi-component generation, rather than merely creating simple summaries. The low-quality memory content gene...
470	Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation 2509.03736 Behavioral Coherence of LLM Agents用潜在画像评估LLM代理在社会模拟中的行为一致性。	cs.CLcs.LGcs.AI	James Mooney, Josef Woldense, Zheng Robert Jia, Shirley Anugrah Hayati, My Ha Nguyen	The impressive capabilities of Large Language Models (LLMs) raise the possibility that synthetic agents can serve as substitutes for real participants in human-subject research. To evaluate this claim, prior research has largely focused on whether LLM-generate... The impressive capabilities of Large Language Models (LLMs) raise the possibility that synthetic agents can serve as substitutes for real participants in human-subject research. To evaluate this claim, prior research has largely focused on whether LLM-generated survey responses align with those produced by human respondents whom the LLMs are prompted to represent. In contrast, we address a more fundamental question: Do agents maintain empirical consistency; aligning to human behavioral models wh...
471	SpikingBrain: Spiking Brain-inspired Large Models 2509.05276 Brain-inspired Spiking Large Models提出脉冲脑启发大模型以高效支持长上下文训练与推理。	cs.CLcs.LGcs.AI	Yuqi Pan, Yupeng Feng, Jinghao Zhuang, Siyu Ding, Han Xu	Mainstream Transformer-based large language models face major efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly, limiting long-context processing. Building large models on non-NVIDIA pla... Mainstream Transformer-based large language models face major efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly, limiting long-context processing. Building large models on non-NVIDIA platforms also poses challenges for stable and efficient training. To address this, we introduce SpikingBrain, a family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU c...
472	Automated Evaluation can Distinguish the Good and Bad AI Responses to Patient Questions about Hospitalization 2510.00436 Automated Evaluation of Medical QA研究自动指标能否区分住院问答中AI回复优劣并对齐专家。	cs.CLcs.AI	Sarvesh Soni, Dina Demner-Fushman	Automated approaches to answer patient-posed health questions are rising, but selecting among systems requires reliable evaluation. The current gold standard for evaluating the free-text artificial intelligence (AI) responses--human expert review--is labor-int... Automated approaches to answer patient-posed health questions are rising, but selecting among systems requires reliable evaluation. The current gold standard for evaluating the free-text artificial intelligence (AI) responses--human expert review--is labor-intensive and slow, limiting scalability. Automated metrics are promising yet variably aligned with human judgments and often context-dependent. To address the feasibility of automating the evaluation of AI responses to hospitalization-related...
473	InvThink: Premortem Reasoning for Safer Language Models 2510.01569 Premortem Safety Reasoning通过预演失败与约束生成的三步流程提升语言模型安全性。	cs.CLcs.AI	Yubin Kim, Taehan Kim, Eugene Park, Chunjong Park, Cynthia Breazeal	We present InvThink, a training and prompting framework that requires the model to enumerate, analyze, and constrain potential failures before generating its final response. Unlike existing safety alignment methods that optimize only for safe final responses, ... We present InvThink, a training and prompting framework that requires the model to enumerate, analyze, and constrain potential failures before generating its final response. Unlike existing safety alignment methods that optimize only for safe final responses, InvThink structures generation into three steps: (1) enumerate potential harms, (2) analyze their consequences, (3) generate the response under explicit mitigation constraints. We observe three findings: (i) InvThink shows higher safety sco...
474	Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models 2601.04731 Data-efficient RL for Reasoning用策略不确定性作内在奖励提升大推理模型无评论家RL效率。	cs.CLcs.AI	Shuyang Jiang, Yuhao Wang, Ya Zhang, Yanfeng Wang, Yu Wang	Current critic-free RL methods for large reasoning models suffer from severe inefficiency when training on positive homogeneous prompts (where all rollouts are correct), resulting in waste of rollouts due to zero advantage estimates. We introduce a radically s... Current critic-free RL methods for large reasoning models suffer from severe inefficiency when training on positive homogeneous prompts (where all rollouts are correct), resulting in waste of rollouts due to zero advantage estimates. We introduce a radically simple yet powerful solution to \uline{M}ine \uline{in}trinsic mast\uline{er}y (Miner), that repurposes the policy's intrinsic uncertainty as a self-supervised reward signal, with no external supervision, auxiliary models, or additional infe...
475	Neural Neural Scaling Laws 2601.19831 Downstream Task Scaling Laws提出新方法刻画下游任务多样化缩放规律而非仅看验证损失。	cs.CLcs.LG	Michael Y. Hu, Jane Pan, Ayush Rajesh Jhaveri, Nicholas Lourie, Kyunghyun Cho	Neural scaling laws predict how language model performance improves with increased training inputs. While aggregate metrics like validation loss can follow smooth power-law curves, individual downstream tasks exhibit diverse scaling behaviors: some improve mon... Neural scaling laws predict how language model performance improves with increased training inputs. While aggregate metrics like validation loss can follow smooth power-law curves, individual downstream tasks exhibit diverse scaling behaviors: some improve monotonically, others plateau, and some even degrade with scale. We argue that predicting downstream performance from validation loss suffers from two limitations: averaging token-level losses obscures signal, and no simple parametric family c...
476	Sign-Based Optimizers Are Effective Under Heavy-Tailed Noise 2602.07425 Sign-based Optimization Theory从重尾梯度噪声理论解释符号优化器优于AdamW的原因。	cs.CLcs.LG	Dingzhi Yu, Hongyi Tao, Yuanyu Wan, Luo Luo, Lijun Zhang	While adaptive gradient methods are the workhorse of modern machine learning, sign-based optimization algorithms such as Lion and Muon have recently demonstrated superior empirical performance over AdamW in training large language models (LLM). However, a theo... While adaptive gradient methods are the workhorse of modern machine learning, sign-based optimization algorithms such as Lion and Muon have recently demonstrated superior empirical performance over AdamW in training large language models (LLM). However, a theoretical understanding of why sign-based updates outperform variance-adapted methods remains elusive. In this paper, we aim to bridge the gap between theory and practice through the lens of heavy-tailed gradient noise, a phenomenon frequentl...
477	Flexible Entropy Control in RLVR with a Gradient-Preserving Perspective 2602.09782 Entropy Control in RLVR提出保梯度视角的熵控制方法缓解RLVR训练中的熵塌缩。	cs.CLcs.LGcs.AI	Kun Chen, Peng Shi, Fanfan Liu, Haibo Qiu, Zhixiong Zeng	Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a critical method for enhancing the reasoning capabilities of Large Language Models (LLMs). However, continuous training often leads to policy entropy collapse, characterized by a rapid decay... Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a critical method for enhancing the reasoning capabilities of Large Language Models (LLMs). However, continuous training often leads to policy entropy collapse, characterized by a rapid decay in entropy that results in premature overconfidence, reduced output diversity, and vanishing gradient norms that inhibit learning. Gradient-Preserving Clipping is a primary factor influencing these dynamics, but existing mitigation strateg...
478	Overview of the TREC 2025 RAGTIME Track 2602.10024 TREC RAGTIME Track Overview总结TREC多语种RAGTIME报告生成与检索任务及参赛结果。	cs.CL	Dawn Lawrie, Sean MacAvaney, James Mayfield, Luca Soldaini, Eugene Yang	The principal goal of the RAG TREC Instrument for Multilingual Evaluation (RAGTIME) track at TREC is to study report generation from multilingual source documents. The track has created a document collection containing Arabic, Chinese, English, and Russian new... The principal goal of the RAG TREC Instrument for Multilingual Evaluation (RAGTIME) track at TREC is to study report generation from multilingual source documents. The track has created a document collection containing Arabic, Chinese, English, and Russian news stories. RAGTIME includes three task types: Multilingual Report Generation, English Report Generation, and Multilingual Information Retrieval (MLIR). A total of 125 runs were submitted by 13 participating teams (and as baselines by the tr...
479	A Geometric Taxonomy of Hallucinations in LLMs 2602.13224 Hallucination Taxonomy and Detection提出幻觉几何分类并在黑盒单次问答场景下改进检测。	cs.CLcs.AI	Javier Mar\'in	Hallucinations in deployed language models can have real consequences for downstream decisions in domains such as healthcare, legal, and financial services. In production, detection has to run on what the deployed system can see: the query, the response, and o... Hallucinations in deployed language models can have real consequences for downstream decisions in domains such as healthcare, legal, and financial services. In production, detection has to run on what the deployed system can see: the query, the response, and often a source document. White-box access to model internals and multi-sample querying are not generally available behind a third-party API. Within this setting - black-box, single-pass, only question/answer available - the dominant baseline...
480	ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation 2602.15189 Schema-constrained LLM Dataset发布ScrapeGraphAI-100k真实网页抽取数据集用于JSON模式约束生成。	cs.CLcs.AI	William Brach, Francesco Zuppichini, Marco Vinciguerra, Lorenzo Padoan	Producing output that conforms to a specified JSON schema underlies tool use, structured extraction, and knowledge base construction in modern large language models. Despite this centrality, public datasets for the task remain small, synthetic, or text-only, a... Producing output that conforms to a specified JSON schema underlies tool use, structured extraction, and knowledge base construction in modern large language models. Despite this centrality, public datasets for the task remain small, synthetic, or text-only, and rarely pair real page content with the prompts and schemas used in practice. We introduce ScrapeGraphAI-100k, 93,695 schema-constrained extraction events collected via opt-in ScrapeGraphAI telemetry in Q2--Q3 2025, deduplicated and balan...
481	Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features 2603.03096 SSL语音特征说话人解析用PCA分析自监督语音特征各维度的说话人信息。	cs.CLeess.AS	Kyle Janse van Rensburg, Benjamin van Niekerk, Herman Kamper	How do speech models trained through self-supervised learning structure their representations? Previous studies have looked at how information is encoded in feature vectors across different layers. But few studies have considered whether speech characteristics... How do speech models trained through self-supervised learning structure their representations? Previous studies have looked at how information is encoded in feature vectors across different layers. But few studies have considered whether speech characteristics are captured within individual dimensions of SSL features. In this paper we specifically look at speaker information using PCA on utterance-averaged representations. For a range of SSL models, we find that the principal dimension that expl...
482	Sparser, Faster, Lighter Transformer Language Models 2603.23198 稀疏Transformer加速推理用非结构化稀疏与CUDA内核降低LLM前馈层算力与内存。	cs.CLcs.LG	Edoardo Cetin, Stefano Peluchetti, Emilio Castillo, Akira Naruse, Mana Murakami	Scaling autoregressive large language models (LLMs) has driven unprecedented progress but comes with vast computational costs. In this work, we tackle these costs by leveraging unstructured sparsity within an LLM's feedforward layers, the components accounting... Scaling autoregressive large language models (LLMs) has driven unprecedented progress but comes with vast computational costs. In this work, we tackle these costs by leveraging unstructured sparsity within an LLM's feedforward layers, the components accounting for most of the model parameters and execution FLOPs. To achieve this, we introduce a new sparse packing format and a set of CUDA kernels designed to seamlessly integrate with the optimized execution pipelines of modern GPUs, enabling effi...
483	SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks 2603.24755 迭代式编码代理基准提出SlopCodeBench评测编码代理在长程迭代扩展中的退化。	cs.CLcs.AI	Gabriel Orlanski, Devjeet Roy, Alexander Yun, Changho Shin, Alex Gu	Software development is iterative, yet agentic coding benchmarks hide design issues through their single-shot setup. Recent iterative benchmarks attempt to remedy this but heavily constrain an agent's design decision space, making it impossible to faithfully m... Software development is iterative, yet agentic coding benchmarks hide design issues through their single-shot setup. Recent iterative benchmarks attempt to remedy this but heavily constrain an agent's design decision space, making it impossible to faithfully measure how their decisions shape future extensions. We introduce SlopCodeBench, a benchmark of 36 problems and 196 checkpoints where agents repeatedly extend their own solutions. Unlike prior iterative benchmarks, our evolving specification...
484	OASES: Outcome-Aligned Search-Evaluation Co-Training for Agentic Search 2604.03675 搜索代理共训练奖励用结果对齐的搜索-评估共训练缓解RLVR中间步骤信用分配稀疏。	cs.CLcs.AI	Erhan Zhang, Yiqun Chen, Zechun Niu, Wei Yang, Xiaochi Wei	Agentic search enables language models to solve knowledge-intensive tasks by adaptively acquiring external evidence over multiple steps. Reinforcement learning with verifiable rewards (RLVR) has emerged as a widely adopted training paradigm for search agents, ... Agentic search enables language models to solve knowledge-intensive tasks by adaptively acquiring external evidence over multiple steps. Reinforcement learning with verifiable rewards (RLVR) has emerged as a widely adopted training paradigm for search agents, yet outcome-only rewards are sparse and provide limited credit assignment for intermediate search actions. Existing process-reward methods therefore seek to densify supervision through proxy signals, external evaluators, or likelihood-based...
485	KV Cache Offloading for Context-Intensive Tasks 2604.08426 KV缓存卸载评测系统评估长上下文任务中KV-cache卸载对延迟内存与准确率影响。	cs.CLcs.LGcs.AI	Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev, Vyacheslav Zhdanovskiy, Yegor Yershov	With the growing demand for long-context LLMs across a wide range of applications, the key-value (KV) cache has become a critical bottleneck for both latency and memory usage. Recently, KV-cache offloading has emerged as a promising approach to reduce memory f... With the growing demand for long-context LLMs across a wide range of applications, the key-value (KV) cache has become a critical bottleneck for both latency and memory usage. Recently, KV-cache offloading has emerged as a promising approach to reduce memory footprint and inference latency while preserving accuracy. Prior evaluations have largely focused on tasks that do not require extracting large amounts of information from the context. In this work, we study KV-cache offloading on context-in...
486	ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning 2605.00380 推理RL的负样本残差用负样本投影残差强化学习提升推理并保持生成多样性。	cs.CLcs.LG	Zihan Lin, Xiaohan Wang, Jie Cao, Jiajun Chai, Li Wang	Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement ... Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposes ne...
487	Multilingual Safety Alignment via Self-Distillation 2605.02971 多语种安全自蒸馏对齐用跨语言自蒸馏将高资源语言护栏迁移到低资源语言防越狱。	cs.CLcs.LGcs.AI	Ruiyang Qin, Qingzhuo Wang, Dongrui Liu, Qiang Li, Zhihua Wei	Large language models (LLMs) exhibit severe multilingual safety misalignment: they possess strong safeguards in high-resource languages but remain highly vulnerable to jailbreak attacks in low-resource languages. Current safety alignment methods generally rely... Large language models (LLMs) exhibit severe multilingual safety misalignment: they possess strong safeguards in high-resource languages but remain highly vulnerable to jailbreak attacks in low-resource languages. Current safety alignment methods generally rely on high-quality response data for each target language, which is expensive and difficult to generate. In this paper, we propose a cross-lingual safeguard transfer framework named Multilingual Self-Distillation (MSD). This framework transfe...
488	FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation 2605.04651 前向闭式快速权重适配提出FAAST将标注样本一次前向编译为快权重实现测试时快速适配。	cs.CLcs.LG	Guangsheng Bao, Hongbo Zhang, Han Cui, Ke Sun, Yanbin Zhao	Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytical... Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytically compiles labeled examples into fast weights in a single pass. By eliminating memory or context dependence, FAAST achieves constant-time inference and decouples task adaptation from pretrained representation. Across image classification a...
489	Belief Memory: Agent Memory Under Partial Observability 2605.05583 部分可观测的信念记忆用信念分布存储不确定观察，减少代理记忆自强化错误。	cs.CLcs.AI	Junfeng Liao, Qizhou Wang, Jianing Zhu, Bo Du, Rui Yan	LLM agents that operate over long context depend on external memory to accumulate knowledge over time. However, existing methods typically store each observation as a single deterministic conclusion (e.g., inferring "API~X failed" from temporary errors), even ... LLM agents that operate over long context depend on external memory to accumulate knowledge over time. However, existing methods typically store each observation as a single deterministic conclusion (e.g., inferring "API~X failed" from temporary errors), even though such observations are inherently partial and potentially ambiguous. By committing to one conclusion and discarding uncertainty, these methods introduce self-reinforcing error: the agent acts on the stored conclusion, never revisits a...
490	Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks 2605.05995 几何瓶颈防有害微调提出Safety Anchor以几何瓶颈限制优化轨迹抵御有害微调攻击。	cs.CLcs.AI	Guoxin Lu, Letian Sha, Qing Wang, Peijie Sun, Hao Zhou	The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on parameters, gradients, or internal representations, we observe that they can be effectively circumvented under p... The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on parameters, gradients, or internal representations, we observe that they can be effectively circumvented under persistent HFT. Our analysis traces this failure to the inherent redundancy of the high-dimensional parameter space: attackers exploit optimization trajectories that are orthogonal to defense constraints to restore harmful capabilities while...
491	On Time, Within Budget: Constraint-Driven Online Resource Allocation for Agentic Workflows 2605.06110 代理工作流在线资源分配在预算与截止约束下为多模型工作流做在线资源分配优化。	cs.CLcs.AI	Xinglin Wang, Zishen Liu, Shaoxiong Feng, Peiwen Yuan, Yiwei Li	Agentic systems increasingly solve complex user requests by executing orchestrated workflows, where subtasks are assigned to specialized models or tools and coordinated according to their dependencies. While recent work improves agent efficiency by optimizing ... Agentic systems increasingly solve complex user requests by executing orchestrated workflows, where subtasks are assigned to specialized models or tools and coordinated according to their dependencies. While recent work improves agent efficiency by optimizing the performance--cost--latency frontier, real deployments often impose concrete requirements: a workflow must be completed within a specified budget and before a specified deadline. This shifts the goal from average efficiency optimization ...
cs.CV 280 papers
1	Visual Text Compression as Measure Transport 2605.06708 Visual text compression将文本渲染成图像压缩，并用测度传输解释性能差异。	cs.CVcs.AI	Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li	Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not t... Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not translate predictably into downstream utility: on some tasks the visual path matches or exceeds the text path, on others it collapses, and the compression ratio itself does not predict which regime will occur. The missing quantity is therefo...
2	Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey 2605.06714 Edge deep learning survey综述边缘端深度学习在视觉与医疗诊断中的方法与挑战。	cs.CVcs.AI	Yiwen Xu, Tariq M. Khan, Yang Song, Erik Meijering	Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensiv... Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensive review of the current state of the art in edge deep learning, focusing on computer vision applications, in particular medical diagnostics. An overview of the foundational principles and technical advantages of edge deep learning is presen...
3	HumanNet: Scaling Human-centric Video Learning to One Million Hours 2605.06747 Large-scale human video dataset发布百万小时人类活动视频数据集以扩展具身学习。	cs.CV	Yufan Deng, Daquan Zhou	Progress in embodied intelligence increasingly depends on scalable data infrastructure. While vision and language have scaled with internet corpora, learning physical interaction remains constrained by the lack of large, diverse, and richly annotated human act... Progress in embodied intelligence increasingly depends on scalable data infrastructure. While vision and language have scaled with internet corpora, learning physical interaction remains constrained by the lack of large, diverse, and richly annotated human activity data. We present HumanNet, a one-million-hour human-centric video corpus that captures how humans interact with the physical world at scale. HumanNet spans both first-person and third-person perspectives and covers fine-grained activi...
4	R$^3$L: Reasoning 3D Layouts from Relative Spatial Relations 2605.06758 3D layout spatial reasoning提升相对空间关系推理一致性以生成可靠3D布局。	cs.CVcs.LGcs.AI	Zhifeng Gu, Yuqi Wang, Bing Wang	Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred... Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred relations are often unreliable and are typically handled with post-hoc heuristics. In this paper, we propose R$^3$L, a general framework that improves the reliability and consistency of relative spatial reasoning for 3D layout generation. ...
5	LookWhen? Fast Video Recognition by Learning When, Where, and What to Compute 2605.06809 Efficient video recognition学习何时何地计算以减少视频Transformer冗余开销。	cs.CVcs.LG	Ali Salamatian, Anthony Fuller, Pritam Sarkar, James R. Green, Leonid Sigal	Transformers dominate video recognition. They split videos into tokens, and processing them has expensive superlinear computational cost. Yet videos are filled with redundancy, so we can question the need for this expense. We introduce LookWhen, a selector-ext... Transformers dominate video recognition. They split videos into tokens, and processing them has expensive superlinear computational cost. Yet videos are filled with redundancy, so we can question the need for this expense. We introduce LookWhen, a selector-extractor framework that factorizes video recognition into learning when, where, and what to compute. Our shallow selector gets a scaled-down video and quickly scores all tokens across space-time, while our deep extractor gets the top-K select...
6	Knowledge Transfer Scaling Laws for 3D Medical Imaging 2605.06859 Scaling laws for 3D medical pretraining研究3D医学多模态预训练的迁移缩放规律与混合策略。	cs.CVcs.LGcs.AI	Ho Hin Lee, Dongna Du, Chu Wang, Yuankai Huo, Shi Gu	Vision foundation models are increasingly moving beyond 2D to volumetric domains such as 3D medical imaging, where unified pretraining across different imaging modalities (i.e. CT, MRI, and PET) could provide foundational models for diverse clinical tasks. How... Vision foundation models are increasingly moving beyond 2D to volumetric domains such as 3D medical imaging, where unified pretraining across different imaging modalities (i.e. CT, MRI, and PET) could provide foundational models for diverse clinical tasks. However, training such models requires mixing heterogeneous imaging domains, and current mixture strategies remain largely heuristic. In this work, we observe that different medical imaging domains scale at variable rates during pretraining, a...
7	AdpSplit: Error-Driven Adaptive Splitting for Faster Geometry Discovery in 3D Gaussian Splatting 2605.06876 3D Gaussian splatting acceleration用误差驱动自适应分裂加速3DGS几何细节发现。	cs.CV	Yongjae Lee, Jingxing Li, Abhay Kumar Yadav, Rama Chellappa, Deliang Fan	Adaptive density control in 3D Gaussian Splatting (3DGS) repeatedly grows the Gaussian population through fixed-cardinality random splitting to discover useful scene structure. However, in vanilla 3DGS, its binary split operator requires many densification rou... Adaptive density control in 3D Gaussian Splatting (3DGS) repeatedly grows the Gaussian population through fixed-cardinality random splitting to discover useful scene structure. However, in vanilla 3DGS, its binary split operator requires many densification rounds to expose fine details, making it a bottleneck for efficient training schedules with fewer iterations. We introduce AdpSplit, an error-driven adaptive split operator that determines the number of split children and initializes the child...
8	TriDE: Triangle-Consistent Translation Directions for Global Camera Pose Estimation 2605.06889 Global camera pose estimation利用三角一致性联合校验平移方向以估计全局相机位姿。	cs.CV	Francisco Chen, Yiran Wang, Yunpeng Shi	Pairwise translation directions are a key input to camera location estimation in global structure-from-motion. Existing estimators usually process each image pair independently, producing directions that may be locally plausible but inconsistent with the other... Pairwise translation directions are a key input to camera location estimation in global structure-from-motion. Existing estimators usually process each image pair independently, producing directions that may be locally plausible but inconsistent with the other relative directions in the viewing graph. To jointly estimate the direction, we propose TriDE, which exploits camera-triangle consistency as an efficient higher-order verification signal. Instead of solving a costly global nonlinear optimi...
9	Towards Fairness under Label Bias in Image Segmentation: Impact, Measurement and Mitigation 2605.06891 Fair segmentation under label bias在无干净标注下检测并缓解分割任务的群体标签偏差。	cs.CVcs.LG	Aditya Parikh, Stella Frank, Sneha Das, Aasa Feragen	Labeled datasets reflect the biases of their annotation pipelines, which sometimes introduce label bias: group-conditional label errors that cause systematic performance disparities across demographic subgroups. Label bias in image segmentation remains underex... Labeled datasets reflect the biases of their annotation pipelines, which sometimes introduce label bias: group-conditional label errors that cause systematic performance disparities across demographic subgroups. Label bias in image segmentation remains underexplored, as even detecting it typically requires clean, unbiased annotations, which are not readily available. We present a data-centric adaptation of Confident Learning to segmentation, allowing detection of label bias directly in the train...
10	Not All Tokens Need 40 Steps: Heterogeneous Step Allocation in Diffusion Transformers for Efficient Video Generation 2605.06892 Efficient diffusion video generation推理时为不同视频token分配不同去噪步数以降算力。	cs.CV	Ernie Chu, Vishal M. Patel	Diffusion Transformers (DiTs) have achieved state-of-the-art video generation quality, but they incur immense computational cost because standard inference applies the same number of denoising steps uniformly to every token in the sequence. It is well known th... Diffusion Transformers (DiTs) have achieved state-of-the-art video generation quality, but they incur immense computational cost because standard inference applies the same number of denoising steps uniformly to every token in the sequence. It is well known that human vision ignores vast amounts of redundant motion. Why, then, do our densest models treat every spatiotemporal token with equal priority? In this paper, we introduce Heterogeneous Step Allocation (HSA), a training-free inference algo...
11	Advancing Reliable Synthetic Video Detection: Insights from the SAFE Challenge 2605.06912 Synthetic video detection benchmark总结SAFE挑战赛并分析可靠合成视频检测方法与评测。	cs.CV	Kirill Trapeznikov, Gabriel Mancino-Ball, Jonathan Li, Paul Cummer, Jai Aslam	The proliferation of generative video technologies has intensified the need for reliable methods to detect and characterize synthetic media. To address this challenge, we organized the \href{https://safe-video-2025.dsri.org}{SAFE: Synthetic Video Detection Cha... The proliferation of generative video technologies has intensified the need for reliable methods to detect and characterize synthetic media. To address this challenge, we organized the \href{https://safe-video-2025.dsri.org}{SAFE: Synthetic Video Detection Challenge}, co-located with the \textit{Authenticity and Provenance in the Age of Generative AI (APAI) Workshop }at ICCV 2025. The competition invited participants to develop and evaluate algorithms capable of distinguishing real from syntheti...
12	A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency 2605.06924 Long video consistent generation提出闭环检索-生成-精炼的自回归扩散以保持长视频一致性。	cs.CVcs.AI	Do Xuan Long, Yale Song, Min-Yen Kan, Tomas Pfister, Long T. Le	Synthesizing consistent and coherent long video remains a fundamental challenge. Existing methods suffer from semantic drift and narrative collapse over long horizons. We present A$^2$RD, an Agentic Auto-Regressive Diffusion architecture that decouples creativ... Synthesizing consistent and coherent long video remains a fundamental challenge. Existing methods suffer from semantic drift and narrative collapse over long horizons. We present A$^2$RD, an Agentic Auto-Regressive Diffusion architecture that decouples creative synthesis from consistency enforcement. A$^2$RD formulates long video synthesis as a closed-loop process that synthesizes and self-improves video segment-by-segment through a Retrieve--Synthesize--Refine--Update cycle. It comprises three ...
13	XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling 2605.06927 Energy-aware object detection NAS通过迭代架构搜索与能耗估计在边缘设备上优化检测模型。	cs.CVcs.AI	Tony Tran, Richie R. Suganda, Bin Hu	Object detection on heterogeneous edge devices must satisfy strict energy, latency, and memory constraints while still providing reliable perception for downstream autonomy. Existing energy-aware NAS methods often target limited deployment settings, while real... Object detection on heterogeneous edge devices must satisfy strict energy, latency, and memory constraints while still providing reliable perception for downstream autonomy. Existing energy-aware NAS methods often target limited deployment settings, while real energy remains difficult to optimize because it is highly device-dependent and costly to measure. We address these challenges with an energy-adaptive framework that combines an energy-aware XiResOFA search space, a two-stage energy estimat...
14	Bringing Multimodal Large Language Models to Infrared-Visible Image Fusion Quality Assessment 2605.06969 IVIF quality assessment with MLLMs引入多模态大模型推理来评估红外-可见光融合图像质量。	cs.CV	Yuchen Guo, Junli Gong, Yao Lu, Xintong Xu, Yiuming Cheung	Infrared-Visible image fusion (IVIF) aims to integrate thermal information and detailed spatial structures into a single fused image to enhance perception. However, existing evaluation approaches tend to over-optimize both hand-crafted no-reference statistics ... Infrared-Visible image fusion (IVIF) aims to integrate thermal information and detailed spatial structures into a single fused image to enhance perception. However, existing evaluation approaches tend to over-optimize both hand-crafted no-reference statistics and full-reference metrics that treat the source images as pseudo ground truths. Recent IVIF reward-modelling efforts learn from human ratings but use scalar regression on aggregated scores, neither leveraging the reasoning of Multimodal La...
15	TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations 2605.06990 Trajectory-centric geospatial pretraining用地理对齐表示学习对城市轨迹多模态进行自监督预训练。	cs.CVcs.LG	Maria Despoina Siampou, Gengchen Mai, Ni Lao, Jinmeng Rao, Neha Arora	Multimodal self-supervised learning (MSSL) has emerged as a key paradigm for pretraining geospatial foundation models. However, existing geospatial MSSL methods are mainly designed for static pairs of modalities, such as satellite imagery, street-view imagery,... Multimodal self-supervised learning (MSSL) has emerged as a key paradigm for pretraining geospatial foundation models. However, existing geospatial MSSL methods are mainly designed for static pairs of modalities, such as satellite imagery, street-view imagery, and text, where learning is driven by aligning observations from the same or nearby locations. This assumption breaks down for human mobility trajectories, which represent continuous movement along paths rather than discrete observations a...
16	LensVLM: Selective Context Expansion for Compressed Visual Representation of Text 2605.07019 Text-as-image VLM compression用选择性上下文扩展缓解高压缩下文本图像表征精度下降。	cs.CVcs.AI	Roy Xie, Dan Friedman, Donghan Yu, Bowen Pan, Christopher Fifty	Vision Language Models (VLMs) offer the exciting possibility of processing text as rendered images, bypassing the need for tokenizing the text into long token sequences. Since VLM image encoders map fixed-size images to a fixed number of visual tokens, varying... Vision Language Models (VLMs) offer the exciting possibility of processing text as rendered images, bypassing the need for tokenizing the text into long token sequences. Since VLM image encoders map fixed-size images to a fixed number of visual tokens, varying rendering resolution provides a fine-grained compression knob. However, accuracy deteriorates quickly as compression increases: characters shrink below the vision encoder's effective resolution, making them indistinguishable. To address th...
17	OneViewAll: Semantic Prior Guided One-View 6D Pose Estimation for Novel Objects 2605.07023 One-view 6D pose estimation仅用单个RGB-D参考视图与语义先验实现新物体6D位姿估计。	cs.CV	Yang Luo, Yan Gong, Yongsheng Gao, Jie Zhao, Xinyu Zhang	In many practical 6D object pose estimation scenarios, we often have access to only a single real-world RGB-D reference view per object, typically without CAD models. Existing methods largely rely on explicit 3D models or multi-view data, which limits their sc... In many practical 6D object pose estimation scenarios, we often have access to only a single real-world RGB-D reference view per object, typically without CAD models. Existing methods largely rely on explicit 3D models or multi-view data, which limits their scalability. To address this challenging single-reference model-free setting, we propose \textbf{OneViewAll}, a semantic-prior-guided framework that performs pose estimation via a novel Project-and-Compare paradigm. Instead of relying on comp...
18	Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness 2605.07055 Multiorgan medical foundation model以显著性引导掩码训练全身器官基础模型并增强缺失鲁棒性。	cs.CVcs.AI	Qiangqiang Wu, Grace McIlvain, Zhou Yu, Junhao Wen	Foundation models (FMs) have shown great promise in medical imaging, but most FMs are trained on unimodal data within isolated domains, such as brain MRI alone. Human aging and disease arise through coordinated biological processes across organs, therefore mot... Foundation models (FMs) have shown great promise in medical imaging, but most FMs are trained on unimodal data within isolated domains, such as brain MRI alone. Human aging and disease arise through coordinated biological processes across organs, therefore motivating multimodal FMs that learn whole-body representations. A key challenge, however, is that real-world multimodal biomedical data are often missing not at random, which can reduce power, limit generalizability, and introduce bias. We pr...
19	Learning to Track Instance from Single Nature Language Description 2605.07064 Self-supervised vision-language tracking无需框标注，利用自然语言描述实现自监督目标跟踪。	cs.CV	Yaozong Zheng, Bineng Zhong, Qihua Liang, Shuimu Zeng, Haiying Xia	How to achieve vision-language (VL) tracking using natural language descriptions from a video sequence \textbf{without relying on any bounding-box ground truth}? In this work, we achieve this goal by tackling \textit{self-supervised VL tracking}, which aims to... How to achieve vision-language (VL) tracking using natural language descriptions from a video sequence \textbf{without relying on any bounding-box ground truth}? In this work, we achieve this goal by tackling \textit{self-supervised VL tracking}, which aims to evaluate tracking capabilities guided by natural language descriptions. We introduce \textbf{\tracker}, a novel self-supervised VL tracker that is capable of tracking any referred object by a language description. Unlike traditional method...
20	Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection 2605.07074 Universal AI-generated image detection解耦语义与指纹特征以提升跨未知生成器的伪造检测。	cs.CV	Zhiyuan Wang (Hefei University of Technology), Yanxiang Chen (Key Laboratory of Knowledge Engineering with Big Data), Yuanzhi Yao (Key Laboratory of Knowledge Engineering with Big Data), Yunfeng Diao (Key Laboratory of Knowledge Engineering with Big Data)	Detecting AI-generated images across unseen architectures remains challenging, as existing models often overfit to generator-specific fingerprints and semantic content rather than learning universal forgery traces. We attribute this failure to feature entangle... Detecting AI-generated images across unseen architectures remains challenging, as existing models often overfit to generator-specific fingerprints and semantic content rather than learning universal forgery traces. We attribute this failure to feature entanglement: detectors learn these factors as a single entangled representation, where universal forgery traces are inextricably confounded with both generator-specific fingerprints and semantic content. Crucially, our spectral analysis reveals th...
21	Learning Visual Feature-Based World Models via Residual Latent Action 2605.07079 Feature-based world models用残差潜在动作生成式预测未来视觉特征以构建世界模型。	cs.CVcs.LGcs.AI	Xinyu Zhang, Zhengtong Xu, Yutian Tao, Yeping Wang, Yu She	World models predict future transitions from observations and actions. Existing works predominantly focus on image generation only. Visual feature-based world models, on the other hand, predict future visual features instead of raw video pixels, offering a pro... World models predict future transitions from observations and actions. Existing works predominantly focus on image generation only. Visual feature-based world models, on the other hand, predict future visual features instead of raw video pixels, offering a promising alternative that is more efficient and less prone to hallucination. However, current feature-based approaches rely on direct regression, which leads to blurry or collapsed predictions in complex interactions, while generative modelin...
22	ImplantMamba: Long-range Sequential Modeling Mamba For Dental Implant Position Prediction 2605.07082 Dental implant position prediction用Mamba长程建模从牙科影像上下文预测种植体位置与角度。	cs.CV	Xinquan Yang, Congmin Wang, Xuguang Li, Yulei Li, Linlin Shen	In the design of surgical guides for implant placement, determining the precise implant position is a critical step. However, the implant region itself is often characterized by a lack of distinctive texture in medical images. Consequently, artificial intellig... In the design of surgical guides for implant placement, determining the precise implant position is a critical step. However, the implant region itself is often characterized by a lack of distinctive texture in medical images. Consequently, artificial intelligence (AI) models must infer the correct implant position and angulation (slope) primarily by analyzing the texture of the surrounding teeth, which poses a significant challenge. To address this, we propose ImplantMamba, a network architectu...
23	Task Relevance Is Not Local Replaceability: A Two-Axis View of Channel Information 2605.07086 Channel information analysis提出任务相关性与局部可替代性两轴度量以分析通道信息。	cs.CVcs.LG	Houman Safaai, Andrew T. Landau, Celia C. Beron, Yasin Mazloumi, Bernardo L. Sabatini	Channel importance in vision networks is usually summarized by a single score. That summary hides two different questions: how much a channel is related to the task, and whether its function can be supplied by same-layer peers when the channel is removed. We c... Channel importance in vision networks is usually summarized by a single score. That summary hides two different questions: how much a channel is related to the task, and whether its function can be supplied by same-layer peers when the channel is removed. We call the second property local replaceability. We introduce a two-axis view that separates these questions. The local axis measures input capture and peer overlap, while the target axis measures task information and target-excess information...
24	InfoGeo: Information-Theoretic Object-Centric Learning for Cross-View Generalizable UAV Geo-Localization 2605.07099 Object-centric UAV geo-localization以信息论目标中心学习提升跨视角UAV地理定位泛化。	cs.CV	Hongyang Zhang, Maonnan Wang, Ziyao Wang, Hongrui Yin, Man OnPun	Cross-view geo-localization (CVGL) is fundamental for precise localization and navigation in GPS-denied environments, aiming to match ground or UAV imagery with satellite views. While existing approaches rely on global feature alignment, they often suffer from... Cross-view geo-localization (CVGL) is fundamental for precise localization and navigation in GPS-denied environments, aiming to match ground or UAV imagery with satellite views. While existing approaches rely on global feature alignment, they often suffer from substantial domain shifts induced by varying regional textures and weather conditions. This issue becomes even more pronounced in UAV-based scenarios, where the broader perspective inevitably introduces dense, fine-grained objects, creatin...
25	Neurosymbolic Framework for Concept-Driven Logical Reasoning in Skeleton-Based Human Action Recognition 2605.07140 Neurosymbolic action recognition将骨架动作识别表述为基于概念的一阶逻辑推理以增强可解释性。	cs.CVcs.AI	Talha Ilyas, Deval Mehta, Zongyuan Ge	Skeleton-based human activity recognition has achieved strong empirical performance, yet most existing models remain black boxes and difficult to interpret. In this work, we introduce a neurosymbolic formulation of skeleton-based HAR that reframes action recog... Skeleton-based human activity recognition has achieved strong empirical performance, yet most existing models remain black boxes and difficult to interpret. In this work, we introduce a neurosymbolic formulation of skeleton-based HAR that reframes action recognition as concept-driven first-order logical reasoning over motion primitives. Our framework bridges representation learning and symbolic inference by grounding first-order logic predicates in learnable spatial and temporal motion concepts....
26	Qwen3-VL-Seg: Unlocking Open-World Referring Segmentation with Vision-Language Grounding 2605.07141 Open-world referring segmentation将Qwen3多模态模型扩展为开放世界指代分割的像素级输出。	cs.CVcs.AI	Yuan Yao, Qiushi Yang, Humen Zhong, Jiangning Wei, Yifang Men	Open-world referring segmentation requires grounding unconstrained language expressions to precise pixel-level regions. Existing multimodal large language models (MLLMs) exhibit strong open-world visual grounding, but their outputs remain limited to sparse bou... Open-world referring segmentation requires grounding unconstrained language expressions to precise pixel-level regions. Existing multimodal large language models (MLLMs) exhibit strong open-world visual grounding, but their outputs remain limited to sparse bounding-box coordinates and are insufficient for dense visual prediction. Recent MLLM-based segmentation methods either directly predict sparse contour coordinates, struggling to reconstruct continuous object boundaries, or rely on external s...
27	AGA3DNet: Anatomy-Guided Gaussian Priors with Multi-view xLSTM for 3D Brain MRI Subtype Classification 2605.07142 Brain MRI subtype classification融合报告提取解剖短语先验与多视角xLSTM进行3D分型。	cs.CV	Peiyu Duan, Xueqi Guo, Sepehr Farhand, Mehmet Berk Sahin, Xinyuan Zheng	Accurate 3D brain MRI subtype classification benefits from both localized anatomical cues and long-range contextual reasoning. We present AGA3DNet, a report-grounded framework that incorporates brief anatomical phrases extracted from radiology reports as a sof... Accurate 3D brain MRI subtype classification benefits from both localized anatomical cues and long-range contextual reasoning. We present AGA3DNet, a report-grounded framework that incorporates brief anatomical phrases extracted from radiology reports as a soft anatomical prior channel and fuses it with a lightweight 3D CNN and multi-view xLSTM aggregation. Specifically, extracted anatomical phrases are mapped to atlas-defined regions and converted into smooth spatial priors using a signed-dista...
28	TriP: A Triangle Puzzle Approach to Robust Translation Averaging 2605.07143 Robust translation averaging用三角拼图式一致性推断实现抗噪的相机平移平均。	cs.CV	Zhekai Fan, Wanze Li, Jinxin Wang, Yunpeng Shi	Translation averaging aims to recover camera locations from pairwise relative translation directions and is a fundamental component of global Structure-from-Motion pipelines. The problem is challenging because direction measurements contain no distance informa... Translation averaging aims to recover camera locations from pairwise relative translation directions and is a fundamental component of global Structure-from-Motion pipelines. The problem is challenging because direction measurements contain no distance information, making the estimation problem highly ill-conditioned and highly sensitive to corrupted observations. In this paper, we propose TriP, a triangle-based framework for robust translation averaging. TriP first infers local relative edge sc...
29	UniV2D: Bridging Visual Restoration and Semantic Perception for Underwater Salient Object Detection 2605.07146 Underwater salient object detection联合水下图像复原与语义感知以提升显著目标检测。	cs.CV	Laibin Chang, Shaodong Wang, Yunke Wang, Xu Zhang, Kui Jiang	Underwater salient object detection (USOD) plays a vital role in marine vision tasks but remains fundamentally challenging due to severe visual degradation, such as selective absorption and medium scattering. Conventional pipelines typically adopt a sequential... Underwater salient object detection (USOD) plays a vital role in marine vision tasks but remains fundamentally challenging due to severe visual degradation, such as selective absorption and medium scattering. Conventional pipelines typically adopt a sequential "enhance-then-detect" paradigm. However, isolating low-level visual restoration from high-level semantic perception often leads to semantic inconsistency, where the restored images may not be optimal for detection and can even introduce ta...
30	Uncovering and Shaping the Latent Representation of 3D Scene Topology in Vision-Language Models 2605.07148 3D topology in vision-language models揭示并塑造VLM内部的3D场景拓扑表征以改进空间推理。	cs.CV	Haoming Wang, Wei Gao	Decades of cognitive science establish that humans navigate environments by forming cognitive maps, defined as allocentric and topology-preserving representations of 3D space. While modern Vision-Language Models (VLMs) demonstrate emergent spatial reasoning fr... Decades of cognitive science establish that humans navigate environments by forming cognitive maps, defined as allocentric and topology-preserving representations of 3D space. While modern Vision-Language Models (VLMs) demonstrate emergent spatial reasoning from 2D egocentric inputs, it remains unclear whether they construct an analogous 3D internal representation. In this paper, we demonstrate that current VLMs do possess a latent topological map of 3D scenes, but it is heavily overshadowed by ...
31	Real-IAD MVN: A Multi-View Normal Vector Dataset and Benchmark for High-Fidelity Industrial Anomaly Detection 2605.07149 Multi-view normal anomaly dataset提出多视角法向量工业异常检测数据集与基准。	cs.CV	Wenbing Zhu, Jianing Liang, Linjie Cheng, Yurui Pan, Zhuhao Chen	Industrial Anomaly Detection (IAD) is critical for quality control, but existing methods struggle with subtle, geometric defects. Standard 2D (RGB) images are sensitive to texture and lighting but often miss fine geometric anomalies. While 3D point clouds capt... Industrial Anomaly Detection (IAD) is critical for quality control, but existing methods struggle with subtle, geometric defects. Standard 2D (RGB) images are sensitive to texture and lighting but often miss fine geometric anomalies. While 3D point clouds capture macro-shape, they are typically too sparse to detect micro-defects like scratches or pits. We address this fundamental data limitation by introducing Real-IAD-MVN (Multi-View Normal), a large-scale industrial dataset. By upgrading our a...
32	DPG-CD: Depth-Prior-Guided Cross-Modal Joint 2D-3D Change Detection 2605.07151 Cross-modal 2D-3D change detection用深度先验引导实现2D语义与3D高度联合变化检测。	cs.CVcs.AI	Luqi Zhang, Zhen Dong, Bisheng Yang	Urban spatial evolution is manifested not only through horizontal expansion but also through vertical structural changes. Consequently, jointly capturing 2D semantic changes and 3D height changes is essential for urban morphology analysis and emergency managem... Urban spatial evolution is manifested not only through horizontal expansion but also through vertical structural changes. Consequently, jointly capturing 2D semantic changes and 3D height changes is essential for urban morphology analysis and emergency management. In practical scenarios, collecting 3D observations is often constrained by high acquisition costs and the inability to support frequent updates. The multi-temporal cross-modal input consisting of pre-event Digital Surface Model (DSM) a...
33	PRIMED: Adaptive Modality Suppression for Referring Audio-Visual Segmentation via Biased Competition 2605.07154 Referring audio-visual segmentation以自适应模态抑制提升音视文指代分割鲁棒性。	cs.CV	Yuchen He, Jing Zhang	Referring Audio-Visual Segmentation (Ref-AVS) seeks to localize and segment target objects in video frames based on visual, auditory, and textual referring cues. The task is challenging because the relevance of different modalities varies across referring expr... Referring Audio-Visual Segmentation (Ref-AVS) seeks to localize and segment target objects in video frames based on visual, auditory, and textual referring cues. The task is challenging because the relevance of different modalities varies across referring expressions and scenes, while existing methods typically treat multimodal cues as homogeneous inputs for fusion, prompting, or reasoning, making them vulnerable to irrelevant or misleading modalities. To address this problem, we propose PRIMED,...
34	Hierarchical Perfusion Graphs for Tumor Heterogeneity Modeling in Glioma Molecular Subtyping 2605.07156 Perfusion graphs for glioma subtyping用灌注图建模肿瘤异质性以预测胶质瘤分子分型。	cs.CV	Han Jang, Junhyeok Lee, Heeseong Eum, Joon Jang, Yoseob Han	Precise molecular subtyping of gliomas, including isocitrate dehydrogenase (IDH) mutation and 1p/19q codeletion, directly guides surgical and therapeutic decisions, yet currently relies on invasive tissue sampling. Deep learning on structural MRI has emerged a... Precise molecular subtyping of gliomas, including isocitrate dehydrogenase (IDH) mutation and 1p/19q codeletion, directly guides surgical and therapeutic decisions, yet currently relies on invasive tissue sampling. Deep learning on structural MRI has emerged as a non-invasive alternative, but anatomy-only approaches cannot capture the hemodynamic signatures that distinguish molecular subtypes. Radiogenomics based on dynamic susceptibility contrast (DSC) MRI holds immense potential for non-invasi...
35	Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection 2605.07178 Text extraction for change detection从变化掩码提取结构化文本监督提升遥感变化检测。	cs.CV	Kai Zheng, Hang-Cheng Dong, Jiatong Pan, Zhenkai Wu, Fupeng Wei	Remote sensing change detection is pivotal for urban monitoring, disaster assessment, and environmental resource management. Yet, unimodal deep learning methods frequently confuse genuine semantic changes with visually similar but irrelevant variations. Recent... Remote sensing change detection is pivotal for urban monitoring, disaster assessment, and environmental resource management. Yet, unimodal deep learning methods frequently confuse genuine semantic changes with visually similar but irrelevant variations. Recent multimodal approaches incorporate text as auxiliary supervision, but their descriptions are either semantically coarse and unstructured or model-generated and thus noisy. Critically, all of them overlook a simple fact: fine-grained change ...
36	SatSurfGS: Generalizable 2D Gaussian Splatting for Sparse-View Satellite Surface Reconstruction 2605.07181 Sparse-view satellite Gaussian splatting提出可泛化2D高斯溅射以稀疏视角重建卫星地表。	cs.CV	Min Chen, Wei Guo, Bin Wang, Wen Li, Tong Fang	Sparse-view satellite image surface reconstruction remains highly challenging, fundamentally because the reliability of multi-view matching under satellite imaging conditions is strongly spatially heterogeneous. Affected by large photometric differences, weak ... Sparse-view satellite image surface reconstruction remains highly challenging, fundamentally because the reliability of multi-view matching under satellite imaging conditions is strongly spatially heterogeneous. Affected by large photometric differences, weak textures, and repetitive textures, multi-view geometric constraints are often sparse, unevenly distributed, and locally unreliable. Although 2D Gaussian Splatting (2DGS) is more suitable than 3D Gaussian Splatting (3DGS) for the explicit re...
37	PicoEyes: Unified Gaze Estimation Framework for Mixed Reality with a Large-Scale Multi-View Dataset 2605.07188 Mixed reality gaze estimation dataset统一框架从单/双目预测多属性注视并发布大规模数据集。	cs.CV	Fuxin Duan, Hui Wang	We present PicoEyes, a unified gaze estimation framework that directly predicts all key attributes of gaze, including 3D eye parameters, eye-region segmentation, optical axis, visual axis, and depth maps, from either monocular or binocular inputs. The framewor... We present PicoEyes, a unified gaze estimation framework that directly predicts all key attributes of gaze, including 3D eye parameters, eye-region segmentation, optical axis, visual axis, and depth maps, from either monocular or binocular inputs. The framework simultaneously addresses calibration, gaze forecasting, and varying device postures, while also supporting 3D eye reconstruction via joint estimation of eye parameters and depth maps in an end-to-end manner. In addition, we introduce a la...
38	Attention Transfer Is Not Universally Effective for Vision Transformers 2605.07191 ViT attention transfer evaluation系统评测表明ViT注意力迁移蒸馏并非普遍有效。	cs.CVcs.LG	Huaiyuan Qin, Muli Yang, Gabriel James Goenawan, Peng Hu, Chen Gong	A recent work shows that Attention Transfer, which transfers only the attention patterns from a pre-trained teacher Vision Transformer (ViT) to a randomly initialized standard student ViT, is sufficient to recover the full benefit of the teacher's pre-trained ... A recent work shows that Attention Transfer, which transfers only the attention patterns from a pre-trained teacher Vision Transformer (ViT) to a randomly initialized standard student ViT, is sufficient to recover the full benefit of the teacher's pre-trained weights. We revisit this finding on a comprehensive benchmark of 20 teachers from 11 well-known ViT families and reveal that Attention Transfer is not universally effective. While 7 families transfer successfully, 4 consistently fail, falli...
39	AsyncEvGS: Asynchronous Event-Assisted Gaussian Splatting for Handheld Motion-Blurred Scenes 2605.07192 Event-assisted Gaussian splatting用异步事件辅助高斯溅射重建运动模糊手持场景。	cs.CV	Jun Dai, Renbiao Jin, Bo Xu, Yutian Chen, Linning Xu	3D reconstruction methods such as 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) achieve impressive photorealism but fail when input images suffer from severe motion blur. While event cameras provide high-temporal-resolution motion cues, existi... 3D reconstruction methods such as 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) achieve impressive photorealism but fail when input images suffer from severe motion blur. While event cameras provide high-temporal-resolution motion cues, existing event-assisted approaches rely on low-resolution sensors and strict synchronization, limiting their practicality for handheld 3D capture on common devices, such as smartphones. We introduce a flexible, high-resolution asynchronous RGB-Ev...
40	Closed-Form Linear-Probe Dataset Distillation for Pre-trained Vision Models 2605.07194 Linear-probe dataset distillation给出线性探测场景下数据蒸馏的闭式解与高效算法。	cs.CVcs.LGcs.AI	Bincheng Peng, Guang Li, Ping Liu, Takahiro Ogawa, Miki Haseyama	Dataset distillation compresses a large training set into a small synthetic set that preserves downstream training utility. While most existing methods target training networks from scratch, modern visual transfer learning often uses frozen pre-trained encoder... Dataset distillation compresses a large training set into a small synthetic set that preserves downstream training utility. While most existing methods target training networks from scratch, modern visual transfer learning often uses frozen pre-trained encoders followed by lightweight linear probing. Existing distillation methods for this setting either unroll iterative linear-probe updates with trajectory-based gradient matching, or rely on closed-form formulations originally designed for from-...
41	See Tomorrow, Act Today: Foresight-Driven Autonomous Driving 2605.07195 World-model planning for driving以世界模型前瞻想象未来场景来进行自动驾驶规划。	cs.CV	Bozhou Zhang, Nan Song, Yuang Wang, Jiankang Deng, Xiatian Zhu	Current end-to-end autonomous driving planners are fundamentally reactive: they condition on historical and present observations to predict future actions. We argue that autonomous agents should instead imagine future scenes before deciding, just as human driv... Current end-to-end autonomous driving planners are fundamentally reactive: they condition on historical and present observations to predict future actions. We argue that autonomous agents should instead imagine future scenes before deciding, just as human drivers mentally simulate ``what will happen next" before acting. We introduce ForeSight, a foundation world model centric planning framework that reframes autonomous driving as anticipatory decision-making. Rather than treating world models as...
42	From Pixels to Primitives: Scene Change Detection in 3D Gaussian Splatting 2605.07203 3DGS primitive-based change detection直接利用高斯基元属性而非渲染像素进行场景变化检测。	cs.CV	Chamuditha Jayanga Galappaththige, Jason Lai, Timothy Patten, Donald Dansereau, Niko Suenderhauf	Scene change detection methods built on Gaussian splatting universally follow a render-then-compare paradigm: the pre-change scene is rendered into 2D and compared against post-change images via pixel or feature residuals. This change detection problem with Ga... Scene change detection methods built on Gaussian splatting universally follow a render-then-compare paradigm: the pre-change scene is rendered into 2D and compared against post-change images via pixel or feature residuals. This change detection problem with Gaussian Splatting has been treated as a question about pixels; we treat it as a question about primitives. We provide direct evidence that native primitive attributes alone -- position, anisotropic covariance, and color -- carry sufficient s...
43	LoHGNet: Infrared Small Target Detection through Lorentz Geometric Encoding with High-Order Relation Learning 2605.07213 Infrared small target detection用洛伦兹几何编码与高阶关系学习提升红外小目标检测。	cs.CV	Qianwen Ma, Yang Xu, Shangwei Deng, Xiaobo Li, Haofeng Hu	Infrared small target detection (IRSTD) remains challenging due to the scarcity of useful target cues and the presence of severe background clutter. Most current methods rely on conventional feature learning and local interaction modeling, where features are r... Infrared small target detection (IRSTD) remains challenging due to the scarcity of useful target cues and the presence of severe background clutter. Most current methods rely on conventional feature learning and local interaction modeling, where features are represented in Euclidean space. However, such designs may still be limited in describing the subtle differences of weak targets and the contextual relations between targets and backgrounds. To address these limitations, we propose LoHGNet, a...
44	DINO-MVR: Multi-View Readout of Frozen DINOv3 for Annotation-Efficient Medical Segmentation 2605.07221 Frozen DINOv3 medical segmentation提出多视图读出冻结DINOv3特征以少标注医学分割。	cs.CV	Wei Jiang, Feng Liu, Nan Ye, Hongfu Sun	Adapting foundation models to medical segmentation typically requires either backbone fine-tuning or high-capacity task-specific decoders, both of which are difficult to fit reliably when annotations are scarce. We show that frozen DINOv3 features already cont... Adapting foundation models to medical segmentation typically requires either backbone fine-tuning or high-capacity task-specific decoders, both of which are difficult to fit reliably when annotations are scarce. We show that frozen DINOv3 features already contain useful structural and boundary cues for medical segmentation, and that the main bottleneck lies in how these features are read out. We propose DINO-MVR, a Multi-View Readout framework for annotation-efficient medical segmentation. DINO-...
45	CASCADE: Context-Aware Relaxation for Speculative Image Decoding 2605.07230 Speculative decoding for images用上下文感知松弛降低拒绝率以加速自回归图像解码。	cs.CVcs.AI	Selin Yildirim, Subhajit Dutta Chowdhury, Mohammad Mahdi Kamani, Vikram Appia, Deming Chen	Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing app... Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing approaches fail to achieve efficiency gains comparable to those observed in text generation. A key limitation is the target model's high uncertainty during image generation, which leads to high draft token rejection rates. In this work, we ide...
46	Towards multi-modal forgery representation learning for AI-generated video detection and localization 2605.07232 Multimodal AI-generated video forensics学习多模态伪造表征以检测并时序定位AI生成视频篡改。	cs.CV	Dat Le, Khoa Nguyen, Xin Wang, Shu Hu	Recent advances in generative AI have democratized video creation at scale. AI-generated videos, including partially manipulated clips across visual and audio channels, pose escalating risks of semantic distortion and misuse, which motivates the need for relia... Recent advances in generative AI have democratized video creation at scale. AI-generated videos, including partially manipulated clips across visual and audio channels, pose escalating risks of semantic distortion and misuse, which motivates the need for reliable detection tools. Most existing AI-generated video detectors remain limited by single- or partial-modality of data modeling and the lack of fine-grained temporal forgery localization. To address these challenges, our primary novelty intr...
47	Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment 2605.07250 MLLM safety bypass via degradation揭示图像降质会削弱多模态模型安全对齐并促成越狱。	cs.CVcs.AI	Zhixue Song, Boyan Han, Yiwei Wang, Chi Zhang	Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes ... Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes jailbreaking. Our experiments reveal that the safety defenses of SOTA models deteriorate sharply as resolution degrades, surprisingly persisting even when text remains legible. We attribute this to ``Cognitive Overload'', hypothesizing that...
48	LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling 2605.07253 Efficient diffusion sampling noise shaping通过低频特征噪声整形在少步扩散采样中提升画质与效率。	cs.CV	Haewon Jeon, Si-Hyeon Lee	Distilled diffusion models accelerate image generation by reducing the number of denoising steps, but often suffer from degraded image quality. To mitigate this trade-off, test-time optimization methods improve quality, yet their iterative nature incurs substa... Distilled diffusion models accelerate image generation by reducing the number of denoising steps, but often suffer from degraded image quality. To mitigate this trade-off, test-time optimization methods improve quality, yet their iterative nature incurs substantial computational overhead and leads to slow inference, limiting practical usability. Recent hypernetwork-based approaches amortize this process during training, but still require costly noise modulation in high-dimensional latent spaces....
49	High-Fidelity Surface Splatting-Based 3D Reconstruction from Multi-View Images 2605.07254 Surface splatting 3D reconstruction提出基于表面溅射的多视图重建以联合优化几何与外观。	cs.CV	Nandhana Sunil, Abhirami R Iyer, Avirup Mandal	Multi-view mesh reconstruction remains a core challenge in computer graphics and vision, especially for recovering high-frequency geometry from sparse observations. Recent methods such as 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) rely on p... Multi-view mesh reconstruction remains a core challenge in computer graphics and vision, especially for recovering high-frequency geometry from sparse observations. Recent methods such as 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) rely on post-processing for mesh extraction, thereby limiting joint optimization of geometry and appearance. Implicit Moving Least Squares (IMLS) instead enables direct conversion of point clouds into signed distance and texture fields, supporting e...
50	TAS-LoRA: Transformer Architecture Search with Mixture-of-LoRA Experts 2605.07256 ViT architecture search with LoRA用LoRA专家混合缓解超网权重共享导致的架构搜索塌缩。	cs.CV	Jeimin Jeon, Hyunju Lee, Bumsub Ham	Transformer architecture search (TAS) discovers optimal vision transformer (ViT) architectures automatically, reducing human effort to manually design ViTs. However, existing TAS methods suffer from the feature collapse problem, where subnets within a supernet... Transformer architecture search (TAS) discovers optimal vision transformer (ViT) architectures automatically, reducing human effort to manually design ViTs. However, existing TAS methods suffer from the feature collapse problem, where subnets within a supernet fail to learn subnet-specific features, mainly due to the shared weights in a supernet, limiting the performance of individual subnets. To address this, we propose TAS-LoRA, a novel method that introduces parameter-efficient low-rank adapt...
51	Adaptive Subspace Projection for Generative Personalization 2605.07257 Generative personalization subspace control用自适应子空间投影抑制个性化语义塌缩并保留提示上下文。	cs.CV	Van-Anh Nguyen, Anh Tuan Bui, Tamas Abraham, Junae Kim, Amardeep Kaur	Generative personalization often suffers from the semantic collapsing problem (SCP), where a learned personalized concept overpowers the rest of the text prompt, causing the model to ignore important contextual details. To address this, we first analyze the un... Generative personalization often suffers from the semantic collapsing problem (SCP), where a learned personalized concept overpowers the rest of the text prompt, causing the model to ignore important contextual details. To address this, we first analyze the underlying cause, revealing that the semantic drift responsible for SCP is not random but is concentrated within a specific low-dimensional subspace. We also discover that the personalization process perturbs the embedding of the original bas...
52	Sat3R: Satellite DSM Reconstruction via RPC-Aware Depth Fine-tuning 2605.07264 Satellite DSM depth fine-tuning引入RPC感知深度微调以提升卫星影像DSM重建精度与泛化。	cs.CV	Qiaoyi Yang, Chaoyi Zhou, Xi Liu, Run Wang, Minghui Xu	Accurate Digital Surface Model (DSM) reconstruction from satellite imagery is critical for applications such as disaster response, urban planning, and large-scale geographic mapping. Existing approaches face a fundamental trade-off: optimization-based methods ... Accurate Digital Surface Model (DSM) reconstruction from satellite imagery is critical for applications such as disaster response, urban planning, and large-scale geographic mapping. Existing approaches face a fundamental trade-off: optimization-based methods achieve strong accuracy but require hours of per-scene computation, while generalizable geometry foundation models offer near-instant inference but fail to generalize to satellite imagery due to the domain gap introduced by the Rational Pol...
53	From Clouds to Hallucinations: Atmospheric Retrieval Hijacking in Remote Sensing Vision-Language RAG 2605.07273 Remote sensing RAG retrieval attack提出云层式输入攻击劫持遥感视觉语言RAG的证据检索。	cs.CVcs.AI	Jiaju Han, Chao Li, Chengyin Hu, Qike Zhang, Xuemeng Sun	Multimodal RAG systems increasingly rely on vision-language retrievers to ground visual queries in external textual evidence. Existing adversarial studies on RAG mainly manipulate the retrieval corpus or memory, while attacks on vision-language and remote sens... Multimodal RAG systems increasingly rely on vision-language retrievers to ground visual queries in external textual evidence. Existing adversarial studies on RAG mainly manipulate the retrieval corpus or memory, while attacks on vision-language and remote sensing models typically target end-task predictions. Input-space threats to the evidence retrieval stage of remote sensing multimodal RAG remain underexplored. To address this gap, we introduce CloudWeb, an atmospheric retrieval hijacking atta...
54	SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis 2605.07287 Adaptive Gaussian allocation for NVS学习按场景复杂度自适应分配高斯基元以提升泛化新视角合成。	cs.CV	Yecong Wan, Fan Li, Mingwen Shao, Wangmeng Zuo	Generalizable novel view synthesis aims to render unseen views from uncalibrated input images without requiring per-scene optimization. Recent feed-forward approaches based on 3D Gaussian Splatting have achieved promising efficiency and rendering quality. Howe... Generalizable novel view synthesis aims to render unseen views from uncalibrated input images without requiring per-scene optimization. Recent feed-forward approaches based on 3D Gaussian Splatting have achieved promising efficiency and rendering quality. However, most of them assign a fixed number of Gaussians to each pixel or voxel, ignoring the spatially varying complexity of real-world scenes. Such uniform allocation often wastes Gaussian primitives in smooth regions while providing insuffic...
55	Sword: Style-Robust World Models as Simulators via Dynamic Latent Bootstrapping for VLA Policy Post-Training 2605.07288 World-model simulator for VLA policies用动态潜变量自举训练风格鲁棒世界模型以改进想象式策略优化。	cs.CVcs.AI	Jiaxuan Gao, Yongjian Guo, Zhong Guan, Wen Huang, Wanlun Ma	The integration of Vision-Language-Action (VLA) models with World Models has gained increasing attention. One representative approach treats learned World Models as generative simulators, enabling policy optimization entirely within "imagination." However, whe... The integration of Vision-Language-Action (VLA) models with World Models has gained increasing attention. One representative approach treats learned World Models as generative simulators, enabling policy optimization entirely within "imagination." However, when deployed as simulators for specific environments such as the LIBERO benchmark, existing World Models often suffer from poor generalization and long-horizon error accumulation. During closed-loop rollouts, these models are highly sensitive...
56	EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams 2605.07299 Egocentric proactive interaction benchmark构建流式第一视角基准评测个性化主动交互与时机把握。	cs.CVcs.AI	Dongchuan Ran, Linyu Ou, Xueheng Li, Wenwen Tong, Chenxu Guo	Existing Multimodal Large Language Models (MLLMs) remain primarily reactive, failing to continuously perceive environments or proactively assist users. While emerging benchmarks address proactivity, they are largely confined to alert scenarios, neglect persona... Existing Multimodal Large Language Models (MLLMs) remain primarily reactive, failing to continuously perceive environments or proactively assist users. While emerging benchmarks address proactivity, they are largely confined to alert scenarios, neglect personalized context, and fail to evaluate the precise timing of human-machine interactions (HMI).In this paper, we introduce EgoPro-Bench, a novel benchmark for training and evaluating proactive interaction capabilities based on streaming egocent...
57	Amortized-Precision Quantization for Early-Exit Vision Transformers 2605.07317 Quantization for early-exit ViTs提出利用率感知量化以稳定低精度早退ViT的动态推理。	cs.CVcs.AI	Rui Fang, Hsi-Wen Chen, Ming-Syan Chen	Vision Transformers (ViTs) achieve strong performance across vision tasks, yet their deployment with low-precision early exiting remains fragile. Existing quantization methods assume static full-depth execution, making them unstable when exit decisions are per... Vision Transformers (ViTs) achieve strong performance across vision tasks, yet their deployment with low-precision early exiting remains fragile. Existing quantization methods assume static full-depth execution, making them unstable when exit decisions are perturbed by quantization noise, which can amplify errors along dynamic inference paths. In this paper, we introduce Amortized-Precision Quantization (APQ), a utilization-aware formulation that accounts for layer-wise stochastic exposure to qu...
58	GEM: Generating LiDAR World Model via Deformable Mamba 2605.07326 Generative LiDAR world model用可变形Mamba建模点云时空动态生成LiDAR世界模型。	cs.CV	Yang Wu, Zhaojiang Liu, Qiang Meng, Youquan Liu, Renliang Weng	World models, which simulate environmental dynamics and generate sensor observations, are gaining increasing attention in autonomous driving. However, progress in LiDAR-based world models has lagged behind those built on camera videos or occupancy data, primar... World models, which simulate environmental dynamics and generate sensor observations, are gaining increasing attention in autonomous driving. However, progress in LiDAR-based world models has lagged behind those built on camera videos or occupancy data, primarily due to two core challenges: the inherent disorder of LiDAR point clouds and the difficulty of distinguishing dynamic objects from static structures. To address these issues, we propose GEM: a Generative LiDAR world model that leverages ...
59	Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations 2605.07327 One-step diffusion distillation用单一漂移损失结合预训练表征实现一步扩散蒸馏。	cs.CV	Yuan Zhang, Chenyi Li, Guoqing Ma, Jiajun Zha, Yuanming Yang	Sampling from pretrained diffusion and flow-matching models typically requires many forward passes to generate diverse and high-fidelity images. Existing distillation methods often rely on multiple auxiliary networks, carefully designed training stages, or com... Sampling from pretrained diffusion and flow-matching models typically requires many forward passes to generate diverse and high-fidelity images. Existing distillation methods often rely on multiple auxiliary networks, carefully designed training stages, or complex optimization pipelines. In this work, we revisit the recently proposed Drifting Model objective and show that a single drifting loss can be directly used to simplify one step distillation. A key observation is that the pretrained diffu...
60	GC-ART: Global Learnable Second-Order Rational Tone Curves for Illumination Robustness 2605.07329 Tone-curve preprocessing for robustness学习全局可微色调曲线预处理以增强分类对光照变化鲁棒性。	cs.CV	Wei Huang, Joyce Huang	We introduce GC-ART (Global Curve Adaptive Rational Tone-mapping), a lightweight differentiable pre-processing module for robust image classification. GC-ART predicts an endpoint-pinned rational tone curve from per-channel soft histograms using a 643-parameter... We introduce GC-ART (Global Curve Adaptive Rational Tone-mapping), a lightweight differentiable pre-processing module for robust image classification. GC-ART predicts an endpoint-pinned rational tone curve from per-channel soft histograms using a 643-parameter MLP, then applies the curve pointwise before the classifier. The module is trained end-to-end with cross-entropy and a soft monotonicity penalty. On CIFAR-10 with a CIFAR-style ResNet-18, GC-ART matches clean accuracy with the unenhanced b...
61	RCoT-Seg: Reinforced Chain-of-Thought for Video Reasoning and Segmentation 2605.07334 Video reasoning segmentation用强化链式思维提升视频推理分割的时序理解与定位。	cs.CV	Junwei Wen, Deshui Miao, Guangming Lu, Xin Li, Wenjie Pei	Video Reasoning Segmentation (VRS) aims to segment target objects in videos based on implicit instructions that convey human intent and temporal logic. Existing MLLM-based methods predict masks with a [SEG] token after selecting frames via simple sampling or a... Video Reasoning Segmentation (VRS) aims to segment target objects in videos based on implicit instructions that convey human intent and temporal logic. Existing MLLM-based methods predict masks with a [SEG] token after selecting frames via simple sampling or an auxiliary MLLM, where limited supervision and frame-language similarity rules often yield narrow-scope keyframe choices that weaken holistic temporal understanding and lead to brittle localization in complex multi-object scenes. To addres...
62	ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs 2605.07338 Marine mollusc recognition benchmark构建海洋贝类视觉识别基准以评测水下鲁棒性。	cs.CV	Ziheng Zhou, Yang Wang, Nan Wang, Chengliang Wu, Jun Yan	The decline of global shellfish biodiversity poses a severe threat to coastal ecosystems. Although artificial intelligence (AI) technologies show potential for automated ecological monitoring, existing marine benthic datasets often lack adaptation to the compl... The decline of global shellfish biodiversity poses a severe threat to coastal ecosystems. Although artificial intelligence (AI) technologies show potential for automated ecological monitoring, existing marine benthic datasets often lack adaptation to the complexities of real underwater environments (e.g., variable lighting conditions and diverse species postures), posing challenges for the robust generalization of vision models in practical ecological monitoring. To address this problem, we cons...
63	SoLAR: Error-Resilient Streamable Long-Horizon Free-Viewpoint Video Reconstruction with Anchor Activation and Latent Recalibration 2605.07346 Long-horizon free-viewpoint reconstruction提出可流式长时自由视角视频重建的抗误差表示与校准。	cs.CV	Haotian Zhang, Xu Mo, Yixin Yu, Guanhua Zhu, Jian Xue	Free-Viewpoint Video (FVV) has emerged as a cornerstone of next-generation immersive media systems and attracted widespread attention. Previous methods primarily focus on short video sequences and suffer from significant performance degradation when processing... Free-Viewpoint Video (FVV) has emerged as a cornerstone of next-generation immersive media systems and attracted widespread attention. Previous methods primarily focus on short video sequences and suffer from significant performance degradation when processing long-horizon free-viewpoint video (LFVV). Motivated by bit allocation theory, we analyze dynamic-anchor-based volumetric video representation within a rate-distortion optimization framework and propose \textbf{SoLAR}, which is the first er...
64	Disambiguating 2D-3D Correspondences in Gaussian Splatting-based Feature Fields for Visual Localization 2605.07351 Gaussian splatting localization改进高斯泼溅特征场的2D-3D匹配以稳定位姿定位。	cs.CV	Miso Lee, Sangeek Hyun, Yerim Jeon, Jae-Pil Heo	While Gaussian Splatting-based Feature Fields (GSFFs) have shown promise for visual localization, this paper highlights that photometrically optimized GSFFs are inherently ill-suited for 2D-3D matching. The volumetric extent of each Gaussian induces many-to-on... While Gaussian Splatting-based Feature Fields (GSFFs) have shown promise for visual localization, this paper highlights that photometrically optimized GSFFs are inherently ill-suited for 2D-3D matching. The volumetric extent of each Gaussian induces many-to-one pixel-to-point mappings that destabilize PnP-based pose estimation, while photometric optimization gives rise to superfluous Gaussians devoid of multi-view consistency. To address these issues, we propose SplitGS-Loc, a localization-speci...
65	TTF: Temporal Token Fusion for Efficient Video-Language Model 2605.07355 Video token compression提出训练免的时序Token融合以加速视频语言模型推理。	cs.CVcs.AI	Simin Huo, Ning LI	Video-language models (VLMs) face rapid inference costs as visual token counts scale with video length. For example, 32 frames at $448{\times}448$ resolution already yield >8,000 visual tokens in Qwen3-VL, making LLM prefill the dominant throughput bottlene... Video-language models (VLMs) face rapid inference costs as visual token counts scale with video length. For example, 32 frames at $448{\times}448$ resolution already yield >8,000 visual tokens in Qwen3-VL, making LLM prefill the dominant throughput bottleneck. Existing methods often rely on global similarity or attention-guided compression, incurring offsets to their gains. We propose \textbf{Temporal Token Fusion (TTF)}, a training-free, plug-and-play pre-LLM token compression framework that ex...
66	UniD-Shift: Towards Unified Semantic Segmentation via Interpretable Share-Private Multimodal Decomposition 2605.07356 Multimodal 2D-3D segmentation fusion用可解释共享-私有分解统一2D/3D语义分割融合。	cs.CV	Shuai Zhang, Zhecheng Shi, Zhuxiao Li, Jing Ou, Tengxi Wang	Semantic segmentation of large-scale 3D point clouds is crucial for applications such as autonomous driving and urban digital twins. However, the sparse sampling pattern of LiDAR and the view-dependent geometric distortion in image observations complicate cros... Semantic segmentation of large-scale 3D point clouds is crucial for applications such as autonomous driving and urban digital twins. However, the sparse sampling pattern of LiDAR and the view-dependent geometric distortion in image observations complicate cross-modal alignment and hinder stable fusion. Inspired by the fact that 2D images captured by cameras are representations of the 3D world, we recognize that the features learned from 2D and 3D segmentation share some common semantics, while o...
67	UniISP: A Unified ISP Framework for Both Human and Machine Vision 2605.07359 Unified image signal processing统一RAW到RGB处理以兼顾人眼观感与机器识别性能。	cs.CV	Hanxi Li, Yao Cheng, Bo Zhang, Li Zeng	Compared to RGB images, raw sensor data provides a richer representation of information, which is crucial for accurate recognition, particularly under challenging conditions such as low-light environments. The traditional Image Signal Processing (ISP) pipeline... Compared to RGB images, raw sensor data provides a richer representation of information, which is crucial for accurate recognition, particularly under challenging conditions such as low-light environments. The traditional Image Signal Processing (ISP) pipeline generates visually pleasing RGB images for human perception through a series of steps, but some of these operations may adversely impact the information integrity by introducing compression and loss. Furthermore, in computer vision tasks t...
68	RELO: Reinforcement Learning to Localize for Visual Object Tracking 2605.07379 RL-based object tracking localization将跟踪定位建模为MDP并用强化学习直接优化IoU等指标。	cs.CVcs.AI	Xin Chen, Chuanyu Sun, Jiao Xu, Houwen Peng, Dong Wang	Conventional visual object trackers localize targets using handcrafted spatial priors, often in the form of heatmaps. Such priors provide only surrogate supervision and are poorly aligned with tracking optimization and evaluation metrics, such as intersection ... Conventional visual object trackers localize targets using handcrafted spatial priors, often in the form of heatmaps. Such priors provide only surrogate supervision and are poorly aligned with tracking optimization and evaluation metrics, such as intersection over union (IoU) and area under the success curve (AUC). Here, we introduce RELO, a REinforcement-learning-to-LOcalize method for visual object tracking that formulates target localization as a Markov decision process. Specifically, RELO re...
69	A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization 2605.07388 Marine debris detection增强YOLO的注意力与特征交互以提升海洋垃圾检测。	cs.CV	Yuyang Li, Jiashu Han, Yinyi Lai, Wenbin Kang, Zenghui Liu	Marine debris detection for ocean robot is crucial for ecological protection, yet performance is often degraded by low-quality images with blur, complex backgrounds, and small targets. To address these challenges, we propose YOLO-MD, an enhanced YOLO-based det... Marine debris detection for ocean robot is crucial for ecological protection, yet performance is often degraded by low-quality images with blur, complex backgrounds, and small targets. To address these challenges, we propose YOLO-MD, an enhanced YOLO-based detection framework. A Dual-Branch Convolutional Enhanced Self-Attention (DB-CASA) module is designed to strengthen spatial-channel interactions, improving feature representation in degraded images. Additionally, a lightweight shift-based oper...
70	ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation 2605.07390 4D world model generation在世界模型中注入4D时空认知以生成更一致的4D内容。	cs.CV	Haonan Wang, Hanyu Zhou, Tao Gu, Luxin Yan	Generative models have achieved success in producing apparently coherent 2D videos, but remain challenging in the physical world due to lack of 4D spatiotemporal scale. Typically, existing 4D generative models directly embed macro scale constraints to enhance ... Generative models have achieved success in producing apparently coherent 2D videos, but remain challenging in the physical world due to lack of 4D spatiotemporal scale. Typically, existing 4D generative models directly embed macro scale constraints to enhance overall spatiotemporal consistency. However, these methods only ensure global appearance coherence and fail to reveal the local dynamics of the physical world. Our insight is that global appearance structure and local dynamic topology empow...
71	BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning 2605.07394 RL image captioning alignment提出平衡的强化学习框架以兼顾多维度图像描述质量。	cs.CVcs.AI	Shaokai Ye, Vasileios Saveris, Yihao Qian, Jiaming Hu, Elmira Amirloo	Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, rece... Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasingly turned to reinforcement learning (RL). However, existing captioning-RL methods and evaluation metrics often emphasize a narrow notion of caption quality, inducing trade-offs across core dimensions of captioning. For...
72	Exposing and Mitigating Temporal Attack in Deepfake Video Detection 2605.07398 Deepfake temporal attack defense揭示深伪检测的时序攻击并用谱不变防御提升鲁棒性。	cs.CVcs.AI	Zheyuan Gu, Minghao Shao, Zhen Wang, Yusong Wang, Mingkun Xu	While spatiotemporal deepfake detectors achieve high AUC, our experiments reveal their susceptibility to evasion attacks. These models tend to overfit on fragile temporal spectrum cues, rather than learning robust semantic causality. To mitigate this vulnerabi... While spatiotemporal deepfake detectors achieve high AUC, our experiments reveal their susceptibility to evasion attacks. These models tend to overfit on fragile temporal spectrum cues, rather than learning robust semantic causality. To mitigate this vulnerability, we propose SpInShield, a temporal spectral-invariant defense framework explicitly designed to decouple semantic motion from manipulatable spectral artifacts. We propose a learnable spectral adversary that dynamically synthesizes sever...
73	GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization 2605.07399 Jailbreak diffusion VLM用全局概率优化方法对扩散式视觉语言模型实施越狱攻击。	cs.CV	Yu Pan, Andi Zhang, Yi Wang, Sibei Yang, Wenjie Wang	Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have demonstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dV... Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have demonstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dVLMs appear inherently robust against conventional jailbreak tactics, which we categorize as Fixed Prefix Optimization (FPO) (e.g., anchoring responses with "Sure, here is"), this perceived resilience is deceptive. Our investigation into the...
74	InsHuman: Towards Natural and Identity-Preserving Human Insertion 2605.07402 Identity-preserving human insertion提出保持身份与姿态自然的人像插入方法与数据集。	cs.CV	Jie Li, Shulian Zhang, Yangyang Gao, Wenbo Li, Yulun Zhang	Human insertion aims to naturally place specific individuals into a target background. Although existing image editing models may have such ability, they often produce failure cases, including inappropriate human pose in new background, inconsistent number of ... Human insertion aims to naturally place specific individuals into a target background. Although existing image editing models may have such ability, they often produce failure cases, including inappropriate human pose in new background, inconsistent number of people, and modified facial identity. Moreover, publicly available human datasets often lack full-body portraits and realistic physical interaction between humans and their background. To address these challenges, we propose InsHuman for na...
75	ChartREG++: Towards Benchmarking and Improving Chart Referring Expression Grounding under Diverse referring clues and Multi-Target Referring 2605.07415 Chart referring expression grounding构建并改进图表指代表达定位基准以支持多目标线索。	cs.CVcs.CL	Tianhao Niu, Ziyu Han, Qingfu Zhu, Wanxiang Che	Referring expression grounding is a core problem in visual grounding and is widely used as a diagnostic of spatial grounding and reasoning in vision and language models, yet most prior work focuses on natural images. In contrast, existing chart referring expre... Referring expression grounding is a core problem in visual grounding and is widely used as a diagnostic of spatial grounding and reasoning in vision and language models, yet most prior work focuses on natural images. In contrast, existing chart referring expression grounding-related benchmarks remain limited: (1) they largely adopt bounding boxes, constraining localization precision for fine chart elements (2) they mostly assume a single and two referred target instances, failing to handle multi...
76	Learning Image-Adaptive Scale Fields for Metric Depth Recovery 2605.07418 Metric depth scale recovery学习图像自适应尺度场以用稀疏锚点恢复度量深度。	cs.CV	Yuanyan Li, Matthias Althoff	Monocular depth estimation (MDE) typically produces depth estimations that are defined up to an unknown scale or shift. When only sparse metric anchors are available, recovering accurate metric depth becomes challenging yet necessary for practical applications... Monocular depth estimation (MDE) typically produces depth estimations that are defined up to an unknown scale or shift. When only sparse metric anchors are available, recovering accurate metric depth becomes challenging yet necessary for practical applications. We address this problem by formulating metric depth recovery as image-adaptive scale field modeling. Instead of directly correcting the depth, we reformulate the correction as a low-dimensional linear combination of image-adaptive basis m...
77	Towards Photorealistic and Efficient Bokeh Rendering via Diffusion Framework 2605.07429 Diffusion bokeh rendering基于扩散框架实现高效逼真的手机虚化散景渲染。	cs.CV	Linxiao Shi, Siming Zheng, Zerong Wang, Hao Zhang, Jinwei Chen	Existing mobile devices are constrained by compact optical designs, such as small apertures, which make it difficult to produce natural, optically realistic bokeh effects. Although recent learning-based methods have shown promising results, they still struggle... Existing mobile devices are constrained by compact optical designs, such as small apertures, which make it difficult to produce natural, optically realistic bokeh effects. Although recent learning-based methods have shown promising results, they still struggle with photos captured under high digital zoom levels, which often suffer from reduced resolution and loss of fine details. A naive solution is to enhance image quality before applying bokeh rendering, yet this two-stage pipeline reduces eff...
78	Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs 2605.07447 VLM adversarial attack detection用稀疏自编码器作为即插即用防火墙检测VLM对抗攻击。	cs.CVcs.CLcs.LGcs.AI	Hao Wang, Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh, Daisuke Kawahara	Vision-language models (VLMs) have advanced rapidly and are increasingly deployed in real-world applications, especially with the rise of agent-based systems. However, their safety has received relatively limited attention. Even the latest proprietary and open... Vision-language models (VLMs) have advanced rapidly and are increasingly deployed in real-world applications, especially with the rise of agent-based systems. However, their safety has received relatively limited attention. Even the latest proprietary and open-weight VLMs remain highly vulnerable to adversarial attacks, leaving downstream applications exposed to significant risks. In this work, we propose a novel and lightweight adversarial attack detection framework based on sparse autoencoders...
79	EditTransfer++: Toward Faithful and Efficient Visual-Prompt-Guided Image Editing 2605.07455 Visual-prompt edit transfer提出更忠实高效的视觉提示示例编辑迁移扩散方法。	cs.CV	Lan Chen, Qi Mao, Yiren Song, Yuchao Gu, Siwei Ma	Visual-prompt-guided edit transfer aims to learn image transformations directly from example pairs, offering more precise and controllable editing than purely text-driven approaches. However, existing diffusion transformer-based methods often fail to faithfull... Visual-prompt-guided edit transfer aims to learn image transformations directly from example pairs, offering more precise and controllable editing than purely text-driven approaches. However, existing diffusion transformer-based methods often fail to faithfully reproduce the demonstrated edits due to structural mismatches between the task and the backbone, including a pretrained bias toward textual conditioning and inherent stochastic instability during sampling. To bridge this gap, we present E...
80	EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement 2605.07457 Agentic image edit refinement构建人类对齐的代理式框架对编辑结果进行局部精修。	cs.CV	Zitong Xu, Huiyu Duan, Yifei Nie, Mingda Du, Sijing Wu	Recent text-guided image editing (TIE) models have made remarkable progress, yet edited images still frequently suffer from fine-grained issues such as unnatural objects, lighting mismatch, and unexpected changes. Existing refinement approaches either rely on ... Recent text-guided image editing (TIE) models have made remarkable progress, yet edited images still frequently suffer from fine-grained issues such as unnatural objects, lighting mismatch, and unexpected changes. Existing refinement approaches either rely on costly iterative regeneration or employ vision-language models (VLMs) with weak spatial grounding, often resulting in semantic drift and unreliable local corrections. To address these limitations, we first construct EditFHF-15K, a dataset o...
81	A Unified Framework for the Detection and Classification of Fatty Pancreas in Ultrasound Images 2605.07466 Ultrasound fatty pancreas diagnosis用分割与分类一体化框架自动识别超声脂肪胰。	cs.CV	Ioan-Tudor-Alexandru Anghel, Ciprian-Mihai Ceausescu, Elena Dana Nedelcu, Elena Raluca Stirban, Camelia Croitoru	Non-alcoholic fatty pancreas disease (NAFPD) is an underdiagnosed condition associated with metabolic syndrome, insulin resistance, and increased risk of pancreatic cancer. Diagnosis typically relies on subjective visual assessment of ultrasound images by clin... Non-alcoholic fatty pancreas disease (NAFPD) is an underdiagnosed condition associated with metabolic syndrome, insulin resistance, and increased risk of pancreatic cancer. Diagnosis typically relies on subjective visual assessment of ultrasound images by clinicians. We propose an end-to-end framework for automatically classifying normal versus fatty pancreas from abdominal ultrasound images. Our method employs a TransUNet-based segmentation architecture with a ResNet encoder and transformer bot...
82	ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations 2605.07474 Federated vision-language-action learning在联邦设置下用无语言标注的视动数据训练VLA机器人模型。	cs.CVcs.AI	Yuhao Zhou, Yunpeng Zhu, Yang Zhou, Jindi Lyu, Jian Lan	Vision-Language-Action (VLA) models hold great promise for general-purpose robotic intelligence, yet scaling up such models is severely bottlenecked by the high cost of acquiring annotated training data. Fortunately, vision-equipped robots deployed across vari... Vision-Language-Action (VLA) models hold great promise for general-purpose robotic intelligence, yet scaling up such models is severely bottlenecked by the high cost of acquiring annotated training data. Fortunately, vision-equipped robots deployed across various domains already produce abundant vision-action pairs that can be leveraged to scale up VLA training more efficiently. However, these raw data cannot be centrally aggregated due to various constraints and also exhibit severe heterogeneit...
83	ReasonEdit: Towards Interpretable Image Editing Evaluation via Reinforcement Learning 2605.07477 Interpretable image editing evaluation用强化学习训练可解释的图像编辑评测器并构建数据集。	cs.CV	Honghua Chen, Zitong Xu, Huiyu Duan, Xinyun Zhang, Xiongkuo Min	Recent text-guided image editing (TIE) models have achieved remarkable progress, however, many edited results still suffer from artifacts, unintended modifications, and suboptimal aesthetics. Although several benchmarks and evaluation methods have been propose... Recent text-guided image editing (TIE) models have achieved remarkable progress, however, many edited results still suffer from artifacts, unintended modifications, and suboptimal aesthetics. Although several benchmarks and evaluation methods have been proposed, most existing approaches rely on scalar scores and lack interpretability. This limitation largely stems from the absence of high-quality interpretation datasets for TIE and effective reward models to train interpretable evaluators. To ad...
84	AudioFace: Language-Assisted Speech-Driven Facial Animation with Multimodal Language Models 2605.07478 Speech-driven facial animation借助多模态语言模型利用语音与语言结构生成口型表情动画。	cs.CV	Kai Zheng, Zejian Kang, Rui Mao, Hongyuan Zou, Yuanchen Fei	Speech-driven facial animation requires accurate correspondence between acoustic signals and facial motion, especially for articulation-related mouth movements. However, directly mapping speech audio to facial coefficients often overlooks the linguistic and ph... Speech-driven facial animation requires accurate correspondence between acoustic signals and facial motion, especially for articulation-related mouth movements. However, directly mapping speech audio to facial coefficients often overlooks the linguistic and phonetic structure underlying speech production. In this paper, we propose AudioFace, a language-assisted framework for speech-driven blendshape generation that treats mouth-related facial coefficient prediction as a structured generation pro...
85	Implicit Multi-Camera System Calibration Using Gaussian Processes 2605.07491 Gaussian process camera calibration用高斯过程隐式学习多相机非线性标定并提供不确定性。	cs.CV	Ivan De Boi, Bart Ribbens, Veronika Golanova, Ursula Kapov, Simon Verspeek	This paper proposes a novel framework for implicit multi-camera system calibration utilizing Gaussian Process (GP) regression. Conventional explicit calibration methods are constrained by rigid mathematical models and struggle with complex, non-linear distorti... This paper proposes a novel framework for implicit multi-camera system calibration utilizing Gaussian Process (GP) regression. Conventional explicit calibration methods are constrained by rigid mathematical models and struggle with complex, non-linear distortions from unconventional optics, while existing neural network-based implicit approaches are typically data-hungry and lack inherent uncertainty quantification (UQ). Our GP-based model directly learns the complex, non-linear mapping from 2D ...
86	How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings 2605.07492 Document parsing benchmark提出可溯源的文档解析新基准并审计现有基准错误与污染。	cs.CV	Zhiheng Li, Zongyang Ma, Jiaxian Chen, Jianing Zhang, Zhaolong Su	The past year has seen over 20 open-source document parsing models, yet thefield still benchmarks almost exclusively on OmniDocBench, a 1,355-pagemanually annotated dataset whose top scores have saturated above 90%. Athree-stage audit pipeline we run on OmniDo... The past year has seen over 20 open-source document parsing models, yet thefield still benchmarks almost exclusively on OmniDocBench, a 1,355-pagemanually annotated dataset whose top scores have saturated above 90%. Athree-stage audit pipeline we run on OmniDocBench screens its 21,353evaluator-scored blocks and confirms 2,580 errors (12.08%); combined with overa year of public availability, both annotation quality and contamination riskcall its rankings into question. To address these issues, we...
87	DIMoE-Adapters: Dynamic Expert Evolution for Continual Learning in Vision-Language Models 2605.07494 Continual learning for VLMs用动态MoE适配器实现视觉语言模型的多域持续学习。	cs.CV	Mengxin Qin, Xiang Zhang, Xi Wang, Kun Wei, Xu Yang	Continual learning enables vision-language models to accumulate knowledge and adapt to evolving tasks without retraining from scratch. However, in multi-domain task-incremental learning, large domain shifts intensify the stability-plasticity dilemma. Most exis... Continual learning enables vision-language models to accumulate knowledge and adapt to evolving tasks without retraining from scratch. However, in multi-domain task-incremental learning, large domain shifts intensify the stability-plasticity dilemma. Most existing methods rely on fixed architectures with statically allocated parameters, which limits adaptation to new domains and aggravates catastrophic forgetting. To address these challenges, we propose DIMoE-Adapters, a Dynamic Incremental Mixt...
88	Lightweight Unpaired Smartphone ISP Transfer with Semantic Pseudo-Pairing 2605.07495 Unpaired smartphone ISP transfer通过语义伪配对实现轻量无配对的手机ISP风格迁移。	cs.CV	Yujin Cho, Flavien Armangeon, Yanhao Li	Unpaired smartphone ISP is a challenging problem due to the lack of scene and color alignment between RAW and target RGB images. Many existing methods either require paired data or rely heavily on adversarial training, which can become unstable in the unpaired... Unpaired smartphone ISP is a challenging problem due to the lack of scene and color alignment between RAW and target RGB images. Many existing methods either require paired data or rely heavily on adversarial training, which can become unstable in the unpaired setting. In this work, we present a simple and effective approach developed for the NTIRE 2026 Learned Smartphone ISP Challenge with Unpaired Data. Our method first reconstructs larger images from training patches to recover global context...
89	Cloud-top infrared observations reveal the four-dimensional precipitation structure 2605.07499 4D precipitation retrieval从静止卫星红外云顶观测反演全球降水四维结构。	cs.CV	Tianchi Xu, Ziqiang Ma, Andrea Marinoni, Yuanpeng He, Xiaoqing Li	Accurate four-dimensional (4D) precipitation information is essential for understanding the Earth's energy and water cycles, yet remains observationally unresolved at global scales. Conventional theory holds that geostationary infrared observations primarily s... Accurate four-dimensional (4D) precipitation information is essential for understanding the Earth's energy and water cycles, yet remains observationally unresolved at global scales. Conventional theory holds that geostationary infrared observations primarily sense cloud-top properties, with limited sensitivity to sub-cloud precipitation. Here we show that cloud-top infrared measurements nevertheless encode sufficient information to recover the four-dimensional structure of precipitation, reveali...
90	Diffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformers 2605.07503 Preference alignment for video diffusion提出轨迹感知偏好对齐以高效对齐视频扩散Transformer。	cs.CV	Jingyuan Zhu, Biaolong Chen, Le Zhang, Aixi Zhang, Hao Jiang	Efficiently aligning large-scale video diffusion models with human intent requires a scalable and trajectory-aware pathway that bridges the inherent discrepancy between training noise distributions and practical inference trajectories. While existing paradigms... Efficiently aligning large-scale video diffusion models with human intent requires a scalable and trajectory-aware pathway that bridges the inherent discrepancy between training noise distributions and practical inference trajectories. While existing paradigms such as Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO) attempt to address this, they are often hindered by either reliance on bias-prone, complex reward models or suboptimal timestep sampling. In this pa...
91	InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search 2605.07510 Interleaved multimodal agentic search提出InterLV-Search基准评测语言视觉交错搜索轨迹。	cs.CVcs.CL	Bohan Hou, Jiuning Gu, Jiayan Guo, Ronghao Dang, Sicong Leng	Existing benchmarks for multimodal agentic search evaluate multimodal search and visual browsing, but visual evidence is either confined to the input or treated as an answer endpoint rather than part of an interleaved search trajectory. We introduce \textbf{In... Existing benchmarks for multimodal agentic search evaluate multimodal search and visual browsing, but visual evidence is either confined to the input or treated as an answer endpoint rather than part of an interleaved search trajectory. We introduce \textbf{InterLV-Search}, a benchmark for Interleaved Language-Vision Agentic Search, in which textual and visual evidence is repeatedly used to condition later search. It contains 2,061 examples across three levels: active visual evidence seeking, co...
92	Hierarchical Dual-Subspace Decoupling for Continual Learning in Vision-Language Models 2605.07512 Continual learning for VLMs用层级双子空间解耦减少视觉语言持续学习遗忘。	cs.CV	Mengxin Qin, Xiang Zhang, Kun Wei, Xu Yang, Cheng Deng	Class-incremental learning aims to continuously acquire new knowledge while preserving previously learned information, thereby mitigating catastrophic forgetting. Existing methods primarily restrict parameter updates but often overlook their structural propert... Class-incremental learning aims to continuously acquire new knowledge while preserving previously learned information, thereby mitigating catastrophic forgetting. Existing methods primarily restrict parameter updates but often overlook their structural properties in high-dimensional spaces. From a subspace perspective, updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference and severe forgetting. To address this issue...
93	Implicit Preference Alignment for Human Image Animation 2605.07545 Preference alignment for animation用隐式偏好对齐提升人体图像动画的手部动作质量。	cs.CVcs.AI	Yuanzhi Wang, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Kai Yu	Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly di... Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly direct preference optimization, offers a potential solution, it necessitates the construction of strict preference pairs. However, curating such pairs for dynamic hand regions is prohibitively expensive and often impractical due to frame-wise...
94	Probabilistic Object Detection with Conformal Prediction 2605.07549 Conformal prediction for detection将共形预测用于目标检测以给出有覆盖保证的不确定性。	cs.CVcs.LG	Christopher Ries, Moussa Kassem Sbeyti, Nicolas Bianco, Nadja Klein	Conformal Prediction (CP) is a distribution-free method for constructing prediction sets with marginal finite-sample coverage guarantees, making it a suitable framework for reliable uncertainty quantification in safety-critical object detection. However, objec... Conformal Prediction (CP) is a distribution-free method for constructing prediction sets with marginal finite-sample coverage guarantees, making it a suitable framework for reliable uncertainty quantification in safety-critical object detection. However, object detection introduces structured multi-output predictions, complicating the application of classical CP theory developed for single outputs. In addition, standard, unscaled CP produces fixed-width prediction intervals across inputs, leadin...
95	Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views 2605.07550 3D reconstruction from disjoint views在无视角重叠条件下实现几何准确的生成式三维重建。	cs.CV	Grzegorz Wilczynski, Miko{\l}aj Zielinski, Bartosz \'Swirta, Dominik Belter, Przemys{\l}aw Spurek	3D vision systems are fundamentally constrained by their reliance on visual overlap: reconstruction methods require it for geometric alignment, while generative models use it to enforce multi-view consistency. This limitation is particularly acute in real-worl... 3D vision systems are fundamentally constrained by their reliance on visual overlap: reconstruction methods require it for geometric alignment, while generative models use it to enforce multi-view consistency. This limitation is particularly acute in real-world scenarios such as distributed swarm robotics or crowd-sourced data collection, where capturing overlapping perspectives, both in terms of spatial and appearance overlap, is often impossible. We introduce Generative Reconstruction from Dis...
96	VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network 2605.07552 Visual-inertial 3D pose estimation提出Mamba与交叉注意混合网络实现高效视觉惯性3D人体姿态。	cs.CV	Zepeng Yang, Junxuan Bai, Hao Li, Ju Dai, Junjun Pan	The rapid advances in deep learning have significantly enhanced the accuracy of multimodal 3D human pose estimation (HPE). However, the state-of-the-art (SOTA) HPE pipelines still rely on Transformers, whose quadratic complexity makes real-time processing for ... The rapid advances in deep learning have significantly enhanced the accuracy of multimodal 3D human pose estimation (HPE). However, the state-of-the-art (SOTA) HPE pipelines still rely on Transformers, whose quadratic complexity makes real-time processing for long sequences impractical. Mamba addresses this issue through selective state-space modeling, enabling efficient sequence processing without sacrificing representational power. Nevertheless, it struggles to capture complex spatial dependen...
97	Dynamic Mode Decomposition along Depth in Vision Transformers 2605.07556 DMD analysis of ViT depth用动态模态分解检验ViT深度是否近似线性自主演化。	cs.CV	Nishant Suresh Aswani, Saif Eddin Jabari	Recent work has shown that contiguous vision transformer (ViT) blocks (a) can be replaced by a linear map and (b) organize into recurrent phases of computation. We ask whether these observations coincide: does ViT depth implement approximately \textit{autonomo... Recent work has shown that contiguous vision transformer (ViT) blocks (a) can be replaced by a linear map and (b) organize into recurrent phases of computation. We ask whether these observations coincide: does ViT depth implement approximately \textit{autonomous linear} dynamics, admitting a single operator $K$ applied recurrently across a contiguous span? We test this using Dynamic Mode Decomposition (DMD), which fits $K$ from selected, consecutive hidden-state pairs and predicts $p$ steps ahea...
98	Multimodal Stepwise Clinically-Guided Attention Learning for Pathological Complete Response Prediction in Breast Cancer 2605.07561 Multimodal MRI pCR prediction用临床引导的分步注意融合多模态MRI预测乳腺癌pCR。	cs.CV	Alice Natalina Caragliano, Valerio Guarrasi, Michela Gravina, Carlo Sansone, Paolo Soda	Pathological complete response (pCR) is a key prognostic factor in breast cancer patients undergoing neoadjuvant therapy, strongly associated with long-term survival and treatment personalization. However, accurate pre-treatment pCR prediction remains challeng... Pathological complete response (pCR) is a key prognostic factor in breast cancer patients undergoing neoadjuvant therapy, strongly associated with long-term survival and treatment personalization. However, accurate pre-treatment pCR prediction remains challenging due to severe class imbalance and limited generalizability across diverse clinical settings. In this work, we propose a multimodal stepwise clinically-guided attention learning framework for pCR prediction from breast magnetic resonance...
99	Beyond GSD-as-Token: Continuous Scale Conditioning for Remote Sensing VLMs 2605.07562 Scale conditioning for RS-VLMs提出连续尺度条件微调以适配遥感VLM跨GSD尺度变化。	cs.CV	Song Zhang, Yanlong Chen, Yilin Li, Yining Chen, Zili Yi	Remote sensing vision-language models (RS-VLMs) face a fundamental mismatch with natural-image counterparts: the same geographic object exhibits radically different visual evidence across ground sampling distances (GSDs) spanning multiple orders of magnitude. ... Remote sensing vision-language models (RS-VLMs) face a fundamental mismatch with natural-image counterparts: the same geographic object exhibits radically different visual evidence across ground sampling distances (GSDs) spanning multiple orders of magnitude. Yet existing RS-VLMs often discard GSD or inject it as a discrete text token, forcing a single static parameter set to absorb the entire scale spectrum. We introduce ScaleEarth, a parameter-efficient fine-tuning framework built on Qwen3-VL ...
100	Tracing the Arrow of Time: Diagnosing Temporal Information Flow in Video-LLMs 2605.07568 Temporal information in Video-LLMs用箭头时间任务诊断Video-LLM的时序信息流瓶颈。	cs.CVcs.CL	Peitao Han, Fei Cheng, Lis K. Pereira, Qianying Liu, Shigeru Kitazawa	The Arrow-of-Time (AoT) task, determining whether a video plays forward or backward by recognizing temporal irreversibility, is one humans solve with near-perfect accuracy, yet frontier Video Large Language Models (Video-LLMs) perform only modestly above chanc... The Arrow-of-Time (AoT) task, determining whether a video plays forward or backward by recognizing temporal irreversibility, is one humans solve with near-perfect accuracy, yet frontier Video Large Language Models (Video-LLMs) perform only modestly above chance. This gap raises a key question: do visual backbones fail to encode temporal information, or does information bottleneck lie elsewhere in the Video-LLM architecture? We address this question by isolating the vision encoder from the Video-...
101	PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models 2605.07574 Polarization vision-language model将偏振成像物理参数接入VLM以缓解反射透明等歧义。	cs.CV	Yuliang Li, Chu Zhou, Heng Guo, Boxin Shi, Imari Sato	Mainstream vision-language models (VLMs) fundamentally struggle with severe optical ambiguities, such as reflections and transparent objects, due to the inherent limitations of standard RGB inputs. While polarization imaging captures polarimetric physical para... Mainstream vision-language models (VLMs) fundamentally struggle with severe optical ambiguities, such as reflections and transparent objects, due to the inherent limitations of standard RGB inputs. While polarization imaging captures polarimetric physical parameters that resolve these ambiguities, existing methods are constrained by fixed-format outputs and remain isolated from open-ended reasoning. To bridge this semantic-physical gap, we introduce PolarVLM, the first multimodal framework integ...
102	Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding 2605.07575 Scene-graph streaming video understanding用显式场景图对齐证据与条件实现主动流式视频问答。	cs.CVcs.AI	Ke Ma, Jiaqi Tang, Bin Guo, Xueting Han, Ruonan Xu	Proactive streaming video understanding requires Video-LLMs to decide when to respond as a video unfolds, a task where existing methods often fall short due to their implicit, query-agnostic modeling of visual evidence. We introduce Response-G1, a novel framew... Proactive streaming video understanding requires Video-LLMs to decide when to respond as a video unfolds, a task where existing methods often fall short due to their implicit, query-agnostic modeling of visual evidence. We introduce Response-G1, a novel framework that establishes explicit, structured alignment between the accumulated video evidence and the query's expected response conditions via scene graphs. The framework operates in three fine-tuning-free stages: (1) online query-guided scene...
103	Beyond Defenses: Manifold-Aligned Regularization for Intrinsic 3D Point Cloud Robustness 2605.07590 Robustness for 3D point clouds以流形对齐正则提升点云网络对几何保持扰动的鲁棒性。	cs.CV	Pedro Alonso, Chongshou Li, Tianrui Li	Despite extensive progress in point cloud robustness, existing methods primarily improve performance through augmentation or defense mechanisms, while overlooking the geometric root cause of adversarial fragility. We hypothesize that adversarial vulnerability ... Despite extensive progress in point cloud robustness, existing methods primarily improve performance through augmentation or defense mechanisms, while overlooking the geometric root cause of adversarial fragility. We hypothesize that adversarial vulnerability in 3D networks arises from a manifold misalignment between the latent geometry learned by the model and the intrinsic geometry of the underlying surface. Small, geometry-preserving perturbations along the input manifold often induce disprop...
104	TraceAV-Bench: Benchmarking Multi-Hop Trajectory Reasoning over Long Audio-Visual Videos 2605.07593 Long audio-visual trajectory benchmark提出TraceAV-Bench评测长视频音视多跳轨迹推理与幻觉。	cs.CV	Hengyi Feng, Hao Liang, Mingrui Chen, Bohan Zeng, Meiyi Qiang	Real-world audio-visual understanding requires chaining evidence that is sparse, temporally dispersed, and split across the visual and auditory streams, whereas existing benchmarks largely fail to evaluate this capability. They restrict videos to short clips, ... Real-world audio-visual understanding requires chaining evidence that is sparse, temporally dispersed, and split across the visual and auditory streams, whereas existing benchmarks largely fail to evaluate this capability. They restrict videos to short clips, isolate modalities, or reduce questions to one-hop perception. We introduce TraceAV-Bench, the first benchmark to jointly evaluate multi-hop reasoning over long audio-visual trajectories and multimodal hallucination robustness. TraceAV-Benc...
105	SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild 2605.07604 Promptable 3D animal reconstruction基于SMAL+实现单图多动物可提示三维重建。	cs.CVcs.AI	Xuyi Hu, Jin Lyu, Jiuming Liu, Yebin Liu, Silvia Zuffi	3D animal reconstruction in the wild remains challenging due to large species variation, frequent occlusions, and the prevalence of multi-animal scenes, while existing methods predominantly focus on single-animal settings. We present SAM 3D Animal, the first p... 3D animal reconstruction in the wild remains challenging due to large species variation, frequent occlusions, and the prevalence of multi-animal scenes, while existing methods predominantly focus on single-animal settings. We present SAM 3D Animal, the first promptable framework for multi-animal 3D reconstruction from a single image. Built on the SMAL+ parametric animal model, our method jointly reconstructs multiple instances and supports flexible prompts in the form of keypoints and masks whic...
106	FS-I2P:A Hierarchical Focus-Sweep Registration Network with Dynamically Allocated Depth 2605.07607 Image-to-point cloud registration提出分层聚焦扫描配准网络缓解跨模态尺度歧义与漂移。	cs.CV	Zhixin Cheng, Yujia Chen, Xujing Tao, Bohao Liao, Xiaotian Yin	Image-to-point cloud registration is often challenged by viewpoint changes, cross-modal discrepancies, and repetitive textures, which induce scale ambiguity and consequently lead to erroneous correspondences. Recent detection-free methods alleviate this issue ... Image-to-point cloud registration is often challenged by viewpoint changes, cross-modal discrepancies, and repetitive textures, which induce scale ambiguity and consequently lead to erroneous correspondences. Recent detection-free methods alleviate this issue by leveraging multi-scale features and transformer-based interactions. However, they still suffer from attention drift across layers and intra-scale inconsistencies, hindering precise registration. Inspired by human behavior, we propose a `...
107	LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation 2605.07640 Remote-sensing lithology benchmark构建LithoBench评测多模态大模型的遥感岩性解译能力。	cs.CVcs.AI	Jun Wang, Fengpeng Li, Hang Dong, Tianjin Huang, Wei Han	Remote sensing lithology interpretation is fundamental to geological surveys, mineral exploration, and regional geological mapping. Unlike general land-cover recognition, lithology interpretation is a knowledge-intensive task that requires experts to infer roc... Remote sensing lithology interpretation is fundamental to geological surveys, mineral exploration, and regional geological mapping. Unlike general land-cover recognition, lithology interpretation is a knowledge-intensive task that requires experts to infer rock types from various features, e.g., subtle visual, spectral, textural, geomorphological, and contextual cues, making reliable automated interpretation highly challenging. Geological knowledge-guided large multimodal models offer new opport...
108	EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting 2605.07642 Egocentric hand pose forecasting提出EggHand基础模型从第一视角视频预测未来3D手部姿态。	cs.CV	Jaeyoung Choi, Hyeondong Kim, Yujin Kim, Daehee Park	Forecasting future 3D hand pose sequences from egocentric video is essential for understanding human intention and enabling embodied applications such as AR/VR assistance and human-robot interaction. However, this task remains a highly challenging problem beca... Forecasting future 3D hand pose sequences from egocentric video is essential for understanding human intention and enabling embodied applications such as AR/VR assistance and human-robot interaction. However, this task remains a highly challenging problem because egocentric hand motion is driven by complex human intent, exhibits highly dexterous articulations, and is observed under drastic viewpoint shifts induced by ego-motion. In this work, we introduce EggHand, a foundation-model-based framew...
109	Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models 2605.07649 ODD-aware zero-shot perception研究在运行设计域约束下用VLM进行零样本安全感知。	cs.CVcs.AI	Berkehan \"Unal, Dierend Hauke, Fazlija Dren, Plachetka Christopher	Over the last few years, research on autonomous systems has matured to such a degree that the field is increasingly well-positioned to translate research into practical, stakeholder-driven use cases across well-defined domains. However, for a wide-scale practi... Over the last few years, research on autonomous systems has matured to such a degree that the field is increasingly well-positioned to translate research into practical, stakeholder-driven use cases across well-defined domains. However, for a wide-scale practical adoption of autonomous systems, adherence to safety regulations is crucial. Many regulations are influenced by the Operational Design Domain (ODD), which defines the specific conditions in which an autonomous agent can function. This is...
110	Breaking Spatial Uniformity: Prior-Guided Mamba with Radial Serialization for Lens Flare Removal 2605.07650 Lens flare removal with Mamba用径向序列化与先验引导Mamba实现区域自适应去眩光。	cs.CV	Zijia Fu (School of Artificial Intelligence, Beijing Normal University, Beijing, China), Yuanfei Huang (School of Artificial Intelligence	Lens flares, caused by complex optical aberrations, severely degrade image quality especially in nighttime photography. Although recent restoration methods have made remarkable progress, most still rely on spatially uniform processing. They are failing to hand... Lens flares, caused by complex optical aberrations, severely degrade image quality especially in nighttime photography. Although recent restoration methods have made remarkable progress, most still rely on spatially uniform processing. They are failing to handle the region-dependent restoration demands of flare scenes, where saturated light sources should be preserved, flare artifacts removed, and background details recovered. To address this challenge, we propose DeflareMambav2, a prior-guided ...
111	Aquatic Neuromorphic Optical Flow 2605.07653 Neuromorphic underwater optical flow用脉冲神经网络与事件视觉实现水下高效光流估计。	cs.CV	Pei Zhang, Yunkai Liang, Kaiqiang Wang	Underwater environments impose severe constraints on conventional imaging systems and demand solutions that balance high-quality sensing with strict resource efficiency. While emerging event cameras offer a promising alternative, their potential in aquatic sce... Underwater environments impose severe constraints on conventional imaging systems and demand solutions that balance high-quality sensing with strict resource efficiency. While emerging event cameras offer a promising alternative, their potential in aquatic scenarios remains largely unexplored. Through the lens of neuromorphic vision, this work pioneers the investigation of motion fields that serve as key media for agile underwater perception. Built upon spiking neural networks, we introduce a se...
112	Towards Billion-scale Multi-modal Biometric Search 2605.07655 Billion-scale multimodal biometric search总结Bharat ABIS十亿级多模态生物特征检索系统设计与经验。	cs.CVcs.AI	Arka Koner, Chetan S. Naik, Lokesh Kurre, Vivek Raghavan, Barada P. Sabut	Searching a multi-biometric database of a billion records for a country-level identity system requires pushing the limits of all aspects of a biometric system, including acquisition, preprocessing, feature extraction, accuracy, matching speed, presentation att... Searching a multi-biometric database of a billion records for a country-level identity system requires pushing the limits of all aspects of a biometric system, including acquisition, preprocessing, feature extraction, accuracy, matching speed, presentation attack detection, and handling of special cases (e.g., missing finger digits). This is the first paper that gives insights into such a large-scale multimodal biometric search system, called Bharat ABIS, based on open-source architectures. The ...
113	OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos 2605.07695 Text-guided surgical video editing提出OphEdit无需训练即可文本引导编辑眼科手术视频。	cs.CV	Ritul Jangir, Arkya Jyoti Bagchi, Aiman Farooq, Mangalton Okram, Saurabh Seetaram Korgaonkar	High-fidelity surgical video generation can greatly improve medical training and the development of AI, adapting these generative models for precise video editing remains a formidable challenge. Modifying surgical attributes, such as instrument tissue interact... High-fidelity surgical video generation can greatly improve medical training and the development of AI, adapting these generative models for precise video editing remains a formidable challenge. Modifying surgical attributes, such as instrument tissue interactions or procedural phases is challenging due to the strict anatomical and temporal constraints. In this paper, we propose OphEdit, a novel training-free framework for the text-guided editing of ophthalmic surgical videos. Our approach lever...
114	LAMES: A Large-Scale and Artisanal Mining Environmental Segmentation Dataset 2605.07740 Mining environmental segmentation dataset发布LAMES数据集用于手工采矿环境影响区域分割。	cs.CV	Matthias Kahl, Zhaiyu Chen, Sudipan Saha, Mrinalini Kochupillai, Lukas Kondmann	Mining operations are of utmost importance to the economy of some nations. However, such operations result in land-use change, very high energy consumption, and negative impacts on the environment, including soil erosion and deforestation. The mining process c... Mining operations are of utmost importance to the economy of some nations. However, such operations result in land-use change, very high energy consumption, and negative impacts on the environment, including soil erosion and deforestation. The mining process can impact an area much larger than the mining site itself. Adding to the negative externalities linked to mining is the fact that, in addition to government-sanctioned legal mining operations, illegal mining is widespread, including in vari...
115	Benchmarking Foundation Models for Renal Lesion Stratification in CT 2605.07749 Medical foundation model benchmark基准评测医学基础模型在CT肾脏病灶分层任务的迁移效果。	cs.CV	Hartmut H\"antze, Sarah de Boer, Myrthe Buser, Alessa Hering, Bram van Ginneken	The rapid proliferation of open-source medical foundation models (FMs) raises a practical question: how well do their pre-trained representations transfer to clinically relevant but data-scarce classification tasks? Particularly in CT-based renal lesion classi... The rapid proliferation of open-source medical foundation models (FMs) raises a practical question: how well do their pre-trained representations transfer to clinically relevant but data-scarce classification tasks? Particularly in CT-based renal lesion classification, a push toward greater generalizability would be meaningful, as the field is constrained by inherently limited training data. We addressed this through a benchmark of three medical FMs on this specific task. This six-class problem ...
116	Head Similarity: Modeling Structured Whole-Head Appearance Beyond Face Recognition 2605.07766 Whole-head appearance similarity提出头部相似度表征以建模超越人脸的整体外观一致性。	cs.CV	Yingfeng Wang, Yuxuan Xiao, Shengcai Liao	Many vision applications require identity consistency beyond strict biometric recognition, especially under non-frontal views or when facial cues are missing. However, conventional face recognition models enforce intra-identity invariance, collapsing appearanc... Many vision applications require identity consistency beyond strict biometric recognition, especially under non-frontal views or when facial cues are missing. However, conventional face recognition models enforce intra-identity invariance, collapsing appearance variations such as hairstyle or styling changes into a single representation, limiting their use in appearance-sensitive scenarios. To address this limitation, we introduce Head Similarity, a new formulation that extends identity-centric ...
117	SIMI: Self-information Mining Network for Low-light Image Enhancement 2605.07767 Unsupervised low-light enhancement提出SIMI用位平面分解挖掘自信息实现无监督低照增强。	cs.CV	Xuanshuo Fu, Lei Kang, Javier Vazquez-Corral	Poor lighting conditions significantly impact image quality, posing substantial challenges for image editing and visualization. Many existing enhancement methods aim at proposing complex models while neglecting the intrinsic information contained within low-li... Poor lighting conditions significantly impact image quality, posing substantial challenges for image editing and visualization. Many existing enhancement methods aim at proposing complex models while neglecting the intrinsic information contained within low-light images. In this work, we propose the Self-Information Mining (SIMI) network, an innovative unsupervised framework that decomposes low-light images into multiple components based on bit-plane decomposition. Our approach allows mining int...
118	Differentiable Ray Tracing with Gaussians for Unified Radio Propagation Simulation and View Synthesis 2605.07781 Gaussian differentiable ray tracing for RF在3D高斯表示中可微射线追踪统一RF传播仿真与视图合成。	cs.CV	Niklas Vaara, Lam Huynh, Pekka Sangi, Miguel Bordallo L\'opez, Janne Heikkil\"a	Explicit neural representations such as 3D Gaussian Splatting (3DGS) enable high-fidelity and real-time novel view synthesis, yet optimize for alpha-composited optical appearance rather than ray-intersectable geometry. In contrast, radio-frequency (RF) digital... Explicit neural representations such as 3D Gaussian Splatting (3DGS) enable high-fidelity and real-time novel view synthesis, yet optimize for alpha-composited optical appearance rather than ray-intersectable geometry. In contrast, radio-frequency (RF) digital twins require deterministic multi-bounce paths, where the geometry dictates trajectories and their associated attenuation and delay. We introduce a framework enabling differentiable RF propagation simulation directly within visually recons...
119	Radiologist-Guided Causal Concept Bottleneck Models for Chest X-Ray Interpretation 2605.07785 Causal concept bottleneck for CXR提出放射科医生引导的因果概念瓶颈模型解释胸片诊断。	cs.CV	Amy Rafferty, Rishi Ramaesh, Ajitha Rajan	Concept Bottleneck Models (CBMs) in medical imaging aim to improve model interpretability by predicting intermediate clinical concepts before final diagnoses. However, most existing CBMs treat concepts as discriminative predictors of pathology labels, without ... Concept Bottleneck Models (CBMs) in medical imaging aim to improve model interpretability by predicting intermediate clinical concepts before final diagnoses. However, most existing CBMs treat concepts as discriminative predictors of pathology labels, without explicitly modelling the underlying clinical generative process where diseases produce observable radiographic findings. We propose XpertCausal, a radiologist-guided causal CBM for chest X-ray interpretation which models pathology-to-concep...
120	APEX: Assumption-free Projection-based Embedding eXamination Metric for Image Quality Assessment 2605.07786 Image quality assessment metric提出APEX无假设投影嵌入度量以评估生成图像质量。	cs.CVcs.AI	Caterina Gallegati, Monica Bianchini, Franco Scarselli, Vittorio Murino, Barbara Toniella Corradini	As generative models achieve unprecedented visual quality, the gold standard for image evaluation remains traditional feature-distribution metrics (e.g., FID). However, these metrics are provably hindered by the closed-vocabulary bottleneck of outdated feature... As generative models achieve unprecedented visual quality, the gold standard for image evaluation remains traditional feature-distribution metrics (e.g., FID). However, these metrics are provably hindered by the closed-vocabulary bottleneck of outdated features and the assumptive bias of rigid parametric formulations. Recent alternatives exploit modern backbones to solve the feature bottleneck, yet continue to suffer from parametric limitations. To close this gap, we introduce APEX (Assumption-f...
121	SARA: Semantically Adaptive Relational Alignment for Video Diffusion Models 2605.07800 Video diffusion prompt alignment按提示语义分配关系对齐监督以提升视频扩散跟随性	cs.CV	Jiesong Lian, Zixiang Zhou, Ruizhe Zhong, Yuan Zhou, Qinglin Lu	Recent video diffusion models (VDMs) synthesize visually convincing clips, yet still drop entities, mis-bind attributes, and weaken the interactions specified in the prompt. Representation-alignment objectives such as VideoREPA and MoAlign improve fine-grained... Recent video diffusion models (VDMs) synthesize visually convincing clips, yet still drop entities, mis-bind attributes, and weaken the interactions specified in the prompt. Representation-alignment objectives such as VideoREPA and MoAlign improve fine-grained text following by distilling spatio-temporal token relations from a frozen visual foundation model, but their pairwise supervision budget is allocated by visual or motion cues rather than by how relevant each pair is to the prompt. We pres...
122	Text-to-CAD Evaluation with CADTests 2605.07807 Text-to-CAD evaluation benchmark提出可执行CADTests的自动化评测基准CADTestBench	cs.CVcs.LGcs.AI	Dimitrios Mallis, Marco Wang, Ahmet Serdar Karadeniz, Elisa Ricci, Anis Kacem	Text-to-CAD has recently emerged as an important task with the potential to substantially accelerate design workflows. Despite its significance, there has been surprisingly little work on Text-to-CAD evaluation, and assessing CAD model generation performance r... Text-to-CAD has recently emerged as an important task with the potential to substantially accelerate design workflows. Despite its significance, there has been surprisingly little work on Text-to-CAD evaluation, and assessing CAD model generation performance remains a considerable challenge. In this work, we introduce a new evaluation perspective for Text-to-CAD based on automated testing. We propose CADTestBench, the first test-based benchmark for Text-to-CAD, based on CADTests, executable soft...
123	ICDAR 2026 Competition on Writer Identification and Pen Classification from Hand-Drawn Circles 2605.07816 Writer ID and pen classification发布CircleID竞赛数据集用于写者识别与笔类型分类	cs.CV	Thomas Gorges, Janne van der Loop, Lukas H\"uttner, Linda-Sophie Schneider, Fei Wu	This paper presents CircleID, a large-scale ICDAR 2026 competition on writer identification and pen classification from scanned hand-drawn circles. The primary objective is to investigate how biometric writer characteristics and physical pen features naturally... This paper presents CircleID, a large-scale ICDAR 2026 competition on writer identification and pen classification from scanned hand-drawn circles. The primary objective is to investigate how biometric writer characteristics and physical pen features naturally entangle within minimal, static traces. CircleID comprises two distinct tasks: (1) open-set writer identification, requiring models to recognize known writers while explicitly rejecting unknown ones, and (2) cross-writer pen classification...
124	GazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoning 2605.07817 Active attention control for VLMs通过内部注意力控制实现主动视觉以增强多模态推理	cs.CVcs.CLcs.AI	Brown Ebouky, Gabriele Carrino, Niccolo Avogaro, Christoph Studer, Andrea Bartezzaghi	Human visual reasoning is governed by active vision, a process where metacognitive control drives top-down goal-directed attention, dynamically routing foveal focus toward task-relevant details while maintaining peripheral awareness of the global scene. In con... Human visual reasoning is governed by active vision, a process where metacognitive control drives top-down goal-directed attention, dynamically routing foveal focus toward task-relevant details while maintaining peripheral awareness of the global scene. In contrast, modern Vision-Language Models (VLMs) process visual information passively, relying on the static accumulation of massive token contexts that dilute spatial reasoning and induce linguistic hallucinations. Here we propose the following...
125	Divide and Conquer: Object Co-occurrence Helps Mitigate Simplicity Bias in OOD Detection 2605.07821 OOD detection with co-occurrence利用物体共现上下文缓解简单性偏置提升近OOD检测	cs.CVcs.AI	Boyang Dai, Chaoqi Chen, Yizhou Yu	Out-of-distribution (OOD) detection is crucial for ensuring the reliability of deep learning models. Existing methods mostly focus on regular entangled representations to discriminate in-distribution (ID) and OOD data, neglecting the rich contextual informatio... Out-of-distribution (OOD) detection is crucial for ensuring the reliability of deep learning models. Existing methods mostly focus on regular entangled representations to discriminate in-distribution (ID) and OOD data, neglecting the rich contextual information within images. This issue is particularly challenging for detecting near-OOD, as models with simplicity bias struggle to learn discriminative features in disentangled representations. The human visual system can use the co-occurrence of o...
126	Explainable Part-Based Vehicle Classifier with Spatial Awareness 2605.07831 Explainable part-based vehicle classification用部件检测加决策树实现可解释且具空间感知的车型分类	cs.CV	Andreas Caduff (Competence Center for Intelligent Sensors and Networks, Lucerne University of Applied Science and Art), Klaus Zahn (Competence Center for Intelligent Sensors and Networks, Lucerne University of Applied Science and Art), Jonas Hofstetter (Competence Center for Intelligent Sensors and Networks	In the area of Intelligent Transportation Systems (ITS), fine-grained vehicle classification systems play an essential role. Recently, the authors have presented a novel vision-based classification approach in which standard end-to-end Convolutional Neural Net... In the area of Intelligent Transportation Systems (ITS), fine-grained vehicle classification systems play an essential role. Recently, the authors have presented a novel vision-based classification approach in which standard end-to-end Convolutional Neural Networks (CNNs) have been decomposed into 1) a CNN-based detector for semantically strong vehicle parts, followed by 2) feature construction and 3) final classification by a decision tree. In contrast to conventional CNNs, this allows both eas...
127	BRIDGE: Background Routing and Isolated Discrete Gating for Coarse-Mask Local Editing 2605.07846 Coarse-mask local image editing用背景路由与离散门控减轻粗掩码形状偏置实现局部编辑	cs.CV	Peilin Xiong, Honghui Yuan, Junwen Chen, Keiji Yanai	Coarse-mask local image editing asks a model to modify a user-indicated region while preserving the surrounding scene. In practice, however, rough masks often become unintended shape priors: instead of serving as flexible edit support, the mask can pull the ge... Coarse-mask local image editing asks a model to modify a user-indicated region while preserving the surrounding scene. In practice, however, rough masks often become unintended shape priors: instead of serving as flexible edit support, the mask can pull the generated object toward its accidental boundary. We study this failure as mask-shape bias and frame the task through a Two-Zone Constraint, where the background should remain stable while the editable region should follow the instruction with...
128	EyeCue: Driver Cognitive Distraction Detection via Gaze-Empowered Egocentric Video Understanding 2605.07859 Driver cognitive distraction detection融合注视信息的第一视角视频理解检测驾驶认知分心	cs.CV	Lang Zhang, JinYi Yoon, Matthew Corbett, Abhijit Sarkar, Bo Ji	Driver cognitive distraction is a major cause of road collisions and remains difficult to detect. Unlike manual or visual distraction, cognitive distraction is diverted by thoughts unrelated to driving, even when the driver appears visually attentive and exhib... Driver cognitive distraction is a major cause of road collisions and remains difficult to detect. Unlike manual or visual distraction, cognitive distraction is diverted by thoughts unrelated to driving, even when the driver appears visually attentive and exhibits no explicit physical movements. In this work, we propose EyeCue, a gaze-empowered egocentric video understanding framework, to detect driver cognitive distraction. A key insight is that cognitive distraction manifests in the interaction...
129	From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data 2605.07861 Identity-consistent makeup transfer结合合成与真实数据实现保持身份一致的妆容迁移	cs.CV	Yue Yu, Jiayu Wang, Jiajia Shi, Jingjing Chen, Yu-Gang Jiang	Makeup transfer aims to apply the makeup style of a reference portrait to a source portrait while preserving identity and background. Early methods formulate this task as unsupervised image-to-image translation, relying on surrogate objectives and often yieldi... Makeup transfer aims to apply the makeup style of a reference portrait to a source portrait while preserving identity and background. Early methods formulate this task as unsupervised image-to-image translation, relying on surrogate objectives and often yielding limited performance. Recent diffusion- and flow-based approaches instead exploit synthetic data for supervised training, leading to significant improvements. However, these methods still face two critical challenges: synthetic supervisio...
130	Video Understanding Reward Modeling: A Robust Benchmark and Performant Reward Models 2605.07872 Video reward modeling benchmark构建VURB偏好基准并训练高性能视频理解奖励模型	cs.CVcs.AI	Yuancheng Wei, Linli Yao, Lei Li, Haojie Zhang, Hao Zhou	Multimodal reward models have advanced substantially in text and image domains, yet progress in video understanding reward modeling remains severely limited by the lack of robust evaluation benchmarks and high-quality preference data. To address this, we propo... Multimodal reward models have advanced substantially in text and image domains, yet progress in video understanding reward modeling remains severely limited by the lack of robust evaluation benchmarks and high-quality preference data. To address this, we propose a unified framework spanning benchmark design, data construction, and reward model training. We introduce Video Understanding Reward Bench (VURB), a benchmark featuring 2,100 preference pairs with long chain-of-thought reasoning traces (...
131	Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding 2605.07897 Semantic adaptive video memory以语义信号自适应管理记忆以支持流式视频问答理解	cs.CVcs.AI	Hang Wu, Sherin Mary Mathews, Yujun Cai, Ming-Hsuan Yang, Yiwei Wang	Online streaming video understanding requires models to process continuous visual inputs and respond to user queries in real time, where the unbounded stream and unpredictable query timing turn memory management into a central challenge. Existing methods typic... Online streaming video understanding requires models to process continuous visual inputs and respond to user queries in real time, where the unbounded stream and unpredictable query timing turn memory management into a central challenge. Existing methods typically compress visual tokens via visual similarity heuristics, or augment compression with KV-cache-level retrieval. However, compression decisions rarely incorporate semantic signals, and retrieval is often added after compression is finali...
132	One World, Dual Timeline: Decoupled Spatio-Temporal Gaussian Scene Graph for 4D Cooperative Driving Reconstruction 2605.07910 Asynchronous 4D driving reconstruction用双时间线高斯场景图处理异步观测的协同驾驶重建	cs.CV	Yulong Chen, Xiaoyun Dong, Haoyu Zhang, Zongxian Yang, Lewei Xie	Reconstructing dynamic scenes from Vehicle-to-Infrastructure Cooperative Autonomous Driving (VICAD) data is fundamentally complicated by temporal asynchrony: vehicle and infrastructure cameras operate on independent clocks, capturing the same dynamic agent suc... Reconstructing dynamic scenes from Vehicle-to-Infrastructure Cooperative Autonomous Driving (VICAD) data is fundamentally complicated by temporal asynchrony: vehicle and infrastructure cameras operate on independent clocks, capturing the same dynamic agent such as cars and pedestrians at different physical times. Existing Gaussian Scene Graph methods implicitly assume synchronized observations and assign a single pose per agent per frame, which is an assumption that breaks in cooperative setting...
133	What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion 2605.07915 Tokenizer design for latent diffusion研究扩散友好潜空间并提出先验对齐自编码器改进LDM	cs.CV	Zhengrong Yue, Taihang Hu, Mengting Chen, Haiyu Zhang, Zihao Pan	Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve reconstruction fidelity or inherit pretrained representations, leav... Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve reconstruction fidelity or inherit pretrained representations, leaving unclear what kind of latent space is truly friendly for generative modeling. In this paper, we study this question from the perspective of latent manifold organization. By constructing controlled tokenizer variants, we identify three ke...
134	MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence 2605.07919 Trustworthy medical VLM evaluation评测医学VLM在证据破坏下的静默失败并给出基准	cs.CV	Hanqi Jiang, Junhao Chen, Yi Pan, Lifeng Chen, Weihang You	Medical vision--language models (VLMs) are usually evaluated on intact image--question pairs, but trustworthy clinical use requires a stronger property: a model must recognise when the evidential basis for an answer has failed. We study this through silent fai... Medical vision--language models (VLMs) are usually evaluated on intact image--question pairs, but trustworthy clinical use requires a stronger property: a model must recognise when the evidential basis for an answer has failed. We study this through silent failures under perturbed evidence, where a vision-required medical question is paired with a false premise, wording perturbation, knowledge-only rewrite, or ROI-corrupted image, yet the model returns a fluent non-refusal answer. We introduce m...
135	One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy 2605.07931 Low-bandwidth world models for VLA提出每帧单token的低带宽世界模型以提升VLA规划	cs.CVcs.AI	Zuojin Tang, Shengchao Yuan, Xiaoxin Bai, Zhiyuan Jin, De Ma	Vision-language-action (VLA) models increasingly rely on auxiliary world modules to plan over long horizons, yet how such modules should be parameterized on top of a pretrained VLA remains an open design question. Existing world-model-augmented VLAs typically ... Vision-language-action (VLA) models increasingly rely on auxiliary world modules to plan over long horizons, yet how such modules should be parameterized on top of a pretrained VLA remains an open design question. Existing world-model-augmented VLAs typically pass the per-frame visual stream into the world module at high visual bandwidth and treat its rollout as a side product of action prediction; under a constrained adaptation budget on a frozen backbone, this leaves both the per-frame represe...
136	Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision 2605.07940 Single-pair exemplar image editing用单对样例监督学习可迁移编辑语义的Delta-Adapter	cs.CV	Jiacheng Chen, Songze Li, Han Fu, Baoquan Zhao, Wei Liu	Exemplar-based image editing applies a transformation defined by a source-target image pair to a new query image. Existing methods rely on a pair-of-pairs supervision paradigm, requiring two image pairs sharing the same edit semantics to learn the target trans... Exemplar-based image editing applies a transformation defined by a source-target image pair to a new query image. Existing methods rely on a pair-of-pairs supervision paradigm, requiring two image pairs sharing the same edit semantics to learn the target transformation. This constraint makes training data difficult to curate at scale and limits generalization across diverse edit types. We propose Delta-Adapter, a method that learns transferable editing semantics under single-pair supervision, re...
137	Rebalancing gradient to improve self-supervised co-training of depth, odometry and optical flow predictions 2605.07945 Self-supervised depth-odometry-flow co-training通过动态梯度再平衡改进深度里程计与光流的自监督协同训练	cs.CV	Marwane Hariat, Antoine Manzanera, David Filliat	We present CoopNet, an approach that improves the cooperation of co-trained networks by dynamically adapting the apportionment of gradient, to ensure equitable learning progress. It is applied to motion-aware self-supervised prediction of depth maps, by introd... We present CoopNet, an approach that improves the cooperation of co-trained networks by dynamically adapting the apportionment of gradient, to ensure equitable learning progress. It is applied to motion-aware self-supervised prediction of depth maps, by introducing a new hybrid loss, based on a distribution model of photo-metric reconstruction errors made by, on the one hand the depth + odometry paired networks, and on the other hand the optical flow network. This model essentially assumes that ...
138	TimeLesSeg: Unified Contrast-Agnostic Cross-Sectional and Longitudinal MS Lesion Segmentation via a Stochastic Generative Model 2605.07955 MS lesion segmentation generative model用随机生成模型统一跨期与纵向且对对比度不敏感的MS分割	cs.CVcs.AI	Vicent Caselles-Ballester, Eloy Mart\'inez-Heras, Giuseppe Pontillo, Zoe Mendelsohn, Elena M. Marr\'on	Multiple sclerosis (MS) expresses substantial clinical and radiological heterogeneity, which poses significant challenges for automatic lesion segmentation. The current deep learning-based SOTA is highly susceptible to changes in both distribution, e.g., chang... Multiple sclerosis (MS) expresses substantial clinical and radiological heterogeneity, which poses significant challenges for automatic lesion segmentation. The current deep learning-based SOTA is highly susceptible to changes in both distribution, e.g., changes in scanner; as well as the structure of inputs, evident in the current divide between cross-sectional and longitudinal approaches. We introduce TimeLesSeg, a unified contrast-agnostic framework designed to segment MS lesions regardless o...
139	DVD: Discrete Voxel Diffusion for 3D Generation and Editing 2605.07971 Discrete voxel diffusion for 3D提出离散体素扩散生成与编辑稀疏体素脚手架用于3D管线	cs.CVcs.LG	Zhengrui Xiang, Jiaqi Wu, Fupeng Sun, Heliang Zheng, Yingzhen Li	We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in ... We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in image-like generation, we show that it can be an effective first-stage prior for sparse voxel scaffolds. By treating voxel occupancy as a native discrete variable, DVD avoids continuous-to-discrete thresholding and provides a simple framewo...
140	HEART: Hyperspherical Embedding Alignment via Kent-Representation Traversal in Diffusion Models 2605.07973 Hyperspherical embedding control in diffusion在超球嵌入上对齐并遍历Kent表示以增强扩散可控性	cs.CV	Arani Roy, Shristi Das Biswas, Kaushik Roy	Text-to-image diffusion models can generate visually stunning images, yet, controlling what appears and how it appears, remains surprisingly difficult, especially when operating solely within the constraints of the text-conditioning space. For example, changin... Text-to-image diffusion models can generate visually stunning images, yet, controlling what appears and how it appears, remains surprisingly difficult, especially when operating solely within the constraints of the text-conditioning space. For example, changing a subject or adjusting an attribute often leads to unintended side effects, such as altered backgrounds or distorted details. This is because most existing text-based control methods treat the embedding space as Euclidean and apply simple...
141	Seeing Across Skies and Streets: Feedforward 3D Reconstruction from Satellite, Drone, and Ground Images 2605.07978 Cross-view feedforward 3D reconstruction融合卫星无人机与地面图像实现前馈式跨视角3D重建定位	cs.CV	Qiwei Wang, Zhongyao Tuo, Xianghui Ze, Yujiao Shi	Cross-view localization classically asks: where does this ground image lie on the satellite tile? Existing methods are typically limited to 3-DoF estimates -- an $(x,y)$ position and a yaw angle -- because nadir satellite imagery provides no direct cues for ro... Cross-view localization classically asks: where does this ground image lie on the satellite tile? Existing methods are typically limited to 3-DoF estimates -- an $(x,y)$ position and a yaw angle -- because nadir satellite imagery provides no direct cues for roll, pitch, or altitude, forcing a reliance on planar-motion and zero-tilt assumptions. These assumptions break on real terrain with slopes, ramps, and tilted camera mounts. To overcome this, we introduce a single UAV image as an intermediat...
142	Rethinking Dense Optical Flow without Test-Time Scaling 2605.08000 Efficient dense optical flow借助基础模型语义几何先验提升光流精度而无需测试时缩放	cs.CV	Praroop Chanda, Suryansh Kumar	Recent progress in dense optical flow has been driven by increasingly complex architectures and multi-step refinement for test-time scaling. While these approaches achieve strong benchmark performance, they also require substantial computation during inference... Recent progress in dense optical flow has been driven by increasingly complex architectures and multi-step refinement for test-time scaling. While these approaches achieve strong benchmark performance, they also require substantial computation during inference. This raises a fundamental question: Is scaling test-time computation the only way to improve dense optical flow accuracy? We argue that it is not. Instead, powerful visual semantic and geometric priors encoded in modern foundation models ...
143	SphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hypersphere 2605.08003 Training-free video anomaly detection在单位超球上做测地推断利用预训练特征实现免训练异常检测	cs.CV	Chao Huang, Penfei Wei, Wei Wang, Jie Wen, Zhihua Wang	Video anomaly detection (VAD) aims to automatically identify events that deviate from normal patterns in untrimmed surveillance videos. Existing methods universally depend on large-scale annotations or task-specific training procedures, severely limiting their... Video anomaly detection (VAD) aims to automatically identify events that deviate from normal patterns in untrimmed surveillance videos. Existing methods universally depend on large-scale annotations or task-specific training procedures, severely limiting their rapid deployment to novel scenes. We observe that intermediate-layer features of pre-trained multimodal large language models (MLLMs) already encode rich anomaly semantics, yet existing approaches rely on the language output pathway and fa...
144	TRAS: An Interactive Software for Tracing Tree Ring Cross Sections 2605.08025 Tree ring tracing software开源TRAS软件用于树轮自动描绘、人工校正与测量	cs.CV	Henry Marichal, Diego Passarella, Gregory Randall	Tree ring marking remains a key step in dendrometry and dendrochronology, but it is often performed manually, making the process time-consuming, subjective, and difficult to scale to large image datasets. We present the Tree Ring Analyzer Suite (TRAS), an open... Tree ring marking remains a key step in dendrometry and dendrochronology, but it is often performed manually, making the process time-consuming, subjective, and difficult to scale to large image datasets. We present the Tree Ring Analyzer Suite (TRAS), an open-source graphical software for automatic delineation, manual correction, and measurement of tree rings in wood cross-sectional images. TRAS integrates three complementary detection algorithms: the classical image-processing method CS-TRD an...
145	STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation 2605.08029 Unified multimodal generation with flows用自回归归一化流连接语言模型实现统一的图文序列生成	cs.CVcs.LG	Ying Shen, Tianrong Chen, Yuan Gao, Yizhe Zhang, Yuyang Wang	Deep generative models have advanced rapidly across text and vision, motivating unified multimodal systems that can understand, reason over, and generate interleaved text-image sequences. Most existing approaches combine autoregressive language modeling with d... Deep generative models have advanced rapidly across text and vision, motivating unified multimodal systems that can understand, reason over, and generate interleaved text-image sequences. Most existing approaches combine autoregressive language modeling with diffusion-based image generators, inheriting a structural mismatch between causal text generation and iterative visual denoising. We observe that autoregressive normalizing flows are autoregressive Transformers--sharing the same causal mask,...
146	PET-Adapter: Test-Time Domain Adaptation for Full and Limited-Angle PET Image Reconstruction 2605.08030 Test-time domain adaptation for PET提出PET-Adapter在测试时自适应以提升全角与限角PET重建	cs.CVcs.LG	R\"uveyda Yilmaz, Yuli Wu, Johannes Stegmaier, Volkmar Schulz	Positron Emission Tomography (PET) image reconstruction is inherently challenged by Poisson noise and physical degradation factors, which are further exacerbated in limited-angle acquisitions. While deep learning methods demonstrate promising performance, thei... Positron Emission Tomography (PET) image reconstruction is inherently challenged by Poisson noise and physical degradation factors, which are further exacerbated in limited-angle acquisitions. While deep learning methods demonstrate promising performance, their generalization to unseen clinical data distributions remains limited without extensive retraining. We propose PET-Adapter, a test-time domain adaptation framework for generative PET reconstruction models pretrained solely on phantom data....
147	Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models 2605.08031 Reinforcement unlearning for VLMs在视觉编码器上用强化式遗忘去除敏感语义并抑制幻觉	cs.CV	Kaidi Jia, Yujie Lin, Chengyi Yang, Jiayao Ma, Jinsong Su	Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge. However, existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fai... Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge. However, existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fails to erase underlying visual representations and often introduces object hallucination. We propose HFRU, a reinforcement unlearning framework that operates on the vision encoder for deep semantic removal. Our two-stage approach combines al...
148	SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation 2605.08043 Structured orchestration for image generation以结构化分解与条件技能编排贯穿生成流程满足复杂语义约束	cs.CVcs.AI	Tianfei Ren, Zhipeng Yan, Yiming Zhao, Zhen Fang, Yu Zeng	While text-to-image models have made strong progress in visual fidelity, faithfully realizing complex visual intents remains challenging because many requirements must be tracked across grounding, generation, and verification. We refer to these requirements as... While text-to-image models have made strong progress in visual fidelity, faithfully realizing complex visual intents remains challenging because many requirements must be tracked across grounding, generation, and verification. We refer to these requirements as semantic commitments and formalize their lifecycle discontinuity as the Conceptual Rift, where commitments may be locally resolved or checked but fail to remain identifiable as the same operational units throughout the generation lifecycle...
149	MoCoTalk: Multi-Conditional Diffusion with Adaptive Router for Controllable Talking Head Generation 2605.08050 Controllable talking-head video diffusion用自适应路由融合多条件信号实现可控说话人头视频生成	cs.CV	Xinyan Ye, Jiankang Deng, Abbas Edalat	Talking-head generation requires joint modeling of identity, head pose, facial expression, and mouth dynamics. Existing methods typically address only a subset of these factors, and rely on fixed-weight or heuristic fusion when multiple conditions are involved... Talking-head generation requires joint modeling of identity, head pose, facial expression, and mouth dynamics. Existing methods typically address only a subset of these factors, and rely on fixed-weight or heuristic fusion when multiple conditions are involved. We present MoCoTalk, a multi-conditional video diffusion framework that unifies four complementary control signals: a reference image, facial keypoints, 3DMM-rendered shading meshes, and the corresponding speech audio. To resolve destruct...
150	Towards Highly-Constrained Human Motion Generation with Retrieval-Guided Diffusion Noise Optimization 2605.08054 Constrained human motion diffusion用检索引导的扩散噪声优化生成满足强时空约束的人体动作	cs.CV	Hanchao Liu, Fang-Lue Zhang, Shining Zhang, Tai-Jiang Mu, Shi-Min Hu	Generating human motion that satisfies customized zero-shot goal functions, enabling applications such as controllable character animation and behavior synthesis for virtual agents, is a critical capability. While current approaches handle many unseen constrai... Generating human motion that satisfies customized zero-shot goal functions, enabling applications such as controllable character animation and behavior synthesis for virtual agents, is a critical capability. While current approaches handle many unseen constraints, they fail on tasks with very challenging spatiotemporal restrictions, such as severe spatial obstacles or specified numbers of walking steps. To equip motion generators for these highly constrained tasks, we present a retrieval-guided ...
151	6D Pose Estimation via Keypoint Heatmap Regression with RGB-D Residual Neural Networks 2605.08059 RGB-D 6D Pose Estimation用关键点热力图回归结合PnP估计物体6D位姿。	cs.CV	Ismail Aljosevic, Amir Masoud Almasi, Ana Parovic, Ashkan Shafiei	In this paper, we propose a modular framework for 6D pose estimation based on keypoint heatmap regression. Our approach combines YOLOv10m for object detection with a ResNet18-based network that predicts 2D heatmaps from RGB images. Keypoints extracted from the... In this paper, we propose a modular framework for 6D pose estimation based on keypoint heatmap regression. Our approach combines YOLOv10m for object detection with a ResNet18-based network that predicts 2D heatmaps from RGB images. Keypoints extracted from these heatmaps are used to estimate the 6D object pose via the PnP RANSAC algorithm. We compare different keypoint selection strategies to assess their impact on pose accuracy. Additionally, we extend the baseline by incorporating depth data u...
152	Flow-OPD: On-Policy Distillation for Flow Matching Models 2605.08063 On-Policy Distillation for Flow提出Flow-OPD缓解多任务对齐的奖励稀疏与梯度干扰。	cs.CVcs.AI	Zhen Fang, Wenxuan Huang, Yu Zeng, Yiming Zhao, Shuang Chen	Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induced by scalar-valued rewards, and the gradient interference arising from jointly optimizing heterogeneous objectives, whic... Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induced by scalar-valued rewards, and the gradient interference arising from jointly optimizing heterogeneous objectives, which together give rise to a 'seesaw effect' of competing metrics and pervasive reward hacking. Inspired by the success of On-Policy Distillation (OPD) in the large language model community, we propose Flow-OPD, the first unified post-training...
153	Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment 2605.08064 Efficient 3D VLM Representations用语义聚类与对齐构建高效3D表征以增强VLM空间推理。	cs.CV	Jerry Jiang, Haowen Sun, Denis Gudovskiy, Yohei Nakata, Tomoyuki Okuno	Spatial intelligence in vision-language models (VLMs) attracts research interest with the practical demand to reason in the 3D world.Despite promising results, most existing methods follow the conventional 2D pipeline in VLMs and use pixel-aligned representati... Spatial intelligence in vision-language models (VLMs) attracts research interest with the practical demand to reason in the 3D world.Despite promising results, most existing methods follow the conventional 2D pipeline in VLMs and use pixel-aligned representations for the vision modality. However, correspondence-based models with implicit 3D scene understanding often fail to achieve spatial consistency, and representation-based models with 3D geometric priors lack efficiency in vision sequence se...
154	EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction 2605.08073 Event-guided Image Reconstruction提出高效状态空间模型融合事件数据进行图像重建。	cs.CVcs.AI	Wei Yu, Yunhang Qian	Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations: CNNs often fail to ... Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations: CNNs often fail to capture global feature correlations, whereas ViTs incur quadratic computational complexity (e.g., $O(n^2)$), hindering their application in high-resolution scenarios. To address these bottlenecks, we introduce EmambaIR, an Efficient visual ...
155	Normalizing Trajectory Models 2605.08078 Normalizing Flow Trajectory Models用条件归一化流建模少步反向轨迹并保持精确似然训练。	cs.CVcs.LG	Jiatao Gu, Tianrong Chen, Ying Shen, David Berthelot, Shuangfei Zhai	Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coarse transitions. Existing few-step methods address this through distillation, consistency training, o... Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coarse transitions. Existing few-step methods address this through distillation, consistency training, or adversarial objectives, but sacrifice the likelihood framework in the process. We introduce Normalizing Trajectory Models (NTM), which models each reverse step as an expressive conditional normalizing flow with exact likelihood training. ...
156	Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers 2605.06169 Stabilizing Deep Diffusion Transformers分析千层DiT均值塌缩并用均值-方差残差分裂抑制MMS。	cs.CV	Pengqi Lu	Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. Through mechanistic auditing... Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. Through mechanistic auditing, we isolate the trigger event of this collapse as Mean Mode Screaming (MMS). MMS can occur even when training appears stable, with a mean-coherent backward shock on residual writers that opens deep residual branches and drives the network ...
157	On the Role of Strain and Vorticity in Numerical Integration Error for Flow Matching 2605.06680 Flow Matching Integration Error分解速度雅可比为应变与涡量以刻画流匹配积分误差来源。	cs.CVcs.LG	Chenxi Tao, Seung-Kyum Choi	Flow matching generates data by integrating a learned velocity field, where the number of integration steps (NFE) directly determines inference cost. We analyze which properties of the velocity field govern integration error by decomposing the velocity Jacobia... Flow matching generates data by integrating a learned velocity field, where the number of integration steps (NFE) directly determines inference cost. We analyze which properties of the velocity field govern integration error by decomposing the velocity Jacobian into its symmetric part S (strain rate) and antisymmetric part Omega (vorticity). We prove that strain and vorticity play different roles: strain controls exponential error amplification through the logarithmic norm, while vorticity contr...
158	A Hierarchical Ensemble Pipeline for Anomaly Detection in ESA Satellite Telemetry 2605.06681 Satellite Telemetry Anomaly Detection用分层集成与多级堆叠检测ESA多变量遥测异常。	cs.CVcs.LG	Lorenzo Riccardo Allegrini, Geremia Pompei	A hierarchical ensemble pipeline is introduced to address anomaly detection in multivariate telemetry data provided by European Space Agency (ESA). The method integrates shapelet-based and statistical feature extraction, per-channel modeling, intra-channel sta... A hierarchical ensemble pipeline is introduced to address anomaly detection in multivariate telemetry data provided by European Space Agency (ESA). The method integrates shapelet-based and statistical feature extraction, per-channel modeling, intra-channel stacking, and a final cross-channel aggregation. The pipeline is trained and validated using time-series cross-validation and two-level masking strategies to prevent information leakage. Results on the European Space Agency Anomaly Detection B...
159	Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention 2605.06699 Multimodal Latent Diffusion Synthesis在共享潜空间用交叉注意力联合生成MRI体数据与表格数据。	cs.CVcs.LGcs.AI	Daniel Mensing, Jan Kapar, Jochen G. Hirsch, Matthias G\"unther, Horst Hahn	We propose a multimodal latent diffusion model that jointly synthesizes volumetric magnetic resonance imaging (MRI) and tabular clinical data within a shared latent space via cross-attention. This approach enables coherent joint representation learning of MRI ... We propose a multimodal latent diffusion model that jointly synthesizes volumetric magnetic resonance imaging (MRI) and tabular clinical data within a shared latent space via cross-attention. This approach enables coherent joint representation learning of MRI and tabular modalities for generative modeling. Our model utilizes a variational autoencoder to fuse the two modalities before diffusion-based synthesis, allowing modality-appropriate reconstruction with separate decoders for MRI and tabula...
160	Weblica: Scalable and Reproducible Training Environments for Visual Web Agents 2605.06761 Reproducible Web Agent Environments用HTTP缓存与回放构建可扩展可复现的视觉网页智能体训练环境。	cs.CVcs.LGcs.AI	O\u{g}uzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan	The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environme... The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stabl...
161	Enhancing Eye Movement Biometrics for User Authentication via Continuous Gaze Offset Score Fusion 2605.06810 Eye Movement Biometrics Fusion将连续凝视偏移与眼动特征融合提升用户认证性能。	cs.CV	Hashim Aziz, Mehedi Hasan Raju, Oleg V. Komogortsev	Eye movement biometrics (EMB) use subject-specific gaze dynamics for user authentication and identification. Recent deep learning-based EMB systems achieve strong performance by modeling temporal eye movement behavior. However, these systems typically overlook... Eye movement biometrics (EMB) use subject-specific gaze dynamics for user authentication and identification. Recent deep learning-based EMB systems achieve strong performance by modeling temporal eye movement behavior. However, these systems typically overlook continuous gaze offset, despite prior evidence that it contains user-discriminative information. This work examines whether continuous gaze offset can improve biometric performance when combined with existing biometric features. We evaluat...
162	Uneven Evolution of Cognition Across Generations of Generative AI Models 2605.06815 Psychometric Evaluation of GenAI用心理测量框架评估多代生成式模型的认知剖面并对比人类常模。	cs.CVcs.AI	Isaac Galatzer-Levy, Daniel McDuff, Xin Liu, Jed McGiffin	The pursuit of artificial general intelligence necessitates robust methods for evaluating the cognitive capabilities of models beyond narrow task performance. Here, we introduce a psychometric framework to assess the cognitive profiles of generative AI, compar... The pursuit of artificial general intelligence necessitates robust methods for evaluating the cognitive capabilities of models beyond narrow task performance. Here, we introduce a psychometric framework to assess the cognitive profiles of generative AI, comparing them to human norms and tracking their evolution across generations. Initial evaluation of leading multimodal models using tasks adapted from the Wechsler Adult Intelligence Scale revealed a profoundly uneven cognitive architecture: nea...
163	A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models 2605.06829 Unified Theory of Diffusion Models以测度论统一扩散、score与流匹配为学习时变向量场的框架。	cs.CVcs.LG	Aditya Ranganath, Mukesh Singhal	We survey continuous-time generative modeling methods based on transporting a simple reference distribution to a data distribution via stochastic or deterministic dynamics. We present a unified framework in which diffusion models, score-based generative models... We survey continuous-time generative modeling methods based on transporting a simple reference distribution to a data distribution via stochastic or deterministic dynamics. We present a unified framework in which diffusion models, score-based generative models, and flow matching are instances of learning a time-dependent vector field that induces a family of marginals $(\rho_t)_{t \in [0,1]}$ governed by continuity and Fokker-Planck equations. Such a unified theory is timely because these method...
164	EULER-ADAS: Energy-Efficient & SIMD-Unified Logarithmic-Posit Engine for Precision-Reconfigurable Approximate ADAS Acceleration 2605.06875 Posit Accelerator for ADAS提出SIMD对数Posit计算引擎以低功耗加速ADAS推理。	cs.CVcs.AI	Mukul Lokhande, Ratko Pilipovic, Omkar Kokane, Adam Teman, Santosh Kumar Vishvakarma	Advanced driver-assistance systems (ADAS) require neural compute engines that deliver low-latency inference under strict power and area constraints. Posit arithmetic is attractive for such accelerators because it provides high numerical fidelity at low precisi... Advanced driver-assistance systems (ADAS) require neural compute engines that deliver low-latency inference under strict power and area constraints. Posit arithmetic is attractive for such accelerators because it provides high numerical fidelity at low precision, but its variable-length regime encoding increases encode/decode cost and exposes the datapath to large regime-field fault effects. This paper presents EULER-ADAS, a SIMD-enabled logarithmic bounded-Posit neural compute engine for energy...
165	Dr-BA: Separable Optimization for Direct Radar Bundle Adjustment & Localization 2605.07041 Radar Bundle Adjustment Localization直接在旋转雷达强度图上做可分优化的BA与定位。	cs.CV	Daniil Lisus, Cedric Le Gentil, Timothy D. Barfoot	This paper introduces Dr-BA, a first-of-its-kind radar bundle adjustment (BA) framework that operates directly on 2D spinning radar intensity images. Unlike camera or lidar sensors, radar is largely unaffected by precipitation, making it a critical modality fo... This paper introduces Dr-BA, a first-of-its-kind radar bundle adjustment (BA) framework that operates directly on 2D spinning radar intensity images. Unlike camera or lidar sensors, radar is largely unaffected by precipitation, making it a critical modality for autonomous systems that require all-weather robustness. Existing state estimation approaches using spinning radar typically extract sparse point clouds from range-azimuth-intensity measurements and apply point cloud alignment techniques t...
166	Do Joint Audio-Video Generation Models Understand Physics? 2605.07061 Physics Benchmark for AV Generation提出AV-Phys Bench评测音视频联合生成的物理一致性。	cs.CVcs.AIcs.SDcs.MM	Zijun Cui, Xiulong Liu, Hao Fang, Mingwei Xu, Jiageng Liu	Joint audio-video generation models are rapidly approaching professional production quality, raising a central question: do they understand audio-visual physics, or merely generate plausible sounds and frames that violate real-world consistency? We introduce A... Joint audio-video generation models are rapidly approaching professional production quality, raising a central question: do they understand audio-visual physics, or merely generate plausible sounds and frames that violate real-world consistency? We introduce AV-Phys Bench, a benchmark for evaluating physical commonsense in joint audio-video generation. AV-Phys Bench tests models across three scene categories: Steady State, Event Transition, and Environment Transition. It covers physics-grounded ...
167	Fine-tuning a vision-language model for fracture-surface morphology recognition 2605.07145 VLM for Fracture Morphology微调视觉语言模型识别断口形貌并构建大规模标注数据集。	cs.CV	Quanliang Liu, Jungtaek Kim, Kangwook Lee, Hyunseok Oh	Vision-language models (VLMs) have shown strong potential for scientific image understanding, but general-purpose models often lack the domain-specific visual knowledge required for reliable materials characterization. In this work, we fine-tuned an open-sourc... Vision-language models (VLMs) have shown strong potential for scientific image understanding, but general-purpose models often lack the domain-specific visual knowledge required for reliable materials characterization. In this work, we fine-tuned an open-source VLM (Qwen3-VL-32B-Instruct) for fracture-surface image analysis using a curated dataset of 13,168 open-source, literature-mined fracture-surface images. Morphology annotations were generated by GPT-5.2-Reasoning (high) from both the image...
168	PersonaGest: Personalized Co-Speech Gesture Generation with Semantic-Guided Hierarchical Motion Representation 2605.07252 Personalized Co-speech Gestures用语义引导的分层运动表征生成可个性化的伴随语音手势。	cs.CVcs.MM	Junchuan Zhao, Qifan Liang, Ye Wang	Co-speech gesture generation aims to synthesize realistic body movements that are semantically coherent with speech and faithful to a user-specified gestural style. Existing VQ-VAE based co-speech gesture generation methods improve generation quality but fail ... Co-speech gesture generation aims to synthesize realistic body movements that are semantically coherent with speech and faithful to a user-specified gestural style. Existing VQ-VAE based co-speech gesture generation methods improve generation quality but fail to encode semantic structure into the motion representation or explicitly disentangle content from style, limiting both semantic coherence and personalization fidelity. We present PersonaGest, a two-stage framework addressing both limitatio...
169	Predictive but Not Plannable: RC-aux for Latent World Models 2605.07278 Reachability-Corrected World Models提出RC-aux校正潜空间可达性以提升世界模型规划对齐。	cs.CVcs.LGcs.AI	Wenyuan Li, Guang Li, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama	A latent world model may achieve accurate short-horizon prediction while still inducing a latent space that is poorly aligned with planning. A key issue is spatiotemporal mismatch: these models are often trained with local predictive supervision, but deployed ... A latent world model may achieve accurate short-horizon prediction while still inducing a latent space that is poorly aligned with planning. A key issue is spatiotemporal mismatch: these models are often trained with local predictive supervision, but deployed for long-horizon goal-directed search in latent spaces where Euclidean distance may not reflect what is reachable within a finite action budget. We present the Reachability-Correction auxiliary objective (RC-aux), a lightweight correction f...
170	Task-Oriented Communication for Human Action Understanding via Edge-Cloud Co-Inference 2605.07354 Edge-Cloud Action Understanding提出面向任务的通信与边云协同推理以降低动作识别带宽与延迟。	cs.CV	Jingyi Liu, Cheng Yuan, Lijun He, Jun Zhang, Jiawei Shao	The expanding application of smart sensing has created a growing demand for the accurate understanding of human action at the network edge. Traditional approaches require massive video data to be transmitted from resource-constrained edge devices to powerful c... The expanding application of smart sensing has created a growing demand for the accurate understanding of human action at the network edge. Traditional approaches require massive video data to be transmitted from resource-constrained edge devices to powerful cloud servers, incurring prohibitive uplink bandwidth consumption and unacceptable latency while raising privacy concerns. To overcome these bottlenecks, we propose a task-oriented communication framework for human action understanding (TOAU...
171	Weather-Robust Scene Semantics with Vision-Aligned 4D Radar 2605.07367 Radar-based Weather-Robust Semantics将4D雷达表征对齐视觉嵌入并用VLM生成稳健场景语义描述。	cs.CV	Kali Hamilton, Christoffer Heckman	Cameras and LiDAR degrade in rain, fog, and snow, while millimeter-wave radar remains largely unaffected. We align a radar encoder to frozen SigLIP vision embeddings and decode structured scene captions through a frozen vision-language model (VLM) with approxi... Cameras and LiDAR degrade in rain, fog, and snow, while millimeter-wave radar remains largely unaffected. We align a radar encoder to frozen SigLIP vision embeddings and decode structured scene captions through a frozen vision-language model (VLM) with approximately 7M trainable parameters. On K-RADAR with held-out fog, light snow, and heavy snow sequences, all radar configurations outperform a camera baseline that collapses to over 90% hallucination. We identify a token-norm mismatch as the dom...
172	Velocity-Space 3D Asset Editing 2605.07385 Local 3D Editing in Velocity Space在ODE采样器速度场中施加局部约束实现原生3D资产编辑。	cs.CV	Hao Liu, Yuxuan Lin, Jingfeng Guo, Ruihang Chu, Junjie Wang	Editing a 3D asset locally, modifying a target region while preserving the rest, is a fundamental requirement of native 3D editing. Existing methods enforce locality through mechanisms external to the generator, such as manual 3D masks, post-hoc voxel merging,... Editing a 3D asset locally, modifying a target region while preserving the rest, is a fundamental requirement of native 3D editing. Existing methods enforce locality through mechanisms external to the generator, such as manual 3D masks, post-hoc voxel merging, or 2D multi-view lifting. None of them intervene where the corruption actually originates: inside the ODE sampler. For a rectified-flow generator to achieve faithful local editing, its velocity field should be strong over the target editin...
173	SR$^2$-LoRA: Self-Rectifying Inter-layer Relations in Low-Rank Adaptation for Class-Incremental Learning 2605.07420 LoRA for Class-Incremental Learning用自校正层间关系的LoRA减轻类增量学习中的遗忘。	cs.CVcs.LG	Fengqiang Wan, Yipeng Lin, Kan Lv, Yang Yang	Pre-trained models with parameter-efficient fine-tuning (PEFT) have demonstrated promising potential for class-incremental learning (CIL), yet catastrophic forgetting still persists when adapting models to new tasks. In this paper, we present a novel perspecti... Pre-trained models with parameter-efficient fine-tuning (PEFT) have demonstrated promising potential for class-incremental learning (CIL), yet catastrophic forgetting still persists when adapting models to new tasks. In this paper, we present a novel perspective on catastrophic forgetting through the analysis of inter-layer relation drift, i.e., the progressive disruption of relationships among layer-wise representations during the learning of new tasks. We theoretically show that the increase o...
174	Is the Future Compatible? Diagnosing Dynamic Consistency in World Action Models 2605.07514 Dynamic Consistency in Action Models提出诊断指标检验世界动作模型生成未来与动作序列的动力一致性。	cs.CV	Bo-Kai Ruan, Teng-Fang Hsiao, Ling Lo, Hong-Han Shuai	World Action Models (WAMs) enable decision-making through imagined rollouts by predicting future observations and actions. However, the reliability of these imagined futures remains under-examined: is a generated future merely visually plausible, or is it dyna... World Action Models (WAMs) enable decision-making through imagined rollouts by predicting future observations and actions. However, the reliability of these imagined futures remains under-examined: is a generated future merely visually plausible, or is it dynamically compatible with the action sequence it claims to model? In this work, we identify action-state consistency, the alignment between predicted actions and induced state transitions, as a missing reliability axis for WAMs. Through a sys...
175	Stochastic Transition-Map Distillation for Fast Probabilistic Inference 2605.07661 Fast Diffusion Inference Distillation提出STMD蒸馏完整转移分布以加速扩散推断并保持随机性。	cs.CVcs.LG	George Rapakoulias, Peter Garud, Lingjiong Zhu, Panagiotis Tsiotras	Diffusion models achieve strong generation quality, diversity, and distribution coverage, but their performance often comes with expensive inference. In this work, we propose Stochastic Transition-Map Distillation (STMD), a teacher-free framework for accelerat... Diffusion models achieve strong generation quality, diversity, and distribution coverage, but their performance often comes with expensive inference. In this work, we propose Stochastic Transition-Map Distillation (STMD), a teacher-free framework for accelerating diffusion model inference while preserving probabilistic sample generation. In contrast to score-based diffusion models, whose denoising parametrization models the mean of the posterior distribution, STMD distills the full transition ma...
176	Spectral Surgery: Class-Targeted Post-Hoc Rebalancing via Hessian Spike Perturbation 2605.07790 Hessian-based Post-hoc Rebalancing利用Hessian谱尖峰扰动进行类别定向的后处理再平衡提升分类。	cs.CVcs.LG	Hugo Vigna, Samuel Bontemps	The Hessian spectrum of trained deep networks exhibits a characteristic structure: a continuous bulk of near-zero eigenvalues and a small number of large outlier eigenvalues (spikes), confirming the relevance of Random Matrix Theory in deep learning. The spike... The Hessian spectrum of trained deep networks exhibits a characteristic structure: a continuous bulk of near-zero eigenvalues and a small number of large outlier eigenvalues (spikes), confirming the relevance of Random Matrix Theory in deep learning. The spike count matches the number of classes minus one. While prior work has described this structure, no method has exploited it operationally to improve classification performance. We propose Spectral Surgery, a post-hoc optimization method that ...
177	Pre-training Enables Extraordinary All-optical Image Denoising 2605.07810 Pretrained Optical Image Denoising通过预训练提升全光学神经网络的快照图像去噪质量。	cs.CV	Xudong Lv, Yuxiang Sun, Shuo Wang, Nanxing Chen, Jun Guan	Optical neural networks are emerging as powerful machine learning and information processing tools because of their potential advantages in speed and energy efficiency. The training methods of these physical models, however, remain underexplored compared to th... Optical neural networks are emerging as powerful machine learning and information processing tools because of their potential advantages in speed and energy efficiency. The training methods of these physical models, however, remain underexplored compared to their digital counterparts and are leading to suboptimal performance. This paper reports a pre-training-driven approach that leads to snapshot image denoising with substantially improved quality. We demonstrated effective free-space optical d...
178	Anisotropic Modality Align 2605.07825 Multimodal Representation Interchangeability提出各向异性对齐方法缓解模态表示偏移以支持用单模态数据训练。	cs.CVcs.MM	Xiaomin Yu, Yijiang Li, Yuhui Zhang, Hanzhen Zhao, Yue Yang	Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models ... Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unimodal data. However, the key premise of this paradigm remains insufficiently understood: can representations from different modalities be reliably interchanged? The core obstacle lies in the persistent...
179	Enhancing Federated Quadruplet Learning: Stochastic Client Selection and Embedding Stability Analysis 2605.07888 Federated Metric Learning Stability提出FedQuad结合随机选客户端与稳定嵌入以提升联邦度量学习泛化。	cs.CVcs.LG	Ozgu Goksu, Nicolas Pugeault	Federated Learning (FL) enables decentralised model training across distributed clients without requiring data centralisation. However, the generalisation performance of the global model is usually degraded by data heterogeneity across clients, particularly un... Federated Learning (FL) enables decentralised model training across distributed clients without requiring data centralisation. However, the generalisation performance of the global model is usually degraded by data heterogeneity across clients, particularly under limited data availability and class imbalance. To address this challenge, we propose FedQuad, a novel method that explicitly enforces minimising intra-class representations while enabling inter-class splits across clients. By jointly mi...
180	Consistency Regularised Gradient Flows for Inverse Problems 2605.07907 Gradient Flows for Inverse Problems用一致性正则的欧氏-W2梯度流加速LDM先验下的逆问题求解。	cs.CVcs.LG	Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra, O. Deniz Akyildiz	Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagatio... Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagation through large pretrained components, leading to substantial computational costs and, in some cases, degraded reconstruction quality. We propose a unified Euclidean-Wasserstein-2 gradient-flow framework that jointly performs posterior samp...
181	Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning 2605.07914 Multi-distribution generalization geometry提出同时考虑平坦性与梯度对齐的多分布学习方法。	cs.CVcs.LG	Aristotelis Ballas, Christos Diou	Sharpness-aware and gradient-alignment methods have been shown to improve generalization, however each family of methods targets a single geometric property of the loss landscape, while ignoring the other. In this paper, we show that this omission is structura... Sharpness-aware and gradient-alignment methods have been shown to improve generalization, however each family of methods targets a single geometric property of the loss landscape, while ignoring the other. In this paper, we show that this omission is structurally unavoidable and that both flatness and gradient alignment should be considered in multi-distribution learning settings. Specifically, we derive an excess-risk decomposition that yields two additive leading-order terms: (i) an alignment ...
182	TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning 2605.07943 Active vision imitation benchmark发布TAVIS基准评测主动视线控制的模仿学习。	cs.CVcs.LGcs.AI	Giacomo Spigler	Active vision -- where a policy controls its own gaze during manipulation -- has emerged as a key capability for imitation learning, with multiple independent systems demonstrating its benefits in the past year. Yet there is no shared benchmark to compare appr... Active vision -- where a policy controls its own gaze during manipulation -- has emerged as a key capability for imitation learning, with multiple independent systems demonstrating its benefits in the past year. Yet there is no shared benchmark to compare approaches or quantify what active vision contributes, on which task types, and under what conditions. We introduce TAVIS, evaluation infrastructure for active-vision imitation learning, with two complementary task suites -- TAVIS-Head (5 tasks...
183	Uncertainty Quantification for Cardiac Shape Reconstruction with Deep Signed Distance Functions via MCMC methods 2605.07987 Uncertainty-aware cardiac reconstruction用DeepSDF结合MCMC量化心脏形状重建不确定性。	cs.CV	Jan Verh\"ulsdonk, Thomas Grandits, Francisco Sahli Costabal, Thomas Beiert, Simone Pezzuto	Atlas-based approaches allow high-quality, patient-specific shape reconstructions of cardiac anatomy from sparse and/or noisy data such as point clouds. However, these methods are mainly prior-driven, so the impact of uncertainty can be large, limiting their c... Atlas-based approaches allow high-quality, patient-specific shape reconstructions of cardiac anatomy from sparse and/or noisy data such as point clouds. However, these methods are mainly prior-driven, so the impact of uncertainty can be large, limiting their clinical reliability. We propose a probabilistic framework for uncertainty-aware cardiac shape reconstruction that combines Deep Signed Distance Functions (DeepSDFs) with Markov Chain Monte Carlo (MCMC) sampling. Cardiac geometries are model...
184	123D: Unifying Multi-Modal Autonomous Driving Data at Scale 2605.08084 Autonomous driving data unification构建123D统一多模态自动驾驶数据格式与工具链。	cs.CV	Daniel Dauner, Valentin Charraut, Bastian Berle, Tianyu Li, Long Nguyen	The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, anno... The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, annotations, traffic lights, and HD maps, with different rates and synchronization schemes. They come in fragmented formats requiring complex dependencies that cannot natively coexist in the same development environment. Further, major inconsis...
185	ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation 2408.06747 CLIP bias-corrected segmentation显式建模并纠正CLIP偏置以提升无监督分割。	cs.CV	Jingyun Wang, Guoliang Kang	Recent works utilize CLIP to perform the challenging unsupervised semantic segmentation task where only images without annotations are available. However, we observe that when adopting CLIP to such a pixel-level understanding task, unexpected bias (including c... Recent works utilize CLIP to perform the challenging unsupervised semantic segmentation task where only images without annotations are available. However, we observe that when adopting CLIP to such a pixel-level understanding task, unexpected bias (including class-preference bias and space-preference bias) occurs. Previous works don't explicitly model the bias, which largely constrains the segmentation performance. In this paper, we propose to explicitly model and rectify the bias existing in CL...
186	Multimodal Diffusion Transformer with Memory Bank for Scalable Long-Duration Talking Video Generation 2411.16748 Long talking video generation用带记忆库的多模态扩散Transformer生成长口播视频。	cs.CV	Haojie Zhang, Zhihao Liang, Ruibo Fu, Bingyan Liu, Zhengqi Wen	Long-duration talking video synthesis faces enduring challenges in achieving high video quality, portrait consistency, temporal coherence, and computational efficiency. As video length increases, issues such as visual degradation, portrait drift, temporal arti... Long-duration talking video synthesis faces enduring challenges in achieving high video quality, portrait consistency, temporal coherence, and computational efficiency. As video length increases, issues such as visual degradation, portrait drift, temporal artifacts, and error accumulation become increasingly problematic, severely affecting the realism and reliability of the results. To address these challenges, we present LetsTalk, a diffusion transformer framework equipped with multimodal guida...
187	Surgical Visual Understanding (SurgVU) Dataset 2501.09209 Surgical video dataset发布SurgVU大规模手术视频数据集及多任务标注。	cs.CV	Aneeq Zia, Max Berniker, Rogerio Nespolo, Xiaorui Zhang, Conor Perreault	Owing to recent advances in machine learning and the ability to harvest large amounts of data during robotic-assisted surgeries, surgical data science is ripe for foundational work. We present a large dataset of surgical videos and their accompanying labels fo... Owing to recent advances in machine learning and the ability to harvest large amounts of data during robotic-assisted surgeries, surgical data science is ripe for foundational work. We present a large dataset of surgical videos and their accompanying labels for this purpose. We describe how the data was collected and some of its unique attributes. Multiple example problems are outlined. Although the dataset was curated for a particular set of scientific challenges (in an accompanying paper), it ...
188	RedDiffuser: Auditing Multimodal Safety Failures in Vision-Language Models via Reinforced Diffusion 2503.06223 VLM safety auditing用强化扩散生成对抗情境审计多模态安全失效。	cs.CV	Ruofan Wang, Xingjun Ma	Large Vision-Language Models (VLMs) are increasingly deployed in open-ended environments, where ensuring reliable safety under multimodal inputs is critical. However, existing evaluations remain largely instruction-centric, focusing on explicit malicious queri... Large Vision-Language Models (VLMs) are increasingly deployed in open-ended environments, where ensuring reliable safety under multimodal inputs is critical. However, existing evaluations remain largely instruction-centric, focusing on explicit malicious queries while overlooking a more realistic and underexplored risk: whether safety alignment remains robust under harmful contextual exposure. This limitation is particularly important for multimodal systems, where visual inputs can substantially...
189	Tables Guide Vision: Learning to See the Heart through Tabular Data 2503.14998 Tabular-guided medical contrastive learning利用表格临床属性引导对比学习提升心脏影像表征。	cs.CV	Marta Hasny, Maxime Di Folco, Keno Bressem, Julia Schnabel	Contrastive learning methods in computer vision typically rely on augmented views of the same image or multimodal pretraining strategies that align paired modalities. However, these approaches often overlook semantic relationships between distinct instances, l... Contrastive learning methods in computer vision typically rely on augmented views of the same image or multimodal pretraining strategies that align paired modalities. However, these approaches often overlook semantic relationships between distinct instances, leading to false negatives when semantically similar samples are treated as negatives. This limitation is especially critical in medical imaging domains such as cardiology, where demographic and clinical attributes play a critical role in as...
190	Frozen Backpropagation: Relaxing Weight Symmetry in Deep Spiking Neural Networks 2505.13741 Spiking neural network training提出冻结反传以放宽SNN前后向权重对称约束。	cs.CV	Gaspard Goupy, Pierre Tirilly, Ioan Marius Bilasco	Direct training of Spiking Neural Networks (SNNs) on neuromorphic hardware can greatly reduce energy costs compared to GPU-based training. However, implementing Backpropagation (BP) on such hardware is challenging because forward and backward passes are typica... Direct training of Spiking Neural Networks (SNNs) on neuromorphic hardware can greatly reduce energy costs compared to GPU-based training. However, implementing Backpropagation (BP) on such hardware is challenging because forward and backward passes are typically performed by separate networks with distinct weights. To compute correct gradients, forward and feedback weights must remain symmetric during training, necessitating weight transport between the two networks. This symmetry requirement i...
191	CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition 2505.14113 Conformal uncertainty for segmentation用空间分组分解的共形预测给分割提供可靠置信集。	cs.CVcs.LG	Bruno Viti, Elias Karabelas, Martin Holler	Most machine learning-based image segmentation models produce pixel-wise confidence scores that represent the model's predicted probability for each class label at every pixel. While this information can be particularly valuable in high-stakes domains such as ... Most machine learning-based image segmentation models produce pixel-wise confidence scores that represent the model's predicted probability for each class label at every pixel. While this information can be particularly valuable in high-stakes domains such as medical imaging, these scores are heuristic in nature and do not constitute rigorous quantitative uncertainty estimates. Conformal prediction (CP) provides a principled framework for transforming heuristic confidence scores into statistical...
192	LoopNav: Benchmarking Spatial Consistency in World Models 2505.22976 Spatial consistency world-model benchmark提出LoopNav基准评测世界模型的长程空间一致性。	cs.CVcs.AI	Kewei Lian, Shaofei Cai, Yitao Liang, Anji Liu	The ability to simulate the world in a spatially consistent manner is a crucial requirement for effective world models. Such a model enables high-quality visual generation, and also ensures the reliability of world models for downstream tasks such as simulatio... The ability to simulate the world in a spatially consistent manner is a crucial requirement for effective world models. Such a model enables high-quality visual generation, and also ensures the reliability of world models for downstream tasks such as simulation and planning. It must not only retain long-horizon observational information, but also enables the construction of explicit or implicit internal spatial representations. However, existing datasets do not explicitly enforce spatial consist...
193	Factored Classifier-Free Guidance 2506.14399 Attribute-wise diffusion guidance提出分解式CFG为反事实扩散生成按属性独立引导。	cs.CVcs.AI	Tian Xia, Fabio De Sousa Ribeiro, Rajat R Rasal, Avinash Kori, Raghav Mehta	Counterfactual generation aims to simulate realistic hypothetical outcomes under causal interventions. Diffusion models have emerged as a powerful tool for this task, combining DDIM inversion with conditional generation and classifier-free guidance (CFG). In t... Counterfactual generation aims to simulate realistic hypothetical outcomes under causal interventions. Diffusion models have emerged as a powerful tool for this task, combining DDIM inversion with conditional generation and classifier-free guidance (CFG). In this work, we identify a key limitation of CFG for counterfactual generation: it prescribes a global guidance scale for all attributes, leading to significant spurious changes in inferred counterfactuals. To mitigate this, we propose Factore...
194	NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection 2508.01248 Generalizable AI-image detection用CLIP特征的零空间解耦提升未知生成器检测泛化。	cs.CV	Jiazhen Yan, Fan Wang, Weiwei Jiang, Ziqiang Li, Zhangjie Fu	The rapid progress of generative models, such as GANs and diffusion models, has facilitated the creation of highly realistic images, raising growing concerns over their misuse in security-sensitive domains. While existing detectors perform well under known gen... The rapid progress of generative models, such as GANs and diffusion models, has facilitated the creation of highly realistic images, raising growing concerns over their misuse in security-sensitive domains. While existing detectors perform well under known generative settings, they often fail to generalize to unknown generative models, especially when semantic content between real and fake images is closely aligned. In this paper, we revisit the use of CLIP features for AI-generated image detect...
195	Deeply Dual Supervised learning for melanoma recognition 2508.01994 Melanoma recognition framework以局部与全局双重监督增强黑色素瘤识别。	cs.CV	Rujosh Polma, Krishnan Menon Iyer	As the application of deep learning in dermatology continues to grow, the recognition of melanoma has garnered significant attention, demonstrating potential for improving diagnostic accuracy. Despite advancements in image classification techniques, existing m... As the application of deep learning in dermatology continues to grow, the recognition of melanoma has garnered significant attention, demonstrating potential for improving diagnostic accuracy. Despite advancements in image classification techniques, existing models still face challenges in identifying subtle visual cues that differentiate melanoma from benign lesions. This paper presents a novel Deeply Dual Supervised Learning framework that integrates local and global feature extraction to enha...
196	VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling 2508.02129 Dynamic scene 4D Gaussian splatting结合视频扩散增强4D高斯溅射建模动态城市场景。	cs.CV	Yuru Xiao, Zihan Lin, Chao Lu, Deming Zhai, Kui Jiang	Dynamic urban scene modeling is a rapidly evolving area with broad applications. While current approaches leveraging neural radiance fields or Gaussian Splatting have achieved fine-grained reconstruction and high-fidelity novel view synthesis, they still face ... Dynamic urban scene modeling is a rapidly evolving area with broad applications. While current approaches leveraging neural radiance fields or Gaussian Splatting have achieved fine-grained reconstruction and high-fidelity novel view synthesis, they still face significant limitations. These often stem from a dependence on pre-calibrated object tracks or difficulties in accurately modeling fast-moving objects from undersampled capture, particularly due to challenges in handling temporal discontinu...
197	Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling 2508.06805 Medical organ boundary edge detection通过自顶向下细化与亚像素上采样精确定位器官边界。	cs.CV	Aarav Mehta, Priya Deshmukh, Vikram Singh, Siddharth Malhotra, Krishnan Menon Iyer	Accurate localization of organ boundaries is critical in medical imaging for segmentation, registration, surgical planning, and radiotherapy. While deep convolutional networks (ConvNets) have advanced general-purpose edge detection to near-human performance on... Accurate localization of organ boundaries is critical in medical imaging for segmentation, registration, surgical planning, and radiotherapy. While deep convolutional networks (ConvNets) have advanced general-purpose edge detection to near-human performance on natural images, their outputs often lack precise localization, a limitation that is particularly harmful in medical applications where millimeter-level accuracy is required. Building on a systematic analysis of ConvNet edge outputs, we pro...
198	DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation 2508.06816 Lesion segmentation with artifact suppression提出双分辨率残差网络抑制伪影并精细分割皮损。	cs.CV	Vikram Singh, Kabir Malhotra, Rohan Desai, Ananya Shankaracharya, Priyadarshini Chatterjee	Lesion segmentation, in contrast to natural scene segmentation, requires handling subtle variations in texture and color, frequent imaging artifacts (such as hairs, rulers, and bubbles), and a critical need for precise boundary localization to aid in accurate ... Lesion segmentation, in contrast to natural scene segmentation, requires handling subtle variations in texture and color, frequent imaging artifacts (such as hairs, rulers, and bubbles), and a critical need for precise boundary localization to aid in accurate diagnosis. The accurate delineation of melanocytic tumors in dermoscopic images is a crucial component of automated skin cancer screening systems and clinical decision support. In this paper, we present a novel dual-resolution architecture ...
199	VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation 2508.06819 Weakly supervised vessel segmentation用学习的随机游走传播实现弱监督皮下血管分割。	cs.CV	Ayaan Nooruddin Siddiqui, Mahnoor Zaidi, Ayesha Nazneen Shahbaz, Priyadarshini Chatterjee, Krishnan Menon Iyer	The task of parsing subcutaneous vessels in clinical images is often hindered by the high cost and limited availability of ground truth data, as well as the challenge of low contrast and noisy vessel appearances across different patients and imaging modalities... The task of parsing subcutaneous vessels in clinical images is often hindered by the high cost and limited availability of ground truth data, as well as the challenge of low contrast and noisy vessel appearances across different patients and imaging modalities. In this work, we propose a novel weakly supervised training framework specifically designed for subcutaneous vessel segmentation. This method utilizes low-cost, sparse annotations such as centerline traces, dot markers, or short scribbles...
200	Dino U-Net: Exploiting High-Fidelity Dense Features from Foundation Models for Medical Image Segmentation 2508.20909 Foundation-model features for med segmentation以冻结DINOv3密集特征构建Dino U-Net做医学分割。	cs.CV	Haoyue Li, Yifan Gao, Feng Yuan, Xiaosong Wang, Xin Gao	Foundation models pre-trained on large-scale natural image datasets offer a powerful paradigm for medical image segmentation. However, effectively transferring their learned representations for precise clinical applications remains a challenge. In this work, w... Foundation models pre-trained on large-scale natural image datasets offer a powerful paradigm for medical image segmentation. However, effectively transferring their learned representations for precise clinical applications remains a challenge. In this work, we propose Dino U-Net, a novel encoder-decoder architecture designed to exploit the high-fidelity dense features of the DINOv3 vision foundation model. Our architecture introduces an encoder built upon a frozen DINOv3 backbone, which employs...
201	CalexNet: Soft Cascade-Aligned Training and Calibration for Lightweight Early-Exit Branches 2509.08318 Early-exit calibration training提出CalexNet对齐训练与推理分布并校准早退分支。	cs.CV	Yehudit Aperstein, Alexander Apartsin	Early-exit cascades over a frozen convolutional backbone enable adaptive inference but suffer from three sources of train-inference mismatch: branches train on samples they will never see at inference, their per-class precision thresholds are calibrated on the... Early-exit cascades over a frozen convolutional backbone enable adaptive inference but suffer from three sources of train-inference mismatch: branches train on samples they will never see at inference, their per-class precision thresholds are calibrated on the wrong distribution, and the standard cross-entropy target on backbone argmax labels discards the backbone's uncertainty signal. We close all three gaps with CalexNet (Cascade-Aligned Early eXits), a training-recipe-only modification: branc...
202	A Computer Vision Pipeline for Individual-Level Behavior Analysis: Benchmarking on the Edinburgh Pig Dataset 2509.12047 Animal behavior vision pipeline构建个体级猪群行为分析视觉流水线并在数据集上评测。	cs.CVcs.AI	Haiyu Yang, Enhong Liu, Jennifer Sun, Sumit Sharma, Meike van Leerdam	Animal behavior analysis plays a crucial role in understanding animal welfare, health status, and productivity in agricultural settings. However, traditional manual observation methods are time-consuming, subjective, and limited in scalability. We present a mo... Animal behavior analysis plays a crucial role in understanding animal welfare, health status, and productivity in agricultural settings. However, traditional manual observation methods are time-consuming, subjective, and limited in scalability. We present a modular pipeline that leverages open-sourced state-of-the-art computer vision techniques to automate animal behavior analysis in a group housing environment. Our approach combines state-of-the-art models for zero-shot object detection, motion...
203	TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses 2509.22813 Test-time adaptation for SSMs用不确定性引导的遍历生成多视角实现SSM测试时自适应。	cs.CV	Sahar Dastani, Ali Bahri, Gustavo Adolfo Vargas Hakim, Moslem Yazdanpanah, Mehrdad Noori	State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering architecture designed for vision tasks. However, their generalization performance degrades significantly under distribution... State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering architecture designed for vision tasks. However, their generalization performance degrades significantly under distribution shifts. To address this limitation, we propose TRUST (Test-Time Refinement using Uncertainty-Guided SSM Traverses), a novel test-time adaptation (TTA) method that leverages diverse traversal permutations to generate multiple causal perspec...
204	GRAPE: Let GRPO Supervise Query Rewriting by Ranking for Retrieval 2509.23370 LLM query rewriting for retrieval用GRPO式排序监督训练LLM改写查询以提升检索鲁棒性。	cs.CV	Zhaohua Zhang, Jianhuan Zhuo, Muxi Chen, Chenchen Zhao, Wenyu Jiang	The CLIP model has established itself as a cornerstone of large-scale retrieval systems. However, its performance often degrades under distributional shifts such as multilingual, long-form, or multimodal queries. To avoid the prohibitive costs associated with ... The CLIP model has established itself as a cornerstone of large-scale retrieval systems. However, its performance often degrades under distributional shifts such as multilingual, long-form, or multimodal queries. To avoid the prohibitive costs associated with retriever retraining or corpus re-embedding, we propose GRAPE (Grouped Ranking-Aware Policy Optimization Enhancement), a plug-and-play approach that leverages LLM-based query rewriting to bridge these gaps. Unlike existing methods that lack...
205	PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection 2509.26272 VLM deepfake detection optimization构建推理标注数据并用段落级策略优化提升深伪检测。	cs.CVcs.LG	Tuan Nguyen, Naseem Khan, Khang Tran, NhatHai Phan, Issa Khalil	The rapid rise of synthetic media has made deepfake detection a critical challenge for online safety and trust. Progress remains constrained by the scarcity of large, high-quality datasets. Although multimodal large language models (LLMs) exhibit strong reason... The rapid rise of synthetic media has made deepfake detection a critical challenge for online safety and trust. Progress remains constrained by the scarcity of large, high-quality datasets. Although multimodal large language models (LLMs) exhibit strong reasoning capabilities, their performance on deepfake detection is poor, often producing explanations that are misaligned with visual evidence or hallucinatory. To address this limitation, we introduce a reasoning-annotated dataset for deepfake d...
206	Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry 2510.08638 DINO interpretability with SAEs用稀疏自编码字典解析DINO概念并研究其几何结构。	cs.CVcs.AI	Thomas Fel, Binxu Wang, Michael A. Lepori, Matthew Kowal, Andrew Lee	DINOv2 is routinely deployed to recognize objects, scenes, and actions; yet the nature of what it perceives remains unknown. As a working baseline, we adopt the Linear Representation Hypothesis (LRH) and operationalize it using SAEs, producing a 32,000-unit di... DINOv2 is routinely deployed to recognize objects, scenes, and actions; yet the nature of what it perceives remains unknown. As a working baseline, we adopt the Linear Representation Hypothesis (LRH) and operationalize it using SAEs, producing a 32,000-unit dictionary that serves as the interpretability backbone of our study, which unfolds in three parts. In the first part, we analyze how different downstream tasks recruit concepts from our learned dictionary, revealing functional specialization...
207	DKDS: A Benchmark Dataset of Degraded Kuzushiji Documents with Seals for Detection and Binarization 2511.09117 Degraded Kuzushiji document benchmark发布含退化与印章的古文书数据集用于检测与二值化。	cs.CV	Rui-Yang Ju, Kohei Yamashita, Hirotaka Kameko, Shinsuke Mori	Kuzushiji, a pre-modern Japanese cursive script, can currently be read and understood by only a few thousand trained experts in Japan. With the rapid development of deep learning, researchers have begun applying Optical Character Recognition (OCR) techniques t... Kuzushiji, a pre-modern Japanese cursive script, can currently be read and understood by only a few thousand trained experts in Japan. With the rapid development of deep learning, researchers have begun applying Optical Character Recognition (OCR) techniques to transcribe Kuzushiji into modern Japanese. Although existing OCR methods perform well on clean pre-modern Japanese documents written in Kuzushiji, they often fail to consider various types of noise, such as document degradation and seals,...
208	Teaching Prompts to Coordinate: Hierarchical Layer-Grouped Prompt Tuning for Continual Learning 2511.12090 Prompt tuning for continual learning提出分层分组提示调优以协调层间适配并减轻遗忘。	cs.CV	Shengqin Jiang, Tianqi Kong, Yuankai Qi, Haokui Zhang, Lina Yao	Prompt-based continual learning methods fine-tune only a small set of additional learnable parameters while keeping the pre-trained model's parameters frozen. It enables efficient adaptation to new tasks while mitigating the risk of catastrophic forgetting. Th... Prompt-based continual learning methods fine-tune only a small set of additional learnable parameters while keeping the pre-trained model's parameters frozen. It enables efficient adaptation to new tasks while mitigating the risk of catastrophic forgetting. These methods typically attach one independent task-specific prompt to each layer of pre-trained models to locally modulate its features, ensuring that the layer's representation aligns with the requirements of the new task. However, although...
209	Physics-Based Benchmarking Metrics for Multimodal Synthetic Images 2511.15204 Physics-based multimodal evaluation metric提出物理约束多模态评测指标衡量合成图像真实性。	cs.CVcs.AI	Kishor Datta Gupta, Marufa Kamal, Md. Mahfuzur Rahman, Fahad Rahman, Mohd Ariful Haque	Current state of the art measures like BLEU, CIDEr, VQA score, SigLIP-2 and CLIPScore are often unable to capture semantic or structural accuracy, especially for domain-specific or context-dependent scenarios. For this, this paper proposes a Physics-Constraine... Current state of the art measures like BLEU, CIDEr, VQA score, SigLIP-2 and CLIPScore are often unable to capture semantic or structural accuracy, especially for domain-specific or context-dependent scenarios. For this, this paper proposes a Physics-Constrained Multimodal Data Evaluation (PCMDE) metric combining large language models with reasoning, knowledge based mapping and vision-language models to overcome these limitations. The architecture is comprised of three main stages: (1) feature ex...
210	GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection 2512.02991 Multimodal 3D object detection fusion以动态图注意卷积与跨模态Transformer融合图像点云做3D检测。	cs.CV	Md Sohag Mia, Md Nahid Hasan, Muhammad Abdullah Adnan	Despite significant progress in 3D object detection, point clouds remain challenging due to sparse data, incomplete structures, and limited semantic information. Capturing contextual relationships between distant objects presents additional difficulties. To ad... Despite significant progress in 3D object detection, point clouds remain challenging due to sparse data, incomplete structures, and limited semantic information. Capturing contextual relationships between distant objects presents additional difficulties. To address these challenges, we propose GraphFusion3D, a unified framework combining multi-modal fusion with advanced feature learning. Our approach introduces the Adaptive Cross-Modal Transformer (ACMT), which adaptively integrates image featur...
211	Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles 2512.03454 Future-aware visual grounding提出世界模型式推理框架，先预测场景演化再做驾驶指令目标定位。	cs.CVcs.AI	Haicheng Liao, Huanming Shen, Bonan Wang, Yongkang Li, Yihong Tang	Interpreting natural-language commands to localize target objects is critical for autonomous driving (AD). Existing visual grounding (VG) methods for autonomous vehicles (AVs) typically struggle with ambiguous, context-dependent instructions, as they lack reas... Interpreting natural-language commands to localize target objects is critical for autonomous driving (AD). Existing visual grounding (VG) methods for autonomous vehicles (AVs) typically struggle with ambiguous, context-dependent instructions, as they lack reasoning over 3D spatial relations and anticipated scene evolution. Grounded in the principles of world models, we propose ThinkDeeper, a framework that reasons about future spatial states before making grounding decisions. At its core is a Sp...
212	ProcObject-10K: Benchmarking Object-Centric Procedural Understanding in Instructional Videos 2512.03479 Object-centric procedural video QA构建ProcObject-10K评测物体状态变化推理与时序证据定位的VideoQA。	cs.CV	Wenliang Guo, Yu Kong	Procedural activities are fundamentally driven by object state transitions, yet existing instructional video benchmarks remain action-centric and cannot evaluate whether models reason about how objects evolve toward task completion. In this work, we introduce ... Procedural activities are fundamentally driven by object state transitions, yet existing instructional video benchmarks remain action-centric and cannot evaluate whether models reason about how objects evolve toward task completion. In this work, we introduce ProcObject-10K, the first benchmark that jointly evaluates object-centric reasoning and temporal evidence grounding in instructional videos, across both egocentric and exocentric views. It comprises 10,522 open-ended VideoQA pairs grounded ...
213	S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss 2601.01285 Medical image segmentation network提出谱空混合分割网络并配形态感知自适应损失以提升医学分割效果。	cs.CV	Md. Sanaullah Chowdhury Lameya Sabrin	Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures ... Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures fail to resolve. Although convolutional networks provide local precision at $\mathcal{O}(n)$ cost but limited receptive fields, vision transformers achieve global context through $\mathcal{O}(n^2)$ self-attention at prohibitive computationa...
214	CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval 2601.03728 Composed image retrieval alignment用CoT增强对称对齐与记忆库缓解多模态表征碎片化提升CIR检索。	cs.CVcs.AI	Zhipeng Qian, Zihan Liang, Yufei Ma, Ben Chen, Huangyu Dai	Composed Image Retrieval (CIR) enables users to search for target images using both a reference image and manipulation text, offering substantial advantages over single-modality retrieval systems. However, existing CIR methods suffer from representation space ... Composed Image Retrieval (CIR) enables users to search for target images using both a reference image and manipulation text, offering substantial advantages over single-modality retrieval systems. However, existing CIR methods suffer from representation space fragmentation: queries and targets comprise heterogeneous modalities and are processed by distinct encoders, forcing models to bridge misaligned representation spaces only through post-hoc alignment, which fundamentally limits retrieval per...
215	A Unified and Controllable Framework for Layered Image Generation with Visual Effects 2601.15507 Layered image generation effects统一可控分层生成框架，保持主体身份并生成阴影反射等真实视觉效果。	cs.CV	Jinrui Yang, Qing Liu, Yijun Li, Mengwei Ren, Letian Zhang	Recent image generation models produce impressive composites, but often fail to preserve the identity of user-provided content when editing specific elements: the surrounding scene may shift, and even the edited object's appearance can drift from the original.... Recent image generation models produce impressive composites, but often fail to preserve the identity of user-provided content when editing specific elements: the surrounding scene may shift, and even the edited object's appearance can drift from the original. Layered representation offer a natural remedy--they allow users to independently manipulate individual elements--but existing layered methods typically produce transparent foregrounds without realistic visual effects such as shadows and re...
216	Contrast-X: A Multi-Modal Contrast Image Synthesis Benchmark and Universal Modality Flow Matching 2601.15884 Contrast image synthesis benchmark发布Contrast-X配对增强数据集并提出通用模态流匹配以合成对比增强影像。	cs.CV	Yifan Chen, Fei Yin, Hao Chen, Jia Wu, Chao Li	Contrast-enhanced imaging is central to oncologic diagnosis, but contrast agents can be contraindicated for many of the patients who need them most. Synthesizing contrast scans from non-contrast inputs is the natural response. Two obstacles stand in the way: n... Contrast-enhanced imaging is central to oncologic diagnosis, but contrast agents can be contraindicated for many of the patients who need them most. Synthesizing contrast scans from non-contrast inputs is the natural response. Two obstacles stand in the way: no benchmark provides paired contrast data with lesion-level evaluation, and no single model handles the arbitrary missing patterns seen in practice. We introduce Contrast-X, a benchmark of paired contrast-enhanced and non-contrast imaging s...
217	A Step to Decouple Optimization in 3DGS 2601.16736 3D Gaussian splatting optimization分析3DGS优化细节并提出解耦更新策略以提升训练稳定性与效果。	cs.CV	Renjie Ding, Yaonan Wang, Min Liu, Jialin Zhu, Jiazheng Wang	3D Gaussian Splatting (3DGS) has emerged as a powerful technique for real-time novel view synthesis. As an explicit representation optimized through gradient propagation among primitives, optimization widely accepted in deep neural networks (DNNs) is actually ... 3D Gaussian Splatting (3DGS) has emerged as a powerful technique for real-time novel view synthesis. As an explicit representation optimized through gradient propagation among primitives, optimization widely accepted in deep neural networks (DNNs) is actually adopted in 3DGS, such as synchronous weight updating and Adam with the adaptive gradient. However, considering the physical significance and specific design in 3DGS, there are two overlooked details in the optimization of 3DGS: (i) update s...
218	Structure Over Scale: Learning Visual Reasoning from Pedagogical Video 2601.23251 Pedagogical video reasoning learning利用儿童教学视频的问答结构对齐线索，学习空间关系等基础视觉推理。	cs.CV	Bishoy Galoaa, Xiangyu Bai, Sarah Ostadabbas	State-of-the-art vision-language models (VLMs) score impressively on video benchmarks yet stumble on basic visual reasoning tasks involving spatial relations, navigation, and object selection that a preschooler solves easily. We hypothesize that the explicit p... State-of-the-art vision-language models (VLMs) score impressively on video benchmarks yet stumble on basic visual reasoning tasks involving spatial relations, navigation, and object selection that a preschooler solves easily. We hypothesize that the explicit pedagogical structure, specifically the context-question-pause-answer cycles embedded in children's educational video, provides naturally co-aligned reasoning traces: temporally synchronized visual cues, questions, and answers that emerge on...
219	AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation 2602.04672 Hand-object interaction reconstruction提出Agentic生成式方法从单目视频重建手物交互，减少遮挡与SfM依赖。	cs.CV	Jin-Chuan Shi, Binhong Ye, Tao Liu, Xiaoyang Liu, Yangjinhui Xu	Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neura... Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neural rendering often yields fragmented, non-simulation-ready geometries under heavy occlusion, and (2) dependence on brittle Structure-from-Motion (SfM) initialization leads to frequent failures on in-the-wild footage. To overcome these limita...
220	SynthForensics: Benchmarking and Evaluating People-Centric Synthetic Video Deepfakes 2602.04939 Synthetic video deepfake benchmark构建以人物为中心的合成视频深伪基准并系统评测检测方法与压缩影响。	cs.CV	Roberto Leotta, Salvatore Alfio Sambataro, Claudio Vittorio Ragaglia, Mirko Casu, Yuri Petralia	Modern T2V/I2V generators synthesize people increasingly hard to distinguish from authentic footage, while current evaluation suites lag: legacy benchmarks target manipulation-based forgeries, and recent synthetic-video benchmarks prioritize scale over realist... Modern T2V/I2V generators synthesize people increasingly hard to distinguish from authentic footage, while current evaluation suites lag: legacy benchmarks target manipulation-based forgeries, and recent synthetic-video benchmarks prioritize scale over realistic human depiction. We introduce SynthForensics, a people-centric benchmark of $20{,}445$ videos from 8 T2V and 7 I2V open-source generators, paired-source from FF++/DFD reals, two-stage human-validated, in four compression versions with fu...
221	Multimodal Latent Reasoning via Hierarchical Visual Cues Injection 2602.05359 Latent multimodal reasoning提出层级视觉线索注入，在潜空间进行多模态推理以减少冗长CoT与幻觉。	cs.CV	Yiming Zhang, Qiangyu Yan, Borui Jiang, Kai Han	The advancement of multimodal large language models (MLLMs) has enabled impressive perception capabilities. However, their reasoning process often remains a "fast thinking" paradigm, reliant on end-to-end generation or explicit, language-centric chains of thou... The advancement of multimodal large language models (MLLMs) has enabled impressive perception capabilities. However, their reasoning process often remains a "fast thinking" paradigm, reliant on end-to-end generation or explicit, language-centric chains of thought (CoT), which can be inefficient, verbose, and prone to hallucination. This work posits that robust reasoning should evolve within a latent space, integrating multimodal signals seamlessly. We propose multimodal latent reasoning via HIer...
222	MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices 2602.06523 Tiny wearable activity recognition设计超轻量MicroBi-ConvLSTM在微控制器上实现低内存高精度活动识别。	cs.CV	Mridankan Mandal	Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets. State of the art lightweight architectures such as TinierHAR (34K parameters), and TinyHAR (55K parameters... Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets. State of the art lightweight architectures such as TinierHAR (34K parameters), and TinyHAR (55K parameters) achieve strong accuracy, but exceed memory budgets of microcontrollers with limited SRAM once operating system overhead is considered. We present MicroBi-ConvLSTM, an ultra-lightweight convolutional recurrent architecture achieving 11.4K ...
223	Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models 2602.07026 Modality gap subspace alignment刻画模态间几何偏移并提出子空间对齐训练范式以缩小MLLM模态鸿沟。	cs.CVcs.AIcs.MM	Xiaomin Yu, Yi Xin, Yuhui Zhang, Wenjie Zhang, Chonghan Liu	Despite the success of multimodal contrastive learning in aligning visual and linguistic representations, a persistent geometric anomaly, the Modality Gap, remains: embeddings of distinct modalities expressing identical semantics occupy systematically offset r... Despite the success of multimodal contrastive learning in aligning visual and linguistic representations, a persistent geometric anomaly, the Modality Gap, remains: embeddings of distinct modalities expressing identical semantics occupy systematically offset regions. Prior approaches to bridge this gap are largely limited by oversimplified isotropic assumptions, hindering their application in large-scale scenarios. In this paper, we address these limitations by precisely characterizing the geome...
224	Towards Explainable Industrial Anomaly Detection via Knowledge-Guided Latent Reasoning 2602.09850 Explainable industrial anomaly detection提出知识引导的动态潜变量推理框架，实现可解释的工业缺陷检测。	cs.CV	Peng Chen, Chao Huang, Yunkang Cao, Chengliang Liu, Wei Wang	Industrial anomaly detection demands precise reasoning over fine-grained defect patterns. However, existing multimodal large language models (MLLMs), pretrained on general-domain data, often struggle to capture category-specific anomalies, thereby limiting bot... Industrial anomaly detection demands precise reasoning over fine-grained defect patterns. However, existing multimodal large language models (MLLMs), pretrained on general-domain data, often struggle to capture category-specific anomalies, thereby limiting both detection accuracy and interpretability. To address these limitations, we propose Reason-IAD, a knowledge-guided dynamic latent reasoning framework for explainable industrial anomaly detection. Reason-IAD comprises two core components. Fi...
225	The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs 2602.13298 CNN topology vs trainability在统一实验下比较VGG/ResNet/GoogLeNet，提出有效深度解释可训练性差异。	cs.CVcs.AI	Manfred M. Fischer, Joshua Pitts	This paper investigates the relationship between convolutional neural network (CNN) topology and image recognition performance through a comparative study of the VGG, ResNet, and GoogLeNet architectural families. Utilizing a unified experimental framework, the... This paper investigates the relationship between convolutional neural network (CNN) topology and image recognition performance through a comparative study of the VGG, ResNet, and GoogLeNet architectural families. Utilizing a unified experimental framework, the study isolates the impact of depth from confounding implementation variables. A formal distinction is introduced between nominal depth ($D_{\mathrm{nom}}$), representing the physical layer count, and effective depth ($D_{\mathrm{eff}}$), a...
226	AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers 2602.13357 Diffusion Transformer cache correction提出自适应偏移缓存校正，缓解特征复用漂移并加速DiT采样。	cs.CVcs.AI	Dong Liu, Yanxuan Yu, Ben Lengerich, Ying Nian Wu	Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate featu... Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate features, they rely on static reuse schedules or coarse-grained heuristics, which often lead to temporal drift and cache misalignment that significantly degrade generation quality. We introduce \textbf{AdaCorrection}, an adaptive offset cache co...
227	A Causal Diffusion Model for Video Reconstruction from Ultra-Low-Bitrate Representations 2602.13837 Low-bitrate video reconstruction diffusion提出因果视频扩散模型，从超低码率语义与压缩帧重建高一致性视频。	cs.CV	Cem Eteke, Batuhan Tosun, Martin Piccolrovazzi, Alexander Griessel, Wolfgang Kellerer	We study video reconstruction from ultra-low-bitrate representations, where the primary challenge shifts from encoding to decoding. In this regime, reconstruction with classical and neural codecs introduces blur, while generative and semantic approaches often ... We study video reconstruction from ultra-low-bitrate representations, where the primary challenge shifts from encoding to decoding. In this regime, reconstruction with classical and neural codecs introduces blur, while generative and semantic approaches often struggle to jointly preserve fidelity, temporal consistency, and perceptual quality. To address these limitations, we propose a causal video diffusion model that reconstructs videos from ultra-low-bitrate semantics and highly compressed fra...
228	RL-RIG: A Generative Spatial Reasoner via Intrinsic Reflection 2602.19974 RL-based spatially faithful generation用反思式强化学习训练生成模型，提升对提示中细粒度空间关系的遵循。	cs.CV	Tianyu Wang, Zhiyuan Ma, Qian Wang, Xinyi Zhang, Xinwei Long	Recent advancements in image generation have achieved impressive results in producing high-quality images. However, existing image generation models still generally struggle with a spatial reasoning dilemma, lacking the ability to accurately capture fine-grain... Recent advancements in image generation have achieved impressive results in producing high-quality images. However, existing image generation models still generally struggle with a spatial reasoning dilemma, lacking the ability to accurately capture fine-grained spatial relationships from the prompt and correctly generate scenes with structural integrity. To mitigate this dilemma, we propose RL-RIG, a Reinforcement Learning framework for Reflection-based Image Generation. Our architecture compri...
229	Pretty Good Measurement for Radiomics: A Quantum-Inspired Multi-Class Classifier for Lung Cancer Subtyping and Prostate Cancer Risk Stratification 2603.00223 Quantum-inspired radiomics classifier将PGM量子测量思想用于多分类，做肺癌分型与前列腺风险分层。	cs.CV	Giuseppe Sergioli, Carlo Cuccu, Giovanni Pasini, Alessandro Stefano, Giorgio Russo	We investigate a quantum-inspired approach to supervised multi-class classification based on the Pretty Good Measurement (PGM), viewed as an operator-valued decision rule derived from quantum state discrimination. The method associates each class with an encod... We investigate a quantum-inspired approach to supervised multi-class classification based on the Pretty Good Measurement (PGM), viewed as an operator-valued decision rule derived from quantum state discrimination. The method associates each class with an encoded mixed state and performs classification through a single POVM construction, thus providing a genuinely multi-class strategy without reduction to pairwise or one-vs-rest schemes. In this perspective, classification is reformulated as the ...
230	InterCoG: Towards Spatially Precise Image Editing with Interleaved Chain-of-Grounding Reasoning 2603.01586 Chain-of-grounding image editing提出交错式定位推理链，在多实体复杂场景中实现更精确的文本引导编辑。	cs.CV	Yecong Wan, Fan Li, Chunwei Wang, Hao Wu, Mingwen Shao	Emerging unified editing models have demonstrated strong capabilities in general object editing tasks. However, it remains a significant challenge to perform fine-grained editing in complex multi-entity scenes, particularly those where targets are not visually... Emerging unified editing models have demonstrated strong capabilities in general object editing tasks. However, it remains a significant challenge to perform fine-grained editing in complex multi-entity scenes, particularly those where targets are not visually salient and require spatial reasoning. To this end, we propose InterCoG, a novel text-vision Interleaved Chain-of-Grounding reasoning framework for fine-grained image editing in complex real-world scenes. The key insight of InterCoG is to ...
231	SemanticDialect: Semantic-Aware Mixed-Format Quantization for Video Diffusion Transformers 2603.02883 Quantization for video DiTs提出语义感知混合格式量化，降低视频DiT算力内存并保持时序语义质量。	cs.CV	Wonsuk Jang, Thierry Tambe	Diffusion Transformers (DiTs) achieve state-of-the-art video generation quality, but their substantial memory and computational footprints hinder edge deployment. Quantization can reduce these costs, yet existing methods often degrade video quality due to high... Diffusion Transformers (DiTs) achieve state-of-the-art video generation quality, but their substantial memory and computational footprints hinder edge deployment. Quantization can reduce these costs, yet existing methods often degrade video quality due to high activation variation and the difficulty of preserving semantic and temporal coherence. We propose SemanticDialect, which advances block-wise mixed-format quantization. In this framework, each block selects an optimal format (dialect) from ...
232	Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks 2603.04676 Multi-image VLM attention analysis发现多图推理VLM注意力脉冲与位置偏置，并用PulseFocus训练改善聚焦。	cs.CVcs.AI	Chenjun Li	Multi-image reasoning remains a significant challenge for vision-language models (VLMs). We investigate a previously overlooked phenomenon: during chain-of-thought (CoT) generation, the text-to-image (T2I) attention of reasoning VLMs exhibits diffuse "pulses":... Multi-image reasoning remains a significant challenge for vision-language models (VLMs). We investigate a previously overlooked phenomenon: during chain-of-thought (CoT) generation, the text-to-image (T2I) attention of reasoning VLMs exhibits diffuse "pulses": sporadic and unfocused attention patterns that fail to concentrate on task-relevant images. We further reveal a systematic positional bias in attention allocation across images. Motivated by these observations, we propose PulseFocus, a tra...
233	LR-SGS: Robust LiDAR-Reflectance-Guided Salient Gaussian Splatting for Self-Driving Scene Reconstruction 2603.12647 LiDAR-guided Gaussian splatting融合LiDAR反射率与RGB引导显著高斯建模，提升自动驾驶场景3D重建。	cs.CVcs.AI	Ziyu Chen, Fan Zhu, Hui Zhu, Deyi Kong, Xinkai Kuang	Recent 3D Gaussian Splatting (3DGS) methods have demonstrated the feasibility of self-driving scene reconstruction and novel view synthesis. However, most existing methods either rely solely on cameras or use LiDAR only for Gaussian initialization or depth sup... Recent 3D Gaussian Splatting (3DGS) methods have demonstrated the feasibility of self-driving scene reconstruction and novel view synthesis. However, most existing methods either rely solely on cameras or use LiDAR only for Gaussian initialization or depth supervision, while the rich scene information contained in point clouds, such as reflectance, and the complementarity between LiDAR and RGB have not been fully exploited, leading to degradation in challenging self-driving scenes, such as those...
234	Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models 2603.14186 Fair benchmarking one-step generators提出设置匹配与语义缩放的评测协议，公平对比一步生成与多步扩散/流模型。	cs.CV	Advaith Ravishankar, Serena Liu, Mingyang Wang, Todd Zhou, Jeffrey Zhou	State-of-the-art text-to-image models produce high-quality images, but inference remains expensive as generation requires several sequential ODE or denoising steps. Native one-step models aim to reduce this cost by mapping noise to an image in a single step, y... State-of-the-art text-to-image models produce high-quality images, but inference remains expensive as generation requires several sequential ODE or denoising steps. Native one-step models aim to reduce this cost by mapping noise to an image in a single step, yet fair comparisons to multi-step systems are difficult because studies use mismatched sampling steps and different classifier-free guidance (CFG) settings, where CFG can shift FID, Inception Score, and CLIP-based alignment in opposing dire...
235	Clinically Aware Synthetic Image Generation for Concept Coverage in Chest X-ray Models 2603.15525 Clinically constrained CXR synthesis提出临床与解剖约束的胸片合成框架，扩展概念组合覆盖以提升诊断模型可靠性。	cs.CV	Amy Rafferty, Rishi Ramaesh, Ajitha Rajan	Deep learning models for chest X-ray diagnosis are constrained by limited coverage of clinically meaningful concept combinations in publicly available training datasets. While synthetic image generation has been explored to increase data diversity, existing me... Deep learning models for chest X-ray diagnosis are constrained by limited coverage of clinically meaningful concept combinations in publicly available training datasets. While synthetic image generation has been explored to increase data diversity, existing methods rarely enforce clinical or anatomical constraints, limiting utility for improving model reliability. We propose CARPA, a clinically aware and anatomically grounded framework for synthetic chest X-ray generation that applies targeted p...
236	Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation 2603.16876 Multi-agent RL radiology reporting用多模态多智能体强化学习端到端优化胸片分区解读与汇总生成报告。	cs.CVcs.LGcs.AI	Kaito Baba, Risa Kishikawa, Satoshi Kodera	We propose MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation that trains the entire agentic system on policy within its deployed radiology workflow. MARL-Rad addresses the limitation of post-hoc agentization, ... We propose MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation that trains the entire agentic system on policy within its deployed radiology workflow. MARL-Rad addresses the limitation of post-hoc agentization, where fixed LLMs are organized into hand-designed agentic workflows without being optimized for their assigned roles. Our framework decomposes chest X-ray interpretation into region-specific agents and a global integrating agent, and jointl...
237	Attention Sparsity is Input-Stable: Training-Free Sparse Attention for Video Generation via Offline Sparsity Profiling and Online QK Co-Clustering 2603.18636 Training-free sparse attention video离线剖析稀疏模式并在线QK协同聚类，实现免训练稀疏注意力加速视频生成。	cs.CV	Jiayi Luo, Jiayu Chen, Jiankun Wang, Cong Wang, Hanxin Zhu	Diffusion Transformers (DiTs) achieve strong video generation quality but suffer from high inference cost due to dense 3D attention, motivating sparse attention techniques for improving efficiency. However, existing training-free sparse attention methods for v... Diffusion Transformers (DiTs) achieve strong video generation quality but suffer from high inference cost due to dense 3D attention, motivating sparse attention techniques for improving efficiency. However, existing training-free sparse attention methods for video generation still face two unresolved limitations: ignoring layer heterogeneity in attention pruning and ignoring query-key coupling in block partitioning, which hinder a better quality-speedup trade-off. In this work, we uncover a crit...
238	Motion-o: Trajectory-Grounded Video Reasoning 2603.18856 Trajectory-grounded video reasoning提出轨迹证据链表征与监督，使视频推理显式解释物体运动过程。	cs.CVcs.AI	Bishoy Galoaa, Shayda Moezzi, Xiangyu Bai, Sarah Ostadabbas	Recent video reasoning models increasingly produce spatio-temporal evidence chains that localize objects at specific timestamps. While these traces improve interpretability by grounding \emph{where} and \emph{when} evidence appears, they often leave the motion... Recent video reasoning models increasingly produce spatio-temporal evidence chains that localize objects at specific timestamps. While these traces improve interpretability by grounding \emph{where} and \emph{when} evidence appears, they often leave the motion connecting observations, the \textit{how}, implicit. This makes dynamic and trajectory-dependent claims difficult to supervise, verify, or penalize when unsupported by the video. We formalize this missing component as Spatial-Temporal-Traj...
239	SteelDefectX: A Multi-Form Vision-Language Dataset and Benchmark for Steel Surface Defect Analysis 2603.21824 Steel defect vision-language benchmark发布SteelDefectX多形态文本标注数据集，评测钢表面缺陷的视觉语言理解。	cs.CVcs.AI	Shuxian Zhao, Jie Gui, Baosheng Yu, Dacheng Tao	Steel surface defect analysis is critical for industrial quality control, yet existing benchmarks rely primarily on label-only annotations, limiting fine-grained semantic understanding and systematic evaluation of vision-language models. To address this gap, w... Steel surface defect analysis is critical for industrial quality control, yet existing benchmarks rely primarily on label-only annotations, limiting fine-grained semantic understanding and systematic evaluation of vision-language models. To address this gap, we introduce SteelDefectX, a vision-language dataset with multi-form textual annotations for steel surface defect analysis, comprising 7,778 images across 25 defect categories. At the class level, the dataset provides defect names, represent...
240	Automatic Image-Level Morphological Trait Annotation for Organismal Images 2604.01619 Morphological trait annotation用稀疏自编码器挖掘可解释特征，实现生物图像形态性状的自动标注。	cs.CVcs.AI	Vardaan Pahuja, Samuel Stevens, Alyson East, Sydne Record, Yu Su	Morphological traits are physical characteristics of biological organisms that provide vital clues on how organisms interact with their environment. Yet extracting these traits remains a slow, expert-driven process, limiting their use in large-scale ecological... Morphological traits are physical characteristics of biological organisms that provide vital clues on how organisms interact with their environment. Yet extracting these traits remains a slow, expert-driven process, limiting their use in large-scale ecological studies. A major bottleneck is the absence of high-quality datasets linking biological images to trait-level annotations. In this work, we demonstrate that sparse autoencoders trained on foundation-model features yield monosemantic, spatia...
241	DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection 2604.02753 Open-Vocabulary Object Detection提出解耦式DETR以高效实现开放词汇目标检测。	cs.CV	Siheng Wang, Yanshu Li, Bohan Hu, Zhengdao Li, Haibo Zhan	Open-vocabulary Object Detection (OVOD) enables models to recognize objects beyond predefined categories, but existing approaches remain limited in practical deployment. On the one hand, multimodal designs often incur substantial computational overhead due to ... Open-vocabulary Object Detection (OVOD) enables models to recognize objects beyond predefined categories, but existing approaches remain limited in practical deployment. On the one hand, multimodal designs often incur substantial computational overhead due to their reliance on text encoders at inference time. On the other hand, tightly coupled training objectives introduce a trade-off between closed-set detection accuracy and open-world generalization. Thus, we propose Decoupled Cognition DETR (...
242	Zero-Shot Quantization via Weight-Space Arithmetic 2604.03420 Zero-Shot Post-Training Quantization用权重空间算术提取量化向量零样本提升PTQ精度。	cs.CVcs.LGcs.AI	Daniele Solombrino, Antonio Andrea Gargiulo, Alessandro Zirilli, Luca Zhou, Adrian Robert Minut	We show that robustness to post-training quantization (PTQ) is a transferable direction in weight space. We call this direction the quantization vector: extracted from a donor task by simple weight-space arithmetic, it can be used to patch a receiver model and... We show that robustness to post-training quantization (PTQ) is a transferable direction in weight space. We call this direction the quantization vector: extracted from a donor task by simple weight-space arithmetic, it can be used to patch a receiver model and improve post-PTQ Top-1 accuracy by up to 60 points in a 3-bit setting, without receiver-side quantization-aware training (QAT). Because the method requires no receiver training data, it provides a zero-shot, low-cost alternative to QAT for...
243	Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks 2604.19697 Multimodal STEM Reasoning Benchmark构建StepSTEM评测多模态模型的分步交错推理链。	cs.CV	Jing Jin, Hao Liu, Yan Bai, Yihang Lou, Zhenke Wang	Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuable testbed because it provides highly verifiable feedback, bu... Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuable testbed because it provides highly verifiable feedback, but existing benchmarks often permit unimodal shortcuts due to modality redundancy and focus mainly on final-answer accuracy, overlooking the reasoning process itself. To address this challenge, we introduce StepSTEM: a graduate-level benchma...
244	Bridging Restoration and Generation Manifolds in One-Step Diffusion for Real-World Super-Resolution 2604.24136 One-Step Diffusion Super-Resolution提出IDaS-SR一阶段扩散框架提升真实超分质量与效率。	cs.CV	Shyang-En Weng, Yi-Cheng Liao, Yu-Syuan Xu, Wei-Chen Chiu, Ching-Chun Huang	Pretrained diffusion models have revolutionized real-world image super-resolution (Real-ISR) but suffer from computational bottlenecks due to iterative sampling. Recent single-step distillation accelerates inference but faces a stark perception-distortion trad... Pretrained diffusion models have revolutionized real-world image super-resolution (Real-ISR) but suffer from computational bottlenecks due to iterative sampling. Recent single-step distillation accelerates inference but faces a stark perception-distortion trade-off due to rigid timestep initialization, distributional trajectory mismatches, and fragile stochastic modulation. To address this, we present Adaptive Inversion and Degradation-aware Sampling for Real-ISR (IDaS-SR), a one-step framework ...
245	Benchmarking and Improving GUI Agents in High-Dynamic Environments 2604.25380 GUI Agent Benchmarking基准并改进高动态GUI环境中的智能体决策与观测。	cs.CV	Enqi Liu, Liyuan Pan, Zhi Gao, Yan Yang, Chenrui Shi	Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments remains largely underexplor... Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments remains largely underexplored. Existing agents typically rely on a single screenshot after each action for decision-making, leading to a partially observable (or even unobservable) Markov decision process, where the key GUI state including important information for a...
246	Instruction-Evidence Contrastive Dual-Stream Decoding for Grounded Vision-Language Reasoning 2604.25809 Grounded Vision-Language Decoding提出对比双流解码以平衡指令遵循与视觉证据落地。	cs.CV	Yashwant Pravinrao Bangde, Debaditya Roy	Vision-Language Models (VLMs) exhibit strong performance in instruction following and open-ended vision-language reasoning, yet they frequently generate fluent outputs that are weakly grounded in visual evidence. Prior works have shown that instruction prompti... Vision-Language Models (VLMs) exhibit strong performance in instruction following and open-ended vision-language reasoning, yet they frequently generate fluent outputs that are weakly grounded in visual evidence. Prior works have shown that instruction prompting further worsens this issue by amplifying language priors, especially when the visual signal is uncertain or ambiguous. To address this challenge, we propose a decoding framework that explicitly balances linguistic informativeness and vis...
247	Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs 2605.00814 Visual Memory for LVLMs提出PVM模块缓解视觉信号稀释并持续访问图像证据。	cs.CVcs.AI	Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He	While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visua... While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence length. To counteract this, we propose Persistent Visual Memory (PVM), a lightweight learnable module designed to strengthen sustained, on-demand access to visual evidence. Integrated a...
248	SRGAN-CKAN: Expressive Super-Resolution with Nonlinear Functional Operators under Minimal Resources 2605.01459 Lightweight Image Super-Resolution用非线性算子增强低资源超分模型的表达与细节重建。	cs.CVcs.AI	Roberto Isai Navaro-Avi\~na, Eduardo Said Merin-Martinez, Andres Mendez-Vazquez, Eduardo Rodriguez-Tello	Single-Image Super-Resolution (SISR) aims to reconstruct a High-Resolution (HR) image from a Low-Resolution (LR) observation, a fundamentally ill-posed problem where high-frequency details are severely degraded at large upscaling factors. Recent advances have ... Single-Image Super-Resolution (SISR) aims to reconstruct a High-Resolution (HR) image from a Low-Resolution (LR) observation, a fundamentally ill-posed problem where high-frequency details are severely degraded at large upscaling factors. Recent advances have been driven by transformer-based architectures and diffusion models improve global context modeling and perceptual quality at the cost of increased computational complexity. In contrast, this work focuses on enhancing the expressivity of lo...
249	Super-Resolution of Airborne Laser Scanning Point Clouds for Forest Inventory 2605.02201 Point Cloud Super-Resolution提出3DFSR提升森林ALS点云密度并同时降噪。	cs.CV	Jinyuan Shao, Sangyoong Park, Chunxi Zhao, Ayman Habib, Songlin Fei	Airborne Laser Scanning (ALS) can collect point clouds across large areas, enabling large-scale forest inventory. However, ALS point clouds are sparse and noisy, resulting in inaccurate individual-tree-level forest inventory, such as stem localization and tree... Airborne Laser Scanning (ALS) can collect point clouds across large areas, enabling large-scale forest inventory. However, ALS point clouds are sparse and noisy, resulting in inaccurate individual-tree-level forest inventory, such as stem localization and tree size estimation. To overcome this problem, we propose a deep learning model, 3D Forest Super Resolution (3DFSR), to simultaneously improve point density and reduce noise for ALS forest point cloud. 3DFSR is a voxel-based CNN with a U-Net a...
250	Metric Unreliability in Multimodal Machine Unlearning: A Systematic Analysis and Principled Unified Score 2605.02206 Multimodal Machine Unlearning Metrics系统分析多模态遗忘评测指标冲突并给出统一评分。	cs.CVcs.LG	Abdullah Ahmad Khan, Hamid Laga, Ferdous Sohel	Machine unlearning in Vision-Language Models (VLMs) is required for compliance with the General Data Protection Regulation (GDPR), yet current evaluation practices are inconsistent. We present the first systematic study of metric reliability in multimodal unle... Machine unlearning in Vision-Language Models (VLMs) is required for compliance with the General Data Protection Regulation (GDPR), yet current evaluation practices are inconsistent. We present the first systematic study of metric reliability in multimodal unlearning. Five standard metrics, Forget Accuracy (FA), Retain Accuracy (RA), Membership Inference Attack (MIA), Activation Distance (AD), and JS divergence (JS), yield conflicting method rankings across three VQA benchmarks (MLLMU-Bench, UnLO...
251	Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis 2605.02357 Point Cloud Feature Aggregation提出通道关系与注意聚合并加邻域同质约束提升点云表征。	cs.CV	Jiaqi Shi, Jin Xiao, Xiaoguang Hu, Wenxuan Ji, Zichong Jia	In 3D point cloud understanding, the core challenge lies in accurately capturing discriminative features within complex neighborhoods, which directly affects the execution precision of downstream tasks such as embodied AI and autonomous driving. Existing metho... In 3D point cloud understanding, the core challenge lies in accurately capturing discriminative features within complex neighborhoods, which directly affects the execution precision of downstream tasks such as embodied AI and autonomous driving. Existing methods explore feature correlation discrimination but are limited to point-level spatial distribution or channel responses, enabling only coarse-grained level evaluation. For modern multi-scale point cloud networks, such coarse-grained metrics ...
252	Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures 2605.04035 3D Gaussian Head Reconstruction提出HeadsUp从多视角快速重建高质量3D高斯人头。	cs.CVcs.LG	Evangelos Ntavelis, Sean Wu, Mohamad Shahbazi, Fabio Maninchedda, Dmitry Kostiaev	We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representa... We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, en...
253	Zero-Shot Satellite Image Retrieval through Joint Embeddings: Application to Crisis Response 2605.05405 Zero-Shot Satellite Image Retrieval提出GeoQuery用联合嵌入实现自然语言零样本卫星检索。	cs.CV	James Walsh, William Fawcett, Grace Colvard, Ra\'ul Ramos-Poll\'an	Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-... Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-sensing CLIP-style model requires paired data and compute that are unavailable at global scale. To allow natural language querying at global scales, we present GeoQuery, a zero-shot retrieval system that sidesteps data and compute constrain...
254	EGA: Adapting Frozen Encoders for Vector Search with Bounded Out-of-Distribution Degradation 2605.05674 OOD-Robust Vector Search Adapters提出EGA残差适配器降低向量检索在分布外查询的退化。	cs.CVcs.LGcs.AI	Dongfang Zhao	Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wron... Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wrong seen-class clusters, dropping worst-case Label Precision by over 40 points below the frozen baseline in our tests. We propose Euclidean Geodesic Alignment (EGA), a residual adapter that couples three principles: zero initialization, local...
255	VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding 2605.05848 Efficient Long-Video Understanding提出VideoRouter按查询自适应路由压缩长视频视觉token。	cs.CVcs.AI	Kuanwei Lin, Wenhao Zhang, Ge Li	Video large multimodal models increasingly face a scalability bottleneck: long videos produce excessively long visual-token sequences, which sharply increase memory and latency during inference. While existing compression methods are effective in specific sett... Video large multimodal models increasingly face a scalability bottleneck: long videos produce excessively long visual-token sequences, which sharply increase memory and latency during inference. While existing compression methods are effective in specific settings, most are either weakly query-aware or apply a fixed compression policy across frames, proving suboptimal when visual evidence is unevenly distributed over time. To address this, we present VideoRouter, a query-adaptive dual-router fra...
256	VISD: Enhancing Video Reasoning via Structured Self-Distillation 2605.06094 Video Reasoning Self-Distillation提出VISD结构化自蒸馏为视频推理提供细粒度监督。	cs.CVcs.AI	Hao Lin, Kunyang Lv, Xu Jiang, Jingqi Tian, Zhongjing Du	Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level rewards and the lack of fine grained credit assignment over long, temporally grounded reasoning trajectories. While reinforcement learning with verifiable rewards (RLVR) ... Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level rewards and the lack of fine grained credit assignment over long, temporally grounded reasoning trajectories. While reinforcement learning with verifiable rewards (RLVR) provides reliable supervision, it fails to capture token level contributions, leading to inefficient learning. Conversely, existing self distillation methods offer dense supervision but lack structure and diagnostic specificity, and often i...
257	Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation 2605.06173 Retinal Diagnosis and Report Generation提出Retina-RAG联合视网膜分级检测与临床报告生成。	cs.CVcs.AI	Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier	Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost... Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading, macular edema (ME) detection, and report generation. The architecture decouples a high-performance retinal classifier and a parameter-efficient vision-language model (Qwen2.5-VL-...
258	Eulerian Motion Guidance: Robust Image Animation via Bidirectional Geometric Consistency 2605.06280 Diffusion-Based Image Animation用相邻帧欧拉运动引导实现更稳健的可控图像动画生成。	cs.CV	Thong Nguyen, Khoi M. Le, Cong-Duy Nguyen, Luu Anh Tuan, See-Kiong Ng	Recent advancements in image animation have utilized diffusion models to breathe life into static images. However, existing controllable frameworks typically rely on Lagrangian motion guidance, where optical flow is estimated relative to the initial frame. Thi... Recent advancements in image animation have utilized diffusion models to breathe life into static images. However, existing controllable frameworks typically rely on Lagrangian motion guidance, where optical flow is estimated relative to the initial frame. This paper revisits the same optical-flow primitive through a more local supervision design: we use adjacent-frame Eulerian motion fields to guide generation, where the motion signal always describes a short temporal hop. This shift enables pa...
259	Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement 2605.06298 Weight-Space World Models提出NOVA以隐式网络权重表征状态并渲染预测视频。	cs.CVcs.AI	Roussel Desmond Nzoyem, Mauro Comi	Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction leaves the... Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction leaves these models computationally expensive and uninterpretable. We address this problem by introducing NOVA, a world modelling framework that represents the system state as the weights and biases of an auxiliary coordinate-based implicit neural re...
260	NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps 2605.06317 Top-Down Vision-Language Navigation将VLN改为俯视地图的一步全局路径规划以减少累积误差。	cs.CVcs.AI	Dijia Zhan, Jinyi Li, Chenxi Zheng, Shaoyu Huang, Yong Li	Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on in... Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on incrementally updating memory graphs or scoring discrete path proposals, which restricts continuous spatial reasoning and creates discrete bottlenecks. We propose Top-Down VLN (TD-VLN), reformulating navigation as a one-step global path plann...
261	Multispectral Indices for Wildfire Management 2309.01751 Multispectral Wildfire Monitoring评估多光谱指数在野火监测管理中的信息提取能力。	cs.CV	Afonso Oliveira, Jo\~ao P. Matos-Carvalho, Filipe Moutinho, Nuno Fachada	The increasing frequency and severity of wildfires necessitates advanced methods for effective surveillance and management, as traditional ground-based techniques often struggle to adapt to rapidly changing fire behavior and environmental conditions. This stud... The increasing frequency and severity of wildfires necessitates advanced methods for effective surveillance and management, as traditional ground-based techniques often struggle to adapt to rapidly changing fire behavior and environmental conditions. This study investigates the use of multispectral aerial and satellite imagery for wildfire management through an assessment of current literature and two practical case studies. We evaluate several multispectral indices for their ability to extract ...
262	Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise 2408.09929 Contrastive Learning Theory从信息论刻画对比学习增强等价于估计正激励噪声。	cs.CVcs.LG	Hongyuan Zhang, Yanchen Xu, Sida Huang, Xuelong Li	Inspired by the idea of Positive-incentive Noise (Pi-Noise or $\pi$-Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and $\pi$-noise in this paper. By converting the ... Inspired by the idea of Positive-incentive Noise (Pi-Noise or $\pi$-Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and $\pi$-noise in this paper. By converting the contrastive loss to an auxiliary Gaussian distribution to quantitatively measure the difficulty of the specific contrastive model under the information theory framework, we properly define the task entropy, the core concept of $\pi$-noise, ...
263	RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization 2505.13289 Symmetry Discovery and Canonicalization提出RECON显式规范化朝向以鲁棒发现实例特定对称性。	cs.CVcs.LG	Alonso Urbano, David W. Romero, Max Zimmer, Sebastian Pokutta	Real world data often exhibits unknown, instance-specific symmetries that rarely exactly match a transformation group $G$ fixed a priori. Class-pose decompositions aim to create disentangled representations by factoring inputs into invariant features and a pos... Real world data often exhibits unknown, instance-specific symmetries that rarely exactly match a transformation group $G$ fixed a priori. Class-pose decompositions aim to create disentangled representations by factoring inputs into invariant features and a pose $g\in G$ defined relative to a training-dependent, arbitrary canonical representation. We introduce RECON, a class-pose agnostic canonical orientation normalization that corrects arbitrary canonicals via a simple right translation, yieldi...
264	Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics 2509.08461 VLMs for Neutrino Classification微调视觉语言模型对高能物理探测器图像进行中微子分类。	cs.CVcs.LGcs.AI	Dikshant Sagar, Kaiwen Yu, Alejandro Yankelevich, Jianming Bian, Pierre Baldi	Recent advances in Large Language Models (LLMs) have demonstrated their remarkable capacity to process and reason over structured and unstructured data modalities beyond natural language. In this work, we explore the applications of Vision Language Models (VLM... Recent advances in Large Language Models (LLMs) have demonstrated their remarkable capacity to process and reason over structured and unstructured data modalities beyond natural language. In this work, we explore the applications of Vision Language Models (VLMs), specifically a fine-tuned variant of LLaMA 3.2 to the task of identifying neutrino interactions in pixelated detector data from high-energy physics (HEP) experiments. We benchmark this model against a state-of-the-art convolutional neur...
265	Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models 2509.25584 Layer Skipping in VLMs给出视觉语言模型可跳层的理论条件以降低推理成本。	cs.CVcs.CLcs.LGcs.AI	Max Hartman, Vidhata Jayaraman, Moulik Choraria, Akhil Bhimaraju, Lav R. Varshney	Vision-language models achieve incredible performance across a wide range of tasks, but their large size makes inference costly. Recent work has shown that multimodal processing contains significant redundancies, making it possible to skip certain layers with ... Vision-language models achieve incredible performance across a wide range of tasks, but their large size makes inference costly. Recent work has shown that multimodal processing contains significant redundancies, making it possible to skip certain layers with minimal performance loss. Yet current pruning techniques remain ad-hoc, relying on heuristics or hyperparameter sweeps rather than principled criteria for determining when layer skipping is beneficial. In this paper, we propose a unified fr...
266	Frequency-Aware Model Parameter Explorer: A new attribution method for improving explainability 2510.03245 Frequency-Domain Model Attribution提出频率感知参数归因方法用谱域扰动提升可解释性。	cs.CVcs.LGcs.AI	Ali Yavari, Alireza Mohamadi, Elham Beydaghi, Philipp Seeb\"ock, Rainer A. Leitgeb	State-of-the-art attribution methods rely on adversarial sample generation that applies an all-pass filter across the frequency spectrum, discarding fine-grained high-frequency information that is demonstrably important for accurate feature attribution in deep... State-of-the-art attribution methods rely on adversarial sample generation that applies an all-pass filter across the frequency spectrum, discarding fine-grained high-frequency information that is demonstrably important for accurate feature attribution in deep neural networks. By generating adversarial samples that selectively perturb high- and low-frequency components, we can probe which spectral features a model relies on most -- directly translating frequency-domain exploration into attributi...
267	Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis 2511.09907 Reasoning-Driven Data Synthesis提出面向求解器的推理驱动题目生成以合成高价值训练数据。	cs.CVcs.AI	Yongxian Wei, Yilin Zhao, Zixuan Hu, Li Shen, Xinrui Chen	Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores th... Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores the solver's ability and yields low-value problems, or reliance on complex data pipelines to balance problem difficulty; and (ii) a lack of reasoning in problem generation, leading to shallow problem variants. In this paper, we develop a prob...
268	Saving Foundation Flow-Matching Priors for Inverse Problems 2511.16520 Flow-Matching Priors for Inverse Problems提出FMPlug以热启动与正则化增强FM先验求解逆问题。	cs.CVcs.LG	Yuxiang Wan, Ryan Devera, Wenjie Zhang, Ju Sun	Foundation flow-matching (FM) models promise a universal prior for solving inverse problems (IPs), yet today they trail behind domain-specific or even untrained priors. How can we unlock their potential? We introduce FMPlug, a plug-in framework that redefines ... Foundation flow-matching (FM) models promise a universal prior for solving inverse problems (IPs), yet today they trail behind domain-specific or even untrained priors. How can we unlock their potential? We introduce FMPlug, a plug-in framework that redefines how foundation FMs are used in IPs. FMPlug combines an instance-guided, time-dependent warm-start strategy with a sharp Gaussianity regularization, adding problem-specific guidance while preserving the Gaussian structures. This leads to a s...
269	Large Video Planner Enables Generalizable Robot Control 2512.15840 Video-Based Robot Planning提出大视频规划器以提升机器人控制在多任务上的泛化。	cs.CV	Boyuan Chen, Tianyuan Zhang, Haoran Geng, Caiyi Zhang, Peihao Li	General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (... General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (VLA) systems. These efforts are motivated by the intuition that MLLMs' large-scale language and image pretraining can be effectively transferred to the action output modality. In this work, we explore an alternative paradigm of using large-...
270	DisCo-FLoc: Semantic-Free Floorplan Localization via $SE(2)$-Aware Contrastive Disambiguation 2601.01822 Floorplan Localization Contrastive Learning提出DisCo-FLoc用SE(2)对比消歧实现无语义平面定位。	cs.CV	Ping Zhong, Shiyong Meng, Bolei Chen, Tao Zou, Chaoxu Mu	Visual Floorplan Localization (FLoc) struggles with severe structural aliasing caused by repetitive minimalist layouts. This occurs because physically distant poses share highly similar visual-geometric features, which degrades spatial separability and angular... Visual Floorplan Localization (FLoc) struggles with severe structural aliasing caused by repetitive minimalist layouts. This occurs because physically distant poses share highly similar visual-geometric features, which degrades spatial separability and angular discriminability. While existing methods attempt to mitigate these ambiguities by relying on costly semantic annotations, the resulting performance gains remain inherently limited. To address the above issues, we propose DisCo-FLoc, a sema...
271	DeepFedNAS: Efficient Hardware-Aware Architecture Adaptation for Heterogeneous IoT Federations via Pareto-Guided Supernet Training 2601.15127 Federated NAS for IoT提出硬件感知FedNAS框架，为异构IoT设备自适应网络结构。	cs.CVcs.LG	Bostan Khan, Masoud Daneshtalab	Deploying federated learning across heterogeneous IoT device fleets requires tailored neural network architectures for each device class, yet existing Federated Neural Architecture Search (FedNAS) methods suffer from unguided supernet training and prohibitivel... Deploying federated learning across heterogeneous IoT device fleets requires tailored neural network architectures for each device class, yet existing Federated Neural Architecture Search (FedNAS) methods suffer from unguided supernet training and prohibitively costly post-training search pipelines that demand over 20 GPU-hours per deployment target. We introduce DeepFedNAS, a two-phase framework built on a multi-objective fitness function that synthesizes information-theoretic network metrics w...
272	Lossy Common Information in a Learnable Gray-Wyner Network 2601.21424 Learnable Gray-Wyner Codec设计可学习Gray-Wyner三通道编码器，分离多任务共享与特有信息。	cs.CVcs.LG	Anderson de Andrade, Alon Harell, Ivan V. Baji\'c	Many computer vision tasks share substantial overlapping information, yet conventional codecs tend to ignore this, leading to redundant and inefficient representations. The Gray-Wyner network, a classical concept from information theory, offers a principled fr... Many computer vision tasks share substantial overlapping information, yet conventional codecs tend to ignore this, leading to redundant and inefficient representations. The Gray-Wyner network, a classical concept from information theory, offers a principled framework for separating common and task-specific information. Inspired by this idea, we develop a learnable three-channel codec that disentangles shared information from task-specific details across multiple vision tasks. We characterize the...
273	Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts 2602.03473 Continual Learning MoE Routing提出双层路由MoE持续学习方法，扩展到300+任务并兼顾稳定与可塑。	cs.CVcs.LG	Meng Lou, Yunxiang Fu, Yizhou Yu	Continual learning, especially class-incremental learning (CIL), on the basis of a pre-trained model (PTM) has garnered substantial research interest in recent years. However, how to effectively learn both discriminative and comprehensive feature representatio... Continual learning, especially class-incremental learning (CIL), on the basis of a pre-trained model (PTM) has garnered substantial research interest in recent years. However, how to effectively learn both discriminative and comprehensive feature representations while maintaining stability and plasticity over very long task sequences remains an open problem. We propose CaRE, a scalable {C}ontinual Le{a}rner with efficient Bi-Level {R}outing Mixture-of-{E}xperts (BR-MoE). The core idea of BR-MoE ...
274	Direction-Flipped Influence Audits Reveal Hidden Structure in Moral Choices of LLMs 2602.22831 LLM Moral Choice Auditing用方向翻转提示审计，揭示LLM道德选择对微小线索的敏感结构。	cs.CVcs.CLcs.LGcs.AI	Phil Blandfort, Tushar Karayil, Alex McKenzie, Urja Pawar, Robert Graham	Moral benchmarks for LLMs typically score models on context-free prompts, implicitly treating the measured choice rate as stable. We test this assumption with a direction-flipped influence audit: for each scenario, we compare a baseline prompt with matched cue... Moral benchmarks for LLMs typically score models on context-free prompts, implicitly treating the measured choice rate as stable. We test this assumption with a direction-flipped influence audit: for each scenario, we compare a baseline prompt with matched cues steering toward option A or option B. Across a trolley-problem-style moral triage task, BBQ, and DailyDilemmas, and across five LLM families with and without reasoning, short contextual cues shift per-condition choice rates by 12-18 perce...
275	3D tomography of exchange phase in a Si/SiGe quantum dot device 2603.16025 Quantum Dot Exchange Tomography对Si/SiGe量子点器件的交换相位进行三维层析重建以估计J(V)。	cs.CV	Dylan Albrecht, Sarah Thompson, N. Tobias Jacobson, Ryan Jock	The exchange interaction is a foundational building block for the operation of spin-based quantum processors. Extracting the exchange interaction coefficient $J(\mathbf{V})$, as a function of gate electrode voltages, is important for understanding disorder, fa... The exchange interaction is a foundational building block for the operation of spin-based quantum processors. Extracting the exchange interaction coefficient $J(\mathbf{V})$, as a function of gate electrode voltages, is important for understanding disorder, faithfully simulating device performance, and operating spin qubits with high fidelity. Typical coherent measurements of exchange in spin qubit devices yield a modulated cosine of an accumulated phase, which in turn is the time integral of ex...
276	Drifting Fields are not Conservative 2604.06333 Nonconservative Drift Fields证明漂移生成模型的漂移场一般非保守，无法等价为标量损失优化。	cs.CVcs.LG	Leonard T. Franz, Sebastian Hoffmann, Tim Weiland, Bernhard Sch\"olkopf, Georg Martius	Drifting models have recently gained attention for generating high-quality samples in a single forward pass. During training, they learn a push-forward map by following a vector-valued field, the drift field. We ask whether this procedure is equivalent to opti... Drifting models have recently gained attention for generating high-quality samples in a single forward pass. During training, they learn a push-forward map by following a vector-valued field, the drift field. We ask whether this procedure is equivalent to optimizing a scalar loss and find that, in general, it is not: drift fields are not conservative and cannot be written as the gradient of any scalar potential. We identify the position-dependent normalization as the source of non-conservatism, ...
277	3D Generation for Embodied AI and Robotic Simulation: A Survey 2604.26509 3D Generation Survey Robotics综述面向具身智能与机器人仿真的3D生成技术与交互物理需求。	cs.CV	Tianwei Ye, Yifan Mao, Minwen Liao, Jian Liu, Chunchao Guo	Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements f... Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This surve...
278	Affordance Agent Harness: Verification-Gated Skill Orchestration 2605.00663 Affordance Skill Orchestration提出验证门控的技能编排框架，按实例难度调度并纠错可供性推理。	cs.CV	Haojian Huang, Jiahao Shi, Yinchuan Li, Yingcong Chen	Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occluded, reflective, and visually ambiguous. Recent systems therefore combine multiple skills (e.g., detection, se... Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occluded, reflective, and visually ambiguous. Recent systems therefore combine multiple skills (e.g., detection, segmentation, interaction-imagination), yet most orchestrate them with fixed pipelines that are poorly matched to per-instance difficulty, offer limited targeted recovery from intermediate errors, and fail to reuse experience from recurring o...
279	3DSS: 3D Surface Splatting for Inverse Rendering 2605.05876 Differentiable Surface Splatting提出可微表面splatting渲染器，用于多视图物理逆渲染与分层合成。	cs.CV	Mae Younes, Adnane Boukhayma	We present 3D Surface Splatting (3DSS), the first differentiable surface splatting renderer for physically-based inverse rendering from multi-view images. Our central insight is that the surface separation problem at the heart of surface splatting admits a dir... We present 3D Surface Splatting (3DSS), the first differentiable surface splatting renderer for physically-based inverse rendering from multi-view images. Our central insight is that the surface separation problem at the heart of surface splatting admits a direct formulation in terms of the reconstruction kernels themselves. From this foundation we derive a coverage-based compositing model whose per-layer opacity arises directly from the accumulated Elliptical Weighted Average reconstruction wei...
280	SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders 2605.06610 Adaptive Sparse Autoencoders提出动态Top-K稀疏自编码器，按输入自适应选择激活特征数。	cs.CVcs.LG	Jakub St\k{e}pie\'n, Marcin Mazur, Jacek Tabor, Przemys{\l}aw Spurek	Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets ... Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets of monosemantic features, SAEs aim to translate neural network computations into human-understandable concepts. However, common architectures such as TopK SAEs rely on a fixed sparsity level. They enforce the same number of active features ...
cs.LG 449 papers
492	A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence 2605.06678 WGAN气候情景生成用Wasserstein GAN生成土壤沉降相关气候情景以支持保险风险管理。	cs.LG	Antoine Heranval (BioSP), Olivier Lopez (CREST), Didier Ngatcha (CREST), Daniel Nkameni (CREST)	According to the United Nations Office for Disaster Risk Reduction (2025), the average annual cost of natural catastrophes increased from 70--80 billion USD between 1970 and 2000 to 180--200 billion USD between 2001 and 2020. Reports from organizations such as... According to the United Nations Office for Disaster Risk Reduction (2025), the average annual cost of natural catastrophes increased from 70--80 billion USD between 1970 and 2000 to 180--200 billion USD between 2001 and 2020. Reports from organizations such as the IFOA and the WWF highlight the need for the insurance sector to adapt to this rapidly evolving context by developing medium- to long-term strategies that go beyond the one-year horizon of prudential regulations such as Solvency II. Thi...
493	Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding 2605.06679 VLM去幻觉解码提出训练无关PND解码对比正负路径以增强视觉一致性减少幻觉。	cs.LG	Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng	Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference fram... Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference framework that intervenes directly in the decoding process to enforce visual fidelity. PND is motivated by our finding of an attention imbalance in VLMs, where visual features are under-weighted. Our framework introduces a dual-path contrast: a...
494	From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes 2605.06684 树碰撞事故风险建模用混合预测框架量化树相关交通事故严重度的关键风险因素。	cs.LG	Abdul Azim, Ahmed Hossain, Soumyadip Maitra, Panick Kalambay	Tree-involved crashes represent a critical subset of run-off-road (ROR) collisions, often resulting in fatal or severe injuries due to high-energy impacts. This study develops a comprehensive analytical framework to identify and quantify risk factors contribut... Tree-involved crashes represent a critical subset of run-off-road (ROR) collisions, often resulting in fatal or severe injuries due to high-energy impacts. This study develops a comprehensive analytical framework to identify and quantify risk factors contributing to crash severity in tree-involved collisions using the Crash Report Sampling System (CRSS) database spanning 2020-2023. The modeling framework follows a multi-step process. First, a machine learning based classification model (CatBoost...
495	Robustness of Refugee-Matching Gains to Off-Policy Evaluation Choices 2605.06686 难民匹配的离线评估稳健性比较多种离线评估方法检验美国难民匹配收益结论的稳健性。	cs.LG	Kirk Bansak, Elisabeth Paulson, Dominik Rothenh\"ausler, Jeremy Ferwerda, Jens Hainmueller	Previous research has investigated the potential of refugee matching for boosting refugee outcomes, first considered by Bansak et al. (2018). This paper demonstrates the stability of counterfactual impact evaluation results in the context of refugee matching i... Previous research has investigated the potential of refugee matching for boosting refugee outcomes, first considered by Bansak et al. (2018). This paper demonstrates the stability of counterfactual impact evaluation results in the context of refugee matching in the United States using a range of off-policy evaluation methods. In order to estimate counterfactual impact and test the robustness of our results, we employ several evaluation methods, including inverse probability weighting (IPW) and m...
496	Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion 2605.06720 离散扩散抗体序列生成用分类器引导的离散扩散生成抗体序列并吸收胚系偏置以建模体细胞变异。	cs.LGcs.AI	Justin Sanders, Luca Giancardo, Lan Guo, Yue Zhao, Kemal Sonmez	Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as powerful tools for ant... Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as powerful tools for antibody sequence design, existing approaches largely suffer from two key limitations: they predominantly memorize germline sequences rather than modeling biologically meaningful somatic variation, and they offer limited support for flexible c...
497	Enabling Unsupervised Training of Deep EEG Denoisers With Intelligent Partitioning 2605.06724 无监督EEG去噪训练用智能分区实现无需干净标签的深度可穿戴EEG去噪训练。	cs.LGcs.AI	Qiyu Rao, Haozhe Tian, Homayoun Hamedmoghadam, Danilo Mandic	Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot ... Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot handle the time-varying pervasive artifacts in wearable EEGs. Deep learning methods, on the other hand, show promise in decomposition-free EEG denoising using highly expressive neural networks, but the training requires artifact-free EEG, w...
498	Transformer-Based Wildlife Species Classification from Daily Movement Trajectories 2605.06726 轨迹Transformer物种识别用Transformer从日常GPS运动轨迹中分类野生动物物种并跨区域泛化。	cs.LG	Obed Irakoze, Prasenjit Mitra	Inferring the identity of wildlife species from daily movement data alone is a challenging task. We train sequence models on large-scale, 7-species GPS trajectories from the Movebank platform. Trajectories models are evaluated using a protocol in which entire ... Inferring the identity of wildlife species from daily movement data alone is a challenging task. We train sequence models on large-scale, 7-species GPS trajectories from the Movebank platform. Trajectories models are evaluated using a protocol in which entire telemetry studies or regions are heldout during testing. We compare Transformer-based sequence models to LSTM, CNN, and Temporal Convolutional Networks, and find that Transformers consistently achieve higher balanced accuracy with gains of ...
499	Medical Imaging Classification with Cold-Atom Reservoir Computing using Auto-Encoders and Surrogate-Driven Training 2605.06727 冷原子储备计算医学影像结合自编码器与中性原子储备计算实现息肉检测并用替代梯度训练。	cs.LG	Nuno Batista, Ana Morgado, Oscar Ferraz, Sagar Silva Pratapsi, Jorge Lobo	We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guide... We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guided auto-encoder. This pipeline learns compact and discriminative representations of image data that are also well-suited for quantum reservoir computing. A key challenge in such systems is the non-differentiable nature of quantum measurement...
500	The E$\Delta$-MHC-Geo Transformer: Adaptive Geodesic Operations with Guaranteed Orthogonality 2605.06729 正交残差的几何Transformer用数据依赖Cayley变换构造始终正交的自适应残差连接Transformer。	cs.LGcs.AI	Arash Shahmansoori	We present the E$\Delta$-MHC-Geo Transformer, a novel architecture that unifies Manifold-Constrained Hyper-Connections (mHC), Deep Delta Learning (DDL), and the Cayley transform to obtain input-adaptive, unconditionally orthogonal residual connections. Unlike ... We present the E$\Delta$-MHC-Geo Transformer, a novel architecture that unifies Manifold-Constrained Hyper-Connections (mHC), Deep Delta Learning (DDL), and the Cayley transform to obtain input-adaptive, unconditionally orthogonal residual connections. Unlike DDL, whose Householder operator is orthogonal only at $\beta \in \{0,2\}$, our Data-Dependent Cayley rotation $Q(x)=(I+(\beta/2)A(x))^{-1}(I-(\beta/2)A(x))$ preserves orthogonality for all $\beta$ and all inputs. To handle negation, an eige...
501	Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics 2605.06730 新闻语义状态抽象接口将新闻文本分解为可审计多轴状态用于LLM增强的投资决策与诊断。	cs.LG	Likhita Yerra (AIVANCITY School of AI and Data), Remi Uttejitha Allam (AIVANCITY School of AI and Data)	We introduce Semantic State Abstraction Interfaces (SSAI): a methodological template for mapping sparse unstructured text into $K$ auditable, named coordinates with neutral defaults on no-news days, designed to separate representation hypotheses from optimisat... We introduce Semantic State Abstraction Interfaces (SSAI): a methodological template for mapping sparse unstructured text into $K$ auditable, named coordinates with neutral defaults on no-news days, designed to separate representation hypotheses from optimisation variance in sequential decision systems. Our contribution is the framework and its evaluation protocol, not a claim that SSAI outperforms denser alternatives. We instantiate SSAI with $K=4$ axes (sentiment, risk, confidence, volatility ...
502	On Training in Imagination 2605.06732 想象回放的模型式RL分析分析用学习到的动力学与奖励模型进行想象训练时误差如何影响回报与优化。	cs.LG	Nadav Timor, Ravid Shwartz-Ziv, Micah Goldblum, Yann LeCun, David Harel	State-of-the-art model-based reinforcement learning methods train policies on imagined rollouts. These rollouts are trajectories generated by a learned dynamics model and are scored by a learned reward model, but without querying the true environment during po... State-of-the-art model-based reinforcement learning methods train policies on imagined rollouts. These rollouts are trajectories generated by a learned dynamics model and are scored by a learned reward model, but without querying the true environment during policy updates. We study this training paradigm by quantifying how errors in learned dynamics and reward models affect returns and policy optimization. First, we extend the analysis of Asadi et al. (2018) to MDPs with learned reward models, a...
503	Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA 2605.06733 联邦LoRA的规范不变聚合提出规范感知的低秩服务器表示以解决联邦LoRA因分解等价导致的聚合偏差。	cs.LGcs.AI	Jinqian Chen, Chang Liu, Jihua Zhu	Federated LoRA enables parameter-efficient adaptation of large language models under decentralized data and limited client resources.However, directly averaging LoRA factors is representation-dependent: the same intrinsic update admits infinitely many gauge-eq... Federated LoRA enables parameter-efficient adaptation of large language models under decentralized data and limited client resources.However, directly averaging LoRA factors is representation-dependent: the same intrinsic update admits infinitely many gauge-equivalent factorizations, so factor-level aggregation can change under arbitrary coordinate choices while the underlying update remains unchanged. This reveals a semantic mismatch in existing federated LoRA aggregation rules. We propose \tex...
504	Gated QKAN-FWP: Scalable Quantum-inspired Sequence Learning 2605.06734 量子启发快速权重序列学习提出可扩展的门控QKAN-FWP以量子启发方式进行序列建模与快权重更新。	cs.LGcs.AI	Kuo-Chung Peng, Samuel Yen-Chi Chen, Jiun-Cheng Jiang, Chen-Yu Liu, En-Jui Kuo	Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-q... Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-qubit architectures that are difficult to scale on noisy intermediate-scale quantum (NISQ) devices and expensive to simulate classically. We propose gated QKAN-FWP, a fast-weight framework that integrates FWP with Quantum-inspired Kolmogorov...
505	STDA-Net: Spectrogram-Based Domain Adaptation for cross-dataset Sleep Stage Classification 2605.06736 睡眠分期跨域自适应提出STDA-Net用频谱图输入的无监督域适配实现跨数据集睡眠分期。	cs.LGcs.AI	Unaza Tallal, Shruti Kshirsagar, Ankita Shukla	Accurate sleep stage classification across datasets remains challenging due to variability in EEG channel montages, sampling rates, recording environments, and subject populations. Although deep learning has shown considerable promise for automated sleep stagi... Accurate sleep stage classification across datasets remains challenging due to variability in EEG channel montages, sampling rates, recording environments, and subject populations. Although deep learning has shown considerable promise for automated sleep staging, most existing cross-dataset methods rely on one-dimensional EEG signal representations, whereas the use of two-dimensional spectrogram-based inputs within an unsupervised domain adaptation framework has remained largely unexplored. Here...
506	Geometric Kolmogorov--Arnold Network (GeoKAN) 2605.06740 几何感知KAN模型提出GeoKAN学习黎曼度量扭曲坐标后再做基展开以增强几何归纳偏置。	cs.LGcs.AI	Abhijit Sen, Bikram Keshari Parida, Giridas Maiti, Mahima Arya, Denys I. Bondar	We introduce Geometric Kolmogorov--Arnold Networks (GeoKANs), a family of geometry-aware KAN-type models in which approximation is carried out in learned, geometry-adapted coordinates rather than in fixed Euclidean input coordinates. GeoKAN achieves this by le... We introduce Geometric Kolmogorov--Arnold Networks (GeoKANs), a family of geometry-aware KAN-type models in which approximation is carried out in learned, geometry-adapted coordinates rather than in fixed Euclidean input coordinates. GeoKAN achieves this by learning a diagonal Riemannian metric that warps the input before basis expansion and feature mixing. The learned metric provides a geometric inductive bias through local length scaling and volume distortion, and in physics-informed settings ...
507	A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics 2605.06741 信念空间学习率上界在概率单纯形的KL几何下推导可收缩更新的学习率步长闭式上界。	cs.LG	Zixi Li, Youzhen Li	Learning-rate steps are usually treated as hyperparameters. This paper isolates a local beliefspace calculation: when an update is modeled as a projected forward step on the probability simplex, admissibility means contractivity in the natural KL/Bregman geome... Learning-rate steps are usually treated as hyperparameters. This paper isolates a local beliefspace calculation: when an update is modeled as a projected forward step on the probability simplex, admissibility means contractivity in the natural KL/Bregman geometry. Under this model, the upper bound of an admissible step is not a tuning slogan but a formula.
508	Gradient Extrapolation-Based Policy Optimization 2605.06755 梯度外推策略优化提出GXPO用梯度外推近似多步前瞻以低成本改进GRPO式推理RL更新。	cs.LGcs.AI	Ismam Nur Swapnil, Aranya Saha, Tanvir Ahmed Khan, Mohammad Ariful Haque, Ser-Nam Lim	Reinforcement learning is widely used to improve the reasoning ability of large language models, especially when answers can be automatically checked. Standard GRPO-style training updates the model using only the current step, while full multi-step lookahead c... Reinforcement learning is widely used to improve the reasoning ability of large language models, especially when answers can be automatically checked. Standard GRPO-style training updates the model using only the current step, while full multi-step lookahead can give a better update direction but is too expensive because it needs many backward passes. We propose Gradient Extrapolation-Based Policy Optimization (GXPO), a plug-compatible policy-update rule for GRPO-style reasoning RL. GXPO approxi...
509	Physics-based Digital Twins for Integrated Thermal Energy Systems Using Active Learning 2605.06756 热能系统数字孪生主动学习用主动学习耦合Modelica仿真与多类代理模型构建不确定性感知数字孪生。	cs.LG	Umme Mahbuba Nabila, Paul Seurin, Linyu Lin, Majdi I. Radaideh	Real-time supervisory control of thermal energy distribution systems requires digital twins that are accurate, interpretable, and uncertainty-aware, yet remain data and computationally efficient. High-fidelity simulations alone are costly, while purely data-dr... Real-time supervisory control of thermal energy distribution systems requires digital twins that are accurate, interpretable, and uncertainty-aware, yet remain data and computationally efficient. High-fidelity simulations alone are costly, while purely data-driven surrogates often lack robustness. To address these challenges, this work proposes an active learning (AL) framework that couples system-level Modelica simulations with four simpler physics-informed and data-driven surrogate modeling ap...
510	Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache 2605.06763 KV缓存稀疏注意力索引将稀疏注意力建模为范围搜索以构建推理高效且避免漏选关键KV的索引。	cs.LG	Mohsen Dehghankar, Abolfazl Asudeh	Sparse attention improves LLM inference efficiency by selecting a subset of key-value entries, but at the cost of potential accuracy degradation. In particular, omitting critical KV entries can induce substantial errors in model outputs. Existing methods typic... Sparse attention improves LLM inference efficiency by selecting a subset of key-value entries, but at the cost of potential accuracy degradation. In particular, omitting critical KV entries can induce substantial errors in model outputs. Existing methods typically operate under fixed or adaptive token budgets and provide empirical robustness or partial theoretical guarantees, yet they do not ensure zero false negatives in decoding steps, particularly since the set of relevant tokens is both quer...
511	Revisiting Adam for Streaming Reinforcement Learning 2605.06764 Streaming RL Adam优化重新分析Adam在流式强化学习中的稳定更新机制。	cs.LGcs.AI	Florin Gogianu, Adrian Catalin Lutu, Razvan Pascanu	Learning from a sequence of interactions, as soon as observations are perceived and acted upon, without explicitly storing them, holds the promise of simpler, more efficient and adaptive algorithms. For over a decade, however, deep reinforcement learning walke... Learning from a sequence of interactions, as soon as observations are perceived and acted upon, without explicitly storing them, holds the promise of simpler, more efficient and adaptive algorithms. For over a decade, however, deep reinforcement learning walked the contrary path, augmenting agents with replay buffers or parallel sampling routines, in an effort to tame learning instability. Recently, this topic has been revisited by Elsayed et al. (2024), focusing on update computation through el...
512	Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport 2605.06785 PRM校准与最优传输用条件最优传输校准过程奖励模型的成功概率预测。	cs.LGcs.AI	Rachel Ma, Dylan Hadfield-Menell, Kristjan Greenewald	Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of conditional optimal transport for calibrating PRMs, modifying conditio... Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of conditional optimal transport for calibrating PRMs, modifying conditional OT (CondOT) map learning \cite{bunne2022supervised} to estimate a monotonic conditional quantile function over success probabilities estimated by the PRM, conditioned on PRM hidden states. This yields structurally valid quantile estimat...
513	Conformal Agent Error Attribution 2605.06788 多智能体错误归因用共形预测为多智能体交互轨迹提供可保证的错误定位。	cs.LG	Naihe Feng, Yi Sui, Shiyi Hou, Ga Wu, Jesse C. Cresswell	When multi-agent systems (MAS) fail, identifying where the decisive error occurred is the first step for automated recovery to an earlier state. Error attribution remains a fundamental challenge due to the long interaction traces that large language model-base... When multi-agent systems (MAS) fail, identifying where the decisive error occurred is the first step for automated recovery to an earlier state. Error attribution remains a fundamental challenge due to the long interaction traces that large language model-based MAS generate. This paper presents a framework for error attribution based on conformal prediction (CP) which provides finite-sample, distribution-free coverage guarantees. We introduce new algorithms for filtration-based CP designed for s...
514	MIND: Monge Inception Distance for Generative Models Evaluation 2605.06797 生成模型评估指标提出MIND以切片Wasserstein改进FID的评估可靠性。	cs.LG	Quentin Berthet, Yu-Han Wu, Clement Crepy, Romuald Elie, Klaus Greff	We propose the Monge Inception Distance (MIND), a metric for evaluating generative models that addresses key limitations of the widely adopted Fr\'echet Inception Distance (FID). The MIND metric leverages the sliced Wasserstein distance to compare distribution... We propose the Monge Inception Distance (MIND), a metric for evaluating generative models that addresses key limitations of the widely adopted Fr\'echet Inception Distance (FID). The MIND metric leverages the sliced Wasserstein distance to compare distributions by averaging one-dimensional optimal transport distances, efficiently computed via sorting. This approach circumvents the estimation of high-dimensional means and covariance matrices, which underlie FID's poor sample complexity and vulner...
515	From Model to Data (M2D): Shifting Complexity from GNNs to Graphs for Transparent Graph Learning 2605.06814 GNN蒸馏到数据提出M2D将GNN复杂度转移到图数据以提升可解释性。	cs.LG	Debolina Halder Lina, Arlei Silva	Graph Neural Networks (GNNs) achieve high performance but can be opaque to humans, making it difficult to understand and compare the many proposed architectures. While existing explainability methods attribute individual predictions to nodes, edges, or feature... Graph Neural Networks (GNNs) achieve high performance but can be opaque to humans, making it difficult to understand and compare the many proposed architectures. While existing explainability methods attribute individual predictions to nodes, edges, or features, they do not provide architectural transparency or explain the fundamental performance gap between simple and more complex models. To address this limitation, we introduce Model-to-Data (M2D) distillation, a new framework that increases t...
516	A Theory of Online Learning with Autoregressive Chain-of-Thought Reasoning 2605.06819 在线CoT学习理论建立自回归链式思维映射的在线学习与错误界理论。	cs.LG	Ilan Doron-Arad, Idan Mehalel, Elchanan Mossel	Autoregressive generation lies at the heart of the mechanism of large language models. It can be viewed as the repeated application of a next-token generator: starting from an input string (prompt), the generator is applied for $M$ steps, and the last generate... Autoregressive generation lies at the heart of the mechanism of large language models. It can be viewed as the repeated application of a next-token generator: starting from an input string (prompt), the generator is applied for $M$ steps, and the last generated token is taken as the final output. [Joshi et al., 2025] proposed a PAC model for studying the learnability of the input-output maps arising from this process. We develop an online analogue of this framework, focusing on the mistake bound...
517	A Rod Flow Model for Adam at the Edge of Stability 2605.06821 Adam边界稳定性模型用rod flow连续动力学刻画Adam在稳定边缘的行为。	cs.LGcs.AI	Eric Regis, Sinho Chewi	Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these models to mom... Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these models to momentum methods remains underdeveloped. In the gradient descent setting, Regis et al. (arXiv:2602.01480) introduced rod flow, which models consecutive iterates as an extended one-dimensional object -- a "rod." Here we extend rod flow to Adam ...
518	SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents 2605.06822 交易智能体自改进提出可审计的自演化评分规则以对齐金融交易智能体。	cs.LG	Xiwen Chen, Wenhui Zhu, Songzhu Zheng, Kashif Rasul, Yueyue Deng	Large language models (LLMs) are increasingly deployed for autonomous financial trading, a domain requiring continuous adaptation to noisy, non-stationary markets. Existing self-improving agents typically address this through unbounded free-form prompt optimiz... Large language models (LLMs) are increasingly deployed for autonomous financial trading, a domain requiring continuous adaptation to noisy, non-stationary markets. Existing self-improving agents typically address this through unbounded free-form prompt optimization. However, in low signal-to-noise environments with delayed scalar rewards (P\&L), this unstructured approach exacerbates the fundamental credit assignment problem: optimizers cannot reliably distinguish systematic logic flaws from sto...
519	Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics 2605.06831 DDIM与DDPM幻觉理论从反向ODE/SDE分析DDIM更易卡模态导致幻觉的原因。	cs.LGcs.AI	Muhammad H. Ashiq, Samanyu Arora, Abhinav N. Harish, Ishaan Kharbanda, Hung Yun Tseng	We theoretically study the hallucination phenomena in two canonical diffusion samplers: the stochastic Denoising Diffusion Probabilistic Model (DDPM) and the deterministic Denoising Diffusion Implicit Model (DDIM). We analyze the reverse ODE (DDIM) and SDE (DD... We theoretically study the hallucination phenomena in two canonical diffusion samplers: the stochastic Denoising Diffusion Probabilistic Model (DDPM) and the deterministic Denoising Diffusion Implicit Model (DDIM). We analyze the reverse ODE (DDIM) and SDE (DDPM) for a Gaussian mixture target, proving that after a critical time $\tau$, (a) DDIM can become stuck on the segment connecting the two nearest modes and (b) DDPM stochasticity helps it become unstuck from this region, thus avoiding hal...
520	Attribution-Based Neuron Utility for Plasticity Restoration in Deep Networks 2605.06834 持续学习可塑性恢复用归因度量神经元效用以恢复网络可塑性并减缓僵化。	cs.LG	Patrick Elisii, Lucas Beauchemin, Dawer Jamshed	Continual learning research attempts to conserve two fundamental capabilities: new knowledge acquisition and the preservation of previously acquired knowledge. While knowledge in this case can be measured through performance over an implicit or explicit task s... Continual learning research attempts to conserve two fundamental capabilities: new knowledge acquisition and the preservation of previously acquired knowledge. While knowledge in this case can be measured through performance over an implicit or explicit task space, model plasticity generally concerns adaptability as data distributions evolve. Though much of the literature has focused on catastrophic forgetting, deep networks can also suffer from loss of plasticity, becoming progressively harder ...
521	On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics 2605.06835 表格扩散隐私泄露系统评估表格扩散模型的隐私泄露因素与攻击度量。	cs.LGcs.AI	Masoumeh Shafieinejad, D. B. Emerson, Behnoosh Zamanlooy, Elaheh Bassak, Fatemeh Tavakoli	Tabular data plays an important role in many fields and industries, including those with elevated privacy considerations and risks. As such, there is a rising interest in generating high-quality synthetic proxies for real tabular data as a means of reducing pr... Tabular data plays an important role in many fields and industries, including those with elevated privacy considerations and risks. As such, there is a rising interest in generating high-quality synthetic proxies for real tabular data as a means of reducing privacy risk and proprietary data exposure. With tabular diffusion models (TDMs) demonstrating leading performance in synthesizing such data, understanding and measuring the privacy risks associated with these models is imperative. Leveraging...
522	How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment 2605.06850 RL对齐KV缓存压缩用影子掩码蒸馏压缩RL后训练中的KV缓存以省显存。	cs.LGcs.AI	Rui Zhu, Weiheng Bai, Qiushi Wu, Yang Ren, Haixu Tang	Reinforcement Learning (RL) has emerged as a crucial paradigm for unlocking the advanced reasoning capabilities of Large Language Models (LLMs), encompassing frameworks like RLHF and RLAIF. Regardless of the specific optimization algorithm (e.g., PPO, GRPO, or... Reinforcement Learning (RL) has emerged as a crucial paradigm for unlocking the advanced reasoning capabilities of Large Language Models (LLMs), encompassing frameworks like RLHF and RLAIF. Regardless of the specific optimization algorithm (e.g., PPO, GRPO, or Online DPO), online RL inherently requires an exploratory trajectory generation (rollout) phase. However, for long-context reasoning tasks, this rollout phase imposes a severe ``memory wall'' due to the exorbitant Key-Value (KV) cache foot...
523	Christoffel-DPS: Optimal sensor placement in diffusion posterior sampling for arbitrary distributions 2605.06861 扩散后验采样传感器布置用Christoffel-DPS在任意分布下优化扩散重建的传感器选址。	cs.LG	James Rowbottom, Nick Huang, Carola-Bibiane Sch\"onlieb, Ben Adcock	State estimation is a critical task in scientific, engineering and control applications. Since the reliability of reconstructions depends on the number and position of sensors, optimal sensor placement (OSP) is essential in scenarios where measurements are spa... State estimation is a critical task in scientific, engineering and control applications. Since the reliability of reconstructions depends on the number and position of sensors, optimal sensor placement (OSP) is essential in scenarios where measurements are sparse and expensive. Classical OSP approaches rely on Gaussian assumptions and are consequently unable to account for the complex distributions encountered in many real-world systems. Generative-model-based reconstruction using sensor guided ...
524	Multi-Objective Multi-Agent Bandits: From Learning Efficiency to Fairness Optimization 2605.06864 多目标多智能体Bandit提出兼顾Pareto效率与公平性的多智能体多目标UCB算法。	cs.LG	John Wang, Mengfan Xu	We study multi-objective multi-agent multi-armed bandits (MO-MA-MAB) under stochastic rewards, where agents observe heterogeneous reward vectors and communicate over time-varying graphs. We formulate this emerging problem setting to address \emph{efficient lea... We study multi-objective multi-agent multi-armed bandits (MO-MA-MAB) under stochastic rewards, where agents observe heterogeneous reward vectors and communicate over time-varying graphs. We formulate this emerging problem setting to address \emph{efficient learning}, measured by Pareto regret, and incorporate \emph{fair learning} as an additional goal, captured via social welfare. To measure efficiency, we formulate Pareto regret and develop \textsc{Pareto UCB1 Gossip}, whose novel exploration r...
525	Dataset Watermarking for Closed LLMs with Provable Detection 2605.06865 闭源LLM数据集水印设计可证明检测的数据集水印以验证闭源模型训练使用情况。	cs.LG	Pengrun Huang, Kamalika Chaudhuri, Yu-Xiang Wang	Large language models (LLMs) are pre-trained and post-trained on vast amounts of loosely curated data, raising the possibility that these models may have been trained on proprietary datasets or the same benchmarks used for evaluation. This motivates the need f... Large language models (LLMs) are pre-trained and post-trained on vast amounts of loosely curated data, raising the possibility that these models may have been trained on proprietary datasets or the same benchmarks used for evaluation. This motivates the need for dataset watermarking: designing datasets such that training on them leaves detectable signatures in the resulting model. Prior work has explored this problem for open models. We introduce the first dataset watermarking method for closed ...
526	A Finite-Iteration Theory for Asynchronous Categorical Distributional Temporal-Difference Learning 2605.06866 异步分布式TD理论给出异步分类分布式TD学习的有限迭代误差收敛理论。	cs.LG	Ege C. Kaya, Abolfazl Hashemi	Recent non-asymptotic analyses have substantially advanced the theory of distributional policy evaluation, but they largely concern synchronous full-state updates under a generative model, model-based estimators, accelerated variants, or different approximatio... Recent non-asymptotic analyses have substantially advanced the theory of distributional policy evaluation, but they largely concern synchronous full-state updates under a generative model, model-based estimators, accelerated variants, or different approximation architectures. Standard categorical temporal-difference learning is typically used in a different regime. It asynchronously performs a single-state update at each iteration and, in online settings, is driven by a Markovian trajectory. Thi...
527	When Descent Is Too Stable: Event-Triggered Hamiltonian Learning to Optimize 2605.06868 事件触发哈密顿优化提出SHAPE通过事件触发控制优化器跳出过度稳定的局部极小。	cs.LG	Yi Wang, Chandrajit Bajaj	Fixed-budget nonconvex optimization can fail not because local descent is unstable, but because it is too stable: after reaching a nearby stationary point, an optimizer may spend the remaining evaluations refining an uninformative local minimum. We formulate t... Fixed-budget nonconvex optimization can fail not because local descent is unstable, but because it is too stable: after reaching a nearby stationary point, an optimizer may spend the remaining evaluations refining an uninformative local minimum. We formulate this failure mode as a control problem over optimizer dynamics, where the learner must decide when to descend, when to exploit a promising basin, and when stagnation should trigger movement elsewhere. We introduce SHAPE, a structured adaptiv...
528	Continuous First, Discrete Later: VQ-VAEs Without Dimensional Collapse 2605.06870 VQ-VAE维度塌缩分析并缓解VQ-VAE表示维度塌缩导致的性能下界问题。	cs.LG	Xinyu Zhao, Nikita Karagodin, Hamed Hassani, Sinan Hersek, Paul Pu Liang	While many approaches to improve VQ-VAE performance focus on codebook size and utilization, the effect of dimensional collapse, where trained VQ-VAE representations live in an extremely low-dimensional subspace (1-2% of full rank), remains unaddressed. We show... While many approaches to improve VQ-VAE performance focus on codebook size and utilization, the effect of dimensional collapse, where trained VQ-VAE representations live in an extremely low-dimensional subspace (1-2% of full rank), remains unaddressed. We show theoretically and empirically that dimension collapse causes a hard loss lower bound that various codebook improvement techniques fail to surpass. Our analytic framework extends the sequential learning effect of Saxe et al. [2014] by intro...
529	On the Divergence of Differential Temporal Difference Learning without Local Clocks 2605.06874 差分TD学习发散性证明无局部时钟的差分TD学习在某些设定下会发散。	cs.LG	David Antrobius, Shangtong Zhang	Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form $\alpha_t$ that depends only on the time step $t$ (i.e., a global clock)... Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form $\alpha_t$ that depends only on the time step $t$ (i.e., a global clock). The latter is of the form $\alpha_{\nu(S_t, t)}$, where $\nu(s, t)$ counts the number of visits to state $s$ until time $t$ (i.e., a local clock). In discounted RL, an RL algorithm that is convergent with a local clock is always also conv...
530	Temporal Attention for Adaptive Control of Euler-Lagrange Systems with Unobservable Memory 2605.06877 注意力自适应控制用时间注意力生成控制增益以应对不可观测记忆的摩擦系统。	cs.LG	Giansalvo Cirrincione, Adriano Fagiolini	Adaptive control of Euler-Lagrange systems is challenging when friction is governed by a finite-horizon internal state that is not directly observable from joint measurements. In this setting, the measured closed-loop state is no longer Markovian, and standard... Adaptive control of Euler-Lagrange systems is challenging when friction is governed by a finite-horizon internal state that is not directly observable from joint measurements. In this setting, the measured closed-loop state is no longer Markovian, and standard certainty-equivalence adaptive laws may lose their convergence guarantees. The paper proposes a meta-control architecture in which the gains of a computed-torque controller are generated by a self-attention block processing a short window ...
531	Better Protein Function Prediction by Modeling Survivorship Bias 2605.06879 蛋白功能PU学习偏差建模生存者偏差以改进仅正例条件下的蛋白功能预测。	cs.LG	Zhongmou Chao, Poompol Buathong, Ekaterina Selivanovitch, Susan Daniel, Peter I. Frazier	Protein sequence data from nature exhibits survivorship bias: we only observe data from those organisms that survive and reproduce, while non-functional protein mutations are eliminated by natural selection. Thus, predicting whether a protein sequence is funct... Protein sequence data from nature exhibits survivorship bias: we only observe data from those organisms that survive and reproduce, while non-functional protein mutations are eliminated by natural selection. Thus, predicting whether a protein sequence is functional often requires learning from positive examples alone. While positive-unlabeled (PU) learning frameworks offer a generic solution to this problem, existing PU methods ignore the evolutionary processes that shape sequence observability ...
532	Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment 2605.06885 AR到扩散LM对齐通过表示对齐将自回归语言模型适配为扩散语言模型而少训练。	cs.LGcs.AI	Fred Zhangzhi Peng, Alexis Fox, Anru R. Zhang, Alexander Tong	Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive che... Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive checkpoints can be converted into diffusion language models, existing recipes primarily transfer parameters through continued denoising training with objective- and attention-level modifications. We instead ask whether the internal representat...
533	Streaming Adversarial Robustness in Fuzzy ARTMAP: Mechanism-Aligned Evaluation, Progressive Training, and Interpretable Diagnostics 2605.06902 流式对抗鲁棒性评估为Fuzzy ARTMAP提出机制对齐攻击与渐进训练及诊断方法。	cs.LG	Shane Cairns, Leonardo Enzo Brito da Silva, Sasha Petrenko, Donald C. Wunsch II, Jian Liu	Adversarial robustness has been studied extensively for offline deep networks, but less is known about strict single-pass streaming neural learners. This paper studies adversarial robustness in Fuzzy ARTMAP, an Adaptive Resonance Theory architecture based on c... Adversarial robustness has been studied extensively for offline deep networks, but less is known about strict single-pass streaming neural learners. This paper studies adversarial robustness in Fuzzy ARTMAP, an Adaptive Resonance Theory architecture based on category competition, complement coding, match tracking, and replay-free prototype updates. We introduce WB-Softmax, a differentiable white-box attack surrogate aligned with ARTMAP's category-competition and map-field prediction mechanism, a...
534	Conservative Flows: A New Paradigm of Generative Models 2605.06905 守恒流生成模型提出保持数据分布不变的离散动力学生成范式与采样机制。	cs.LG	Eshed Gal, Md Shahriar Rahim Siddiqui, Moshe Eliasof, Eldad Haber	Modern generative modeling is dominated by transport from a noise prior to data. We propose an alternative paradigm in which generation is performed by a discrete stochastic dynamics that leaves the data distribution invariant, initialized from data-supported ... Modern generative modeling is dominated by transport from a noise prior to data. We propose an alternative paradigm in which generation is performed by a discrete stochastic dynamics that leaves the data distribution invariant, initialized from data-supported states rather than from noise. The framework can utilize any pretrained flow model. We develop two probability-preserving sampling mechanisms, a corrected Langevin dynamics with a Metropolis adjustment and a predictor-corrector flow, that o...
535	TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond 2605.06906 人类移动预训练框架提出TraXion面向移动轨迹的结构化预训练目标与建模。	cs.LG	Shang-Ling Hsu, Mark Tenzer, Cyrus Shahabi, Khurram Shafique	Human mobility differs from text and from generic time series in three structural ways: visits are tuple-valued events whose meaning depends on the joint distribution over location, time, and activity; users carry persistent signatures across trajectories; and... Human mobility differs from text and from generic time series in three structural ways: visits are tuple-valued events whose meaning depends on the joint distribution over location, time, and activity; users carry persistent signatures across trajectories; and visits are not independent across users, since co-location at shared places is a primary signal. Existing pre-training recipes for mobility import objectives from language modeling, treating trajectories as sentences and visits as tokens, ...
536	Same Signal, Opposite Meaning: Direction-Informed Adaptive Learning for LLM Agents 2605.06908 LLM自适应计算门控提出方向感知学习以稳定门控信号与额外计算收益的关系。	cs.LGcs.AI	Ziming Li, Jiatan Huang, Xiaoguang Guo, Guilin Wang, Chuxu Zhang	Adaptive test-time compute for LLM agents aims to invoke extra computation only when it improves performance. Existing methods typically use confidence-, uncertainty-, or difficulty-based gates, assuming a fixed direction from the gating signal through compute... Adaptive test-time compute for LLM agents aims to invoke extra computation only when it improves performance. Existing methods typically use confidence-, uncertainty-, or difficulty-based gates, assuming a fixed direction from the gating signal through compute need to the value of computation. This makes gating a utility-calibration problem: gating signals should align with whether extra computation improves the final outcome over the base policy. We show that this alignment is unstable: the sam...
537	Dual-Scale Temporal Fusion Reveals Structured Predictability in Subseasonal-to-Seasonal Temperature Prediction 2605.06911 S2S温度可预测性用双尺度时间融合揭示季节内到季节温度预测的结构性可预测性。	cs.LG	Elnaz Bashir, Jiali Wang, Lin Yan	Subseasonal-to-seasonal (S2S) temperature forecasts, spanning several weeks to a few months, are critically needed in agriculture practice, energy planning, and extreme-weather induced risk management, yet their reliability varies substantially across seasons ... Subseasonal-to-seasonal (S2S) temperature forecasts, spanning several weeks to a few months, are critically needed in agriculture practice, energy planning, and extreme-weather induced risk management, yet their reliability varies substantially across seasons and regions. Forecast skill is often attributed primarily to lead time, but this perspective does not fully explain the spatiotemporal patterns of predictability. Here we show that S2S predictability is organized across interacting temporal...
538	LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs 2605.06915 LLM概率信念一致性量化LLM在证据更新中的概率信念不一致与非贝叶斯性。	cs.LG	Chacha Chen, Matthew J\"orke, Adam Goli\'nski, Masha Fedzechkina, Guillermo Sapiro	Modern AI systems are being deployed in complex domains such as medicine, science, and law, where it is important that they not only produce correct answers, but also represent and update uncertain beliefs about the world as new evidence arrives. We introduce ... Modern AI systems are being deployed in complex domains such as medicine, science, and law, where it is important that they not only produce correct answers, but also represent and update uncertain beliefs about the world as new evidence arrives. We introduce the novel technique of studying LLMs as information processing rules and utilize the information processing gap to study the internal (in)consistencies of how LLMs update their probabilistic beliefs from evidence. Our extensive experiments ...
539	Tyche: One Step Flow for Efficient Probabilistic Weather Forecasting 2605.06916 高效概率天气预报提出一步流模型Tyche以低成本生成校准的概率天气预报。	cs.LG	Fan Xu, Yuan Gao, Kun Wang, Rui Su, Fenghua Ling	Probabilistic weather forecasting requires not only accurate trajectories, but calibrated distributions over plausible atmospheric futures. Recent data-driven systems have achieved remarkable deterministic skill, and diffusion-based ensemble forecasters have s... Probabilistic weather forecasting requires not only accurate trajectories, but calibrated distributions over plausible atmospheric futures. Recent data-driven systems have achieved remarkable deterministic skill, and diffusion-based ensemble forecasters have substantially improved sample realism and uncertainty quantification. However, their inference cost scales with forecast horizon, ensemble size, and the number of denoising steps required for each transition, making large operational ensembl...
540	Target-Aware Data Augmentation for SAT Prediction 2605.06931 SAT预测数据增强提出目标感知数据增强以减少SAT标注成本并提升预测性能。	cs.LG	Eshed Gal, Uri Ascher, Eldad Haber	Learning-based approaches to NP-hard problems have shown increasing promise, but their progress is fundamentally constrained by the high cost of generating labeled training data. In domains such as Boolean satisfiability (SAT), standard pipelines rely on solve... Learning-based approaches to NP-hard problems have shown increasing promise, but their progress is fundamentally constrained by the high cost of generating labeled training data. In domains such as Boolean satisfiability (SAT), standard pipelines rely on solver-in-the-loop labeling, which scales poorly with problem size and limits the amount of usable supervision. This bottleneck hinders the broader goal of leveraging machine learning to capture structure in hard combinatorial problems. In this ...
541	MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security 2605.06933 Post-quantum AI governance提出具可证明安全性的后量子多智能体治理架构。	cs.LG	Sepideh Avizeh, Tushin Mallick, Alina Oprea, Cristina Nita-Rotaru, Reihaneh Safavi-Naini	Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment of agentic AI systems and advancements in quantum computing. With respect to agentic AI systems, one of the most critical problems is creating secure governing arc... Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment of agentic AI systems and advancements in quantum computing. With respect to agentic AI systems, one of the most critical problems is creating secure governing architectures that ensure agents follow their owners' communication and interaction policies and can be held accountable for the messages they exchange with other agents. With respect to quantum computing, existing systems must be retrofitted ...
542	Learned Lyapunov Shielding for Adaptive Control 2605.06934 Lyapunov-safe adaptive control用学习李雅普诺夫与安全滤波增强自适应控制。	cs.LG	Giansalvo Cirrincione, Adriano Fagiolini	We augment the Slotine--Li adaptive controller for Euler--Lagrange systems with three learned components: a structured-quadratic Lyapunov function $V_\psi$ whose positive-definiteness follows from a Cholesky parameterization, a residual Soft Actor--Critic po... We augment the Slotine--Li adaptive controller for Euler--Lagrange systems with three learned components: a structured-quadratic Lyapunov function $V_\psi$ whose positive-definiteness follows from a Cholesky parameterization, a residual Soft Actor--Critic policy that adds bounded torque corrections to the analytic baseline, and a physics-informed neural network that estimates unmodeled dynamics. A closed-form safety filter, derived from the single affine constraint \(\dot V_\psi + \alpha V_\ps...
543	A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis 2605.06937 LLM prompt workflow calibration给出可复现流程校准证据综述的LLM提示工作流。	cs.LG	Teo Susnjak	This methods article presents a reproducible calibration workflow for prompt-based large language models (LLMs) in structured evidence-synthesis tasks. The method separates the rules that define the scientific task from the mutable prompt harness that frames a... This methods article presents a reproducible calibration workflow for prompt-based large language models (LLMs) in structured evidence-synthesis tasks. The method separates the rules that define the scientific task from the mutable prompt harness that frames and applies them. It optimises that harness against labelled or reference examples and an explicit task metric, then preserves the calibrated workflow as an inspectable artefact with its specification, metric, settings, and evaluation traces...
544	A Generalized Singular Value Theory for Neural Networks 2605.06938 Generalized SVD for networks证明多种神经网络可等价表示为广义SVD结构。	cs.LGcs.AI	Brian Charles Brown, Robert Bridges, David Grimsman, Mauricio Munoz, Sean Warnick	Building on the abstract Generalized Singular Value Decomposition (GSVD) theory of Brown et al. [2025], we prove that most modern neural architectures admit a generalized SVD representation in which they are left-invertible before a final linear layer, with no... Building on the abstract Generalized Singular Value Decomposition (GSVD) theory of Brown et al. [2025], we prove that most modern neural architectures admit a generalized SVD representation in which they are left-invertible before a final linear layer, with no change in input-output behavior. Furthermore, the left-invertible nonlinear portion of the input-output behavior can be made to be \emph{norm preserving}, meaning that perturbations in the left-invertible ``embedding'' (the activations pri...
545	Bias and Uncertainty in LLM-as-a-Judge Estimation 2605.06939 LLM-as-judge bias estimation分析LLM评审的偏差与不确定性并评估校准风险。	cs.LG	James Fiedler	LLM-as-a-Judge evaluation has become a standard tool for assessing base model performance. However, characterizing performance via the naive estimator, i.e., raw judge outputs, is systematically biased. Recent work has proposed estimators to correct this bias,... LLM-as-a-Judge evaluation has become a standard tool for assessing base model performance. However, characterizing performance via the naive estimator, i.e., raw judge outputs, is systematically biased. Recent work has proposed estimators to correct this bias, but their reliability depends critically on judge quality and, for model comparisons, on calibration stability. Sharing calibration across compared models is practically attractive but can introduce severe bias, including cases where the c...
546	Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice Settings 2605.06941 Causal bilevel discrete choice用因果感知基础模型求解离散选择下的双层定价优化。	cs.LG	Shivaram Subramanian, Zhengliang Xue, Markus Ettl, Yingdong Lu, Jayant Kalagnanam	We introduce a causal aware foundation-model framework for real time optimal decision making in discrete choice environments. We propose a constrained triple-head price optimization (C3PO) network to solve a bilevel decision problem in which a service provider... We introduce a causal aware foundation-model framework for real time optimal decision making in discrete choice environments. We propose a constrained triple-head price optimization (C3PO) network to solve a bilevel decision problem in which a service provider selects an optimal assortment while heterogeneous users make personalized acceptance or rejection choices optimizing their own personalized preferences. C3PO integrates imitation learning of prices, multi-task learning of revenue responses...
547	ProtoSSL: Interpretable Prototype Learning from Unlabeled Time-Series Data 2605.06943 Unsupervised prototype time-series从无标签时序学习可解释原型以支持案例式解释。	cs.LG	Steven Song, Sahil Sethi, Brett Beaulieu-Jones, Robert L. Grossman	In time-series domains where both predictive performance and interpretability are essential, deep neural networks achieve strong results but provide limited insight into how their predictions are made. Projection-based prototype networks address this limitatio... In time-series domains where both predictive performance and interpretability are essential, deep neural networks achieve strong results but provide limited insight into how their predictions are made. Projection-based prototype networks address this limitation by grounding predictions in similarity to representative training examples, enabling case-based explanations and global prototype inspection. However, existing approaches rely on label supervision, tying prototypes to a specific task and ...
548	Adaptive Memory Decay for Log-Linear Attention 2605.06946 Log-linear attention memory decay提出自适应记忆衰减以提升对数线性注意力的召回。	cs.LGcs.AI	Yaxita Amin, Helen Zichen Li, Mengfan Zhang, Samet Ayhan	Sequence models face a fundamental tradeoff between memory capacity and computational efficiency. Transformers achieve expressive context modeling at quadratic cost, while linear attention and state-space models run in linear time by compressing context into a... Sequence models face a fundamental tradeoff between memory capacity and computational efficiency. Transformers achieve expressive context modeling at quadratic cost, while linear attention and state-space models run in linear time by compressing context into a fixed-size hidden state, inherently limiting recall. Log-linear attention navigates this tradeoff by organizing memory across a Fenwick tree hierarchy, growing its hidden state logarithmically with sequence length at log-linear compute cos...
549	Rollback-Free Stable Brick Structures Generation 2605.06947 Stable brick structure generation用强化学习在训练期约束物理稳定生成积木结构。	cs.LG	Chenhui Xu, Ziyue Bai, Fuxun Yu, Heng Huang, Jinjun Xiong	While autoregressive models have advanced 3D generation, creating physically stable brick structures remains a challenge due to the strict requirements of gravity and interconnectivity. Existing approaches rely on external physical simulators during inference ... While autoregressive models have advanced 3D generation, creating physically stable brick structures remains a challenge due to the strict requirements of gravity and interconnectivity. Existing approaches rely on external physical simulators during inference to perform rejection sampling and brick-by-brick rollbacks, which severely bottlenecks efficiency. To address this, we propose a reinforcement learning paradigm that shifts physical validity enforcement from test-time correction to training...
550	Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection 2605.06955 Score matching anomaly detection用峰度引导选择噪声尺度提升表格异常检测。	cs.LGcs.AI	Victor Livernoche, Jie Zan, Reihaneh Rabbany	Denoising score matching (DSM) provides a way to learn data distributions by training a neural network to recover the score function, defined as the gradient of the log density, from noise-corrupted samples. Once trained, the score magnitude at a test point re... Denoising score matching (DSM) provides a way to learn data distributions by training a neural network to recover the score function, defined as the gradient of the log density, from noise-corrupted samples. Once trained, the score magnitude at a test point reflects how consistent that point is with the learned distribution, making it a natural anomaly signal. The key practical challenge is selecting the perturbation scale: too little noise yields unstable score estimates in sparse regions, whil...
551	$f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses 2605.06977 f-divergence regularized RLHF统一分析不同f散度正则下RLHF的采样与理论性质。	cs.LGcs.AI	Di Wu, Chengshuai Shi, Jing Yang, Cong Shen	Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone technique for post-training large language models. While most existing approaches rely on the reverse KL-regularization, recent empirical studies have begun exploring alternative diverg... Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone technique for post-training large language models. While most existing approaches rely on the reverse KL-regularization, recent empirical studies have begun exploring alternative divergences (e.g., forward KL, chi-squared) as regularizers in RLHF. However, a unified theoretical understanding of general $f$-divergence regularization remains under-explored. To fill this gap, this work develops a comprehensive theoretical fr...
552	PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction 2605.06979 Neural causal abstraction localization用最优传输渐进定位神经因果抽象的关键干预位置。	cs.LGcs.AI	Jonathn Chang, Arya Datla, Ziv Goldfeld	Causal abstraction offers a principled framework for mechanistic interpretability, aligning a high-level causal model with the low-level computation realized by a neural network through counterfactual intervention analysis. Existing methods such as distributed... Causal abstraction offers a principled framework for mechanistic interpretability, aligning a high-level causal model with the low-level computation realized by a neural network through counterfactual intervention analysis. Existing methods such as distributed alignment search (DAS) learn expressive subspace interventions, but the relevant neural site is unknown a priori, so finding a handle requires a computationally burdensome search over candidate sites. We introduce PLOT (Progressive Localiz...
553	FastOmniTMAE: Parallel Clause Learning for Scalable and Hardware-Efficient Tsetlin Embeddings 2605.06982 Tsetlin embedding acceleration并行子句学习加速可扩展且硬件友好的Tsetlin嵌入。	cs.LG	Ahmed K. Kadhim, Lei Jiao, Rishad Shafik, Ole-Christoffer Granmo, Mayur Kishor Shende	Embedding models in natural language processing (NLP) increasingly rely on deep architectures such as BERT, while simpler models such as Word2Vec provide efficient representations but limited interpretability. The Tsetlin Machine (TM) offers an alternative log... Embedding models in natural language processing (NLP) increasingly rely on deep architectures such as BERT, while simpler models such as Word2Vec provide efficient representations but limited interpretability. The Tsetlin Machine (TM) offers an alternative logic-based learning paradigm. Omni TM Autoencoder (Omni TM-AE) applies this paradigm to static embedding by exploiting automaton state distributions within a single clause layer, but its training process remains slow. In this work, we propose...
554	Response Time Enhances Alignment with Heterogeneous Preferences 2605.06987 Preference alignment with response time利用标注响应时间建模异质偏好以改进对齐。	cs.LG	Federico Echenique, Alireza Fallah, Baihe Huang, Michael I. Jordan	Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-... Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this crit...
555	Why Does Agentic Safety Fail to Generalize Across Tasks? 2605.06992 Agentic safety generalization解释多任务场景下智能体安全为何难以跨任务泛化。	cs.LG	Yonatan Slutzky, Yotam Alexander, Tomer Slor, Yoav Nagel, Nadav Cohen	AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but ... AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but do so while avoiding risks and handling ones that materialize. Empirical evidence suggests that even when the ability to execute generalizes to unseen tasks, the ability to do so safely frequently does not. This paper provides theory and ex...
556	Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators 2605.06997 KV-cache-free long-context recall用谱Koopman算子实现无KV缓存的长程联想检索。	cs.LG	Anupama Sridhar, Alexander Johansen	Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-me... Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-memory recurrence but suffer a memory cliff: retrieval accuracy collapses once the gap between a stored fact and its query exceeds the effective horizon of the recurrent state. We introduce Echo, a KV-cache-free associative recall architectur...
557	Inductive Power Grid Cascading Failure Analysis with GRU-Gated Graph Attention 2605.07010 Power grid cascade failure GNN用GRU门控图注意力实现跨电网迁移的级联失效分析。	cs.LG	Tianxin Zhou, Xiang Li, Haibing Lu	Identifying vulnerable transmission lines in power grids before a cascading failure occurs is challenging: existing methods can learn inter-line failure correlations from cascade data, but they are trained and evaluated on a single grid, and transferring the l... Identifying vulnerable transmission lines in power grids before a cascading failure occurs is challenging: existing methods can learn inter-line failure correlations from cascade data, but they are trained and evaluated on a single grid, and transferring the learned knowledge to an unseen grid remains an open problem. We address this by training a single Gated Recurrent Unit (GRU)-gated Graph Attention Network on combined cascading failure data from limited training grids and applying it directl...
558	Dual-Agent Co-Training for Health Coaching via Implicit Adversarial Preference Optimization 2605.07011 AI health coaching co-training双智能体协同训练并用偏好优化提升健康教练对话。	cs.LG	Da Long, Lingyi Fu, Diya Michelle Rao, Jasmine Ruales Carrera, Yang Bai	Motivational-interviewing-based health coaching is an effective approach for improving mental health and promoting healthy behavior change. However, the scarcity of trained human coaches and the high cost of coaching services make such support inaccessible to ... Motivational-interviewing-based health coaching is an effective approach for improving mental health and promoting healthy behavior change. However, the scarcity of trained human coaches and the high cost of coaching services make such support inaccessible to many people who could benefit from it. This motivates the development of AI health coaches that can provide scalable and affordable support. Existing methods typically optimize only one side of the interaction: they either train a dialogue ...
559	FlashMol: High-Quality Molecule Generation in as Few as Four Steps 2605.07020 Few-step molecular diffusion提出四步级快速生成高质量3D分子构象模型。	cs.LGcs.AI	Xinyuan Wei, Zian Li, Shaoheng Yan, Cai Zhou, Muhan Zhang	Generating chemically valid 3D molecular conformations is critical for computational drug discovery. Classical diffusion-based models like GeoLDM perform well but require hundreds of steps, making large-scale in silico screening impractical. Recent efforts on ... Generating chemically valid 3D molecular conformations is critical for computational drug discovery. Classical diffusion-based models like GeoLDM perform well but require hundreds of steps, making large-scale in silico screening impractical. Recent efforts on few-step molecular generation have accelerated this process to 12-50 steps, but they often largely sacrifice sample stability. In this work, we present FlashMol, an ultra-fast molecule generative model producing high-quality molecular confo...
560	Self Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale 2605.07022 LLM-driven biomedical dataset construction用LLM从海量论文自动抽取构建细粒度生物医学数据集。	cs.LG	Haydn Jones, Yimeng Zeng, Alden Rose, Li S. Yifei, Yining Huang	Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show t... Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show that PubMed itself can be autonomously and cost-effectively turned into structured datasets that are larger, more nuanced, and more accurate than the curated databases they replace. We present three coupled contributions: (1) an LLM-based en...
561	Delulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Tasks 2605.07024 Code hallucination FIM benchmark发布多语言FIM代码幻觉检测的可验证基准Delulu。	cs.LG	Mahdi Erfanian, Nelson Daniel Troncoso, Aashna Garg, Amabel Gale, Xiaoyu Liu	Large Language Models for code generation frequently produce hallucinations in Fill-in-the-Middle (FIM) tasks -- plausible but incorrect completions such as invented API methods, invalid parameters, undefined variables, or non-existent imports. These failures ... Large Language Models for code generation frequently produce hallucinations in Fill-in-the-Middle (FIM) tasks -- plausible but incorrect completions such as invented API methods, invalid parameters, undefined variables, or non-existent imports. These failures pass superficial review yet introduce runtime errors. We introduce Delulu, a verified multi-lingual benchmark of 1,951 FIM samples across 7 languages and 4 hallucination types. Samples are curated through an adversarial pipeline: a frontier...
562	A Systematic Investigation of The RL-Jailbreaker in LLMs 2605.07032 RL-based LLM jailbreaking analysis系统研究RL越狱攻击机制并分析其成功原因。	cs.LGcs.AI	Montaser Mohammedalamen, Kevin Roice, Reginald McLean, Alyssa Lefaivre \v{S}kopac	The evolution of generative models from next-token predictors to autonomous engines of complex systems necessitates rigorous safety hardening. Adversarial jailbreaking, the strategic manipulation of models to elicit harmful output, remains a primary threat to ... The evolution of generative models from next-token predictors to autonomous engines of complex systems necessitates rigorous safety hardening. Adversarial jailbreaking, the strategic manipulation of models to elicit harmful output, remains a primary threat to safe deployment. While Reinforcement Learning (RL) frames jailbreaking as a multi-step attack through sequential optimization, a mechanistic understanding of why the framework succeeds remains incomplete. To fill this gap, we present the fi...
563	Learning Material-Aware Hamiltonian Risk Fields for Safe Navigation 2605.07038 Hamiltonian risk-aware navigation学习材料感知哈密顿风险场以实现更安全导航。	cs.LG	Aditya Sai Ellendula, Yi Wang, Chandrajit Bajaj	Risk-aware navigation should be selective: a policy should expose evasive degrees of freedom only when the local scene admits a lower-risk feasible maneuver, and suppress them when no safer alternative exists. We show that adding one context-energy term to a p... Risk-aware navigation should be selective: a policy should expose evasive degrees of freedom only when the local scene admits a lower-risk feasible maneuver, and suppress them when no safer alternative exists. We show that adding one context-energy term to a port-Hamiltonian navigation policy produces a learned force channel with exactly this falsifiable signature. When the local risk field contains a feasible lower-risk direction, the induced context force activates toward it; when the apparent...
564	PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents 2605.07039 Test-time learning for evo agents用强化学习在测试时自适应进化搜索代理策略。	cs.LG	Minghao Yan, Bo Peng, Benjamin Coleman, Ziqi Chen, Zhouhang Xie	Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progr... Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-model reinforcement learning framework for test-time policy adaptation in evolutionary search agents. PACEvolve++ decouples strategic search decisio...
565	Unlocking High-Fidelity Molecular Generation from Mass Spectra via Dual-Stream Line Graph Diffusion 2605.07048 Mass spectra to molecule diffusion双流线图扩散联合原子与键推理以由质谱生成分子。	cs.LGcs.AI	Xujun Che, Xiuxia Du, Depeng Xu	De novo molecular generation from tandem mass spectra is a challenging inverse problem whose core difficulty lies in the circular dependency between atom-level and bond-level reasoning: determining a bond's type requires knowing its endpoint atoms' chemical en... De novo molecular generation from tandem mass spectra is a challenging inverse problem whose core difficulty lies in the circular dependency between atom-level and bond-level reasoning: determining a bond's type requires knowing its endpoint atoms' chemical environment, yet an atom's environment is in turn defined by its incident bonds. Existing graph diffusion methods process atoms and bonds within a single computation stream, where atom-bond information synchronization can only occur implicitl...
566	Towards Differentially Private Reinforcement Learning with General Function Approximation 2605.07049 Differentially private online RL给出一般函数逼近下在线强化学习的差分隐私保证。	cs.LGcs.AI	Yi He, Xingyu Zhou	We present the first theoretical guarantees for differentially private online reinforcement learning (RL) with general function approximation, extending beyond prior work restricted to tabular and linear settings. Our approach combines a batched policy update ... We present the first theoretical guarantees for differentially private online reinforcement learning (RL) with general function approximation, extending beyond prior work restricted to tabular and linear settings. Our approach combines a batched policy update scheme with the exponential mechanism, together with a novel regret analysis. We show that, even under general function approximation, the regret in the model-free setting under differential privacy matches the state of the art for the line...
567	Integrating Causal DAGs in Deep RL: Activating Minimal Markovian States with Multi-Order Exposure 2605.07057 Causal DAG state construction in RL从纵向因果图构造满足马尔可夫性的最小RL状态。	cs.LG	Jiamin Xu, Jacqueline Maasch, Kyra Gan	Online reinforcement learning (RL) relies on the Markov property for guaranteed performance, but real-world applications often lack well-defined states given raw observed variables. While causal RL has attracted growing interest, existing work typically assume... Online reinforcement learning (RL) relies on the Markov property for guaranteed performance, but real-world applications often lack well-defined states given raw observed variables. While causal RL has attracted growing interest, existing work typically assumes Markovian states are provided and focuses on using causality to accelerate learning, leaving a fundamental gap: \emph{given a longitudinal causal graph over observed variables, how does one construct MDP states that provably satisfy the M...
568	Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training 2605.07063 Data-regularized LLM post-training将通用数据视作正则项以改进LLM后训练与选数。	cs.LGcs.AI	Pingbang Hu, Xueshen Liu, Z. Morley Mao, Jiaqi W. Ma	Data selection methods address a critical challenge in LLM post-training: effectively leveraging scarce, high-fidelity target data alongside abundant but imperfectly aligned general training data. In this work, we move beyond the data-selection framing and int... Data selection methods address a critical challenge in LLM post-training: effectively leveraging scarce, high-fidelity target data alongside abundant but imperfectly aligned general training data. In this work, we move beyond the data-selection framing and introduce Dr. Post-Training (Data-Regularized Post-Training), a novel framework that reconceptualizes general training data as a data-induced regularizer that prevents overfitting to the scarce target objective, rather than serving as a pool f...
569	PolarAdamW: Disentangling Spectral Control and Schur Gauge-Equivariance in Matrix Optimisation 2605.07067 Matrix optimization PolarAdamW提出PolarAdamW分离谱控制与Schur规范等变性。	cs.LG	Haozhou Zhang	Muon's matrix-level update couples two distinct effects: spectral control via a polar map, and equivariance under orthogonal changes of multiplicity-space basis (Schur gauge-equivariance). We separate them with PolarAdamW, a controlled hybrid that preserves Mu... Muon's matrix-level update couples two distinct effects: spectral control via a polar map, and equivariance under orthogonal changes of multiplicity-space basis (Schur gauge-equivariance). We separate them with PolarAdamW, a controlled hybrid that preserves Muon's polar spectral-norm control but breaks the gauge-equivariance, since AdamW's coordinatewise preconditioner is basis-dependent. Algorithmically, PolarAdamW applies Muon's Newton-Schulz polar map to AdamW's preconditioned direction rathe...
570	Less Random, More Private: What is the Optimal Subsampling Scheme for DP-SGD? 2605.07072 Optimal subsampling for DP-SGD证明参与方差致隐私放大次优并寻找更优采样方案。	cs.LG	Andy Dong, Ayfer \"Ozg\"ur	Poisson subsampling is the default sampling scheme in differentially private machine learning, largely because its unstructured randomness yields tractable privacy amplification analyses. Yet this same randomness introduces substantial participation variance: ... Poisson subsampling is the default sampling scheme in differentially private machine learning, largely because its unstructured randomness yields tractable privacy amplification analyses. Yet this same randomness introduces substantial participation variance: each sample appears in very different numbers of training iterations. In this work, we show that this variance is not merely a practical artifact to be tolerated, but a fundamental source of suboptimal privacy amplification. We prove that B...
571	ModelLens: Finding the Best for Your Task from Myriads of Models 2605.07075 Pretrained Model Selection提出ModelLens在海量模型中为新数据集快速选最优模型。	cs.LG	Rui Cai, Weijie Jacky Mo, Xiaofei Wen, Qiyao Ma, Wenhui Zhu	The open-source model ecosystem now contains hundreds of thousands of pretrained models, yet picking the best model for a new dataset is increasingly infeasible: new models and unbenchmarked datasets emerge continuously, leaving practitioners with no prior rec... The open-source model ecosystem now contains hundreds of thousands of pretrained models, yet picking the best model for a new dataset is increasingly infeasible: new models and unbenchmarked datasets emerge continuously, leaving practitioners with no prior records on either side. Existing approaches handle only fragments of this in-the-wild setting: AutoML and transferability estimation select models from small predefined pools or require expensive per-model forward passes on the target dataset,...
572	Test-Time Compositional Generalization in Diffusion Models via Concept Discovery 2605.07078 Diffusion Compositional Generalization让预训练扩散模型从得分中发现概念并在测试时组合生成。	cs.LG	Zekun Wang, Anant Gupta, Tianyi Zhu, Christopher J. MacLellan	Compositional generalization requires models to produce novel configurations from familiar parts. In diffusion models, prior compositional generation methods typically assume that the relevant concepts or conditioning signals are already available. We instead ... Compositional generalization requires models to produce novel configurations from familiar parts. In diffusion models, prior compositional generation methods typically assume that the relevant concepts or conditioning signals are already available. We instead ask whether a pretrained diffusion model can discover query-specific concepts from the time-indexed scores it learns for the noisy marginals $p_t(x_t)$ and compose them at test time. Given a single out-of-distribution query, our method perf...
573	Actor-Critic with Active Importance Sampling 2605.07094 Variance-Reduced Actor-Critic提出AISAC主动优化行为策略以重要性采样降低策略梯度方差。	cs.LG	Majid Molaei, Gabor Paczolay, Matteo Papini, Alberto Maria Metelli, Marcello Restelli	This paper introduces the Active-Importance-Sampling Actor-Critic (AISAC) algorithm, an extension of the Actor-Critic framework for reducing variance in policy gradient estimation. AISAC optimizes the behavior policy to minimize gradient variance while preserv... This paper introduces the Active-Importance-Sampling Actor-Critic (AISAC) algorithm, an extension of the Actor-Critic framework for reducing variance in policy gradient estimation. AISAC optimizes the behavior policy to minimize gradient variance while preserving unbiased gradient estimates. Using importance sampling principles, the algorithm adapts the behavior policy toward efficient data collection distributions aligned with target policy gradients. For continuous action spaces, AISAC employs...
574	Query-efficient model evaluation using cached responses 2605.07096 Cached-Response Model Evaluation利用已缓存模型回答减少新模型基准评测所需查询数。	cs.LGcs.AI	Hayden Helm, Ben Johnson, Carey Priebe	Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice, responses from ... Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice, responses from previously-evaluated models are often cached -- creating a potential opportunity to use this additional information to decrease the number of queries required to accurately evaluate a new model. In this paper, we introduce an approach for p...
575	CarCrashNet: A Large-Scale Dataset and Hierarchical Neural Solver for Data-Driven Structural Crash Simulation 2605.07098 Crash Simulation Dataset and Solver发布CarCrashNet数据集并用分层神经网络进行结构碰撞仿真。	cs.LG	Mohamed Elrefaie, Dule Shu, Matthew Klenk, Faez Ahmed	Crash simulation is a cornerstone of modern vehicle development because it reduces the need for costly physical prototypes, accelerates safety-driven design iteration, and increasingly supports virtual testing workflows. At the same time, modeling structural c... Crash simulation is a cornerstone of modern vehicle development because it reduces the need for costly physical prototypes, accelerates safety-driven design iteration, and increasingly supports virtual testing workflows. At the same time, modeling structural crash mechanics remains exceptionally challenging: the response is governed by nonlinear contact, large deformation, material plasticity, failure, and complex multi-body interactions evolving over space and time on high-resolution finite-ele...
576	Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift 2605.07104 Stochastic Approximation Convergence Theory用Poisson-Moreau漂移给出马尔可夫噪声下SA与RL几乎必然收敛率。	cs.LG	Xinyu Liu, Zixuan Xie, Shangtong Zhang	Establishing almost sure convergence rates for stochastic approximation and reinforcement learning under Markovian noise is a fundamental theoretical challenge. We make progress towards this challenge for a class of stochastic approximation algorithms whose ex... Establishing almost sure convergence rates for stochastic approximation and reinforcement learning under Markovian noise is a fundamental theoretical challenge. We make progress towards this challenge for a class of stochastic approximation algorithms whose expected updates are contractive, a setting that arises in many reinforcement learning algorithms such as $Q$-learning and linear temporal difference learning. Specifically, for a power-law learning rate $O(n^{-\eta})$ with $\eta \in (1/2, 1)...
577	Solving Max-Cut to Global Optimality via Feasibility-Preserving Graph Neural Networks 2605.07113 GNN-Accelerated Exact Max-Cut用保持可行性的GNN近似SDP界以加速分支定界求解Max-Cut最优解。	cs.LG	Hao Chen, Chendi Qian, Christopher Morris, Andrea Lodi, Can Li	Exact solution of hard combinatorial optimization problems often relies on strong convex relaxations, but solving these relaxations repeatedly inside a branch-and-bound algorithm can be prohibitively expensive. Hence, we consider this challenge for Max-Cut, wh... Exact solution of hard combinatorial optimization problems often relies on strong convex relaxations, but solving these relaxations repeatedly inside a branch-and-bound algorithm can be prohibitively expensive. Hence, we consider this challenge for Max-Cut, where branch and bound commonly uses semidefinite programming (SDP) relaxations to bound subproblems. We propose a Max-Cut-specific graph neural network that serves as a principled, lightweight neural proxy for these SDP solvers and can be pl...
578	Where to Spend Rollouts: Hit-Utility Optimal Rollout Allocation for Group-Based RLVR 2605.07114 Adaptive Rollout Allocation in RLVR提出命中效用准则为GRPO等按提示自适应分配rollout计算。	cs.LG	Tao Wang, Shuo Li, Yan Sun, Dongsheng Ding, Edgar Dobriban	Reinforcement learning with verifiable rewards (RLVR) has emerged as a central paradigm for improving the reasoning capabilities of large language models. Group-based policy optimization methods, such as GRPO, typically allocate a fixed number of rollouts to e... Reinforcement learning with verifiable rewards (RLVR) has emerged as a central paradigm for improving the reasoning capabilities of large language models. Group-based policy optimization methods, such as GRPO, typically allocate a fixed number of rollouts to every prompt. This uniform allocation can be inefficient: it over-allocates compute to prompts whose sampled groups are already saturated while under-exploring prompts for which additional samples may reveal useful correct trajectories. To a...
579	Conformal-Style Quantile Analyses for Stochastic Bandits 2605.07115 Quantile Objectives in Bandits提出ACP-UCB1等方法分析并优化随机老虎机的上分位数目标。	cs.LG	Chengyu Du, Mengfan Xu	Stochastic bandit algorithms are usually analyzed under a mean-reward criterion, yet many problems favor arms with strong upper-tail performance, which we study herein. For a fixed miscoverage level $\alpha$, the natural upper-tail target of arm $j$ is the... Stochastic bandit algorithms are usually analyzed under a mean-reward criterion, yet many problems favor arms with strong upper-tail performance, which we study herein. For a fixed miscoverage level $\alpha$, the natural upper-tail target of arm $j$ is the upper endpoint $F_j^{-1}(1-\alpha/2)$ of a central prediction interval. This target can rank arms differently from their means, creating a central mismatch with the classical bandit objective. To this end, we propose ACP-UCB1, a conforma...
580	Stabilized neural Hamilton--Jacobi--Bellman solvers: Error analysis and applications in model-based reinforcement learning 2605.07116 Neural HJB Solvers for MBRL分析稳定化神经HJB求解器误差并用于连续时间模型式强化学习。	cs.LGcs.AI	Minseok Kim, Yeongjong Kim, Namkyeong Cho, Yeoneung Kim	Physics-informed neural solvers offer a promising route to model-based reinforcement learning in continuous time, where optimal feedback synthesis is governed by Hamilton--Jacobi--Bellman (HJB) equations. Practical implementations often occupy a regime that is... Physics-informed neural solvers offer a promising route to model-based reinforcement learning in continuous time, where optimal feedback synthesis is governed by Hamilton--Jacobi--Bellman (HJB) equations. Practical implementations often occupy a regime that is neither a classical grid method nor a continuous-PDE PINN: the value function is represented by a neural network, finite-difference HJB policy-evaluation operators are evaluated by network queries at shifted points, and residuals are minim...
581	When Symbol Names Should Not Matter: A Logistic Theory of Fresh-Symbol Classification 2605.07120 Symbol-Renaming Invariant Classification建立逻辑回归理论解释模板任务中对符号重命名不变的分类学习。	cs.LG	Wenjie Guan, Jelena Bradic	Template tasks have emerged as a clean testbed for asking whether transformers reason with abstract symbols rather than concrete token names. We study the fixed-label classification version of this problem, where train and test examples share latent templates ... Template tasks have emerged as a clean testbed for asking whether transformers reason with abstract symbols rather than concrete token names. We study the fixed-label classification version of this problem, where train and test examples share latent templates but may use disjoint vocabularies. Unlike next-token prediction, the model need not emit unseen symbols; it must learn a decision rule invariant to symbol renaming. We analyze regularized kernel logistic classification in the transformer-ke...
582	Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought 2605.07123 Theory of In-Context RL with CoT从理论上刻画CoT如何促进Transformer的上下文强化学习收敛与涌现。	cs.LG	Zixuan Xie, Xinyu Liu, Rohan Chandra, Shangtong Zhang	In-context reinforcement learning (ICRL) refers to the ability of RL agents to adapt to new tasks at inference time without parameter updates by conditioning on additional context. Recent empirical studies further demonstrate that Chain-of-Thought (CoT) genera... In-context reinforcement learning (ICRL) refers to the ability of RL agents to adapt to new tasks at inference time without parameter updates by conditioning on additional context. Recent empirical studies further demonstrate that Chain-of-Thought (CoT) generation can amplify this ICRL capability. This paper is the first to provide a theoretical understanding on how CoT interacts with ICRL. We conduct our analysis in a policy evaluation setup with linear Transformer. We prove that with specific ...
583	Simple KNN-Based Outlier Detection Achieves Robust Clustering 2605.07130 KNN Outlier Detection for Robust k-Means证明简单KNN异常检测可实现鲁棒聚类并提升robust k-means效果。	cs.LG	Tianle Jiang, Yufa Zhou	Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means cost on the remaining ... Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means cost on the remaining points. Despite the close connection between robust $k$-Means and outlier detection, both theoretical and empirical understanding of the effectiveness of $\textit{classic outlier detection heuristics}$ for robust $k$-Means remains limited. ...
584	GAD in the Wild: Benchmarking Graph Anomaly Detection under Realistic Deployment Challenges 2605.07133 Realistic Graph Anomaly Detection Benchmark构建贴近部署的图异常检测基准并系统评测多种挑战维度。	cs.LGcs.AI	Jingjing Zhou, Shiyu Huang, Qing Qing, Zuquan Yuan, Huafei Huang	Graph Anomaly Detection (GAD) is a critical task in graph machine learning with vital applications in financial fraud detection and social platform governance. However, existing GAD benchmarks are often restricted to small-scale, curated graphs with relatively... Graph Anomaly Detection (GAD) is a critical task in graph machine learning with vital applications in financial fraud detection and social platform governance. However, existing GAD benchmarks are often restricted to small-scale, curated graphs with relatively balanced anomaly ratios, leaving a substantial gap between academic evaluation and real-world deployment. To bridge this gap, we present a multi-dimensional benchmark that systematically evaluates GAD models under three deployment-relevant...
585	Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR 2605.07137 Negative Reinforcement for RLVR提出自适应负强化动态调节惩罚以兼顾纠错与多样性提升推理。	cs.LGcs.AI	Yash Ingle, Jaival Chauhan, Ankit Yadav, Sudhakar Mishra	Reinforcement learning with verifiable rewards (RLVR) has become a highly effective method for improving the reasoning abilities of Large Language Models (LLMs). Recent research shows that Negative Sample Reinforcement (NSR) -- which focuses on penalizing inco... Reinforcement learning with verifiable rewards (RLVR) has become a highly effective method for improving the reasoning abilities of Large Language Models (LLMs). Recent research shows that Negative Sample Reinforcement (NSR) -- which focuses on penalizing incorrect steps rather than simply rewarding correct ones -- can match or even exceed the performance of more complex frameworks like PPO and GRPO across the entire Pass@k spectrum. However, current NSR techniques usually apply a fixed penalty ...
586	Regret-Oracle Complexity Tradeoffs in Agnostic Online Learning 2605.07155 Oracle-Efficient Online Learning Tradeoffs研究不可知在线学习中遗憾与ERM预言机复杂度之间的权衡界。	cs.LG	Idan Attias, Steve Hanneke, Arvind Ramaswami	Agnostic online learning is classically solved via a reduction to the realizable setting, utilizing Littlestone's Standard Optimal Algorithm (SOA) as a base learner. However, the SOA is computationally intractable to execute even for a single round. To overcom... Agnostic online learning is classically solved via a reduction to the realizable setting, utilizing Littlestone's Standard Optimal Algorithm (SOA) as a base learner. However, the SOA is computationally intractable to execute even for a single round. To overcome this barrier, recent work in oracle-efficient online learning replaces the SOA with a realizable base learner that accesses the concept class exclusively through an offline empirical risk minimization (ERM) oracle. While such agnostic lea...
587	Learned Lagrangian Models of PDEs via Euler-Lagrange Residual Minimization 2605.07157 Learned Lagrangian PDE Modeling通过最小化欧拉-拉格朗日残差学习连续拉氏量并稳定预测PDE动力学。	cs.LG	Lyra Zhornyak, Eric Forgoston, M. Ani Hsieh	We present the first method to directly use a learned continuous Lagrangian to forecast the dynamics of systems governed by partial differential equations, exploiting the inherent conservative structure to achieve stable long-range predictions. We develop an o... We present the first method to directly use a learned continuous Lagrangian to forecast the dynamics of systems governed by partial differential equations, exploiting the inherent conservative structure to achieve stable long-range predictions. We develop an optimization-based integrator that minimizes the squared Euler--Lagrange residual via a mesh-free near-symplectic construction on local space-time patches. Different from integrators for analytical models, integrators for learned models shou...
588	Neurosymbolic Imitation Learning with Human Guidance: A Privileged Information Approach 2605.07166 Neurosymbolic Imitation Learning用人类指导的特权信息融合神经与符号实现更高效泛化的模仿学习。	cs.LG	Nikhilesh Prabhakar, Varun Balaji, Athresh Karanam, Kristian Kersting, Sriraam Natarajan	Imitation learning is widely used for learning to act in complex environments. While pure neural-based methods handle high dimensional data effectively, they suffer from the requirement of large number of samples and are prone to overfitting. Pure symbolic app... Imitation learning is widely used for learning to act in complex environments. While pure neural-based methods handle high dimensional data effectively, they suffer from the requirement of large number of samples and are prone to overfitting. Pure symbolic approaches, while generalize well, do not handle high-dimensional data effectively. We propose a neurosymbolic approach that achieves the best of both worlds, i.e, handling high-dimensional data while achieving generalization. The key advantag...
589	Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy 2605.07171 Cost-Constrained Multi-Armed Bandits研究带成本补贴的老虎机在满足奖励约束下的最小成本决策。	cs.LG	Ishank Juneja, Carlee Joe-Wong, Osman Ya\u{g}an	The classic multi-armed bandit (MAB) problem tackles the challenge of accruing maximum reward while making decisions under uncertainty. However, in applications, often the goal is to minimize cost subject to a constraint on the minimum permissible reward, an o... The classic multi-armed bandit (MAB) problem tackles the challenge of accruing maximum reward while making decisions under uncertainty. However, in applications, often the goal is to minimize cost subject to a constraint on the minimum permissible reward, an objective captured by multi-armed bandits with cost-subsidy (MAB-CS). Of interest to this paper is the setting where the quality (reward) constraint is specified relative to the unknown best reward and the cost of each arm is known. We chara...
590	Learning Multi-Relational Graph Representations for DNA Methylation-Based Biological Age Estimation 2605.07175 Graph Learning for Epigenetic Aging Clocks学习多关系图表示建模CpG依赖以提升甲基化生物年龄估计。	cs.LGcs.AI	Qing Qing, Xikun Zhang, Zhongyuan Zhang, Jiarui Liu, Xingtong Yu	Aging clocks aim to estimate biological age, a measure of physiological state distinct from chronological age, from observable biomarkers, and are widely used for health assessment and disease analysis. DNA methylation is a particularly informative biomarker d... Aging clocks aim to estimate biological age, a measure of physiological state distinct from chronological age, from observable biomarkers, and are widely used for health assessment and disease analysis. DNA methylation is a particularly informative biomarker due to its stability and strong association with aging, and recent learning-based approaches have improved predictive performance. However, most existing methods treat CpG sites as independent features, overlooking the complex and heterogene...
591	HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents 2605.07177 Parallel Multimodal Search Agents提出HyperEyes并行发起多路检索与视觉定位以减少交互轮次。	cs.LGcs.AI	Guankai Li, Jiabin Chen, Yi Xu, Xichen Zhang, Yuan Lu	Existing multimodal search agents process target entities sequentially, issuing one tool call per entity and accumulating redundant interaction rounds whenever a query decomposes into independent sub-retrievals. We argue that effective multimodal agents should... Existing multimodal search agents process target entities sequentially, issuing one tool call per entity and accumulating redundant interaction rounds whenever a query decomposes into independent sub-retrievals. We argue that effective multimodal agents should search wider rather than longer: dispatching multiple grounded queries concurrently within a round. To this end, we present HyperEyes, a parallel multimodal search agent that fuses visual grounding and retrieval into a single atomic action...
592	Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control 2605.07182 Nested Submodels for Reasoning LLMsStar Elastic一次后训练在单模型内嵌多子模型并可控推理预算。	cs.LG	Ali Taghibakhshi, Ruisi Cai, Saurav Muralidharan, Sharath Turuvekere Sreenivas, Aditya Vavre	Training a family of large language models (LLMs), either from scratch or via iterative compression, is prohibitively expensive and inefficient, requiring separate training runs for each model in the family. In this paper, we introduce Star Elastic, a novel LL... Training a family of large language models (LLMs), either from scratch or via iterative compression, is prohibitively expensive and inefficient, requiring separate training runs for each model in the family. In this paper, we introduce Star Elastic, a novel LLM post-training method that adds N nested submodels to a given parent reasoning model using the compute of one run (N-fold savings) via a single post-training job. Beyond reducing training costs, Star Elastic also addresses a fundamental li...
593	Coupling Models for One-Step Discrete Generation 2605.07193 One-Step Discrete Generative Models提出耦合模型将离散序列与高斯潜变量直接耦合实现一步生成。	cs.LG	Fred Zhangzhi Peng, Avishek Joey Bose, Anru R. Zhang, Alexander Tong	Generative modeling over discrete structures underpins applications across deep learning, from biological sequence design and code generation to large language models, yet generation often remains sequential, relying on autoregressive decoding or iterative ref... Generative modeling over discrete structures underpins applications across deep learning, from biological sequence design and code generation to large language models, yet generation often remains sequential, relying on autoregressive decoding or iterative refinement. In this work, we introduce Coupling Models(Coupling Models), a one-step discrete generative model that learns a direct coupling between discrete sequences and Gaussian latents. Unlike recent distillation methods that compress a pre...
594	Arrow: A Foundation Model for Causal Discovery 2605.07204 Foundation Model for Causal Discovery提出Arrow用Transformer零样本预测骨架与拓扑序以发现因果图。	cs.LG	Ryan Thompson, He Zhao, Daniel M. Steinberg, Edwin V. Bonilla	We introduce Arrow, a foundation model for zero-shot causal discovery on observational tabular data. Arrow factorizes a directed acyclic graph into an undirected skeleton and a topological order, guaranteeing acyclicity by construction. Given a new dataset, it... We introduce Arrow, a foundation model for zero-shot causal discovery on observational tabular data. Arrow factorizes a directed acyclic graph into an undirected skeleton and a topological order, guaranteeing acyclicity by construction. Given a new dataset, it uses a transformer-based architecture to contextualize variables within and across observations, then predicts skeleton edge probabilities and node order scores that together define a graph. Arrow is trained in a supervised fashion on synt...
595	FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution 2605.07208 Academic Impact Forecasting用连续时间流形演化预测论文影响并评估LLM对高影响论文辨识能力。	cs.LG	Jianrong Ding, Jianyuan Zhong, Zhengyan Shi, Qiang Xu	Large Language Models (LLMs) are increasingly used to brainstorm and evaluate research ideas, yet assessing such judgments is fundamentally difficult because the true impact of a new idea may take years to emerge. We address this challenge by using the impact ... Large Language Models (LLMs) are increasingly used to brainstorm and evaluate research ideas, yet assessing such judgments is fundamentally difficult because the true impact of a new idea may take years to emerge. We address this challenge by using the impact forecasting of human-authored manuscripts as a verifiable proxy task. In a prospective forecasting study, we find that frontier LLMs fail to reliably distinguish high-impact papers from ordinary publications, suggesting that static text-bas...
596	HARMONY: Bridging the Personalization-Generalization Gap by Mitigating Representation Skew in Heterogeneous Split Federated Learning 2605.07211 Heterogeneous Split Federated LearningHARMONY缓解异构Split联邦学习中的表示偏斜以兼顾个性化与泛化。	cs.LGcs.AI	Jiseok Youn, You Rim Choi, Goodsol Lee, Sangtae Ha, Hyung-Sin Kim	Mobile devices face diverse resource constraints and non-IID data class distributions, requiring fast on-device inference for local in-distribution (ID) classes and on-demand remote support for client-specific out-of-distribution (OOD) classes. Hybrid split fe... Mobile devices face diverse resource constraints and non-IID data class distributions, requiring fast on-device inference for local in-distribution (ID) classes and on-demand remote support for client-specific out-of-distribution (OOD) classes. Hybrid split federated learning (Hybrid SFL) couples personalized client-side front ends (supporting early exit) with a generalized server-side backend for fallback inference, balancing accuracy and cost. However, under client architectural heterogeneity,...
597	Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability 2605.07212 EEG Preprocessing Robustness将预处理视为干预空间并量化其导致的EEG解码预测不稳定性。	cs.LGcs.AI	Dengzhe Hou, Zihao Wu, Lingyu Jiang, Zirui Li, Fangzhou Lin	Electroencephalography (EEG) is a cornerstone of brain-computer interfaces and clinical neuroscience, yet deep learning models are typically trained and evaluated under a single, unreported preprocessing pipeline. We formalize preprocessing choices as a counte... Electroencephalography (EEG) is a cornerstone of brain-computer interfaces and clinical neuroscience, yet deep learning models are typically trained and evaluated under a single, unreported preprocessing pipeline. We formalize preprocessing choices as a counterfactual intervention space and show that EEG predictions are surprisingly unstable under this space: across six datasets spanning four paradigms, up to 42% of trial-level predictions flip when only the preprocessing changes, a variability ...
598	Improved Model-based Reinforcement Learning with Smooth Kernels 2605.07218 Kernel-Smooth Model-Based RL提出平滑核的模型式强化学习方法以利用MDP光滑性提升样本效率。	cs.LG	Kun Long, Yuqiang Li, Xianyi Wu	For continuous state-action space scenarios, classical reinforcement learning (RL) theory predominantly focuses on low-rank Markov decision processes (MDPs), which provide sample-efficient guarantees at the expense of restrictive structural assumptions. Kernel... For continuous state-action space scenarios, classical reinforcement learning (RL) theory predominantly focuses on low-rank Markov decision processes (MDPs), which provide sample-efficient guarantees at the expense of restrictive structural assumptions. Kernel smoothing model-based approaches offer a promising alternative paradigm that instead leverages the smoothness of the MDP and employs non-parametric kernel smoothing estimates of transition dynamics. This paper proposes a new kernel-smoothi...
599	On the Robustness of Distribution Support under Diffusion Guidance 2605.07220 Theory of Diffusion Guidance Robustness从理论解释扩散引导如何影响分布支持并保证高质量可控生成。	cs.LG	Ruijia Cao, Yuchen Wu, Nisha Chadramoorthy	Diffusion guidance is a powerful technique that enables controllable and high-fidelity sample generation with diffusion models. At a high level, it modifies the score function by incorporating a guidance term that steers the generative process toward a desired... Diffusion guidance is a powerful technique that enables controllable and high-fidelity sample generation with diffusion models. At a high level, it modifies the score function by incorporating a guidance term that steers the generative process toward a desired condition. Despite its empirical success, the theoretical properties of diffusion guidance remain largely unexplored, and it is not well understood why it consistently produces high-quality samples. In this work, we explain the effectivene...
600	Don't Learn the Shape: Forecasting Periodic Time Series by Rank-1 Decomposition 2605.07222 Rank-1 Periodic Time Series Forecasting用秩一分解分离周期形状与幅度以极少参数预测周期时间序列。	cs.LG	Takato Honda	How few parameters do we really need to forecast a periodic time series? An hourly electricity series, reshaped as a 24-row matrix with one column per day, is approximately rank-1: a daily shape modulated by a daily level (median centered rank-1 energy 0.82 on... How few parameters do we really need to forecast a periodic time series? An hourly electricity series, reshaped as a 24-row matrix with one column per day, is approximately rank-1: a daily shape modulated by a daily level (median centered rank-1 energy 0.82 on GIFT-Eval). Should we learn the shape? Smoothing, shrinkage, and low-rank fits all seem like obvious upgrades over the simple average of the last K=2 cycles. On all 97 GIFT-Eval configurations, we tested 8 such alternatives (e.g., Fourier,...
601	Modulated learning for private and distributed regression with just a single sample per client device 2605.07233 One-sample federated regression提出调制学习以在每端仅一条样本下实现隐私分布式回归。	cs.LG	Praneeth Vepakomma, Amirhossein Reisizadeh, Samuel Horv\'ath, Munther Dahleh	This work focuses on the question of learning from a large number of devices with each device holding only a single sample of data. Several real-world applications exist to this one sample per client setup up including learning from fitness trackers, data/app ... This work focuses on the question of learning from a large number of devices with each device holding only a single sample of data. Several real-world applications exist to this one sample per client setup up including learning from fitness trackers, data/app usage aggregators, body-worn sensing devices, and daily event monitors to name a few. When a client has only one sample, the standard federated learning paradigm breaks down as a local update based on that single point is far from being use...
602	Sample Complexity of Stochastic Optimization with Integer Variables 2605.07239 Integer stochastic optimization complexity分析整数随机优化的样本复杂度并与连续情形对比。	cs.LG	Hongyu Cheng, Yinghao Zheng, Marco Molinaro, Amitabh Basu	We establish sample complexity results for stochastic optimization over the integers, especially with a view to understand the complexity with respect to the corresponding continuous optimization problem. We show that integer optimization can sometimes require... We establish sample complexity results for stochastic optimization over the integers, especially with a view to understand the complexity with respect to the corresponding continuous optimization problem. We show that integer optimization can sometimes require strictly more samples and sometimes strictly smaller number of samples, depending on the structure of the objective and constraints. 1. For Lipschitz objectives over subsets of the $\ell_\infty$ ball, the statistical complexity of general ...
603	PerCaM-Health: Personalized Dynamic Causal Graphs for Healthcare Reasoning 2605.07267 Personalized temporal causal graphs学习个体化动态因果图以支持医疗时序推理与决策。	cs.LG	Elahe Khatibi, Ziyu Wang, Saba A. Farahani, Di Huang, Hung Cao	Personalized healthcare decisions require reasoning about how physiological and behavioral variables influence an individual patient over time. Existing temporal causal discovery methods are poorly matched to this setting: cohort-level models provide stable bu... Personalized healthcare decisions require reasoning about how physiological and behavioral variables influence an individual patient over time. Existing temporal causal discovery methods are poorly matched to this setting: cohort-level models provide stable but non-personalized structures, while per-patient discovery is unreliable because individual trajectories are short, noisy, irregular, and non-stationary. This creates a fundamental gap between population-level causal modeling and the patien...
604	bispectrum: Selective $G$-Bispectra Made Practical 2605.07270 Group-invariant bispectrum features将选择性G-双谱做成可用工具以提取群不变表征。	cs.LG	Johan Mathe, Adele Myers, Simon Mataigne, Nina Miolane	Many machine learning tasks are invariant under the action of a group $G$ of transformations: signal classification can be invariant under translations, image classification under 2D rotations, and spherical-image classification under 3D rotations. The $G$-bis... Many machine learning tasks are invariant under the action of a group $G$ of transformations: signal classification can be invariant under translations, image classification under 2D rotations, and spherical-image classification under 3D rotations. The $G$-bispectrum is a principled complete invariant of a signal (retaining all all signal's information up to the group action) with proven benefits in machine learning and as a pooling layer in deep networks. However, its deployment has been hamper...
605	Bifurcation Models: Learning Set-Valued Solution Maps with Weight-Tied Dynamics 2605.07277 Set-valued learning via dynamics用权重共享动力系统学习多解问题的集合值解映射。	cs.LGcs.AI	Caleb Jore, Jialin Liu	Many scientific and combinatorial problems admit multiple correct solutions, not a single label. Standard supervised learning resolves this ambiguity by choosing one solution as the target, but this hidden selector can be arbitrary, discontinuous, and harder t... Many scientific and combinatorial problems admit multiple correct solutions, not a single label. Standard supervised learning resolves this ambiguity by choosing one solution as the target, but this hidden selector can be arbitrary, discontinuous, and harder to learn than the underlying solution set. We study bifurcation models, a weight-tied dynamical view in which different initializations can converge to different stable equilibria, so the model represents an attractor landscape rather than o...
606	Mask2Cause: Causal Discovery via Adjacency Constrained Causal Attention 2605.07280 Time-series causal discovery attention用邻接约束因果注意力在预测过程中端到端恢复因果图。	cs.LGcs.AI	Omar Muhammad, Pasupuleti Dhruv Shivkant, Deepak N. Subramani	Leveraging deep learning for causal discovery in time series remains challenging because existing neural methods predominantly rely on component-wise architectures that fail to capture shared system dynamics or employ decoupled post-hoc graph extraction that r... Leveraging deep learning for causal discovery in time series remains challenging because existing neural methods predominantly rely on component-wise architectures that fail to capture shared system dynamics or employ decoupled post-hoc graph extraction that risks overfitting to spurious correlations. We propose $\textbf{Mask2Cause}$, an end-to-end framework that recovers the underlying causal graph directly during the forecasting forward pass. Our approach introduces an Inverted Variable Embedd...
607	The Convergence Gap: Instruction-Tuned Language Models Stabilize Later in the Forward Pass 2605.07282 Layerwise prediction stabilization in LMs提出收敛间隙指标揭示指令微调模型更晚才稳定预测。	cs.LG	Yifan Zhou	Final outputs hide when a checkpoint commits to its next-token prediction. We introduce the convergence gap, a model-diffing diagnostic that decodes each layer's next-token distribution and measures its distance to the model's own final distribution. Across si... Final outputs hide when a checkpoint commits to its next-token prediction. We introduce the convergence gap, a model-diffing diagnostic that decodes each layer's next-token distribution and measures its distance to the model's own final distribution. Across six paired pretrained and instruction-tuned checkpoints in native prompting regimes, instruction-tuned checkpoints remain farther from their final predictions later into the stack. The effect persists under endpoint-matched raw and tuned read...
608	Instruction Tuning Changes How Upstream State Conditions Late Readout: A Cross-Patching Diagnostic 2605.07284 Cross-patching interpretability for tuning用跨补丁诊断分析指令微调如何影响后层读出与行为差异。	cs.LG	Yifan Zhou	Recent interpretability work has identified model-internal handles on post-trained behavior, including refusal directions, assistant/persona axes, and sparse chat-tuning features. These results localize where behaviors can be read out or controlled, often in m... Recent interpretability work has identified model-internal handles on post-trained behavior, including refusal directions, assistant/persona axes, and sparse chat-tuning features. These results localize where behaviors can be read out or controlled, often in middle-to-late layers. We ask how earlier computation and the late stack cooperate to turn those differences into next-token margins. To test this, we introduce first-divergence cross-patching: at the first token where pretrained base (PT) a...
609	Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation 2605.07302 Spectral basis from pretraining证明预训练形成可复用谱基，使下游微调仅需少量方向调整。	cs.LG	Junjie Yu, Yue Wang, Zihan Deng, Yan Zhu, Wenxiao Ma	Finetuning pretrained models occurs in a low-dimensional subspace of the full parameter space. Prior work has focused on characterizing this optimization subspace, but largely ignored the complementary question: why do certain directions remain unexplored duri... Finetuning pretrained models occurs in a low-dimensional subspace of the full parameter space. Prior work has focused on characterizing this optimization subspace, but largely ignored the complementary question: why do certain directions remain unexplored during finetuning? Are these stable directions irrelevant to downstream tasks, or do they already encode task-relevant structure that requires no further adjustment? Answering this question is central to understanding how pretrained knowledge t...
610	Latent Order Bandits 2605.07304 Latent-structure bandits提出潜在顺序老虎机以利用跨实例结构降低个性化探索样本。	cs.LG	Emil Carlsson, Newton Mwai, Fredrik D. Johansson	Bandit algorithms solve diverse sequential decision-making problems, but are often too sample-inefficient for from-scratch personalization. To substantially reduce exploration times, latent bandit algorithms exploit cross-instance structure implied by discrete... Bandit algorithms solve diverse sequential decision-making problems, but are often too sample-inefficient for from-scratch personalization. To substantially reduce exploration times, latent bandit algorithms exploit cross-instance structure implied by discrete latent states, provided that the posterior distribution of rewards and latent states is known and accurate. However, obtaining an accurate model of this structure is difficult, and a small number of latent states may be insufficient to cha...
611	Generative Modeling with Flux Matching 2605.07319 Flux matching generative models提出Flux Matching学习非保守向量场以实现更灵活生成建模。	cs.LGcs.AI	Peter Pao-Huang, Xiaojie Qiu, Stefano Ermon	We introduce Flux Matching, a new paradigm for generative modeling that generalizes existing score-based models to a broader family of vector fields that need not be conservative. Rather than requiring the model to equal the data score, the Flux Matching objec... We introduce Flux Matching, a new paradigm for generative modeling that generalizes existing score-based models to a broader family of vector fields that need not be conservative. Rather than requiring the model to equal the data score, the Flux Matching objective imposes a weaker condition that admits infinitely many vector fields whose stationary distribution is the data. This flexibility enables a class of generative models that cannot be learned under score matching, in which inductive biase...
612	SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication 2605.07330 Communication-efficient RL weight sync用稀疏同步实现强化学习权重无损传输并大幅降低通信量。	cs.LGcs.AI	Lucas Hu, Ranchi Zhao, Isaac Zhu, Zach Zhang, Hscos Zhang	In large-scale reinforcement learning (RL) systems with decoupled Trainer-Rollout execution, the Trainer must regularly synchronize policy weights to the Rollout side to limit policy staleness. When inter-node bandwidth is abundant, such synchronization is usu... In large-scale reinforcement learning (RL) systems with decoupled Trainer-Rollout execution, the Trainer must regularly synchronize policy weights to the Rollout side to limit policy staleness. When inter-node bandwidth is abundant, such synchronization is usually only a small fraction of end-to-end cost. As model size grows, however, the communication demand rises rapidly. In bandwidth-constrained or network-variable deployments -- for example, cross-datacenter or cross-cluster settings, hetero...
613	Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective 2605.07331 Importance sampling for LLM RL从累计token视角重构重要性采样比率以缓解偏差方差矛盾。	cs.LGcs.AI	Yuheng Zhang, Chenlu Ye, Shuowei Jin, Changlong Yu, Wei Xiong	Reinforcement learning, including reinforcement learning with verifiable rewards (RLVR), has emerged as a powerful approach for LLM post-training. Central to these approaches is the design of the importance sampling (IS) ratio used in off-policy policy-gradien... Reinforcement learning, including reinforcement learning with verifiable rewards (RLVR), has emerged as a powerful approach for LLM post-training. Central to these approaches is the design of the importance sampling (IS) ratio used in off-policy policy-gradient estimation. Existing methods face a fundamental bias-variance dilemma: token-level IS ratios, as adopted by PPO (Schulman et al., 2017) and GRPO (Shao et al., 2024), introduce bias by ignoring prefix state distribution mismatch; full sequ...
614	Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning 2605.07333 In-context RL with softmax attention理论证明标准softmax Transformer可实现上下文内强化学习。	cs.LG	Zixuan Xie, Xinyu Liu, Claire Chen, Shuze Daniel Liu, Rohan Chandra	In-context reinforcement learning (ICRL) studies agents that, after pretraining, adapt to new tasks by conditioning on additional context without parameter updates. Existing theoretical analyses of ICRL largely rely on linear attention, which replaces the soft... In-context reinforcement learning (ICRL) studies agents that, after pretraining, adapt to new tasks by conditioning on additional context without parameter updates. Existing theoretical analyses of ICRL largely rely on linear attention, which replaces the softmax function in the standard attention with an identity mapping. This paper provides the first theoretical understanding of ICRL without making the unrealistic linear attention simplification. In particular, we consider the standard softmax...
615	CellScientist: Dual-Space Hierarchical Orchestration for Closed-Loop Refinement of Virtual Cell Models 2605.07335 Closed-loop virtual cell model refinement提出双空间分层编排框架实现虚拟细胞模型的闭环迭代修正。	cs.LG	Mengran Li, Bo Li, Jiaying Wang, Wenbin Xing, Yixuan Dong	Virtual Cell Modeling (VCM) requires models that not only predict perturbation responses, but also support targeted revision when predictions fail. Current LLM-assisted modeling workflows face a refinement-routing problem: prediction discrepancies are observed... Virtual Cell Modeling (VCM) requires models that not only predict perturbation responses, but also support targeted revision when predictions fail. Current LLM-assisted modeling workflows face a refinement-routing problem: prediction discrepancies are observed through executable implementations, but the relevant revision may involve the modeling assumption, representation design, implementation, or task constraint. Without structured feedback propagation across these levels, iterative refinement...
616	Mage: Multi-Axis Evaluation of LLM-Generated Executable Game Scenes Beyond Compile-Pass Rate 2605.07342 Evaluation of LLM-generated game scenes提出Mage四轴指标评测LLM生成可执行游戏场景的真实质量。	cs.LGcs.AI	Hugh Xuechen Liu, K{\i}van\c{c} Tatar	Compile-pass rate is the dominant evaluation signal for LLM code generation, yet for multi-component domain-specific artifacts it can be actively misleading. We demonstrate this on executable game scene synthesis with a four-axis evaluation protocol (named `Ma... Compile-pass rate is the dominant evaluation signal for LLM code generation, yet for multi-component domain-specific artifacts it can be actively misleading. We demonstrate this on executable game scene synthesis with a four-axis evaluation protocol (named `Mage') -- compile success, runtime success, structural fidelity, and mechanism adherence -- applied to 858 generation attempts across four open-weight LLMs (7B--30B), 26~hand-crafted Unity goal pattern playable concepts, and two automatically...
617	MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference 2605.07363 Sparse attention for long-context inference提出MISA混合索引稀疏注意力以降低长上下文推理开销。	cs.LGcs.AI	Ruijie Zhou, Fanxu Meng, Yufei Xu, Tongxuan Liu, Guangming Lu	DeepSeek Sparse Attention (DSA) sets the state of the art for fine-grained inference-time sparse attention by introducing a learned token-wise indexer that scores every prefix token and selects the most relevant ones for the main attention. To remain expressiv... DeepSeek Sparse Attention (DSA) sets the state of the art for fine-grained inference-time sparse attention by introducing a learned token-wise indexer that scores every prefix token and selects the most relevant ones for the main attention. To remain expressive, the indexer uses many query heads (for example, 64 on DeepSeek-V3.2) that share the same selected token set; this multi-head design is precisely what makes the indexer the dominant cost on long contexts. We propose MISA (Mixture of Index...
618	FlightSense: An End-to-End MLOps Platform for Real-Time Flight Delay Prediction via Rotation-Chain Propagation Features and Agentic Conversational AI 2605.07364 Real-time flight delay MLOps构建端到端平台用航班轮换链传播特征实时预测延误并部署。	cs.LG	Aditi J. Shelke, Renuka J. Shelke, Yash M. Kamerkar	Flight delays impose cascading operational and financial burdens across the aviation network, costing the U.S. economy billions of dollars annually by disrupting interconnected aircraft rotation systems. While prior machine learning approaches have demonstrate... Flight delays impose cascading operational and financial burdens across the aviation network, costing the U.S. economy billions of dollars annually by disrupting interconnected aircraft rotation systems. While prior machine learning approaches have demonstrated strong predictive performance, most treat upstream delays as static input variables rather than explicitly modeling how delays propagate dynamically through aircraft rotation chains, and none have deployed such systems alongside a live we...
619	QuadNorm: Resolution-Robust Normalization for Neural Operators 2605.07375 Resolution-robust normalization for operators用数值求积替代均匀平均提出QuadNorm以提升跨分辨率鲁棒性。	cs.LG	Bum Jun Kim, Makoto Kawano, Yusuke Iwasawa, Yutaka Matsuo	Normalization layers in neural operators usually compute statistics by uniformly averaging discrete grid values, making the normalization itself discretization-dependent and thereby a source of transfer error across different resolutions or meshes. To enable d... Normalization layers in neural operators usually compute statistics by uniformly averaging discrete grid values, making the normalization itself discretization-dependent and thereby a source of transfer error across different resolutions or meshes. To enable discretization robustness, we introduce a quadrature normalization family that replaces existing uniform averaging in normalization layers with numerical quadrature: QuadNorm and BlendQuadNorm. On endpoint-inclusive uniform grids, the propos...
620	Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns 2605.07378 Training-free network evaluation metrics用样本级激活模式提出零训练代理指标以更准评估网络性能。	cs.LG	Yameng Peng, Andy Song, HaythamM. Fayek, Vic Ciesielski, Xiaojun Chang	Zero-shot proxies, also known as training-free metrics, are widely adopted to reduce the computational overhead in neural network evaluation for scenarios such as Neural Architecture Search (NAS), as they do not require any training. Existing zero-shot metrics... Zero-shot proxies, also known as training-free metrics, are widely adopted to reduce the computational overhead in neural network evaluation for scenarios such as Neural Architecture Search (NAS), as they do not require any training. Existing zero-shot metrics have several limitations, including weak correlation with the true performance and poor generalisation across different networks or downstream tasks. For example, most of these metrics apply only to either convolutional neural networks (CN...
621	StreamPhy: Streaming Inference of High-Dimensional Physical Dynamics via State Space Models 2605.07384 Streaming physical field inference提出StreamPhy用状态空间模型从稀疏不规则观测实时推断物理场。	cs.LG	Panqi Chen, Yifan Sun, Shikai Fang, Xiao Fu, Lei Cheng	Inferring the evolution of high-dimensional and multi-modal (e.g., spatio-temporal) physical fields from irregular sparse measurements in real time is a fundamental challenge in science and engineering. Existing approaches, including diffusion-based generative... Inferring the evolution of high-dimensional and multi-modal (e.g., spatio-temporal) physical fields from irregular sparse measurements in real time is a fundamental challenge in science and engineering. Existing approaches, including diffusion-based generative models and functional tensor methods, typically operate in offline settings, depend on full temporal observations, or incur substantial inference cost. We propose StreamPhy, an end-to-end framework that enables efficient and accurate strea...
622	Convex Optimization with Nested Evolving Feasible Sets 2605.07386 Online convex optimization with shrinking sets研究嵌套收缩可行域下在线凸优化的遗憾与移动成本权衡算法。	cs.LG	Karthick Krishna M., Haricharan Balasundaram, Rahul Vaze	Convex Optimization with Nested Evolving Feasible Sets (CONES)} is considered where the objective function $f$ remains fixed but the feasible region evolves over time as a nested sequence $S_1 \supseteq S_2 \supseteq \cdots \supseteq S_T$. The goal of an onlin... Convex Optimization with Nested Evolving Feasible Sets (CONES)} is considered where the objective function $f$ remains fixed but the feasible region evolves over time as a nested sequence $S_1 \supseteq S_2 \supseteq \cdots \supseteq S_T$. The goal of an online algorithm is to simultaneously minimize the regret with respect to hindsight static optimal benchmark and the total movement cost while ensuring feasibility at all times. CONES is an optimization-oriented generalization of the well-known ...
623	Rubric-based On-policy Distillation 2605.07396 Rubric-based on-policy distillation用语义评分量表替代教师logits实现黑盒场景的在策略蒸馏。	cs.LGcs.AI	Junfeng Fang, Zhepei Hong, Mao Zheng, Mingyang Song, Gengsheng Li	On-policy distillation (OPD) is a powerful paradigm for model alignment, yet its reliance on teacher logits restricts its application to white-box scenarios. We contend that structured semantic rubrics can serve as a scalable alternative to teacher logits, ena... On-policy distillation (OPD) is a powerful paradigm for model alignment, yet its reliance on teacher logits restricts its application to white-box scenarios. We contend that structured semantic rubrics can serve as a scalable alternative to teacher logits, enabling OPD using only teacher-generated responses. To prove it, we introduce ROPD, a simple yet foundational framework for rubric-based OPD. Specifically, ROPD induces prompt-specific rubrics from teacher-student contrasts, and then utilizes...
624	Have Graph -- Will Lift? The Case for Higher-Order Benchmarks 2605.07397 Higher-order geometric deep learning benchmarks主张并构建更高阶结构基准以评测图与复形上的消息传递模型。	cs.LG	Bastian Rieck	After a somewhat rocky start, geometry and topology have established a foothold in machine learning. Message passing, either on graphs or higher-order complexes, is one of the main drivers of geometric deep learning, and paradigms that were once considered to ... After a somewhat rocky start, geometry and topology have established a foothold in machine learning. Message passing, either on graphs or higher-order complexes, is one of the main drivers of geometric deep learning, and paradigms that were once considered to be firmly in the realm of the abstract-like sheaves-have been "tamed" to serve as novel inductive biases for model architectures in topological deep learning. The veritable diversity of models, however, is in stark contrast to the scarcity ...
625	Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer 2605.07407 Interpretable symbols in health foundation models从冻结嵌入提取可解释符号并对齐跨模态健康表征实现迁移。	cs.LG	Gajendra Katuwal, Advait Koparkar, Salar Abbaspourazad, Anshuman Mishra, Sarvesh Kirthivasan	Health foundation models (FMs) learn useful representations from wearable sensors, but interpreting what they encode and transferring that knowledge across modalities after training remains difficult. We present a post-training framework that decomposes frozen... Health foundation models (FMs) learn useful representations from wearable sensors, but interpreting what they encode and transferring that knowledge across modalities after training remains difficult. We present a post-training framework that decomposes frozen embeddings into interpretable directions, referred to as symbols, and use these symbols to align the embedding spaces without retraining. We evaluate the framework on three FMs for photoplethysmography (PPG) and accelerometer data, indepen...
626	Tracking Large-scale Shared Bikes with Inertial Motion Learning in GNSS Blocked Environments 2605.07412 Inertial bike tracking without GNSS用惯性运动学习在GNSS受阻环境中实现共享单车鲁棒定位跟踪。	cs.LGcs.AI	Feng Liu (Beijing Jiaotong University), Kejia Li (Beijing Jiaotong University), Zhiwei Yang (DiDi Company), Chunwei Yang (DiDi Company), Qun Li (DiDi Company)	Although Global Navigation Satellite Systems (GNSS) provide a general solution for bike tracking outdoors, there still exist complex riding environments where only inertial navigation systems work, such as urban canyons. Despite decades of research, localizati... Although Global Navigation Satellite Systems (GNSS) provide a general solution for bike tracking outdoors, there still exist complex riding environments where only inertial navigation systems work, such as urban canyons. Despite decades of research, localization using only low-cost inertial sensors still faces challenges such as cumulative drifts and poor robustness caused by filtering methods. Furthermore, sensors such as visual and LiDAR could provide reliable measurements, but they are not su...
627	Risk-Consistent Multiclass Learning from Random Label-Subset Membership Queries 2605.07413 Weak supervision via label-subset queries研究随机标签子集成员查询下的多类学习并保证风险一致性。	cs.LG	Jiaxu Su, Junpeng Li, Changchun Hua, Yana Yang	Obtaining accurate class labels is often costly or unreliable, and may also be limited by privacy or other practical conditions. Compared with asking an annotator to provide the exact class, it is often easier to ask whether the true label belongs to a certain... Obtaining accurate class labels is often costly or unreliable, and may also be limited by privacy or other practical conditions. Compared with asking an annotator to provide the exact class, it is often easier to ask whether the true label belongs to a certain label subset. This query-response form defines a distinct weak-supervision mechanism: weak supervision information is generated through feedback on a label subset. Although weakly supervised learning has studied many learning frameworks, m...
628	A Flexible Adaptive Stable Clustering Algorithm for Archive-Scale Online Mass Spectrometry 2605.07424 Stable scalable clustering for mass spectrometry提出FASC框架在海量在线质谱流上实现可扩展且稳定的聚类。	cs.LG	Shao Shi, Xin Yang, Huiran Feng, Jianhuai Ye, Tianlong Hu	Modern online mass spectrometry generates multi-terabyte data streams critical for understanding Earth's environmental systems. However, extracting actionable chemical insights from these repositories is impeded by a computational bottleneck: existing clusteri... Modern online mass spectrometry generates multi-terabyte data streams critical for understanding Earth's environmental systems. However, extracting actionable chemical insights from these repositories is impeded by a computational bottleneck: existing clustering methods force a compromise among scalability, metric flexibility, and algorithmic stability. Here, we introduce Flexible Adaptive Stable Clustering (FASC), a dynamical systems framework that resolves these constraints by architecturally ...
629	GameGen-Verifier: Parallel Keypoint-Based Verification for LLM-Generated Games via Runtime State Injection 2605.07442 Verification for LLM-generated games用并行关键点与运行时状态注入自动验证LLM生成游戏机制正确性。	cs.LG	Chaobo Jia, Ruipeng Wan, Ting Sun, Weihao Tan, Borui Wan	LLM-based game generation promises to turn natural-language specifications into executable games, but progress is limited by the lack of reliable automated verification. Unlike conventional code generation, game correctness is defined over long-horizon interac... LLM-based game generation promises to turn natural-language specifications into executable games, but progress is limited by the lack of reliable automated verification. Unlike conventional code generation, game correctness is defined over long-horizon interaction: a game may appear correct while violating core mechanics such as state updates, interaction rules, and phase transitions. Existing Agent-as-a-Verifier approaches collapse verification into open-ended gameplay, making verdicts reachabi...
630	VNN-LIB 2.0: Rigorous Foundations for Neural Network Verification 2605.07451 Neural network verification standard提出VNN-LIB 2.0为神经网络验证提供严格语法语义与类型基础。	cs.LG	Ann Roy, Allen Antony, Andrea Gimelli, Matthew L. Daggitt	Neural network verification is an active and rapidly maturing research area, with a growing ecosystem of solvers and tools. The VNN-LIB standard was introduced to support interoperability in this ecosystem, but Version~1.0 has several serious short-comings as ... Neural network verification is an active and rapidly maturing research area, with a growing ecosystem of solvers and tools. The VNN-LIB standard was introduced to support interoperability in this ecosystem, but Version~1.0 has several serious short-comings as a formal foundation: it lacks a precise syntax, semantics, and type system, offers limited expressivity, and relies on externally defined ONNX models whose semantics are informal and constantly evolving. The latter distinguishes VNN-LIB fro...
631	Inference-Time Attribute Distribution Alignment for Unconditional Diffusion 2605.07456 Diffusion Attribute Distribution Control提出推理时对齐属性分布的方法以控制无条件扩散生成的人群比例。	cs.LG	Hao Luan, See-Kiong Ng, Chun Kai Ling	Inference-time controllable generation is essential for real-world applications of unconditional diffusion models. However, most existing techniques focus on individual samples, struggling in applications that require the sample population to follow specific a... Inference-time controllable generation is essential for real-world applications of unconditional diffusion models. However, most existing techniques focus on individual samples, struggling in applications that require the sample population to follow specific attribute distributions (e.g., demographic balance or semantic proportions). We formalize this setting as the inference-time attribute distributional alignment problem for pretrained unconditional diffusion models. To address this, we cast i...
632	Estimation of Motor Unit Parameters from Surface Electromyograms using an Informed Autoencoder 2605.07458 EMG Informed Autoencoder Estimation用带先验约束的自编码器从表面肌电估计运动单位生理参数。	cs.LG	Kaja Balzereit, Malte Mechtenberg, Axel Schneider	Motor unit parameters such as the innervation zone centre or the conduction velocity of the electrical potential harbour the potential to improve the fidelity of neuromechanical models used for movement and force prediction. Determining these parameters in a n... Motor unit parameters such as the innervation zone centre or the conduction velocity of the electrical potential harbour the potential to improve the fidelity of neuromechanical models used for movement and force prediction. Determining these parameters in a non-invasive way is challenging, as they are subject-specific and may vary with muscle contraction. Existing work on the estimation of motor unit parameters mainly relies on white-box modelling and therefore requires substantial manual model...
633	Learning Minimal-Deviation Corrections for Multi-Dimensional Mismodelling in HEP Simulations 2605.07460 HEP Simulation Minimal Corrections在仅有一维观测约束下学习最小偏差的多维仿真修正以匹配数据。	cs.LG	Matthias Schott, Lucie Flek	Accurate Monte Carlo (MC) modelling in high-energy physics is challenging, particularly in complex scenarios where simulations fail to reproduce observed data. In practice, experimental information is often limited to one-dimensional (1D) distributions, while ... Accurate Monte Carlo (MC) modelling in high-energy physics is challenging, particularly in complex scenarios where simulations fail to reproduce observed data. In practice, experimental information is often limited to one-dimensional (1D) distributions, while mismodelling arises in a multidimensional feature space. This restricts traditional correction methods, as one-dimensional reweighting ignores correlations and fully multidimensional approaches require large target datasets. We propose a ne...
634	Approximation Error Upper and Lower Bounds for H\"{o}lder Class with Transformers 2605.07463 Transformer Hölder Approximation Bounds给出Transformer逼近Hölder函数类的误差上下界与所需网络规模。	cs.LG	Xin He, Yuling Jiao, Xiliang Lu, Jerry Zhijian Yang	We explore the expressive power of Transformers by establishing precise approximation error upper and lower bounds for H\"{o}lder class. Specifically, a new approximation upper bound is derived for the standard Transformer architecture equipped with Softmax op... We explore the expressive power of Transformers by establishing precise approximation error upper and lower bounds for H\"{o}lder class. Specifically, a new approximation upper bound is derived for the standard Transformer architecture equipped with Softmax operators, ReLU activation functions, and residual connections. We prove that a Transformer network composed of at most $\mathcal{O}(\varepsilon^{-{d_{0}}/{\alpha}})$ blocks can approximate any bounded H\"{o}lder function with $d_{0}$-dimensi...
635	Physical Simulators as Do-Operators: Causal Discovery under Latent Confounders for AI-for-Science 2605.07467 Causal Discovery with Physical Simulators将物理模拟器视作干预算子，在潜在混杂下进行因果结构发现。	cs.LGcs.AI	Tsuyoshi Okita	Existing interventional causal discovery methods -- IGSP, DCDI, ENCO -- assume causal sufficiency (no latent confounders) and rely on virtual interventions in synthetic simulators. In AI-for-Science settings such as molecular design and materials science, late... Existing interventional causal discovery methods -- IGSP, DCDI, ENCO -- assume causal sufficiency (no latent confounders) and rely on virtual interventions in synthetic simulators. In AI-for-Science settings such as molecular design and materials science, latent confounders are ubiquitous and real interventions (e.g., physics-based simulations) require hours to days per data point. We propose CFM-SD (Causal Flow Matching with Simulation Data), which uses first-principles physical simulators as d...
636	Uncovering Hidden Systematics in Neural Network Models for High Energy Physics 2605.07470 Systematic Uncertainty in HEP NNs分析高能物理神经网络中隐藏系统误差的来源并提出诊断方法。	cs.LG	Lucie Flek, Philipp Alexander Jungs, Akbar Karimi, Timo Saala, Alexander Schmid	Neural networks (NNs) are inherently multidimensional classifiers that learn complex, non-linear relationships among input observables. While their flexibility enables unprecedented performance in high-energy physics (HEP) analyses, it also makes them sensitiv... Neural networks (NNs) are inherently multidimensional classifiers that learn complex, non-linear relationships among input observables. While their flexibility enables unprecedented performance in high-energy physics (HEP) analyses, it also makes them sensitive to small variations in their inputs. Consequently, the propagation and estimation of systematic uncertainties in NN-based models remain an open challenge. There are indications that uncertainties derived in control regions or from nominal...
637	Transfer Learning Across Fast- and Full-Simulation Domains in High-Energy Physics 2605.07471 HEP Fast-to-Full Transfer Learning系统评估从快仿到全仿的迁移学习在多种LHC任务上的效果。	cs.LG	Matthias Schott, Lucie Flek	Machine-learning models in high-energy physics are often trained on simulated data, where fully simulated samples are computationally expensive while fast simulation provides large statistics at reduced realism. In this work, we systematically study transfer l... Machine-learning models in high-energy physics are often trained on simulated data, where fully simulated samples are computationally expensive while fast simulation provides large statistics at reduced realism. In this work, we systematically study transfer learning between fast-simulated and fully simulated datasets in a realistic LHC environment. We consider three representative tasks, signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction, usi...
638	NPMixer: Hierarchical Neighboring Patch Mixing for Time Series Forecasting 2605.07476 Wavelet Patch Mixing Forecasting提出含可学习小波分解与邻域补丁混合的层次模型用于时间序列预测。	cs.LG	Jung Min Choi, Vijaya Krishna Yalavarthi, Lars Schmidt-Thieme	Multivariate time series forecasting remains a challenge due to the complexity of local temporal dynamics and global dependencies across multiple variables. In this paper, we propose \textbf{N}eighboring \textbf{P}atching \textbf{Mixer} (\textbf{NPMixer}), a h... Multivariate time series forecasting remains a challenge due to the complexity of local temporal dynamics and global dependencies across multiple variables. In this paper, we propose \textbf{N}eighboring \textbf{P}atching \textbf{Mixer} (\textbf{NPMixer}), a hierarchical architecture featuring a Learnable Stationary Wavelet Transform that adaptively learns filter coefficients to decompose signals into trend and detail components in a data-dependent manner. Our framework introduces a Neighboring ...
639	SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion 2605.07482 Retain-Set-Free LLM Unlearning用自蒸馏与logit降权实现无需保留集的LLM选择性遗忘。	cs.LGcs.AI	Zizhao Hu, Ameya Godbole, Johnny Tian-Zheng Wei, Mohammad Rostami, Jesse Thomason	Machine unlearning for large language models (LLMs) aims to selectively remove memorized content such as private data, copyrighted text, or hazardous knowledge, without costly full retraining. Most existing methods require a retain set of curated examples to p... Machine unlearning for large language models (LLMs) aims to selectively remove memorized content such as private data, copyrighted text, or hazardous knowledge, without costly full retraining. Most existing methods require a retain set of curated examples to prevent catastrophic degradation of general model utility, creating an extra data dependency that complicates deployment. We propose SHRED (Self-distillation via High-surprisal-only Retain-set-free Entropy Demotion), a retain-set-free unlear...
640	Does Your Neural Network Extrapolate? Feature Engineering as Identifiability Bias for OOD Generalization 2605.07483 OOD Extrapolation Identifiability Bias从可辨识性角度解释神经网络为何难外推，并用特征工程引入偏置改进OOD。	cs.LGcs.AI	Leonel Aguilar, Jan Nagler, Christoph Hoelscher, Nino Antulov-Fantulin	Successful deep neural networks discover salient features of data. We show when and why they fail to learn out-of-distribution (OOD)-relevant representations from an in-distribution (ID) training window. This requires decoupling feature learning from data-gene... Successful deep neural networks discover salient features of data. We show when and why they fail to learn out-of-distribution (OOD)-relevant representations from an in-distribution (ID) training window. This requires decoupling feature learning from data-generating-process (DGP) identifiability. From a single training window, OOD extrapolation is non-identifiable: infinitely many DGPs are $\varepsilon$-observationally equivalent on the training data but diverge arbitrarily outside it, and no in...
641	Excluding the Target Domain Improves Extrapolation: Deconfounded Hierarchical Physics Constraints 2605.07485 Deconfounded Physics Constraints for Extrapolation提出去混杂的分层物理约束门控机制以提升生成模型外推能力。	cs.LGcs.AI	Tsuyoshi Okita	Extrapolation to out-of-distribution conditions is a fundamental challenge for physics-constrained deep generative models. Existing methods apply physical constraints as a single static regularization term uniformly across the generation process, and address n... Extrapolation to out-of-distribution conditions is a fundamental challenge for physics-constrained deep generative models. Existing methods apply physical constraints as a single static regularization term uniformly across the generation process, and address neither the hierarchical structure of physical laws and the confounding variable problem. We propose the Deconfounded Hierarchical Gate (DHG), which serves as a diagnostic and control mechanism: it identifies when and how strongly temperatur...
642	Tessellations of Semi-Discrete Flow Matching 2605.07513 Semi-Discrete Flow Matching Geometry研究高斯到有限点目标的半离散流匹配并分析其诱导的几何结构。	cs.LG	Emile Pierret, Johannes Hertrich, Samuel Hurault, Julie Delon	We study Flow Matching in a semi-discrete setting where a Gaussian source is transported toward a discrete target supported on finitely many points. This semi-discrete regime is the theoretical setting behind the use of Flow Matching for generative modeling, w... We study Flow Matching in a semi-discrete setting where a Gaussian source is transported toward a discrete target supported on finitely many points. This semi-discrete regime is the theoretical setting behind the use of Flow Matching for generative modeling, where the target distribution is represented by a finite dataset. In this semi-discrete regime, the exact Flow Matching velocity field is available in closed form, which makes it possible to analyze the geometry induced by the terminal flow ...
643	Why Self-Inconsistency Arises in GNN Explanations and How to Exploit It 2605.07527 Self-Inconsistency in GNN Explanations揭示SI-GNN解释自不一致的成因并利用该现象改进解释与信号分配。	cs.LGcs.AI	Wenxin Tai, Yaqian Liu, Ting Zhong, Fan Zhou	Recent work has observed that explanations produced by Self-Interpretable Graph Neural Networks (SI-GNNs) can be self-inconsistent: when the model is reapplied to its own explanatory graph subset, it may produce a different explanation. However, why self-incon... Recent work has observed that explanations produced by Self-Interpretable Graph Neural Networks (SI-GNNs) can be self-inconsistent: when the model is reapplied to its own explanatory graph subset, it may produce a different explanation. However, why self-inconsistency arises remains poorly understood. In this work, we first identify re-explanation-induced context perturbation as the direct cause of score variation. We then introduce a latent signal assignment hypothesis to explain why only some ...
644	SGD for Variational Inference: Tackling Unbounded Variance via Preconditioning and Dynamic Batching 2605.07531 BBVI SGD Variance Control针对BBVI梯度方差无界问题提出预条件与动态批量并给出收敛分析。	cs.LG	Hippolyte Labarri\`ere, Cesare Molinari, Silvia Villa, Lorenzo Rosasco	Black-Box Variational Inference (BBVI) typically relies on Stochastic Gradient Descent (SGD) to optimize the Evidence Lower Bound (ELBO). However, the stochastic gradients in BBVI inherently exhibit unbounded variance, violating standard assumptions and instea... Black-Box Variational Inference (BBVI) typically relies on Stochastic Gradient Descent (SGD) to optimize the Evidence Lower Bound (ELBO). However, the stochastic gradients in BBVI inherently exhibit unbounded variance, violating standard assumptions and instead satisfying the weaker Blum-Gladyshev (BG) condition, where variance grows quadratically with distance from the optimum. In this paper, we bridge the gap between stochastic optimization theory and the practical instances of BBVI. Focusing ...
645	On the Invariance and Generality of Neural Scaling Laws 2605.07546 Generalizable Neural Scaling Laws研究如何让一次拟合的神经缩放律在新任务与新模型上保持泛化。	cs.LG	Xing Han, Ziyin Liu, Suchi Saria, Paul Pu Liang	Neural scaling laws establish a predictable relationship between model performance and data or compute, offering crucial guidance for resource allocation in new domains and tasks. Yet such laws are most needed precisely where they are hardest to obtain: fittin... Neural scaling laws establish a predictable relationship between model performance and data or compute, offering crucial guidance for resource allocation in new domains and tasks. Yet such laws are most needed precisely where they are hardest to obtain: fitting one for a new model task pair demands expensive sweeps that typically exhaust the very compute budget the law is meant to economize. This paper poses the research question of how to develop generalizable scaling laws: laws fit once on a w...
646	Disagreement-Regularized Importance Sampling for Adversarial Label Corruption 2605.07551 Robust Importance Sampling under Corruption提出基于代理集成分歧正则的采样策略以抵抗对抗性标签污染。	cs.LG	Csongor Horv\'ath, Ida-Maria Sintorn, Prashant Singh	Standard Importance Sampling (IS) collapses under label corruption because high-norm examples, prioritized for variance reduction, are often adversarial outliers. We formalize this misalignment using an $\varepsilon$-contamination model and propose Disagreemen... Standard Importance Sampling (IS) collapses under label corruption because high-norm examples, prioritized for variance reduction, are often adversarial outliers. We formalize this misalignment using an $\varepsilon$-contamination model and propose Disagreement-Regularized Importance Sampling (DR-IS), a sub-sampling method based on loss rank-disagreement across independent proxy ensemble. We prove finite-sample concentration bounds showing that the empirical rank disagreement of bulk corrupted e...
647	ProteinJEPA: Latent prediction complements protein language models 2605.07554 Protein JEPA Latent Prediction在蛋白语言模型中加入掩码位置的潜表示预测以提升表征学习。	cs.LGcs.AI	Dan Ofer, Dafna Shahaf, Michal Linial	Protein language models are trained primarily with masked language modeling (MLM), which predicts amino-acid identities at masked positions. We ask whether latent-space prediction can complement these token-level objectives under matched wall-clock budget. Acr... Protein language models are trained primarily with masked language modeling (MLM), which predicts amino-acid identities at masked positions. We ask whether latent-space prediction can complement these token-level objectives under matched wall-clock budget. Across pretrained and random-init protein sequence encoders at 35--150M parameters, we find that the best protein-JEPA design is not all-position latent prediction but a variant: predicting latent targets only at masked positions, and retainin...
648	Beyond Distribution Estimation: Simplex Anchored Structural Inference Towards Universal Semi-Supervised Learning 2605.07557 Universal Semi-Supervised Learning提出UniSSL并用单纯形锚定结构推断在未知无标分布下利用未标数据。	cs.LG	Yaxin Hou, Jun Ma, Hanyang Li, Bo Han, Jie Yu	Semi-supervised learning faces significant challenges in realistic scenarios where labeled data is scarce and unlabeled data follows unknown, arbitrary distributions. We formalize this critical yet under-explored paradigm as Universal Semi-supervised Learning ... Semi-supervised learning faces significant challenges in realistic scenarios where labeled data is scarce and unlabeled data follows unknown, arbitrary distributions. We formalize this critical yet under-explored paradigm as Universal Semi-supervised Learning (UniSSL). Existing methods typically leverage unlabeled data via pseudo-labeling. However, they often rely on the idealized assumption of a uniform unlabeled data distribution or require sufficient labeled data to estimate it. In the UniSSL...
649	Ensemble Distributionally Robust Bayesian Optimisation 2605.07565 Distributionally Robust Bayesian Optimisation提出可计算的集成分布鲁棒贝叶斯优化以应对上下文分布不确定性。	cs.LGcs.AI	Tigran Ramazyan, Denis Derkach	We study zeroth-order optimisation under context distributional uncertainty, a setting commonly tackled using Bayesian optimisation (BO). A prevailing strategy to make BO more robust to the complex and noisy nature of data is to employ an ensemble as the surro... We study zeroth-order optimisation under context distributional uncertainty, a setting commonly tackled using Bayesian optimisation (BO). A prevailing strategy to make BO more robust to the complex and noisy nature of data is to employ an ensemble as the surrogate model, thereby mitigating the weaknesses of any single model. In this study, we propose a novel algorithm for Ensemble Distributionally Robust Bayesian Optimisation that remains computationally tractable while managing continuous conte...
650	Bilevel Graph Structure Learning, Revisited: Inner-Channel Origins of the Reported Gain 2605.07577 Bilevel Graph Structure Learning Analysis证明双层图结构学习的收益多来自内循环训练动力学而非重连本身。	cs.LG	Minkyoung Kim, Beakcheol Jang	Bilevel graph structure learning is widely understood to improve graph neural networks by jointly optimizing model parameters and a learned graph structure, with the resulting performance gain attributed to the rewired adjacency. We find that this attribution ... Bilevel graph structure learning is widely understood to improve graph neural networks by jointly optimizing model parameters and a learned graph structure, with the resulting performance gain attributed to the rewired adjacency. We find that this attribution may be overstated: training-dynamics effects in the inner loop, rather than the rewiring itself, capture a substantial share of the gain. To establish this, we introduce frozen-$\phi$, a control that freezes the graph while retaining the in...
651	Revisiting Transformer Layer Parameterization Through Causal Energy Minimization 2605.07588 Transformer Parameterization via Energy Minimization用因果能量最小化框架重新解释并指导Transformer层的参数化设计。	cs.LGcs.AI	Jin Xu, Camille Couturier, Victor R\"uhle, Saravan Rajmohan, James Hensman	Transformer blocks typically combine multi-head attention (MHA) for token mixing with gated MLPs for token-wise feature transformation, yet many choices in their parameterization remain largely empirical. We introduce Causal Energy Minimization (CEM), a framew... Transformer blocks typically combine multi-head attention (MHA) for token mixing with gated MLPs for token-wise feature transformation, yet many choices in their parameterization remain largely empirical. We introduce Causal Energy Minimization (CEM), a framework that recasts Transformer layers as optimization steps on conditional energy functions while explicitly accounting for layer parameterization. Extending prior energy-based interpretations of attention, CEM shows that weight-tied MHA can ...
652	Optimal Recourse Summaries via Bi-Objective Decision Tree Learning 2605.07598 Recourse Summaries Decision Trees用双目标决策树学习生成群体级可行动作摘要以便审计与公平分析。	cs.LG	Ioannis Chatzis, Jason Liartis, Athanasios Voulodimos, Giorgos Stamou	Actionable Recourse provides individuals with actions they can take to change an unfavorable classifier outcome. While useful at the instance level, it is ill-suited for global auditing and bias detection, since aggregating local actions is costly and often in... Actionable Recourse provides individuals with actions they can take to change an unfavorable classifier outcome. While useful at the instance level, it is ill-suited for global auditing and bias detection, since aggregating local actions is costly and often inconsistent. Recourse Summaries address this limitation by partitioning the population and assigning one shared action per subgroup, enabling comparison across subgroups. Designing summaries involves a fundamental trade-off between recourse ...
653	Learning Large-Scale Modular Addition with an Auxiliary Modulus 2605.07648 Modular Addition with Auxiliary Modulus通过引入辅助模数缓解协变量偏移并扩展大规模模加法学习能力。	cs.LG	Hanato Kikuchi, Ryosuke Masuya, Kazuhiko Kawamoto, Hiroshi Kera	Learning parity functions, more general modular addition, is a challenging machine learning task due to its input sensitivity. A recent study substantially scaled modular addition learning in both the number of summands and the modulus. Its key idea is to incr... Learning parity functions, more general modular addition, is a challenging machine learning task due to its input sensitivity. A recent study substantially scaled modular addition learning in both the number of summands and the modulus. Its key idea is to increase zeros in training sequences, reducing the effective number of summands and thus controlling training difficulty; however, this induces covariate shift between training and test input distributions. This study theoretically and empirica...
654	Direction-Preserving Number Representations 2605.07662 Direction-Preserving Low-Precision Numbers研究有限字母表低精度数值对向量方向保持能力并给出几何刻画。	cs.LG	Bardia Zadeh, George A. Constantinides	Low-precision number formats are widely used in modern machine learning systems due to their efficiency. Accurate direction representation is key to the accuracy of vector operations. This work precisely explores the extent to which the direction of a vector c... Low-precision number formats are widely used in modern machine learning systems due to their efficiency. Accurate direction representation is key to the accuracy of vector operations. This work precisely explores the extent to which the direction of a vector can be represented by selecting its scalar elements from a common finite alphabet of a given size. This is standard practice in machine learning, where low-precision significands may be narrow-width floating-point or integer values. A geomet...
655	Structured Coupling for Flow Matching 2605.07676 Structured Coupling for Flow Matching在流匹配中引入结构化潜变量与噪声耦合以学习可解释潜结构。	cs.LG	Xavier Sumba, Carles Balsells-Rodas, Yingzhen Li	Standard flow matching scales well but typically relies on an unstructured source distribution, limiting its ability to learn interpretable latent structure. Latent-variable models, by contrast, capture structure but often sacrifice generative quality. We brid... Standard flow matching scales well but typically relies on an unstructured source distribution, limiting its ability to learn interpretable latent structure. Latent-variable models, by contrast, capture structure but often sacrifice generative quality. We bridge this gap by proposing Structured Coupling for Flow Matching (SCFM), a cooperative framework that augments flow matching with structured latent representation learning. By introducing structured latent variables and exogenous noise into t...
656	The Coupling Tax: How Shared Token Budgets Undermine Visible Chain-of-Thought Under Fixed Output Limits 2605.07686 Chain-of-Thought Coupling Tax揭示固定输出预算下推理链会挤占答案长度并导致性能下降的耦合税。	cs.LG	Wenhua Nie, Junlin Liu, Jianan Wu, Zijie Meng, Yilong Fan	Chain-of-thought reasoning is often treated as a monotone way to improve language-model accuracy by letting a model think longer. We identify a countervailing effect, the coupling tax: when reasoning traces and final answers share one output-token budget, long... Chain-of-thought reasoning is often treated as a monotone way to improve language-model accuracy by letting a model think longer. We identify a countervailing effect, the coupling tax: when reasoning traces and final answers share one output-token budget, long traces can crowd out the answer they are meant to support. Across GSM8K, MATH-500, and five BIG-Bench Hard tasks with Qwen3 models at three scales, non-thinking mode matches or outperforms thinking mode on GSM8K and MATH-500 at every budge...
657	Gradient Starvation in Binary-Reward GRPO: Why Group-Mean Centering Fails and Why the Simplest Fix Works 2605.07689 GRPO Gradient Starvation Fix分析二值奖励下GRPO均值中心化导致梯度饥饿并给出简单有效修复。	cs.LG	Wenhua Nie, Jianan Wu, Junlin Liu, Ziwei Li, Zheng Lin	Group Relative Policy Optimization (GRPO) is a standard algorithm for reinforcement learning from verifiable rewards, but its group-mean-centered advantage can fail under binary rewards. The failure mode is gradient starvation: when every response in a group i... Group Relative Policy Optimization (GRPO) is a standard algorithm for reinforcement learning from verifiable rewards, but its group-mean-centered advantage can fail under binary rewards. The failure mode is gradient starvation: when every response in a group is correct or every response is wrong, the centered advantage is exactly zero and the policy receives no learning signal. We prove that the true degeneracy rate always exceeds the i.i.d. Bernoulli prediction by Jensen's inequality, and obser...
658	Fortifying Time Series: DTW-Certified Robust Anomaly Detection 2605.07690 DTW-Certified Robust Anomaly Detection提出在DTW扰动模型下具可认证鲁棒性的时间序列异常检测方法。	cs.LG	Shijie Liu, Tansu Alpcan, Christopher Leckie, Sarah Erfani	Time-series anomaly detection is critical for ensuring safety in high-stakes applications, where robustness is a fundamental requirement rather than a mere performance metric. Addressing the vulnerability of these systems to adversarial manipulation is therefo... Time-series anomaly detection is critical for ensuring safety in high-stakes applications, where robustness is a fundamental requirement rather than a mere performance metric. Addressing the vulnerability of these systems to adversarial manipulation is therefore essential. Existing defenses are largely heuristic or provide certified robustness only under $\ell_p$-norm constraints, which are incompatible with time-series data. In particular, $\ell_p$-norm fails to capture the intrinsic temporal s...
659	Toward Better Geometric Representations for Molecule Generative Models 2605.07693 Geometric Representations for Molecule Generation改进分两阶段分子生成中的几何表征学习以提升3D结构生成质量。	cs.LG	Shaoheng Yan, Zian Li, Cai Zhou, Qiaojing Huang, Kai Liu	Geometric representation-conditioned molecule generation provides an effective paradigm that decouples molecule representation modeling from structure generation. By decoupling molecule generation into two stages-first generating a meaningful molecule represen... Geometric representation-conditioned molecule generation provides an effective paradigm that decouples molecule representation modeling from structure generation. By decoupling molecule generation into two stages-first generating a meaningful molecule representation, and then generating a 3D molecule conditioned on this representation-the efficiency and quality of the generation process can be significantly enhanced. However, its effectiveness is fundamentally limited by the quality of the repre...
660	Future Validity is the Missing Statistic: From Impossibility to $\Phi$-Estimation for Grammar-Faithful Speculative Decoding 2605.07698 Grammar-Faithful Speculative Decoding指出现有推测解码偏离语法条件分布并用Φ估计实现语法忠实采样。	cs.LG	Wenhua Nie, Zijie Meng, Kun Zou, Zheng Lin, Ziwei Li	Grammar-constrained generation is often combined with local vocabulary masking and speculative decoding, but the resulting sampling law is not the grammar-conditional distribution users usually intend. We show that any speculative decoder with local mask acces... Grammar-constrained generation is often combined with local vocabulary masking and speculative decoding, but the resulting sampling law is not the grammar-conditional distribution users usually intend. We show that any speculative decoder with local mask access, Leviathan rejection, and rollback soundness samples from the locally projected distribution $\mu^{\mathrm{proj}}$ rather than the grammar-conditional distribution $\mu^\star$. This extends the GAD impossibility result to speculative deco...
661	Bayesian Fine-tuning in Projected Subspaces 2605.07706 Bayesian LoRA Fine-tuning在LoRA投影子空间中进行贝叶斯微调以量化不确定性。	cs.LG	Viktar Dubovik, Patryk Marsza{\l}ek, Jacek Tabor, Tomasz Ku\'smierczyk	Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of large models by decomposing weight updates into low-rank matrices, significantly reducing storage and computational overhead. While effective, standard LoRA lacks mechanisms for uncertainty ... Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of large models by decomposing weight updates into low-rank matrices, significantly reducing storage and computational overhead. While effective, standard LoRA lacks mechanisms for uncertainty quantification, leading to overconfident and poorly calibrated models. Bayesian variants of LoRA address this limitation, but at the cost of a significantly increased number of trainable parameters, partially offsetting the original efficie...
662	An Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inference 2605.07719 Hybrid Sparse Attention提出CPU-GPU并行的混合稀疏注意力加速长上下文推理。	cs.LGcs.AI	Feiyu Yao, Zhixiong Niu, Xiaqing Li, Yongqiang Xiong, Juan Fang	Long-context inference increasingly operates over CPU-resident KV caches, either because decoding-time KV states exceed GPU memory capacity or because disaggregated prefill-decode systems place KV data in host memory. Although block-sparse attention reduces at... Long-context inference increasingly operates over CPU-resident KV caches, either because decoding-time KV states exceed GPU memory capacity or because disaggregated prefill-decode systems place KV data in host memory. Although block-sparse attention reduces attention cost in this setting, sparsity alone is insufficient for end-to-end efficiency. GPU-only designs remain constrained by PCIe bandwidth and metadata memory overhead, while CPU-GPU hybrid designs still suffer from substantial GPU idle ...
663	Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences 2605.07724 Synthetic Retraining Collapse理论分析多奖励偏好策展可缓解生成模型递归训练塌缩。	cs.LGcs.AI	Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab	Recursive retraining of generative models poses a critical representation challenge: when synthetic outputs are curated based on a fixed reward signal, the model tends to collapse onto a narrow set of outputs that over-optimize that objective. Prior work sugge... Recursive retraining of generative models poses a critical representation challenge: when synthetic outputs are curated based on a fixed reward signal, the model tends to collapse onto a narrow set of outputs that over-optimize that objective. Prior work suggests that such collapse is unavoidable without adding real data into the mix. We revisit this conclusion from an alignment perspective and show that collapse can be mitigated through curation based on multiple reward functions. We formalize ...
664	Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow 2605.07727 Wasserstein Generative Policy用W2梯度流推导一步生成式策略更新并含信任域约束。	cs.LGcs.AI	Juil Koo, Mingue Park, Jiwon Choi, Yunhong Min, Minhyuk Sung	We propose Drifting Field Policy (DFP), a non-ODE one-step generative policy built on the drifting model paradigm. We frame the policy update as a reverse-KL Wasserstein-2 gradient flow toward a soft target policy, so that each DFP update corresponds to a grad... We propose Drifting Field Policy (DFP), a non-ODE one-step generative policy built on the drifting model paradigm. We frame the policy update as a reverse-KL Wasserstein-2 gradient flow toward a soft target policy, so that each DFP update corresponds to a gradient step in probability space. By construction, this gradient is decomposed into an ascent toward higher action-value regions and a score matching with the anchor policy as a trust region. We further derive a simple, tractable surrogate of...
665	Intelligent Truck Matching in Full Truckload Shipments using Ping2Hex approach 2605.07733 GPS Truck-Shipment Matching用Ping2Hex将GPS轨迹匹配到运单并做概率排序。	cs.LGcs.AI	Srinivas Kumar R, Jose Mathew, Ankit Singh Chauhan, Dinesh Rajkumar, Aravind Manoj	Accurate truck-to-shipment matching using GPS data is foundational for full truckload supply chain visibility, enabling real-time tracking and accurate estimated time of arrival (ETA) predictions. However, missing or corrupted vehicle identifiers prevent tradi... Accurate truck-to-shipment matching using GPS data is foundational for full truckload supply chain visibility, enabling real-time tracking and accurate estimated time of arrival (ETA) predictions. However, missing or corrupted vehicle identifiers prevent traditional matching approaches, leaving shipments without visibility. This paper presents Intelligent Truck Matching (ITM) 2.0, a machine learning system that addresses this critical gap by formulating matching as a probabilistic ranking proble...
666	Robust and Reliable AI for Predictive Quality in Semiconductor Materials Manufacturing with MLOps and Uncertainty Quantification 2605.07752 MLOps with Uncertainty基于五年产线数据评测重训练策略并做不确定性质量预测。	cs.LG	Min Gao, Julia Maria Perathoner, Anton Ludwig Bonin, Steven Eulig, Gianni Klesse	Semiconductor materials manufacturing presents unique challenges for machine learning deployment due to evolving process conditions, equipment degradation, and raw material variability that can cause model performance deterioration over time. This study benchm... Semiconductor materials manufacturing presents unique challenges for machine learning deployment due to evolving process conditions, equipment degradation, and raw material variability that can cause model performance deterioration over time. This study benchmarks machine learning operations (MLOps) retraining strategies using five years of real manufacturing data to identify optimal retraining approaches for quality prediction. We evaluate various retraining frequencies and hyperparameter optim...
667	When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining 2605.07756 Online Loss Weighting用双层梯度对齐在线学习预训练多损失权重以提升下游效果。	cs.LGcs.AI	Ivan Karpukhin, Andrey Savchenko	Modern deep models are often pretrained on large-scale data with missing labels using composite objectives, where the relative weights of multiple loss terms act as hyperparameters. Tuning these weights with random search or Bayesian optimization is computatio... Modern deep models are often pretrained on large-scale data with missing labels using composite objectives, where the relative weights of multiple loss terms act as hyperparameters. Tuning these weights with random search or Bayesian optimization is computationally expensive, as it requires many independent training runs. To address this, we propose a gradient-based bilevel method that learns pretraining loss weights online by aligning the composite pretraining gradient with a downstream objecti...
668	Efficient Verification of Neural Control Barrier Functions with Smooth Nonlinear Activations 2605.07757 NCBF Formal Verification提出LightCROWN为tanh等激活计算更紧Jacobian界以验证NCBF。	cs.LG	Jun Zhang, Haibo Zhang, Chun Liu, Xiaofan Wang, Liang Xu	Formal verification of neural control barrier functions (NCBFs) remains challenging, especially for neural networks with nonlinear activations like $\tanh$. Existing CROWN-based methods rely on conservative linear relaxations for Jacobian bounds, limiting sc... Formal verification of neural control barrier functions (NCBFs) remains challenging, especially for neural networks with nonlinear activations like $\tanh$. Existing CROWN-based methods rely on conservative linear relaxations for Jacobian bounds, limiting scalability. We propose LightCROWN, which computes tighter Jacobian bounds by exploiting the analytical properties of activation functions. Experiments on nonlinear control systems including the inverted pendulum, Dubins car, and planar quadr...
669	Pre-trained Tabular Foundation Models as Versatile Summary Networks for Neural Posterior Estimation 2605.07765 TabPFN for SBI将TabPFN作为免训练摘要网络用于仿真贝叶斯后验估计。	cs.LG	Elliot Pickens, Chiraag Gohel, Sidharth Satya	In this work, we study TabPFN as a training-free, modular summary network for simulation-based Bayesian inference (SBI). Tabular foundation models such as TabPFN are pretrained on broad families of synthetic tabular data-generating processes and adapt at test ... In this work, we study TabPFN as a training-free, modular summary network for simulation-based Bayesian inference (SBI). Tabular foundation models such as TabPFN are pretrained on broad families of synthetic tabular data-generating processes and adapt at test time through in-context learning, making them natural candidates for SBI, where posterior estimation often depends on learning informative summaries of simulated observations. We propose PFN-NPE: a general recipe that uses a pretrained TabP...
670	Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers 2605.07772 Transformer Mean-field Training在均场Transformer中研究训练如何促使表示逃离token聚类。	cs.LG	Noboru Isobe, Daisuke Inoue, Masaaki Imaizumi	Transformers perform inference by iteratively transforming token representations across layers. This layerwise computation has been studied empirically, and recent mean-field theories of Transformer dynamics explain how attention can drive token distributions ... Transformers perform inference by iteratively transforming token representations across layers. This layerwise computation has been studied empirically, and recent mean-field theories of Transformer dynamics explain how attention can drive token distributions toward clustering. However, existing mean-field analyses largely treat model parameters as prescribed, leaving open how training reshapes this clustering picture. We study this question in a noisy mean-field Transformer in which only a para...
671	POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles 2605.07775 Uncertainty-Aware LLM Optimization用计算高效的策略集成实现Thompson采样式LLM优化与不确定性。	cs.LGcs.AI	Nicolas Menet, Andreas Krause, Abbas Rahimi	Balancing exploration and exploitation is a core challenge in sequential decision-making and black-box optimization. We introduce POETS ($\textbf{Po}$licy $\textbf{E}$nsembles for $\textbf{T}$hompson $\textbf{S}$ampling), a novel framework that bridges uncerta... Balancing exploration and exploitation is a core challenge in sequential decision-making and black-box optimization. We introduce POETS ($\textbf{Po}$licy $\textbf{E}$nsembles for $\textbf{T}$hompson $\textbf{S}$ampling), a novel framework that bridges uncertainty quantification and policy optimization. Our approach is grounded in the insight that policies trained with Kullback-Leibler (KL) regularization implicitly encode an underlying reward function. Building on this, POETS bypasses the compl...
672	Neural Operators as Efficient Function Interpolators 2605.07792 Neural Operators Interpolation将神经算子重释为函数插值器并在多基准上优于MLP。	cs.LGcs.AI	Vasilis Niarchos, Angelos Sirbu, Sokratis Trifinopoulos	Neural operators (NOs) are designed to learn maps between infinite-dimensional function spaces. We propose a novel reframing of their use. By introducing an auxiliary base-space, any finite-dimensional function can be viewed as an operator acting by compositio... Neural operators (NOs) are designed to learn maps between infinite-dimensional function spaces. We propose a novel reframing of their use. By introducing an auxiliary base-space, any finite-dimensional function can be viewed as an operator acting by composition on functions of the base-space. Through a range of benchmarks on analytic functions of increasing complexity and dimensionality, we demonstrate that NOs can match or outperform standard multilayer perceptrons and Kolmogorov--Arnold Networ...
673	Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning 2605.07799 LUPI for Foundation Models提出PIQL引入特权信息以加速并提升表格基础模型泛化。	cs.LGcs.AI	Xueying Ding, Leman Akoglu	Training foundation models is computationally intensive and often slow to converge.We introduce PIQL,Privileged Information for Quick and Quality Learning, the first framework to systematically integrate privileged information (PI) to simultaneously accelerate... Training foundation models is computationally intensive and often slow to converge.We introduce PIQL,Privileged Information for Quick and Quality Learning, the first framework to systematically integrate privileged information (PI) to simultaneously accelerate learning and improve generalization in tabular foundation models (TFMs). We construct two complementary forms of PI: (i) aggregate dataset-level statistics that reduce the burden on in-context learning, and (ii) encodings of the underlying...
674	Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning 2605.07804 On-Policy Distillation Pruning通过剪枝漂移轨迹提升长链推理的在策略蒸馏效率与可靠性。	cs.LGcs.AI	Zhicheng Yang, Zhijiang Guo, Yifan Song, Minrui Xu, Yongxin Wang	On-policy distillation (OPD) leverages dense teacher rewards to enhance reasoning models. However, scaling OPD to long-horizon tasks exposes a critical flaw: as the student's generated prefix inevitably diverges from the teacher's thought process, the teacher'... On-policy distillation (OPD) leverages dense teacher rewards to enhance reasoning models. However, scaling OPD to long-horizon tasks exposes a critical flaw: as the student's generated prefix inevitably diverges from the teacher's thought process, the teacher's dense reward loses local exploitability. Continuing to generate and evaluate tokens on these ``drifted'' trajectories not only degrades reward quality but also incurs massive computational waste. To address this, we introduce \textbf{Prun...
675	Flexible Routing via Uncertainty Decomposition 2605.07805 Uncertainty-Based Model Routing分解不确定性实现灵活路由以减少不必要的高成本调用。	cs.LG	Charlotte Peale, Siddartha Devic, Parikshit Gopalan, Udi Wieder, Aravind Gollakota	A key strategy for balancing performance and cost in modern machine learning systems is to dynamically route queries to either a low-cost model or a more expensive oracle (such as a large pretrained model or human expert), an approach known as model routing. I... A key strategy for balancing performance and cost in modern machine learning systems is to dynamically route queries to either a low-cost model or a more expensive oracle (such as a large pretrained model or human expert), an approach known as model routing. In this work we present a new uncertainty-aware router that (1) avoids unnecessary oracle calls on inherently ambiguous queries, and (2) adapts dynamically to different loss functions and cost parameters through simple hyperparameter changes...
676	The Minimax Rate of Second-Order Calibration 2605.07808 Second-Order Calibration Rate给出二阶校准误差估计的极小极大收敛速率与方法。	cs.LG	Kamil Ciosek, Banafsheh Rafiee, Sina Ghiassian, Nicol\`o Felicioni	We characterize the minimax rate of estimating the second-order calibration error for binary classification, which quantifies whether a higher-order predictor's epistemic-uncertainty estimate matches the conditional variance of the label probability on its lev... We characterize the minimax rate of estimating the second-order calibration error for binary classification, which quantifies whether a higher-order predictor's epistemic-uncertainty estimate matches the conditional variance of the label probability on its level sets. Our key observation is that the sech perturbation kernel, previously used only to enforce smoothness of calibration functions, in fact makes them analytic in a strip of half-width $h\pi/2$. Polynomial regression then estimates the ...
677	Scaling Categorical Flow Maps 2605.07820 Discrete Flow Matching LM扩展离散流匹配语言模型以提升可扩展性与生成效率。	cs.LG	Oscar Davis, Anastasiia Filippova, Pierre Ablin, Victor Turrisi, Amitis Shidani	Continuous diffusion and flow matching models could represent a powerful alternative to autoregressive approaches for language modelling (LM), as they unlock a host of advantages currently reserved for continuous modalities, including accelerated sampling and ... Continuous diffusion and flow matching models could represent a powerful alternative to autoregressive approaches for language modelling (LM), as they unlock a host of advantages currently reserved for continuous modalities, including accelerated sampling and tilting. Recently, several works have demonstrated the possibility of generating discrete data continuously by a simple flow matching process between a Gaussian and the one-hot encoded data distribution. They have further shown the feasibil...
678	Approximation-Free Differentiable Oblique Decision Trees 2605.07837 Differentiable Oblique Trees提出无需近似的可微斜决策树以端到端梯度训练。	cs.LGcs.AI	Subrat Prasad Panda, Blaise Genest, Arvind Easwaran	Decision Trees (DTs) are widely used in safety-critical domains such as medical diagnosis, valued for their interpretability and effectiveness on tabular data. However, training accurate oblique DTs is challenging due to complex optimization landscapes and ove... Decision Trees (DTs) are widely used in safety-critical domains such as medical diagnosis, valued for their interpretability and effectiveness on tabular data. However, training accurate oblique DTs is challenging due to complex optimization landscapes and overfitting risks, particularly in regression. Recent advances have introduced differentiable formulations that enable gradient-based training and joint optimization of decision boundaries and leaf regressors. Yet, existing approaches typicall...
679	RelAgent: LLM Agents as Data Scientists for Relational Learning 2605.07840 LLM Agent for Relational Learning提出RelAgent让LLM代理自动搜索并构建关系学习建模流程。	cs.LG	Xingyue Huang, Louis Tichelman, Jinwoo Kim, Krzysztof Olejniczak, \.Ismail \.Ilkan Ceylan	Relational learning is a challenging problem that has motivated a wide range of approaches, including graph-based models (e.g., graph neural networks, graph transformers), tabular methods (e.g., tabular foundation models), and sequence-based approaches (e.g., ... Relational learning is a challenging problem that has motivated a wide range of approaches, including graph-based models (e.g., graph neural networks, graph transformers), tabular methods (e.g., tabular foundation models), and sequence-based approaches (e.g., large language models), each with its own advantages and limitations. We propose RelAgent, an LLM-based autonomous data scientist for relational learning, which operates in two phases. In the search phase, an LLM agent uses database, valida...
680	\mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments 2605.07841 Adversarial Decentralized Learning在对手占多数场景下用一致性激励机制实现去中心化鲁棒学习。	cs.LGcs.AI	Hanzaleh Akbari Nodehi, Parsa Moradi, Soheil Mohajer, Mohammad Ali Maddah-Ali	Decentralized machine learning often relies on outsourcing computations, such as gradient evaluations, to untrusted worker nodes. Existing robust aggregation methods can mitigate malicious behavior under honest-majority assumptions, but may fail when adversari... Decentralized machine learning often relies on outsourcing computations, such as gradient evaluations, to untrusted worker nodes. Existing robust aggregation methods can mitigate malicious behavior under honest-majority assumptions, but may fail when adversaries control a majority of the workers. We study this adversary-dominated setting through an incentive-oriented framework in which reports are accepted and rewarded only when they are mutually consistent up to a threshold. This turns the adve...
681	Distributional simplicity bias and effective convexity in Energy Based Models 2605.07844 Energy-Based Model Dynamics从有效模型动力学解释EBM训练的简单性偏置与有效凸性。	cs.LG	Aur\'elien Decelle, Alfonso de Jes\'us Navas G\'omez, Beatriz Seoane	Energy-based learning is a powerful framework for generative modelling, but its training is inherently non-convex, leading potentially to sensitivity to initialisation, poor local optima, and unstable gradient dynamics. We present a dynamical analysis of energ... Energy-based learning is a powerful framework for generative modelling, but its training is inherently non-convex, leading potentially to sensitivity to initialisation, poor local optima, and unstable gradient dynamics. We present a dynamical analysis of energy-based learning through the lens of the effective model, which can be interpreted as either a generalised Ising model with higher-order interactions or the Fourier expansion of the energy. Under sufficient expressivity, we show that the gr...
682	Actor-Critic Algorithm for Dynamic Expectile and CVaR 2605.07857 Risk-Sensitive Actor-Critic提出无扰动策略梯度与模型无关学习以优化动态Expectile与CVaR。	cs.LG	Yudong Luo, Erick Delage	Optimizing dynamic risk with stochastic policies is challenging in both policy updates and value learning. The former typically requires transition perturbation, while the latter may rely on model-based approaches. To address these challenges, we propose a sur... Optimizing dynamic risk with stochastic policies is challenging in both policy updates and value learning. The former typically requires transition perturbation, while the latter may rely on model-based approaches. To address these challenges, we propose a surrogate policy gradient without transition perturbation under softmax policy parameterization. We further develop model-free value learning methods for dynamic expectile and conditional value-at-risk by leveraging elicitability. Finally, ins...
683	On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems 2605.07860 Federated On-Device Generative Models分析联邦预测维护中端侧生成模型的性能与资源权衡。	cs.LGcs.AI	Usevalad Milasheuski, Piero Baraldi, Enrico Zio, Stefano Savazzi	Federated Learning (FL) has emerged as a promising paradigm for preserving client data ownership and control over distributed Internet of Things (IoT) environments. While discriminative models dominate most FL use cases, recent advances in generative models --... Federated Learning (FL) has emerged as a promising paradigm for preserving client data ownership and control over distributed Internet of Things (IoT) environments. While discriminative models dominate most FL use cases, recent advances in generative models -- such as Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), and Diffusion Models (DM) -- offer new opportunities for unsupervised anomaly detection in time series analysis, with relevant applications in predictive mainte...
684	ADKO: Agentic Decentralized Knowledge Optimization 2605.07863 Decentralized GP Optimization用知识token通信的多代理GP框架实现隐私高效黑盒优化。	cs.LG	Lucas Nerone Rillo, Zhanhong Jiang, Nastaran Saadati, Aditya Balu, Baskar Ganapathysubramanian	We present Agentic Decentralized Knowledge Optimization (ADKO), a framework for collaborative black-box optimization across autonomous agents that achieves sample efficiency, privacy preservation, heterogeneous-objective handling, and communication efficiency.... We present Agentic Decentralized Knowledge Optimization (ADKO), a framework for collaborative black-box optimization across autonomous agents that achieves sample efficiency, privacy preservation, heterogeneous-objective handling, and communication efficiency. Each agent maintains a private Gaussian Process (GP) surrogate trained on local data and communicates only through knowledge tokens-compact, lossy summaries containing directional signals, advantage scores, and optional language-model (LM)...
685	Black-box model classification under the discriminative factorization 2605.07878 Black-box Model Classification提出判别因子分解以用更稳健查询集表征并分类黑盒模型。	cs.LG	Hayden Helm, Merrick Ohata, Carey Priebe	Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based ... Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based on the relationship between their embedded responses to a set of queries are useful for inferring model-level properties, the quality of these representations is highly sensitive to the query set. We introduce the \emph{discriminative facto...
686	Adaptive Regularization for Sparsity Control in Bregman-Based Optimizers 2605.07892 Adaptive Sparsity Regularization为Bregman优化器自适应调节正则以精确控制稀疏率。	cs.LG	Ahmad Aloradi, Tim Roith, Emanu\"el A. P. Habets, Daniel Tenbrinck	Sparse training reduces the memory and computational costs of deep neural networks. However, sparse optimization methods, e.g., those adding an $\ell_1$ penalty, often control sparsity only indirectly through a regularization parameter $\lambda$, whose mapping... Sparse training reduces the memory and computational costs of deep neural networks. However, sparse optimization methods, e.g., those adding an $\ell_1$ penalty, often control sparsity only indirectly through a regularization parameter $\lambda$, whose mapping to the final sparsity rate is non-trivial. In our experiments, we found this parameter sensitivity to be particularly pronounced for Bregman-based optimizers. Specifically, the two variants LinBreg and AdaBreg reach the same sparsity at $\...
687	Curvature Beyond Positivity: Greedy Guarantees for Arbitrary Submodular Functions 2605.07902 Submodular Greedy Guarantees给出任意子模函数在曲率条件下的贪心近似保证。	cs.LG	Yixin Chen, Alan Kuhnle	Submodular functions -- functions exhibiting diminishing returns -- are central to machine learning. When the objective is monotone and non-negative, the greedy algorithm achieves a tight $63\%$ approximation. But many practical objectives incorporate costs th... Submodular functions -- functions exhibiting diminishing returns -- are central to machine learning. When the objective is monotone and non-negative, the greedy algorithm achieves a tight $63\%$ approximation. But many practical objectives incorporate costs that make them negative on some inputs, and all existing multiplicative guarantees require non-negativity. Prior work handles negativity through additive bounds for the special class of decomposable functions and non-monotonicity through part...
688	Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders 2605.07922 Hierarchical Sparse Autoencoders提出Tree SAE学习稀疏自编码器的层级特征结构并改进判别准则。	cs.LG	Tue M. Cao, Hoang X. Nhat, Raed Alharbi, My T. Thai	Learning hierarchical features in Sparse Autoencoders (SAEs) is essential for capturing the structured nature of real-world data and mitigating issues like feature absorption or splitting. Existing works attempt to identify hierarchical relationships within in... Learning hierarchical features in Sparse Autoencoders (SAEs) is essential for capturing the structured nature of real-world data and mitigating issues like feature absorption or splitting. Existing works attempt to identify hierarchical relationships within independent feature sets by relying on activation coverage, the assumption that child feature should only activate when its parent feature activates. However, we demonstrate that this condition alone is insufficient; that is, it often produce...
689	INO-SGD: Addressing Utility Imbalance under Individualized Differential Privacy 2605.07930 Individualized Differential Privacy SGD提出INO-SGD在个性化差分隐私下缓解不同群体效用不均衡。	cs.LGcs.AI	Xiao Tian, Jue Fan, Rachael Hwee Ling Sim, Bryan Kian Hsiang Low	Differential privacy (DP) is widely employed in machine learning to protect confidential or sensitive training data from being revealed. As data owners gain greater control over their data due to personal data ownership, they are more likely to set their own p... Differential privacy (DP) is widely employed in machine learning to protect confidential or sensitive training data from being revealed. As data owners gain greater control over their data due to personal data ownership, they are more likely to set their own privacy requirements, necessitating individualized DP (IDP) to fulfil such requests. In particular, owners of data from more sensitive subsets, such as positive cases of stigmatized diseases, likely set stronger privacy requirements, as leak...
690	Prototype Guided Post-pretraining for Single-Cell Representation Learning 2605.07938 Single-Cell Prototype Post-pretraining用原型引导的后预训练提升单细胞表示在长尾与分布移位下泛化。	cs.LG	Sachini Weerasekara, Natasha Darras, Sagar Kamarthi, Colles Price, Jacqueline Isaacs	Single-cell representation learning (SCRL) from gene expression data offers a way to uncover the complex regulatory logic underlying cellular function. Inspired by large language models in natural language modeling, several single-cell pretrained models have r... Single-cell representation learning (SCRL) from gene expression data offers a way to uncover the complex regulatory logic underlying cellular function. Inspired by large language models in natural language modeling, several single-cell pretrained models have recently been proposed that treat genes as tokens and cells as sentences. However, these models are fundamentally limited by the long-tailed nature of cell-type distributions and struggle to generalize under covariate shifts in gene expressi...
691	Slowly Annealed Langevin Dynamics: Theory and Applications to Training-Free Guided Generation 2605.07950 Annealed Langevin sampling提出SALD慢退火采样并给出KL收敛界用于免训练引导生成。	cs.LG	Atsushi Nitanda, Dake Bu, Yueming Lyu, Tanya Veeravalli	We study Slowly Annealed Langevin Dynamics (SALD), a sampler for tracking a path of moving target distributions and approximating the terminal target through time slowdown. We establish non-asymptotic convergence guarantees via a KL differential inequality, sh... We study Slowly Annealed Langevin Dynamics (SALD), a sampler for tracking a path of moving target distributions and approximating the terminal target through time slowdown. We establish non-asymptotic convergence guarantees via a KL differential inequality, showing that slowdown improves tracking through contraction of intermediate targets and the complexity of the path. Motivated by training-free guided generation with pretrained score-based generative models, we further introduce Velocity-Awar...
692	Convergent Stochastic Training of Attention and Understanding LoRA 2605.07959 LoRA attention optimization theory统一分析注意力与LoRA的随机训练并证明在温和条件下可收敛可训练。	cs.LG	Zhengkai Sun, Dibyakanti Kumar, Alejandro F Frangi, Anirbit Mukherjee, Mingfei Sun	Transformers have revolutionized machine learning and deploying attention layers in the model is increasingly standard across a myriad of applications. Further, for large models, it is common to implement Low Rank Adaptation (LoRA), whereby a factorized parame... Transformers have revolutionized machine learning and deploying attention layers in the model is increasingly standard across a myriad of applications. Further, for large models, it is common to implement Low Rank Adaptation (LoRA), whereby a factorized parameterization of them is trained, to achieve a surprisingly beneficial accuracy-size trade-off. In this work, via a unified framework we rigorously establish trainability of such models under stochastic methods. We prove that for any mild regu...
693	Graph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMs 2605.07961 Federated LLM poisoning defense用图表示学习增强联邦微调以检测并缓解恶意更新的模型操纵攻击。	cs.LG	Hanlin Cai, Kai Li, Houtianfu Wang, Haofan Dong, Yichen Li	Federated fine-tuning (FFT) has emerged as a privacy-preserving paradigm for collaboratively adapting large language models (LLMs). Built upon federated learning, FFT enables distributed agents to jointly refine a shared pretrained LLM by aggregating local LLM... Federated fine-tuning (FFT) has emerged as a privacy-preserving paradigm for collaboratively adapting large language models (LLMs). Built upon federated learning, FFT enables distributed agents to jointly refine a shared pretrained LLM by aggregating local LLM updates without sharing local raw data. However, FFT-based LLMs remain vulnerable to model manipulation threats, in which adversarial participants upload manipulated LLM updates that corrupt the aggregation process and degrade the performa...
694	FLAM: Evaluating Model Performance with Aggregatable Measures in Federated Learning 2605.07962 Federated evaluation metrics aggregation提出可聚合评估度量框架以在联邦学习中可靠汇总各端性能。	cs.LG	Fabian Stricker, Jose A. Peregrina, David Bermbach, Christian Zirpins	Performance evaluation is essential for assessing the quality of machine learning (ML) models and guiding deployment decisions. In federated learning (FL), assessing the performance is challenging because data are distributed across participants. Consequently,... Performance evaluation is essential for assessing the quality of machine learning (ML) models and guiding deployment decisions. In federated learning (FL), assessing the performance is challenging because data are distributed across participants. Consequently, the coordinator must rely on locally computed evaluation metrics and aggregate them to assess the global model. A key challenge is that common aggregation strategies, such as weighted averaging based on the local samples per participant, d...
695	Aggregation in conformal e-classification 2605.07963 Conformal e-predictor aggregation实验研究交叉共形e预测器及其更简洁灵活的聚合改进方法。	cs.LG	Vladimir Vovk	Aggregating conformal predictors is a standard way of balancing their predictive and computational efficiency while retaining their validity, at least approximately. An important advantage of conformal e-predictors is that they are easier to aggregate without ... Aggregating conformal predictors is a standard way of balancing their predictive and computational efficiency while retaining their validity, at least approximately. An important advantage of conformal e-predictors is that they are easier to aggregate without sacrificing their validity. This paper studies experimentally cross-conformal e-prediction, which is an existing method of aggregating conformal e-predictors, and its modifications that are conceptually simpler and more flexible.
696	When Diffusion Model Can Ignore Dimension: An Entropy-Based Theory 2605.07969 Diffusion sampling dimension-free theory用熵分析解释扩散采样在高维下可少步数并给出弱维度依赖理论。	cs.LG	Ahmad Aghapour, Erhan Bayraktar	Diffusion models perform remarkably well on high-dimensional data such as images, often using only a modest number of reverse-time steps. Despite this practical success, existing convergence theory does not fully explain why such samplers remain efficient in h... Diffusion models perform remarkably well on high-dimensional data such as images, often using only a modest number of reverse-time steps. Despite this practical success, existing convergence theory does not fully explain why such samplers remain efficient in high dimensions. Many prior KL guarantees bound the discretization error in terms of the ambient dimension, while other improved results replace this dependence using intrinsic-dimensional or geometric structure assumptions. In this work, we...
697	It Just Takes Two: Scaling Amortized Inference to Large Sets 2605.07972 Amortized inference for large sets提出仅需两元素训练即可泛化到大集合条件的摊销后验估计方法。	cs.LGcs.AI	Antoine Wehenkel, Michael Kagan, Lukas Heinrich, Chris Pollard	Neural posterior estimation has emerged as a powerful tool for amortized inference, with growing adoption across scientific and applied domains. In many of these applications, the conditioning variable is a set of observations whose elements depend not only on... Neural posterior estimation has emerged as a powerful tool for amortized inference, with growing adoption across scientific and applied domains. In many of these applications, the conditioning variable is a set of observations whose elements depend not only on the target but also on unknown factors shared across the set. Optimal inference therefore requires treating the set jointly, which in turn requires training the estimator at the deployment set size -- a regime where memory and compute quic...
698	Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback 2605.07977 Online federated LLM self-improvement在联邦在线微调中用优势加权自博弈融合实时反馈提升LLM。	cs.LG	Seohyun Lee, Wenzhi Fang, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton	Recent works have advanced feedback-based learning systems, whereby a foundation model is able to intake incoming feedback (e.g., a user) to self-improve, creating a self-loop system of training. However, existing works are limited in needing to consider an of... Recent works have advanced feedback-based learning systems, whereby a foundation model is able to intake incoming feedback (e.g., a user) to self-improve, creating a self-loop system of training. However, existing works are limited in needing to consider an offline setup to allow for such feedback-based methods, and are further limited in the need of requiring privileged ground-truth contexts for training. Moreover, there is limited consideration of federated learning (FL), which is particularly...
699	Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning 2605.07980 Bayesian linear response interpretability系统介绍贝叶斯学习中的线性响应与易感度用于解释网络与影响函数。	cs.LG	Chris Elliott, Daniel Murfet	These notes introduce the theory of susceptibilities as developed in [arXiv:2504.18274, arXiv:2601.12703] for interpreting neural networks. The susceptibility of an observable $\phi$ to a data perturbation is defined as a derivative of a posterior expectation,... These notes introduce the theory of susceptibilities as developed in [arXiv:2504.18274, arXiv:2601.12703] for interpreting neural networks. The susceptibility of an observable $\phi$ to a data perturbation is defined as a derivative of a posterior expectation, which by the fluctuation--dissipation theorem equals a posterior covariance. Different choices of $\phi$ yield different objects: per-sample losses give the influence matrix (the Bayesian influence function of [arXiv:2509.26544]), while co...
700	Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions 2605.07984 Mechanistic planning localization in LMs用线性探针与激活补丁定位语言模型前向传播中的潜在规划表征并验证因果性。	cs.LGcs.AI	Nicole Ma, Nick Rui	We study planning site formation in language models -- where internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forwar... We study planning site formation in language models -- where internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forward-looking constraint, we apply two lightweight methods (linear probing and activation patching) across Qwen3, Gemma-3, and Llama-3 at more than ten scales. Probing shows that future-rhyme information is linearly decodable at the line bounda...
701	Bayesian Sensitivity of Causal Inference Estimators under Evidence-Based Priors 2605.07993 Bayesian sensitivity for causal estimators在证据先验下提出因果估计的贝叶斯敏感性分析以评估结论稳健性。	cs.LG	Nikita Dhawan, Daniel Shen, Leonardo Cotta, Chris J. Maddison	Causal inference, especially in observational studies, relies on untestable assumptions about the true data-generating process. Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions. Existing framewor... Causal inference, especially in observational studies, relies on untestable assumptions about the true data-generating process. Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions. Existing frameworks for sensitivity analysis are concerned with worst-case changes in assumptions. In this work, we argue that using such pessimistic criteria can often become uninformative or lead to conclusions contradicting our prior knowledge about the ...
702	Graph-Structured Hyperdimensional Computing for Data-Efficient and Explainable Process-Structure-Property Prediction 2605.07999 Graph hyperdimensional PSP prediction用图结构超维计算实现数据高效且可解释的工艺-结构-性能预测。	cs.LGcs.AI	Jingzhan Ge, Ajeeth Vellore, Ajinkya Palwe, Ahsan Khan, David Gorsich	Multiphoton photoreduction enables high-fidelity fabrication of complex 3D microstructures, yet reliable process-structure-property (PSP) prediction remains difficult because the available data are sparse, heterogeneous, and interaction-dominated. In this regi... Multiphoton photoreduction enables high-fidelity fabrication of complex 3D microstructures, yet reliable process-structure-property (PSP) prediction remains difficult because the available data are sparse, heterogeneous, and interaction-dominated. In this regime, conventional feature-vector models are statistically underdetermined, making them prone to spurious correlations, poor regime transfer, and unstable post hoc explanations, whereas mechanistic pipelines depend on calibrated submodels tha...
703	STEPS: A Temporal Smooth Error Propagation Solver on the Manifolds for Test-Time Adaptation in Time Series Forecasting 2605.08005 Test-time adaptation for time series提出STEPS在流形上平滑传播误差以提升时序预测的在线测试时自适应。	cs.LG	Jiaqi Liu, Yifan Ouyang, Zhifei Song, Sim Kuan Goh, Ashwaq Qasem	Test-Time Adaptation (TTA) aims to improve time series forecasting under distribution shifts by using limited observations revealed during inference. However, forecasting TTA must operate in a source-free online setting, where the adaptation signal is short, t... Test-Time Adaptation (TTA) aims to improve time series forecasting under distribution shifts by using limited observations revealed during inference. However, forecasting TTA must operate in a source-free online setting, where the adaptation signal is short, temporally correlated, and potentially noisy. Existing methods can therefore suffer from weak identifiability, error accumulation, and unstable long-horizon corrections when the revealed prefix is sparse or contaminated. To address these iss...
704	Interpreting Reinforcement Learning Agents with Susceptibilities 2605.08007 Susceptibility interpretability for RL将易感度推广到强化学习遗憾并用其解释智能体训练阶段性内部特征。	cs.LG	Chris Elliott, Einar Urdshals, David Quarel, Daniel Murfet	Susceptibilities are a technique for neural network interpretability that studies the response of posterior expectation values of observables to perturbations of the loss. We generalize this construction to the setting of the regret in deep reinforcement learn... Susceptibilities are a technique for neural network interpretability that studies the response of posterior expectation values of observables to perturbations of the loss. We generalize this construction to the setting of the regret in deep reinforcement learning and investigate the utility of susceptibilities in a simple gridworld model that nevertheless exhibits non-trivial stagewise development. We argue that susceptibilities reveal internal features of the development of the model in paramet...
705	Adaptive Domain Decomposition Physics-Informed Neural Networks for Traffic State Estimation with Sparse Sensor Data 2605.08028 Domain-decomposed PINNs for traffic提出ADD-PINN自适应域分解以在稀疏传感下重建交通速度场并保留激波。	cs.LG	Eunhan Ka, Ludovic Leclercq, Satish V. Ukkusuri	Traffic state estimation from sparse fixed sensors is challenging because physics-informed neural networks (PINNs) tend to over-smooth the shockwaves admitted by the Lighthill-Whitham-Richards (LWR) model. This study proposes Adaptive Domain Decomposition Phys... Traffic state estimation from sparse fixed sensors is challenging because physics-informed neural networks (PINNs) tend to over-smooth the shockwaves admitted by the Lighthill-Whitham-Richards (LWR) model. This study proposes Adaptive Domain Decomposition Physics-Informed Neural Networks (ADD-PINN), a two-stage residual-guided framework for LWR-based offline speed-field reconstruction. A coarse global PINN is first trained; its spatial residual profile is then used to place subdomain boundaries ...
706	Don't Get Your Kroneckers in a Twist: Gaussian Processes on High-Dimensional Incomplete Grids 2605.08036 Scalable Gaussian processes on grids提出CUTS-GPR利用不完整网格与加性核实现高维GPR的快速精确推断。	cs.LG	Mads Greisen H{\o}jlund, August Smart Lykke-M{\o}ller, Henry Moss, Ove Christiansen	We introduce CUTS-GPR, a new method for performing numerically exact Gaussian process regression (GPR) in high-dimensional settings. The key component of CUTS-GPR is an extremely fast kernel matrix-vector product, which exhibits near-linear or even linear scal... We introduce CUTS-GPR, a new method for performing numerically exact Gaussian process regression (GPR) in high-dimensional settings. The key component of CUTS-GPR is an extremely fast kernel matrix-vector product, which exhibits near-linear or even linear scaling with the amount of training data, $N$, and low-order polynomial scaling with dimensionality, $D$. This is obtained by combining an additive kernel with an incomplete grid and exploiting the resulting structure of the kernel matrix. We d...
707	Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph 2605.08037 Preference graph optimization for LMs将多候选偏好数据建模为偏好图以改进DPO训练稳定性与信息利用。	cs.LGcs.AI	Ning Liu, Chuanneng Sun, Kristina Klinkner, Shervin Malmasi	Direct Preference Optimization (DPO) aligns language models using pairwise preference comparisons, offering a simple and effective alternative to Reinforcement Learning (RL) from human feedback. However, in many practical settings, training data consists of mu... Direct Preference Optimization (DPO) aligns language models using pairwise preference comparisons, offering a simple and effective alternative to Reinforcement Learning (RL) from human feedback. However, in many practical settings, training data consists of multiple rollouts per prompt, inducing rich preference structure that pairwise DPO fails to exploit. Collapsing such data into independent pairs discards transitivity, introduces redundant or conflicting supervision, and can lead to unstable ...
708	Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs 2605.08053 Exponential-utility reinforcement learning推导指数效用MDP的Q学习型算法并证明相关算子收敛与压缩性。	cs.LG	Gugan Thoppe, L. A. Prashanth, Ankur Naskar, Sanjay Bhat	Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type equation for exponenti... Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type equation for exponential utility studied in \cite{porteus1975optimality}, we derive two Q-value-style extensions and show that the associated operators are contractions in the $L_\infty$ and sup-log/Thompson metrics, respectively. We characterize their fixed poi...
709	GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs 2605.08074 Localized conformal prediction on graphs提出GRAPHLCP利用图结构进行局部化共形预测以获得更紧的不确定性集合。	cs.LG	Peyman Baghershahi, Fangxin Wang, Debmalya Mandal, Sourav Medya	Conformal prediction (CP) provides a distribution-free approach to uncertainty quantification with finite-sample guarantees. However, applying CP to graph neural networks (GNNs) remains challenging as the combinatorial nature of graphs often leads to insuffici... Conformal prediction (CP) provides a distribution-free approach to uncertainty quantification with finite-sample guarantees. However, applying CP to graph neural networks (GNNs) remains challenging as the combinatorial nature of graphs often leads to insufficiently certain predictions and indiscriminative embeddings. Existing methods primarily rely on embedding-space proximity for localization, which can be unreliable for graphs and yield inefficient prediction sets. We propose GRAPHLCP, a proxi...
710	Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping 2605.08075 Zero-shot imagined speech decoding通过想象到聆听MEG映射实现零样本想象语音解码并利用更易标注数据。	cs.LGeess.AS	Maryam Maghsoudi, Shihab Shamma	Decoding imagined speech from non-invasive brain recordings is challenging because imagined datasets are scarce and difficult to align temporally across subjects and sessions In this work, we propose a new approach to the decoding of imagined speech that lever... Decoding imagined speech from non-invasive brain recordings is challenging because imagined datasets are scarce and difficult to align temporally across subjects and sessions In this work, we propose a new approach to the decoding of imagined speech that leverages the richer and more reliably labeled recordings during listening to speech. We collected paired listened and imagined MEG recordings to rhythmic melodic and spoken stimuli from trained musicians. Using trained musicians helped improve ...
711	XDecomposer: Learning Prior-Free Set Decomposition for Multiphase X-ray Diffraction 2605.05866 Multiphase XRD set decomposition提出XDecomposer无先验地将多相PXRD谱分解为组分以辅助结构识别。	cs.LG	Hanyu Gao, Bin Cao, Yunyue Su, Tong-Yi Zhang, Qiang Liu	Multiphase powder X-ray diffraction (PXRD) analysis remains a fundamental bottleneck in structure identification, as real-world synthesis often produces complex mixtures whose constituent phases (components) cannot be reliably disentangled. While recent advanc... Multiphase powder X-ray diffraction (PXRD) analysis remains a fundamental bottleneck in structure identification, as real-world synthesis often produces complex mixtures whose constituent phases (components) cannot be reliably disentangled. While recent advances in representation-based crystal retrieval and generation suggest the possibility of inferring structures directly from PXRD, existing approaches largely assume single-phase inputs and break down in multiphase settings. Here, we present X...
712	MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems 2605.06623 Multi-agent prompt optimization提出MASPO联合优化多智能体角色提示以对齐局部目标与系统整体表现。	cs.LG	Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang	Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them acr... Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iterati...
713	Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs 2605.06669 Prompt injection defense evaluation构建教育LLM导师的注入防御评测并量化安全性、可用性与延迟权衡。	cs.LGcs.AI	Alexandre Cristov\~ao Maiorano	Educational LLM tutors face a core AI alignment challenge: they must follow user intent while preserving pedagogical constraints and safety policies. We present an evaluation methodology for prompt-injection defenses in this setting, showing that guardrail des... Educational LLM tutors face a core AI alignment challenge: they must follow user intent while preserving pedagogical constraints and safety policies. We present an evaluation methodology for prompt-injection defenses in this setting, showing that guardrail design entails explicit trade-offs among adversarial robustness, benign-task usability, and response latency. We evaluate a domain-specific multi-layer safeguard pipeline combining deterministic pattern filters, structural validation, contextu...
714	Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations 2605.06696 Coalition detection in multi-agent AI从智能体内部表征提出谱诊断方法以早期检测隐藏联盟与信息耦合。	cs.LGcs.AI	Cameron Berg, Susan L. Schneider, Mark M. Bailey	Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from ... Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent. Here, we introduce a practical method for detecting coalition structure from the internal neu...
715	Information-theoretic Limits of Learning and Estimation 2605.06710 Information-theoretic learning limits综述信息论工具如何给出学习与估计的基本极限并配套练习。	cs.LG	Abbas El Gamal, Maxim Raginsky	Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-ch... Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-chapter exercises makes the material suitable for both classroom use and self-study. We begin by introducing concentration inequalities along with the notions of covering and packing in metric spaces, and the associated concept of metric entr...
716	TUANDROMD-X: Advanced Entropy and Visual Analytics Dataset for Enhanced Malware Detection and Classification 2605.06718 Malware detection analytics dataset发布含熵与可视分析特征的数据集以支持恶意软件检测与分类研究。	cs.LG	Parthajit Borah, Upasana Sarmah, D. K. Bhattacharyya, J. K. Kalita	Malware and malware-based attacks are becoming more prevalent and complex. Attackers regularly come up with new techniques that have the ability to evade conventional and signature-based malware defense. In order to address such threats, there is an increasing... Malware and malware-based attacks are becoming more prevalent and complex. Attackers regularly come up with new techniques that have the ability to evade conventional and signature-based malware defense. In order to address such threats, there is an increasing demand for advanced and better defense solutions. Machine learning-based techniques are efficiently capable of defending against malware and malware-based attacks. Nevertheless, creating and efficiently testing such techniques demand high-...
717	AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites 2605.06841 Affordance-grounded world models提出基于可供性约束的世界模型以处理动作前置条件与组合式环境动态。	cs.LGcs.AI	Qinshi Zhang (University of California, San Diego), Weipeng Deng (University of Hong Kong), Zhihan Jiang (Columbia University), Jiaming Qu (Amazon)	In model-based learning, the agent learns behaviors by simulating trajectories based on world model predictions. Standard world models typically learn a stationary transition function that maps states and actions to next states, when an action and an outcome f... In model-based learning, the agent learns behaviors by simulating trajectories based on world model predictions. Standard world models typically learn a stationary transition function that maps states and actions to next states, when an action and an outcome frequently co-occur in training data, the model tends to internalize this correlation as a general causal rule while ignoring action preconditions. In interactive environments, however, agent actions can reshape the future affordance space. ...
718	One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators 2605.06873 Neural operators for probabilistic conditioning学习单一神经算子以摊销近似多分布条件化过程而非逐任务学条件分布。	cs.LG	Panos Tsimpos, Edoardo Calvello, Ayoub Belhadji, Nicholas H. Nelsen	Probabilistic conditioning is concerned with the identification of a distribution of a random variable $X$ given a random variable $Y$. It is a cornerstone of scientific and engineering applications where modeling uncertainty is key. This problem has tradition... Probabilistic conditioning is concerned with the identification of a distribution of a random variable $X$ given a random variable $Y$. It is a cornerstone of scientific and engineering applications where modeling uncertainty is key. This problem has traditionally been addressed in machine learning by directly learning the conditional distribution of a fixed joint distribution. This paper introduces a novel perspective: we propose to solve the conditioning problem by identifying a single operato...
719	Kernel Selection is Model Selection: A Unified Complexity-Penalized Approach for MMD Two-Sample Tests 2605.06883 Kernel selection for MMD testing将核选择视为模型选择并用复杂度惩罚统一提升MMD两样本检验功效。	cs.LG	Yijin Ni, Xiaoming Huo	The Maximum Mean Discrepancy (MMD) is a cornerstone statistic for nonparametric two-sample testing, but its test power is dictated entirely by the chosen kernel. Because any fixed kernel inherently fails to distinguish certain distributions, the kernel must be... The Maximum Mean Discrepancy (MMD) is a cornerstone statistic for nonparametric two-sample testing, but its test power is dictated entirely by the chosen kernel. Because any fixed kernel inherently fails to distinguish certain distributions, the kernel must be dynamically optimized. However, data-driven optimization violates the foundational i.i.d. assumption, forcing a strict trade-off in existing frameworks. Ratio criteria ignore this dependence, inducing overfitting and variance collapse on r...
720	Muon with Nesterov Momentum: Heavy-Tailed Noise and (Randomized) Inexact Polar Decomposition 2605.06884 Muon optimizer theory with momentum分析带Nesterov动量的Muon在重尾噪声与近似极分解下的收敛与误差。	cs.LG	Sayantan Choudhury, Xiaoran Cheng, Martin Tak\'a\v{c}, Sen Na, Mladen Kolar	Most first-order optimizers treat matrix-valued parameters as vectors, ignoring the intrinsic geometry of hidden-layer weights in neural networks. Muon addresses this mismatch by updating along the polar factor of a momentum matrix, but its theoretical underst... Most first-order optimizers treat matrix-valued parameters as vectors, ignoring the intrinsic geometry of hidden-layer weights in neural networks. Muon addresses this mismatch by updating along the polar factor of a momentum matrix, but its theoretical understanding has lagged behind practice. In particular, practical implementations incorporate Nesterov momentum, compute the polar factor only approximately, and operate with stochastic gradients that may be heavy-tailed. We close this gap by dev...
721	McNdroid: A Longitudinal Multimodal Benchmark for Robust Drift Detection in Android Malware 2605.06894 Android恶意软件漂移基准构建2013-2025多模态安卓恶意软件基准用于漂移检测。	cs.LG	Md Mahmuduzzaman Kamol, Jesus Lopez, Saeefa Rubaiyet Nowmi, Emilia Rivas, Md Ahsanul Haque	Machine learning (ML) in real-world systems must contend with concept drift, adversarial actors, and a spectrum of potential features with varying costs and benefits. Malware naturally exhibits all of these complexities, but for the same reason, it is challeng... Machine learning (ML) in real-world systems must contend with concept drift, adversarial actors, and a spectrum of potential features with varying costs and benefits. Malware naturally exhibits all of these complexities, but for the same reason, it is challenging to curate and organize data to study these factors. We present McNdroid, to our knowledge the largest longitudinal multimodal Android malware benchmark for malware detection and drift analysis. McNdroid spans 2013--2025, excluding 2015,...
722	Accelerated Relax-and-Round for Concave Coverage Problems 2605.06900 凹覆盖问题近似算法提出加速relax-and-round以更快求解凹覆盖优化问题。	cs.LG	Matthew Fahrbach, Mehraneh Liaee, Morteza Zadimoghaddam	We present an accelerated relax-and-round algorithm for concave coverage problems, which generalize the classic maximum coverage problem. Building on the relax-and-round framework of Barman et al. [STACS 2021], we propose two significant improvements. First, w... We present an accelerated relax-and-round algorithm for concave coverage problems, which generalize the classic maximum coverage problem. Building on the relax-and-round framework of Barman et al. [STACS 2021], we propose two significant improvements. First, we replace the linear programming (LP) relaxation step with a projected accelerated gradient method applied to a smooth surrogate objective to achieve a $\widetilde{O}(mn \varepsilon^{-1})$ running time. Second, we use a specialized rounding...
723	You Only Stack Once (YOSO): A Motion-Filtered, Deep-Learning Framework for Detecting Faint Moving Sources 2605.06913 天文移动目标检测用像素级运动滤波与深度学习自动检出微弱慢速天体。	cs.LG	Nitya Pandey, C\'esar Fuentes, Pedro Bernardinelli, Valeria Fr\'ias, Colin Orion Chandler	We present You Only Stack Once (YOSO), an automated pipeline designed to detect faint, slow-moving Solar System objects in wide-field astronomical surveys. The pipeline integrates a novel Gaussian Motion Filter (GMoF) that operates at the pixel level to enhanc... We present You Only Stack Once (YOSO), an automated pipeline designed to detect faint, slow-moving Solar System objects in wide-field astronomical surveys. The pipeline integrates a novel Gaussian Motion Filter (GMoF) that operates at the pixel level to enhance signal-to-noise for objects exhibiting a range of apparent rates of motion. Unlike conventional shift-and-stack methods, which rely on discrete velocity trials, GMoF amplifies trails while suppressing random noise and static background fe...
724	Generalising Travel Time Prediction To Varying Route Choices In Urban Networks 2605.06918 城市路网行程时间预测提出可区分路线选择的模型以预测流量与旅行时间。	cs.LG	{\L}ukasz Gorczyca, Kacper Drozd, Micha{\l} Bujak, Rafa{\l} Kucharski	Previous methods that predict system-wide travel time, predominantly grounded in graph neural networks, remain limited to typical and recurring demand patterns. While they successfully predict future congestion following daily commute, they inherently approxim... Previous methods that predict system-wide travel time, predominantly grounded in graph neural networks, remain limited to typical and recurring demand patterns. While they successfully predict future congestion following daily commute, they inherently approximate a single demand realisation and fail to capture varying route choices. In this work, we propose a Generalised Travel Time Predictor (GenTTP) that successfully differentiates route choices and offers accurate flow and travel time predict...
725	In-Context Credit Assignment via the Core 2605.06920 上下文信用分配机制用合作博弈least core为上下文中创作者分配价值与补偿。	cs.LGcs.AI	Keegan Harris, Siddharth Prasad, Asher Trockman	We propose incentive-aligned mechanisms for in-context credit assignment: the task of assigning credit for AI-generated content (e.g. code, news articles, short-form videos) among creators whose intellectual property appears in the context window. Our approach... We propose incentive-aligned mechanisms for in-context credit assignment: the task of assigning credit for AI-generated content (e.g. code, news articles, short-form videos) among creators whose intellectual property appears in the context window. Our approach is based on the least core solution concept from cooperative game theory, which distributes value in a way that is as stable as possible by ensuring that no subset of creators is significantly under-compensated relative to the value they c...
726	Physics-Based Flow Matching for Full-Field Prediction of Silicon Photonic Devices 2605.06929 光子器件电磁场生成模型用条件流匹配生成模型替代FDTD预测光子器件全场分布。	cs.LG	Joseph Quaratiello, Anthony Rizzo	Designing photonic integrated circuits requires accurate electromagnetic field simulations, which remain computationally expensive even for simple device geometries. We present PIC-Flow, a generative neural surrogate that predicts electromagnetic field distrib... Designing photonic integrated circuits requires accurate electromagnetic field simulations, which remain computationally expensive even for simple device geometries. We present PIC-Flow, a generative neural surrogate that predicts electromagnetic field distributions for photonic devices given their geometry and operating wavelength as an alternative to costly finite-difference time-domain (FDTD) simulations. Our approach combines three key ideas: (i) conditional flow matching as the generative f...
727	Multi-Objective Constraint Inference using Inverse reinforcement learning 2605.06951 多目标约束逆强化学习从异质专家演示中推断多目标安全约束并提升效率。	cs.LGcs.AI	Syed Ihtesham Hussain Shah, Floris den Hengst, Aneta Lisowska, Annette ten Teije	Constraint inference is widely considered essential to align reinforcement learning agents with safety boundaries and operational guidelines by observing expert demonstrations. However, existing approaches typically assume homogeneous demonstrations (i.e., gen... Constraint inference is widely considered essential to align reinforcement learning agents with safety boundaries and operational guidelines by observing expert demonstrations. However, existing approaches typically assume homogeneous demonstrations (i.e., generated by a single expert or multiple experts with identical objectives). They also have limited ability to capture individual preferences and often suffer from computational inefficiencies. In this paper, we introduce Multi-Objective Const...
728	Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions 2605.06959 高维分段线性回归以DoMA参数化并用ABGD实现高维分段线性回归局部收敛。	cs.LG	Haitham Kanj, Kiryung Lee	This paper presents a parametric solution to piecewise linear regression through the Adaptive Block Gradient Descent (ABGD) algorithm. The heart of the method is the parametrization of piecewise linear functions as the difference of max-affine (DoMA) functions... This paper presents a parametric solution to piecewise linear regression through the Adaptive Block Gradient Descent (ABGD) algorithm. The heart of the method is the parametrization of piecewise linear functions as the difference of max-affine (DoMA) functions. A non-asymptotic local convergence analysis for ABGD is provided under sub-Gaussian covariate and noise distributions. To initialize ABGD, we adapt a prior algorithm originally developed for the simpler setting of max-affine functions. Wh...
729	A Differentiable Bayesian Relaxation for Latent Partial-Order Inference 2605.06976 潜在偏序推断贝叶斯松弛提出可微贝叶斯松弛从线性序列数据恢复潜在部分序。	cs.LG	Dongqing Li, Geoff K. Nicholls, Shiyi Sun, You Luo	Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary linearization rather than true ... Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary linearization rather than true prerequisites. We introduce a differentiable relaxation for latent partial-order inference from such traces. Starting from a hard frontier-constrained model of noisy linear extensions, we replace discontinuous product-order precedence and b...
730	Drawing Lines in Psychological Space: What K-means Clustering Reveals in Simulated and Real Psychometric Data 2605.06989 心理测量K-means局限分析用模拟与真实数据揭示K-means不等同于发现潜在类别。	cs.LGcs.AI	Pedro Henrique Ramos Pinto, Maria Jullyanna Ferreira Marques, Luiz Carlos Serramo Lopez	K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means p... K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In this paper, we examine this limitation through a sequence of controlled simulated datasets....
731	Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift 2605.07005 分布移位学习模型等价证明PQ学习与可拒绝的TDS学习在分布无关设定下等价。	cs.LG	Adam R. Klivans, Shyamal Patel, Konstantinos Stavropoulos, Arsen Vasilyan	Recent work on provably efficient algorithms for learning with distribution shift has focused on two models: PQ learning (Goldwasser et al. (2020)) and TDS learning (Klivans et al. (2024)). Algorithms for TDS learning are allowed to reject a test set entirely ... Recent work on provably efficient algorithms for learning with distribution shift has focused on two models: PQ learning (Goldwasser et al. (2020)) and TDS learning (Klivans et al. (2024)). Algorithms for TDS learning are allowed to reject a test set entirely if distribution shift is detected. In contrast, PQ learners may only reject points that are deemed out-of-distribution on an individual basis. Our main result is a surprising equivalence between these two models in the distribution-free set...
732	Learning Cross-Atlas Consistent Brain Disorder Representations via Disentangled Multi-Atlas Functional Connectivity Learning 2605.07026 多脑图谱功能连接表征用解耦多图谱学习获得跨atlas一致的脑疾病表征。	cs.LGcs.AI	Minheng Chen, Chao Cao, Jing Zhang, Tianming Liu, Dajiang Zhu	Functional connectivity (FC) derived from resting-state fMRI is widely used to characterize large-scale brain network alterations in neurological and psychiatric disorders. However, FC construction critically depends on the choice of brain atlas, and different... Functional connectivity (FC) derived from resting-state fMRI is widely used to characterize large-scale brain network alterations in neurological and psychiatric disorders. However, FC construction critically depends on the choice of brain atlas, and different parcellations may emphasize distinct organizational features, leading to heterogeneous and sometimes inconsistent representations. Existing multi-atlas approaches partially alleviate this issue but often fuse atlas-derived features or pred...
733	BGM-IV: an AI-powered Bayesian generative modeling approach for instrumental variable analysis 2605.07029 贝叶斯生成式工具变量因果提出BGM-IV以潜变量生成模型处理非线性高维IV因果估计。	cs.LGcs.AI	Guyue Luo, Qiao Liu	Instrumental-variable (IV) regression enables causal estimation under endogeneity, but modern IV problems often involve nonlinear structural effects and high-dimensional covariates. Existing nonlinear IV methods directly learn the causal relation in observed f... Instrumental-variable (IV) regression enables causal estimation under endogeneity, but modern IV problems often involve nonlinear structural effects and high-dimensional covariates. Existing nonlinear IV methods directly learn the causal relation in observed feature space or rely on learned representations within two-stage or moment-based procedures, which can struggle when the causal information is embedded in a high-dimensional representation. We propose BGM-IV, a latent Bayesian generative mo...
734	Beyond the Wrapper: Identifying Artifact Reliance in Static Malware Classifiers using TRUSTEE 2605.07034 静态恶意软件伪迹依赖检测用TRUSTEE识别静态分类器对packing等非语义伪迹的依赖。	cs.LG	Riyazuddin Mohammed, Lan Zhang	Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these u... Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these unnecessary artifacts rather than the true binary behavior because of the high association between maliciousness and packing. Moreover, these malware classifiers are black boxes, making it difficult to understand what they learn. To address ...
735	The Context Gathering Decision Process: A POMDP Framework for Agentic Search 2605.07042 Agent搜索的POMDP建模将上下文收集建模为POMDP以指导LLM代理高效检索信息。	cs.LGcs.AI	Chinmaya Kausik, Adith Swaminathan, Nathan Kallus	Large Language Model (LLM) agents are deployed in complex environments -- such as massive codebases, enterprise databases, and conversational histories -- where the relevant state far exceeds their context windows. To navigate these spaces, an agent must itera... Large Language Model (LLM) agents are deployed in complex environments -- such as massive codebases, enterprise databases, and conversational histories -- where the relevant state far exceeds their context windows. To navigate these spaces, an agent must iteratively explore the environment to find relevant information. However, without explicit infrastructure, an agent's working memory can degrade into lossy representations of the search state, resulting in redundant work (e.g. repetitive loopin...
736	An Interpretable and Scalable Framework for Evaluating Large Language Models 2605.07046 可解释可扩展LLM评测用可扩展IRT框架建模题目异质性与输出随机性评估LLM。	cs.LGcs.AI	Xinhao Qu, Qiang Heng, Hao Zeng, Xiaoqian Liu	Evaluation of large language models (LLMs) is increasingly critical, yet standard benchmarking methods rely on average accuracy, overlooking both the inherent stochasticity of LLM outputs and the heterogeneity of benchmark items. Item Response Theory (IRT) off... Evaluation of large language models (LLMs) is increasingly critical, yet standard benchmarking methods rely on average accuracy, overlooking both the inherent stochasticity of LLM outputs and the heterogeneity of benchmark items. Item Response Theory (IRT) offers a principled framework for modeling latent model abilities and item characteristics, but conventional methods are computationally expensive and numerically unstable, limiting large-scale implementations. To address these challenges, we ...
737	A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces 2605.07052 RKHS行为系统辨识在向量值RKHS中推广行为方法以数据驱动建模非线性系统。	cs.LG	Boya Hou, Maxim Raginsky	We generalize Jan Willems' behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series an... We generalize Jan Willems' behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series and their autoregressive variants, as well as systems admitting Hammerstein-type state-space realizations. We apply the proposed framework to the problem of data-driven modeling of such systems, i.e., when simulation or control objectives for...
738	Functional-prior-based Bayesian PDE-constrained inversion using PINNs 2605.07060 PINN贝叶斯反演函数先验提出函数空间先验的PINN贝叶斯PDE反演统一框架。	cs.LG	Ryoichiro Agata, Tomohisa Okazaki	Physics-informed neural networks (PINNs) provide a mesh-free framework for solving PDE-constrained inverse problems, but their extension to Bayesian inversion still faces a fundamental difficulty: prior distributions are typically defined in the weight space o... Physics-informed neural networks (PINNs) provide a mesh-free framework for solving PDE-constrained inverse problems, but their extension to Bayesian inversion still faces a fundamental difficulty: prior distributions are typically defined in the weight space of neural networks, whereas physically meaningful prior assumptions are more naturally expressed in function space. In this study, we introduce a unified framework, termed functional-prior-based approaches to Bayesian PDE-constrained inversi...
739	Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neural Networks 2605.07065 个体处理效应区间估计用EpiNets校正PNS界的有限样本偏差以给出更可靠区间。	cs.LGcs.AI	Gandharv Patil, Keyi Tang, Raquel Aoki, Leo Guelman	Individual treatment effects are not point-identified from data. The Probability of Necessity and Sufficiency (PNS) circumvents this limitation by characterizing individual-level causality through intersection bounds derived from combined experimental and obse... Individual treatment effects are not point-identified from data. The Probability of Necessity and Sufficiency (PNS) circumvents this limitation by characterizing individual-level causality through intersection bounds derived from combined experimental and observational data. In finite samples, however, standard plug-in estimators systematically fail: they violate structural probability constraints and suffer from extremum bias induced by max-min operators, yielding spuriously narrow intervals. W...
740	Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity 2605.07097 o-极小结构网络样本复杂度证明可在o-极小结构中定义的固定网络具有限样本PAC复杂度。	cs.LG	Anastasis Kratsios, Gregory Cousins, Haitz S\'aez de Oc\'ariz Borde, Bum Jun Kim, Simone Brugiapaglia	We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has finite sample complexity... We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting, even with unbounded parameters. This covers standard fixed-size MLPs, CNNs, GNNs, and transformers with fixed sequence length, together with the operations and layers typically used in such architectures, inclu...
741	TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models 2605.07100 生成模型辅助保形预测用扩散与流匹配做传输对齐以构造多维输出保形预测区域。	cs.LG	Zhenhan Fang, Aixin Tan, Jian Huang	Constructing valid and informative conformal prediction regions for multi-dimensional outputs remains a fundamental challenge. While conformal prediction provides finite-sample, distribution-free coverage guarantees, its practical performance critically depend... Constructing valid and informative conformal prediction regions for multi-dimensional outputs remains a fundamental challenge. While conformal prediction provides finite-sample, distribution-free coverage guarantees, its practical performance critically depends on the choice of nonconformity score. Existing approaches often rely on restrictive geometric assumptions or require explicit likelihood evaluation and invertible transformations, limiting their applicability in complex generative setting...
742	Classification Fields: Arbitrarily Fine Recursive Hierarchical Clustering From Few Examples 2605.07119 无限深层次聚类结构提出classification fields从少样本生成可无限细化的层次聚类。	cs.LG	Yicen Li, Ruiyang Hong, Anastasis Kratsios, Haitz S\'aez de Oc\'ariz Borde, Paul D. McNicholas	Classical clustering methods usually return either a finite partition of the observed data or a finite dendrogram over it. This finite-sample view is inadequate when the hierarchy of interest is a recursive geometric object with fine-scale refinements that con... Classical clustering methods usually return either a finite partition of the observed data or a finite dendrogram over it. This finite-sample view is inadequate when the hierarchy of interest is a recursive geometric object with fine-scale refinements that continue beyond the levels directly observed. We introduce classification fields: infinite-depth hierarchical cluster structures on $\mathbb{R}^d$ generated by a local parent-to-child refinement rule. A classification field generator maps each...
743	AdaTKG: Adaptive Memory for Temporal Knowledge Graph Reasoning 2605.07121 时序知识图谱自适应记忆为TKG推理引入实体自适应记忆以保留交互历史并提升预测。	cs.LGcs.AI	Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim	Temporal knowledge graphs (TKGs) represent time-stamped relational facts and support a wide range of reasoning tasks over evolving events. However, existing methods produce entity representations that are static at the entity level, in that each representation... Temporal knowledge graphs (TKGs) represent time-stamped relational facts and support a wide range of reasoning tasks over evolving events. However, existing methods produce entity representations that are static at the entity level, in that each representation is a function of learned parameters only and retains no trace of the interactions in which the entity has participated. In this paper, we depart from this static view and propose that each entity be modeled as an adaptive process whose rep...
744	RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation 2605.07129 LLM推荐的检索与记忆用协同与元记忆的排序驱动检索构建更有效推荐上下文。	cs.LGcs.AI	Shijun Li, Wooseong Yang, Yu Wang, Tianxin Wei, Joydeep Ghosh	Large Language Models (LLMs) have emerged as a promising paradigm for next-generation recommender systems, offering strong semantic understanding and natural-language reasoning abilities. Despite recent progress, current LLM-based recommenders still face key c... Large Language Models (LLMs) have emerged as a promising paradigm for next-generation recommender systems, offering strong semantic understanding and natural-language reasoning abilities. Despite recent progress, current LLM-based recommenders still face key challenges in constructing decision-relevant contexts from heterogeneous evidence. First, existing methods often rely on fixed context construction strategies: collaborative behavioral evidence and item-side metadata are typically incorporat...
745	Can You Break RLVER? Probing Adversarial Robustness of RL-Trained Empathetic Agents 2605.07138 同理心RL模型对抗鲁棒性构建对抗同理心基准评测RLVER代理在操纵性对话下的稳健性。	cs.LGcs.AI	Deeraj S K, Sadhana Devarajan, Krishna Mehra, Sudhakar Mishra	Reinforcement learning from verifiable emotion rewards RLVER has produced language models with strong empathetic performance, evaluated on benchmarks that assume cooperative, honest users. Yet real emotional interactions systematically violate this assumption:... Reinforcement learning from verifiable emotion rewards RLVER has produced language models with strong empathetic performance, evaluated on benchmarks that assume cooperative, honest users. Yet real emotional interactions systematically violate this assumption: users gaslight, escalate, and pressure AI systems for unconditional validation, dynamics that cooperative benchmarks cannot surface. We construct the Adversarial Empathy Benchmark AEB and introduce the Emotional Consistency Score ECS to ev...
746	MathlibPR: Pull Request Merge-Readiness Benchmark for Formal Mathematical Libraries 2605.07147 形式化数学PR可合并性基准提出MathlibPR基准评测模型判断Lean/Mathlib PR合并就绪度。	cs.LGcs.AI	Zixuan Xie, Xinyu Liu, Shangtong Zhang	The ecosystem of Lean and Mathlib has become the de facto standard for large language model (LLM) assisted formal reasoning with remarkable successes in recent years. Those successes, however, only consume Mathlib as an essential dependency but do not directly... The ecosystem of Lean and Mathlib has become the de facto standard for large language model (LLM) assisted formal reasoning with remarkable successes in recent years. Those successes, however, only consume Mathlib as an essential dependency but do not directly contribute to it. In the meantime, the growth of Mathlib has recently been bottlenecked by the review process, which requires human reviewers to judge whether proposed pull requests (PRs) follow the Mathlib's conventions and are worth inte...
747	Three-in-One World Model: Energy-Based Consistency, Prediction, and Counterfactual Inference for Marketing Intervention 2605.07199 营销干预三合一世界模型用能量模型统一学习信念表征以做预测与反事实营销推断。	cs.LGcs.AI	Junichiro Niimi	Marketing decisions reflect the interaction of latent consumer heterogeneity, time-varying internal states, and explicit interventions, a structure that current prediction- and language-oriented models do not capture in a unified manner. We propose a Three-in-... Marketing decisions reflect the interaction of latent consumer heterogeneity, time-varying internal states, and explicit interventions, a structure that current prediction- and language-oriented models do not capture in a unified manner. We propose a Three-in-One world-model architecture in which a Deep Boltzmann Machine (DBM) learns a frozen belief representation from demographics, time, and lagged actions and outcomes, with lightweight task-specific adapters attached on top. The same belief su...
748	Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning 2605.07263 非相干空口联邦学习聚合提出REED无需瞬时CSI与相位对齐实现OTA-FL非相干聚合。	cs.LGcs.AI	Hao Chen, Zavareh Bozorgasl	Over-the-air federated learning (OTA-FL) reduces uplink latency by exploiting waveform superposition, but conventional analog aggregation schemes typically require instantaneous channel state information (CSI), channel inversion, and coherent phase alignment, ... Over-the-air federated learning (OTA-FL) reduces uplink latency by exploiting waveform superposition, but conventional analog aggregation schemes typically require instantaneous channel state information (CSI), channel inversion, and coherent phase alignment, which can be difficult to maintain in practical wireless systems. This paper proposes resource-element energy difference (REED), a noncoherent aggregation primitive for continuous signed updates that avoids instantaneous CSI. REED maps the ...
749	How Big Should a Wireless Foundation Model Be? 2605.07266 无线基础模型规模定律用信道非线性流形维度给出无线基础模型可扩展上限。	cs.LG	Wei-Lun Cheng, Wanjiun Liao	Wireless foundation models are rapidly emerging as a key enabler of AI-native communication systems, yet a fundamental question remains unanswered: how large should these models be? We present a principled, physics-grounded answer, showing that the intrinsic d... Wireless foundation models are rapidly emerging as a key enabler of AI-native communication systems, yet a fundamental question remains unanswered: how large should these models be? We present a principled, physics-grounded answer, showing that the intrinsic dimensionality (dNL, the nonlinear manifold dimension of the channel) acts as the fundamental bottleneck, defining the scaling ceiling once a data-sufficient regime is reached. This dimensionality is not a design choice but a physical constr...
750	Structured Role-Aware Policy Optimization for Multimodal Reasoning 2605.07274 多模态推理的角色感知RL提出角色感知策略优化以让奖励区分视觉证据与文本功能。	cs.LGcs.AI	Bingqing Jiang, Difan Zou	Reinforcement learning from verifiable rewards (RLVR), especially with Group Relative Policy Optimization (GRPO), has shown strong potential for improving the reasoning capabilities of large vision-language models (LVLMs). However, in multimodal reasoning, fin... Reinforcement learning from verifiable rewards (RLVR), especially with Group Relative Policy Optimization (GRPO), has shown strong potential for improving the reasoning capabilities of large vision-language models (LVLMs). However, in multimodal reasoning, final-answer rewards are typically assigned at the sequence level and do not distinguish the functional roles of different tokens, making it difficult to determine whether a correct answer is supported by task-relevant visual evidence. In this...
751	Sparse Random-Feature Neural Networks with Krylov-Based SVD for Singularly Perturbed ODE 2605.07286 Sparse random-feature networks用Krylov-SVD稀疏化RFNN求解奇异摄动ODE。	cs.LG	Kevin Kurian Thomas Vaidyan, Siddharth Rout	Random-feature neural networks (RFNNs), including architectures with fixed hidden layers and analytically determined output weights, offer fast training but often suffer from issues due to dense representations of the hidden layer activation. Their reliance on... Random-feature neural networks (RFNNs), including architectures with fixed hidden layers and analytically determined output weights, offer fast training but often suffer from issues due to dense representations of the hidden layer activation. Their reliance on dense feature mappings and least squares solvers can limit scalability and numerical stability, particularly for high-dimensional or stiff systems. Specifically, the activation matrix is observed to be low-rank and extremely ill-conditione...
752	Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers 2605.07297 Transformer generalization bounds提出谱自适应的Transformer事后泛化界。	cs.LG	Mana Sakai, Masaaki Imaizumi	Understanding why trained Transformers generalize well is a fundamental problem in modern machine learning theory, and complexity-based generalization bounds provide a principled way to study this question. While existing norm-based bounds for Transformers rem... Understanding why trained Transformers generalize well is a fundamental problem in modern machine learning theory, and complexity-based generalization bounds provide a principled way to study this question. While existing norm-based bounds for Transformers remove the explicit polynomial dependence on the hidden dimension, they typically impose fixed norm constraints specified a priori and can exhibit unfavorable exponential dependence on depth. In this paper, we derive spectrum-adaptive post hoc...
753	Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation 2605.07323 ODE discovery with LLMs用LLM做定性+定量评估来发现常微分方程。	cs.LGcs.AI	Sum Kyun Song, Bong Gyun Shin, Jae Yong Lee	Discovering governing differential equations from observational data is a fundamental challenge in scientific machine learning. Existing symbolic regression approaches rely primarily on quantitative metrics; however, real-world differential equation modeling a... Discovering governing differential equations from observational data is a fundamental challenge in scientific machine learning. Existing symbolic regression approaches rely primarily on quantitative metrics; however, real-world differential equation modeling also requires incorporating domain knowledge to ensure physical plausibility. To address this gap, we propose DoLQ, a method for discovering ordinary differential equations with LLM-based qualitative and quantitative evaluation. DoLQ employs...
754	Exploring CoCo Challenges in ML Engineering Teams: Insights From the Semiconductor Industry 2605.07389 ML team collaboration challenges调研半导体行业MLE团队协作沟通难题与影响。	cs.LG	A. Azamnouri, M. Haug, L. Woltmann, M. Fritz, J. Bogner	The integration of machine learning (ML) into complex software systems has increased challenges in collaboration and communication (CoCo) of the teams building these systems. ML engineering (MLE) teams often involve diverse roles, ML engineers, data scientists... The integration of machine learning (ML) into complex software systems has increased challenges in collaboration and communication (CoCo) of the teams building these systems. ML engineering (MLE) teams often involve diverse roles, ML engineers, data scientists, software engineers, and domain experts, each bringing unique goals, experiences, and jargon. These interdisciplinary dynamics can make it challenging to deploy, reproduce, and maintain ML-enabled systems over the long term. Previous studi...
755	Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs 2605.07417 DNN fault tolerance memory提出替代ECC的轻量方法提升大规模DNN可靠性。	cs.LG	Mohammad Hasan Ahmadilivani, Marten Roots, Marco Restifo, Sven-Markus Loorits, Luca Di Mauro	Modern Deep Learning (DL) workloads are increasingly deployed in safety-critical domains, such as automotive systems and hyperscale data centers, where transient hardware faults pose a serious threat to system reliability. These workloads are highly memory-int... Modern Deep Learning (DL) workloads are increasingly deployed in safety-critical domains, such as automotive systems and hyperscale data centers, where transient hardware faults pose a serious threat to system reliability. These workloads are highly memory-intensive, and their correct functionality strongly depends on model parameters stored in memory, which are typically protected using Error Correction Codes (ECCs). In this work, we study ECC's impact on such models and propose two lightweight...
756	Inference of Qualitative Models from Steady-State Data via Weighted MaxSMT 2605.07433 Qualitative model inference MaxSMT用加权MaxSMT从稳态数据鲁棒推断定性生物模型。	cs.LG	Ond\v{r}ej Huvar, Nikola Bene\v{s}, Martin Jon\'a\v{s}, David \v{S}afr\'anek, Samuel Pastva	Qualitative models provide crucial instruments for modelling complex biological systems. While advances in automated reasoning and symbolic encodings have enabled rigorous inference of these models from data, the process remains highly fragile. First, biologic... Qualitative models provide crucial instruments for modelling complex biological systems. While advances in automated reasoning and symbolic encodings have enabled rigorous inference of these models from data, the process remains highly fragile. First, biological measurement errors inevitably propagate into formal model specifications. Second, when a specification becomes unsatisfiable, distinguishing between fundamental design flaws and minor technical errors is notoriously difficult. This uncer...
757	Breaking QAOA's Fixed Target Hamiltonian Barrier: A Fully Connected Quantum Boltzmann Machine via Bilevel Optimization 2605.07473 Quantum Boltzmann machine QAOA用双层优化扩展QAOA实现全连接量子玻尔兹曼机。	cs.LG	Jun Liu	To overcome the limitations of classical partially connected Boltzmann machines and mainstream quantum Boltzmann machines (QBMs), this work extends the conventional circuit of the quantum approximate optimization algorithm (QAOA) to a bilevel optimization arch... To overcome the limitations of classical partially connected Boltzmann machines and mainstream quantum Boltzmann machines (QBMs), this work extends the conventional circuit of the quantum approximate optimization algorithm (QAOA) to a bilevel optimization architecture and proposes a fully connected QBM. The inner-loop training simulates positive phase energy minimization based on the computational process of the conventional QAOA circuit, whereas the outer-loop training simulates negative phase ...
758	Efficient Data Selection for Multimodal Models via Incremental Optimization Utility 2605.07488 Multimodal data selection将多模态数据选择建模为增量效用排序以降成本。	cs.LGcs.AI	Jinhao Jing, Qiannian Zhao, Chao Huang, Zhan Su	The scaling of Large Multimodal Models (LMMs) is constrained by the quality-quantity trade-off inherent in synthetic data. Previous approaches, such as LLM-as-a-Judge, have proven their effectiveness in addressing this but suffer from prohibitive computational... The scaling of Large Multimodal Models (LMMs) is constrained by the quality-quantity trade-off inherent in synthetic data. Previous approaches, such as LLM-as-a-Judge, have proven their effectiveness in addressing this but suffer from prohibitive computational costs and lack of interpretability. To bridge this gap, we propose One-Step-Train (OST), a framework that reformulates data selection as an incremental optimization utility ranking problem. Instead of relying on semantic heuristics, OST es...
759	LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning 2605.07505 Compact GUI agent distillation用强化学习蒸馏轻量端侧视觉语言GUI智能体。	cs.LGcs.AI	Yubin Wu, Zicheng Cai, Liping Ning, Hua Wang, Zhi Chen	Developing lightweight, on-device vision-language GUI agents is essential for efficient cross-platform automated interaction. However, current on-device agents are constrained by limited model capacity, and further performance improvements remain urgently need... Developing lightweight, on-device vision-language GUI agents is essential for efficient cross-platform automated interaction. However, current on-device agents are constrained by limited model capacity, and further performance improvements remain urgently needed. Traditional Supervised Fine-Tuning (SFT) for small-scale models often leads to overfitting, catastrophic forgetting and policy rigidity, and thus fails to fully address these challenges. In this work, we propose a novel SFT-free trainin...
760	GESR: Graph-Based Edge Semantic Reconstruction for Stealthy Communication Detection with Benign-Only Training 2605.07536 Graph anomaly network security用图语义重建在仅良性训练下检测隐蔽通信。	cs.LG	Henghui Xu, Yuchen Zhang, Xiaobo Ma	Detecting stealthy malicious communications from flow logs under benign-only training remains a critical challenge in network security. Malicious communications often camouflage as normal traffic like standard HTTPS flows. Conventional intrusion detectors rely... Detecting stealthy malicious communications from flow logs under benign-only training remains a critical challenge in network security. Malicious communications often camouflage as normal traffic like standard HTTPS flows. Conventional intrusion detectors rely strictly on known labeled attacks. Alternatively, they score flows completely independently. These approaches fail against sparse and context-dependent suspicious activity. To capture this essential context, graph anomaly detectors have be...
761	A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning 2605.07596 Contrastive learning generalization分析极多类监督对比学习的更精细泛化与样本复杂度。	cs.LG	Nong Minh Hieu, Antoine Ledent	Contrastive Representation Learning (CRL) has achieved strong empirical success in multiple machine learning disciplines, yet its theoretical sample complexity remains poorly understood. Existing analyses usually assume that input tuples are identically and in... Contrastive Representation Learning (CRL) has achieved strong empirical success in multiple machine learning disciplines, yet its theoretical sample complexity remains poorly understood. Existing analyses usually assume that input tuples are identically and independently distributed, an assumption violated in most practical settings where contrastive tuples are constructed from a finite pool of labeled data, inducing dependencies among tuples. While one recent work analyzed this learning setting...
762	Robust stochastic first order methods in heavy-tailed noise via medoid mini-batch gradient sampling 2605.07634 Heavy-tailed robust optimization用medoid小批梯度采样提升重尾噪声下SGD鲁棒性。	cs.LG	Manojlo Vukovic, Dusan Jakovetic	We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed,... We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed, with possibly infinite variances. For the considered heavy-tailed setting, many algorithmic variants have recently been proposed based on gradient clipping or other nonlinear operators (e.g., normalization) applied over noisy gradients. In...
763	Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding 2605.07637 Multi-agent pathfinding communication学习局部通信策略以扩展大规模多智能体路径规划。	cs.LGcs.AI	Valeriy Vyaltsev, Alsu Sagirova, Anton Andreychuk, Yuri Kuratov, Konstantin Yakovlev	Multi-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solv... Multi-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solvers are critical for real-world applications such as logistics and search-and-rescue. To this end, the research community has proposed various decentralized suboptimal MAPF solvers that leverage machine learning. Such methods frame MAPF (fr...
764	Quotient Semivalues for False-Name-Resistant Data Attribution 2605.07663 False-name-resistant data valuation提出quotient semivalue机制抵抗伪名操纵的数据归因。	cs.LG	Florian A. D. Burnat, Brittany I. Davidson	Data valuation methods allocate payments and audit training data's contribution to machine-learning pipelines; however, they often assume passive contributors. In reality, contributors can split datasets across pseudonymous identities, duplicate high-value exa... Data valuation methods allocate payments and audit training data's contribution to machine-learning pipelines; however, they often assume passive contributors. In reality, contributors can split datasets across pseudonymous identities, duplicate high-value examples, create near-duplicates, or launder synthetic variants to inflate their share. We formalize this as false-name manipulation in ML data attribution. Our main construction is the quotient semivalue mechanism: compute Shapley-, Banzhaf-,...
765	Debiased Counterfactual Generation via Flow Matching from Observations 2605.07665 Counterfactual generation flow matching用流匹配利用观测分布关系生成去偏反事实分布。	cs.LG	Hugh Dance, Johnny Xi, Peter Orbanz, Benjamin Bloem-Reddy	Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relatio... Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relationship to the observational data. In this work, we show that under standard assumptions, observational and counterfactual outcome distributions are tightly linked: they have identical support and tail behavior, remain statistically close und...
766	Differentially Private Auditing Under Strategic Response 2605.07674 DP auditing with strategic agents将差分隐私审计建模为Stackelberg博弈并优化查询预算。	cs.LG	Florian A. D. Burnat	Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize pri... Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize privacy-constrained auditing as a bilevel Stackelberg game, in which an auditor commits to a query policy and DP budget allocation across harm dimensions, and a strategic developer reallocates mitigation efforts in response. We introduce the w...
767	FactoryBench: Evaluating Industrial Machine Understanding 2605.07675 Industrial telemetry benchmark发布FactoryBench评测工业机器人时序理解与因果问答。	cs.LGcs.AI	Yanis Merzouki, Coral Izquierdo, Matei Ignuta-Ciuncanu, Marcos Gomez-Bracamonte, Riccardo Maggioni	We introduce FactoryBench, a benchmark for evaluating time-series models and LLMs on machine understanding over industrial robotic telemetry. Q&A pairs are organized along four causal levels (state, intervention, counterfactual, decision) instantiating Pea... We introduce FactoryBench, a benchmark for evaluating time-series models and LLMs on machine understanding over industrial robotic telemetry. Q&A pairs are organized along four causal levels (state, intervention, counterfactual, decision) instantiating Pearl's ladder of causation, and span five answer formats: four structured formats are scored deterministically and free-form answers are scored by an LLM-as-judge voting protocol. We propose a scalable Q&A generation framework built around struct...
768	Physics-Informed Reduced-Order Operator Learning for Hyperelasticity in Continuum Micromechanics 2605.07738 Physics-informed reduced operator learning结合EquiNO与Q-DEIM做超弹性微观力学降阶算子学习。	cs.LG	Hamidreza Eivazi, Henning Wessels	Physics-informed operator learning is an attractive candidate for surrogate modeling of microstructures, especially in multiscale finite-element simulations. Its practical use, however, is often limited by the high cost of loss evaluation. We address this bott... Physics-informed operator learning is an attractive candidate for surrogate modeling of microstructures, especially in multiscale finite-element simulations. Its practical use, however, is often limited by the high cost of loss evaluation. We address this bottleneck by combining the Equilibrium Neural Operator (EquiNO) with the QR-based discrete empirical interpolation method (Q-DEIM). EquiNO learns only the modal coefficients of reduced displacement-fluctuation and first Piola-Kirchhoff stress ...
769	Flow Matching for Count Data 2605.07746 Flow matching for counts提出适用于高维计数数据的流匹配生成建模方法。	cs.LG	Ganchao Wei, John Pearson	High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusio... High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous sp...
770	SMT-Based Active Learning of Weighted Automata 2605.07758 SMT-based active automata learning用SMT主动学习非确定加权自动机并保证最小性。	cs.LG	Tiago Ferreira, Kevin Batz, Alexandra Silva	We present an SMT-based active learning algorithm for nondeterministic weighted automata (WFAs) as a practical and robust alternative to Hankel/L-style methods. Our algorithm is parametric in a given semiring and, if it terminates, guaranteed to produce minim... We present an SMT-based active learning algorithm for nondeterministic weighted automata (WFAs) as a practical and robust alternative to Hankel/L-style methods. Our algorithm is parametric in a given semiring and, if it terminates, guaranteed to produce minimal WFAs. We prove partial correctness and provide a sufficient termination condition, which in particular implies termination for all finite semirings. Our extensive experimental evaluation shows that our algorithm is capable of learning nu...
771	Interactive Trajectory Planning with Learning-based Distributionally Robust Model Predictive Control and Markov Systems 2605.07768 Distributionally robust MPC planning学习他车决策分布并用PAC+分布鲁棒MPC做交互规划。	cs.LG	Erik B\"orve, Nikolce Murgovski, Morteza Haghir Chehreghani, Leo Laine	We investigate interactive trajectory planning subject to uncertainty in the decisions of surrounding agents. To control the ego-agent, we aim to first learn the decision distribution and solve a Stochastic Model Predictive Control (SMPC) problem. To account f... We investigate interactive trajectory planning subject to uncertainty in the decisions of surrounding agents. To control the ego-agent, we aim to first learn the decision distribution and solve a Stochastic Model Predictive Control (SMPC) problem. To account for errors in the learned distribution, we show that it is possible to utilize Probably Approximately Correct (PAC) learning in combination with Distributionally Robust (DR) optimization to obtain a solution which accounts for the errors ind...
772	GRASP -- Graph-Based Anomaly Detection Through Self-Supervised Classification 2605.07812 Self-supervised graph intrusion detection用自监督图分类实现溯源图异常检测以识别APT攻击。	cs.LG	Robin Buchta, Carsten Kleiner, Felix Heine, Gabi Dreo Rodosek	Advanced persistent threat (APT) attacks remain difficult to detect due to their stealth, adaptability, and use of legitimate system components. Provenance-based intrusion detection systems (PIDS) offer a promising defense by capturing detailed relationships b... Advanced persistent threat (APT) attacks remain difficult to detect due to their stealth, adaptability, and use of legitimate system components. Provenance-based intrusion detection systems (PIDS) offer a promising defense by capturing detailed relationships between system components and actions. However, current PIDS rely on predefined or subset-determined thresholds, which limit detection stability and the ability to detect any anomalous behavior in general. Furthermore, related work often neg...
773	NSPOD: acceleratingthe convergence ofKrylov-based iterative linearsolvers via approximated PODs 2605.07828 Krylov solver acceleration POD用近似POD加速参数PDE的Krylov迭代线性求解收敛。	cs.LG	Francesc Levrero-Florencio, Youngkyu Lee, Jay Pathak, George Em Karniadakis	The convergence of Krylov-based linear iterative solvers applied to parametric partial differential equations (PDEs) is often highly sensitive to the domain, its discretization, the location/values of the applied Dirichlet/Neumann boundary conditions, body for... The convergence of Krylov-based linear iterative solvers applied to parametric partial differential equations (PDEs) is often highly sensitive to the domain, its discretization, the location/values of the applied Dirichlet/Neumann boundary conditions, body forces and material properties, among others. We have previously introduced hybridization of classical linear iterative solvers with neural operators for specific geometries, but they tend to not perform well on geometries not previously seen ...
774	PPI-Net connects molecular protein interactions to functional processes in disease 2605.07838 Hierarchical GNN for disease用层次图网络融合PPI与通路表征建模疾病机制。	cs.LGcs.AI	Kyle Higgins, Guadalupe Gonzalez, Dennis Veselkov, Ivan Laponogov, Kirill Veselkov	Understanding how molecular alterations propagate across biological systems to drive disease remains a central challenge. Although high-throughput profiling enables comprehensive characterization of tumor states, most models neglect structured biological relat... Understanding how molecular alterations propagate across biological systems to drive disease remains a central challenge. Although high-throughput profiling enables comprehensive characterization of tumor states, most models neglect structured biological relationships or lack interpretability across scales. Here we present PPI-Net, a hierarchical graph neural network that integrates protein-protein interaction (PPI) networks with pathway-level representations to model disease from molecular inte...
775	Characterizing and Correcting Effective Target Shift in Online Learning 2605.07886 Online learning target shift刻画在线核回归的有效目标偏移并给出校正方法。	cs.LG	Ziyan Li, Naoki Hiratani	Online learning from a stream of data is a defining feature of intelligence, yet modern machine learning systems often struggle in this setting, especially under distributional shift. To understand its basic properties, we study the relationship between online... Online learning from a stream of data is a defining feature of intelligence, yet modern machine learning systems often struggle in this setting, especially under distributional shift. To understand its basic properties, we study the relationship between online and offline learning in the context of kernel regression. We derive a closed-form expression for the function learned by online kernel regression, revealing that online kernel regression is equivalent to offline regression with shifted, in...
776	Statistical inference with belief functions: A survey 2605.07908 Belief function statistical inference综述基于信念函数框架的统计推断与学习方法。	cs.LGcs.AI	Fabio Cuzzolin	Belief functions are a powerful and popular framework for the mathematical characterisation of uncertainty, in particular in situations in which lack of data renders learning a probability distribution for the problem impractical. The first step in a reasoning... Belief functions are a powerful and popular framework for the mathematical characterisation of uncertainty, in particular in situations in which lack of data renders learning a probability distribution for the problem impractical. The first step in a reasoning chain based on belief functions is inference: how to learn a belief measure from the available data. In this survey we focus, in particular, on making inference from statistical data, and review the most significant contributions in the ar...
777	Exploring the non-convexity in machine learning using quantum-inspired optimization 2605.07947 Quantum-inspired nonconvex optimization用量子启发全局搜索求解含离群点的非凸学习问题。	cs.LGcs.AI	Kandula Eswara Sai Kumar, Parth Dhananjay Danve, Abhishek Chopra, Rut Lineswala	The escalating complexity of modern machine learning necessitates solving challenging non-convex optimization problems, particularly in high-dimensional regimes and scenarios contaminated by gross outliers. Traditional approaches, relying on convex relaxations... The escalating complexity of modern machine learning necessitates solving challenging non-convex optimization problems, particularly in high-dimensional regimes and scenarios contaminated by gross outliers. Traditional approaches, relying on convex relaxations or specialized local search heuristics, frequently succumb to suboptimal local minima and fail to recover the true underlying discrete structures. In this paper, we propose treating these non-convex challenges as a global search problem an...
778	Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means 2605.07964 Bayes-assisted confidence sequences提出贝叶斯辅助的有界均值置信序列并达对数最优。	cs.LG	Valentin Kilian, Stefano Cortinovis, Fran\c{c}ois Caron	Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martin... Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martingale updates, and many existing constructions do not exploit prior information about plausible data-generating distributions or mean values. We propose a Bayes-assisted framework that uses a Bayesian working predictive model to adaptively c...
779	Linear Response Estimators for Singular Statistical Models 2605.07970 Linear response in singular models定义并一致估计奇异统计模型的线性响应敏感度。	cs.LG	Chris Elliott, Daniel Murfet	We define susceptibilities as a measure of the response of an observable quantity of a parameterized statistical model to a perturbation of the data for a general class of observables. We define estimators for these susceptibilities as statistics in a sequence... We define susceptibilities as a measure of the response of an observable quantity of a parameterized statistical model to a perturbation of the data for a general class of observables. We define estimators for these susceptibilities as statistics in a sequence of n data-points and prove that these estimators are consistent and asymptotically unbiased in the large n regime.
780	Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems 2605.08006 Bilevel minimax optimization methods提出罚函数一阶法求解含minimax与约束下层的双层优化。	cs.LG	Yiyang Shen, Yutian He, Weiran Wang, Qihang Lin	We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax opti... We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here....
781	Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction 2605.08022 Spiking Neural Network Training通过参数重构实现SNN的全局最优训练，避免替代梯度误差累积。	cs.LGcs.AI	Himanshu Udupi, Xiaocong Yang, ChengXiang Zhai	Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks (ANNs). However, the training of SNN usually relies on surrogate gradients due to the non-differentiability... Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks (ANNs). However, the training of SNN usually relies on surrogate gradients due to the non-differentiability of the spike function, introducing approximation errors that accumulate across layers. To address this challenge, we extend the work on convexification of parallel feedforward threshold networks to parallel recurrent threshold networks, wh...
782	Semiparametric Efficient Test for Interpretable Distributional Treatment Effects 2605.08034 Distributional Treatment Effect Testing提出半参数高效的有限位置检验，定位处理效应导致的分布差异。	cs.LG	Houssam Zenati, Arthur Gretton	Distributional treatment effects can be invisible to means: a treatment may preserve average outcomes while changing tails, modes, dispersion, or rare-event probabilities. Kernel tests can detect discrepancies between interventional outcome laws, but global te... Distributional treatment effects can be invisible to means: a treatment may preserve average outcomes while changing tails, modes, dispersion, or rare-event probabilities. Kernel tests can detect discrepancies between interventional outcome laws, but global tests do not reveal where the laws differ. We propose DR-ME, to our knowledge the first semiparametrically efficient finite-location test for interpretable distributional treatment effects. DR-ME evaluates an interventional kernel witness at ...
783	PropSplat: Map-Free RF Field Reconstruction via 3D Gaussian Propagation Splatting 2605.08035 RF Propagation Field Reconstruction用3D各向异性高斯传播splatting在无地图条件下重建RF场与路径损耗。	cs.LG	William Bjorndahl, Maninder Pal Singh, Farhad Nouri, Joseph Camp	Building a site-specific propagation model typically requires either ray-tracing over detailed 3D maps or dense measurement campaigns. Both approaches are expensive and often infeasible for rapid deployments where geographic data is unavailable or outdated. We... Building a site-specific propagation model typically requires either ray-tracing over detailed 3D maps or dense measurement campaigns. Both approaches are expensive and often infeasible for rapid deployments where geographic data is unavailable or outdated. We present PropSplat, a map-free propagation modeling method that reconstructs radio frequency (RF) fields using 3D anisotropic Gaussian primitives. Each Gaussian encodes a scalar path loss offset relative to an explicit baseline path loss mo...
784	A Note on Non-Negative $L_1$-Approximating Polynomials 2605.08072 Nonnegative L1 Approximating Polynomials研究高斯分布下非负L1指示函数近似多项式的存在性与性质。	cs.LG	Jane H. Lee, Anay Mehrotra, Manolis Zampetakis	$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials... $L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials with respect to Gaussian distributions. This is a stronger requirement than $L_1$-approximation but weaker than sandwiching polynomials (which themselves have many applications). These non-negative approximating polynomials have recently f...
785	Multi-Stage Prototype Learning for Interpretable Time Series Classification 2106.09636 Interpretable Time Series Classification提出多阶段原型学习，提取可解释的单变量与跨变量时序模式用于分类。	cs.LG	Bhavesh Kalisetti, Vincent Wang, Gaurav R. Ghosal, Maryam Bijanzadeh, Reza Abbasi-Asl	Deep learning methods are powerful tools in classifying multivariate time series data. Despite their high performance, these methods are hard to interpret, which diminishes their applications in high-risk domains such as healthcare. In this paper, we propose a... Deep learning methods are powerful tools in classifying multivariate time series data. Despite their high performance, these methods are hard to interpret, which diminishes their applications in high-risk domains such as healthcare. In this paper, we propose a novel multi-stage prototype learning framework for multivariate time series classification. By design, our framework identifies predictive temporal patterns in individual variables as well as cross-variable patterns that are highly predict...
786	Testing Noise Assumptions of Learning Algorithms 2501.09189 Testing Learning Noise Assumptions给出高效算法检测训练集是否满足特定噪声模型假设，并扩展可测试学习框架。	cs.LG	Surbhi Goel, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan	We pose a fundamental question in computational learning theory: can we efficiently test whether a training set satisfies the assumptions of a given noise model? This question has remained unaddressed despite decades of research on learning in the presence of ... We pose a fundamental question in computational learning theory: can we efficiently test whether a training set satisfies the assumptions of a given noise model? This question has remained unaddressed despite decades of research on learning in the presence of noise. In this work, we show that this task is tractable and present the first efficient algorithm to test various noise assumptions on the training data. To model this question, we extend the recently proposed testable learning framework o...
787	Generalized Euler Logarithm and its Applications in Machine Learning: Natural Gradient, Backpropagation, Generalized EG, Mirror Descent and OLPS 2502.17500 Generalized Logarithm for Optimization系统分析广义欧拉对数并连接自然梯度、镜像下降等学习算法。	cs.LGcs.AI	Andrzej Cichocki	This paper investigates in depth the fundamental properties of the two-parameter generalized Euler logarithm and its inverse, the associated deformed $(a,b)$-exponential function. We systematically clarify the parameter domains that guarantee monotonicity, con... This paper investigates in depth the fundamental properties of the two-parameter generalized Euler logarithm and its inverse, the associated deformed $(a,b)$-exponential function. We systematically clarify the parameter domains that guarantee monotonicity, concavity, and invertibility, derive series and integral representations, and provide explicit links to a broad class of one- and two-parameter deformations, including Tsallis, Kaniadakis, Schw\"ammle--Tsallis, Kaniadakis--Scarfone, and Tempes...
788	No Forgetting Learning: Buffer-free Continual Learning Classification 2503.04638 Buffer-free Continual Learning提出无回放缓冲的持续学习框架，利用过参数冗余分解共享骨干与任务头减遗忘。	cs.LG	Mohammad Ali Vahedifar, Qi Zhang	Most Continual Learning (CL) methods maintain performance on earlier tasks by storing exemplars in a replay buffer, introducing memory overhead that scales with the number of tasks and raising privacy concerns in regulated domains. We propose No Forgetting Lea... Most Continual Learning (CL) methods maintain performance on earlier tasks by storing exemplars in a replay buffer, introducing memory overhead that scales with the number of tasks and raising privacy concerns in regulated domains. We propose No Forgetting Learning (NFL), a buffer-free framework for class- and task-incremental learning that instead exploits the inherent redundancy of overparameterized networks. NFL decomposes the network into a shared backbone and task-specific heads, then appli...
789	A Resilience Framework for Bi-Criteria Combinatorial Optimization with Bandit Feedback 2503.12285 Bicriteria Bandit Combinatorial Optimization提出双目标组合优化的韧性框架与离线到在线化方法，处理带噪bandit反馈。	cs.LGcs.AI	Vaneet Aggarwal, Shweta Jain, Subham Pokhriyal, Christopher John Quinn	We study bi-criteria combinatorial optimization under noisy function evaluations. While resilience and black-box offline-to-online reductions have been studied in single-objective settings, extending these ideas to bi-criteria problems introduces new challenge... We study bi-criteria combinatorial optimization under noisy function evaluations. While resilience and black-box offline-to-online reductions have been studied in single-objective settings, extending these ideas to bi-criteria problems introduces new challenges due to the coupled degradation of approximation guarantees for objectives and constraints. We introduce a notion of $(\alpha,\beta,\delta,\texttt{N})$-resilience for bi-criteria approximation algorithms, capturing how joint approximation ...
790	Synergistic Benefits of Joint Molecule Generation and Property Prediction 2504.16559 Joint Molecule Generation Prediction提出Hyformer联合分子生成与性质预测，通过交替注意力实现协同提升。	cs.LG	Adam Izdebski, Jan Olszewski, Pankhil Gawade, Krzysztof Koras, Serra Korkmaz	Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint ... Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint models presents daunting architectural and optimization challenges. Here, we propose Hyformer, a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism ...
791	Identifiability Challenges in Sparse Linear Ordinary Differential Equations 2506.09816 Identifiability in Sparse Linear ODEs分析稀疏线性常微分方程从数据学习时的可辨识性困难与条件。	cs.LG	Cecilia Casolo, S\"oren Becker, Niki Kilbertus	Dynamical systems modeling is a core pillar of scientific inquiry across natural and life sciences. Increasingly, dynamical system models are learned from data, rendering identifiability a paramount concept. For systems that are not identifiable from data, no ... Dynamical systems modeling is a core pillar of scientific inquiry across natural and life sciences. Increasingly, dynamical system models are learned from data, rendering identifiability a paramount concept. For systems that are not identifiable from data, no guarantees can be given about their behavior under new conditions and inputs, or about possible control mechanisms to steer the system. It is known in the community that "linear ordinary differential equations (ODE) are almost surely identi...
792	From Time Series Analysis to Question Answering: A Survey in the LLM Era 2506.11512 LLMs for Time Series QA Survey综述LLM时代时间序列分析到问答等语言任务的融合进展与挑战。	cs.LGcs.AI	Wei Li, Zhe Xie, Yuxuan Liang, Xinli Hao, Yunyao Cheng	Recently, Large Language Models (LLMs) have introduced a novel paradigm in Time Series Analysis (TSA), leveraging strong language capabilities to support tasks such as forecasting and anomaly detection. However, these analysis tasks cannot adequately cover tem... Recently, Large Language Models (LLMs) have introduced a novel paradigm in Time Series Analysis (TSA), leveraging strong language capabilities to support tasks such as forecasting and anomaly detection. However, these analysis tasks cannot adequately cover temporal language tasks, such as interpretation and captioning. A fundamental gap remains between TSA and LLMs: LLMs are pre-trained to optimize natural language relevance for question answering rather than objectives specialized for TSA. To b...
793	HYPER: A Foundation Model for Inductive Link Prediction with Knowledge Hypergraphs 2506.12362 Inductive Hypergraph Link Prediction提出知识超图基础模型HYPER，实现对新实体与新关系类型的归纳链路预测。	cs.LGcs.AI	Xingyue Huang, Mikhail Galkin, Michael M. Bronstein, \.Ismail \.Ilkan Ceylan	Inductive link prediction with knowledge hypergraphs is the task of predicting missing hyperedges involving completely novel entities (i.e., nodes unseen during training). Existing methods for inductive link prediction with knowledge hypergraphs assume a fixed... Inductive link prediction with knowledge hypergraphs is the task of predicting missing hyperedges involving completely novel entities (i.e., nodes unseen during training). Existing methods for inductive link prediction with knowledge hypergraphs assume a fixed relational vocabulary and, as a result, cannot generalize to knowledge hypergraphs with novel relation types (i.e., relations unseen during training). Inspired by knowledge graph foundation models, we propose HYPER as a foundation model fo...
794	Flat Channels to Infinity in Neural Loss Landscapes 2506.14951 Neural Loss Landscape Channels刻画损失景观中的平坦通道结构，解释神经元权重发散与合并现象。	cs.LGcs.AI	Flavio Martinelli, Alexander Van Meegen, Berfin \c{S}im\c{s}ek, Wulfram Gerstner, Johanni Brea	The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slow... The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slowly, while the output weights of at least two neurons, $a_i$ and $a_j$, diverge to $\pm$infinity, and their input weight vectors, $\mathbf{w_i}$ and $\mathbf{w_j}$, become equal to each other. At convergence, the two neurons implement a gate...
795	Discovering Learning-Friendly Generation Orders for Sequential Computation 2506.23875 Learning-friendly Generation Order Discovery用损失剖面自动搜索序列计算的生成顺序，使训练更易收敛。	cs.LGcs.AI	Yuta Sato, Kazuhiko Kawamoto, Hiroshi Kera	Sequential computation via autoregressive generation can make difficult tasks learnable, but the generation order of intermediate states strongly affects whether training succeeds. We address the problem of discovering a learning-friendly target order automati... Sequential computation via autoregressive generation can make difficult tasks learnable, but the generation order of intermediate states strongly affects whether training succeeds. We address the problem of discovering a learning-friendly target order automatically, rather than relying on task-specific design. Our key observation is that learning-friendly orders cause a faster loss drop in the early stage of training. We exploit this by \emph{loss profiling}, which ranks candidate orders by the ...
796	Scalable Equilibrium Propagation via Intermediate Error Signals for Deep Convolutional CRNNs 2508.15989 Scalable Equilibrium Propagation为深度卷积CRNN提出带中间误差信号的可扩展平衡传播训练方法。	cs.LG	Jiaqi Lin, Malyaban Bal, Abhronil Sengupta	Equilibrium Propagation (EP) is a biologically inspired local learning rule first proposed for convergent recurrent neural networks (CRNNs), in which synaptic updates depend only on neuron states from two distinct phases. EP estimates gradients that closely al... Equilibrium Propagation (EP) is a biologically inspired local learning rule first proposed for convergent recurrent neural networks (CRNNs), in which synaptic updates depend only on neuron states from two distinct phases. EP estimates gradients that closely align with those computed by Backpropagation Through Time (BPTT) while significantly reducing computational demands, positioning it as a potential candidate for on-chip training in neuromorphic architectures. However, prior studies on EP have...
797	Normalized Maximum Likelihood Code-Length on Riemannian Data Spaces 2508.21466 NML on Riemannian Manifolds将归一化最大似然码长推广到黎曼数据空间，用于模型选择与遗憾最小化。	cs.LG	Kota Fukuzawa, Atsushi Suzuki, Kenji Yamanishi	In recent years, with the large-scale expansion of graph data, there has been an increased focus on Riemannian manifold data spaces other than Euclidean space. In particular, the development of hyperbolic spaces has been remarkable, and they have high expressi... In recent years, with the large-scale expansion of graph data, there has been an increased focus on Riemannian manifold data spaces other than Euclidean space. In particular, the development of hyperbolic spaces has been remarkable, and they have high expressive power for graph data with hierarchical structures. Normalized Maximum Likelihood (NML) is employed in regret minimization and model selection. However, existing formulations of NML have been developed primarily in Euclidean spaces and ar...
798	Scalable Option Learning in High-Throughput Environments 2509.00338 Scalable Hierarchical Reinforcement Learning提出SOL可扩展选项学习算法，提升高吞吐环境下分层强化学习训练效率。	cs.LGcs.AI	Mikael Henaff, Scott Fujimoto, Michael Matthews, Michael Rabbat	Hierarchical reinforcement learning (RL) has the potential to enable effective decision-making over long timescales. Existing approaches, while promising, have yet to realize the benefits of large-scale training. In this work, we identify and solve several key... Hierarchical reinforcement learning (RL) has the potential to enable effective decision-making over long timescales. Existing approaches, while promising, have yet to realize the benefits of large-scale training. In this work, we identify and solve several key challenges in scaling online hierarchical RL to high-throughput environments. We propose Scalable Option Learning (SOL), a highly scalable hierarchical RL algorithm which achieves a ~35x higher throughput compared to existing hierarchical ...
799	Ensemble Learning for Healthcare: A Comparative Analysis of Hybrid Voting and Ensemble Stacking in Obesity Risk Prediction 2509.02826 Ensemble Obesity Risk Prediction比较混合投票与堆叠集成在肥胖风险预测中的效果与差异。	cs.LGcs.AI	Towhidul Islam, Md Sumon Ali	Obesity is a critical global health issue driven by dietary, physiological, and environmental factors, and is strongly associated with chronic diseases such as diabetes, cardiovascular disorders, and cancer. Machine learning has emerged as a promising approach... Obesity is a critical global health issue driven by dietary, physiological, and environmental factors, and is strongly associated with chronic diseases such as diabetes, cardiovascular disorders, and cancer. Machine learning has emerged as a promising approach for early obesity risk prediction, yet a comparative evaluation of ensemble techniques -- particularly hybrid majority voting and ensemble stacking -- remains limited. This study aims to compare hybrid majority voting and ensemble stacking...
800	Mechanistic Interpretability with Sparse Autoencoder Neural Operators 2509.03738 Sparse Autoencoder Neural Operators提出函数空间稀疏自编码算子，用函数化概念实现机制可解释表示。	cs.LGcs.AI	Bahareh Tolooshams, Ailsa Shen, Anima Anandkumar	We introduce sparse autoencoder neural operators (SAE-NOs), a new class of sparse autoencoders that operate in function spaces rather than fixed-dimensional Euclidean representations. We formalize the functional representation hypothesis, where data are explai... We introduce sparse autoencoder neural operators (SAE-NOs), a new class of sparse autoencoders that operate in function spaces rather than fixed-dimensional Euclidean representations. We formalize the functional representation hypothesis, where data are explained through sparse compositions of structured functions. Unlike standard SAEs that represent concepts with scalar activations, SAE-NOs parameterize concepts as functions, enabling representations that capture not only a concept's presence, ...
801	Hammer and Anvil: Toward a Theory of Backdoors in Federated Learning 2509.08089 Federated Learning Backdoor Theory提出Hammer and Anvil理论框架按更新偏离度刻画FL后门并分析防御类型。	cs.LG	Lucas Fenaux, Zheng Wang, Jacob Yan, Nathan Chung, Florian Kerschbaum	Federated Learning (FL) enables distributed model training but is vulnerable to backdoor attacks, where malicious clients embed attacker-controlled behaviors into the global model. Existing defenses fail against adaptive adversaries. In this paper, we present ... Federated Learning (FL) enables distributed model training but is vulnerable to backdoor attacks, where malicious clients embed attacker-controlled behaviors into the global model. Existing defenses fail against adaptive adversaries. In this paper, we present "Hammer and Anvil", a principled theoretical framework that categorizes backdoors by the deviation, $\delta$, of their updates to the mean of the updates. We identify two fundamental defense types: "Type 1 (The Anvil)", comprising outlier d...
802	Inverse Reinforcement Learning with Just Classification and a Few Regressions 2509.21172 Normalized Inverse Reinforcement Learning在最大熵IRL中研究状态仿射归一化下的奖励可恢复性，仅需分类与少量回归。	cs.LG	Lars van der Laan, Nathan Kallus, Aurelien Bibaut	Inverse reinforcement learning (IRL) aims to infer rewards from observed behavior, but rewards are not identified from the policy alone: many reward--value pairs can rationalize the same actions. Meaningful reward recovery therefore requires a normalization, y... Inverse reinforcement learning (IRL) aims to infer rewards from observed behavior, but rewards are not identified from the policy alone: many reward--value pairs can rationalize the same actions. Meaningful reward recovery therefore requires a normalization, yet existing normalized IRL methods often rely on anchor-action restrictions or specialized neural architectures. We study reward recovery in the maximum-entropy, or Gumbel-shock, model under a broad class of statewise affine normalizations,...
803	BoHA: Blockwise Hadamard Product Adaptation for Parameter-Efficient Fine-Tuning 2509.21637 Parameter-efficient Fine-tuning Adaptation提出BoHA块式Hadamard适配用于PEFT，并关注序列微调下的遗忘保持。	cs.LG	Feng Yu, Jia Hu, Geyong Min	Parameter-efficient fine-tuning (PEFT) of large language models trains a small task-specific parameter set while keeping the pretrained model frozen. The dominant Low-Rank Adaptation (LoRA) family makes this trade-off practical; however, evaluations under the ... Parameter-efficient fine-tuning (PEFT) of large language models trains a small task-specific parameter set while keeping the pretrained model frozen. The dominant Low-Rank Adaptation (LoRA) family makes this trade-off practical; however, evaluations under the same parameter budget assess single-task accuracy. In sequential adaptation settings, such evaluations should also measure how well performance on the first-stage task is retained after subsequent fine-tuning. To address this gap, we introd...
804	Limitations on Accurate, Trusted, Human-level Reasoning 2509.21654 Limits of Trusted Reasoning用严格定义证明准确性、可被信任与人类水平推理三者存在根本不相容。	cs.LGcs.AI	Rina Panigrahy, Vatsal Sharan	We identify a fundamental incompatibility between the goals of accuracy, trust, and human-level reasoning in artificial intelligence (AI) systems, for strict mathematical definitions of these notions. We define accuracy of a system as the property that it neve... We identify a fundamental incompatibility between the goals of accuracy, trust, and human-level reasoning in artificial intelligence (AI) systems, for strict mathematical definitions of these notions. We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or e...
805	Fidel-TS: A High-Fidelity Multimodal Benchmark for Time Series Forecasting 2509.24789 Time Series Forecasting Benchmark构建高保真多模态预测基准Fidel-TS，强调数据完整性与无泄漏评测。	cs.LG	Zhijian Xu, Wanxu Cai, Xilin Dai, Zhaorong Deng, Qiang Xu	The evaluation of time series forecasting models is hindered by a lack of high-quality benchmarks, leading to overestimated assessments of progress. Existing datasets suffer from issues ranging from small-scale, low-frequency, pre-training data contamination i... The evaluation of time series forecasting models is hindered by a lack of high-quality benchmarks, leading to overestimated assessments of progress. Existing datasets suffer from issues ranging from small-scale, low-frequency, pre-training data contamination in unimodal designs to the temporal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, leak-free design, and st...
806	TAP: Two-Stage Adaptive Personalization of Multi-Task and Multi-Modal Foundation Models in Federated Learning 2509.26524 Federated Foundation Model Personalization提出TAP两阶段自适应个性化，在联邦场景下处理多任务多模态异质性。	cs.LGcs.AI	Seohyun Lee, Wenzhi Fang, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton	In federated learning (FL), local personalization of models has received significant attention, yet personalized fine-tuning of foundation models remains underexplored. In particular, there is a lack of understanding in the literature on how to personalize fou... In federated learning (FL), local personalization of models has received significant attention, yet personalized fine-tuning of foundation models remains underexplored. In particular, there is a lack of understanding in the literature on how to personalize foundation models in settings where there exist heterogeneity not only in data, but also in tasks and modalities across the clients. To address this gap, we propose Two-Stage Adaptive Personalization (TAP). In the first stage, TAP leverages mi...
807	DReS: Dual Reconstruction Smoothing for Functional Regularization 2510.00253 Smoothness Regularization via Reconstruction提出DReS双重重构平滑正则，无需显式梯度开销即可诱导函数平滑性。	cs.LG	Parsa Moradi, Tayyebeh Jahaninezhad, Hanzaleh Akbarinodehi, Mohammad Ali Maddah-Ali	Smoothness is a key inductive bias in machine learning and is closely related to generalization. Existing smoothness-inducing methods typically rely either on explicit gradient regularization, which often incurs substantial computational and memory overhead, o... Smoothness is a key inductive bias in machine learning and is closely related to generalization. Existing smoothness-inducing methods typically rely either on explicit gradient regularization, which often incurs substantial computational and memory overhead, or on data-mixing strategies, which are less naturally applicable to unsupervised and self-supervised settings. In this work, we propose $\textit{Dual Reconstruction Smoothing}$ (DReS), a nonparametric regularization framework that induces s...
808	Geometric Analysis of Neural Regression Collapse via Intrinsic Dimension 2510.01105 Regression Collapse Geometry用内在维度分析神经回归塌缩现象，解释其为何降低回归性能。	cs.LG	George Andriopoulos, Zixuan Dong, Bimarsha Adhikari, Keith Ross	Neural multivariate regression underpins a wide range of domains, including control, robotics, and finance, yet the geometry of its learned representations remains poorly characterized. While neural collapse has been shown to benefit generalization in classifi... Neural multivariate regression underpins a wide range of domains, including control, robotics, and finance, yet the geometry of its learned representations remains poorly characterized. While neural collapse has been shown to benefit generalization in classification, we find that analogous collapse in regression consistently degrades performance. To explain this contrast, we analyze regression models through the lens of intrinsic dimension. Across control tasks and synthetic datasets, we estimat...
809	ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models 2510.01290 KV Cache Compression for Reasoning提出ThinKV按思维类型自适应压缩KV缓存，用量化与驱逐降低推理显存。	cs.LG	Akshat Ramachandran, Marina Neseem, Charbel Sakr, Rangharajan Venkatesan, Brucek Khailany	The long-output context generation of large reasoning models enables extended chain of thought (CoT) but also drives rapid growth of the key-value (KV) cache, quickly overwhelming GPU memory. To address this challenge, we propose ThinKV, a thought-adaptive KV ... The long-output context generation of large reasoning models enables extended chain of thought (CoT) but also drives rapid growth of the key-value (KV) cache, quickly overwhelming GPU memory. To address this challenge, we propose ThinKV, a thought-adaptive KV cache compression framework. ThinKV is based on the observation that attention sparsity reveals distinct thought types with varying importance within the CoT. It applies a hybrid quantization-eviction strategy, assigning token precision by ...
810	Flock: A Knowledge Graph Foundation Model via Learning on Random Walks 2510.01510 Knowledge Graph Foundation Model提出Flock在随机游走上学习的KG基础模型，实现零样本链路预测泛化。	cs.LG	Jinwoo Kim, Xingyue Huang, Krzysztof Olejniczak, Kyungbin Min, Michael Bronstein	We study the problem of zero-shot link prediction on knowledge graphs (KGs), which requires models to generalize to novel entities and novel relations. Knowledge graph foundation models (KGFMs) address this task by enforcing equivariance over both nodes and re... We study the problem of zero-shot link prediction on knowledge graphs (KGs), which requires models to generalize to novel entities and novel relations. Knowledge graph foundation models (KGFMs) address this task by enforcing equivariance over both nodes and relations, which enables them to learn structural properties of nodes and relations that transfer to novel KGs with similar structure. However, the conventional notion of deterministic equivariance inherently limits the expressive power of KG...
811	Closed-Form Last Layer Optimization 2510.04606 Closed-form last-layer training用闭式解更新最后线性层并仅优化骨干参数。	cs.LG	Alexandre Galashov, Natha\"el Da Costa, Liyuan Xu, Philipp Hennig, Arthur Gretton	Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during optimization, treating the la... Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during optimization, treating the last layer as a function of the backbone parameters, and optimizing solely for these parameters. We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer. We adapt the ...
812	Amortized Multi-Objective Optimization Across Tasks with Generative Solution Modeling 2511.09598 Amortized multi-objective optimization用生成式解模型跨任务摊销求解昂贵多目标优化。	cs.LG	Tingyang Wei, Jiao Liu, Abhishek Gupta, Chin Chun Ooi, Puay Siew Tan	Many real-world applications require solving families of expensive multi-objective optimization problems~(EMOPs) under varying operational conditions. This can be formulated as parametric expensive multi-objective optimization problems (P-EMOPs) where each tas... Many real-world applications require solving families of expensive multi-objective optimization problems~(EMOPs) under varying operational conditions. This can be formulated as parametric expensive multi-objective optimization problems (P-EMOPs) where each task parameter defines a distinct optimization instance. Current multi-objective Bayesian optimization methods have been widely used for finding finite sets of Pareto optimal solutions for each task. However, P-EMOPs present a fundamental chal...
813	Outlier Smoothing with Closed-Form Rotations for W4A4 Large Language Model Quantization 2511.22316 LLM W4A4 quantization rotations用闭式旋转与异常平滑提升W4A4量化收敛与精度。	cs.LG	Jinying Xiao, Bin Ji, Shasha Li, Xiaodong Liu, Ma Jun	Large Language Models (LLMs) quantization facilitates deploying LLMs in resource-limited settings, but existing methods that combine incompatible gradient optimization and quantization truncation lead to serious convergence pathology. This prolongs quantizatio... Large Language Models (LLMs) quantization facilitates deploying LLMs in resource-limited settings, but existing methods that combine incompatible gradient optimization and quantization truncation lead to serious convergence pathology. This prolongs quantization time and degrades LLMs' task performance. Our studies confirm that Straight-Through Estimator (STE) on Stiefel manifolds introduce non-smoothness and gradient noise, obstructing optimization convergence and blocking high-fidelity quantize...
814	Faster Verified Explanations for Neural Networks 2512.00164 Verified neural explanations acceleration提出FaVeX加速神经网络可验证解释的计算。	cs.LG	Alessandro De Palma, Greta Dolcetti, Caterina Urban	Verified explanations are a principled way to explain the decisions taken by neural networks, which are otherwise black-box in nature. However, these techniques face significant scalability challenges, as they require multiple calls to neural network verifiers... Verified explanations are a principled way to explain the decisions taken by neural networks, which are otherwise black-box in nature. However, these techniques face significant scalability challenges, as they require multiple calls to neural network verifiers, each of them with an exponential worst-case complexity. We present FaVeX, a novel algorithm to compute verified explanations. FaVeX accelerates the computation by dynamically combining batch and sequential processing of input features, an...
815	ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms 2512.03476 Agentic scientific computing framework提出ATHENA代理框架自动化数值算法研发全流程。	cs.LGcs.AI	Juan Diego Toscano, Daniel T. Chen, George Em Karniadakis	Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We introduce ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algo... Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We introduce ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algorithms), an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle. Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual Bandit problem. Acting as an onli...
816	Neural CDEs as Correctors for Learned Time Series Models 2512.12116 Neural CDE predictor-corrector用神经CDE作校正器减少多步时间序列预测误差累积。	cs.LG	Muhammad Bilal Shahid, Zhanhong Jiang, Prajwal Koirala, Soumik Sarkar, Cody Fleming	Learned time-series models, whether continuous or discrete, are widely used for forecasting the states of dynamical systems but suffer from error accumulation in multi-step forecasts. To address this issue, we propose a Predictor-Corrector framework in which t... Learned time-series models, whether continuous or discrete, are widely used for forecasting the states of dynamical systems but suffer from error accumulation in multi-step forecasts. To address this issue, we propose a Predictor-Corrector framework in which the Predictor is a learned time-series model that generates multi-step forecasts and the Corrector is a neural controlled differential equation that corrects the forecast errors. The Corrector works with irregularly sampled time series and i...
817	Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics 2512.12602 Exact-flow linear attention用连续时间精确流替代欧拉更新实现精确线性注意力。	cs.LG	Jingdi Lei, Di Zhang, Soujanya Poria	In this paper, we introduce Exact Flow Linear Attention~(EFLA), an exact-flow formulation of delta-rule linear attention. We show that the delta-rule update can be interpreted as an explicit Euler discretization of an underlying continuous-time system. EFLA re... In this paper, we introduce Exact Flow Linear Attention~(EFLA), an exact-flow formulation of delta-rule linear attention. We show that the delta-rule update can be interpreted as an explicit Euler discretization of an underlying continuous-time system. EFLA replaces this first-order update with the exact closed-form flow. By exploiting the rank-1 structure of the dynamics matrix, both the matrix exponential and the input integral collapse to a simple update that preserves delta-rule linear atten...
818	DT-PBO: an Interpretable Tree-based Surrogate Model for Preferential Bayesian Optimization 2512.14263 Interpretable preferential Bayesian optimization用可解释树模型替代GP实现偏好贝叶斯优化。	cs.LGcs.AI	Nick Leenders, Thomas Quadt, Boris Cule, Roy Lindelauf, Herman Monsuur	Preferential Bayesian Optimization (PBO) aims to find a decision-maker's most preferred solution in as few pairwise comparisons as possible. Existing approaches rely on Gaussian Process (GP) surrogates, which provide strong performance but limited interpretabi... Preferential Bayesian Optimization (PBO) aims to find a decision-maker's most preferred solution in as few pairwise comparisons as possible. Existing approaches rely on Gaussian Process (GP) surrogates, which provide strong performance but limited interpretability. This limits real-world usability in high-stakes domains, such as healthcare, where interpretability and trust are essential. We propose DT-PBO, a novel tree-based surrogate model for PBO that is inherently interpretable while capturin...
819	DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations 2512.17129 Differentiable morphogenesis simulation用可微代理仿真端到端学习3D形态生成协议。	cs.LG	Seong Ho Pahng, Guoye Guan, Benjamin Fefferman, Sahand Hormoz	Biological systems can form complex three-dimensional structures through the collective behavior of agents that share a common update rule and operate without central control. How such distributed control gives rise to precise global patterns remains a central... Biological systems can form complex three-dimensional structures through the collective behavior of agents that share a common update rule and operate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that g...
820	Bloom Filter Encoding for Machine Learning 2512.19991 Bloom filter feature encoding用布隆滤波哈希编码将样本压缩为定长比特特征。	cs.LG	John Cartmell, Mihaela Cardei, Ionut Cardei	We present a method that uses a Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact bit-array representation using hash-based encoding, producing a fixed-length feature space that reduces memory usage and obfus... We present a method that uses a Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact bit-array representation using hash-based encoding, producing a fixed-length feature space that reduces memory usage and obfuscates original feature values. The encoding does not rely on keyed hashing; however, a key can optionally be used to control the mapping and would be required to reproduce the representation. We evaluate the approach on six datasets spannin...
821	Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions 2512.20974 Deep Bayesian RL with GLMs在深度贝叶斯RL中学习基函数并用GLM建模转移与奖励。	cs.LGcs.AI	Jingyang You, Hanna Kurniawati	Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL method... Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL methods assume known forms of transition and reward models. While recent deep BRL methods incorporate model learning to address this, applying neural networks directly to joint data and task parameters necessitates variational inference. This oft...
822	SB-TRPO: Towards Safe Reinforcement Learning with Hard Constraints 2512.23770 Hard-constrained safe reinforcement learning提出SB-TRPO在信赖域内实现近零违规的安全RL。	cs.LGcs.AI	Dominik Wagner, Ankit Kanwar, Luke Ong	In safety-critical domains, reinforcement learning (RL) agents must often satisfy strict, zero-cost safety constraints while accomplishing tasks. Existing model-free methods frequently either fail to achieve near-zero safety violations or become overly conserv... In safety-critical domains, reinforcement learning (RL) agents must often satisfy strict, zero-cost safety constraints while accomplishing tasks. Existing model-free methods frequently either fail to achieve near-zero safety violations or become overly conservative. We introduce Safety-Biased Trust Region Policy Optimisation (SB-TRPO), a principled algorithm for hard-constrained RL that dynamically balances cost reduction with reward improvement. At each step, SB-TRPO updates via a dynamic conve...
823	FANoS-v2: Feedback-Controlled Momentum with Thermostat Damping for Lightweight Neural Optimization 2601.00889 Feedback-controlled neural optimizer给出FANoS-v2优化器的完整数学定义与稳定性诊断。	cs.LG	Nalin Dhiman	\FANOS{} is a PyTorch optimizer that augments RMS-preconditioned momentum with a scalar feedback controller over update energy. The public reference implementation stores momentum in parameter-update units, applies a non-negative thermostat damping coefficient... \FANOS{} is a PyTorch optimizer that augments RMS-preconditioned momentum with a scalar feedback controller over update energy. The public reference implementation stores momentum in parameter-update units, applies a non-negative thermostat damping coefficient, supports diagonal, factored, and raw-gradient preconditioning, and exposes diagnostics intended for stability audits. This study gives a complete mathematical specification of the released optimizer, including the exact parameter-unit upd...
824	Partially Lazy Gradient Descent for Smoothed Online Learning 2601.15984 Partially lazy online gradient descent提出k-lazyGD在SOCO中权衡反应性与稳定性并给出保证。	cs.LG	Naram Mhaisen, George Iosifidis	We introduce \textsc{$k$-lazyGD}, an online learning algorithm that bridges the gap between greedy Online Gradient Descent (OGD, for $k{=}1$) and lazy GD/dual-averaging (for $k{=}T$), creating a spectrum between reactive and stable updates. We analyze this spe... We introduce \textsc{$k$-lazyGD}, an online learning algorithm that bridges the gap between greedy Online Gradient Descent (OGD, for $k{=}1$) and lazy GD/dual-averaging (for $k{=}T$), creating a spectrum between reactive and stable updates. We analyze this spectrum in Smoothed Online Convex Optimization (SOCO), where the learner incurs both hitting and movement costs. Our main contribution is establishing that laziness is possible without sacrificing hitting performance: we prove that \textsc{$k...
825	ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule 2601.18681 RL timestep scheduling for diffusion用强化学习自适应重参数化时间表以提升扩散采样效率。	cs.LGcs.AI	Yilie Huang, Wenpin Tang, Xunyu Zhou	We consider time discretization for score-based diffusion models to generate samples from a learned reverse-time dynamic on a finite grid. Uniform and hand-crafted grids can be suboptimal given a budget on the number of time steps. We introduce Adaptive Repara... We consider time discretization for score-based diffusion models to generate samples from a learned reverse-time dynamic on a finite grid. Uniform and hand-crafted grids can be suboptimal given a budget on the number of time steps. We introduce Adaptive Reparameterized Time (ART), which controls the clock speed of a reparameterized time variable to redistribute computation along the sampling trajectory while preserving the terminal time, with the objective of minimizing the aggregate Euler discr...
826	R-GTD: A Geometric Analysis of Gradient Temporal-Difference Learning in Singular Regimes 2601.20599 Singular-regime GTD analysis在特征交互矩阵奇异时给出GTD学习的几何收敛分析。	cs.LGcs.AI	Hyunjun Na, Donghwan Lee	Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) i... Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) is nonsingular. In practice, the FIM can become singular and leads to instability or degraded performance. While some prior works have applied regularization to relax the nonsingularity assumption, their theoretical guarantees inevitably rel...
827	Exact Gaussian Moment Matching for Residual Networks: a Second-Order Method 2601.22307 Exact Gaussian moment matching推导多种激活下残差网络的高斯均值协方差精确传播。	cs.LG	Simon Kuang, Xinfan Lin	We study the problem of propagating the mean and covariance of a general multivariate Gaussian distribution through a deep (residual) neural network using layer-by-layer moment matching. We close a longstanding gap by deriving exact moment matching for the pro... We study the problem of propagating the mean and covariance of a general multivariate Gaussian distribution through a deep (residual) neural network using layer-by-layer moment matching. We close a longstanding gap by deriving exact moment matching for the probit, GeLU, ReLU (as a limit of GeLU), Heaviside (as a limit of probit), and sine activation functions; for both feedforward and generalized residual layers. On random networks, we find orders-of-magnitude improvements in the KL divergence e...
828	Sparse Attention as Compact Kernel Regression 2601.22766 Sparse attention kernel interpretation证明稀疏注意力等价于有界支撑的紧核回归形式。	cs.LG	Saul Santos, Nuno Gon\c{c}alves, Daniel C. McNamee, Marcos Treviso, Andr\'e F. T Martins	Recent work has revealed a link between self-attention mechanisms in transformers and test-time kernel regression via the Nadaraya-Watson estimator, with standard softmax attention corresponding to a Gaussian kernel. However, a kernel-theoretic understanding o... Recent work has revealed a link between self-attention mechanisms in transformers and test-time kernel regression via the Nadaraya-Watson estimator, with standard softmax attention corresponding to a Gaussian kernel. However, a kernel-theoretic understanding of sparse attention mechanisms is currently missing. In this paper, we establish a formal correspondence between sparse attention and compact (bounded support) kernels. We show that normalized ReLU and sparsemax attention arise from Epanechn...
829	PAIR-Former: Budgeted Relational Multi-Instance Learning for Functional miRNA Target Prediction 2602.00465 Relational MIL for miRNA targets提出预算化关系聚合的多实例学习预测miRNA靶点。	cs.LGcs.AI	Jiaqi Yin, Baiming Chen, Jia Fei, Mingjun Yang	Functional miRNA--mRNA targeting is a large-bag prediction problem where each transcript yields a heavy-tailed pool of candidate target sites (CTSs), yet only a pair-level label is observed. Prior methods use max-pooling over individual CTS scores, ignoring re... Functional miRNA--mRNA targeting is a large-bag prediction problem where each transcript yields a heavy-tailed pool of candidate target sites (CTSs), yet only a pair-level label is observed. Prior methods use max-pooling over individual CTS scores, ignoring relational patterns among sites, but modeling these patterns is critical for accuracy. The challenge is that naive relational aggregation incurs $\mathcal{O}(n^2)$ cost, prohibitive when $n$ reaches thousands, yet a cheap scan alone discards ...
830	Minerva: Reinforcement Learning with Verifiable Rewards for Cyber Threat Intelligence LLMs 2602.00513 Verifiable-reward RL for CTI LLMs用可验证奖励的强化学习提升LLM生成结构化CTI输出。	cs.LG	Md Tanvirul Alam, Aritran Piplai, Ionut Cardei, Nidhi Rastogi, Peter J Worth Jr	Cyber threat intelligence (CTI) analysts routinely convert noisy, unstructured security artifacts into standardized, automation-ready representations. Although large language models (LLMs) show promise for this task, existing approaches remain brittle when pro... Cyber threat intelligence (CTI) analysts routinely convert noisy, unstructured security artifacts into standardized, automation-ready representations. Although large language models (LLMs) show promise for this task, existing approaches remain brittle when producing structured CTI outputs and have largely relied on supervised fine-tuning (SFT). In contrast, CTI standards and community-maintained resources define canonical identifiers and schemas that enable deterministic verification of model ou...
831	ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning 2602.01003 Memory-efficient RL fine-tuning用ES结合SAM的ESSAM降低LLM强化微调显存开销。	cs.LGcs.AI	Zhishen Sun, Sizhe Dang, Guang Dai, Haishan Ye	Reinforcement learning (RL) has become a key training step for improving mathematical reasoning in large language models (LLMs), but it often has high GPU memory usage, which makes it hard to use in settings with limited resources. To reduce these issues, we p... Reinforcement learning (RL) has become a key training step for improving mathematical reasoning in large language models (LLMs), but it often has high GPU memory usage, which makes it hard to use in settings with limited resources. To reduce these issues, we propose Evolution Strategies with Sharpness-Aware Maximization (ESSAM), a full parameter fine-tuning framework that tightly combines the zero-order search in parameter space from Evolution Strategies (ES) with the Sharpness-Aware Maximizatio...
832	The Effect of Mini-Batch Noise on the Implicit Bias of Adam 2602.01642 Adam implicit bias under batch noise理论分析小批量噪声如何影响Adam的隐式偏置与解选择。	cs.LGcs.AI	Matias D. Cattaneo, Boris Shigida	With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two momentum hyperparam... With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two momentum hyperparameters $(\beta_1, \beta_2)$ controlling memory and one very important hyperparameter, batch size, controlling (in particular) the amount mini-batch noise. We introduce a theoretical framework to understand how mini-batch noise influences the...
833	TopoPrune: Robust Data Pruning via Unified Latent Space Topology 2602.02739 Topology-based robust data pruning用潜空间拓扑结构实现跨架构更稳健的数据剪枝。	cs.LGcs.AI	Arjun Roy, Prajna G. Malettira, Manish Nagaraj, Kaushik Roy	Geometric data pruning methods, while practical for leveraging pretrained models, are fundamentally unstable. Their reliance on extrinsic geometry renders them highly sensitive to latent space perturbations, causing performance to degrade during cross-architec... Geometric data pruning methods, while practical for leveraging pretrained models, are fundamentally unstable. Their reliance on extrinsic geometry renders them highly sensitive to latent space perturbations, causing performance to degrade during cross-architecture transfer or in the presence of feature noise. We introduce TopoPrune, a framework which resolves this challenge by leveraging topology to capture the stable, intrinsic structure of data. TopoPrune operates at two scales, (1) utilizing ...
834	Koopman Autoencoders with Continuous-Time Latent Dynamics for Fluid Dynamics Forecasting 2602.02832 Continuous-time Koopman autoencoders用连续时间Koopman潜动力学实现任意步长流体预测。	cs.LG	Rares Grozavescu, Pengyu Zhang, Etienne Meunier, Mark Girolami	Forecasting physical systems over long horizons from irregularly sampled observations demands models that are stable, computationally efficient, and free of fixed-timestep assumptions. We address this with a continuous-time Koopman autoencoder whose latent dyn... Forecasting physical systems over long horizons from irregularly sampled observations demands models that are stable, computationally efficient, and free of fixed-timestep assumptions. We address this with a continuous-time Koopman autoencoder whose latent dynamics obey $dz/dt = \mathbf{K}_{\mathrm{cont}} z$, yielding closed-form inference via $z(\tau) = \exp(\mathbf{K}_{\mathrm{cont}} \tau) z(0)$ at any horizon $\tau$ in a single step. This decouples forecast cost from forecast length at infere...
835	SLOPE: Optimistic Potential Landscape Shaping for Model-based Reinforcement Learning 2602.03201 Sparse-reward model-based RL shaping用乐观势函数塑形奖励景观以缓解稀疏奖励MBRL。	cs.LG	Yao-Hui Li, Zeyu Wang, Xin Li, Wei Pang, Yingfang Yuan	Model-based reinforcement learning (MBRL) is sample-efficient but struggles in sparse reward settings. A critical bottleneck arises from the lack of informative gradients in sparse settings, where standard reward models often yield flat landscapes that struggl... Model-based reinforcement learning (MBRL) is sample-efficient but struggles in sparse reward settings. A critical bottleneck arises from the lack of informative gradients in sparse settings, where standard reward models often yield flat landscapes that struggle to guide planning. To address this challenge, we propose Shaping Landscapes with Optimistic Potential Estimates (SLOPE), a novel framework that shifts reward modeling from predicting sparse scalars to constructing informative potential la...
836	Bayesian Conformal Prediction as a Decision Risk Problem 2602.03331 Bayesian conformal prediction risk control结合贝叶斯后验与共形风险控制生成有限样本覆盖集合。	cs.LG	Fanyi Wu, Veronika Lohmanova, Samuel Kaski, Michele Caprio	We propose Bayesian Conformal Prediction (BCP), a framework that combines Bayesian posterior predictive distributions with PAC-style conformal risk control to produce prediction sets with finite-sample coverage guarantees. Standard quantile-threshold conformal... We propose Bayesian Conformal Prediction (BCP), a framework that combines Bayesian posterior predictive distributions with PAC-style conformal risk control to produce prediction sets with finite-sample coverage guarantees. Standard quantile-threshold conformal methods often construct prediction sets using a single fixed threshold, which typically yields connected prediction sets. While valid, such sets can be inefficient when the posterior predictive distribution is multimodal, since they may sp...
837	Path Integration and Object-Location Binding Emerge in an Action-Conditioned Predictive Sequence Network 2602.03490 Action-conditioned predictive world models在动作条件预测网络中涌现路径积分与物体位置绑定表征。	cs.LG	Linda Ariel Ventura, Victoria Bosch, Tim C Kietzmann, Sushrut Thorat	Adaptive cognition requires structured internal models of objects and their relations. Predictive neural networks are often proposed to learn such world models, but how these are instantiated and how they support prediction remain unclear. We investigate this ... Adaptive cognition requires structured internal models of objects and their relations. Predictive neural networks are often proposed to learn such world models, but how these are instantiated and how they support prediction remain unclear. We investigate this in a minimal in-silico setting. A recurrent neural network samples tokens sequentially from 2D continuous token scenes and is trained to predict the upcoming token from the current input and a saccade-like displacement. On novel scenes, pre...
838	Manifold Random Features 2602.03797 Random features on manifolds提出流形随机特征以近似流形上的核与双变量函数。	cs.LG	Ananya Parashar, Derek Long, Dwaipayan Saha, Krzysztof Choromanski	We present a new paradigm for creating random features to approximate bi-variate functions (in particular, kernels) defined on general manifolds. This new mechanism of Manifold Random Features (MRFs) leverages discretization of the manifold and the recently in... We present a new paradigm for creating random features to approximate bi-variate functions (in particular, kernels) defined on general manifolds. This new mechanism of Manifold Random Features (MRFs) leverages discretization of the manifold and the recently introduced technique of Graph Random Features (GRFs) to learn continuous fields on manifolds. Those fields are used to find continuous approximation mechanisms that otherwise, in general scenarios, cannot be derived analytically. MRFs provide...
839	Mixture of Masters: Sparse Chess Language Models with Player Routing 2602.04447 Mixture-of-experts chess language models用大师路由的稀疏MoE棋类语言模型保留不同棋风。	cs.LGcs.AI	Giacomo Frisoni, Lorenzo Molfetta, Davide Freddi, Gianluca Moro	Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare bu... Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare but effective strategies are suppressed. To counteract homogenization, we introduce Mixture-of-Masters (MoM), the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters. For each move, a post-hoc ...
840	SOCKET: SOft Collision Kernel EsTimator for Sparse Attention 2602.06283 Soft LSH kernel for sparse attention提出SOCKET用软碰撞核估计高效选择稀疏注意力token。	cs.LG	Sahil Joshi, Agniva Chowdhury, Wyatt Bellinger, Amar Kanakamedala, Ekam Singh	Exploiting sparsity during long-context inference is key to scaling large language models, as attention dominates the cost of autoregressive decoding. Sparse attention reduces this cost by restricting computation to a subset of tokens, but its effectiveness de... Exploiting sparsity during long-context inference is key to scaling large language models, as attention dominates the cost of autoregressive decoding. Sparse attention reduces this cost by restricting computation to a subset of tokens, but its effectiveness depends on efficient scoring and selection at inference time. We revisit Locality-Sensitive Hashing (LSH) and introduce SOCKET, a SOft Collision Kernel EsTimator that replaces hard bucket matches with probabilistic, similarity-aware aggregati...
841	When Does Embedding Magnitude Matter? A Cross-Task Functional-Symmetry Framework 2602.09229 Embedding normalization in retrieval提出查询/文档归一化框架并验证单边归一化检索更优	cs.LG	Xincan Feng, Taro Watanabe	Cosine similarity normalizes both sides; dot product normalizes neither. We propose a 2x2 framework that independently controls query-side and document-side normalization, exposing two intermediate variants (QNorm, DNorm) that have not been previously studied.... Cosine similarity normalizes both sides; dot product normalizes neither. We propose a 2x2 framework that independently controls query-side and document-side normalization, exposing two intermediate variants (QNorm, DNorm) that have not been previously studied. On retrieval with four encoders, evaluated in-domain on MS MARCO and out-of-domain on BEIR, BRIGHT, and multi-hop QA, the unilateral variants outperform both cosine and dot product, with relative gains of up to +72% out-of-domain and +24% ...
842	Exponential Sample Complexity Separation between Flat and Hierarchical Agentic Theorem Provers 2602.10512 Hierarchical agentic theorem proving证明分层引理分解的定理证明器具指数级样本复杂度优势	cs.LG	Sho Sonoda, Shunta Akiyama, Yuya Uezato	Agentic theorem provers often introduce intermediate lemmas, proof sketches, or subgoal decompositions before returning to tactic-level search. This can look like an expensive detour: if proving lemmas is itself hard, why should a learned prover spend effort t... Agentic theorem provers often introduce intermediate lemmas, proof sketches, or subgoal decompositions before returning to tactic-level search. This can look like an expensive detour: if proving lemmas is itself hard, why should a learned prover spend effort there? We give a statistical learning answer. Instead of worst-case proof complexity over all formulas, we study the biased data distribution produced by a teacher prover: initial theorem states together with successful verified proof traces...
843	VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training 2602.10693 Off-policy RL for LLMs提出序列级变分软策略优化以稳定低方差离策略LLM训练	cs.LGcs.AI	Guobin Shen, Chenxiao Zhao, Xiang Cheng, Lei Huang, Xing Yu	Off-policy updates are inevitable in reinforcement learning (RL) for large language models (LLMs) due to rollout staleness from asynchronous training and mismatches between training and inference engines. Naive importance sampling gives an unbiased correction ... Off-policy updates are inevitable in reinforcement learning (RL) for large language models (LLMs) due to rollout staleness from asynchronous training and mismatches between training and inference engines. Naive importance sampling gives an unbiased correction but suffers from high variance, which is amplified by unbounded ratios and autoregressive generation. Prior remedies either rely on scenario-specific engineering, or trade bias for variance via token-level clipping or sequence-level normali...
844	Amortized Molecular Optimization via Group Relative Policy Optimization 2602.12162 Amortized molecular optimization用组相对策略优化训练可迁移策略以加速受约束分子优化	cs.LG	Muhammad bin Javaid, Hasham Hussain, Ashima Khanna, Berke Kisin, Jonathan Pirnay	In structurally constrained molecular optimization, state-of-the-art methods restart an expensive oracle-driven search from scratch for every new input structure, scaling poorly to settings with many starting structures or expensive oracles. While amortized ap... In structurally constrained molecular optimization, state-of-the-art methods restart an expensive oracle-driven search from scratch for every new input structure, scaling poorly to settings with many starting structures or expensive oracles. While amortized approaches that learn a transferable policy could in principle remove this bottleneck, existing methods struggle to generalize to diverse structural constraints at inference time. We present AMORTIX, an amortized Graph Transformer model that ...
845	$\gamma$-weakly $\theta$-up-concavity: A Unified Framework for Non-Convex Optimization Beyond DR-Submodular and OSS Functions 2602.13506 Non-convex optimization theory提出γ弱θ上凹性统一刻画并推广多类非凸可优化函数	cs.LGcs.AI	Mohammad Pedramfar, Vaneet Aggarwal	Optimizing non-convex functions is a fundamental challenge across machine learning and combinatorial optimization. We introduce and study $\gamma$-weakly $\theta$-up-concavity, a novel first-order condition that characterizes a broad class of such functions. T... Optimizing non-convex functions is a fundamental challenge across machine learning and combinatorial optimization. We introduce and study $\gamma$-weakly $\theta$-up-concavity, a novel first-order condition that characterizes a broad class of such functions. This condition provides a powerful unifying framework, strictly generalizing both DR-submodular and One-Sided Smooth (OSS) functions while capturing broader forms of scale-dependent curvature, including accumulating-then-diminishing returns ...
846	Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning 2602.14868 Curriculum RL for reasoning通过自适应调节任务难度缓解稀疏奖励并提升推理RL效率	cs.LGcs.AI	Ilia Mahrooghi, Aryo Lotfi, Emmanuel Abbe	Reinforcement learning has emerged as a powerful paradigm for unlocking reasoning capabilities in language models. However, relying on sparse rewards makes this process highly sample-inefficient, as models must navigate vast search spaces with minimal feedback... Reinforcement learning has emerged as a powerful paradigm for unlocking reasoning capabilities in language models. However, relying on sparse rewards makes this process highly sample-inefficient, as models must navigate vast search spaces with minimal feedback. While classic curriculum learning aims to mitigate this by ordering data based on complexity, prior works have primarily targeted small datasets and do not directly transfer to the large-scale settings typical of modern LM training. Furth...
847	RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion 2602.16548 RNA 3D inverse design用RL引导扩散模型进行RNA三维结构逆向设计并优化结构保真	cs.LG	Tianmeng Hu, Yongzheng Cui, Biao Luo, Ke Li	The inverse design of RNA three-dimensional (3D) structures is crucial for engineering functional RNAs in synthetic biology and therapeutics. While recent deep learning approaches have advanced this field, they are typically optimized and evaluated using nativ... The inverse design of RNA three-dimensional (3D) structures is crucial for engineering functional RNAs in synthetic biology and therapeutics. While recent deep learning approaches have advanced this field, they are typically optimized and evaluated using native sequence recovery, which is a limited surrogate for structural fidelity, since different sequences can fold into similar 3D structures and high recovery does not necessarily indicate correct folding. To address this limitation, we propose...
848	Structured Prototype-Guided Adaptation for EEG Foundation Models 2602.17251 EEG foundation model adaptation用结构化原型引导微调缓解少标注EEG模型失配与漂移	cs.LG	Jingying Ma, Feng Wu, Yucheng Xing, Qika Lin, Tianyu Liu	Electroencephalography (EEG) foundation models (EFMs) have shown strong potential for transferable representation learning, yet their adaptation in realistic settings remains challenging when only a few labeled subjects are available. We show that this challen... Electroencephalography (EEG) foundation models (EFMs) have shown strong potential for transferable representation learning, yet their adaptation in realistic settings remains challenging when only a few labeled subjects are available. We show that this challenge stems from a structural mismatch between noisy, limited supervision and the highly plastic parameter space of EFMs, reflected in three key failure modes: overconfident miscalibration, prediction collapse, and representation drift caused ...
849	Emergent Manifold Separability during Reasoning in Large Language Models 2602.20338 LLM reasoning representation geometry用流形容量理论分析CoT推理中表征可分性随时间涌现	cs.LG	Chanwoo Chun, Alexandre Polo, SueYeon Chung	Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) t... Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) to two compositional reasoning tasks: a controlled Boolean logic tree that supports deep mechanistic analysis, and a natural-language eligibility task in which the model has to extract attributes from prose, compare them to thresholds, and c...
850	MAST: A Multi-fidelity Augmented Surrogate model via Spatial Trust-weighting 2602.20974 Multi-fidelity surrogate modeling提出空间信任加权的多保真增强代理模型以兼顾精度与成本	cs.LG	Ahmed Mohamed Eisa Nasr, Ali Elham, Haris Moazam Sheikh	In engineering design and scientific computing, computational cost and predictive accuracy are intrinsically coupled. High-fidelity simulations provide accurate predictions but at substantial computational costs, while lower-fidelity approximations offer effic... In engineering design and scientific computing, computational cost and predictive accuracy are intrinsically coupled. High-fidelity simulations provide accurate predictions but at substantial computational costs, while lower-fidelity approximations offer efficiency at the expense of accuracy. Multi-fidelity surrogate modelling addresses this trade-off by combining abundant low-fidelity data with sparse high-fidelity observations. However, existing methods rely on global correlation assumptions t...
851	Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parametric Policies 2602.23811 Offline RL with function approximation给出参数化策略的离线策略优化理论并超越逐状态镜像下降限制	cs.LGcs.AI	Xiang Li, Yuheng Zhang, Nan Jiang	We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundations of learning a good policy from offline data via pessimis... We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundations of learning a good policy from offline data via pessimism, existing algorithms that are computationally tractable (often in an oracle-efficient sense), such as PSPI, only apply to finite and small action spaces. Moreover, these algorithms rely on state-wise mirror descent and require actors to b...
852	Econometric vs. Causal Structure-Learning for Time-Series Policy Decisions: Evidence from the UK COVID-19 Policies 2603.00041 Causal discovery in time series比较计量与因果结构学习方法在英国新冠政策时序决策中的表现	cs.LGcs.AI	Bruno Petrungaro, Anthony C. Constantinou	Causal machine learning (ML) recovers graphical structures that inform us about potential cause-and-effect relationships. Most progress has focused on cross-sectional data with no explicit time order, whereas recovering causal structures from time series data ... Causal machine learning (ML) recovers graphical structures that inform us about potential cause-and-effect relationships. Most progress has focused on cross-sectional data with no explicit time order, whereas recovering causal structures from time series data remains the subject of ongoing research in causal ML. In addition to traditional causal ML, this study assesses econometric methods that some argue can recover causal structures from time series data. The use of these methods can be explain...
853	VDCook:DIY video data cook your MLLMs 2603.05539 Video data construction for MLLMs构建自演化视频数据操作系统以检索与合成生成可追溯数据包	cs.LGcs.AIcs.MM	Chengwei Wu	We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain teams. Users initiate data requests via natural language queries and adjustable parameters (scale, retrieval-s... We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain teams. Users initiate data requests via natural language queries and adjustable parameters (scale, retrieval-synthesis ratio, quality threshold). The system automatically performs query optimization, concurrently running real video retrieval and controlled synthesis modules. It ultimately generates in-domain data packages with complete provenance a...
854	Exact Is Easier: Credit Assignment for Cooperative LLM Agents 2603.06859 Credit assignment for LLM agents指出移除代理等反事实评估会失真并提出更精确的协作归因方法	cs.LGcs.AI	Yanjun Chen, Yirong Sun, Hanlin Wang, Jinghan Wang, Xinming Zhang	Removing an agent from a cooperative team to measure its contribution seems natural, yet in multi-agent LLM systems this evaluation distorts the result it claims to measure. This failure is not isolated: learned critics, trajectory-level baselines, and agent-r... Removing an agent from a cooperative team to measure its contribution seems natural, yet in multi-agent LLM systems this evaluation distorts the result it claims to measure. This failure is not isolated: learned critics, trajectory-level baselines, and agent-removal counterfactuals all inherit from standard multi-agent reinforcement learning a premise that exact counterfactual evaluation requires privileged environment access, and therefore approximate. In cooperative LLM systems, this premise i...
855	Upper Generalization Bounds for Neural Oscillators 2603.09742 Generalization bounds for neural oscillators推导二阶ODE神经振子架构的上界泛化误差理论结果	cs.LG	Zifeng Huang, Konstantin M. Zuev, Yong Xia, Michael Beer	Neural oscillators that originate from second-order ordinary differential equations (ODEs) have shown competitive performance in learning mappings between dynamic loads and responses of complex nonlinear structural systems. Despite this empirical success, theo... Neural oscillators that originate from second-order ordinary differential equations (ODEs) have shown competitive performance in learning mappings between dynamic loads and responses of complex nonlinear structural systems. Despite this empirical success, theoretically quantifying the generalization capacities of their neural network architectures remains undeveloped. In this study, the neural oscillator consisting of a second-order ODE followed by a multilayer perceptron (MLP) is considered. It...
856	How Log-Barrier Helps Exploration in Policy Optimization 2603.15001 Exploration in policy optimization用对数障碍正则化为策略优化引入显式探索并改进收敛保证	cs.LGcs.AI	Leonardo Cesani, Matteo Papini, Marcello Restelli	Recently, it has been shown that the Stochastic Gradient Bandit (SGB) algorithm converges to a globally optimal policy with a constant learning rate. However, these guarantees rely on unrealistic assumptions about the learning process, namely that the probabil... Recently, it has been shown that the Stochastic Gradient Bandit (SGB) algorithm converges to a globally optimal policy with a constant learning rate. However, these guarantees rely on unrealistic assumptions about the learning process, namely that the probability of the optimal action is always bounded away from zero. We attribute this to the lack of an explicit exploration mechanism in SGB. To address these limitations, we propose to regularize the SGB objective with a log-barrier on the parame...
857	A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks 2603.22586 Instruction-conditioned time-series ICL提出指令条件时序基础模型以示例提示实现多任务在上下文学习	cs.LG	Anish Saha, Konstantin Shmakov	In-context learning (ICL) enables task adaptation at inference time by conditioning on demonstrations rather than updating model parameters. Although recent time-series foundation models incorporate contextual conditioning, retrieval, or example-based promptin... In-context learning (ICL) enables task adaptation at inference time by conditioning on demonstrations rather than updating model parameters. Although recent time-series foundation models incorporate contextual conditioning, retrieval, or example-based prompting, they typically rely on implicit positional structure or task-specific objectives rather than explicit instruction-conditioned input-output demonstrations. We introduce iAmTime, a time-series foundation model trained with instruction-cond...
858	Demystifying Lipschitz verification: positive matrices, negative results 2603.28113 Lipschitz constant verification分析SDP等Lipschitz验证的局限并给出正矩阵视角的负面结果	cs.LG	Simon Kuang, Yuezhu Xu, S. Sivaranjani, Xinfan Lin	The global Lipschitz constant of a neural network is related to robustness and generalization, yet unlike in many classical models, it is not plainly legible from the parameters. This has motivated sophisticated verification algorithms, especially semidefinite... The global Lipschitz constant of a neural network is related to robustness and generalization, yet unlike in many classical models, it is not plainly legible from the parameters. This has motivated sophisticated verification algorithms, especially semidefinite programming (SDP) based on incremental quadratic constraints on the activation functions, to improve on the fast but often loose product of layerwise Lipschitz constants (the trivial bound). We ask why Lipschitz verification is a problem i...
859	ASPECT: Node-Level Adaptive Spectral Fusion for Graph Contrastive Learning 2604.01878 Graph contrastive spectral fusion提出节点级自适应谱融合以改进图对比学习的低高频视图结合	cs.LGcs.AI	Zhuolong Li, Boxue Yang, Haopeng Chen	Spectral graph contrastive learning often constructs low- and high-frequency views to capture complementary graph signals, but these views are commonly combined by graph-level or node-agnostic fusion rules. We show that graph-level fusion can incur irreducible... Spectral graph contrastive learning often constructs low- and high-frequency views to capture complementary graph signals, but these views are commonly combined by graph-level or node-agnostic fusion rules. We show that graph-level fusion can incur irreducible regret on mixed graphs with separated node-wise spectral preferences. Motivated by this result, we propose ASPECT, a spectral graph contrastive learning method that adaptively fuses low- and high-frequency views at the node level. ASPECT l...
860	AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation 2604.02525 Low-precision training stabilization提出感知离群模式的旋转策略以提升低精度训练速度与精度	cs.LG	Seonggon Kim, Alireza Khodamoradi, Pranathi Vasireddy, Kristof Denolf, Eunhyeok Park	Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces qu... Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces quantization error only when its direction is properly aligned with the operand's outlier structure. Through a systematic study of weights, activations, and gradients in LLM training, we identify three stable outlier patterns, Row-wise, Colum...
861	Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions 2604.07277 Efficient online RL for Android agents提出单状态多动作更新以降低交互成本并提升安卓代理在线训练效率	cs.LGcs.AI	Guo Gan, Yuxuan Ding, Cong Chen, Yuwei Ren, Yin Huang	Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample ineffi... Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample inefficiency of existing RL algorithms. We identify a fundamental limitation in current approaches: the Single State Single Action paradigm, which updates the policy with one-to-one state-action pairs from online one-way rollouts without fully ex...
862	The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts 2604.11962 Feature geometry in deep networks提出线性质心假说将特征解释为局部专家学习的质心方向结构	cs.LG	Thomas Walker, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk	The Linear Representation Hypothesis (LRH) identifies features of a trained deep network (DN) as linear directions in the activation spaces, i.e., output spaces of intermediate layers. This characterization decouples the input-output maps learned by a DN from ... The Linear Representation Hypothesis (LRH) identifies features of a trained deep network (DN) as linear directions in the activation spaces, i.e., output spaces of intermediate layers. This characterization decouples the input-output maps learned by a DN from the organization of feature directions in its activation spaces. We introduce the Linear Centroids Hypothesis (LCH), which instead identifies features with linear directions among a DN's centroid spaces -- where any vector denotes a centroi...
863	Loss-Driven Bayesian Active Learning 2604.11995 Bayesian active learning objectives提出由下游损失直接导出的贝叶斯主动学习采样目标与算法	cs.LG	Zhuoyue Huang, Freddie Bickford Smith, Tom Rainforth	The central goal of active learning is to gather data that maximises downstream predictive performance, but popular approaches have limited flexibility in customising this data acquisition to different downstream problems and losses. We propose a rigorous loss... The central goal of active learning is to gather data that maximises downstream predictive performance, but popular approaches have limited flexibility in customising this data acquisition to different downstream problems and losses. We propose a rigorous loss-driven approach to Bayesian active learning that allows data acquisition to directly target the loss associated with a given decision problem. In particular, we show how any loss can be used to derive a unique objective for optimal data ac...
864	Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation 2604.13010 Offline on-policy distillation提出离线在策略蒸馏以降低教师服务开销并分析失败条件与修正	cs.LGcs.AI	Yecheng Wu, Song Han, Hai Cai	On-policy distillation (OPD) is an effective post-training paradigm for large language models but requires a live teacher server throughout training, resulting in substantial infrastructure overhead. We investigate whether OPD can be performed offline by preco... On-policy distillation (OPD) is an effective post-training paradigm for large language models but requires a live teacher server throughout training, resulting in substantial infrastructure overhead. We investigate whether OPD can be performed offline by precomputing teacher log-probabilities once over SFT rollouts and reusing them during training. We find that naively doing so fails to reliably match standard OPD, and trace the root cause to a previously overlooked condition we term teacher con...
865	Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction 2604.15694 Discrete diffusion via CTMCs将CTMC反向率分解为跳时与跳向并据此构建离散扩散生成模型	cs.LG	Jingyuan Li, Xiaoyi Jiang, Fukang Wen, Wei Liu, Renqian Luo	Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix monolithically -- through proxies such as co... Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix monolithically -- through proxies such as concrete scores (SEDD) or clean-data predictions (MDLM, GIDD) -- rather than aligning the parameterization with the intrinsic CTMC decomposition into jump timing and jump direction. We propose \textbf{Neural CTMC}, which exploits the underlyi...
866	EviDep: Trustworthy Multimodal Depression Estimation via Disentangled Evidential Learning 2604.16579 Uncertainty-aware depression estimation用解耦证据学习在多模态抑郁估计中同时给出预测与不确定性	cs.LGcs.AI	Fangyuan Liu, Sirui Zhao, Zeyu Zhang, Jinyang Huang, Feng-Qi Cui	Automated multimodal depression estimation in unconstrained environments is inherently challenged by naturalistic noise and complex behavioral variability. Prevailing deterministic methods, however, produce uncalibrated point estimates without quantifying pred... Automated multimodal depression estimation in unconstrained environments is inherently challenged by naturalistic noise and complex behavioral variability. Prevailing deterministic methods, however, produce uncalibrated point estimates without quantifying predictive uncertainty, exposing decision-making to the risk of overconfident, untrustworthy estimates. To establish a reliable and trustworthy estimation paradigm, we propose EviDep, an evidential learning framework that jointly quantifies dep...
867	Robustness of Spatio-temporal Graph Neural Networks for Fault Location in Partially Observable Distribution Grids 2604.20403 Robust spatio-temporal GNNs for grids评测并增强时空图神经网络在部分可观测配电网故障定位的鲁棒性	cs.LG	Burak Karabulut, Carlo Manna, Chris Develder	Fault location in distribution grids is critical for reliability and minimizing outage durations. Yet, it remains challenging due to partial observability, given sparse measurement infrastructure. Recent works show promising results by combining Recurrent Neur... Fault location in distribution grids is critical for reliability and minimizing outage durations. Yet, it remains challenging due to partial observability, given sparse measurement infrastructure. Recent works show promising results by combining Recurrent Neural Networks (RNNs) and Graph Neural Networks (GNNs) for spatio-temporal learning. Still, many modern GNN architectures remain untested for this grid application, while existing GNN solutions have not explored GNN topology definitions beyond...
868	Transferable SCF-Acceleration through Solver-Aligned Initialization Learning 2604.21657 ML initialization for DFT SCF学习与求解器对齐的初始化以可迁移地加速KS-DFT自洽场收敛	cs.LG	Eike S. Eberhard, Viktor Kotsev, Timm G\"uthle, Stephan G\"unnemann	The cost of Kohn-Sham density functional theory (KS-DFT) calculations scales with the number of solver iterations, which depends on the quality of the initial guess. Machine learning methods that predict initial guesses from molecular geometry can reduce this ... The cost of Kohn-Sham density functional theory (KS-DFT) calculations scales with the number of solver iterations, which depends on the quality of the initial guess. Machine learning methods that predict initial guesses from molecular geometry can reduce this cost, but matrix-prediction models fail when extrapolating to larger molecules, degrading rather than accelerating convergence [Liu et al., 2025]. We show that this failure is a supervision problem, not an extrapolation problem: models trai...
869	The Role of Symmetry in Optimizing Overparameterized Networks 2604.25150 Symmetry in overparameterized optimization分析过参数化引入的权重对称性如何改善条件数并促进优化	cs.LGcs.AI	Kusha Sareen, Mohammad Pedramfar, S\'ekou-Oumar Kaba, Mehran Shakerinava, Siamak Ravanbakhsh	Overparameterization is central to the success of deep learning, yet the mechanisms by which it improves optimization remain incompletely understood. We analyze weight-space symmetries in neural networks and show that overparameterization introduces additional... Overparameterization is central to the success of deep learning, yet the mechanisms by which it improves optimization remain incompletely understood. We analyze weight-space symmetries in neural networks and show that overparameterization introduces additional symmetries that benefit optimization in two distinct ways. First, we prove that these symmetries act as a form of diagonal preconditioning on the Hessian, enabling the existence of better-conditioned minima within each equivalence class of...
870	Polynomial-Time Optimal Group Selection via the Double-Commutator Eigenvalue Problem 2605.00834 Polynomial-time group selection将群选择化为双交换子特征值问题并给出多项式时间最优算法	cs.LG	Mitchell A. Thornton	The algebraic diversity framework generalizes temporal averaging over multiple observations to algebraic group action on a single observation for second-order statistical estimation. The central open problem in this framework is $\textit{group selection}$: giv... The algebraic diversity framework generalizes temporal averaging over multiple observations to algebraic group action on a single observation for second-order statistical estimation. The central open problem in this framework is $\textit{group selection}$: given an $M$-dimensional observation with unknown covariance structure, find the finite group whose spectral decomposition best matches the covariance. Naive enumeration of all subgroups of the symmetric group $S_M$ requires exponential time i...
871	Rhamba: Region-Aware Hybrid Attention-Mamba Framework for Self-Supervised Learning in Resting-State fMRI 2605.01240 Self-supervised fMRI pretraining提出区域感知掩码与Attention-Mamba融合的静息态fMRI自监督预训练框架。	cs.LGcs.AI	Ruthwik Reddy Doodipala, Pankaj Pandey, Pratheek Eranki, Carolina Torres-Rojas, Manob Jyoti Saikia	Self-supervised pretraining is promising for large-scale neuroimaging, yet the impact of region-aware masking and hybrid sequence modeling remains underexplored. In this work, we introduce Rhamba, a region-aware pretraining framework that integrates anatomical... Self-supervised pretraining is promising for large-scale neuroimaging, yet the impact of region-aware masking and hybrid sequence modeling remains underexplored. In this work, we introduce Rhamba, a region-aware pretraining framework that integrates anatomically guided masking with hybrid Attention-Mamba architectures for resting state functional magnetic resonance imaging (fMRI) analysis. Models were pretrained on the ABIDE dataset using region-aligned patch embeddings and three masking strateg...
872	A Theory of Saddle Escape in Deep Nonlinear Networks 2605.01288 Saddle escape theory推导深层非线性网络鞍点逃逸与特征突变的理论刻画并分类激活函数。	cs.LG	Divit Rawal, Michael R. DeWeese	In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks re... In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks remains challenging. We derive an exact identity for the imbalance of Frobenius norms of layer weight matrices that holds for any smooth activation and any differentiable loss and use this to classify activation functions into four universali...
873	QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL 2605.01862 Offline goal-conditioned RL提出Q条件混合Attention-Mamba序列模型以提升离线目标条件强化学习的历史建模。	cs.LG	Xing Lei, Jincheng Wang, Xuetao Zhang, Donglin Wang	Offline goal-conditioned RL (GCRL) learns goal-reaching policies from static datasets, but real-world datasets are often partially observable and history-dependent, exhibiting a mix of Markovian and non-Markovian that violate standard RL assumptions. History-a... Offline goal-conditioned RL (GCRL) learns goal-reaching policies from static datasets, but real-world datasets are often partially observable and history-dependent, exhibiting a mix of Markovian and non-Markovian that violate standard RL assumptions. History-aware sequence models such as Decision Transformer (DT) are a natural fit for long-term dependency modeling, yet pure attention is inefficient and brittle when handling local Markovian structure and long-range context simultaneously. Althoug...
874	DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning 2605.02196 Machine unlearning attack揭示INT4量化会恢复已遗忘内容并提出量化恢复攻击与系统评测。	cs.LG	Abdullah Ahmad Khan, Ferdous Sohel	Machine unlearning aims to remove specified training data to satisfy privacy regulations such as GDPR. However, existing evaluations assume identical precision at unlearning and deployment, overlooking that production LLMs are deployed at low-bit precision. We... Machine unlearning aims to remove specified training data to satisfy privacy regulations such as GDPR. However, existing evaluations assume identical precision at unlearning and deployment, overlooking that production LLMs are deployed at low-bit precision. We show that INT4 quantization systematically restores forgotten content even when models pass compliance audits at bfloat16 (BF16), we term this the quantization recovery attack (QRA). We conduct the first systematic study of unlearning robu...
875	AsymTalker: Identity-Consistent Long-Term Talking Head Generation via Asymmetric Distillation 2605.02948 Talking head video generation用非对称蒸馏缓解分块生成的身份漂移，实现长时一致的说话人视频生成。	cs.LGcs.AIcs.SD	Yuxin Lu, Qian Qiao, Jiayang Sun, Guibo Zhu, Min Cao	Diffusion-based talking head generation has achieved remarkable visual quality, yet scaling it to long-term videos remains challenging. The widely adopted chunk-wise paradigm introduces two fundamental failures: (1) temporal-spatial misalignment between static... Diffusion-based talking head generation has achieved remarkable visual quality, yet scaling it to long-term videos remains challenging. The widely adopted chunk-wise paradigm introduces two fundamental failures: (1) temporal-spatial misalignment between static identity references and dynamic audio streams, and (2) cascading identity drift propagated through self-generated continuity references across chunks. To address both issues, we propose AsymTalker, a novel diffusion-based talking head gene...
876	DGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignment 2605.03327 LLM RL credit assignment提出DGPO以细粒度分配推理步骤信用并稳定KL约束的策略优化。	cs.LGcs.AI	Hongbo Jin, Rongpeng Zhu, Zhongjing Du, Xu Jiang, Jingqi Tian	Reinforcement learning is crucial for aligning large language models to perform complex reasoning tasks. However, current algorithms such as Group Relative Policy Optimization suffer from coarse grained, sequence level credit assignment, which severely struggl... Reinforcement learning is crucial for aligning large language models to perform complex reasoning tasks. However, current algorithms such as Group Relative Policy Optimization suffer from coarse grained, sequence level credit assignment, which severely struggles to isolate pivotal reasoning steps within long Chain of Thought generations. Furthermore, the standard unbounded Kullback Leibler divergence penalty induces severe gradient instability and mode seeking conservatism, ultimately stifling t...
877	Time series causal discovery with variable lags 2605.04081 Time-series causal discovery研究含可变滞后的时间序列因果结构学习方法以恢复因果图。	cs.LGcs.AI	Bruno Petrungaro, Anthony C. Constantinou	Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs require a cause-and-effect map ... Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs require a cause-and-effect map of the variables under consideration, known as the network's structure. Learning the graphical structure of a causal model from data remains challenging; learning it from time-series data is even harder because dependencies may arise at dif...
878	Gradient Flow Structure and Quantitative Dynamics of Multi-Head Self-Attention 2605.04279 Multi-head attention dynamics建立多头自注意力的梯度流动力学理论并分析聚类与头间干扰。	cs.LG	Ayan Pendharkar	Transformer self-attention can be interpreted as a gradient flow on the unit sphere, in which tokens evolve under softmax interaction potentials and tend to form clusters. While prior work has established clustering behavior for single-head attention, the mult... Transformer self-attention can be interpreted as a gradient flow on the unit sphere, in which tokens evolve under softmax interaction potentials and tend to form clusters. While prior work has established clustering behavior for single-head attention, the multi-head setting remains less understood due to geometric interference between heads, which invalidates standard monotonicity arguments. In this work, we develop a theoretical framework for multi-head self-attention dynamics and resolve sever...
879	LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems 2605.04323 Soil multimodal dataset发布LUCAS-MEGA大规模土壤环境多模态数据集以支持表征学习。	cs.LG	Kuangdai Leng, Simon Jeffery, Panos Panagos, Tarje Nissen-Meyer	Understanding soil is fundamental to agriculture, carbon cycling, and environmental sustainability, yet progress is limited by fragmented and heterogeneous datasets that constrain modeling to small-scale predictive settings rather than high-dimensional represe... Understanding soil is fundamental to agriculture, carbon cycling, and environmental sustainability, yet progress is limited by fragmented and heterogeneous datasets that constrain modeling to small-scale predictive settings rather than high-dimensional representation learning. We introduce LUCAS-MEGA, a large-scale multimodal dataset constructed through systematic data fusion of European soil-environment observations, with the LUCAS survey as its backbone. The fused dataset comprises over 70,000...
880	Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention 2605.04460 Sparse counterfactual intervention通过潜变量调整发现稀疏可行的调查变量干预以实现社区反事实转移。	cs.LG	Fatima Ashraf, Muhammad Ayub Sabir, Junbiao Pang, Yufang Zhou, Yan Shang	Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual co... Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual community intervention from survey responses, where the goal is to shift a target respondent group toward a desired reference group through controllable survey-variable adjustments. We formulate this task as a policy-feasible distributional a...
881	SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning 2605.04712 Continual DRL with MoE提出SPHERE缓解MoE在持续强化学习中的谱可塑性损失与性能退化。	cs.LG	Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li	In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixt... In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating ...
882	Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime 2605.05112 Binary-reward RL steering提出通过控制rollout通过率将二值奖励RL引导到信息量最大的训练区间。	cs.LG	Tianshu Zhu, Wenyu Zhang, Xiaoying Zuo, Lun Tian, Haotian Zhao	Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side sign... Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side signal is strongest near a 50% rollout pass rate under four criteria: reward entropy, group-filtering survival, leave-one-out (RLOO) advantage energy under Group Relative Policy Optimization (GRPO), and success-failure pair count. We propose Pr...
883	MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference 2605.05225 Efficient multimodal MoE inference提出MACS按模态感知进行容量缩放以减少多模态MoE推理的拖尾瓶颈。	cs.LGcs.AI	Bo Li, Chuan Wu, shaolin Zhu	Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-base... Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load balancing methods fail to address two unique challenges: (1) Information Heterogeneity, where numerous redundant visual tokens are treated equally to semantically critical ones, and (2) Modality Dynamics, where varying visual to text...
884	Online Localized Conformal Prediction 2605.05497 Online conformal prediction提出在线局部化保序预测以在非交换数据下实现更高效的校准与覆盖。	cs.LG	Yuheng Lai, Garvesh Raskutti	Conformal prediction is a framework that provides valid uncertainty quantification for general models with exchangeable data. However, in the online learning and time-series settings, exchangeability is not satisfied. Existing online conformal methods, such as... Conformal prediction is a framework that provides valid uncertainty quantification for general models with exchangeable data. However, in the online learning and time-series settings, exchangeability is not satisfied. Existing online conformal methods, such as adaptive conformal inference (ACI), can achieve long-run validity, yet they remain inefficient under covariate heterogeneity because they rely on global calibration. We propose \emph{Online Localized Conformal Prediction (OLCP)}, which com...
885	LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites 2605.05615 LLM inference carbon modeling构建LEO卫星上LLM推理的全生命周期能耗与碳足迹建模框架LLMSpace。	cs.LG	Lei Jiang, Adrian Ildefonso, Daniel Loveless, Fan Chen	Large language models (LLMs) impose rapidly growing energy demands, creating an emerging energy and carbon crisis driven by large-scale inference. Solar-powered, AI-enabled low Earth orbit (LEO) satellites have been proposed to mitigate terrestrial electricity... Large language models (LLMs) impose rapidly growing energy demands, creating an emerging energy and carbon crisis driven by large-scale inference. Solar-powered, AI-enabled low Earth orbit (LEO) satellites have been proposed to mitigate terrestrial electricity consumption, but their lifecycle carbon footprint remains poorly understood due to launch emissions, satellite manufacturing, and radiation-hardened hardware requirements. This paper presents \textit{LLMSpace}, the first carbon modeling fr...
886	CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning 2605.05732 Continual learning without weight updates提出CRAFT用低秩隐表示干预替代权重更新以减轻持续微调遗忘。	cs.LGcs.AI	Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari	Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank int... Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in three stages: it first routes each task to a group of similar tasks based on output-distribution divergence; it then fine-tunes the model using a Kullback-Leibler (KL) divergence again...
887	Retrieval from Within: An Intrinsic Capability of Attention-Based Models 2605.05806 Intrinsic retrieval in attention提出INTRA让编码解码模型用注意力在内部表征中检索证据并复用生成。	cs.LG	Elad Hoffer, Yochai Blau, Edan Kinderman, Ron Banner, Daniel Soudry	Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval v... Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generat...
888	Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs 2605.05957 LLM factual correction suppression构建基准并分析任务请求会抑制LLM对错误前提的事实纠正行为。	cs.LG	Zixuan Chen, Hao Lin, Zizhe Chen, Yizhou Tian, Garry Yang	LLMs reliably correct false claims when presented in isolation, yet when the same claims are embedded in task-oriented requests, they often comply rather than correct. We term this failure mode \emph{correction suppression} and construct a benchmark of 300 fal... LLMs reliably correct false claims when presented in isolation, yet when the same claims are embedded in task-oriented requests, they often comply rather than correct. We term this failure mode \emph{correction suppression} and construct a benchmark of 300 false premises to systematically evaluate it across eight models. Suppression rates range from 19\% to 90\%, with four models exceeding 80\%, establishing correction suppression as a prevalent and severe phenomenon. Mechanistic analysis reveal...
889	Entropy-Regularized Adjoint Matching for Offline Reinforcement Learning 2605.06156 Offline RL with generative policies提出熵正则的伴随匹配以缓解离线RL中流式策略的流行度偏置与支持绑定。	cs.LGcs.AI	Abdelghani Ghanem, Mounir Ghogho	Integrating expressive generative policies, such as flow-matching models, into offline reinforcement learning (RL) allows agents to capture complex, multi-modal behaviors. While Q-learning with Adjoint Matching (QAM) stabilizes policy optimization via the cont... Integrating expressive generative policies, such as flow-matching models, into offline reinforcement learning (RL) allows agents to capture complex, multi-modal behaviors. While Q-learning with Adjoint Matching (QAM) stabilizes policy optimization via the continuous adjoint method, it remains inherently bound to the fixed behavior distribution. This dependence induces a \textit{popularity bias} that can suppress high-reward actions in low-density regions, and creates a \textit{support binding} t...
890	AffineLens: Capturing the Continuous Piecewise Affine Functions of Neural Networks 2605.06218 Piecewise affine network analysis提出AffineLens以实际枚举并刻画神经网络连续分段仿射区域结构。	cs.LG	Yi Wei, Xuan Qi, Furao shen, Jian Zhao, Vittorio Murino	Piecewise affine neural networks (PANNs) provide a principled geometric perspective on neural network expressivity by characterizing the input--output map as a continuous piecewise affine (CPA) function whose complexity is governed by the number, arrangement, ... Piecewise affine neural networks (PANNs) provide a principled geometric perspective on neural network expressivity by characterizing the input--output map as a continuous piecewise affine (CPA) function whose complexity is governed by the number, arrangement, and shapes of its affine regions. However, existing interpretability and expressivity analyses often rely on indirect proxies (e.g., activation statistics or theoretical upper bounds) and rarely offer practical, accurate tools for enumerati...
891	MinMax Recurrent Neural Cascades 2605.06384 MinMax recurrent networks提出MinMax递归级联网络以避免梯度消失爆炸并具强表达能力。	cs.LGcs.AI	Alessandro Ronca	We show that the MinMax algebra provides a form of recurrence that is expressively powerful, efficiently implementable, and most importantly it is not affected by vanishing or exploding gradient. We call MinMax Recurrent Neural Cascades (RNCs) the models obtai... We show that the MinMax algebra provides a form of recurrence that is expressively powerful, efficiently implementable, and most importantly it is not affected by vanishing or exploding gradient. We call MinMax Recurrent Neural Cascades (RNCs) the models obtained by cascading several layers of neurons that employ such recurrence. We show that MinMax RNCs enjoy many favourable theoretical properties. First, their formal expressivity includes all regular languages, arguably the maximal expressivit...
892	Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level 2605.06387 On-policy distillation for LLMs提出非对称在策略蒸馏以降低方差并改进token级纠错与探索。	cs.LGcs.AI	Nan Jia, Haojin Yang, Xing Ma, Jiesong Lian, Shuailiang Zhang	On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suf... On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suffers from three structural weaknesses, including high variance updates, vanishing gradients in zero-advantage regions, and exploration bottlenecks when corrective signals are insufficient. We therefore propose Asymmetric On-Policy Distillat...
893	Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching 2605.06474 Off-policy evaluation theory提出Q-MMR用递归重加权与矩匹配学习样本权重以估计目标策略回报。	cs.LGcs.AI	Xiang Li, Nan Jiang	We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weig... We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned inductively in a top-down manner via a moment matching objective against a value-function discriminator class. Notably, and perhaps surprisingly, a data-dependent finite-sample guarantee for general function approximation ca...
894	Learning quantum Hamiltonians at any temperature in polynomial time 2310.02243 Quantum Hamiltonian learning给出在任意温度下多项式时间学习局域量子哈密顿量的算法。	cs.LG	Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang	We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $\rho = e^{-\beta H}/\textrm{tr}(e^{-\beta H})$ at a known inverse temperature $\beta>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave... We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $\rho = e^{-\beta H}/\textrm{tr}(e^{-\beta H})$ at a known inverse temperature $\beta>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $\epsilon$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [...
895	Active teacher selection for reward learning 2310.15288 Multi-teacher reward learning提出HUB框架主动选择教师以在异质人类反馈下高效学习奖励。	cs.LGcs.AI	Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell	Reward learning techniques enable machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite gathering feedback from large and heterogene... Reward learning techniques enable machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite gathering feedback from large and heterogeneous populations. We propose the Hidden Utility Bandit (HUB) framework to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers. We develop a variety of solution algor...
896	Clinical Characteristics and Laboratory Biomarkers in ICU-admitted Septic Patients with and without Bacteremia 2311.08433 Sepsis bacteremia biomarkers回顾性分析ICU脓毒症患者实验室指标以评估预测菌血症的效能。	cs.LG	Sangwon Baek, Seung Jun Lee	Few studies have investigated the diagnostic utilities of biomarkers for predicting bacteremia among septic patients admitted to intensive care units (ICU). Therefore, this study evaluated the prediction power of laboratory biomarkers to utilize those markers ... Few studies have investigated the diagnostic utilities of biomarkers for predicting bacteremia among septic patients admitted to intensive care units (ICU). Therefore, this study evaluated the prediction power of laboratory biomarkers to utilize those markers with high performance to optimize the predictive model for bacteremia. This retrospective cross-sectional study was conducted at the ICU department of Gyeongsang National University Changwon Hospital in 2019. Adult patients qualifying SEPSI...
897	Structure learning of Hamiltonians from real-time evolution 2405.00082 Hamiltonian structure learning从未知哈密顿量的实时演化中高效恢复其相互作用项与结构。	cs.LG	Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang	We study the problem of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m \lambda_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is alre... We study the problem of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m \lambda_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is already well-understood under the assumption that the interaction terms, $E_a$, are given, and only the interaction strengths, $\lambda_a$, are unknown. But how efficiently can we learn a local Hamiltonian without prior knowledge of its interac...
898	Optimising MFCC parameters for the automatic detection of respiratory diseases 2408.07522 MFCC for respiratory diagnosis系统优化MFCC提取参数以提升呼吸系统疾病的自动检测性能。	cs.LGcs.SDeess.AS	Yuyang Yan, Sami O. Simons, Loes van Bemmel, Lauren Reinders, Frits M. E. Franssen	Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for auto... Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we addres...
899	A Hybrid Graph Neural Network for Enhanced EEG-Based Depression Detection 2410.18103 EEG depression detection GNN提出混合图神经网络同时建模抑郁的共性与个体化脑连接以提升检测。	cs.LGcs.AI	Yiye Wang, Wenming Zheng, Yang Li, Hao Yang	Graph neural networks (GNNs) are becoming increasingly popular for EEG-based depression detection. However, previous GNN-based methods fail to sufficiently consider the characteristics of depression, thus limiting their performance. Firstly, studies in neurosc... Graph neural networks (GNNs) are becoming increasingly popular for EEG-based depression detection. However, previous GNN-based methods fail to sufficiently consider the characteristics of depression, thus limiting their performance. Firstly, studies in neuroscience indicate that depression patients exhibit both common and individualized brain abnormal patterns. Previous GNN-based approaches typically focus either on fixed graph connections to capture common abnormal brain patterns or on adaptive...
900	Pretraining a Foundation Model for Small-Molecule Natural Products 2503.17656 Molecular foundation model pretraining预训练面向天然产物小分子的基础模型以提升多下游任务的泛化表现。	cs.LGcs.AI	Yuheng Ding, Bo Qiang, Shaoning Li, Yiran Zhou, Jie Yu	Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learnin... Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learning approaches designed for specific downstream tasks. However, such one-model-for-a-task paradigm often lacks generalizability and leaves significant room for performance improvement. Additionally, existing molecular characterization methods...
901	An abstract effective convergence theorem for stochastic processes, with applications to stochastic approximation 2504.12922 Stochastic Approximation Convergence提出松弛超鞅条件下随机过程的有效收敛定理并给出定量保证。	cs.LG	Morenikeji Neri, Nicholas Pischke, Thomas Powell	We provide a general theorem on the asymptotic behavior of stochastic processes that conform to a relaxed supermartingale condition. The distinguishing feature of our result is that it provides quantitative convergence guarantees at a much higher level of abst... We provide a general theorem on the asymptotic behavior of stochastic processes that conform to a relaxed supermartingale condition. The distinguishing feature of our result is that it provides quantitative convergence guarantees at a much higher level of abstraction and generality than is typically seen in the stochastic approximation literature, formulated in particular in terms of a general modulus $\tau$ that, on an intuitive level, captures an effective variant of the uniqueness in expectat...
902	Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors 2505.11325 Uncertainty for PFNs用鞅后验为PFN预测均值与分位数提供高效无调参不确定性量化。	cs.LGcs.AI	Thomas Nagler, David R\"ugamer	Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular datasets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not p... Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular datasets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled, efficient, and tuning-free sampling procedure to construct Bayesian posteriors for such estimates based on martingale pos...
903	Versatile yet Efficient Network Traffic Analysis: Offloading Network Foundation Model to SmartNIC 2508.02001 SmartNIC-Offloaded Traffic Analysis将网络基础模型卸载到SmartNIC以兼顾加密流量分析的通用性与低延迟。	cs.LG	Chungang Lin, Xuying Meng, Tianyu Zuo, Weiyao Zhang, Meng Shen	Pervasive encryption makes large-scale labeling infeasible for traffic analysis, while security operations demand edge analysis to avert service degradation and further vulnerabilities. These pressures have produced two disjoint research lines: 1) versatile an... Pervasive encryption makes large-scale labeling infeasible for traffic analysis, while security operations demand edge analysis to avert service degradation and further vulnerabilities. These pressures have produced two disjoint research lines: 1) versatile analysis, via network foundation models for low label dependency, and 2) efficient analysis, via hardware offloading for low analysis latency. However, versatility and efficiency have appeared fundamentally incompatible to co-achieve, with pr...
904	Chordless cycle filtrations for dimensionality detection in complex networks via topological data analysis 2509.08350 TDA Network Dimensionality基于无弦环的拓扑加权滤波来数据驱动估计复杂网络的潜在维度。	cs.LG	Aina Ferr\`a Marc\'us, Robert Jankowski, Meritxell Vila Mi\~nana, Carles Casacuberta, M. \'Angeles Serrano	Many complex networks, ranging from social to biological systems, exhibit structural patterns consistent with an underlying hyperbolic geometry. Revealing the dimensionality of this latent space can disentangle the structural complexity of communities, impact ... Many complex networks, ranging from social to biological systems, exhibit structural patterns consistent with an underlying hyperbolic geometry. Revealing the dimensionality of this latent space can disentangle the structural complexity of communities, impact efficient network navigation, and fundamentally shape connectivity and system behavior. We introduce a topological data analysis weighting scheme for graphs based on chordless cycles to estimate network dimensionality in a data-driven way. ...
905	Privately Estimating Black-Box Statistics 2510.00322 Black-Box Differential Privacy提出无需已知敏感度的黑盒统计量差分隐私估计方法并提升数据效率。	cs.LG	G\"unter F. Steinke, Thomas Steinke	Standard techniques for differentially private estimation, such as Laplace or Gaussian noise addition, require guaranteed bounds on the sensitivity of the estimator in question. But such sensitivity bounds are often large or simply unknown. Thus we seek differ... Standard techniques for differentially private estimation, such as Laplace or Gaussian noise addition, require guaranteed bounds on the sensitivity of the estimator in question. But such sensitivity bounds are often large or simply unknown. Thus we seek differentially private methods that can be applied to arbitrary black-box functions. A handful of such techniques exist, but all are either inefficient in their use of data or require evaluating the function on exponentially many inputs. In this ...
906	Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining 2510.18516 Calcium Imaging SSL Pretraining提出细胞模式感知预训练以从钙成像中稳健解码动态视觉体验。	cs.LG	Sangyoon Bae, Mehdi Azabou, Blake Richards, Jiook Cha	Neural recordings exhibit a distinctive form of heterogeneity rooted in differences in cell types, intrinsic circuit dynamics, and stochastic stimulus-response variability that goes beyond ordinary dataset variability, mixing statistically regular neurons with... Neural recordings exhibit a distinctive form of heterogeneity rooted in differences in cell types, intrinsic circuit dynamics, and stochastic stimulus-response variability that goes beyond ordinary dataset variability, mixing statistically regular neurons with highly stochastic, stimulus-contingent ones within the same dataset. This heterogeneity poses a challenge for self-supervised learning (SSL) -- learnable statistical regularity -- thereby destabilizing representation learning and limiting ...
907	Benchmarking World-Model Learning with Environment-Level Queries 2510.19788 World-Model Evaluation Benchmark用环境级查询构建评测以检验世界模型回答多类问题的能力。	cs.LGcs.AI	Archana Warrier, Dat Nguyen, Michelangelo Naim, Moksh Jain, Yichao Liang	World models are central to building AI agents capable of flexible reasoning and planning. Yet current evaluations (i) test only properties measurable from observed interactions, such as next-frame prediction or task return, and (ii) do not test whether a lear... World models are central to building AI agents capable of flexible reasoning and planning. Yet current evaluations (i) test only properties measurable from observed interactions, such as next-frame prediction or task return, and (ii) do not test whether a learned model supports diverse queries about the environment. In contrast, humans build $\textit{general-purpose}$ models that can answer many different questions about an environment$\unicode{x2014}$including questions that require understandi...
908	RNAGenScape: Property-Guided, Optimized Generation of mRNA Sequences with Manifold Langevin Dynamics 2510.24736 mRNA Sequence Generation用流形朗之万动力学进行属性引导的mRNA序列生成并保持生物可行性。	cs.LG	Danqi Liao, Chen Liu, Xingzhi Sun, Di\'e Tang, Haochen Wang	Generating property-optimized mRNA sequences is central to applications such as vaccine design and protein replacement therapy, but remains challenging due to limited data, complex sequence-function relationships, and the narrow space of biologically viable se... Generating property-optimized mRNA sequences is central to applications such as vaccine design and protein replacement therapy, but remains challenging due to limited data, complex sequence-function relationships, and the narrow space of biologically viable sequences. Generative methods that drift away from the data manifold can yield sequences that fail to fold, translate poorly, or are otherwise nonfunctional. We present RNAGenScape, a property-guided manifold Langevin dynamics framework for m...
909	Understanding Robustness of Model Editing in Code LLMs 2511.03182 Code LLM Model Editing构建基准分析代码大模型编辑在API迁移中的泛化与鲁棒性保持。	cs.LG	Vinaik Chhetri, Moghis Fereidouni, A. B Siddique, Umar Farooq	Large language models (LLMs) for code are increasingly used in software development, but they remain static after pretraining while APIs and software libraries continue to evolve. Model editing offers a lightweight alternative to retraining for incorporating A... Large language models (LLMs) for code are increasingly used in software development, but they remain static after pretraining while APIs and software libraries continue to evolve. Model editing offers a lightweight alternative to retraining for incorporating API updates, yet it remains unclear whether existing editing methods can induce correct API migration, generalize that behavior to unseen tasks, and preserve performance on tasks involving unmodified APIs. We present a controlled benchmark f...
910	Assumed Density Filtering and Smoothing with Neural Network Surrogate Models 2511.09016 Neural ADF Filtering用神经网络代理模型的高斯输入解析矩传播实现ADF滤波与RTS平滑。	cs.LG	Simon Kuang, Xinfan Lin	The Kalman filter and Rauch-Tung-Striebel (RTS) smoother are optimal for state estimation in linear dynamic systems. With nonlinear systems, the challenge consists in how to propagate uncertainty through the state transitions and output function. For the case ... The Kalman filter and Rauch-Tung-Striebel (RTS) smoother are optimal for state estimation in linear dynamic systems. With nonlinear systems, the challenge consists in how to propagate uncertainty through the state transitions and output function. For the case of a neural network model, we enable accurate uncertainty propagation using a recent state-of-the-art analytic formula for computing the mean and covariance of a deep neural network with Gaussian input. We argue that cross entropy is a more...
911	End-to-end PDDL Planning with Hardcoded and Dynamic Agents 2512.09629 End-to-End PDDL Planning将自然语言需求转为PDDL并由多代理迭代修正以完成可验证规划。	cs.LGcs.AI	Emanuele La Malfa, Ping Zhu, Samuele Marro, Sara Bernardini, Michael Wooldridge	We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem are iterati... We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem are iteratively refined by sub-modules (agents) to address common planning requirements, such as time constraints and optimality, as well as ambiguities and contradictions that may exist in the human specification. We support two categories of agents:...
912	Evaluating Large Language Models in Scientific Discovery 2512.15567 LLMs for Scientific Discovery提出情景化基准评测大模型在科学发现中的迭代推理与假设生成能力。	cs.LGcs.AI	Zhangde Song, Jieyu Lu, Yuanqi Du, Botao Yu, Thomas M. Pruyn	Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific d... Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We introduce a scenario-grounded benchmark that evaluates LLMs across biology, chemistry, materials, and physics, where domain experts define research projects of genuine interest and decompose them into modular research scenarios...
913	Bellman Calibration for $V$-Learning in Offline Reinforcement Learning 2512.23694 Offline RL Value Calibration提出Bellman校准准则与误差度量以提升离线V学习的长程可靠性。	cs.LG	Lars van der Laan, Nathan Kallus	Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman completeness or realizabil... Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman completeness or realizability. We introduce Bellman calibration, a weak reliability criterion requiring that states assigned similar predicted values have average Bellman targets that agree with those predictions. This criterion yields a scalar calibration error for...
914	Fitted $Q$ Evaluation Without Bellman Completeness via Stationary Weighting 2512.23805 FQE without Bellman Completeness通过目标策略平稳分布加权的回归范数实现无完备性假设的FQE稳定性。	cs.LG	Lars van der Laan, Nathan Kallus	Fitted $Q$-evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximation. We study an alternative rout... Fitted $Q$-evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximation. We study an alternative route: changing the norm used in the regression step. The policy-evaluation Bellman operator is contractive in the $L^2$ norm induced by the target policy's stationary state-action distribution, whereas standard off-policy FQE projects Bellman ...
915	Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration 2512.23927 Soft FQI Local Convergence分析软FQI在平稳范数对齐下无需Bellman完备性也可局部收敛。	cs.LG	Lars van der Laan, Nathan Kallus	Fitted $Q$-iteration (FQI) and soft FQI are widely used value-based methods for offline reinforcement learning, but their standard stability guarantees often depend on Bellman completeness, a strong closure condition that can fail under function approximation.... Fitted $Q$-iteration (FQI) and soft FQI are widely used value-based methods for offline reinforcement learning, but their standard stability guarantees often depend on Bellman completeness, a strong closure condition that can fail under function approximation. We analyze soft FQI without Bellman completeness and identify the stability mechanism that replaces it: local stationary norm alignment. Near the soft-optimal fixed point, the soft Bellman operator has the same first-order behavior as the ...
916	SWaRL: Safeguard Code Watermarking via Reinforcement Learning 2601.02602 RL-based Code Watermarking用强化学习嵌入可验证且难移除的代码水印并保持程序功能正确。	cs.LG	Neusha Javidnia, Ruisi Zhang, Ashish Kundu, Farinaz Koushanfar	We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLMs by embedding unique and verifiable signatures in the generated program. Existing watermarking approaches either rely on handcra... We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLMs by embedding unique and verifiable signatures in the generated program. Existing watermarking approaches either rely on handcrafted code transformations or manipulate token generation probabilities at inference time, making them vulnerable to removal attacks or prone to breaking functional correctness. To address these challenges, SWaRL employs a reinforcement lear...
917	Multi-environment Invariance Learning with Missing Data 2601.07247 Invariant Learning with Missing Data研究多环境不变性学习在缺失数据下的建模与泛化方法。	cs.LG	Yiran Jia, Jelena Bradic	Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationship... Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationships, which may represent causal effects when the data distribution is encoded within a structural equation model (SEM) and satisfies modularity conditions. This has led to a growing body of work that builds on invariance learning, leveraging ...
918	TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models 2601.18744 Time Series Reasoning Benchmark提出多任务多模态TSRBench以全面评测通用模型的时间序列推理能力。	cs.LGcs.AI	Fangxu Yu, Xingang Guo, Lingzhi Yuan, Haoqiang Kang, Hongyu Zhao	Time series are ubiquitous in real-world scenarios and crucial for applications ranging from energy management to traffic control. Consequently, the ability to reason over time series is a fundamental skill for generalist models to solve complex problems. Howe... Time series are ubiquitous in real-world scenarios and crucial for applications ranging from energy management to traffic control. Consequently, the ability to reason over time series is a fundamental skill for generalist models to solve complex problems. However, current benchmarks for generalist models largely overlook this dimension. To bridge this gap, we introduce TSRBench, a comprehensive multi-modal benchmark designed to stress-test the full spectrum of time series reasoning capabilities....
919	Test-Time Compute Games 2601.21839 Economics of Test-Time Compute用博弈论分析LLM按算力计费导致的社会低效与提供商激励扭曲。	cs.LGcs.AI	Ander Artola Velasco, Dimitrios Rontogiannis, Stratis Tsirtsis, Manuel Gomez-Rodriguez	Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models (LLMs). However, this strategy has in turn increased how much users pay cloud-based providers offering LLM-as-a-service, since providers charge us... Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models (LLMs). However, this strategy has in turn increased how much users pay cloud-based providers offering LLM-as-a-service, since providers charge users for the amount of test-time compute they use to generate an output. In our work, we show that the market of LLM-as-a-service is socially inefficient: providers have a financial incentive to increase the amount of test-time compute, even...
920	Diffusion Path Samplers via Sequential Monte Carlo 2601.21951 SMC Diffusion Samplers结合扩散路径与序贯蒙特卡洛构造采样器并给出得分与密度估计。	cs.LG	James Matthew Young, Paula Cordero-Encinar, Sebastian Reich, Andrew Duncan, O. Deniz Akyildiz	We develop diffusion-based samplers for target distributions known up to a normalising constant. To this end, we rely on the well-known diffusion path that smoothly interpolates between a simple base distribution and the target, popularised by diffusion models... We develop diffusion-based samplers for target distributions known up to a normalising constant. To this end, we rely on the well-known diffusion path that smoothly interpolates between a simple base distribution and the target, popularised by diffusion models. We tackle the score estimation problem by developing an efficient sequential Monte Carlo sampler that evolves auxiliary variables from conditional distributions along the path, providing principled score and density estimates for time-var...
921	Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients 2602.00474 Markov Chain Policy Evaluation用最小外周商分解区分持久相位与瞬态效应以改进马尔可夫链评估。	cs.LG	Yang Xu, Vaneet Aggarwal	We study fixed-policy evaluation for finite Markov chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Ces\`aro averages, while persistent phase-dep... We study fixed-policy evaluation for finite Markov chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Ces\`aro averages, while persistent phase-dependent behavior is absorbed into the bias together with genuinely transient effects. We identify the real peripheral invariant subspace $\mathcal{K}(P)$ of the transition matrix $P$ as the source of this ambiguity. Quotienting by $\mathcal{...
922	Emergence of Distortions in High-Dimensional Guided Diffusion Models 2602.00716 Classifier-Free Guidance Distortions用统计物理刻画CFG在高维下引入的条件分布失真及其依赖关系。	cs.LG	Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello	Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often reduces sample diversity. Using tools from statistical physics, we analyze the emergence of generative distortions induced by CFG, namely the mis... Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often reduces sample diversity. Using tools from statistical physics, we analyze the emergence of generative distortions induced by CFG, namely the mismatch between the CFG sampling distribution and the true conditional distribution. We study this phenomenon in analytically tractable settings with exact score functions, characterizing its dependence on data dimensionality and the number o...
923	Robust Sublinear Convergence Rates for Iterative Bregman Projections 2602.01372 Bregman Projection Convergence给出迭代KL型Bregman投影的稳健O(1/k)对偶收敛率证明框架。	cs.LG	Gabriel Peyr\'e	Entropic regularization provides a simple way to approximate linear programs whose constraints split into two or more tractable blocks. The resulting objectives are amenable to cyclic Kullback-Leibler (KL) Bregman projections, with Sinkhorn-type algorithms for... Entropic regularization provides a simple way to approximate linear programs whose constraints split into two or more tractable blocks. The resulting objectives are amenable to cyclic Kullback-Leibler (KL) Bregman projections, with Sinkhorn-type algorithms for optimal transport, matrix scaling, and barycenters as canonical examples. This paper gives a general blueprint for proving $O(1/k)$ dual convergence rate with a constant that scales only linearly in $1/\gamma$, where $\gamma$ is the entrop...
924	CGF-Softmax: A Cumulant-Based Softmax Reformulation for Efficient Inference under Homomorphic Encryption 2602.01621 HE-Friendly Softmax Approximation提出基于累积量生成函数的softmax重写以降低同态加密推理开销。	cs.LG	Hanjun Park, Byeongseo Min, Jiheon Woo, Min-Wook Jeong, Jongho Shin	Homomorphic encryption (HE) is a prominent framework for privacy-preserving machine learning, enabling inference directly on encrypted data. However, evaluating softmax, a core component of transformer architectures, remains particularly challenging in HE due ... Homomorphic encryption (HE) is a prominent framework for privacy-preserving machine learning, enabling inference directly on encrypted data. However, evaluating softmax, a core component of transformer architectures, remains particularly challenging in HE due to its multivariate structure, the large dynamic range induced by exponential functions, and the costly division operation. In this paper, we propose CGF-softmax, which reformulates the softmax denominator through the cumulant generating fu...
925	Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model 2602.04774 Optimal Learning Rate Schedules在可解随机特征模型中推导SGD最优学习率日程与缩放律。	cs.LG	Blake Bordelon, Francesco Mori	Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR schedules for a powerlaw random feature mod... Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $\eta_T^\star(t)$ where $t$ is the current iterate and $T$ is the training horizon. This schedule is computed both as a numerical optimization problem and a...
926	On the Meta-Design of Allocation Problems 2602.08786 Meta-Design of Resource Allocation将资源配置的设计参数也纳入优化以联合提升总体福利与服务策略。	cs.LG	Unai Fischer-Abaigar, Emily Aiken, Christoph Kern, Juan Carlos Perdomo	There is an extensive literature that studies how to find optimal policies in resource allocation problems, taking the underlying design parameters that define the allocation, such as what data is collected, how many people can be served, and quality of servic... There is an extensive literature that studies how to find optimal policies in resource allocation problems, taking the underlying design parameters that define the allocation, such as what data is collected, how many people can be served, and quality of service as fixed constraints. Yet, from a planner's perspective, these design parameters are themselves optimization variables that are just as important in determining overall welfare as selecting the optimal targeting rule for a given set of co...
927	From Average Sensitivity to Small-Loss Regret Bounds under Random-Order Model 2602.09457 Random-Order Online Learning Regret由平均敏感度与稳定性条件推导随机顺序模型下的小损失遗憾界。	cs.LG	Shinsaku Sakaue, Yuichi Yoshida	We study online learning in the random-order model, where the multiset of loss functions is chosen adversarially but revealed in a uniformly random order. By extending the batch-to-online transformation of Dong and Yoshida (2023), we show that if an offline al... We study online learning in the random-order model, where the multiset of loss functions is chosen adversarially but revealed in a uniformly random order. By extending the batch-to-online transformation of Dong and Yoshida (2023), we show that if an offline algorithm enjoys a $(1+\varepsilon)$-approximation guarantee, an average sensitivity bound controlled by a function $\varphi(\varepsilon)$, and stability with respect to $\varepsilon$, then we can obtain a small-loss regret bound typically of...
928	MobileDev-Bench: A Benchmark for Issue Resolution in Mobile Application Development 2603.24946 Mobile App Issue Benchmark发布MobileDev-Bench评测大模型在真实移动应用多文件缺陷修复能力。	cs.LG	Moshood A. Fakorede, Krishna Upadhyay, A. B. Siddique, Umar Farooq	Large language models (LLMs) have shown strong performance on automated software engineering tasks, yet existing benchmarks focus primarily on library-style repositories, leaving mobile application development largely unexplored despite its framework-specific ... Large language models (LLMs) have shown strong performance on automated software engineering tasks, yet existing benchmarks focus primarily on library-style repositories, leaving mobile application development largely unexplored despite its framework-specific build systems, heterogeneous artifact types, and coordinated multi-file fix requirements. We introduce MobileDev-Bench, a benchmark comprising 407 real-world issue-resolution tasks collected from 19 production mobile applications spanning A...
929	Beyond Pessimism: Offline Learning in KL-regularized Games 2604.06738 Offline Learning in Regularized Games提出无悲观估计的算法以改进KL正则零和博弈的离线学习统计效率。	cs.LG	Yuheng Zhang, Claire Chen, Nan Jiang	We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized with respect to a fixed reference policy through KL regularization. Prior work relies on pessimistic value estimation to handle distribution shift, yielding onl... We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized with respect to a fixed reference policy through KL regularization. Prior work relies on pessimistic value estimation to handle distribution shift, yielding only $\widetilde{\mathcal{O}}(1/\sqrt n)$ statistical rates. We develop a new pessimism-free algorithm and analytical framework for KL-regularized games, built on the smoothness of KL-regularized best responses and a stability property of the ...
930	Steered LLM Activations are Non-Surjective 2604.09839 Limits of Activation Steering证明激活操控产生的行为映射非满射并分析其不可由提示实现的范围。	cs.LGcs.AI	Aayush Mishra, Daniel Khashabi, Anqi Liu	Activation steering is a popular white-box control technique that modifies model activations to elicit an abstract change in its behavior. It has also become a standard tool in interpretability (e.g., probing truthfulness, or translating activations into human... Activation steering is a popular white-box control technique that modifies model activations to elicit an abstract change in its behavior. It has also become a standard tool in interpretability (e.g., probing truthfulness, or translating activations into human-readable explanations) and safety research (e.g., jailbreakability). However, it is unclear whether steered behavior is realizable by any textual prompt. In this work, we cast this question as a surjectivity problem: for a fixed model, doe...
931	One-Shot Generative Flows: Existence and Obstructions 2604.15439 Generative Flow Transport Theory研究生成模型中的动态测度传输流及其存在性障碍。	cs.LG	Panos Tsimpos, Daniel Sharp, Youssef Marzouk	We study dynamic measure transport for generative modeling, focusing on transport maps that connect a source measure $P_0$ to a target measure $P_1$ by integrating a velocity field of the form $v_t(x) = \mathbb{E}[\dot X_t \mid X_t = x]$, where $X_\bullet = (X... We study dynamic measure transport for generative modeling, focusing on transport maps that connect a source measure $P_0$ to a target measure $P_1$ by integrating a velocity field of the form $v_t(x) = \mathbb{E}[\dot X_t \mid X_t = x]$, where $X_\bullet = (X_t)_t$ is a stochastic process satisfying $(X_0,X_1)\sim{P_0}\otimes{P_1}$ and $\dot X_t$ is its time derivative. We investigate when $X_\bullet$ induces a \emph{straight-line flow}: a flow whose pointwise acceleration vanishes and is there...
932	Verification Modulo Tested Library Contracts 2604.15533 Tested Library Contract Verification合成可测试的库方法契约以模块化验证客户端程序。	cs.LG	Abhishek Uppar, Omar Muhammad, Sumanth Prabhu, Deepak D'Souza, Madhusudan P	We consider the problem of verification modulo tested library contracts as a step towards automating the verification of client programs that use complex libraries. We formulate this problem as the synthesis of modular contracts for the library methods used by... We consider the problem of verification modulo tested library contracts as a step towards automating the verification of client programs that use complex libraries. We formulate this problem as the synthesis of modular contracts for the library methods used by the client that are adequate to prove the client correct, and that also pass the scrutiny of a testing engine that tests the library against these contracts. We also consider a new form of method contracts called contextual contracts that ...
933	Beyond Bellman: High-Order Generator Regression for Continuous-Time Policy Evaluation 2604.18972 Continuous-Time Policy Evaluation用高阶生成元回归提升连续时间策略评估精度。	cs.LG	Yaowei Zheng, Richong Zhang, Shenxi Wu, Shirui Bian, Haosong Zhang	We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from one-step recursion is only... We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from one-step recursion is only first-order in the grid width. We estimate the time-dependent generator from multi-step transitions using moment-matching coefficients that cancel lower-order truncation terms, and combine the resulting surrogate with backward regression. ...
934	FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards 2604.26733 Live RL Future Prediction构建实时未来事件预测环境以训练可持续学习的智能体。	cs.LGcs.AI	Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue	Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn fro... Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from the real world. It can provide a large number of prediction questions grounded in diverse real-world events, while preventing answer leakage. To leverage the advantages of future prediction, we present FutureWorld, a live agentic reinforc...
935	Geometric and dynamical analysis of attractor boundaries and storage limits in kernel Hopfield networks 2605.00366 Kernel Hopfield Network Dynamics分析核Hopfield网络吸引域边界与存储容量极限机制。	cs.LG	Akira Tamamori	High-capacity associative memories based on Kernel Logistic Regression (KLR) exhibit strong storage capabilities, but the dynamical and geometric mechanisms underlying their stability remain poorly understood. This paper investigates the global geometry of att... High-capacity associative memories based on Kernel Logistic Regression (KLR) exhibit strong storage capabilities, but the dynamical and geometric mechanisms underlying their stability remain poorly understood. This paper investigates the global geometry of attractor basins and the mechanisms governing the storage limit in KLR-trained Hopfield networks. We combine empirical evaluations using random sequences and real-world image embeddings (CIFAR-10) with morphing experiments and statistical Sign...
936	Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring 2605.00754 Multilingual Code Reward Models训练鲁棒多语种代码奖励模型以支持多指标灵活评分。	cs.LG	Indraneil Paul, Goran Glava\v{s}, Iryna Gurevych	Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with exi... Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choice constrains post-training to optimizing functional correctness over self-contained executable code. In this work, we examine the training and evaluation of multilingual, multi-cr...
937	Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning 2605.01041 Multi-Agent UAV Deconfliction用多智能体强化学习实现异构无人机群安全间隔保障。	cs.LGcs.AI	Iman Sharifi, Hyeong Tae Kim, Maheed Hatem Ahmed, Mahsa Ghasemi, Peng Wei	In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sen... In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free...
938	Saliency-Aware Regularized Quantization Calibration for Large Language Models 2605.05693 LLM Quantization Calibration提出显著性感知的正则化校准以提升LLM后训练量化泛化。	cs.LGcs.AI	Yanlong Zhao, Xiaoyuan Cheng, Huihang Liu, Baihua He, Xinyu Zhang	Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predeter... Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predetermined calibration dataset, typically optimized via either scale search or Gram-based methods. However, from the perspective of generalization risk, existing PTQ calibration objectives based solely on empirical reconstruction error over limi...
939	Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems 2605.05703 Active Learning for LLM-MAS用主动学习挑选任务以优化多智能体通信结构并省token。	cs.LGcs.AI	Huchen Yang, Xinghao Dong, Dan Negrut, Jin-Long Wu	Optimizing the communication structure of large language model based multi-agent systems (LLM-MAS) has been shown to improve downstream performance and reduce token usage. Existing methods typically rely on randomly sampled training tasks. However, tasks may d... Optimizing the communication structure of large language model based multi-agent systems (LLM-MAS) has been shown to improve downstream performance and reduce token usage. Existing methods typically rely on randomly sampled training tasks. However, tasks may differ substantially in difficulty and domain, and thus they are not equally informative for updating communication structure, making optimization under limited training budgets often unstable and highly sensitive to the particular training ...
940	Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend 2605.06055 Efficient MoE Communication提出基于共享HBM的无中继缓冲MoE推理通信方案。	cs.LG	Tianlun Hu, Tiancheng Hu, Shengsheng Litang, Sheng Wang, Xiaoming Bao	Mixture-of-Experts (MoE) inference requires large-scale token exchange across devices, making dispatch and combine major bottlenecks in both prefill and decode. Beyond network transfer, routing-driven layout transformation, temporary relay, and output restorat... Mixture-of-Experts (MoE) inference requires large-scale token exchange across devices, making dispatch and combine major bottlenecks in both prefill and decode. Beyond network transfer, routing-driven layout transformation, temporary relay, and output restoration can add substantial overhead. Existing MoE communication paths are often buffer-centric, using explicit inter-process relay and reordering buffers around collective transfer. This report presents a relay-buffer-free communication design...
cs.MM 1 papers
1102	Forensic analysis of video data deletion and recovery in Honeywell surveillance file system 2605.07430 Surveillance File System Forensics逆向分析Honeywell监控专有文件系统，研究视频删除与恢复取证。	cs.MM	Jinhee Yoon, Sungjae Hwang	Real-time video surveillance systems store recorded video using digital video recorders (DVRs) and network video recorders (NVRs). To support continuous high-volume video storage, these devices employ specialized, nonstandard file systems that are often propri... Real-time video surveillance systems store recorded video using digital video recorders (DVRs) and network video recorders (NVRs). To support continuous high-volume video storage, these devices employ specialized, nonstandard file systems that are often proprietary and undocumented. This lack of documentation significantly increases the time and effort required for forensic analysis. In this study, we analyze an undocumented proprietary file system used by Honeywell video surveillance devices-on...
cs.SD 3 papers
1096	An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire 2605.06685 Piano Transcription and Information Profiling构建高精度钢琴转录到分析流水线，生成作曲家信息论风格画像。	cs.SDeess.AS	Fred Jalbert-Desforges	We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certif... We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives emp...
1097	A Decomposed Retrieval-Edit-Rerank Framework for Chord Generation 2605.07489 Retrieval-Edit-Rerank Chord Generation用检索-编辑-重排分解式框架生成和弦，兼顾多样性与乐理约束。	cs.SDcs.MM	Qiqi He, Dichucheng Li, Xiaoheng Sun, Anqi Huang	Chord generation is an inherently constrained creative task that requires balancing stylistic diversity with music-theoretic feasibility. Existing approaches typically entangle candidate generation and constraint enforcement within a single model, making the d... Chord generation is an inherently constrained creative task that requires balancing stylistic diversity with music-theoretic feasibility. Existing approaches typically entangle candidate generation and constraint enforcement within a single model, making the diversity-feasibility trade-off difficult to control and interpret. In this work, we approach chord generation from a system-level perspective, introducing a Retrieval-Edit-Rerank (RER) framework that decomposes the task into three explicit ...
1098	TARNet: A Temporal-Aware Multi-Scale Architecture for Closed-Set Speaker Identification 2605.07735 Temporal Multi-Scale Speaker Identification提出TARNet多尺度时序建模网络，提升闭集说话人识别性能。	cs.SD	Yassin Terraf, Youssef Iraqi	Closed-Set speaker identification aims to assign a speech utterance to one of a predefined set of enrolled speakers and requires robust modeling of speaker-specific characteristics across multiple temporal scales. While recent deep learning approaches have ach... Closed-Set speaker identification aims to assign a speech utterance to one of a predefined set of enrolled speakers and requires robust modeling of speaker-specific characteristics across multiple temporal scales. While recent deep learning approaches have achieved strong performance, many existing architectures provide limited mechanisms for modeling temporal dependencies across different time scales, which can restrict the effective use of complementary short-, mid-, and long-term speaker char...
eess.AS 3 papers
1099	Evaluating voice anonymisation using similarity rank disclosure 2605.07291 Voice Anonymisation Privacy Metric用相似度排名泄露SRD度量匿名化隐私风险，避免EER局限。	eess.AS	Shilpa Chandra, Matteo Petten\`o, Nicholas Evans, Michele Panariello, Massimiliano Todisco	The evaluation of voice anonymisation remains challenging. Current practice relies on automatic speaker verification metrics such as the equal error rate (EER). Performance estimates dependent on the classifier and operating point provide an incomplete or even... The evaluation of voice anonymisation remains challenging. Current practice relies on automatic speaker verification metrics such as the equal error rate (EER). Performance estimates dependent on the classifier and operating point provide an incomplete or even misleading characterisation of privacy risk. We investigate the use of similarity rank disclosure (SRD), an information-theoretic metric, which operates on feature representations rather than classifier decisions, providing a threshold-ind...
1100	Asymmetric Phase Coding Audio Watermarking 2605.07241 Cryptographic Audio Watermarking提出APC相位编码结合数字签名与纠错，实现训练无关音频溯源水印。	eess.AS	Guang Yang, Amir Ghasemian, Ninareh Mehrabi, Homa Hosseinmardi	The proliferation of deepfake audio challenges voice-based authentication systems; passive forensic detectors are sensitive to evolving generative models and to real-world channel distortions. We propose Asymmetric Phase Coding (APC), a training-free cryptogra... The proliferation of deepfake audio challenges voice-based authentication systems; passive forensic detectors are sensitive to evolving generative models and to real-world channel distortions. We propose Asymmetric Phase Coding (APC), a training-free cryptographic signing layer for audio, designed as a compact and auditable provenance primitive that can stand alone or be stacked with learned watermarks. APC combines Ed25519 digital signatures (EdDSA, FIPS 186-5; 64-byte signatures) with Reed-Sol...
1101	Multi-Axis Speech Similarity via Factor-Partitioned Embeddings 2605.02804 Factor-Partitioned Speech Embeddings将语音嵌入按内容与说话人等因素分区，获得多轴相似度表示。	eess.AS	Jim O'Regan, Jens Edlund	Speech encodes multiple simultaneous attributes -- linguistic content, speaker identity, dialect, gender --that conventional single-vector embeddings conflate. We present a factor-partitioned embedding framework that maps each utterance into a single vector wh... Speech encodes multiple simultaneous attributes -- linguistic content, speaker identity, dialect, gender --that conventional single-vector embeddings conflate. We present a factor-partitioned embedding framework that maps each utterance into a single vector whose subspaces correspond to distinct axes of variation. A shared acoustic encoder feeds per-axis linear projection heads, each trained via distillation from a specialist teacher or a contrastive objective over shared-label pairs. The result...