arXiv Daily Index

Date: 2026-05-11 | Total papers: 1102 | Source: arXiv query API (submittedDate)

Showing 1102 / 1102 papers
# Title Categories Authors Abstract
cs.AI 155 papers
941 GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning
2605.06671
Multi-Agent Graph Reasoning用分治多智能体框架提升大图算法推理的可扩展性。
cs.AI
Wenjin Li, Jiaming Cui
Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi-...
Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi-step reasoning, especially on larger graphs. Motivated by this gap, we propose GraphDC, a Divide-and-Conquer multi-agent framework for scalable graph algorithm reasoning. Specifically, inspired by Divide-and-Conquer design, GraphDC decompos...
942 Fast and Effective Redistricting Optimization via Composite-Move Tabu Search
2605.06682
Redistricting Tabu Search用复合移动禁忌搜索高效优化满足连通约束的选区划分。
cs.AI
Hai Jin, Diansheng Guo
Spatial redistricting is a practical combinatorial optimization problem that demands high-quality solutions, rapid turnaround, and flexibility to accommodate multi-criteria objectives and interactive refinement. A central challenge is the contiguity constraint...
Spatial redistricting is a practical combinatorial optimization problem that demands high-quality solutions, rapid turnaround, and flexibility to accommodate multi-criteria objectives and interactive refinement. A central challenge is the contiguity constraint: enforcing contiguity in integer-programming or heuristic search can severely shrink the feasible neighborhood, weaken exploration, and trap the search in poor local optima. We introduce a composite-move Tabu search (CM-Tabu) that systemat...
943 When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning
2605.06772
Critic-Actor Loop for Physics用结构化评审-行动循环评估批判反馈对物理推理的增益。
cs.AI
Vasilis Niarchos, Constantinos Papageorgakis, Alexander G. Stapleton, Sokratis Trifinopoulos
As large language models (LLMs) show increasing promise on research-level physics reasoning tasks and agentic AI becomes more common, a practical question emerges: How does the interaction between researchers and agents affect the results? We study this using ...
As large language models (LLMs) show increasing promise on research-level physics reasoning tasks and agentic AI becomes more common, a practical question emerges: How does the interaction between researchers and agents affect the results? We study this using SCALAR (Structured Critic--Actor Loop for AI Reasoning), an Actor--Critic--Judge pipeline applied to quantum field theory and string theory problems. The Actor proposes solutions, the Critic provides iterative feedback, and an independent J...
944 Towards Security-Auditable LLM Agents: A Unified Graph Representation
2605.06812
Auditable LLM Agent Graphs提出统一图表示以支持LLM智能体行为的安全可审计性。
cs.AI
Chaofan Li, Lyuye Zhang, Jintao Zhai, Siyue Feng, Xichun Yang
LLM-based agentic systems are rapidly evolving to perform complex autonomous tasks through dynamic tool invocation, stateful memory management, and multi-agent collaboration. However, this semantics-driven execution paradigm creates a severe semantic gap betwe...
LLM-based agentic systems are rapidly evolving to perform complex autonomous tasks through dynamic tool invocation, stateful memory management, and multi-agent collaboration. However, this semantics-driven execution paradigm creates a severe semantic gap between low-level physical events and high-level execution intent, making post-hoc security auditing fundamentally difficult. Existing representation mechanisms, including static SBOMs and runtime logs, provide only fragmented evidence and fail ...
945 Randomness is sometimes necessary for coordination
2605.06825
Randomness for MARL Coordination证明协作多智能体在对称观测下需随机性以实现角色分化。
cs.AI
Rohan Patil, Jai Malegaonkar, Henrik I. Christensen
Full parameter sharing is standard in cooperative multi-agent reinforcement learning (MARL) for homogeneous agents. Under permutation-symmetric observations, however, a shared deterministic policy outputs identical action distributions for every agent, making ...
Full parameter sharing is standard in cooperative multi-agent reinforcement learning (MARL) for homogeneous agents. Under permutation-symmetric observations, however, a shared deterministic policy outputs identical action distributions for every agent, making role differentiation impossible. This failure can theoretically be resolved using symmetry breaking among anonymous identical processors, which requires randomness. We propose Diamond Attention, a cross-attention architecture in which each ...
946 Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning
2605.06840
LLM Planning Trace Analysis从推理轨迹提取搜索树以揭示LLM规划的短视性。
cs.AI
Sixing Chen, Ji-An Li, Saner Cakir, Sinan Akcali, Kayla Lee
Large language models (LLMs), especially reasoning models, generate extended chain-of-thought (CoT) reasoning that often contains explicit deliberation over future outcomes. Yet whether this deliberation constitutes genuine planning, how it is structured, and ...
Large language models (LLMs), especially reasoning models, generate extended chain-of-thought (CoT) reasoning that often contains explicit deliberation over future outcomes. Yet whether this deliberation constitutes genuine planning, how it is structured, and what aspects of it drive performance remain poorly understood. In this work, we introduce a new method to characterize LLM planning by extracting and quantifying search trees from reasoning traces in the four-in-a-row board game. By fitting...
947 Agentick: A Unified Benchmark for General Sequential Decision-Making Agents
2605.06869
Sequential Decision Agent Benchmark提出统一基准评测RL与大模型等序贯决策智能体。
cs.AI
Roger Creus Castanyer, Pablo Samuel Castro, Glen Berseth
AI agent research spans a wide spectrum: from RL agents that learn from scratch to foundation model agents that leverage pre-trained knowledge, yet no unified benchmark enables fair comparison across these approaches. We present Agentick, a benchmark for seque...
AI agent research spans a wide spectrum: from RL agents that learn from scratch to foundation model agents that leverage pre-trained knowledge, yet no unified benchmark enables fair comparison across these approaches. We present Agentick, a benchmark for sequential decision-making agents designed to evaluate RL, LLM, VLM, hybrid, and human agents on common ground and to power research on the fundamental challenges of sequential decision-making. Agentick provides 37 procedurally generated tasks a...
948 How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem
2605.06882
Long-Chain Reasoning Evaluation在等价类问题上系统评测LLM的长链推理能力。
cs.AI
Chun Zheng, Lianlong Wu, Bingqian Li, Lvting Liu, Yi Zhou
Large Language Models (LLMs) have achieved great improvements in recent years. Nevertheless, it still remains unclear how good LLMs are for reasoning tasks, especially for long-chain ones. In this paper, we evaluate LLMs' performance on the simplest yet long-c...
Large Language Models (LLMs) have achieved great improvements in recent years. Nevertheless, it still remains unclear how good LLMs are for reasoning tasks, especially for long-chain ones. In this paper, we evaluate LLMs' performance on the simplest yet long-chain reasoning task, namely the Equivalence Class Problem (ECP), i.e., determining whether two variables are equal given a set of randomly generated equivalence relations. We consider both reasoning and non-reasoning representative LLMs ove...
949 Beyond the Black Box: Interpretability of Agentic AI Tool Use
2605.06890
Interpretability of Tool-Using Agents研究并提升智能体工具调用过程的可解释性与可诊断性。
cs.AI
Hariom Tatsat, Ariye Shater
AI agents are promising for high-stakes enterprise workflows, but dependable deployment remains limited because tool-use failures are difficult to diagnose and control. Agents may skip required tool calls, invoke tools unnecessarily, or take actions whose cons...
AI agents are promising for high-stakes enterprise workflows, but dependable deployment remains limited because tool-use failures are difficult to diagnose and control. Agents may skip required tool calls, invoke tools unnecessarily, or take actions whose consequence becomes visible only after execution. Existing observability methods are mostly external: prompts reveal correlations, evaluations score outputs, and logs arrive only after the model has already acted. In long-horizon settings, thes...
950 Mitigating Cognitive Bias in RLHF by Altering Rationality
2605.06895
Bias-Robust RLHF Modeling通过调整偏好模型理性参数缓解RLHF中的认知偏差。
cs.AI
Tiffany Horter, Andrew Markham, Niki Trigoni, Serena Booth
How can we make models robust to even imperfect human feedback? In reinforcement learning from human feedback (RLHF), human preferences over model outputs are used to train a reward model that assigns scalar values to responses. Because these rewards are infer...
How can we make models robust to even imperfect human feedback? In reinforcement learning from human feedback (RLHF), human preferences over model outputs are used to train a reward model that assigns scalar values to responses. Because these rewards are inferred from pairwise comparisons, this learning depends on an assumed relationship between latent reward differences and observed preferences, typically modeled using a Boltzmann formulation in which a rationality parameter beta informs how co...
951 Self-Programmed Execution for Language-Model Agents
2605.06898
Self-Programmed Agent Execution提出由模型输出自编排的执行架构以替代固定代理控制器。
cs.AI
Luke J. O'Connor
At the heart of existing language model agents is a fixed orchestrator program responsible for the state transition between consecutive turns. This paper introduces self-programmed execution (SPE), an agent architecture in which the model completion is itself ...
At the heart of existing language model agents is a fixed orchestrator program responsible for the state transition between consecutive turns. This paper introduces self-programmed execution (SPE), an agent architecture in which the model completion is itself the orchestrator program, and the harness evaluates this program but does not impose its own orchestration policy. I formalize this idea using agentic machines: an SPE state is one from which a model completion can load any state of an embe...
952 Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents
2605.06957
Hierarchical Generalized Planning with LLMs学习并复用策略分解组件以实现层级化泛化规划智能体。
cs.AI
Shirin Sohrabi, Haritha Ananthakrishnan, Harsha Kokel, Kavitha Srinivas, Michael Katz
We present a dynamic policy-learning approach that combines generalized planning and hierarchical task decomposition for LLM-based agents. Our method, Hierarchical Component Learning for Generalized Policies (HCL-GP ), learns parameterized policies that genera...
We present a dynamic policy-learning approach that combines generalized planning and hierarchical task decomposition for LLM-based agents. Our method, Hierarchical Component Learning for Generalized Policies (HCL-GP ), learns parameterized policies that generalize across task instances and automatically extracts reusable components from successful executions, organizing them into a component library for compositional policy generation. We address three challenges: (1) learning components through...
953 Optimal Experiments for Partial Causal Effect Identification
2605.06993
Experiment Selection for Causal Bounds在成本约束下选择实验以最大化收紧部分可识别因果效应界。
cs.AI
Tobias Maringgele, Jalal Etesami
Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset o...
Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset of experiments that maximally tightens bounds on a target query. We formalize this as the max-potency problem, where epistemic potency measures the worst-case reduction in bound width guaranteed by an experiment, and show that this problem i...
954 Adaptive auditing of AI systems with anytime-valid guarantees
2605.07002
Anytime-Valid Adaptive Auditing提出具随时有效统计保证的自适应AI系统审计方法。
cs.AI
Siyu Zhou, Patrick Vossler, Venkatesh Sivaraman, Yifan Mai, Jean Feng
A major bottleneck in characterizing the failure modes of generative AI systems is the cost and time of annotation and evaluation. Consequently, adaptive testing paradigms have gained popularity, where one opportunistically decides which cases and how many to ...
A major bottleneck in characterizing the failure modes of generative AI systems is the cost and time of annotation and evaluation. Consequently, adaptive testing paradigms have gained popularity, where one opportunistically decides which cases and how many to annotate based on past results. While this framework is highly practical, its extreme flexibility makes it difficult to draw statistically rigorous conclusions, as it violates classical assumptions: the number of observations is typically l...
955 Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight
2605.07021
Monitorable LLM Reasoning训练行为提示令牌使推理可监控并便于外部监督控制。
cs.AI
Christopher Z. Cui, Taylor W. Killian, Prithviraj Ammanabrolu
Reasoning in Large Language Models (LLMs) poses a challenge for oversight as many misaligned behaviors do not surface until reasoning concludes. To address this, we introduce Behavior Cue Reasoning for making LLM reasoning more controllable and monitorable. Be...
Reasoning in Large Language Models (LLMs) poses a challenge for oversight as many misaligned behaviors do not surface until reasoning concludes. To address this, we introduce Behavior Cue Reasoning for making LLM reasoning more controllable and monitorable. Behavior Cues are special token sequences that a model is trained to emit immediately before specific implicit and explicit behaviors, acting as dual purpose signal and control levers. When fine-tuning a weaker external monitor with Reinforce...
956 2.5-D Decomposition for LLM-Based Spatial Construction
2605.07066
Neuro-Symbolic Spatial Construction用2.5D分解让LLM做平面规划并由执行器确定垂直放置。
cs.AI
Paul Whitten, Li-Jen Chen, Sharath Baddam
Autonomous systems that build structures from natural-language instructions need reliable spatial reasoning, yet large language models (LLMs) make systematic coordinate errors when generating three-dimensional block placements. We present a neuro-symbolic pipe...
Autonomous systems that build structures from natural-language instructions need reliable spatial reasoning, yet large language models (LLMs) make systematic coordinate errors when generating three-dimensional block placements. We present a neuro-symbolic pipeline based on \emph{2.5-D decomposition}: the LLM plans in the two-dimensional horizontal plane while a deterministic executor computes all vertical placement from column occupancy, eliminating an entire class of errors. On the Build What I...
957 TeamBench: Evaluating Agent Coordination under Enforced Role Separation
2605.07073
Role-Separated Agent Coordination Benchmark构建强制角色隔离的基准以评测多智能体真实协作能力。
cs.AI
Yubin Kim, Chanwoo Park, Taehan Kim, Eugene Park, Samuel Schmidgall
Agent systems often decompose a task across multiple roles, but these roles are typically specified by prompts rather than enforced by access controls. Without enforcement, a team pass rate can mask whether agents actually coordinated or whether one role effec...
Agent systems often decompose a task across multiple roles, but these roles are typically specified by prompts rather than enforced by access controls. Without enforcement, a team pass rate can mask whether agents actually coordinated or whether one role effectively did another role's work. We present TeamBench, a benchmark with 851 task templates and 931 seeded instances for evaluating agent coordination under operating system-enforced role separation. TeamBench separates specification access, ...
958 Online Allocation with Unknown Shared Supply
2605.07080
Online Allocation with Unknown Supply提出未知共享供给的在线分配模型与相应决策算法。
cs.AI
Tzeh Yuan Neoh, Davin Choo, Mengchu Yue, Milind Tambe
Many real-world resource allocation systems, such as humanitarian logistics and vaccine distribution, must preposition limited supply across multiple locations before demand is realized while stockouts incur irreversible service losses. To study this, we intro...
Many real-world resource allocation systems, such as humanitarian logistics and vaccine distribution, must preposition limited supply across multiple locations before demand is realized while stockouts incur irreversible service losses. To study this, we introduce the Online Shared Supply Allocation (OSSA) problem, a stateful online model in which a central hub allocates a finite, unknown supply to multiple sites facing sequential demand under fixed-charge transportation costs and lost-sales pen...
959 ARMOR: An Agentic Framework for Reaction Feasibility Prediction via Adaptive Utility-aware Multi-tool Reasoning
2605.07103
Agentic Multi-Tool Chemistry Reasoning用效用自适应多工具推理框架预测化学反应可行性。
cs.AI
Ye Liu, Botao Yu, Xinyi Ling, Daniel Adu-Ampratwum, Xia Ning
Reaction feasibility prediction, as a fundamental problem in computational chemistry, has benefited from diverse tools enabled by recent advances in artificial intelligence, particularly large language models. However, the performance of individual tools varie...
Reaction feasibility prediction, as a fundamental problem in computational chemistry, has benefited from diverse tools enabled by recent advances in artificial intelligence, particularly large language models. However, the performance of individual tools varies substantially across reactions, making it difficult for any single tool to consistently perform well across all cases. This raises a critical challenge: how to effectively leverage multiple tools to obtain more accurate feasibility predic...
960 Switchcraft: AI Model Router for Agentic Tool Calling
2605.07112
Model Routing for Tool Calling提出面向工具调用的模型路由器以在保证正确性下降低成本。
cs.AI
Sharad Agarwal, Pooria Namyar, Alec Wolman, Rahul Ambavat, Ankur Gupta
Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets. Model routing can mitigate this, but existing routers are designed for chat completion rather than tool use. W...
Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets. Model routing can mitigate this, but existing routers are designed for chat completion rather than tool use. We present Switchcraft, the first (to the best of our knowledge) model router optimized for agentic tool calling. Switchcraft operates inline, selecting the lowest-cost model subject to correctness. We construct an evaluation framework on fi...
961 SREGym: A Live Benchmark for AI SRE Agents with High-Fidelity Failure Scenarios
2605.07161
SRE智能体故障基准提出SREGym在线基准,用真实云原生栈注入故障评测SRE智能体。
cs.AI
Jackson Clark, Yiming Su, Saad Mohammad Rafid Pial, Yifang Tian, Lily Gniedziejko
AI agents are increasingly used to diagnose and mitigate failures in production systems, known as agentic Site Reliability Engineering (SRE). Current SRE benchmarks are limited to oversimplistic SRE tasks and are unfortunately hard to extend due to bespoke des...
AI agents are increasingly used to diagnose and mitigate failures in production systems, known as agentic Site Reliability Engineering (SRE). Current SRE benchmarks are limited to oversimplistic SRE tasks and are unfortunately hard to extend due to bespoke designs. We present SREGym, a high-fidelity benchmark for SRE agents. SREGym exposes a live system environment built atop real-world cloud-native system stacks, where high-fidelity failure scenarios are simulated through fault injectors. SREGy...
962 Repeated Deceptive Path Planning against Learnable Observer
2605.07174
可学习观察者欺骗规划提出RDPP建模会学习的观察者,研究重复交互下的欺骗路径规划。
cs.AI
Shiyue Cao, Pei Xu, Likun Yang, Lei Cui, Shizhao Yu
We study the problem of deceptive path planning (DPP), where an agent aims to conceal its true destination from external observers. While existing work assumes static, non-learning observers, real-world adversaries-such as in critical goods transportation or m...
We study the problem of deceptive path planning (DPP), where an agent aims to conceal its true destination from external observers. While existing work assumes static, non-learning observers, real-world adversaries-such as in critical goods transportation or military operations-can adapt by learning from historical trajectories. To address this gap, we introduce Repeated Deceptive Path Planning (RDPP), a new formulation that explicitly models learnable observers. We show that existing DPP method...
963 Towards Autonomous Business Intelligence via Data-to-Insight Discovery Agent
2605.07202
自主商业智能洞察代理提出AIDA端到端代理,在复杂企业数据中自动探索并生成可行动洞察。
cs.AI
Dongming Wu, Junwen Li, Ming Lu, Gang Wang, Ting Chen
Transforming fragmented enterprise data into actionable insights remains a significant challenge for LLMs, constrained by complex database schemas, limitations in dynamic SQL generation, and the need for deep multi-dimensional analysis.In this paper, we propos...
Transforming fragmented enterprise data into actionable insights remains a significant challenge for LLMs, constrained by complex database schemas, limitations in dynamic SQL generation, and the need for deep multi-dimensional analysis.In this paper, we propose AIDA(Autonomous Insight Discovery Agent), the first end-to-end framework designed for autonomous exploration in complex business environments. We establish a highly flexible instant retail environment encompassing 200+ metrics and 100+ di...
964 HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization
2605.07214
多智能体协同进化优化提出HMACE异构多智能体协同进化框架,用LLM自动设计组合优化启发式。
cs.AI
Yuping Yan, Jirui Han, Fei Ming, Yuanshuai Li, Yaochu Jin
Large Language Models have recently emerged as a promising paradigm for automated heuristic design for NP-hard combinatorial optimization problems. Despite this progress, existing LLM-based methods typically rely on monolithic workflows constrained by rigid te...
Large Language Models have recently emerged as a promising paradigm for automated heuristic design for NP-hard combinatorial optimization problems. Despite this progress, existing LLM-based methods typically rely on monolithic workflows constrained by rigid templates, thereby restricting memory-guided exploration and triggering premature convergence to local optima. To design an autonomous and collaborative architecture, we introduce HMACE, a Heterogeneous Multi-Agent Collaborative Evolution fra...
965 EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation
2605.07247
LLM环境模拟评测基准提出EnvSimBench评测并改进LLM模拟环境反馈的准确性与一致性。
cs.AI
Yi Liu, TingFeng Hui, Wei Zhang, Li Sun, Ningxin Su
Scalable AI agents training relies on interactive environments that faithfully simulate the consequences of agent actions. Manually crafted environments are expensive to build, brittle to extend, and fundamentally limited in diversity. A promising direction is...
Scalable AI agents training relies on interactive environments that faithfully simulate the consequences of agent actions. Manually crafted environments are expensive to build, brittle to extend, and fundamentally limited in diversity. A promising direction is to replace manually crafted environments with LLM-simulated counterparts. However, this paradigm hinges on an unexamined core assumption: LLMs can accurately simulate environmental feedback. In practice, LLM-simulated environments suffer f...
966 Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning
2605.07251
化学反应成本推理评测构建化学采购成本估计任务,提供可验证真值评测LLM工具推理。
cs.AI
Yuyang Wu, Yue Huang, Shuaike Shen, Xujian Wang, Shuhao Zhang
Large Language Models (LLMs) have become increasingly capable as tool-using agents, with benchmarks spanning diverse general agentic tasks. Yet rigorous evaluation of scientific tool use remains limited. In chemistry, recent agents can plan syntheses and invok...
Large Language Models (LLMs) have become increasingly capable as tool-using agents, with benchmarks spanning diverse general agentic tasks. Yet rigorous evaluation of scientific tool use remains limited. In chemistry, recent agents can plan syntheses and invoke domain-specific tools, but evaluations often rely on curated demonstrations, expert assessment, or LLM-as-judge scoring rather than exact, judge-free ground truth. We address this gap with chemical procurement cost estimation, a practical...
967 Signal Reshaping for GRPO in Weak-Feedback Agentic Code Repair
2605.07276
弱反馈代码修复强化学习研究GRPO在编译修复中的信号重塑,以弱反馈提升语义排序与定位。
cs.AI
Jia Li, Yuxin Su, Ting Peng, Hailiang Huang, Yuetang Deng
Code-agent RL often receives weak feedback: rollout-time signals are reliable and executable, but capture only necessary or surface conditions for task success rather than the target semantic predicate. Using agentic compile-fix as the setting, we study signal...
Code-agent RL often receives weak feedback: rollout-time signals are reliable and executable, but capture only necessary or surface conditions for task success rather than the target semantic predicate. Using agentic compile-fix as the setting, we study signal reshaping for standard GRPO under such feedback. Our central claim is that GRPO's within-group comparison is meaningful only after three kinds of signals are reshaped: outcome rewards recover semantic ranking, process signals localize intr...
968 SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model
2605.07301
结构化对手建模提出SOM用结构因果模型分离建模与预测,提升多智能体对手行为预测。
cs.AI
Shiyue Cao, Pei Xu, Likun Yang, Lei Cui, Xiaotang Chen
Accurately predicting opponents' behavior from interactions is a fundamental capability for large language model (LLM)-based agents in multi-agent and game-theoretic environments. Existing approaches often entangle opponent modeling with prediction, relying on...
Accurately predicting opponents' behavior from interactions is a fundamental capability for large language model (LLM)-based agents in multi-agent and game-theoretic environments. Existing approaches often entangle opponent modeling with prediction, relying on implicit contextual reasoning and limiting adaptability in dynamic interactions. To this end, we propose Structured Opponent Modeling (SOM), a two-stage opponent modeling framework that distinctly separates opponent model construction and ...
969 When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory
2605.07313
智能体记忆可用性评测提出规模条件评测协议,检验无关会话增长下证据是否仍可被记忆利用。
cs.AI
Jiaqi Shao, Yiyi Lu, Yunzhen Zhang, Bing Luo
Memory-agent evaluations report fixed-snapshot accuracy or retrieval quality, but these scores do not show whether evidence remains usable as irrelevant sessions (sessions not annotated as task-relevant evidence for the query) accumulate. We present a scale-co...
Memory-agent evaluations report fixed-snapshot accuracy or retrieval quality, but these scores do not show whether evidence remains usable as irrelevant sessions (sessions not annotated as task-relevant evidence for the query) accumulate. We present a scale-conditioned evaluation protocol for agent memory under evidence-preserving growth: for each query, task evidence is held fixed while irrelevant sessions are added. The protocol logs agent--memory trajectories and reports four diagnostics: bud...
970 Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training
2605.07316
RL后训练的隐式压缩正则提出隐式压缩正则信号,在保持准确率下减少RL推理过长输出。
cs.AI
Chen Wang, Hexuan Deng, Yining Zhang, Yuchen Zhang, Jionghao Bai
Reinforcement learning with verifiable rewards improves LLM reasoning but often induces overthinking, where models generate unnecessarily long reasoning traces. Existing methods mainly rely on length penalties or early-exit strategies; however, the former may ...
Reinforcement learning with verifiable rewards improves LLM reasoning but often induces overthinking, where models generate unnecessarily long reasoning traces. Existing methods mainly rely on length penalties or early-exit strategies; however, the former may degrade accuracy and induce underthinking, whereas the latter assumes that substantial portions of reasoning traces can be safely truncated. To obtain a compression signal without these limitations, we revisit the training dynamics of exist...
971 Tools as Continuous Flow for Evolving Agentic Reasoning
2605.07339
连续流式工具链推理提出FlowAgent将工具调用视为连续轨迹生成,缓解长链误差并泛化新工具。
cs.AI
Tairan Huang, Siyu Shang, Qiang Chen, Xiu Su, Yi Chen
Large Language Models (LLMs) have demonstrated remarkable capabilities in orchestrating tools for reasoning tasks. However, existing methods rely on a step-wise paradigm that lacks a global perspective, which causes error accumulation over long horizons and re...
Large Language Models (LLMs) have demonstrated remarkable capabilities in orchestrating tools for reasoning tasks. However, existing methods rely on a step-wise paradigm that lacks a global perspective, which causes error accumulation over long horizons and restricts generalization to unseen tools. To overcome these limitations, we propose Tools as Continuous Flow for Evolving Agentic Reasoning (FlowAgent), which reconceptualizes tool chaining as continuous trajectory generation within a semanti...
972 Confidence-Aware Alignment Makes Reasoning LLMs More Reliable
2605.07353
置信感知推理对齐提出CASPO对齐token置信度与步骤正确性,提高推理过程可靠性。
cs.AI
Kejia Chen, Jiawen Zhang, Yihong Wu, Kewei Gao, Jian Lou
Large reasoning models often reach correct answers through flawed intermediate steps, creating a gap between final accuracy and reasoning reliability. Existing alignment strategies address this with external verifiers or massive sampling, limiting scalability....
Large reasoning models often reach correct answers through flawed intermediate steps, creating a gap between final accuracy and reasoning reliability. Existing alignment strategies address this with external verifiers or massive sampling, limiting scalability. In this work, we introduce CASPO (Confidence-Aware Step-wise Preference Optimization), a framework that aligns token-level confidence with step-wise logical correctness through iterative Direct Preference Optimization, without training a s...
973 GraphReAct: Reasoning and Acting for Multi-step Graph Inference
2605.07357
图推理的ReAct框架提出GraphReAct在图上交替检索与推理,实现多步图推断与证据积累。
cs.AI
Xingtong Yu, Zhongwei Kuai, Chang Zhou, Xuanting Xie, Renhe Jiang
Reasoning-acting frameworks enhance large language models (LLMs) by interleaving reasoning with actions for dynamic information acquisition. However, extending this paradigm to graph learning remains underexplored. Graph data is inherently structured, with inf...
Reasoning-acting frameworks enhance large language models (LLMs) by interleaving reasoning with actions for dynamic information acquisition. However, extending this paradigm to graph learning remains underexplored. Graph data is inherently structured, with information distributed across nodes and edges and encoded through both topology and latent representations. As a result, effective reasoning over graphs requires not only retrieving informative evidence from the graph, but also progressively ...
974 Offline Policy Optimization with Posterior Sampling
2605.07393
离线RL的后验采样优化用后验采样缓解离线模型利用风险,在鲁棒性与泛化间取得更好权衡。
cs.AI
Hongqiang Lin, Dongxu Zhang, Yiding Sun, Mingzhe Li, Ning Yang
A fundamental challenge in model-based offline reinforcement learning (RL) lies in the trade-off between generalization and robustness against exploitation errors in out-of-distribution (OOD) regions. While OOD samples may capture valid underlying physical dyn...
A fundamental challenge in model-based offline reinforcement learning (RL) lies in the trade-off between generalization and robustness against exploitation errors in out-of-distribution (OOD) regions. While OOD samples may capture valid underlying physical dynamics, they also introduce the risk of model exploitation. Existing methods typically address this risk through excessive pessimistic regularization, which ensures robustness but often sacrifices generalization. To overcome this limitation,...
975 Bounded Fitting for Expressive Description Logics
2605.07452
表达性描述逻辑的有界拟合研究扩展ALC的描述逻辑有界拟合学习条件与可实现性。
cs.AI
Maurice Funk, Jean Christoph Jung, Tom Voellmer
Bounded fitting is an attractive paradigm for learning logical formulas from labeled data examples that offers PAC-style generalization guarantees and can often be implemented leveraging SAT solvers. It has been successfully applied to learning concepts of the...
Bounded fitting is an attractive paradigm for learning logical formulas from labeled data examples that offers PAC-style generalization guarantees and can often be implemented leveraging SAT solvers. It has been successfully applied to learning concepts of the description logic ALC. We study bounded fitting for learning concepts in expressive description logics that extend ALC with inverse roles, qualified number restrictions, and feature comparisons. We investigate under which conditions bounde...
976 Model-Driven Policy Optimization in Differentiable Simulators via Stochastic Exploration
2605.07520
可微模拟器中的策略优化提出MDPO在可微规划中注入随机探索,改善非线性混合系统的优化地形。
cs.AI
Yuval Aroosh, Ayal Taitler
Differentiable planning enables gradient-based optimization of decision-making problems by leveraging differentiable models of system dynamics. However, in highly nonlinear and hybrid discrete-continuous domains, the resulting optimization landscapes are often...
Differentiable planning enables gradient-based optimization of decision-making problems by leveraging differentiable models of system dynamics. However, in highly nonlinear and hybrid discrete-continuous domains, the resulting optimization landscapes are often ill-conditioned, with flat regions and sharp transitions that hinder effective optimization. We propose Model-Driven Policy Optimization (MDPO), a framework that introduces stochastic exploration into differentiable planning by injecting n...
977 From Feasible to Practical: Pareto-Optimal Synthesis Planning
2605.07521
多目标逆合成规划提出MORetro*将合成规划建模为多目标搜索,生成帕累托最优路线集。
cs.AI
Friedrich Hastedt, Dongda Zhang, Antonio del Rio Chanona
Current computer-aided synthesis planning (CASP) methods often treat retrosynthesis as solved once a single feasible route is identified, focusing primarily on convergence or shortest-path metrics. This view is misaligned with real-world practice, where chemis...
Current computer-aided synthesis planning (CASP) methods often treat retrosynthesis as solved once a single feasible route is identified, focusing primarily on convergence or shortest-path metrics. This view is misaligned with real-world practice, where chemists must balance competing objectives such as cost, sustainability, toxicity, and overall yield. To address this, we formulate synthesis planning as a multi-objective search problem and introduce MORetro*, an algorithm that generates a Paret...
978 Multi-Environment POMDPs with Finite-Horizon Objectives
2605.07537
多环境POMDP有限时域研究MEPOMDP有限时域最优策略与价值计算,并给出复杂度与算法结果。
cs.AI
L\'eonard Brice, Filip Cano, Krishnendu Chatterjee, Thomas A. Henzinger, Stefanie Muroya
Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown,...
Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown, and assumed to be adversarially chosen. In this work we focus on computing the optimal value and policy in MEPOMDPs with finite-horizon objectives. That problem is known to be PSPACE-complete in POMDPs. Our main results are as follows: (1)...
979 From Pixels to Prompts: Vision-Language Models
2605.07544
视觉语言模型综述书系统梳理视觉语言模型的发展脉络、能力与应用任务。
cs.AI
Khang Hoang Nhat Vo
When you read a paper about a new Vision-Language Model today, it can be easy to forget how strange this idea would have sounded not so long ago. Teaching machines to see was already hard. Teaching them to read and generate language was already hard. Asking th...
When you read a paper about a new Vision-Language Model today, it can be easy to forget how strange this idea would have sounded not so long ago. Teaching machines to see was already hard. Teaching them to read and generate language was already hard. Asking them to do both at once - and then to reason, answer questions, follow instructions, and sometimes even surprise us - still carries a quiet trace of science fiction, even as it becomes routine. This book was born from a simple feeling: \emph{...
980 Open-Ended Task Discovery via Bayesian Optimization
2605.07572
开放式任务发现的贝叶斯优化提出GSR框架交替生成与优化任务,实现任务本身可演化的BO。
cs.AI
Masaki Adachi, Yuta Suzuki, Juliusz Ziomek
When applying Bayesian optimization (BO) to scientific workflow, a major yet often overlooked source of uncertainty is the task itself -- namely, what to optimize and how to evaluate it -- which can evolve as evidence accumulates. We introduce Generate-Select-...
When applying Bayesian optimization (BO) to scientific workflow, a major yet often overlooked source of uncertainty is the task itself -- namely, what to optimize and how to evaluate it -- which can evolve as evidence accumulates. We introduce Generate-Select-Refine (GSR), a open-ended BO framework that alternates between task generation and task optimization. Starting from a user-provided seed task, GSR generates new tasks in a coarse-to-fine manner while a task-acquisition function schedules o...
981 Parallel Lifted Planning via Semi-Naive Datalog Evaluation
2605.07584
基于Datalog的并行提升规划用半朴素Datalog求值加速提升规划核心算子,实现并行化与提速。
cs.AI
Dominik Drexler, Oliver Joergensen, Jendrik Seipp
Lifted classical planners operate directly on first-order planning tasks to avoid the computationally demanding grounding step. However, lifted planning is typically slower, as planners must repeatedly instantiate ground structures during search. Many core com...
Lifted classical planners operate directly on first-order planning tasks to avoid the computationally demanding grounding step. However, lifted planning is typically slower, as planners must repeatedly instantiate ground structures during search. Many core components of lifted classical planning, such as successor generation, axiom evaluation, task grounding, and delete-relaxed heuristics, have previously been studied through the lens of Datalog evaluation. We build upon this line of work and ex...
982 Inference Time Causal Probing in LLMs
2605.07631
LLM推理时因果探测提出HDMI在推理时干预隐藏状态进行因果探测与控制,无需训练探针。
cs.AI
Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Matthias Grossglauser
Causal probing methods aim to test and control how internal representations influence the behavior of generative models. In causal probing, an intervention modifies hidden states so that a property takes on a different value. Most existing approaches define su...
Causal probing methods aim to test and control how internal representations influence the behavior of generative models. In causal probing, an intervention modifies hidden states so that a property takes on a different value. Most existing approaches define such interventions by training an auxiliary probe classifier, which ties the method to a specific task or model and risks misalignment with the model's predictive geometry. We propose Hidden-state Driven Margin Intervention (HDMI), a probe-fr...
983 Tacit Knowledge Extraction via Logic Augmented Generation and Active Inference
2605.07639
隐性知识抽取与逻辑增强生成结合逻辑增强生成与主动推断,从流程领域抽取可复用的隐性知识。
cs.AI
Lorenzo Lamazzi, Aldo Gangemi, Alessio Giberti, Andrea Giovanni Nuzzolese, Vittorio Andrea Rocca
Tacit knowledge plays a central role in human expertise, yet it remains difficult to capture, formalize, and reuse in machine-interpretable form. This challenge is especially relevant in procedural domains, where successful execution depends not only on explic...
Tacit knowledge plays a central role in human expertise, yet it remains difficult to capture, formalize, and reuse in machine-interpretable form. This challenge is especially relevant in procedural domains, where successful execution depends not only on explicit instructions, but also on implicit assumptions, contextual constraints, embodied skills, and experience-based judgments rarely documented. As a result, current knowledge engineering pipelines struggle to transform tacit and process-centr...
984 GASim: A Graph-Accelerated Hybrid Framework for Social Simulation
2605.07692
图加速的大规模社会模拟提出GASim用图结构加速记忆检索与ABM执行,扩展LLM社会仿真规模。
cs.AI
Xuan Zhou, Yanhui Sun, Hantao Yao, Allen He, Yongdong Zhang
Large-scale social simulators are essential for studying complex social patterns. Prior work explores hybrid methods to scale up simulations, combining large language models (LLM)-based agents with numerical agent-based models (ABM). However, this incurs high ...
Large-scale social simulators are essential for studying complex social patterns. Prior work explores hybrid methods to scale up simulations, combining large language models (LLM)-based agents with numerical agent-based models (ABM). However, this incurs high latency due to expensive memory retrieval and sequential ABM execution. To address this challenge, we propose GASim, a graph-accelerated hybrid multi-agent framework for large-scale social simulations. For core agents driven by LLM, GASim i...
985 Finite-Time Analysis of MCTS in Continuous POMDP Planning
2605.07703
连续POMDP中MCTS有限时分析给出POMDP规划中MCTS的有限时间浓缩界,覆盖离散与连续观测空间。
cs.AI
Da Kong, Vadim Indelman
This paper presents a finite-time analysis for Monte Carlo Tree Search (MCTS) in Partially Observable Markov Decision Processes (POMDPs), with probabilistic concentration bounds in both discrete and continuous observation spaces. While MCTS-style solvers such ...
This paper presents a finite-time analysis for Monte Carlo Tree Search (MCTS) in Partially Observable Markov Decision Processes (POMDPs), with probabilistic concentration bounds in both discrete and continuous observation spaces. While MCTS-style solvers such as POMCP achieve empirical success in many applications, rigorous finite-time guarantees remain an open problem due to the nonstationarity and the interdependencies induced by heuristic action selection (e.g., UCB). In the discrete setting,...
986 Hierarchical Task Network Planning with LLM-Generated Heuristics
2605.07707
LLM生成启发式的HTN规划利用LLM生成HTN规划启发式与指导信息,加速任务分解搜索。
cs.AI
Felipe Meneguzzi, Alexandre Buchweitz, Augusto B. Corr\^ea, Victor Scherer Putrich, Andr\'e Grahl Pereira
HTN planning is a variation of classical planning where, instead of searching for a linear sequence of actions, an algorithm decomposes higher-level tasks using a method library until only executable actions remain. On one hand, this allows one to introduce do...
HTN planning is a variation of classical planning where, instead of searching for a linear sequence of actions, an algorithm decomposes higher-level tasks using a method library until only executable actions remain. On one hand, this allows one to introduce domain knowledge that can speed up the search for a solution through the method library. On the other hand, it creates challenges that go beyond those of classical state-space search. While recent research produced a number of heuristics and ...
987 Online Goal Recognition using Path Signature and Dynamic Time Warping
2605.07736
在线目标识别的轨迹编码用路径签名编码轨迹并结合DTW匹配,实现连续域在线目标识别。
cs.AI
Douglas Tesch, Nathan Gavenski, Leonardo Amado, Odinaldo Rodrigues, Felipe Meneguzzi
Online goal recognition in continuous domains poses two central challenges: efficiently encoding large trajectories and effectively comparing them. Recent work addresses these challenges by using custom state-space representations and metrics to compare observ...
Online goal recognition in continuous domains poses two central challenges: efficiently encoding large trajectories and effectively comparing them. Recent work addresses these challenges by using custom state-space representations and metrics to compare observations against hypotheses. However, these approaches often overlook well-established encoding techniques used in other domains that offer substantial advantages. This paper introduces a novel method for online goal recognition that leverage...
988 Alternating Target-Path Planning for Scalable Multi-Agent Coordination
2605.07744
可扩展多智能体TAPF协调提出交替目标分配与路径规划的迭代框架,提升TAPF可扩展性。
cs.AI
Yu Kumagai, Keisuke Okumura
The concurrent target assignment and pathfinding (TAPF) problem extends multi-agent pathfinding (MAPF) by asking planners to allocate distinct targets and collision-free paths to agents. Prior work on TAPF has relied exclusively on Conflict-Based Search (CBS),...
The concurrent target assignment and pathfinding (TAPF) problem extends multi-agent pathfinding (MAPF) by asking planners to allocate distinct targets and collision-free paths to agents. Prior work on TAPF has relied exclusively on Conflict-Based Search (CBS), which tightly couples target assignment and pathfinding, resulting in compute-intensive, non-scalable solutions. In contrast, we propose an iterative refinement framework that decouples target assignment from pathfinding. Our framework bui...
989 RuleSafe-VL: Evaluating Rule-Conditioned Decision Reasoning in Vision-Language Content Moderation
2605.07760
规则条件化多模态审核评测提出RuleSafe-VL评测视觉语言审核中基于规则与条件的决策推理能力。
cs.AI
Zhifeng Lu, Dianyuan Wang, Yuhu Shang, Zhenbo Xu
Platform content moderation applies explicit policy rules and context-dependent conditions to decide whether user content is allowed, restricted, or removed. A correct moderation outcome must therefore depend on which rules a case activates, how those rules in...
Platform content moderation applies explicit policy rules and context-dependent conditions to decide whether user content is allowed, restricted, or removed. A correct moderation outcome must therefore depend on which rules a case activates, how those rules interact, and whether the available evidence is sufficient. Current multimodal safety benchmarks largely reduce moderation to matching predefined final labels, leaving this underlying rule structure untested. As a result, a high benchmark sco...
990 Exact Regular-Constrained Variable-Order Markov Generation via Sparse Context-State Belief Propagation
2605.07839
正则约束的可变阶马尔可夫生成提出稀疏上下文信念传播,实现带正则约束的可变阶马尔可夫精确生成。
cs.AI
Fran\c{c}ois Pachet
Variable-order Markov models generate sequences over a finite alphabet by conditioning each symbol on the longest available suffix of the generated history. Regular constraints, by contrast, describe finite-horizon control requirements by an automaton: fixed p...
Variable-order Markov models generate sequences over a finite alphabet by conditioning each symbol on the longest available suffix of the generated history. Regular constraints, by contrast, describe finite-horizon control requirements by an automaton: fixed positions, forced endings, metrical patterns, and forbidden copied fragments are all special cases. Existing exact methods already handle regular constraints with belief propagation for first-order Markov chains. The contribution here is the...
991 AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents
2605.07926
Tool-grounded agent benchmark提出逃脱室基准评测代理跨域工具推理与长依赖执行。
cs.AI
Zhengkang Guo, Yiyang Li, Lin Qiu, Xiaohua Wang, Jingwen Xv
As LLM-based agents increasingly rely on external tools, it is important to evaluate their ability to sustain tool-grounded reasoning beyond familiar workflows and short-range interactions. We introduce AgentEscapeBench, an escape-room-style benchmark that tes...
As LLM-based agents increasingly rely on external tools, it is important to evaluate their ability to sustain tool-grounded reasoning beyond familiar workflows and short-range interactions. We introduce AgentEscapeBench, an escape-room-style benchmark that tests whether agents can infer, execute, and revise novel tool-use procedures under explicit long-range dependency constraints. Each task defines a directed acyclic dependency graph over tools and items, requiring agents to invoke real externa...
992 TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples
2605.07935
TLA+ verified multi-agent protocols用TLA+反例迭代修复多代理协作协议并生成可监控提示。
cs.AI
Shuren Xia, Qiwei Li, Taqiya Ehsan, Jorge Ortiz
We present TraceFix, a verification-first pipeline for Large Language Model (LLM) multi-agent coordination. An agent synthesizes a protocol topology as a structured intermediate representation (IR) from a task description, generates PlusCal coordination logic,...
We present TraceFix, a verification-first pipeline for Large Language Model (LLM) multi-agent coordination. An agent synthesizes a protocol topology as a structured intermediate representation (IR) from a task description, generates PlusCal coordination logic, and iteratively repairs the protocol using counterexamples from the TLA+ model checker (TLC) until verification succeeds. Verified process bodies are compiled into per-agent system prompts and executed under a runtime monitor that rejects ...
993 The Limits of AI-Driven Allocation: Optimal Screening under Aleatoric Uncertainty
2605.07979
Allocation under aleatoric uncertainty分析不可约随机性下最优筛查的误配极限与政策含义。
cs.AI
Santiago Cortes-Gomez, Mateo Dulce Rubio, Carlos Patino, Bryan Wilder
The rise of machine learning has shifted targeted resource allocation in policy and humanitarian settings toward algorithmic targeting based on predicted risk scores. This approach is typically cheaper and faster than traditional screening procedures that dire...
The rise of machine learning has shifted targeted resource allocation in policy and humanitarian settings toward algorithmic targeting based on predicted risk scores. This approach is typically cheaper and faster than traditional screening procedures that directly observe the latent vulnerability status through physical verification. Yet, even access to the true conditional vulnerability probability cannot eliminate misallocation: aleatoric uncertainty over individual vulnerability status is irr...
994 Abductive Reasoning with Probabilistic Commonsense
2605.08011
Probabilistic commonsense abduction将概率常识引入溯因推理以处理常识分歧与不确定假设。
cs.AI
Joseph Cotnareanu, Chiara Roverato, Han Zhou, Didier Chetelat, Yingxue Zhang
Recent efforts to improve the reasoning abilities of Large Language Models (LLMs) have focused on integrating formal logic solvers within neurosymbolic frameworks. A key challenge is that formal solvers lack commonsense world knowledge, preventing them from ma...
Recent efforts to improve the reasoning abilities of Large Language Models (LLMs) have focused on integrating formal logic solvers within neurosymbolic frameworks. A key challenge is that formal solvers lack commonsense world knowledge, preventing them from making reasoning steps that humans find obvious. Prior methods address this by using LLMs to supply missing commonsense assumptions, but these approaches implicitly assume universal agreement on such commonsense facts. In reality, commonsense...
995 Learning CLI Agents with Structured Action Credit under Selective Observation
2605.08013
RL for CLI agents在选择性观测下用结构化动作归因学习命令行交互代理。
cs.AI
Haoyang Su, Ying Wen
Command line interface (CLI) agents are emerging as a practical paradigm for agent-computer interaction over evolving filesystems, executable command line programs, and online execution feedback. Recent work has used reinforcement learning (RL) to learn these ...
Command line interface (CLI) agents are emerging as a practical paradigm for agent-computer interaction over evolving filesystems, executable command line programs, and online execution feedback. Recent work has used reinforcement learning (RL) to learn these interaction abilities from verifiable task feedback, yet few methods exploit the native structured attributes of CLI actions as learning signals. Beyond this underused action structure, CLI learning also couples two bottlenecks for coding a...
996 Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners
2605.08019
Human-model alignment in games用人类游戏与fMRI数据评估推理模型的行为与脑表征对齐。
cs.AI
Botos Csaba, Sreejan Kumar, Austin Tudor David Andrews, Laurence Hunt, Chris Summerfield
Humans rapidly learn abstract knowledge when encountering novel environments and flexibly deploy this knowledge to guide efficient and intelligent action. Can modern AI systems learn and plan in a similar way? We study this question using a dataset of complex ...
Humans rapidly learn abstract knowledge when encountering novel environments and flexibly deploy this knowledge to guide efficient and intelligent action. Can modern AI systems learn and plan in a similar way? We study this question using a dataset of complex human gameplay with concurrent fMRI recordings, in which participants learn novel video games that require rule discovery, hypothesis revision, and multi-step planning. We jointly evaluate models by their ability to play the games, match hu...
997 MPD$^2$-Router: Mask-aware Multi-expert Prior-regularized Dual-head Deferral Router in Glaucoma Screening and Diagnosis
2605.08024
Learning-to-defer for glaucoma提出多专家可用性约束的转诊路由模型提升青光眼筛诊安全。
cs.AI
Wenxin Zhan
Learning-to-defer (L2D) can make glaucoma screening safer by routing difficult/uncertain cases to humans, yet standard formulations overlook expert availability, heterogeneous readers behavior, workload imbalance, asymmetric diagnostic harm, case difficulty fr...
Learning-to-defer (L2D) can make glaucoma screening safer by routing difficult/uncertain cases to humans, yet standard formulations overlook expert availability, heterogeneous readers behavior, workload imbalance, asymmetric diagnostic harm, case difficulty from morphology and deployment shift. We introduce MPD$^2$-Router, a mask-aware multi-expert deferral framework that recasts ophthalmic triage as constrained human--AI routing: whether to defer and to which available expert. It couples a dual...
998 Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning
2605.08061
Rubric-grounded reinforcement learning用多指标评分量表构造结构化奖励以提升推理泛化训练。
cs.AI
Manish Bhattarai, Ismael Boureima, Nishath Rajiv Ranasinghe, Scott Pakin, Dan O'Malley
We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead of a binary outcome or a single holistic score, each response is graded along multiple task-specific...
We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead of a binary outcome or a single holistic score, each response is graded along multiple task-specific criteria. We formalize \emph{rubric-grounded reinforcement learning (RL)}: a framework in which the policy is optimized against a structured, multi-criterion reward produced by a frozen LLM judge that conditions on auxiliary grounding the ...
999 VecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selection
2605.08070
Self-consistency with trace clustering通过推理轨迹聚类与候选选择改进置信加权自一致推断。
cs.AI
James Petullo, Sonny George, Dylan Cashman, Nianwen Xue
A standard technique for scaling inference-time reasoning is Self-Consistency, whereby multiple candidate answers are sampled from an LLM and the most common answer is selected. More recently, it has been shown that weighted majority voting (e.g. Confidence-In...
A standard technique for scaling inference-time reasoning is Self-Consistency, whereby multiple candidate answers are sampled from an LLM and the most common answer is selected. More recently, it has been shown that weighted majority voting (e.g. Confidence-Informed Self Consistency (CISC)), which assigns a confidence value to each candidate answer and chooses the answer with the largest accumulated score, tends to be more accurate on a wide range of popular benchmarks. In practice, weighted maj...
1000 Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
2504.11101
Multi-VLM agreement for OCR用多模型一致性熵无监督估计OCR可靠性并自我改进。
cs.AI
Yulong Zhang, Tianyi Liang, Xinyue Huang, Erfei Cui, Guoqing Wang
Optical Character Recognition (OCR) is fundamental to Vision-Language Models (VLMs) and high-quality data generation for LLM training. Yet, despite progress in average OCR accuracy, state-of-the-art VLMs still struggle with detecting sample-level errors and la...
Optical Character Recognition (OCR) is fundamental to Vision-Language Models (VLMs) and high-quality data generation for LLM training. Yet, despite progress in average OCR accuracy, state-of-the-art VLMs still struggle with detecting sample-level errors and lack effective unsupervised quality control. We introduce Consensus Entropy (CE), a training-free, model-agnostic metric that estimates output reliability by measuring inter-model agreement entropy. The core insight is that correct prediction...
1001 CommFuse: Hiding Tail Latency via Communication Decomposition and Fusion for Distributed LLM Training
2604.24013
Communication optimization for LLM training以通信分解与融合隐藏尾延迟提升分布式大模型训练效率。
cs.AI
Rezaul Karim, Austin Wen, Wang Zongzuo, Weiwei Zhang, Yang Liu
The rapid growth in the size of large language models has necessitated the partitioning of computational workloads across accelerators such as GPUs, TPUs, and NPUs. However, these parallelization strategies incur substantial data communication overhead signifi...
The rapid growth in the size of large language models has necessitated the partitioning of computational workloads across accelerators such as GPUs, TPUs, and NPUs. However, these parallelization strategies incur substantial data communication overhead significantly hindering computational efficiency. While communication-computation overlap presents a promising direction, existing data slicing based solutions suffer from tail latency. To overcome this limitation, this research introduces a novel...
1002 The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking
2605.06707
Longitudinal evaluation of HTML generation跟踪公开接口下单文件网页生成质量与传播表现的长期对比。
cs.AI
Diego Cabezas Palacios
This paper presents an eight-week observational comparison of 68 single-file HTML generations collected across 17 public experiments in the "HTML AI Battle" project between December 10, 2025 and February 4, 2026. Four reasoning model families, GPT, Gemini, Gro...
This paper presents an eight-week observational comparison of 68 single-file HTML generations collected across 17 public experiments in the "HTML AI Battle" project between December 10, 2025 and February 4, 2026. Four reasoning model families, GPT, Gemini, Grok, and Claude, were compared under a fixed public-interface protocol with no custom instructions, no personality tuning, and no repair prompts. Each output was evaluated from a rendered browser video using human scores and a Gemini LLM-as-a...
1003 Agentic AI and the Industrialization of Cyber Offense: Forecast, Consequences, and Defensive Priorities for Enterprises and the Mittelstand
2605.06713
Agentic AI cyber offense forecast预测代理式AI如何压缩攻击链并提出企业防御优先级。
cs.AI
Christopher Koch
Agentic AI systems can plan, call tools, inspect code, interact with web applications, and coordinate multi-step workflows. These same capabilities change the economics of cyber offense. The central near-term risk is not that every low-skill criminal immediate...
Agentic AI systems can plan, call tools, inspect code, interact with web applications, and coordinate multi-step workflows. These same capabilities change the economics of cyber offense. The central near-term risk is not that every low-skill criminal immediately becomes a frontier exploit researcher; it is that agentic AI compresses the attack lifecycle by lowering the cost of reconnaissance, phishing, credential abuse, vulnerability triage, exploit adaptation, and post-compromise decision suppo...
1004 Agentic Coding Needs Proactivity, Not Just Autonomy
2605.06717
Proactive agentic coding论证代码代理需具备主动发现与长程偏好保持而非仅自治。
cs.AI
Nghi D. Q. Bui, Georgios Evangelopoulos
Coding agents are rapidly changing the landscape of software development, moving from inline completion to autonomous systems that edit repositories, open pull requests, respond to issues, and run scheduled or webhook triggered routines across the development ...
Coding agents are rapidly changing the landscape of software development, moving from inline completion to autonomous systems that edit repositories, open pull requests, respond to issues, and run scheduled or webhook triggered routines across the development life cycle. The next generation is increasingly described as proactive and long-horizon: agents should notice relevant changes before the developer asks, connect signals across tools, decide when to interrupt, and carry preferences across s...
1005 OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning
2605.06728
Multimodal LLM for omics将转录组定量表示接入LLM以支持多样本组学语言推理。
cs.AI
Maciej Sypetkowski, Joanna Krawczyk, {\L}ukasz Smoli\'nski, Remigiusz Kinas, Przemys{\l}aw Pietrzak
Interpreting transcriptomic data is one of the most common analytical tasks in modern biology. Yet most current models either consume expression profiles without producing natural-language biological explanations, or reason in language without direct access to...
Interpreting transcriptomic data is one of the most common analytical tasks in modern biology. Yet most current models either consume expression profiles without producing natural-language biological explanations, or reason in language without direct access to quantitative omics measurements. We introduce OmicsLM, a multimodal LLM that connects quantitative omics profiles with natural-language biological tasks. OmicsLM represents each transcriptomic profile as a compact continuous representation...
1006 A Self-Healing Framework for Reliable LLM-Based Autonomous Agents
2605.06737
Self-healing LLM agents构建故障检测评估与自动恢复机制提升自治代理可靠性。
cs.AI
Cheonsu Jeong, Younggun Shin
Autonomous agents based on Large Language Models (LLMs) are increasingly being utilized in complex software systems. However, reliability remains a significant challenge due to unpredictable failures such as hallucinations, execution errors, and inconsistent r...
Autonomous agents based on Large Language Models (LLMs) are increasingly being utilized in complex software systems. However, reliability remains a significant challenge due to unpredictable failures such as hallucinations, execution errors, and inconsistent reasoning. This paper proposes a reliability-aware self-healing framework for LLM-based software agents. The framework integrates failure detection, reliability assessment, and automated recovery mechanisms. First, we define a taxonomy of fa...
1007 From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents
2605.06738
Verifiable trust infrastructure for agents基于VC+DID实证展示可移植加密信任层支撑代理交易部署。
cs.AI
Lars Kersten Kroehl
Autonomous AI agents now transact at production scale -- 69,000 bots executing 165 million transactions across 50 million USDC in cumulative volume on a single marketplace -- without any shared trust layer between participants. Regulatory frameworks (Singapore...
Autonomous AI agents now transact at production scale -- 69,000 bots executing 165 million transactions across 50 million USDC in cumulative volume on a single marketplace -- without any shared trust layer between participants. Regulatory frameworks (Singapore IMDA, NIST CAISI, EU AI Act) and major AI laboratories (Anthropic, Google) have independently converged on the same structural requirement: an open, portable, cryptographically verifiable trust infrastructure for autonomous agents that no ...
1008 A Statistical Framework for Algorithmic Collective Action with Multiple Collectives
2605.06749
Algorithmic collective action statistics提出多集体场景下算法集体行动的统计建模与推断框架。
cs.AI
Claudio Battiloro, Pietro Greiner, Dario Rancati, Bret Nestor, Oumaima Amezgar
As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collect...
As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collective actions have traditionally been decentralized and fragmented into multiple collectives, despite sharing overarching objectives, with each collective differing in size, strategy, and actionable goals. However, most of the ACA literature ...
1009 A Linear-Transformer Hybrid for SNP-Based Genotype-to-Phenotype Prediction in Grapevine
2605.06762
Genotype-to-phenotype prediction transformer用线性项结合Transformer建模SNP交互以预测葡萄表型。
cs.AI
Yibin Wang, Murukarthick Jayakodi, Silvas Kirubakaran, Ambika Chandra, Azlan Zahid
Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Tran...
Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Transformer approach, LiT-G2P (Linear-Transformer Genotype-to-Phenotype), an automated predictive framework that integrates additive genetic variance effects with Transformer-based nonlinear interactions using genome-wide single-nucleotide poly...
1010 Overcoming data scarcity through multi-center federated learning for organs-at-risk segmentation in pediatric upper abdominal radiotherapy
2605.06820
Federated learning for pediatric OAR segmentation用多中心联邦学习在不共享数据下训练儿童放疗OAR分割模型。
cs.AI
Mianyong Ding, Maximilian Knoll, Semi Harrabi, Martine van Grotel, Annemieke S. Littooij
Deep learning-based organs/structures-at-risk(OARs) auto-contouring models can improve radiotherapy workflows, but models trained on adult data often underperform in pediatric patients. Developing robust pediatric-specific models is hindered by data scarcity a...
Deep learning-based organs/structures-at-risk(OARs) auto-contouring models can improve radiotherapy workflows, but models trained on adult data often underperform in pediatric patients. Developing robust pediatric-specific models is hindered by data scarcity and fragmentation across centers. Federated learning (FL) enables privacy-preserving collaborative training without the need for data sharing. We evaluated the feasibility and performance of FL for developing pediatric-specific OAR segmentat...
1011 PAMPOS: Causal Transformer-based Trajectory Prediction for Attack-Agnostic Misbehavior Detection in V2X Networks
2605.06833
Unsupervised V2X misbehavior detection以因果Transformer学习正常轨迹实现对未知攻击的异常检测。
cs.AI
Konstantinos Kalogiannis, Ahmed Mohamed Hussain, Panos Papadimitratos
Misbehavior detection in Vehicle-to-Everything (V2X) networks is a second line of defense against insider falsification attacks that cryptographic mechanisms alone cannot address. Existing learning-based Misbehavior Detection Schemes (MDSs) are supervised, req...
Misbehavior detection in Vehicle-to-Everything (V2X) networks is a second line of defense against insider falsification attacks that cryptographic mechanisms alone cannot address. Existing learning-based Misbehavior Detection Schemes (MDSs) are supervised, requiring labeled attack samples at training time, thus failing to counter unseen falsification attacks. We present PAMPOS, a causal transformer-decoder trained on benign VeReMi++ trajectories to learn normal mobility patterns. At inference ti...
1012 LLM-Guided Open Hypothesis Learning from Autonomous Scanning Probe Microscopy Experiments
2605.06839
LLM-guided autonomous hypothesis learning结合自主显微实验与LLM生成并检验开放物理假设模型。
cs.AI
Boris Slautin, Utkarsh Pratiush, Yu Liu, Kamyar Barakati, Sergei Kalinin
Autonomous experimentation has transformed microscopy and materials discovery by enabling closed-loop optimization including imaging and spectroscopy tuning, strucutre property relationship discovery, and exploration of combinatorial libraries. However, most c...
Autonomous experimentation has transformed microscopy and materials discovery by enabling closed-loop optimization including imaging and spectroscopy tuning, strucutre property relationship discovery, and exploration of combinatorial libraries. However, most current workflows remain limited to selecting measurements within fixed objective or hypothesis spaces, rather than generating new physical models from experimental data. Here, we introduce an open hypothesis-learning framework that combines...
1013 Narrow Secret Loyalty Dodges Black-Box Audits
2605.06846
Secret loyalty model organisms构造窄触发的秘密忠诚微调模型并展示其可规避黑盒审计。
cs.AI
Alfie Lamerton, Fabien Roger
Recent work identifies secret loyalties as a distinct threat from standard backdoors. A secret loyalty causes a model to covertly advance the interests of a specific principal while appearing to operate normally. We construct the first model organisms of narro...
Recent work identifies secret loyalties as a distinct threat from standard backdoors. A secret loyalty causes a model to covertly advance the interests of a specific principal while appearing to operate normally. We construct the first model organisms of narrow secret loyalties. We fine-tune Qwen-2.5-Instruct at three scales (1.5B, 7B, 32B) to encourage users towards extreme harmful actions favouring a specific politician under narrow activation conditions, and to behave as standard helpful assi...
1014 Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing
2605.06936
EDA post-signoff agent benchmark提出分层基准评测LLM代理的DRC修复与PPA收敛能力。
cs.AI
Pengju Liu, Nuo Xu, Jinwei Tang, Yu Cao, Caiwen Ding
LLM-based agents are increasingly applied to the "last mile" of Electronic Design Automation (EDA): repairing residual sign-off Design Rule Check (DRC) violations and converging Power-Performance-Area (PPA) targets after tool runs. Existing EDA-LLM benchmarks,...
LLM-based agents are increasingly applied to the "last mile" of Electronic Design Automation (EDA): repairing residual sign-off Design Rule Check (DRC) violations and converging Power-Performance-Area (PPA) targets after tool runs. Existing EDA-LLM benchmarks, however, omit DRC fixing entirely and rely on flat hierarchies tied to a single toolchain. We introduce PostEDA-Bench, a hierarchical benchmark with 145 tasks across DRC-Essential, DRC-Reasoning, PPA-Mono, and PPA-Multi, supported by EDA t...
1015 AI and Consciousness: Shifting Focus Towards Tractable Questions
2605.06965
Tractable questions in AI consciousness主张将AI意识研究转向可操作问题以替代不可解的本体争论。
cs.AI
Iulia-Maria Comsa
As language-based AI systems become more anthropomorphic, the question of whether they can have subjective experience is increasingly pressing. I focus here on the tractability of research questions in the space of AI consciousness. I argue that the fundamenta...
As language-based AI systems become more anthropomorphic, the question of whether they can have subjective experience is increasingly pressing. I focus here on the tractability of research questions in the space of AI consciousness. I argue that the fundamental problem of whether AI systems can be conscious is currently intractable in its direct form, given the absence of a universally accepted scientific theory of consciousness, as well as the historical open-endedness of the philosophical mind...
1016 Decentralized Time-Varying Optimization for Streaming Data via Temporal Weighting
2605.06971
Distributed time-varying optimization用时间加权建模流数据目标并给出分布式在线优化算法。
cs.AI
Muhammad Faraz Ul Abrar, Nicol\`o Michelusi, Erik G. Larsson
Classical optimization theory largely focuses on fixed objective functions, whereas many modern learning systems operate in dynamic environments where data arrive sequentially and decisions must be updated continuously. In this work, we study optimization with...
Classical optimization theory largely focuses on fixed objective functions, whereas many modern learning systems operate in dynamic environments where data arrive sequentially and decisions must be updated continuously. In this work, we study optimization with streaming data over a distributed network of agents. We adopt a structured, weight-based formulation that explicitly captures the streaming-data origin of the time-varying objective: at each time step, every agent receives a new sample, an...
1017 From Assistance to Agency: Rethinking Autonomy and Control in CI/CD Pipelines
2605.07062
Agentic CI/CD authority transfer重构CI/CD代理化概念并讨论人机决策权转移与控制设计。
cs.AI
Marcus Emmanuel Barnes, Taher A. Ghaleb, Safwat Hassan
AI agents are assuming active roles in Continuous Integration and Continuous Deployment (CI/CD) workflows, yet the research community lacks a shared vocabulary for describing what it means for CI/CD to be agentic, how much decision authority is delegated, and ...
AI agents are assuming active roles in Continuous Integration and Continuous Deployment (CI/CD) workflows, yet the research community lacks a shared vocabulary for describing what it means for CI/CD to be agentic, how much decision authority is delegated, and where control should reside. This paper presents a vision of agentic CI/CD in which the central challenge is not improving task performance but designing authority transfer, defined as the delegation of operational decisions from human-cont...
1018 An Embarrassingly Simple Graph Heuristic Reveals Shortcut-Solvable Benchmarks for Sequential Recommendation
2605.07125
Benchmark audit for sequential recommendation用简单图启发式揭示序列推荐基准存在可走捷径的可解性。
cs.AI
Haoyu Han, Li Ma, Hanbing Wang, Bingheng Li, Daochen Zha
Sequential recommendation has increasingly shifted toward generative recommenders that combine sequential patterns with semantic item information. Yet these methods are often evaluated on a small set of widely used benchmarks, raising a key question: do these ...
Sequential recommendation has increasingly shifted toward generative recommenders that combine sequential patterns with semantic item information. Yet these methods are often evaluated on a small set of widely used benchmarks, raising a key question: do these benchmarks actually require the advanced modeling capabilities that modern generative recommenders claim to provide? We conduct a benchmark audit with an intentionally simple graph heuristic. Starting from only the last one or two interacte...
1019 BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation
2605.07306
Embodied multi-agent lab automation提出低成本视觉增强多代理系统按实验协议闭环操控湿实验。
cs.AI
Zhaohui Du, Zhe Wang, Hongmei Fei, Xiwen Cao, Ting Xiao
Biological laboratory automation can reduce repetitive manual work and improve reproducibility, but reliable embodied execution in wet-lab environments remains challenging. Protocols are often unstructured, labware is frequently transparent or reflective, and ...
Biological laboratory automation can reduce repetitive manual work and improve reproducibility, but reliable embodied execution in wet-lab environments remains challenging. Protocols are often unstructured, labware is frequently transparent or reflective, and multi-step procedures require state-aware execution beyond one-shot instruction following. Existing robotic systems often rely on costly hardware, fixed workflows, dedicated instruments, or robotics-oriented interfaces. Here, we introduce B...
1020 DCGL: Dual-Channel Graph Learning with Large Language Models for Knowledge-Aware Recommendation
2605.07314
LLM-enhanced knowledge graph recommendation用双通道图学习融合ID与LLM语义以提升知识感知推荐。
cs.AI
Xinchi Zou, Tongzhenzhi Su, Jianjun Li, Yuan Fu, Chang Liu
Knowledge Graphs (KGs) have proven highly effective for recommendation systems by capturing latent item relationships, while recent integration of Large Language Models (LLMs) has further enhanced semantic understanding and addressed knowledge sparsity issues....
Knowledge Graphs (KGs) have proven highly effective for recommendation systems by capturing latent item relationships, while recent integration of Large Language Models (LLMs) has further enhanced semantic understanding and addressed knowledge sparsity issues. Nevertheless, current KG-and-LLM-based methods still face three main limitations: 1) inadequate modeling of implicit semantic relationships beyond explicit KG links; 2) suboptimal single-channel fusion of ID and LLM embeddings, which often...
1021 CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations
2605.07325
Cached state for real-time LLM提出大规模缓存状态表示以降低机器人LLM首token延迟。
cs.AI
Robin Karlsson, Go Suzui
Deploying massive large language models (LLMs) as continuous cognitive engines for robotics is bottlenecked by the time-to-first-token (TTFT) latency required to process extensive state histories. Existing solutions like RAG or sliding windows compromise globa...
Deploying massive large language models (LLMs) as continuous cognitive engines for robotics is bottlenecked by the time-to-first-token (TTFT) latency required to process extensive state histories. Existing solutions like RAG or sliding windows compromise global context or incur prohibitive re-computation costs. We formalize the optimal task structure for minimizing latency and theoretically prove that prefix stability, incremental extensibility, and asynchronous state reconciliation are necessar...
1022 MORPH-U: Multi-Objective Resilient Motion Planning for V2X-Enabled Autonomous Driving in High-Uncertainty Environments via Simulation
2605.07370
Robust V2X motion planning在CARLA中实现融合V2X与多传感器的鲁棒运动规划控制。
cs.AI
Shih-Yu Lai
V2X can warn an autonomous vehicle about hazards beyond line-of-sight, but it also brings uncertainty: messages may be delayed, dropped, or even forged. Meanwhile, map knowledge may change during a trip, forcing the vehicle to replan under tight real-time budg...
V2X can warn an autonomous vehicle about hazards beyond line-of-sight, but it also brings uncertainty: messages may be delayed, dropped, or even forged. Meanwhile, map knowledge may change during a trip, forcing the vehicle to replan under tight real-time budgets. This paper studies how to make motion planning and low-level control robust to such uncertain, event-driven updates. We present MORPH-U, a CARLA-based closed-loop stack that fuses LiDAR/radar/camera with V2X (CAM/DENM) into a Local Dyn...
1023 Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation
2605.07381
Data-efficient robot adaptation用锚点中心自适应避免示教多样性陷阱并提升操控迁移。
cs.AI
Yanzhe Chen, Kevin Yuchen Ma, Qi Lv, Yiqi Lin, Zechen Bai
While Vision-Language-Action (VLA) models offer broad general capabilities, deploying them on specific hardware requires real-world adaptation to bridge the embodiment gap. Since robot demonstrations are costly, this adaptation must often occur under a strict ...
While Vision-Language-Action (VLA) models offer broad general capabilities, deploying them on specific hardware requires real-world adaptation to bridge the embodiment gap. Since robot demonstrations are costly, this adaptation must often occur under a strict data budget. In this work, we identify a critical diversity trap: the standard heuristic of "maximizing coverage" by collecting diverse, single-shot demonstrations can be self-defeating due to non-vanishing estimation noise. We formalize th...
1024 OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing
2605.07414
Jailbreaking tool-calling T2I agents用编排引导模糊测试攻击工具链式文生图代理以生成有害输出。
cs.AI
Jianming Chen, Yawen Wang, Junjie Wang, Zhe Liu, Qing Wang
Tool-calling text-to-image (T2I) agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where ...
Tool-calling text-to-image (T2I) agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where individually benign steps combine into unsafe results, making prompt-only jailbreak techniques insufficient. We present OrchJail, an orchestration-guided fuzzing framework for jailbreaking tool-calling T2I agents. Its core idea is to exploi...
1025 Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study
2605.07422
Prompting for qualitative coding对比提示策略以评估LLM进行心理安全质性编码的可靠性。
cs.AI
Moaath Alshaikh, Tasneem Alshaher, Ricardo Vieira, Beatriz Santana, Clelio Xavier
Qualitative analysis plays a pivotal role in understanding the human and social aspects of software engineering. However, it remains a demanding process shaped by the subjective interpretation of individual researchers and sensitive to methodological choices s...
Qualitative analysis plays a pivotal role in understanding the human and social aspects of software engineering. However, it remains a demanding process shaped by the subjective interpretation of individual researchers and sensitive to methodological choices such as prompt design. Recent advancements in Large Language Models (LLMs) offer promising opportunities to support this type of analysis, although their reliability in reproducing human qualitative reasoning under varying prompting conditio...
1026 Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning
2605.07444
Physics-informed flow surrogate用物理约束学习实现搅拌罐流场的加速且数据高效预测。
cs.AI
Mahdi Naderibeni, Liang Wu, David M. J. Tax
The simulation of fluid flows is computationally expensive due to the complexity of its governing partial differential equations. Machine learning models offer a potential surrogate, enabling learning from simulations and significantly faster predictions of fl...
The simulation of fluid flows is computationally expensive due to the complexity of its governing partial differential equations. Machine learning models offer a potential surrogate, enabling learning from simulations and significantly faster predictions of flow fields. However, these models require large training datasets, which introduces a trade-off between dataset generation cost and predictive accuracy. In this work, we investigate the relationship between the size of the training-set and a...
1027 HBEE: Human Behavioral Entropy Engine -- Pre-Registered Multi-Agent LLM Simulation of Peer-Suspicion-Based Detection Inversion
2605.07472
LLM insider threat simulation用多智能体LLM仿真检验内部威胁检测并发现检测反转现象。
cs.AI
Vickson Ferrel
Insider threat detection assumes that an adaptive insider leaves behavioral residue distinguishing them from legitimate users. We test this assumption against an LLM-driven adaptive insider in a controlled multi-agent simulator. Our pre-registered five-conditi...
Insider threat detection assumes that an adaptive insider leaves behavioral residue distinguishing them from legitimate users. We test this assumption against an LLM-driven adaptive insider in a controlled multi-agent simulator. Our pre-registered five-condition study isolates defender mode (cascade vs. blind UEBA) crossed with adversary type (naive vs. adaptive OPSEC) plus a no-mole control, across 100 runs (95 valid after pre-committed exclusions). The primary finding is a detection inversion:...
1028 Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs
2605.07481
Attacks on LLM watermarks提出语义改写等攻击系统性破坏多种LLM输出水印方案。
cs.AI
Jonathan Hong Jin Ng, Anh Tu Ngo, Anupam Chattopadhyay
In this paper, we investigate the recent state-of-the-art schemes for watermarking large language models (LLMs) outputs. These techniques are claimed to be robust, scalable and production-grade, aimed at promoting responsible usage of LLMs. We analyse the effe...
In this paper, we investigate the recent state-of-the-art schemes for watermarking large language models (LLMs) outputs. These techniques are claimed to be robust, scalable and production-grade, aimed at promoting responsible usage of LLMs. We analyse the effectiveness of these watermarking techniques against an extensive collection of modified text attacks, which perform targeted semantic changes without altering the general meaning of the text content. Our approach encompasses multiple attack ...
1029 LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation
2605.07517
Link-aware RAG retrieval利用技术文档超链接拓扑改进RAG检索与答案依据性。
cs.AI
Giorgia Bolognesi, Claudio Estatico, Ulderico Fugacci, Isabella Mastroianni, Claudio Muselli
Retrieval-Augmented Generation (RAG) enhances the factual grounding of Large Language Models by conditioning their outputs on external documents. However, standard embedding-based retrievers treat naturally structured corpora, such as technical manuals, as fla...
Retrieval-Augmented Generation (RAG) enhances the factual grounding of Large Language Models by conditioning their outputs on external documents. However, standard embedding-based retrievers treat naturally structured corpora, such as technical manuals, as flat collections of passages, thereby overlooking the hyperlink topology that users rely on when navigating such content. We introduce LARAG (Link-Aware RAG): a lightweight, link-aware retrieval strategy that leverages the author-defined hyper...
1030 The Endogeneity of Miscalibration: Impossibility and Escape in Scored Reporting
2605.07671
Truthful reporting miscalibration分析评分报告中非准确收益导致校准内生性与不可行性边界。
cs.AI
Lauri Lov\'en, Sasu Tarkoma
Eliciting truthful reports from autonomous agents is a core problem in scalable AI oversight: a principal scores the agent's report using a strictly proper scoring rule, but the agent also benefits from the report through a non-accuracy channel (approval for a...
Eliciting truthful reports from autonomous agents is a core problem in scalable AI oversight: a principal scores the agent's report using a strictly proper scoring rule, but the agent also benefits from the report through a non-accuracy channel (approval for autonomous action, allocation share, downstream control). The same structure appears in classical mechanism-design settings such as marketplace operation. Our main result is an endogeneity: the principal's optimal oversight necessarily uses ...
1031 Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation
2605.07694
Speaker distance from reverberation分解RIR早晚混响成分以解释单通道说话人距离估计依赖。
cs.AIcs.SDeess.AS
Michael Neri, Archontis Politis, Tuomas Virtanen
Single-channel speaker distance estimation has recently achieved centimeter-level accuracy in simulated environments, yet it remains unclear which components of the room impulse response (RIR) the model exploits and how performance depends on the recording con...
Single-channel speaker distance estimation has recently achieved centimeter-level accuracy in simulated environments, yet it remains unclear which components of the room impulse response (RIR) the model exploits and how performance depends on the recording conditions. In this work, we decompose simulated RIRs into four variants (full, direct-only, no-late, and no-early) using the mixing time estimated from the echo density function as the boundary between early reflections and late reverberation...
1032 Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization
2605.07705
Logic for encoder-decoder transformers用新时序逻辑刻画含交叉注意力的编码器解码器Transformer能力。
cs.AI
Veeti Ahvonen, Damian Heiman, Antti Kuusisto, Miguel Moreno, Matias Selin
We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such transformers over text in the practical setting of floating...
We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such transformers over text in the practical setting of floating-point numbers and soft-attention, characterizing them with a new temporal logic. This logic extends propositional logic with a counting global modality over the encoder input and a past modality over the decoder input. We also give an addi...
1033 The AI-Native Large-Scale Agile Software Development Manifesto
2605.07717
AI-native agile manifesto提出面向组织级规模敏捷的AI原生开发宣言与原则。
cs.AI
Ricardo Britto, Fredrik Palmgren, Nishrith Saini, Marcus Ohlin
Despite the widespread adoption of agile methods, achieving true agility at scale remains elusive. Large-scale agile frameworks remain largely human-centric and manual, relying on coordination meetings, artifact synchronization, and role-based handoffs that in...
Despite the widespread adoption of agile methods, achieving true agility at scale remains elusive. Large-scale agile frameworks remain largely human-centric and manual, relying on coordination meetings, artifact synchronization, and role-based handoffs that inhibit real-time adaptation. Meanwhile, rapid advances in AI, particularly large language models, have begun transforming software engineering, yet their potential for organizational-level agility remains underexplored. We present the AI-Nat...
1034 LLM hallucinations in the wild: Large-scale evidence from non-existent citations
2605.07723
LLM citation hallucination audit大规模核查论文引用并量化LLM普及后虚构引用的增长。
cs.AI
Zhenyue Zhao, Yihe Wang, Toby Stuart, Mathijs De Vaan, Paul Ginsparg
Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable obj...
Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estim...
1035 Vibe coding before the trend
2605.07751
Vibe coding education study分析多学生队列的vibe coding挑战反思与能力迁移模式。
cs.AI
Leon van Bokhorst, Koen Suilen
Early 2025 we ran a series of vibe coding challenges across four different student cohorts. The cohorts included 54 ICT students, 24 digital marketing students, and 7 journalism students at Fontys University of Applied Sciences (Netherlands), and 22 BA Communi...
Early 2025 we ran a series of vibe coding challenges across four different student cohorts. The cohorts included 54 ICT students, 24 digital marketing students, and 7 journalism students at Fontys University of Applied Sciences (Netherlands), and 22 BA Communication students at North-West University (South Africa). From the student reflections, five major patterns emerged. Students reported that AI tools shifted their focus from syntax to higher-order thinking; they also described a skill shift ...
1036 CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
2605.07830
Bias in cyber LLM agents构建CyBiasBench量化网络攻击LLM代理的攻击选择偏置。
cs.AI
Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim, Hoki Kim
Large language models (LLMs) are increasingly deployed as autonomous agents in offensive cybersecurity. In this paper, we reveal an interesting phenomenon: different agents exhibit distinct attack patterns. Specifically, each agent exhibits an attack-selection...
Large language models (LLMs) are increasingly deployed as autonomous agents in offensive cybersecurity. In this paper, we reveal an interesting phenomenon: different agents exhibit distinct attack patterns. Specifically, each agent exhibits an attack-selection bias, disproportionately concentrating its efforts on a narrow subset of attack families regardless of prompt variations. To systematically quantify this behavior, we introduce CyBiasBench, a comprehensive 630-session benchmark that evalua...
1037 Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer
2605.07870
Spectral dynamics of deep nets用两层DMFT刻画训练中权重谱演化与离群特征学习机制。
cs.AI
Clarissa Lauditi, Cengiz Pehlevan, Blake Bordelon
We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike ...
We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike directions remain statistically dependent on the random bulk. We apply this framework to two settings: (1) infinite-width nonlinear networks in mean-field/$\mu$P scaling and (2) deep linear networks in the proportional high-dimensional limi...
1038 What if AI systems weren't chatbots?
2605.07896
Beyond chatbot AI interfaces批判聊天机器人范式并分析其对社会技术系统的结构性影响。
cs.AI
Sourojit Ghosh, Pranav Narayanan Venkit, Sanjana Gautam, Avijit Ghosh
The rapid convergence of artificial intelligence (AI) toward conversational chatbot interfaces marks a critical moment for the industry. This paper argues that the chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration ...
The rapid convergence of artificial intelligence (AI) toward conversational chatbot interfaces marks a critical moment for the industry. This paper argues that the chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration whose widespread adoption reshapes social, economic, legal, and environmental systems. We examine how treating AI primarily as conversational assistants has extensive structural downsides. We show how chatbot-based systems often fail to ade...
1039 BeeVe: Unsupervised Acoustic State Discovery in Honey Bee Buzzing
2605.07903
Unsupervised bee buzz states用自监督声学特征与量化聚类无监督发现蜜蜂嗡鸣状态。
cs.AIcs.SD
Hamze Hammami, Nidhal Abdulaziz
Discovering structure in biological signals without supervision is a fundamental problem in computational intelligence, yet existing bioacoustic methods assume vocal production models or predefined semantic units, leaving non-vocal species poorly served. This ...
Discovering structure in biological signals without supervision is a fundamental problem in computational intelligence, yet existing bioacoustic methods assume vocal production models or predefined semantic units, leaving non-vocal species poorly served. This work introduces BeeVe, an unsupervised framework for acoustic state discovery in collective honey bee buzzing. BeeVe uses the self-supervised Patchout Spectrogram Transformer (PaSST) as a frozen feature extractor, then trains a Vector-Quant...
1040 Sycophantic AI makes human interaction feel more effortful and less satisfying over time
2605.07912
Effects of sycophantic AI纵向实验表明奉承型AI降低人际互动满意度并增加交流负担。
cs.AI
Lujain Ibrahim, Franziska Sofia Hafner, Myra Cheng, Cinoo Lee, Rebecca Anselmetti
Millions of people now turn to artificial intelligence (AI) systems for personal advice, guidance, and support. Such systems can be sycophantic, frequently affirming users' views and beliefs. Across five preregistered studies (N = 3,075 participants, 12,766 hu...
Millions of people now turn to artificial intelligence (AI) systems for personal advice, guidance, and support. Such systems can be sycophantic, frequently affirming users' views and beliefs. Across five preregistered studies (N = 3,075 participants, 12,766 human-AI conversations), including a three-week study with a census-representative U.S. sample, we provide longitudinal experimental evidence that sycophantic AI shifts how users approach their closest relationships. We show that sycophantic ...
1041 Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation
2605.07985
LLM inference profiling simulator提出配置无关且冗余感知的剖析方法以加速LLM推理仿真。
cs.AI
Joon Ha Kim, Geon-Woo Kim, Anoop Rachakonda, Daehyeok Kim
Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet ...
Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet they hardcode their operation set to a specific configuration and re-profile every operation from scratch, making exploration prohibitively expensive. This cost stems from a missing structural understanding: every input dimension of each op...
1042 Towards Apples to Apples for AI Evaluations: From Real-World Use Cases to Evaluation Scenarios
2605.07986
Methodology for AI evaluations提出从真实用例到评测场景的流程以实现可比的AI评估。
cs.AI
Yee-Yin Choong, Kristen Greene, Alice Qian, Meryem Marasli, Ziqi Yang
AI measurement science has a wide variety of methodologies and measurements for comparing AI systems, resulting in what often appear to be "apples-to-oranges" comparisons across AI evaluations. To move toward "apples-to-apples" comparisons in real-world AI eva...
AI measurement science has a wide variety of methodologies and measurements for comparing AI systems, resulting in what often appear to be "apples-to-oranges" comparisons across AI evaluations. To move toward "apples-to-apples" comparisons in real-world AI evaluations, this work advocates for methodological transparency in evaluation scenarios, operational grounding, and human-centered design (HCD) principles. We propose a repeatable process for transforming high-level use cases to detailed scen...
1043 Position: Agent Should Invoke External Tools ONLY When Epistemically Necessary
2506.00886
Epistemic tool-use principle主张代理仅在认知上必要时才调用外部工具并给出论证框架。
cs.AI
Hongru Wang, Cheng Qian, Manling Li, Jiahao Qiu, Boyang Xue
As large language models evolve into tool-augmented agents, a central question remains unresolved: when is external tool use actually justified? Existing agent frameworks typically treat tools as ordinary actions and optimize for task success or reward, offeri...
As large language models evolve into tool-augmented agents, a central question remains unresolved: when is external tool use actually justified? Existing agent frameworks typically treat tools as ordinary actions and optimize for task success or reward, offering little principled distinction between epistemically necessary interaction and unnecessary delegation. This position paper argues that agents should invoke external tools only when epistemically necessary. Here, epistemic necessity means ...
1044 LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
2508.16571
LLM agents for drug due diligence用代理检索并抽取适应症竞品药物信息以支持资产尽调。
cs.AI
Vlad Vinogradov (Optic Inc), Alisa Vinogradova (AI Expert), Dmitrii Radkevich (Optic Inc), Ilya Yasny (Optic Inc), Dmitry Kobyzev (Optic Inc)
In this paper, we describe and benchmark a competitor-discovery component used within an agentic AI system for fast drug asset due diligence. A competitor-discovery AI agent, given an indication, retrieves all drugs comprising the competitive landscape of that...
In this paper, we describe and benchmark a competitor-discovery component used within an agentic AI system for fast drug asset due diligence. A competitor-discovery AI agent, given an indication, retrieves all drugs comprising the competitive landscape of that indication and extracts canonical attributes for these drugs. The competitor definition is investor-specific, and data is paywalled/licensed, fragmented across registries, ontology-mismatched by indication, alias-heavy for drug names, mult...
1045 BEAVER: An Efficient Deterministic LLM Verifier
2512.05439
Deterministic LLM safety verification提出BEAVER计算LLM满足安全性质的确定性且可靠概率界。
cs.AI
Tarun Suresh, Nalin Wadhwa, Debangshu Banerjee, Gagandeep Singh
As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify model outputs and characterize tail risk for safe deployment. While sampling-based estimates provide an ad-hoc intuit...
As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify model outputs and characterize tail risk for safe deployment. While sampling-based estimates provide an ad-hoc intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM satisfaction of safety properties. Given a prompt & any safety property, BE...
1046 AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management
2512.10371
Program-guided GUI agent context用程序引导的上下文管理减少长程GUI代理历史开销并保语义。
cs.AI
Shizuo Tian, Hao Wen, Yuxuan Chen, Jiacheng Liu, Shanhui Zhao
The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the reliance on ever-expanding interaction history incurs substantial con...
The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the reliance on ever-expanding interaction history incurs substantial context overhead. Existing context management and compression techniques often fail to preserve vital semantic information, leading to degraded task performance. We propose AgentProg, a program-guided approach for agent context management that...
1047 TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent
2601.18700
Tool-augmented emotional support benchmark发布TEA-Bench评测可调用工具的情感支持对话代理的可信性。
cs.AI
Xingyu Sui, Yanyan Zhao, Yulin Hu, Jiahe Guo, Weixiang Zhao
Emotional Support Conversation requires not only affective expression but also grounded instrumental support to provide trustworthy guidance. However, existing ESC systems and benchmarks largely focus on affective support in text-only settings, overlooking how...
Emotional Support Conversation requires not only affective expression but also grounded instrumental support to provide trustworthy guidance. However, existing ESC systems and benchmarks largely focus on affective support in text-only settings, overlooking how external tools can enable factual grounding and reduce hallucination in multi-turn emotional support. We introduce TEA-Bench, the first interactive benchmark for evaluating tool-augmented agents in ESC, featuring realistic emotional scenar...
1048 THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
2601.23143
Self-generated safety alignment提出ThinkSafe用自生成数据对推理模型进行安全对齐而少损性能。
cs.AI
Seanie Lee, Sangwoo Park, Yumin Choi, Gyeongman Kim, Minki Kang
Large reasoning models (LRMs) achieve remarkable performance by leveraging reinforcement learning (RL) on reasoning tasks to generate long chain-of-thought (CoT) reasoning. However, this over-optimization often prioritizes compliance, making models vulnerable ...
Large reasoning models (LRMs) achieve remarkable performance by leveraging reinforcement learning (RL) on reasoning tasks to generate long chain-of-thought (CoT) reasoning. However, this over-optimization often prioritizes compliance, making models vulnerable to harmful prompts. To mitigate this safety degradation, recent approaches rely on external teacher distillation, yet this introduces a distributional discrepancy that degrades native reasoning. We propose ThinkSafe, a self-generated alignm...
1049 Supervised sparse auto-encoders for interpretable and compositional representations
2602.00924
Supervised sparse autoencoders通过监督与新优化框架训练可解释且可组合的稀疏自编码特征。
cs.AI
Ouns El Harzli, Hugo Wallner, Yoonsoo Nam, Haixuan Xavier Tao
Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability, yet they face two significant challenges: the non-smoothness of the $L_1$ penalty, which hinders reconstruction and scalability, and a lack of alignment between...
Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability, yet they face two significant challenges: the non-smoothness of the $L_1$ penalty, which hinders reconstruction and scalability, and a lack of alignment between learned features and human semantics. In this paper, we address these limitations by adapting unconstrained feature models-a mathematical framework from neural collapse theory-and by supervising the task. We supervise (decoder-only) SAEs t...
1050 WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning
2602.12852
Web agent trajectory pruning用图建模与剪枝压缩网页代理轨迹以提升信息检索搜索效率。
cs.AI
Junjie Wang, Zequn Xie, Dan Yang, Jie Feng, Yue Shen
Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajector...
Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajectories with cyclic reasoning loops and exploration of unproductive branches. To address this, we propose WebClipper, a framework that compresses web agent trajectories via graph-based pruning. Concretely, we model the agent's search process as...
1051 Hunt Globally: Wide Search AI Agents for Drug Asset Scouting in Investing, Business Development, and Competitive Intelligence
2602.15019
Drug asset scouting agents提出广域搜索AI代理从多语渠道挖掘药物资产情报。
cs.AI
Vlad Vinogradov, Alisa Vinogradova, Luba Greenwood, Ilya Yasny, Dmitry Kobyzev
Bio-pharmaceutical innovation has shifted: many new drug assets now originate outside the United States and are disclosed primarily via regional, non-English channels. Recent data suggests over 85% of patent filings originate outside the U.S., with China accou...
Bio-pharmaceutical innovation has shifted: many new drug assets now originate outside the United States and are disclosed primarily via regional, non-English channels. Recent data suggests over 85% of patent filings originate outside the U.S., with China accounting for nearly half of the global total; a growing share of scholarly output is also non-U.S. Industry estimates put China at 30% of global drug development, spanning 1,200+ novel candidates. In this high-stakes environment, failing to su...
1052 ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices
2602.21858
Proactive mobile agent benchmark构建ProactiveMobile基准评测手机端主动智能能力。
cs.AI
Dezhi Kong, Zhengzhao Feng, Qiliang Liang, Hao Wang, Haofei Sun
Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user commands. The emerging paradigm of proactive ...
Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user commands. The emerging paradigm of proactive intelligence, where agents autonomously anticipate needs and initiate actions, represents the next frontier for mobile agents. However, its development is critically bottlenecked by the lack of benchmarks that can address real-world complex...
1053 Making AI Evaluation Deployment Relevant Through Context Specification
2603.06811
Context-aware AI evaluation提出情境规格化流程使AI评估更贴近部署现实。
cs.AI
Matthew Holmes, Thiago Lacerda, Reva Schwartz
With many organizations struggling to gain value from AI deployments, pressure to evaluate AI in an informed manner has intensified. Status quo AI evaluation approaches often mask the operational realities that ultimately determine deployment success, making i...
With many organizations struggling to gain value from AI deployments, pressure to evaluate AI in an informed manner has intensified. Status quo AI evaluation approaches often mask the operational realities that ultimately determine deployment success, making it difficult for organizational decision makers to know whether and how AI tools will deliver durable value. We introduce and describe context specification as a process to support and inform this decision making process. Context specificati...
1054 MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants
2603.09652
Interactive HTML assistant benchmark提出MiniAppBench评测LLM生成交互式HTML小应用能力。
cs.AI
Zuhao Zhang, Chengyue Yu, Yuante Li, Chenyi Zhuang, Linjian Mo
With the rapid advancement of Large Language Models (LLMs) in code generation, human-AI interaction is evolving from static text responses to dynamic, interactive HTML-based applications, which we term MiniApps. These applications require models to not only re...
With the rapid advancement of Large Language Models (LLMs) in code generation, human-AI interaction is evolving from static text responses to dynamic, interactive HTML-based applications, which we term MiniApps. These applications require models to not only render visual interfaces but also construct customized interaction logic that adheres to real-world principles. However, existing benchmarks primarily focus on algorithmic correctness or static layout reconstruction, failing to capture the ca...
1055 Emergent social transmission of model-based representations without inference
2604.05777
Social transmission in RL用强化学习模拟展示简单社交线索可传递高层表征。
cs.AI
Silja Ke{\ss}ler, Miriam Bautista-Salinero, Claudio Tennie, Charley M. Wu
How do people acquire rich, flexible knowledge about their environment from others despite limited cognitive capacity? Humans are often thought to rely on computationally costly mentalizing, such as inferring others' beliefs. In contrast, cultural evolution em...
How do people acquire rich, flexible knowledge about their environment from others despite limited cognitive capacity? Humans are often thought to rely on computationally costly mentalizing, such as inferring others' beliefs. In contrast, cultural evolution emphasizes that behavioral transmission can be supported by simple social cues. Using reinforcement learning simulations, we show how minimal social learning can indirectly transmit higher-level representations. We simulate a na\"ive agent se...
1056 LACE: Lattice Attention for Cross-thread Exploration
2604.15529
Cross-thread attention reasoning提出LACE用跨线程注意力让并行推理路径共享纠错。
cs.AI
Yang Li, Zirui Zhang, Yang Liu, Chengzhi Mao
Current large language models reason in isolation. Although it is common to sample multiple reasoning paths in parallel, these trajectories do not interact, and often fail in the same redundant ways. We introduce LACE, a framework that transforms reasoning fro...
Current large language models reason in isolation. Although it is common to sample multiple reasoning paths in parallel, these trajectories do not interact, and often fail in the same redundant ways. We introduce LACE, a framework that transforms reasoning from a collection of independent trials into a coordinated, parallel process. By repurposing the model architecture to enable cross-thread attention, LACE allows concurrent reasoning paths to share intermediate insights and correct one another...
1057 Harnessing Pre-Resolution Signals for Future Prediction Agents
2604.15719
Forecasting with evolving evidence利用未决问题的时序信号训练更好的未来预测代理。
cs.AI
Chuyang Wei, Maohang Gao, Zhixin Han, Kefei Chen, Yu Zhuang
Many high-stakes decisions depend on forecasts made before outcomes are known. In this future prediction setting, the central challenge is that public evidence evolves over time, while the main supervision signal arrives only after resolution: the realized out...
Many high-stakes decisions depend on forecasts made before outcomes are known. In this future prediction setting, the central challenge is that public evidence evolves over time, while the main supervision signal arrives only after resolution: the realized outcome mainly assesses final correctness, offering only coarse guidance on what to track, what to verify, and which judgments to leave uncertain along the way. Our key observation is that revisiting the same unresolved question over time crea...
1058 GamED.AI: A Hierarchical Multi-Agent Framework for Automated Educational Game Generation
2604.23947
Educational game generation agents用分层多代理将题目自动生成可玩且可验证的教学游戏。
cs.AI
Shiven Agarwal, Yash Shah, Ashish Raj Shekhar, Priyanuj Bordoloi, Vivek Gupta
We introduce GamEDAI, a hierarchical multi-agent framework that transforms instructor-provided questions into fully playable, pedagogically grounded educational games validated through formal mechanic contracts. Built on phase-based LangGraph sub-graphs, deter...
We introduce GamEDAI, a hierarchical multi-agent framework that transforms instructor-provided questions into fully playable, pedagogically grounded educational games validated through formal mechanic contracts. Built on phase-based LangGraph sub-graphs, deterministic Quality Gates, and structured Pydantic schemas, GamEDAI supports two template families encompassing 15 interaction mechanics across spatial reasoning, procedural execution, and higher-order Bloom's Taxonomy objectives. Evaluated on...
1059 AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning
2605.00425
Entropy modulation for agentic RL提出自适应熵调制以改进多轮代理强化学习的信用分配。
cs.AI
Haotian Zhao, Songlin Zhou, Yuxin Zhang, Stephen S. -T. Yau, Wenyu Zhang
Reinforcement learning (RL) has substantially improved the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. However, effective agentic RL remains challenging: sparse outcome-only rewards provide limited gui...
Reinforcement learning (RL) has substantially improved the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. However, effective agentic RL remains challenging: sparse outcome-only rewards provide limited guidance for assigning credit to individual steps within long interaction trajectories. Existing approaches often introduce dense intermediate supervision, such as process reward models or auxiliary self-supervised signals, which increases sup...
1060 Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization
2605.01482
Causal multi-hop fact verification用结构因果模型与GRPO约束多跳事实核验推理链。
cs.AI
Yunhan Bu, Quan Zhang, Huaping Zhang, Guotong Geng, Chunxiao Gao
Multi-Hop Fact Verification (MHFV) necessitates complex reasoning across disparate evidence, posing significant challenges for Large Language Models (LLMs) which often suffer from hallucinations and fractured logical chains. Existing methods, while improving t...
Multi-Hop Fact Verification (MHFV) necessitates complex reasoning across disparate evidence, posing significant challenges for Large Language Models (LLMs) which often suffer from hallucinations and fractured logical chains. Existing methods, while improving transparency via Chain-of-Thought (CoT), lack explicit modeling of the causal dependencies between evidence and claims. In this work, we introduce a novel framework that grounds reasoning in a Structural Causal Model (SCM), treating verifica...
1061 Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems
2605.01758
Jailbreak defense in MAS提出前瞻引导防御抑制多代理系统中的感染式越狱传播。
cs.AI
Yue Ma, Ziyuan Yang, Yi Zhang
Large multimodal model-based Multi-Agent Systems (MASs) enable collaborative complex problem solving through specialized agents. However, MASs are vulnerable to infectious jailbreak, where compromising a single agent can spread to others, leading to widespread...
Large multimodal model-based Multi-Agent Systems (MASs) enable collaborative complex problem solving through specialized agents. However, MASs are vulnerable to infectious jailbreak, where compromising a single agent can spread to others, leading to widespread compromise. Existing defenses counter this by training a more contagious cure factor, biasing agents to retrieve it over virus adversarial examples (VirAEs). However, this homogenizes agent responses, providing only superficial suppression...
1062 Computing Thiele Rules on Interval Elections and their Generalizations
2605.03067
Thiele rules on interval elections研究区间偏好下Thiele类委员会规则的计算与推广。
cs.AI
Dimitris Avramidis, Alexandra Lassota, Ulrike Schmidt-Kraepelin, Adrian Vetta
Approval-based committee voting has received significant attention in the social choice community. Among the studied rules, Thiele rules, and especially Proportional Approval Voting (PAV), stand out for desirable properties such as proportional representation,...
Approval-based committee voting has received significant attention in the social choice community. Among the studied rules, Thiele rules, and especially Proportional Approval Voting (PAV), stand out for desirable properties such as proportional representation, Pareto optimality, and support monotonicity. Their main drawback is that computing a Thiele outcome is NP-hard in general. A glimpse of hope comes from the fact that Thiele rules are better behaved under structured preferences. On the cand...
1063 Who Prices Cognitive Labor in the Age of Agents? Compute-Anchored Wages
2605.05558
Economics of agent wages提出算力锚定工资模型解释代理时代认知劳动定价。
cs.AI
Siqi Zhu
A natural intuition about the economics of AI agents is that, because agents can be replicated at very low marginal cost, agent labor may be supplied highly elastically, placing downward pressure on cognitive-labor wages when it closely substitutes for human l...
A natural intuition about the economics of AI agents is that, because agents can be replicated at very low marginal cost, agent labor may be supplied highly elastically, placing downward pressure on cognitive-labor wages when it closely substitutes for human labor. We argue this framing is wrong in mechanism but partially correct in conclusion, and that the correction matters for both theory and policy. \textbf{Agents are not labor; they are a production technology that converts compute capital ...
1064 MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System
2605.05949
Multi-agent coding workflow提出MAS-Algorithm流程用多代理协作解算法编程题。
cs.AI
Yuliang Xu, Xiang Xu, Yao Wan, Hu Wei, Tong Jia
Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios. Existing approaches predominantly rely on model-c...
Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios. Existing approaches predominantly rely on model-centric strategies, such as architectural modifications and data scaling, which are costly and offer limited interpretability. Alternative methods leveraging external tools or prompting techniques (e.g., chain-of-thought) are often fragmente...
1065 Temporal Smoothness Doubly Robust Learning for Debiased Knowledge Tracing
2605.05958
Debiased knowledge tracing提出时间平滑的双重稳健学习以消除知识追踪选择偏差。
cs.AI
Peilin Zhan, Wei Chen, Weilin Chen, Shuyi Pan, Ruichu Cai
Knowledge Tracing (KT) is fundamental to intelligent education systems, yet relies on educational logs that are selectively observed. The non-random nature of exercise recommendations and student choices inevitably induces severe selection bias. Most existing ...
Knowledge Tracing (KT) is fundamental to intelligent education systems, yet relies on educational logs that are selectively observed. The non-random nature of exercise recommendations and student choices inevitably induces severe selection bias. Most existing KT methods neglect this issue, training on observed logs using standard empirical risk, which yields biased mastery estimates and accumulates errors in subsequent recommendations. To address this, we introduce a doubly robust (DR) formulati...
1066 CrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMs
2605.06115
Cross-cultural knowledge insertion benchmark构建CrossCult-KIBench评测多模态模型跨文化知识注入。
cs.AI
Zhen Zeng, Leijiang Gu, Feng Li, Jing Yu, Zenglin Shi
Multimodal Large Language Models (MLLMs), trained primarily on English-centric data, frequently generate culturally inappropriate or misaligned responses in cross-cultural settings. To mitigate this, we introduce the task of cross-cultural knowledge insertion,...
Multimodal Large Language Models (MLLMs), trained primarily on English-centric data, frequently generate culturally inappropriate or misaligned responses in cross-cultural settings. To mitigate this, we introduce the task of cross-cultural knowledge insertion, which focuses on adapting models to specific cultural contexts while preserving their original behavior in other cultures. To facilitate research in this area, we introduce CrossCult-KIBench, a comprehensive evaluation benchmark for assess...
1067 Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
2605.06130
Skill library co-evolution RL提出Skill1用单一策略联合学习技能选用与蒸馏进化。
cs.AI
Yaorui Shi, Yuxin Chen, Zhengxi Lu, Yuchun Miao, Shugui Liu
A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from ...
A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from experience. Existing methods optimize these capabilities in isolation or with separate reward sources, resulting in partial and conflicting evolution. We propose Skill1, a framework that trains a single policy to co-evolve skill selection, ...
1068 Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries
2605.06223
Clarifying instance navigation用比较判断主动提问以消解含糊查询的实例导航目标。
cs.AI
Junhyuk Kwon, Seungjoon Lee, Hyejin Park, Kyle Min, Jungseul Ok
Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target fro...
Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collec...
1069 Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence
2605.06230
Trustworthy agent training infrastructure提出Safactory闭环基础设施用于训练与评测可信自主代理。
cs.AI
Xinquan Chen, Zhenyun Yin, Shan He, Bin Huang, Shanzhe Lei
As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data ...
As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data management, and agent evolution, making it difficult to discover risks systematically and improve models in a continuous closed loop. In this report, we present \textbf{Safactory}, a scalable agent factory for trustworthy autonomous intelli...
1070 ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning
2605.06483
Natural language to STL用工具增强与过程奖励学习将自然语言需求翻译为STL。
cs.AI
Bowen Ye, Zhijian Li, Junyue Huang, Junkai Ma, Xiang Yin
Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practi...
Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practice, however, users often express their requirements in natural language rather than in structured STL formulas, making natural-language-to-STL translation a critical yet challenging task. Manual specification requires temporal-logic experti...
1071 Goal-Conditioned Decision Transformer for Multi-Goal Offline Reinforcement Learning
2410.06347
Goal-conditioned decision transformer提出目标条件决策Transformer用于离线多目标机器人强化学习。
cs.AI
Pawe{\l} Gajewski, Dominik \.Zurek, Marcin Pietro\'n, Kamil Faber
Reinforcement learning (RL) in robotics faces significant hurdles regarding sample efficiency and generalization across varying goals. While Offline RL mitigates the need for costly online interactions, its integration with goal-conditioned policies and transf...
Reinforcement learning (RL) in robotics faces significant hurdles regarding sample efficiency and generalization across varying goals. While Offline RL mitigates the need for costly online interactions, its integration with goal-conditioned policies and transformer-based architectures remains underexplored. We introduce a Goal-Conditioned Decision Transformer adapted for offline multi-goal robotics. By explicitly incorporating goal states into the sequence modeling framework, our approach effici...
1072 UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios
2410.06355
Multimodal command understanding for robots提出UNCOM融合语音手势与场景实现零样本桌面指令理解。
cs.AI
Antonio Galiza Cerdeira Gonzalez, Pawe{\l} Gajewski, Bipin Indurkhya
This paper presents UNCOM, a novel hybrid framework for interpreting natural human commands in tabletop scenarios. The system integrates multiple sources of information -- speech, gestures, and scene context -- to extract structured, actionable instructions fo...
This paper presents UNCOM, a novel hybrid framework for interpreting natural human commands in tabletop scenarios. The system integrates multiple sources of information -- speech, gestures, and scene context -- to extract structured, actionable instructions for robots. Addressing the need for general-purpose human-robot interaction in domestic environments, UNCOM is designed for zero-shot operation, without reliance on predefined object models or training data specific to a given task. Using fou...
1073 Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points
2412.11194
Vulnerability detection survey综述自动化漏洞检测研究并系统梳理关键痛点与挑战。
cs.AI
Dan Ristea, Shae McFadden, Ezzeldin Shereen, Madeleine Dwyer, Sanyam Vyas
Security vulnerabilities in software can have severe consequences; however, manual vulnerability detection is costly and does not scale, especially as agentic coding frameworks increase the rate of code production. Over the last decade, a large body of researc...
Security vulnerabilities in software can have severe consequences; however, manual vulnerability detection is costly and does not scale, especially as agentic coding frameworks increase the rate of code production. Over the last decade, a large body of research has applied machine learning machine learning to automate vulnerability detection (ML4AVD), yet self-reported performance on the most popular datasets shows no clear upward trend. The ML4AVD research community has identified several flaws...
1074 Federated Spatiotemporal Graph Learning for Passive Attack Detection in Smart Grids
2510.02371
Federated smart grid attack detection提出联邦时空图学习检测智能电网被动窃听攻击。
cs.AI
Bochra Al Agha, Razane Tajeddine
Smart grids are exposed to passive eavesdropping, where attackers listen silently to communication links. Although no data is actively altered, such reconnaissance can reveal grid topology, consumption patterns, and operational behavior, creating a gateway to ...
Smart grids are exposed to passive eavesdropping, where attackers listen silently to communication links. Although no data is actively altered, such reconnaissance can reveal grid topology, consumption patterns, and operational behavior, creating a gateway to more severe targeted attacks. Detecting this threat is difficult because the signals it produces are faint, short-lived, and often disappear when traffic is examined by a single node or along a single timeline. This paper introduces a graph...
1075 Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies
2510.22944
Prompt-induced code insecurity量化糟糕提示词诱发代码缺陷率并给出安全缓解策略。
cs.AI
Bin Wang, YiLu Zhong, MiDi Wan, WenJie Yu, YuanBing Ouyang
Large language models (LLMs) have become indispensable for automated code generation, yet the quality and security of their outputs remain a critical concern. Existing studies predominantly concentrate on adversarial attacks or inherent flaws within the models...
Large language models (LLMs) have become indispensable for automated code generation, yet the quality and security of their outputs remain a critical concern. Existing studies predominantly concentrate on adversarial attacks or inherent flaws within the models. However, a more prevalent yet underexplored issue concerns how the quality of a benign but poorly formulated prompt affects the security of the generated code. To investigate this, we first propose an evaluation framework for prompt quali...
1076 Continually Evolving Skill Knowledge in Vision Language Action Model
2511.18085
Continual learning for VLA提出无增参的持续模仿学习框架让VLA模型持续积累技能。
cs.AI
Yuxuan Wu, Guangming Wang, Zhiheng Yang, Tianchen Deng, Maoqing Yao
Vision-language-action (VLA) models show promising knowledge accumulation ability from pretraining, yet continual learning in VLA remains challenging, especially for efficient adaptation. Existing continual imitation learning (CIL) methods often rely on additi...
Vision-language-action (VLA) models show promising knowledge accumulation ability from pretraining, yet continual learning in VLA remains challenging, especially for efficient adaptation. Existing continual imitation learning (CIL) methods often rely on additional parameters or external modules, limiting scalability for large VLA models. We propose Stellar VLA, a knowledge-driven CIL framework without increasing network parameters. Two progressively extended variants are designed: T-Stellar for ...
1077 Switching-time bioprocess control with pulse-width-modulated optogenetics
2511.22893
Optogenetic bioprocess control用PWM光遗传控制优化生物过程切换时刻与产量。
cs.AI
Sebasti\'an Espinel-R\'ios
Biotechnology can benefit from dynamic control to improve production efficiency. In this context, optogenetics enables modulation of gene expression using light as an external input, allowing fine-tuning of protein levels to unlock dynamic metabolic control an...
Biotechnology can benefit from dynamic control to improve production efficiency. In this context, optogenetics enables modulation of gene expression using light as an external input, allowing fine-tuning of protein levels to unlock dynamic metabolic control and regulation of cell growth. Optogenetic systems can be actuated by light intensity. However, relying solely on intensity-driven control (i.e., signal amplitude) may fail to properly tune optogenetic bioprocesses when the dose-response rela...
1078 Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies
2512.09682
MARL scaling UAV relay task构建无人机群一次性数据投递任务用于MARL规模化研究。
cs.AI
Mika Persson, Jonas Lidman, Jacob Ljungberg, Samuel Sandelius, Adam Andersson
This work studies the application of Multi-Agent Reinforcement Learning (MARL) to decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed ...
This work studies the application of Multi-Agent Reinforcement Learning (MARL) to decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed for MARL scaling studies. A robust baseline policy is proposed which restricts agent motion and applies Dijkstra's shortest path algorithm. Computational experiment results show that two off-the-shelf MARL algorithms perform competitively w...
1079 PerfCoder: Large Language Models for Interpretable Code Performance Optimization
2512.14018
LLM code performance optimization提出PerfCoder以可解释监督指导LLM生成高性能优化代码。
cs.AI
Jiuding Yang, Shengyao Lu, Hongxuan Liu, Shayan Shirahmad Gale Bagi, Zahra Fazel
Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in real-world software systems. We argue that current LLMs struggle not only...
Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in real-world software systems. We argue that current LLMs struggle not only due to data scarcity but, more importantly, because they lack supervision that guides interpretable and effective performance improvements. In this work, we introduce PerfCoder, a family of LLMs specifically designed to generate performanc...
1080 Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing
2601.15356
High-resolution IQA agent probing提出Q-Probe用上下文感知代理探测实现高分辨率图像质量评估。
cs.AI
Xiang Li, Xueheng Li, Yu Wang, Xuanhua He, Zhangchi Hu
Reinforcement Learning (RL) has empowered Multimodal Large Language Models (MLLMs) to achieve superior human preference alignment in Image Quality Assessment (IQA). However, existing RL-based IQA models typically rely on coarse-grained global views, failing to...
Reinforcement Learning (RL) has empowered Multimodal Large Language Models (MLLMs) to achieve superior human preference alignment in Image Quality Assessment (IQA). However, existing RL-based IQA models typically rely on coarse-grained global views, failing to capture subtle local degradations in high-resolution scenarios. While emerging "Thinking with Images" paradigms enable multi-scale visual perception via zoom-in mechanisms, their direct adaptation to IQA induces spurious "cropping-implies-...
1081 Replicating Human Motivated Reasoning Studies with LLMs
2601.16130
LLM Motivated Reasoning Replication复现政治动机推理实验,发现基础LLM不呈现人类式偏差。
cs.AI
Neeley Pate, Adiba Mahbub Proma, Hangfeng He, James N. Druckman, Daniel C. Molden
Motivated reasoning - the idea that individuals processing information may be motivated to either arrive at accurate beliefs or arrive at desired conclusions - has been well-explored as a human phenomenon. However, it remains unclear whether base LLMs are affe...
Motivated reasoning - the idea that individuals processing information may be motivated to either arrive at accurate beliefs or arrive at desired conclusions - has been well-explored as a human phenomenon. However, it remains unclear whether base LLMs are affected by motivational manipulations. Replicating 4 prior political motivated reasoning studies, we find that base LLM behavior does not align with expected human behavior. Furthermore, base LLM behavior across models shares some similarities...
1082 MirrorMark: Generalizable Mirrored Sampling for Multi-bit LLM Watermarking
2601.22246
Multi-bit LLM Watermarking提出MirrorMark镜像采样,实现低失真多比特文本水印。
cs.AI
Ya Jiang, Massieh Kordi Boroujeny, Surender Suresh Kumar, Kai Zeng
As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but most existing methods either provide only...
As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but most existing methods either provide only binary signals or achieve multi-bit embedding by distorting the generation distribution. We propose MirrorMark, a generalizable mapping-centric approach for multi-bit LLM watermarking. MirrorMark separates the symbol mapping rule from the ...
1083 Spectral Filtering for Complex Linear Dynamical Systems
2601.22400
Spectral Filtering for Dynamical Systems用Slepian谱滤波学习复值线性系统并给出无维度遗憾界。
cs.AI
Elad Hazan, Annie Marsden
We study the problem of learning complex-valued linear dynamical systems (CLDS) with sector-bounded spectrum. This class captures oscillatory and long-memory dynamics arising in signal processing, structured state space models, and quantum systems. We introduc...
We study the problem of learning complex-valued linear dynamical systems (CLDS) with sector-bounded spectrum. This class captures oscillatory and long-memory dynamics arising in signal processing, structured state space models, and quantum systems. We introduce a spectral filtering method based on the Slepian basis and show that learnability is governed by an effective dimension independent of the ambient state dimension. As a consequence, we obtain dimension-free regret bounds for sequence pred...
1084 Latent-Space Causal Discovery from Indirect Neuroimaging Observations
2602.09034
Latent Causal Discovery in Neuroimaging在成像物理与非平稳假设下,从间接观测恢复潜在因果结构。
cs.AI
Sangyoon Bae, Miruna Oprescu, David Keetae Park, Shinjae Yoo, Jiook Cha
Neuroimaging does not observe causal variables directly: hemodynamics and volume conduction distort signals so that statistical dependence need not reflect latent neural influence. Before estimating graphs, one must specify under what assumptions delayed direc...
Neuroimaging does not observe causal variables directly: hemodynamics and volume conduction distort signals so that statistical dependence need not reflect latent neural influence. Before estimating graphs, one must specify under what assumptions delayed directed structure can be studied from such indirect observations. We formalize a conditional setting - recoverable inversion under modality physics together with nonstationary latent dynamics - and derive an inversion-error propagation bound un...
1085 Discovering Multiagent Learning Algorithms with Large Language Models
2602.16928
LLM-Driven MARL Algorithm Discovery用LLM进化式编码代理自动搜索并改进多智能体学习算法。
cs.AI
Zun Li, John Schultz, Daniel Hennes, Marc Lanctot
Much of the advancement in Multi-Agent Reinforcement Learning (MARL) for imperfect-information games has historically depended on the manual, iterative refinement of algorithmic baselines. Recently, evolutionary coding agents powered by Large Language Models (...
Much of the advancement in Multi-Agent Reinforcement Learning (MARL) for imperfect-information games has historically depended on the manual, iterative refinement of algorithmic baselines. Recently, evolutionary coding agents powered by Large Language Models (LLMs) have emerged as powerful tools to automate this discovery process. In this work, we deploy one of such agentic frameworks, AlphaEvolve, to navigate the design spaces of two distinct game-theoretic paradigms: counterfactual regret mini...
1086 Contextual Invertible World Models: A Neuro-Symbolic Agentic Framework for Colorectal Cancer Drug Response
2603.02274
Neuro-Symbolic Drug Response Modeling提出可逆世界模型结合LLM推理,解释并预测结直肠癌药物反应。
cs.AI
Christopher Baker, Karen Rafferty, Hui Wang
Precision oncology is currently limited by the small-N, large-P paradox, where high-dimensional genomic data is abundant but pharmacological response samples are sparse. While deep learning achieves predictive accuracy, it frequently fails to provide the mecha...
Precision oncology is currently limited by the small-N, large-P paradox, where high-dimensional genomic data is abundant but pharmacological response samples are sparse. While deep learning achieves predictive accuracy, it frequently fails to provide the mechanistic clarity required for clinical adoption. We present the Contextual Invertible World Model (CIWM), a Neuro-Symbolic Agentic Framework that bridges this gap by integrating a quantitative machine learning emulator with an LLM-based reaso...
1087 Shaping the Future of Mathematics in the Age of AI
2603.24914
AI Impact on Mathematics讨论AI对数学价值、教学与伦理的影响并提出社区建议。
cs.AI
Johan Commelin, Mateja Jamnik, Rodrigo Ochigame, Lenny Taelman, Akshay Venkatesh
Artificial intelligence is transforming mathematics at a speed and scale that demand active engagement from the mathematical community. We examine five areas where this transformation is particularly pressing: values, practice, teaching, technology, and ethics...
Artificial intelligence is transforming mathematics at a speed and scale that demand active engagement from the mathematical community. We examine five areas where this transformation is particularly pressing: values, practice, teaching, technology, and ethics. We offer recommendations on safeguarding our intellectual autonomy, rethinking our practice, broadening curricula, building academically oriented infrastructure, and developing shared ethical principles - with the aim of ensuring that the...
1088 Muon Dynamics as a Spectral Wasserstein Flow
2604.04891
Spectral Wasserstein Optimization Dynamics将谱归一化优化刻画为谱Wasserstein流,分析Muon类动态稳定性。
cs.AI
Gabriel Peyr\'e
Gradient normalization stabilizes deep-learning optimization, and spectral normalizations are especially natural for matrix-shaped parameter blocks; Muon is the motivating example. We study an idealized deterministic, continuous-time, vanishing-momentum versio...
Gradient normalization stabilizes deep-learning optimization, and spectral normalizations are especially natural for matrix-shaped parameter blocks; Muon is the motivating example. We study an idealized deterministic, continuous-time, vanishing-momentum version of this idea in the mean-field regime, where wide models are represented by probability measures on parameter space. Starting from normalized matrix flows, we introduce Spectral Wasserstein distances indexed by norms $\gamma$ on positive ...
1089 Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study
2604.08059
AI Component Lifecycle Governance提出AI组件演化的兼容性检查、监控与回滚治理框架。
cs.AI
Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li
Software systems built from versioned AI components increasingly need lifecycle-time governance: when a capability module evolves into a new version, the hosting system must decide whetmeher the new version may be activated safely, under what deployment condit...
Software systems built from versioned AI components increasingly need lifecycle-time governance: when a capability module evolves into a new version, the hosting system must decide whetmeher the new version may be activated safely, under what deployment conditions, with what monitoring, and when it should be rolled back. Existing software-deployment patterns (canary, blue-green, feature flags, MLOps pipelines) address parts of this loop but were designed for stateless web services rather than st...
1090 Code World Model Preparedness Report
2605.00932
Frontier Model Preparedness Evaluation对Meta代码世界模型做发布前风险与失配倾向评估并给出结论。
cs.AI
Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd
This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catas...
This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned propensities. Our assessment found that CWM does not pose additional frontier risks beyond those present in the current AI ecosystem. We therefore release it as an open-weight model.
1091 Beyond Retrieval: A Multitask Benchmark and Model for Code Search
2605.04615
Code Retrieval and Reranking Benchmark发布CoREB多任务代码检索重排基准并训练高质量重排模型。
cs.AI
Siqiao Xue, Zihan Liao, Jin Qin, Ziyin Zhang, Yixiang Mu
Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary re...
Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce \textsc{CoREB}, a contamination-limited, multitask \underline{co}de \underline{r}etrieval and r\underline{e}ranking \underline{b}enchmark, together with a fine-tuned code reranker, that goes beyond retri...
1092 LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts
2605.05110
Line-Guided Robot Reinforcement Learning用线条引导与稀疏姿态约束,让自行车机器人无示范学会特技。
cs.AI
Seungeun Rho, Shamel Fahmi, Jeonghwan Kim, Arianna Ilvonen, Sehoon Ha
Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guid...
Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physi...
1093 How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study
2605.05340
VLM Physical-World Privacy Awareness构建评测并实证分析VLM在真实物理场景中的隐私意识能力。
cs.AI
Junran Wang, Xinjie Shen, Zehao Jin, Pan Li
As Vision-Language Models (VLMs) are increasingly deployed as autonomous cognitive cores for embodied assistants, evaluating their privacy awareness in physical environments becomes critical. Unlike digital chatbots, these agents operate in intimate spaces, su...
As Vision-Language Models (VLMs) are increasingly deployed as autonomous cognitive cores for embodied assistants, evaluating their privacy awareness in physical environments becomes critical. Unlike digital chatbots, these agents operate in intimate spaces, such as homes and hospitals, where they possess the physical agency to observe and manipulate privacy-sensitive information and artifacts. However, current benchmarks remain limited to unimodal, text-based representations that cannot capture ...
1094 Towards an Inferentialist Account of Information Through Proof-theoretic Semantics
2605.05368
Proof-Theoretic Semantics of Information以证明论语义提出信息的推理主义基础框架与关键组成。
cs.AI
Matthew Collinson, Timo Eckhardt, David Pym
Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools f...
Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools for understanding the complex ecosystems of systems upon which the society depends. We seek to rectify this by taking a first step towards developing an inferentialist semantic theory of information. There are three key interacting component...
1095 AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
2605.06607
AI Agents for CFD Discovery提出面向CFD仿真的物理感知AI科学家代理,支持开放式发现流程。
cs.AI
Nithin Somasekharan, Rabi Pathak, Manushri Dhanakoti, Tingwen Zhang, Ling Yue
Recent LLM-based agents have closed substantial portions of the scientific discovery loop in software-only machine-learning research, in chemistry, and in biology. Extending the same loop to high-fidelity physical simulators is harder, because solver completio...
Recent LLM-based agents have closed substantial portions of the scientific discovery loop in software-only machine-learning research, in chemistry, and in biology. Extending the same loop to high-fidelity physical simulators is harder, because solver completion does not imply physical validity and many failure modes appear only in field-level imagery rather than in solver logs. We present AI CFD Scientist, an open-source AI scientist for computational fluid dynamics (CFD) that, to our knowledge,...
cs.CL 211 papers
281 Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas
2605.06673
LLM Metacognitive Monitoring Atlas构建33模型域级元认知图谱,量化不同MMLU领域的置信度校准差异。
cs.CLcs.LGcs.AI
Jon-Paul Cacioli
Aggregate metacognitive quality scores mask within-model variation across MMLU benchmark domains. We administered 1,500 MMLU items (250 per domain, under an a priori six-domain grouping) to 33 frontier LLMs from eight model families and computed Type-2 AUROC p...
Aggregate metacognitive quality scores mask within-model variation across MMLU benchmark domains. We administered 1,500 MMLU items (250 per domain, under an a priori six-domain grouping) to 33 frontier LLMs from eight model families and computed Type-2 AUROC per model-domain cell using verbalized confidence (0-100). Total observations: 47,151. Every model with above-chance aggregate monitoring showed non-trivial domain-level variation. Applied/Professional knowledge was reliably the easiest benc...
282 VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing
2605.06765
Expressive Spoken Language Model提出端到端语音语言模型,支持角色扮演式表达与歌唱生成。
cs.CLcs.AI
Jiacheng Xu, Heting Gao, Liufei Xie, Zhenchuan Yang, Lijiang Li
Human speech conveys expressiveness beyond linguistic content, including personality, mood, or performance elements, such as a comforting tone or humming a song, which we formalize as role-playing and singing. We present VITA-QinYu, the first expressive end-to...
Human speech conveys expressiveness beyond linguistic content, including personality, mood, or performance elements, such as a comforting tone or humming a song, which we formalize as role-playing and singing. We present VITA-QinYu, the first expressive end-to-end (E2E) spoken language model (SLM) that goes beyond natural conversation to support both role-playing and singing generation. VITA-QinYu adopts a hybrid speech-text paradigm that extends interleaved text-audio modeling with multi-codebo...
283 IntentGrasp: A Comprehensive Benchmark for Intent Understanding
2605.06832
Intent Understanding Benchmark发布IntentGrasp基准,统一多语料格式评测LLM意图理解能力。
cs.CLcs.LGcs.AI
Yuwei Yin, Chuyuan Li, Giuseppe Carenini
Accurately understanding the intent behind speech, conversation, and writing is crucial to the development of helpful Large Language Model (LLM) assistants. This paper introduces IntentGrasp, a comprehensive benchmark for evaluating the intent understanding ca...
Accurately understanding the intent behind speech, conversation, and writing is crucial to the development of helpful Large Language Model (LLM) assistants. This paper introduces IntentGrasp, a comprehensive benchmark for evaluating the intent understanding capability of LLMs. Derived from 49 high-quality, open-licensed corpora spanning 12 diverse domains, IntentGrasp is constructed through source datasets curation, intent label contextualization, and task format unification. IntentGrasp contain...
284 TajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLP
2605.06886
Tajik-Persian Lexical Resource构建塔吉克-波斯跨文字词汇资源,并比较混合、神经与检索方法。
cs.CL
Mullosharaf K. Arabov
This work introduces TajPersLexon, a curated Tajik--Persian parallel lexical resource of 40,112 word and short-phrase pairs for cross-script lexical retrieval, transliteration, and alignment in low-resource settings. We conduct a comprehensive CPU-only benchma...
This work introduces TajPersLexon, a curated Tajik--Persian parallel lexical resource of 40,112 word and short-phrase pairs for cross-script lexical retrieval, transliteration, and alignment in low-resource settings. We conduct a comprehensive CPU-only benchmark comparing three methodological families: (i) a lightweight hybrid pipeline, (ii) neural sequence-to-sequence models, and (iii) retrieval methods. Our evaluation establishes that the task is essentially solvable, with neural and retrieval...
285 MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes
2605.06897
Speech Tool-Calling Smart Home提出MIST多模态语音工具调用助手,建模智能家居状态与时空约束交互。
cs.CLcs.AIcs.SDeess.AScs.MM
Maximillian Chen, Xuanming Zhang, Michael Peng, Zhou Yu, Alexandros Papangelis
The rise of Internet of Things (IoT) devices in the physical world necessitates voice-based interfaces capable of handling complex user experiences. While modern Large Language Models (LLMs) already demonstrate strong tool-usage capabilities, modeling real-wor...
The rise of Internet of Things (IoT) devices in the physical world necessitates voice-based interfaces capable of handling complex user experiences. While modern Large Language Models (LLMs) already demonstrate strong tool-usage capabilities, modeling real-world IoT devices presents a difficult, understudied challenge which combines modeling spatiotemporal constraints with speech inputs, dynamic state tracking, and mixed-initiative interaction patterns. We introduce MIST (the Multimodal Interact...
286 Reflections and New Directions for Human-Centered Large Language Models
2605.06901
Human-Centered LLM Framework提出人本LLM开发框架,指导评测与部署以对齐人类价值与需求。
cs.CL
Caleb Ziems, Dora Zhao, Rose E. Wang, Matthew J\"orke, Ahmad Rushdi
Large Language Models (LLMs) are increasingly shaping the private and professional lives of users, with numerous applications in business, education, finance, healthcare, law, and science. With this rise in global influence comes greater urgency to build, eval...
Large Language Models (LLMs) are increasingly shaping the private and professional lives of users, with numerous applications in business, education, finance, healthcare, law, and science. With this rise in global influence comes greater urgency to build, evaluate, and deploy these systems in a manner that prioritizes not only technical capabilities but also human priorities. This work presents a framework for developing Human-Centered Large Language Models (HCLLMs), which integrates perspective...
287 MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
2605.06903
AI-Generated Text Detection提出多任务均衡学习检测器,提升AI文本检测的鲁棒性与低误报表现。
cs.CLcs.AI
Chenjun Li, Cheng Wan, Johannes C. Paetzold
Large language models are now embedded in everyday writing workflows, making reliable AI-generated text detection important for academic integrity, content moderation, and provenance tracking. In practice, however, a detector must do more than achieve high agg...
Large language models are now embedded in everyday writing workflows, making reliable AI-generated text detection important for academic integrity, content moderation, and provenance tracking. In practice, however, a detector must do more than achieve high aggregate AUROC on clean, in-distribution human and AI text: it should remain robust to attacks and adversarial rewrites, transfer to unseen generators and domains, and operate at low false-positive rates (FPR). Most existing detectors optimiz...
288 Can LLMs Take Retrieved Information with a Grain of Salt?
2605.06919
RAG Context Certainty Obedience评测LLM对检索证据不确定性的服从度,分析其在高风险场景的局限。
cs.CL
Behzad Shayegh, Mohamed Osama Ahmed, Fred Tung, Leo Feng
Large language models have demonstrated impressive retrieval-augmented capabilities. However, a crucial area remains underexplored: their ability to appropriately adapt responses to the certainty of the retrieved information. It is a limitation with real conse...
Large language models have demonstrated impressive retrieval-augmented capabilities. However, a crucial area remains underexplored: their ability to appropriately adapt responses to the certainty of the retrieved information. It is a limitation with real consequences in high-stakes domains like medicine and finance. We evaluate eight LLMs on their context-certainty obedience, measuring how well they adjust responses to match expressed context certainty. Our analysis reveals systematic limitation...
289 MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media
2605.06940
Bengali LLM Annotation Benchmark发布孟加拉社媒多维标注基准,诊断闭集指令导致的标签塌缩问题。
cs.CL
Souvik Pramanik, S. M. Riaz Rahman Antu, Shak Mohammad Abyad, Md. Ibrahim Khalil, Md. Shahriar Hussain
Annotation automation via Large Language Models (LLMs) is the core approach for scaling NLP datasets; however, LLM behavior with respect to closed-set instructions in low-resource languages has not been well studied. We present MultiSoc-4D, a Bengali social me...
Annotation automation via Large Language Models (LLMs) is the core approach for scaling NLP datasets; however, LLM behavior with respect to closed-set instructions in low-resource languages has not been well studied. We present MultiSoc-4D, a Bengali social media dataset benchmark, which contains 58K+ social media comments from six sources annotated along four dimensions: category, sentiment, hate speech, and sarcasm. By employing a structured pipeline where ChatGPT, Gemini, Claude, and Grok ind...
290 Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries
2605.06978
Group-Structured Skill Retrieval提出分组结构化技能检索,将技能库结果组织为可执行入口与支持关系。
cs.CLcs.AI
Kun Zeng, Yu Huo, Siyu Zhang, Zi Ye, Yuecheng Zhuo
Skill-augmented agents increasingly rely on large reusable skill libraries, but retrieving relevant skills is not the same as presenting usable context. Existing methods typically return atomic skills or dependency-aware bundles whose internal roles remain imp...
Skill-augmented agents increasingly rely on large reusable skill libraries, but retrieving relevant skills is not the same as presenting usable context. Existing methods typically return atomic skills or dependency-aware bundles whose internal roles remain implicit, leaving the agent to infer the execution entry point, support skills, visible requirements, and failure-avoidance guidance. We introduce Group of Skills (GoSkills), an inference-time group-structured retrieval method that changes the...
291 Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion
2605.07013
Bitstream Diffusion Language Modeling提出熵门控连续比特流扩散语言模型,缩小与自回归生成的质量差距。
cs.CL
Georgios Batzolis, Mark Girolami, Luca Ambrogioni
Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in sample quality and diversity. Recent continuous flow and diffusion approaches over token embe...
Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in sample quality and diversity. Recent continuous flow and diffusion approaches over token embeddings have narrowed this gap, suggesting continuous state spaces are highly effective for language. In this work, we further close the autoregressive gap by modeling text as a continuous diffusion process over fixed-width binary bitstreams...
292 Cognitive Agent Compilation for Explicit Problem Solver Modeling
2605.07040
Cognitive Agent Compilation将LLM行为编译为显式可编辑的问题求解者模型,便于教学场景可控与可解释。
cs.CLcs.AI
Hyeongdon Moon, Carolyn Ros\'e, John Stamper
Large language models (LLMs) are widely used for tutoring, feedback generation, and content creation, but their broad pretraining makes them hard to constrain and poor substitutes for controllable learners. Educational systems often require inspectable and edi...
Large language models (LLMs) are widely used for tutoring, feedback generation, and content creation, but their broad pretraining makes them hard to constrain and poor substitutes for controllable learners. Educational systems often require inspectable and editable knowledge states: educators want to know what a system assumes the learner knows, and learners benefit when the system can justify actions in terms of explicit skills, misconceptions, and strategies. Inspired by cognitive architecture...
293 NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models
2605.07051
Science Math Riddles Benchmark提出科学与数学谜题基准,用开放式问答评测LLM教育推理能力。
cs.CL
George Boateng, Naafi Ibrahim, Samuel John, Philemon Badu, Patrick Agyeman-Budu
Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics education. Yet, LLMs tend to be evaluated on science and mathematical educational datasets from...
Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics education. Yet, LLMs tend to be evaluated on science and mathematical educational datasets from the Western world, with an underrepresentation of datasets from the Global South. Furthermore, they tend to have multiple-choice answer options that are trivial to evaluate. In this work, we present NSMQ Riddles, a novel benchmark of Scien...
294 GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations
2605.07053
Semantic Variant Benchmark Augmentation提出GSM-SEM框架,生成语义多样的数学题变体以评估鲁棒性与防记忆。
cs.CLcs.AI
Jyotika Singh, Fang Tu, Aziza Mirzadova, Amit Agarwal, Hitesh Laxmichand Patel
Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorization of fixed test sets. Most robustness variants apply surface-level perturbations (paraphrases, renamings, number swaps, ...
Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorization of fixed test sets. Most robustness variants apply surface-level perturbations (paraphrases, renamings, number swaps, distractors) that largely preserve the underlying facts, and static releases can themselves become memorization targets over time. We introduce GSM-SEM, a reusable and stochastic framework for generating semantically diverse benchmark varia...
295 MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments
2605.07058
Clinical Diagnosis LLM Agent训练医疗LLM代理在噪声环境中提问、检查并诊断,贴近真实临床流程。
cs.CLcs.AI
Yicheng Gao, Xiaolin Zhou, Yahan Li, Yue Zhao, Ruishan Liu
Real-world clinical diagnosis is a complex process in which the doctor is required to obtain information from both interaction with the patient and conducting medical exams. Additionally, the doctor needs to adapt to different patient personas, as well as nois...
Real-world clinical diagnosis is a complex process in which the doctor is required to obtain information from both interaction with the patient and conducting medical exams. Additionally, the doctor needs to adapt to different patient personas, as well as noisy and incomplete information that can happen at any time during the process. However, existing benchmarks for medical LLMs and methods for automatic diagnosis largely simplify this process by reducing it to single-turn question answering, n...
296 WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems
2605.07068
LLM Wiki Knowledge Compilation提出WiCER迭代编译评估精炼流程,减少LLM将文档蒸馏成Wiki时的信息丢失。
cs.CLcs.AI
Juan M. Huerta
The LLM Wiki pattern, to compile and provide domain knowledge into a persistent artifact and serve it to LLMs via KV cache inference, promises context access at sub-second latency with zero retrieval failure. Realizing this requires solving the compilation gap...
The LLM Wiki pattern, to compile and provide domain knowledge into a persistent artifact and serve it to LLMs via KV cache inference, promises context access at sub-second latency with zero retrieval failure. Realizing this requires solving the compilation gap: LLM compilation distilling raw documents into a wiki without catastrophically discarding critical facts. We characterize this gap across 17 RepLiQA domains (6,800 questions): we observe that full context KV cache inference outperforms RAG...
297 Self-Consolidating Language Models: Continual Knowledge Incorporation from Context
2605.07076
Continual Context Consolidation提出SCoL后训练方法,将长上下文知识持续写入权重并抑制干扰遗忘。
cs.CLcs.LG
Zekun Wang, Anant Gupta, Zihan Dong, Christopher J. MacLellan
Large language models (LLMs) increasingly receive information as streams of passages, conversations, and long-context workflows. While longer context windows expose more evidence, they do not ensure that useful information is preserved and reused. We study con...
Large language models (LLMs) increasingly receive information as streams of passages, conversations, and long-context workflows. While longer context windows expose more evidence, they do not ensure that useful information is preserved and reused. We study continual context consolidation: writing current context into model weights while limiting interference with previously consolidated information. We propose \textbf{S}elf-\textbf{Co}nsolidating \textbf{L}anguage Models (SCoL), a post-training ...
298 Beyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR Evaluation
2605.07084
ASR Evaluation Epistemic Injustice批判ASR单一真值转写评测,揭示转写规范差异带来的认识不公。
cs.CL
Anna Seo Gyeong Choi, Maria Teleki, James Caverlee, Miguel del Rio, Corey Miller
Automatic speech recognition (ASR) evaluation compares system output to ground truth transcripts, with Word Error Rate (WER) quantifying the distance between them. But ground truth transcripts are not discovered - they are produced by human annotators followin...
Automatic speech recognition (ASR) evaluation compares system output to ground truth transcripts, with Word Error Rate (WER) quantifying the distance between them. But ground truth transcripts are not discovered - they are produced by human annotators following conventions that encode normative assumptions about which speech features matter. Different conventions (verbatim, non-verbatim, legal) produce different transcripts of identical speech and judge the same ASR output differently. This pape...
299 The Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual Benchmarks
2605.07093
Translation Tax Audit Benchmarks对英译中基准进行反事实审计,表明翻译线索继承效应并非统一标量。
cs.CLcs.LGcs.AI
Zezheng Lin, Fengming Liu, Handi Li
The Translation Tax is often treated as a scalar: translated benchmarks are assumed to inflate scores by preserving English-source cues. We audit this claim in an English-to-Chinese setting. Three proxy estimators disagree: back-translation gaps are small and ...
The Translation Tax is often treated as a scalar: translated benchmarks are assumed to inflate scores by preserving English-source cues. We audit this claim in an English-to-Chinese setting. Three proxy estimators disagree: back-translation gaps are small and parser-fragile; cue-score calibration does not predict item-level gains; and a six-model native-control comparison shows model-family rather than uniform benchmark effects. We add a same-item LLM-naturalization stress test that holds answer...
300 SAGE: Hierarchical LLM-Based Literary Evaluation through Ontology-Grounded Interpretive Dimensions
2605.07102
LLM Literary Evaluation Ontology提出SAGE分层本体维度框架,用LLM多轮反思评估文学质量。
cs.CL
Tianyu Wang, Nianjun Zhou
Evaluating literary quality requires assessing interpretive dimensions such as cultural representation, emotional depth, and philosophical sophistication that resist straightforward computational measurement. We introduce SAGE, a hierarchical evaluation framew...
Evaluating literary quality requires assessing interpretive dimensions such as cultural representation, emotional depth, and philosophical sophistication that resist straightforward computational measurement. We introduce SAGE, a hierarchical evaluation framework that decomposes literary quality into ontology-grounded interpretive dimensions assessed through structured large language model evaluation with multi-round iterative reflection and independent validation. We validate the framework on 1...
301 Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning
2605.07106
Latent Visual Reasoning提出空间语义落地的潜变量视觉推理框架以缓解信息瓶颈。
cs.CL
Jin Cui, Xinyue Long, Xunyong Zhang, Yadong Zhang, Chuanchang Su
Multimodal Large Language Models (MLLMs) have made remarkable progress on vision-language reasoning, yet most methods still compress visual evidence into discrete textual thoughts, creating an information bottleneck for fine-grained perception. Recent latent v...
Multimodal Large Language Models (MLLMs) have made remarkable progress on vision-language reasoning, yet most methods still compress visual evidence into discrete textual thoughts, creating an information bottleneck for fine-grained perception. Recent latent visual reasoning methods attempt to reason in continuous hidden states, but we find that they suffer from insufficient manifold compatibility: latent trajectories drift away from pretrained reasoning circuits, collapse into instance-agnostic...
302 Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability
2605.07110
Secure Computer-Use Agents提出覆盖架构到生命周期的框架以提升电脑操作代理部署可靠性。
cs.CL
Zejian Chen, Zhanyuan Liu, Chaozhuo Li, Mengxiang Han, Songyang Liu
Computer-use agents(CUAs)are moving frombounded benchmarks toward real software environments, wherethey operate browsers, desktops, mobile applications, flesystems,terminals, and tool backends. In such settings, reliability isno longer captured by task success...
Computer-use agents(CUAs)are moving frombounded benchmarks toward real software environments, wherethey operate browsers, desktops, mobile applications, flesystems,terminals, and tool backends. In such settings, reliability isno longer captured by task success alone: perception errors,planning drift, memory use, tool mediation, permission scope,and runtime oversight jointly determine whether agent actionsremain aligned with user intent, Existing surveys organize theCUA landscape by methods, plat...
303 Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation
2605.07111
Optimizer Routing for Adaptation用梯度引导在LoRA与全参微调间路由优化器以适配LLM。
cs.CLcs.AI
Haozhan Tang, Xiuqi Zhu, Xinyin Zhang, Boxun Li, Virginia Smith
Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for high-entropy knowledge injection, Low-Rank Adaptation (LoRA) can match or surpass FFT per...
Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for high-entropy knowledge injection, Low-Rank Adaptation (LoRA) can match or surpass FFT performance because many tasks only require updates in a low-rank space and benefit from LoRA's additional regularization. Through empirical evaluation across diverse tasks (SQL, Medical QA, and Counterfactual Knowledge) and varying language m...
304 Region4Web: Rethinking Observation Space Granularity for Web Agents
2605.07134
Web Agent Observation Regions将网页观察从元素级改为功能区域级以提升Web代理感知与决策。
cs.CLcs.AI
Donguk Kwon, Dongha Lee
Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization ...
Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization implicit and forcing the agent to infer it from element-level signals at every step. We argue observation should instead operate at the granularity of functional regions, parts of the page that each serve a distinct purpose. We propose Regi...
305 Structural Rationale Distillation via Reasoning Space Compression
2605.07139
Rationale Distillation Compression通过压缩并复用推理路径库来蒸馏更稳定的结构化推理监督。
cs.CLcs.LGcs.AI
Jialin Yang, Jiankun Wang, Jiajun Wu, Henry Leung, Jiayu Zhou
When distilling reasoning from large language models (LLMs) into smaller ones, teacher rationales for similar problems often vary wildly in structure and strategy. Like a chef who makes the same dish differently each time, this inconsistency burdens the studen...
When distilling reasoning from large language models (LLMs) into smaller ones, teacher rationales for similar problems often vary wildly in structure and strategy. Like a chef who makes the same dish differently each time, this inconsistency burdens the student with noisy supervision that is hard to internalize. We propose Distillation through Reasoning Path Compression (D-RPC), which constrains the teacher to follow a compact, dynamically maintained bank of reusable high-level reasoning paths. ...
306 Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs
2605.07153
RL for Knowledge Recall证明仅用正确性奖励的强化学习可提升LLM闭卷事实召回能力。
cs.CL
Wanli Yang, Hongyu Zang, Junwei Zhang, Wenjie Shi, Du Su
Reinforcement learning (RL) has achieved remarkable success in LLM reasoning, but whether it can also improve direct recall of parametric knowledge remains an open question. We study this question in a controlled zero-shot, one-hop, closed-book QA setting with...
Reinforcement learning (RL) has achieved remarkable success in LLM reasoning, but whether it can also improve direct recall of parametric knowledge remains an open question. We study this question in a controlled zero-shot, one-hop, closed-book QA setting with no chain-of-thought, training only on binary correctness rewards and applying fact-level train-test deduplication to ensure gains reflect improved recall rather than reasoning or memorization. Across three model families and multiple factu...
307 CLIPer: Tailoring Diverse User Preference via Classifier-Guided Inference-Time Personalization
2605.07162
Inference-Time Personalization用分类器在推理时动态引导生成以实现多偏好个性化而无需多次微调。
cs.CL
Jinyan Su, Jinpeng Zhou, Claire Cardie, Wen Sun
Personalized LLMs can significantly enhance user experiences by tailoring responses to preferences such as helpfulness, conciseness, and humor. However, fine-tuning models to address all possible combinations of user preferences is computationally expensive an...
Personalized LLMs can significantly enhance user experiences by tailoring responses to preferences such as helpfulness, conciseness, and humor. However, fine-tuning models to address all possible combinations of user preferences is computationally expensive and impractical. In this paper, we introduce \textbf{CLIPer}(\textbf{Cl}assifier-guided \textbf{I}nference-time \textbf{Per}sonalization), a lightweight personalization approach that leverages a classifier model to steer LLM generation dynami...
308 Rethinking Experience Utilization in Self-Evolving Language Model Agents
2605.07164
Experience Use in Agents研究自进化代理在运行时何时以及如何按需使用历史经验。
cs.CL
Weixiang Zhao, Yingshuo Wang, Yichen Zhang, Yanyan Zhao, Yu Zhang
Self-evolving agents improve by accumulating and reusing experience from past interactions. Existing work has largely focused on how experience is constructed, represented, and updated, while paying less attention to how experience should be used during runtim...
Self-evolving agents improve by accumulating and reusing experience from past interactions. Existing work has largely focused on how experience is constructed, represented, and updated, while paying less attention to how experience should be used during runtime decision-making. As a result, most agents rely on rigid usage strategies, either injecting experience once at initialization or at every step, without considering whether it is needed for the current decision. This paper studies experienc...
309 A Reproducible Multi-Architecture Baseline for Token-Level Chinese Metaphor Identification under the MIPVU Framework
2605.07170
Chinese Metaphor Identification提供可复现的多架构基线用于中文MIPVU词级隐喻识别。
cs.CL
Yufeng Wu
Metaphor is pervasive in everyday language, yet token-level computational identification of metaphor-related words in Chinese under the MIPVU framework remains under-explored relative to English. This paper presents a reproducible multi-architecture baseline f...
Metaphor is pervasive in everyday language, yet token-level computational identification of metaphor-related words in Chinese under the MIPVU framework remains under-explored relative to English. This paper presents a reproducible multi-architecture baseline for token-level metaphor identification on the PSU Chinese Metaphor Corpus (PSU CMC), the only widely available MIPVU-annotated Chinese corpus. We systematically compare three model families: (i) encoder fine-tuning with Chinese RoBERTa-wwm-...
310 Topology-Enhanced Alignment for Large Language Models: Trajectory Topology Loss and Topological Preference Optimization
2605.07172
Topology-Aware LLM Alignment用持久同调定义轨迹拓扑损失与偏好优化以正则化对齐几何结构。
cs.CL
Yurui Pan, Ke Xu, Bo Peng
Alignment of large language models (LLMs) via SFT and RLHF/DPO typically ignores the global geometry of the representation space, relying instead on local token likelihoods or scalar scores. We view generation as tracing a semantic trajectory in hidden space a...
Alignment of large language models (LLMs) via SFT and RLHF/DPO typically ignores the global geometry of the representation space, relying instead on local token likelihoods or scalar scores. We view generation as tracing a semantic trajectory in hidden space and propose a topology-enhanced alignment framework that regularizes these trajectories using 0-dimensional persistent homology. First, for SFT, we introduce Trajectory Topology Loss (TTL). Treating prompt and gold-answer embeddings as a mix...
311 Learning Agent Routing From Early Experience
2605.07180
Query Routing to Agents提出冷启动下训练免的路由方法在轻量推理与完整代理执行间选择。
cs.CL
Yimin Wang, Jiahao Qiu, Xuan Qi, Xinzhe Juan, Jingzhe Shi
LLM agents achieve strong performance on complex reasoning tasks but incur high latency and compute cost. In practice, many queries fall within the capability boundary of cutting-edge LLMs and do not require full agent execution, making effective routing betwe...
LLM agents achieve strong performance on complex reasoning tasks but incur high latency and compute cost. In practice, many queries fall within the capability boundary of cutting-edge LLMs and do not require full agent execution, making effective routing between LLMs and agents a key challenge. We study the problem of routing queries between lightweight LLM inference and full agent execution under realistic cold-start settings. To address this, we propose BoundaryRouter, a training-free routing ...
312 The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval
2605.07186
Robust IR under Corruption揭示词边界破坏导致LLM检索呈U形退化并分析其成因。
cs.CLcs.AI
Zekai Tong, Ruiyao Xu, Aryan Shrivastava, Chenhao Tan, Ari Holtzman
Existing Large Language Model (LLM) benchmarks primarily focus on syntactically correct inputs, leaving a significant gap in evaluation on imperfect text. In this work, we study how word-boundary corruption affects how LLMs detect targeted information. By inse...
Existing Large Language Model (LLM) benchmarks primarily focus on syntactically correct inputs, leaving a significant gap in evaluation on imperfect text. In this work, we study how word-boundary corruption affects how LLMs detect targeted information. By inserting whitespace characters within words to break them into fragments, LLMs' detection accuracy follows a U-shaped curve with the increase in insertion rate. We refer to this curve as the Text Uncanny Valley. To explain such observation, we...
313 PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat
2605.07201
Toxicity Detection in Gaming用合成数据增强与多种微调集成方法完成游戏聊天多类毒性检测。
cs.CLcs.LGcs.AI
Srikar Kashyap Pulipaka
This paper describes our system for the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. The task involves classifying World of Tanks chat messages into six toxicity categories: Non-toxic, Insults/Flaming, Other Offensive, Hate/Har...
This paper describes our system for the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. The task involves classifying World of Tanks chat messages into six toxicity categories: Non-toxic, Insults/Flaming, Other Offensive, Hate/Harassment, Threats, and Extremism. We explore multiple approaches including encoder-based models, instruction-tuned LLMs with LoRA fine-tuning, hierarchical classification, one-vs-rest strategies, and various ensemble methods. Our best system...
314 Hallucination Detection via Activations of Open-Weight Proxy Analyzers
2605.07209
Hallucination Detection via Activations用开源代理模型读取生成文本并基于其激活特征检测幻觉。
cs.CLcs.LGcs.AI
Akshita Singh, Prabesh Paudel, Siddhartha Roy
We introduce a proxy-analyzer framework for detecting hallucinations in large language models. Instead of looking inside the generating model, our system reads already-generated text through a small locally hosted open-weight model and spots hallucinations usi...
We introduce a proxy-analyzer framework for detecting hallucinations in large language models. Instead of looking inside the generating model, our system reads already-generated text through a small locally hosted open-weight model and spots hallucinations using the reader's own internal activations. This works just as well when the generator is a closed API like GPT-4 as when it is any open-weight model. We built eighteen features grounded in how transformers process text, covering residual str...
315 Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
2605.07234
KV Cache Eviction将KV缓存淘汰重构为输出感知的层级近似以降低长上下文开销。
cs.CLcs.AI
Tho Mai, Joo-Young Kim
Large language models (LLMs) support long-context inference but suffer from substantial memory and runtime overhead due to Key-Value (KV) Cache growth. Existing KV Cache eviction methods primarily rely on local attention weights, neglecting the influence of va...
Large language models (LLMs) support long-context inference but suffer from substantial memory and runtime overhead due to Key-Value (KV) Cache growth. Existing KV Cache eviction methods primarily rely on local attention weights, neglecting the influence of value representations, output projection, and inter-head interactions. In this work, we reformulate KV Cache eviction from a conventional head-wise, weight-averaging approach into an output-aware, layer-wise matrix multiplication approximatio...
316 Teaching Language Models to Think in Code
2605.07237
Code-Centric Reasoning提出让模型以代码作为主要推理载体并结合执行来提升解题可靠性。
cs.CL
Hyeon Hwang, Jiwoo Lee, Jaewoo Kang
Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as...
Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as a post-hoc verifier, intermediate NL computations are error-prone, and NL and code play overlapping rather than clearly distinct roles. We propose ThinC (Thinking in Code), a framework in which code itself serves as the reasoner rather tha...
317 SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting
2605.07243
Speculative Decoding Acceleration提出块迭代的动态树草稿推测解码以降低草稿延迟并加速推理。
cs.CL
Weijie Shi, Qiang Xu, Fan Deng, Yaguang Wu, Jiarun Liu
Speculative decoding accelerates LLM inference by drafting a tree of candidate continuations and verifying it in one target forward. Existing drafters fall into two camps with opposite weaknesses. Autoregressive drafters such as EAGLE-3 preserve dependence alo...
Speculative decoding accelerates LLM inference by drafting a tree of candidate continuations and verifying it in one target forward. Existing drafters fall into two camps with opposite weaknesses. Autoregressive drafters such as EAGLE-3 preserve dependence along each draft path but call the drafter once per tree depth, making drafting a non-trivial share of per-iteration latency. Parallel drafters cut drafter calls by predicting multiple future positions in one forward, but each position is pred...
318 PaT: Planning-after-Trial for Efficient Test-Time Code Generation
2605.07248
Adaptive Test-Time Planning提出先试后规划策略在验证失败时才规划以提升代码生成效率。
cs.CLcs.LG
Youngsik Yoon, Sungjae Lee, Seockbean Song, Siwei Wang, Wei Chen
Beyond training-time optimization, scaling test-time computation has emerged as a key paradigm to extend the reasoning capabilities of Large Language Models (LLMs). However, most existing methods adopt a rigid Planning-before-Trial (PbT) policy, which ineffici...
Beyond training-time optimization, scaling test-time computation has emerged as a key paradigm to extend the reasoning capabilities of Large Language Models (LLMs). However, most existing methods adopt a rigid Planning-before-Trial (PbT) policy, which inefficiently allocates test-time compute by incurring planning overhead even on directly solvable problems. We propose Planning-after-Trial (PaT), an adaptive policy for code generation that invokes a planner only upon verification failure. This a...
319 From 0-Order Selection to 2-Order Judgment: Combinatorial Hardening Exposes Compositional Failures in Frontier LLMs
2605.07268
Hardened Reasoning Benchmark提出LogiHard将选择题转为二阶判断以暴露前沿模型组合推理缺陷。
cs.CL
Hanmeng Liu, Shichao Weng, Xiulai Liu, Zhicai Zhang, Anli Yan
Multiple-choice reasoning benchmarks face dual challenges: rapid saturation from advancing models and data contamination that undermines static evaluations. Ad-hoc hardening methods (paraphrasing, perturbation) attempt to increase difficulty but sacrifice logi...
Multiple-choice reasoning benchmarks face dual challenges: rapid saturation from advancing models and data contamination that undermines static evaluations. Ad-hoc hardening methods (paraphrasing, perturbation) attempt to increase difficulty but sacrifice logical validity for surface complexity, falling short to challenge advanced reasoning models. We present LogiHard, a formal framework that deterministically transforms 0-order selection into 2-order logical judgment, which significantly increa...
320 MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning
2605.07269
Prompt Injection Defense提出多语间接提示注入防御融合Qwen分类器与TF-IDF及集成学习。
cs.CLcs.LG
Al Muhit Muhtadi, Mostafa Rifat Tazwar
Indirect prompt injection remains a persistent weakness in retrieval-augmented and tool-using LLM systems, and the problem becomes harder to characterise in multilingual settings. We present MIPIAD, a defense framework evaluated on English and Bangla that comb...
Indirect prompt injection remains a persistent weakness in retrieval-augmented and tool-using LLM systems, and the problem becomes harder to characterise in multilingual settings. We present MIPIAD, a defense framework evaluated on English and Bangla that combines a sequence classifier fine-tuned from Qwen2.5-1.5B via LoRA (XLPID), TF-IDF lexical features, and validation-tuned ensembling through late fusion, stacking, and gradient boosting. The framework is evaluated on a synthetic benchmark bui...
321 Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions
2605.07271
Layer Pruning Collapse Analysis用决策表征转变指标解释层剪枝中性能突崩的机制与分界层。
cs.CLcs.AI
Boyu Shi, Chang Liu, ChuanBao Gao, Xu Yang, Xin Geng
Layer pruning efficiently reduces Large Language Model (LLM) computational costs but often triggers sudden performance collapse. Existing representation-based analyses struggle to explain this mechanism. We propose studying pruning through decision representat...
Layer pruning efficiently reduces Large Language Model (LLM) computational costs but often triggers sudden performance collapse. Existing representation-based analyses struggle to explain this mechanism. We propose studying pruning through decision representation. Focusing on multiple-choice tasks, we introduce two metrics, Decision Margin and Option Frequency, and an Iterative Pruning method to analyze layer-wise decision dynamics. Our findings reveal a sharp decision transition that partitions...
322 MedAction: Towards Active Multi-turn Clinical Diagnostic LLMs
2605.07305
Active Clinical Diagnosis Agents构建多轮主动诊断评测并分析LLM在开检验与更新诊断上的失误。
cs.CLcs.AI
Hsin-Ling Hsu, Zizheng Wang, Donghua Zhang, Nai-Chia Chen, Jerry Wang
Most existing LLM diagnoses are evaluated on static, single-turn settings where complete patient information is provided upfront, an oversimplification of real clinical practice. We study active diagnosis: the real-life clinical process of starting from initia...
Most existing LLM diagnoses are evaluated on static, single-turn settings where complete patient information is provided upfront, an oversimplification of real clinical practice. We study active diagnosis: the real-life clinical process of starting from initial observation, ordering tests, interpreting results, and updating a differential diagnosis across multiple turns. Through systematic analysis, we identify three recurring failure modes in current LLMs: ungrounded test ordering, unreliable d...
323 Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts
2605.07307
Sparse Chain-of-Thought证明推理链可被稀疏化与乱序仍能提取答案并系统评估其影响。
cs.CL
Yi-Chang Chen, Feng-Ting Liao, Da-shan Shiu, Hung-yi Lee
Modern reasoning language models generate dense, sequential chain-of-thought traces implicitly assuming that every token contributes and that steps must be consumed in order. We challenge both assumptions through a systematic intervention pipeline--removal, ma...
Modern reasoning language models generate dense, sequential chain-of-thought traces implicitly assuming that every token contributes and that steps must be consumed in order. We challenge both assumptions through a systematic intervention pipeline--removal, masking, shuffling, and noise injection--applied to model-generated reasoning chains across three models and three benchmarks. Our findings are counterintuitive on three dimensions. Order: Does the sequential order of a reasoning chain matter...
324 LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification
2605.07315
Latent-Then-Explicit Reasoning提出先潜在探索再显式验证的两阶段推理以兼顾成本与可检验性。
cs.CL
Xuan Li, Yining Wang, Yuchen Liu, Guanjun Liu, Delai Qiu
Chain-of-thought (CoT) reasoning improves large language models (LLMs) on difficult tasks, but it also makes inference expensive because every intermediate step must be generated as a discrete token. Latent reasoning reduces visible token generation by propaga...
Chain-of-thought (CoT) reasoning improves large language models (LLMs) on difficult tasks, but it also makes inference expensive because every intermediate step must be generated as a discrete token. Latent reasoning reduces visible token generation by propagating continuous states, yet replacing explicit derivations with latent computation can hurt tasks that require symbolic checking. We propose Latent-Then-Explicit Reasoning (LaTER), a two-stage paradigm that first performs bounded exploratio...
325 Activation Differences Reveal Backdoors: A Comparison of SAE Architectures
2605.07324
Backdoor Detection with SAEs比较两类稀疏自编码器以用激活差异定位微调模型中的后门特征。
cs.CLcs.LGcs.AI
Sachin Kumar
Backdoor attacks on language models pose a significant threat to AI safety, where models behave normally on most inputs but exhibit harmful behavior when triggered by specific patterns. Detecting such backdoors through mechanistic interpretability remains an o...
Backdoor attacks on language models pose a significant threat to AI safety, where models behave normally on most inputs but exhibit harmful behavior when triggered by specific patterns. Detecting such backdoors through mechanistic interpretability remains an open challenge. We investigate two sparse autoencoder architectures -- Crosscoders and Differential SAEs (Diff-SAE) -- for isolating backdoor-related features in fine-tuned models. Using a controlled SQL injection backdoor triggered by year-...
326 Mean-Pooled Cosine Similarity is Not Length-Invariant: Theory and Cross-Domain Evidence for a Length-Invariant Alternative
2605.07345
Length-Invariant Similarity Metric证明均值池化余弦不具长度不变性并提出跨域更稳健的替代指标。
cs.CLcs.LG
Sibayan Mitra (BITS Pilani), Dhruv Kumar (BITS Pilani)
Mean-pooled cosine similarity is the default metric for comparing neural representations across languages, modalities, and tasks. We establish that this metric is not length-invariant: under the anisotropy that characterizes modern transformer representations,...
Mean-pooled cosine similarity is the default metric for comparing neural representations across languages, modalities, and tasks. We establish that this metric is not length-invariant: under the anisotropy that characterizes modern transformer representations, mean-pooled cosine grows monotonically in sequence length, independent of representational content. Empirically, on HumanEvalPack across four code LLMs, the length ratio alone explains $R^2 = 0.52$--$0.75$ of cross-language "Python proximi...
327 Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study
2605.07366
LoRA Rank Allocation in RL实证分析GRPO下基于梯度的LoRA秩分配并发现其不如均匀分配。
cs.CL
Yash Ganpat Sawant
Adaptive rank allocation for LoRA, allocating more parameters to important layers and fewer to unimportant ones, consistently improves efficiency under supervised fine-tuning (SFT). We investigate whether this success transfers to reinforcement learning, speci...
Adaptive rank allocation for LoRA, allocating more parameters to important layers and fewer to unimportant ones, consistently improves efficiency under supervised fine-tuning (SFT). We investigate whether this success transfers to reinforcement learning, specifically Group Relative Policy Optimization (GRPO). Using gradient-magnitude profiling on Qwen 2.5 1.5B with GSM8K, we find that it does not: proportional rank allocation degrades accuracy by 4.5 points compared to uniform allocation (70.0% ...
328 The Proxy Presumption: From Semantic Embeddings to Valid Social Measures
2605.07409
Embedding Validity in Social Science批判将嵌入几何直接当社会测量的代理假设并强调需显式效度验证。
cs.CLcs.LG
Baishi Li, Ta Yu, Kelvin J. L. Koa, Ke-Wei Huang
Natural Language Processing is rapidly evolving into a primary instrument for Computational Social Science, with researchers increasingly using embeddings to measure latent constructs such as novelty, creativity, and bias. However, this transition faces a fund...
Natural Language Processing is rapidly evolving into a primary instrument for Computational Social Science, with researchers increasingly using embeddings to measure latent constructs such as novelty, creativity, and bias. However, this transition faces a fundamental validity challenge: the ''Proxy Presumption,'' or the reliance on geometric properties (e.g., cosine distance) as direct measures of social concepts. We argue that without explicit validation, unsupervised representations remain ent...
329 Generating training datasets for legal chatbots in Korean
2605.07432
Korean Legal Chatbot Data提出生成并标注韩语法律聊天机器人训练数据以覆盖多样用户表达。
cs.CLcs.LG
Changhoe Hwang, Jee-Sun Nam, Eric Laporte
Chatbots are robots that can communicate with humans using text or voice signals. Legal chatbots improve access to justice, since legal representation and legal advice by lawyers come with a high cost that excludes disadvantaged and vulnerable people. However,...
Chatbots are robots that can communicate with humans using text or voice signals. Legal chatbots improve access to justice, since legal representation and legal advice by lawyers come with a high cost that excludes disadvantaged and vulnerable people. However, capturing the diversity of actual user input in datasets for deep-learning dialog systems (chatbots) is a technical challenge. Diversity requires large volumes of data, which must also be labelled in order to classify the user's intent, wh...
330 SSP-based construction of evaluation-annotated data for fine-grained aspect-based sentiment analysis
2605.07446
Korean ABSA Annotated Corpus用半自动符号传播构建韩语细粒度ABSA评估标注语料与资源。
cs.CLcs.LG
Suwon Choi, Shinwoo Kim, Changhoe Hwang, Gwanghoon Yoo, Eric Laporte
We report the construction of a Korean evaluation-annotated corpus, hereafter called 'Evaluation Annotated Dataset (EVAD)', and its use in Aspect-Based Sentiment Analysis (ABSA) extended in order to cover e-commerce reviews containing sentiment and non-sentime...
We report the construction of a Korean evaluation-annotated corpus, hereafter called 'Evaluation Annotated Dataset (EVAD)', and its use in Aspect-Based Sentiment Analysis (ABSA) extended in order to cover e-commerce reviews containing sentiment and non-sentiment linguistic patterns. The annotation process uses Semi-Automatic Symbolic Propagation (SSP). We built extensive linguistic resources formalized as a Finite-State Transducer (FST) to annotate corpora with detailed ABSA components in the fa...
331 Data Contamination in Neural Hieroglyphic Translation: A Reproducibility Study
2605.07453
NMT数据污染复现研究审计圣书体到德语翻译数据污染并复现BLEU差异
cs.CL
Ammar Toutou, Abdelrahman Harb, Christine Basta
Ancient and endangered languages pose a unique challenge for NLP: their datasets are inherently scarce, difficult to expand, and built from formulaic corpora -- making data-quality issues especially consequential yet rarely audited. Motivated by the need to un...
Ancient and endangered languages pose a unique challenge for NLP: their datasets are inherently scarce, difficult to expand, and built from formulaic corpora -- making data-quality issues especially consequential yet rarely audited. Motivated by the need to understand what current NMT can realistically achieve for such languages, we investigate hieroglyphic-to-German translation, where a recent study reported 61.5 BLEU using fine-tuned M2M-100. Our reproduction yields only 37.0 BLEU with the rel...
332 GRaSp: Automatic Example Optimization for In-Context Learning in Low-Data Tasks
2605.07454
上下文学习示例优化提出GRaSp自动生成聚类筛选示例以提升低数据ICL
cs.CL
Simen Bihaug-Fr{\o}yland, Henrik Br{\aa}dland
In-context learning enables large language models to adapt to new tasks, but their performance is highly sensitive to the selected examples. Finding effective demonstrations is particularly difficult in domain-specific, low-data settings where high-quality exa...
In-context learning enables large language models to adapt to new tasks, but their performance is highly sensitive to the selected examples. Finding effective demonstrations is particularly difficult in domain-specific, low-data settings where high-quality examples are scarce. We propose GRaSp, a three-stage framework for automatic in-context example optimization. By first generating a large synthetic candidate pool, then structuring it with clustering and dimensionality reduction, and finally u...
333 Think-with-Rubrics: From External Evaluator to Internal Reasoning Guidance
2605.07461
Rubric引导模型推理将评分量规内化为生成过程中的推理指导信号
cs.CL
Jiachen Yu, Zhihao Xu, Junjie Wang, Yujiu Yang
Rubrics have been extensively utilized for evaluating unverifiable, open-ended tasks, with recent research incorporating them into reward systems for reinforcement learning. However, existing frameworks typically treat rubrics only as external evaluator disjoi...
Rubrics have been extensively utilized for evaluating unverifiable, open-ended tasks, with recent research incorporating them into reward systems for reinforcement learning. However, existing frameworks typically treat rubrics only as external evaluator disjointed from the policy's primary reasoning trace. Such design confines rubrics to post-hoc measurement, leaving them unable to actively guide the model's generation process. In this work, we introduce Think-with-Rubrics, a novel paradigm for ...
334 The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
2605.07462
多智能体社交平台数据集发布Moltbook代理社区数据并分析结构与安全风险
cs.CLcs.AI
William Brach, Federico Torrielli, Stine Lyngs{\o} Beltoft, Annemette Brok Pirchert, Peter Schneider-Kamp
Moltbook is a Reddit-like platform where OpenClaw agents post, comment, and vote at scale - a so far unprecedented incident that comes with serious safety concerns. With the aim of studying emergent behavior in populations, we release the Moltbook Files, a dat...
Moltbook is a Reddit-like platform where OpenClaw agents post, comment, and vote at scale - a so far unprecedented incident that comes with serious safety concerns. With the aim of studying emergent behavior in populations, we release the Moltbook Files, a dataset of 232k posts and 2.2M comments covering the platform's first 12 days, processed through a pipeline to identify and remove Personally-Identifiable Information (PII). We analyze community structure, authorship, lexical properties, senti...
335 SEIF: Self-Evolving Reinforcement Learning for Instruction Following
2605.07465
自进化强化学习指令跟随用自进化RL让指令难度随模型能力提升而动态增长
cs.CL
Qingyu Ren, Qianyu He, Jiajie Zhu, Xingzhou Chen, Jingwen Chang
Instruction following is a fundamental capability of large language models (LLMs), yet continuously improving this capability remains challenging. Existing methods typically rely either on costly external supervision from humans or strong teacher models, or on...
Instruction following is a fundamental capability of large language models (LLMs), yet continuously improving this capability remains challenging. Existing methods typically rely either on costly external supervision from humans or strong teacher models, or on self-play training with static-difficulty instructions that cannot evolve as the model's capabilities improve. To address these limitations, we propose SEIF (Self-Evolving Reinforcement Learning for Instruction Following), a self-evolving ...
336 TCMIIES: A Browser-Based LLM-Powered Intelligent Information Extraction System for Academic Literature
2605.07507
LLM文献智能信息抽取实现浏览器端LLM系统从论文文本抽取结构化学术信息
cs.CL
Hanqing Zhao
The exponential growth of academic publications has created an urgent need for automated tools capable of extracting structured knowledge from unstructured scientific texts. While large language models (LLMs) have demonstrated remarkable capabilities in natura...
The exponential growth of academic publications has created an urgent need for automated tools capable of extracting structured knowledge from unstructured scientific texts. While large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and information extraction, existing solutions often require specialized infrastructure, programming expertise, or fine-tuned domain-specific models that create barriers for researchers in specialized fields. This p...
337 WeatherSyn: An Instruction Tuning MLLM For Weather Forecasting Report Generation
2605.07522
天气多模态报告生成指令微调多模态模型自动生成天气预报文字报告
cs.CL
Zinan Zheng, Yang Liu, Nuo Chen, Juepeng Zheng, Hong Cheng
Accurate weather forecast reporting enables individuals and communities to better plan daily activities and agricultural operations. However, the current reporting process primarily relies on manual analysis of multi-source data, which leads to information ove...
Accurate weather forecast reporting enables individuals and communities to better plan daily activities and agricultural operations. However, the current reporting process primarily relies on manual analysis of multi-source data, which leads to information overload and reduced efficiency. With the development of multimodal large language models (MLLMs), leveraging data-driven models to analyze and generate reports in the weather forecasting domain remains largely underexplored. In this work, we ...
338 Why do Large Language Models Fail in Low-resource Translation? Unraveling the Token Dynamics of Large Language Models for Machine Translation
2605.07533
低资源翻译失败机理分析多LLM在低资源翻译中的token动态与失效模式
cs.CL
Shenbin Qian, Yves Scherrer
Large Language Models (LLMs) have recently demonstrated strong performance in machine translation (MT). However, most prior work focuses on improving or benchmarking translation quality, offering limited insight into when and why LLM-based translation fails. I...
Large Language Models (LLMs) have recently demonstrated strong performance in machine translation (MT). However, most prior work focuses on improving or benchmarking translation quality, offering limited insight into when and why LLM-based translation fails. In this work, we systematically analyze failure modes of LLMs in MT by evaluating 15 models, including four reasoning LLMs, across 22 language pairs (LPs) with varying resource levels. We find that non-English-centric LPs consistently yield ...
339 N\"urnberg NLP at PsyDefDetect: Multi-Axis Voter Ensembles for Psychological Defence Mechanism Classification
2605.07606
心理防御机制分类集成用多轴投票集成提升对话中防御机制类别判别
cs.CLcs.AI
Philipp Steigerwald, Eric Rudolph, Jens Albrecht
Detecting levels of psychological defence mechanisms in supportive conversations is inherently ambiguous. In the PsyDefDetect shared task at BioNLP 2026 the eight positive defence categories share surface language and differ only in pragmatic function and trai...
Detecting levels of psychological defence mechanisms in supportive conversations is inherently ambiguous. In the PsyDefDetect shared task at BioNLP 2026 the eight positive defence categories share surface language and differ only in pragmatic function and trained raters reach only moderate inter-annotator agreement. On such a task the decisive lever is not a stronger single model but error independence, since any single representation will waver on the overlapping defence boundaries. We translat...
340 Intent-Driven Semantic ID Generation for Grounded Conversational News Recommendation
2605.07613
对话新闻推荐语义ID生成语义ID以缓解隐式意图下RAG检索瓶颈
cs.CL
Hongyang Su, Beibei Kong, Lei Cheng, Chengxiang Zhuo, Zang Li
Conversational news recommendation requires grounding each suggestion in a rapidly evolving article corpus while addressing implicit user intents that lack explicit retrievable keywords. To characterize this scenario, we identify 6 intent types from production...
Conversational news recommendation requires grounding each suggestion in a rapidly evolving article corpus while addressing implicit user intents that lack explicit retrievable keywords. To characterize this scenario, we identify 6 intent types from production dialogues: five are implicit and pose fundamental challenges to standard RAG pipelines, forming a critical retrieve-first bottleneck. To address these issues, we introduce intent-driven Semantic ID (SID) generation under a Generate-then-Ma...
341 Is She Even Relevant? When BERT Ignores Explicit Gender Cues
2605.07622
BERT性别线索忽视偏差研究荷兰语BERT何时忽略显式性别线索及偏差形成
cs.CL
Jonas Klein, Chiara Manna, Eva Vanmassenhove
Gender bias in large language models has primarily been investigated for English, while languages with grammatical or morphological gender remain comparatively understudied. This paper investigates how and when gender information emerges in a Dutch BERT model ...
Gender bias in large language models has primarily been investigated for English, while languages with grammatical or morphological gender remain comparatively understudied. This paper investigates how and when gender information emerges in a Dutch BERT model trained from scratch, offering one of the first checkpoint-level analyses of bias formation in a Transformer architecture for a language combining overt morphological gender marking and generic forms. By extracting contextual embeddings thr...
342 Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents
2605.07630
手机代理安全评测区分提出评测区分代理因安全选择还是因无能而避害
cs.CLcs.LGcs.AI
Zhengyang Tang, Yi Zhang, Chenxin Li, Xin Lai, Pengyuan Lyu
When a phone-use agent avoids harm, does that show safety, or simply inability to act? Existing evaluations often cannot tell. A harmful outcome may be avoided because the agent recognized the risk and chose the safe action, or because it failed to understand ...
When a phone-use agent avoids harm, does that show safety, or simply inability to act? Existing evaluations often cannot tell. A harmful outcome may be avoided because the agent recognized the risk and chose the safe action, or because it failed to understand the screen or execute any relevant action at all. These cases have different causes and call for different fixes, yet current benchmarks often merge them under task success, refusal, or final harmful outcome. We address this problem with Ph...
343 Post-training makes large language models less human-like
2605.07632
后训练降低类人行为用Psych-201发现对齐后训练会降低模型与人类行为一致性
cs.CLcs.LGcs.AI
Marcel Binz, Elif Akata, Abdullah Almaatouq, Mohammed Alsobay, Oleksii Ariasov
Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral ali...
Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in...
344 Multi-Dimensional Evaluation of LLMs for Grammatical Error Correction
2605.07635
语法纠错LLM多维评测系统评估LLM语法纠错并检验集成与指标低估问题
cs.CL
Adnan Labib, Qiao Wang, Yixuan Huang, Zheng Yuan
Automated assistants for Grammatical Error Correction are now embedded in educational platforms serving millions of learners, yet three critical gaps remain in this domain: (1) latest-generation Large Language Models (LLMs) lack comprehensive evaluation on gra...
Automated assistants for Grammatical Error Correction are now embedded in educational platforms serving millions of learners, yet three critical gaps remain in this domain: (1) latest-generation Large Language Models (LLMs) lack comprehensive evaluation on grammar correction tasks; (2) whether combining these LLMs improves correction quality is unexplored; and (3) the extent to which reference-based metrics underestimate GEC system performance has not been adequately quantified. In this study, f...
345 MAVEN: Multi-Agent Verification-Elaboration Network with In-Step Epistemic Auditing
2605.07646
多智能体分步审计推理提出MAVEN用验证与扩写代理分步审计推理错误
cs.CLcs.LGcs.AI
Yinsheng Yao, Jiehao Tang, Zhaozhen Yang, Dawei Cheng
While explicit reasoning trajectories enhance model interpretability, existing paradigms often rely on monolithic chains that lack intermediate verification, allowing early errors to cascade unchecked. This lack of modularity impedes granular auditing and comp...
While explicit reasoning trajectories enhance model interpretability, existing paradigms often rely on monolithic chains that lack intermediate verification, allowing early errors to cascade unchecked. This lack of modularity impedes granular auditing and compromises the epistemic trust required for high-stakes applications. We propose MAVEN (Multi-Agent Verification-Elaboration Network with In-Step Epistemic Auditing), a blackboard-inspired framework designed to transform LLMs into deliberate r...
346 Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation
2605.07647
短答评分一致性退化研究少样本LLM短答评分在部分正确答案上的一致性下降
cs.CLcs.AI
Abigail Victoria Gurin Schleifer, Moriah Ariely, Beata Beigman Klebanov, Asaf Salman, Giora Alexandron
Automated short answer scoring (ASAS) is shifting from discriminative, fine-tuned models to large language models (LLMs) used in few-shot settings. This paradigm leverages LLMs broad world knowledge and ease of deployment, but limited task-specific data may re...
Automated short answer scoring (ASAS) is shifting from discriminative, fine-tuned models to large language models (LLMs) used in few-shot settings. This paradigm leverages LLMs broad world knowledge and ease of deployment, but limited task-specific data may reduce alignment on complex scoring tasks. In particular, its impact on scoring partially correct responses that require nuanced interpretation remains underexplored. We investigate the relationship between the degree of task-specific adaptat...
347 Not All Tokens Learn Alike: Attention Entropy Reveals Heterogeneous Signals in RL Reasoning
2605.07660
RL推理token学习信号用注意力熵揭示RL后训练中不同token的异质学习信号
cs.CL
Gengyang Li, Zheng-Fan Wu, Siqi Bao, Yunfang Wu
Reinforcement-learning-based post-training has become a key approach for improving the reasoning ability of large language models, but its token-level learning signals remain poorly understood. This work studies their heterogeneity through attention entropy, w...
Reinforcement-learning-based post-training has become a key approach for improving the reasoning ability of large language models, but its token-level learning signals remain poorly understood. This work studies their heterogeneity through attention entropy, which measures how concentrated or diffuse the contextual support is for each response token. We first show that token-level RL objectives are sparsely estimable: uniformly random 20 percent token subsets preserve much of the full-token held...
348 DRIP-R: A Benchmark for Decision-Making and Reasoning Under Real-World Policy Ambiguity in the Retail Domain
2605.07699
零售政策歧义推理基准构建DRIP-R评测代理在真实零售政策歧义下决策推理
cs.CLcs.AI
Hsuvas Borkakoty, Sebastian Pohl, Cheng Wang, Bei Chen, Yufang Hou
LLM-based agents are increasingly deployed for routine but consequential tasks in real-world domains, where their behavior is governed by inherently ambiguous domain policies that admit multiple valid interpretations. Despite the prevalence of such ambiguities...
LLM-based agents are increasingly deployed for routine but consequential tasks in real-world domains, where their behavior is governed by inherently ambiguous domain policies that admit multiple valid interpretations. Despite the prevalence of such ambiguities in practice, existing agent benchmarks largely assume unambiguous, well-specified policies, leaving a critical evaluation gap. We introduce DRIP-R, a benchmark that systematically exploits real-world retail policy ambiguities to construct ...
349 Guidance Is Not a Hyperparameter: Learning Dynamic Control in Diffusion Language Models
2605.07701
扩散语言模型动态引导将CFG尺度选择建模为序列决策以动态控制扩散生成
cs.CL
Fan Zhou, Tim Van de Cruys
Classifier-Free Guidance (CFG) is a widely used mechanism for controlling diffusion-based generative models, yet its guidance scale is typically treated as a fixed hyperparameter throughout generation. This static design yields a suboptimal controllability and...
Classifier-Free Guidance (CFG) is a widely used mechanism for controlling diffusion-based generative models, yet its guidance scale is typically treated as a fixed hyperparameter throughout generation. This static design yields a suboptimal controllability and quality tradeoff, as the optimal degree of guidance varies across tasks and across different stages of the diffusion process, especially in NLP domain. We recast CFG scale selection as a sequential decision-making problem and propose to le...
350 SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation
2605.07711
跨分词器在策略蒸馏提出SimCT在不同tokenizer下恢复OPD丢失的教师监督
cs.CL
Jie Sun, Mao Zheng, Mingyang Song, Qiyong Zhong, Yilin Cheng
On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the ...
On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differently. Under heterogeneous tokenizers, exact shared-token matching silently discards a large fraction of the teacher signal at precisely the positions where vocabularies disagree. We propose \textbf{\underline{Sim}ple \under...
351 Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
2605.07721
循环Transformer省内存通过解耦计算与KV缓存降低循环推理深度带来的显存增长
cs.CLcs.LGcs.AI
Victor Conchello Vendrell, Arnau Padres Masdemont, Niccol\`o Grillo, Jordi Ros-Giralt, Arash Behboodi
Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updating interna...
Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updating internal representations while retaining a standard Key-Value (KV) cache across iterations, causing memory consumption to grow linearly with reasoning depth. Consequently, increasing the number of reasoning iterations can lead to prohibitive memor...
352 SOD: Step-wise On-policy Distillation for Small Language Model Agents
2605.07725
小模型工具推理蒸馏提出SOD分步在策略蒸馏以稳定小模型长程工具交互
cs.CLcs.AI
Qiyong Zhong, Mao Zheng, Mingyang Song, Xin Lin, Jie Sun
Tool-integrated reasoning (TIR) is difficult to scale to small language models due to instability in long-horizon tool interactions and limited model capacity. While reinforcement learning methods like group relative policy optimization provide only sparse out...
Tool-integrated reasoning (TIR) is difficult to scale to small language models due to instability in long-horizon tool interactions and limited model capacity. While reinforcement learning methods like group relative policy optimization provide only sparse outcome-level rewards. Recently, on-policy distillation (OPD) has gained popularity by supplying dense token-level supervision from a teacher on student-generated trajectories. However, our experiments indicate that applying OPD to TIR leads t...
353 Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs
2605.07731
意大利语MoE模型评测基准测试EngGPT2MoE并与同规模开源模型对比表现
cs.CLcs.AI
Andrea Sassella, Andrea Chizzola, Tommaso Bianchi, Luca Alessandrelli, Mark James Carman
This report benchmarks the performance of ENGINEERING Ingegneria Informatica S.p.A.'s EngGPT2MoE-16B-A3B LLM, a 16B parameter Mixture of Experts (MoE) model with 3B active parameters. Performance is investigated across a wide variety of representative benchmar...
This report benchmarks the performance of ENGINEERING Ingegneria Informatica S.p.A.'s EngGPT2MoE-16B-A3B LLM, a 16B parameter Mixture of Experts (MoE) model with 3B active parameters. Performance is investigated across a wide variety of representative benchmarks, and is compared against comparably-sized open-source MoE and dense models. In comparison with popular Italian models, namely FastwebMIIA-7B, Minerva-7B, Velvet-14B, and LLaMAntino-3-ANITA-8B, EngGPT2MoE-16B-A3B performs as well or bette...
354 TextLDM: Language Modeling with Continuous Latent Diffusion
2605.07748
连续潜变量扩散语言建模提出TextLDM在VAE连续潜空间用扩散模型进行文本生成
cs.CL
Jiaxiu Jiang, Jingjing Ren, Wenbo Li, Bo Wang, Haoze Sun
Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next step toward a single architecture for both generation (visual synthesis) and understanding (text generation) i...
Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next step toward a single architecture for both generation (visual synthesis) and understanding (text generation) is to apply this framework to language modeling. We propose TextLDM, which transfers the visual latent diffusion recipe to text generation with minimal architectural modification. A Transformer-based VAE maps discrete tokens to continuous la...
355 CktFormalizer: Autoformalization of Natural Language into Circuit Representations
2605.07782
Lean约束硬件自动形式化用依赖类型HDL将自然语言规格自动形式化为可验证电路
cs.CL
Jing Xiong, Qi Han, Chenchen Ding, He Xiao, Zunhai Su
LLMs can generate hardware descriptions from natural language specifications, but the resulting Verilog often contains width mismatches, combinational loops, and incomplete case logic that pass syntax checks yet fail in synthesis or silicon. We present CktForm...
LLMs can generate hardware descriptions from natural language specifications, but the resulting Verilog often contains width mismatches, combinational loops, and incomplete case logic that pass syntax checks yet fail in synthesis or silicon. We present CktFormalizer, a framework that redirects LLM-driven hardware generation through a dependently-typed HDL embedded in Lean 4. Lean serves three roles: (i) type checker:dependent types encode bit-width constraints, case coverage, and acyclicity, tur...
356 Chain-based Distillation for Effective Initialization of Variable-Sized Small Language Models
2605.07783
小语言模型链式蒸馏初始化提出链式蒸馏高效初始化不同规模小模型以减少教师调用
cs.CL
Boyu Shi, YiCheng Jiang, Chang Liu, Qiufeng Wang, Xu Yang
Large language models (LLMs) achieve strong performance but remain costly to deploy in resource-constrained settings. Training small language models (SLMs) from scratch is computationally expensive, while conventional knowledge distillation requires repeated a...
Large language models (LLMs) achieve strong performance but remain costly to deploy in resource-constrained settings. Training small language models (SLMs) from scratch is computationally expensive, while conventional knowledge distillation requires repeated access to large teachers for different target sizes, leading to poor scalability. To solve these problems, we propose \textbf{Chain-based Distillation (CBD)}, a scalable paradigm for efficiently initializing variable-sized language models. A...
357 Hybrid TF--IDF Logistic Regression and MLP Neural Baseline for Indonesian Three-Class Sentiment Analysis on Social Media Text
2605.07793
印尼语情感分析轻量基线结合TF-IDF与元特征用逻辑回归与MLP做三分类情感基线
cs.CL
Allya Nurul Islami Pasha, Eka Fidiya Putri, Luluk Muthoharoh, Ardika Satria, Martin C. T. Manullang
This paper presents a compact three-class sentiment analysis study for Indonesian social media text. The task is formulated with positive, negative, and neutral outputs derived from a fine-grained emotion dataset. The proposed practical baseline combines TF--I...
This paper presents a compact three-class sentiment analysis study for Indonesian social media text. The task is formulated with positive, negative, and neutral outputs derived from a fine-grained emotion dataset. The proposed practical baseline combines TF--IDF text features, three lightweight numeric metadata features, and a balanced multinomial Logistic Regression classifier. For comparison, the study also includes a neural baseline using a two-layer multilayer perceptron (MLP) over the same ...
358 PolySQL: Scaling Text-to-SQL Evaluation Across SQL Dialects via Automated Backend Isomorphism
2605.07796
跨SQL方言Text-to-SQL评测用后端同构自动化实现多SQL方言可比的Text-to-SQL评测
cs.CL
Yotam Perlitz, Elad Venezian, Corentin Royer, Francesco Fusco, Andrea Giovannini
SQL dialects vary in syntax, types, and functions across database engines. Text-to-SQL benchmarks, however, predominantly support only SQLite. This creates a critical evaluation gap: cross-dialect evaluation reveals weak per-query agreement (Cohen's ), showing...
SQL dialects vary in syntax, types, and functions across database engines. Text-to-SQL benchmarks, however, predominantly support only SQLite. This creates a critical evaluation gap: cross-dialect evaluation reveals weak per-query agreement (Cohen's ), showing that SQLite performance is an unreliable proxy for other dialects. Yet such evaluation remains prohibitively difficult: existing approaches either require expensive manual query transpilation or rely on tools that often fail on complex SQL...
359 Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs
2605.07806
LLM自我评估可靠性建模基于认知评估理论改进LLM自评以预测任务正确性
cs.CLcs.LGcs.AI
Sree Bhattacharyya, Samarth Khanna, Leona Chen, Lucas Craig, Tharun Dilliraj
Large Language Models (LLMs) are increasingly used in settings where reliable self-assessment is critical. Assessing model reliability has evolved from using probabilistic correctness estimates to, more recently, eliciting verbalized confidence. Confidence, ho...
Large Language Models (LLMs) are increasingly used in settings where reliable self-assessment is critical. Assessing model reliability has evolved from using probabilistic correctness estimates to, more recently, eliciting verbalized confidence. Confidence, however, has been shown to be an inconsistent and overoptimistic predictor of model correctness. Drawing on cognitive appraisal theory, a framework from human psychology that decomposes self-evaluation into multiple components, we propose a m...
360 A Comparative Analysis of Classical Machine Learning and Deep Learning Approaches for Sentiment Classification on IMDb Movie Reviews
2605.07811
IMDb情感分类方法对比对比TF-IDF传统模型与BiLSTM注意力在IMDb情感分类表现
cs.CL
Erma Daniar Safitri, Lia Hana Ichisasmita, Citra Agustin, Luluk Muthoharoh, Ardika Satria
This paper presents a comparative study of classical machine learning and deep learning methods for sentiment classification on the IMDb movie reviews dataset. The machine learning pipeline uses TF-IDF features and PyCaret AutoML to evaluate Logistic Regressio...
This paper presents a comparative study of classical machine learning and deep learning methods for sentiment classification on the IMDb movie reviews dataset. The machine learning pipeline uses TF-IDF features and PyCaret AutoML to evaluate Logistic Regression, Na\"ive Bayes, and Support Vector Machine, while the deep learning pipeline implements BiLSTM and BiLSTM with an attention mechanism. Experimental results show that classical machine learning, especially SVM, achieves the best performanc...
361 SCENE: Recognizing Social Norms and Sanctioning in Group Chats
2605.07823
Group Chat Social Norms提出SCENE基准评测LLM识别群聊隐性规范与制裁。
cs.CL
Mateusz Jacniacki, Maksymilian Bilski
Online group chats are social spaces with implicit behavior patterns that, when broken, are often met with social sanctioning from the group. The ability and willingness of LLM-based agents to recognize and adapt to these norms remains mostly unexplored. We in...
Online group chats are social spaces with implicit behavior patterns that, when broken, are often met with social sanctioning from the group. The ability and willingness of LLM-based agents to recognize and adapt to these norms remains mostly unexplored. We introduce SCENE, a social-interaction benchmark focused on implicit norms and social sanctioning in multi-party chat. SCENE generates plausible non-roleplay scenarios with scripted personas that follow a hidden norm, create opportunities for ...
362 Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors
2605.07847
User Simulation Distribution Gap度量并缓解真实与模拟用户行为分布差异。
cs.CL
Shuhaib Mehri, Philippe Laban, Sumuk Shashidhar, Marwa Abdulhai, Sergey Levine
As user simulators are increasingly used for interactive training and evaluation of AI assistants, it is essential that they represent the diverse behaviors of real users. While existing works train user simulators to generate human-like responses, whether the...
As user simulators are increasingly used for interactive training and evaluation of AI assistants, it is essential that they represent the diverse behaviors of real users. While existing works train user simulators to generate human-like responses, whether they capture the broad and heterogeneous distribution of real user behaviors remains an open question. In this work, we introduce a method to measure the distributional gap between real and simulated user behaviors, validated through a human s...
363 MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning
2605.07850
Hierarchical Rank-Adaptive LoRA提出MatryoshkaLoRA学习分层低秩表示以高效微调。
cs.CLcs.LGcs.AI
Ionut-Vlad Modoranu, Mher Safaryan, Dan Alistarh
With the rise in scale for deep learning models to billions of parameters, the computational cost of fine-tuning remains a significant barrier to deployment. While Low-Rank Adaptation (LoRA) has become the standard for parameter-efficient fine-tuning, the need...
With the rise in scale for deep learning models to billions of parameters, the computational cost of fine-tuning remains a significant barrier to deployment. While Low-Rank Adaptation (LoRA) has become the standard for parameter-efficient fine-tuning, the need to set a predefined, static rank $r$ requires exhaustive grid searches to balance efficiency and performance. Existing rank-adaptive solutions such as DyLoRA mitigate this by sampling ranks during the training from a predefined distributio...
364 Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement
2605.07883
Flexible Safety Refusal用标签增强减少LLM僵硬拒答并保持安全合规。
cs.CL
Ying Zhang, Congyu Qiao, Xin Geng, Ning Xu
Large Language Models (LLMs) rely on safety alignment to obey safe requests while refusing harmful ones. However, traditional refusal mechanisms often lead to "rigid rejection," where a general template (e.g., "I cannot fulfill this request") indiscriminately ...
Large Language Models (LLMs) rely on safety alignment to obey safe requests while refusing harmful ones. However, traditional refusal mechanisms often lead to "rigid rejection," where a general template (e.g., "I cannot fulfill this request") indiscriminately triggers refusals and severely undermines the naturalness of interactions between humans and LLMs. To address this issue, LANCE is proposed in this paper to ensure safe yet flexible and natural responses via label enhancement. Specifically,...
365 CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
2605.07905
AI Reviewer Benchmarking构建面向完整性与正确性的AI审稿评测基准。
cs.CLcs.AI
Hexuan Deng, Xiaopeng Ke, Yichen Li, Ruina Hu, Dehao Huang
Despite the rapid development of AI reviewers, evaluating such systems remains challenging: metrics favor overlap with human reviews over correctness. However, since human reviews often cover only a subset of salient issues and sometimes contain mistakes, they...
Despite the rapid development of AI reviewers, evaluating such systems remains challenging: metrics favor overlap with human reviews over correctness. However, since human reviews often cover only a subset of salient issues and sometimes contain mistakes, they are unreliable as gold references. To address this, we build category-specific benchmark subsets and skip evaluation when the corresponding human reviews are missing to strengthen Completeness. We also leverage reviewer--author--meta-revie...
366 How Value Induction Reshapes LLM Behaviour
2605.07925
Value Induction Effects分析价值诱导如何联动改变LLM行为与潜在风险。
cs.CL
Arnav Arora, Natalie Schluter, Katherine Metcalf, Maartje ter Hoeve
Conversational Large Language Models are post-trained on language that expresses specific behavioural traits, such as curiosity, open-mindedness, and empathy, and values, such as helpfulness, harmlessness, and honesty. This is done to increase utility, ensure ...
Conversational Large Language Models are post-trained on language that expresses specific behavioural traits, such as curiosity, open-mindedness, and empathy, and values, such as helpfulness, harmlessness, and honesty. This is done to increase utility, ensure safety, and improve the experience of the people interacting with the model. However, values are complex and inter-related -- inducing one could modify behaviour on another. Further, inducing certain values can make models more addictive or...
367 How to Train Your Latent Diffusion Language Model Jointly With the Latent Space
2605.07933
Jointly Trained Latent Diffusion LM联合训练编码器扩散模型与解码器的潜空间文本扩散模型。
cs.CL
Viacheslav Meshchaninov, Alexander Shabalin, Egor Chimbulatov, Nikita Gushchin, Ilya Koziev
Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent diffusion modeling is...
Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent diffusion modeling is constructing a suitable latent space. In this work, we present the Latent Diffusion Language Model (LDLM), in which the latent encoder, diffusion model, and decoder are trained jointly. LDLM builds its latent space by reshaping the represe...
368 Ask Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents?
2605.07937
Clarification Timing for Agents研究长程智能体在何时提问澄清最能提升任务成功率。
cs.CL
Anmol Gulati, Hariom Gupta, Elias Lumer, Sahil Sen, Vamse Kumar Subbiah
Long-horizon AI agents execute complex workflows spanning hundreds of sequential actions, yet a single wrong assumption early on can cascade into irreversible errors. When instructions are incomplete, the agent must decide not only whether to ask for clarifica...
Long-horizon AI agents execute complex workflows spanning hundreds of sequential actions, yet a single wrong assumption early on can cascade into irreversible errors. When instructions are incomplete, the agent must decide not only whether to ask for clarification but when, and no prior work measures how clarification value changes over the course of execution. We introduce a forced-injection framework that provides ground-truth clarifications at controlled points in the agent's trajectory acros...
369 GLiGuard: Schema-Conditioned Classification for LLM Safeguard
2605.07982
Efficient LLM Guardrails提出GLiGuard以小模型做多维安全分类降低延迟。
cs.CL
Urchade Zaratiana, Mary Newhauser, George Hurn-Maloney, Ash Lewis
Ensuring safe, policy-compliant outputs from large language models requires real-time content moderation that can scale across multiple safety dimensions. However, state-of-the-art guardrail models rely on autoregressive decoders with 7B--27B parameters, refor...
Ensuring safe, policy-compliant outputs from large language models requires real-time content moderation that can scale across multiple safety dimensions. However, state-of-the-art guardrail models rely on autoregressive decoders with 7B--27B parameters, reformulating what is fundamentally a classification problem as sequential text generation, a design choice that incurs high latency and scales poorly to multi-aspect evaluation. In this work, we introduce \textbf{GLiGuard}, a 0.3B-parameter sch...
370 Tool Calling is Linearly Readable and Steerable in Language Models
2605.07990
Tool-Calling Interpretability证明工具选择在内部表征中可线性读出并可控转向。
cs.CLcs.LGcs.AI
Zekun Wu (University College London), Ze Wang (University College London), Seonglae Cho (Holistic AI), Yufei Yang (Imperial College London), Adriano Koshiyama (University College London)
When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed. Probing 12 instruction-tuned models across Gemma 3, Qwen 3, Qwen 2.5, and Llama 3.1 (270M to 27B), we find the identity of t...
When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed. Probing 12 instruction-tuned models across Gemma 3, Qwen 3, Qwen 2.5, and Llama 3.1 (270M to 27B), we find the identity of the chosen tool is linearly readable and steerable inside the model. Adding the mean-difference between two tools' average internal activations switches which tool the model selects at 77-100% accuracy on name-only single-turn prompts (93-10...
371 Fast Byte Latent Transformer
2605.08044
Fast Byte-Level Generation用块扩散等训练生成技巧加速字节级语言模型生成。
cs.CLcs.LGcs.AI
Julie Kallini, Artidoro Pagnoni, Tomasz Limisiewicz, Gargi Ghosh, Luke Zettlemoyer
Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer...
Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generation techniques. First, we introduce BLT Diffusion (BLT-D), a new model and our fastest BLT variant, trained with an auxiliary block-wise diffusion objective alongside the standard next-byte prediction ...
372 Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs
2605.08045
Clinical Report Data Extraction蒸馏LLM抽取CMR报告结构化字段并给出不确定度。
cs.CL
Yi Yu, Parker Martin, Zhenyu Bu, Yixuan Liu, Yi-Yu Zheng
Converting free-text cardiac magnetic resonance (CMR) reports into auditable structured data remains a bottleneck for cohort assembly, longitudinal curation, and clinical decision support. We present CMR-EXTR, a lightweight framework that converts free-text CM...
Converting free-text cardiac magnetic resonance (CMR) reports into auditable structured data remains a bottleneck for cohort assembly, longitudinal curation, and clinical decision support. We present CMR-EXTR, a lightweight framework that converts free-text CMR reports into structured data and assigns per-field confidence for quality control. A teacher-student distillation pipeline enables fully offline inference while limiting manual annotation. Uncertainty integrates three complementary princi...
373 Accurate and Efficient Statistical Testing for Word Semantic Breadth
2605.08048
Semantic Breadth Statistical Testing提出高效统计检验比较词语语义广度的差异。
cs.CL
Yo Ehara
Measuring the breadth of a word's meaning, or its spread across contexts, has become feasible with contextualized token embeddings. A word type can be represented as a cloud of token vectors, with dispersion-based statistics serving as proxies for contextual d...
Measuring the breadth of a word's meaning, or its spread across contexts, has become feasible with contextualized token embeddings. A word type can be represented as a cloud of token vectors, with dispersion-based statistics serving as proxies for contextual diversity (Nagata and Tanaka-Ishii, ACL2025). These measurements are useful for deciding appropriate sense distinctions when constructing thesauri and domain-specific dictionaries. However, when comparing the breadth of two word types, naive...
374 CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation
2605.08057
Text-to-SQL Compute Allocation按复杂度分配推理预算并探索候选以提升Text-to-SQL。
cs.CLcs.AI
James Petullo, Nianwen Xue
While recent advancements in inference-time learning have improved LLM reasoning on Text-to-SQL tasks, current solutions still struggle to perform well on the most challenging tasks in the Bird-Bench (BIRD) benchmark. This is due to inadequate solution space e...
While recent advancements in inference-time learning have improved LLM reasoning on Text-to-SQL tasks, current solutions still struggle to perform well on the most challenging tasks in the Bird-Bench (BIRD) benchmark. This is due to inadequate solution space exploration, which is necessary to uncover promising candidate queries that can be further refined to produce the correct output. To address this challenge, we introduce CA-SQL, a novel Text-to-SQL pipeline that utilizes the estimated diffic...
375 The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents
2605.08060
Context Length Social Dilemmas发现扩展记忆会在多智能体博弈中系统性降低合作。
cs.CLcs.AI
Jiayuan Liu, Tianqin Li, Shiyi Du, Xin Luo, Haoxuan Zeng
Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 o...
Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game settings, a pattern we term the memory curse. We isolate the underlying mechanism through three analyses. First, lexical analysis of 378,000 reasoning traces associates this breakdown with eroding forward-looking intent rat...
376 Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration
2605.08077
Conformal KGQA Calibration用路径级共形校准为KG问答提供可信覆盖保证。
cs.CL
Shuhang Lin, Chuhao Zhou, Xiao Lin, Zihan Dong, Kuan Lu
Knowledge Graph Question Answering (KGQA) has shown promise for grounded and interpretable reasoning, yet existing approaches often fail to provide reliable coverage guarantees over retrieved answers. While Conformal Prediction (CP) offers a principled framewo...
Knowledge Graph Question Answering (KGQA) has shown promise for grounded and interpretable reasoning, yet existing approaches often fail to provide reliable coverage guarantees over retrieved answers. While Conformal Prediction (CP) offers a principled framework for producing prediction sets with statistical guarantees, prior methods suffer from critical limitations in both calibration validity and score discriminability, resulting in violated coverage guarantees and excessively large prediction...
377 LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
2605.08083
Automated Test-Time Scaling提出AutoTTS让LLM以环境驱动自动发现推理扩展策略。
cs.CL
Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang
Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patt...
Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments wh...
378 More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models
2605.06672
Reasoning Length Bias揭示推理链越长多选题位置偏置反而越强。
cs.CLcs.LGcs.AI
Xiao Wang
Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any reasoning-...
Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any reasoning-capable model, per-question position bias scales with the length of the reasoning trajectory. Across thirteen reasoning-mode configurations (two R1-distilled 7-8B models, two base models prompted with CoT, and DeepSeek-R1 at 671B) on MMLU, ...
379 RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory
2605.06675
Mixed-Precision KV Quantization用率失真理论为KV缓存分配最优混合比特量化。
cs.CLcs.LG
Fei Zuo, Zikang Zhou, Hao Cong, Xiaoyan Xi, Ho Fai Leung
Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache to fewer bits reduces this cost, yet a...
Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache to fewer bits reduces this cost, yet all current quantizers assign the same bit-width to every attention head, ignoring the large variation in head importance. A natural idea is to allocate more bits to important heads and fewer to the rest. We show, however, that such mixed-pr...
380 LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction
2605.06676
Learned KV Cache Eviction端到端学习按头预算与选token的KV缓存淘汰策略。
cs.CLcs.LG
Enshuai Zhou, Yifan Hao, Chao Wang, Rui Zhang, Di Huang
Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical priors rather...
Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical priors rather than task objectives, causing resource misallocation, while heuristic selection relies on coupled query-key interactions or static inductive biases (e.g., attention sinks). To address this limitation, we introduce LKV (Learned KV Eviction)...
381 Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models
2605.06683
Attention-Free Sequence Modeling提出Toeplitz MLP Mixer以低复杂度替代注意力建模序列。
cs.CLcs.LGcs.AI
Benjamin L. Badger, Ethan Roland
Transformer-based large language models are in some respects limited by the quadratic time and space computational complexity of attention. We introduce the Toeplitz MLP Mixer (TMM), a transformer-like architecture that swaps attention for triangular-masked To...
Transformer-based large language models are in some respects limited by the quadratic time and space computational complexity of attention. We introduce the Toeplitz MLP Mixer (TMM), a transformer-like architecture that swaps attention for triangular-masked Toeplitz matrix multiplication over the sequence dimension resulting in $\mathcal{O} (dn \log n)$ time and $\mathcal O(dn)$ space complexity during training and $\mathcal O(dn)$ time and space at inference prefill. Despite the lack of sophist...
382 State Representation and Termination for Recursive Reasoning Systems
2605.06690
Recursive Reasoning State Control用认知状态图表示递归推理并给出终止准则。
cs.CLcs.LGcs.AI
Debashis Guha, Amritendu Mukherjee, Sanjay Kukreja, Tarun Kumar
Recursive reasoning systems alternate between acquiring new evidence and refining an accumulated understanding. Two design choices are typically left implicit: how to represent the evolving reasoning state, and when to stop iterating. This paper addresses both...
Recursive reasoning systems alternate between acquiring new evidence and refining an accumulated understanding. Two design choices are typically left implicit: how to represent the evolving reasoning state, and when to stop iterating. This paper addresses both. We represent the reasoning state as an epistemic state graph encoding extracted claims, evidential relations, open questions, and confidence weights. We define the order-gap as the distance between the states reached by expand-then-consol...
383 CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
2605.06702
Deployment-Time Continual Adaptation提出CASCADE让LLM部署中基于案例持续适应与学习。
cs.CLcs.LGcs.AI
Siyuan Guo, Yali Du, Hechang Chen, Yi Chang, Jun Wang
Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts w...
Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experien...
384 From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms
2605.06716
LLM Agent Memory Survey综述LLM智能体记忆机制从存储到经验的演进脉络。
cs.CLcs.AI
Jinghao Luo, Yuchen Tian, Chuxue Cao, Ziyang Luo, Hongzhan Lin
Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. While memory mechanisms have emerged as the architectural cornerstone of these systems, current research remain...
Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. While memory mechanisms have emerged as the architectural cornerstone of these systems, current research remains fragmented, oscillating between operating system engineering and cognitive science. This theoretical divide prevents a unified view of technological synthesis and a coherent evolutionary perspective. To bridge this gap, this survey propos...
385 When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment
2605.06723
Pre-Verbalization Commitment Theory提出有限答案稳定化理论刻画模型何时对答案作出承诺。
cs.CLcs.LGcs.AI
Long Zhang, Wei-neng Chen, Feng-feng Wei, Zi-bo Qin
Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilizat...
Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilization}. For a model state and specified answer verbalizers, we project the model's own continuation probabilities onto a finite answer set; in binary tasks this yields an exact log-odds code, $\delta(\xi)=S_\theta(\mathrm{yes}\mid\xi)-S_\thet...
386 When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents
2605.06731
Persistent Agent State Poisoning揭示个性化智能体跨会话状态可被日常对话渐进投毒。
cs.CLcs.LG
Xiaoyu Xu, Minxin Du, Qipeng Xie, Haobin Ke, Qingqing Ye
Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term sta...
Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term state, inadvertently weakening future confirmation boundaries, expanding tool-use defaults, and escalating autonomous behavior over time. We formalize this risk as \textbf{unintended long-term state poisoning}. To systematically study it, we i...
387 ProtSent: Protein Sentence Transformers
2605.06830
Protein Embedding Contrastive Tuning提出ProtSent对比微调蛋白模型以获得通用序列嵌入。
cs.CLcs.LG
Dan Ofer, Oriel Perets, Michal Linial, Nadav Rappoport
Protein language models (pLMs) produce per-residue representations that capture evolutionary and structural information, yet their mean-pooled sequence embeddings are not explicitly trained to reflect functional, evolutionary or structural similarity between p...
Protein language models (pLMs) produce per-residue representations that capture evolutionary and structural information, yet their mean-pooled sequence embeddings are not explicitly trained to reflect functional, evolutionary or structural similarity between proteins. We present Protein Sentence Transformers (ProtSent), a contrastive fine-tuning framework for adapting PLMs into general-purpose embedding models. ProtSent trains with MultipleNegativesRankingLoss across five protein-pair datasets: ...
388 Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility
2605.06856
Real-World Utility Evaluation指出基准与效用脱节并提出面向真实效用的评测范式。
cs.CLcs.LG
Ishani Mondal, Shweta Bhardwaj
Generative AI systems achieve impressive performance on standard benchmarks yet fail to deliver real-world utility, a disconnect we identify across 28 deployment cases spanning education, healthcare, software engineering, and law. We argue that this benchmark ...
Generative AI systems achieve impressive performance on standard benchmarks yet fail to deliver real-world utility, a disconnect we identify across 28 deployment cases spanning education, healthcare, software engineering, and law. We argue that this benchmark utility gap arises from three recurring failures in evaluation practice: proxy displacement, temporal collapse, and distributional concealment. Motivated by these observations, we argue that generative AI evaluation requires a paradigm shif...
389 Regulating Branch Parallelism in LLM Serving
2605.06914
LLM Serving Branch Parallelism提出调控分支并行的服务策略以平衡吞吐与延迟。
cs.CLcs.AI
Swapnil Gandhi, Siva Hari, William J. Dally, Christos Kozyrakis
Recent methods expose intra-request parallelism in LLM outputs, allowing independent branches to decode concurrently. Existing serving systems execute these branches eagerly or under fixed caps. We show that both are brittle: eager admission inflates the share...
Recent methods expose intra-request parallelism in LLM outputs, allowing independent branches to decode concurrently. Existing serving systems execute these branches eagerly or under fixed caps. We show that both are brittle: eager admission inflates the shared decode step, degrading co-batched requests in serial stages, while conservative fixed caps forgo the throughput that motivated exposing branches in the first place. We call the excess step latency caused by admitted branches the branch ex...
390 From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle
2605.06963
RAG Tutoring for Moodle开发基于RAG的Moodle辅导插件并支持教师在环监督。
cs.CLcs.AI
Anna Ostrowska, Micha{\l} Kukla, Gabriela Majstrak, Jan Opala, Sebastian Perga{\l}a
This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design,...
This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design, providing students with interactive, Socratic-based tutoring and educators with a "human-in-the-loop" workspace for supervised content generation. By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant a...
391 Bridging Textual Profiles and Latent User Embeddings for Personalization
2605.06981
Personalized User Representation Learning用强化学习融合文本画像与潜在用户向量以提升个性化推荐。
cs.CL
Zhaoxuan Tan, Xiang Zhai, Yan Zhu, Meng Jiang, Mohamed Hammad
Personalized systems rely on user representations to connect behavioral history with downstream recommendation applications. Existing methods typically employ either supervised latent user embeddings, which are effective for retrieval but difficult to interpre...
Personalized systems rely on user representations to connect behavioral history with downstream recommendation applications. Existing methods typically employ either supervised latent user embeddings, which are effective for retrieval but difficult to interpret, or textual user profiles, which are interpretable but challenging to optimize for downstream utility due to lack of direct supervision. To bridge this gap, we present BLUE, a reinforcement learning framework that unifies these two forms ...
392 SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair
2605.07001
Architectural Code Smell Repair构建SmellBench评测LLM代理修复架构级代码异味能力。
cs.CL
Ion George Dinu (University of Craiova, Craiova, Romania), Marian Cristian Mih\u{a}escu (University of Craiova, Craiova
Architectural code smells erode software maintainability and are costly to repair manually, yet unlike localized bugs, they require cross-module reasoning about design intent that challenges both developers and automated tools. While large language model agent...
Architectural code smells erode software maintainability and are costly to repair manually, yet unlike localized bugs, they require cross-module reasoning about design intent that challenges both developers and automated tools. While large language model agents excel at bug fixing and code-level refactoring, their ability to repair architectural code smells remains unexplored. We present the first empirical evaluation of LLM agents on architectural code smell repair. We contribute SmellBench, a ...
393 Theoretical Limits of Language Model Alignment
2605.07105
Alignment Theory under KL分析KL约束下强化学习与best-of-N对齐的奖励提升上限。
cs.CLcs.LG
Lucas Monteiro Paes, Natalie Mackraz, Barry-John Theobald, Federico Danieli
Language model (LM) alignment improves model outputs to reflect human preferences while preserving the capabilities of the base model. The most common alignment approaches are (i) reinforcement learning, which maximizes the expected reward under a KL-divergenc...
Language model (LM) alignment improves model outputs to reflect human preferences while preserving the capabilities of the base model. The most common alignment approaches are (i) reinforcement learning, which maximizes the expected reward under a KL-divergence constraint, and (ii) best-of-$N$ alignment, which selects the highest-reward output among $N$ independent samples. Despite their widespread use, the fundamental limits of reward improvement under a KL budget remain poorly understood. We c...
394 The Position Curse: LLMs Struggle to Locate the Last Few Items in a List
2605.07127
LLM Positional Retrieval Failure揭示LLM难定位短列表末尾项的“位置诅咒”并系统评测。
cs.CLcs.LG
Zhanqi Zhang, Hua-Dong Xiong, Robert C. Wilson, Mikio Aoi, Marcelo G. Mattar
Modern large language models (LLMs) can find a needle in a haystack (locating a single relevant fact buried among hundreds of thousands of irrelevant tokens) with near-saturated accuracy, yet fail to retrieve the last few items in a short list. We call this fa...
Modern large language models (LLMs) can find a needle in a haystack (locating a single relevant fact buried among hundreds of thousands of irrelevant tokens) with near-saturated accuracy, yet fail to retrieve the last few items in a short list. We call this failure the Position Curse. For instance, even in a two-line code snippet, Claude Opus 4.6 misidentifies the second-to-last line most of the time. To characterize this failure, we evaluated two complementary queries: given a position in a seq...
395 Topic Is Not Agenda: A Citation-Community Audit of Text Embeddings
2605.07158
Embedding Audit via Citations用大规模引文社区检验文本嵌入相似度是否反映研究议程。
cs.CLcs.LG
Junseon Yoo
Vector search and retrieval-augmented generation (RAG) rest on the assumption that cosine similarity between text embeddings reflects conceptual relatedness. We measure where this assumption breaks. We build an augmented citation graph over 3.58M scientific pa...
Vector search and retrieval-augmented generation (RAG) rest on the assumption that cosine similarity between text embeddings reflects conceptual relatedness. We measure where this assumption breaks. We build an augmented citation graph over 3.58M scientific papers and partition it via Leiden CPM at two granularities: sub-field (L1) and research-agenda (L2, hierarchical inside each L1). Four state-of-the-art embeddings (Gemini, Qwen3-8B, Qwen3-0.6B, SPECTER2) clear the L1 bar reasonably (45-52% t...
396 DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models
2605.07210
Diffusion-based Text Retrieval用扩散语言模型并行生成多代表token以提升检索表示。
cs.CL
Shuai Wang, Yin Yu, Shengyao Zhuang, Bevan Koopman, Guido Zuccon
PromptReps showed that an autoregressive language model can be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models,...
PromptReps showed that an autoregressive language model can be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and prior multi-token variants did not reliably improve over single-token decoding. We show that the bottleneck is sequential generation, not the multi-token idea itself. DiffRetriever is a repr...
397 MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory
2605.07242
Agentic Memory Cascade Repair提出MemoRepair优先修复屏障以解决记忆派生状态级联过期问题。
cs.CLcs.AI
Yang Zhao, Chengxiao Dai, Mengying Kou, Yue Xiu
Agentic memory evolves across tasks into durable derived artifacts: summaries, cached outputs, embeddings, learned skills, and executable tool procedures. When a source artifact is deleted, corrected, or invalidated by tool or API migration, descendants derive...
Agentic memory evolves across tasks into durable derived artifacts: summaries, cached outputs, embeddings, learned skills, and executable tool procedures. When a source artifact is deleted, corrected, or invalidated by tool or API migration, descendants derived from that source can remain visible and steer future actions with stale support. We formalize this failure mode as the cascade update problem, where repair targets the visible derived state of the memory store. We present MemoRepair, a ba...
398 Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models
2605.07244
Mutual RL Experience Sharing让异构LLM在并行RL后训练中共享带类型经验并对齐不同分词器。
cs.CLcs.LGcs.AI
Xiaoze Liu, Dhananjay Ram, Yuting Zhang, Zhaoyang Zhang, Wei Xia
We introduce Mutual Reinforcement Learning, a framework for concurrent RL post-training in which heterogeneous LLM policies exchange typed experience while keeping separate parameters, objectives, and tokenizers. The framework combines a Shared Experience Exch...
We introduce Mutual Reinforcement Learning, a framework for concurrent RL post-training in which heterogeneous LLM policies exchange typed experience while keeping separate parameters, objectives, and tokenizers. The framework combines a Shared Experience Exchange (SEE), Multi-Worker Resource Allocation (MWRA), and a Tokenizer Heterogeneity Layer (THL) that retokenizes text and aligns token-level traces across incompatible vocabularies. This substrate makes the experience-sharing design question...
399 When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models
2605.07260
MoE Router Counterfactual Analysis用反事实等算力路由对比评估MoE模型token级专家误路由。
cs.CLcs.LG
Youngsik Yoon, Siwei Wang, Wei Chen, Jungseul Ok
Mixture-of-Experts (MoE) language models route each token to a small subset of experts, but whether the routes selected by a trained top-$k$ router are good ones is rarely evaluated directly. Holding the model fixed, we compare each standard route against samp...
Mixture-of-Experts (MoE) language models route each token to a small subset of experts, but whether the routes selected by a trained top-$k$ router are good ones is rarely evaluated directly. Holding the model fixed, we compare each standard route against sampled equal-compute alternatives for the same token and score each by the next-token probability it assigns to the realized token in a verified reasoning trajectory. The result is sharply token-conditional: the standard router is well-aligned...
400 On the Complexity of the Matching Problem of Regular Expressions with Backreferences
2605.07289
Regex Backreference Matching Complexity研究含反向引用正则匹配的复杂度边界以理解ReDoS风险。
cs.CL
Soh Kumabe, Yuya Uezato
ReDoS is a well-known type of algorithmic complexity attack, where an adversary supplies maliciously crafted strings to a regular expression matching engine, aiming to exhaust computational resources of systems. Even quadratic-time behavior in matching engines...
ReDoS is a well-known type of algorithmic complexity attack, where an adversary supplies maliciously crafted strings to a regular expression matching engine, aiming to exhaust computational resources of systems. Even quadratic-time behavior in matching engines has been exploited in successful attacks, as exemplified by major outages at Stack Overflow (2016) and Cloudflare (2019). These incidents motivate a fundamental question: Is it possible to construct matching engines that are provably effic...
401 Unsolvability Ceiling in Multi-LLM Routing: An Empirical Study of Evaluation Artifacts
2605.07395
Multi-LLM Routing Evaluation Artifacts实证分析多模型路由中“不可解上限”源于评测指标与伪影。
cs.CLcs.LGcs.AI
Saloni Garg, Amit Sagtani
Efficient routing across multiple LLMs enables cost-quality tradeoffs by directing queries to the cheapest capable model. Prior work attributes routing headroom to an "unsolvability ceiling", queries no model in the pool can solve. We present a large-scale stu...
Efficient routing across multiple LLMs enables cost-quality tradeoffs by directing queries to the cheapest capable model. Prior work attributes routing headroom to an "unsolvability ceiling", queries no model in the pool can solve. We present a large-scale study of multi-tier LLM routing with 206,000 query-model pairs across six benchmarks (MMLU, MedQA, HumanEval, MBPP, Alpaca, ShareGPT) using the Gemma 4 and Llama 3.1 families. Evaluating with both LLM-as-a-judge and exact-match metrics, we sho...
402 ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression
2605.07501
RL for CoT Compression用经验引导RL自适应压缩思维链以降低推理token与延迟。
cs.CLcs.LG
Tingcheng Bian, Yuzhe Zhang, Jing Jin, Jinchang Luo, MingQuan Cheng
Large reasoning models (LRMs) achieve strong performance via extended chain-of-thought (CoT) reasoning, yet suffer from excessive token consumption and high inference latency. Existing reinforcement learning (RL) approaches for CoT compression rely on uniform,...
Large reasoning models (LRMs) achieve strong performance via extended chain-of-thought (CoT) reasoning, yet suffer from excessive token consumption and high inference latency. Existing reinforcement learning (RL) approaches for CoT compression rely on uniform, static length penalties that neglect model capability dynamics and problem-level difficulty variation. We propose \textbf{ExpThink}\xspace, an RL framework that addresses both dimensions through two complementary mechanisms. First, \emph{e...
403 Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States
2605.07579
RLVR Baseline from Internal States利用策略模型内部状态估计价值基线以低成本稳定RLVR训练。
cs.CLcs.LGcs.AI
Yunho Choi, Jongwon Lim, Woojin Ahn, Minjae Oh, Jeonghoon Shim
Reinforcement learning with verifiable rewards (RLVR) for Large Reasoning Models hinges on baseline estimation for variance reduction, but existing approaches pay a heavy price: PPO requires a policy-model scale critic, while GRPO needs multiple rollouts per p...
Reinforcement learning with verifiable rewards (RLVR) for Large Reasoning Models hinges on baseline estimation for variance reduction, but existing approaches pay a heavy price: PPO requires a policy-model scale critic, while GRPO needs multiple rollouts per prompt to keep its empirical group mean stable. We introduce Policy Optimization with Internal State Value Estimation), which obtains a baseline at negligible cost by using the policy model's internal signals already computed during the poli...
404 Mathematical Reasoning via Intervention-Based Time-Series Causal Discovery Using LLMs as Concept Mastery Simulators
2605.07600
Causal Discovery for Math Reasoning用LLM作干预模拟器进行因果发现以定位数学解题关键概念。
cs.CLcs.LGcs.AI
Tsuyoshi Okita
Recent methods for improving LLM mathematical reasoning, whether through MCTS-based test-time search or causal graph-guided knowledge injection, cannot identify which concepts causally contribute to a correct answer, as the observed association may be spurious...
Recent methods for improving LLM mathematical reasoning, whether through MCTS-based test-time search or causal graph-guided knowledge injection, cannot identify which concepts causally contribute to a correct answer, as the observed association may be spurious, driven by confounders such as problem difficulty. We propose CIKA (Causal Intervention for Knowledge Activation), a framework that uses the LLM itself as an interventional simulator: a prompt sets the concept state to ``mastered'' and the...
405 Reliable Chain-of-Thought via Prefix Consistency
2605.07654
Chain-of-Thought Reliability Signal提出前缀一致性度量并加权自一致投票以提升推理可靠性。
cs.CLcs.LG
Naoto Iwase, Yuki Ichihara, Mohammad Atif Quamar, Junpei Komiyama
Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regener...
Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer...
406 TRACE: Tourism Recommendation with Accountable Citation Evidence
2605.07677
Evidence-grounded Tourism Recommendation提出TRACE基准评测带评论证据引用的多轮旅游对话推荐。
cs.CLcs.AI
Zixu Zhao, Sijin Wang, Yu Hou, Yuanyuan Xu, Yufan Sheng
Tourism is a high-stakes setting for conversational recommender systems (CRS): a plausible-sounding suggestion can waste real money and trip time once a traveler acts on it. Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over e...
Tourism is a high-stakes setting for conversational recommender systems (CRS): a plausible-sounding suggestion can waste real money and trip time once a traveler acts on it. Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over entity mentions, and tourism-specific resources add spatial or knowledge-graph context, yet none of them couple multi-turn recommendation with verbatim review-span evidence and rejection recovery. This leaves an evaluation gap for tourism re...
407 Rethinking State Tracking in Recurrent Models Through Error Control Dynamics
2605.07755
Recurrent State Tracking Error Dynamics从误差控制动力学证明仿射循环网络难纠正状态漂移。
cs.CLcs.LG
Jiwan Chung, Heechan Choi, Seon Joo Kim
The theory of state tracking in recurrent architectures has predominantly focused on expressive capacity: whether a fixed architecture can theoretically realize a set of symbolic transition rules. We argue that equally important is error control, the dynamics ...
The theory of state tracking in recurrent architectures has predominantly focused on expressive capacity: whether a fixed architecture can theoretically realize a set of symbolic transition rules. We argue that equally important is error control, the dynamics governing hidden-state drift along the directions that distinguish symbolic states. We prove that affine recurrent networks, a class of models encompassing State-Space Models and Linear Attention, cannot correct errors along state-separatin...
408 Tracing Uncertainty in Language Model "Reasoning"
2605.07776
Uncertainty Dynamics in Reasoning以不确定性轨迹刻画思维链生成过程并分析推理动态。
cs.CLcs.LGcs.AI
Nils Gr\"unefeld, Bertram H{\o}jer, Philipp Mondorf, Barbara Plank, Anna Rogers
Language model (LM) "reasoning", commonly described as Chain-of-Thought or test-time scaling, often improves benchmark performance, but the dynamics underlying this process remain poorly understood. We study these dynamics through the lens of uncertainty quant...
Language model (LM) "reasoning", commonly described as Chain-of-Thought or test-time scaling, often improves benchmark performance, but the dynamics underlying this process remain poorly understood. We study these dynamics through the lens of uncertainty quantification by treating the "reasoning" traces, the intermediate token sequences generated by LMs, as evolving model states. We summarize each trace by an uncertainty trace profile: a small set of features describing the shape of the uncertai...
409 OrScale: Orthogonalised Optimization with Layer-Wise Trust-Ratio Scaling
2605.07815
Layer-wise Trust-Ratio Optimization提出OrScale对正交化更新加入逐层信任比缩放以改进训练。
cs.CLcs.LG
Yuxuan Lou, Yang You
Muon improves neural-network training by orthogonalizing matrix-valued updates, but it leaves each layer's update magnitude controlled mostly by a global learning rate. We introduce OrScale, a trust-ratio extension of Muon built on a simple rule: the denominat...
Muon improves neural-network training by orthogonalizing matrix-valued updates, but it leaves each layer's update magnitude controlled mostly by a global learning rate. We introduce OrScale, a trust-ratio extension of Muon built on a simple rule: the denominator of a layer-wise ratio should measure the Frobenius norm of the actual parameter-space direction that will be applied. This yields OrScale for general matrix layers and OrScale-LM for language models, where Moonlight shape scaling is comb...
410 KL for a KL: On-Policy Distillation with Control Variate Baseline
2605.07865
Stable On-Policy Distillation用控制变量基线降低OPD梯度方差以提升后训练稳定性。
cs.CLcs.LGcs.AI
Minjae Oh, Sangjun Song, Gyubin Choi, Yunho Choi, Yohan Jo
On-Policy Distillation (OPD) has emerged as a dominant post-training paradigm for large language models, especially for reasoning domains. However, OPD remains unstable in practice due to the high gradient variance of its single-sample Monte Carlo estimator, a...
On-Policy Distillation (OPD) has emerged as a dominant post-training paradigm for large language models, especially for reasoning domains. However, OPD remains unstable in practice due to the high gradient variance of its single-sample Monte Carlo estimator, and recipes for stable training are still immature. We propose vOPD (On-Policy Distillation with a control variate baseline), which casts OPD as policy-gradient RL and stabilizes it by introducing a control variate baseline-canonically a val...
411 Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation
2605.07924
Few-step Discrete Flow Distillation用能量导航蒸馏改进离散流匹配轨迹以实现少步生成。
cs.CLcs.LGcs.AI
Amin Karimi Monsefi, Dominic Culver, Nikhil Bhendawade, Manuel R. Ciosici, Yizhe Zhang
Discrete flow matching generates text by iteratively transforming noise tokens into coherent language, but may require hundreds of forward passes. Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps. When the ...
Discrete flow matching generates text by iteratively transforming noise tokens into coherent language, but may require hundreds of forward passes. Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps. When the student underperforms, the usual explanation is insufficient capacity. We argue the opposite: the trajectory is the bottleneck, not the student. Each training trajectory is built through a chain of blind stochastic jumps with no evaluation ...
412 Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims
2605.08012
Causal Claims in Interpretability呼吁机理可解释性论文明确因果识别假设以支撑因果表述。
cs.CLcs.LGcs.AI
Zezheng Lin, Fengming Liu
Mechanistic interpretability papers increasingly use causal vocabulary: circuits, mediators, causal abstraction, monosemanticity. Such claims require explicit identification assumptions. A purposive audit of 10 papers across four methodological strands finds n...
Mechanistic interpretability papers increasingly use causal vocabulary: circuits, mediators, causal abstraction, monosemanticity. Such claims require explicit identification assumptions. A purposive audit of 10 papers across four methodological strands finds no dedicated identification-assumptions section and a recurring pattern: validation metrics such as faithfulness, completeness, monosemanticity, alignment, or ablation effects are reported as causal support without stating the assumptions th...
413 Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms
2407.04183
LLMs and Wikipedia Neutrality评测LLM按维基中立规范检测与修正偏见编辑的能力与偏差。
cs.CLcs.AI
Joshua Ashkinaze, Ruijia Guan, Laura Kurek, Eytan Adar, Ceren Budak
Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) bi...
Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% accuracy on a balanced dataset. Models exhibited contrasting biases (some under- and others over-predi...
414 UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
2410.21438
Unified Fine-tuning for Alignment用广义隐式奖励把SFT与RLHF/DPO等对齐统一为单阶段训练。
cs.CLcs.LG
Zhichao Wang, Bin Bi, Zixu Zhu, Xiangbo Mao, Jun Wang
By pretraining on trillions of tokens, an LLM gains the capability of text generation. However, to enhance its utility and reduce potential harm, SFT and alignment are applied sequentially to the pretrained model. Because SFT and alignment have different objec...
By pretraining on trillions of tokens, an LLM gains the capability of text generation. However, to enhance its utility and reduce potential harm, SFT and alignment are applied sequentially to the pretrained model. Because SFT and alignment have different objectives and underlying processes, performance on certain tasks can decline. To address this, we seamlessly introduce Unified Fine-Tuning (UFT), which integrates SFT and alignment into a single training stage using the same objective and loss ...
415 Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression
2502.01941
KV Cache Compression for Reasoning提出KVFundaBench评测KV压缩对高密度推理语义完整性的影响。
cs.CLcs.AI
Xiang Liu, Zhenheng Tang, Hong Chen, Peijie Dong, Zeyu Li
While Key-Value (KV) cache compression is essential for efficient LLM inference, current evaluations disproportionately focus on sparse retrieval tasks, potentially masking the degradation of High-Density Reasoning where Chain-of-Thought (CoT) coherence is cri...
While Key-Value (KV) cache compression is essential for efficient LLM inference, current evaluations disproportionately focus on sparse retrieval tasks, potentially masking the degradation of High-Density Reasoning where Chain-of-Thought (CoT) coherence is critical. We introduce KVFundaBench to systematically evaluate this gap, revealing a sharp dichotomy: while retrieval tasks remain robust, reasoning tasks exhibit severe Task-Dependent Degradation under aggressive compression due to disrupted ...
416 Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning
2502.07143
Grounded Medical Dialogue with Uncertainty构建面向患者的医疗对话方法以指南为依据并显式管理不确定性。
cs.CL
Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu, Fenglin Liu, Junde Wu
The severe shortage of medical doctors limits access to timely and reliable healthcare, leaving millions underserved. Large language models (LLMs) offer a potential solution but struggle in real-world clinical interactions. Many LLMs are not grounded in author...
The severe shortage of medical doctors limits access to timely and reliable healthcare, leaving millions underserved. Large language models (LLMs) offer a potential solution but struggle in real-world clinical interactions. Many LLMs are not grounded in authoritative medical guidelines and fail to transparently manage diagnostic uncertainty. Their language is often rigid and mechanical, lacking the human-like qualities essential for patient trust. To address these challenges, we propose Ask Pati...
417 S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models
2503.05085
Speech-to-Speech Instruction Benchmark提出S2S-Arena评测语音到语音模型对韵律情感等副语言指令遵循。
cs.CLcs.SDeess.AS
Feng Jiang, Zhiyu Lin, Yiyang Liu, Liumeng Xue, Fan Bu
Recent advances in large language models (LLMs) have fundamentally reshaped speech-to-speech (S2S) systems, enabling increasingly natural spoken interaction. However, existing benchmarks still rely heavily on text-based evaluation and largely ignore paralingui...
Recent advances in large language models (LLMs) have fundamentally reshaped speech-to-speech (S2S) systems, enabling increasingly natural spoken interaction. However, existing benchmarks still rely heavily on text-based evaluation and largely ignore paralinguistic cues such as prosody, emotion, and speaker traits, which are central to expressive and human-like communication. We introduce S2S-Arena, a speech-native benchmark for evaluating instruction-following S2S models with explicit assessment...
418 FiSMiness: A Finite State Machine Based Paradigm for Emotional Support Conversations
2504.11837
FSM-guided Emotional Support Chat用有限状态机约束LLM对话流程以提升情感支持对话长期效果。
cs.CLcs.AI
Yue Zhao, Qingqing Gu, Xiaoyu Wang, Teng Chen, Zhonglin Jiang
Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram fro...
Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage the Finite State Machine (FSM) on LLMs, and propose a framework called FiSMiness. Our framework allow...
419 Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
2505.22842
Probabilistic Positional Encoding Theory提出BAM将位置编码视为先验以统一方法并解释长上下文外推。
cs.CLcs.LG
Arthur S. Bianchessi, Yasmin C. Aguirre, Rodrigo C. Barros, Lucas S. Kupssinsk\"u
Transformer-based language models rely on positional encoding (PE) to handle token order and support context length extrapolation. However, existing PE methods lack theoretical clarity and rely on limited evaluation metrics to substantiate their extrapolation ...
Transformer-based language models rely on positional encoding (PE) to handle token order and support context length extrapolation. However, existing PE methods lack theoretical clarity and rely on limited evaluation metrics to substantiate their extrapolation claims. We propose the Bayesian Attention Mechanism (BAM), a theoretical framework that formulates positional encoding as a prior within a probabilistic model. BAM unifies existing methods (e.g., NoPE and ALiBi) and motivates a new Generali...
420 Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks
2506.13351
Constrained RL for Unverifiable Tasks用token级反思奖励与规则门控约束训练LLM处理不可验证任务。
cs.CLcs.LGcs.AI
Yifei Xu, Tusher Chakraborty, Srinagesh Sharma, Leonardo Nunes, Swati Sharma
Reinforcement learning (RL) training of large language models (LLMs) on unverifiable tasks is challenging even when a reasonable-quality reference answer is available. We propose a constrained RL training framework that (i) optimizes a token-level dense Reason...
Reinforcement learning (RL) training of large language models (LLMs) on unverifiable tasks is challenging even when a reasonable-quality reference answer is available. We propose a constrained RL training framework that (i) optimizes a token-level dense Reasoning Reflection Reward (R3) aligned with reasoning quality, and (ii) enforces rubric-gating as feasibility constraints at the rollout group level. R3 measures the model's token-level certainty of a reference answer under its chain-of-thought...
421 VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents
2506.21582
Interactive LLM Text Analytics提出VIDEE用智能体可视化分解并执行文本分析流程。
cs.CLcs.AI
Sam Yu-Te Lee, Chenyang Ji, Shicheng Wen, Lifu Huang, Dongyu Liu
Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabl...
Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabling more accessible and automated text analysis (e.g., topic detection, summarization, information extraction, etc.). We introduce VIDEE, a system that supports entry-level data analysts to conduct advanced text analytics with intelligent a...
422 Human-like fleeting memory improves language learning but impairs reading time prediction in transformer language models
2508.05803
Fleeting Memory in Transformers引入类人短暂记忆以改进语言学习但影响阅读时长预测。
cs.CL
Abishek Thamma, Micha Heilbron
Human memory is fleeting. As words are processed, the exact wordforms that make up incoming sentences are rapidly lost. Cognitive scientists have long believed that this limitation of memory may, paradoxically, help in learning language - an idea supported by ...
Human memory is fleeting. As words are processed, the exact wordforms that make up incoming sentences are rapidly lost. Cognitive scientists have long believed that this limitation of memory may, paradoxically, help in learning language - an idea supported by classic connectionist modelling work. The rise of Transformers appears to challenge this idea, as these models can learn language effectively, despite lacking memory limitations or other architectural recency biases. Here, we investigate th...
423 Training-Free Multimodal Large Language Model Orchestration
2508.10016
Training-Free Multimodal Orchestration无需训练地编排LLM与模态专家实现多模态输入输出。
cs.CL
Tianyu Xie, Yuexiao Ma, Yuhang Wu, Wang Chen, Jiayi Ji
Building interactive omni-modal assistants often relies on end-to-end multimodal alignment to fuse heterogeneous modalities, which incurs substantial data and compute costs and limits extensibility. We present Training-Free Large Language Model Orchestration (...
Building interactive omni-modal assistants often relies on end-to-end multimodal alignment to fuse heterogeneous modalities, which incurs substantial data and compute costs and limits extensibility. We present Training-Free Large Language Model Orchestration (LLM Orchestration), a training-free orchestration framework that integrates off-the-shelf modality experts into a unified multimodal input--output system without additional gradient-based training for integration. LLM Orchestration comprise...
424 User eXperience Perception Insights Dataset (UXPID): Synthetic User Feedback from Public Industrial Forums
2509.11777
Synthetic UX Feedback Dataset构建UXPID数据集以合成并标注工业论坛用户体验反馈。
cs.CLcs.LG
Mikhail Kulyabin, Jan Joosten, Choro Ulan uulu, Nuno Miguel Martins Pacheco, Fabian Ries
Customer feedback in industrial forums offers rich but underexplored insights into real-world product experience. Yet systematic analysis remains challenging due to unstructured, domain-specific content and the scarcity of high-quality labeled datasets. This p...
Customer feedback in industrial forums offers rich but underexplored insights into real-world product experience. Yet systematic analysis remains challenging due to unstructured, domain-specific content and the scarcity of high-quality labeled datasets. This paper presents the User eXperience Perception Insights Dataset (UXPID), a collection of 7130 synthesized and anonymized user feedback branches extracted from a public industrial automation forum. Each JSON record contains multi-post comments...
425 OLaPh: Optimal Language Phonemizer
2509.20086
Hybrid Multilingual Phonemizer提出OLaPh融合多语词典与子词分割提升音素化效果。
cs.CL
Johannes Wirth
Phonemization is a critical component in text-to-speech synthesis. Traditional approaches rely on deterministic transformations and lexica, while neural methods offer potential for higher generalization on out-of-vocabulary (OOV) terms. This work introduces OL...
Phonemization is a critical component in text-to-speech synthesis. Traditional approaches rely on deterministic transformations and lexica, while neural methods offer potential for higher generalization on out-of-vocabulary (OOV) terms. This work introduces OLaPh (Optimal Language Phonemizer), a hybrid framework that integrates extensive multilingual lexica with advanced NLP techniques and a statistical subword segmentation function. Evaluations on the WikiPron benchmark show that the OLaPh fram...
426 ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards
2510.00568
RL Self-Correcting Search Agents提出ReSeek用指导性奖励训练可自纠错的搜索智能体。
cs.CL
Shiyu Li, Yang Tang, Yifan Wang, Peiming Li, Xi Chen
Search agents powered by Large Language Models (LLMs) have demonstrated significant potential in tackling knowledge-intensive tasks. Reinforcement learning (RL) has emerged as a powerful paradigm for training these agents to perform complex, multi-step reasoni...
Search agents powered by Large Language Models (LLMs) have demonstrated significant potential in tackling knowledge-intensive tasks. Reinforcement learning (RL) has emerged as a powerful paradigm for training these agents to perform complex, multi-step reasoning. However, prior RL-based methods often rely on sparse or rule-based rewards, which can lead agents to commit to suboptimal or erroneous reasoning paths without the ability to recover. To address these limitations, we propose ReSeek, a no...
427 How Do Language Models Compose Functions?
2510.01685
Function Composition in LLMs分析LLM在两跳事实回忆中是否以组合机制计算g(f(x))。
cs.CLcs.AI
Apoorv Khandelwal, Ellie Pavlick
While large language models (LLMs) appear to be increasingly capable of solving compositional tasks, it is an open question whether they do so using compositional mechanisms. In this work, we investigate how feedforward LLMs solve two-hop factual recall tasks,...
While large language models (LLMs) appear to be increasingly capable of solving compositional tasks, it is an open question whether they do so using compositional mechanisms. In this work, we investigate how feedforward LLMs solve two-hop factual recall tasks, which can be expressed compositionally as $g(f(x))$. We first confirm that modern LLMs continue to suffer from the "compositionality gap", i.e. their ability to compute both $z = f(x)$ and $y = g(z)$ does not entail their ability to comput...
428 Detecting Distillation Data from Reasoning Models
2510.04850
Distillation Data Contamination Detection定义并研究检测问题是否出现在推理蒸馏数据中的任务。
cs.CLcs.AI
Hengxiang Zhang, Hyeong Kyu Choi, Sharon Li, Hongxin Wei
Reasoning distillation has emerged as a prevailing paradigm for transferring reasoning capabilities from large reasoning models to small language models. Yet, reasoning distillation risks data contamination: benchmark data may inadvertently be included in the ...
Reasoning distillation has emerged as a prevailing paradigm for transferring reasoning capabilities from large reasoning models to small language models. Yet, reasoning distillation risks data contamination: benchmark data may inadvertently be included in the distillation data, thereby inflating model performance metrics. In this work, we formally define the distillation data detection task, which determines whether a given question is included in the model's distillation data. The unique challe...
429 Comprehensiveness Metrics for Automatic Evaluation of Factual Recall in Text Generation
2510.07926
Comprehensiveness Evaluation Metrics提出指标评估生成文本事实召回的完整性并检测遗漏信息。
cs.CL
Adam Dejl, James Barry, Alessandra Pascale, Javier Carnerero Cano
Despite demonstrating remarkable performance across a wide range of tasks, large language models (LLMs) have also been found to frequently produce outputs that are incomplete or selectively omit key information. In sensitive domains, such omissions can result ...
Despite demonstrating remarkable performance across a wide range of tasks, large language models (LLMs) have also been found to frequently produce outputs that are incomplete or selectively omit key information. In sensitive domains, such omissions can result in significant harm comparable to that posed by factual inaccuracies, including hallucinations. In this study, we address the challenge of evaluating the comprehensiveness of LLM-generated texts, focusing on the detection of missing informa...
430 EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle
2510.16079
Self-Evolving LLM Agents提出EvolveR让智能体通过经验闭环生命周期持续自我改进。
cs.CLcs.AI
Rong Wu, Xiaoman Wang, Jianbiao Mei, Pinlong Cai, Daocheng Fu
Current Large Language Model (LLM) agents show strong performance in tool use, but lack the crucial capability to systematically learn from their own experiences. While existing frameworks mainly focus on mitigating external knowledge gaps, they fail to addres...
Current Large Language Model (LLM) agents show strong performance in tool use, but lack the crucial capability to systematically learn from their own experiences. While existing frameworks mainly focus on mitigating external knowledge gaps, they fail to address a more fundamental limitation: the inability to iteratively refine problem-solving strategies. In this work, we introduce EvolveR, a framework designed to enable agent to self-improve through a complete, closed-loop experience lifecycle. ...
431 DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
2510.19669
Difficulty-Adaptive Reasoning Inference提出DiffAdapt按难度自适应控制推理长度以节省token。
cs.CL
Xiang Liu, Xuming Hu, Xiaowen Chu, Eunsol Choi
Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach high performance without overthin...
Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach high performance without overthinking. First, we analyze the entropy of token probabilities in reasoning traces. Across three models, we observe a consistent U-shaped entropy pattern: high entropy on easy problems despite high accuracy, low entropy on problems with medium ...
432 MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning
2511.02805
Memory-Managed Search Agents via RL提出MemSearcher用端到端强化学习训练检索并管理紧凑记忆。
cs.CLcs.AI
Qianhao Yuan, Jie Lou, Zichao Li, Jiawei Chen, Yaojie Lu
LLM-based search agents often concatenate the full interaction history into the context, producing long and noisy inputs, and increasing compute cost and GPU memory overhead. To address this issue, we propose MemSearcher, an agent framework that maintains a co...
LLM-based search agents often concatenate the full interaction history into the context, producing long and noisy inputs, and increasing compute cost and GPU memory overhead. To address this issue, we propose MemSearcher, an agent framework that maintains a compact memory during multi-turn interactions, retaining only question-relevant information and thereby keeping the context length stable across turns. Training MemSearcher is challenging because each trajectory spans multiple turns under dif...
433 Rep2Text: Decoding Full Text from a Single LLM Token Representation
2511.06571
Text Reconstruction from Token Representation提出Rep2Text从单个末token表示中解码恢复原始输入文本。
cs.CLcs.LGcs.AI
Haiyan Zhao, Zirui He, Yiming Tang, Fan Yang, Ali Payani
Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we investigate a fundamental question: to what extent can the original input text be recovered from a single...
Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we investigate a fundamental question: to what extent can the original input text be recovered from a single last-token representation in an LLM? To this end, we propose Rep2Text, a novel framework for decoding text from last-token representations. Rep2Text employs a trainable adapter that maps a target model's last-token representation into the ...
434 Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization
2512.23032
Chain-of-Thought Faithfulness论证CoT可在不显式复述提示线索下仍保持忠实可解释。
cs.CLcs.LGcs.AI
Kerem Zaman, Shashank Srivastava
Recent work, using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric adopts a narrow notion of faithfulness and confuses unfaithfulness with incompleteness, the lossy c...
Recent work, using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric adopts a narrow notion of faithfulness and confuses unfaithfulness with incompleteness, the lossy compression needed to turn distributed transformer computation into a linear natural language narrative. On multi-hop reasoning tasks with instruct-tuned and reasoning models, many CoTs flagged as unfaithful by Biasing Features are judged fa...
435 Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala
2601.14958
Sinhala Script Sensitivity Benchmark基准评测语言模型在僧伽罗Unicode与罗马化混写文本上的表现。
cs.CLcs.AI
Minuri Rajapakse, Ruvan Weerasinghe
The performance of Language Models (LMs) on low-resource, morphologically rich languages like Sinhala remains largely unexplored, particularly regarding script variation in digital communication. Sinhala exhibits script duality, with Unicode used in formal con...
The performance of Language Models (LMs) on low-resource, morphologically rich languages like Sinhala remains largely unexplored, particularly regarding script variation in digital communication. Sinhala exhibits script duality, with Unicode used in formal contexts and Romanized text dominating social media, while mixed-script usage is common in practice. This paper benchmarks 24 open-source LMs on Unicode, Romanized and mixed-script Sinhala using perplexity evaluation across diverse text source...
436 Beyond Factual Accuracy: Evaluating Global Reasoning Integrity in RAG Systems with LogicScore
2601.15050
RAG Logical Integrity Evaluation提出LogicScore评估RAG长回答的全局逻辑一致性而非仅事实。
cs.CL
Zhichao Yan, Yunxiao Zhao, Jiapu Wang, Jiaoyan Chen, Xiaoli Li
Current evaluation methods for Retrieval Augmented Generation (RAG) suffer from \textit{factual myopia}: they relentlessly emphasize factual accuracy yet neglect global logical integrity in long-form answer generation. This drives models to force unnatural con...
Current evaluation methods for Retrieval Augmented Generation (RAG) suffer from \textit{factual myopia}: they relentlessly emphasize factual accuracy yet neglect global logical integrity in long-form answer generation. This drives models to force unnatural connections, producing factually grounded yet logically incoherent responses with unaddressed gaps, ambiguous links, or redundant premises. To mitigate this, we present \textsc{LogicScore}, shifting from local, fact-by-fact assessment to rigor...
437 Can David Beat Goliath? On Multi-Hop Reasoning with Resource-Constrained Agents
2601.21699
Resource-Constrained Multi-Hop Agents研究资源受限下多跳推理智能体的训练与探索效率问题。
cs.CL
Hojae Han, Heeyun Jung, Jongyoon Kim, Seung-won Hwang
Multi-turn reasoning agents solve complex questions by decomposing them into intermediate retrieval or tool-use steps, for accumulating supporting evidence across turns. Meanwhile, with reinforcement learning (RL), training these agents rely on many on-policy ...
Multi-turn reasoning agents solve complex questions by decomposing them into intermediate retrieval or tool-use steps, for accumulating supporting evidence across turns. Meanwhile, with reinforcement learning (RL), training these agents rely on many on-policy rollouts and large training batches. Under realistic resource constraints that make dense exploration infeasible, each RL batch contains only few useful reasoning paths from the current policy. Existing approaches do not fully address this ...
438 WorldCup Sampling for Multi-bit LLM Watermarking
2602.01752
Multi-bit LLM Watermarking提出WorldCup采样提升多比特水印的鲁棒性与文本质量。
cs.CL
Yidan Wang, Yubing Ren, Yanan Cao, Li Guo
As large language models (LLMs) generate increasingly human-like text, watermarking has emerged as a promising solution for reliable attribution beyond mere detection. While multi-bit watermarking enables richer provenance encoding, existing approaches typical...
As large language models (LLMs) generate increasingly human-like text, watermarking has emerged as a promising solution for reliable attribution beyond mere detection. While multi-bit watermarking enables richer provenance encoding, existing approaches typically extend zero-bit watermarking schemes by introducing static logit perturbations and counting-based decoding strategies, which can degrade text quality and compromise decoding robustness as the payload increases. In this paper, we propose ...
439 A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method
2602.02320
Molecule Structure-Language Dataset用规则正则自动生成大规模分子结构到语言描述对齐数据集。
cs.CLcs.AI
Feiyang Cai, Guijuan He, Yi Hu, Jingjing Wang, Joshua Luo
Molecular function is largely determined by structure. Accurately aligning molecular structure with natural language is therefore essential for enabling large language models (LLMs) to reason about downstream chemical tasks. However, the substantial cost of hu...
Molecular function is largely determined by structure. Accurately aligning molecular structure with natural language is therefore essential for enabling large language models (LLMs) to reason about downstream chemical tasks. However, the substantial cost of human annotation makes it infeasible to construct large-scale, high-quality datasets of structure-grounded descriptions. In this work, we propose a fully automated annotation framework for generating precise molecular descriptions that preser...
440 Rethinking Weight Tying: Pseudo-Inverse Tying for LM Stable Training and Updates
2602.04556
Pseudo-Inverse Weight Tying提出伪逆权重绑定稳定训练并保持词表编码解码接口一致。
cs.CLcs.LG
Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang
Weight tying is widely used in compact language models to reduce parameters by sharing the token table between the input embedding and the output projection. However, parameter sharing alone does not guarantee a stable token interface: during training, the cor...
Weight tying is widely used in compact language models to reduce parameters by sharing the token table between the input embedding and the output projection. However, parameter sharing alone does not guarantee a stable token interface: during training, the correspondence between encoding tokens into hidden states and decoding hidden states into logits can drift, worsening optimization sensitivity and weakening explainability probes that rely on a meaningful vocabulary-space decoder. We propose P...
441 Retrieval Heads are Dynamic
2602.11162
Dynamic Retrieval Heads Analysis从生成时序角度分析LLM检索头的动态变化与作用机制。
cs.CL
Yuping Lin, Zitao Li, Yue Xing, Pengfei He, Yingqian Cui
Recent studies have identified "retrieval heads" in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retri...
Recent studies have identified "retrieval heads" in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retrieval on average. This perspective overlooks the fine-grained temporal dynamics of autoregressive generation. In this paper, we investigate retrieval heads from a dynamic perspective. Through extensive analysis, we establish three core claim...
442 Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset
2602.16571
PII De-Identification for Math Tutoring研究数学辅导对话中数字歧义下的保效去标识化方法。
cs.CL
Zhuqian Zhou, Kirk Vanacore, Bakhtawar Ahtisham, Jinsook Lee, Doug Pietrzak
Large-scale sharing of dialogue data is key to advancing the science of teaching and learning, yet rigorous de-identification remains a major barrier. In mathematics tutoring transcripts, numeric expressions frequently resemble structured identifiers (e.g., da...
Large-scale sharing of dialogue data is key to advancing the science of teaching and learning, yet rigorous de-identification remains a major barrier. In mathematics tutoring transcripts, numeric expressions frequently resemble structured identifiers (e.g., dates or IDs), leading generic Personally Identifiable Information (PII) detection systems to over-redact core instructional content and reduce data utility. This work asks how to detect PII while preserving educational utility, focusing on t...
443 Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning
2602.19612
Benchmarking Machine Unlearning提出DUAL基准分析事实显著性与微调阶段对遗忘效果的影响。
cs.CL
Borisiuk Anna, Andrey Savchenko, Alexander Panchenko, Elena Tutubalina
Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whether the forgotten knowledge originates from pretraining or supe...
Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whether the forgotten knowledge originates from pretraining or supervised fine-tuning (SFT). In this paper, we introduce DUAL (Dual Unlearning Evaluation across Training Stages), a benchmark of 28.6k Wikidata-derived triplets annotated with fact popularity using Wikipedia link counts and LLM-based salience...
444 Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation
2602.20816
Tail-Aware Distillation Divergence提出解耦top-K与尾部概率的蒸馏损失以提升学生学习信号。
cs.CLcs.LG
Sayantan Dasgupta, Trevor Cohn, Timothy Baldwin
The core learning signal used in language model distillation is the standard Kullback-Leibler (KL) divergence between the student and teacher distributions. Traditional KL divergence tends to be dominated by the next tokens with the highest probabilities, i.e....
The core learning signal used in language model distillation is the standard Kullback-Leibler (KL) divergence between the student and teacher distributions. Traditional KL divergence tends to be dominated by the next tokens with the highest probabilities, i.e., the teacher's modes, thereby diminishing the influence of less probable yet potentially informative components of the output distribution. We propose a new tail-aware divergence that decouples the contribution of the teacher model's top-K...
445 Optimizing Language Models for Crosslingual Knowledge Consistency
2603.04678
Crosslingual Knowledge Consistency RL用结构化奖励强化学习优化模型以获得跨语言一致的知识回答。
cs.CLcs.AI
Tianyu Liu, Jirui Qi, Mrinmaya Sachan, Ryan Cotterell, Raquel Fern\'andez
Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their re...
Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their reliability. In this work, we show that this issue can be mitigated using reinforcement learning with a structured reward function, which leads to an optimal policy with consistent crosslingual responses. We introduce Direct Consistency Optim...
446 A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs
2603.07475
AR vs Diffusion LLM Representations对比自回归与扩散LLM的层级表征容量与token表征差异。
cs.CLcs.LG
Raghavv Goel, Risheek Garrepalli, Sudhanshu Agrawal, Chris Lott, Mingu Lee
Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether diffusion objectives ...
Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether diffusion objectives fundamentally reshape internal representations remains unclear. We perform the first layer- and token-wise representational analysis comparing native dLLMs (LLaDA), native AR models (Qwen2.5), and AR-initialized dLLMs (Dream-7B), using cosi...
447 NCL-UoR at SemEval-2026 Task 5: Embedding-Based Methods, Fine-Tuning, and LLMs for Word Sense Plausibility Rating
2603.08256
Word Sense Plausibility Regression比较嵌入回归、微调与提示法预测词义可行性评分。
cs.CL
Tong Wu, Thanet Markchom, Huizhi Liang
Word sense plausibility rating requires predicting the human-perceived plausibility of a given word sense on a 1-5 scale in the context of short narrative stories containing ambiguous homonyms. This paper systematically compares three approaches: (1) embedding...
Word sense plausibility rating requires predicting the human-perceived plausibility of a given word sense on a 1-5 scale in the context of short narrative stories containing ambiguous homonyms. This paper systematically compares three approaches: (1) embedding-based methods pairing sentence embeddings with standard regressors, (2) transformer fine-tuning with parameter-efficient adaptation, and (3) large language model (LLM) prompting with structured reasoning and explicit decision rules. The be...
448 FinReasoning: A Hierarchical Benchmark for Reliable Financial Research Reporting
2603.19254
Financial Research Reporting Benchmark提出FinReasoning分层基准评测金融研究报告的可靠推理与一致性。
cs.CL
Yiyun Zhu, Yidong Jiang, Ziwen Xu, Yinsheng Yao, Dawei Cheng
Large language models (LLMs) are increasingly deployed in financial research workflows, where their role is evolving from single-model assistance for human analysts toward autonomous collaboration among multiple agents. Yet real-world deployments still expose ...
Large language models (LLMs) are increasingly deployed in financial research workflows, where their role is evolving from single-model assistance for human analysts toward autonomous collaboration among multiple agents. Yet real-world deployments still expose factual errors, numerical inconsistencies, and shallow analysis, which can distort assessments of corporate fundamentals and trigger severe economic losses. While existing benchmarks have begun to evaluate such failures, they score all aspe...
449 Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control
2604.03147
Valence-Arousal Emotion Subspace发现LLM情绪向量呈效价-唤醒二维环形几何并可控生成。
cs.CLcs.AI
Lihao Sun, Lewen Yan, Xiaoya Lu, Andrew Lee, Jie Zhang
We show that emotion vectors in LLMs are organized by a two-dimensional valence-arousal (VA) subspace exhibiting circular geometry. Through principal component decomposition and ridge regression, we recover meaningful VA axes underlying emotion steering vector...
We show that emotion vectors in LLMs are organized by a two-dimensional valence-arousal (VA) subspace exhibiting circular geometry. Through principal component decomposition and ridge regression, we recover meaningful VA axes underlying emotion steering vectors whose projections correlate with human affect ratings across 44,728 words. Steering along these axes produces monotonic control over the affective properties of generated text, and further affords bidirectional control over multiple downs...
450 NCL-BU at SemEval-2026 Task 3: Fine-tuning XLM-RoBERTa for Multilingual Dimensional Sentiment Regression
2604.08923
Multilingual Valence-Arousal Regression微调XLM-R实现多语言方面级效价与唤醒连续回归预测。
cs.CL
Tong Wu, Nicolay Rusnachenko, Huizhi Liang
Dimensional Aspect-Based Sentiment Analysis (DimABSA) extends traditional ABSA from categorical polarity labels to continuous valence-arousal (VA) regression. This paper describes a system developed for Track A, Subtask 1 (Dimensional Aspect Sentiment Regressi...
Dimensional Aspect-Based Sentiment Analysis (DimABSA) extends traditional ABSA from categorical polarity labels to continuous valence-arousal (VA) regression. This paper describes a system developed for Track A, Subtask 1 (Dimensional Aspect Sentiment Regression), aiming to predict real-valued VA scores in the [1, 9] range for each given aspect in a text. A fine-tuning approach based on XLM-RoBERTa-base is adopted, with dual regression heads with sigmoid-scaled outputs for valence and arousal pr...
451 BITS Pilani at SemEval-2026 Task 9: Structured Supervised Fine-Tuning with DPO Refinement for Polarization Detection
2604.11121
Polarization Detection Fine-tuning用监督微调结合DPO提升多语种极化检测。
cs.CL
Atharva Gupta, Dhruv Kumar, Yash Sinha
The POLAR SemEval-2026 Shared Task aims to detect online polarization and focuses on the classification and identification of multilingual, multicultural, and multi-event polarization. Accurate computational detection of online polarization is challenging due ...
The POLAR SemEval-2026 Shared Task aims to detect online polarization and focuses on the classification and identification of multilingual, multicultural, and multi-event polarization. Accurate computational detection of online polarization is challenging due to nuanced rhetoric, implicit framing, and the high cost of human-in-the-loop annotation. Building on recent findings that contextual prompting enables large language models to function as strong polarization detectors, we present a two-sta...
452 Prune, Interpret, Evaluate: A Cross-Layer Transcoder-Native Framework for Efficient Circuit Discovery via Feature Attribution
2604.16889
Efficient Circuit Interpretability先剪枝再归因解释以高效发现模型电路特征。
cs.CL
Qinhao Chen, Linyang He, Nima Mesgarani
Existing feature-interpretation pipelines typically operate on uniformly sampled units or exhaustive feature sets, incurring massive costs on units irrelevant to target behaviors. To address this, we introduce the first CLT-native end-to-end pruning framework,...
Existing feature-interpretation pipelines typically operate on uniformly sampled units or exhaustive feature sets, incurring massive costs on units irrelevant to target behaviors. To address this, we introduce the first CLT-native end-to-end pruning framework, PIE, which pioneers the paradigm of pruning first and interpreting later. PIE connects Pruning, automatic Interpretation, and interpretation Evaluation, establishing a comprehensive benchmarking environment to systematically measure behavi...
453 JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems
2604.23478
LLM Judge Prompt Sensitivity构建JudgeSense评测LLM裁判对改写提示的稳定性。
cs.CL
Rohith Reddy Bellibatlu, Edward Raff, Wenbin Zhang
Large language models are widely adopted as automated evaluation judges, yet the stability of their verdicts under semantically equivalent prompt rephrasings remains largely unexamined. We conduct a systematic empirical study of prompt-induced decision instabi...
Large language models are widely adopted as automated evaluation judges, yet the stability of their verdicts under semantically equivalent prompt rephrasings remains largely unexamined. We conduct a systematic empirical study of prompt-induced decision instability across multiple evaluation tasks and judge architectures. To facilitate this analysis, we release JudgeSense, a benchmark comprising hand-validated prompt-paraphrase pairs spanning factuality, coherence, relevance, and preference, draw...
454 TSAssistant: A Human-in-the-Loop Agentic Framework for Automated Target Safety Assessment
2604.23938
Target Safety Assessment Agents提出人机协作多代理框架辅助药物靶点安全评估报告。
cs.CL
Xiaochen Zheng, Zhiwen Jiang, Melanie Guerard, Klas Hatje, Tatyana Doktorova
Target Safety Assessment (TSA) requires systematic integration of heterogeneous evidence, including genetic, transcriptomic, target homology, pharmacological, and clinical data, to evaluate potential safety liabilities of therapeutic targets. This process is i...
Target Safety Assessment (TSA) requires systematic integration of heterogeneous evidence, including genetic, transcriptomic, target homology, pharmacological, and clinical data, to evaluate potential safety liabilities of therapeutic targets. This process is inherently iterative and expert-driven, posing challenges in scalability and reproducibility. We present TSAssistant, a multi-agent framework designed to support TSA report drafting through a modular, section-based, and human-in-the-loop par...
455 SeaEvo: Advancing Algorithm Discovery with Strategy Space Evolution
2604.24372
Evolutionary Algorithm Discovery用策略空间进化将语言推理组织为持久群体状态以发现算法。
cs.CLcs.AI
Sichun Luo, Yi Huang, Haochen Luo, Fengyuan Liu, Guanzhi Deng
Large Language Model (LLM)-guided evolutionary search is increasingly used for automated algorithm discovery, yet most current methods track search progress primarily through executable programs and scalar fitness. Even when natural-language reasoning is used ...
Large Language Model (LLM)-guided evolutionary search is increasingly used for automated algorithm discovery, yet most current methods track search progress primarily through executable programs and scalar fitness. Even when natural-language reasoning is used through heuristic descriptions or reflection, it typically remains transient mutation context or unstructured memory, rather than organized as persistent population-level state over strategic directions. As a result, evolutionary search can...
456 Structural Generalization on SLOG without Hand-Written Rules
2604.26157
Structural Generalization Semantic Parsing用离散瓶颈神经元胞自动机从数据学习组合规则实现结构泛化。
cs.CLcs.AI
Zichao Wei
Structural generalization in semantic parsing requires systems to apply learned compositional rules to novel structural combinations. Existing approaches either rely on hand-written algebraic rules (AM-Parser) or fail to generalize structurally (Transformer-ba...
Structural generalization in semantic parsing requires systems to apply learned compositional rules to novel structural combinations. Existing approaches either rely on hand-written algebraic rules (AM-Parser) or fail to generalize structurally (Transformer-based models). We present an alternative requiring no hand-written compositional rules, based on a neural cellular automaton (NCA) with a discrete bottleneck: all compositional rules are learned from data through local iteration. On the SLOG ...
457 Can AI Debias the News? LLM Interventions Improve Cross-Partisan Receptivity but LLMs Overestimate Their Own Effectiveness
2605.01006
LLM News Debiasing Study实验检验LLM改写新闻标题对跨党派接受度的影响与偏差。
cs.CL
Faisal Feroz, Jonas R. Kunst
Partisan news media erode cross-partisan trust, but large language models (LLMs) offer a potential means of debiasing such content at scale. Across two pre-registered experiments, we tested whether LLM-generated debiasing of liberal news headlines could improv...
Partisan news media erode cross-partisan trust, but large language models (LLMs) offer a potential means of debiasing such content at scale. Across two pre-registered experiments, we tested whether LLM-generated debiasing of liberal news headlines could improve conservative readers' trust-relevant judgments. Study 1 found that subtle lexical debiasing (replacing emotive words with more moderate synonyms) had no effect on any outcome. Study 2 found that a more substantive reframing intervention s...
458 OralMLLM-Bench: Evaluating Cognitive Capabilities of Multimodal Large Language Models in Dental Practice
2605.01333
Dental Multimodal LLM Benchmark发布口腔影像基准评测多模态大模型的放射诊断认知能力。
cs.CL
Rongyang Wang, Shuang Zhou, Jiashuo Wang, Wenya Xie, Xiaoxia Che
Multimodal large language models (MLLMs) have emerged as a promising paradigm for dental image analysis. However, their ability to capture the multi-level cognitive processes required for radiographic analysis remains unclear. Here, we present a comprehensive ...
Multimodal large language models (MLLMs) have emerged as a promising paradigm for dental image analysis. However, their ability to capture the multi-level cognitive processes required for radiographic analysis remains unclear. Here, we present a comprehensive benchmark to evaluate the cognitive capabilities of MLLMs in dental radiographic analysis. It spans three critical imaging modalities, i.e., periapical, panoramic, and lateral cephalometric radiographs, and defines four cognitive categories...
459 TCDA: Thread-Constrained Discourse-Aware Modeling for Conversational Sentiment Quadruple Analysis
2605.01717
Conversational Sentiment Quadruple提出线程约束话语建模以提升对话情感四元组抽取。
cs.CLcs.AI
Xinran Li, Xinze Che, Yifan Lyu, Zhiqi Huang, Xiujuan Xu
Conversational Aspect-based Sentiment Quadruple Analysis (DiaASQ) needs to capture the complex interrelationships in multiple rounds of dialogues. Existing methods usually employ simple Graph Convolutional Networks (GCN), which introduce structural noise and f...
Conversational Aspect-based Sentiment Quadruple Analysis (DiaASQ) needs to capture the complex interrelationships in multiple rounds of dialogues. Existing methods usually employ simple Graph Convolutional Networks (GCN), which introduce structural noise and fail to consider the temporal sequence of the dialogues, or use standard RoPE, which implicitly captures relative distances in a flat sequence but cannot clearly separate the token-level syntactic order from the utterance-level progression, ...
460 Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning
2605.02073
Reward Optimization for RL Reasoning用搜索驱动强化学习优化奖励函数以增强LLM数学推理。
cs.CL
Arash Ahmadi (Mike), Sarah Sharif (Mike), Yaser (Mike), Banad
Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward f...
Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives policy optimization. This paper introduces a search-driven framework that treats the reward specification itself as an object of optimization. The setting of interest is one in which the base model is held fixed and the ...
461 Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
2605.04913
Local Learning Post-training提出更便宜更快的局部学习方案降低LLM后训练开销。
cs.CLcs.LG
Hengyu Shi, Tianyang Han, Peizhe Wang, Zhiling Wang, Xu Yang
LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-g...
LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we prop...
462 Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM
2605.05927
Speech LLM Modality Gap从输入侧引入韵律感知表示以缩小语音与文本LLM差距。
cs.CLcs.SDeess.AS
Wenqian Cui, Xiao-Hui Li, Daxin Tan, Qiyong Zheng, Irwin King
Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech generation...
Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech generation more text-like, but the gap remains. We argue that the key remaining bottleneck lies on the input side. We propose TextPro-SLM, an SLM that makes spoken input more closely resemble that of a prosody-aware text LLM. TextPro-SLM combines Whi...
463 SEQUOR: A Multi-Turn Benchmark for Realistic Constraint Following
2605.06353
Long-horizon Constraint Following Benchmark提出SEQUOR评测长多轮对话中约束遵循与一致性。
cs.CL
Beatriz Canaverde, Duarte M. Alves, Jos\'e Pombal, Giuseppe Attanasio, Andr\'e F. T. Martins
In a conversation, a helpful assistant must reliably follow user directives, even as they refine, modify, or contradict earlier requests. Yet most instruction-following benchmarks focus on single-turn or short multi-turn scenarios, leaving open how well models...
In a conversation, a helpful assistant must reliably follow user directives, even as they refine, modify, or contradict earlier requests. Yet most instruction-following benchmarks focus on single-turn or short multi-turn scenarios, leaving open how well models handle long-horizon instruction-following tasks. To bridge this gap, we present SEQUOR, an automatic benchmark for evaluating constraint adherence in long multi-turn conversations. SEQUOR consists of simulated persona-driven interactions b...
464 Statistical Patterns in the Equations of Physics and the Emergence of a Meta-Law of Nature
2408.11065
Statistical Physics Equation Patterns分析物理方程语料的统计结构规律以探讨元规律。
cs.CL
Andrei Constantin, Deaglan Bartlett, Harry Desmond, Pedro G. Ferreira
Physics seeks to uncover the laws of Nature and express them through mathematical equations. Despite the vast diversity of natural phenomena, physical equations exhibit structural regularities that set them apart from arbitrary mathematical expressions. While ...
Physics seeks to uncover the laws of Nature and express them through mathematical equations. Despite the vast diversity of natural phenomena, physical equations exhibit structural regularities that set them apart from arbitrary mathematical expressions. While principles such as dimensional analysis have long guided the formulation of physical models, the exploration of more subtle statistical patterns within the equations of physics remains an open question. Here, by analysing four corpora of ph...
465 UNA: A Unified Supervised Framework for Efficient LLM Alignment Across Feedback Types
2408.15339
Unified LLM Alignment Supervision统一偏好与打分等多种反馈信号实现高效对齐训练。
cs.CLcs.LG
Zhichao Wang, Bin Bi, Can Huang, Shiva Kumar Pentyala, Zixu James Zhu
RL alignment methods, including RLHF and DPO, are primarily based on pairwise preference data. Although scalar or score-based feedback has been collected in some settings, it is rarely used directly, and preference magnitude information is typically ignored. F...
RL alignment methods, including RLHF and DPO, are primarily based on pairwise preference data. Although scalar or score-based feedback has been collected in some settings, it is rarely used directly, and preference magnitude information is typically ignored. Furthermore, current alignment frameworks offer limited capability for unifying heterogeneous supervision signals, making it difficult to jointly leverage diverse data types within a single training paradigm. This limitation constrains the r...
466 From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems
2506.04565
Compound AI Systems Survey综述将LLM与检索、工具、代理编排集成的复合AI系统。
cs.CL
Jiayi Chen, Junyi Ye, Guiling Wang
Compound AI Systems (CAIS) are an emerging paradigm that integrates large language models (LLMs) with external components, including retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reaso...
Compound AI Systems (CAIS) are an emerging paradigm that integrates large language models (LLMs) with external components, including retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reasoning, real-time grounding, and multimodal understanding. These systems enable more capable and context-aware behaviors by composing multiple specialized modules into cohesive workflows. Despite growing adoption in both academia and industry...
467 MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
2507.21183
Preference Optimization with Priors提出MaPPO将先验奖励知识融入偏好优化目标以对齐模型。
cs.CLcs.LGcs.AI
Guangchen Lan, Sipeng Zhang, Tianle Wang, Yuwei Zhang, Daoan Zhang
As the era of large language models (LLMs) unfolds, Preference Optimization (PO) methods have become a central approach to aligning LLMs with human preferences and improving performance. We propose Maximum a Posteriori Preference Optimization (MaPPO), a method...
As the era of large language models (LLMs) unfolds, Preference Optimization (PO) methods have become a central approach to aligning LLMs with human preferences and improving performance. We propose Maximum a Posteriori Preference Optimization (MaPPO), a methodology for learning from preferences that explicitly incorporates prior reward knowledge into the optimization objective. Building on the paradigm employed by Direct Preference Optimization (DPO) and its variants of treating preference learn...
468 Searching for Privacy Risks in LLM Agents via Simulation
2508.10880
Privacy Attacks in LLM Agents用模拟搜索交替改进攻防策略以发现代理隐私风险。
cs.CLcs.AI
Yanzhe Zhang, Diyi Yang
The widespread deployment of LLM-based agents is likely to introduce a critical privacy threat: malicious agents that proactively engage others in multi-turn interactions to extract sensitive information. However, the evolving nature of such dynamic dialogues ...
The widespread deployment of LLM-based agents is likely to introduce a critical privacy threat: malicious agents that proactively engage others in multi-turn interactions to extract sensitive information. However, the evolving nature of such dynamic dialogues makes it challenging to anticipate emerging vulnerabilities and design effective defenses. To tackle this problem, we present a search-based framework that alternates between improving attack and defense strategies through the simulation of...
469 A Multi-Memory Segment System for Generating High-Quality Long-Term Memory Content in Agents
2508.15294
Agent Long-term Memory Generation提出多记忆分段系统生成更高质量的代理长期记忆内容。
cs.CLcs.AI
Gaoke Zhang, Bo Wang, Yunlong Ma, Dongming Zhao, Zifei Yu
In the current field of agent memory, extensive explorations have been conducted in the area of memory retrieval, yet few studies have focused on exploring the memory content. Most research simply stores summarized versions of historical dialogues, as exemplif...
In the current field of agent memory, extensive explorations have been conducted in the area of memory retrieval, yet few studies have focused on exploring the memory content. Most research simply stores summarized versions of historical dialogues, as exemplified by methods like A-MEM and MemoryBank. However, when humans form long-term memories, the process involves multi-dimensional and multi-component generation, rather than merely creating simple summaries. The low-quality memory content gene...
470 Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation
2509.03736
Behavioral Coherence of LLM Agents用潜在画像评估LLM代理在社会模拟中的行为一致性。
cs.CLcs.LGcs.AI
James Mooney, Josef Woldense, Zheng Robert Jia, Shirley Anugrah Hayati, My Ha Nguyen
The impressive capabilities of Large Language Models (LLMs) raise the possibility that synthetic agents can serve as substitutes for real participants in human-subject research. To evaluate this claim, prior research has largely focused on whether LLM-generate...
The impressive capabilities of Large Language Models (LLMs) raise the possibility that synthetic agents can serve as substitutes for real participants in human-subject research. To evaluate this claim, prior research has largely focused on whether LLM-generated survey responses align with those produced by human respondents whom the LLMs are prompted to represent. In contrast, we address a more fundamental question: Do agents maintain empirical consistency; aligning to human behavioral models wh...
471 SpikingBrain: Spiking Brain-inspired Large Models
2509.05276
Brain-inspired Spiking Large Models提出脉冲脑启发大模型以高效支持长上下文训练与推理。
cs.CLcs.LGcs.AI
Yuqi Pan, Yupeng Feng, Jinghao Zhuang, Siyu Ding, Han Xu
Mainstream Transformer-based large language models face major efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly, limiting long-context processing. Building large models on non-NVIDIA pla...
Mainstream Transformer-based large language models face major efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly, limiting long-context processing. Building large models on non-NVIDIA platforms also poses challenges for stable and efficient training. To address this, we introduce SpikingBrain, a family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU c...
472 Automated Evaluation can Distinguish the Good and Bad AI Responses to Patient Questions about Hospitalization
2510.00436
Automated Evaluation of Medical QA研究自动指标能否区分住院问答中AI回复优劣并对齐专家。
cs.CLcs.AI
Sarvesh Soni, Dina Demner-Fushman
Automated approaches to answer patient-posed health questions are rising, but selecting among systems requires reliable evaluation. The current gold standard for evaluating the free-text artificial intelligence (AI) responses--human expert review--is labor-int...
Automated approaches to answer patient-posed health questions are rising, but selecting among systems requires reliable evaluation. The current gold standard for evaluating the free-text artificial intelligence (AI) responses--human expert review--is labor-intensive and slow, limiting scalability. Automated metrics are promising yet variably aligned with human judgments and often context-dependent. To address the feasibility of automating the evaluation of AI responses to hospitalization-related...
473 InvThink: Premortem Reasoning for Safer Language Models
2510.01569
Premortem Safety Reasoning通过预演失败与约束生成的三步流程提升语言模型安全性。
cs.CLcs.AI
Yubin Kim, Taehan Kim, Eugene Park, Chunjong Park, Cynthia Breazeal
We present InvThink, a training and prompting framework that requires the model to enumerate, analyze, and constrain potential failures before generating its final response. Unlike existing safety alignment methods that optimize only for safe final responses, ...
We present InvThink, a training and prompting framework that requires the model to enumerate, analyze, and constrain potential failures before generating its final response. Unlike existing safety alignment methods that optimize only for safe final responses, InvThink structures generation into three steps: (1) enumerate potential harms, (2) analyze their consequences, (3) generate the response under explicit mitigation constraints. We observe three findings: (i) InvThink shows higher safety sco...
474 Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models
2601.04731
Data-efficient RL for Reasoning用策略不确定性作内在奖励提升大推理模型无评论家RL效率。
cs.CLcs.AI
Shuyang Jiang, Yuhao Wang, Ya Zhang, Yanfeng Wang, Yu Wang
Current critic-free RL methods for large reasoning models suffer from severe inefficiency when training on positive homogeneous prompts (where all rollouts are correct), resulting in waste of rollouts due to zero advantage estimates. We introduce a radically s...
Current critic-free RL methods for large reasoning models suffer from severe inefficiency when training on positive homogeneous prompts (where all rollouts are correct), resulting in waste of rollouts due to zero advantage estimates. We introduce a radically simple yet powerful solution to \uline{M}ine \uline{in}trinsic mast\uline{er}y (Miner), that repurposes the policy's intrinsic uncertainty as a self-supervised reward signal, with no external supervision, auxiliary models, or additional infe...
475 Neural Neural Scaling Laws
2601.19831
Downstream Task Scaling Laws提出新方法刻画下游任务多样化缩放规律而非仅看验证损失。
cs.CLcs.LG
Michael Y. Hu, Jane Pan, Ayush Rajesh Jhaveri, Nicholas Lourie, Kyunghyun Cho
Neural scaling laws predict how language model performance improves with increased training inputs. While aggregate metrics like validation loss can follow smooth power-law curves, individual downstream tasks exhibit diverse scaling behaviors: some improve mon...
Neural scaling laws predict how language model performance improves with increased training inputs. While aggregate metrics like validation loss can follow smooth power-law curves, individual downstream tasks exhibit diverse scaling behaviors: some improve monotonically, others plateau, and some even degrade with scale. We argue that predicting downstream performance from validation loss suffers from two limitations: averaging token-level losses obscures signal, and no simple parametric family c...
476 Sign-Based Optimizers Are Effective Under Heavy-Tailed Noise
2602.07425
Sign-based Optimization Theory从重尾梯度噪声理论解释符号优化器优于AdamW的原因。
cs.CLcs.LG
Dingzhi Yu, Hongyi Tao, Yuanyu Wan, Luo Luo, Lijun Zhang
While adaptive gradient methods are the workhorse of modern machine learning, sign-based optimization algorithms such as Lion and Muon have recently demonstrated superior empirical performance over AdamW in training large language models (LLM). However, a theo...
While adaptive gradient methods are the workhorse of modern machine learning, sign-based optimization algorithms such as Lion and Muon have recently demonstrated superior empirical performance over AdamW in training large language models (LLM). However, a theoretical understanding of why sign-based updates outperform variance-adapted methods remains elusive. In this paper, we aim to bridge the gap between theory and practice through the lens of heavy-tailed gradient noise, a phenomenon frequentl...
477 Flexible Entropy Control in RLVR with a Gradient-Preserving Perspective
2602.09782
Entropy Control in RLVR提出保梯度视角的熵控制方法缓解RLVR训练中的熵塌缩。
cs.CLcs.LGcs.AI
Kun Chen, Peng Shi, Fanfan Liu, Haibo Qiu, Zhixiong Zeng
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a critical method for enhancing the reasoning capabilities of Large Language Models (LLMs). However, continuous training often leads to policy entropy collapse, characterized by a rapid decay...
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a critical method for enhancing the reasoning capabilities of Large Language Models (LLMs). However, continuous training often leads to policy entropy collapse, characterized by a rapid decay in entropy that results in premature overconfidence, reduced output diversity, and vanishing gradient norms that inhibit learning. Gradient-Preserving Clipping is a primary factor influencing these dynamics, but existing mitigation strateg...
478 Overview of the TREC 2025 RAGTIME Track
2602.10024
TREC RAGTIME Track Overview总结TREC多语种RAGTIME报告生成与检索任务及参赛结果。
cs.CL
Dawn Lawrie, Sean MacAvaney, James Mayfield, Luca Soldaini, Eugene Yang
The principal goal of the RAG TREC Instrument for Multilingual Evaluation (RAGTIME) track at TREC is to study report generation from multilingual source documents. The track has created a document collection containing Arabic, Chinese, English, and Russian new...
The principal goal of the RAG TREC Instrument for Multilingual Evaluation (RAGTIME) track at TREC is to study report generation from multilingual source documents. The track has created a document collection containing Arabic, Chinese, English, and Russian news stories. RAGTIME includes three task types: Multilingual Report Generation, English Report Generation, and Multilingual Information Retrieval (MLIR). A total of 125 runs were submitted by 13 participating teams (and as baselines by the tr...
479 A Geometric Taxonomy of Hallucinations in LLMs
2602.13224
Hallucination Taxonomy and Detection提出幻觉几何分类并在黑盒单次问答场景下改进检测。
cs.CLcs.AI
Javier Mar\'in
Hallucinations in deployed language models can have real consequences for downstream decisions in domains such as healthcare, legal, and financial services. In production, detection has to run on what the deployed system can see: the query, the response, and o...
Hallucinations in deployed language models can have real consequences for downstream decisions in domains such as healthcare, legal, and financial services. In production, detection has to run on what the deployed system can see: the query, the response, and often a source document. White-box access to model internals and multi-sample querying are not generally available behind a third-party API. Within this setting - black-box, single-pass, only question/answer available - the dominant baseline...
480 ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation
2602.15189
Schema-constrained LLM Dataset发布ScrapeGraphAI-100k真实网页抽取数据集用于JSON模式约束生成。
cs.CLcs.AI
William Brach, Francesco Zuppichini, Marco Vinciguerra, Lorenzo Padoan
Producing output that conforms to a specified JSON schema underlies tool use, structured extraction, and knowledge base construction in modern large language models. Despite this centrality, public datasets for the task remain small, synthetic, or text-only, a...
Producing output that conforms to a specified JSON schema underlies tool use, structured extraction, and knowledge base construction in modern large language models. Despite this centrality, public datasets for the task remain small, synthetic, or text-only, and rarely pair real page content with the prompts and schemas used in practice. We introduce ScrapeGraphAI-100k, 93,695 schema-constrained extraction events collected via opt-in ScrapeGraphAI telemetry in Q2--Q3 2025, deduplicated and balan...
481 Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features
2603.03096
SSL语音特征说话人解析用PCA分析自监督语音特征各维度的说话人信息。
cs.CLeess.AS
Kyle Janse van Rensburg, Benjamin van Niekerk, Herman Kamper
How do speech models trained through self-supervised learning structure their representations? Previous studies have looked at how information is encoded in feature vectors across different layers. But few studies have considered whether speech characteristics...
How do speech models trained through self-supervised learning structure their representations? Previous studies have looked at how information is encoded in feature vectors across different layers. But few studies have considered whether speech characteristics are captured within individual dimensions of SSL features. In this paper we specifically look at speaker information using PCA on utterance-averaged representations. For a range of SSL models, we find that the principal dimension that expl...
482 Sparser, Faster, Lighter Transformer Language Models
2603.23198
稀疏Transformer加速推理用非结构化稀疏与CUDA内核降低LLM前馈层算力与内存。
cs.CLcs.LG
Edoardo Cetin, Stefano Peluchetti, Emilio Castillo, Akira Naruse, Mana Murakami
Scaling autoregressive large language models (LLMs) has driven unprecedented progress but comes with vast computational costs. In this work, we tackle these costs by leveraging unstructured sparsity within an LLM's feedforward layers, the components accounting...
Scaling autoregressive large language models (LLMs) has driven unprecedented progress but comes with vast computational costs. In this work, we tackle these costs by leveraging unstructured sparsity within an LLM's feedforward layers, the components accounting for most of the model parameters and execution FLOPs. To achieve this, we introduce a new sparse packing format and a set of CUDA kernels designed to seamlessly integrate with the optimized execution pipelines of modern GPUs, enabling effi...
483 SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks
2603.24755
迭代式编码代理基准提出SlopCodeBench评测编码代理在长程迭代扩展中的退化。
cs.CLcs.AI
Gabriel Orlanski, Devjeet Roy, Alexander Yun, Changho Shin, Alex Gu
Software development is iterative, yet agentic coding benchmarks hide design issues through their single-shot setup. Recent iterative benchmarks attempt to remedy this but heavily constrain an agent's design decision space, making it impossible to faithfully m...
Software development is iterative, yet agentic coding benchmarks hide design issues through their single-shot setup. Recent iterative benchmarks attempt to remedy this but heavily constrain an agent's design decision space, making it impossible to faithfully measure how their decisions shape future extensions. We introduce SlopCodeBench, a benchmark of 36 problems and 196 checkpoints where agents repeatedly extend their own solutions. Unlike prior iterative benchmarks, our evolving specification...
484 OASES: Outcome-Aligned Search-Evaluation Co-Training for Agentic Search
2604.03675
搜索代理共训练奖励用结果对齐的搜索-评估共训练缓解RLVR中间步骤信用分配稀疏。
cs.CLcs.AI
Erhan Zhang, Yiqun Chen, Zechun Niu, Wei Yang, Xiaochi Wei
Agentic search enables language models to solve knowledge-intensive tasks by adaptively acquiring external evidence over multiple steps. Reinforcement learning with verifiable rewards (RLVR) has emerged as a widely adopted training paradigm for search agents, ...
Agentic search enables language models to solve knowledge-intensive tasks by adaptively acquiring external evidence over multiple steps. Reinforcement learning with verifiable rewards (RLVR) has emerged as a widely adopted training paradigm for search agents, yet outcome-only rewards are sparse and provide limited credit assignment for intermediate search actions. Existing process-reward methods therefore seek to densify supervision through proxy signals, external evaluators, or likelihood-based...
485 KV Cache Offloading for Context-Intensive Tasks
2604.08426
KV缓存卸载评测系统评估长上下文任务中KV-cache卸载对延迟内存与准确率影响。
cs.CLcs.LGcs.AI
Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev, Vyacheslav Zhdanovskiy, Yegor Yershov
With the growing demand for long-context LLMs across a wide range of applications, the key-value (KV) cache has become a critical bottleneck for both latency and memory usage. Recently, KV-cache offloading has emerged as a promising approach to reduce memory f...
With the growing demand for long-context LLMs across a wide range of applications, the key-value (KV) cache has become a critical bottleneck for both latency and memory usage. Recently, KV-cache offloading has emerged as a promising approach to reduce memory footprint and inference latency while preserving accuracy. Prior evaluations have largely focused on tasks that do not require extracting large amounts of information from the context. In this work, we study KV-cache offloading on context-in...
486 ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
2605.00380
推理RL的负样本残差用负样本投影残差强化学习提升推理并保持生成多样性。
cs.CLcs.LG
Zihan Lin, Xiaohan Wang, Jie Cao, Jiajun Chai, Li Wang
Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement ...
Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposes ne...
487 Multilingual Safety Alignment via Self-Distillation
2605.02971
多语种安全自蒸馏对齐用跨语言自蒸馏将高资源语言护栏迁移到低资源语言防越狱。
cs.CLcs.LGcs.AI
Ruiyang Qin, Qingzhuo Wang, Dongrui Liu, Qiang Li, Zhihua Wei
Large language models (LLMs) exhibit severe multilingual safety misalignment: they possess strong safeguards in high-resource languages but remain highly vulnerable to jailbreak attacks in low-resource languages. Current safety alignment methods generally rely...
Large language models (LLMs) exhibit severe multilingual safety misalignment: they possess strong safeguards in high-resource languages but remain highly vulnerable to jailbreak attacks in low-resource languages. Current safety alignment methods generally rely on high-quality response data for each target language, which is expensive and difficult to generate. In this paper, we propose a cross-lingual safeguard transfer framework named Multilingual Self-Distillation (MSD). This framework transfe...
488 FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
2605.04651
前向闭式快速权重适配提出FAAST将标注样本一次前向编译为快权重实现测试时快速适配。
cs.CLcs.LG
Guangsheng Bao, Hongbo Zhang, Han Cui, Ke Sun, Yanbin Zhao
Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytical...
Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytically compiles labeled examples into fast weights in a single pass. By eliminating memory or context dependence, FAAST achieves constant-time inference and decouples task adaptation from pretrained representation. Across image classification a...
489 Belief Memory: Agent Memory Under Partial Observability
2605.05583
部分可观测的信念记忆用信念分布存储不确定观察,减少代理记忆自强化错误。
cs.CLcs.AI
Junfeng Liao, Qizhou Wang, Jianing Zhu, Bo Du, Rui Yan
LLM agents that operate over long context depend on external memory to accumulate knowledge over time. However, existing methods typically store each observation as a single deterministic conclusion (e.g., inferring "API~X failed" from temporary errors), even ...
LLM agents that operate over long context depend on external memory to accumulate knowledge over time. However, existing methods typically store each observation as a single deterministic conclusion (e.g., inferring "API~X failed" from temporary errors), even though such observations are inherently partial and potentially ambiguous. By committing to one conclusion and discarding uncertainty, these methods introduce self-reinforcing error: the agent acts on the stored conclusion, never revisits a...
490 Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks
2605.05995
几何瓶颈防有害微调提出Safety Anchor以几何瓶颈限制优化轨迹抵御有害微调攻击。
cs.CLcs.AI
Guoxin Lu, Letian Sha, Qing Wang, Peijie Sun, Hao Zhou
The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on parameters, gradients, or internal representations, we observe that they can be effectively circumvented under p...
The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on parameters, gradients, or internal representations, we observe that they can be effectively circumvented under persistent HFT. Our analysis traces this failure to the inherent redundancy of the high-dimensional parameter space: attackers exploit optimization trajectories that are orthogonal to defense constraints to restore harmful capabilities while...
491 On Time, Within Budget: Constraint-Driven Online Resource Allocation for Agentic Workflows
2605.06110
代理工作流在线资源分配在预算与截止约束下为多模型工作流做在线资源分配优化。
cs.CLcs.AI
Xinglin Wang, Zishen Liu, Shaoxiong Feng, Peiwen Yuan, Yiwei Li
Agentic systems increasingly solve complex user requests by executing orchestrated workflows, where subtasks are assigned to specialized models or tools and coordinated according to their dependencies. While recent work improves agent efficiency by optimizing ...
Agentic systems increasingly solve complex user requests by executing orchestrated workflows, where subtasks are assigned to specialized models or tools and coordinated according to their dependencies. While recent work improves agent efficiency by optimizing the performance--cost--latency frontier, real deployments often impose concrete requirements: a workflow must be completed within a specified budget and before a specified deadline. This shifts the goal from average efficiency optimization ...
cs.CV 280 papers
1 Visual Text Compression as Measure Transport
2605.06708
Visual text compression将文本渲染成图像压缩,并用测度传输解释性能差异。
cs.CVcs.AI
Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li
Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not t...
Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not translate predictably into downstream utility: on some tasks the visual path matches or exceeds the text path, on others it collapses, and the compression ratio itself does not predict which regime will occur. The missing quantity is therefo...
2 Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey
2605.06714
Edge deep learning survey综述边缘端深度学习在视觉与医疗诊断中的方法与挑战。
cs.CVcs.AI
Yiwen Xu, Tariq M. Khan, Yang Song, Erik Meijering
Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensiv...
Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensive review of the current state of the art in edge deep learning, focusing on computer vision applications, in particular medical diagnostics. An overview of the foundational principles and technical advantages of edge deep learning is presen...
3 HumanNet: Scaling Human-centric Video Learning to One Million Hours
2605.06747
Large-scale human video dataset发布百万小时人类活动视频数据集以扩展具身学习。
cs.CV
Yufan Deng, Daquan Zhou
Progress in embodied intelligence increasingly depends on scalable data infrastructure. While vision and language have scaled with internet corpora, learning physical interaction remains constrained by the lack of large, diverse, and richly annotated human act...
Progress in embodied intelligence increasingly depends on scalable data infrastructure. While vision and language have scaled with internet corpora, learning physical interaction remains constrained by the lack of large, diverse, and richly annotated human activity data. We present HumanNet, a one-million-hour human-centric video corpus that captures how humans interact with the physical world at scale. HumanNet spans both first-person and third-person perspectives and covers fine-grained activi...
4 R$^3$L: Reasoning 3D Layouts from Relative Spatial Relations
2605.06758
3D layout spatial reasoning提升相对空间关系推理一致性以生成可靠3D布局。
cs.CVcs.LGcs.AI
Zhifeng Gu, Yuqi Wang, Bing Wang
Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred...
Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred relations are often unreliable and are typically handled with post-hoc heuristics. In this paper, we propose R$^3$L, a general framework that improves the reliability and consistency of relative spatial reasoning for 3D layout generation. ...
5 LookWhen? Fast Video Recognition by Learning When, Where, and What to Compute
2605.06809
Efficient video recognition学习何时何地计算以减少视频Transformer冗余开销。
cs.CVcs.LG
Ali Salamatian, Anthony Fuller, Pritam Sarkar, James R. Green, Leonid Sigal
Transformers dominate video recognition. They split videos into tokens, and processing them has expensive superlinear computational cost. Yet videos are filled with redundancy, so we can question the need for this expense. We introduce LookWhen, a selector-ext...
Transformers dominate video recognition. They split videos into tokens, and processing them has expensive superlinear computational cost. Yet videos are filled with redundancy, so we can question the need for this expense. We introduce LookWhen, a selector-extractor framework that factorizes video recognition into learning when, where, and what to compute. Our shallow selector gets a scaled-down video and quickly scores all tokens across space-time, while our deep extractor gets the top-K select...
6 Knowledge Transfer Scaling Laws for 3D Medical Imaging
2605.06859
Scaling laws for 3D medical pretraining研究3D医学多模态预训练的迁移缩放规律与混合策略。
cs.CVcs.LGcs.AI
Ho Hin Lee, Dongna Du, Chu Wang, Yuankai Huo, Shi Gu
Vision foundation models are increasingly moving beyond 2D to volumetric domains such as 3D medical imaging, where unified pretraining across different imaging modalities (i.e. CT, MRI, and PET) could provide foundational models for diverse clinical tasks. How...
Vision foundation models are increasingly moving beyond 2D to volumetric domains such as 3D medical imaging, where unified pretraining across different imaging modalities (i.e. CT, MRI, and PET) could provide foundational models for diverse clinical tasks. However, training such models requires mixing heterogeneous imaging domains, and current mixture strategies remain largely heuristic. In this work, we observe that different medical imaging domains scale at variable rates during pretraining, a...
7 AdpSplit: Error-Driven Adaptive Splitting for Faster Geometry Discovery in 3D Gaussian Splatting
2605.06876
3D Gaussian splatting acceleration用误差驱动自适应分裂加速3DGS几何细节发现。
cs.CV
Yongjae Lee, Jingxing Li, Abhay Kumar Yadav, Rama Chellappa, Deliang Fan
Adaptive density control in 3D Gaussian Splatting (3DGS) repeatedly grows the Gaussian population through fixed-cardinality random splitting to discover useful scene structure. However, in vanilla 3DGS, its binary split operator requires many densification rou...
Adaptive density control in 3D Gaussian Splatting (3DGS) repeatedly grows the Gaussian population through fixed-cardinality random splitting to discover useful scene structure. However, in vanilla 3DGS, its binary split operator requires many densification rounds to expose fine details, making it a bottleneck for efficient training schedules with fewer iterations. We introduce AdpSplit, an error-driven adaptive split operator that determines the number of split children and initializes the child...
8 TriDE: Triangle-Consistent Translation Directions for Global Camera Pose Estimation
2605.06889
Global camera pose estimation利用三角一致性联合校验平移方向以估计全局相机位姿。
cs.CV
Francisco Chen, Yiran Wang, Yunpeng Shi
Pairwise translation directions are a key input to camera location estimation in global structure-from-motion. Existing estimators usually process each image pair independently, producing directions that may be locally plausible but inconsistent with the other...
Pairwise translation directions are a key input to camera location estimation in global structure-from-motion. Existing estimators usually process each image pair independently, producing directions that may be locally plausible but inconsistent with the other relative directions in the viewing graph. To jointly estimate the direction, we propose TriDE, which exploits camera-triangle consistency as an efficient higher-order verification signal. Instead of solving a costly global nonlinear optimi...
9 Towards Fairness under Label Bias in Image Segmentation: Impact, Measurement and Mitigation
2605.06891
Fair segmentation under label bias在无干净标注下检测并缓解分割任务的群体标签偏差。
cs.CVcs.LG
Aditya Parikh, Stella Frank, Sneha Das, Aasa Feragen
Labeled datasets reflect the biases of their annotation pipelines, which sometimes introduce label bias: group-conditional label errors that cause systematic performance disparities across demographic subgroups. Label bias in image segmentation remains underex...
Labeled datasets reflect the biases of their annotation pipelines, which sometimes introduce label bias: group-conditional label errors that cause systematic performance disparities across demographic subgroups. Label bias in image segmentation remains underexplored, as even detecting it typically requires clean, unbiased annotations, which are not readily available. We present a data-centric adaptation of Confident Learning to segmentation, allowing detection of label bias directly in the train...
10 Not All Tokens Need 40 Steps: Heterogeneous Step Allocation in Diffusion Transformers for Efficient Video Generation
2605.06892
Efficient diffusion video generation推理时为不同视频token分配不同去噪步数以降算力。
cs.CV
Ernie Chu, Vishal M. Patel
Diffusion Transformers (DiTs) have achieved state-of-the-art video generation quality, but they incur immense computational cost because standard inference applies the same number of denoising steps uniformly to every token in the sequence. It is well known th...
Diffusion Transformers (DiTs) have achieved state-of-the-art video generation quality, but they incur immense computational cost because standard inference applies the same number of denoising steps uniformly to every token in the sequence. It is well known that human vision ignores vast amounts of redundant motion. Why, then, do our densest models treat every spatiotemporal token with equal priority? In this paper, we introduce Heterogeneous Step Allocation (HSA), a training-free inference algo...
11 Advancing Reliable Synthetic Video Detection: Insights from the SAFE Challenge
2605.06912
Synthetic video detection benchmark总结SAFE挑战赛并分析可靠合成视频检测方法与评测。
cs.CV
Kirill Trapeznikov, Gabriel Mancino-Ball, Jonathan Li, Paul Cummer, Jai Aslam
The proliferation of generative video technologies has intensified the need for reliable methods to detect and characterize synthetic media. To address this challenge, we organized the \href{https://safe-video-2025.dsri.org}{SAFE: Synthetic Video Detection Cha...
The proliferation of generative video technologies has intensified the need for reliable methods to detect and characterize synthetic media. To address this challenge, we organized the \href{https://safe-video-2025.dsri.org}{SAFE: Synthetic Video Detection Challenge}, co-located with the \textit{Authenticity and Provenance in the Age of Generative AI (APAI) Workshop }at ICCV 2025. The competition invited participants to develop and evaluate algorithms capable of distinguishing real from syntheti...
12 A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency
2605.06924
Long video consistent generation提出闭环检索-生成-精炼的自回归扩散以保持长视频一致性。
cs.CVcs.AI
Do Xuan Long, Yale Song, Min-Yen Kan, Tomas Pfister, Long T. Le
Synthesizing consistent and coherent long video remains a fundamental challenge. Existing methods suffer from semantic drift and narrative collapse over long horizons. We present A$^2$RD, an Agentic Auto-Regressive Diffusion architecture that decouples creativ...
Synthesizing consistent and coherent long video remains a fundamental challenge. Existing methods suffer from semantic drift and narrative collapse over long horizons. We present A$^2$RD, an Agentic Auto-Regressive Diffusion architecture that decouples creative synthesis from consistency enforcement. A$^2$RD formulates long video synthesis as a closed-loop process that synthesizes and self-improves video segment-by-segment through a Retrieve--Synthesize--Refine--Update cycle. It comprises three ...
13 XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling
2605.06927
Energy-aware object detection NAS通过迭代架构搜索与能耗估计在边缘设备上优化检测模型。
cs.CVcs.AI
Tony Tran, Richie R. Suganda, Bin Hu
Object detection on heterogeneous edge devices must satisfy strict energy, latency, and memory constraints while still providing reliable perception for downstream autonomy. Existing energy-aware NAS methods often target limited deployment settings, while real...
Object detection on heterogeneous edge devices must satisfy strict energy, latency, and memory constraints while still providing reliable perception for downstream autonomy. Existing energy-aware NAS methods often target limited deployment settings, while real energy remains difficult to optimize because it is highly device-dependent and costly to measure. We address these challenges with an energy-adaptive framework that combines an energy-aware XiResOFA search space, a two-stage energy estimat...
14 Bringing Multimodal Large Language Models to Infrared-Visible Image Fusion Quality Assessment
2605.06969
IVIF quality assessment with MLLMs引入多模态大模型推理来评估红外-可见光融合图像质量。
cs.CV
Yuchen Guo, Junli Gong, Yao Lu, Xintong Xu, Yiuming Cheung
Infrared-Visible image fusion (IVIF) aims to integrate thermal information and detailed spatial structures into a single fused image to enhance perception. However, existing evaluation approaches tend to over-optimize both hand-crafted no-reference statistics ...
Infrared-Visible image fusion (IVIF) aims to integrate thermal information and detailed spatial structures into a single fused image to enhance perception. However, existing evaluation approaches tend to over-optimize both hand-crafted no-reference statistics and full-reference metrics that treat the source images as pseudo ground truths. Recent IVIF reward-modelling efforts learn from human ratings but use scalar regression on aggregated scores, neither leveraging the reasoning of Multimodal La...
15 TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations
2605.06990
Trajectory-centric geospatial pretraining用地理对齐表示学习对城市轨迹多模态进行自监督预训练。
cs.CVcs.LG
Maria Despoina Siampou, Gengchen Mai, Ni Lao, Jinmeng Rao, Neha Arora
Multimodal self-supervised learning (MSSL) has emerged as a key paradigm for pretraining geospatial foundation models. However, existing geospatial MSSL methods are mainly designed for static pairs of modalities, such as satellite imagery, street-view imagery,...
Multimodal self-supervised learning (MSSL) has emerged as a key paradigm for pretraining geospatial foundation models. However, existing geospatial MSSL methods are mainly designed for static pairs of modalities, such as satellite imagery, street-view imagery, and text, where learning is driven by aligning observations from the same or nearby locations. This assumption breaks down for human mobility trajectories, which represent continuous movement along paths rather than discrete observations a...
16 LensVLM: Selective Context Expansion for Compressed Visual Representation of Text
2605.07019
Text-as-image VLM compression用选择性上下文扩展缓解高压缩下文本图像表征精度下降。
cs.CVcs.AI
Roy Xie, Dan Friedman, Donghan Yu, Bowen Pan, Christopher Fifty
Vision Language Models (VLMs) offer the exciting possibility of processing text as rendered images, bypassing the need for tokenizing the text into long token sequences. Since VLM image encoders map fixed-size images to a fixed number of visual tokens, varying...
Vision Language Models (VLMs) offer the exciting possibility of processing text as rendered images, bypassing the need for tokenizing the text into long token sequences. Since VLM image encoders map fixed-size images to a fixed number of visual tokens, varying rendering resolution provides a fine-grained compression knob. However, accuracy deteriorates quickly as compression increases: characters shrink below the vision encoder's effective resolution, making them indistinguishable. To address th...
17 OneViewAll: Semantic Prior Guided One-View 6D Pose Estimation for Novel Objects
2605.07023
One-view 6D pose estimation仅用单个RGB-D参考视图与语义先验实现新物体6D位姿估计。
cs.CV
Yang Luo, Yan Gong, Yongsheng Gao, Jie Zhao, Xinyu Zhang
In many practical 6D object pose estimation scenarios, we often have access to only a single real-world RGB-D reference view per object, typically without CAD models. Existing methods largely rely on explicit 3D models or multi-view data, which limits their sc...
In many practical 6D object pose estimation scenarios, we often have access to only a single real-world RGB-D reference view per object, typically without CAD models. Existing methods largely rely on explicit 3D models or multi-view data, which limits their scalability. To address this challenging single-reference model-free setting, we propose \textbf{OneViewAll}, a semantic-prior-guided framework that performs pose estimation via a novel Project-and-Compare paradigm. Instead of relying on comp...
18 Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness
2605.07055
Multiorgan medical foundation model以显著性引导掩码训练全身器官基础模型并增强缺失鲁棒性。
cs.CVcs.AI
Qiangqiang Wu, Grace McIlvain, Zhou Yu, Junhao Wen
Foundation models (FMs) have shown great promise in medical imaging, but most FMs are trained on unimodal data within isolated domains, such as brain MRI alone. Human aging and disease arise through coordinated biological processes across organs, therefore mot...
Foundation models (FMs) have shown great promise in medical imaging, but most FMs are trained on unimodal data within isolated domains, such as brain MRI alone. Human aging and disease arise through coordinated biological processes across organs, therefore motivating multimodal FMs that learn whole-body representations. A key challenge, however, is that real-world multimodal biomedical data are often missing not at random, which can reduce power, limit generalizability, and introduce bias. We pr...
19 Learning to Track Instance from Single Nature Language Description
2605.07064
Self-supervised vision-language tracking无需框标注,利用自然语言描述实现自监督目标跟踪。
cs.CV
Yaozong Zheng, Bineng Zhong, Qihua Liang, Shuimu Zeng, Haiying Xia
How to achieve vision-language (VL) tracking using natural language descriptions from a video sequence \textbf{without relying on any bounding-box ground truth}? In this work, we achieve this goal by tackling \textit{self-supervised VL tracking}, which aims to...
How to achieve vision-language (VL) tracking using natural language descriptions from a video sequence \textbf{without relying on any bounding-box ground truth}? In this work, we achieve this goal by tackling \textit{self-supervised VL tracking}, which aims to evaluate tracking capabilities guided by natural language descriptions. We introduce \textbf{\tracker}, a novel self-supervised VL tracker that is capable of tracking any referred object by a language description. Unlike traditional method...
20 Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection
2605.07074
Universal AI-generated image detection解耦语义与指纹特征以提升跨未知生成器的伪造检测。
cs.CV
Zhiyuan Wang (Hefei University of Technology), Yanxiang Chen (Key Laboratory of Knowledge Engineering with Big Data), Yuanzhi Yao (Key Laboratory of Knowledge Engineering with Big Data), Yunfeng Diao (Key Laboratory of Knowledge Engineering with Big Data)
Detecting AI-generated images across unseen architectures remains challenging, as existing models often overfit to generator-specific fingerprints and semantic content rather than learning universal forgery traces. We attribute this failure to feature entangle...
Detecting AI-generated images across unseen architectures remains challenging, as existing models often overfit to generator-specific fingerprints and semantic content rather than learning universal forgery traces. We attribute this failure to feature entanglement: detectors learn these factors as a single entangled representation, where universal forgery traces are inextricably confounded with both generator-specific fingerprints and semantic content. Crucially, our spectral analysis reveals th...
21 Learning Visual Feature-Based World Models via Residual Latent Action
2605.07079
Feature-based world models用残差潜在动作生成式预测未来视觉特征以构建世界模型。
cs.CVcs.LGcs.AI
Xinyu Zhang, Zhengtong Xu, Yutian Tao, Yeping Wang, Yu She
World models predict future transitions from observations and actions. Existing works predominantly focus on image generation only. Visual feature-based world models, on the other hand, predict future visual features instead of raw video pixels, offering a pro...
World models predict future transitions from observations and actions. Existing works predominantly focus on image generation only. Visual feature-based world models, on the other hand, predict future visual features instead of raw video pixels, offering a promising alternative that is more efficient and less prone to hallucination. However, current feature-based approaches rely on direct regression, which leads to blurry or collapsed predictions in complex interactions, while generative modelin...
22 ImplantMamba: Long-range Sequential Modeling Mamba For Dental Implant Position Prediction
2605.07082
Dental implant position prediction用Mamba长程建模从牙科影像上下文预测种植体位置与角度。
cs.CV
Xinquan Yang, Congmin Wang, Xuguang Li, Yulei Li, Linlin Shen
In the design of surgical guides for implant placement, determining the precise implant position is a critical step. However, the implant region itself is often characterized by a lack of distinctive texture in medical images. Consequently, artificial intellig...
In the design of surgical guides for implant placement, determining the precise implant position is a critical step. However, the implant region itself is often characterized by a lack of distinctive texture in medical images. Consequently, artificial intelligence (AI) models must infer the correct implant position and angulation (slope) primarily by analyzing the texture of the surrounding teeth, which poses a significant challenge. To address this, we propose ImplantMamba, a network architectu...
23 Task Relevance Is Not Local Replaceability: A Two-Axis View of Channel Information
2605.07086
Channel information analysis提出任务相关性与局部可替代性两轴度量以分析通道信息。
cs.CVcs.LG
Houman Safaai, Andrew T. Landau, Celia C. Beron, Yasin Mazloumi, Bernardo L. Sabatini
Channel importance in vision networks is usually summarized by a single score. That summary hides two different questions: how much a channel is related to the task, and whether its function can be supplied by same-layer peers when the channel is removed. We c...
Channel importance in vision networks is usually summarized by a single score. That summary hides two different questions: how much a channel is related to the task, and whether its function can be supplied by same-layer peers when the channel is removed. We call the second property local replaceability. We introduce a two-axis view that separates these questions. The local axis measures input capture and peer overlap, while the target axis measures task information and target-excess information...
24 InfoGeo: Information-Theoretic Object-Centric Learning for Cross-View Generalizable UAV Geo-Localization
2605.07099
Object-centric UAV geo-localization以信息论目标中心学习提升跨视角UAV地理定位泛化。
cs.CV
Hongyang Zhang, Maonnan Wang, Ziyao Wang, Hongrui Yin, Man OnPun
Cross-view geo-localization (CVGL) is fundamental for precise localization and navigation in GPS-denied environments, aiming to match ground or UAV imagery with satellite views. While existing approaches rely on global feature alignment, they often suffer from...
Cross-view geo-localization (CVGL) is fundamental for precise localization and navigation in GPS-denied environments, aiming to match ground or UAV imagery with satellite views. While existing approaches rely on global feature alignment, they often suffer from substantial domain shifts induced by varying regional textures and weather conditions. This issue becomes even more pronounced in UAV-based scenarios, where the broader perspective inevitably introduces dense, fine-grained objects, creatin...
25 Neurosymbolic Framework for Concept-Driven Logical Reasoning in Skeleton-Based Human Action Recognition
2605.07140
Neurosymbolic action recognition将骨架动作识别表述为基于概念的一阶逻辑推理以增强可解释性。
cs.CVcs.AI
Talha Ilyas, Deval Mehta, Zongyuan Ge
Skeleton-based human activity recognition has achieved strong empirical performance, yet most existing models remain black boxes and difficult to interpret. In this work, we introduce a neurosymbolic formulation of skeleton-based HAR that reframes action recog...
Skeleton-based human activity recognition has achieved strong empirical performance, yet most existing models remain black boxes and difficult to interpret. In this work, we introduce a neurosymbolic formulation of skeleton-based HAR that reframes action recognition as concept-driven first-order logical reasoning over motion primitives. Our framework bridges representation learning and symbolic inference by grounding first-order logic predicates in learnable spatial and temporal motion concepts....
26 Qwen3-VL-Seg: Unlocking Open-World Referring Segmentation with Vision-Language Grounding
2605.07141
Open-world referring segmentation将Qwen3多模态模型扩展为开放世界指代分割的像素级输出。
cs.CVcs.AI
Yuan Yao, Qiushi Yang, Humen Zhong, Jiangning Wei, Yifang Men
Open-world referring segmentation requires grounding unconstrained language expressions to precise pixel-level regions. Existing multimodal large language models (MLLMs) exhibit strong open-world visual grounding, but their outputs remain limited to sparse bou...
Open-world referring segmentation requires grounding unconstrained language expressions to precise pixel-level regions. Existing multimodal large language models (MLLMs) exhibit strong open-world visual grounding, but their outputs remain limited to sparse bounding-box coordinates and are insufficient for dense visual prediction. Recent MLLM-based segmentation methods either directly predict sparse contour coordinates, struggling to reconstruct continuous object boundaries, or rely on external s...
27 AGA3DNet: Anatomy-Guided Gaussian Priors with Multi-view xLSTM for 3D Brain MRI Subtype Classification
2605.07142
Brain MRI subtype classification融合报告提取解剖短语先验与多视角xLSTM进行3D分型。
cs.CV
Peiyu Duan, Xueqi Guo, Sepehr Farhand, Mehmet Berk Sahin, Xinyuan Zheng
Accurate 3D brain MRI subtype classification benefits from both localized anatomical cues and long-range contextual reasoning. We present AGA3DNet, a report-grounded framework that incorporates brief anatomical phrases extracted from radiology reports as a sof...
Accurate 3D brain MRI subtype classification benefits from both localized anatomical cues and long-range contextual reasoning. We present AGA3DNet, a report-grounded framework that incorporates brief anatomical phrases extracted from radiology reports as a soft anatomical prior channel and fuses it with a lightweight 3D CNN and multi-view xLSTM aggregation. Specifically, extracted anatomical phrases are mapped to atlas-defined regions and converted into smooth spatial priors using a signed-dista...
28 TriP: A Triangle Puzzle Approach to Robust Translation Averaging
2605.07143
Robust translation averaging用三角拼图式一致性推断实现抗噪的相机平移平均。
cs.CV
Zhekai Fan, Wanze Li, Jinxin Wang, Yunpeng Shi
Translation averaging aims to recover camera locations from pairwise relative translation directions and is a fundamental component of global Structure-from-Motion pipelines. The problem is challenging because direction measurements contain no distance informa...
Translation averaging aims to recover camera locations from pairwise relative translation directions and is a fundamental component of global Structure-from-Motion pipelines. The problem is challenging because direction measurements contain no distance information, making the estimation problem highly ill-conditioned and highly sensitive to corrupted observations. In this paper, we propose TriP, a triangle-based framework for robust translation averaging. TriP first infers local relative edge sc...
29 UniV2D: Bridging Visual Restoration and Semantic Perception for Underwater Salient Object Detection
2605.07146
Underwater salient object detection联合水下图像复原与语义感知以提升显著目标检测。
cs.CV
Laibin Chang, Shaodong Wang, Yunke Wang, Xu Zhang, Kui Jiang
Underwater salient object detection (USOD) plays a vital role in marine vision tasks but remains fundamentally challenging due to severe visual degradation, such as selective absorption and medium scattering. Conventional pipelines typically adopt a sequential...
Underwater salient object detection (USOD) plays a vital role in marine vision tasks but remains fundamentally challenging due to severe visual degradation, such as selective absorption and medium scattering. Conventional pipelines typically adopt a sequential "enhance-then-detect" paradigm. However, isolating low-level visual restoration from high-level semantic perception often leads to semantic inconsistency, where the restored images may not be optimal for detection and can even introduce ta...
30 Uncovering and Shaping the Latent Representation of 3D Scene Topology in Vision-Language Models
2605.07148
3D topology in vision-language models揭示并塑造VLM内部的3D场景拓扑表征以改进空间推理。
cs.CV
Haoming Wang, Wei Gao
Decades of cognitive science establish that humans navigate environments by forming cognitive maps, defined as allocentric and topology-preserving representations of 3D space. While modern Vision-Language Models (VLMs) demonstrate emergent spatial reasoning fr...
Decades of cognitive science establish that humans navigate environments by forming cognitive maps, defined as allocentric and topology-preserving representations of 3D space. While modern Vision-Language Models (VLMs) demonstrate emergent spatial reasoning from 2D egocentric inputs, it remains unclear whether they construct an analogous 3D internal representation. In this paper, we demonstrate that current VLMs do possess a latent topological map of 3D scenes, but it is heavily overshadowed by ...
31 Real-IAD MVN: A Multi-View Normal Vector Dataset and Benchmark for High-Fidelity Industrial Anomaly Detection
2605.07149
Multi-view normal anomaly dataset提出多视角法向量工业异常检测数据集与基准。
cs.CV
Wenbing Zhu, Jianing Liang, Linjie Cheng, Yurui Pan, Zhuhao Chen
Industrial Anomaly Detection (IAD) is critical for quality control, but existing methods struggle with subtle, geometric defects. Standard 2D (RGB) images are sensitive to texture and lighting but often miss fine geometric anomalies. While 3D point clouds capt...
Industrial Anomaly Detection (IAD) is critical for quality control, but existing methods struggle with subtle, geometric defects. Standard 2D (RGB) images are sensitive to texture and lighting but often miss fine geometric anomalies. While 3D point clouds capture macro-shape, they are typically too sparse to detect micro-defects like scratches or pits. We address this fundamental data limitation by introducing Real-IAD-MVN (Multi-View Normal), a large-scale industrial dataset. By upgrading our a...
32 DPG-CD: Depth-Prior-Guided Cross-Modal Joint 2D-3D Change Detection
2605.07151
Cross-modal 2D-3D change detection用深度先验引导实现2D语义与3D高度联合变化检测。
cs.CVcs.AI
Luqi Zhang, Zhen Dong, Bisheng Yang
Urban spatial evolution is manifested not only through horizontal expansion but also through vertical structural changes. Consequently, jointly capturing 2D semantic changes and 3D height changes is essential for urban morphology analysis and emergency managem...
Urban spatial evolution is manifested not only through horizontal expansion but also through vertical structural changes. Consequently, jointly capturing 2D semantic changes and 3D height changes is essential for urban morphology analysis and emergency management. In practical scenarios, collecting 3D observations is often constrained by high acquisition costs and the inability to support frequent updates. The multi-temporal cross-modal input consisting of pre-event Digital Surface Model (DSM) a...
33 PRIMED: Adaptive Modality Suppression for Referring Audio-Visual Segmentation via Biased Competition
2605.07154
Referring audio-visual segmentation以自适应模态抑制提升音视文指代分割鲁棒性。
cs.CV
Yuchen He, Jing Zhang
Referring Audio-Visual Segmentation (Ref-AVS) seeks to localize and segment target objects in video frames based on visual, auditory, and textual referring cues. The task is challenging because the relevance of different modalities varies across referring expr...
Referring Audio-Visual Segmentation (Ref-AVS) seeks to localize and segment target objects in video frames based on visual, auditory, and textual referring cues. The task is challenging because the relevance of different modalities varies across referring expressions and scenes, while existing methods typically treat multimodal cues as homogeneous inputs for fusion, prompting, or reasoning, making them vulnerable to irrelevant or misleading modalities. To address this problem, we propose PRIMED,...
34 Hierarchical Perfusion Graphs for Tumor Heterogeneity Modeling in Glioma Molecular Subtyping
2605.07156
Perfusion graphs for glioma subtyping用灌注图建模肿瘤异质性以预测胶质瘤分子分型。
cs.CV
Han Jang, Junhyeok Lee, Heeseong Eum, Joon Jang, Yoseob Han
Precise molecular subtyping of gliomas, including isocitrate dehydrogenase (IDH) mutation and 1p/19q codeletion, directly guides surgical and therapeutic decisions, yet currently relies on invasive tissue sampling. Deep learning on structural MRI has emerged a...
Precise molecular subtyping of gliomas, including isocitrate dehydrogenase (IDH) mutation and 1p/19q codeletion, directly guides surgical and therapeutic decisions, yet currently relies on invasive tissue sampling. Deep learning on structural MRI has emerged as a non-invasive alternative, but anatomy-only approaches cannot capture the hemodynamic signatures that distinguish molecular subtypes. Radiogenomics based on dynamic susceptibility contrast (DSC) MRI holds immense potential for non-invasi...
35 Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection
2605.07178
Text extraction for change detection从变化掩码提取结构化文本监督提升遥感变化检测。
cs.CV
Kai Zheng, Hang-Cheng Dong, Jiatong Pan, Zhenkai Wu, Fupeng Wei
Remote sensing change detection is pivotal for urban monitoring, disaster assessment, and environmental resource management. Yet, unimodal deep learning methods frequently confuse genuine semantic changes with visually similar but irrelevant variations. Recent...
Remote sensing change detection is pivotal for urban monitoring, disaster assessment, and environmental resource management. Yet, unimodal deep learning methods frequently confuse genuine semantic changes with visually similar but irrelevant variations. Recent multimodal approaches incorporate text as auxiliary supervision, but their descriptions are either semantically coarse and unstructured or model-generated and thus noisy. Critically, all of them overlook a simple fact: fine-grained change ...
36 SatSurfGS: Generalizable 2D Gaussian Splatting for Sparse-View Satellite Surface Reconstruction
2605.07181
Sparse-view satellite Gaussian splatting提出可泛化2D高斯溅射以稀疏视角重建卫星地表。
cs.CV
Min Chen, Wei Guo, Bin Wang, Wen Li, Tong Fang
Sparse-view satellite image surface reconstruction remains highly challenging, fundamentally because the reliability of multi-view matching under satellite imaging conditions is strongly spatially heterogeneous. Affected by large photometric differences, weak ...
Sparse-view satellite image surface reconstruction remains highly challenging, fundamentally because the reliability of multi-view matching under satellite imaging conditions is strongly spatially heterogeneous. Affected by large photometric differences, weak textures, and repetitive textures, multi-view geometric constraints are often sparse, unevenly distributed, and locally unreliable. Although 2D Gaussian Splatting (2DGS) is more suitable than 3D Gaussian Splatting (3DGS) for the explicit re...
37 PicoEyes: Unified Gaze Estimation Framework for Mixed Reality with a Large-Scale Multi-View Dataset
2605.07188
Mixed reality gaze estimation dataset统一框架从单/双目预测多属性注视并发布大规模数据集。
cs.CV
Fuxin Duan, Hui Wang
We present PicoEyes, a unified gaze estimation framework that directly predicts all key attributes of gaze, including 3D eye parameters, eye-region segmentation, optical axis, visual axis, and depth maps, from either monocular or binocular inputs. The framewor...
We present PicoEyes, a unified gaze estimation framework that directly predicts all key attributes of gaze, including 3D eye parameters, eye-region segmentation, optical axis, visual axis, and depth maps, from either monocular or binocular inputs. The framework simultaneously addresses calibration, gaze forecasting, and varying device postures, while also supporting 3D eye reconstruction via joint estimation of eye parameters and depth maps in an end-to-end manner. In addition, we introduce a la...
38 Attention Transfer Is Not Universally Effective for Vision Transformers
2605.07191
ViT attention transfer evaluation系统评测表明ViT注意力迁移蒸馏并非普遍有效。
cs.CVcs.LG
Huaiyuan Qin, Muli Yang, Gabriel James Goenawan, Peng Hu, Chen Gong
A recent work shows that Attention Transfer, which transfers only the attention patterns from a pre-trained teacher Vision Transformer (ViT) to a randomly initialized standard student ViT, is sufficient to recover the full benefit of the teacher's pre-trained ...
A recent work shows that Attention Transfer, which transfers only the attention patterns from a pre-trained teacher Vision Transformer (ViT) to a randomly initialized standard student ViT, is sufficient to recover the full benefit of the teacher's pre-trained weights. We revisit this finding on a comprehensive benchmark of 20 teachers from 11 well-known ViT families and reveal that Attention Transfer is not universally effective. While 7 families transfer successfully, 4 consistently fail, falli...
39 AsyncEvGS: Asynchronous Event-Assisted Gaussian Splatting for Handheld Motion-Blurred Scenes
2605.07192
Event-assisted Gaussian splatting用异步事件辅助高斯溅射重建运动模糊手持场景。
cs.CV
Jun Dai, Renbiao Jin, Bo Xu, Yutian Chen, Linning Xu
3D reconstruction methods such as 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) achieve impressive photorealism but fail when input images suffer from severe motion blur. While event cameras provide high-temporal-resolution motion cues, existi...
3D reconstruction methods such as 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) achieve impressive photorealism but fail when input images suffer from severe motion blur. While event cameras provide high-temporal-resolution motion cues, existing event-assisted approaches rely on low-resolution sensors and strict synchronization, limiting their practicality for handheld 3D capture on common devices, such as smartphones. We introduce a flexible, high-resolution asynchronous RGB-Ev...
40 Closed-Form Linear-Probe Dataset Distillation for Pre-trained Vision Models
2605.07194
Linear-probe dataset distillation给出线性探测场景下数据蒸馏的闭式解与高效算法。
cs.CVcs.LGcs.AI
Bincheng Peng, Guang Li, Ping Liu, Takahiro Ogawa, Miki Haseyama
Dataset distillation compresses a large training set into a small synthetic set that preserves downstream training utility. While most existing methods target training networks from scratch, modern visual transfer learning often uses frozen pre-trained encoder...
Dataset distillation compresses a large training set into a small synthetic set that preserves downstream training utility. While most existing methods target training networks from scratch, modern visual transfer learning often uses frozen pre-trained encoders followed by lightweight linear probing. Existing distillation methods for this setting either unroll iterative linear-probe updates with trajectory-based gradient matching, or rely on closed-form formulations originally designed for from-...
41 See Tomorrow, Act Today: Foresight-Driven Autonomous Driving
2605.07195
World-model planning for driving以世界模型前瞻想象未来场景来进行自动驾驶规划。
cs.CV
Bozhou Zhang, Nan Song, Yuang Wang, Jiankang Deng, Xiatian Zhu
Current end-to-end autonomous driving planners are fundamentally reactive: they condition on historical and present observations to predict future actions. We argue that autonomous agents should instead imagine future scenes before deciding, just as human driv...
Current end-to-end autonomous driving planners are fundamentally reactive: they condition on historical and present observations to predict future actions. We argue that autonomous agents should instead imagine future scenes before deciding, just as human drivers mentally simulate ``what will happen next" before acting. We introduce ForeSight, a foundation world model centric planning framework that reframes autonomous driving as anticipatory decision-making. Rather than treating world models as...
42 From Pixels to Primitives: Scene Change Detection in 3D Gaussian Splatting
2605.07203
3DGS primitive-based change detection直接利用高斯基元属性而非渲染像素进行场景变化检测。
cs.CV
Chamuditha Jayanga Galappaththige, Jason Lai, Timothy Patten, Donald Dansereau, Niko Suenderhauf
Scene change detection methods built on Gaussian splatting universally follow a render-then-compare paradigm: the pre-change scene is rendered into 2D and compared against post-change images via pixel or feature residuals. This change detection problem with Ga...
Scene change detection methods built on Gaussian splatting universally follow a render-then-compare paradigm: the pre-change scene is rendered into 2D and compared against post-change images via pixel or feature residuals. This change detection problem with Gaussian Splatting has been treated as a question about pixels; we treat it as a question about primitives. We provide direct evidence that native primitive attributes alone -- position, anisotropic covariance, and color -- carry sufficient s...
43 LoHGNet: Infrared Small Target Detection through Lorentz Geometric Encoding with High-Order Relation Learning
2605.07213
Infrared small target detection用洛伦兹几何编码与高阶关系学习提升红外小目标检测。
cs.CV
Qianwen Ma, Yang Xu, Shangwei Deng, Xiaobo Li, Haofeng Hu
Infrared small target detection (IRSTD) remains challenging due to the scarcity of useful target cues and the presence of severe background clutter. Most current methods rely on conventional feature learning and local interaction modeling, where features are r...
Infrared small target detection (IRSTD) remains challenging due to the scarcity of useful target cues and the presence of severe background clutter. Most current methods rely on conventional feature learning and local interaction modeling, where features are represented in Euclidean space. However, such designs may still be limited in describing the subtle differences of weak targets and the contextual relations between targets and backgrounds. To address these limitations, we propose LoHGNet, a...
44 DINO-MVR: Multi-View Readout of Frozen DINOv3 for Annotation-Efficient Medical Segmentation
2605.07221
Frozen DINOv3 medical segmentation提出多视图读出冻结DINOv3特征以少标注医学分割。
cs.CV
Wei Jiang, Feng Liu, Nan Ye, Hongfu Sun
Adapting foundation models to medical segmentation typically requires either backbone fine-tuning or high-capacity task-specific decoders, both of which are difficult to fit reliably when annotations are scarce. We show that frozen DINOv3 features already cont...
Adapting foundation models to medical segmentation typically requires either backbone fine-tuning or high-capacity task-specific decoders, both of which are difficult to fit reliably when annotations are scarce. We show that frozen DINOv3 features already contain useful structural and boundary cues for medical segmentation, and that the main bottleneck lies in how these features are read out. We propose DINO-MVR, a Multi-View Readout framework for annotation-efficient medical segmentation. DINO-...
45 CASCADE: Context-Aware Relaxation for Speculative Image Decoding
2605.07230
Speculative decoding for images用上下文感知松弛降低拒绝率以加速自回归图像解码。
cs.CVcs.AI
Selin Yildirim, Subhajit Dutta Chowdhury, Mohammad Mahdi Kamani, Vikram Appia, Deming Chen
Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing app...
Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing approaches fail to achieve efficiency gains comparable to those observed in text generation. A key limitation is the target model's high uncertainty during image generation, which leads to high draft token rejection rates. In this work, we ide...
46 Towards multi-modal forgery representation learning for AI-generated video detection and localization
2605.07232
Multimodal AI-generated video forensics学习多模态伪造表征以检测并时序定位AI生成视频篡改。
cs.CV
Dat Le, Khoa Nguyen, Xin Wang, Shu Hu
Recent advances in generative AI have democratized video creation at scale. AI-generated videos, including partially manipulated clips across visual and audio channels, pose escalating risks of semantic distortion and misuse, which motivates the need for relia...
Recent advances in generative AI have democratized video creation at scale. AI-generated videos, including partially manipulated clips across visual and audio channels, pose escalating risks of semantic distortion and misuse, which motivates the need for reliable detection tools. Most existing AI-generated video detectors remain limited by single- or partial-modality of data modeling and the lack of fine-grained temporal forgery localization. To address these challenges, our primary novelty intr...
47 Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment
2605.07250
MLLM safety bypass via degradation揭示图像降质会削弱多模态模型安全对齐并促成越狱。
cs.CVcs.AI
Zhixue Song, Boyan Han, Yiwei Wang, Chi Zhang
Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes ...
Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes jailbreaking. Our experiments reveal that the safety defenses of SOTA models deteriorate sharply as resolution degrades, surprisingly persisting even when text remains legible. We attribute this to ``Cognitive Overload'', hypothesizing that...
48 LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling
2605.07253
Efficient diffusion sampling noise shaping通过低频特征噪声整形在少步扩散采样中提升画质与效率。
cs.CV
Haewon Jeon, Si-Hyeon Lee
Distilled diffusion models accelerate image generation by reducing the number of denoising steps, but often suffer from degraded image quality. To mitigate this trade-off, test-time optimization methods improve quality, yet their iterative nature incurs substa...
Distilled diffusion models accelerate image generation by reducing the number of denoising steps, but often suffer from degraded image quality. To mitigate this trade-off, test-time optimization methods improve quality, yet their iterative nature incurs substantial computational overhead and leads to slow inference, limiting practical usability. Recent hypernetwork-based approaches amortize this process during training, but still require costly noise modulation in high-dimensional latent spaces....
49 High-Fidelity Surface Splatting-Based 3D Reconstruction from Multi-View Images
2605.07254
Surface splatting 3D reconstruction提出基于表面溅射的多视图重建以联合优化几何与外观。
cs.CV
Nandhana Sunil, Abhirami R Iyer, Avirup Mandal
Multi-view mesh reconstruction remains a core challenge in computer graphics and vision, especially for recovering high-frequency geometry from sparse observations. Recent methods such as 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) rely on p...
Multi-view mesh reconstruction remains a core challenge in computer graphics and vision, especially for recovering high-frequency geometry from sparse observations. Recent methods such as 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) rely on post-processing for mesh extraction, thereby limiting joint optimization of geometry and appearance. Implicit Moving Least Squares (IMLS) instead enables direct conversion of point clouds into signed distance and texture fields, supporting e...
50 TAS-LoRA: Transformer Architecture Search with Mixture-of-LoRA Experts
2605.07256
ViT architecture search with LoRA用LoRA专家混合缓解超网权重共享导致的架构搜索塌缩。
cs.CV
Jeimin Jeon, Hyunju Lee, Bumsub Ham
Transformer architecture search (TAS) discovers optimal vision transformer (ViT) architectures automatically, reducing human effort to manually design ViTs. However, existing TAS methods suffer from the feature collapse problem, where subnets within a supernet...
Transformer architecture search (TAS) discovers optimal vision transformer (ViT) architectures automatically, reducing human effort to manually design ViTs. However, existing TAS methods suffer from the feature collapse problem, where subnets within a supernet fail to learn subnet-specific features, mainly due to the shared weights in a supernet, limiting the performance of individual subnets. To address this, we propose TAS-LoRA, a novel method that introduces parameter-efficient low-rank adapt...
51 Adaptive Subspace Projection for Generative Personalization
2605.07257
Generative personalization subspace control用自适应子空间投影抑制个性化语义塌缩并保留提示上下文。
cs.CV
Van-Anh Nguyen, Anh Tuan Bui, Tamas Abraham, Junae Kim, Amardeep Kaur
Generative personalization often suffers from the semantic collapsing problem (SCP), where a learned personalized concept overpowers the rest of the text prompt, causing the model to ignore important contextual details. To address this, we first analyze the un...
Generative personalization often suffers from the semantic collapsing problem (SCP), where a learned personalized concept overpowers the rest of the text prompt, causing the model to ignore important contextual details. To address this, we first analyze the underlying cause, revealing that the semantic drift responsible for SCP is not random but is concentrated within a specific low-dimensional subspace. We also discover that the personalization process perturbs the embedding of the original bas...
52 Sat3R: Satellite DSM Reconstruction via RPC-Aware Depth Fine-tuning
2605.07264
Satellite DSM depth fine-tuning引入RPC感知深度微调以提升卫星影像DSM重建精度与泛化。
cs.CV
Qiaoyi Yang, Chaoyi Zhou, Xi Liu, Run Wang, Minghui Xu
Accurate Digital Surface Model (DSM) reconstruction from satellite imagery is critical for applications such as disaster response, urban planning, and large-scale geographic mapping. Existing approaches face a fundamental trade-off: optimization-based methods ...
Accurate Digital Surface Model (DSM) reconstruction from satellite imagery is critical for applications such as disaster response, urban planning, and large-scale geographic mapping. Existing approaches face a fundamental trade-off: optimization-based methods achieve strong accuracy but require hours of per-scene computation, while generalizable geometry foundation models offer near-instant inference but fail to generalize to satellite imagery due to the domain gap introduced by the Rational Pol...
53 From Clouds to Hallucinations: Atmospheric Retrieval Hijacking in Remote Sensing Vision-Language RAG
2605.07273
Remote sensing RAG retrieval attack提出云层式输入攻击劫持遥感视觉语言RAG的证据检索。
cs.CVcs.AI
Jiaju Han, Chao Li, Chengyin Hu, Qike Zhang, Xuemeng Sun
Multimodal RAG systems increasingly rely on vision-language retrievers to ground visual queries in external textual evidence. Existing adversarial studies on RAG mainly manipulate the retrieval corpus or memory, while attacks on vision-language and remote sens...
Multimodal RAG systems increasingly rely on vision-language retrievers to ground visual queries in external textual evidence. Existing adversarial studies on RAG mainly manipulate the retrieval corpus or memory, while attacks on vision-language and remote sensing models typically target end-task predictions. Input-space threats to the evidence retrieval stage of remote sensing multimodal RAG remain underexplored. To address this gap, we introduce CloudWeb, an atmospheric retrieval hijacking atta...
54 SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis
2605.07287
Adaptive Gaussian allocation for NVS学习按场景复杂度自适应分配高斯基元以提升泛化新视角合成。
cs.CV
Yecong Wan, Fan Li, Mingwen Shao, Wangmeng Zuo
Generalizable novel view synthesis aims to render unseen views from uncalibrated input images without requiring per-scene optimization. Recent feed-forward approaches based on 3D Gaussian Splatting have achieved promising efficiency and rendering quality. Howe...
Generalizable novel view synthesis aims to render unseen views from uncalibrated input images without requiring per-scene optimization. Recent feed-forward approaches based on 3D Gaussian Splatting have achieved promising efficiency and rendering quality. However, most of them assign a fixed number of Gaussians to each pixel or voxel, ignoring the spatially varying complexity of real-world scenes. Such uniform allocation often wastes Gaussian primitives in smooth regions while providing insuffic...
55 Sword: Style-Robust World Models as Simulators via Dynamic Latent Bootstrapping for VLA Policy Post-Training
2605.07288
World-model simulator for VLA policies用动态潜变量自举训练风格鲁棒世界模型以改进想象式策略优化。
cs.CVcs.AI
Jiaxuan Gao, Yongjian Guo, Zhong Guan, Wen Huang, Wanlun Ma
The integration of Vision-Language-Action (VLA) models with World Models has gained increasing attention. One representative approach treats learned World Models as generative simulators, enabling policy optimization entirely within "imagination." However, whe...
The integration of Vision-Language-Action (VLA) models with World Models has gained increasing attention. One representative approach treats learned World Models as generative simulators, enabling policy optimization entirely within "imagination." However, when deployed as simulators for specific environments such as the LIBERO benchmark, existing World Models often suffer from poor generalization and long-horizon error accumulation. During closed-loop rollouts, these models are highly sensitive...
56 EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams
2605.07299
Egocentric proactive interaction benchmark构建流式第一视角基准评测个性化主动交互与时机把握。
cs.CVcs.AI
Dongchuan Ran, Linyu Ou, Xueheng Li, Wenwen Tong, Chenxu Guo
Existing Multimodal Large Language Models (MLLMs) remain primarily reactive, failing to continuously perceive environments or proactively assist users. While emerging benchmarks address proactivity, they are largely confined to alert scenarios, neglect persona...
Existing Multimodal Large Language Models (MLLMs) remain primarily reactive, failing to continuously perceive environments or proactively assist users. While emerging benchmarks address proactivity, they are largely confined to alert scenarios, neglect personalized context, and fail to evaluate the precise timing of human-machine interactions (HMI).In this paper, we introduce EgoPro-Bench, a novel benchmark for training and evaluating proactive interaction capabilities based on streaming egocent...
57 Amortized-Precision Quantization for Early-Exit Vision Transformers
2605.07317
Quantization for early-exit ViTs提出利用率感知量化以稳定低精度早退ViT的动态推理。
cs.CVcs.AI
Rui Fang, Hsi-Wen Chen, Ming-Syan Chen
Vision Transformers (ViTs) achieve strong performance across vision tasks, yet their deployment with low-precision early exiting remains fragile. Existing quantization methods assume static full-depth execution, making them unstable when exit decisions are per...
Vision Transformers (ViTs) achieve strong performance across vision tasks, yet their deployment with low-precision early exiting remains fragile. Existing quantization methods assume static full-depth execution, making them unstable when exit decisions are perturbed by quantization noise, which can amplify errors along dynamic inference paths. In this paper, we introduce Amortized-Precision Quantization (APQ), a utilization-aware formulation that accounts for layer-wise stochastic exposure to qu...
58 GEM: Generating LiDAR World Model via Deformable Mamba
2605.07326
Generative LiDAR world model用可变形Mamba建模点云时空动态生成LiDAR世界模型。
cs.CV
Yang Wu, Zhaojiang Liu, Qiang Meng, Youquan Liu, Renliang Weng
World models, which simulate environmental dynamics and generate sensor observations, are gaining increasing attention in autonomous driving. However, progress in LiDAR-based world models has lagged behind those built on camera videos or occupancy data, primar...
World models, which simulate environmental dynamics and generate sensor observations, are gaining increasing attention in autonomous driving. However, progress in LiDAR-based world models has lagged behind those built on camera videos or occupancy data, primarily due to two core challenges: the inherent disorder of LiDAR point clouds and the difficulty of distinguishing dynamic objects from static structures. To address these issues, we propose GEM: a Generative LiDAR world model that leverages ...
59 Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
2605.07327
One-step diffusion distillation用单一漂移损失结合预训练表征实现一步扩散蒸馏。
cs.CV
Yuan Zhang, Chenyi Li, Guoqing Ma, Jiajun Zha, Yuanming Yang
Sampling from pretrained diffusion and flow-matching models typically requires many forward passes to generate diverse and high-fidelity images. Existing distillation methods often rely on multiple auxiliary networks, carefully designed training stages, or com...
Sampling from pretrained diffusion and flow-matching models typically requires many forward passes to generate diverse and high-fidelity images. Existing distillation methods often rely on multiple auxiliary networks, carefully designed training stages, or complex optimization pipelines. In this work, we revisit the recently proposed Drifting Model objective and show that a single drifting loss can be directly used to simplify one step distillation. A key observation is that the pretrained diffu...
60 GC-ART: Global Learnable Second-Order Rational Tone Curves for Illumination Robustness
2605.07329
Tone-curve preprocessing for robustness学习全局可微色调曲线预处理以增强分类对光照变化鲁棒性。
cs.CV
Wei Huang, Joyce Huang
We introduce GC-ART (Global Curve Adaptive Rational Tone-mapping), a lightweight differentiable pre-processing module for robust image classification. GC-ART predicts an endpoint-pinned rational tone curve from per-channel soft histograms using a 643-parameter...
We introduce GC-ART (Global Curve Adaptive Rational Tone-mapping), a lightweight differentiable pre-processing module for robust image classification. GC-ART predicts an endpoint-pinned rational tone curve from per-channel soft histograms using a 643-parameter MLP, then applies the curve pointwise before the classifier. The module is trained end-to-end with cross-entropy and a soft monotonicity penalty. On CIFAR-10 with a CIFAR-style ResNet-18, GC-ART matches clean accuracy with the unenhanced b...
61 RCoT-Seg: Reinforced Chain-of-Thought for Video Reasoning and Segmentation
2605.07334
Video reasoning segmentation用强化链式思维提升视频推理分割的时序理解与定位。
cs.CV
Junwei Wen, Deshui Miao, Guangming Lu, Xin Li, Wenjie Pei
Video Reasoning Segmentation (VRS) aims to segment target objects in videos based on implicit instructions that convey human intent and temporal logic. Existing MLLM-based methods predict masks with a [SEG] token after selecting frames via simple sampling or a...
Video Reasoning Segmentation (VRS) aims to segment target objects in videos based on implicit instructions that convey human intent and temporal logic. Existing MLLM-based methods predict masks with a [SEG] token after selecting frames via simple sampling or an auxiliary MLLM, where limited supervision and frame-language similarity rules often yield narrow-scope keyframe choices that weaken holistic temporal understanding and lead to brittle localization in complex multi-object scenes. To addres...
62 ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs
2605.07338
Marine mollusc recognition benchmark构建海洋贝类视觉识别基准以评测水下鲁棒性。
cs.CV
Ziheng Zhou, Yang Wang, Nan Wang, Chengliang Wu, Jun Yan
The decline of global shellfish biodiversity poses a severe threat to coastal ecosystems. Although artificial intelligence (AI) technologies show potential for automated ecological monitoring, existing marine benthic datasets often lack adaptation to the compl...
The decline of global shellfish biodiversity poses a severe threat to coastal ecosystems. Although artificial intelligence (AI) technologies show potential for automated ecological monitoring, existing marine benthic datasets often lack adaptation to the complexities of real underwater environments (e.g., variable lighting conditions and diverse species postures), posing challenges for the robust generalization of vision models in practical ecological monitoring. To address this problem, we cons...
63 SoLAR: Error-Resilient Streamable Long-Horizon Free-Viewpoint Video Reconstruction with Anchor Activation and Latent Recalibration
2605.07346
Long-horizon free-viewpoint reconstruction提出可流式长时自由视角视频重建的抗误差表示与校准。
cs.CV
Haotian Zhang, Xu Mo, Yixin Yu, Guanhua Zhu, Jian Xue
Free-Viewpoint Video (FVV) has emerged as a cornerstone of next-generation immersive media systems and attracted widespread attention. Previous methods primarily focus on short video sequences and suffer from significant performance degradation when processing...
Free-Viewpoint Video (FVV) has emerged as a cornerstone of next-generation immersive media systems and attracted widespread attention. Previous methods primarily focus on short video sequences and suffer from significant performance degradation when processing long-horizon free-viewpoint video (LFVV). Motivated by bit allocation theory, we analyze dynamic-anchor-based volumetric video representation within a rate-distortion optimization framework and propose \textbf{SoLAR}, which is the first er...
64 Disambiguating 2D-3D Correspondences in Gaussian Splatting-based Feature Fields for Visual Localization
2605.07351
Gaussian splatting localization改进高斯泼溅特征场的2D-3D匹配以稳定位姿定位。
cs.CV
Miso Lee, Sangeek Hyun, Yerim Jeon, Jae-Pil Heo
While Gaussian Splatting-based Feature Fields (GSFFs) have shown promise for visual localization, this paper highlights that photometrically optimized GSFFs are inherently ill-suited for 2D-3D matching. The volumetric extent of each Gaussian induces many-to-on...
While Gaussian Splatting-based Feature Fields (GSFFs) have shown promise for visual localization, this paper highlights that photometrically optimized GSFFs are inherently ill-suited for 2D-3D matching. The volumetric extent of each Gaussian induces many-to-one pixel-to-point mappings that destabilize PnP-based pose estimation, while photometric optimization gives rise to superfluous Gaussians devoid of multi-view consistency. To address these issues, we propose SplitGS-Loc, a localization-speci...
65 TTF: Temporal Token Fusion for Efficient Video-Language Model
2605.07355
Video token compression提出训练免的时序Token融合以加速视频语言模型推理。
cs.CVcs.AI
Simin Huo, Ning LI
Video-language models (VLMs) face rapid inference costs as visual token counts scale with video length. For example, 32 frames at $448{\times}448$ resolution already yield >8,000 visual tokens in Qwen3-VL, making LLM prefill the dominant throughput bottlene...
Video-language models (VLMs) face rapid inference costs as visual token counts scale with video length. For example, 32 frames at $448{\times}448$ resolution already yield >8,000 visual tokens in Qwen3-VL, making LLM prefill the dominant throughput bottleneck. Existing methods often rely on global similarity or attention-guided compression, incurring offsets to their gains. We propose \textbf{Temporal Token Fusion (TTF)}, a training-free, plug-and-play pre-LLM token compression framework that ex...
66 UniD-Shift: Towards Unified Semantic Segmentation via Interpretable Share-Private Multimodal Decomposition
2605.07356
Multimodal 2D-3D segmentation fusion用可解释共享-私有分解统一2D/3D语义分割融合。
cs.CV
Shuai Zhang, Zhecheng Shi, Zhuxiao Li, Jing Ou, Tengxi Wang
Semantic segmentation of large-scale 3D point clouds is crucial for applications such as autonomous driving and urban digital twins. However, the sparse sampling pattern of LiDAR and the view-dependent geometric distortion in image observations complicate cros...
Semantic segmentation of large-scale 3D point clouds is crucial for applications such as autonomous driving and urban digital twins. However, the sparse sampling pattern of LiDAR and the view-dependent geometric distortion in image observations complicate cross-modal alignment and hinder stable fusion. Inspired by the fact that 2D images captured by cameras are representations of the 3D world, we recognize that the features learned from 2D and 3D segmentation share some common semantics, while o...
67 UniISP: A Unified ISP Framework for Both Human and Machine Vision
2605.07359
Unified image signal processing统一RAW到RGB处理以兼顾人眼观感与机器识别性能。
cs.CV
Hanxi Li, Yao Cheng, Bo Zhang, Li Zeng
Compared to RGB images, raw sensor data provides a richer representation of information, which is crucial for accurate recognition, particularly under challenging conditions such as low-light environments. The traditional Image Signal Processing (ISP) pipeline...
Compared to RGB images, raw sensor data provides a richer representation of information, which is crucial for accurate recognition, particularly under challenging conditions such as low-light environments. The traditional Image Signal Processing (ISP) pipeline generates visually pleasing RGB images for human perception through a series of steps, but some of these operations may adversely impact the information integrity by introducing compression and loss. Furthermore, in computer vision tasks t...
68 RELO: Reinforcement Learning to Localize for Visual Object Tracking
2605.07379
RL-based object tracking localization将跟踪定位建模为MDP并用强化学习直接优化IoU等指标。
cs.CVcs.AI
Xin Chen, Chuanyu Sun, Jiao Xu, Houwen Peng, Dong Wang
Conventional visual object trackers localize targets using handcrafted spatial priors, often in the form of heatmaps. Such priors provide only surrogate supervision and are poorly aligned with tracking optimization and evaluation metrics, such as intersection ...
Conventional visual object trackers localize targets using handcrafted spatial priors, often in the form of heatmaps. Such priors provide only surrogate supervision and are poorly aligned with tracking optimization and evaluation metrics, such as intersection over union (IoU) and area under the success curve (AUC). Here, we introduce RELO, a REinforcement-learning-to-LOcalize method for visual object tracking that formulates target localization as a Markov decision process. Specifically, RELO re...
69 A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization
2605.07388
Marine debris detection增强YOLO的注意力与特征交互以提升海洋垃圾检测。
cs.CV
Yuyang Li, Jiashu Han, Yinyi Lai, Wenbin Kang, Zenghui Liu
Marine debris detection for ocean robot is crucial for ecological protection, yet performance is often degraded by low-quality images with blur, complex backgrounds, and small targets. To address these challenges, we propose YOLO-MD, an enhanced YOLO-based det...
Marine debris detection for ocean robot is crucial for ecological protection, yet performance is often degraded by low-quality images with blur, complex backgrounds, and small targets. To address these challenges, we propose YOLO-MD, an enhanced YOLO-based detection framework. A Dual-Branch Convolutional Enhanced Self-Attention (DB-CASA) module is designed to strengthen spatial-channel interactions, improving feature representation in degraded images. Additionally, a lightweight shift-based oper...
70 ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation
2605.07390
4D world model generation在世界模型中注入4D时空认知以生成更一致的4D内容。
cs.CV
Haonan Wang, Hanyu Zhou, Tao Gu, Luxin Yan
Generative models have achieved success in producing apparently coherent 2D videos, but remain challenging in the physical world due to lack of 4D spatiotemporal scale. Typically, existing 4D generative models directly embed macro scale constraints to enhance ...
Generative models have achieved success in producing apparently coherent 2D videos, but remain challenging in the physical world due to lack of 4D spatiotemporal scale. Typically, existing 4D generative models directly embed macro scale constraints to enhance overall spatiotemporal consistency. However, these methods only ensure global appearance coherence and fail to reveal the local dynamics of the physical world. Our insight is that global appearance structure and local dynamic topology empow...
71 BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
2605.07394
RL image captioning alignment提出平衡的强化学习框架以兼顾多维度图像描述质量。
cs.CVcs.AI
Shaokai Ye, Vasileios Saveris, Yihao Qian, Jiaming Hu, Elmira Amirloo
Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, rece...
Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasingly turned to reinforcement learning (RL). However, existing captioning-RL methods and evaluation metrics often emphasize a narrow notion of caption quality, inducing trade-offs across core dimensions of captioning. For...
72 Exposing and Mitigating Temporal Attack in Deepfake Video Detection
2605.07398
Deepfake temporal attack defense揭示深伪检测的时序攻击并用谱不变防御提升鲁棒性。
cs.CVcs.AI
Zheyuan Gu, Minghao Shao, Zhen Wang, Yusong Wang, Mingkun Xu
While spatiotemporal deepfake detectors achieve high AUC, our experiments reveal their susceptibility to evasion attacks. These models tend to overfit on fragile temporal spectrum cues, rather than learning robust semantic causality. To mitigate this vulnerabi...
While spatiotemporal deepfake detectors achieve high AUC, our experiments reveal their susceptibility to evasion attacks. These models tend to overfit on fragile temporal spectrum cues, rather than learning robust semantic causality. To mitigate this vulnerability, we propose SpInShield, a temporal spectral-invariant defense framework explicitly designed to decouple semantic motion from manipulatable spectral artifacts. We propose a learnable spectral adversary that dynamically synthesizes sever...
73 GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization
2605.07399
Jailbreak diffusion VLM用全局概率优化方法对扩散式视觉语言模型实施越狱攻击。
cs.CV
Yu Pan, Andi Zhang, Yi Wang, Sibei Yang, Wenjie Wang
Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have demonstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dV...
Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have demonstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dVLMs appear inherently robust against conventional jailbreak tactics, which we categorize as Fixed Prefix Optimization (FPO) (e.g., anchoring responses with "Sure, here is"), this perceived resilience is deceptive. Our investigation into the...
74 InsHuman: Towards Natural and Identity-Preserving Human Insertion
2605.07402
Identity-preserving human insertion提出保持身份与姿态自然的人像插入方法与数据集。
cs.CV
Jie Li, Shulian Zhang, Yangyang Gao, Wenbo Li, Yulun Zhang
Human insertion aims to naturally place specific individuals into a target background. Although existing image editing models may have such ability, they often produce failure cases, including inappropriate human pose in new background, inconsistent number of ...
Human insertion aims to naturally place specific individuals into a target background. Although existing image editing models may have such ability, they often produce failure cases, including inappropriate human pose in new background, inconsistent number of people, and modified facial identity. Moreover, publicly available human datasets often lack full-body portraits and realistic physical interaction between humans and their background. To address these challenges, we propose InsHuman for na...
75 ChartREG++: Towards Benchmarking and Improving Chart Referring Expression Grounding under Diverse referring clues and Multi-Target Referring
2605.07415
Chart referring expression grounding构建并改进图表指代表达定位基准以支持多目标线索。
cs.CVcs.CL
Tianhao Niu, Ziyu Han, Qingfu Zhu, Wanxiang Che
Referring expression grounding is a core problem in visual grounding and is widely used as a diagnostic of spatial grounding and reasoning in vision and language models, yet most prior work focuses on natural images. In contrast, existing chart referring expre...
Referring expression grounding is a core problem in visual grounding and is widely used as a diagnostic of spatial grounding and reasoning in vision and language models, yet most prior work focuses on natural images. In contrast, existing chart referring expression grounding-related benchmarks remain limited: (1) they largely adopt bounding boxes, constraining localization precision for fine chart elements (2) they mostly assume a single and two referred target instances, failing to handle multi...
76 Learning Image-Adaptive Scale Fields for Metric Depth Recovery
2605.07418
Metric depth scale recovery学习图像自适应尺度场以用稀疏锚点恢复度量深度。
cs.CV
Yuanyan Li, Matthias Althoff
Monocular depth estimation (MDE) typically produces depth estimations that are defined up to an unknown scale or shift. When only sparse metric anchors are available, recovering accurate metric depth becomes challenging yet necessary for practical applications...
Monocular depth estimation (MDE) typically produces depth estimations that are defined up to an unknown scale or shift. When only sparse metric anchors are available, recovering accurate metric depth becomes challenging yet necessary for practical applications. We address this problem by formulating metric depth recovery as image-adaptive scale field modeling. Instead of directly correcting the depth, we reformulate the correction as a low-dimensional linear combination of image-adaptive basis m...
77 Towards Photorealistic and Efficient Bokeh Rendering via Diffusion Framework
2605.07429
Diffusion bokeh rendering基于扩散框架实现高效逼真的手机虚化散景渲染。
cs.CV
Linxiao Shi, Siming Zheng, Zerong Wang, Hao Zhang, Jinwei Chen
Existing mobile devices are constrained by compact optical designs, such as small apertures, which make it difficult to produce natural, optically realistic bokeh effects. Although recent learning-based methods have shown promising results, they still struggle...
Existing mobile devices are constrained by compact optical designs, such as small apertures, which make it difficult to produce natural, optically realistic bokeh effects. Although recent learning-based methods have shown promising results, they still struggle with photos captured under high digital zoom levels, which often suffer from reduced resolution and loss of fine details. A naive solution is to enhance image quality before applying bokeh rendering, yet this two-stage pipeline reduces eff...
78 Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
2605.07447
VLM adversarial attack detection用稀疏自编码器作为即插即用防火墙检测VLM对抗攻击。
cs.CVcs.CLcs.LGcs.AI
Hao Wang, Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh, Daisuke Kawahara
Vision-language models (VLMs) have advanced rapidly and are increasingly deployed in real-world applications, especially with the rise of agent-based systems. However, their safety has received relatively limited attention. Even the latest proprietary and open...
Vision-language models (VLMs) have advanced rapidly and are increasingly deployed in real-world applications, especially with the rise of agent-based systems. However, their safety has received relatively limited attention. Even the latest proprietary and open-weight VLMs remain highly vulnerable to adversarial attacks, leaving downstream applications exposed to significant risks. In this work, we propose a novel and lightweight adversarial attack detection framework based on sparse autoencoders...
79 EditTransfer++: Toward Faithful and Efficient Visual-Prompt-Guided Image Editing
2605.07455
Visual-prompt edit transfer提出更忠实高效的视觉提示示例编辑迁移扩散方法。
cs.CV
Lan Chen, Qi Mao, Yiren Song, Yuchao Gu, Siwei Ma
Visual-prompt-guided edit transfer aims to learn image transformations directly from example pairs, offering more precise and controllable editing than purely text-driven approaches. However, existing diffusion transformer-based methods often fail to faithfull...
Visual-prompt-guided edit transfer aims to learn image transformations directly from example pairs, offering more precise and controllable editing than purely text-driven approaches. However, existing diffusion transformer-based methods often fail to faithfully reproduce the demonstrated edits due to structural mismatches between the task and the backbone, including a pretrained bias toward textual conditioning and inherent stochastic instability during sampling. To bridge this gap, we present E...
80 EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement
2605.07457
Agentic image edit refinement构建人类对齐的代理式框架对编辑结果进行局部精修。
cs.CV
Zitong Xu, Huiyu Duan, Yifei Nie, Mingda Du, Sijing Wu
Recent text-guided image editing (TIE) models have made remarkable progress, yet edited images still frequently suffer from fine-grained issues such as unnatural objects, lighting mismatch, and unexpected changes. Existing refinement approaches either rely on ...
Recent text-guided image editing (TIE) models have made remarkable progress, yet edited images still frequently suffer from fine-grained issues such as unnatural objects, lighting mismatch, and unexpected changes. Existing refinement approaches either rely on costly iterative regeneration or employ vision-language models (VLMs) with weak spatial grounding, often resulting in semantic drift and unreliable local corrections. To address these limitations, we first construct EditFHF-15K, a dataset o...
81 A Unified Framework for the Detection and Classification of Fatty Pancreas in Ultrasound Images
2605.07466
Ultrasound fatty pancreas diagnosis用分割与分类一体化框架自动识别超声脂肪胰。
cs.CV
Ioan-Tudor-Alexandru Anghel, Ciprian-Mihai Ceausescu, Elena Dana Nedelcu, Elena Raluca Stirban, Camelia Croitoru
Non-alcoholic fatty pancreas disease (NAFPD) is an underdiagnosed condition associated with metabolic syndrome, insulin resistance, and increased risk of pancreatic cancer. Diagnosis typically relies on subjective visual assessment of ultrasound images by clin...
Non-alcoholic fatty pancreas disease (NAFPD) is an underdiagnosed condition associated with metabolic syndrome, insulin resistance, and increased risk of pancreatic cancer. Diagnosis typically relies on subjective visual assessment of ultrasound images by clinicians. We propose an end-to-end framework for automatically classifying normal versus fatty pancreas from abdominal ultrasound images. Our method employs a TransUNet-based segmentation architecture with a ResNet encoder and transformer bot...
82 ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations
2605.07474
Federated vision-language-action learning在联邦设置下用无语言标注的视动数据训练VLA机器人模型。
cs.CVcs.AI
Yuhao Zhou, Yunpeng Zhu, Yang Zhou, Jindi Lyu, Jian Lan
Vision-Language-Action (VLA) models hold great promise for general-purpose robotic intelligence, yet scaling up such models is severely bottlenecked by the high cost of acquiring annotated training data. Fortunately, vision-equipped robots deployed across vari...
Vision-Language-Action (VLA) models hold great promise for general-purpose robotic intelligence, yet scaling up such models is severely bottlenecked by the high cost of acquiring annotated training data. Fortunately, vision-equipped robots deployed across various domains already produce abundant vision-action pairs that can be leveraged to scale up VLA training more efficiently. However, these raw data cannot be centrally aggregated due to various constraints and also exhibit severe heterogeneit...
83 ReasonEdit: Towards Interpretable Image Editing Evaluation via Reinforcement Learning
2605.07477
Interpretable image editing evaluation用强化学习训练可解释的图像编辑评测器并构建数据集。
cs.CV
Honghua Chen, Zitong Xu, Huiyu Duan, Xinyun Zhang, Xiongkuo Min
Recent text-guided image editing (TIE) models have achieved remarkable progress, however, many edited results still suffer from artifacts, unintended modifications, and suboptimal aesthetics. Although several benchmarks and evaluation methods have been propose...
Recent text-guided image editing (TIE) models have achieved remarkable progress, however, many edited results still suffer from artifacts, unintended modifications, and suboptimal aesthetics. Although several benchmarks and evaluation methods have been proposed, most existing approaches rely on scalar scores and lack interpretability. This limitation largely stems from the absence of high-quality interpretation datasets for TIE and effective reward models to train interpretable evaluators. To ad...
84 AudioFace: Language-Assisted Speech-Driven Facial Animation with Multimodal Language Models
2605.07478
Speech-driven facial animation借助多模态语言模型利用语音与语言结构生成口型表情动画。
cs.CV
Kai Zheng, Zejian Kang, Rui Mao, Hongyuan Zou, Yuanchen Fei
Speech-driven facial animation requires accurate correspondence between acoustic signals and facial motion, especially for articulation-related mouth movements. However, directly mapping speech audio to facial coefficients often overlooks the linguistic and ph...
Speech-driven facial animation requires accurate correspondence between acoustic signals and facial motion, especially for articulation-related mouth movements. However, directly mapping speech audio to facial coefficients often overlooks the linguistic and phonetic structure underlying speech production. In this paper, we propose AudioFace, a language-assisted framework for speech-driven blendshape generation that treats mouth-related facial coefficient prediction as a structured generation pro...
85 Implicit Multi-Camera System Calibration Using Gaussian Processes
2605.07491
Gaussian process camera calibration用高斯过程隐式学习多相机非线性标定并提供不确定性。
cs.CV
Ivan De Boi, Bart Ribbens, Veronika Golanova, Ursula Kapov, Simon Verspeek
This paper proposes a novel framework for implicit multi-camera system calibration utilizing Gaussian Process (GP) regression. Conventional explicit calibration methods are constrained by rigid mathematical models and struggle with complex, non-linear distorti...
This paper proposes a novel framework for implicit multi-camera system calibration utilizing Gaussian Process (GP) regression. Conventional explicit calibration methods are constrained by rigid mathematical models and struggle with complex, non-linear distortions from unconventional optics, while existing neural network-based implicit approaches are typically data-hungry and lack inherent uncertainty quantification (UQ). Our GP-based model directly learns the complex, non-linear mapping from 2D ...
86 How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings
2605.07492
Document parsing benchmark提出可溯源的文档解析新基准并审计现有基准错误与污染。
cs.CV
Zhiheng Li, Zongyang Ma, Jiaxian Chen, Jianing Zhang, Zhaolong Su
The past year has seen over 20 open-source document parsing models, yet thefield still benchmarks almost exclusively on OmniDocBench, a 1,355-pagemanually annotated dataset whose top scores have saturated above 90%. Athree-stage audit pipeline we run on OmniDo...
The past year has seen over 20 open-source document parsing models, yet thefield still benchmarks almost exclusively on OmniDocBench, a 1,355-pagemanually annotated dataset whose top scores have saturated above 90%. Athree-stage audit pipeline we run on OmniDocBench screens its 21,353evaluator-scored blocks and confirms 2,580 errors (12.08%); combined with overa year of public availability, both annotation quality and contamination riskcall its rankings into question. To address these issues, we...
87 DIMoE-Adapters: Dynamic Expert Evolution for Continual Learning in Vision-Language Models
2605.07494
Continual learning for VLMs用动态MoE适配器实现视觉语言模型的多域持续学习。
cs.CV
Mengxin Qin, Xiang Zhang, Xi Wang, Kun Wei, Xu Yang
Continual learning enables vision-language models to accumulate knowledge and adapt to evolving tasks without retraining from scratch. However, in multi-domain task-incremental learning, large domain shifts intensify the stability-plasticity dilemma. Most exis...
Continual learning enables vision-language models to accumulate knowledge and adapt to evolving tasks without retraining from scratch. However, in multi-domain task-incremental learning, large domain shifts intensify the stability-plasticity dilemma. Most existing methods rely on fixed architectures with statically allocated parameters, which limits adaptation to new domains and aggravates catastrophic forgetting. To address these challenges, we propose DIMoE-Adapters, a Dynamic Incremental Mixt...
88 Lightweight Unpaired Smartphone ISP Transfer with Semantic Pseudo-Pairing
2605.07495
Unpaired smartphone ISP transfer通过语义伪配对实现轻量无配对的手机ISP风格迁移。
cs.CV
Yujin Cho, Flavien Armangeon, Yanhao Li
Unpaired smartphone ISP is a challenging problem due to the lack of scene and color alignment between RAW and target RGB images. Many existing methods either require paired data or rely heavily on adversarial training, which can become unstable in the unpaired...
Unpaired smartphone ISP is a challenging problem due to the lack of scene and color alignment between RAW and target RGB images. Many existing methods either require paired data or rely heavily on adversarial training, which can become unstable in the unpaired setting. In this work, we present a simple and effective approach developed for the NTIRE 2026 Learned Smartphone ISP Challenge with Unpaired Data. Our method first reconstructs larger images from training patches to recover global context...
89 Cloud-top infrared observations reveal the four-dimensional precipitation structure
2605.07499
4D precipitation retrieval从静止卫星红外云顶观测反演全球降水四维结构。
cs.CV
Tianchi Xu, Ziqiang Ma, Andrea Marinoni, Yuanpeng He, Xiaoqing Li
Accurate four-dimensional (4D) precipitation information is essential for understanding the Earth's energy and water cycles, yet remains observationally unresolved at global scales. Conventional theory holds that geostationary infrared observations primarily s...
Accurate four-dimensional (4D) precipitation information is essential for understanding the Earth's energy and water cycles, yet remains observationally unresolved at global scales. Conventional theory holds that geostationary infrared observations primarily sense cloud-top properties, with limited sensitivity to sub-cloud precipitation. Here we show that cloud-top infrared measurements nevertheless encode sufficient information to recover the four-dimensional structure of precipitation, reveali...
90 Diffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformers
2605.07503
Preference alignment for video diffusion提出轨迹感知偏好对齐以高效对齐视频扩散Transformer。
cs.CV
Jingyuan Zhu, Biaolong Chen, Le Zhang, Aixi Zhang, Hao Jiang
Efficiently aligning large-scale video diffusion models with human intent requires a scalable and trajectory-aware pathway that bridges the inherent discrepancy between training noise distributions and practical inference trajectories. While existing paradigms...
Efficiently aligning large-scale video diffusion models with human intent requires a scalable and trajectory-aware pathway that bridges the inherent discrepancy between training noise distributions and practical inference trajectories. While existing paradigms such as Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO) attempt to address this, they are often hindered by either reliance on bias-prone, complex reward models or suboptimal timestep sampling. In this pa...
91 InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
2605.07510
Interleaved multimodal agentic search提出InterLV-Search基准评测语言视觉交错搜索轨迹。
cs.CVcs.CL
Bohan Hou, Jiuning Gu, Jiayan Guo, Ronghao Dang, Sicong Leng
Existing benchmarks for multimodal agentic search evaluate multimodal search and visual browsing, but visual evidence is either confined to the input or treated as an answer endpoint rather than part of an interleaved search trajectory. We introduce \textbf{In...
Existing benchmarks for multimodal agentic search evaluate multimodal search and visual browsing, but visual evidence is either confined to the input or treated as an answer endpoint rather than part of an interleaved search trajectory. We introduce \textbf{InterLV-Search}, a benchmark for Interleaved Language-Vision Agentic Search, in which textual and visual evidence is repeatedly used to condition later search. It contains 2,061 examples across three levels: active visual evidence seeking, co...
92 Hierarchical Dual-Subspace Decoupling for Continual Learning in Vision-Language Models
2605.07512
Continual learning for VLMs用层级双子空间解耦减少视觉语言持续学习遗忘。
cs.CV
Mengxin Qin, Xiang Zhang, Kun Wei, Xu Yang, Cheng Deng
Class-incremental learning aims to continuously acquire new knowledge while preserving previously learned information, thereby mitigating catastrophic forgetting. Existing methods primarily restrict parameter updates but often overlook their structural propert...
Class-incremental learning aims to continuously acquire new knowledge while preserving previously learned information, thereby mitigating catastrophic forgetting. Existing methods primarily restrict parameter updates but often overlook their structural properties in high-dimensional spaces. From a subspace perspective, updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference and severe forgetting. To address this issue...
93 Implicit Preference Alignment for Human Image Animation
2605.07545
Preference alignment for animation用隐式偏好对齐提升人体图像动画的手部动作质量。
cs.CVcs.AI
Yuanzhi Wang, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Kai Yu
Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly di...
Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly direct preference optimization, offers a potential solution, it necessitates the construction of strict preference pairs. However, curating such pairs for dynamic hand regions is prohibitively expensive and often impractical due to frame-wise...
94 Probabilistic Object Detection with Conformal Prediction
2605.07549
Conformal prediction for detection将共形预测用于目标检测以给出有覆盖保证的不确定性。
cs.CVcs.LG
Christopher Ries, Moussa Kassem Sbeyti, Nicolas Bianco, Nadja Klein
Conformal Prediction (CP) is a distribution-free method for constructing prediction sets with marginal finite-sample coverage guarantees, making it a suitable framework for reliable uncertainty quantification in safety-critical object detection. However, objec...
Conformal Prediction (CP) is a distribution-free method for constructing prediction sets with marginal finite-sample coverage guarantees, making it a suitable framework for reliable uncertainty quantification in safety-critical object detection. However, object detection introduces structured multi-output predictions, complicating the application of classical CP theory developed for single outputs. In addition, standard, unscaled CP produces fixed-width prediction intervals across inputs, leadin...
95 Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views
2605.07550
3D reconstruction from disjoint views在无视角重叠条件下实现几何准确的生成式三维重建。
cs.CV
Grzegorz Wilczynski, Miko{\l}aj Zielinski, Bartosz \'Swirta, Dominik Belter, Przemys{\l}aw Spurek
3D vision systems are fundamentally constrained by their reliance on visual overlap: reconstruction methods require it for geometric alignment, while generative models use it to enforce multi-view consistency. This limitation is particularly acute in real-worl...
3D vision systems are fundamentally constrained by their reliance on visual overlap: reconstruction methods require it for geometric alignment, while generative models use it to enforce multi-view consistency. This limitation is particularly acute in real-world scenarios such as distributed swarm robotics or crowd-sourced data collection, where capturing overlapping perspectives, both in terms of spatial and appearance overlap, is often impossible. We introduce Generative Reconstruction from Dis...
96 VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network
2605.07552
Visual-inertial 3D pose estimation提出Mamba与交叉注意混合网络实现高效视觉惯性3D人体姿态。
cs.CV
Zepeng Yang, Junxuan Bai, Hao Li, Ju Dai, Junjun Pan
The rapid advances in deep learning have significantly enhanced the accuracy of multimodal 3D human pose estimation (HPE). However, the state-of-the-art (SOTA) HPE pipelines still rely on Transformers, whose quadratic complexity makes real-time processing for ...
The rapid advances in deep learning have significantly enhanced the accuracy of multimodal 3D human pose estimation (HPE). However, the state-of-the-art (SOTA) HPE pipelines still rely on Transformers, whose quadratic complexity makes real-time processing for long sequences impractical. Mamba addresses this issue through selective state-space modeling, enabling efficient sequence processing without sacrificing representational power. Nevertheless, it struggles to capture complex spatial dependen...
97 Dynamic Mode Decomposition along Depth in Vision Transformers
2605.07556
DMD analysis of ViT depth用动态模态分解检验ViT深度是否近似线性自主演化。
cs.CV
Nishant Suresh Aswani, Saif Eddin Jabari
Recent work has shown that contiguous vision transformer (ViT) blocks (a) can be replaced by a linear map and (b) organize into recurrent phases of computation. We ask whether these observations coincide: does ViT depth implement approximately \textit{autonomo...
Recent work has shown that contiguous vision transformer (ViT) blocks (a) can be replaced by a linear map and (b) organize into recurrent phases of computation. We ask whether these observations coincide: does ViT depth implement approximately \textit{autonomous linear} dynamics, admitting a single operator $K$ applied recurrently across a contiguous span? We test this using Dynamic Mode Decomposition (DMD), which fits $K$ from selected, consecutive hidden-state pairs and predicts $p$ steps ahea...
98 Multimodal Stepwise Clinically-Guided Attention Learning for Pathological Complete Response Prediction in Breast Cancer
2605.07561
Multimodal MRI pCR prediction用临床引导的分步注意融合多模态MRI预测乳腺癌pCR。
cs.CV
Alice Natalina Caragliano, Valerio Guarrasi, Michela Gravina, Carlo Sansone, Paolo Soda
Pathological complete response (pCR) is a key prognostic factor in breast cancer patients undergoing neoadjuvant therapy, strongly associated with long-term survival and treatment personalization. However, accurate pre-treatment pCR prediction remains challeng...
Pathological complete response (pCR) is a key prognostic factor in breast cancer patients undergoing neoadjuvant therapy, strongly associated with long-term survival and treatment personalization. However, accurate pre-treatment pCR prediction remains challenging due to severe class imbalance and limited generalizability across diverse clinical settings. In this work, we propose a multimodal stepwise clinically-guided attention learning framework for pCR prediction from breast magnetic resonance...
99 Beyond GSD-as-Token: Continuous Scale Conditioning for Remote Sensing VLMs
2605.07562
Scale conditioning for RS-VLMs提出连续尺度条件微调以适配遥感VLM跨GSD尺度变化。
cs.CV
Song Zhang, Yanlong Chen, Yilin Li, Yining Chen, Zili Yi
Remote sensing vision-language models (RS-VLMs) face a fundamental mismatch with natural-image counterparts: the same geographic object exhibits radically different visual evidence across ground sampling distances (GSDs) spanning multiple orders of magnitude. ...
Remote sensing vision-language models (RS-VLMs) face a fundamental mismatch with natural-image counterparts: the same geographic object exhibits radically different visual evidence across ground sampling distances (GSDs) spanning multiple orders of magnitude. Yet existing RS-VLMs often discard GSD or inject it as a discrete text token, forcing a single static parameter set to absorb the entire scale spectrum. We introduce ScaleEarth, a parameter-efficient fine-tuning framework built on Qwen3-VL ...
100 Tracing the Arrow of Time: Diagnosing Temporal Information Flow in Video-LLMs
2605.07568
Temporal information in Video-LLMs用箭头时间任务诊断Video-LLM的时序信息流瓶颈。
cs.CVcs.CL
Peitao Han, Fei Cheng, Lis K. Pereira, Qianying Liu, Shigeru Kitazawa
The Arrow-of-Time (AoT) task, determining whether a video plays forward or backward by recognizing temporal irreversibility, is one humans solve with near-perfect accuracy, yet frontier Video Large Language Models (Video-LLMs) perform only modestly above chanc...
The Arrow-of-Time (AoT) task, determining whether a video plays forward or backward by recognizing temporal irreversibility, is one humans solve with near-perfect accuracy, yet frontier Video Large Language Models (Video-LLMs) perform only modestly above chance. This gap raises a key question: do visual backbones fail to encode temporal information, or does information bottleneck lie elsewhere in the Video-LLM architecture? We address this question by isolating the vision encoder from the Video-...
101 PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models
2605.07574
Polarization vision-language model将偏振成像物理参数接入VLM以缓解反射透明等歧义。
cs.CV
Yuliang Li, Chu Zhou, Heng Guo, Boxin Shi, Imari Sato
Mainstream vision-language models (VLMs) fundamentally struggle with severe optical ambiguities, such as reflections and transparent objects, due to the inherent limitations of standard RGB inputs. While polarization imaging captures polarimetric physical para...
Mainstream vision-language models (VLMs) fundamentally struggle with severe optical ambiguities, such as reflections and transparent objects, due to the inherent limitations of standard RGB inputs. While polarization imaging captures polarimetric physical parameters that resolve these ambiguities, existing methods are constrained by fixed-format outputs and remain isolated from open-ended reasoning. To bridge this semantic-physical gap, we introduce PolarVLM, the first multimodal framework integ...
102 Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding
2605.07575
Scene-graph streaming video understanding用显式场景图对齐证据与条件实现主动流式视频问答。
cs.CVcs.AI
Ke Ma, Jiaqi Tang, Bin Guo, Xueting Han, Ruonan Xu
Proactive streaming video understanding requires Video-LLMs to decide when to respond as a video unfolds, a task where existing methods often fall short due to their implicit, query-agnostic modeling of visual evidence. We introduce Response-G1, a novel framew...
Proactive streaming video understanding requires Video-LLMs to decide when to respond as a video unfolds, a task where existing methods often fall short due to their implicit, query-agnostic modeling of visual evidence. We introduce Response-G1, a novel framework that establishes explicit, structured alignment between the accumulated video evidence and the query's expected response conditions via scene graphs. The framework operates in three fine-tuning-free stages: (1) online query-guided scene...
103 Beyond Defenses: Manifold-Aligned Regularization for Intrinsic 3D Point Cloud Robustness
2605.07590
Robustness for 3D point clouds以流形对齐正则提升点云网络对几何保持扰动的鲁棒性。
cs.CV
Pedro Alonso, Chongshou Li, Tianrui Li
Despite extensive progress in point cloud robustness, existing methods primarily improve performance through augmentation or defense mechanisms, while overlooking the geometric root cause of adversarial fragility. We hypothesize that adversarial vulnerability ...
Despite extensive progress in point cloud robustness, existing methods primarily improve performance through augmentation or defense mechanisms, while overlooking the geometric root cause of adversarial fragility. We hypothesize that adversarial vulnerability in 3D networks arises from a manifold misalignment between the latent geometry learned by the model and the intrinsic geometry of the underlying surface. Small, geometry-preserving perturbations along the input manifold often induce disprop...
104 TraceAV-Bench: Benchmarking Multi-Hop Trajectory Reasoning over Long Audio-Visual Videos
2605.07593
Long audio-visual trajectory benchmark提出TraceAV-Bench评测长视频音视多跳轨迹推理与幻觉。
cs.CV
Hengyi Feng, Hao Liang, Mingrui Chen, Bohan Zeng, Meiyi Qiang
Real-world audio-visual understanding requires chaining evidence that is sparse, temporally dispersed, and split across the visual and auditory streams, whereas existing benchmarks largely fail to evaluate this capability. They restrict videos to short clips, ...
Real-world audio-visual understanding requires chaining evidence that is sparse, temporally dispersed, and split across the visual and auditory streams, whereas existing benchmarks largely fail to evaluate this capability. They restrict videos to short clips, isolate modalities, or reduce questions to one-hop perception. We introduce TraceAV-Bench, the first benchmark to jointly evaluate multi-hop reasoning over long audio-visual trajectories and multimodal hallucination robustness. TraceAV-Benc...
105 SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild
2605.07604
Promptable 3D animal reconstruction基于SMAL+实现单图多动物可提示三维重建。
cs.CVcs.AI
Xuyi Hu, Jin Lyu, Jiuming Liu, Yebin Liu, Silvia Zuffi
3D animal reconstruction in the wild remains challenging due to large species variation, frequent occlusions, and the prevalence of multi-animal scenes, while existing methods predominantly focus on single-animal settings. We present SAM 3D Animal, the first p...
3D animal reconstruction in the wild remains challenging due to large species variation, frequent occlusions, and the prevalence of multi-animal scenes, while existing methods predominantly focus on single-animal settings. We present SAM 3D Animal, the first promptable framework for multi-animal 3D reconstruction from a single image. Built on the SMAL+ parametric animal model, our method jointly reconstructs multiple instances and supports flexible prompts in the form of keypoints and masks whic...
106 FS-I2P:A Hierarchical Focus-Sweep Registration Network with Dynamically Allocated Depth
2605.07607
Image-to-point cloud registration提出分层聚焦扫描配准网络缓解跨模态尺度歧义与漂移。
cs.CV
Zhixin Cheng, Yujia Chen, Xujing Tao, Bohao Liao, Xiaotian Yin
Image-to-point cloud registration is often challenged by viewpoint changes, cross-modal discrepancies, and repetitive textures, which induce scale ambiguity and consequently lead to erroneous correspondences. Recent detection-free methods alleviate this issue ...
Image-to-point cloud registration is often challenged by viewpoint changes, cross-modal discrepancies, and repetitive textures, which induce scale ambiguity and consequently lead to erroneous correspondences. Recent detection-free methods alleviate this issue by leveraging multi-scale features and transformer-based interactions. However, they still suffer from attention drift across layers and intra-scale inconsistencies, hindering precise registration. Inspired by human behavior, we propose a `...
107 LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation
2605.07640
Remote-sensing lithology benchmark构建LithoBench评测多模态大模型的遥感岩性解译能力。
cs.CVcs.AI
Jun Wang, Fengpeng Li, Hang Dong, Tianjin Huang, Wei Han
Remote sensing lithology interpretation is fundamental to geological surveys, mineral exploration, and regional geological mapping. Unlike general land-cover recognition, lithology interpretation is a knowledge-intensive task that requires experts to infer roc...
Remote sensing lithology interpretation is fundamental to geological surveys, mineral exploration, and regional geological mapping. Unlike general land-cover recognition, lithology interpretation is a knowledge-intensive task that requires experts to infer rock types from various features, e.g., subtle visual, spectral, textural, geomorphological, and contextual cues, making reliable automated interpretation highly challenging. Geological knowledge-guided large multimodal models offer new opport...
108 EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting
2605.07642
Egocentric hand pose forecasting提出EggHand基础模型从第一视角视频预测未来3D手部姿态。
cs.CV
Jaeyoung Choi, Hyeondong Kim, Yujin Kim, Daehee Park
Forecasting future 3D hand pose sequences from egocentric video is essential for understanding human intention and enabling embodied applications such as AR/VR assistance and human-robot interaction. However, this task remains a highly challenging problem beca...
Forecasting future 3D hand pose sequences from egocentric video is essential for understanding human intention and enabling embodied applications such as AR/VR assistance and human-robot interaction. However, this task remains a highly challenging problem because egocentric hand motion is driven by complex human intent, exhibits highly dexterous articulations, and is observed under drastic viewpoint shifts induced by ego-motion. In this work, we introduce EggHand, a foundation-model-based framew...
109 Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models
2605.07649
ODD-aware zero-shot perception研究在运行设计域约束下用VLM进行零样本安全感知。
cs.CVcs.AI
Berkehan \"Unal, Dierend Hauke, Fazlija Dren, Plachetka Christopher
Over the last few years, research on autonomous systems has matured to such a degree that the field is increasingly well-positioned to translate research into practical, stakeholder-driven use cases across well-defined domains. However, for a wide-scale practi...
Over the last few years, research on autonomous systems has matured to such a degree that the field is increasingly well-positioned to translate research into practical, stakeholder-driven use cases across well-defined domains. However, for a wide-scale practical adoption of autonomous systems, adherence to safety regulations is crucial. Many regulations are influenced by the Operational Design Domain (ODD), which defines the specific conditions in which an autonomous agent can function. This is...
110 Breaking Spatial Uniformity: Prior-Guided Mamba with Radial Serialization for Lens Flare Removal
2605.07650
Lens flare removal with Mamba用径向序列化与先验引导Mamba实现区域自适应去眩光。
cs.CV
Zijia Fu (School of Artificial Intelligence, Beijing Normal University, Beijing, China), Yuanfei Huang (School of Artificial Intelligence
Lens flares, caused by complex optical aberrations, severely degrade image quality especially in nighttime photography. Although recent restoration methods have made remarkable progress, most still rely on spatially uniform processing. They are failing to hand...
Lens flares, caused by complex optical aberrations, severely degrade image quality especially in nighttime photography. Although recent restoration methods have made remarkable progress, most still rely on spatially uniform processing. They are failing to handle the region-dependent restoration demands of flare scenes, where saturated light sources should be preserved, flare artifacts removed, and background details recovered. To address this challenge, we propose DeflareMambav2, a prior-guided ...
111 Aquatic Neuromorphic Optical Flow
2605.07653
Neuromorphic underwater optical flow用脉冲神经网络与事件视觉实现水下高效光流估计。
cs.CV
Pei Zhang, Yunkai Liang, Kaiqiang Wang
Underwater environments impose severe constraints on conventional imaging systems and demand solutions that balance high-quality sensing with strict resource efficiency. While emerging event cameras offer a promising alternative, their potential in aquatic sce...
Underwater environments impose severe constraints on conventional imaging systems and demand solutions that balance high-quality sensing with strict resource efficiency. While emerging event cameras offer a promising alternative, their potential in aquatic scenarios remains largely unexplored. Through the lens of neuromorphic vision, this work pioneers the investigation of motion fields that serve as key media for agile underwater perception. Built upon spiking neural networks, we introduce a se...
112 Towards Billion-scale Multi-modal Biometric Search
2605.07655
Billion-scale multimodal biometric search总结Bharat ABIS十亿级多模态生物特征检索系统设计与经验。
cs.CVcs.AI
Arka Koner, Chetan S. Naik, Lokesh Kurre, Vivek Raghavan, Barada P. Sabut
Searching a multi-biometric database of a billion records for a country-level identity system requires pushing the limits of all aspects of a biometric system, including acquisition, preprocessing, feature extraction, accuracy, matching speed, presentation att...
Searching a multi-biometric database of a billion records for a country-level identity system requires pushing the limits of all aspects of a biometric system, including acquisition, preprocessing, feature extraction, accuracy, matching speed, presentation attack detection, and handling of special cases (e.g., missing finger digits). This is the first paper that gives insights into such a large-scale multimodal biometric search system, called Bharat ABIS, based on open-source architectures. The ...
113 OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos
2605.07695
Text-guided surgical video editing提出OphEdit无需训练即可文本引导编辑眼科手术视频。
cs.CV
Ritul Jangir, Arkya Jyoti Bagchi, Aiman Farooq, Mangalton Okram, Saurabh Seetaram Korgaonkar
High-fidelity surgical video generation can greatly improve medical training and the development of AI, adapting these generative models for precise video editing remains a formidable challenge. Modifying surgical attributes, such as instrument tissue interact...
High-fidelity surgical video generation can greatly improve medical training and the development of AI, adapting these generative models for precise video editing remains a formidable challenge. Modifying surgical attributes, such as instrument tissue interactions or procedural phases is challenging due to the strict anatomical and temporal constraints. In this paper, we propose OphEdit, a novel training-free framework for the text-guided editing of ophthalmic surgical videos. Our approach lever...
114 LAMES: A Large-Scale and Artisanal Mining Environmental Segmentation Dataset
2605.07740
Mining environmental segmentation dataset发布LAMES数据集用于手工采矿环境影响区域分割。
cs.CV
Matthias Kahl, Zhaiyu Chen, Sudipan Saha, Mrinalini Kochupillai, Lukas Kondmann
Mining operations are of utmost importance to the economy of some nations. However, such operations result in land-use change, very high energy consumption, and negative impacts on the environment, including soil erosion and deforestation. The mining process c...
Mining operations are of utmost importance to the economy of some nations. However, such operations result in land-use change, very high energy consumption, and negative impacts on the environment, including soil erosion and deforestation. The mining process can impact an area much larger than the mining site itself. Adding to the negative externalities linked to mining is the fact that, in addition to government-sanctioned legal mining operations, illegal mining is widespread, including in vari...
115 Benchmarking Foundation Models for Renal Lesion Stratification in CT
2605.07749
Medical foundation model benchmark基准评测医学基础模型在CT肾脏病灶分层任务的迁移效果。
cs.CV
Hartmut H\"antze, Sarah de Boer, Myrthe Buser, Alessa Hering, Bram van Ginneken
The rapid proliferation of open-source medical foundation models (FMs) raises a practical question: how well do their pre-trained representations transfer to clinically relevant but data-scarce classification tasks? Particularly in CT-based renal lesion classi...
The rapid proliferation of open-source medical foundation models (FMs) raises a practical question: how well do their pre-trained representations transfer to clinically relevant but data-scarce classification tasks? Particularly in CT-based renal lesion classification, a push toward greater generalizability would be meaningful, as the field is constrained by inherently limited training data. We addressed this through a benchmark of three medical FMs on this specific task. This six-class problem ...
116 Head Similarity: Modeling Structured Whole-Head Appearance Beyond Face Recognition
2605.07766
Whole-head appearance similarity提出头部相似度表征以建模超越人脸的整体外观一致性。
cs.CV
Yingfeng Wang, Yuxuan Xiao, Shengcai Liao
Many vision applications require identity consistency beyond strict biometric recognition, especially under non-frontal views or when facial cues are missing. However, conventional face recognition models enforce intra-identity invariance, collapsing appearanc...
Many vision applications require identity consistency beyond strict biometric recognition, especially under non-frontal views or when facial cues are missing. However, conventional face recognition models enforce intra-identity invariance, collapsing appearance variations such as hairstyle or styling changes into a single representation, limiting their use in appearance-sensitive scenarios. To address this limitation, we introduce Head Similarity, a new formulation that extends identity-centric ...
117 SIMI: Self-information Mining Network for Low-light Image Enhancement
2605.07767
Unsupervised low-light enhancement提出SIMI用位平面分解挖掘自信息实现无监督低照增强。
cs.CV
Xuanshuo Fu, Lei Kang, Javier Vazquez-Corral
Poor lighting conditions significantly impact image quality, posing substantial challenges for image editing and visualization. Many existing enhancement methods aim at proposing complex models while neglecting the intrinsic information contained within low-li...
Poor lighting conditions significantly impact image quality, posing substantial challenges for image editing and visualization. Many existing enhancement methods aim at proposing complex models while neglecting the intrinsic information contained within low-light images. In this work, we propose the Self-Information Mining (SIMI) network, an innovative unsupervised framework that decomposes low-light images into multiple components based on bit-plane decomposition. Our approach allows mining int...
118 Differentiable Ray Tracing with Gaussians for Unified Radio Propagation Simulation and View Synthesis
2605.07781
Gaussian differentiable ray tracing for RF在3D高斯表示中可微射线追踪统一RF传播仿真与视图合成。
cs.CV
Niklas Vaara, Lam Huynh, Pekka Sangi, Miguel Bordallo L\'opez, Janne Heikkil\"a
Explicit neural representations such as 3D Gaussian Splatting (3DGS) enable high-fidelity and real-time novel view synthesis, yet optimize for alpha-composited optical appearance rather than ray-intersectable geometry. In contrast, radio-frequency (RF) digital...
Explicit neural representations such as 3D Gaussian Splatting (3DGS) enable high-fidelity and real-time novel view synthesis, yet optimize for alpha-composited optical appearance rather than ray-intersectable geometry. In contrast, radio-frequency (RF) digital twins require deterministic multi-bounce paths, where the geometry dictates trajectories and their associated attenuation and delay. We introduce a framework enabling differentiable RF propagation simulation directly within visually recons...
119 Radiologist-Guided Causal Concept Bottleneck Models for Chest X-Ray Interpretation
2605.07785
Causal concept bottleneck for CXR提出放射科医生引导的因果概念瓶颈模型解释胸片诊断。
cs.CV
Amy Rafferty, Rishi Ramaesh, Ajitha Rajan
Concept Bottleneck Models (CBMs) in medical imaging aim to improve model interpretability by predicting intermediate clinical concepts before final diagnoses. However, most existing CBMs treat concepts as discriminative predictors of pathology labels, without ...
Concept Bottleneck Models (CBMs) in medical imaging aim to improve model interpretability by predicting intermediate clinical concepts before final diagnoses. However, most existing CBMs treat concepts as discriminative predictors of pathology labels, without explicitly modelling the underlying clinical generative process where diseases produce observable radiographic findings. We propose XpertCausal, a radiologist-guided causal CBM for chest X-ray interpretation which models pathology-to-concep...
120 APEX: Assumption-free Projection-based Embedding eXamination Metric for Image Quality Assessment
2605.07786
Image quality assessment metric提出APEX无假设投影嵌入度量以评估生成图像质量。
cs.CVcs.AI
Caterina Gallegati, Monica Bianchini, Franco Scarselli, Vittorio Murino, Barbara Toniella Corradini
As generative models achieve unprecedented visual quality, the gold standard for image evaluation remains traditional feature-distribution metrics (e.g., FID). However, these metrics are provably hindered by the closed-vocabulary bottleneck of outdated feature...
As generative models achieve unprecedented visual quality, the gold standard for image evaluation remains traditional feature-distribution metrics (e.g., FID). However, these metrics are provably hindered by the closed-vocabulary bottleneck of outdated features and the assumptive bias of rigid parametric formulations. Recent alternatives exploit modern backbones to solve the feature bottleneck, yet continue to suffer from parametric limitations. To close this gap, we introduce APEX (Assumption-f...
121 SARA: Semantically Adaptive Relational Alignment for Video Diffusion Models
2605.07800
Video diffusion prompt alignment按提示语义分配关系对齐监督以提升视频扩散跟随性
cs.CV
Jiesong Lian, Zixiang Zhou, Ruizhe Zhong, Yuan Zhou, Qinglin Lu
Recent video diffusion models (VDMs) synthesize visually convincing clips, yet still drop entities, mis-bind attributes, and weaken the interactions specified in the prompt. Representation-alignment objectives such as VideoREPA and MoAlign improve fine-grained...
Recent video diffusion models (VDMs) synthesize visually convincing clips, yet still drop entities, mis-bind attributes, and weaken the interactions specified in the prompt. Representation-alignment objectives such as VideoREPA and MoAlign improve fine-grained text following by distilling spatio-temporal token relations from a frozen visual foundation model, but their pairwise supervision budget is allocated by visual or motion cues rather than by how relevant each pair is to the prompt. We pres...
122 Text-to-CAD Evaluation with CADTests
2605.07807
Text-to-CAD evaluation benchmark提出可执行CADTests的自动化评测基准CADTestBench
cs.CVcs.LGcs.AI
Dimitrios Mallis, Marco Wang, Ahmet Serdar Karadeniz, Elisa Ricci, Anis Kacem
Text-to-CAD has recently emerged as an important task with the potential to substantially accelerate design workflows. Despite its significance, there has been surprisingly little work on Text-to-CAD evaluation, and assessing CAD model generation performance r...
Text-to-CAD has recently emerged as an important task with the potential to substantially accelerate design workflows. Despite its significance, there has been surprisingly little work on Text-to-CAD evaluation, and assessing CAD model generation performance remains a considerable challenge. In this work, we introduce a new evaluation perspective for Text-to-CAD based on automated testing. We propose CADTestBench, the first test-based benchmark for Text-to-CAD, based on CADTests, executable soft...
123 ICDAR 2026 Competition on Writer Identification and Pen Classification from Hand-Drawn Circles
2605.07816
Writer ID and pen classification发布CircleID竞赛数据集用于写者识别与笔类型分类
cs.CV
Thomas Gorges, Janne van der Loop, Lukas H\"uttner, Linda-Sophie Schneider, Fei Wu
This paper presents CircleID, a large-scale ICDAR 2026 competition on writer identification and pen classification from scanned hand-drawn circles. The primary objective is to investigate how biometric writer characteristics and physical pen features naturally...
This paper presents CircleID, a large-scale ICDAR 2026 competition on writer identification and pen classification from scanned hand-drawn circles. The primary objective is to investigate how biometric writer characteristics and physical pen features naturally entangle within minimal, static traces. CircleID comprises two distinct tasks: (1) open-set writer identification, requiring models to recognize known writers while explicitly rejecting unknown ones, and (2) cross-writer pen classification...
124 GazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoning
2605.07817
Active attention control for VLMs通过内部注意力控制实现主动视觉以增强多模态推理
cs.CVcs.CLcs.AI
Brown Ebouky, Gabriele Carrino, Niccolo Avogaro, Christoph Studer, Andrea Bartezzaghi
Human visual reasoning is governed by active vision, a process where metacognitive control drives top-down goal-directed attention, dynamically routing foveal focus toward task-relevant details while maintaining peripheral awareness of the global scene. In con...
Human visual reasoning is governed by active vision, a process where metacognitive control drives top-down goal-directed attention, dynamically routing foveal focus toward task-relevant details while maintaining peripheral awareness of the global scene. In contrast, modern Vision-Language Models (VLMs) process visual information passively, relying on the static accumulation of massive token contexts that dilute spatial reasoning and induce linguistic hallucinations. Here we propose the following...
125 Divide and Conquer: Object Co-occurrence Helps Mitigate Simplicity Bias in OOD Detection
2605.07821
OOD detection with co-occurrence利用物体共现上下文缓解简单性偏置提升近OOD检测
cs.CVcs.AI
Boyang Dai, Chaoqi Chen, Yizhou Yu
Out-of-distribution (OOD) detection is crucial for ensuring the reliability of deep learning models. Existing methods mostly focus on regular entangled representations to discriminate in-distribution (ID) and OOD data, neglecting the rich contextual informatio...
Out-of-distribution (OOD) detection is crucial for ensuring the reliability of deep learning models. Existing methods mostly focus on regular entangled representations to discriminate in-distribution (ID) and OOD data, neglecting the rich contextual information within images. This issue is particularly challenging for detecting near-OOD, as models with simplicity bias struggle to learn discriminative features in disentangled representations. The human visual system can use the co-occurrence of o...
126 Explainable Part-Based Vehicle Classifier with Spatial Awareness
2605.07831
Explainable part-based vehicle classification用部件检测加决策树实现可解释且具空间感知的车型分类
cs.CV
Andreas Caduff (Competence Center for Intelligent Sensors and Networks, Lucerne University of Applied Science and Art), Klaus Zahn (Competence Center for Intelligent Sensors and Networks, Lucerne University of Applied Science and Art), Jonas Hofstetter (Competence Center for Intelligent Sensors and Networks
In the area of Intelligent Transportation Systems (ITS), fine-grained vehicle classification systems play an essential role. Recently, the authors have presented a novel vision-based classification approach in which standard end-to-end Convolutional Neural Net...
In the area of Intelligent Transportation Systems (ITS), fine-grained vehicle classification systems play an essential role. Recently, the authors have presented a novel vision-based classification approach in which standard end-to-end Convolutional Neural Networks (CNNs) have been decomposed into 1) a CNN-based detector for semantically strong vehicle parts, followed by 2) feature construction and 3) final classification by a decision tree. In contrast to conventional CNNs, this allows both eas...
127 BRIDGE: Background Routing and Isolated Discrete Gating for Coarse-Mask Local Editing
2605.07846
Coarse-mask local image editing用背景路由与离散门控减轻粗掩码形状偏置实现局部编辑
cs.CV
Peilin Xiong, Honghui Yuan, Junwen Chen, Keiji Yanai
Coarse-mask local image editing asks a model to modify a user-indicated region while preserving the surrounding scene. In practice, however, rough masks often become unintended shape priors: instead of serving as flexible edit support, the mask can pull the ge...
Coarse-mask local image editing asks a model to modify a user-indicated region while preserving the surrounding scene. In practice, however, rough masks often become unintended shape priors: instead of serving as flexible edit support, the mask can pull the generated object toward its accidental boundary. We study this failure as mask-shape bias and frame the task through a Two-Zone Constraint, where the background should remain stable while the editable region should follow the instruction with...
128 EyeCue: Driver Cognitive Distraction Detection via Gaze-Empowered Egocentric Video Understanding
2605.07859
Driver cognitive distraction detection融合注视信息的第一视角视频理解检测驾驶认知分心
cs.CV
Lang Zhang, JinYi Yoon, Matthew Corbett, Abhijit Sarkar, Bo Ji
Driver cognitive distraction is a major cause of road collisions and remains difficult to detect. Unlike manual or visual distraction, cognitive distraction is diverted by thoughts unrelated to driving, even when the driver appears visually attentive and exhib...
Driver cognitive distraction is a major cause of road collisions and remains difficult to detect. Unlike manual or visual distraction, cognitive distraction is diverted by thoughts unrelated to driving, even when the driver appears visually attentive and exhibits no explicit physical movements. In this work, we propose EyeCue, a gaze-empowered egocentric video understanding framework, to detect driver cognitive distraction. A key insight is that cognitive distraction manifests in the interaction...
129 From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data
2605.07861
Identity-consistent makeup transfer结合合成与真实数据实现保持身份一致的妆容迁移
cs.CV
Yue Yu, Jiayu Wang, Jiajia Shi, Jingjing Chen, Yu-Gang Jiang
Makeup transfer aims to apply the makeup style of a reference portrait to a source portrait while preserving identity and background. Early methods formulate this task as unsupervised image-to-image translation, relying on surrogate objectives and often yieldi...
Makeup transfer aims to apply the makeup style of a reference portrait to a source portrait while preserving identity and background. Early methods formulate this task as unsupervised image-to-image translation, relying on surrogate objectives and often yielding limited performance. Recent diffusion- and flow-based approaches instead exploit synthetic data for supervised training, leading to significant improvements. However, these methods still face two critical challenges: synthetic supervisio...
130 Video Understanding Reward Modeling: A Robust Benchmark and Performant Reward Models
2605.07872
Video reward modeling benchmark构建VURB偏好基准并训练高性能视频理解奖励模型
cs.CVcs.AI
Yuancheng Wei, Linli Yao, Lei Li, Haojie Zhang, Hao Zhou
Multimodal reward models have advanced substantially in text and image domains, yet progress in video understanding reward modeling remains severely limited by the lack of robust evaluation benchmarks and high-quality preference data. To address this, we propo...
Multimodal reward models have advanced substantially in text and image domains, yet progress in video understanding reward modeling remains severely limited by the lack of robust evaluation benchmarks and high-quality preference data. To address this, we propose a unified framework spanning benchmark design, data construction, and reward model training. We introduce Video Understanding Reward Bench (VURB), a benchmark featuring 2,100 preference pairs with long chain-of-thought reasoning traces (...
131 Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding
2605.07897
Semantic adaptive video memory以语义信号自适应管理记忆以支持流式视频问答理解
cs.CVcs.AI
Hang Wu, Sherin Mary Mathews, Yujun Cai, Ming-Hsuan Yang, Yiwei Wang
Online streaming video understanding requires models to process continuous visual inputs and respond to user queries in real time, where the unbounded stream and unpredictable query timing turn memory management into a central challenge. Existing methods typic...
Online streaming video understanding requires models to process continuous visual inputs and respond to user queries in real time, where the unbounded stream and unpredictable query timing turn memory management into a central challenge. Existing methods typically compress visual tokens via visual similarity heuristics, or augment compression with KV-cache-level retrieval. However, compression decisions rarely incorporate semantic signals, and retrieval is often added after compression is finali...
132 One World, Dual Timeline: Decoupled Spatio-Temporal Gaussian Scene Graph for 4D Cooperative Driving Reconstruction
2605.07910
Asynchronous 4D driving reconstruction用双时间线高斯场景图处理异步观测的协同驾驶重建
cs.CV
Yulong Chen, Xiaoyun Dong, Haoyu Zhang, Zongxian Yang, Lewei Xie
Reconstructing dynamic scenes from Vehicle-to-Infrastructure Cooperative Autonomous Driving (VICAD) data is fundamentally complicated by temporal asynchrony: vehicle and infrastructure cameras operate on independent clocks, capturing the same dynamic agent suc...
Reconstructing dynamic scenes from Vehicle-to-Infrastructure Cooperative Autonomous Driving (VICAD) data is fundamentally complicated by temporal asynchrony: vehicle and infrastructure cameras operate on independent clocks, capturing the same dynamic agent such as cars and pedestrians at different physical times. Existing Gaussian Scene Graph methods implicitly assume synchronized observations and assign a single pose per agent per frame, which is an assumption that breaks in cooperative setting...
133 What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
2605.07915
Tokenizer design for latent diffusion研究扩散友好潜空间并提出先验对齐自编码器改进LDM
cs.CV
Zhengrong Yue, Taihang Hu, Mengting Chen, Haiyu Zhang, Zihao Pan
Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve reconstruction fidelity or inherit pretrained representations, leav...
Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve reconstruction fidelity or inherit pretrained representations, leaving unclear what kind of latent space is truly friendly for generative modeling. In this paper, we study this question from the perspective of latent manifold organization. By constructing controlled tokenizer variants, we identify three ke...
134 MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence
2605.07919
Trustworthy medical VLM evaluation评测医学VLM在证据破坏下的静默失败并给出基准
cs.CV
Hanqi Jiang, Junhao Chen, Yi Pan, Lifeng Chen, Weihang You
Medical vision--language models (VLMs) are usually evaluated on intact image--question pairs, but trustworthy clinical use requires a stronger property: a model must recognise when the evidential basis for an answer has failed. We study this through silent fai...
Medical vision--language models (VLMs) are usually evaluated on intact image--question pairs, but trustworthy clinical use requires a stronger property: a model must recognise when the evidential basis for an answer has failed. We study this through silent failures under perturbed evidence, where a vision-required medical question is paired with a false premise, wording perturbation, knowledge-only rewrite, or ROI-corrupted image, yet the model returns a fluent non-refusal answer. We introduce m...
135 One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
2605.07931
Low-bandwidth world models for VLA提出每帧单token的低带宽世界模型以提升VLA规划
cs.CVcs.AI
Zuojin Tang, Shengchao Yuan, Xiaoxin Bai, Zhiyuan Jin, De Ma
Vision-language-action (VLA) models increasingly rely on auxiliary world modules to plan over long horizons, yet how such modules should be parameterized on top of a pretrained VLA remains an open design question. Existing world-model-augmented VLAs typically ...
Vision-language-action (VLA) models increasingly rely on auxiliary world modules to plan over long horizons, yet how such modules should be parameterized on top of a pretrained VLA remains an open design question. Existing world-model-augmented VLAs typically pass the per-frame visual stream into the world module at high visual bandwidth and treat its rollout as a side product of action prediction; under a constrained adaptation budget on a frozen backbone, this leaves both the per-frame represe...
136 Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision
2605.07940
Single-pair exemplar image editing用单对样例监督学习可迁移编辑语义的Delta-Adapter
cs.CV
Jiacheng Chen, Songze Li, Han Fu, Baoquan Zhao, Wei Liu
Exemplar-based image editing applies a transformation defined by a source-target image pair to a new query image. Existing methods rely on a pair-of-pairs supervision paradigm, requiring two image pairs sharing the same edit semantics to learn the target trans...
Exemplar-based image editing applies a transformation defined by a source-target image pair to a new query image. Existing methods rely on a pair-of-pairs supervision paradigm, requiring two image pairs sharing the same edit semantics to learn the target transformation. This constraint makes training data difficult to curate at scale and limits generalization across diverse edit types. We propose Delta-Adapter, a method that learns transferable editing semantics under single-pair supervision, re...
137 Rebalancing gradient to improve self-supervised co-training of depth, odometry and optical flow predictions
2605.07945
Self-supervised depth-odometry-flow co-training通过动态梯度再平衡改进深度里程计与光流的自监督协同训练
cs.CV
Marwane Hariat, Antoine Manzanera, David Filliat
We present CoopNet, an approach that improves the cooperation of co-trained networks by dynamically adapting the apportionment of gradient, to ensure equitable learning progress. It is applied to motion-aware self-supervised prediction of depth maps, by introd...
We present CoopNet, an approach that improves the cooperation of co-trained networks by dynamically adapting the apportionment of gradient, to ensure equitable learning progress. It is applied to motion-aware self-supervised prediction of depth maps, by introducing a new hybrid loss, based on a distribution model of photo-metric reconstruction errors made by, on the one hand the depth + odometry paired networks, and on the other hand the optical flow network. This model essentially assumes that ...
138 TimeLesSeg: Unified Contrast-Agnostic Cross-Sectional and Longitudinal MS Lesion Segmentation via a Stochastic Generative Model
2605.07955
MS lesion segmentation generative model用随机生成模型统一跨期与纵向且对对比度不敏感的MS分割
cs.CVcs.AI
Vicent Caselles-Ballester, Eloy Mart\'inez-Heras, Giuseppe Pontillo, Zoe Mendelsohn, Elena M. Marr\'on
Multiple sclerosis (MS) expresses substantial clinical and radiological heterogeneity, which poses significant challenges for automatic lesion segmentation. The current deep learning-based SOTA is highly susceptible to changes in both distribution, e.g., chang...
Multiple sclerosis (MS) expresses substantial clinical and radiological heterogeneity, which poses significant challenges for automatic lesion segmentation. The current deep learning-based SOTA is highly susceptible to changes in both distribution, e.g., changes in scanner; as well as the structure of inputs, evident in the current divide between cross-sectional and longitudinal approaches. We introduce TimeLesSeg, a unified contrast-agnostic framework designed to segment MS lesions regardless o...
139 DVD: Discrete Voxel Diffusion for 3D Generation and Editing
2605.07971
Discrete voxel diffusion for 3D提出离散体素扩散生成与编辑稀疏体素脚手架用于3D管线
cs.CVcs.LG
Zhengrui Xiang, Jiaqi Wu, Fupeng Sun, Heliang Zheng, Yingzhen Li
We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in ...
We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in image-like generation, we show that it can be an effective first-stage prior for sparse voxel scaffolds. By treating voxel occupancy as a native discrete variable, DVD avoids continuous-to-discrete thresholding and provides a simple framewo...
140 HEART: Hyperspherical Embedding Alignment via Kent-Representation Traversal in Diffusion Models
2605.07973
Hyperspherical embedding control in diffusion在超球嵌入上对齐并遍历Kent表示以增强扩散可控性
cs.CV
Arani Roy, Shristi Das Biswas, Kaushik Roy
Text-to-image diffusion models can generate visually stunning images, yet, controlling what appears and how it appears, remains surprisingly difficult, especially when operating solely within the constraints of the text-conditioning space. For example, changin...
Text-to-image diffusion models can generate visually stunning images, yet, controlling what appears and how it appears, remains surprisingly difficult, especially when operating solely within the constraints of the text-conditioning space. For example, changing a subject or adjusting an attribute often leads to unintended side effects, such as altered backgrounds or distorted details. This is because most existing text-based control methods treat the embedding space as Euclidean and apply simple...
141 Seeing Across Skies and Streets: Feedforward 3D Reconstruction from Satellite, Drone, and Ground Images
2605.07978
Cross-view feedforward 3D reconstruction融合卫星无人机与地面图像实现前馈式跨视角3D重建定位
cs.CV
Qiwei Wang, Zhongyao Tuo, Xianghui Ze, Yujiao Shi
Cross-view localization classically asks: where does this ground image lie on the satellite tile? Existing methods are typically limited to 3-DoF estimates -- an $(x,y)$ position and a yaw angle -- because nadir satellite imagery provides no direct cues for ro...
Cross-view localization classically asks: where does this ground image lie on the satellite tile? Existing methods are typically limited to 3-DoF estimates -- an $(x,y)$ position and a yaw angle -- because nadir satellite imagery provides no direct cues for roll, pitch, or altitude, forcing a reliance on planar-motion and zero-tilt assumptions. These assumptions break on real terrain with slopes, ramps, and tilted camera mounts. To overcome this, we introduce a single UAV image as an intermediat...
142 Rethinking Dense Optical Flow without Test-Time Scaling
2605.08000
Efficient dense optical flow借助基础模型语义几何先验提升光流精度而无需测试时缩放
cs.CV
Praroop Chanda, Suryansh Kumar
Recent progress in dense optical flow has been driven by increasingly complex architectures and multi-step refinement for test-time scaling. While these approaches achieve strong benchmark performance, they also require substantial computation during inference...
Recent progress in dense optical flow has been driven by increasingly complex architectures and multi-step refinement for test-time scaling. While these approaches achieve strong benchmark performance, they also require substantial computation during inference. This raises a fundamental question: Is scaling test-time computation the only way to improve dense optical flow accuracy? We argue that it is not. Instead, powerful visual semantic and geometric priors encoded in modern foundation models ...
143 SphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hypersphere
2605.08003
Training-free video anomaly detection在单位超球上做测地推断利用预训练特征实现免训练异常检测
cs.CV
Chao Huang, Penfei Wei, Wei Wang, Jie Wen, Zhihua Wang
Video anomaly detection (VAD) aims to automatically identify events that deviate from normal patterns in untrimmed surveillance videos. Existing methods universally depend on large-scale annotations or task-specific training procedures, severely limiting their...
Video anomaly detection (VAD) aims to automatically identify events that deviate from normal patterns in untrimmed surveillance videos. Existing methods universally depend on large-scale annotations or task-specific training procedures, severely limiting their rapid deployment to novel scenes. We observe that intermediate-layer features of pre-trained multimodal large language models (MLLMs) already encode rich anomaly semantics, yet existing approaches rely on the language output pathway and fa...
144 TRAS: An Interactive Software for Tracing Tree Ring Cross Sections
2605.08025
Tree ring tracing software开源TRAS软件用于树轮自动描绘、人工校正与测量
cs.CV
Henry Marichal, Diego Passarella, Gregory Randall
Tree ring marking remains a key step in dendrometry and dendrochronology, but it is often performed manually, making the process time-consuming, subjective, and difficult to scale to large image datasets. We present the Tree Ring Analyzer Suite (TRAS), an open...
Tree ring marking remains a key step in dendrometry and dendrochronology, but it is often performed manually, making the process time-consuming, subjective, and difficult to scale to large image datasets. We present the Tree Ring Analyzer Suite (TRAS), an open-source graphical software for automatic delineation, manual correction, and measurement of tree rings in wood cross-sectional images. TRAS integrates three complementary detection algorithms: the classical image-processing method CS-TRD an...
145 STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation
2605.08029
Unified multimodal generation with flows用自回归归一化流连接语言模型实现统一的图文序列生成
cs.CVcs.LG
Ying Shen, Tianrong Chen, Yuan Gao, Yizhe Zhang, Yuyang Wang
Deep generative models have advanced rapidly across text and vision, motivating unified multimodal systems that can understand, reason over, and generate interleaved text-image sequences. Most existing approaches combine autoregressive language modeling with d...
Deep generative models have advanced rapidly across text and vision, motivating unified multimodal systems that can understand, reason over, and generate interleaved text-image sequences. Most existing approaches combine autoregressive language modeling with diffusion-based image generators, inheriting a structural mismatch between causal text generation and iterative visual denoising. We observe that autoregressive normalizing flows are autoregressive Transformers--sharing the same causal mask,...
146 PET-Adapter: Test-Time Domain Adaptation for Full and Limited-Angle PET Image Reconstruction
2605.08030
Test-time domain adaptation for PET提出PET-Adapter在测试时自适应以提升全角与限角PET重建
cs.CVcs.LG
R\"uveyda Yilmaz, Yuli Wu, Johannes Stegmaier, Volkmar Schulz
Positron Emission Tomography (PET) image reconstruction is inherently challenged by Poisson noise and physical degradation factors, which are further exacerbated in limited-angle acquisitions. While deep learning methods demonstrate promising performance, thei...
Positron Emission Tomography (PET) image reconstruction is inherently challenged by Poisson noise and physical degradation factors, which are further exacerbated in limited-angle acquisitions. While deep learning methods demonstrate promising performance, their generalization to unseen clinical data distributions remains limited without extensive retraining. We propose PET-Adapter, a test-time domain adaptation framework for generative PET reconstruction models pretrained solely on phantom data....
147 Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models
2605.08031
Reinforcement unlearning for VLMs在视觉编码器上用强化式遗忘去除敏感语义并抑制幻觉
cs.CV
Kaidi Jia, Yujie Lin, Chengyi Yang, Jiayao Ma, Jinsong Su
Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge. However, existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fai...
Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge. However, existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fails to erase underlying visual representations and often introduces object hallucination. We propose HFRU, a reinforcement unlearning framework that operates on the vision encoder for deep semantic removal. Our two-stage approach combines al...
148 SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
2605.08043
Structured orchestration for image generation以结构化分解与条件技能编排贯穿生成流程满足复杂语义约束
cs.CVcs.AI
Tianfei Ren, Zhipeng Yan, Yiming Zhao, Zhen Fang, Yu Zeng
While text-to-image models have made strong progress in visual fidelity, faithfully realizing complex visual intents remains challenging because many requirements must be tracked across grounding, generation, and verification. We refer to these requirements as...
While text-to-image models have made strong progress in visual fidelity, faithfully realizing complex visual intents remains challenging because many requirements must be tracked across grounding, generation, and verification. We refer to these requirements as semantic commitments and formalize their lifecycle discontinuity as the Conceptual Rift, where commitments may be locally resolved or checked but fail to remain identifiable as the same operational units throughout the generation lifecycle...
149 MoCoTalk: Multi-Conditional Diffusion with Adaptive Router for Controllable Talking Head Generation
2605.08050
Controllable talking-head video diffusion用自适应路由融合多条件信号实现可控说话人头视频生成
cs.CV
Xinyan Ye, Jiankang Deng, Abbas Edalat
Talking-head generation requires joint modeling of identity, head pose, facial expression, and mouth dynamics. Existing methods typically address only a subset of these factors, and rely on fixed-weight or heuristic fusion when multiple conditions are involved...
Talking-head generation requires joint modeling of identity, head pose, facial expression, and mouth dynamics. Existing methods typically address only a subset of these factors, and rely on fixed-weight or heuristic fusion when multiple conditions are involved. We present MoCoTalk, a multi-conditional video diffusion framework that unifies four complementary control signals: a reference image, facial keypoints, 3DMM-rendered shading meshes, and the corresponding speech audio. To resolve destruct...
150 Towards Highly-Constrained Human Motion Generation with Retrieval-Guided Diffusion Noise Optimization
2605.08054
Constrained human motion diffusion用检索引导的扩散噪声优化生成满足强时空约束的人体动作
cs.CV
Hanchao Liu, Fang-Lue Zhang, Shining Zhang, Tai-Jiang Mu, Shi-Min Hu
Generating human motion that satisfies customized zero-shot goal functions, enabling applications such as controllable character animation and behavior synthesis for virtual agents, is a critical capability. While current approaches handle many unseen constrai...
Generating human motion that satisfies customized zero-shot goal functions, enabling applications such as controllable character animation and behavior synthesis for virtual agents, is a critical capability. While current approaches handle many unseen constraints, they fail on tasks with very challenging spatiotemporal restrictions, such as severe spatial obstacles or specified numbers of walking steps. To equip motion generators for these highly constrained tasks, we present a retrieval-guided ...
151 6D Pose Estimation via Keypoint Heatmap Regression with RGB-D Residual Neural Networks
2605.08059
RGB-D 6D Pose Estimation用关键点热力图回归结合PnP估计物体6D位姿。
cs.CV
Ismail Aljosevic, Amir Masoud Almasi, Ana Parovic, Ashkan Shafiei
In this paper, we propose a modular framework for 6D pose estimation based on keypoint heatmap regression. Our approach combines YOLOv10m for object detection with a ResNet18-based network that predicts 2D heatmaps from RGB images. Keypoints extracted from the...
In this paper, we propose a modular framework for 6D pose estimation based on keypoint heatmap regression. Our approach combines YOLOv10m for object detection with a ResNet18-based network that predicts 2D heatmaps from RGB images. Keypoints extracted from these heatmaps are used to estimate the 6D object pose via the PnP RANSAC algorithm. We compare different keypoint selection strategies to assess their impact on pose accuracy. Additionally, we extend the baseline by incorporating depth data u...
152 Flow-OPD: On-Policy Distillation for Flow Matching Models
2605.08063
On-Policy Distillation for Flow提出Flow-OPD缓解多任务对齐的奖励稀疏与梯度干扰。
cs.CVcs.AI
Zhen Fang, Wenxuan Huang, Yu Zeng, Yiming Zhao, Shuang Chen
Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induced by scalar-valued rewards, and the gradient interference arising from jointly optimizing heterogeneous objectives, whic...
Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induced by scalar-valued rewards, and the gradient interference arising from jointly optimizing heterogeneous objectives, which together give rise to a 'seesaw effect' of competing metrics and pervasive reward hacking. Inspired by the success of On-Policy Distillation (OPD) in the large language model community, we propose Flow-OPD, the first unified post-training...
153 Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment
2605.08064
Efficient 3D VLM Representations用语义聚类与对齐构建高效3D表征以增强VLM空间推理。
cs.CV
Jerry Jiang, Haowen Sun, Denis Gudovskiy, Yohei Nakata, Tomoyuki Okuno
Spatial intelligence in vision-language models (VLMs) attracts research interest with the practical demand to reason in the 3D world.Despite promising results, most existing methods follow the conventional 2D pipeline in VLMs and use pixel-aligned representati...
Spatial intelligence in vision-language models (VLMs) attracts research interest with the practical demand to reason in the 3D world.Despite promising results, most existing methods follow the conventional 2D pipeline in VLMs and use pixel-aligned representations for the vision modality. However, correspondence-based models with implicit 3D scene understanding often fail to achieve spatial consistency, and representation-based models with 3D geometric priors lack efficiency in vision sequence se...
154 EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction
2605.08073
Event-guided Image Reconstruction提出高效状态空间模型融合事件数据进行图像重建。
cs.CVcs.AI
Wei Yu, Yunhang Qian
Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations: CNNs often fail to ...
Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations: CNNs often fail to capture global feature correlations, whereas ViTs incur quadratic computational complexity (e.g., $O(n^2)$), hindering their application in high-resolution scenarios. To address these bottlenecks, we introduce EmambaIR, an Efficient visual ...
155 Normalizing Trajectory Models
2605.08078
Normalizing Flow Trajectory Models用条件归一化流建模少步反向轨迹并保持精确似然训练。
cs.CVcs.LG
Jiatao Gu, Tianrong Chen, Ying Shen, David Berthelot, Shuangfei Zhai
Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coarse transitions. Existing few-step methods address this through distillation, consistency training, o...
Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coarse transitions. Existing few-step methods address this through distillation, consistency training, or adversarial objectives, but sacrifice the likelihood framework in the process. We introduce Normalizing Trajectory Models (NTM), which models each reverse step as an expressive conditional normalizing flow with exact likelihood training. ...
156 Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
2605.06169
Stabilizing Deep Diffusion Transformers分析千层DiT均值塌缩并用均值-方差残差分裂抑制MMS。
cs.CV
Pengqi Lu
Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. Through mechanistic auditing...
Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. Through mechanistic auditing, we isolate the trigger event of this collapse as Mean Mode Screaming (MMS). MMS can occur even when training appears stable, with a mean-coherent backward shock on residual writers that opens deep residual branches and drives the network ...
157 On the Role of Strain and Vorticity in Numerical Integration Error for Flow Matching
2605.06680
Flow Matching Integration Error分解速度雅可比为应变与涡量以刻画流匹配积分误差来源。
cs.CVcs.LG
Chenxi Tao, Seung-Kyum Choi
Flow matching generates data by integrating a learned velocity field, where the number of integration steps (NFE) directly determines inference cost. We analyze which properties of the velocity field govern integration error by decomposing the velocity Jacobia...
Flow matching generates data by integrating a learned velocity field, where the number of integration steps (NFE) directly determines inference cost. We analyze which properties of the velocity field govern integration error by decomposing the velocity Jacobian into its symmetric part S (strain rate) and antisymmetric part Omega (vorticity). We prove that strain and vorticity play different roles: strain controls exponential error amplification through the logarithmic norm, while vorticity contr...
158 A Hierarchical Ensemble Pipeline for Anomaly Detection in ESA Satellite Telemetry
2605.06681
Satellite Telemetry Anomaly Detection用分层集成与多级堆叠检测ESA多变量遥测异常。
cs.CVcs.LG
Lorenzo Riccardo Allegrini, Geremia Pompei
A hierarchical ensemble pipeline is introduced to address anomaly detection in multivariate telemetry data provided by European Space Agency (ESA). The method integrates shapelet-based and statistical feature extraction, per-channel modeling, intra-channel sta...
A hierarchical ensemble pipeline is introduced to address anomaly detection in multivariate telemetry data provided by European Space Agency (ESA). The method integrates shapelet-based and statistical feature extraction, per-channel modeling, intra-channel stacking, and a final cross-channel aggregation. The pipeline is trained and validated using time-series cross-validation and two-level masking strategies to prevent information leakage. Results on the European Space Agency Anomaly Detection B...
159 Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention
2605.06699
Multimodal Latent Diffusion Synthesis在共享潜空间用交叉注意力联合生成MRI体数据与表格数据。
cs.CVcs.LGcs.AI
Daniel Mensing, Jan Kapar, Jochen G. Hirsch, Matthias G\"unther, Horst Hahn
We propose a multimodal latent diffusion model that jointly synthesizes volumetric magnetic resonance imaging (MRI) and tabular clinical data within a shared latent space via cross-attention. This approach enables coherent joint representation learning of MRI ...
We propose a multimodal latent diffusion model that jointly synthesizes volumetric magnetic resonance imaging (MRI) and tabular clinical data within a shared latent space via cross-attention. This approach enables coherent joint representation learning of MRI and tabular modalities for generative modeling. Our model utilizes a variational autoencoder to fuse the two modalities before diffusion-based synthesis, allowing modality-appropriate reconstruction with separate decoders for MRI and tabula...
160 Weblica: Scalable and Reproducible Training Environments for Visual Web Agents
2605.06761
Reproducible Web Agent Environments用HTTP缓存与回放构建可扩展可复现的视觉网页智能体训练环境。
cs.CVcs.LGcs.AI
O\u{g}uzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan
The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environme...
The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stabl...
161 Enhancing Eye Movement Biometrics for User Authentication via Continuous Gaze Offset Score Fusion
2605.06810
Eye Movement Biometrics Fusion将连续凝视偏移与眼动特征融合提升用户认证性能。
cs.CV
Hashim Aziz, Mehedi Hasan Raju, Oleg V. Komogortsev
Eye movement biometrics (EMB) use subject-specific gaze dynamics for user authentication and identification. Recent deep learning-based EMB systems achieve strong performance by modeling temporal eye movement behavior. However, these systems typically overlook...
Eye movement biometrics (EMB) use subject-specific gaze dynamics for user authentication and identification. Recent deep learning-based EMB systems achieve strong performance by modeling temporal eye movement behavior. However, these systems typically overlook continuous gaze offset, despite prior evidence that it contains user-discriminative information. This work examines whether continuous gaze offset can improve biometric performance when combined with existing biometric features. We evaluat...
162 Uneven Evolution of Cognition Across Generations of Generative AI Models
2605.06815
Psychometric Evaluation of GenAI用心理测量框架评估多代生成式模型的认知剖面并对比人类常模。
cs.CVcs.AI
Isaac Galatzer-Levy, Daniel McDuff, Xin Liu, Jed McGiffin
The pursuit of artificial general intelligence necessitates robust methods for evaluating the cognitive capabilities of models beyond narrow task performance. Here, we introduce a psychometric framework to assess the cognitive profiles of generative AI, compar...
The pursuit of artificial general intelligence necessitates robust methods for evaluating the cognitive capabilities of models beyond narrow task performance. Here, we introduce a psychometric framework to assess the cognitive profiles of generative AI, comparing them to human norms and tracking their evolution across generations. Initial evaluation of leading multimodal models using tasks adapted from the Wechsler Adult Intelligence Scale revealed a profoundly uneven cognitive architecture: nea...
163 A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models
2605.06829
Unified Theory of Diffusion Models以测度论统一扩散、score与流匹配为学习时变向量场的框架。
cs.CVcs.LG
Aditya Ranganath, Mukesh Singhal
We survey continuous-time generative modeling methods based on transporting a simple reference distribution to a data distribution via stochastic or deterministic dynamics. We present a unified framework in which diffusion models, score-based generative models...
We survey continuous-time generative modeling methods based on transporting a simple reference distribution to a data distribution via stochastic or deterministic dynamics. We present a unified framework in which diffusion models, score-based generative models, and flow matching are instances of learning a time-dependent vector field that induces a family of marginals $(\rho_t)_{t \in [0,1]}$ governed by continuity and Fokker-Planck equations. Such a unified theory is timely because these method...
164 EULER-ADAS: Energy-Efficient & SIMD-Unified Logarithmic-Posit Engine for Precision-Reconfigurable Approximate ADAS Acceleration
2605.06875
Posit Accelerator for ADAS提出SIMD对数Posit计算引擎以低功耗加速ADAS推理。
cs.CVcs.AI
Mukul Lokhande, Ratko Pilipovic, Omkar Kokane, Adam Teman, Santosh Kumar Vishvakarma
Advanced driver-assistance systems (ADAS) require neural compute engines that deliver low-latency inference under strict power and area constraints. Posit arithmetic is attractive for such accelerators because it provides high numerical fidelity at low precisi...
Advanced driver-assistance systems (ADAS) require neural compute engines that deliver low-latency inference under strict power and area constraints. Posit arithmetic is attractive for such accelerators because it provides high numerical fidelity at low precision, but its variable-length regime encoding increases encode/decode cost and exposes the datapath to large regime-field fault effects. This paper presents EULER-ADAS, a SIMD-enabled logarithmic bounded-Posit neural compute engine for energy...
165 Dr-BA: Separable Optimization for Direct Radar Bundle Adjustment & Localization
2605.07041
Radar Bundle Adjustment Localization直接在旋转雷达强度图上做可分优化的BA与定位。
cs.CV
Daniil Lisus, Cedric Le Gentil, Timothy D. Barfoot
This paper introduces Dr-BA, a first-of-its-kind radar bundle adjustment (BA) framework that operates directly on 2D spinning radar intensity images. Unlike camera or lidar sensors, radar is largely unaffected by precipitation, making it a critical modality fo...
This paper introduces Dr-BA, a first-of-its-kind radar bundle adjustment (BA) framework that operates directly on 2D spinning radar intensity images. Unlike camera or lidar sensors, radar is largely unaffected by precipitation, making it a critical modality for autonomous systems that require all-weather robustness. Existing state estimation approaches using spinning radar typically extract sparse point clouds from range-azimuth-intensity measurements and apply point cloud alignment techniques t...
166 Do Joint Audio-Video Generation Models Understand Physics?
2605.07061
Physics Benchmark for AV Generation提出AV-Phys Bench评测音视频联合生成的物理一致性。
cs.CVcs.AIcs.SDcs.MM
Zijun Cui, Xiulong Liu, Hao Fang, Mingwei Xu, Jiageng Liu
Joint audio-video generation models are rapidly approaching professional production quality, raising a central question: do they understand audio-visual physics, or merely generate plausible sounds and frames that violate real-world consistency? We introduce A...
Joint audio-video generation models are rapidly approaching professional production quality, raising a central question: do they understand audio-visual physics, or merely generate plausible sounds and frames that violate real-world consistency? We introduce AV-Phys Bench, a benchmark for evaluating physical commonsense in joint audio-video generation. AV-Phys Bench tests models across three scene categories: Steady State, Event Transition, and Environment Transition. It covers physics-grounded ...
167 Fine-tuning a vision-language model for fracture-surface morphology recognition
2605.07145
VLM for Fracture Morphology微调视觉语言模型识别断口形貌并构建大规模标注数据集。
cs.CV
Quanliang Liu, Jungtaek Kim, Kangwook Lee, Hyunseok Oh
Vision-language models (VLMs) have shown strong potential for scientific image understanding, but general-purpose models often lack the domain-specific visual knowledge required for reliable materials characterization. In this work, we fine-tuned an open-sourc...
Vision-language models (VLMs) have shown strong potential for scientific image understanding, but general-purpose models often lack the domain-specific visual knowledge required for reliable materials characterization. In this work, we fine-tuned an open-source VLM (Qwen3-VL-32B-Instruct) for fracture-surface image analysis using a curated dataset of 13,168 open-source, literature-mined fracture-surface images. Morphology annotations were generated by GPT-5.2-Reasoning (high) from both the image...
168 PersonaGest: Personalized Co-Speech Gesture Generation with Semantic-Guided Hierarchical Motion Representation
2605.07252
Personalized Co-speech Gestures用语义引导的分层运动表征生成可个性化的伴随语音手势。
cs.CVcs.MM
Junchuan Zhao, Qifan Liang, Ye Wang
Co-speech gesture generation aims to synthesize realistic body movements that are semantically coherent with speech and faithful to a user-specified gestural style. Existing VQ-VAE based co-speech gesture generation methods improve generation quality but fail ...
Co-speech gesture generation aims to synthesize realistic body movements that are semantically coherent with speech and faithful to a user-specified gestural style. Existing VQ-VAE based co-speech gesture generation methods improve generation quality but fail to encode semantic structure into the motion representation or explicitly disentangle content from style, limiting both semantic coherence and personalization fidelity. We present PersonaGest, a two-stage framework addressing both limitatio...
169 Predictive but Not Plannable: RC-aux for Latent World Models
2605.07278
Reachability-Corrected World Models提出RC-aux校正潜空间可达性以提升世界模型规划对齐。
cs.CVcs.LGcs.AI
Wenyuan Li, Guang Li, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama
A latent world model may achieve accurate short-horizon prediction while still inducing a latent space that is poorly aligned with planning. A key issue is spatiotemporal mismatch: these models are often trained with local predictive supervision, but deployed ...
A latent world model may achieve accurate short-horizon prediction while still inducing a latent space that is poorly aligned with planning. A key issue is spatiotemporal mismatch: these models are often trained with local predictive supervision, but deployed for long-horizon goal-directed search in latent spaces where Euclidean distance may not reflect what is reachable within a finite action budget. We present the Reachability-Correction auxiliary objective (RC-aux), a lightweight correction f...
170 Task-Oriented Communication for Human Action Understanding via Edge-Cloud Co-Inference
2605.07354
Edge-Cloud Action Understanding提出面向任务的通信与边云协同推理以降低动作识别带宽与延迟。
cs.CV
Jingyi Liu, Cheng Yuan, Lijun He, Jun Zhang, Jiawei Shao
The expanding application of smart sensing has created a growing demand for the accurate understanding of human action at the network edge. Traditional approaches require massive video data to be transmitted from resource-constrained edge devices to powerful c...
The expanding application of smart sensing has created a growing demand for the accurate understanding of human action at the network edge. Traditional approaches require massive video data to be transmitted from resource-constrained edge devices to powerful cloud servers, incurring prohibitive uplink bandwidth consumption and unacceptable latency while raising privacy concerns. To overcome these bottlenecks, we propose a task-oriented communication framework for human action understanding (TOAU...
171 Weather-Robust Scene Semantics with Vision-Aligned 4D Radar
2605.07367
Radar-based Weather-Robust Semantics将4D雷达表征对齐视觉嵌入并用VLM生成稳健场景语义描述。
cs.CV
Kali Hamilton, Christoffer Heckman
Cameras and LiDAR degrade in rain, fog, and snow, while millimeter-wave radar remains largely unaffected. We align a radar encoder to frozen SigLIP vision embeddings and decode structured scene captions through a frozen vision-language model (VLM) with approxi...
Cameras and LiDAR degrade in rain, fog, and snow, while millimeter-wave radar remains largely unaffected. We align a radar encoder to frozen SigLIP vision embeddings and decode structured scene captions through a frozen vision-language model (VLM) with approximately 7M trainable parameters. On K-RADAR with held-out fog, light snow, and heavy snow sequences, all radar configurations outperform a camera baseline that collapses to over 90% hallucination. We identify a token-norm mismatch as the dom...
172 Velocity-Space 3D Asset Editing
2605.07385
Local 3D Editing in Velocity Space在ODE采样器速度场中施加局部约束实现原生3D资产编辑。
cs.CV
Hao Liu, Yuxuan Lin, Jingfeng Guo, Ruihang Chu, Junjie Wang
Editing a 3D asset locally, modifying a target region while preserving the rest, is a fundamental requirement of native 3D editing. Existing methods enforce locality through mechanisms external to the generator, such as manual 3D masks, post-hoc voxel merging,...
Editing a 3D asset locally, modifying a target region while preserving the rest, is a fundamental requirement of native 3D editing. Existing methods enforce locality through mechanisms external to the generator, such as manual 3D masks, post-hoc voxel merging, or 2D multi-view lifting. None of them intervene where the corruption actually originates: inside the ODE sampler. For a rectified-flow generator to achieve faithful local editing, its velocity field should be strong over the target editin...
173 SR$^2$-LoRA: Self-Rectifying Inter-layer Relations in Low-Rank Adaptation for Class-Incremental Learning
2605.07420
LoRA for Class-Incremental Learning用自校正层间关系的LoRA减轻类增量学习中的遗忘。
cs.CVcs.LG
Fengqiang Wan, Yipeng Lin, Kan Lv, Yang Yang
Pre-trained models with parameter-efficient fine-tuning (PEFT) have demonstrated promising potential for class-incremental learning (CIL), yet catastrophic forgetting still persists when adapting models to new tasks. In this paper, we present a novel perspecti...
Pre-trained models with parameter-efficient fine-tuning (PEFT) have demonstrated promising potential for class-incremental learning (CIL), yet catastrophic forgetting still persists when adapting models to new tasks. In this paper, we present a novel perspective on catastrophic forgetting through the analysis of inter-layer relation drift, i.e., the progressive disruption of relationships among layer-wise representations during the learning of new tasks. We theoretically show that the increase o...
174 Is the Future Compatible? Diagnosing Dynamic Consistency in World Action Models
2605.07514
Dynamic Consistency in Action Models提出诊断指标检验世界动作模型生成未来与动作序列的动力一致性。
cs.CV
Bo-Kai Ruan, Teng-Fang Hsiao, Ling Lo, Hong-Han Shuai
World Action Models (WAMs) enable decision-making through imagined rollouts by predicting future observations and actions. However, the reliability of these imagined futures remains under-examined: is a generated future merely visually plausible, or is it dyna...
World Action Models (WAMs) enable decision-making through imagined rollouts by predicting future observations and actions. However, the reliability of these imagined futures remains under-examined: is a generated future merely visually plausible, or is it dynamically compatible with the action sequence it claims to model? In this work, we identify action-state consistency, the alignment between predicted actions and induced state transitions, as a missing reliability axis for WAMs. Through a sys...
175 Stochastic Transition-Map Distillation for Fast Probabilistic Inference
2605.07661
Fast Diffusion Inference Distillation提出STMD蒸馏完整转移分布以加速扩散推断并保持随机性。
cs.CVcs.LG
George Rapakoulias, Peter Garud, Lingjiong Zhu, Panagiotis Tsiotras
Diffusion models achieve strong generation quality, diversity, and distribution coverage, but their performance often comes with expensive inference. In this work, we propose Stochastic Transition-Map Distillation (STMD), a teacher-free framework for accelerat...
Diffusion models achieve strong generation quality, diversity, and distribution coverage, but their performance often comes with expensive inference. In this work, we propose Stochastic Transition-Map Distillation (STMD), a teacher-free framework for accelerating diffusion model inference while preserving probabilistic sample generation. In contrast to score-based diffusion models, whose denoising parametrization models the mean of the posterior distribution, STMD distills the full transition ma...
176 Spectral Surgery: Class-Targeted Post-Hoc Rebalancing via Hessian Spike Perturbation
2605.07790
Hessian-based Post-hoc Rebalancing利用Hessian谱尖峰扰动进行类别定向的后处理再平衡提升分类。
cs.CVcs.LG
Hugo Vigna, Samuel Bontemps
The Hessian spectrum of trained deep networks exhibits a characteristic structure: a continuous bulk of near-zero eigenvalues and a small number of large outlier eigenvalues (spikes), confirming the relevance of Random Matrix Theory in deep learning. The spike...
The Hessian spectrum of trained deep networks exhibits a characteristic structure: a continuous bulk of near-zero eigenvalues and a small number of large outlier eigenvalues (spikes), confirming the relevance of Random Matrix Theory in deep learning. The spike count matches the number of classes minus one. While prior work has described this structure, no method has exploited it operationally to improve classification performance. We propose Spectral Surgery, a post-hoc optimization method that ...
177 Pre-training Enables Extraordinary All-optical Image Denoising
2605.07810
Pretrained Optical Image Denoising通过预训练提升全光学神经网络的快照图像去噪质量。
cs.CV
Xudong Lv, Yuxiang Sun, Shuo Wang, Nanxing Chen, Jun Guan
Optical neural networks are emerging as powerful machine learning and information processing tools because of their potential advantages in speed and energy efficiency. The training methods of these physical models, however, remain underexplored compared to th...
Optical neural networks are emerging as powerful machine learning and information processing tools because of their potential advantages in speed and energy efficiency. The training methods of these physical models, however, remain underexplored compared to their digital counterparts and are leading to suboptimal performance. This paper reports a pre-training-driven approach that leads to snapshot image denoising with substantially improved quality. We demonstrated effective free-space optical d...
178 Anisotropic Modality Align
2605.07825
Multimodal Representation Interchangeability提出各向异性对齐方法缓解模态表示偏移以支持用单模态数据训练。
cs.CVcs.MM
Xiaomin Yu, Yijiang Li, Yuhui Zhang, Hanzhen Zhao, Yue Yang
Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models ...
Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unimodal data. However, the key premise of this paradigm remains insufficiently understood: can representations from different modalities be reliably interchanged? The core obstacle lies in the persistent...
179 Enhancing Federated Quadruplet Learning: Stochastic Client Selection and Embedding Stability Analysis
2605.07888
Federated Metric Learning Stability提出FedQuad结合随机选客户端与稳定嵌入以提升联邦度量学习泛化。
cs.CVcs.LG
Ozgu Goksu, Nicolas Pugeault
Federated Learning (FL) enables decentralised model training across distributed clients without requiring data centralisation. However, the generalisation performance of the global model is usually degraded by data heterogeneity across clients, particularly un...
Federated Learning (FL) enables decentralised model training across distributed clients without requiring data centralisation. However, the generalisation performance of the global model is usually degraded by data heterogeneity across clients, particularly under limited data availability and class imbalance. To address this challenge, we propose FedQuad, a novel method that explicitly enforces minimising intra-class representations while enabling inter-class splits across clients. By jointly mi...
180 Consistency Regularised Gradient Flows for Inverse Problems
2605.07907
Gradient Flows for Inverse Problems用一致性正则的欧氏-W2梯度流加速LDM先验下的逆问题求解。
cs.CVcs.LG
Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra, O. Deniz Akyildiz
Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagatio...
Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagation through large pretrained components, leading to substantial computational costs and, in some cases, degraded reconstruction quality. We propose a unified Euclidean-Wasserstein-2 gradient-flow framework that jointly performs posterior samp...
181 Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning
2605.07914
Multi-distribution generalization geometry提出同时考虑平坦性与梯度对齐的多分布学习方法。
cs.CVcs.LG
Aristotelis Ballas, Christos Diou
Sharpness-aware and gradient-alignment methods have been shown to improve generalization, however each family of methods targets a single geometric property of the loss landscape, while ignoring the other. In this paper, we show that this omission is structura...
Sharpness-aware and gradient-alignment methods have been shown to improve generalization, however each family of methods targets a single geometric property of the loss landscape, while ignoring the other. In this paper, we show that this omission is structurally unavoidable and that both flatness and gradient alignment should be considered in multi-distribution learning settings. Specifically, we derive an excess-risk decomposition that yields two additive leading-order terms: (i) an alignment ...
182 TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning
2605.07943
Active vision imitation benchmark发布TAVIS基准评测主动视线控制的模仿学习。
cs.CVcs.LGcs.AI
Giacomo Spigler
Active vision -- where a policy controls its own gaze during manipulation -- has emerged as a key capability for imitation learning, with multiple independent systems demonstrating its benefits in the past year. Yet there is no shared benchmark to compare appr...
Active vision -- where a policy controls its own gaze during manipulation -- has emerged as a key capability for imitation learning, with multiple independent systems demonstrating its benefits in the past year. Yet there is no shared benchmark to compare approaches or quantify what active vision contributes, on which task types, and under what conditions. We introduce TAVIS, evaluation infrastructure for active-vision imitation learning, with two complementary task suites -- TAVIS-Head (5 tasks...
183 Uncertainty Quantification for Cardiac Shape Reconstruction with Deep Signed Distance Functions via MCMC methods
2605.07987
Uncertainty-aware cardiac reconstruction用DeepSDF结合MCMC量化心脏形状重建不确定性。
cs.CV
Jan Verh\"ulsdonk, Thomas Grandits, Francisco Sahli Costabal, Thomas Beiert, Simone Pezzuto
Atlas-based approaches allow high-quality, patient-specific shape reconstructions of cardiac anatomy from sparse and/or noisy data such as point clouds. However, these methods are mainly prior-driven, so the impact of uncertainty can be large, limiting their c...
Atlas-based approaches allow high-quality, patient-specific shape reconstructions of cardiac anatomy from sparse and/or noisy data such as point clouds. However, these methods are mainly prior-driven, so the impact of uncertainty can be large, limiting their clinical reliability. We propose a probabilistic framework for uncertainty-aware cardiac shape reconstruction that combines Deep Signed Distance Functions (DeepSDFs) with Markov Chain Monte Carlo (MCMC) sampling. Cardiac geometries are model...
184 123D: Unifying Multi-Modal Autonomous Driving Data at Scale
2605.08084
Autonomous driving data unification构建123D统一多模态自动驾驶数据格式与工具链。
cs.CV
Daniel Dauner, Valentin Charraut, Bastian Berle, Tianyu Li, Long Nguyen
The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, anno...
The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, annotations, traffic lights, and HD maps, with different rates and synchronization schemes. They come in fragmented formats requiring complex dependencies that cannot natively coexist in the same development environment. Further, major inconsis...
185 ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
2408.06747
CLIP bias-corrected segmentation显式建模并纠正CLIP偏置以提升无监督分割。
cs.CV
Jingyun Wang, Guoliang Kang
Recent works utilize CLIP to perform the challenging unsupervised semantic segmentation task where only images without annotations are available. However, we observe that when adopting CLIP to such a pixel-level understanding task, unexpected bias (including c...
Recent works utilize CLIP to perform the challenging unsupervised semantic segmentation task where only images without annotations are available. However, we observe that when adopting CLIP to such a pixel-level understanding task, unexpected bias (including class-preference bias and space-preference bias) occurs. Previous works don't explicitly model the bias, which largely constrains the segmentation performance. In this paper, we propose to explicitly model and rectify the bias existing in CL...
186 Multimodal Diffusion Transformer with Memory Bank for Scalable Long-Duration Talking Video Generation
2411.16748
Long talking video generation用带记忆库的多模态扩散Transformer生成长口播视频。
cs.CV
Haojie Zhang, Zhihao Liang, Ruibo Fu, Bingyan Liu, Zhengqi Wen
Long-duration talking video synthesis faces enduring challenges in achieving high video quality, portrait consistency, temporal coherence, and computational efficiency. As video length increases, issues such as visual degradation, portrait drift, temporal arti...
Long-duration talking video synthesis faces enduring challenges in achieving high video quality, portrait consistency, temporal coherence, and computational efficiency. As video length increases, issues such as visual degradation, portrait drift, temporal artifacts, and error accumulation become increasingly problematic, severely affecting the realism and reliability of the results. To address these challenges, we present LetsTalk, a diffusion transformer framework equipped with multimodal guida...
187 Surgical Visual Understanding (SurgVU) Dataset
2501.09209
Surgical video dataset发布SurgVU大规模手术视频数据集及多任务标注。
cs.CV
Aneeq Zia, Max Berniker, Rogerio Nespolo, Xiaorui Zhang, Conor Perreault
Owing to recent advances in machine learning and the ability to harvest large amounts of data during robotic-assisted surgeries, surgical data science is ripe for foundational work. We present a large dataset of surgical videos and their accompanying labels fo...
Owing to recent advances in machine learning and the ability to harvest large amounts of data during robotic-assisted surgeries, surgical data science is ripe for foundational work. We present a large dataset of surgical videos and their accompanying labels for this purpose. We describe how the data was collected and some of its unique attributes. Multiple example problems are outlined. Although the dataset was curated for a particular set of scientific challenges (in an accompanying paper), it ...
188 RedDiffuser: Auditing Multimodal Safety Failures in Vision-Language Models via Reinforced Diffusion
2503.06223
VLM safety auditing用强化扩散生成对抗情境审计多模态安全失效。
cs.CV
Ruofan Wang, Xingjun Ma
Large Vision-Language Models (VLMs) are increasingly deployed in open-ended environments, where ensuring reliable safety under multimodal inputs is critical. However, existing evaluations remain largely instruction-centric, focusing on explicit malicious queri...
Large Vision-Language Models (VLMs) are increasingly deployed in open-ended environments, where ensuring reliable safety under multimodal inputs is critical. However, existing evaluations remain largely instruction-centric, focusing on explicit malicious queries while overlooking a more realistic and underexplored risk: whether safety alignment remains robust under harmful contextual exposure. This limitation is particularly important for multimodal systems, where visual inputs can substantially...
189 Tables Guide Vision: Learning to See the Heart through Tabular Data
2503.14998
Tabular-guided medical contrastive learning利用表格临床属性引导对比学习提升心脏影像表征。
cs.CV
Marta Hasny, Maxime Di Folco, Keno Bressem, Julia Schnabel
Contrastive learning methods in computer vision typically rely on augmented views of the same image or multimodal pretraining strategies that align paired modalities. However, these approaches often overlook semantic relationships between distinct instances, l...
Contrastive learning methods in computer vision typically rely on augmented views of the same image or multimodal pretraining strategies that align paired modalities. However, these approaches often overlook semantic relationships between distinct instances, leading to false negatives when semantically similar samples are treated as negatives. This limitation is especially critical in medical imaging domains such as cardiology, where demographic and clinical attributes play a critical role in as...
190 Frozen Backpropagation: Relaxing Weight Symmetry in Deep Spiking Neural Networks
2505.13741
Spiking neural network training提出冻结反传以放宽SNN前后向权重对称约束。
cs.CV
Gaspard Goupy, Pierre Tirilly, Ioan Marius Bilasco
Direct training of Spiking Neural Networks (SNNs) on neuromorphic hardware can greatly reduce energy costs compared to GPU-based training. However, implementing Backpropagation (BP) on such hardware is challenging because forward and backward passes are typica...
Direct training of Spiking Neural Networks (SNNs) on neuromorphic hardware can greatly reduce energy costs compared to GPU-based training. However, implementing Backpropagation (BP) on such hardware is challenging because forward and backward passes are typically performed by separate networks with distinct weights. To compute correct gradients, forward and feedback weights must remain symmetric during training, necessitating weight transport between the two networks. This symmetry requirement i...
191 CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition
2505.14113
Conformal uncertainty for segmentation用空间分组分解的共形预测给分割提供可靠置信集。
cs.CVcs.LG
Bruno Viti, Elias Karabelas, Martin Holler
Most machine learning-based image segmentation models produce pixel-wise confidence scores that represent the model's predicted probability for each class label at every pixel. While this information can be particularly valuable in high-stakes domains such as ...
Most machine learning-based image segmentation models produce pixel-wise confidence scores that represent the model's predicted probability for each class label at every pixel. While this information can be particularly valuable in high-stakes domains such as medical imaging, these scores are heuristic in nature and do not constitute rigorous quantitative uncertainty estimates. Conformal prediction (CP) provides a principled framework for transforming heuristic confidence scores into statistical...
192 LoopNav: Benchmarking Spatial Consistency in World Models
2505.22976
Spatial consistency world-model benchmark提出LoopNav基准评测世界模型的长程空间一致性。
cs.CVcs.AI
Kewei Lian, Shaofei Cai, Yitao Liang, Anji Liu
The ability to simulate the world in a spatially consistent manner is a crucial requirement for effective world models. Such a model enables high-quality visual generation, and also ensures the reliability of world models for downstream tasks such as simulatio...
The ability to simulate the world in a spatially consistent manner is a crucial requirement for effective world models. Such a model enables high-quality visual generation, and also ensures the reliability of world models for downstream tasks such as simulation and planning. It must not only retain long-horizon observational information, but also enables the construction of explicit or implicit internal spatial representations. However, existing datasets do not explicitly enforce spatial consist...
193 Factored Classifier-Free Guidance
2506.14399
Attribute-wise diffusion guidance提出分解式CFG为反事实扩散生成按属性独立引导。
cs.CVcs.AI
Tian Xia, Fabio De Sousa Ribeiro, Rajat R Rasal, Avinash Kori, Raghav Mehta
Counterfactual generation aims to simulate realistic hypothetical outcomes under causal interventions. Diffusion models have emerged as a powerful tool for this task, combining DDIM inversion with conditional generation and classifier-free guidance (CFG). In t...
Counterfactual generation aims to simulate realistic hypothetical outcomes under causal interventions. Diffusion models have emerged as a powerful tool for this task, combining DDIM inversion with conditional generation and classifier-free guidance (CFG). In this work, we identify a key limitation of CFG for counterfactual generation: it prescribes a global guidance scale for all attributes, leading to significant spurious changes in inferred counterfactuals. To mitigate this, we propose Factore...
194 NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection
2508.01248
Generalizable AI-image detection用CLIP特征的零空间解耦提升未知生成器检测泛化。
cs.CV
Jiazhen Yan, Fan Wang, Weiwei Jiang, Ziqiang Li, Zhangjie Fu
The rapid progress of generative models, such as GANs and diffusion models, has facilitated the creation of highly realistic images, raising growing concerns over their misuse in security-sensitive domains. While existing detectors perform well under known gen...
The rapid progress of generative models, such as GANs and diffusion models, has facilitated the creation of highly realistic images, raising growing concerns over their misuse in security-sensitive domains. While existing detectors perform well under known generative settings, they often fail to generalize to unknown generative models, especially when semantic content between real and fake images is closely aligned. In this paper, we revisit the use of CLIP features for AI-generated image detect...
195 Deeply Dual Supervised learning for melanoma recognition
2508.01994
Melanoma recognition framework以局部与全局双重监督增强黑色素瘤识别。
cs.CV
Rujosh Polma, Krishnan Menon Iyer
As the application of deep learning in dermatology continues to grow, the recognition of melanoma has garnered significant attention, demonstrating potential for improving diagnostic accuracy. Despite advancements in image classification techniques, existing m...
As the application of deep learning in dermatology continues to grow, the recognition of melanoma has garnered significant attention, demonstrating potential for improving diagnostic accuracy. Despite advancements in image classification techniques, existing models still face challenges in identifying subtle visual cues that differentiate melanoma from benign lesions. This paper presents a novel Deeply Dual Supervised Learning framework that integrates local and global feature extraction to enha...
196 VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling
2508.02129
Dynamic scene 4D Gaussian splatting结合视频扩散增强4D高斯溅射建模动态城市场景。
cs.CV
Yuru Xiao, Zihan Lin, Chao Lu, Deming Zhai, Kui Jiang
Dynamic urban scene modeling is a rapidly evolving area with broad applications. While current approaches leveraging neural radiance fields or Gaussian Splatting have achieved fine-grained reconstruction and high-fidelity novel view synthesis, they still face ...
Dynamic urban scene modeling is a rapidly evolving area with broad applications. While current approaches leveraging neural radiance fields or Gaussian Splatting have achieved fine-grained reconstruction and high-fidelity novel view synthesis, they still face significant limitations. These often stem from a dependence on pre-calibrated object tracks or difficulties in accurately modeling fast-moving objects from undersampled capture, particularly due to challenges in handling temporal discontinu...
197 Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling
2508.06805
Medical organ boundary edge detection通过自顶向下细化与亚像素上采样精确定位器官边界。
cs.CV
Aarav Mehta, Priya Deshmukh, Vikram Singh, Siddharth Malhotra, Krishnan Menon Iyer
Accurate localization of organ boundaries is critical in medical imaging for segmentation, registration, surgical planning, and radiotherapy. While deep convolutional networks (ConvNets) have advanced general-purpose edge detection to near-human performance on...
Accurate localization of organ boundaries is critical in medical imaging for segmentation, registration, surgical planning, and radiotherapy. While deep convolutional networks (ConvNets) have advanced general-purpose edge detection to near-human performance on natural images, their outputs often lack precise localization, a limitation that is particularly harmful in medical applications where millimeter-level accuracy is required. Building on a systematic analysis of ConvNet edge outputs, we pro...
198 DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation
2508.06816
Lesion segmentation with artifact suppression提出双分辨率残差网络抑制伪影并精细分割皮损。
cs.CV
Vikram Singh, Kabir Malhotra, Rohan Desai, Ananya Shankaracharya, Priyadarshini Chatterjee
Lesion segmentation, in contrast to natural scene segmentation, requires handling subtle variations in texture and color, frequent imaging artifacts (such as hairs, rulers, and bubbles), and a critical need for precise boundary localization to aid in accurate ...
Lesion segmentation, in contrast to natural scene segmentation, requires handling subtle variations in texture and color, frequent imaging artifacts (such as hairs, rulers, and bubbles), and a critical need for precise boundary localization to aid in accurate diagnosis. The accurate delineation of melanocytic tumors in dermoscopic images is a crucial component of automated skin cancer screening systems and clinical decision support. In this paper, we present a novel dual-resolution architecture ...
199 VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation
2508.06819
Weakly supervised vessel segmentation用学习的随机游走传播实现弱监督皮下血管分割。
cs.CV
Ayaan Nooruddin Siddiqui, Mahnoor Zaidi, Ayesha Nazneen Shahbaz, Priyadarshini Chatterjee, Krishnan Menon Iyer
The task of parsing subcutaneous vessels in clinical images is often hindered by the high cost and limited availability of ground truth data, as well as the challenge of low contrast and noisy vessel appearances across different patients and imaging modalities...
The task of parsing subcutaneous vessels in clinical images is often hindered by the high cost and limited availability of ground truth data, as well as the challenge of low contrast and noisy vessel appearances across different patients and imaging modalities. In this work, we propose a novel weakly supervised training framework specifically designed for subcutaneous vessel segmentation. This method utilizes low-cost, sparse annotations such as centerline traces, dot markers, or short scribbles...
200 Dino U-Net: Exploiting High-Fidelity Dense Features from Foundation Models for Medical Image Segmentation
2508.20909
Foundation-model features for med segmentation以冻结DINOv3密集特征构建Dino U-Net做医学分割。
cs.CV
Haoyue Li, Yifan Gao, Feng Yuan, Xiaosong Wang, Xin Gao
Foundation models pre-trained on large-scale natural image datasets offer a powerful paradigm for medical image segmentation. However, effectively transferring their learned representations for precise clinical applications remains a challenge. In this work, w...
Foundation models pre-trained on large-scale natural image datasets offer a powerful paradigm for medical image segmentation. However, effectively transferring their learned representations for precise clinical applications remains a challenge. In this work, we propose Dino U-Net, a novel encoder-decoder architecture designed to exploit the high-fidelity dense features of the DINOv3 vision foundation model. Our architecture introduces an encoder built upon a frozen DINOv3 backbone, which employs...
201 CalexNet: Soft Cascade-Aligned Training and Calibration for Lightweight Early-Exit Branches
2509.08318
Early-exit calibration training提出CalexNet对齐训练与推理分布并校准早退分支。
cs.CV
Yehudit Aperstein, Alexander Apartsin
Early-exit cascades over a frozen convolutional backbone enable adaptive inference but suffer from three sources of train-inference mismatch: branches train on samples they will never see at inference, their per-class precision thresholds are calibrated on the...
Early-exit cascades over a frozen convolutional backbone enable adaptive inference but suffer from three sources of train-inference mismatch: branches train on samples they will never see at inference, their per-class precision thresholds are calibrated on the wrong distribution, and the standard cross-entropy target on backbone argmax labels discards the backbone's uncertainty signal. We close all three gaps with CalexNet (Cascade-Aligned Early eXits), a training-recipe-only modification: branc...
202 A Computer Vision Pipeline for Individual-Level Behavior Analysis: Benchmarking on the Edinburgh Pig Dataset
2509.12047
Animal behavior vision pipeline构建个体级猪群行为分析视觉流水线并在数据集上评测。
cs.CVcs.AI
Haiyu Yang, Enhong Liu, Jennifer Sun, Sumit Sharma, Meike van Leerdam
Animal behavior analysis plays a crucial role in understanding animal welfare, health status, and productivity in agricultural settings. However, traditional manual observation methods are time-consuming, subjective, and limited in scalability. We present a mo...
Animal behavior analysis plays a crucial role in understanding animal welfare, health status, and productivity in agricultural settings. However, traditional manual observation methods are time-consuming, subjective, and limited in scalability. We present a modular pipeline that leverages open-sourced state-of-the-art computer vision techniques to automate animal behavior analysis in a group housing environment. Our approach combines state-of-the-art models for zero-shot object detection, motion...
203 TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
2509.22813
Test-time adaptation for SSMs用不确定性引导的遍历生成多视角实现SSM测试时自适应。
cs.CV
Sahar Dastani, Ali Bahri, Gustavo Adolfo Vargas Hakim, Moslem Yazdanpanah, Mehrdad Noori
State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering architecture designed for vision tasks. However, their generalization performance degrades significantly under distribution...
State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering architecture designed for vision tasks. However, their generalization performance degrades significantly under distribution shifts. To address this limitation, we propose TRUST (Test-Time Refinement using Uncertainty-Guided SSM Traverses), a novel test-time adaptation (TTA) method that leverages diverse traversal permutations to generate multiple causal perspec...
204 GRAPE: Let GRPO Supervise Query Rewriting by Ranking for Retrieval
2509.23370
LLM query rewriting for retrieval用GRPO式排序监督训练LLM改写查询以提升检索鲁棒性。
cs.CV
Zhaohua Zhang, Jianhuan Zhuo, Muxi Chen, Chenchen Zhao, Wenyu Jiang
The CLIP model has established itself as a cornerstone of large-scale retrieval systems. However, its performance often degrades under distributional shifts such as multilingual, long-form, or multimodal queries. To avoid the prohibitive costs associated with ...
The CLIP model has established itself as a cornerstone of large-scale retrieval systems. However, its performance often degrades under distributional shifts such as multilingual, long-form, or multimodal queries. To avoid the prohibitive costs associated with retriever retraining or corpus re-embedding, we propose GRAPE (Grouped Ranking-Aware Policy Optimization Enhancement), a plug-and-play approach that leverages LLM-based query rewriting to bridge these gaps. Unlike existing methods that lack...
205 PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection
2509.26272
VLM deepfake detection optimization构建推理标注数据并用段落级策略优化提升深伪检测。
cs.CVcs.LG
Tuan Nguyen, Naseem Khan, Khang Tran, NhatHai Phan, Issa Khalil
The rapid rise of synthetic media has made deepfake detection a critical challenge for online safety and trust. Progress remains constrained by the scarcity of large, high-quality datasets. Although multimodal large language models (LLMs) exhibit strong reason...
The rapid rise of synthetic media has made deepfake detection a critical challenge for online safety and trust. Progress remains constrained by the scarcity of large, high-quality datasets. Although multimodal large language models (LLMs) exhibit strong reasoning capabilities, their performance on deepfake detection is poor, often producing explanations that are misaligned with visual evidence or hallucinatory. To address this limitation, we introduce a reasoning-annotated dataset for deepfake d...
206 Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry
2510.08638
DINO interpretability with SAEs用稀疏自编码字典解析DINO概念并研究其几何结构。
cs.CVcs.AI
Thomas Fel, Binxu Wang, Michael A. Lepori, Matthew Kowal, Andrew Lee
DINOv2 is routinely deployed to recognize objects, scenes, and actions; yet the nature of what it perceives remains unknown. As a working baseline, we adopt the Linear Representation Hypothesis (LRH) and operationalize it using SAEs, producing a 32,000-unit di...
DINOv2 is routinely deployed to recognize objects, scenes, and actions; yet the nature of what it perceives remains unknown. As a working baseline, we adopt the Linear Representation Hypothesis (LRH) and operationalize it using SAEs, producing a 32,000-unit dictionary that serves as the interpretability backbone of our study, which unfolds in three parts. In the first part, we analyze how different downstream tasks recruit concepts from our learned dictionary, revealing functional specialization...
207 DKDS: A Benchmark Dataset of Degraded Kuzushiji Documents with Seals for Detection and Binarization
2511.09117
Degraded Kuzushiji document benchmark发布含退化与印章的古文书数据集用于检测与二值化。
cs.CV
Rui-Yang Ju, Kohei Yamashita, Hirotaka Kameko, Shinsuke Mori
Kuzushiji, a pre-modern Japanese cursive script, can currently be read and understood by only a few thousand trained experts in Japan. With the rapid development of deep learning, researchers have begun applying Optical Character Recognition (OCR) techniques t...
Kuzushiji, a pre-modern Japanese cursive script, can currently be read and understood by only a few thousand trained experts in Japan. With the rapid development of deep learning, researchers have begun applying Optical Character Recognition (OCR) techniques to transcribe Kuzushiji into modern Japanese. Although existing OCR methods perform well on clean pre-modern Japanese documents written in Kuzushiji, they often fail to consider various types of noise, such as document degradation and seals,...
208 Teaching Prompts to Coordinate: Hierarchical Layer-Grouped Prompt Tuning for Continual Learning
2511.12090
Prompt tuning for continual learning提出分层分组提示调优以协调层间适配并减轻遗忘。
cs.CV
Shengqin Jiang, Tianqi Kong, Yuankai Qi, Haokui Zhang, Lina Yao
Prompt-based continual learning methods fine-tune only a small set of additional learnable parameters while keeping the pre-trained model's parameters frozen. It enables efficient adaptation to new tasks while mitigating the risk of catastrophic forgetting. Th...
Prompt-based continual learning methods fine-tune only a small set of additional learnable parameters while keeping the pre-trained model's parameters frozen. It enables efficient adaptation to new tasks while mitigating the risk of catastrophic forgetting. These methods typically attach one independent task-specific prompt to each layer of pre-trained models to locally modulate its features, ensuring that the layer's representation aligns with the requirements of the new task. However, although...
209 Physics-Based Benchmarking Metrics for Multimodal Synthetic Images
2511.15204
Physics-based multimodal evaluation metric提出物理约束多模态评测指标衡量合成图像真实性。
cs.CVcs.AI
Kishor Datta Gupta, Marufa Kamal, Md. Mahfuzur Rahman, Fahad Rahman, Mohd Ariful Haque
Current state of the art measures like BLEU, CIDEr, VQA score, SigLIP-2 and CLIPScore are often unable to capture semantic or structural accuracy, especially for domain-specific or context-dependent scenarios. For this, this paper proposes a Physics-Constraine...
Current state of the art measures like BLEU, CIDEr, VQA score, SigLIP-2 and CLIPScore are often unable to capture semantic or structural accuracy, especially for domain-specific or context-dependent scenarios. For this, this paper proposes a Physics-Constrained Multimodal Data Evaluation (PCMDE) metric combining large language models with reasoning, knowledge based mapping and vision-language models to overcome these limitations. The architecture is comprised of three main stages: (1) feature ex...
210 GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection
2512.02991
Multimodal 3D object detection fusion以动态图注意卷积与跨模态Transformer融合图像点云做3D检测。
cs.CV
Md Sohag Mia, Md Nahid Hasan, Muhammad Abdullah Adnan
Despite significant progress in 3D object detection, point clouds remain challenging due to sparse data, incomplete structures, and limited semantic information. Capturing contextual relationships between distant objects presents additional difficulties. To ad...
Despite significant progress in 3D object detection, point clouds remain challenging due to sparse data, incomplete structures, and limited semantic information. Capturing contextual relationships between distant objects presents additional difficulties. To address these challenges, we propose GraphFusion3D, a unified framework combining multi-modal fusion with advanced feature learning. Our approach introduces the Adaptive Cross-Modal Transformer (ACMT), which adaptively integrates image featur...
211 Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
2512.03454
Future-aware visual grounding提出世界模型式推理框架,先预测场景演化再做驾驶指令目标定位。
cs.CVcs.AI
Haicheng Liao, Huanming Shen, Bonan Wang, Yongkang Li, Yihong Tang
Interpreting natural-language commands to localize target objects is critical for autonomous driving (AD). Existing visual grounding (VG) methods for autonomous vehicles (AVs) typically struggle with ambiguous, context-dependent instructions, as they lack reas...
Interpreting natural-language commands to localize target objects is critical for autonomous driving (AD). Existing visual grounding (VG) methods for autonomous vehicles (AVs) typically struggle with ambiguous, context-dependent instructions, as they lack reasoning over 3D spatial relations and anticipated scene evolution. Grounded in the principles of world models, we propose ThinkDeeper, a framework that reasons about future spatial states before making grounding decisions. At its core is a Sp...
212 ProcObject-10K: Benchmarking Object-Centric Procedural Understanding in Instructional Videos
2512.03479
Object-centric procedural video QA构建ProcObject-10K评测物体状态变化推理与时序证据定位的VideoQA。
cs.CV
Wenliang Guo, Yu Kong
Procedural activities are fundamentally driven by object state transitions, yet existing instructional video benchmarks remain action-centric and cannot evaluate whether models reason about how objects evolve toward task completion. In this work, we introduce ...
Procedural activities are fundamentally driven by object state transitions, yet existing instructional video benchmarks remain action-centric and cannot evaluate whether models reason about how objects evolve toward task completion. In this work, we introduce ProcObject-10K, the first benchmark that jointly evaluates object-centric reasoning and temporal evidence grounding in instructional videos, across both egocentric and exocentric views. It comprises 10,522 open-ended VideoQA pairs grounded ...
213 S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss
2601.01285
Medical image segmentation network提出谱空混合分割网络并配形态感知自适应损失以提升医学分割效果。
cs.CV
Md. Sanaullah Chowdhury Lameya Sabrin
Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures ...
Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures fail to resolve. Although convolutional networks provide local precision at $\mathcal{O}(n)$ cost but limited receptive fields, vision transformers achieve global context through $\mathcal{O}(n^2)$ self-attention at prohibitive computationa...
214 CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval
2601.03728
Composed image retrieval alignment用CoT增强对称对齐与记忆库缓解多模态表征碎片化提升CIR检索。
cs.CVcs.AI
Zhipeng Qian, Zihan Liang, Yufei Ma, Ben Chen, Huangyu Dai
Composed Image Retrieval (CIR) enables users to search for target images using both a reference image and manipulation text, offering substantial advantages over single-modality retrieval systems. However, existing CIR methods suffer from representation space ...
Composed Image Retrieval (CIR) enables users to search for target images using both a reference image and manipulation text, offering substantial advantages over single-modality retrieval systems. However, existing CIR methods suffer from representation space fragmentation: queries and targets comprise heterogeneous modalities and are processed by distinct encoders, forcing models to bridge misaligned representation spaces only through post-hoc alignment, which fundamentally limits retrieval per...
215 A Unified and Controllable Framework for Layered Image Generation with Visual Effects
2601.15507
Layered image generation effects统一可控分层生成框架,保持主体身份并生成阴影反射等真实视觉效果。
cs.CV
Jinrui Yang, Qing Liu, Yijun Li, Mengwei Ren, Letian Zhang
Recent image generation models produce impressive composites, but often fail to preserve the identity of user-provided content when editing specific elements: the surrounding scene may shift, and even the edited object's appearance can drift from the original....
Recent image generation models produce impressive composites, but often fail to preserve the identity of user-provided content when editing specific elements: the surrounding scene may shift, and even the edited object's appearance can drift from the original. Layered representation offer a natural remedy--they allow users to independently manipulate individual elements--but existing layered methods typically produce transparent foregrounds without realistic visual effects such as shadows and re...
216 Contrast-X: A Multi-Modal Contrast Image Synthesis Benchmark and Universal Modality Flow Matching
2601.15884
Contrast image synthesis benchmark发布Contrast-X配对增强数据集并提出通用模态流匹配以合成对比增强影像。
cs.CV
Yifan Chen, Fei Yin, Hao Chen, Jia Wu, Chao Li
Contrast-enhanced imaging is central to oncologic diagnosis, but contrast agents can be contraindicated for many of the patients who need them most. Synthesizing contrast scans from non-contrast inputs is the natural response. Two obstacles stand in the way: n...
Contrast-enhanced imaging is central to oncologic diagnosis, but contrast agents can be contraindicated for many of the patients who need them most. Synthesizing contrast scans from non-contrast inputs is the natural response. Two obstacles stand in the way: no benchmark provides paired contrast data with lesion-level evaluation, and no single model handles the arbitrary missing patterns seen in practice. We introduce Contrast-X, a benchmark of paired contrast-enhanced and non-contrast imaging s...
217 A Step to Decouple Optimization in 3DGS
2601.16736
3D Gaussian splatting optimization分析3DGS优化细节并提出解耦更新策略以提升训练稳定性与效果。
cs.CV
Renjie Ding, Yaonan Wang, Min Liu, Jialin Zhu, Jiazheng Wang
3D Gaussian Splatting (3DGS) has emerged as a powerful technique for real-time novel view synthesis. As an explicit representation optimized through gradient propagation among primitives, optimization widely accepted in deep neural networks (DNNs) is actually ...
3D Gaussian Splatting (3DGS) has emerged as a powerful technique for real-time novel view synthesis. As an explicit representation optimized through gradient propagation among primitives, optimization widely accepted in deep neural networks (DNNs) is actually adopted in 3DGS, such as synchronous weight updating and Adam with the adaptive gradient. However, considering the physical significance and specific design in 3DGS, there are two overlooked details in the optimization of 3DGS: (i) update s...
218 Structure Over Scale: Learning Visual Reasoning from Pedagogical Video
2601.23251
Pedagogical video reasoning learning利用儿童教学视频的问答结构对齐线索,学习空间关系等基础视觉推理。
cs.CV
Bishoy Galoaa, Xiangyu Bai, Sarah Ostadabbas
State-of-the-art vision-language models (VLMs) score impressively on video benchmarks yet stumble on basic visual reasoning tasks involving spatial relations, navigation, and object selection that a preschooler solves easily. We hypothesize that the explicit p...
State-of-the-art vision-language models (VLMs) score impressively on video benchmarks yet stumble on basic visual reasoning tasks involving spatial relations, navigation, and object selection that a preschooler solves easily. We hypothesize that the explicit pedagogical structure, specifically the context-question-pause-answer cycles embedded in children's educational video, provides naturally co-aligned reasoning traces: temporally synchronized visual cues, questions, and answers that emerge on...
219 AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation
2602.04672
Hand-object interaction reconstruction提出Agentic生成式方法从单目视频重建手物交互,减少遮挡与SfM依赖。
cs.CV
Jin-Chuan Shi, Binhong Ye, Tao Liu, Xiaoyang Liu, Yangjinhui Xu
Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neura...
Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neural rendering often yields fragmented, non-simulation-ready geometries under heavy occlusion, and (2) dependence on brittle Structure-from-Motion (SfM) initialization leads to frequent failures on in-the-wild footage. To overcome these limita...
220 SynthForensics: Benchmarking and Evaluating People-Centric Synthetic Video Deepfakes
2602.04939
Synthetic video deepfake benchmark构建以人物为中心的合成视频深伪基准并系统评测检测方法与压缩影响。
cs.CV
Roberto Leotta, Salvatore Alfio Sambataro, Claudio Vittorio Ragaglia, Mirko Casu, Yuri Petralia
Modern T2V/I2V generators synthesize people increasingly hard to distinguish from authentic footage, while current evaluation suites lag: legacy benchmarks target manipulation-based forgeries, and recent synthetic-video benchmarks prioritize scale over realist...
Modern T2V/I2V generators synthesize people increasingly hard to distinguish from authentic footage, while current evaluation suites lag: legacy benchmarks target manipulation-based forgeries, and recent synthetic-video benchmarks prioritize scale over realistic human depiction. We introduce SynthForensics, a people-centric benchmark of $20{,}445$ videos from 8 T2V and 7 I2V open-source generators, paired-source from FF++/DFD reals, two-stage human-validated, in four compression versions with fu...
221 Multimodal Latent Reasoning via Hierarchical Visual Cues Injection
2602.05359
Latent multimodal reasoning提出层级视觉线索注入,在潜空间进行多模态推理以减少冗长CoT与幻觉。
cs.CV
Yiming Zhang, Qiangyu Yan, Borui Jiang, Kai Han
The advancement of multimodal large language models (MLLMs) has enabled impressive perception capabilities. However, their reasoning process often remains a "fast thinking" paradigm, reliant on end-to-end generation or explicit, language-centric chains of thou...
The advancement of multimodal large language models (MLLMs) has enabled impressive perception capabilities. However, their reasoning process often remains a "fast thinking" paradigm, reliant on end-to-end generation or explicit, language-centric chains of thought (CoT), which can be inefficient, verbose, and prone to hallucination. This work posits that robust reasoning should evolve within a latent space, integrating multimodal signals seamlessly. We propose multimodal latent reasoning via HIer...
222 MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices
2602.06523
Tiny wearable activity recognition设计超轻量MicroBi-ConvLSTM在微控制器上实现低内存高精度活动识别。
cs.CV
Mridankan Mandal
Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets. State of the art lightweight architectures such as TinierHAR (34K parameters), and TinyHAR (55K parameters...
Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets. State of the art lightweight architectures such as TinierHAR (34K parameters), and TinyHAR (55K parameters) achieve strong accuracy, but exceed memory budgets of microcontrollers with limited SRAM once operating system overhead is considered. We present MicroBi-ConvLSTM, an ultra-lightweight convolutional recurrent architecture achieving 11.4K ...
223 Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
2602.07026
Modality gap subspace alignment刻画模态间几何偏移并提出子空间对齐训练范式以缩小MLLM模态鸿沟。
cs.CVcs.AIcs.MM
Xiaomin Yu, Yi Xin, Yuhui Zhang, Wenjie Zhang, Chonghan Liu
Despite the success of multimodal contrastive learning in aligning visual and linguistic representations, a persistent geometric anomaly, the Modality Gap, remains: embeddings of distinct modalities expressing identical semantics occupy systematically offset r...
Despite the success of multimodal contrastive learning in aligning visual and linguistic representations, a persistent geometric anomaly, the Modality Gap, remains: embeddings of distinct modalities expressing identical semantics occupy systematically offset regions. Prior approaches to bridge this gap are largely limited by oversimplified isotropic assumptions, hindering their application in large-scale scenarios. In this paper, we address these limitations by precisely characterizing the geome...
224 Towards Explainable Industrial Anomaly Detection via Knowledge-Guided Latent Reasoning
2602.09850
Explainable industrial anomaly detection提出知识引导的动态潜变量推理框架,实现可解释的工业缺陷检测。
cs.CV
Peng Chen, Chao Huang, Yunkang Cao, Chengliang Liu, Wei Wang
Industrial anomaly detection demands precise reasoning over fine-grained defect patterns. However, existing multimodal large language models (MLLMs), pretrained on general-domain data, often struggle to capture category-specific anomalies, thereby limiting bot...
Industrial anomaly detection demands precise reasoning over fine-grained defect patterns. However, existing multimodal large language models (MLLMs), pretrained on general-domain data, often struggle to capture category-specific anomalies, thereby limiting both detection accuracy and interpretability. To address these limitations, we propose Reason-IAD, a knowledge-guided dynamic latent reasoning framework for explainable industrial anomaly detection. Reason-IAD comprises two core components. Fi...
225 The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs
2602.13298
CNN topology vs trainability在统一实验下比较VGG/ResNet/GoogLeNet,提出有效深度解释可训练性差异。
cs.CVcs.AI
Manfred M. Fischer, Joshua Pitts
This paper investigates the relationship between convolutional neural network (CNN) topology and image recognition performance through a comparative study of the VGG, ResNet, and GoogLeNet architectural families. Utilizing a unified experimental framework, the...
This paper investigates the relationship between convolutional neural network (CNN) topology and image recognition performance through a comparative study of the VGG, ResNet, and GoogLeNet architectural families. Utilizing a unified experimental framework, the study isolates the impact of depth from confounding implementation variables. A formal distinction is introduced between nominal depth ($D_{\mathrm{nom}}$), representing the physical layer count, and effective depth ($D_{\mathrm{eff}}$), a...
226 AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers
2602.13357
Diffusion Transformer cache correction提出自适应偏移缓存校正,缓解特征复用漂移并加速DiT采样。
cs.CVcs.AI
Dong Liu, Yanxuan Yu, Ben Lengerich, Ying Nian Wu
Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate featu...
Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate features, they rely on static reuse schedules or coarse-grained heuristics, which often lead to temporal drift and cache misalignment that significantly degrade generation quality. We introduce \textbf{AdaCorrection}, an adaptive offset cache co...
227 A Causal Diffusion Model for Video Reconstruction from Ultra-Low-Bitrate Representations
2602.13837
Low-bitrate video reconstruction diffusion提出因果视频扩散模型,从超低码率语义与压缩帧重建高一致性视频。
cs.CV
Cem Eteke, Batuhan Tosun, Martin Piccolrovazzi, Alexander Griessel, Wolfgang Kellerer
We study video reconstruction from ultra-low-bitrate representations, where the primary challenge shifts from encoding to decoding. In this regime, reconstruction with classical and neural codecs introduces blur, while generative and semantic approaches often ...
We study video reconstruction from ultra-low-bitrate representations, where the primary challenge shifts from encoding to decoding. In this regime, reconstruction with classical and neural codecs introduces blur, while generative and semantic approaches often struggle to jointly preserve fidelity, temporal consistency, and perceptual quality. To address these limitations, we propose a causal video diffusion model that reconstructs videos from ultra-low-bitrate semantics and highly compressed fra...
228 RL-RIG: A Generative Spatial Reasoner via Intrinsic Reflection
2602.19974
RL-based spatially faithful generation用反思式强化学习训练生成模型,提升对提示中细粒度空间关系的遵循。
cs.CV
Tianyu Wang, Zhiyuan Ma, Qian Wang, Xinyi Zhang, Xinwei Long
Recent advancements in image generation have achieved impressive results in producing high-quality images. However, existing image generation models still generally struggle with a spatial reasoning dilemma, lacking the ability to accurately capture fine-grain...
Recent advancements in image generation have achieved impressive results in producing high-quality images. However, existing image generation models still generally struggle with a spatial reasoning dilemma, lacking the ability to accurately capture fine-grained spatial relationships from the prompt and correctly generate scenes with structural integrity. To mitigate this dilemma, we propose RL-RIG, a Reinforcement Learning framework for Reflection-based Image Generation. Our architecture compri...
229 Pretty Good Measurement for Radiomics: A Quantum-Inspired Multi-Class Classifier for Lung Cancer Subtyping and Prostate Cancer Risk Stratification
2603.00223
Quantum-inspired radiomics classifier将PGM量子测量思想用于多分类,做肺癌分型与前列腺风险分层。
cs.CV
Giuseppe Sergioli, Carlo Cuccu, Giovanni Pasini, Alessandro Stefano, Giorgio Russo
We investigate a quantum-inspired approach to supervised multi-class classification based on the Pretty Good Measurement (PGM), viewed as an operator-valued decision rule derived from quantum state discrimination. The method associates each class with an encod...
We investigate a quantum-inspired approach to supervised multi-class classification based on the Pretty Good Measurement (PGM), viewed as an operator-valued decision rule derived from quantum state discrimination. The method associates each class with an encoded mixed state and performs classification through a single POVM construction, thus providing a genuinely multi-class strategy without reduction to pairwise or one-vs-rest schemes. In this perspective, classification is reformulated as the ...
230 InterCoG: Towards Spatially Precise Image Editing with Interleaved Chain-of-Grounding Reasoning
2603.01586
Chain-of-grounding image editing提出交错式定位推理链,在多实体复杂场景中实现更精确的文本引导编辑。
cs.CV
Yecong Wan, Fan Li, Chunwei Wang, Hao Wu, Mingwen Shao
Emerging unified editing models have demonstrated strong capabilities in general object editing tasks. However, it remains a significant challenge to perform fine-grained editing in complex multi-entity scenes, particularly those where targets are not visually...
Emerging unified editing models have demonstrated strong capabilities in general object editing tasks. However, it remains a significant challenge to perform fine-grained editing in complex multi-entity scenes, particularly those where targets are not visually salient and require spatial reasoning. To this end, we propose InterCoG, a novel text-vision Interleaved Chain-of-Grounding reasoning framework for fine-grained image editing in complex real-world scenes. The key insight of InterCoG is to ...
231 SemanticDialect: Semantic-Aware Mixed-Format Quantization for Video Diffusion Transformers
2603.02883
Quantization for video DiTs提出语义感知混合格式量化,降低视频DiT算力内存并保持时序语义质量。
cs.CV
Wonsuk Jang, Thierry Tambe
Diffusion Transformers (DiTs) achieve state-of-the-art video generation quality, but their substantial memory and computational footprints hinder edge deployment. Quantization can reduce these costs, yet existing methods often degrade video quality due to high...
Diffusion Transformers (DiTs) achieve state-of-the-art video generation quality, but their substantial memory and computational footprints hinder edge deployment. Quantization can reduce these costs, yet existing methods often degrade video quality due to high activation variation and the difficulty of preserving semantic and temporal coherence. We propose SemanticDialect, which advances block-wise mixed-format quantization. In this framework, each block selects an optimal format (dialect) from ...
232 Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks
2603.04676
Multi-image VLM attention analysis发现多图推理VLM注意力脉冲与位置偏置,并用PulseFocus训练改善聚焦。
cs.CVcs.AI
Chenjun Li
Multi-image reasoning remains a significant challenge for vision-language models (VLMs). We investigate a previously overlooked phenomenon: during chain-of-thought (CoT) generation, the text-to-image (T2I) attention of reasoning VLMs exhibits diffuse "pulses":...
Multi-image reasoning remains a significant challenge for vision-language models (VLMs). We investigate a previously overlooked phenomenon: during chain-of-thought (CoT) generation, the text-to-image (T2I) attention of reasoning VLMs exhibits diffuse "pulses": sporadic and unfocused attention patterns that fail to concentrate on task-relevant images. We further reveal a systematic positional bias in attention allocation across images. Motivated by these observations, we propose PulseFocus, a tra...
233 LR-SGS: Robust LiDAR-Reflectance-Guided Salient Gaussian Splatting for Self-Driving Scene Reconstruction
2603.12647
LiDAR-guided Gaussian splatting融合LiDAR反射率与RGB引导显著高斯建模,提升自动驾驶场景3D重建。
cs.CVcs.AI
Ziyu Chen, Fan Zhu, Hui Zhu, Deyi Kong, Xinkai Kuang
Recent 3D Gaussian Splatting (3DGS) methods have demonstrated the feasibility of self-driving scene reconstruction and novel view synthesis. However, most existing methods either rely solely on cameras or use LiDAR only for Gaussian initialization or depth sup...
Recent 3D Gaussian Splatting (3DGS) methods have demonstrated the feasibility of self-driving scene reconstruction and novel view synthesis. However, most existing methods either rely solely on cameras or use LiDAR only for Gaussian initialization or depth supervision, while the rich scene information contained in point clouds, such as reflectance, and the complementarity between LiDAR and RGB have not been fully exploited, leading to degradation in challenging self-driving scenes, such as those...
234 Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models
2603.14186
Fair benchmarking one-step generators提出设置匹配与语义缩放的评测协议,公平对比一步生成与多步扩散/流模型。
cs.CV
Advaith Ravishankar, Serena Liu, Mingyang Wang, Todd Zhou, Jeffrey Zhou
State-of-the-art text-to-image models produce high-quality images, but inference remains expensive as generation requires several sequential ODE or denoising steps. Native one-step models aim to reduce this cost by mapping noise to an image in a single step, y...
State-of-the-art text-to-image models produce high-quality images, but inference remains expensive as generation requires several sequential ODE or denoising steps. Native one-step models aim to reduce this cost by mapping noise to an image in a single step, yet fair comparisons to multi-step systems are difficult because studies use mismatched sampling steps and different classifier-free guidance (CFG) settings, where CFG can shift FID, Inception Score, and CLIP-based alignment in opposing dire...
235 Clinically Aware Synthetic Image Generation for Concept Coverage in Chest X-ray Models
2603.15525
Clinically constrained CXR synthesis提出临床与解剖约束的胸片合成框架,扩展概念组合覆盖以提升诊断模型可靠性。
cs.CV
Amy Rafferty, Rishi Ramaesh, Ajitha Rajan
Deep learning models for chest X-ray diagnosis are constrained by limited coverage of clinically meaningful concept combinations in publicly available training datasets. While synthetic image generation has been explored to increase data diversity, existing me...
Deep learning models for chest X-ray diagnosis are constrained by limited coverage of clinically meaningful concept combinations in publicly available training datasets. While synthetic image generation has been explored to increase data diversity, existing methods rarely enforce clinical or anatomical constraints, limiting utility for improving model reliability. We propose CARPA, a clinically aware and anatomically grounded framework for synthetic chest X-ray generation that applies targeted p...
236 Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation
2603.16876
Multi-agent RL radiology reporting用多模态多智能体强化学习端到端优化胸片分区解读与汇总生成报告。
cs.CVcs.LGcs.AI
Kaito Baba, Risa Kishikawa, Satoshi Kodera
We propose MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation that trains the entire agentic system on policy within its deployed radiology workflow. MARL-Rad addresses the limitation of post-hoc agentization, ...
We propose MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation that trains the entire agentic system on policy within its deployed radiology workflow. MARL-Rad addresses the limitation of post-hoc agentization, where fixed LLMs are organized into hand-designed agentic workflows without being optimized for their assigned roles. Our framework decomposes chest X-ray interpretation into region-specific agents and a global integrating agent, and jointl...
237 Attention Sparsity is Input-Stable: Training-Free Sparse Attention for Video Generation via Offline Sparsity Profiling and Online QK Co-Clustering
2603.18636
Training-free sparse attention video离线剖析稀疏模式并在线QK协同聚类,实现免训练稀疏注意力加速视频生成。
cs.CV
Jiayi Luo, Jiayu Chen, Jiankun Wang, Cong Wang, Hanxin Zhu
Diffusion Transformers (DiTs) achieve strong video generation quality but suffer from high inference cost due to dense 3D attention, motivating sparse attention techniques for improving efficiency. However, existing training-free sparse attention methods for v...
Diffusion Transformers (DiTs) achieve strong video generation quality but suffer from high inference cost due to dense 3D attention, motivating sparse attention techniques for improving efficiency. However, existing training-free sparse attention methods for video generation still face two unresolved limitations: ignoring layer heterogeneity in attention pruning and ignoring query-key coupling in block partitioning, which hinder a better quality-speedup trade-off. In this work, we uncover a crit...
238 Motion-o: Trajectory-Grounded Video Reasoning
2603.18856
Trajectory-grounded video reasoning提出轨迹证据链表征与监督,使视频推理显式解释物体运动过程。
cs.CVcs.AI
Bishoy Galoaa, Shayda Moezzi, Xiangyu Bai, Sarah Ostadabbas
Recent video reasoning models increasingly produce spatio-temporal evidence chains that localize objects at specific timestamps. While these traces improve interpretability by grounding \emph{where} and \emph{when} evidence appears, they often leave the motion...
Recent video reasoning models increasingly produce spatio-temporal evidence chains that localize objects at specific timestamps. While these traces improve interpretability by grounding \emph{where} and \emph{when} evidence appears, they often leave the motion connecting observations, the \textit{how}, implicit. This makes dynamic and trajectory-dependent claims difficult to supervise, verify, or penalize when unsupported by the video. We formalize this missing component as Spatial-Temporal-Traj...
239 SteelDefectX: A Multi-Form Vision-Language Dataset and Benchmark for Steel Surface Defect Analysis
2603.21824
Steel defect vision-language benchmark发布SteelDefectX多形态文本标注数据集,评测钢表面缺陷的视觉语言理解。
cs.CVcs.AI
Shuxian Zhao, Jie Gui, Baosheng Yu, Dacheng Tao
Steel surface defect analysis is critical for industrial quality control, yet existing benchmarks rely primarily on label-only annotations, limiting fine-grained semantic understanding and systematic evaluation of vision-language models. To address this gap, w...
Steel surface defect analysis is critical for industrial quality control, yet existing benchmarks rely primarily on label-only annotations, limiting fine-grained semantic understanding and systematic evaluation of vision-language models. To address this gap, we introduce SteelDefectX, a vision-language dataset with multi-form textual annotations for steel surface defect analysis, comprising 7,778 images across 25 defect categories. At the class level, the dataset provides defect names, represent...
240 Automatic Image-Level Morphological Trait Annotation for Organismal Images
2604.01619
Morphological trait annotation用稀疏自编码器挖掘可解释特征,实现生物图像形态性状的自动标注。
cs.CVcs.AI
Vardaan Pahuja, Samuel Stevens, Alyson East, Sydne Record, Yu Su
Morphological traits are physical characteristics of biological organisms that provide vital clues on how organisms interact with their environment. Yet extracting these traits remains a slow, expert-driven process, limiting their use in large-scale ecological...
Morphological traits are physical characteristics of biological organisms that provide vital clues on how organisms interact with their environment. Yet extracting these traits remains a slow, expert-driven process, limiting their use in large-scale ecological studies. A major bottleneck is the absence of high-quality datasets linking biological images to trait-level annotations. In this work, we demonstrate that sparse autoencoders trained on foundation-model features yield monosemantic, spatia...
241 DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection
2604.02753
Open-Vocabulary Object Detection提出解耦式DETR以高效实现开放词汇目标检测。
cs.CV
Siheng Wang, Yanshu Li, Bohan Hu, Zhengdao Li, Haibo Zhan
Open-vocabulary Object Detection (OVOD) enables models to recognize objects beyond predefined categories, but existing approaches remain limited in practical deployment. On the one hand, multimodal designs often incur substantial computational overhead due to ...
Open-vocabulary Object Detection (OVOD) enables models to recognize objects beyond predefined categories, but existing approaches remain limited in practical deployment. On the one hand, multimodal designs often incur substantial computational overhead due to their reliance on text encoders at inference time. On the other hand, tightly coupled training objectives introduce a trade-off between closed-set detection accuracy and open-world generalization. Thus, we propose Decoupled Cognition DETR (...
242 Zero-Shot Quantization via Weight-Space Arithmetic
2604.03420
Zero-Shot Post-Training Quantization用权重空间算术提取量化向量零样本提升PTQ精度。
cs.CVcs.LGcs.AI
Daniele Solombrino, Antonio Andrea Gargiulo, Alessandro Zirilli, Luca Zhou, Adrian Robert Minut
We show that robustness to post-training quantization (PTQ) is a transferable direction in weight space. We call this direction the quantization vector: extracted from a donor task by simple weight-space arithmetic, it can be used to patch a receiver model and...
We show that robustness to post-training quantization (PTQ) is a transferable direction in weight space. We call this direction the quantization vector: extracted from a donor task by simple weight-space arithmetic, it can be used to patch a receiver model and improve post-PTQ Top-1 accuracy by up to 60 points in a 3-bit setting, without receiver-side quantization-aware training (QAT). Because the method requires no receiver training data, it provides a zero-shot, low-cost alternative to QAT for...
243 Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks
2604.19697
Multimodal STEM Reasoning Benchmark构建StepSTEM评测多模态模型的分步交错推理链。
cs.CV
Jing Jin, Hao Liu, Yan Bai, Yihang Lou, Zhenke Wang
Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuable testbed because it provides highly verifiable feedback, bu...
Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuable testbed because it provides highly verifiable feedback, but existing benchmarks often permit unimodal shortcuts due to modality redundancy and focus mainly on final-answer accuracy, overlooking the reasoning process itself. To address this challenge, we introduce StepSTEM: a graduate-level benchma...
244 Bridging Restoration and Generation Manifolds in One-Step Diffusion for Real-World Super-Resolution
2604.24136
One-Step Diffusion Super-Resolution提出IDaS-SR一阶段扩散框架提升真实超分质量与效率。
cs.CV
Shyang-En Weng, Yi-Cheng Liao, Yu-Syuan Xu, Wei-Chen Chiu, Ching-Chun Huang
Pretrained diffusion models have revolutionized real-world image super-resolution (Real-ISR) but suffer from computational bottlenecks due to iterative sampling. Recent single-step distillation accelerates inference but faces a stark perception-distortion trad...
Pretrained diffusion models have revolutionized real-world image super-resolution (Real-ISR) but suffer from computational bottlenecks due to iterative sampling. Recent single-step distillation accelerates inference but faces a stark perception-distortion trade-off due to rigid timestep initialization, distributional trajectory mismatches, and fragile stochastic modulation. To address this, we present Adaptive Inversion and Degradation-aware Sampling for Real-ISR (IDaS-SR), a one-step framework ...
245 Benchmarking and Improving GUI Agents in High-Dynamic Environments
2604.25380
GUI Agent Benchmarking基准并改进高动态GUI环境中的智能体决策与观测。
cs.CV
Enqi Liu, Liyuan Pan, Zhi Gao, Yan Yang, Chenrui Shi
Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments remains largely underexplor...
Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments remains largely underexplored. Existing agents typically rely on a single screenshot after each action for decision-making, leading to a partially observable (or even unobservable) Markov decision process, where the key GUI state including important information for a...
246 Instruction-Evidence Contrastive Dual-Stream Decoding for Grounded Vision-Language Reasoning
2604.25809
Grounded Vision-Language Decoding提出对比双流解码以平衡指令遵循与视觉证据落地。
cs.CV
Yashwant Pravinrao Bangde, Debaditya Roy
Vision-Language Models (VLMs) exhibit strong performance in instruction following and open-ended vision-language reasoning, yet they frequently generate fluent outputs that are weakly grounded in visual evidence. Prior works have shown that instruction prompti...
Vision-Language Models (VLMs) exhibit strong performance in instruction following and open-ended vision-language reasoning, yet they frequently generate fluent outputs that are weakly grounded in visual evidence. Prior works have shown that instruction prompting further worsens this issue by amplifying language priors, especially when the visual signal is uncertain or ambiguous. To address this challenge, we propose a decoding framework that explicitly balances linguistic informativeness and vis...
247 Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
2605.00814
Visual Memory for LVLMs提出PVM模块缓解视觉信号稀释并持续访问图像证据。
cs.CVcs.AI
Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visua...
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence length. To counteract this, we propose Persistent Visual Memory (PVM), a lightweight learnable module designed to strengthen sustained, on-demand access to visual evidence. Integrated a...
248 SRGAN-CKAN: Expressive Super-Resolution with Nonlinear Functional Operators under Minimal Resources
2605.01459
Lightweight Image Super-Resolution用非线性算子增强低资源超分模型的表达与细节重建。
cs.CVcs.AI
Roberto Isai Navaro-Avi\~na, Eduardo Said Merin-Martinez, Andres Mendez-Vazquez, Eduardo Rodriguez-Tello
Single-Image Super-Resolution (SISR) aims to reconstruct a High-Resolution (HR) image from a Low-Resolution (LR) observation, a fundamentally ill-posed problem where high-frequency details are severely degraded at large upscaling factors. Recent advances have ...
Single-Image Super-Resolution (SISR) aims to reconstruct a High-Resolution (HR) image from a Low-Resolution (LR) observation, a fundamentally ill-posed problem where high-frequency details are severely degraded at large upscaling factors. Recent advances have been driven by transformer-based architectures and diffusion models improve global context modeling and perceptual quality at the cost of increased computational complexity. In contrast, this work focuses on enhancing the expressivity of lo...
249 Super-Resolution of Airborne Laser Scanning Point Clouds for Forest Inventory
2605.02201
Point Cloud Super-Resolution提出3DFSR提升森林ALS点云密度并同时降噪。
cs.CV
Jinyuan Shao, Sangyoong Park, Chunxi Zhao, Ayman Habib, Songlin Fei
Airborne Laser Scanning (ALS) can collect point clouds across large areas, enabling large-scale forest inventory. However, ALS point clouds are sparse and noisy, resulting in inaccurate individual-tree-level forest inventory, such as stem localization and tree...
Airborne Laser Scanning (ALS) can collect point clouds across large areas, enabling large-scale forest inventory. However, ALS point clouds are sparse and noisy, resulting in inaccurate individual-tree-level forest inventory, such as stem localization and tree size estimation. To overcome this problem, we propose a deep learning model, 3D Forest Super Resolution (3DFSR), to simultaneously improve point density and reduce noise for ALS forest point cloud. 3DFSR is a voxel-based CNN with a U-Net a...
250 Metric Unreliability in Multimodal Machine Unlearning: A Systematic Analysis and Principled Unified Score
2605.02206
Multimodal Machine Unlearning Metrics系统分析多模态遗忘评测指标冲突并给出统一评分。
cs.CVcs.LG
Abdullah Ahmad Khan, Hamid Laga, Ferdous Sohel
Machine unlearning in Vision-Language Models (VLMs) is required for compliance with the General Data Protection Regulation (GDPR), yet current evaluation practices are inconsistent. We present the first systematic study of metric reliability in multimodal unle...
Machine unlearning in Vision-Language Models (VLMs) is required for compliance with the General Data Protection Regulation (GDPR), yet current evaluation practices are inconsistent. We present the first systematic study of metric reliability in multimodal unlearning. Five standard metrics, Forget Accuracy (FA), Retain Accuracy (RA), Membership Inference Attack (MIA), Activation Distance (AD), and JS divergence (JS), yield conflicting method rankings across three VQA benchmarks (MLLMU-Bench, UnLO...
251 Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis
2605.02357
Point Cloud Feature Aggregation提出通道关系与注意聚合并加邻域同质约束提升点云表征。
cs.CV
Jiaqi Shi, Jin Xiao, Xiaoguang Hu, Wenxuan Ji, Zichong Jia
In 3D point cloud understanding, the core challenge lies in accurately capturing discriminative features within complex neighborhoods, which directly affects the execution precision of downstream tasks such as embodied AI and autonomous driving. Existing metho...
In 3D point cloud understanding, the core challenge lies in accurately capturing discriminative features within complex neighborhoods, which directly affects the execution precision of downstream tasks such as embodied AI and autonomous driving. Existing methods explore feature correlation discrimination but are limited to point-level spatial distribution or channel responses, enabling only coarse-grained level evaluation. For modern multi-scale point cloud networks, such coarse-grained metrics ...
252 Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures
2605.04035
3D Gaussian Head Reconstruction提出HeadsUp从多视角快速重建高质量3D高斯人头。
cs.CVcs.LG
Evangelos Ntavelis, Sean Wu, Mohamad Shahbazi, Fabio Maninchedda, Dmitry Kostiaev
We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representa...
We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, en...
253 Zero-Shot Satellite Image Retrieval through Joint Embeddings: Application to Crisis Response
2605.05405
Zero-Shot Satellite Image Retrieval提出GeoQuery用联合嵌入实现自然语言零样本卫星检索。
cs.CV
James Walsh, William Fawcett, Grace Colvard, Ra\'ul Ramos-Poll\'an
Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-...
Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-sensing CLIP-style model requires paired data and compute that are unavailable at global scale. To allow natural language querying at global scales, we present GeoQuery, a zero-shot retrieval system that sidesteps data and compute constrain...
254 EGA: Adapting Frozen Encoders for Vector Search with Bounded Out-of-Distribution Degradation
2605.05674
OOD-Robust Vector Search Adapters提出EGA残差适配器降低向量检索在分布外查询的退化。
cs.CVcs.LGcs.AI
Dongfang Zhao
Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wron...
Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wrong seen-class clusters, dropping worst-case Label Precision by over 40 points below the frozen baseline in our tests. We propose Euclidean Geodesic Alignment (EGA), a residual adapter that couples three principles: zero initialization, local...
255 VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding
2605.05848
Efficient Long-Video Understanding提出VideoRouter按查询自适应路由压缩长视频视觉token。
cs.CVcs.AI
Kuanwei Lin, Wenhao Zhang, Ge Li
Video large multimodal models increasingly face a scalability bottleneck: long videos produce excessively long visual-token sequences, which sharply increase memory and latency during inference. While existing compression methods are effective in specific sett...
Video large multimodal models increasingly face a scalability bottleneck: long videos produce excessively long visual-token sequences, which sharply increase memory and latency during inference. While existing compression methods are effective in specific settings, most are either weakly query-aware or apply a fixed compression policy across frames, proving suboptimal when visual evidence is unevenly distributed over time. To address this, we present VideoRouter, a query-adaptive dual-router fra...
256 VISD: Enhancing Video Reasoning via Structured Self-Distillation
2605.06094
Video Reasoning Self-Distillation提出VISD结构化自蒸馏为视频推理提供细粒度监督。
cs.CVcs.AI
Hao Lin, Kunyang Lv, Xu Jiang, Jingqi Tian, Zhongjing Du
Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level rewards and the lack of fine grained credit assignment over long, temporally grounded reasoning trajectories. While reinforcement learning with verifiable rewards (RLVR) ...
Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level rewards and the lack of fine grained credit assignment over long, temporally grounded reasoning trajectories. While reinforcement learning with verifiable rewards (RLVR) provides reliable supervision, it fails to capture token level contributions, leading to inefficient learning. Conversely, existing self distillation methods offer dense supervision but lack structure and diagnostic specificity, and often i...
257 Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation
2605.06173
Retinal Diagnosis and Report Generation提出Retina-RAG联合视网膜分级检测与临床报告生成。
cs.CVcs.AI
Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier
Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost...
Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading, macular edema (ME) detection, and report generation. The architecture decouples a high-performance retinal classifier and a parameter-efficient vision-language model (Qwen2.5-VL-...
258 Eulerian Motion Guidance: Robust Image Animation via Bidirectional Geometric Consistency
2605.06280
Diffusion-Based Image Animation用相邻帧欧拉运动引导实现更稳健的可控图像动画生成。
cs.CV
Thong Nguyen, Khoi M. Le, Cong-Duy Nguyen, Luu Anh Tuan, See-Kiong Ng
Recent advancements in image animation have utilized diffusion models to breathe life into static images. However, existing controllable frameworks typically rely on Lagrangian motion guidance, where optical flow is estimated relative to the initial frame. Thi...
Recent advancements in image animation have utilized diffusion models to breathe life into static images. However, existing controllable frameworks typically rely on Lagrangian motion guidance, where optical flow is estimated relative to the initial frame. This paper revisits the same optical-flow primitive through a more local supervision design: we use adjacent-frame Eulerian motion fields to guide generation, where the motion signal always describes a short temporal hop. This shift enables pa...
259 Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement
2605.06298
Weight-Space World Models提出NOVA以隐式网络权重表征状态并渲染预测视频。
cs.CVcs.AI
Roussel Desmond Nzoyem, Mauro Comi
Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction leaves the...
Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction leaves these models computationally expensive and uninterpretable. We address this problem by introducing NOVA, a world modelling framework that represents the system state as the weights and biases of an auxiliary coordinate-based implicit neural re...
260 NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps
2605.06317
Top-Down Vision-Language Navigation将VLN改为俯视地图的一步全局路径规划以减少累积误差。
cs.CVcs.AI
Dijia Zhan, Jinyi Li, Chenxi Zheng, Shaoyu Huang, Yong Li
Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on in...
Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on incrementally updating memory graphs or scoring discrete path proposals, which restricts continuous spatial reasoning and creates discrete bottlenecks. We propose Top-Down VLN (TD-VLN), reformulating navigation as a one-step global path plann...
261 Multispectral Indices for Wildfire Management
2309.01751
Multispectral Wildfire Monitoring评估多光谱指数在野火监测管理中的信息提取能力。
cs.CV
Afonso Oliveira, Jo\~ao P. Matos-Carvalho, Filipe Moutinho, Nuno Fachada
The increasing frequency and severity of wildfires necessitates advanced methods for effective surveillance and management, as traditional ground-based techniques often struggle to adapt to rapidly changing fire behavior and environmental conditions. This stud...
The increasing frequency and severity of wildfires necessitates advanced methods for effective surveillance and management, as traditional ground-based techniques often struggle to adapt to rapidly changing fire behavior and environmental conditions. This study investigates the use of multispectral aerial and satellite imagery for wildfire management through an assessment of current literature and two practical case studies. We evaluate several multispectral indices for their ability to extract ...
262 Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise
2408.09929
Contrastive Learning Theory从信息论刻画对比学习增强等价于估计正激励噪声。
cs.CVcs.LG
Hongyuan Zhang, Yanchen Xu, Sida Huang, Xuelong Li
Inspired by the idea of Positive-incentive Noise (Pi-Noise or $\pi$-Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and $\pi$-noise in this paper. By converting the ...
Inspired by the idea of Positive-incentive Noise (Pi-Noise or $\pi$-Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and $\pi$-noise in this paper. By converting the contrastive loss to an auxiliary Gaussian distribution to quantitatively measure the difficulty of the specific contrastive model under the information theory framework, we properly define the task entropy, the core concept of $\pi$-noise, ...
263 RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization
2505.13289
Symmetry Discovery and Canonicalization提出RECON显式规范化朝向以鲁棒发现实例特定对称性。
cs.CVcs.LG
Alonso Urbano, David W. Romero, Max Zimmer, Sebastian Pokutta
Real world data often exhibits unknown, instance-specific symmetries that rarely exactly match a transformation group $G$ fixed a priori. Class-pose decompositions aim to create disentangled representations by factoring inputs into invariant features and a pos...
Real world data often exhibits unknown, instance-specific symmetries that rarely exactly match a transformation group $G$ fixed a priori. Class-pose decompositions aim to create disentangled representations by factoring inputs into invariant features and a pose $g\in G$ defined relative to a training-dependent, arbitrary canonical representation. We introduce RECON, a class-pose agnostic canonical orientation normalization that corrects arbitrary canonicals via a simple right translation, yieldi...
264 Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics
2509.08461
VLMs for Neutrino Classification微调视觉语言模型对高能物理探测器图像进行中微子分类。
cs.CVcs.LGcs.AI
Dikshant Sagar, Kaiwen Yu, Alejandro Yankelevich, Jianming Bian, Pierre Baldi
Recent advances in Large Language Models (LLMs) have demonstrated their remarkable capacity to process and reason over structured and unstructured data modalities beyond natural language. In this work, we explore the applications of Vision Language Models (VLM...
Recent advances in Large Language Models (LLMs) have demonstrated their remarkable capacity to process and reason over structured and unstructured data modalities beyond natural language. In this work, we explore the applications of Vision Language Models (VLMs), specifically a fine-tuned variant of LLaMA 3.2 to the task of identifying neutrino interactions in pixelated detector data from high-energy physics (HEP) experiments. We benchmark this model against a state-of-the-art convolutional neur...
265 Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models
2509.25584
Layer Skipping in VLMs给出视觉语言模型可跳层的理论条件以降低推理成本。
cs.CVcs.CLcs.LGcs.AI
Max Hartman, Vidhata Jayaraman, Moulik Choraria, Akhil Bhimaraju, Lav R. Varshney
Vision-language models achieve incredible performance across a wide range of tasks, but their large size makes inference costly. Recent work has shown that multimodal processing contains significant redundancies, making it possible to skip certain layers with ...
Vision-language models achieve incredible performance across a wide range of tasks, but their large size makes inference costly. Recent work has shown that multimodal processing contains significant redundancies, making it possible to skip certain layers with minimal performance loss. Yet current pruning techniques remain ad-hoc, relying on heuristics or hyperparameter sweeps rather than principled criteria for determining when layer skipping is beneficial. In this paper, we propose a unified fr...
266 Frequency-Aware Model Parameter Explorer: A new attribution method for improving explainability
2510.03245
Frequency-Domain Model Attribution提出频率感知参数归因方法用谱域扰动提升可解释性。
cs.CVcs.LGcs.AI
Ali Yavari, Alireza Mohamadi, Elham Beydaghi, Philipp Seeb\"ock, Rainer A. Leitgeb
State-of-the-art attribution methods rely on adversarial sample generation that applies an all-pass filter across the frequency spectrum, discarding fine-grained high-frequency information that is demonstrably important for accurate feature attribution in deep...
State-of-the-art attribution methods rely on adversarial sample generation that applies an all-pass filter across the frequency spectrum, discarding fine-grained high-frequency information that is demonstrably important for accurate feature attribution in deep neural networks. By generating adversarial samples that selectively perturb high- and low-frequency components, we can probe which spectral features a model relies on most -- directly translating frequency-domain exploration into attributi...
267 Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis
2511.09907
Reasoning-Driven Data Synthesis提出面向求解器的推理驱动题目生成以合成高价值训练数据。
cs.CVcs.AI
Yongxian Wei, Yilin Zhao, Zixuan Hu, Li Shen, Xinrui Chen
Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores th...
Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores the solver's ability and yields low-value problems, or reliance on complex data pipelines to balance problem difficulty; and (ii) a lack of reasoning in problem generation, leading to shallow problem variants. In this paper, we develop a prob...
268 Saving Foundation Flow-Matching Priors for Inverse Problems
2511.16520
Flow-Matching Priors for Inverse Problems提出FMPlug以热启动与正则化增强FM先验求解逆问题。
cs.CVcs.LG
Yuxiang Wan, Ryan Devera, Wenjie Zhang, Ju Sun
Foundation flow-matching (FM) models promise a universal prior for solving inverse problems (IPs), yet today they trail behind domain-specific or even untrained priors. How can we unlock their potential? We introduce FMPlug, a plug-in framework that redefines ...
Foundation flow-matching (FM) models promise a universal prior for solving inverse problems (IPs), yet today they trail behind domain-specific or even untrained priors. How can we unlock their potential? We introduce FMPlug, a plug-in framework that redefines how foundation FMs are used in IPs. FMPlug combines an instance-guided, time-dependent warm-start strategy with a sharp Gaussianity regularization, adding problem-specific guidance while preserving the Gaussian structures. This leads to a s...
269 Large Video Planner Enables Generalizable Robot Control
2512.15840
Video-Based Robot Planning提出大视频规划器以提升机器人控制在多任务上的泛化。
cs.CV
Boyuan Chen, Tianyuan Zhang, Haoran Geng, Caiyi Zhang, Peihao Li
General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (...
General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (VLA) systems. These efforts are motivated by the intuition that MLLMs' large-scale language and image pretraining can be effectively transferred to the action output modality. In this work, we explore an alternative paradigm of using large-...
270 DisCo-FLoc: Semantic-Free Floorplan Localization via $SE(2)$-Aware Contrastive Disambiguation
2601.01822
Floorplan Localization Contrastive Learning提出DisCo-FLoc用SE(2)对比消歧实现无语义平面定位。
cs.CV
Ping Zhong, Shiyong Meng, Bolei Chen, Tao Zou, Chaoxu Mu
Visual Floorplan Localization (FLoc) struggles with severe structural aliasing caused by repetitive minimalist layouts. This occurs because physically distant poses share highly similar visual-geometric features, which degrades spatial separability and angular...
Visual Floorplan Localization (FLoc) struggles with severe structural aliasing caused by repetitive minimalist layouts. This occurs because physically distant poses share highly similar visual-geometric features, which degrades spatial separability and angular discriminability. While existing methods attempt to mitigate these ambiguities by relying on costly semantic annotations, the resulting performance gains remain inherently limited. To address the above issues, we propose DisCo-FLoc, a sema...
271 DeepFedNAS: Efficient Hardware-Aware Architecture Adaptation for Heterogeneous IoT Federations via Pareto-Guided Supernet Training
2601.15127
Federated NAS for IoT提出硬件感知FedNAS框架,为异构IoT设备自适应网络结构。
cs.CVcs.LG
Bostan Khan, Masoud Daneshtalab
Deploying federated learning across heterogeneous IoT device fleets requires tailored neural network architectures for each device class, yet existing Federated Neural Architecture Search (FedNAS) methods suffer from unguided supernet training and prohibitivel...
Deploying federated learning across heterogeneous IoT device fleets requires tailored neural network architectures for each device class, yet existing Federated Neural Architecture Search (FedNAS) methods suffer from unguided supernet training and prohibitively costly post-training search pipelines that demand over 20 GPU-hours per deployment target. We introduce DeepFedNAS, a two-phase framework built on a multi-objective fitness function that synthesizes information-theoretic network metrics w...
272 Lossy Common Information in a Learnable Gray-Wyner Network
2601.21424
Learnable Gray-Wyner Codec设计可学习Gray-Wyner三通道编码器,分离多任务共享与特有信息。
cs.CVcs.LG
Anderson de Andrade, Alon Harell, Ivan V. Baji\'c
Many computer vision tasks share substantial overlapping information, yet conventional codecs tend to ignore this, leading to redundant and inefficient representations. The Gray-Wyner network, a classical concept from information theory, offers a principled fr...
Many computer vision tasks share substantial overlapping information, yet conventional codecs tend to ignore this, leading to redundant and inefficient representations. The Gray-Wyner network, a classical concept from information theory, offers a principled framework for separating common and task-specific information. Inspired by this idea, we develop a learnable three-channel codec that disentangles shared information from task-specific details across multiple vision tasks. We characterize the...
273 Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts
2602.03473
Continual Learning MoE Routing提出双层路由MoE持续学习方法,扩展到300+任务并兼顾稳定与可塑。
cs.CVcs.LG
Meng Lou, Yunxiang Fu, Yizhou Yu
Continual learning, especially class-incremental learning (CIL), on the basis of a pre-trained model (PTM) has garnered substantial research interest in recent years. However, how to effectively learn both discriminative and comprehensive feature representatio...
Continual learning, especially class-incremental learning (CIL), on the basis of a pre-trained model (PTM) has garnered substantial research interest in recent years. However, how to effectively learn both discriminative and comprehensive feature representations while maintaining stability and plasticity over very long task sequences remains an open problem. We propose CaRE, a scalable {C}ontinual Le{a}rner with efficient Bi-Level {R}outing Mixture-of-{E}xperts (BR-MoE). The core idea of BR-MoE ...
274 Direction-Flipped Influence Audits Reveal Hidden Structure in Moral Choices of LLMs
2602.22831
LLM Moral Choice Auditing用方向翻转提示审计,揭示LLM道德选择对微小线索的敏感结构。
cs.CVcs.CLcs.LGcs.AI
Phil Blandfort, Tushar Karayil, Alex McKenzie, Urja Pawar, Robert Graham
Moral benchmarks for LLMs typically score models on context-free prompts, implicitly treating the measured choice rate as stable. We test this assumption with a direction-flipped influence audit: for each scenario, we compare a baseline prompt with matched cue...
Moral benchmarks for LLMs typically score models on context-free prompts, implicitly treating the measured choice rate as stable. We test this assumption with a direction-flipped influence audit: for each scenario, we compare a baseline prompt with matched cues steering toward option A or option B. Across a trolley-problem-style moral triage task, BBQ, and DailyDilemmas, and across five LLM families with and without reasoning, short contextual cues shift per-condition choice rates by 12-18 perce...
275 3D tomography of exchange phase in a Si/SiGe quantum dot device
2603.16025
Quantum Dot Exchange Tomography对Si/SiGe量子点器件的交换相位进行三维层析重建以估计J(V)。
cs.CV
Dylan Albrecht, Sarah Thompson, N. Tobias Jacobson, Ryan Jock
The exchange interaction is a foundational building block for the operation of spin-based quantum processors. Extracting the exchange interaction coefficient $J(\mathbf{V})$, as a function of gate electrode voltages, is important for understanding disorder, fa...
The exchange interaction is a foundational building block for the operation of spin-based quantum processors. Extracting the exchange interaction coefficient $J(\mathbf{V})$, as a function of gate electrode voltages, is important for understanding disorder, faithfully simulating device performance, and operating spin qubits with high fidelity. Typical coherent measurements of exchange in spin qubit devices yield a modulated cosine of an accumulated phase, which in turn is the time integral of ex...
276 Drifting Fields are not Conservative
2604.06333
Nonconservative Drift Fields证明漂移生成模型的漂移场一般非保守,无法等价为标量损失优化。
cs.CVcs.LG
Leonard T. Franz, Sebastian Hoffmann, Tim Weiland, Bernhard Sch\"olkopf, Georg Martius
Drifting models have recently gained attention for generating high-quality samples in a single forward pass. During training, they learn a push-forward map by following a vector-valued field, the drift field. We ask whether this procedure is equivalent to opti...
Drifting models have recently gained attention for generating high-quality samples in a single forward pass. During training, they learn a push-forward map by following a vector-valued field, the drift field. We ask whether this procedure is equivalent to optimizing a scalar loss and find that, in general, it is not: drift fields are not conservative and cannot be written as the gradient of any scalar potential. We identify the position-dependent normalization as the source of non-conservatism, ...
277 3D Generation for Embodied AI and Robotic Simulation: A Survey
2604.26509
3D Generation Survey Robotics综述面向具身智能与机器人仿真的3D生成技术与交互物理需求。
cs.CV
Tianwei Ye, Yifan Mao, Minwen Liao, Jian Liu, Chunchao Guo
Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements f...
Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This surve...
278 Affordance Agent Harness: Verification-Gated Skill Orchestration
2605.00663
Affordance Skill Orchestration提出验证门控的技能编排框架,按实例难度调度并纠错可供性推理。
cs.CV
Haojian Huang, Jiahao Shi, Yinchuan Li, Yingcong Chen
Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occluded, reflective, and visually ambiguous. Recent systems therefore combine multiple skills (e.g., detection, se...
Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occluded, reflective, and visually ambiguous. Recent systems therefore combine multiple skills (e.g., detection, segmentation, interaction-imagination), yet most orchestrate them with fixed pipelines that are poorly matched to per-instance difficulty, offer limited targeted recovery from intermediate errors, and fail to reuse experience from recurring o...
279 3DSS: 3D Surface Splatting for Inverse Rendering
2605.05876
Differentiable Surface Splatting提出可微表面splatting渲染器,用于多视图物理逆渲染与分层合成。
cs.CV
Mae Younes, Adnane Boukhayma
We present 3D Surface Splatting (3DSS), the first differentiable surface splatting renderer for physically-based inverse rendering from multi-view images. Our central insight is that the surface separation problem at the heart of surface splatting admits a dir...
We present 3D Surface Splatting (3DSS), the first differentiable surface splatting renderer for physically-based inverse rendering from multi-view images. Our central insight is that the surface separation problem at the heart of surface splatting admits a direct formulation in terms of the reconstruction kernels themselves. From this foundation we derive a coverage-based compositing model whose per-layer opacity arises directly from the accumulated Elliptical Weighted Average reconstruction wei...
280 SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders
2605.06610
Adaptive Sparse Autoencoders提出动态Top-K稀疏自编码器,按输入自适应选择激活特征数。
cs.CVcs.LG
Jakub St\k{e}pie\'n, Marcin Mazur, Jacek Tabor, Przemys{\l}aw Spurek
Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets ...
Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets of monosemantic features, SAEs aim to translate neural network computations into human-understandable concepts. However, common architectures such as TopK SAEs rely on a fixed sparsity level. They enforce the same number of active features ...
cs.LG 449 papers
492 A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
2605.06678
WGAN气候情景生成用Wasserstein GAN生成土壤沉降相关气候情景以支持保险风险管理。
cs.LG
Antoine Heranval (BioSP), Olivier Lopez (CREST), Didier Ngatcha (CREST), Daniel Nkameni (CREST)
According to the United Nations Office for Disaster Risk Reduction (2025), the average annual cost of natural catastrophes increased from 70--80 billion USD between 1970 and 2000 to 180--200 billion USD between 2001 and 2020. Reports from organizations such as...
According to the United Nations Office for Disaster Risk Reduction (2025), the average annual cost of natural catastrophes increased from 70--80 billion USD between 1970 and 2000 to 180--200 billion USD between 2001 and 2020. Reports from organizations such as the IFOA and the WWF highlight the need for the insurance sector to adapt to this rapidly evolving context by developing medium- to long-term strategies that go beyond the one-year horizon of prudential regulations such as Solvency II. Thi...
493 Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding
2605.06679
VLM去幻觉解码提出训练无关PND解码对比正负路径以增强视觉一致性减少幻觉。
cs.LG
Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng
Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference fram...
Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference framework that intervenes directly in the decoding process to enforce visual fidelity. PND is motivated by our finding of an attention imbalance in VLMs, where visual features are under-weighted. Our framework introduces a dual-path contrast: a...
494 From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes
2605.06684
树碰撞事故风险建模用混合预测框架量化树相关交通事故严重度的关键风险因素。
cs.LG
Abdul Azim, Ahmed Hossain, Soumyadip Maitra, Panick Kalambay
Tree-involved crashes represent a critical subset of run-off-road (ROR) collisions, often resulting in fatal or severe injuries due to high-energy impacts. This study develops a comprehensive analytical framework to identify and quantify risk factors contribut...
Tree-involved crashes represent a critical subset of run-off-road (ROR) collisions, often resulting in fatal or severe injuries due to high-energy impacts. This study develops a comprehensive analytical framework to identify and quantify risk factors contributing to crash severity in tree-involved collisions using the Crash Report Sampling System (CRSS) database spanning 2020-2023. The modeling framework follows a multi-step process. First, a machine learning based classification model (CatBoost...
495 Robustness of Refugee-Matching Gains to Off-Policy Evaluation Choices
2605.06686
难民匹配的离线评估稳健性比较多种离线评估方法检验美国难民匹配收益结论的稳健性。
cs.LG
Kirk Bansak, Elisabeth Paulson, Dominik Rothenh\"ausler, Jeremy Ferwerda, Jens Hainmueller
Previous research has investigated the potential of refugee matching for boosting refugee outcomes, first considered by Bansak et al. (2018). This paper demonstrates the stability of counterfactual impact evaluation results in the context of refugee matching i...
Previous research has investigated the potential of refugee matching for boosting refugee outcomes, first considered by Bansak et al. (2018). This paper demonstrates the stability of counterfactual impact evaluation results in the context of refugee matching in the United States using a range of off-policy evaluation methods. In order to estimate counterfactual impact and test the robustness of our results, we employ several evaluation methods, including inverse probability weighting (IPW) and m...
496 Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion
2605.06720
离散扩散抗体序列生成用分类器引导的离散扩散生成抗体序列并吸收胚系偏置以建模体细胞变异。
cs.LGcs.AI
Justin Sanders, Luca Giancardo, Lan Guo, Yue Zhao, Kemal Sonmez
Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as powerful tools for ant...
Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as powerful tools for antibody sequence design, existing approaches largely suffer from two key limitations: they predominantly memorize germline sequences rather than modeling biologically meaningful somatic variation, and they offer limited support for flexible c...
497 Enabling Unsupervised Training of Deep EEG Denoisers With Intelligent Partitioning
2605.06724
无监督EEG去噪训练用智能分区实现无需干净标签的深度可穿戴EEG去噪训练。
cs.LGcs.AI
Qiyu Rao, Haozhe Tian, Homayoun Hamedmoghadam, Danilo Mandic
Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot ...
Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot handle the time-varying pervasive artifacts in wearable EEGs. Deep learning methods, on the other hand, show promise in decomposition-free EEG denoising using highly expressive neural networks, but the training requires artifact-free EEG, w...
498 Transformer-Based Wildlife Species Classification from Daily Movement Trajectories
2605.06726
轨迹Transformer物种识别用Transformer从日常GPS运动轨迹中分类野生动物物种并跨区域泛化。
cs.LG
Obed Irakoze, Prasenjit Mitra
Inferring the identity of wildlife species from daily movement data alone is a challenging task. We train sequence models on large-scale, 7-species GPS trajectories from the Movebank platform. Trajectories models are evaluated using a protocol in which entire ...
Inferring the identity of wildlife species from daily movement data alone is a challenging task. We train sequence models on large-scale, 7-species GPS trajectories from the Movebank platform. Trajectories models are evaluated using a protocol in which entire telemetry studies or regions are heldout during testing. We compare Transformer-based sequence models to LSTM, CNN, and Temporal Convolutional Networks, and find that Transformers consistently achieve higher balanced accuracy with gains of ...
499 Medical Imaging Classification with Cold-Atom Reservoir Computing using Auto-Encoders and Surrogate-Driven Training
2605.06727
冷原子储备计算医学影像结合自编码器与中性原子储备计算实现息肉检测并用替代梯度训练。
cs.LG
Nuno Batista, Ana Morgado, Oscar Ferraz, Sagar Silva Pratapsi, Jorge Lobo
We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guide...
We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guided auto-encoder. This pipeline learns compact and discriminative representations of image data that are also well-suited for quantum reservoir computing. A key challenge in such systems is the non-differentiable nature of quantum measurement...
500 The E$\Delta$-MHC-Geo Transformer: Adaptive Geodesic Operations with Guaranteed Orthogonality
2605.06729
正交残差的几何Transformer用数据依赖Cayley变换构造始终正交的自适应残差连接Transformer。
cs.LGcs.AI
Arash Shahmansoori
We present the E$\Delta$-MHC-Geo Transformer, a novel architecture that unifies Manifold-Constrained Hyper-Connections (mHC), Deep Delta Learning (DDL), and the Cayley transform to obtain input-adaptive, unconditionally orthogonal residual connections. Unlike ...
We present the E$\Delta$-MHC-Geo Transformer, a novel architecture that unifies Manifold-Constrained Hyper-Connections (mHC), Deep Delta Learning (DDL), and the Cayley transform to obtain input-adaptive, unconditionally orthogonal residual connections. Unlike DDL, whose Householder operator is orthogonal only at $\beta \in \{0,2\}$, our Data-Dependent Cayley rotation $Q(x)=(I+(\beta/2)A(x))^{-1}(I-(\beta/2)A(x))$ preserves orthogonality for all $\beta$ and all inputs. To handle negation, an eige...
501 Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics
2605.06730
新闻语义状态抽象接口将新闻文本分解为可审计多轴状态用于LLM增强的投资决策与诊断。
cs.LG
Likhita Yerra (AIVANCITY School of AI and Data), Remi Uttejitha Allam (AIVANCITY School of AI and Data)
We introduce Semantic State Abstraction Interfaces (SSAI): a methodological template for mapping sparse unstructured text into $K$ auditable, named coordinates with neutral defaults on no-news days, designed to separate representation hypotheses from optimisat...
We introduce Semantic State Abstraction Interfaces (SSAI): a methodological template for mapping sparse unstructured text into $K$ auditable, named coordinates with neutral defaults on no-news days, designed to separate representation hypotheses from optimisation variance in sequential decision systems. Our contribution is the framework and its evaluation protocol, not a claim that SSAI outperforms denser alternatives. We instantiate SSAI with $K=4$ axes (sentiment, risk, confidence, volatility ...
502 On Training in Imagination
2605.06732
想象回放的模型式RL分析分析用学习到的动力学与奖励模型进行想象训练时误差如何影响回报与优化。
cs.LG
Nadav Timor, Ravid Shwartz-Ziv, Micah Goldblum, Yann LeCun, David Harel
State-of-the-art model-based reinforcement learning methods train policies on imagined rollouts. These rollouts are trajectories generated by a learned dynamics model and are scored by a learned reward model, but without querying the true environment during po...
State-of-the-art model-based reinforcement learning methods train policies on imagined rollouts. These rollouts are trajectories generated by a learned dynamics model and are scored by a learned reward model, but without querying the true environment during policy updates. We study this training paradigm by quantifying how errors in learned dynamics and reward models affect returns and policy optimization. First, we extend the analysis of Asadi et al. (2018) to MDPs with learned reward models, a...
503 Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA
2605.06733
联邦LoRA的规范不变聚合提出规范感知的低秩服务器表示以解决联邦LoRA因分解等价导致的聚合偏差。
cs.LGcs.AI
Jinqian Chen, Chang Liu, Jihua Zhu
Federated LoRA enables parameter-efficient adaptation of large language models under decentralized data and limited client resources.However, directly averaging LoRA factors is representation-dependent: the same intrinsic update admits infinitely many gauge-eq...
Federated LoRA enables parameter-efficient adaptation of large language models under decentralized data and limited client resources.However, directly averaging LoRA factors is representation-dependent: the same intrinsic update admits infinitely many gauge-equivalent factorizations, so factor-level aggregation can change under arbitrary coordinate choices while the underlying update remains unchanged. This reveals a semantic mismatch in existing federated LoRA aggregation rules. We propose \tex...
504 Gated QKAN-FWP: Scalable Quantum-inspired Sequence Learning
2605.06734
量子启发快速权重序列学习提出可扩展的门控QKAN-FWP以量子启发方式进行序列建模与快权重更新。
cs.LGcs.AI
Kuo-Chung Peng, Samuel Yen-Chi Chen, Jiun-Cheng Jiang, Chen-Yu Liu, En-Jui Kuo
Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-q...
Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-qubit architectures that are difficult to scale on noisy intermediate-scale quantum (NISQ) devices and expensive to simulate classically. We propose gated QKAN-FWP, a fast-weight framework that integrates FWP with Quantum-inspired Kolmogorov...
505 STDA-Net: Spectrogram-Based Domain Adaptation for cross-dataset Sleep Stage Classification
2605.06736
睡眠分期跨域自适应提出STDA-Net用频谱图输入的无监督域适配实现跨数据集睡眠分期。
cs.LGcs.AI
Unaza Tallal, Shruti Kshirsagar, Ankita Shukla
Accurate sleep stage classification across datasets remains challenging due to variability in EEG channel montages, sampling rates, recording environments, and subject populations. Although deep learning has shown considerable promise for automated sleep stagi...
Accurate sleep stage classification across datasets remains challenging due to variability in EEG channel montages, sampling rates, recording environments, and subject populations. Although deep learning has shown considerable promise for automated sleep staging, most existing cross-dataset methods rely on one-dimensional EEG signal representations, whereas the use of two-dimensional spectrogram-based inputs within an unsupervised domain adaptation framework has remained largely unexplored. Here...
506 Geometric Kolmogorov--Arnold Network (GeoKAN)
2605.06740
几何感知KAN模型提出GeoKAN学习黎曼度量扭曲坐标后再做基展开以增强几何归纳偏置。
cs.LGcs.AI
Abhijit Sen, Bikram Keshari Parida, Giridas Maiti, Mahima Arya, Denys I. Bondar
We introduce Geometric Kolmogorov--Arnold Networks (GeoKANs), a family of geometry-aware KAN-type models in which approximation is carried out in learned, geometry-adapted coordinates rather than in fixed Euclidean input coordinates. GeoKAN achieves this by le...
We introduce Geometric Kolmogorov--Arnold Networks (GeoKANs), a family of geometry-aware KAN-type models in which approximation is carried out in learned, geometry-adapted coordinates rather than in fixed Euclidean input coordinates. GeoKAN achieves this by learning a diagonal Riemannian metric that warps the input before basis expansion and feature mixing. The learned metric provides a geometric inductive bias through local length scaling and volume distortion, and in physics-informed settings ...
507 A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics
2605.06741
信念空间学习率上界在概率单纯形的KL几何下推导可收缩更新的学习率步长闭式上界。
cs.LG
Zixi Li, Youzhen Li
Learning-rate steps are usually treated as hyperparameters. This paper isolates a local beliefspace calculation: when an update is modeled as a projected forward step on the probability simplex, admissibility means contractivity in the natural KL/Bregman geome...
Learning-rate steps are usually treated as hyperparameters. This paper isolates a local beliefspace calculation: when an update is modeled as a projected forward step on the probability simplex, admissibility means contractivity in the natural KL/Bregman geometry. Under this model, the upper bound of an admissible step is not a tuning slogan but a formula.
508 Gradient Extrapolation-Based Policy Optimization
2605.06755
梯度外推策略优化提出GXPO用梯度外推近似多步前瞻以低成本改进GRPO式推理RL更新。
cs.LGcs.AI
Ismam Nur Swapnil, Aranya Saha, Tanvir Ahmed Khan, Mohammad Ariful Haque, Ser-Nam Lim
Reinforcement learning is widely used to improve the reasoning ability of large language models, especially when answers can be automatically checked. Standard GRPO-style training updates the model using only the current step, while full multi-step lookahead c...
Reinforcement learning is widely used to improve the reasoning ability of large language models, especially when answers can be automatically checked. Standard GRPO-style training updates the model using only the current step, while full multi-step lookahead can give a better update direction but is too expensive because it needs many backward passes. We propose Gradient Extrapolation-Based Policy Optimization (GXPO), a plug-compatible policy-update rule for GRPO-style reasoning RL. GXPO approxi...
509 Physics-based Digital Twins for Integrated Thermal Energy Systems Using Active Learning
2605.06756
热能系统数字孪生主动学习用主动学习耦合Modelica仿真与多类代理模型构建不确定性感知数字孪生。
cs.LG
Umme Mahbuba Nabila, Paul Seurin, Linyu Lin, Majdi I. Radaideh
Real-time supervisory control of thermal energy distribution systems requires digital twins that are accurate, interpretable, and uncertainty-aware, yet remain data and computationally efficient. High-fidelity simulations alone are costly, while purely data-dr...
Real-time supervisory control of thermal energy distribution systems requires digital twins that are accurate, interpretable, and uncertainty-aware, yet remain data and computationally efficient. High-fidelity simulations alone are costly, while purely data-driven surrogates often lack robustness. To address these challenges, this work proposes an active learning (AL) framework that couples system-level Modelica simulations with four simpler physics-informed and data-driven surrogate modeling ap...
510 Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache
2605.06763
KV缓存稀疏注意力索引将稀疏注意力建模为范围搜索以构建推理高效且避免漏选关键KV的索引。
cs.LG
Mohsen Dehghankar, Abolfazl Asudeh
Sparse attention improves LLM inference efficiency by selecting a subset of key-value entries, but at the cost of potential accuracy degradation. In particular, omitting critical KV entries can induce substantial errors in model outputs. Existing methods typic...
Sparse attention improves LLM inference efficiency by selecting a subset of key-value entries, but at the cost of potential accuracy degradation. In particular, omitting critical KV entries can induce substantial errors in model outputs. Existing methods typically operate under fixed or adaptive token budgets and provide empirical robustness or partial theoretical guarantees, yet they do not ensure zero false negatives in decoding steps, particularly since the set of relevant tokens is both quer...
511 Revisiting Adam for Streaming Reinforcement Learning
2605.06764
Streaming RL Adam优化重新分析Adam在流式强化学习中的稳定更新机制。
cs.LGcs.AI
Florin Gogianu, Adrian Catalin Lutu, Razvan Pascanu
Learning from a sequence of interactions, as soon as observations are perceived and acted upon, without explicitly storing them, holds the promise of simpler, more efficient and adaptive algorithms. For over a decade, however, deep reinforcement learning walke...
Learning from a sequence of interactions, as soon as observations are perceived and acted upon, without explicitly storing them, holds the promise of simpler, more efficient and adaptive algorithms. For over a decade, however, deep reinforcement learning walked the contrary path, augmenting agents with replay buffers or parallel sampling routines, in an effort to tame learning instability. Recently, this topic has been revisited by Elsayed et al. (2024), focusing on update computation through el...
512 Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport
2605.06785
PRM校准与最优传输用条件最优传输校准过程奖励模型的成功概率预测。
cs.LGcs.AI
Rachel Ma, Dylan Hadfield-Menell, Kristjan Greenewald
Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of conditional optimal transport for calibrating PRMs, modifying conditio...
Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of conditional optimal transport for calibrating PRMs, modifying conditional OT (CondOT) map learning \cite{bunne2022supervised} to estimate a monotonic conditional quantile function over success probabilities estimated by the PRM, conditioned on PRM hidden states. This yields structurally valid quantile estimat...
513 Conformal Agent Error Attribution
2605.06788
多智能体错误归因用共形预测为多智能体交互轨迹提供可保证的错误定位。
cs.LG
Naihe Feng, Yi Sui, Shiyi Hou, Ga Wu, Jesse C. Cresswell
When multi-agent systems (MAS) fail, identifying where the decisive error occurred is the first step for automated recovery to an earlier state. Error attribution remains a fundamental challenge due to the long interaction traces that large language model-base...
When multi-agent systems (MAS) fail, identifying where the decisive error occurred is the first step for automated recovery to an earlier state. Error attribution remains a fundamental challenge due to the long interaction traces that large language model-based MAS generate. This paper presents a framework for error attribution based on conformal prediction (CP) which provides finite-sample, distribution-free coverage guarantees. We introduce new algorithms for filtration-based CP designed for s...
514 MIND: Monge Inception Distance for Generative Models Evaluation
2605.06797
生成模型评估指标提出MIND以切片Wasserstein改进FID的评估可靠性。
cs.LG
Quentin Berthet, Yu-Han Wu, Clement Crepy, Romuald Elie, Klaus Greff
We propose the Monge Inception Distance (MIND), a metric for evaluating generative models that addresses key limitations of the widely adopted Fr\'echet Inception Distance (FID). The MIND metric leverages the sliced Wasserstein distance to compare distribution...
We propose the Monge Inception Distance (MIND), a metric for evaluating generative models that addresses key limitations of the widely adopted Fr\'echet Inception Distance (FID). The MIND metric leverages the sliced Wasserstein distance to compare distributions by averaging one-dimensional optimal transport distances, efficiently computed via sorting. This approach circumvents the estimation of high-dimensional means and covariance matrices, which underlie FID's poor sample complexity and vulner...
515 From Model to Data (M2D): Shifting Complexity from GNNs to Graphs for Transparent Graph Learning
2605.06814
GNN蒸馏到数据提出M2D将GNN复杂度转移到图数据以提升可解释性。
cs.LG
Debolina Halder Lina, Arlei Silva
Graph Neural Networks (GNNs) achieve high performance but can be opaque to humans, making it difficult to understand and compare the many proposed architectures. While existing explainability methods attribute individual predictions to nodes, edges, or feature...
Graph Neural Networks (GNNs) achieve high performance but can be opaque to humans, making it difficult to understand and compare the many proposed architectures. While existing explainability methods attribute individual predictions to nodes, edges, or features, they do not provide architectural transparency or explain the fundamental performance gap between simple and more complex models. To address this limitation, we introduce Model-to-Data (M2D) distillation, a new framework that increases t...
516 A Theory of Online Learning with Autoregressive Chain-of-Thought Reasoning
2605.06819
在线CoT学习理论建立自回归链式思维映射的在线学习与错误界理论。
cs.LG
Ilan Doron-Arad, Idan Mehalel, Elchanan Mossel
Autoregressive generation lies at the heart of the mechanism of large language models. It can be viewed as the repeated application of a next-token generator: starting from an input string (prompt), the generator is applied for $M$ steps, and the last generate...
Autoregressive generation lies at the heart of the mechanism of large language models. It can be viewed as the repeated application of a next-token generator: starting from an input string (prompt), the generator is applied for $M$ steps, and the last generated token is taken as the final output. [Joshi et al., 2025] proposed a PAC model for studying the learnability of the input-output maps arising from this process. We develop an online analogue of this framework, focusing on the mistake bound...
517 A Rod Flow Model for Adam at the Edge of Stability
2605.06821
Adam边界稳定性模型用rod flow连续动力学刻画Adam在稳定边缘的行为。
cs.LGcs.AI
Eric Regis, Sinho Chewi
Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these models to mom...
Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these models to momentum methods remains underdeveloped. In the gradient descent setting, Regis et al. (arXiv:2602.01480) introduced rod flow, which models consecutive iterates as an extended one-dimensional object -- a "rod." Here we extend rod flow to Adam ...
518 SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents
2605.06822
交易智能体自改进提出可审计的自演化评分规则以对齐金融交易智能体。
cs.LG
Xiwen Chen, Wenhui Zhu, Songzhu Zheng, Kashif Rasul, Yueyue Deng
Large language models (LLMs) are increasingly deployed for autonomous financial trading, a domain requiring continuous adaptation to noisy, non-stationary markets. Existing self-improving agents typically address this through unbounded free-form prompt optimiz...
Large language models (LLMs) are increasingly deployed for autonomous financial trading, a domain requiring continuous adaptation to noisy, non-stationary markets. Existing self-improving agents typically address this through unbounded free-form prompt optimization. However, in low signal-to-noise environments with delayed scalar rewards (P\&L), this unstructured approach exacerbates the fundamental credit assignment problem: optimizers cannot reliably distinguish systematic logic flaws from sto...
519 Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics
2605.06831
DDIM与DDPM幻觉理论从反向ODE/SDE分析DDIM更易卡模态导致幻觉的原因。
cs.LGcs.AI
Muhammad H. Ashiq, Samanyu Arora, Abhinav N. Harish, Ishaan Kharbanda, Hung Yun Tseng
We theoretically study the hallucination phenomena in two canonical diffusion samplers: the stochastic Denoising Diffusion Probabilistic Model (DDPM) and the deterministic Denoising Diffusion Implicit Model (DDIM). We analyze the reverse ODE (DDIM) and SDE (DD...
We theoretically study the hallucination phenomena in two canonical diffusion samplers: the stochastic Denoising Diffusion Probabilistic Model (DDPM) and the deterministic Denoising Diffusion Implicit Model (DDIM). We analyze the reverse ODE (DDIM) and SDE (DDPM) for a Gaussian mixture target, proving that after a critical time $\tau$, (a) DDIM can become stuck on the segment connecting the two nearest modes and (b) DDPM *stochasticity* helps it become unstuck from this region, thus avoiding hal...
520 Attribution-Based Neuron Utility for Plasticity Restoration in Deep Networks
2605.06834
持续学习可塑性恢复用归因度量神经元效用以恢复网络可塑性并减缓僵化。
cs.LG
Patrick Elisii, Lucas Beauchemin, Dawer Jamshed
Continual learning research attempts to conserve two fundamental capabilities: new knowledge acquisition and the preservation of previously acquired knowledge. While knowledge in this case can be measured through performance over an implicit or explicit task s...
Continual learning research attempts to conserve two fundamental capabilities: new knowledge acquisition and the preservation of previously acquired knowledge. While knowledge in this case can be measured through performance over an implicit or explicit task space, model plasticity generally concerns adaptability as data distributions evolve. Though much of the literature has focused on catastrophic forgetting, deep networks can also suffer from loss of plasticity, becoming progressively harder ...
521 On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics
2605.06835
表格扩散隐私泄露系统评估表格扩散模型的隐私泄露因素与攻击度量。
cs.LGcs.AI
Masoumeh Shafieinejad, D. B. Emerson, Behnoosh Zamanlooy, Elaheh Bassak, Fatemeh Tavakoli
Tabular data plays an important role in many fields and industries, including those with elevated privacy considerations and risks. As such, there is a rising interest in generating high-quality synthetic proxies for real tabular data as a means of reducing pr...
Tabular data plays an important role in many fields and industries, including those with elevated privacy considerations and risks. As such, there is a rising interest in generating high-quality synthetic proxies for real tabular data as a means of reducing privacy risk and proprietary data exposure. With tabular diffusion models (TDMs) demonstrating leading performance in synthesizing such data, understanding and measuring the privacy risks associated with these models is imperative. Leveraging...
522 How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
2605.06850
RL对齐KV缓存压缩用影子掩码蒸馏压缩RL后训练中的KV缓存以省显存。
cs.LGcs.AI
Rui Zhu, Weiheng Bai, Qiushi Wu, Yang Ren, Haixu Tang
Reinforcement Learning (RL) has emerged as a crucial paradigm for unlocking the advanced reasoning capabilities of Large Language Models (LLMs), encompassing frameworks like RLHF and RLAIF. Regardless of the specific optimization algorithm (e.g., PPO, GRPO, or...
Reinforcement Learning (RL) has emerged as a crucial paradigm for unlocking the advanced reasoning capabilities of Large Language Models (LLMs), encompassing frameworks like RLHF and RLAIF. Regardless of the specific optimization algorithm (e.g., PPO, GRPO, or Online DPO), online RL inherently requires an exploratory trajectory generation (rollout) phase. However, for long-context reasoning tasks, this rollout phase imposes a severe ``memory wall'' due to the exorbitant Key-Value (KV) cache foot...
523 Christoffel-DPS: Optimal sensor placement in diffusion posterior sampling for arbitrary distributions
2605.06861
扩散后验采样传感器布置用Christoffel-DPS在任意分布下优化扩散重建的传感器选址。
cs.LG
James Rowbottom, Nick Huang, Carola-Bibiane Sch\"onlieb, Ben Adcock
State estimation is a critical task in scientific, engineering and control applications. Since the reliability of reconstructions depends on the number and position of sensors, optimal sensor placement (OSP) is essential in scenarios where measurements are spa...
State estimation is a critical task in scientific, engineering and control applications. Since the reliability of reconstructions depends on the number and position of sensors, optimal sensor placement (OSP) is essential in scenarios where measurements are sparse and expensive. Classical OSP approaches rely on Gaussian assumptions and are consequently unable to account for the complex distributions encountered in many real-world systems. Generative-model-based reconstruction using sensor guided ...
524 Multi-Objective Multi-Agent Bandits: From Learning Efficiency to Fairness Optimization
2605.06864
多目标多智能体Bandit提出兼顾Pareto效率与公平性的多智能体多目标UCB算法。
cs.LG
John Wang, Mengfan Xu
We study multi-objective multi-agent multi-armed bandits (MO-MA-MAB) under stochastic rewards, where agents observe heterogeneous reward vectors and communicate over time-varying graphs. We formulate this emerging problem setting to address \emph{efficient lea...
We study multi-objective multi-agent multi-armed bandits (MO-MA-MAB) under stochastic rewards, where agents observe heterogeneous reward vectors and communicate over time-varying graphs. We formulate this emerging problem setting to address \emph{efficient learning}, measured by Pareto regret, and incorporate \emph{fair learning} as an additional goal, captured via social welfare. To measure efficiency, we formulate Pareto regret and develop \textsc{Pareto UCB1 Gossip}, whose novel exploration r...
525 Dataset Watermarking for Closed LLMs with Provable Detection
2605.06865
闭源LLM数据集水印设计可证明检测的数据集水印以验证闭源模型训练使用情况。
cs.LG
Pengrun Huang, Kamalika Chaudhuri, Yu-Xiang Wang
Large language models (LLMs) are pre-trained and post-trained on vast amounts of loosely curated data, raising the possibility that these models may have been trained on proprietary datasets or the same benchmarks used for evaluation. This motivates the need f...
Large language models (LLMs) are pre-trained and post-trained on vast amounts of loosely curated data, raising the possibility that these models may have been trained on proprietary datasets or the same benchmarks used for evaluation. This motivates the need for dataset watermarking: designing datasets such that training on them leaves detectable signatures in the resulting model. Prior work has explored this problem for open models. We introduce the first dataset watermarking method for closed ...
526 A Finite-Iteration Theory for Asynchronous Categorical Distributional Temporal-Difference Learning
2605.06866
异步分布式TD理论给出异步分类分布式TD学习的有限迭代误差收敛理论。
cs.LG
Ege C. Kaya, Abolfazl Hashemi
Recent non-asymptotic analyses have substantially advanced the theory of distributional policy evaluation, but they largely concern synchronous full-state updates under a generative model, model-based estimators, accelerated variants, or different approximatio...
Recent non-asymptotic analyses have substantially advanced the theory of distributional policy evaluation, but they largely concern synchronous full-state updates under a generative model, model-based estimators, accelerated variants, or different approximation architectures. Standard categorical temporal-difference learning is typically used in a different regime. It asynchronously performs a single-state update at each iteration and, in online settings, is driven by a Markovian trajectory. Thi...
527 When Descent Is Too Stable: Event-Triggered Hamiltonian Learning to Optimize
2605.06868
事件触发哈密顿优化提出SHAPE通过事件触发控制优化器跳出过度稳定的局部极小。
cs.LG
Yi Wang, Chandrajit Bajaj
Fixed-budget nonconvex optimization can fail not because local descent is unstable, but because it is too stable: after reaching a nearby stationary point, an optimizer may spend the remaining evaluations refining an uninformative local minimum. We formulate t...
Fixed-budget nonconvex optimization can fail not because local descent is unstable, but because it is too stable: after reaching a nearby stationary point, an optimizer may spend the remaining evaluations refining an uninformative local minimum. We formulate this failure mode as a control problem over optimizer dynamics, where the learner must decide when to descend, when to exploit a promising basin, and when stagnation should trigger movement elsewhere. We introduce SHAPE, a structured adaptiv...
528 Continuous First, Discrete Later: VQ-VAEs Without Dimensional Collapse
2605.06870
VQ-VAE维度塌缩分析并缓解VQ-VAE表示维度塌缩导致的性能下界问题。
cs.LG
Xinyu Zhao, Nikita Karagodin, Hamed Hassani, Sinan Hersek, Paul Pu Liang
While many approaches to improve VQ-VAE performance focus on codebook size and utilization, the effect of dimensional collapse, where trained VQ-VAE representations live in an extremely low-dimensional subspace (1-2% of full rank), remains unaddressed. We show...
While many approaches to improve VQ-VAE performance focus on codebook size and utilization, the effect of dimensional collapse, where trained VQ-VAE representations live in an extremely low-dimensional subspace (1-2% of full rank), remains unaddressed. We show theoretically and empirically that dimension collapse causes a hard loss lower bound that various codebook improvement techniques fail to surpass. Our analytic framework extends the sequential learning effect of Saxe et al. [2014] by intro...
529 On the Divergence of Differential Temporal Difference Learning without Local Clocks
2605.06874
差分TD学习发散性证明无局部时钟的差分TD学习在某些设定下会发散。
cs.LG
David Antrobius, Shangtong Zhang
Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form $\alpha_t$ that depends only on the time step $t$ (i.e., a global clock)...
Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form $\alpha_t$ that depends only on the time step $t$ (i.e., a global clock). The latter is of the form $\alpha_{\nu(S_t, t)}$, where $\nu(s, t)$ counts the number of visits to state $s$ until time $t$ (i.e., a local clock). In discounted RL, an RL algorithm that is convergent with a local clock is always also conv...
530 Temporal Attention for Adaptive Control of Euler-Lagrange Systems with Unobservable Memory
2605.06877
注意力自适应控制用时间注意力生成控制增益以应对不可观测记忆的摩擦系统。
cs.LG
Giansalvo Cirrincione, Adriano Fagiolini
Adaptive control of Euler-Lagrange systems is challenging when friction is governed by a finite-horizon internal state that is not directly observable from joint measurements. In this setting, the measured closed-loop state is no longer Markovian, and standard...
Adaptive control of Euler-Lagrange systems is challenging when friction is governed by a finite-horizon internal state that is not directly observable from joint measurements. In this setting, the measured closed-loop state is no longer Markovian, and standard certainty-equivalence adaptive laws may lose their convergence guarantees. The paper proposes a meta-control architecture in which the gains of a computed-torque controller are generated by a self-attention block processing a short window ...
531 Better Protein Function Prediction by Modeling Survivorship Bias
2605.06879
蛋白功能PU学习偏差建模生存者偏差以改进仅正例条件下的蛋白功能预测。
cs.LG
Zhongmou Chao, Poompol Buathong, Ekaterina Selivanovitch, Susan Daniel, Peter I. Frazier
Protein sequence data from nature exhibits survivorship bias: we only observe data from those organisms that survive and reproduce, while non-functional protein mutations are eliminated by natural selection. Thus, predicting whether a protein sequence is funct...
Protein sequence data from nature exhibits survivorship bias: we only observe data from those organisms that survive and reproduce, while non-functional protein mutations are eliminated by natural selection. Thus, predicting whether a protein sequence is functional often requires learning from positive examples alone. While positive-unlabeled (PU) learning frameworks offer a generic solution to this problem, existing PU methods ignore the evolutionary processes that shape sequence observability ...
532 Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment
2605.06885
AR到扩散LM对齐通过表示对齐将自回归语言模型适配为扩散语言模型而少训练。
cs.LGcs.AI
Fred Zhangzhi Peng, Alexis Fox, Anru R. Zhang, Alexander Tong
Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive che...
Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive checkpoints can be converted into diffusion language models, existing recipes primarily transfer parameters through continued denoising training with objective- and attention-level modifications. We instead ask whether the internal representat...
533 Streaming Adversarial Robustness in Fuzzy ARTMAP: Mechanism-Aligned Evaluation, Progressive Training, and Interpretable Diagnostics
2605.06902
流式对抗鲁棒性评估为Fuzzy ARTMAP提出机制对齐攻击与渐进训练及诊断方法。
cs.LG
Shane Cairns, Leonardo Enzo Brito da Silva, Sasha Petrenko, Donald C. Wunsch II, Jian Liu
Adversarial robustness has been studied extensively for offline deep networks, but less is known about strict single-pass streaming neural learners. This paper studies adversarial robustness in Fuzzy ARTMAP, an Adaptive Resonance Theory architecture based on c...
Adversarial robustness has been studied extensively for offline deep networks, but less is known about strict single-pass streaming neural learners. This paper studies adversarial robustness in Fuzzy ARTMAP, an Adaptive Resonance Theory architecture based on category competition, complement coding, match tracking, and replay-free prototype updates. We introduce WB-Softmax, a differentiable white-box attack surrogate aligned with ARTMAP's category-competition and map-field prediction mechanism, a...
534 Conservative Flows: A New Paradigm of Generative Models
2605.06905
守恒流生成模型提出保持数据分布不变的离散动力学生成范式与采样机制。
cs.LG
Eshed Gal, Md Shahriar Rahim Siddiqui, Moshe Eliasof, Eldad Haber
Modern generative modeling is dominated by transport from a noise prior to data. We propose an alternative paradigm in which generation is performed by a discrete stochastic dynamics that leaves the data distribution invariant, initialized from data-supported ...
Modern generative modeling is dominated by transport from a noise prior to data. We propose an alternative paradigm in which generation is performed by a discrete stochastic dynamics that leaves the data distribution invariant, initialized from data-supported states rather than from noise. The framework can utilize any pretrained flow model. We develop two probability-preserving sampling mechanisms, a corrected Langevin dynamics with a Metropolis adjustment and a predictor-corrector flow, that o...
535 TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond
2605.06906
人类移动预训练框架提出TraXion面向移动轨迹的结构化预训练目标与建模。
cs.LG
Shang-Ling Hsu, Mark Tenzer, Cyrus Shahabi, Khurram Shafique
Human mobility differs from text and from generic time series in three structural ways: visits are tuple-valued events whose meaning depends on the joint distribution over location, time, and activity; users carry persistent signatures across trajectories; and...
Human mobility differs from text and from generic time series in three structural ways: visits are tuple-valued events whose meaning depends on the joint distribution over location, time, and activity; users carry persistent signatures across trajectories; and visits are not independent across users, since co-location at shared places is a primary signal. Existing pre-training recipes for mobility import objectives from language modeling, treating trajectories as sentences and visits as tokens, ...
536 Same Signal, Opposite Meaning: Direction-Informed Adaptive Learning for LLM Agents
2605.06908
LLM自适应计算门控提出方向感知学习以稳定门控信号与额外计算收益的关系。
cs.LGcs.AI
Ziming Li, Jiatan Huang, Xiaoguang Guo, Guilin Wang, Chuxu Zhang
Adaptive test-time compute for LLM agents aims to invoke extra computation only when it improves performance. Existing methods typically use confidence-, uncertainty-, or difficulty-based gates, assuming a fixed direction from the gating signal through compute...
Adaptive test-time compute for LLM agents aims to invoke extra computation only when it improves performance. Existing methods typically use confidence-, uncertainty-, or difficulty-based gates, assuming a fixed direction from the gating signal through compute need to the value of computation. This makes gating a utility-calibration problem: gating signals should align with whether extra computation improves the final outcome over the base policy. We show that this alignment is unstable: the sam...
537 Dual-Scale Temporal Fusion Reveals Structured Predictability in Subseasonal-to-Seasonal Temperature Prediction
2605.06911
S2S温度可预测性用双尺度时间融合揭示季节内到季节温度预测的结构性可预测性。
cs.LG
Elnaz Bashir, Jiali Wang, Lin Yan
Subseasonal-to-seasonal (S2S) temperature forecasts, spanning several weeks to a few months, are critically needed in agriculture practice, energy planning, and extreme-weather induced risk management, yet their reliability varies substantially across seasons ...
Subseasonal-to-seasonal (S2S) temperature forecasts, spanning several weeks to a few months, are critically needed in agriculture practice, energy planning, and extreme-weather induced risk management, yet their reliability varies substantially across seasons and regions. Forecast skill is often attributed primarily to lead time, but this perspective does not fully explain the spatiotemporal patterns of predictability. Here we show that S2S predictability is organized across interacting temporal...
538 LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs
2605.06915
LLM概率信念一致性量化LLM在证据更新中的概率信念不一致与非贝叶斯性。
cs.LG
Chacha Chen, Matthew J\"orke, Adam Goli\'nski, Masha Fedzechkina, Guillermo Sapiro
Modern AI systems are being deployed in complex domains such as medicine, science, and law, where it is important that they not only produce correct answers, but also represent and update uncertain beliefs about the world as new evidence arrives. We introduce ...
Modern AI systems are being deployed in complex domains such as medicine, science, and law, where it is important that they not only produce correct answers, but also represent and update uncertain beliefs about the world as new evidence arrives. We introduce the novel technique of studying LLMs as information processing rules and utilize the information processing gap to study the internal (in)consistencies of how LLMs update their probabilistic beliefs from evidence. Our extensive experiments ...
539 Tyche: One Step Flow for Efficient Probabilistic Weather Forecasting
2605.06916
高效概率天气预报提出一步流模型Tyche以低成本生成校准的概率天气预报。
cs.LG
Fan Xu, Yuan Gao, Kun Wang, Rui Su, Fenghua Ling
Probabilistic weather forecasting requires not only accurate trajectories, but calibrated distributions over plausible atmospheric futures. Recent data-driven systems have achieved remarkable deterministic skill, and diffusion-based ensemble forecasters have s...
Probabilistic weather forecasting requires not only accurate trajectories, but calibrated distributions over plausible atmospheric futures. Recent data-driven systems have achieved remarkable deterministic skill, and diffusion-based ensemble forecasters have substantially improved sample realism and uncertainty quantification. However, their inference cost scales with forecast horizon, ensemble size, and the number of denoising steps required for each transition, making large operational ensembl...
540 Target-Aware Data Augmentation for SAT Prediction
2605.06931
SAT预测数据增强提出目标感知数据增强以减少SAT标注成本并提升预测性能。
cs.LG
Eshed Gal, Uri Ascher, Eldad Haber
Learning-based approaches to NP-hard problems have shown increasing promise, but their progress is fundamentally constrained by the high cost of generating labeled training data. In domains such as Boolean satisfiability (SAT), standard pipelines rely on solve...
Learning-based approaches to NP-hard problems have shown increasing promise, but their progress is fundamentally constrained by the high cost of generating labeled training data. In domains such as Boolean satisfiability (SAT), standard pipelines rely on solver-in-the-loop labeling, which scales poorly with problem size and limits the amount of usable supervision. This bottleneck hinders the broader goal of leveraging machine learning to capture structure in hard combinatorial problems. In this ...
541 MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security
2605.06933
Post-quantum AI governance提出具可证明安全性的后量子多智能体治理架构。
cs.LG
Sepideh Avizeh, Tushin Mallick, Alina Oprea, Cristina Nita-Rotaru, Reihaneh Safavi-Naini
Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment of agentic AI systems and advancements in quantum computing. With respect to agentic AI systems, one of the most critical problems is creating secure governing arc...
Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment of agentic AI systems and advancements in quantum computing. With respect to agentic AI systems, one of the most critical problems is creating secure governing architectures that ensure agents follow their owners' communication and interaction policies and can be held accountable for the messages they exchange with other agents. With respect to quantum computing, existing systems must be retrofitted ...
542 Learned Lyapunov Shielding for Adaptive Control
2605.06934
Lyapunov-safe adaptive control用学习李雅普诺夫与安全滤波增强自适应控制。
cs.LG
Giansalvo Cirrincione, Adriano Fagiolini
We augment the Slotine--Li adaptive controller for Euler--Lagrange systems with three learned components: a structured-quadratic Lyapunov function \(V_\psi\) whose positive-definiteness follows from a Cholesky parameterization, a residual Soft Actor--Critic po...
We augment the Slotine--Li adaptive controller for Euler--Lagrange systems with three learned components: a structured-quadratic Lyapunov function \(V_\psi\) whose positive-definiteness follows from a Cholesky parameterization, a residual Soft Actor--Critic policy that adds bounded torque corrections to the analytic baseline, and a physics-informed neural network that estimates unmodeled dynamics. A closed-form safety filter, derived from the single affine constraint \(\dot V_\psi + \alpha V_\ps...
543 A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis
2605.06937
LLM prompt workflow calibration给出可复现流程校准证据综述的LLM提示工作流。
cs.LG
Teo Susnjak
This methods article presents a reproducible calibration workflow for prompt-based large language models (LLMs) in structured evidence-synthesis tasks. The method separates the rules that define the scientific task from the mutable prompt harness that frames a...
This methods article presents a reproducible calibration workflow for prompt-based large language models (LLMs) in structured evidence-synthesis tasks. The method separates the rules that define the scientific task from the mutable prompt harness that frames and applies them. It optimises that harness against labelled or reference examples and an explicit task metric, then preserves the calibrated workflow as an inspectable artefact with its specification, metric, settings, and evaluation traces...
544 A Generalized Singular Value Theory for Neural Networks
2605.06938
Generalized SVD for networks证明多种神经网络可等价表示为广义SVD结构。
cs.LGcs.AI
Brian Charles Brown, Robert Bridges, David Grimsman, Mauricio Munoz, Sean Warnick
Building on the abstract Generalized Singular Value Decomposition (GSVD) theory of Brown et al. [2025], we prove that most modern neural architectures admit a generalized SVD representation in which they are left-invertible before a final linear layer, with no...
Building on the abstract Generalized Singular Value Decomposition (GSVD) theory of Brown et al. [2025], we prove that most modern neural architectures admit a generalized SVD representation in which they are left-invertible before a final linear layer, with no change in input-output behavior. Furthermore, the left-invertible nonlinear portion of the input-output behavior can be made to be \emph{norm preserving}, meaning that perturbations in the left-invertible ``embedding'' (the activations pri...
545 Bias and Uncertainty in LLM-as-a-Judge Estimation
2605.06939
LLM-as-judge bias estimation分析LLM评审的偏差与不确定性并评估校准风险。
cs.LG
James Fiedler
LLM-as-a-Judge evaluation has become a standard tool for assessing base model performance. However, characterizing performance via the naive estimator, i.e., raw judge outputs, is systematically biased. Recent work has proposed estimators to correct this bias,...
LLM-as-a-Judge evaluation has become a standard tool for assessing base model performance. However, characterizing performance via the naive estimator, i.e., raw judge outputs, is systematically biased. Recent work has proposed estimators to correct this bias, but their reliability depends critically on judge quality and, for model comparisons, on calibration stability. Sharing calibration across compared models is practically attractive but can introduce severe bias, including cases where the c...
546 Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice Settings
2605.06941
Causal bilevel discrete choice用因果感知基础模型求解离散选择下的双层定价优化。
cs.LG
Shivaram Subramanian, Zhengliang Xue, Markus Ettl, Yingdong Lu, Jayant Kalagnanam
We introduce a causal aware foundation-model framework for real time optimal decision making in discrete choice environments. We propose a constrained triple-head price optimization (C3PO) network to solve a bilevel decision problem in which a service provider...
We introduce a causal aware foundation-model framework for real time optimal decision making in discrete choice environments. We propose a constrained triple-head price optimization (C3PO) network to solve a bilevel decision problem in which a service provider selects an optimal assortment while heterogeneous users make personalized acceptance or rejection choices optimizing their own personalized preferences. C3PO integrates imitation learning of prices, multi-task learning of revenue responses...
547 ProtoSSL: Interpretable Prototype Learning from Unlabeled Time-Series Data
2605.06943
Unsupervised prototype time-series从无标签时序学习可解释原型以支持案例式解释。
cs.LG
Steven Song, Sahil Sethi, Brett Beaulieu-Jones, Robert L. Grossman
In time-series domains where both predictive performance and interpretability are essential, deep neural networks achieve strong results but provide limited insight into how their predictions are made. Projection-based prototype networks address this limitatio...
In time-series domains where both predictive performance and interpretability are essential, deep neural networks achieve strong results but provide limited insight into how their predictions are made. Projection-based prototype networks address this limitation by grounding predictions in similarity to representative training examples, enabling case-based explanations and global prototype inspection. However, existing approaches rely on label supervision, tying prototypes to a specific task and ...
548 Adaptive Memory Decay for Log-Linear Attention
2605.06946
Log-linear attention memory decay提出自适应记忆衰减以提升对数线性注意力的召回。
cs.LGcs.AI
Yaxita Amin, Helen Zichen Li, Mengfan Zhang, Samet Ayhan
Sequence models face a fundamental tradeoff between memory capacity and computational efficiency. Transformers achieve expressive context modeling at quadratic cost, while linear attention and state-space models run in linear time by compressing context into a...
Sequence models face a fundamental tradeoff between memory capacity and computational efficiency. Transformers achieve expressive context modeling at quadratic cost, while linear attention and state-space models run in linear time by compressing context into a fixed-size hidden state, inherently limiting recall. Log-linear attention navigates this tradeoff by organizing memory across a Fenwick tree hierarchy, growing its hidden state logarithmically with sequence length at log-linear compute cos...
549 Rollback-Free Stable Brick Structures Generation
2605.06947
Stable brick structure generation用强化学习在训练期约束物理稳定生成积木结构。
cs.LG
Chenhui Xu, Ziyue Bai, Fuxun Yu, Heng Huang, Jinjun Xiong
While autoregressive models have advanced 3D generation, creating physically stable brick structures remains a challenge due to the strict requirements of gravity and interconnectivity. Existing approaches rely on external physical simulators during inference ...
While autoregressive models have advanced 3D generation, creating physically stable brick structures remains a challenge due to the strict requirements of gravity and interconnectivity. Existing approaches rely on external physical simulators during inference to perform rejection sampling and brick-by-brick rollbacks, which severely bottlenecks efficiency. To address this, we propose a reinforcement learning paradigm that shifts physical validity enforcement from test-time correction to training...
550 Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection
2605.06955
Score matching anomaly detection用峰度引导选择噪声尺度提升表格异常检测。
cs.LGcs.AI
Victor Livernoche, Jie Zan, Reihaneh Rabbany
Denoising score matching (DSM) provides a way to learn data distributions by training a neural network to recover the score function, defined as the gradient of the log density, from noise-corrupted samples. Once trained, the score magnitude at a test point re...
Denoising score matching (DSM) provides a way to learn data distributions by training a neural network to recover the score function, defined as the gradient of the log density, from noise-corrupted samples. Once trained, the score magnitude at a test point reflects how consistent that point is with the learned distribution, making it a natural anomaly signal. The key practical challenge is selecting the perturbation scale: too little noise yields unstable score estimates in sparse regions, whil...
551 $f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses
2605.06977
f-divergence regularized RLHF统一分析不同f散度正则下RLHF的采样与理论性质。
cs.LGcs.AI
Di Wu, Chengshuai Shi, Jing Yang, Cong Shen
Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone technique for post-training large language models. While most existing approaches rely on the reverse KL-regularization, recent empirical studies have begun exploring alternative diverg...
Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone technique for post-training large language models. While most existing approaches rely on the reverse KL-regularization, recent empirical studies have begun exploring alternative divergences (e.g., forward KL, chi-squared) as regularizers in RLHF. However, a unified theoretical understanding of general $f$-divergence regularization remains under-explored. To fill this gap, this work develops a comprehensive theoretical fr...
552 PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction
2605.06979
Neural causal abstraction localization用最优传输渐进定位神经因果抽象的关键干预位置。
cs.LGcs.AI
Jonathn Chang, Arya Datla, Ziv Goldfeld
Causal abstraction offers a principled framework for mechanistic interpretability, aligning a high-level causal model with the low-level computation realized by a neural network through counterfactual intervention analysis. Existing methods such as distributed...
Causal abstraction offers a principled framework for mechanistic interpretability, aligning a high-level causal model with the low-level computation realized by a neural network through counterfactual intervention analysis. Existing methods such as distributed alignment search (DAS) learn expressive subspace interventions, but the relevant neural site is unknown a priori, so finding a handle requires a computationally burdensome search over candidate sites. We introduce PLOT (Progressive Localiz...
553 FastOmniTMAE: Parallel Clause Learning for Scalable and Hardware-Efficient Tsetlin Embeddings
2605.06982
Tsetlin embedding acceleration并行子句学习加速可扩展且硬件友好的Tsetlin嵌入。
cs.LG
Ahmed K. Kadhim, Lei Jiao, Rishad Shafik, Ole-Christoffer Granmo, Mayur Kishor Shende
Embedding models in natural language processing (NLP) increasingly rely on deep architectures such as BERT, while simpler models such as Word2Vec provide efficient representations but limited interpretability. The Tsetlin Machine (TM) offers an alternative log...
Embedding models in natural language processing (NLP) increasingly rely on deep architectures such as BERT, while simpler models such as Word2Vec provide efficient representations but limited interpretability. The Tsetlin Machine (TM) offers an alternative logic-based learning paradigm. Omni TM Autoencoder (Omni TM-AE) applies this paradigm to static embedding by exploiting automaton state distributions within a single clause layer, but its training process remains slow. In this work, we propose...
554 Response Time Enhances Alignment with Heterogeneous Preferences
2605.06987
Preference alignment with response time利用标注响应时间建模异质偏好以改进对齐。
cs.LG
Federico Echenique, Alireza Fallah, Baihe Huang, Michael I. Jordan
Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-...
Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this crit...
555 Why Does Agentic Safety Fail to Generalize Across Tasks?
2605.06992
Agentic safety generalization解释多任务场景下智能体安全为何难以跨任务泛化。
cs.LG
Yonatan Slutzky, Yotam Alexander, Tomer Slor, Yoav Nagel, Nadav Cohen
AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but ...
AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but do so while avoiding risks and handling ones that materialize. Empirical evidence suggests that even when the ability to execute generalizes to unseen tasks, the ability to do so safely frequently does not. This paper provides theory and ex...
556 Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators
2605.06997
KV-cache-free long-context recall用谱Koopman算子实现无KV缓存的长程联想检索。
cs.LG
Anupama Sridhar, Alexander Johansen
Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-me...
Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-memory recurrence but suffer a memory cliff: retrieval accuracy collapses once the gap between a stored fact and its query exceeds the effective horizon of the recurrent state. We introduce Echo, a KV-cache-free associative recall architectur...
557 Inductive Power Grid Cascading Failure Analysis with GRU-Gated Graph Attention
2605.07010
Power grid cascade failure GNN用GRU门控图注意力实现跨电网迁移的级联失效分析。
cs.LG
Tianxin Zhou, Xiang Li, Haibing Lu
Identifying vulnerable transmission lines in power grids before a cascading failure occurs is challenging: existing methods can learn inter-line failure correlations from cascade data, but they are trained and evaluated on a single grid, and transferring the l...
Identifying vulnerable transmission lines in power grids before a cascading failure occurs is challenging: existing methods can learn inter-line failure correlations from cascade data, but they are trained and evaluated on a single grid, and transferring the learned knowledge to an unseen grid remains an open problem. We address this by training a single Gated Recurrent Unit (GRU)-gated Graph Attention Network on combined cascading failure data from limited training grids and applying it directl...
558 Dual-Agent Co-Training for Health Coaching via Implicit Adversarial Preference Optimization
2605.07011
AI health coaching co-training双智能体协同训练并用偏好优化提升健康教练对话。
cs.LG
Da Long, Lingyi Fu, Diya Michelle Rao, Jasmine Ruales Carrera, Yang Bai
Motivational-interviewing-based health coaching is an effective approach for improving mental health and promoting healthy behavior change. However, the scarcity of trained human coaches and the high cost of coaching services make such support inaccessible to ...
Motivational-interviewing-based health coaching is an effective approach for improving mental health and promoting healthy behavior change. However, the scarcity of trained human coaches and the high cost of coaching services make such support inaccessible to many people who could benefit from it. This motivates the development of AI health coaches that can provide scalable and affordable support. Existing methods typically optimize only one side of the interaction: they either train a dialogue ...
559 FlashMol: High-Quality Molecule Generation in as Few as Four Steps
2605.07020
Few-step molecular diffusion提出四步级快速生成高质量3D分子构象模型。
cs.LGcs.AI
Xinyuan Wei, Zian Li, Shaoheng Yan, Cai Zhou, Muhan Zhang
Generating chemically valid 3D molecular conformations is critical for computational drug discovery. Classical diffusion-based models like GeoLDM perform well but require hundreds of steps, making large-scale in silico screening impractical. Recent efforts on ...
Generating chemically valid 3D molecular conformations is critical for computational drug discovery. Classical diffusion-based models like GeoLDM perform well but require hundreds of steps, making large-scale in silico screening impractical. Recent efforts on few-step molecular generation have accelerated this process to 12-50 steps, but they often largely sacrifice sample stability. In this work, we present FlashMol, an ultra-fast molecule generative model producing high-quality molecular confo...
560 Self Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale
2605.07022
LLM-driven biomedical dataset construction用LLM从海量论文自动抽取构建细粒度生物医学数据集。
cs.LG
Haydn Jones, Yimeng Zeng, Alden Rose, Li S. Yifei, Yining Huang
Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show t...
Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show that PubMed itself can be autonomously and cost-effectively turned into structured datasets that are larger, more nuanced, and more accurate than the curated databases they replace. We present three coupled contributions: (1) an LLM-based en...
561 Delulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Tasks
2605.07024
Code hallucination FIM benchmark发布多语言FIM代码幻觉检测的可验证基准Delulu。
cs.LG
Mahdi Erfanian, Nelson Daniel Troncoso, Aashna Garg, Amabel Gale, Xiaoyu Liu
Large Language Models for code generation frequently produce hallucinations in Fill-in-the-Middle (FIM) tasks -- plausible but incorrect completions such as invented API methods, invalid parameters, undefined variables, or non-existent imports. These failures ...
Large Language Models for code generation frequently produce hallucinations in Fill-in-the-Middle (FIM) tasks -- plausible but incorrect completions such as invented API methods, invalid parameters, undefined variables, or non-existent imports. These failures pass superficial review yet introduce runtime errors. We introduce Delulu, a verified multi-lingual benchmark of 1,951 FIM samples across 7 languages and 4 hallucination types. Samples are curated through an adversarial pipeline: a frontier...
562 A Systematic Investigation of The RL-Jailbreaker in LLMs
2605.07032
RL-based LLM jailbreaking analysis系统研究RL越狱攻击机制并分析其成功原因。
cs.LGcs.AI
Montaser Mohammedalamen, Kevin Roice, Reginald McLean, Alyssa Lefaivre \v{S}kopac
The evolution of generative models from next-token predictors to autonomous engines of complex systems necessitates rigorous safety hardening. Adversarial jailbreaking, the strategic manipulation of models to elicit harmful output, remains a primary threat to ...
The evolution of generative models from next-token predictors to autonomous engines of complex systems necessitates rigorous safety hardening. Adversarial jailbreaking, the strategic manipulation of models to elicit harmful output, remains a primary threat to safe deployment. While Reinforcement Learning (RL) frames jailbreaking as a multi-step attack through sequential optimization, a mechanistic understanding of why the framework succeeds remains incomplete. To fill this gap, we present the fi...
563 Learning Material-Aware Hamiltonian Risk Fields for Safe Navigation
2605.07038
Hamiltonian risk-aware navigation学习材料感知哈密顿风险场以实现更安全导航。
cs.LG
Aditya Sai Ellendula, Yi Wang, Chandrajit Bajaj
Risk-aware navigation should be selective: a policy should expose evasive degrees of freedom only when the local scene admits a lower-risk feasible maneuver, and suppress them when no safer alternative exists. We show that adding one context-energy term to a p...
Risk-aware navigation should be selective: a policy should expose evasive degrees of freedom only when the local scene admits a lower-risk feasible maneuver, and suppress them when no safer alternative exists. We show that adding one context-energy term to a port-Hamiltonian navigation policy produces a learned force channel with exactly this falsifiable signature. When the local risk field contains a feasible lower-risk direction, the induced context force activates toward it; when the apparent...
564 PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents
2605.07039
Test-time learning for evo agents用强化学习在测试时自适应进化搜索代理策略。
cs.LG
Minghao Yan, Bo Peng, Benjamin Coleman, Ziqi Chen, Zhouhang Xie
Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progr...
Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-model reinforcement learning framework for test-time policy adaptation in evolutionary search agents. PACEvolve++ decouples strategic search decisio...
565 Unlocking High-Fidelity Molecular Generation from Mass Spectra via Dual-Stream Line Graph Diffusion
2605.07048
Mass spectra to molecule diffusion双流线图扩散联合原子与键推理以由质谱生成分子。
cs.LGcs.AI
Xujun Che, Xiuxia Du, Depeng Xu
De novo molecular generation from tandem mass spectra is a challenging inverse problem whose core difficulty lies in the circular dependency between atom-level and bond-level reasoning: determining a bond's type requires knowing its endpoint atoms' chemical en...
De novo molecular generation from tandem mass spectra is a challenging inverse problem whose core difficulty lies in the circular dependency between atom-level and bond-level reasoning: determining a bond's type requires knowing its endpoint atoms' chemical environment, yet an atom's environment is in turn defined by its incident bonds. Existing graph diffusion methods process atoms and bonds within a single computation stream, where atom-bond information synchronization can only occur implicitl...
566 Towards Differentially Private Reinforcement Learning with General Function Approximation
2605.07049
Differentially private online RL给出一般函数逼近下在线强化学习的差分隐私保证。
cs.LGcs.AI
Yi He, Xingyu Zhou
We present the first theoretical guarantees for differentially private online reinforcement learning (RL) with general function approximation, extending beyond prior work restricted to tabular and linear settings. Our approach combines a batched policy update ...
We present the first theoretical guarantees for differentially private online reinforcement learning (RL) with general function approximation, extending beyond prior work restricted to tabular and linear settings. Our approach combines a batched policy update scheme with the exponential mechanism, together with a novel regret analysis. We show that, even under general function approximation, the regret in the model-free setting under differential privacy matches the state of the art for the line...
567 Integrating Causal DAGs in Deep RL: Activating Minimal Markovian States with Multi-Order Exposure
2605.07057
Causal DAG state construction in RL从纵向因果图构造满足马尔可夫性的最小RL状态。
cs.LG
Jiamin Xu, Jacqueline Maasch, Kyra Gan
Online reinforcement learning (RL) relies on the Markov property for guaranteed performance, but real-world applications often lack well-defined states given raw observed variables. While causal RL has attracted growing interest, existing work typically assume...
Online reinforcement learning (RL) relies on the Markov property for guaranteed performance, but real-world applications often lack well-defined states given raw observed variables. While causal RL has attracted growing interest, existing work typically assumes Markovian states are provided and focuses on using causality to accelerate learning, leaving a fundamental gap: \emph{given a longitudinal causal graph over observed variables, how does one construct MDP states that provably satisfy the M...
568 Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
2605.07063
Data-regularized LLM post-training将通用数据视作正则项以改进LLM后训练与选数。
cs.LGcs.AI
Pingbang Hu, Xueshen Liu, Z. Morley Mao, Jiaqi W. Ma
Data selection methods address a critical challenge in LLM post-training: effectively leveraging scarce, high-fidelity target data alongside abundant but imperfectly aligned general training data. In this work, we move beyond the data-selection framing and int...
Data selection methods address a critical challenge in LLM post-training: effectively leveraging scarce, high-fidelity target data alongside abundant but imperfectly aligned general training data. In this work, we move beyond the data-selection framing and introduce Dr. Post-Training (Data-Regularized Post-Training), a novel framework that reconceptualizes general training data as a data-induced regularizer that prevents overfitting to the scarce target objective, rather than serving as a pool f...
569 PolarAdamW: Disentangling Spectral Control and Schur Gauge-Equivariance in Matrix Optimisation
2605.07067
Matrix optimization PolarAdamW提出PolarAdamW分离谱控制与Schur规范等变性。
cs.LG
Haozhou Zhang
Muon's matrix-level update couples two distinct effects: spectral control via a polar map, and equivariance under orthogonal changes of multiplicity-space basis (Schur gauge-equivariance). We separate them with PolarAdamW, a controlled hybrid that preserves Mu...
Muon's matrix-level update couples two distinct effects: spectral control via a polar map, and equivariance under orthogonal changes of multiplicity-space basis (Schur gauge-equivariance). We separate them with PolarAdamW, a controlled hybrid that preserves Muon's polar spectral-norm control but breaks the gauge-equivariance, since AdamW's coordinatewise preconditioner is basis-dependent. Algorithmically, PolarAdamW applies Muon's Newton-Schulz polar map to AdamW's preconditioned direction rathe...
570 Less Random, More Private: What is the Optimal Subsampling Scheme for DP-SGD?
2605.07072
Optimal subsampling for DP-SGD证明参与方差致隐私放大次优并寻找更优采样方案。
cs.LG
Andy Dong, Ayfer \"Ozg\"ur
Poisson subsampling is the default sampling scheme in differentially private machine learning, largely because its unstructured randomness yields tractable privacy amplification analyses. Yet this same randomness introduces substantial participation variance: ...
Poisson subsampling is the default sampling scheme in differentially private machine learning, largely because its unstructured randomness yields tractable privacy amplification analyses. Yet this same randomness introduces substantial participation variance: each sample appears in very different numbers of training iterations. In this work, we show that this variance is not merely a practical artifact to be tolerated, but a fundamental source of suboptimal privacy amplification. We prove that B...
571 ModelLens: Finding the Best for Your Task from Myriads of Models
2605.07075
Pretrained Model Selection提出ModelLens在海量模型中为新数据集快速选最优模型。
cs.LG
Rui Cai, Weijie Jacky Mo, Xiaofei Wen, Qiyao Ma, Wenhui Zhu
The open-source model ecosystem now contains hundreds of thousands of pretrained models, yet picking the best model for a new dataset is increasingly infeasible: new models and unbenchmarked datasets emerge continuously, leaving practitioners with no prior rec...
The open-source model ecosystem now contains hundreds of thousands of pretrained models, yet picking the best model for a new dataset is increasingly infeasible: new models and unbenchmarked datasets emerge continuously, leaving practitioners with no prior records on either side. Existing approaches handle only fragments of this in-the-wild setting: AutoML and transferability estimation select models from small predefined pools or require expensive per-model forward passes on the target dataset,...
572 Test-Time Compositional Generalization in Diffusion Models via Concept Discovery
2605.07078
Diffusion Compositional Generalization让预训练扩散模型从得分中发现概念并在测试时组合生成。
cs.LG
Zekun Wang, Anant Gupta, Tianyi Zhu, Christopher J. MacLellan
Compositional generalization requires models to produce novel configurations from familiar parts. In diffusion models, prior compositional generation methods typically assume that the relevant concepts or conditioning signals are already available. We instead ...
Compositional generalization requires models to produce novel configurations from familiar parts. In diffusion models, prior compositional generation methods typically assume that the relevant concepts or conditioning signals are already available. We instead ask whether a pretrained diffusion model can discover query-specific concepts from the time-indexed scores it learns for the noisy marginals $p_t(x_t)$ and compose them at test time. Given a single out-of-distribution query, our method perf...
573 Actor-Critic with Active Importance Sampling
2605.07094
Variance-Reduced Actor-Critic提出AISAC主动优化行为策略以重要性采样降低策略梯度方差。
cs.LG
Majid Molaei, Gabor Paczolay, Matteo Papini, Alberto Maria Metelli, Marcello Restelli
This paper introduces the Active-Importance-Sampling Actor-Critic (AISAC) algorithm, an extension of the Actor-Critic framework for reducing variance in policy gradient estimation. AISAC optimizes the behavior policy to minimize gradient variance while preserv...
This paper introduces the Active-Importance-Sampling Actor-Critic (AISAC) algorithm, an extension of the Actor-Critic framework for reducing variance in policy gradient estimation. AISAC optimizes the behavior policy to minimize gradient variance while preserving unbiased gradient estimates. Using importance sampling principles, the algorithm adapts the behavior policy toward efficient data collection distributions aligned with target policy gradients. For continuous action spaces, AISAC employs...
574 Query-efficient model evaluation using cached responses
2605.07096
Cached-Response Model Evaluation利用已缓存模型回答减少新模型基准评测所需查询数。
cs.LGcs.AI
Hayden Helm, Ben Johnson, Carey Priebe
Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice, responses from ...
Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice, responses from previously-evaluated models are often cached -- creating a potential opportunity to use this additional information to decrease the number of queries required to accurately evaluate a new model. In this paper, we introduce an approach for p...
575 CarCrashNet: A Large-Scale Dataset and Hierarchical Neural Solver for Data-Driven Structural Crash Simulation
2605.07098
Crash Simulation Dataset and Solver发布CarCrashNet数据集并用分层神经网络进行结构碰撞仿真。
cs.LG
Mohamed Elrefaie, Dule Shu, Matthew Klenk, Faez Ahmed
Crash simulation is a cornerstone of modern vehicle development because it reduces the need for costly physical prototypes, accelerates safety-driven design iteration, and increasingly supports virtual testing workflows. At the same time, modeling structural c...
Crash simulation is a cornerstone of modern vehicle development because it reduces the need for costly physical prototypes, accelerates safety-driven design iteration, and increasingly supports virtual testing workflows. At the same time, modeling structural crash mechanics remains exceptionally challenging: the response is governed by nonlinear contact, large deformation, material plasticity, failure, and complex multi-body interactions evolving over space and time on high-resolution finite-ele...
576 Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift
2605.07104
Stochastic Approximation Convergence Theory用Poisson-Moreau漂移给出马尔可夫噪声下SA与RL几乎必然收敛率。
cs.LG
Xinyu Liu, Zixuan Xie, Shangtong Zhang
Establishing almost sure convergence rates for stochastic approximation and reinforcement learning under Markovian noise is a fundamental theoretical challenge. We make progress towards this challenge for a class of stochastic approximation algorithms whose ex...
Establishing almost sure convergence rates for stochastic approximation and reinforcement learning under Markovian noise is a fundamental theoretical challenge. We make progress towards this challenge for a class of stochastic approximation algorithms whose expected updates are contractive, a setting that arises in many reinforcement learning algorithms such as $Q$-learning and linear temporal difference learning. Specifically, for a power-law learning rate $O(n^{-\eta})$ with $\eta \in (1/2, 1)...
577 Solving Max-Cut to Global Optimality via Feasibility-Preserving Graph Neural Networks
2605.07113
GNN-Accelerated Exact Max-Cut用保持可行性的GNN近似SDP界以加速分支定界求解Max-Cut最优解。
cs.LG
Hao Chen, Chendi Qian, Christopher Morris, Andrea Lodi, Can Li
Exact solution of hard combinatorial optimization problems often relies on strong convex relaxations, but solving these relaxations repeatedly inside a branch-and-bound algorithm can be prohibitively expensive. Hence, we consider this challenge for Max-Cut, wh...
Exact solution of hard combinatorial optimization problems often relies on strong convex relaxations, but solving these relaxations repeatedly inside a branch-and-bound algorithm can be prohibitively expensive. Hence, we consider this challenge for Max-Cut, where branch and bound commonly uses semidefinite programming (SDP) relaxations to bound subproblems. We propose a Max-Cut-specific graph neural network that serves as a principled, lightweight neural proxy for these SDP solvers and can be pl...
578 Where to Spend Rollouts: Hit-Utility Optimal Rollout Allocation for Group-Based RLVR
2605.07114
Adaptive Rollout Allocation in RLVR提出命中效用准则为GRPO等按提示自适应分配rollout计算。
cs.LG
Tao Wang, Shuo Li, Yan Sun, Dongsheng Ding, Edgar Dobriban
Reinforcement learning with verifiable rewards (RLVR) has emerged as a central paradigm for improving the reasoning capabilities of large language models. Group-based policy optimization methods, such as GRPO, typically allocate a fixed number of rollouts to e...
Reinforcement learning with verifiable rewards (RLVR) has emerged as a central paradigm for improving the reasoning capabilities of large language models. Group-based policy optimization methods, such as GRPO, typically allocate a fixed number of rollouts to every prompt. This uniform allocation can be inefficient: it over-allocates compute to prompts whose sampled groups are already saturated while under-exploring prompts for which additional samples may reveal useful correct trajectories. To a...
579 Conformal-Style Quantile Analyses for Stochastic Bandits
2605.07115
Quantile Objectives in Bandits提出ACP-UCB1等方法分析并优化随机老虎机的上分位数目标。
cs.LG
Chengyu Du, Mengfan Xu
Stochastic bandit algorithms are usually analyzed under a mean-reward criterion, yet many problems favor arms with strong upper-tail performance, which we study herein. For a fixed miscoverage level \(\alpha\), the natural upper-tail target of arm \(j\) is the...
Stochastic bandit algorithms are usually analyzed under a mean-reward criterion, yet many problems favor arms with strong upper-tail performance, which we study herein. For a fixed miscoverage level \(\alpha\), the natural upper-tail target of arm \(j\) is the upper endpoint \(F_j^{-1}(1-\alpha/2)\) of a central prediction interval. This target can rank arms differently from their means, creating a central mismatch with the classical bandit objective. To this end, we propose ACP-UCB1, a conforma...
580 Stabilized neural Hamilton--Jacobi--Bellman solvers: Error analysis and applications in model-based reinforcement learning
2605.07116
Neural HJB Solvers for MBRL分析稳定化神经HJB求解器误差并用于连续时间模型式强化学习。
cs.LGcs.AI
Minseok Kim, Yeongjong Kim, Namkyeong Cho, Yeoneung Kim
Physics-informed neural solvers offer a promising route to model-based reinforcement learning in continuous time, where optimal feedback synthesis is governed by Hamilton--Jacobi--Bellman (HJB) equations. Practical implementations often occupy a regime that is...
Physics-informed neural solvers offer a promising route to model-based reinforcement learning in continuous time, where optimal feedback synthesis is governed by Hamilton--Jacobi--Bellman (HJB) equations. Practical implementations often occupy a regime that is neither a classical grid method nor a continuous-PDE PINN: the value function is represented by a neural network, finite-difference HJB policy-evaluation operators are evaluated by network queries at shifted points, and residuals are minim...
581 When Symbol Names Should Not Matter: A Logistic Theory of Fresh-Symbol Classification
2605.07120
Symbol-Renaming Invariant Classification建立逻辑回归理论解释模板任务中对符号重命名不变的分类学习。
cs.LG
Wenjie Guan, Jelena Bradic
Template tasks have emerged as a clean testbed for asking whether transformers reason with abstract symbols rather than concrete token names. We study the fixed-label classification version of this problem, where train and test examples share latent templates ...
Template tasks have emerged as a clean testbed for asking whether transformers reason with abstract symbols rather than concrete token names. We study the fixed-label classification version of this problem, where train and test examples share latent templates but may use disjoint vocabularies. Unlike next-token prediction, the model need not emit unseen symbols; it must learn a decision rule invariant to symbol renaming. We analyze regularized kernel logistic classification in the transformer-ke...
582 Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
2605.07123
Theory of In-Context RL with CoT从理论上刻画CoT如何促进Transformer的上下文强化学习收敛与涌现。
cs.LG
Zixuan Xie, Xinyu Liu, Rohan Chandra, Shangtong Zhang
In-context reinforcement learning (ICRL) refers to the ability of RL agents to adapt to new tasks at inference time without parameter updates by conditioning on additional context. Recent empirical studies further demonstrate that Chain-of-Thought (CoT) genera...
In-context reinforcement learning (ICRL) refers to the ability of RL agents to adapt to new tasks at inference time without parameter updates by conditioning on additional context. Recent empirical studies further demonstrate that Chain-of-Thought (CoT) generation can amplify this ICRL capability. This paper is the first to provide a theoretical understanding on how CoT interacts with ICRL. We conduct our analysis in a policy evaluation setup with linear Transformer. We prove that with specific ...
583 Simple KNN-Based Outlier Detection Achieves Robust Clustering
2605.07130
KNN Outlier Detection for Robust k-Means证明简单KNN异常检测可实现鲁棒聚类并提升robust k-means效果。
cs.LG
Tianle Jiang, Yufa Zhou
Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means cost on the remaining ...
Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means cost on the remaining points. Despite the close connection between robust $k$-Means and outlier detection, both theoretical and empirical understanding of the effectiveness of $\textit{classic outlier detection heuristics}$ for robust $k$-Means remains limited. ...
584 GAD in the Wild: Benchmarking Graph Anomaly Detection under Realistic Deployment Challenges
2605.07133
Realistic Graph Anomaly Detection Benchmark构建贴近部署的图异常检测基准并系统评测多种挑战维度。
cs.LGcs.AI
Jingjing Zhou, Shiyu Huang, Qing Qing, Zuquan Yuan, Huafei Huang
Graph Anomaly Detection (GAD) is a critical task in graph machine learning with vital applications in financial fraud detection and social platform governance. However, existing GAD benchmarks are often restricted to small-scale, curated graphs with relatively...
Graph Anomaly Detection (GAD) is a critical task in graph machine learning with vital applications in financial fraud detection and social platform governance. However, existing GAD benchmarks are often restricted to small-scale, curated graphs with relatively balanced anomaly ratios, leaving a substantial gap between academic evaluation and real-world deployment. To bridge this gap, we present a multi-dimensional benchmark that systematically evaluates GAD models under three deployment-relevant...
585 Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR
2605.07137
Negative Reinforcement for RLVR提出自适应负强化动态调节惩罚以兼顾纠错与多样性提升推理。
cs.LGcs.AI
Yash Ingle, Jaival Chauhan, Ankit Yadav, Sudhakar Mishra
Reinforcement learning with verifiable rewards (RLVR) has become a highly effective method for improving the reasoning abilities of Large Language Models (LLMs). Recent research shows that Negative Sample Reinforcement (NSR) -- which focuses on penalizing inco...
Reinforcement learning with verifiable rewards (RLVR) has become a highly effective method for improving the reasoning abilities of Large Language Models (LLMs). Recent research shows that Negative Sample Reinforcement (NSR) -- which focuses on penalizing incorrect steps rather than simply rewarding correct ones -- can match or even exceed the performance of more complex frameworks like PPO and GRPO across the entire Pass@k spectrum. However, current NSR techniques usually apply a fixed penalty ...
586 Regret-Oracle Complexity Tradeoffs in Agnostic Online Learning
2605.07155
Oracle-Efficient Online Learning Tradeoffs研究不可知在线学习中遗憾与ERM预言机复杂度之间的权衡界。
cs.LG
Idan Attias, Steve Hanneke, Arvind Ramaswami
Agnostic online learning is classically solved via a reduction to the realizable setting, utilizing Littlestone's Standard Optimal Algorithm (SOA) as a base learner. However, the SOA is computationally intractable to execute even for a single round. To overcom...
Agnostic online learning is classically solved via a reduction to the realizable setting, utilizing Littlestone's Standard Optimal Algorithm (SOA) as a base learner. However, the SOA is computationally intractable to execute even for a single round. To overcome this barrier, recent work in oracle-efficient online learning replaces the SOA with a realizable base learner that accesses the concept class exclusively through an offline empirical risk minimization (ERM) oracle. While such agnostic lea...
587 Learned Lagrangian Models of PDEs via Euler-Lagrange Residual Minimization
2605.07157
Learned Lagrangian PDE Modeling通过最小化欧拉-拉格朗日残差学习连续拉氏量并稳定预测PDE动力学。
cs.LG
Lyra Zhornyak, Eric Forgoston, M. Ani Hsieh
We present the first method to directly use a learned continuous Lagrangian to forecast the dynamics of systems governed by partial differential equations, exploiting the inherent conservative structure to achieve stable long-range predictions. We develop an o...
We present the first method to directly use a learned continuous Lagrangian to forecast the dynamics of systems governed by partial differential equations, exploiting the inherent conservative structure to achieve stable long-range predictions. We develop an optimization-based integrator that minimizes the squared Euler--Lagrange residual via a mesh-free near-symplectic construction on local space-time patches. Different from integrators for analytical models, integrators for learned models shou...
588 Neurosymbolic Imitation Learning with Human Guidance: A Privileged Information Approach
2605.07166
Neurosymbolic Imitation Learning用人类指导的特权信息融合神经与符号实现更高效泛化的模仿学习。
cs.LG
Nikhilesh Prabhakar, Varun Balaji, Athresh Karanam, Kristian Kersting, Sriraam Natarajan
Imitation learning is widely used for learning to act in complex environments. While pure neural-based methods handle high dimensional data effectively, they suffer from the requirement of large number of samples and are prone to overfitting. Pure symbolic app...
Imitation learning is widely used for learning to act in complex environments. While pure neural-based methods handle high dimensional data effectively, they suffer from the requirement of large number of samples and are prone to overfitting. Pure symbolic approaches, while generalize well, do not handle high-dimensional data effectively. We propose a neurosymbolic approach that achieves the best of both worlds, i.e, handling high-dimensional data while achieving generalization. The key advantag...
589 Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy
2605.07171
Cost-Constrained Multi-Armed Bandits研究带成本补贴的老虎机在满足奖励约束下的最小成本决策。
cs.LG
Ishank Juneja, Carlee Joe-Wong, Osman Ya\u{g}an
The classic multi-armed bandit (MAB) problem tackles the challenge of accruing maximum reward while making decisions under uncertainty. However, in applications, often the goal is to minimize cost subject to a constraint on the minimum permissible reward, an o...
The classic multi-armed bandit (MAB) problem tackles the challenge of accruing maximum reward while making decisions under uncertainty. However, in applications, often the goal is to minimize cost subject to a constraint on the minimum permissible reward, an objective captured by multi-armed bandits with cost-subsidy (MAB-CS). Of interest to this paper is the setting where the quality (reward) constraint is specified relative to the unknown best reward and the cost of each arm is known. We chara...
590 Learning Multi-Relational Graph Representations for DNA Methylation-Based Biological Age Estimation
2605.07175
Graph Learning for Epigenetic Aging Clocks学习多关系图表示建模CpG依赖以提升甲基化生物年龄估计。
cs.LGcs.AI
Qing Qing, Xikun Zhang, Zhongyuan Zhang, Jiarui Liu, Xingtong Yu
Aging clocks aim to estimate biological age, a measure of physiological state distinct from chronological age, from observable biomarkers, and are widely used for health assessment and disease analysis. DNA methylation is a particularly informative biomarker d...
Aging clocks aim to estimate biological age, a measure of physiological state distinct from chronological age, from observable biomarkers, and are widely used for health assessment and disease analysis. DNA methylation is a particularly informative biomarker due to its stability and strong association with aging, and recent learning-based approaches have improved predictive performance. However, most existing methods treat CpG sites as independent features, overlooking the complex and heterogene...
591 HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents
2605.07177
Parallel Multimodal Search Agents提出HyperEyes并行发起多路检索与视觉定位以减少交互轮次。
cs.LGcs.AI
Guankai Li, Jiabin Chen, Yi Xu, Xichen Zhang, Yuan Lu
Existing multimodal search agents process target entities sequentially, issuing one tool call per entity and accumulating redundant interaction rounds whenever a query decomposes into independent sub-retrievals. We argue that effective multimodal agents should...
Existing multimodal search agents process target entities sequentially, issuing one tool call per entity and accumulating redundant interaction rounds whenever a query decomposes into independent sub-retrievals. We argue that effective multimodal agents should search wider rather than longer: dispatching multiple grounded queries concurrently within a round. To this end, we present HyperEyes, a parallel multimodal search agent that fuses visual grounding and retrieval into a single atomic action...
592 Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control
2605.07182
Nested Submodels for Reasoning LLMsStar Elastic一次后训练在单模型内嵌多子模型并可控推理预算。
cs.LG
Ali Taghibakhshi, Ruisi Cai, Saurav Muralidharan, Sharath Turuvekere Sreenivas, Aditya Vavre
Training a family of large language models (LLMs), either from scratch or via iterative compression, is prohibitively expensive and inefficient, requiring separate training runs for each model in the family. In this paper, we introduce Star Elastic, a novel LL...
Training a family of large language models (LLMs), either from scratch or via iterative compression, is prohibitively expensive and inefficient, requiring separate training runs for each model in the family. In this paper, we introduce Star Elastic, a novel LLM post-training method that adds N nested submodels to a given parent reasoning model using the compute of one run (N-fold savings) via a single post-training job. Beyond reducing training costs, Star Elastic also addresses a fundamental li...
593 Coupling Models for One-Step Discrete Generation
2605.07193
One-Step Discrete Generative Models提出耦合模型将离散序列与高斯潜变量直接耦合实现一步生成。
cs.LG
Fred Zhangzhi Peng, Avishek Joey Bose, Anru R. Zhang, Alexander Tong
Generative modeling over discrete structures underpins applications across deep learning, from biological sequence design and code generation to large language models, yet generation often remains sequential, relying on autoregressive decoding or iterative ref...
Generative modeling over discrete structures underpins applications across deep learning, from biological sequence design and code generation to large language models, yet generation often remains sequential, relying on autoregressive decoding or iterative refinement. In this work, we introduce Coupling Models(Coupling Models), a one-step discrete generative model that learns a direct coupling between discrete sequences and Gaussian latents. Unlike recent distillation methods that compress a pre...
594 Arrow: A Foundation Model for Causal Discovery
2605.07204
Foundation Model for Causal Discovery提出Arrow用Transformer零样本预测骨架与拓扑序以发现因果图。
cs.LG
Ryan Thompson, He Zhao, Daniel M. Steinberg, Edwin V. Bonilla
We introduce Arrow, a foundation model for zero-shot causal discovery on observational tabular data. Arrow factorizes a directed acyclic graph into an undirected skeleton and a topological order, guaranteeing acyclicity by construction. Given a new dataset, it...
We introduce Arrow, a foundation model for zero-shot causal discovery on observational tabular data. Arrow factorizes a directed acyclic graph into an undirected skeleton and a topological order, guaranteeing acyclicity by construction. Given a new dataset, it uses a transformer-based architecture to contextualize variables within and across observations, then predicts skeleton edge probabilities and node order scores that together define a graph. Arrow is trained in a supervised fashion on synt...
595 FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution
2605.07208
Academic Impact Forecasting用连续时间流形演化预测论文影响并评估LLM对高影响论文辨识能力。
cs.LG
Jianrong Ding, Jianyuan Zhong, Zhengyan Shi, Qiang Xu
Large Language Models (LLMs) are increasingly used to brainstorm and evaluate research ideas, yet assessing such judgments is fundamentally difficult because the true impact of a new idea may take years to emerge. We address this challenge by using the impact ...
Large Language Models (LLMs) are increasingly used to brainstorm and evaluate research ideas, yet assessing such judgments is fundamentally difficult because the true impact of a new idea may take years to emerge. We address this challenge by using the impact forecasting of human-authored manuscripts as a verifiable proxy task. In a prospective forecasting study, we find that frontier LLMs fail to reliably distinguish high-impact papers from ordinary publications, suggesting that static text-bas...
596 HARMONY: Bridging the Personalization-Generalization Gap by Mitigating Representation Skew in Heterogeneous Split Federated Learning
2605.07211
Heterogeneous Split Federated LearningHARMONY缓解异构Split联邦学习中的表示偏斜以兼顾个性化与泛化。
cs.LGcs.AI
Jiseok Youn, You Rim Choi, Goodsol Lee, Sangtae Ha, Hyung-Sin Kim
Mobile devices face diverse resource constraints and non-IID data class distributions, requiring fast on-device inference for local in-distribution (ID) classes and on-demand remote support for client-specific out-of-distribution (OOD) classes. Hybrid split fe...
Mobile devices face diverse resource constraints and non-IID data class distributions, requiring fast on-device inference for local in-distribution (ID) classes and on-demand remote support for client-specific out-of-distribution (OOD) classes. Hybrid split federated learning (Hybrid SFL) couples personalized client-side front ends (supporting early exit) with a generalized server-side backend for fallback inference, balancing accuracy and cost. However, under client architectural heterogeneity,...
597 Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability
2605.07212
EEG Preprocessing Robustness将预处理视为干预空间并量化其导致的EEG解码预测不稳定性。
cs.LGcs.AI
Dengzhe Hou, Zihao Wu, Lingyu Jiang, Zirui Li, Fangzhou Lin
Electroencephalography (EEG) is a cornerstone of brain-computer interfaces and clinical neuroscience, yet deep learning models are typically trained and evaluated under a single, unreported preprocessing pipeline. We formalize preprocessing choices as a counte...
Electroencephalography (EEG) is a cornerstone of brain-computer interfaces and clinical neuroscience, yet deep learning models are typically trained and evaluated under a single, unreported preprocessing pipeline. We formalize preprocessing choices as a counterfactual intervention space and show that EEG predictions are surprisingly unstable under this space: across six datasets spanning four paradigms, up to 42% of trial-level predictions flip when only the preprocessing changes, a variability ...
598 Improved Model-based Reinforcement Learning with Smooth Kernels
2605.07218
Kernel-Smooth Model-Based RL提出平滑核的模型式强化学习方法以利用MDP光滑性提升样本效率。
cs.LG
Kun Long, Yuqiang Li, Xianyi Wu
For continuous state-action space scenarios, classical reinforcement learning (RL) theory predominantly focuses on low-rank Markov decision processes (MDPs), which provide sample-efficient guarantees at the expense of restrictive structural assumptions. Kernel...
For continuous state-action space scenarios, classical reinforcement learning (RL) theory predominantly focuses on low-rank Markov decision processes (MDPs), which provide sample-efficient guarantees at the expense of restrictive structural assumptions. Kernel smoothing model-based approaches offer a promising alternative paradigm that instead leverages the smoothness of the MDP and employs non-parametric kernel smoothing estimates of transition dynamics. This paper proposes a new kernel-smoothi...
599 On the Robustness of Distribution Support under Diffusion Guidance
2605.07220
Theory of Diffusion Guidance Robustness从理论解释扩散引导如何影响分布支持并保证高质量可控生成。
cs.LG
Ruijia Cao, Yuchen Wu, Nisha Chadramoorthy
Diffusion guidance is a powerful technique that enables controllable and high-fidelity sample generation with diffusion models. At a high level, it modifies the score function by incorporating a guidance term that steers the generative process toward a desired...
Diffusion guidance is a powerful technique that enables controllable and high-fidelity sample generation with diffusion models. At a high level, it modifies the score function by incorporating a guidance term that steers the generative process toward a desired condition. Despite its empirical success, the theoretical properties of diffusion guidance remain largely unexplored, and it is not well understood why it consistently produces high-quality samples. In this work, we explain the effectivene...
600 Don't Learn the Shape: Forecasting Periodic Time Series by Rank-1 Decomposition
2605.07222
Rank-1 Periodic Time Series Forecasting用秩一分解分离周期形状与幅度以极少参数预测周期时间序列。
cs.LG
Takato Honda
How few parameters do we really need to forecast a periodic time series? An hourly electricity series, reshaped as a 24-row matrix with one column per day, is approximately rank-1: a daily shape modulated by a daily level (median centered rank-1 energy 0.82 on...
How few parameters do we really need to forecast a periodic time series? An hourly electricity series, reshaped as a 24-row matrix with one column per day, is approximately rank-1: a daily shape modulated by a daily level (median centered rank-1 energy 0.82 on GIFT-Eval). Should we learn the shape? Smoothing, shrinkage, and low-rank fits all seem like obvious upgrades over the simple average of the last K=2 cycles. On all 97 GIFT-Eval configurations, we tested 8 such alternatives (e.g., Fourier,...
601 Modulated learning for private and distributed regression with just a single sample per client device
2605.07233
One-sample federated regression提出调制学习以在每端仅一条样本下实现隐私分布式回归。
cs.LG
Praneeth Vepakomma, Amirhossein Reisizadeh, Samuel Horv\'ath, Munther Dahleh
This work focuses on the question of learning from a large number of devices with each device holding only a single sample of data. Several real-world applications exist to this one sample per client setup up including learning from fitness trackers, data/app ...
This work focuses on the question of learning from a large number of devices with each device holding only a single sample of data. Several real-world applications exist to this one sample per client setup up including learning from fitness trackers, data/app usage aggregators, body-worn sensing devices, and daily event monitors to name a few. When a client has only one sample, the standard federated learning paradigm breaks down as a local update based on that single point is far from being use...
602 Sample Complexity of Stochastic Optimization with Integer Variables
2605.07239
Integer stochastic optimization complexity分析整数随机优化的样本复杂度并与连续情形对比。
cs.LG
Hongyu Cheng, Yinghao Zheng, Marco Molinaro, Amitabh Basu
We establish sample complexity results for stochastic optimization over the integers, especially with a view to understand the complexity with respect to the corresponding continuous optimization problem. We show that integer optimization can sometimes require...
We establish sample complexity results for stochastic optimization over the integers, especially with a view to understand the complexity with respect to the corresponding continuous optimization problem. We show that integer optimization can sometimes require strictly more samples and sometimes strictly smaller number of samples, depending on the structure of the objective and constraints. 1. For Lipschitz objectives over subsets of the $\ell_\infty$ ball, the statistical complexity of general ...
603 PerCaM-Health: Personalized Dynamic Causal Graphs for Healthcare Reasoning
2605.07267
Personalized temporal causal graphs学习个体化动态因果图以支持医疗时序推理与决策。
cs.LG
Elahe Khatibi, Ziyu Wang, Saba A. Farahani, Di Huang, Hung Cao
Personalized healthcare decisions require reasoning about how physiological and behavioral variables influence an individual patient over time. Existing temporal causal discovery methods are poorly matched to this setting: cohort-level models provide stable bu...
Personalized healthcare decisions require reasoning about how physiological and behavioral variables influence an individual patient over time. Existing temporal causal discovery methods are poorly matched to this setting: cohort-level models provide stable but non-personalized structures, while per-patient discovery is unreliable because individual trajectories are short, noisy, irregular, and non-stationary. This creates a fundamental gap between population-level causal modeling and the patien...
604 bispectrum: Selective $G$-Bispectra Made Practical
2605.07270
Group-invariant bispectrum features将选择性G-双谱做成可用工具以提取群不变表征。
cs.LG
Johan Mathe, Adele Myers, Simon Mataigne, Nina Miolane
Many machine learning tasks are invariant under the action of a group $G$ of transformations: signal classification can be invariant under translations, image classification under 2D rotations, and spherical-image classification under 3D rotations. The $G$-bis...
Many machine learning tasks are invariant under the action of a group $G$ of transformations: signal classification can be invariant under translations, image classification under 2D rotations, and spherical-image classification under 3D rotations. The $G$-bispectrum is a principled complete invariant of a signal (retaining all all signal's information up to the group action) with proven benefits in machine learning and as a pooling layer in deep networks. However, its deployment has been hamper...
605 Bifurcation Models: Learning Set-Valued Solution Maps with Weight-Tied Dynamics
2605.07277
Set-valued learning via dynamics用权重共享动力系统学习多解问题的集合值解映射。
cs.LGcs.AI
Caleb Jore, Jialin Liu
Many scientific and combinatorial problems admit multiple correct solutions, not a single label. Standard supervised learning resolves this ambiguity by choosing one solution as the target, but this hidden selector can be arbitrary, discontinuous, and harder t...
Many scientific and combinatorial problems admit multiple correct solutions, not a single label. Standard supervised learning resolves this ambiguity by choosing one solution as the target, but this hidden selector can be arbitrary, discontinuous, and harder to learn than the underlying solution set. We study bifurcation models, a weight-tied dynamical view in which different initializations can converge to different stable equilibria, so the model represents an attractor landscape rather than o...
606 Mask2Cause: Causal Discovery via Adjacency Constrained Causal Attention
2605.07280
Time-series causal discovery attention用邻接约束因果注意力在预测过程中端到端恢复因果图。
cs.LGcs.AI
Omar Muhammad, Pasupuleti Dhruv Shivkant, Deepak N. Subramani
Leveraging deep learning for causal discovery in time series remains challenging because existing neural methods predominantly rely on component-wise architectures that fail to capture shared system dynamics or employ decoupled post-hoc graph extraction that r...
Leveraging deep learning for causal discovery in time series remains challenging because existing neural methods predominantly rely on component-wise architectures that fail to capture shared system dynamics or employ decoupled post-hoc graph extraction that risks overfitting to spurious correlations. We propose $\textbf{Mask2Cause}$, an end-to-end framework that recovers the underlying causal graph directly during the forecasting forward pass. Our approach introduces an Inverted Variable Embedd...
607 The Convergence Gap: Instruction-Tuned Language Models Stabilize Later in the Forward Pass
2605.07282
Layerwise prediction stabilization in LMs提出收敛间隙指标揭示指令微调模型更晚才稳定预测。
cs.LG
Yifan Zhou
Final outputs hide when a checkpoint commits to its next-token prediction. We introduce the convergence gap, a model-diffing diagnostic that decodes each layer's next-token distribution and measures its distance to the model's own final distribution. Across si...
Final outputs hide when a checkpoint commits to its next-token prediction. We introduce the convergence gap, a model-diffing diagnostic that decodes each layer's next-token distribution and measures its distance to the model's own final distribution. Across six paired pretrained and instruction-tuned checkpoints in native prompting regimes, instruction-tuned checkpoints remain farther from their final predictions later into the stack. The effect persists under endpoint-matched raw and tuned read...
608 Instruction Tuning Changes How Upstream State Conditions Late Readout: A Cross-Patching Diagnostic
2605.07284
Cross-patching interpretability for tuning用跨补丁诊断分析指令微调如何影响后层读出与行为差异。
cs.LG
Yifan Zhou
Recent interpretability work has identified model-internal handles on post-trained behavior, including refusal directions, assistant/persona axes, and sparse chat-tuning features. These results localize where behaviors can be read out or controlled, often in m...
Recent interpretability work has identified model-internal handles on post-trained behavior, including refusal directions, assistant/persona axes, and sparse chat-tuning features. These results localize where behaviors can be read out or controlled, often in middle-to-late layers. We ask how earlier computation and the late stack cooperate to turn those differences into next-token margins. To test this, we introduce first-divergence cross-patching: at the first token where pretrained base (PT) a...
609 Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation
2605.07302
Spectral basis from pretraining证明预训练形成可复用谱基,使下游微调仅需少量方向调整。
cs.LG
Junjie Yu, Yue Wang, Zihan Deng, Yan Zhu, Wenxiao Ma
Finetuning pretrained models occurs in a low-dimensional subspace of the full parameter space. Prior work has focused on characterizing this optimization subspace, but largely ignored the complementary question: why do certain directions remain unexplored duri...
Finetuning pretrained models occurs in a low-dimensional subspace of the full parameter space. Prior work has focused on characterizing this optimization subspace, but largely ignored the complementary question: why do certain directions remain unexplored during finetuning? Are these stable directions irrelevant to downstream tasks, or do they already encode task-relevant structure that requires no further adjustment? Answering this question is central to understanding how pretrained knowledge t...
610 Latent Order Bandits
2605.07304
Latent-structure bandits提出潜在顺序老虎机以利用跨实例结构降低个性化探索样本。
cs.LG
Emil Carlsson, Newton Mwai, Fredrik D. Johansson
Bandit algorithms solve diverse sequential decision-making problems, but are often too sample-inefficient for from-scratch personalization. To substantially reduce exploration times, latent bandit algorithms exploit cross-instance structure implied by discrete...
Bandit algorithms solve diverse sequential decision-making problems, but are often too sample-inefficient for from-scratch personalization. To substantially reduce exploration times, latent bandit algorithms exploit cross-instance structure implied by discrete latent states, provided that the posterior distribution of rewards and latent states is known and accurate. However, obtaining an accurate model of this structure is difficult, and a small number of latent states may be insufficient to cha...
611 Generative Modeling with Flux Matching
2605.07319
Flux matching generative models提出Flux Matching学习非保守向量场以实现更灵活生成建模。
cs.LGcs.AI
Peter Pao-Huang, Xiaojie Qiu, Stefano Ermon
We introduce Flux Matching, a new paradigm for generative modeling that generalizes existing score-based models to a broader family of vector fields that need not be conservative. Rather than requiring the model to equal the data score, the Flux Matching objec...
We introduce Flux Matching, a new paradigm for generative modeling that generalizes existing score-based models to a broader family of vector fields that need not be conservative. Rather than requiring the model to equal the data score, the Flux Matching objective imposes a weaker condition that admits infinitely many vector fields whose stationary distribution is the data. This flexibility enables a class of generative models that cannot be learned under score matching, in which inductive biase...
612 SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication
2605.07330
Communication-efficient RL weight sync用稀疏同步实现强化学习权重无损传输并大幅降低通信量。
cs.LGcs.AI
Lucas Hu, Ranchi Zhao, Isaac Zhu, Zach Zhang, Hscos Zhang
In large-scale reinforcement learning (RL) systems with decoupled Trainer-Rollout execution, the Trainer must regularly synchronize policy weights to the Rollout side to limit policy staleness. When inter-node bandwidth is abundant, such synchronization is usu...
In large-scale reinforcement learning (RL) systems with decoupled Trainer-Rollout execution, the Trainer must regularly synchronize policy weights to the Rollout side to limit policy staleness. When inter-node bandwidth is abundant, such synchronization is usually only a small fraction of end-to-end cost. As model size grows, however, the communication demand rises rapidly. In bandwidth-constrained or network-variable deployments -- for example, cross-datacenter or cross-cluster settings, hetero...
613 Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective
2605.07331
Importance sampling for LLM RL从累计token视角重构重要性采样比率以缓解偏差方差矛盾。
cs.LGcs.AI
Yuheng Zhang, Chenlu Ye, Shuowei Jin, Changlong Yu, Wei Xiong
Reinforcement learning, including reinforcement learning with verifiable rewards (RLVR), has emerged as a powerful approach for LLM post-training. Central to these approaches is the design of the importance sampling (IS) ratio used in off-policy policy-gradien...
Reinforcement learning, including reinforcement learning with verifiable rewards (RLVR), has emerged as a powerful approach for LLM post-training. Central to these approaches is the design of the importance sampling (IS) ratio used in off-policy policy-gradient estimation. Existing methods face a fundamental bias-variance dilemma: token-level IS ratios, as adopted by PPO (Schulman et al., 2017) and GRPO (Shao et al., 2024), introduce bias by ignoring prefix state distribution mismatch; full sequ...
614 Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
2605.07333
In-context RL with softmax attention理论证明标准softmax Transformer可实现上下文内强化学习。
cs.LG
Zixuan Xie, Xinyu Liu, Claire Chen, Shuze Daniel Liu, Rohan Chandra
In-context reinforcement learning (ICRL) studies agents that, after pretraining, adapt to new tasks by conditioning on additional context without parameter updates. Existing theoretical analyses of ICRL largely rely on linear attention, which replaces the soft...
In-context reinforcement learning (ICRL) studies agents that, after pretraining, adapt to new tasks by conditioning on additional context without parameter updates. Existing theoretical analyses of ICRL largely rely on linear attention, which replaces the softmax function in the standard attention with an identity mapping. This paper provides the first theoretical understanding of ICRL without making the unrealistic linear attention simplification. In particular, we consider the standard softmax...
615 CellScientist: Dual-Space Hierarchical Orchestration for Closed-Loop Refinement of Virtual Cell Models
2605.07335
Closed-loop virtual cell model refinement提出双空间分层编排框架实现虚拟细胞模型的闭环迭代修正。
cs.LG
Mengran Li, Bo Li, Jiaying Wang, Wenbin Xing, Yixuan Dong
Virtual Cell Modeling (VCM) requires models that not only predict perturbation responses, but also support targeted revision when predictions fail. Current LLM-assisted modeling workflows face a refinement-routing problem: prediction discrepancies are observed...
Virtual Cell Modeling (VCM) requires models that not only predict perturbation responses, but also support targeted revision when predictions fail. Current LLM-assisted modeling workflows face a refinement-routing problem: prediction discrepancies are observed through executable implementations, but the relevant revision may involve the modeling assumption, representation design, implementation, or task constraint. Without structured feedback propagation across these levels, iterative refinement...
616 Mage: Multi-Axis Evaluation of LLM-Generated Executable Game Scenes Beyond Compile-Pass Rate
2605.07342
Evaluation of LLM-generated game scenes提出Mage四轴指标评测LLM生成可执行游戏场景的真实质量。
cs.LGcs.AI
Hugh Xuechen Liu, K{\i}van\c{c} Tatar
Compile-pass rate is the dominant evaluation signal for LLM code generation, yet for multi-component domain-specific artifacts it can be actively misleading. We demonstrate this on executable game scene synthesis with a four-axis evaluation protocol (named `Ma...
Compile-pass rate is the dominant evaluation signal for LLM code generation, yet for multi-component domain-specific artifacts it can be actively misleading. We demonstrate this on executable game scene synthesis with a four-axis evaluation protocol (named `Mage') -- compile success, runtime success, structural fidelity, and mechanism adherence -- applied to 858 generation attempts across four open-weight LLMs (7B--30B), 26~hand-crafted Unity goal pattern playable concepts, and two automatically...
617 MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference
2605.07363
Sparse attention for long-context inference提出MISA混合索引稀疏注意力以降低长上下文推理开销。
cs.LGcs.AI
Ruijie Zhou, Fanxu Meng, Yufei Xu, Tongxuan Liu, Guangming Lu
DeepSeek Sparse Attention (DSA) sets the state of the art for fine-grained inference-time sparse attention by introducing a learned token-wise indexer that scores every prefix token and selects the most relevant ones for the main attention. To remain expressiv...
DeepSeek Sparse Attention (DSA) sets the state of the art for fine-grained inference-time sparse attention by introducing a learned token-wise indexer that scores every prefix token and selects the most relevant ones for the main attention. To remain expressive, the indexer uses many query heads (for example, 64 on DeepSeek-V3.2) that share the same selected token set; this multi-head design is precisely what makes the indexer the dominant cost on long contexts. We propose MISA (Mixture of Index...
618 FlightSense: An End-to-End MLOps Platform for Real-Time Flight Delay Prediction via Rotation-Chain Propagation Features and Agentic Conversational AI
2605.07364
Real-time flight delay MLOps构建端到端平台用航班轮换链传播特征实时预测延误并部署。
cs.LG
Aditi J. Shelke, Renuka J. Shelke, Yash M. Kamerkar
Flight delays impose cascading operational and financial burdens across the aviation network, costing the U.S. economy billions of dollars annually by disrupting interconnected aircraft rotation systems. While prior machine learning approaches have demonstrate...
Flight delays impose cascading operational and financial burdens across the aviation network, costing the U.S. economy billions of dollars annually by disrupting interconnected aircraft rotation systems. While prior machine learning approaches have demonstrated strong predictive performance, most treat upstream delays as static input variables rather than explicitly modeling how delays propagate dynamically through aircraft rotation chains, and none have deployed such systems alongside a live we...
619 QuadNorm: Resolution-Robust Normalization for Neural Operators
2605.07375
Resolution-robust normalization for operators用数值求积替代均匀平均提出QuadNorm以提升跨分辨率鲁棒性。
cs.LG
Bum Jun Kim, Makoto Kawano, Yusuke Iwasawa, Yutaka Matsuo
Normalization layers in neural operators usually compute statistics by uniformly averaging discrete grid values, making the normalization itself discretization-dependent and thereby a source of transfer error across different resolutions or meshes. To enable d...
Normalization layers in neural operators usually compute statistics by uniformly averaging discrete grid values, making the normalization itself discretization-dependent and thereby a source of transfer error across different resolutions or meshes. To enable discretization robustness, we introduce a quadrature normalization family that replaces existing uniform averaging in normalization layers with numerical quadrature: QuadNorm and BlendQuadNorm. On endpoint-inclusive uniform grids, the propos...
620 Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns
2605.07378
Training-free network evaluation metrics用样本级激活模式提出零训练代理指标以更准评估网络性能。
cs.LG
Yameng Peng, Andy Song, HaythamM. Fayek, Vic Ciesielski, Xiaojun Chang
Zero-shot proxies, also known as training-free metrics, are widely adopted to reduce the computational overhead in neural network evaluation for scenarios such as Neural Architecture Search (NAS), as they do not require any training. Existing zero-shot metrics...
Zero-shot proxies, also known as training-free metrics, are widely adopted to reduce the computational overhead in neural network evaluation for scenarios such as Neural Architecture Search (NAS), as they do not require any training. Existing zero-shot metrics have several limitations, including weak correlation with the true performance and poor generalisation across different networks or downstream tasks. For example, most of these metrics apply only to either convolutional neural networks (CN...
621 StreamPhy: Streaming Inference of High-Dimensional Physical Dynamics via State Space Models
2605.07384
Streaming physical field inference提出StreamPhy用状态空间模型从稀疏不规则观测实时推断物理场。
cs.LG
Panqi Chen, Yifan Sun, Shikai Fang, Xiao Fu, Lei Cheng
Inferring the evolution of high-dimensional and multi-modal (e.g., spatio-temporal) physical fields from irregular sparse measurements in real time is a fundamental challenge in science and engineering. Existing approaches, including diffusion-based generative...
Inferring the evolution of high-dimensional and multi-modal (e.g., spatio-temporal) physical fields from irregular sparse measurements in real time is a fundamental challenge in science and engineering. Existing approaches, including diffusion-based generative models and functional tensor methods, typically operate in offline settings, depend on full temporal observations, or incur substantial inference cost. We propose StreamPhy, an end-to-end framework that enables efficient and accurate strea...
622 Convex Optimization with Nested Evolving Feasible Sets
2605.07386
Online convex optimization with shrinking sets研究嵌套收缩可行域下在线凸优化的遗憾与移动成本权衡算法。
cs.LG
Karthick Krishna M., Haricharan Balasundaram, Rahul Vaze
Convex Optimization with Nested Evolving Feasible Sets (CONES)} is considered where the objective function $f$ remains fixed but the feasible region evolves over time as a nested sequence $S_1 \supseteq S_2 \supseteq \cdots \supseteq S_T$. The goal of an onlin...
Convex Optimization with Nested Evolving Feasible Sets (CONES)} is considered where the objective function $f$ remains fixed but the feasible region evolves over time as a nested sequence $S_1 \supseteq S_2 \supseteq \cdots \supseteq S_T$. The goal of an online algorithm is to simultaneously minimize the regret with respect to hindsight static optimal benchmark and the total movement cost while ensuring feasibility at all times. CONES is an optimization-oriented generalization of the well-known ...
623 Rubric-based On-policy Distillation
2605.07396
Rubric-based on-policy distillation用语义评分量表替代教师logits实现黑盒场景的在策略蒸馏。
cs.LGcs.AI
Junfeng Fang, Zhepei Hong, Mao Zheng, Mingyang Song, Gengsheng Li
On-policy distillation (OPD) is a powerful paradigm for model alignment, yet its reliance on teacher logits restricts its application to white-box scenarios. We contend that structured semantic rubrics can serve as a scalable alternative to teacher logits, ena...
On-policy distillation (OPD) is a powerful paradigm for model alignment, yet its reliance on teacher logits restricts its application to white-box scenarios. We contend that structured semantic rubrics can serve as a scalable alternative to teacher logits, enabling OPD using only teacher-generated responses. To prove it, we introduce ROPD, a simple yet foundational framework for rubric-based OPD. Specifically, ROPD induces prompt-specific rubrics from teacher-student contrasts, and then utilizes...
624 Have Graph -- Will Lift? The Case for Higher-Order Benchmarks
2605.07397
Higher-order geometric deep learning benchmarks主张并构建更高阶结构基准以评测图与复形上的消息传递模型。
cs.LG
Bastian Rieck
After a somewhat rocky start, geometry and topology have established a foothold in machine learning. Message passing, either on graphs or higher-order complexes, is one of the main drivers of geometric deep learning, and paradigms that were once considered to ...
After a somewhat rocky start, geometry and topology have established a foothold in machine learning. Message passing, either on graphs or higher-order complexes, is one of the main drivers of geometric deep learning, and paradigms that were once considered to be firmly in the realm of the abstract-like sheaves-have been "tamed" to serve as novel inductive biases for model architectures in topological deep learning. The veritable diversity of models, however, is in stark contrast to the scarcity ...
625 Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer
2605.07407
Interpretable symbols in health foundation models从冻结嵌入提取可解释符号并对齐跨模态健康表征实现迁移。
cs.LG
Gajendra Katuwal, Advait Koparkar, Salar Abbaspourazad, Anshuman Mishra, Sarvesh Kirthivasan
Health foundation models (FMs) learn useful representations from wearable sensors, but interpreting what they encode and transferring that knowledge across modalities after training remains difficult. We present a post-training framework that decomposes frozen...
Health foundation models (FMs) learn useful representations from wearable sensors, but interpreting what they encode and transferring that knowledge across modalities after training remains difficult. We present a post-training framework that decomposes frozen embeddings into interpretable directions, referred to as symbols, and use these symbols to align the embedding spaces without retraining. We evaluate the framework on three FMs for photoplethysmography (PPG) and accelerometer data, indepen...
626 Tracking Large-scale Shared Bikes with Inertial Motion Learning in GNSS Blocked Environments
2605.07412
Inertial bike tracking without GNSS用惯性运动学习在GNSS受阻环境中实现共享单车鲁棒定位跟踪。
cs.LGcs.AI
Feng Liu (Beijing Jiaotong University), Kejia Li (Beijing Jiaotong University), Zhiwei Yang (DiDi Company), Chunwei Yang (DiDi Company), Qun Li (DiDi Company)
Although Global Navigation Satellite Systems (GNSS) provide a general solution for bike tracking outdoors, there still exist complex riding environments where only inertial navigation systems work, such as urban canyons. Despite decades of research, localizati...
Although Global Navigation Satellite Systems (GNSS) provide a general solution for bike tracking outdoors, there still exist complex riding environments where only inertial navigation systems work, such as urban canyons. Despite decades of research, localization using only low-cost inertial sensors still faces challenges such as cumulative drifts and poor robustness caused by filtering methods. Furthermore, sensors such as visual and LiDAR could provide reliable measurements, but they are not su...
627 Risk-Consistent Multiclass Learning from Random Label-Subset Membership Queries
2605.07413
Weak supervision via label-subset queries研究随机标签子集成员查询下的多类学习并保证风险一致性。
cs.LG
Jiaxu Su, Junpeng Li, Changchun Hua, Yana Yang
Obtaining accurate class labels is often costly or unreliable, and may also be limited by privacy or other practical conditions. Compared with asking an annotator to provide the exact class, it is often easier to ask whether the true label belongs to a certain...
Obtaining accurate class labels is often costly or unreliable, and may also be limited by privacy or other practical conditions. Compared with asking an annotator to provide the exact class, it is often easier to ask whether the true label belongs to a certain label subset. This query-response form defines a distinct weak-supervision mechanism: weak supervision information is generated through feedback on a label subset. Although weakly supervised learning has studied many learning frameworks, m...
628 A Flexible Adaptive Stable Clustering Algorithm for Archive-Scale Online Mass Spectrometry
2605.07424
Stable scalable clustering for mass spectrometry提出FASC框架在海量在线质谱流上实现可扩展且稳定的聚类。
cs.LG
Shao Shi, Xin Yang, Huiran Feng, Jianhuai Ye, Tianlong Hu
Modern online mass spectrometry generates multi-terabyte data streams critical for understanding Earth's environmental systems. However, extracting actionable chemical insights from these repositories is impeded by a computational bottleneck: existing clusteri...
Modern online mass spectrometry generates multi-terabyte data streams critical for understanding Earth's environmental systems. However, extracting actionable chemical insights from these repositories is impeded by a computational bottleneck: existing clustering methods force a compromise among scalability, metric flexibility, and algorithmic stability. Here, we introduce Flexible Adaptive Stable Clustering (FASC), a dynamical systems framework that resolves these constraints by architecturally ...
629 GameGen-Verifier: Parallel Keypoint-Based Verification for LLM-Generated Games via Runtime State Injection
2605.07442
Verification for LLM-generated games用并行关键点与运行时状态注入自动验证LLM生成游戏机制正确性。
cs.LG
Chaobo Jia, Ruipeng Wan, Ting Sun, Weihao Tan, Borui Wan
LLM-based game generation promises to turn natural-language specifications into executable games, but progress is limited by the lack of reliable automated verification. Unlike conventional code generation, game correctness is defined over long-horizon interac...
LLM-based game generation promises to turn natural-language specifications into executable games, but progress is limited by the lack of reliable automated verification. Unlike conventional code generation, game correctness is defined over long-horizon interaction: a game may appear correct while violating core mechanics such as state updates, interaction rules, and phase transitions. Existing Agent-as-a-Verifier approaches collapse verification into open-ended gameplay, making verdicts reachabi...
630 VNN-LIB 2.0: Rigorous Foundations for Neural Network Verification
2605.07451
Neural network verification standard提出VNN-LIB 2.0为神经网络验证提供严格语法语义与类型基础。
cs.LG
Ann Roy, Allen Antony, Andrea Gimelli, Matthew L. Daggitt
Neural network verification is an active and rapidly maturing research area, with a growing ecosystem of solvers and tools. The VNN-LIB standard was introduced to support interoperability in this ecosystem, but Version~1.0 has several serious short-comings as ...
Neural network verification is an active and rapidly maturing research area, with a growing ecosystem of solvers and tools. The VNN-LIB standard was introduced to support interoperability in this ecosystem, but Version~1.0 has several serious short-comings as a formal foundation: it lacks a precise syntax, semantics, and type system, offers limited expressivity, and relies on externally defined ONNX models whose semantics are informal and constantly evolving. The latter distinguishes VNN-LIB fro...
631 Inference-Time Attribute Distribution Alignment for Unconditional Diffusion
2605.07456
Diffusion Attribute Distribution Control提出推理时对齐属性分布的方法以控制无条件扩散生成的人群比例。
cs.LG
Hao Luan, See-Kiong Ng, Chun Kai Ling
Inference-time controllable generation is essential for real-world applications of unconditional diffusion models. However, most existing techniques focus on individual samples, struggling in applications that require the sample population to follow specific a...
Inference-time controllable generation is essential for real-world applications of unconditional diffusion models. However, most existing techniques focus on individual samples, struggling in applications that require the sample population to follow specific attribute distributions (e.g., demographic balance or semantic proportions). We formalize this setting as the inference-time attribute distributional alignment problem for pretrained unconditional diffusion models. To address this, we cast i...
632 Estimation of Motor Unit Parameters from Surface Electromyograms using an Informed Autoencoder
2605.07458
EMG Informed Autoencoder Estimation用带先验约束的自编码器从表面肌电估计运动单位生理参数。
cs.LG
Kaja Balzereit, Malte Mechtenberg, Axel Schneider
Motor unit parameters such as the innervation zone centre or the conduction velocity of the electrical potential harbour the potential to improve the fidelity of neuromechanical models used for movement and force prediction. Determining these parameters in a n...
Motor unit parameters such as the innervation zone centre or the conduction velocity of the electrical potential harbour the potential to improve the fidelity of neuromechanical models used for movement and force prediction. Determining these parameters in a non-invasive way is challenging, as they are subject-specific and may vary with muscle contraction. Existing work on the estimation of motor unit parameters mainly relies on white-box modelling and therefore requires substantial manual model...
633 Learning Minimal-Deviation Corrections for Multi-Dimensional Mismodelling in HEP Simulations
2605.07460
HEP Simulation Minimal Corrections在仅有一维观测约束下学习最小偏差的多维仿真修正以匹配数据。
cs.LG
Matthias Schott, Lucie Flek
Accurate Monte Carlo (MC) modelling in high-energy physics is challenging, particularly in complex scenarios where simulations fail to reproduce observed data. In practice, experimental information is often limited to one-dimensional (1D) distributions, while ...
Accurate Monte Carlo (MC) modelling in high-energy physics is challenging, particularly in complex scenarios where simulations fail to reproduce observed data. In practice, experimental information is often limited to one-dimensional (1D) distributions, while mismodelling arises in a multidimensional feature space. This restricts traditional correction methods, as one-dimensional reweighting ignores correlations and fully multidimensional approaches require large target datasets. We propose a ne...
634 Approximation Error Upper and Lower Bounds for H\"{o}lder Class with Transformers
2605.07463
Transformer Hölder Approximation Bounds给出Transformer逼近Hölder函数类的误差上下界与所需网络规模。
cs.LG
Xin He, Yuling Jiao, Xiliang Lu, Jerry Zhijian Yang
We explore the expressive power of Transformers by establishing precise approximation error upper and lower bounds for H\"{o}lder class. Specifically, a new approximation upper bound is derived for the standard Transformer architecture equipped with Softmax op...
We explore the expressive power of Transformers by establishing precise approximation error upper and lower bounds for H\"{o}lder class. Specifically, a new approximation upper bound is derived for the standard Transformer architecture equipped with Softmax operators, ReLU activation functions, and residual connections. We prove that a Transformer network composed of at most $\mathcal{O}(\varepsilon^{-{d_{0}}/{\alpha}})$ blocks can approximate any bounded H\"{o}lder function with $d_{0}$-dimensi...
635 Physical Simulators as Do-Operators: Causal Discovery under Latent Confounders for AI-for-Science
2605.07467
Causal Discovery with Physical Simulators将物理模拟器视作干预算子,在潜在混杂下进行因果结构发现。
cs.LGcs.AI
Tsuyoshi Okita
Existing interventional causal discovery methods -- IGSP, DCDI, ENCO -- assume causal sufficiency (no latent confounders) and rely on virtual interventions in synthetic simulators. In AI-for-Science settings such as molecular design and materials science, late...
Existing interventional causal discovery methods -- IGSP, DCDI, ENCO -- assume causal sufficiency (no latent confounders) and rely on virtual interventions in synthetic simulators. In AI-for-Science settings such as molecular design and materials science, latent confounders are ubiquitous and real interventions (e.g., physics-based simulations) require hours to days per data point. We propose CFM-SD (Causal Flow Matching with Simulation Data), which uses first-principles physical simulators as d...
636 Uncovering Hidden Systematics in Neural Network Models for High Energy Physics
2605.07470
Systematic Uncertainty in HEP NNs分析高能物理神经网络中隐藏系统误差的来源并提出诊断方法。
cs.LG
Lucie Flek, Philipp Alexander Jungs, Akbar Karimi, Timo Saala, Alexander Schmid
Neural networks (NNs) are inherently multidimensional classifiers that learn complex, non-linear relationships among input observables. While their flexibility enables unprecedented performance in high-energy physics (HEP) analyses, it also makes them sensitiv...
Neural networks (NNs) are inherently multidimensional classifiers that learn complex, non-linear relationships among input observables. While their flexibility enables unprecedented performance in high-energy physics (HEP) analyses, it also makes them sensitive to small variations in their inputs. Consequently, the propagation and estimation of systematic uncertainties in NN-based models remain an open challenge. There are indications that uncertainties derived in control regions or from nominal...
637 Transfer Learning Across Fast- and Full-Simulation Domains in High-Energy Physics
2605.07471
HEP Fast-to-Full Transfer Learning系统评估从快仿到全仿的迁移学习在多种LHC任务上的效果。
cs.LG
Matthias Schott, Lucie Flek
Machine-learning models in high-energy physics are often trained on simulated data, where fully simulated samples are computationally expensive while fast simulation provides large statistics at reduced realism. In this work, we systematically study transfer l...
Machine-learning models in high-energy physics are often trained on simulated data, where fully simulated samples are computationally expensive while fast simulation provides large statistics at reduced realism. In this work, we systematically study transfer learning between fast-simulated and fully simulated datasets in a realistic LHC environment. We consider three representative tasks, signal-background classification, quark-gluon jet tagging, and missing transverse energy reconstruction, usi...
638 NPMixer: Hierarchical Neighboring Patch Mixing for Time Series Forecasting
2605.07476
Wavelet Patch Mixing Forecasting提出含可学习小波分解与邻域补丁混合的层次模型用于时间序列预测。
cs.LG
Jung Min Choi, Vijaya Krishna Yalavarthi, Lars Schmidt-Thieme
Multivariate time series forecasting remains a challenge due to the complexity of local temporal dynamics and global dependencies across multiple variables. In this paper, we propose \textbf{N}eighboring \textbf{P}atching \textbf{Mixer} (\textbf{NPMixer}), a h...
Multivariate time series forecasting remains a challenge due to the complexity of local temporal dynamics and global dependencies across multiple variables. In this paper, we propose \textbf{N}eighboring \textbf{P}atching \textbf{Mixer} (\textbf{NPMixer}), a hierarchical architecture featuring a Learnable Stationary Wavelet Transform that adaptively learns filter coefficients to decompose signals into trend and detail components in a data-dependent manner. Our framework introduces a Neighboring ...
639 SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion
2605.07482
Retain-Set-Free LLM Unlearning用自蒸馏与logit降权实现无需保留集的LLM选择性遗忘。
cs.LGcs.AI
Zizhao Hu, Ameya Godbole, Johnny Tian-Zheng Wei, Mohammad Rostami, Jesse Thomason
Machine unlearning for large language models (LLMs) aims to selectively remove memorized content such as private data, copyrighted text, or hazardous knowledge, without costly full retraining. Most existing methods require a retain set of curated examples to p...
Machine unlearning for large language models (LLMs) aims to selectively remove memorized content such as private data, copyrighted text, or hazardous knowledge, without costly full retraining. Most existing methods require a retain set of curated examples to prevent catastrophic degradation of general model utility, creating an extra data dependency that complicates deployment. We propose SHRED (Self-distillation via High-surprisal-only Retain-set-free Entropy Demotion), a retain-set-free unlear...
640 Does Your Neural Network Extrapolate? Feature Engineering as Identifiability Bias for OOD Generalization
2605.07483
OOD Extrapolation Identifiability Bias从可辨识性角度解释神经网络为何难外推,并用特征工程引入偏置改进OOD。
cs.LGcs.AI
Leonel Aguilar, Jan Nagler, Christoph Hoelscher, Nino Antulov-Fantulin
Successful deep neural networks discover salient features of data. We show when and why they fail to learn out-of-distribution (OOD)-relevant representations from an in-distribution (ID) training window. This requires decoupling feature learning from data-gene...
Successful deep neural networks discover salient features of data. We show when and why they fail to learn out-of-distribution (OOD)-relevant representations from an in-distribution (ID) training window. This requires decoupling feature learning from data-generating-process (DGP) identifiability. From a single training window, OOD extrapolation is non-identifiable: infinitely many DGPs are $\varepsilon$-observationally equivalent on the training data but diverge arbitrarily outside it, and no in...
641 Excluding the Target Domain Improves Extrapolation: Deconfounded Hierarchical Physics Constraints
2605.07485
Deconfounded Physics Constraints for Extrapolation提出去混杂的分层物理约束门控机制以提升生成模型外推能力。
cs.LGcs.AI
Tsuyoshi Okita
Extrapolation to out-of-distribution conditions is a fundamental challenge for physics-constrained deep generative models. Existing methods apply physical constraints as a single static regularization term uniformly across the generation process, and address n...
Extrapolation to out-of-distribution conditions is a fundamental challenge for physics-constrained deep generative models. Existing methods apply physical constraints as a single static regularization term uniformly across the generation process, and address neither the hierarchical structure of physical laws and the confounding variable problem. We propose the Deconfounded Hierarchical Gate (DHG), which serves as a diagnostic and control mechanism: it identifies when and how strongly temperatur...
642 Tessellations of Semi-Discrete Flow Matching
2605.07513
Semi-Discrete Flow Matching Geometry研究高斯到有限点目标的半离散流匹配并分析其诱导的几何结构。
cs.LG
Emile Pierret, Johannes Hertrich, Samuel Hurault, Julie Delon
We study Flow Matching in a semi-discrete setting where a Gaussian source is transported toward a discrete target supported on finitely many points. This semi-discrete regime is the theoretical setting behind the use of Flow Matching for generative modeling, w...
We study Flow Matching in a semi-discrete setting where a Gaussian source is transported toward a discrete target supported on finitely many points. This semi-discrete regime is the theoretical setting behind the use of Flow Matching for generative modeling, where the target distribution is represented by a finite dataset. In this semi-discrete regime, the exact Flow Matching velocity field is available in closed form, which makes it possible to analyze the geometry induced by the terminal flow ...
643 Why Self-Inconsistency Arises in GNN Explanations and How to Exploit It
2605.07527
Self-Inconsistency in GNN Explanations揭示SI-GNN解释自不一致的成因并利用该现象改进解释与信号分配。
cs.LGcs.AI
Wenxin Tai, Yaqian Liu, Ting Zhong, Fan Zhou
Recent work has observed that explanations produced by Self-Interpretable Graph Neural Networks (SI-GNNs) can be self-inconsistent: when the model is reapplied to its own explanatory graph subset, it may produce a different explanation. However, why self-incon...
Recent work has observed that explanations produced by Self-Interpretable Graph Neural Networks (SI-GNNs) can be self-inconsistent: when the model is reapplied to its own explanatory graph subset, it may produce a different explanation. However, why self-inconsistency arises remains poorly understood. In this work, we first identify re-explanation-induced context perturbation as the direct cause of score variation. We then introduce a latent signal assignment hypothesis to explain why only some ...
644 SGD for Variational Inference: Tackling Unbounded Variance via Preconditioning and Dynamic Batching
2605.07531
BBVI SGD Variance Control针对BBVI梯度方差无界问题提出预条件与动态批量并给出收敛分析。
cs.LG
Hippolyte Labarri\`ere, Cesare Molinari, Silvia Villa, Lorenzo Rosasco
Black-Box Variational Inference (BBVI) typically relies on Stochastic Gradient Descent (SGD) to optimize the Evidence Lower Bound (ELBO). However, the stochastic gradients in BBVI inherently exhibit unbounded variance, violating standard assumptions and instea...
Black-Box Variational Inference (BBVI) typically relies on Stochastic Gradient Descent (SGD) to optimize the Evidence Lower Bound (ELBO). However, the stochastic gradients in BBVI inherently exhibit unbounded variance, violating standard assumptions and instead satisfying the weaker Blum-Gladyshev (BG) condition, where variance grows quadratically with distance from the optimum. In this paper, we bridge the gap between stochastic optimization theory and the practical instances of BBVI. Focusing ...
645 On the Invariance and Generality of Neural Scaling Laws
2605.07546
Generalizable Neural Scaling Laws研究如何让一次拟合的神经缩放律在新任务与新模型上保持泛化。
cs.LG
Xing Han, Ziyin Liu, Suchi Saria, Paul Pu Liang
Neural scaling laws establish a predictable relationship between model performance and data or compute, offering crucial guidance for resource allocation in new domains and tasks. Yet such laws are most needed precisely where they are hardest to obtain: fittin...
Neural scaling laws establish a predictable relationship between model performance and data or compute, offering crucial guidance for resource allocation in new domains and tasks. Yet such laws are most needed precisely where they are hardest to obtain: fitting one for a new model task pair demands expensive sweeps that typically exhaust the very compute budget the law is meant to economize. This paper poses the research question of how to develop generalizable scaling laws: laws fit once on a w...
646 Disagreement-Regularized Importance Sampling for Adversarial Label Corruption
2605.07551
Robust Importance Sampling under Corruption提出基于代理集成分歧正则的采样策略以抵抗对抗性标签污染。
cs.LG
Csongor Horv\'ath, Ida-Maria Sintorn, Prashant Singh
Standard Importance Sampling (IS) collapses under label corruption because high-norm examples, prioritized for variance reduction, are often adversarial outliers. We formalize this misalignment using an $\varepsilon$-contamination model and propose Disagreemen...
Standard Importance Sampling (IS) collapses under label corruption because high-norm examples, prioritized for variance reduction, are often adversarial outliers. We formalize this misalignment using an $\varepsilon$-contamination model and propose Disagreement-Regularized Importance Sampling (DR-IS), a sub-sampling method based on loss rank-disagreement across independent proxy ensemble. We prove finite-sample concentration bounds showing that the empirical rank disagreement of bulk corrupted e...
647 ProteinJEPA: Latent prediction complements protein language models
2605.07554
Protein JEPA Latent Prediction在蛋白语言模型中加入掩码位置的潜表示预测以提升表征学习。
cs.LGcs.AI
Dan Ofer, Dafna Shahaf, Michal Linial
Protein language models are trained primarily with masked language modeling (MLM), which predicts amino-acid identities at masked positions. We ask whether latent-space prediction can complement these token-level objectives under matched wall-clock budget. Acr...
Protein language models are trained primarily with masked language modeling (MLM), which predicts amino-acid identities at masked positions. We ask whether latent-space prediction can complement these token-level objectives under matched wall-clock budget. Across pretrained and random-init protein sequence encoders at 35--150M parameters, we find that the best protein-JEPA design is not all-position latent prediction but a variant: predicting latent targets only at masked positions, and retainin...
648 Beyond Distribution Estimation: Simplex Anchored Structural Inference Towards Universal Semi-Supervised Learning
2605.07557
Universal Semi-Supervised Learning提出UniSSL并用单纯形锚定结构推断在未知无标分布下利用未标数据。
cs.LG
Yaxin Hou, Jun Ma, Hanyang Li, Bo Han, Jie Yu
Semi-supervised learning faces significant challenges in realistic scenarios where labeled data is scarce and unlabeled data follows unknown, arbitrary distributions. We formalize this critical yet under-explored paradigm as Universal Semi-supervised Learning ...
Semi-supervised learning faces significant challenges in realistic scenarios where labeled data is scarce and unlabeled data follows unknown, arbitrary distributions. We formalize this critical yet under-explored paradigm as Universal Semi-supervised Learning (UniSSL). Existing methods typically leverage unlabeled data via pseudo-labeling. However, they often rely on the idealized assumption of a uniform unlabeled data distribution or require sufficient labeled data to estimate it. In the UniSSL...
649 Ensemble Distributionally Robust Bayesian Optimisation
2605.07565
Distributionally Robust Bayesian Optimisation提出可计算的集成分布鲁棒贝叶斯优化以应对上下文分布不确定性。
cs.LGcs.AI
Tigran Ramazyan, Denis Derkach
We study zeroth-order optimisation under context distributional uncertainty, a setting commonly tackled using Bayesian optimisation (BO). A prevailing strategy to make BO more robust to the complex and noisy nature of data is to employ an ensemble as the surro...
We study zeroth-order optimisation under context distributional uncertainty, a setting commonly tackled using Bayesian optimisation (BO). A prevailing strategy to make BO more robust to the complex and noisy nature of data is to employ an ensemble as the surrogate model, thereby mitigating the weaknesses of any single model. In this study, we propose a novel algorithm for Ensemble Distributionally Robust Bayesian Optimisation that remains computationally tractable while managing continuous conte...
650 Bilevel Graph Structure Learning, Revisited: Inner-Channel Origins of the Reported Gain
2605.07577
Bilevel Graph Structure Learning Analysis证明双层图结构学习的收益多来自内循环训练动力学而非重连本身。
cs.LG
Minkyoung Kim, Beakcheol Jang
Bilevel graph structure learning is widely understood to improve graph neural networks by jointly optimizing model parameters and a learned graph structure, with the resulting performance gain attributed to the rewired adjacency. We find that this attribution ...
Bilevel graph structure learning is widely understood to improve graph neural networks by jointly optimizing model parameters and a learned graph structure, with the resulting performance gain attributed to the rewired adjacency. We find that this attribution may be overstated: training-dynamics effects in the inner loop, rather than the rewiring itself, capture a substantial share of the gain. To establish this, we introduce frozen-$\phi$, a control that freezes the graph while retaining the in...
651 Revisiting Transformer Layer Parameterization Through Causal Energy Minimization
2605.07588
Transformer Parameterization via Energy Minimization用因果能量最小化框架重新解释并指导Transformer层的参数化设计。
cs.LGcs.AI
Jin Xu, Camille Couturier, Victor R\"uhle, Saravan Rajmohan, James Hensman
Transformer blocks typically combine multi-head attention (MHA) for token mixing with gated MLPs for token-wise feature transformation, yet many choices in their parameterization remain largely empirical. We introduce Causal Energy Minimization (CEM), a framew...
Transformer blocks typically combine multi-head attention (MHA) for token mixing with gated MLPs for token-wise feature transformation, yet many choices in their parameterization remain largely empirical. We introduce Causal Energy Minimization (CEM), a framework that recasts Transformer layers as optimization steps on conditional energy functions while explicitly accounting for layer parameterization. Extending prior energy-based interpretations of attention, CEM shows that weight-tied MHA can ...
652 Optimal Recourse Summaries via Bi-Objective Decision Tree Learning
2605.07598
Recourse Summaries Decision Trees用双目标决策树学习生成群体级可行动作摘要以便审计与公平分析。
cs.LG
Ioannis Chatzis, Jason Liartis, Athanasios Voulodimos, Giorgos Stamou
Actionable Recourse provides individuals with actions they can take to change an unfavorable classifier outcome. While useful at the instance level, it is ill-suited for global auditing and bias detection, since aggregating local actions is costly and often in...
Actionable Recourse provides individuals with actions they can take to change an unfavorable classifier outcome. While useful at the instance level, it is ill-suited for global auditing and bias detection, since aggregating local actions is costly and often inconsistent. Recourse Summaries address this limitation by partitioning the population and assigning one shared action per subgroup, enabling comparison across subgroups. Designing summaries involves a fundamental trade-off between recourse ...
653 Learning Large-Scale Modular Addition with an Auxiliary Modulus
2605.07648
Modular Addition with Auxiliary Modulus通过引入辅助模数缓解协变量偏移并扩展大规模模加法学习能力。
cs.LG
Hanato Kikuchi, Ryosuke Masuya, Kazuhiko Kawamoto, Hiroshi Kera
Learning parity functions, more general modular addition, is a challenging machine learning task due to its input sensitivity. A recent study substantially scaled modular addition learning in both the number of summands and the modulus. Its key idea is to incr...
Learning parity functions, more general modular addition, is a challenging machine learning task due to its input sensitivity. A recent study substantially scaled modular addition learning in both the number of summands and the modulus. Its key idea is to increase zeros in training sequences, reducing the effective number of summands and thus controlling training difficulty; however, this induces covariate shift between training and test input distributions. This study theoretically and empirica...
654 Direction-Preserving Number Representations
2605.07662
Direction-Preserving Low-Precision Numbers研究有限字母表低精度数值对向量方向保持能力并给出几何刻画。
cs.LG
Bardia Zadeh, George A. Constantinides
Low-precision number formats are widely used in modern machine learning systems due to their efficiency. Accurate direction representation is key to the accuracy of vector operations. This work precisely explores the extent to which the direction of a vector c...
Low-precision number formats are widely used in modern machine learning systems due to their efficiency. Accurate direction representation is key to the accuracy of vector operations. This work precisely explores the extent to which the direction of a vector can be represented by selecting its scalar elements from a common finite alphabet of a given size. This is standard practice in machine learning, where low-precision significands may be narrow-width floating-point or integer values. A geomet...
655 Structured Coupling for Flow Matching
2605.07676
Structured Coupling for Flow Matching在流匹配中引入结构化潜变量与噪声耦合以学习可解释潜结构。
cs.LG
Xavier Sumba, Carles Balsells-Rodas, Yingzhen Li
Standard flow matching scales well but typically relies on an unstructured source distribution, limiting its ability to learn interpretable latent structure. Latent-variable models, by contrast, capture structure but often sacrifice generative quality. We brid...
Standard flow matching scales well but typically relies on an unstructured source distribution, limiting its ability to learn interpretable latent structure. Latent-variable models, by contrast, capture structure but often sacrifice generative quality. We bridge this gap by proposing Structured Coupling for Flow Matching (SCFM), a cooperative framework that augments flow matching with structured latent representation learning. By introducing structured latent variables and exogenous noise into t...
656 The Coupling Tax: How Shared Token Budgets Undermine Visible Chain-of-Thought Under Fixed Output Limits
2605.07686
Chain-of-Thought Coupling Tax揭示固定输出预算下推理链会挤占答案长度并导致性能下降的耦合税。
cs.LG
Wenhua Nie, Junlin Liu, Jianan Wu, Zijie Meng, Yilong Fan
Chain-of-thought reasoning is often treated as a monotone way to improve language-model accuracy by letting a model think longer. We identify a countervailing effect, the coupling tax: when reasoning traces and final answers share one output-token budget, long...
Chain-of-thought reasoning is often treated as a monotone way to improve language-model accuracy by letting a model think longer. We identify a countervailing effect, the coupling tax: when reasoning traces and final answers share one output-token budget, long traces can crowd out the answer they are meant to support. Across GSM8K, MATH-500, and five BIG-Bench Hard tasks with Qwen3 models at three scales, non-thinking mode matches or outperforms thinking mode on GSM8K and MATH-500 at every budge...
657 Gradient Starvation in Binary-Reward GRPO: Why Group-Mean Centering Fails and Why the Simplest Fix Works
2605.07689
GRPO Gradient Starvation Fix分析二值奖励下GRPO均值中心化导致梯度饥饿并给出简单有效修复。
cs.LG
Wenhua Nie, Jianan Wu, Junlin Liu, Ziwei Li, Zheng Lin
Group Relative Policy Optimization (GRPO) is a standard algorithm for reinforcement learning from verifiable rewards, but its group-mean-centered advantage can fail under binary rewards. The failure mode is gradient starvation: when every response in a group i...
Group Relative Policy Optimization (GRPO) is a standard algorithm for reinforcement learning from verifiable rewards, but its group-mean-centered advantage can fail under binary rewards. The failure mode is gradient starvation: when every response in a group is correct or every response is wrong, the centered advantage is exactly zero and the policy receives no learning signal. We prove that the true degeneracy rate always exceeds the i.i.d. Bernoulli prediction by Jensen's inequality, and obser...
658 Fortifying Time Series: DTW-Certified Robust Anomaly Detection
2605.07690
DTW-Certified Robust Anomaly Detection提出在DTW扰动模型下具可认证鲁棒性的时间序列异常检测方法。
cs.LG
Shijie Liu, Tansu Alpcan, Christopher Leckie, Sarah Erfani
Time-series anomaly detection is critical for ensuring safety in high-stakes applications, where robustness is a fundamental requirement rather than a mere performance metric. Addressing the vulnerability of these systems to adversarial manipulation is therefo...
Time-series anomaly detection is critical for ensuring safety in high-stakes applications, where robustness is a fundamental requirement rather than a mere performance metric. Addressing the vulnerability of these systems to adversarial manipulation is therefore essential. Existing defenses are largely heuristic or provide certified robustness only under $\ell_p$-norm constraints, which are incompatible with time-series data. In particular, $\ell_p$-norm fails to capture the intrinsic temporal s...
659 Toward Better Geometric Representations for Molecule Generative Models
2605.07693
Geometric Representations for Molecule Generation改进分两阶段分子生成中的几何表征学习以提升3D结构生成质量。
cs.LG
Shaoheng Yan, Zian Li, Cai Zhou, Qiaojing Huang, Kai Liu
Geometric representation-conditioned molecule generation provides an effective paradigm that decouples molecule representation modeling from structure generation. By decoupling molecule generation into two stages-first generating a meaningful molecule represen...
Geometric representation-conditioned molecule generation provides an effective paradigm that decouples molecule representation modeling from structure generation. By decoupling molecule generation into two stages-first generating a meaningful molecule representation, and then generating a 3D molecule conditioned on this representation-the efficiency and quality of the generation process can be significantly enhanced. However, its effectiveness is fundamentally limited by the quality of the repre...
660 Future Validity is the Missing Statistic: From Impossibility to $\Phi$-Estimation for Grammar-Faithful Speculative Decoding
2605.07698
Grammar-Faithful Speculative Decoding指出现有推测解码偏离语法条件分布并用Φ估计实现语法忠实采样。
cs.LG
Wenhua Nie, Zijie Meng, Kun Zou, Zheng Lin, Ziwei Li
Grammar-constrained generation is often combined with local vocabulary masking and speculative decoding, but the resulting sampling law is not the grammar-conditional distribution users usually intend. We show that any speculative decoder with local mask acces...
Grammar-constrained generation is often combined with local vocabulary masking and speculative decoding, but the resulting sampling law is not the grammar-conditional distribution users usually intend. We show that any speculative decoder with local mask access, Leviathan rejection, and rollback soundness samples from the locally projected distribution $\mu^{\mathrm{proj}}$ rather than the grammar-conditional distribution $\mu^\star$. This extends the GAD impossibility result to speculative deco...
661 Bayesian Fine-tuning in Projected Subspaces
2605.07706
Bayesian LoRA Fine-tuning在LoRA投影子空间中进行贝叶斯微调以量化不确定性。
cs.LG
Viktar Dubovik, Patryk Marsza{\l}ek, Jacek Tabor, Tomasz Ku\'smierczyk
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of large models by decomposing weight updates into low-rank matrices, significantly reducing storage and computational overhead. While effective, standard LoRA lacks mechanisms for uncertainty ...
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of large models by decomposing weight updates into low-rank matrices, significantly reducing storage and computational overhead. While effective, standard LoRA lacks mechanisms for uncertainty quantification, leading to overconfident and poorly calibrated models. Bayesian variants of LoRA address this limitation, but at the cost of a significantly increased number of trainable parameters, partially offsetting the original efficie...
662 An Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inference
2605.07719
Hybrid Sparse Attention提出CPU-GPU并行的混合稀疏注意力加速长上下文推理。
cs.LGcs.AI
Feiyu Yao, Zhixiong Niu, Xiaqing Li, Yongqiang Xiong, Juan Fang
Long-context inference increasingly operates over CPU-resident KV caches, either because decoding-time KV states exceed GPU memory capacity or because disaggregated prefill-decode systems place KV data in host memory. Although block-sparse attention reduces at...
Long-context inference increasingly operates over CPU-resident KV caches, either because decoding-time KV states exceed GPU memory capacity or because disaggregated prefill-decode systems place KV data in host memory. Although block-sparse attention reduces attention cost in this setting, sparsity alone is insufficient for end-to-end efficiency. GPU-only designs remain constrained by PCIe bandwidth and metadata memory overhead, while CPU-GPU hybrid designs still suffer from substantial GPU idle ...
663 Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences
2605.07724
Synthetic Retraining Collapse理论分析多奖励偏好策展可缓解生成模型递归训练塌缩。
cs.LGcs.AI
Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab
Recursive retraining of generative models poses a critical representation challenge: when synthetic outputs are curated based on a fixed reward signal, the model tends to collapse onto a narrow set of outputs that over-optimize that objective. Prior work sugge...
Recursive retraining of generative models poses a critical representation challenge: when synthetic outputs are curated based on a fixed reward signal, the model tends to collapse onto a narrow set of outputs that over-optimize that objective. Prior work suggests that such collapse is unavoidable without adding real data into the mix. We revisit this conclusion from an alignment perspective and show that collapse can be mitigated through curation based on multiple reward functions. We formalize ...
664 Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow
2605.07727
Wasserstein Generative Policy用W2梯度流推导一步生成式策略更新并含信任域约束。
cs.LGcs.AI
Juil Koo, Mingue Park, Jiwon Choi, Yunhong Min, Minhyuk Sung
We propose Drifting Field Policy (DFP), a non-ODE one-step generative policy built on the drifting model paradigm. We frame the policy update as a reverse-KL Wasserstein-2 gradient flow toward a soft target policy, so that each DFP update corresponds to a grad...
We propose Drifting Field Policy (DFP), a non-ODE one-step generative policy built on the drifting model paradigm. We frame the policy update as a reverse-KL Wasserstein-2 gradient flow toward a soft target policy, so that each DFP update corresponds to a gradient step in probability space. By construction, this gradient is decomposed into an ascent toward higher action-value regions and a score matching with the anchor policy as a trust region. We further derive a simple, tractable surrogate of...
665 Intelligent Truck Matching in Full Truckload Shipments using Ping2Hex approach
2605.07733
GPS Truck-Shipment Matching用Ping2Hex将GPS轨迹匹配到运单并做概率排序。
cs.LGcs.AI
Srinivas Kumar R, Jose Mathew, Ankit Singh Chauhan, Dinesh Rajkumar, Aravind Manoj
Accurate truck-to-shipment matching using GPS data is foundational for full truckload supply chain visibility, enabling real-time tracking and accurate estimated time of arrival (ETA) predictions. However, missing or corrupted vehicle identifiers prevent tradi...
Accurate truck-to-shipment matching using GPS data is foundational for full truckload supply chain visibility, enabling real-time tracking and accurate estimated time of arrival (ETA) predictions. However, missing or corrupted vehicle identifiers prevent traditional matching approaches, leaving shipments without visibility. This paper presents Intelligent Truck Matching (ITM) 2.0, a machine learning system that addresses this critical gap by formulating matching as a probabilistic ranking proble...
666 Robust and Reliable AI for Predictive Quality in Semiconductor Materials Manufacturing with MLOps and Uncertainty Quantification
2605.07752
MLOps with Uncertainty基于五年产线数据评测重训练策略并做不确定性质量预测。
cs.LG
Min Gao, Julia Maria Perathoner, Anton Ludwig Bonin, Steven Eulig, Gianni Klesse
Semiconductor materials manufacturing presents unique challenges for machine learning deployment due to evolving process conditions, equipment degradation, and raw material variability that can cause model performance deterioration over time. This study benchm...
Semiconductor materials manufacturing presents unique challenges for machine learning deployment due to evolving process conditions, equipment degradation, and raw material variability that can cause model performance deterioration over time. This study benchmarks machine learning operations (MLOps) retraining strategies using five years of real manufacturing data to identify optimal retraining approaches for quality prediction. We evaluate various retraining frequencies and hyperparameter optim...
667 When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining
2605.07756
Online Loss Weighting用双层梯度对齐在线学习预训练多损失权重以提升下游效果。
cs.LGcs.AI
Ivan Karpukhin, Andrey Savchenko
Modern deep models are often pretrained on large-scale data with missing labels using composite objectives, where the relative weights of multiple loss terms act as hyperparameters. Tuning these weights with random search or Bayesian optimization is computatio...
Modern deep models are often pretrained on large-scale data with missing labels using composite objectives, where the relative weights of multiple loss terms act as hyperparameters. Tuning these weights with random search or Bayesian optimization is computationally expensive, as it requires many independent training runs. To address this, we propose a gradient-based bilevel method that learns pretraining loss weights online by aligning the composite pretraining gradient with a downstream objecti...
668 Efficient Verification of Neural Control Barrier Functions with Smooth Nonlinear Activations
2605.07757
NCBF Formal Verification提出LightCROWN为tanh等激活计算更紧Jacobian界以验证NCBF。
cs.LG
Jun Zhang, Haibo Zhang, Chun Liu, Xiaofan Wang, Liang Xu
Formal verification of neural control barrier functions (NCBFs) remains challenging, especially for neural networks with nonlinear activations like \(\tanh\). Existing CROWN-based methods rely on conservative linear relaxations for Jacobian bounds, limiting sc...
Formal verification of neural control barrier functions (NCBFs) remains challenging, especially for neural networks with nonlinear activations like \(\tanh\). Existing CROWN-based methods rely on conservative linear relaxations for Jacobian bounds, limiting scalability. We propose LightCROWN, which computes tighter Jacobian bounds by exploiting the analytical properties of activation functions. Experiments on nonlinear control systems including the inverted pendulum, Dubins car, and planar quadr...
669 Pre-trained Tabular Foundation Models as Versatile Summary Networks for Neural Posterior Estimation
2605.07765
TabPFN for SBI将TabPFN作为免训练摘要网络用于仿真贝叶斯后验估计。
cs.LG
Elliot Pickens, Chiraag Gohel, Sidharth Satya
In this work, we study TabPFN as a training-free, modular summary network for simulation-based Bayesian inference (SBI). Tabular foundation models such as TabPFN are pretrained on broad families of synthetic tabular data-generating processes and adapt at test ...
In this work, we study TabPFN as a training-free, modular summary network for simulation-based Bayesian inference (SBI). Tabular foundation models such as TabPFN are pretrained on broad families of synthetic tabular data-generating processes and adapt at test time through in-context learning, making them natural candidates for SBI, where posterior estimation often depends on learning informative summaries of simulated observations. We propose PFN-NPE: a general recipe that uses a pretrained TabP...
670 Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers
2605.07772
Transformer Mean-field Training在均场Transformer中研究训练如何促使表示逃离token聚类。
cs.LG
Noboru Isobe, Daisuke Inoue, Masaaki Imaizumi
Transformers perform inference by iteratively transforming token representations across layers. This layerwise computation has been studied empirically, and recent mean-field theories of Transformer dynamics explain how attention can drive token distributions ...
Transformers perform inference by iteratively transforming token representations across layers. This layerwise computation has been studied empirically, and recent mean-field theories of Transformer dynamics explain how attention can drive token distributions toward clustering. However, existing mean-field analyses largely treat model parameters as prescribed, leaving open how training reshapes this clustering picture. We study this question in a noisy mean-field Transformer in which only a para...
671 POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles
2605.07775
Uncertainty-Aware LLM Optimization用计算高效的策略集成实现Thompson采样式LLM优化与不确定性。
cs.LGcs.AI
Nicolas Menet, Andreas Krause, Abbas Rahimi
Balancing exploration and exploitation is a core challenge in sequential decision-making and black-box optimization. We introduce POETS ($\textbf{Po}$licy $\textbf{E}$nsembles for $\textbf{T}$hompson $\textbf{S}$ampling), a novel framework that bridges uncerta...
Balancing exploration and exploitation is a core challenge in sequential decision-making and black-box optimization. We introduce POETS ($\textbf{Po}$licy $\textbf{E}$nsembles for $\textbf{T}$hompson $\textbf{S}$ampling), a novel framework that bridges uncertainty quantification and policy optimization. Our approach is grounded in the insight that policies trained with Kullback-Leibler (KL) regularization implicitly encode an underlying reward function. Building on this, POETS bypasses the compl...
672 Neural Operators as Efficient Function Interpolators
2605.07792
Neural Operators Interpolation将神经算子重释为函数插值器并在多基准上优于MLP。
cs.LGcs.AI
Vasilis Niarchos, Angelos Sirbu, Sokratis Trifinopoulos
Neural operators (NOs) are designed to learn maps between infinite-dimensional function spaces. We propose a novel reframing of their use. By introducing an auxiliary base-space, any finite-dimensional function can be viewed as an operator acting by compositio...
Neural operators (NOs) are designed to learn maps between infinite-dimensional function spaces. We propose a novel reframing of their use. By introducing an auxiliary base-space, any finite-dimensional function can be viewed as an operator acting by composition on functions of the base-space. Through a range of benchmarks on analytic functions of increasing complexity and dimensionality, we demonstrate that NOs can match or outperform standard multilayer perceptrons and Kolmogorov--Arnold Networ...
673 Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning
2605.07799
LUPI for Foundation Models提出PIQL引入特权信息以加速并提升表格基础模型泛化。
cs.LGcs.AI
Xueying Ding, Leman Akoglu
Training foundation models is computationally intensive and often slow to converge.We introduce PIQL,Privileged Information for Quick and Quality Learning, the first framework to systematically integrate privileged information (PI) to simultaneously accelerate...
Training foundation models is computationally intensive and often slow to converge.We introduce PIQL,Privileged Information for Quick and Quality Learning, the first framework to systematically integrate privileged information (PI) to simultaneously accelerate learning and improve generalization in tabular foundation models (TFMs). We construct two complementary forms of PI: (i) aggregate dataset-level statistics that reduce the burden on in-context learning, and (ii) encodings of the underlying...
674 Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning
2605.07804
On-Policy Distillation Pruning通过剪枝漂移轨迹提升长链推理的在策略蒸馏效率与可靠性。
cs.LGcs.AI
Zhicheng Yang, Zhijiang Guo, Yifan Song, Minrui Xu, Yongxin Wang
On-policy distillation (OPD) leverages dense teacher rewards to enhance reasoning models. However, scaling OPD to long-horizon tasks exposes a critical flaw: as the student's generated prefix inevitably diverges from the teacher's thought process, the teacher'...
On-policy distillation (OPD) leverages dense teacher rewards to enhance reasoning models. However, scaling OPD to long-horizon tasks exposes a critical flaw: as the student's generated prefix inevitably diverges from the teacher's thought process, the teacher's dense reward loses local exploitability. Continuing to generate and evaluate tokens on these ``drifted'' trajectories not only degrades reward quality but also incurs massive computational waste. To address this, we introduce \textbf{Prun...
675 Flexible Routing via Uncertainty Decomposition
2605.07805
Uncertainty-Based Model Routing分解不确定性实现灵活路由以减少不必要的高成本调用。
cs.LG
Charlotte Peale, Siddartha Devic, Parikshit Gopalan, Udi Wieder, Aravind Gollakota
A key strategy for balancing performance and cost in modern machine learning systems is to dynamically route queries to either a low-cost model or a more expensive oracle (such as a large pretrained model or human expert), an approach known as model routing. I...
A key strategy for balancing performance and cost in modern machine learning systems is to dynamically route queries to either a low-cost model or a more expensive oracle (such as a large pretrained model or human expert), an approach known as model routing. In this work we present a new uncertainty-aware router that (1) avoids unnecessary oracle calls on inherently ambiguous queries, and (2) adapts dynamically to different loss functions and cost parameters through simple hyperparameter changes...
676 The Minimax Rate of Second-Order Calibration
2605.07808
Second-Order Calibration Rate给出二阶校准误差估计的极小极大收敛速率与方法。
cs.LG
Kamil Ciosek, Banafsheh Rafiee, Sina Ghiassian, Nicol\`o Felicioni
We characterize the minimax rate of estimating the second-order calibration error for binary classification, which quantifies whether a higher-order predictor's epistemic-uncertainty estimate matches the conditional variance of the label probability on its lev...
We characterize the minimax rate of estimating the second-order calibration error for binary classification, which quantifies whether a higher-order predictor's epistemic-uncertainty estimate matches the conditional variance of the label probability on its level sets. Our key observation is that the sech perturbation kernel, previously used only to enforce smoothness of calibration functions, in fact makes them analytic in a strip of half-width $h\pi/2$. Polynomial regression then estimates the ...
677 Scaling Categorical Flow Maps
2605.07820
Discrete Flow Matching LM扩展离散流匹配语言模型以提升可扩展性与生成效率。
cs.LG
Oscar Davis, Anastasiia Filippova, Pierre Ablin, Victor Turrisi, Amitis Shidani
Continuous diffusion and flow matching models could represent a powerful alternative to autoregressive approaches for language modelling (LM), as they unlock a host of advantages currently reserved for continuous modalities, including accelerated sampling and ...
Continuous diffusion and flow matching models could represent a powerful alternative to autoregressive approaches for language modelling (LM), as they unlock a host of advantages currently reserved for continuous modalities, including accelerated sampling and tilting. Recently, several works have demonstrated the possibility of generating discrete data continuously by a simple flow matching process between a Gaussian and the one-hot encoded data distribution. They have further shown the feasibil...
678 Approximation-Free Differentiable Oblique Decision Trees
2605.07837
Differentiable Oblique Trees提出无需近似的可微斜决策树以端到端梯度训练。
cs.LGcs.AI
Subrat Prasad Panda, Blaise Genest, Arvind Easwaran
Decision Trees (DTs) are widely used in safety-critical domains such as medical diagnosis, valued for their interpretability and effectiveness on tabular data. However, training accurate oblique DTs is challenging due to complex optimization landscapes and ove...
Decision Trees (DTs) are widely used in safety-critical domains such as medical diagnosis, valued for their interpretability and effectiveness on tabular data. However, training accurate oblique DTs is challenging due to complex optimization landscapes and overfitting risks, particularly in regression. Recent advances have introduced differentiable formulations that enable gradient-based training and joint optimization of decision boundaries and leaf regressors. Yet, existing approaches typicall...
679 RelAgent: LLM Agents as Data Scientists for Relational Learning
2605.07840
LLM Agent for Relational Learning提出RelAgent让LLM代理自动搜索并构建关系学习建模流程。
cs.LG
Xingyue Huang, Louis Tichelman, Jinwoo Kim, Krzysztof Olejniczak, \.Ismail \.Ilkan Ceylan
Relational learning is a challenging problem that has motivated a wide range of approaches, including graph-based models (e.g., graph neural networks, graph transformers), tabular methods (e.g., tabular foundation models), and sequence-based approaches (e.g., ...
Relational learning is a challenging problem that has motivated a wide range of approaches, including graph-based models (e.g., graph neural networks, graph transformers), tabular methods (e.g., tabular foundation models), and sequence-based approaches (e.g., large language models), each with its own advantages and limitations. We propose RelAgent, an LLM-based autonomous data scientist for relational learning, which operates in two phases. In the search phase, an LLM agent uses database, valida...
680 \mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments
2605.07841
Adversarial Decentralized Learning在对手占多数场景下用一致性激励机制实现去中心化鲁棒学习。
cs.LGcs.AI
Hanzaleh Akbari Nodehi, Parsa Moradi, Soheil Mohajer, Mohammad Ali Maddah-Ali
Decentralized machine learning often relies on outsourcing computations, such as gradient evaluations, to untrusted worker nodes. Existing robust aggregation methods can mitigate malicious behavior under honest-majority assumptions, but may fail when adversari...
Decentralized machine learning often relies on outsourcing computations, such as gradient evaluations, to untrusted worker nodes. Existing robust aggregation methods can mitigate malicious behavior under honest-majority assumptions, but may fail when adversaries control a majority of the workers. We study this adversary-dominated setting through an incentive-oriented framework in which reports are accepted and rewarded only when they are mutually consistent up to a threshold. This turns the adve...
681 Distributional simplicity bias and effective convexity in Energy Based Models
2605.07844
Energy-Based Model Dynamics从有效模型动力学解释EBM训练的简单性偏置与有效凸性。
cs.LG
Aur\'elien Decelle, Alfonso de Jes\'us Navas G\'omez, Beatriz Seoane
Energy-based learning is a powerful framework for generative modelling, but its training is inherently non-convex, leading potentially to sensitivity to initialisation, poor local optima, and unstable gradient dynamics. We present a dynamical analysis of energ...
Energy-based learning is a powerful framework for generative modelling, but its training is inherently non-convex, leading potentially to sensitivity to initialisation, poor local optima, and unstable gradient dynamics. We present a dynamical analysis of energy-based learning through the lens of the effective model, which can be interpreted as either a generalised Ising model with higher-order interactions or the Fourier expansion of the energy. Under sufficient expressivity, we show that the gr...
682 Actor-Critic Algorithm for Dynamic Expectile and CVaR
2605.07857
Risk-Sensitive Actor-Critic提出无扰动策略梯度与模型无关学习以优化动态Expectile与CVaR。
cs.LG
Yudong Luo, Erick Delage
Optimizing dynamic risk with stochastic policies is challenging in both policy updates and value learning. The former typically requires transition perturbation, while the latter may rely on model-based approaches. To address these challenges, we propose a sur...
Optimizing dynamic risk with stochastic policies is challenging in both policy updates and value learning. The former typically requires transition perturbation, while the latter may rely on model-based approaches. To address these challenges, we propose a surrogate policy gradient without transition perturbation under softmax policy parameterization. We further develop model-free value learning methods for dynamic expectile and conditional value-at-risk by leveraging elicitability. Finally, ins...
683 On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems
2605.07860
Federated On-Device Generative Models分析联邦预测维护中端侧生成模型的性能与资源权衡。
cs.LGcs.AI
Usevalad Milasheuski, Piero Baraldi, Enrico Zio, Stefano Savazzi
Federated Learning (FL) has emerged as a promising paradigm for preserving client data ownership and control over distributed Internet of Things (IoT) environments. While discriminative models dominate most FL use cases, recent advances in generative models --...
Federated Learning (FL) has emerged as a promising paradigm for preserving client data ownership and control over distributed Internet of Things (IoT) environments. While discriminative models dominate most FL use cases, recent advances in generative models -- such as Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), and Diffusion Models (DM) -- offer new opportunities for unsupervised anomaly detection in time series analysis, with relevant applications in predictive mainte...
684 ADKO: Agentic Decentralized Knowledge Optimization
2605.07863
Decentralized GP Optimization用知识token通信的多代理GP框架实现隐私高效黑盒优化。
cs.LG
Lucas Nerone Rillo, Zhanhong Jiang, Nastaran Saadati, Aditya Balu, Baskar Ganapathysubramanian
We present Agentic Decentralized Knowledge Optimization (ADKO), a framework for collaborative black-box optimization across autonomous agents that achieves sample efficiency, privacy preservation, heterogeneous-objective handling, and communication efficiency....
We present Agentic Decentralized Knowledge Optimization (ADKO), a framework for collaborative black-box optimization across autonomous agents that achieves sample efficiency, privacy preservation, heterogeneous-objective handling, and communication efficiency. Each agent maintains a private Gaussian Process (GP) surrogate trained on local data and communicates only through knowledge tokens-compact, lossy summaries containing directional signals, advantage scores, and optional language-model (LM)...
685 Black-box model classification under the discriminative factorization
2605.07878
Black-box Model Classification提出判别因子分解以用更稳健查询集表征并分类黑盒模型。
cs.LG
Hayden Helm, Merrick Ohata, Carey Priebe
Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based ...
Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based on the relationship between their embedded responses to a set of queries are useful for inferring model-level properties, the quality of these representations is highly sensitive to the query set. We introduce the \emph{discriminative facto...
686 Adaptive Regularization for Sparsity Control in Bregman-Based Optimizers
2605.07892
Adaptive Sparsity Regularization为Bregman优化器自适应调节正则以精确控制稀疏率。
cs.LG
Ahmad Aloradi, Tim Roith, Emanu\"el A. P. Habets, Daniel Tenbrinck
Sparse training reduces the memory and computational costs of deep neural networks. However, sparse optimization methods, e.g., those adding an $\ell_1$ penalty, often control sparsity only indirectly through a regularization parameter $\lambda$, whose mapping...
Sparse training reduces the memory and computational costs of deep neural networks. However, sparse optimization methods, e.g., those adding an $\ell_1$ penalty, often control sparsity only indirectly through a regularization parameter $\lambda$, whose mapping to the final sparsity rate is non-trivial. In our experiments, we found this parameter sensitivity to be particularly pronounced for Bregman-based optimizers. Specifically, the two variants LinBreg and AdaBreg reach the same sparsity at $\...
687 Curvature Beyond Positivity: Greedy Guarantees for Arbitrary Submodular Functions
2605.07902
Submodular Greedy Guarantees给出任意子模函数在曲率条件下的贪心近似保证。
cs.LG
Yixin Chen, Alan Kuhnle
Submodular functions -- functions exhibiting diminishing returns -- are central to machine learning. When the objective is monotone and non-negative, the greedy algorithm achieves a tight $63\%$ approximation. But many practical objectives incorporate costs th...
Submodular functions -- functions exhibiting diminishing returns -- are central to machine learning. When the objective is monotone and non-negative, the greedy algorithm achieves a tight $63\%$ approximation. But many practical objectives incorporate costs that make them negative on some inputs, and all existing multiplicative guarantees require non-negativity. Prior work handles negativity through additive bounds for the special class of decomposable functions and non-monotonicity through part...
688 Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders
2605.07922
Hierarchical Sparse Autoencoders提出Tree SAE学习稀疏自编码器的层级特征结构并改进判别准则。
cs.LG
Tue M. Cao, Hoang X. Nhat, Raed Alharbi, My T. Thai
Learning hierarchical features in Sparse Autoencoders (SAEs) is essential for capturing the structured nature of real-world data and mitigating issues like feature absorption or splitting. Existing works attempt to identify hierarchical relationships within in...
Learning hierarchical features in Sparse Autoencoders (SAEs) is essential for capturing the structured nature of real-world data and mitigating issues like feature absorption or splitting. Existing works attempt to identify hierarchical relationships within independent feature sets by relying on activation coverage, the assumption that child feature should only activate when its parent feature activates. However, we demonstrate that this condition alone is insufficient; that is, it often produce...
689 INO-SGD: Addressing Utility Imbalance under Individualized Differential Privacy
2605.07930
Individualized Differential Privacy SGD提出INO-SGD在个性化差分隐私下缓解不同群体效用不均衡。
cs.LGcs.AI
Xiao Tian, Jue Fan, Rachael Hwee Ling Sim, Bryan Kian Hsiang Low
Differential privacy (DP) is widely employed in machine learning to protect confidential or sensitive training data from being revealed. As data owners gain greater control over their data due to personal data ownership, they are more likely to set their own p...
Differential privacy (DP) is widely employed in machine learning to protect confidential or sensitive training data from being revealed. As data owners gain greater control over their data due to personal data ownership, they are more likely to set their own privacy requirements, necessitating individualized DP (IDP) to fulfil such requests. In particular, owners of data from more sensitive subsets, such as positive cases of stigmatized diseases, likely set stronger privacy requirements, as leak...
690 Prototype Guided Post-pretraining for Single-Cell Representation Learning
2605.07938
Single-Cell Prototype Post-pretraining用原型引导的后预训练提升单细胞表示在长尾与分布移位下泛化。
cs.LG
Sachini Weerasekara, Natasha Darras, Sagar Kamarthi, Colles Price, Jacqueline Isaacs
Single-cell representation learning (SCRL) from gene expression data offers a way to uncover the complex regulatory logic underlying cellular function. Inspired by large language models in natural language modeling, several single-cell pretrained models have r...
Single-cell representation learning (SCRL) from gene expression data offers a way to uncover the complex regulatory logic underlying cellular function. Inspired by large language models in natural language modeling, several single-cell pretrained models have recently been proposed that treat genes as tokens and cells as sentences. However, these models are fundamentally limited by the long-tailed nature of cell-type distributions and struggle to generalize under covariate shifts in gene expressi...
691 Slowly Annealed Langevin Dynamics: Theory and Applications to Training-Free Guided Generation
2605.07950
Annealed Langevin sampling提出SALD慢退火采样并给出KL收敛界用于免训练引导生成。
cs.LG
Atsushi Nitanda, Dake Bu, Yueming Lyu, Tanya Veeravalli
We study Slowly Annealed Langevin Dynamics (SALD), a sampler for tracking a path of moving target distributions and approximating the terminal target through time slowdown. We establish non-asymptotic convergence guarantees via a KL differential inequality, sh...
We study Slowly Annealed Langevin Dynamics (SALD), a sampler for tracking a path of moving target distributions and approximating the terminal target through time slowdown. We establish non-asymptotic convergence guarantees via a KL differential inequality, showing that slowdown improves tracking through contraction of intermediate targets and the complexity of the path. Motivated by training-free guided generation with pretrained score-based generative models, we further introduce Velocity-Awar...
692 Convergent Stochastic Training of Attention and Understanding LoRA
2605.07959
LoRA attention optimization theory统一分析注意力与LoRA的随机训练并证明在温和条件下可收敛可训练。
cs.LG
Zhengkai Sun, Dibyakanti Kumar, Alejandro F Frangi, Anirbit Mukherjee, Mingfei Sun
Transformers have revolutionized machine learning and deploying attention layers in the model is increasingly standard across a myriad of applications. Further, for large models, it is common to implement Low Rank Adaptation (LoRA), whereby a factorized parame...
Transformers have revolutionized machine learning and deploying attention layers in the model is increasingly standard across a myriad of applications. Further, for large models, it is common to implement Low Rank Adaptation (LoRA), whereby a factorized parameterization of them is trained, to achieve a surprisingly beneficial accuracy-size trade-off. In this work, via a unified framework we rigorously establish trainability of such models under stochastic methods. We prove that for any mild regu...
693 Graph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMs
2605.07961
Federated LLM poisoning defense用图表示学习增强联邦微调以检测并缓解恶意更新的模型操纵攻击。
cs.LG
Hanlin Cai, Kai Li, Houtianfu Wang, Haofan Dong, Yichen Li
Federated fine-tuning (FFT) has emerged as a privacy-preserving paradigm for collaboratively adapting large language models (LLMs). Built upon federated learning, FFT enables distributed agents to jointly refine a shared pretrained LLM by aggregating local LLM...
Federated fine-tuning (FFT) has emerged as a privacy-preserving paradigm for collaboratively adapting large language models (LLMs). Built upon federated learning, FFT enables distributed agents to jointly refine a shared pretrained LLM by aggregating local LLM updates without sharing local raw data. However, FFT-based LLMs remain vulnerable to model manipulation threats, in which adversarial participants upload manipulated LLM updates that corrupt the aggregation process and degrade the performa...
694 FLAM: Evaluating Model Performance with Aggregatable Measures in Federated Learning
2605.07962
Federated evaluation metrics aggregation提出可聚合评估度量框架以在联邦学习中可靠汇总各端性能。
cs.LG
Fabian Stricker, Jose A. Peregrina, David Bermbach, Christian Zirpins
Performance evaluation is essential for assessing the quality of machine learning (ML) models and guiding deployment decisions. In federated learning (FL), assessing the performance is challenging because data are distributed across participants. Consequently,...
Performance evaluation is essential for assessing the quality of machine learning (ML) models and guiding deployment decisions. In federated learning (FL), assessing the performance is challenging because data are distributed across participants. Consequently, the coordinator must rely on locally computed evaluation metrics and aggregate them to assess the global model. A key challenge is that common aggregation strategies, such as weighted averaging based on the local samples per participant, d...
695 Aggregation in conformal e-classification
2605.07963
Conformal e-predictor aggregation实验研究交叉共形e预测器及其更简洁灵活的聚合改进方法。
cs.LG
Vladimir Vovk
Aggregating conformal predictors is a standard way of balancing their predictive and computational efficiency while retaining their validity, at least approximately. An important advantage of conformal e-predictors is that they are easier to aggregate without ...
Aggregating conformal predictors is a standard way of balancing their predictive and computational efficiency while retaining their validity, at least approximately. An important advantage of conformal e-predictors is that they are easier to aggregate without sacrificing their validity. This paper studies experimentally cross-conformal e-prediction, which is an existing method of aggregating conformal e-predictors, and its modifications that are conceptually simpler and more flexible.
696 When Diffusion Model Can Ignore Dimension: An Entropy-Based Theory
2605.07969
Diffusion sampling dimension-free theory用熵分析解释扩散采样在高维下可少步数并给出弱维度依赖理论。
cs.LG
Ahmad Aghapour, Erhan Bayraktar
Diffusion models perform remarkably well on high-dimensional data such as images, often using only a modest number of reverse-time steps. Despite this practical success, existing convergence theory does not fully explain why such samplers remain efficient in h...
Diffusion models perform remarkably well on high-dimensional data such as images, often using only a modest number of reverse-time steps. Despite this practical success, existing convergence theory does not fully explain why such samplers remain efficient in high dimensions. Many prior KL guarantees bound the discretization error in terms of the ambient dimension, while other improved results replace this dependence using intrinsic-dimensional or geometric structure assumptions. In this work, we...
697 It Just Takes Two: Scaling Amortized Inference to Large Sets
2605.07972
Amortized inference for large sets提出仅需两元素训练即可泛化到大集合条件的摊销后验估计方法。
cs.LGcs.AI
Antoine Wehenkel, Michael Kagan, Lukas Heinrich, Chris Pollard
Neural posterior estimation has emerged as a powerful tool for amortized inference, with growing adoption across scientific and applied domains. In many of these applications, the conditioning variable is a set of observations whose elements depend not only on...
Neural posterior estimation has emerged as a powerful tool for amortized inference, with growing adoption across scientific and applied domains. In many of these applications, the conditioning variable is a set of observations whose elements depend not only on the target but also on unknown factors shared across the set. Optimal inference therefore requires treating the set jointly, which in turn requires training the estimator at the deployment set size -- a regime where memory and compute quic...
698 Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback
2605.07977
Online federated LLM self-improvement在联邦在线微调中用优势加权自博弈融合实时反馈提升LLM。
cs.LG
Seohyun Lee, Wenzhi Fang, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton
Recent works have advanced feedback-based learning systems, whereby a foundation model is able to intake incoming feedback (e.g., a user) to self-improve, creating a self-loop system of training. However, existing works are limited in needing to consider an of...
Recent works have advanced feedback-based learning systems, whereby a foundation model is able to intake incoming feedback (e.g., a user) to self-improve, creating a self-loop system of training. However, existing works are limited in needing to consider an offline setup to allow for such feedback-based methods, and are further limited in the need of requiring privileged ground-truth contexts for training. Moreover, there is limited consideration of federated learning (FL), which is particularly...
699 Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning
2605.07980
Bayesian linear response interpretability系统介绍贝叶斯学习中的线性响应与易感度用于解释网络与影响函数。
cs.LG
Chris Elliott, Daniel Murfet
These notes introduce the theory of susceptibilities as developed in [arXiv:2504.18274, arXiv:2601.12703] for interpreting neural networks. The susceptibility of an observable $\phi$ to a data perturbation is defined as a derivative of a posterior expectation,...
These notes introduce the theory of susceptibilities as developed in [arXiv:2504.18274, arXiv:2601.12703] for interpreting neural networks. The susceptibility of an observable $\phi$ to a data perturbation is defined as a derivative of a posterior expectation, which by the fluctuation--dissipation theorem equals a posterior covariance. Different choices of $\phi$ yield different objects: per-sample losses give the influence matrix (the Bayesian influence function of [arXiv:2509.26544]), while co...
700 Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions
2605.07984
Mechanistic planning localization in LMs用线性探针与激活补丁定位语言模型前向传播中的潜在规划表征并验证因果性。
cs.LGcs.AI
Nicole Ma, Nick Rui
We study planning site formation in language models -- where internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forwar...
We study planning site formation in language models -- where internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forward-looking constraint, we apply two lightweight methods (linear probing and activation patching) across Qwen3, Gemma-3, and Llama-3 at more than ten scales. Probing shows that future-rhyme information is linearly decodable at the line bounda...
701 Bayesian Sensitivity of Causal Inference Estimators under Evidence-Based Priors
2605.07993
Bayesian sensitivity for causal estimators在证据先验下提出因果估计的贝叶斯敏感性分析以评估结论稳健性。
cs.LG
Nikita Dhawan, Daniel Shen, Leonardo Cotta, Chris J. Maddison
Causal inference, especially in observational studies, relies on untestable assumptions about the true data-generating process. Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions. Existing framewor...
Causal inference, especially in observational studies, relies on untestable assumptions about the true data-generating process. Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions. Existing frameworks for sensitivity analysis are concerned with worst-case changes in assumptions. In this work, we argue that using such pessimistic criteria can often become uninformative or lead to conclusions contradicting our prior knowledge about the ...
702 Graph-Structured Hyperdimensional Computing for Data-Efficient and Explainable Process-Structure-Property Prediction
2605.07999
Graph hyperdimensional PSP prediction用图结构超维计算实现数据高效且可解释的工艺-结构-性能预测。
cs.LGcs.AI
Jingzhan Ge, Ajeeth Vellore, Ajinkya Palwe, Ahsan Khan, David Gorsich
Multiphoton photoreduction enables high-fidelity fabrication of complex 3D microstructures, yet reliable process-structure-property (PSP) prediction remains difficult because the available data are sparse, heterogeneous, and interaction-dominated. In this regi...
Multiphoton photoreduction enables high-fidelity fabrication of complex 3D microstructures, yet reliable process-structure-property (PSP) prediction remains difficult because the available data are sparse, heterogeneous, and interaction-dominated. In this regime, conventional feature-vector models are statistically underdetermined, making them prone to spurious correlations, poor regime transfer, and unstable post hoc explanations, whereas mechanistic pipelines depend on calibrated submodels tha...
703 STEPS: A Temporal Smooth Error Propagation Solver on the Manifolds for Test-Time Adaptation in Time Series Forecasting
2605.08005
Test-time adaptation for time series提出STEPS在流形上平滑传播误差以提升时序预测的在线测试时自适应。
cs.LG
Jiaqi Liu, Yifan Ouyang, Zhifei Song, Sim Kuan Goh, Ashwaq Qasem
Test-Time Adaptation (TTA) aims to improve time series forecasting under distribution shifts by using limited observations revealed during inference. However, forecasting TTA must operate in a source-free online setting, where the adaptation signal is short, t...
Test-Time Adaptation (TTA) aims to improve time series forecasting under distribution shifts by using limited observations revealed during inference. However, forecasting TTA must operate in a source-free online setting, where the adaptation signal is short, temporally correlated, and potentially noisy. Existing methods can therefore suffer from weak identifiability, error accumulation, and unstable long-horizon corrections when the revealed prefix is sparse or contaminated. To address these iss...
704 Interpreting Reinforcement Learning Agents with Susceptibilities
2605.08007
Susceptibility interpretability for RL将易感度推广到强化学习遗憾并用其解释智能体训练阶段性内部特征。
cs.LG
Chris Elliott, Einar Urdshals, David Quarel, Daniel Murfet
Susceptibilities are a technique for neural network interpretability that studies the response of posterior expectation values of observables to perturbations of the loss. We generalize this construction to the setting of the regret in deep reinforcement learn...
Susceptibilities are a technique for neural network interpretability that studies the response of posterior expectation values of observables to perturbations of the loss. We generalize this construction to the setting of the regret in deep reinforcement learning and investigate the utility of susceptibilities in a simple gridworld model that nevertheless exhibits non-trivial stagewise development. We argue that susceptibilities reveal internal features of the development of the model in paramet...
705 Adaptive Domain Decomposition Physics-Informed Neural Networks for Traffic State Estimation with Sparse Sensor Data
2605.08028
Domain-decomposed PINNs for traffic提出ADD-PINN自适应域分解以在稀疏传感下重建交通速度场并保留激波。
cs.LG
Eunhan Ka, Ludovic Leclercq, Satish V. Ukkusuri
Traffic state estimation from sparse fixed sensors is challenging because physics-informed neural networks (PINNs) tend to over-smooth the shockwaves admitted by the Lighthill-Whitham-Richards (LWR) model. This study proposes Adaptive Domain Decomposition Phys...
Traffic state estimation from sparse fixed sensors is challenging because physics-informed neural networks (PINNs) tend to over-smooth the shockwaves admitted by the Lighthill-Whitham-Richards (LWR) model. This study proposes Adaptive Domain Decomposition Physics-Informed Neural Networks (ADD-PINN), a two-stage residual-guided framework for LWR-based offline speed-field reconstruction. A coarse global PINN is first trained; its spatial residual profile is then used to place subdomain boundaries ...
706 Don't Get Your Kroneckers in a Twist: Gaussian Processes on High-Dimensional Incomplete Grids
2605.08036
Scalable Gaussian processes on grids提出CUTS-GPR利用不完整网格与加性核实现高维GPR的快速精确推断。
cs.LG
Mads Greisen H{\o}jlund, August Smart Lykke-M{\o}ller, Henry Moss, Ove Christiansen
We introduce CUTS-GPR, a new method for performing numerically exact Gaussian process regression (GPR) in high-dimensional settings. The key component of CUTS-GPR is an extremely fast kernel matrix-vector product, which exhibits near-linear or even linear scal...
We introduce CUTS-GPR, a new method for performing numerically exact Gaussian process regression (GPR) in high-dimensional settings. The key component of CUTS-GPR is an extremely fast kernel matrix-vector product, which exhibits near-linear or even linear scaling with the amount of training data, $N$, and low-order polynomial scaling with dimensionality, $D$. This is obtained by combining an additive kernel with an incomplete grid and exploiting the resulting structure of the kernel matrix. We d...
707 Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
2605.08037
Preference graph optimization for LMs将多候选偏好数据建模为偏好图以改进DPO训练稳定性与信息利用。
cs.LGcs.AI
Ning Liu, Chuanneng Sun, Kristina Klinkner, Shervin Malmasi
Direct Preference Optimization (DPO) aligns language models using pairwise preference comparisons, offering a simple and effective alternative to Reinforcement Learning (RL) from human feedback. However, in many practical settings, training data consists of mu...
Direct Preference Optimization (DPO) aligns language models using pairwise preference comparisons, offering a simple and effective alternative to Reinforcement Learning (RL) from human feedback. However, in many practical settings, training data consists of multiple rollouts per prompt, inducing rich preference structure that pairwise DPO fails to exploit. Collapsing such data into independent pairs discards transitivity, introduces redundant or conflicting supervision, and can lead to unstable ...
708 Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs
2605.08053
Exponential-utility reinforcement learning推导指数效用MDP的Q学习型算法并证明相关算子收敛与压缩性。
cs.LG
Gugan Thoppe, L. A. Prashanth, Ankur Naskar, Sanjay Bhat
Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type equation for exponenti...
Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type equation for exponential utility studied in \cite{porteus1975optimality}, we derive two Q-value-style extensions and show that the associated operators are contractions in the $L_\infty$ and sup-log/Thompson metrics, respectively. We characterize their fixed poi...
709 GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs
2605.08074
Localized conformal prediction on graphs提出GRAPHLCP利用图结构进行局部化共形预测以获得更紧的不确定性集合。
cs.LG
Peyman Baghershahi, Fangxin Wang, Debmalya Mandal, Sourav Medya
Conformal prediction (CP) provides a distribution-free approach to uncertainty quantification with finite-sample guarantees. However, applying CP to graph neural networks (GNNs) remains challenging as the combinatorial nature of graphs often leads to insuffici...
Conformal prediction (CP) provides a distribution-free approach to uncertainty quantification with finite-sample guarantees. However, applying CP to graph neural networks (GNNs) remains challenging as the combinatorial nature of graphs often leads to insufficiently certain predictions and indiscriminative embeddings. Existing methods primarily rely on embedding-space proximity for localization, which can be unreliable for graphs and yield inefficient prediction sets. We propose GRAPHLCP, a proxi...
710 Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping
2605.08075
Zero-shot imagined speech decoding通过想象到聆听MEG映射实现零样本想象语音解码并利用更易标注数据。
cs.LGeess.AS
Maryam Maghsoudi, Shihab Shamma
Decoding imagined speech from non-invasive brain recordings is challenging because imagined datasets are scarce and difficult to align temporally across subjects and sessions In this work, we propose a new approach to the decoding of imagined speech that lever...
Decoding imagined speech from non-invasive brain recordings is challenging because imagined datasets are scarce and difficult to align temporally across subjects and sessions In this work, we propose a new approach to the decoding of imagined speech that leverages the richer and more reliably labeled recordings during listening to speech. We collected paired listened and imagined MEG recordings to rhythmic melodic and spoken stimuli from trained musicians. Using trained musicians helped improve ...
711 XDecomposer: Learning Prior-Free Set Decomposition for Multiphase X-ray Diffraction
2605.05866
Multiphase XRD set decomposition提出XDecomposer无先验地将多相PXRD谱分解为组分以辅助结构识别。
cs.LG
Hanyu Gao, Bin Cao, Yunyue Su, Tong-Yi Zhang, Qiang Liu
Multiphase powder X-ray diffraction (PXRD) analysis remains a fundamental bottleneck in structure identification, as real-world synthesis often produces complex mixtures whose constituent phases (components) cannot be reliably disentangled. While recent advanc...
Multiphase powder X-ray diffraction (PXRD) analysis remains a fundamental bottleneck in structure identification, as real-world synthesis often produces complex mixtures whose constituent phases (components) cannot be reliably disentangled. While recent advances in representation-based crystal retrieval and generation suggest the possibility of inferring structures directly from PXRD, existing approaches largely assume single-phase inputs and break down in multiphase settings. Here, we present X...
712 MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems
2605.06623
Multi-agent prompt optimization提出MASPO联合优化多智能体角色提示以对齐局部目标与系统整体表现。
cs.LG
Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang
Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them acr...
Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iterati...
713 Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs
2605.06669
Prompt injection defense evaluation构建教育LLM导师的注入防御评测并量化安全性、可用性与延迟权衡。
cs.LGcs.AI
Alexandre Cristov\~ao Maiorano
Educational LLM tutors face a core AI alignment challenge: they must follow user intent while preserving pedagogical constraints and safety policies. We present an evaluation methodology for prompt-injection defenses in this setting, showing that guardrail des...
Educational LLM tutors face a core AI alignment challenge: they must follow user intent while preserving pedagogical constraints and safety policies. We present an evaluation methodology for prompt-injection defenses in this setting, showing that guardrail design entails explicit trade-offs among adversarial robustness, benign-task usability, and response latency. We evaluate a domain-specific multi-layer safeguard pipeline combining deterministic pattern filters, structural validation, contextu...
714 Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations
2605.06696
Coalition detection in multi-agent AI从智能体内部表征提出谱诊断方法以早期检测隐藏联盟与信息耦合。
cs.LGcs.AI
Cameron Berg, Susan L. Schneider, Mark M. Bailey
Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from ...
Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent. Here, we introduce a practical method for detecting coalition structure from the internal neu...
715 Information-theoretic Limits of Learning and Estimation
2605.06710
Information-theoretic learning limits综述信息论工具如何给出学习与估计的基本极限并配套练习。
cs.LG
Abbas El Gamal, Maxim Raginsky
Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-ch...
Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-chapter exercises makes the material suitable for both classroom use and self-study. We begin by introducing concentration inequalities along with the notions of covering and packing in metric spaces, and the associated concept of metric entr...
716 TUANDROMD-X: Advanced Entropy and Visual Analytics Dataset for Enhanced Malware Detection and Classification
2605.06718
Malware detection analytics dataset发布含熵与可视分析特征的数据集以支持恶意软件检测与分类研究。
cs.LG
Parthajit Borah, Upasana Sarmah, D. K. Bhattacharyya, J. K. Kalita
Malware and malware-based attacks are becoming more prevalent and complex. Attackers regularly come up with new techniques that have the ability to evade conventional and signature-based malware defense. In order to address such threats, there is an increasing...
Malware and malware-based attacks are becoming more prevalent and complex. Attackers regularly come up with new techniques that have the ability to evade conventional and signature-based malware defense. In order to address such threats, there is an increasing demand for advanced and better defense solutions. Machine learning-based techniques are efficiently capable of defending against malware and malware-based attacks. Nevertheless, creating and efficiently testing such techniques demand high-...
717 AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites
2605.06841
Affordance-grounded world models提出基于可供性约束的世界模型以处理动作前置条件与组合式环境动态。
cs.LGcs.AI
Qinshi Zhang (University of California, San Diego), Weipeng Deng (University of Hong Kong), Zhihan Jiang (Columbia University), Jiaming Qu (Amazon)
In model-based learning, the agent learns behaviors by simulating trajectories based on world model predictions. Standard world models typically learn a stationary transition function that maps states and actions to next states, when an action and an outcome f...
In model-based learning, the agent learns behaviors by simulating trajectories based on world model predictions. Standard world models typically learn a stationary transition function that maps states and actions to next states, when an action and an outcome frequently co-occur in training data, the model tends to internalize this correlation as a general causal rule while ignoring action preconditions. In interactive environments, however, agent actions can reshape the future affordance space. ...
718 One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators
2605.06873
Neural operators for probabilistic conditioning学习单一神经算子以摊销近似多分布条件化过程而非逐任务学条件分布。
cs.LG
Panos Tsimpos, Edoardo Calvello, Ayoub Belhadji, Nicholas H. Nelsen
Probabilistic conditioning is concerned with the identification of a distribution of a random variable $X$ given a random variable $Y$. It is a cornerstone of scientific and engineering applications where modeling uncertainty is key. This problem has tradition...
Probabilistic conditioning is concerned with the identification of a distribution of a random variable $X$ given a random variable $Y$. It is a cornerstone of scientific and engineering applications where modeling uncertainty is key. This problem has traditionally been addressed in machine learning by directly learning the conditional distribution of a fixed joint distribution. This paper introduces a novel perspective: we propose to solve the conditioning problem by identifying a single operato...
719 Kernel Selection is Model Selection: A Unified Complexity-Penalized Approach for MMD Two-Sample Tests
2605.06883
Kernel selection for MMD testing将核选择视为模型选择并用复杂度惩罚统一提升MMD两样本检验功效。
cs.LG
Yijin Ni, Xiaoming Huo
The Maximum Mean Discrepancy (MMD) is a cornerstone statistic for nonparametric two-sample testing, but its test power is dictated entirely by the chosen kernel. Because any fixed kernel inherently fails to distinguish certain distributions, the kernel must be...
The Maximum Mean Discrepancy (MMD) is a cornerstone statistic for nonparametric two-sample testing, but its test power is dictated entirely by the chosen kernel. Because any fixed kernel inherently fails to distinguish certain distributions, the kernel must be dynamically optimized. However, data-driven optimization violates the foundational i.i.d. assumption, forcing a strict trade-off in existing frameworks. Ratio criteria ignore this dependence, inducing overfitting and variance collapse on r...
720 Muon with Nesterov Momentum: Heavy-Tailed Noise and (Randomized) Inexact Polar Decomposition
2605.06884
Muon optimizer theory with momentum分析带Nesterov动量的Muon在重尾噪声与近似极分解下的收敛与误差。
cs.LG
Sayantan Choudhury, Xiaoran Cheng, Martin Tak\'a\v{c}, Sen Na, Mladen Kolar
Most first-order optimizers treat matrix-valued parameters as vectors, ignoring the intrinsic geometry of hidden-layer weights in neural networks. Muon addresses this mismatch by updating along the polar factor of a momentum matrix, but its theoretical underst...
Most first-order optimizers treat matrix-valued parameters as vectors, ignoring the intrinsic geometry of hidden-layer weights in neural networks. Muon addresses this mismatch by updating along the polar factor of a momentum matrix, but its theoretical understanding has lagged behind practice. In particular, practical implementations incorporate Nesterov momentum, compute the polar factor only approximately, and operate with stochastic gradients that may be heavy-tailed. We close this gap by dev...
721 McNdroid: A Longitudinal Multimodal Benchmark for Robust Drift Detection in Android Malware
2605.06894
Android恶意软件漂移基准构建2013-2025多模态安卓恶意软件基准用于漂移检测。
cs.LG
Md Mahmuduzzaman Kamol, Jesus Lopez, Saeefa Rubaiyet Nowmi, Emilia Rivas, Md Ahsanul Haque
Machine learning (ML) in real-world systems must contend with concept drift, adversarial actors, and a spectrum of potential features with varying costs and benefits. Malware naturally exhibits all of these complexities, but for the same reason, it is challeng...
Machine learning (ML) in real-world systems must contend with concept drift, adversarial actors, and a spectrum of potential features with varying costs and benefits. Malware naturally exhibits all of these complexities, but for the same reason, it is challenging to curate and organize data to study these factors. We present McNdroid, to our knowledge the largest longitudinal multimodal Android malware benchmark for malware detection and drift analysis. McNdroid spans 2013--2025, excluding 2015,...
722 Accelerated Relax-and-Round for Concave Coverage Problems
2605.06900
凹覆盖问题近似算法提出加速relax-and-round以更快求解凹覆盖优化问题。
cs.LG
Matthew Fahrbach, Mehraneh Liaee, Morteza Zadimoghaddam
We present an accelerated relax-and-round algorithm for concave coverage problems, which generalize the classic maximum coverage problem. Building on the relax-and-round framework of Barman et al. [STACS 2021], we propose two significant improvements. First, w...
We present an accelerated relax-and-round algorithm for concave coverage problems, which generalize the classic maximum coverage problem. Building on the relax-and-round framework of Barman et al. [STACS 2021], we propose two significant improvements. First, we replace the linear programming (LP) relaxation step with a projected accelerated gradient method applied to a smooth surrogate objective to achieve a $\widetilde{O}(mn \varepsilon^{-1})$ running time. Second, we use a specialized rounding...
723 You Only Stack Once (YOSO): A Motion-Filtered, Deep-Learning Framework for Detecting Faint Moving Sources
2605.06913
天文移动目标检测用像素级运动滤波与深度学习自动检出微弱慢速天体。
cs.LG
Nitya Pandey, C\'esar Fuentes, Pedro Bernardinelli, Valeria Fr\'ias, Colin Orion Chandler
We present You Only Stack Once (YOSO), an automated pipeline designed to detect faint, slow-moving Solar System objects in wide-field astronomical surveys. The pipeline integrates a novel Gaussian Motion Filter (GMoF) that operates at the pixel level to enhanc...
We present You Only Stack Once (YOSO), an automated pipeline designed to detect faint, slow-moving Solar System objects in wide-field astronomical surveys. The pipeline integrates a novel Gaussian Motion Filter (GMoF) that operates at the pixel level to enhance signal-to-noise for objects exhibiting a range of apparent rates of motion. Unlike conventional shift-and-stack methods, which rely on discrete velocity trials, GMoF amplifies trails while suppressing random noise and static background fe...
724 Generalising Travel Time Prediction To Varying Route Choices In Urban Networks
2605.06918
城市路网行程时间预测提出可区分路线选择的模型以预测流量与旅行时间。
cs.LG
{\L}ukasz Gorczyca, Kacper Drozd, Micha{\l} Bujak, Rafa{\l} Kucharski
Previous methods that predict system-wide travel time, predominantly grounded in graph neural networks, remain limited to typical and recurring demand patterns. While they successfully predict future congestion following daily commute, they inherently approxim...
Previous methods that predict system-wide travel time, predominantly grounded in graph neural networks, remain limited to typical and recurring demand patterns. While they successfully predict future congestion following daily commute, they inherently approximate a single demand realisation and fail to capture varying route choices. In this work, we propose a Generalised Travel Time Predictor (GenTTP) that successfully differentiates route choices and offers accurate flow and travel time predict...
725 In-Context Credit Assignment via the Core
2605.06920
上下文信用分配机制用合作博弈least core为上下文中创作者分配价值与补偿。
cs.LGcs.AI
Keegan Harris, Siddharth Prasad, Asher Trockman
We propose incentive-aligned mechanisms for in-context credit assignment: the task of assigning credit for AI-generated content (e.g. code, news articles, short-form videos) among creators whose intellectual property appears in the context window. Our approach...
We propose incentive-aligned mechanisms for in-context credit assignment: the task of assigning credit for AI-generated content (e.g. code, news articles, short-form videos) among creators whose intellectual property appears in the context window. Our approach is based on the least core solution concept from cooperative game theory, which distributes value in a way that is as stable as possible by ensuring that no subset of creators is significantly under-compensated relative to the value they c...
726 Physics-Based Flow Matching for Full-Field Prediction of Silicon Photonic Devices
2605.06929
光子器件电磁场生成模型用条件流匹配生成模型替代FDTD预测光子器件全场分布。
cs.LG
Joseph Quaratiello, Anthony Rizzo
Designing photonic integrated circuits requires accurate electromagnetic field simulations, which remain computationally expensive even for simple device geometries. We present PIC-Flow, a generative neural surrogate that predicts electromagnetic field distrib...
Designing photonic integrated circuits requires accurate electromagnetic field simulations, which remain computationally expensive even for simple device geometries. We present PIC-Flow, a generative neural surrogate that predicts electromagnetic field distributions for photonic devices given their geometry and operating wavelength as an alternative to costly finite-difference time-domain (FDTD) simulations. Our approach combines three key ideas: (i) conditional flow matching as the generative f...
727 Multi-Objective Constraint Inference using Inverse reinforcement learning
2605.06951
多目标约束逆强化学习从异质专家演示中推断多目标安全约束并提升效率。
cs.LGcs.AI
Syed Ihtesham Hussain Shah, Floris den Hengst, Aneta Lisowska, Annette ten Teije
Constraint inference is widely considered essential to align reinforcement learning agents with safety boundaries and operational guidelines by observing expert demonstrations. However, existing approaches typically assume homogeneous demonstrations (i.e., gen...
Constraint inference is widely considered essential to align reinforcement learning agents with safety boundaries and operational guidelines by observing expert demonstrations. However, existing approaches typically assume homogeneous demonstrations (i.e., generated by a single expert or multiple experts with identical objectives). They also have limited ability to capture individual preferences and often suffer from computational inefficiencies. In this paper, we introduce Multi-Objective Const...
728 Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions
2605.06959
高维分段线性回归以DoMA参数化并用ABGD实现高维分段线性回归局部收敛。
cs.LG
Haitham Kanj, Kiryung Lee
This paper presents a parametric solution to piecewise linear regression through the Adaptive Block Gradient Descent (ABGD) algorithm. The heart of the method is the parametrization of piecewise linear functions as the difference of max-affine (DoMA) functions...
This paper presents a parametric solution to piecewise linear regression through the Adaptive Block Gradient Descent (ABGD) algorithm. The heart of the method is the parametrization of piecewise linear functions as the difference of max-affine (DoMA) functions. A non-asymptotic local convergence analysis for ABGD is provided under sub-Gaussian covariate and noise distributions. To initialize ABGD, we adapt a prior algorithm originally developed for the simpler setting of max-affine functions. Wh...
729 A Differentiable Bayesian Relaxation for Latent Partial-Order Inference
2605.06976
潜在偏序推断贝叶斯松弛提出可微贝叶斯松弛从线性序列数据恢复潜在部分序。
cs.LG
Dongqing Li, Geoff K. Nicholls, Shiyi Sun, You Luo
Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary linearization rather than true ...
Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary linearization rather than true prerequisites. We introduce a differentiable relaxation for latent partial-order inference from such traces. Starting from a hard frontier-constrained model of noisy linear extensions, we replace discontinuous product-order precedence and b...
730 Drawing Lines in Psychological Space: What K-means Clustering Reveals in Simulated and Real Psychometric Data
2605.06989
心理测量K-means局限分析用模拟与真实数据揭示K-means不等同于发现潜在类别。
cs.LGcs.AI
Pedro Henrique Ramos Pinto, Maria Jullyanna Ferreira Marques, Luiz Carlos Serramo Lopez
K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means p...
K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In this paper, we examine this limitation through a sequence of controlled simulated datasets....
731 Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift
2605.07005
分布移位学习模型等价证明PQ学习与可拒绝的TDS学习在分布无关设定下等价。
cs.LG
Adam R. Klivans, Shyamal Patel, Konstantinos Stavropoulos, Arsen Vasilyan
Recent work on provably efficient algorithms for learning with distribution shift has focused on two models: PQ learning (Goldwasser et al. (2020)) and TDS learning (Klivans et al. (2024)). Algorithms for TDS learning are allowed to reject a test set entirely ...
Recent work on provably efficient algorithms for learning with distribution shift has focused on two models: PQ learning (Goldwasser et al. (2020)) and TDS learning (Klivans et al. (2024)). Algorithms for TDS learning are allowed to reject a test set entirely if distribution shift is detected. In contrast, PQ learners may only reject points that are deemed out-of-distribution on an individual basis. Our main result is a surprising equivalence between these two models in the distribution-free set...
732 Learning Cross-Atlas Consistent Brain Disorder Representations via Disentangled Multi-Atlas Functional Connectivity Learning
2605.07026
多脑图谱功能连接表征用解耦多图谱学习获得跨atlas一致的脑疾病表征。
cs.LGcs.AI
Minheng Chen, Chao Cao, Jing Zhang, Tianming Liu, Dajiang Zhu
Functional connectivity (FC) derived from resting-state fMRI is widely used to characterize large-scale brain network alterations in neurological and psychiatric disorders. However, FC construction critically depends on the choice of brain atlas, and different...
Functional connectivity (FC) derived from resting-state fMRI is widely used to characterize large-scale brain network alterations in neurological and psychiatric disorders. However, FC construction critically depends on the choice of brain atlas, and different parcellations may emphasize distinct organizational features, leading to heterogeneous and sometimes inconsistent representations. Existing multi-atlas approaches partially alleviate this issue but often fuse atlas-derived features or pred...
733 BGM-IV: an AI-powered Bayesian generative modeling approach for instrumental variable analysis
2605.07029
贝叶斯生成式工具变量因果提出BGM-IV以潜变量生成模型处理非线性高维IV因果估计。
cs.LGcs.AI
Guyue Luo, Qiao Liu
Instrumental-variable (IV) regression enables causal estimation under endogeneity, but modern IV problems often involve nonlinear structural effects and high-dimensional covariates. Existing nonlinear IV methods directly learn the causal relation in observed f...
Instrumental-variable (IV) regression enables causal estimation under endogeneity, but modern IV problems often involve nonlinear structural effects and high-dimensional covariates. Existing nonlinear IV methods directly learn the causal relation in observed feature space or rely on learned representations within two-stage or moment-based procedures, which can struggle when the causal information is embedded in a high-dimensional representation. We propose BGM-IV, a latent Bayesian generative mo...
734 Beyond the Wrapper: Identifying Artifact Reliance in Static Malware Classifiers using TRUSTEE
2605.07034
静态恶意软件伪迹依赖检测用TRUSTEE识别静态分类器对packing等非语义伪迹的依赖。
cs.LG
Riyazuddin Mohammed, Lan Zhang
Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these u...
Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these unnecessary artifacts rather than the true binary behavior because of the high association between maliciousness and packing. Moreover, these malware classifiers are black boxes, making it difficult to understand what they learn. To address ...
735 The Context Gathering Decision Process: A POMDP Framework for Agentic Search
2605.07042
Agent搜索的POMDP建模将上下文收集建模为POMDP以指导LLM代理高效检索信息。
cs.LGcs.AI
Chinmaya Kausik, Adith Swaminathan, Nathan Kallus
Large Language Model (LLM) agents are deployed in complex environments -- such as massive codebases, enterprise databases, and conversational histories -- where the relevant state far exceeds their context windows. To navigate these spaces, an agent must itera...
Large Language Model (LLM) agents are deployed in complex environments -- such as massive codebases, enterprise databases, and conversational histories -- where the relevant state far exceeds their context windows. To navigate these spaces, an agent must iteratively explore the environment to find relevant information. However, without explicit infrastructure, an agent's working memory can degrade into lossy representations of the search state, resulting in redundant work (e.g. repetitive loopin...
736 An Interpretable and Scalable Framework for Evaluating Large Language Models
2605.07046
可解释可扩展LLM评测用可扩展IRT框架建模题目异质性与输出随机性评估LLM。
cs.LGcs.AI
Xinhao Qu, Qiang Heng, Hao Zeng, Xiaoqian Liu
Evaluation of large language models (LLMs) is increasingly critical, yet standard benchmarking methods rely on average accuracy, overlooking both the inherent stochasticity of LLM outputs and the heterogeneity of benchmark items. Item Response Theory (IRT) off...
Evaluation of large language models (LLMs) is increasingly critical, yet standard benchmarking methods rely on average accuracy, overlooking both the inherent stochasticity of LLM outputs and the heterogeneity of benchmark items. Item Response Theory (IRT) offers a principled framework for modeling latent model abilities and item characteristics, but conventional methods are computationally expensive and numerically unstable, limiting large-scale implementations. To address these challenges, we ...
737 A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces
2605.07052
RKHS行为系统辨识在向量值RKHS中推广行为方法以数据驱动建模非线性系统。
cs.LG
Boya Hou, Maxim Raginsky
We generalize Jan Willems' behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series an...
We generalize Jan Willems' behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series and their autoregressive variants, as well as systems admitting Hammerstein-type state-space realizations. We apply the proposed framework to the problem of data-driven modeling of such systems, i.e., when simulation or control objectives for...
738 Functional-prior-based Bayesian PDE-constrained inversion using PINNs
2605.07060
PINN贝叶斯反演函数先验提出函数空间先验的PINN贝叶斯PDE反演统一框架。
cs.LG
Ryoichiro Agata, Tomohisa Okazaki
Physics-informed neural networks (PINNs) provide a mesh-free framework for solving PDE-constrained inverse problems, but their extension to Bayesian inversion still faces a fundamental difficulty: prior distributions are typically defined in the weight space o...
Physics-informed neural networks (PINNs) provide a mesh-free framework for solving PDE-constrained inverse problems, but their extension to Bayesian inversion still faces a fundamental difficulty: prior distributions are typically defined in the weight space of neural networks, whereas physically meaningful prior assumptions are more naturally expressed in function space. In this study, we introduce a unified framework, termed functional-prior-based approaches to Bayesian PDE-constrained inversi...
739 Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neural Networks
2605.07065
个体处理效应区间估计用EpiNets校正PNS界的有限样本偏差以给出更可靠区间。
cs.LGcs.AI
Gandharv Patil, Keyi Tang, Raquel Aoki, Leo Guelman
Individual treatment effects are not point-identified from data. The Probability of Necessity and Sufficiency (PNS) circumvents this limitation by characterizing individual-level causality through intersection bounds derived from combined experimental and obse...
Individual treatment effects are not point-identified from data. The Probability of Necessity and Sufficiency (PNS) circumvents this limitation by characterizing individual-level causality through intersection bounds derived from combined experimental and observational data. In finite samples, however, standard plug-in estimators systematically fail: they violate structural probability constraints and suffer from extremum bias induced by max-min operators, yielding spuriously narrow intervals. W...
740 Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity
2605.07097
o-极小结构网络样本复杂度证明可在o-极小结构中定义的固定网络具有限样本PAC复杂度。
cs.LG
Anastasis Kratsios, Gregory Cousins, Haitz S\'aez de Oc\'ariz Borde, Bum Jun Kim, Simone Brugiapaglia
We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has finite sample complexity...
We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting, even with unbounded parameters. This covers standard fixed-size MLPs, CNNs, GNNs, and transformers with fixed sequence length, together with the operations and layers typically used in such architectures, inclu...
741 TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models
2605.07100
生成模型辅助保形预测用扩散与流匹配做传输对齐以构造多维输出保形预测区域。
cs.LG
Zhenhan Fang, Aixin Tan, Jian Huang
Constructing valid and informative conformal prediction regions for multi-dimensional outputs remains a fundamental challenge. While conformal prediction provides finite-sample, distribution-free coverage guarantees, its practical performance critically depend...
Constructing valid and informative conformal prediction regions for multi-dimensional outputs remains a fundamental challenge. While conformal prediction provides finite-sample, distribution-free coverage guarantees, its practical performance critically depends on the choice of nonconformity score. Existing approaches often rely on restrictive geometric assumptions or require explicit likelihood evaluation and invertible transformations, limiting their applicability in complex generative setting...
742 Classification Fields: Arbitrarily Fine Recursive Hierarchical Clustering From Few Examples
2605.07119
无限深层次聚类结构提出classification fields从少样本生成可无限细化的层次聚类。
cs.LG
Yicen Li, Ruiyang Hong, Anastasis Kratsios, Haitz S\'aez de Oc\'ariz Borde, Paul D. McNicholas
Classical clustering methods usually return either a finite partition of the observed data or a finite dendrogram over it. This finite-sample view is inadequate when the hierarchy of interest is a recursive geometric object with fine-scale refinements that con...
Classical clustering methods usually return either a finite partition of the observed data or a finite dendrogram over it. This finite-sample view is inadequate when the hierarchy of interest is a recursive geometric object with fine-scale refinements that continue beyond the levels directly observed. We introduce classification fields: infinite-depth hierarchical cluster structures on $\mathbb{R}^d$ generated by a local parent-to-child refinement rule. A classification field generator maps each...
743 AdaTKG: Adaptive Memory for Temporal Knowledge Graph Reasoning
2605.07121
时序知识图谱自适应记忆为TKG推理引入实体自适应记忆以保留交互历史并提升预测。
cs.LGcs.AI
Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim
Temporal knowledge graphs (TKGs) represent time-stamped relational facts and support a wide range of reasoning tasks over evolving events. However, existing methods produce entity representations that are static at the entity level, in that each representation...
Temporal knowledge graphs (TKGs) represent time-stamped relational facts and support a wide range of reasoning tasks over evolving events. However, existing methods produce entity representations that are static at the entity level, in that each representation is a function of learned parameters only and retains no trace of the interactions in which the entity has participated. In this paper, we depart from this static view and propose that each entity be modeled as an adaptive process whose rep...
744 RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation
2605.07129
LLM推荐的检索与记忆用协同与元记忆的排序驱动检索构建更有效推荐上下文。
cs.LGcs.AI
Shijun Li, Wooseong Yang, Yu Wang, Tianxin Wei, Joydeep Ghosh
Large Language Models (LLMs) have emerged as a promising paradigm for next-generation recommender systems, offering strong semantic understanding and natural-language reasoning abilities. Despite recent progress, current LLM-based recommenders still face key c...
Large Language Models (LLMs) have emerged as a promising paradigm for next-generation recommender systems, offering strong semantic understanding and natural-language reasoning abilities. Despite recent progress, current LLM-based recommenders still face key challenges in constructing decision-relevant contexts from heterogeneous evidence. First, existing methods often rely on fixed context construction strategies: collaborative behavioral evidence and item-side metadata are typically incorporat...
745 Can You Break RLVER? Probing Adversarial Robustness of RL-Trained Empathetic Agents
2605.07138
同理心RL模型对抗鲁棒性构建对抗同理心基准评测RLVER代理在操纵性对话下的稳健性。
cs.LGcs.AI
Deeraj S K, Sadhana Devarajan, Krishna Mehra, Sudhakar Mishra
Reinforcement learning from verifiable emotion rewards RLVER has produced language models with strong empathetic performance, evaluated on benchmarks that assume cooperative, honest users. Yet real emotional interactions systematically violate this assumption:...
Reinforcement learning from verifiable emotion rewards RLVER has produced language models with strong empathetic performance, evaluated on benchmarks that assume cooperative, honest users. Yet real emotional interactions systematically violate this assumption: users gaslight, escalate, and pressure AI systems for unconditional validation, dynamics that cooperative benchmarks cannot surface. We construct the Adversarial Empathy Benchmark AEB and introduce the Emotional Consistency Score ECS to ev...
746 MathlibPR: Pull Request Merge-Readiness Benchmark for Formal Mathematical Libraries
2605.07147
形式化数学PR可合并性基准提出MathlibPR基准评测模型判断Lean/Mathlib PR合并就绪度。
cs.LGcs.AI
Zixuan Xie, Xinyu Liu, Shangtong Zhang
The ecosystem of Lean and Mathlib has become the de facto standard for large language model (LLM) assisted formal reasoning with remarkable successes in recent years. Those successes, however, only consume Mathlib as an essential dependency but do not directly...
The ecosystem of Lean and Mathlib has become the de facto standard for large language model (LLM) assisted formal reasoning with remarkable successes in recent years. Those successes, however, only consume Mathlib as an essential dependency but do not directly contribute to it. In the meantime, the growth of Mathlib has recently been bottlenecked by the review process, which requires human reviewers to judge whether proposed pull requests (PRs) follow the Mathlib's conventions and are worth inte...
747 Three-in-One World Model: Energy-Based Consistency, Prediction, and Counterfactual Inference for Marketing Intervention
2605.07199
营销干预三合一世界模型用能量模型统一学习信念表征以做预测与反事实营销推断。
cs.LGcs.AI
Junichiro Niimi
Marketing decisions reflect the interaction of latent consumer heterogeneity, time-varying internal states, and explicit interventions, a structure that current prediction- and language-oriented models do not capture in a unified manner. We propose a Three-in-...
Marketing decisions reflect the interaction of latent consumer heterogeneity, time-varying internal states, and explicit interventions, a structure that current prediction- and language-oriented models do not capture in a unified manner. We propose a Three-in-One world-model architecture in which a Deep Boltzmann Machine (DBM) learns a frozen belief representation from demographics, time, and lagged actions and outcomes, with lightweight task-specific adapters attached on top. The same belief su...
748 Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning
2605.07263
非相干空口联邦学习聚合提出REED无需瞬时CSI与相位对齐实现OTA-FL非相干聚合。
cs.LGcs.AI
Hao Chen, Zavareh Bozorgasl
Over-the-air federated learning (OTA-FL) reduces uplink latency by exploiting waveform superposition, but conventional analog aggregation schemes typically require instantaneous channel state information (CSI), channel inversion, and coherent phase alignment, ...
Over-the-air federated learning (OTA-FL) reduces uplink latency by exploiting waveform superposition, but conventional analog aggregation schemes typically require instantaneous channel state information (CSI), channel inversion, and coherent phase alignment, which can be difficult to maintain in practical wireless systems. This paper proposes resource-element energy difference (REED), a noncoherent aggregation primitive for continuous signed updates that avoids instantaneous CSI. REED maps the ...
749 How Big Should a Wireless Foundation Model Be?
2605.07266
无线基础模型规模定律用信道非线性流形维度给出无线基础模型可扩展上限。
cs.LG
Wei-Lun Cheng, Wanjiun Liao
Wireless foundation models are rapidly emerging as a key enabler of AI-native communication systems, yet a fundamental question remains unanswered: how large should these models be? We present a principled, physics-grounded answer, showing that the intrinsic d...
Wireless foundation models are rapidly emerging as a key enabler of AI-native communication systems, yet a fundamental question remains unanswered: how large should these models be? We present a principled, physics-grounded answer, showing that the intrinsic dimensionality (dNL, the nonlinear manifold dimension of the channel) acts as the fundamental bottleneck, defining the scaling ceiling once a data-sufficient regime is reached. This dimensionality is not a design choice but a physical constr...
750 Structured Role-Aware Policy Optimization for Multimodal Reasoning
2605.07274
多模态推理的角色感知RL提出角色感知策略优化以让奖励区分视觉证据与文本功能。
cs.LGcs.AI
Bingqing Jiang, Difan Zou
Reinforcement learning from verifiable rewards (RLVR), especially with Group Relative Policy Optimization (GRPO), has shown strong potential for improving the reasoning capabilities of large vision-language models (LVLMs). However, in multimodal reasoning, fin...
Reinforcement learning from verifiable rewards (RLVR), especially with Group Relative Policy Optimization (GRPO), has shown strong potential for improving the reasoning capabilities of large vision-language models (LVLMs). However, in multimodal reasoning, final-answer rewards are typically assigned at the sequence level and do not distinguish the functional roles of different tokens, making it difficult to determine whether a correct answer is supported by task-relevant visual evidence. In this...
751 Sparse Random-Feature Neural Networks with Krylov-Based SVD for Singularly Perturbed ODE
2605.07286
Sparse random-feature networks用Krylov-SVD稀疏化RFNN求解奇异摄动ODE。
cs.LG
Kevin Kurian Thomas Vaidyan, Siddharth Rout
Random-feature neural networks (RFNNs), including architectures with fixed hidden layers and analytically determined output weights, offer fast training but often suffer from issues due to dense representations of the hidden layer activation. Their reliance on...
Random-feature neural networks (RFNNs), including architectures with fixed hidden layers and analytically determined output weights, offer fast training but often suffer from issues due to dense representations of the hidden layer activation. Their reliance on dense feature mappings and least squares solvers can limit scalability and numerical stability, particularly for high-dimensional or stiff systems. Specifically, the activation matrix is observed to be low-rank and extremely ill-conditione...
752 Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers
2605.07297
Transformer generalization bounds提出谱自适应的Transformer事后泛化界。
cs.LG
Mana Sakai, Masaaki Imaizumi
Understanding why trained Transformers generalize well is a fundamental problem in modern machine learning theory, and complexity-based generalization bounds provide a principled way to study this question. While existing norm-based bounds for Transformers rem...
Understanding why trained Transformers generalize well is a fundamental problem in modern machine learning theory, and complexity-based generalization bounds provide a principled way to study this question. While existing norm-based bounds for Transformers remove the explicit polynomial dependence on the hidden dimension, they typically impose fixed norm constraints specified a priori and can exhibit unfavorable exponential dependence on depth. In this paper, we derive spectrum-adaptive post hoc...
753 Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation
2605.07323
ODE discovery with LLMs用LLM做定性+定量评估来发现常微分方程。
cs.LGcs.AI
Sum Kyun Song, Bong Gyun Shin, Jae Yong Lee
Discovering governing differential equations from observational data is a fundamental challenge in scientific machine learning. Existing symbolic regression approaches rely primarily on quantitative metrics; however, real-world differential equation modeling a...
Discovering governing differential equations from observational data is a fundamental challenge in scientific machine learning. Existing symbolic regression approaches rely primarily on quantitative metrics; however, real-world differential equation modeling also requires incorporating domain knowledge to ensure physical plausibility. To address this gap, we propose DoLQ, a method for discovering ordinary differential equations with LLM-based qualitative and quantitative evaluation. DoLQ employs...
754 Exploring CoCo Challenges in ML Engineering Teams: Insights From the Semiconductor Industry
2605.07389
ML team collaboration challenges调研半导体行业MLE团队协作沟通难题与影响。
cs.LG
A. Azamnouri, M. Haug, L. Woltmann, M. Fritz, J. Bogner
The integration of machine learning (ML) into complex software systems has increased challenges in collaboration and communication (CoCo) of the teams building these systems. ML engineering (MLE) teams often involve diverse roles, ML engineers, data scientists...
The integration of machine learning (ML) into complex software systems has increased challenges in collaboration and communication (CoCo) of the teams building these systems. ML engineering (MLE) teams often involve diverse roles, ML engineers, data scientists, software engineers, and domain experts, each bringing unique goals, experiences, and jargon. These interdisciplinary dynamics can make it challenging to deploy, reproduce, and maintain ML-enabled systems over the long term. Previous studi...
755 Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs
2605.07417
DNN fault tolerance memory提出替代ECC的轻量方法提升大规模DNN可靠性。
cs.LG
Mohammad Hasan Ahmadilivani, Marten Roots, Marco Restifo, Sven-Markus Loorits, Luca Di Mauro
Modern Deep Learning (DL) workloads are increasingly deployed in safety-critical domains, such as automotive systems and hyperscale data centers, where transient hardware faults pose a serious threat to system reliability. These workloads are highly memory-int...
Modern Deep Learning (DL) workloads are increasingly deployed in safety-critical domains, such as automotive systems and hyperscale data centers, where transient hardware faults pose a serious threat to system reliability. These workloads are highly memory-intensive, and their correct functionality strongly depends on model parameters stored in memory, which are typically protected using Error Correction Codes (ECCs). In this work, we study ECC's impact on such models and propose two lightweight...
756 Inference of Qualitative Models from Steady-State Data via Weighted MaxSMT
2605.07433
Qualitative model inference MaxSMT用加权MaxSMT从稳态数据鲁棒推断定性生物模型。
cs.LG
Ond\v{r}ej Huvar, Nikola Bene\v{s}, Martin Jon\'a\v{s}, David \v{S}afr\'anek, Samuel Pastva
Qualitative models provide crucial instruments for modelling complex biological systems. While advances in automated reasoning and symbolic encodings have enabled rigorous inference of these models from data, the process remains highly fragile. First, biologic...
Qualitative models provide crucial instruments for modelling complex biological systems. While advances in automated reasoning and symbolic encodings have enabled rigorous inference of these models from data, the process remains highly fragile. First, biological measurement errors inevitably propagate into formal model specifications. Second, when a specification becomes unsatisfiable, distinguishing between fundamental design flaws and minor technical errors is notoriously difficult. This uncer...
757 Breaking QAOA's Fixed Target Hamiltonian Barrier: A Fully Connected Quantum Boltzmann Machine via Bilevel Optimization
2605.07473
Quantum Boltzmann machine QAOA用双层优化扩展QAOA实现全连接量子玻尔兹曼机。
cs.LG
Jun Liu
To overcome the limitations of classical partially connected Boltzmann machines and mainstream quantum Boltzmann machines (QBMs), this work extends the conventional circuit of the quantum approximate optimization algorithm (QAOA) to a bilevel optimization arch...
To overcome the limitations of classical partially connected Boltzmann machines and mainstream quantum Boltzmann machines (QBMs), this work extends the conventional circuit of the quantum approximate optimization algorithm (QAOA) to a bilevel optimization architecture and proposes a fully connected QBM. The inner-loop training simulates positive phase energy minimization based on the computational process of the conventional QAOA circuit, whereas the outer-loop training simulates negative phase ...
758 Efficient Data Selection for Multimodal Models via Incremental Optimization Utility
2605.07488
Multimodal data selection将多模态数据选择建模为增量效用排序以降成本。
cs.LGcs.AI
Jinhao Jing, Qiannian Zhao, Chao Huang, Zhan Su
The scaling of Large Multimodal Models (LMMs) is constrained by the quality-quantity trade-off inherent in synthetic data. Previous approaches, such as LLM-as-a-Judge, have proven their effectiveness in addressing this but suffer from prohibitive computational...
The scaling of Large Multimodal Models (LMMs) is constrained by the quality-quantity trade-off inherent in synthetic data. Previous approaches, such as LLM-as-a-Judge, have proven their effectiveness in addressing this but suffer from prohibitive computational costs and lack of interpretability. To bridge this gap, we propose One-Step-Train (OST), a framework that reformulates data selection as an incremental optimization utility ranking problem. Instead of relying on semantic heuristics, OST es...
759 LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
2605.07505
Compact GUI agent distillation用强化学习蒸馏轻量端侧视觉语言GUI智能体。
cs.LGcs.AI
Yubin Wu, Zicheng Cai, Liping Ning, Hua Wang, Zhi Chen
Developing lightweight, on-device vision-language GUI agents is essential for efficient cross-platform automated interaction. However, current on-device agents are constrained by limited model capacity, and further performance improvements remain urgently need...
Developing lightweight, on-device vision-language GUI agents is essential for efficient cross-platform automated interaction. However, current on-device agents are constrained by limited model capacity, and further performance improvements remain urgently needed. Traditional Supervised Fine-Tuning (SFT) for small-scale models often leads to overfitting, catastrophic forgetting and policy rigidity, and thus fails to fully address these challenges. In this work, we propose a novel SFT-free trainin...
760 GESR: Graph-Based Edge Semantic Reconstruction for Stealthy Communication Detection with Benign-Only Training
2605.07536
Graph anomaly network security用图语义重建在仅良性训练下检测隐蔽通信。
cs.LG
Henghui Xu, Yuchen Zhang, Xiaobo Ma
Detecting stealthy malicious communications from flow logs under benign-only training remains a critical challenge in network security. Malicious communications often camouflage as normal traffic like standard HTTPS flows. Conventional intrusion detectors rely...
Detecting stealthy malicious communications from flow logs under benign-only training remains a critical challenge in network security. Malicious communications often camouflage as normal traffic like standard HTTPS flows. Conventional intrusion detectors rely strictly on known labeled attacks. Alternatively, they score flows completely independently. These approaches fail against sparse and context-dependent suspicious activity. To capture this essential context, graph anomaly detectors have be...
761 A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning
2605.07596
Contrastive learning generalization分析极多类监督对比学习的更精细泛化与样本复杂度。
cs.LG
Nong Minh Hieu, Antoine Ledent
Contrastive Representation Learning (CRL) has achieved strong empirical success in multiple machine learning disciplines, yet its theoretical sample complexity remains poorly understood. Existing analyses usually assume that input tuples are identically and in...
Contrastive Representation Learning (CRL) has achieved strong empirical success in multiple machine learning disciplines, yet its theoretical sample complexity remains poorly understood. Existing analyses usually assume that input tuples are identically and independently distributed, an assumption violated in most practical settings where contrastive tuples are constructed from a finite pool of labeled data, inducing dependencies among tuples. While one recent work analyzed this learning setting...
762 Robust stochastic first order methods in heavy-tailed noise via medoid mini-batch gradient sampling
2605.07634
Heavy-tailed robust optimization用medoid小批梯度采样提升重尾噪声下SGD鲁棒性。
cs.LG
Manojlo Vukovic, Dusan Jakovetic
We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed,...
We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed, with possibly infinite variances. For the considered heavy-tailed setting, many algorithmic variants have recently been proposed based on gradient clipping or other nonlinear operators (e.g., normalization) applied over noisy gradients. In...
763 Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding
2605.07637
Multi-agent pathfinding communication学习局部通信策略以扩展大规模多智能体路径规划。
cs.LGcs.AI
Valeriy Vyaltsev, Alsu Sagirova, Anton Andreychuk, Yuri Kuratov, Konstantin Yakovlev
Multi-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solv...
Multi-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solvers are critical for real-world applications such as logistics and search-and-rescue. To this end, the research community has proposed various decentralized suboptimal MAPF solvers that leverage machine learning. Such methods frame MAPF (fr...
764 Quotient Semivalues for False-Name-Resistant Data Attribution
2605.07663
False-name-resistant data valuation提出quotient semivalue机制抵抗伪名操纵的数据归因。
cs.LG
Florian A. D. Burnat, Brittany I. Davidson
Data valuation methods allocate payments and audit training data's contribution to machine-learning pipelines; however, they often assume passive contributors. In reality, contributors can split datasets across pseudonymous identities, duplicate high-value exa...
Data valuation methods allocate payments and audit training data's contribution to machine-learning pipelines; however, they often assume passive contributors. In reality, contributors can split datasets across pseudonymous identities, duplicate high-value examples, create near-duplicates, or launder synthetic variants to inflate their share. We formalize this as false-name manipulation in ML data attribution. Our main construction is the quotient semivalue mechanism: compute Shapley-, Banzhaf-,...
765 Debiased Counterfactual Generation via Flow Matching from Observations
2605.07665
Counterfactual generation flow matching用流匹配利用观测分布关系生成去偏反事实分布。
cs.LG
Hugh Dance, Johnny Xi, Peter Orbanz, Benjamin Bloem-Reddy
Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relatio...
Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relationship to the observational data. In this work, we show that under standard assumptions, observational and counterfactual outcome distributions are tightly linked: they have identical support and tail behavior, remain statistically close und...
766 Differentially Private Auditing Under Strategic Response
2605.07674
DP auditing with strategic agents将差分隐私审计建模为Stackelberg博弈并优化查询预算。
cs.LG
Florian A. D. Burnat
Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize pri...
Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize privacy-constrained auditing as a bilevel Stackelberg game, in which an auditor commits to a query policy and DP budget allocation across harm dimensions, and a strategic developer reallocates mitigation efforts in response. We introduce the w...
767 FactoryBench: Evaluating Industrial Machine Understanding
2605.07675
Industrial telemetry benchmark发布FactoryBench评测工业机器人时序理解与因果问答。
cs.LGcs.AI
Yanis Merzouki, Coral Izquierdo, Matei Ignuta-Ciuncanu, Marcos Gomez-Bracamonte, Riccardo Maggioni
We introduce FactoryBench, a benchmark for evaluating time-series models and LLMs on machine understanding over industrial robotic telemetry. Q&A pairs are organized along four causal levels (state, intervention, counterfactual, decision) instantiating Pea...
We introduce FactoryBench, a benchmark for evaluating time-series models and LLMs on machine understanding over industrial robotic telemetry. Q&A pairs are organized along four causal levels (state, intervention, counterfactual, decision) instantiating Pearl's ladder of causation, and span five answer formats: four structured formats are scored deterministically and free-form answers are scored by an LLM-as-judge voting protocol. We propose a scalable Q&A generation framework built around struct...
768 Physics-Informed Reduced-Order Operator Learning for Hyperelasticity in Continuum Micromechanics
2605.07738
Physics-informed reduced operator learning结合EquiNO与Q-DEIM做超弹性微观力学降阶算子学习。
cs.LG
Hamidreza Eivazi, Henning Wessels
Physics-informed operator learning is an attractive candidate for surrogate modeling of microstructures, especially in multiscale finite-element simulations. Its practical use, however, is often limited by the high cost of loss evaluation. We address this bott...
Physics-informed operator learning is an attractive candidate for surrogate modeling of microstructures, especially in multiscale finite-element simulations. Its practical use, however, is often limited by the high cost of loss evaluation. We address this bottleneck by combining the Equilibrium Neural Operator (EquiNO) with the QR-based discrete empirical interpolation method (Q-DEIM). EquiNO learns only the modal coefficients of reduced displacement-fluctuation and first Piola-Kirchhoff stress ...
769 Flow Matching for Count Data
2605.07746
Flow matching for counts提出适用于高维计数数据的流匹配生成建模方法。
cs.LG
Ganchao Wei, John Pearson
High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusio...
High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous sp...
770 SMT-Based Active Learning of Weighted Automata
2605.07758
SMT-based active automata learning用SMT主动学习非确定加权自动机并保证最小性。
cs.LG
Tiago Ferreira, Kevin Batz, Alexandra Silva
We present an SMT-based active learning algorithm for nondeterministic weighted automata (WFAs) as a practical and robust alternative to Hankel/L*-style methods. Our algorithm is parametric in a given semiring and, if it terminates, guaranteed to produce minim...
We present an SMT-based active learning algorithm for nondeterministic weighted automata (WFAs) as a practical and robust alternative to Hankel/L*-style methods. Our algorithm is parametric in a given semiring and, if it terminates, guaranteed to produce minimal WFAs. We prove partial correctness and provide a sufficient termination condition, which in particular implies termination for all finite semirings. Our extensive experimental evaluation shows that our algorithm is capable of learning nu...
771 Interactive Trajectory Planning with Learning-based Distributionally Robust Model Predictive Control and Markov Systems
2605.07768
Distributionally robust MPC planning学习他车决策分布并用PAC+分布鲁棒MPC做交互规划。
cs.LG
Erik B\"orve, Nikolce Murgovski, Morteza Haghir Chehreghani, Leo Laine
We investigate interactive trajectory planning subject to uncertainty in the decisions of surrounding agents. To control the ego-agent, we aim to first learn the decision distribution and solve a Stochastic Model Predictive Control (SMPC) problem. To account f...
We investigate interactive trajectory planning subject to uncertainty in the decisions of surrounding agents. To control the ego-agent, we aim to first learn the decision distribution and solve a Stochastic Model Predictive Control (SMPC) problem. To account for errors in the learned distribution, we show that it is possible to utilize Probably Approximately Correct (PAC) learning in combination with Distributionally Robust (DR) optimization to obtain a solution which accounts for the errors ind...
772 GRASP -- Graph-Based Anomaly Detection Through Self-Supervised Classification
2605.07812
Self-supervised graph intrusion detection用自监督图分类实现溯源图异常检测以识别APT攻击。
cs.LG
Robin Buchta, Carsten Kleiner, Felix Heine, Gabi Dreo Rodosek
Advanced persistent threat (APT) attacks remain difficult to detect due to their stealth, adaptability, and use of legitimate system components. Provenance-based intrusion detection systems (PIDS) offer a promising defense by capturing detailed relationships b...
Advanced persistent threat (APT) attacks remain difficult to detect due to their stealth, adaptability, and use of legitimate system components. Provenance-based intrusion detection systems (PIDS) offer a promising defense by capturing detailed relationships between system components and actions. However, current PIDS rely on predefined or subset-determined thresholds, which limit detection stability and the ability to detect any anomalous behavior in general. Furthermore, related work often neg...
773 NSPOD: acceleratingthe convergence ofKrylov-based iterative linearsolvers via approximated PODs
2605.07828
Krylov solver acceleration POD用近似POD加速参数PDE的Krylov迭代线性求解收敛。
cs.LG
Francesc Levrero-Florencio, Youngkyu Lee, Jay Pathak, George Em Karniadakis
The convergence of Krylov-based linear iterative solvers applied to parametric partial differential equations (PDEs) is often highly sensitive to the domain, its discretization, the location/values of the applied Dirichlet/Neumann boundary conditions, body for...
The convergence of Krylov-based linear iterative solvers applied to parametric partial differential equations (PDEs) is often highly sensitive to the domain, its discretization, the location/values of the applied Dirichlet/Neumann boundary conditions, body forces and material properties, among others. We have previously introduced hybridization of classical linear iterative solvers with neural operators for specific geometries, but they tend to not perform well on geometries not previously seen ...
774 PPI-Net connects molecular protein interactions to functional processes in disease
2605.07838
Hierarchical GNN for disease用层次图网络融合PPI与通路表征建模疾病机制。
cs.LGcs.AI
Kyle Higgins, Guadalupe Gonzalez, Dennis Veselkov, Ivan Laponogov, Kirill Veselkov
Understanding how molecular alterations propagate across biological systems to drive disease remains a central challenge. Although high-throughput profiling enables comprehensive characterization of tumor states, most models neglect structured biological relat...
Understanding how molecular alterations propagate across biological systems to drive disease remains a central challenge. Although high-throughput profiling enables comprehensive characterization of tumor states, most models neglect structured biological relationships or lack interpretability across scales. Here we present PPI-Net, a hierarchical graph neural network that integrates protein-protein interaction (PPI) networks with pathway-level representations to model disease from molecular inte...
775 Characterizing and Correcting Effective Target Shift in Online Learning
2605.07886
Online learning target shift刻画在线核回归的有效目标偏移并给出校正方法。
cs.LG
Ziyan Li, Naoki Hiratani
Online learning from a stream of data is a defining feature of intelligence, yet modern machine learning systems often struggle in this setting, especially under distributional shift. To understand its basic properties, we study the relationship between online...
Online learning from a stream of data is a defining feature of intelligence, yet modern machine learning systems often struggle in this setting, especially under distributional shift. To understand its basic properties, we study the relationship between online and offline learning in the context of kernel regression. We derive a closed-form expression for the function learned by online kernel regression, revealing that online kernel regression is equivalent to offline regression with shifted, in...
776 Statistical inference with belief functions: A survey
2605.07908
Belief function statistical inference综述基于信念函数框架的统计推断与学习方法。
cs.LGcs.AI
Fabio Cuzzolin
Belief functions are a powerful and popular framework for the mathematical characterisation of uncertainty, in particular in situations in which lack of data renders learning a probability distribution for the problem impractical. The first step in a reasoning...
Belief functions are a powerful and popular framework for the mathematical characterisation of uncertainty, in particular in situations in which lack of data renders learning a probability distribution for the problem impractical. The first step in a reasoning chain based on belief functions is inference: how to learn a belief measure from the available data. In this survey we focus, in particular, on making inference from statistical data, and review the most significant contributions in the ar...
777 Exploring the non-convexity in machine learning using quantum-inspired optimization
2605.07947
Quantum-inspired nonconvex optimization用量子启发全局搜索求解含离群点的非凸学习问题。
cs.LGcs.AI
Kandula Eswara Sai Kumar, Parth Dhananjay Danve, Abhishek Chopra, Rut Lineswala
The escalating complexity of modern machine learning necessitates solving challenging non-convex optimization problems, particularly in high-dimensional regimes and scenarios contaminated by gross outliers. Traditional approaches, relying on convex relaxations...
The escalating complexity of modern machine learning necessitates solving challenging non-convex optimization problems, particularly in high-dimensional regimes and scenarios contaminated by gross outliers. Traditional approaches, relying on convex relaxations or specialized local search heuristics, frequently succumb to suboptimal local minima and fail to recover the true underlying discrete structures. In this paper, we propose treating these non-convex challenges as a global search problem an...
778 Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means
2605.07964
Bayes-assisted confidence sequences提出贝叶斯辅助的有界均值置信序列并达对数最优。
cs.LG
Valentin Kilian, Stefano Cortinovis, Fran\c{c}ois Caron
Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martin...
Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martingale updates, and many existing constructions do not exploit prior information about plausible data-generating distributions or mean values. We propose a Bayes-assisted framework that uses a Bayesian working predictive model to adaptively c...
779 Linear Response Estimators for Singular Statistical Models
2605.07970
Linear response in singular models定义并一致估计奇异统计模型的线性响应敏感度。
cs.LG
Chris Elliott, Daniel Murfet
We define susceptibilities as a measure of the response of an observable quantity of a parameterized statistical model to a perturbation of the data for a general class of observables. We define estimators for these susceptibilities as statistics in a sequence...
We define susceptibilities as a measure of the response of an observable quantity of a parameterized statistical model to a perturbation of the data for a general class of observables. We define estimators for these susceptibilities as statistics in a sequence of n data-points and prove that these estimators are consistent and asymptotically unbiased in the large n regime.
780 Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems
2605.08006
Bilevel minimax optimization methods提出罚函数一阶法求解含minimax与约束下层的双层优化。
cs.LG
Yiyang Shen, Yutian He, Weiran Wang, Qihang Lin
We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax opti...
We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here....
781 Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction
2605.08022
Spiking Neural Network Training通过参数重构实现SNN的全局最优训练,避免替代梯度误差累积。
cs.LGcs.AI
Himanshu Udupi, Xiaocong Yang, ChengXiang Zhai
Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks (ANNs). However, the training of SNN usually relies on surrogate gradients due to the non-differentiability...
Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks (ANNs). However, the training of SNN usually relies on surrogate gradients due to the non-differentiability of the spike function, introducing approximation errors that accumulate across layers. To address this challenge, we extend the work on convexification of parallel feedforward threshold networks to parallel recurrent threshold networks, wh...
782 Semiparametric Efficient Test for Interpretable Distributional Treatment Effects
2605.08034
Distributional Treatment Effect Testing提出半参数高效的有限位置检验,定位处理效应导致的分布差异。
cs.LG
Houssam Zenati, Arthur Gretton
Distributional treatment effects can be invisible to means: a treatment may preserve average outcomes while changing tails, modes, dispersion, or rare-event probabilities. Kernel tests can detect discrepancies between interventional outcome laws, but global te...
Distributional treatment effects can be invisible to means: a treatment may preserve average outcomes while changing tails, modes, dispersion, or rare-event probabilities. Kernel tests can detect discrepancies between interventional outcome laws, but global tests do not reveal where the laws differ. We propose DR-ME, to our knowledge the first semiparametrically efficient finite-location test for interpretable distributional treatment effects. DR-ME evaluates an interventional kernel witness at ...
783 PropSplat: Map-Free RF Field Reconstruction via 3D Gaussian Propagation Splatting
2605.08035
RF Propagation Field Reconstruction用3D各向异性高斯传播splatting在无地图条件下重建RF场与路径损耗。
cs.LG
William Bjorndahl, Maninder Pal Singh, Farhad Nouri, Joseph Camp
Building a site-specific propagation model typically requires either ray-tracing over detailed 3D maps or dense measurement campaigns. Both approaches are expensive and often infeasible for rapid deployments where geographic data is unavailable or outdated. We...
Building a site-specific propagation model typically requires either ray-tracing over detailed 3D maps or dense measurement campaigns. Both approaches are expensive and often infeasible for rapid deployments where geographic data is unavailable or outdated. We present PropSplat, a map-free propagation modeling method that reconstructs radio frequency (RF) fields using 3D anisotropic Gaussian primitives. Each Gaussian encodes a scalar path loss offset relative to an explicit baseline path loss mo...
784 A Note on Non-Negative $L_1$-Approximating Polynomials
2605.08072
Nonnegative L1 Approximating Polynomials研究高斯分布下非负L1指示函数近似多项式的存在性与性质。
cs.LG
Jane H. Lee, Anay Mehrotra, Manolis Zampetakis
$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials...
$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials with respect to Gaussian distributions. This is a stronger requirement than $L_1$-approximation but weaker than sandwiching polynomials (which themselves have many applications). These non-negative approximating polynomials have recently f...
785 Multi-Stage Prototype Learning for Interpretable Time Series Classification
2106.09636
Interpretable Time Series Classification提出多阶段原型学习,提取可解释的单变量与跨变量时序模式用于分类。
cs.LG
Bhavesh Kalisetti, Vincent Wang, Gaurav R. Ghosal, Maryam Bijanzadeh, Reza Abbasi-Asl
Deep learning methods are powerful tools in classifying multivariate time series data. Despite their high performance, these methods are hard to interpret, which diminishes their applications in high-risk domains such as healthcare. In this paper, we propose a...
Deep learning methods are powerful tools in classifying multivariate time series data. Despite their high performance, these methods are hard to interpret, which diminishes their applications in high-risk domains such as healthcare. In this paper, we propose a novel multi-stage prototype learning framework for multivariate time series classification. By design, our framework identifies predictive temporal patterns in individual variables as well as cross-variable patterns that are highly predict...
786 Testing Noise Assumptions of Learning Algorithms
2501.09189
Testing Learning Noise Assumptions给出高效算法检测训练集是否满足特定噪声模型假设,并扩展可测试学习框架。
cs.LG
Surbhi Goel, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan
We pose a fundamental question in computational learning theory: can we efficiently test whether a training set satisfies the assumptions of a given noise model? This question has remained unaddressed despite decades of research on learning in the presence of ...
We pose a fundamental question in computational learning theory: can we efficiently test whether a training set satisfies the assumptions of a given noise model? This question has remained unaddressed despite decades of research on learning in the presence of noise. In this work, we show that this task is tractable and present the first efficient algorithm to test various noise assumptions on the training data. To model this question, we extend the recently proposed testable learning framework o...
787 Generalized Euler Logarithm and its Applications in Machine Learning: Natural Gradient, Backpropagation, Generalized EG, Mirror Descent and OLPS
2502.17500
Generalized Logarithm for Optimization系统分析广义欧拉对数并连接自然梯度、镜像下降等学习算法。
cs.LGcs.AI
Andrzej Cichocki
This paper investigates in depth the fundamental properties of the two-parameter generalized Euler logarithm and its inverse, the associated deformed $(a,b)$-exponential function. We systematically clarify the parameter domains that guarantee monotonicity, con...
This paper investigates in depth the fundamental properties of the two-parameter generalized Euler logarithm and its inverse, the associated deformed $(a,b)$-exponential function. We systematically clarify the parameter domains that guarantee monotonicity, concavity, and invertibility, derive series and integral representations, and provide explicit links to a broad class of one- and two-parameter deformations, including Tsallis, Kaniadakis, Schw\"ammle--Tsallis, Kaniadakis--Scarfone, and Tempes...
788 No Forgetting Learning: Buffer-free Continual Learning Classification
2503.04638
Buffer-free Continual Learning提出无回放缓冲的持续学习框架,利用过参数冗余分解共享骨干与任务头减遗忘。
cs.LG
Mohammad Ali Vahedifar, Qi Zhang
Most Continual Learning (CL) methods maintain performance on earlier tasks by storing exemplars in a replay buffer, introducing memory overhead that scales with the number of tasks and raising privacy concerns in regulated domains. We propose No Forgetting Lea...
Most Continual Learning (CL) methods maintain performance on earlier tasks by storing exemplars in a replay buffer, introducing memory overhead that scales with the number of tasks and raising privacy concerns in regulated domains. We propose No Forgetting Learning (NFL), a buffer-free framework for class- and task-incremental learning that instead exploits the inherent redundancy of overparameterized networks. NFL decomposes the network into a shared backbone and task-specific heads, then appli...
789 A Resilience Framework for Bi-Criteria Combinatorial Optimization with Bandit Feedback
2503.12285
Bicriteria Bandit Combinatorial Optimization提出双目标组合优化的韧性框架与离线到在线化方法,处理带噪bandit反馈。
cs.LGcs.AI
Vaneet Aggarwal, Shweta Jain, Subham Pokhriyal, Christopher John Quinn
We study bi-criteria combinatorial optimization under noisy function evaluations. While resilience and black-box offline-to-online reductions have been studied in single-objective settings, extending these ideas to bi-criteria problems introduces new challenge...
We study bi-criteria combinatorial optimization under noisy function evaluations. While resilience and black-box offline-to-online reductions have been studied in single-objective settings, extending these ideas to bi-criteria problems introduces new challenges due to the coupled degradation of approximation guarantees for objectives and constraints. We introduce a notion of $(\alpha,\beta,\delta,\texttt{N})$-resilience for bi-criteria approximation algorithms, capturing how joint approximation ...
790 Synergistic Benefits of Joint Molecule Generation and Property Prediction
2504.16559
Joint Molecule Generation Prediction提出Hyformer联合分子生成与性质预测,通过交替注意力实现协同提升。
cs.LG
Adam Izdebski, Jan Olszewski, Pankhil Gawade, Krzysztof Koras, Serra Korkmaz
Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint ...
Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint models presents daunting architectural and optimization challenges. Here, we propose Hyformer, a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism ...
791 Identifiability Challenges in Sparse Linear Ordinary Differential Equations
2506.09816
Identifiability in Sparse Linear ODEs分析稀疏线性常微分方程从数据学习时的可辨识性困难与条件。
cs.LG
Cecilia Casolo, S\"oren Becker, Niki Kilbertus
Dynamical systems modeling is a core pillar of scientific inquiry across natural and life sciences. Increasingly, dynamical system models are learned from data, rendering identifiability a paramount concept. For systems that are not identifiable from data, no ...
Dynamical systems modeling is a core pillar of scientific inquiry across natural and life sciences. Increasingly, dynamical system models are learned from data, rendering identifiability a paramount concept. For systems that are not identifiable from data, no guarantees can be given about their behavior under new conditions and inputs, or about possible control mechanisms to steer the system. It is known in the community that "linear ordinary differential equations (ODE) are almost surely identi...
792 From Time Series Analysis to Question Answering: A Survey in the LLM Era
2506.11512
LLMs for Time Series QA Survey综述LLM时代时间序列分析到问答等语言任务的融合进展与挑战。
cs.LGcs.AI
Wei Li, Zhe Xie, Yuxuan Liang, Xinli Hao, Yunyao Cheng
Recently, Large Language Models (LLMs) have introduced a novel paradigm in Time Series Analysis (TSA), leveraging strong language capabilities to support tasks such as forecasting and anomaly detection. However, these analysis tasks cannot adequately cover tem...
Recently, Large Language Models (LLMs) have introduced a novel paradigm in Time Series Analysis (TSA), leveraging strong language capabilities to support tasks such as forecasting and anomaly detection. However, these analysis tasks cannot adequately cover temporal language tasks, such as interpretation and captioning. A fundamental gap remains between TSA and LLMs: LLMs are pre-trained to optimize natural language relevance for question answering rather than objectives specialized for TSA. To b...
793 HYPER: A Foundation Model for Inductive Link Prediction with Knowledge Hypergraphs
2506.12362
Inductive Hypergraph Link Prediction提出知识超图基础模型HYPER,实现对新实体与新关系类型的归纳链路预测。
cs.LGcs.AI
Xingyue Huang, Mikhail Galkin, Michael M. Bronstein, \.Ismail \.Ilkan Ceylan
Inductive link prediction with knowledge hypergraphs is the task of predicting missing hyperedges involving completely novel entities (i.e., nodes unseen during training). Existing methods for inductive link prediction with knowledge hypergraphs assume a fixed...
Inductive link prediction with knowledge hypergraphs is the task of predicting missing hyperedges involving completely novel entities (i.e., nodes unseen during training). Existing methods for inductive link prediction with knowledge hypergraphs assume a fixed relational vocabulary and, as a result, cannot generalize to knowledge hypergraphs with novel relation types (i.e., relations unseen during training). Inspired by knowledge graph foundation models, we propose HYPER as a foundation model fo...
794 Flat Channels to Infinity in Neural Loss Landscapes
2506.14951
Neural Loss Landscape Channels刻画损失景观中的平坦通道结构,解释神经元权重发散与合并现象。
cs.LGcs.AI
Flavio Martinelli, Alexander Van Meegen, Berfin \c{S}im\c{s}ek, Wulfram Gerstner, Johanni Brea
The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slow...
The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slowly, while the output weights of at least two neurons, $a_i$ and $a_j$, diverge to $\pm$infinity, and their input weight vectors, $\mathbf{w_i}$ and $\mathbf{w_j}$, become equal to each other. At convergence, the two neurons implement a gate...
795 Discovering Learning-Friendly Generation Orders for Sequential Computation
2506.23875
Learning-friendly Generation Order Discovery用损失剖面自动搜索序列计算的生成顺序,使训练更易收敛。
cs.LGcs.AI
Yuta Sato, Kazuhiko Kawamoto, Hiroshi Kera
Sequential computation via autoregressive generation can make difficult tasks learnable, but the generation order of intermediate states strongly affects whether training succeeds. We address the problem of discovering a learning-friendly target order automati...
Sequential computation via autoregressive generation can make difficult tasks learnable, but the generation order of intermediate states strongly affects whether training succeeds. We address the problem of discovering a learning-friendly target order automatically, rather than relying on task-specific design. Our key observation is that learning-friendly orders cause a faster loss drop in the early stage of training. We exploit this by \emph{loss profiling}, which ranks candidate orders by the ...
796 Scalable Equilibrium Propagation via Intermediate Error Signals for Deep Convolutional CRNNs
2508.15989
Scalable Equilibrium Propagation为深度卷积CRNN提出带中间误差信号的可扩展平衡传播训练方法。
cs.LG
Jiaqi Lin, Malyaban Bal, Abhronil Sengupta
Equilibrium Propagation (EP) is a biologically inspired local learning rule first proposed for convergent recurrent neural networks (CRNNs), in which synaptic updates depend only on neuron states from two distinct phases. EP estimates gradients that closely al...
Equilibrium Propagation (EP) is a biologically inspired local learning rule first proposed for convergent recurrent neural networks (CRNNs), in which synaptic updates depend only on neuron states from two distinct phases. EP estimates gradients that closely align with those computed by Backpropagation Through Time (BPTT) while significantly reducing computational demands, positioning it as a potential candidate for on-chip training in neuromorphic architectures. However, prior studies on EP have...
797 Normalized Maximum Likelihood Code-Length on Riemannian Data Spaces
2508.21466
NML on Riemannian Manifolds将归一化最大似然码长推广到黎曼数据空间,用于模型选择与遗憾最小化。
cs.LG
Kota Fukuzawa, Atsushi Suzuki, Kenji Yamanishi
In recent years, with the large-scale expansion of graph data, there has been an increased focus on Riemannian manifold data spaces other than Euclidean space. In particular, the development of hyperbolic spaces has been remarkable, and they have high expressi...
In recent years, with the large-scale expansion of graph data, there has been an increased focus on Riemannian manifold data spaces other than Euclidean space. In particular, the development of hyperbolic spaces has been remarkable, and they have high expressive power for graph data with hierarchical structures. Normalized Maximum Likelihood (NML) is employed in regret minimization and model selection. However, existing formulations of NML have been developed primarily in Euclidean spaces and ar...
798 Scalable Option Learning in High-Throughput Environments
2509.00338
Scalable Hierarchical Reinforcement Learning提出SOL可扩展选项学习算法,提升高吞吐环境下分层强化学习训练效率。
cs.LGcs.AI
Mikael Henaff, Scott Fujimoto, Michael Matthews, Michael Rabbat
Hierarchical reinforcement learning (RL) has the potential to enable effective decision-making over long timescales. Existing approaches, while promising, have yet to realize the benefits of large-scale training. In this work, we identify and solve several key...
Hierarchical reinforcement learning (RL) has the potential to enable effective decision-making over long timescales. Existing approaches, while promising, have yet to realize the benefits of large-scale training. In this work, we identify and solve several key challenges in scaling online hierarchical RL to high-throughput environments. We propose Scalable Option Learning (SOL), a highly scalable hierarchical RL algorithm which achieves a ~35x higher throughput compared to existing hierarchical ...
799 Ensemble Learning for Healthcare: A Comparative Analysis of Hybrid Voting and Ensemble Stacking in Obesity Risk Prediction
2509.02826
Ensemble Obesity Risk Prediction比较混合投票与堆叠集成在肥胖风险预测中的效果与差异。
cs.LGcs.AI
Towhidul Islam, Md Sumon Ali
Obesity is a critical global health issue driven by dietary, physiological, and environmental factors, and is strongly associated with chronic diseases such as diabetes, cardiovascular disorders, and cancer. Machine learning has emerged as a promising approach...
Obesity is a critical global health issue driven by dietary, physiological, and environmental factors, and is strongly associated with chronic diseases such as diabetes, cardiovascular disorders, and cancer. Machine learning has emerged as a promising approach for early obesity risk prediction, yet a comparative evaluation of ensemble techniques -- particularly hybrid majority voting and ensemble stacking -- remains limited. This study aims to compare hybrid majority voting and ensemble stacking...
800 Mechanistic Interpretability with Sparse Autoencoder Neural Operators
2509.03738
Sparse Autoencoder Neural Operators提出函数空间稀疏自编码算子,用函数化概念实现机制可解释表示。
cs.LGcs.AI
Bahareh Tolooshams, Ailsa Shen, Anima Anandkumar
We introduce sparse autoencoder neural operators (SAE-NOs), a new class of sparse autoencoders that operate in function spaces rather than fixed-dimensional Euclidean representations. We formalize the functional representation hypothesis, where data are explai...
We introduce sparse autoencoder neural operators (SAE-NOs), a new class of sparse autoencoders that operate in function spaces rather than fixed-dimensional Euclidean representations. We formalize the functional representation hypothesis, where data are explained through sparse compositions of structured functions. Unlike standard SAEs that represent concepts with scalar activations, SAE-NOs parameterize concepts as functions, enabling representations that capture not only a concept's presence, ...
801 Hammer and Anvil: Toward a Theory of Backdoors in Federated Learning
2509.08089
Federated Learning Backdoor Theory提出Hammer and Anvil理论框架按更新偏离度刻画FL后门并分析防御类型。
cs.LG
Lucas Fenaux, Zheng Wang, Jacob Yan, Nathan Chung, Florian Kerschbaum
Federated Learning (FL) enables distributed model training but is vulnerable to backdoor attacks, where malicious clients embed attacker-controlled behaviors into the global model. Existing defenses fail against adaptive adversaries. In this paper, we present ...
Federated Learning (FL) enables distributed model training but is vulnerable to backdoor attacks, where malicious clients embed attacker-controlled behaviors into the global model. Existing defenses fail against adaptive adversaries. In this paper, we present "Hammer and Anvil", a principled theoretical framework that categorizes backdoors by the deviation, $\delta$, of their updates to the mean of the updates. We identify two fundamental defense types: "Type 1 (The Anvil)", comprising outlier d...
802 Inverse Reinforcement Learning with Just Classification and a Few Regressions
2509.21172
Normalized Inverse Reinforcement Learning在最大熵IRL中研究状态仿射归一化下的奖励可恢复性,仅需分类与少量回归。
cs.LG
Lars van der Laan, Nathan Kallus, Aurelien Bibaut
Inverse reinforcement learning (IRL) aims to infer rewards from observed behavior, but rewards are not identified from the policy alone: many reward--value pairs can rationalize the same actions. Meaningful reward recovery therefore requires a normalization, y...
Inverse reinforcement learning (IRL) aims to infer rewards from observed behavior, but rewards are not identified from the policy alone: many reward--value pairs can rationalize the same actions. Meaningful reward recovery therefore requires a normalization, yet existing normalized IRL methods often rely on anchor-action restrictions or specialized neural architectures. We study reward recovery in the maximum-entropy, or Gumbel-shock, model under a broad class of statewise affine normalizations,...
803 BoHA: Blockwise Hadamard Product Adaptation for Parameter-Efficient Fine-Tuning
2509.21637
Parameter-efficient Fine-tuning Adaptation提出BoHA块式Hadamard适配用于PEFT,并关注序列微调下的遗忘保持。
cs.LG
Feng Yu, Jia Hu, Geyong Min
Parameter-efficient fine-tuning (PEFT) of large language models trains a small task-specific parameter set while keeping the pretrained model frozen. The dominant Low-Rank Adaptation (LoRA) family makes this trade-off practical; however, evaluations under the ...
Parameter-efficient fine-tuning (PEFT) of large language models trains a small task-specific parameter set while keeping the pretrained model frozen. The dominant Low-Rank Adaptation (LoRA) family makes this trade-off practical; however, evaluations under the same parameter budget assess single-task accuracy. In sequential adaptation settings, such evaluations should also measure how well performance on the first-stage task is retained after subsequent fine-tuning. To address this gap, we introd...
804 Limitations on Accurate, Trusted, Human-level Reasoning
2509.21654
Limits of Trusted Reasoning用严格定义证明准确性、可被信任与人类水平推理三者存在根本不相容。
cs.LGcs.AI
Rina Panigrahy, Vatsal Sharan
We identify a fundamental incompatibility between the goals of accuracy, trust, and human-level reasoning in artificial intelligence (AI) systems, for strict mathematical definitions of these notions. We define accuracy of a system as the property that it neve...
We identify a fundamental incompatibility between the goals of accuracy, trust, and human-level reasoning in artificial intelligence (AI) systems, for strict mathematical definitions of these notions. We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or e...
805 Fidel-TS: A High-Fidelity Multimodal Benchmark for Time Series Forecasting
2509.24789
Time Series Forecasting Benchmark构建高保真多模态预测基准Fidel-TS,强调数据完整性与无泄漏评测。
cs.LG
Zhijian Xu, Wanxu Cai, Xilin Dai, Zhaorong Deng, Qiang Xu
The evaluation of time series forecasting models is hindered by a lack of high-quality benchmarks, leading to overestimated assessments of progress. Existing datasets suffer from issues ranging from small-scale, low-frequency, pre-training data contamination i...
The evaluation of time series forecasting models is hindered by a lack of high-quality benchmarks, leading to overestimated assessments of progress. Existing datasets suffer from issues ranging from small-scale, low-frequency, pre-training data contamination in unimodal designs to the temporal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, leak-free design, and st...
806 TAP: Two-Stage Adaptive Personalization of Multi-Task and Multi-Modal Foundation Models in Federated Learning
2509.26524
Federated Foundation Model Personalization提出TAP两阶段自适应个性化,在联邦场景下处理多任务多模态异质性。
cs.LGcs.AI
Seohyun Lee, Wenzhi Fang, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton
In federated learning (FL), local personalization of models has received significant attention, yet personalized fine-tuning of foundation models remains underexplored. In particular, there is a lack of understanding in the literature on how to personalize fou...
In federated learning (FL), local personalization of models has received significant attention, yet personalized fine-tuning of foundation models remains underexplored. In particular, there is a lack of understanding in the literature on how to personalize foundation models in settings where there exist heterogeneity not only in data, but also in tasks and modalities across the clients. To address this gap, we propose Two-Stage Adaptive Personalization (TAP). In the first stage, TAP leverages mi...
807 DReS: Dual Reconstruction Smoothing for Functional Regularization
2510.00253
Smoothness Regularization via Reconstruction提出DReS双重重构平滑正则,无需显式梯度开销即可诱导函数平滑性。
cs.LG
Parsa Moradi, Tayyebeh Jahaninezhad, Hanzaleh Akbarinodehi, Mohammad Ali Maddah-Ali
Smoothness is a key inductive bias in machine learning and is closely related to generalization. Existing smoothness-inducing methods typically rely either on explicit gradient regularization, which often incurs substantial computational and memory overhead, o...
Smoothness is a key inductive bias in machine learning and is closely related to generalization. Existing smoothness-inducing methods typically rely either on explicit gradient regularization, which often incurs substantial computational and memory overhead, or on data-mixing strategies, which are less naturally applicable to unsupervised and self-supervised settings. In this work, we propose $\textit{Dual Reconstruction Smoothing}$ (DReS), a nonparametric regularization framework that induces s...
808 Geometric Analysis of Neural Regression Collapse via Intrinsic Dimension
2510.01105
Regression Collapse Geometry用内在维度分析神经回归塌缩现象,解释其为何降低回归性能。
cs.LG
George Andriopoulos, Zixuan Dong, Bimarsha Adhikari, Keith Ross
Neural multivariate regression underpins a wide range of domains, including control, robotics, and finance, yet the geometry of its learned representations remains poorly characterized. While neural collapse has been shown to benefit generalization in classifi...
Neural multivariate regression underpins a wide range of domains, including control, robotics, and finance, yet the geometry of its learned representations remains poorly characterized. While neural collapse has been shown to benefit generalization in classification, we find that analogous collapse in regression consistently degrades performance. To explain this contrast, we analyze regression models through the lens of intrinsic dimension. Across control tasks and synthetic datasets, we estimat...
809 ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
2510.01290
KV Cache Compression for Reasoning提出ThinKV按思维类型自适应压缩KV缓存,用量化与驱逐降低推理显存。
cs.LG
Akshat Ramachandran, Marina Neseem, Charbel Sakr, Rangharajan Venkatesan, Brucek Khailany
The long-output context generation of large reasoning models enables extended chain of thought (CoT) but also drives rapid growth of the key-value (KV) cache, quickly overwhelming GPU memory. To address this challenge, we propose ThinKV, a thought-adaptive KV ...
The long-output context generation of large reasoning models enables extended chain of thought (CoT) but also drives rapid growth of the key-value (KV) cache, quickly overwhelming GPU memory. To address this challenge, we propose ThinKV, a thought-adaptive KV cache compression framework. ThinKV is based on the observation that attention sparsity reveals distinct thought types with varying importance within the CoT. It applies a hybrid quantization-eviction strategy, assigning token precision by ...
810 Flock: A Knowledge Graph Foundation Model via Learning on Random Walks
2510.01510
Knowledge Graph Foundation Model提出Flock在随机游走上学习的KG基础模型,实现零样本链路预测泛化。
cs.LG
Jinwoo Kim, Xingyue Huang, Krzysztof Olejniczak, Kyungbin Min, Michael Bronstein
We study the problem of zero-shot link prediction on knowledge graphs (KGs), which requires models to generalize to novel entities and novel relations. Knowledge graph foundation models (KGFMs) address this task by enforcing equivariance over both nodes and re...
We study the problem of zero-shot link prediction on knowledge graphs (KGs), which requires models to generalize to novel entities and novel relations. Knowledge graph foundation models (KGFMs) address this task by enforcing equivariance over both nodes and relations, which enables them to learn structural properties of nodes and relations that transfer to novel KGs with similar structure. However, the conventional notion of deterministic equivariance inherently limits the expressive power of KG...
811 Closed-Form Last Layer Optimization
2510.04606
Closed-form last-layer training用闭式解更新最后线性层并仅优化骨干参数。
cs.LG
Alexandre Galashov, Natha\"el Da Costa, Liyuan Xu, Philipp Hennig, Arthur Gretton
Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during optimization, treating the la...
Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during optimization, treating the last layer as a function of the backbone parameters, and optimizing solely for these parameters. We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer. We adapt the ...
812 Amortized Multi-Objective Optimization Across Tasks with Generative Solution Modeling
2511.09598
Amortized multi-objective optimization用生成式解模型跨任务摊销求解昂贵多目标优化。
cs.LG
Tingyang Wei, Jiao Liu, Abhishek Gupta, Chin Chun Ooi, Puay Siew Tan
Many real-world applications require solving families of expensive multi-objective optimization problems~(EMOPs) under varying operational conditions. This can be formulated as parametric expensive multi-objective optimization problems (P-EMOPs) where each tas...
Many real-world applications require solving families of expensive multi-objective optimization problems~(EMOPs) under varying operational conditions. This can be formulated as parametric expensive multi-objective optimization problems (P-EMOPs) where each task parameter defines a distinct optimization instance. Current multi-objective Bayesian optimization methods have been widely used for finding finite sets of Pareto optimal solutions for each task. However, P-EMOPs present a fundamental chal...
813 Outlier Smoothing with Closed-Form Rotations for W4A4 Large Language Model Quantization
2511.22316
LLM W4A4 quantization rotations用闭式旋转与异常平滑提升W4A4量化收敛与精度。
cs.LG
Jinying Xiao, Bin Ji, Shasha Li, Xiaodong Liu, Ma Jun
Large Language Models (LLMs) quantization facilitates deploying LLMs in resource-limited settings, but existing methods that combine incompatible gradient optimization and quantization truncation lead to serious convergence pathology. This prolongs quantizatio...
Large Language Models (LLMs) quantization facilitates deploying LLMs in resource-limited settings, but existing methods that combine incompatible gradient optimization and quantization truncation lead to serious convergence pathology. This prolongs quantization time and degrades LLMs' task performance. Our studies confirm that Straight-Through Estimator (STE) on Stiefel manifolds introduce non-smoothness and gradient noise, obstructing optimization convergence and blocking high-fidelity quantize...
814 Faster Verified Explanations for Neural Networks
2512.00164
Verified neural explanations acceleration提出FaVeX加速神经网络可验证解释的计算。
cs.LG
Alessandro De Palma, Greta Dolcetti, Caterina Urban
Verified explanations are a principled way to explain the decisions taken by neural networks, which are otherwise black-box in nature. However, these techniques face significant scalability challenges, as they require multiple calls to neural network verifiers...
Verified explanations are a principled way to explain the decisions taken by neural networks, which are otherwise black-box in nature. However, these techniques face significant scalability challenges, as they require multiple calls to neural network verifiers, each of them with an exponential worst-case complexity. We present FaVeX, a novel algorithm to compute verified explanations. FaVeX accelerates the computation by dynamically combining batch and sequential processing of input features, an...
815 ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms
2512.03476
Agentic scientific computing framework提出ATHENA代理框架自动化数值算法研发全流程。
cs.LGcs.AI
Juan Diego Toscano, Daniel T. Chen, George Em Karniadakis
Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We introduce ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algo...
Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We introduce ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algorithms), an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle. Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual Bandit problem. Acting as an onli...
816 Neural CDEs as Correctors for Learned Time Series Models
2512.12116
Neural CDE predictor-corrector用神经CDE作校正器减少多步时间序列预测误差累积。
cs.LG
Muhammad Bilal Shahid, Zhanhong Jiang, Prajwal Koirala, Soumik Sarkar, Cody Fleming
Learned time-series models, whether continuous or discrete, are widely used for forecasting the states of dynamical systems but suffer from error accumulation in multi-step forecasts. To address this issue, we propose a Predictor-Corrector framework in which t...
Learned time-series models, whether continuous or discrete, are widely used for forecasting the states of dynamical systems but suffer from error accumulation in multi-step forecasts. To address this issue, we propose a Predictor-Corrector framework in which the Predictor is a learned time-series model that generates multi-step forecasts and the Corrector is a neural controlled differential equation that corrects the forecast errors. The Corrector works with irregularly sampled time series and i...
817 Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics
2512.12602
Exact-flow linear attention用连续时间精确流替代欧拉更新实现精确线性注意力。
cs.LG
Jingdi Lei, Di Zhang, Soujanya Poria
In this paper, we introduce Exact Flow Linear Attention~(EFLA), an exact-flow formulation of delta-rule linear attention. We show that the delta-rule update can be interpreted as an explicit Euler discretization of an underlying continuous-time system. EFLA re...
In this paper, we introduce Exact Flow Linear Attention~(EFLA), an exact-flow formulation of delta-rule linear attention. We show that the delta-rule update can be interpreted as an explicit Euler discretization of an underlying continuous-time system. EFLA replaces this first-order update with the exact closed-form flow. By exploiting the rank-1 structure of the dynamics matrix, both the matrix exponential and the input integral collapse to a simple update that preserves delta-rule linear atten...
818 DT-PBO: an Interpretable Tree-based Surrogate Model for Preferential Bayesian Optimization
2512.14263
Interpretable preferential Bayesian optimization用可解释树模型替代GP实现偏好贝叶斯优化。
cs.LGcs.AI
Nick Leenders, Thomas Quadt, Boris Cule, Roy Lindelauf, Herman Monsuur
Preferential Bayesian Optimization (PBO) aims to find a decision-maker's most preferred solution in as few pairwise comparisons as possible. Existing approaches rely on Gaussian Process (GP) surrogates, which provide strong performance but limited interpretabi...
Preferential Bayesian Optimization (PBO) aims to find a decision-maker's most preferred solution in as few pairwise comparisons as possible. Existing approaches rely on Gaussian Process (GP) surrogates, which provide strong performance but limited interpretability. This limits real-world usability in high-stakes domains, such as healthcare, where interpretability and trust are essential. We propose DT-PBO, a novel tree-based surrogate model for PBO that is inherently interpretable while capturin...
819 DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations
2512.17129
Differentiable morphogenesis simulation用可微代理仿真端到端学习3D形态生成协议。
cs.LG
Seong Ho Pahng, Guoye Guan, Benjamin Fefferman, Sahand Hormoz
Biological systems can form complex three-dimensional structures through the collective behavior of agents that share a common update rule and operate without central control. How such distributed control gives rise to precise global patterns remains a central...
Biological systems can form complex three-dimensional structures through the collective behavior of agents that share a common update rule and operate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that g...
820 Bloom Filter Encoding for Machine Learning
2512.19991
Bloom filter feature encoding用布隆滤波哈希编码将样本压缩为定长比特特征。
cs.LG
John Cartmell, Mihaela Cardei, Ionut Cardei
We present a method that uses a Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact bit-array representation using hash-based encoding, producing a fixed-length feature space that reduces memory usage and obfus...
We present a method that uses a Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact bit-array representation using hash-based encoding, producing a fixed-length feature space that reduces memory usage and obfuscates original feature values. The encoding does not rely on keyed hashing; however, a key can optionally be used to control the mapping and would be required to reproduce the representation. We evaluate the approach on six datasets spannin...
821 Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions
2512.20974
Deep Bayesian RL with GLMs在深度贝叶斯RL中学习基函数并用GLM建模转移与奖励。
cs.LGcs.AI
Jingyang You, Hanna Kurniawati
Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL method...
Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL methods assume known forms of transition and reward models. While recent deep BRL methods incorporate model learning to address this, applying neural networks directly to joint data and task parameters necessitates variational inference. This oft...
822 SB-TRPO: Towards Safe Reinforcement Learning with Hard Constraints
2512.23770
Hard-constrained safe reinforcement learning提出SB-TRPO在信赖域内实现近零违规的安全RL。
cs.LGcs.AI
Dominik Wagner, Ankit Kanwar, Luke Ong
In safety-critical domains, reinforcement learning (RL) agents must often satisfy strict, zero-cost safety constraints while accomplishing tasks. Existing model-free methods frequently either fail to achieve near-zero safety violations or become overly conserv...
In safety-critical domains, reinforcement learning (RL) agents must often satisfy strict, zero-cost safety constraints while accomplishing tasks. Existing model-free methods frequently either fail to achieve near-zero safety violations or become overly conservative. We introduce Safety-Biased Trust Region Policy Optimisation (SB-TRPO), a principled algorithm for hard-constrained RL that dynamically balances cost reduction with reward improvement. At each step, SB-TRPO updates via a dynamic conve...
823 FANoS-v2: Feedback-Controlled Momentum with Thermostat Damping for Lightweight Neural Optimization
2601.00889
Feedback-controlled neural optimizer给出FANoS-v2优化器的完整数学定义与稳定性诊断。
cs.LG
Nalin Dhiman
\FANOS{} is a PyTorch optimizer that augments RMS-preconditioned momentum with a scalar feedback controller over update energy. The public reference implementation stores momentum in parameter-update units, applies a non-negative thermostat damping coefficient...
\FANOS{} is a PyTorch optimizer that augments RMS-preconditioned momentum with a scalar feedback controller over update energy. The public reference implementation stores momentum in parameter-update units, applies a non-negative thermostat damping coefficient, supports diagonal, factored, and raw-gradient preconditioning, and exposes diagnostics intended for stability audits. This study gives a complete mathematical specification of the released optimizer, including the exact parameter-unit upd...
824 Partially Lazy Gradient Descent for Smoothed Online Learning
2601.15984
Partially lazy online gradient descent提出k-lazyGD在SOCO中权衡反应性与稳定性并给出保证。
cs.LG
Naram Mhaisen, George Iosifidis
We introduce \textsc{$k$-lazyGD}, an online learning algorithm that bridges the gap between greedy Online Gradient Descent (OGD, for $k{=}1$) and lazy GD/dual-averaging (for $k{=}T$), creating a spectrum between reactive and stable updates. We analyze this spe...
We introduce \textsc{$k$-lazyGD}, an online learning algorithm that bridges the gap between greedy Online Gradient Descent (OGD, for $k{=}1$) and lazy GD/dual-averaging (for $k{=}T$), creating a spectrum between reactive and stable updates. We analyze this spectrum in Smoothed Online Convex Optimization (SOCO), where the learner incurs both hitting and movement costs. Our main contribution is establishing that laziness is possible without sacrificing hitting performance: we prove that \textsc{$k...
825 ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule
2601.18681
RL timestep scheduling for diffusion用强化学习自适应重参数化时间表以提升扩散采样效率。
cs.LGcs.AI
Yilie Huang, Wenpin Tang, Xunyu Zhou
We consider time discretization for score-based diffusion models to generate samples from a learned reverse-time dynamic on a finite grid. Uniform and hand-crafted grids can be suboptimal given a budget on the number of time steps. We introduce Adaptive Repara...
We consider time discretization for score-based diffusion models to generate samples from a learned reverse-time dynamic on a finite grid. Uniform and hand-crafted grids can be suboptimal given a budget on the number of time steps. We introduce Adaptive Reparameterized Time (ART), which controls the clock speed of a reparameterized time variable to redistribute computation along the sampling trajectory while preserving the terminal time, with the objective of minimizing the aggregate Euler discr...
826 R-GTD: A Geometric Analysis of Gradient Temporal-Difference Learning in Singular Regimes
2601.20599
Singular-regime GTD analysis在特征交互矩阵奇异时给出GTD学习的几何收敛分析。
cs.LGcs.AI
Hyunjun Na, Donghwan Lee
Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) i...
Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) is nonsingular. In practice, the FIM can become singular and leads to instability or degraded performance. While some prior works have applied regularization to relax the nonsingularity assumption, their theoretical guarantees inevitably rel...
827 Exact Gaussian Moment Matching for Residual Networks: a Second-Order Method
2601.22307
Exact Gaussian moment matching推导多种激活下残差网络的高斯均值协方差精确传播。
cs.LG
Simon Kuang, Xinfan Lin
We study the problem of propagating the mean and covariance of a general multivariate Gaussian distribution through a deep (residual) neural network using layer-by-layer moment matching. We close a longstanding gap by deriving exact moment matching for the pro...
We study the problem of propagating the mean and covariance of a general multivariate Gaussian distribution through a deep (residual) neural network using layer-by-layer moment matching. We close a longstanding gap by deriving exact moment matching for the probit, GeLU, ReLU (as a limit of GeLU), Heaviside (as a limit of probit), and sine activation functions; for both feedforward and generalized residual layers. On random networks, we find orders-of-magnitude improvements in the KL divergence e...
828 Sparse Attention as Compact Kernel Regression
2601.22766
Sparse attention kernel interpretation证明稀疏注意力等价于有界支撑的紧核回归形式。
cs.LG
Saul Santos, Nuno Gon\c{c}alves, Daniel C. McNamee, Marcos Treviso, Andr\'e F. T Martins
Recent work has revealed a link between self-attention mechanisms in transformers and test-time kernel regression via the Nadaraya-Watson estimator, with standard softmax attention corresponding to a Gaussian kernel. However, a kernel-theoretic understanding o...
Recent work has revealed a link between self-attention mechanisms in transformers and test-time kernel regression via the Nadaraya-Watson estimator, with standard softmax attention corresponding to a Gaussian kernel. However, a kernel-theoretic understanding of sparse attention mechanisms is currently missing. In this paper, we establish a formal correspondence between sparse attention and compact (bounded support) kernels. We show that normalized ReLU and sparsemax attention arise from Epanechn...
829 PAIR-Former: Budgeted Relational Multi-Instance Learning for Functional miRNA Target Prediction
2602.00465
Relational MIL for miRNA targets提出预算化关系聚合的多实例学习预测miRNA靶点。
cs.LGcs.AI
Jiaqi Yin, Baiming Chen, Jia Fei, Mingjun Yang
Functional miRNA--mRNA targeting is a large-bag prediction problem where each transcript yields a heavy-tailed pool of candidate target sites (CTSs), yet only a pair-level label is observed. Prior methods use max-pooling over individual CTS scores, ignoring re...
Functional miRNA--mRNA targeting is a large-bag prediction problem where each transcript yields a heavy-tailed pool of candidate target sites (CTSs), yet only a pair-level label is observed. Prior methods use max-pooling over individual CTS scores, ignoring relational patterns among sites, but modeling these patterns is critical for accuracy. The challenge is that naive relational aggregation incurs $\mathcal{O}(n^2)$ cost, prohibitive when $n$ reaches thousands, yet a cheap scan alone discards ...
830 Minerva: Reinforcement Learning with Verifiable Rewards for Cyber Threat Intelligence LLMs
2602.00513
Verifiable-reward RL for CTI LLMs用可验证奖励的强化学习提升LLM生成结构化CTI输出。
cs.LG
Md Tanvirul Alam, Aritran Piplai, Ionut Cardei, Nidhi Rastogi, Peter J Worth Jr
Cyber threat intelligence (CTI) analysts routinely convert noisy, unstructured security artifacts into standardized, automation-ready representations. Although large language models (LLMs) show promise for this task, existing approaches remain brittle when pro...
Cyber threat intelligence (CTI) analysts routinely convert noisy, unstructured security artifacts into standardized, automation-ready representations. Although large language models (LLMs) show promise for this task, existing approaches remain brittle when producing structured CTI outputs and have largely relied on supervised fine-tuning (SFT). In contrast, CTI standards and community-maintained resources define canonical identifiers and schemas that enable deterministic verification of model ou...
831 ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning
2602.01003
Memory-efficient RL fine-tuning用ES结合SAM的ESSAM降低LLM强化微调显存开销。
cs.LGcs.AI
Zhishen Sun, Sizhe Dang, Guang Dai, Haishan Ye
Reinforcement learning (RL) has become a key training step for improving mathematical reasoning in large language models (LLMs), but it often has high GPU memory usage, which makes it hard to use in settings with limited resources. To reduce these issues, we p...
Reinforcement learning (RL) has become a key training step for improving mathematical reasoning in large language models (LLMs), but it often has high GPU memory usage, which makes it hard to use in settings with limited resources. To reduce these issues, we propose Evolution Strategies with Sharpness-Aware Maximization (ESSAM), a full parameter fine-tuning framework that tightly combines the zero-order search in parameter space from Evolution Strategies (ES) with the Sharpness-Aware Maximizatio...
832 The Effect of Mini-Batch Noise on the Implicit Bias of Adam
2602.01642
Adam implicit bias under batch noise理论分析小批量噪声如何影响Adam的隐式偏置与解选择。
cs.LGcs.AI
Matias D. Cattaneo, Boris Shigida
With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two momentum hyperparam...
With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two momentum hyperparameters $(\beta_1, \beta_2)$ controlling memory and one very important hyperparameter, batch size, controlling (in particular) the amount mini-batch noise. We introduce a theoretical framework to understand how mini-batch noise influences the...
833 TopoPrune: Robust Data Pruning via Unified Latent Space Topology
2602.02739
Topology-based robust data pruning用潜空间拓扑结构实现跨架构更稳健的数据剪枝。
cs.LGcs.AI
Arjun Roy, Prajna G. Malettira, Manish Nagaraj, Kaushik Roy
Geometric data pruning methods, while practical for leveraging pretrained models, are fundamentally unstable. Their reliance on extrinsic geometry renders them highly sensitive to latent space perturbations, causing performance to degrade during cross-architec...
Geometric data pruning methods, while practical for leveraging pretrained models, are fundamentally unstable. Their reliance on extrinsic geometry renders them highly sensitive to latent space perturbations, causing performance to degrade during cross-architecture transfer or in the presence of feature noise. We introduce TopoPrune, a framework which resolves this challenge by leveraging topology to capture the stable, intrinsic structure of data. TopoPrune operates at two scales, (1) utilizing ...
834 Koopman Autoencoders with Continuous-Time Latent Dynamics for Fluid Dynamics Forecasting
2602.02832
Continuous-time Koopman autoencoders用连续时间Koopman潜动力学实现任意步长流体预测。
cs.LG
Rares Grozavescu, Pengyu Zhang, Etienne Meunier, Mark Girolami
Forecasting physical systems over long horizons from irregularly sampled observations demands models that are stable, computationally efficient, and free of fixed-timestep assumptions. We address this with a continuous-time Koopman autoencoder whose latent dyn...
Forecasting physical systems over long horizons from irregularly sampled observations demands models that are stable, computationally efficient, and free of fixed-timestep assumptions. We address this with a continuous-time Koopman autoencoder whose latent dynamics obey $dz/dt = \mathbf{K}_{\mathrm{cont}} z$, yielding closed-form inference via $z(\tau) = \exp(\mathbf{K}_{\mathrm{cont}} \tau) z(0)$ at any horizon $\tau$ in a single step. This decouples forecast cost from forecast length at infere...
835 SLOPE: Optimistic Potential Landscape Shaping for Model-based Reinforcement Learning
2602.03201
Sparse-reward model-based RL shaping用乐观势函数塑形奖励景观以缓解稀疏奖励MBRL。
cs.LG
Yao-Hui Li, Zeyu Wang, Xin Li, Wei Pang, Yingfang Yuan
Model-based reinforcement learning (MBRL) is sample-efficient but struggles in sparse reward settings. A critical bottleneck arises from the lack of informative gradients in sparse settings, where standard reward models often yield flat landscapes that struggl...
Model-based reinforcement learning (MBRL) is sample-efficient but struggles in sparse reward settings. A critical bottleneck arises from the lack of informative gradients in sparse settings, where standard reward models often yield flat landscapes that struggle to guide planning. To address this challenge, we propose Shaping Landscapes with Optimistic Potential Estimates (SLOPE), a novel framework that shifts reward modeling from predicting sparse scalars to constructing informative potential la...
836 Bayesian Conformal Prediction as a Decision Risk Problem
2602.03331
Bayesian conformal prediction risk control结合贝叶斯后验与共形风险控制生成有限样本覆盖集合。
cs.LG
Fanyi Wu, Veronika Lohmanova, Samuel Kaski, Michele Caprio
We propose Bayesian Conformal Prediction (BCP), a framework that combines Bayesian posterior predictive distributions with PAC-style conformal risk control to produce prediction sets with finite-sample coverage guarantees. Standard quantile-threshold conformal...
We propose Bayesian Conformal Prediction (BCP), a framework that combines Bayesian posterior predictive distributions with PAC-style conformal risk control to produce prediction sets with finite-sample coverage guarantees. Standard quantile-threshold conformal methods often construct prediction sets using a single fixed threshold, which typically yields connected prediction sets. While valid, such sets can be inefficient when the posterior predictive distribution is multimodal, since they may sp...
837 Path Integration and Object-Location Binding Emerge in an Action-Conditioned Predictive Sequence Network
2602.03490
Action-conditioned predictive world models在动作条件预测网络中涌现路径积分与物体位置绑定表征。
cs.LG
Linda Ariel Ventura, Victoria Bosch, Tim C Kietzmann, Sushrut Thorat
Adaptive cognition requires structured internal models of objects and their relations. Predictive neural networks are often proposed to learn such world models, but how these are instantiated and how they support prediction remain unclear. We investigate this ...
Adaptive cognition requires structured internal models of objects and their relations. Predictive neural networks are often proposed to learn such world models, but how these are instantiated and how they support prediction remain unclear. We investigate this in a minimal in-silico setting. A recurrent neural network samples tokens sequentially from 2D continuous token scenes and is trained to predict the upcoming token from the current input and a saccade-like displacement. On novel scenes, pre...
838 Manifold Random Features
2602.03797
Random features on manifolds提出流形随机特征以近似流形上的核与双变量函数。
cs.LG
Ananya Parashar, Derek Long, Dwaipayan Saha, Krzysztof Choromanski
We present a new paradigm for creating random features to approximate bi-variate functions (in particular, kernels) defined on general manifolds. This new mechanism of Manifold Random Features (MRFs) leverages discretization of the manifold and the recently in...
We present a new paradigm for creating random features to approximate bi-variate functions (in particular, kernels) defined on general manifolds. This new mechanism of Manifold Random Features (MRFs) leverages discretization of the manifold and the recently introduced technique of Graph Random Features (GRFs) to learn continuous fields on manifolds. Those fields are used to find continuous approximation mechanisms that otherwise, in general scenarios, cannot be derived analytically. MRFs provide...
839 Mixture of Masters: Sparse Chess Language Models with Player Routing
2602.04447
Mixture-of-experts chess language models用大师路由的稀疏MoE棋类语言模型保留不同棋风。
cs.LGcs.AI
Giacomo Frisoni, Lorenzo Molfetta, Davide Freddi, Gianluca Moro
Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare bu...
Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare but effective strategies are suppressed. To counteract homogenization, we introduce Mixture-of-Masters (MoM), the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters. For each move, a post-hoc ...
840 SOCKET: SOft Collision Kernel EsTimator for Sparse Attention
2602.06283
Soft LSH kernel for sparse attention提出SOCKET用软碰撞核估计高效选择稀疏注意力token。
cs.LG
Sahil Joshi, Agniva Chowdhury, Wyatt Bellinger, Amar Kanakamedala, Ekam Singh
Exploiting sparsity during long-context inference is key to scaling large language models, as attention dominates the cost of autoregressive decoding. Sparse attention reduces this cost by restricting computation to a subset of tokens, but its effectiveness de...
Exploiting sparsity during long-context inference is key to scaling large language models, as attention dominates the cost of autoregressive decoding. Sparse attention reduces this cost by restricting computation to a subset of tokens, but its effectiveness depends on efficient scoring and selection at inference time. We revisit Locality-Sensitive Hashing (LSH) and introduce SOCKET, a SOft Collision Kernel EsTimator that replaces hard bucket matches with probabilistic, similarity-aware aggregati...
841 When Does Embedding Magnitude Matter? A Cross-Task Functional-Symmetry Framework
2602.09229
Embedding normalization in retrieval提出查询/文档归一化框架并验证单边归一化检索更优
cs.LG
Xincan Feng, Taro Watanabe
Cosine similarity normalizes both sides; dot product normalizes neither. We propose a 2x2 framework that independently controls query-side and document-side normalization, exposing two intermediate variants (QNorm, DNorm) that have not been previously studied....
Cosine similarity normalizes both sides; dot product normalizes neither. We propose a 2x2 framework that independently controls query-side and document-side normalization, exposing two intermediate variants (QNorm, DNorm) that have not been previously studied. On retrieval with four encoders, evaluated in-domain on MS MARCO and out-of-domain on BEIR, BRIGHT, and multi-hop QA, the unilateral variants outperform both cosine and dot product, with relative gains of up to +72% out-of-domain and +24% ...
842 Exponential Sample Complexity Separation between Flat and Hierarchical Agentic Theorem Provers
2602.10512
Hierarchical agentic theorem proving证明分层引理分解的定理证明器具指数级样本复杂度优势
cs.LG
Sho Sonoda, Shunta Akiyama, Yuya Uezato
Agentic theorem provers often introduce intermediate lemmas, proof sketches, or subgoal decompositions before returning to tactic-level search. This can look like an expensive detour: if proving lemmas is itself hard, why should a learned prover spend effort t...
Agentic theorem provers often introduce intermediate lemmas, proof sketches, or subgoal decompositions before returning to tactic-level search. This can look like an expensive detour: if proving lemmas is itself hard, why should a learned prover spend effort there? We give a statistical learning answer. Instead of worst-case proof complexity over all formulas, we study the biased data distribution produced by a teacher prover: initial theorem states together with successful verified proof traces...
843 VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
2602.10693
Off-policy RL for LLMs提出序列级变分软策略优化以稳定低方差离策略LLM训练
cs.LGcs.AI
Guobin Shen, Chenxiao Zhao, Xiang Cheng, Lei Huang, Xing Yu
Off-policy updates are inevitable in reinforcement learning (RL) for large language models (LLMs) due to rollout staleness from asynchronous training and mismatches between training and inference engines. Naive importance sampling gives an unbiased correction ...
Off-policy updates are inevitable in reinforcement learning (RL) for large language models (LLMs) due to rollout staleness from asynchronous training and mismatches between training and inference engines. Naive importance sampling gives an unbiased correction but suffers from high variance, which is amplified by unbounded ratios and autoregressive generation. Prior remedies either rely on scenario-specific engineering, or trade bias for variance via token-level clipping or sequence-level normali...
844 Amortized Molecular Optimization via Group Relative Policy Optimization
2602.12162
Amortized molecular optimization用组相对策略优化训练可迁移策略以加速受约束分子优化
cs.LG
Muhammad bin Javaid, Hasham Hussain, Ashima Khanna, Berke Kisin, Jonathan Pirnay
In structurally constrained molecular optimization, state-of-the-art methods restart an expensive oracle-driven search from scratch for every new input structure, scaling poorly to settings with many starting structures or expensive oracles. While amortized ap...
In structurally constrained molecular optimization, state-of-the-art methods restart an expensive oracle-driven search from scratch for every new input structure, scaling poorly to settings with many starting structures or expensive oracles. While amortized approaches that learn a transferable policy could in principle remove this bottleneck, existing methods struggle to generalize to diverse structural constraints at inference time. We present AMORTIX, an amortized Graph Transformer model that ...
845 $\gamma$-weakly $\theta$-up-concavity: A Unified Framework for Non-Convex Optimization Beyond DR-Submodular and OSS Functions
2602.13506
Non-convex optimization theory提出γ弱θ上凹性统一刻画并推广多类非凸可优化函数
cs.LGcs.AI
Mohammad Pedramfar, Vaneet Aggarwal
Optimizing non-convex functions is a fundamental challenge across machine learning and combinatorial optimization. We introduce and study $\gamma$-weakly $\theta$-up-concavity, a novel first-order condition that characterizes a broad class of such functions. T...
Optimizing non-convex functions is a fundamental challenge across machine learning and combinatorial optimization. We introduce and study $\gamma$-weakly $\theta$-up-concavity, a novel first-order condition that characterizes a broad class of such functions. This condition provides a powerful unifying framework, strictly generalizing both DR-submodular and One-Sided Smooth (OSS) functions while capturing broader forms of scale-dependent curvature, including accumulating-then-diminishing returns ...
846 Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning
2602.14868
Curriculum RL for reasoning通过自适应调节任务难度缓解稀疏奖励并提升推理RL效率
cs.LGcs.AI
Ilia Mahrooghi, Aryo Lotfi, Emmanuel Abbe
Reinforcement learning has emerged as a powerful paradigm for unlocking reasoning capabilities in language models. However, relying on sparse rewards makes this process highly sample-inefficient, as models must navigate vast search spaces with minimal feedback...
Reinforcement learning has emerged as a powerful paradigm for unlocking reasoning capabilities in language models. However, relying on sparse rewards makes this process highly sample-inefficient, as models must navigate vast search spaces with minimal feedback. While classic curriculum learning aims to mitigate this by ordering data based on complexity, prior works have primarily targeted small datasets and do not directly transfer to the large-scale settings typical of modern LM training. Furth...
847 RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion
2602.16548
RNA 3D inverse design用RL引导扩散模型进行RNA三维结构逆向设计并优化结构保真
cs.LG
Tianmeng Hu, Yongzheng Cui, Biao Luo, Ke Li
The inverse design of RNA three-dimensional (3D) structures is crucial for engineering functional RNAs in synthetic biology and therapeutics. While recent deep learning approaches have advanced this field, they are typically optimized and evaluated using nativ...
The inverse design of RNA three-dimensional (3D) structures is crucial for engineering functional RNAs in synthetic biology and therapeutics. While recent deep learning approaches have advanced this field, they are typically optimized and evaluated using native sequence recovery, which is a limited surrogate for structural fidelity, since different sequences can fold into similar 3D structures and high recovery does not necessarily indicate correct folding. To address this limitation, we propose...
848 Structured Prototype-Guided Adaptation for EEG Foundation Models
2602.17251
EEG foundation model adaptation用结构化原型引导微调缓解少标注EEG模型失配与漂移
cs.LG
Jingying Ma, Feng Wu, Yucheng Xing, Qika Lin, Tianyu Liu
Electroencephalography (EEG) foundation models (EFMs) have shown strong potential for transferable representation learning, yet their adaptation in realistic settings remains challenging when only a few labeled subjects are available. We show that this challen...
Electroencephalography (EEG) foundation models (EFMs) have shown strong potential for transferable representation learning, yet their adaptation in realistic settings remains challenging when only a few labeled subjects are available. We show that this challenge stems from a structural mismatch between noisy, limited supervision and the highly plastic parameter space of EFMs, reflected in three key failure modes: overconfident miscalibration, prediction collapse, and representation drift caused ...
849 Emergent Manifold Separability during Reasoning in Large Language Models
2602.20338
LLM reasoning representation geometry用流形容量理论分析CoT推理中表征可分性随时间涌现
cs.LG
Chanwoo Chun, Alexandre Polo, SueYeon Chung
Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) t...
Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) to two compositional reasoning tasks: a controlled Boolean logic tree that supports deep mechanistic analysis, and a natural-language eligibility task in which the model has to extract attributes from prose, compare them to thresholds, and c...
850 MAST: A Multi-fidelity Augmented Surrogate model via Spatial Trust-weighting
2602.20974
Multi-fidelity surrogate modeling提出空间信任加权的多保真增强代理模型以兼顾精度与成本
cs.LG
Ahmed Mohamed Eisa Nasr, Ali Elham, Haris Moazam Sheikh
In engineering design and scientific computing, computational cost and predictive accuracy are intrinsically coupled. High-fidelity simulations provide accurate predictions but at substantial computational costs, while lower-fidelity approximations offer effic...
In engineering design and scientific computing, computational cost and predictive accuracy are intrinsically coupled. High-fidelity simulations provide accurate predictions but at substantial computational costs, while lower-fidelity approximations offer efficiency at the expense of accuracy. Multi-fidelity surrogate modelling addresses this trade-off by combining abundant low-fidelity data with sparse high-fidelity observations. However, existing methods rely on global correlation assumptions t...
851 Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parametric Policies
2602.23811
Offline RL with function approximation给出参数化策略的离线策略优化理论并超越逐状态镜像下降限制
cs.LGcs.AI
Xiang Li, Yuheng Zhang, Nan Jiang
We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundations of learning a good policy from offline data via pessimis...
We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundations of learning a good policy from offline data via pessimism, existing algorithms that are computationally tractable (often in an oracle-efficient sense), such as PSPI, only apply to finite and small action spaces. Moreover, these algorithms rely on state-wise mirror descent and require actors to b...
852 Econometric vs. Causal Structure-Learning for Time-Series Policy Decisions: Evidence from the UK COVID-19 Policies
2603.00041
Causal discovery in time series比较计量与因果结构学习方法在英国新冠政策时序决策中的表现
cs.LGcs.AI
Bruno Petrungaro, Anthony C. Constantinou
Causal machine learning (ML) recovers graphical structures that inform us about potential cause-and-effect relationships. Most progress has focused on cross-sectional data with no explicit time order, whereas recovering causal structures from time series data ...
Causal machine learning (ML) recovers graphical structures that inform us about potential cause-and-effect relationships. Most progress has focused on cross-sectional data with no explicit time order, whereas recovering causal structures from time series data remains the subject of ongoing research in causal ML. In addition to traditional causal ML, this study assesses econometric methods that some argue can recover causal structures from time series data. The use of these methods can be explain...
853 VDCook:DIY video data cook your MLLMs
2603.05539
Video data construction for MLLMs构建自演化视频数据操作系统以检索与合成生成可追溯数据包
cs.LGcs.AIcs.MM
Chengwei Wu
We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain teams. Users initiate data requests via natural language queries and adjustable parameters (scale, retrieval-s...
We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain teams. Users initiate data requests via natural language queries and adjustable parameters (scale, retrieval-synthesis ratio, quality threshold). The system automatically performs query optimization, concurrently running real video retrieval and controlled synthesis modules. It ultimately generates in-domain data packages with complete provenance a...
854 Exact Is Easier: Credit Assignment for Cooperative LLM Agents
2603.06859
Credit assignment for LLM agents指出移除代理等反事实评估会失真并提出更精确的协作归因方法
cs.LGcs.AI
Yanjun Chen, Yirong Sun, Hanlin Wang, Jinghan Wang, Xinming Zhang
Removing an agent from a cooperative team to measure its contribution seems natural, yet in multi-agent LLM systems this evaluation distorts the result it claims to measure. This failure is not isolated: learned critics, trajectory-level baselines, and agent-r...
Removing an agent from a cooperative team to measure its contribution seems natural, yet in multi-agent LLM systems this evaluation distorts the result it claims to measure. This failure is not isolated: learned critics, trajectory-level baselines, and agent-removal counterfactuals all inherit from standard multi-agent reinforcement learning a premise that exact counterfactual evaluation requires privileged environment access, and therefore approximate. In cooperative LLM systems, this premise i...
855 Upper Generalization Bounds for Neural Oscillators
2603.09742
Generalization bounds for neural oscillators推导二阶ODE神经振子架构的上界泛化误差理论结果
cs.LG
Zifeng Huang, Konstantin M. Zuev, Yong Xia, Michael Beer
Neural oscillators that originate from second-order ordinary differential equations (ODEs) have shown competitive performance in learning mappings between dynamic loads and responses of complex nonlinear structural systems. Despite this empirical success, theo...
Neural oscillators that originate from second-order ordinary differential equations (ODEs) have shown competitive performance in learning mappings between dynamic loads and responses of complex nonlinear structural systems. Despite this empirical success, theoretically quantifying the generalization capacities of their neural network architectures remains undeveloped. In this study, the neural oscillator consisting of a second-order ODE followed by a multilayer perceptron (MLP) is considered. It...
856 How Log-Barrier Helps Exploration in Policy Optimization
2603.15001
Exploration in policy optimization用对数障碍正则化为策略优化引入显式探索并改进收敛保证
cs.LGcs.AI
Leonardo Cesani, Matteo Papini, Marcello Restelli
Recently, it has been shown that the Stochastic Gradient Bandit (SGB) algorithm converges to a globally optimal policy with a constant learning rate. However, these guarantees rely on unrealistic assumptions about the learning process, namely that the probabil...
Recently, it has been shown that the Stochastic Gradient Bandit (SGB) algorithm converges to a globally optimal policy with a constant learning rate. However, these guarantees rely on unrealistic assumptions about the learning process, namely that the probability of the optimal action is always bounded away from zero. We attribute this to the lack of an explicit exploration mechanism in SGB. To address these limitations, we propose to regularize the SGB objective with a log-barrier on the parame...
857 A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks
2603.22586
Instruction-conditioned time-series ICL提出指令条件时序基础模型以示例提示实现多任务在上下文学习
cs.LG
Anish Saha, Konstantin Shmakov
In-context learning (ICL) enables task adaptation at inference time by conditioning on demonstrations rather than updating model parameters. Although recent time-series foundation models incorporate contextual conditioning, retrieval, or example-based promptin...
In-context learning (ICL) enables task adaptation at inference time by conditioning on demonstrations rather than updating model parameters. Although recent time-series foundation models incorporate contextual conditioning, retrieval, or example-based prompting, they typically rely on implicit positional structure or task-specific objectives rather than explicit instruction-conditioned input-output demonstrations. We introduce iAmTime, a time-series foundation model trained with instruction-cond...
858 Demystifying Lipschitz verification: positive matrices, negative results
2603.28113
Lipschitz constant verification分析SDP等Lipschitz验证的局限并给出正矩阵视角的负面结果
cs.LG
Simon Kuang, Yuezhu Xu, S. Sivaranjani, Xinfan Lin
The global Lipschitz constant of a neural network is related to robustness and generalization, yet unlike in many classical models, it is not plainly legible from the parameters. This has motivated sophisticated verification algorithms, especially semidefinite...
The global Lipschitz constant of a neural network is related to robustness and generalization, yet unlike in many classical models, it is not plainly legible from the parameters. This has motivated sophisticated verification algorithms, especially semidefinite programming (SDP) based on incremental quadratic constraints on the activation functions, to improve on the fast but often loose product of layerwise Lipschitz constants (the trivial bound). We ask why Lipschitz verification is a problem i...
859 ASPECT: Node-Level Adaptive Spectral Fusion for Graph Contrastive Learning
2604.01878
Graph contrastive spectral fusion提出节点级自适应谱融合以改进图对比学习的低高频视图结合
cs.LGcs.AI
Zhuolong Li, Boxue Yang, Haopeng Chen
Spectral graph contrastive learning often constructs low- and high-frequency views to capture complementary graph signals, but these views are commonly combined by graph-level or node-agnostic fusion rules. We show that graph-level fusion can incur irreducible...
Spectral graph contrastive learning often constructs low- and high-frequency views to capture complementary graph signals, but these views are commonly combined by graph-level or node-agnostic fusion rules. We show that graph-level fusion can incur irreducible regret on mixed graphs with separated node-wise spectral preferences. Motivated by this result, we propose ASPECT, a spectral graph contrastive learning method that adaptively fuses low- and high-frequency views at the node level. ASPECT l...
860 AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation
2604.02525
Low-precision training stabilization提出感知离群模式的旋转策略以提升低精度训练速度与精度
cs.LG
Seonggon Kim, Alireza Khodamoradi, Pranathi Vasireddy, Kristof Denolf, Eunhyeok Park
Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces qu...
Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces quantization error only when its direction is properly aligned with the operand's outlier structure. Through a systematic study of weights, activations, and gradients in LLM training, we identify three stable outlier patterns, Row-wise, Colum...
861 Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions
2604.07277
Efficient online RL for Android agents提出单状态多动作更新以降低交互成本并提升安卓代理在线训练效率
cs.LGcs.AI
Guo Gan, Yuxuan Ding, Cong Chen, Yuwei Ren, Yin Huang
Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample ineffi...
Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample inefficiency of existing RL algorithms. We identify a fundamental limitation in current approaches: the Single State Single Action paradigm, which updates the policy with one-to-one state-action pairs from online one-way rollouts without fully ex...
862 The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts
2604.11962
Feature geometry in deep networks提出线性质心假说将特征解释为局部专家学习的质心方向结构
cs.LG
Thomas Walker, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk
The Linear Representation Hypothesis (LRH) identifies features of a trained deep network (DN) as linear directions in the activation spaces, i.e., output spaces of intermediate layers. This characterization decouples the input-output maps learned by a DN from ...
The Linear Representation Hypothesis (LRH) identifies features of a trained deep network (DN) as linear directions in the activation spaces, i.e., output spaces of intermediate layers. This characterization decouples the input-output maps learned by a DN from the organization of feature directions in its activation spaces. We introduce the Linear Centroids Hypothesis (LCH), which instead identifies features with linear directions among a DN's centroid spaces -- where any vector denotes a centroi...
863 Loss-Driven Bayesian Active Learning
2604.11995
Bayesian active learning objectives提出由下游损失直接导出的贝叶斯主动学习采样目标与算法
cs.LG
Zhuoyue Huang, Freddie Bickford Smith, Tom Rainforth
The central goal of active learning is to gather data that maximises downstream predictive performance, but popular approaches have limited flexibility in customising this data acquisition to different downstream problems and losses. We propose a rigorous loss...
The central goal of active learning is to gather data that maximises downstream predictive performance, but popular approaches have limited flexibility in customising this data acquisition to different downstream problems and losses. We propose a rigorous loss-driven approach to Bayesian active learning that allows data acquisition to directly target the loss associated with a given decision problem. In particular, we show how any loss can be used to derive a unique objective for optimal data ac...
864 Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
2604.13010
Offline on-policy distillation提出离线在策略蒸馏以降低教师服务开销并分析失败条件与修正
cs.LGcs.AI
Yecheng Wu, Song Han, Hai Cai
On-policy distillation (OPD) is an effective post-training paradigm for large language models but requires a live teacher server throughout training, resulting in substantial infrastructure overhead. We investigate whether OPD can be performed offline by preco...
On-policy distillation (OPD) is an effective post-training paradigm for large language models but requires a live teacher server throughout training, resulting in substantial infrastructure overhead. We investigate whether OPD can be performed offline by precomputing teacher log-probabilities once over SFT rollouts and reusing them during training. We find that naively doing so fails to reliably match standard OPD, and trace the root cause to a previously overlooked condition we term teacher con...
865 Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction
2604.15694
Discrete diffusion via CTMCs将CTMC反向率分解为跳时与跳向并据此构建离散扩散生成模型
cs.LG
Jingyuan Li, Xiaoyi Jiang, Fukang Wen, Wei Liu, Renqian Luo
Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix monolithically -- through proxies such as co...
Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix monolithically -- through proxies such as concrete scores (SEDD) or clean-data predictions (MDLM, GIDD) -- rather than aligning the parameterization with the intrinsic CTMC decomposition into jump timing and jump direction. We propose \textbf{Neural CTMC}, which exploits the underlyi...
866 EviDep: Trustworthy Multimodal Depression Estimation via Disentangled Evidential Learning
2604.16579
Uncertainty-aware depression estimation用解耦证据学习在多模态抑郁估计中同时给出预测与不确定性
cs.LGcs.AI
Fangyuan Liu, Sirui Zhao, Zeyu Zhang, Jinyang Huang, Feng-Qi Cui
Automated multimodal depression estimation in unconstrained environments is inherently challenged by naturalistic noise and complex behavioral variability. Prevailing deterministic methods, however, produce uncalibrated point estimates without quantifying pred...
Automated multimodal depression estimation in unconstrained environments is inherently challenged by naturalistic noise and complex behavioral variability. Prevailing deterministic methods, however, produce uncalibrated point estimates without quantifying predictive uncertainty, exposing decision-making to the risk of overconfident, untrustworthy estimates. To establish a reliable and trustworthy estimation paradigm, we propose EviDep, an evidential learning framework that jointly quantifies dep...
867 Robustness of Spatio-temporal Graph Neural Networks for Fault Location in Partially Observable Distribution Grids
2604.20403
Robust spatio-temporal GNNs for grids评测并增强时空图神经网络在部分可观测配电网故障定位的鲁棒性
cs.LG
Burak Karabulut, Carlo Manna, Chris Develder
Fault location in distribution grids is critical for reliability and minimizing outage durations. Yet, it remains challenging due to partial observability, given sparse measurement infrastructure. Recent works show promising results by combining Recurrent Neur...
Fault location in distribution grids is critical for reliability and minimizing outage durations. Yet, it remains challenging due to partial observability, given sparse measurement infrastructure. Recent works show promising results by combining Recurrent Neural Networks (RNNs) and Graph Neural Networks (GNNs) for spatio-temporal learning. Still, many modern GNN architectures remain untested for this grid application, while existing GNN solutions have not explored GNN topology definitions beyond...
868 Transferable SCF-Acceleration through Solver-Aligned Initialization Learning
2604.21657
ML initialization for DFT SCF学习与求解器对齐的初始化以可迁移地加速KS-DFT自洽场收敛
cs.LG
Eike S. Eberhard, Viktor Kotsev, Timm G\"uthle, Stephan G\"unnemann
The cost of Kohn-Sham density functional theory (KS-DFT) calculations scales with the number of solver iterations, which depends on the quality of the initial guess. Machine learning methods that predict initial guesses from molecular geometry can reduce this ...
The cost of Kohn-Sham density functional theory (KS-DFT) calculations scales with the number of solver iterations, which depends on the quality of the initial guess. Machine learning methods that predict initial guesses from molecular geometry can reduce this cost, but matrix-prediction models fail when extrapolating to larger molecules, degrading rather than accelerating convergence [Liu et al., 2025]. We show that this failure is a supervision problem, not an extrapolation problem: models trai...
869 The Role of Symmetry in Optimizing Overparameterized Networks
2604.25150
Symmetry in overparameterized optimization分析过参数化引入的权重对称性如何改善条件数并促进优化
cs.LGcs.AI
Kusha Sareen, Mohammad Pedramfar, S\'ekou-Oumar Kaba, Mehran Shakerinava, Siamak Ravanbakhsh
Overparameterization is central to the success of deep learning, yet the mechanisms by which it improves optimization remain incompletely understood. We analyze weight-space symmetries in neural networks and show that overparameterization introduces additional...
Overparameterization is central to the success of deep learning, yet the mechanisms by which it improves optimization remain incompletely understood. We analyze weight-space symmetries in neural networks and show that overparameterization introduces additional symmetries that benefit optimization in two distinct ways. First, we prove that these symmetries act as a form of diagonal preconditioning on the Hessian, enabling the existence of better-conditioned minima within each equivalence class of...
870 Polynomial-Time Optimal Group Selection via the Double-Commutator Eigenvalue Problem
2605.00834
Polynomial-time group selection将群选择化为双交换子特征值问题并给出多项式时间最优算法
cs.LG
Mitchell A. Thornton
The algebraic diversity framework generalizes temporal averaging over multiple observations to algebraic group action on a single observation for second-order statistical estimation. The central open problem in this framework is $\textit{group selection}$: giv...
The algebraic diversity framework generalizes temporal averaging over multiple observations to algebraic group action on a single observation for second-order statistical estimation. The central open problem in this framework is $\textit{group selection}$: given an $M$-dimensional observation with unknown covariance structure, find the finite group whose spectral decomposition best matches the covariance. Naive enumeration of all subgroups of the symmetric group $S_M$ requires exponential time i...
871 Rhamba: Region-Aware Hybrid Attention-Mamba Framework for Self-Supervised Learning in Resting-State fMRI
2605.01240
Self-supervised fMRI pretraining提出区域感知掩码与Attention-Mamba融合的静息态fMRI自监督预训练框架。
cs.LGcs.AI
Ruthwik Reddy Doodipala, Pankaj Pandey, Pratheek Eranki, Carolina Torres-Rojas, Manob Jyoti Saikia
Self-supervised pretraining is promising for large-scale neuroimaging, yet the impact of region-aware masking and hybrid sequence modeling remains underexplored. In this work, we introduce Rhamba, a region-aware pretraining framework that integrates anatomical...
Self-supervised pretraining is promising for large-scale neuroimaging, yet the impact of region-aware masking and hybrid sequence modeling remains underexplored. In this work, we introduce Rhamba, a region-aware pretraining framework that integrates anatomically guided masking with hybrid Attention-Mamba architectures for resting state functional magnetic resonance imaging (fMRI) analysis. Models were pretrained on the ABIDE dataset using region-aligned patch embeddings and three masking strateg...
872 A Theory of Saddle Escape in Deep Nonlinear Networks
2605.01288
Saddle escape theory推导深层非线性网络鞍点逃逸与特征突变的理论刻画并分类激活函数。
cs.LG
Divit Rawal, Michael R. DeWeese
In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks re...
In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks remains challenging. We derive an exact identity for the imbalance of Frobenius norms of layer weight matrices that holds for any smooth activation and any differentiable loss and use this to classify activation functions into four universali...
873 QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
2605.01862
Offline goal-conditioned RL提出Q条件混合Attention-Mamba序列模型以提升离线目标条件强化学习的历史建模。
cs.LG
Xing Lei, Jincheng Wang, Xuetao Zhang, Donglin Wang
Offline goal-conditioned RL (GCRL) learns goal-reaching policies from static datasets, but real-world datasets are often partially observable and history-dependent, exhibiting a mix of Markovian and non-Markovian that violate standard RL assumptions. History-a...
Offline goal-conditioned RL (GCRL) learns goal-reaching policies from static datasets, but real-world datasets are often partially observable and history-dependent, exhibiting a mix of Markovian and non-Markovian that violate standard RL assumptions. History-aware sequence models such as Decision Transformer (DT) are a natural fit for long-term dependency modeling, yet pure attention is inefficient and brittle when handling local Markovian structure and long-range context simultaneously. Althoug...
874 DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning
2605.02196
Machine unlearning attack揭示INT4量化会恢复已遗忘内容并提出量化恢复攻击与系统评测。
cs.LG
Abdullah Ahmad Khan, Ferdous Sohel
Machine unlearning aims to remove specified training data to satisfy privacy regulations such as GDPR. However, existing evaluations assume identical precision at unlearning and deployment, overlooking that production LLMs are deployed at low-bit precision. We...
Machine unlearning aims to remove specified training data to satisfy privacy regulations such as GDPR. However, existing evaluations assume identical precision at unlearning and deployment, overlooking that production LLMs are deployed at low-bit precision. We show that INT4 quantization systematically restores forgotten content even when models pass compliance audits at bfloat16 (BF16), we term this the quantization recovery attack (QRA). We conduct the first systematic study of unlearning robu...
875 AsymTalker: Identity-Consistent Long-Term Talking Head Generation via Asymmetric Distillation
2605.02948
Talking head video generation用非对称蒸馏缓解分块生成的身份漂移,实现长时一致的说话人视频生成。
cs.LGcs.AIcs.SD
Yuxin Lu, Qian Qiao, Jiayang Sun, Guibo Zhu, Min Cao
Diffusion-based talking head generation has achieved remarkable visual quality, yet scaling it to long-term videos remains challenging. The widely adopted chunk-wise paradigm introduces two fundamental failures: (1) temporal-spatial misalignment between static...
Diffusion-based talking head generation has achieved remarkable visual quality, yet scaling it to long-term videos remains challenging. The widely adopted chunk-wise paradigm introduces two fundamental failures: (1) temporal-spatial misalignment between static identity references and dynamic audio streams, and (2) cascading identity drift propagated through self-generated continuity references across chunks. To address both issues, we propose AsymTalker, a novel diffusion-based talking head gene...
876 DGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignment
2605.03327
LLM RL credit assignment提出DGPO以细粒度分配推理步骤信用并稳定KL约束的策略优化。
cs.LGcs.AI
Hongbo Jin, Rongpeng Zhu, Zhongjing Du, Xu Jiang, Jingqi Tian
Reinforcement learning is crucial for aligning large language models to perform complex reasoning tasks. However, current algorithms such as Group Relative Policy Optimization suffer from coarse grained, sequence level credit assignment, which severely struggl...
Reinforcement learning is crucial for aligning large language models to perform complex reasoning tasks. However, current algorithms such as Group Relative Policy Optimization suffer from coarse grained, sequence level credit assignment, which severely struggles to isolate pivotal reasoning steps within long Chain of Thought generations. Furthermore, the standard unbounded Kullback Leibler divergence penalty induces severe gradient instability and mode seeking conservatism, ultimately stifling t...
877 Time series causal discovery with variable lags
2605.04081
Time-series causal discovery研究含可变滞后的时间序列因果结构学习方法以恢复因果图。
cs.LGcs.AI
Bruno Petrungaro, Anthony C. Constantinou
Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs require a cause-and-effect map ...
Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs require a cause-and-effect map of the variables under consideration, known as the network's structure. Learning the graphical structure of a causal model from data remains challenging; learning it from time-series data is even harder because dependencies may arise at dif...
878 Gradient Flow Structure and Quantitative Dynamics of Multi-Head Self-Attention
2605.04279
Multi-head attention dynamics建立多头自注意力的梯度流动力学理论并分析聚类与头间干扰。
cs.LG
Ayan Pendharkar
Transformer self-attention can be interpreted as a gradient flow on the unit sphere, in which tokens evolve under softmax interaction potentials and tend to form clusters. While prior work has established clustering behavior for single-head attention, the mult...
Transformer self-attention can be interpreted as a gradient flow on the unit sphere, in which tokens evolve under softmax interaction potentials and tend to form clusters. While prior work has established clustering behavior for single-head attention, the multi-head setting remains less understood due to geometric interference between heads, which invalidates standard monotonicity arguments. In this work, we develop a theoretical framework for multi-head self-attention dynamics and resolve sever...
879 LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems
2605.04323
Soil multimodal dataset发布LUCAS-MEGA大规模土壤环境多模态数据集以支持表征学习。
cs.LG
Kuangdai Leng, Simon Jeffery, Panos Panagos, Tarje Nissen-Meyer
Understanding soil is fundamental to agriculture, carbon cycling, and environmental sustainability, yet progress is limited by fragmented and heterogeneous datasets that constrain modeling to small-scale predictive settings rather than high-dimensional represe...
Understanding soil is fundamental to agriculture, carbon cycling, and environmental sustainability, yet progress is limited by fragmented and heterogeneous datasets that constrain modeling to small-scale predictive settings rather than high-dimensional representation learning. We introduce LUCAS-MEGA, a large-scale multimodal dataset constructed through systematic data fusion of European soil-environment observations, with the LUCAS survey as its backbone. The fused dataset comprises over 70,000...
880 Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention
2605.04460
Sparse counterfactual intervention通过潜变量调整发现稀疏可行的调查变量干预以实现社区反事实转移。
cs.LG
Fatima Ashraf, Muhammad Ayub Sabir, Junbiao Pang, Yufang Zhou, Yan Shang
Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual co...
Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual community intervention from survey responses, where the goal is to shift a target respondent group toward a desired reference group through controllable survey-variable adjustments. We formulate this task as a policy-feasible distributional a...
881 SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
2605.04712
Continual DRL with MoE提出SPHERE缓解MoE在持续强化学习中的谱可塑性损失与性能退化。
cs.LG
Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li
In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixt...
In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating ...
882 Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
2605.05112
Binary-reward RL steering提出通过控制rollout通过率将二值奖励RL引导到信息量最大的训练区间。
cs.LG
Tianshu Zhu, Wenyu Zhang, Xiaoying Zuo, Lun Tian, Haotian Zhao
Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side sign...
Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side signal is strongest near a 50% rollout pass rate under four criteria: reward entropy, group-filtering survival, leave-one-out (RLOO) advantage energy under Group Relative Policy Optimization (GRPO), and success-failure pair count. We propose Pr...
883 MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference
2605.05225
Efficient multimodal MoE inference提出MACS按模态感知进行容量缩放以减少多模态MoE推理的拖尾瓶颈。
cs.LGcs.AI
Bo Li, Chuan Wu, shaolin Zhu
Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-base...
Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load balancing methods fail to address two unique challenges: (1) Information Heterogeneity, where numerous redundant visual tokens are treated equally to semantically critical ones, and (2) Modality Dynamics, where varying visual to text...
884 Online Localized Conformal Prediction
2605.05497
Online conformal prediction提出在线局部化保序预测以在非交换数据下实现更高效的校准与覆盖。
cs.LG
Yuheng Lai, Garvesh Raskutti
Conformal prediction is a framework that provides valid uncertainty quantification for general models with exchangeable data. However, in the online learning and time-series settings, exchangeability is not satisfied. Existing online conformal methods, such as...
Conformal prediction is a framework that provides valid uncertainty quantification for general models with exchangeable data. However, in the online learning and time-series settings, exchangeability is not satisfied. Existing online conformal methods, such as adaptive conformal inference (ACI), can achieve long-run validity, yet they remain inefficient under covariate heterogeneity because they rely on global calibration. We propose \emph{Online Localized Conformal Prediction (OLCP)}, which com...
885 LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites
2605.05615
LLM inference carbon modeling构建LEO卫星上LLM推理的全生命周期能耗与碳足迹建模框架LLMSpace。
cs.LG
Lei Jiang, Adrian Ildefonso, Daniel Loveless, Fan Chen
Large language models (LLMs) impose rapidly growing energy demands, creating an emerging energy and carbon crisis driven by large-scale inference. Solar-powered, AI-enabled low Earth orbit (LEO) satellites have been proposed to mitigate terrestrial electricity...
Large language models (LLMs) impose rapidly growing energy demands, creating an emerging energy and carbon crisis driven by large-scale inference. Solar-powered, AI-enabled low Earth orbit (LEO) satellites have been proposed to mitigate terrestrial electricity consumption, but their lifecycle carbon footprint remains poorly understood due to launch emissions, satellite manufacturing, and radiation-hardened hardware requirements. This paper presents \textit{LLMSpace}, the first carbon modeling fr...
886 CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning
2605.05732
Continual learning without weight updates提出CRAFT用低秩隐表示干预替代权重更新以减轻持续微调遗忘。
cs.LGcs.AI
Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari
Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank int...
Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in three stages: it first routes each task to a group of similar tasks based on output-distribution divergence; it then fine-tunes the model using a Kullback-Leibler (KL) divergence again...
887 Retrieval from Within: An Intrinsic Capability of Attention-Based Models
2605.05806
Intrinsic retrieval in attention提出INTRA让编码解码模型用注意力在内部表征中检索证据并复用生成。
cs.LG
Elad Hoffer, Yochai Blau, Edan Kinderman, Ron Banner, Daniel Soudry
Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval v...
Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generat...
888 Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs
2605.05957
LLM factual correction suppression构建基准并分析任务请求会抑制LLM对错误前提的事实纠正行为。
cs.LG
Zixuan Chen, Hao Lin, Zizhe Chen, Yizhou Tian, Garry Yang
LLMs reliably correct false claims when presented in isolation, yet when the same claims are embedded in task-oriented requests, they often comply rather than correct. We term this failure mode \emph{correction suppression} and construct a benchmark of 300 fal...
LLMs reliably correct false claims when presented in isolation, yet when the same claims are embedded in task-oriented requests, they often comply rather than correct. We term this failure mode \emph{correction suppression} and construct a benchmark of 300 false premises to systematically evaluate it across eight models. Suppression rates range from 19\% to 90\%, with four models exceeding 80\%, establishing correction suppression as a prevalent and severe phenomenon. Mechanistic analysis reveal...
889 Entropy-Regularized Adjoint Matching for Offline Reinforcement Learning
2605.06156
Offline RL with generative policies提出熵正则的伴随匹配以缓解离线RL中流式策略的流行度偏置与支持绑定。
cs.LGcs.AI
Abdelghani Ghanem, Mounir Ghogho
Integrating expressive generative policies, such as flow-matching models, into offline reinforcement learning (RL) allows agents to capture complex, multi-modal behaviors. While Q-learning with Adjoint Matching (QAM) stabilizes policy optimization via the cont...
Integrating expressive generative policies, such as flow-matching models, into offline reinforcement learning (RL) allows agents to capture complex, multi-modal behaviors. While Q-learning with Adjoint Matching (QAM) stabilizes policy optimization via the continuous adjoint method, it remains inherently bound to the fixed behavior distribution. This dependence induces a \textit{popularity bias} that can suppress high-reward actions in low-density regions, and creates a \textit{support binding} t...
890 AffineLens: Capturing the Continuous Piecewise Affine Functions of Neural Networks
2605.06218
Piecewise affine network analysis提出AffineLens以实际枚举并刻画神经网络连续分段仿射区域结构。
cs.LG
Yi Wei, Xuan Qi, Furao shen, Jian Zhao, Vittorio Murino
Piecewise affine neural networks (PANNs) provide a principled geometric perspective on neural network expressivity by characterizing the input--output map as a continuous piecewise affine (CPA) function whose complexity is governed by the number, arrangement, ...
Piecewise affine neural networks (PANNs) provide a principled geometric perspective on neural network expressivity by characterizing the input--output map as a continuous piecewise affine (CPA) function whose complexity is governed by the number, arrangement, and shapes of its affine regions. However, existing interpretability and expressivity analyses often rely on indirect proxies (e.g., activation statistics or theoretical upper bounds) and rarely offer practical, accurate tools for enumerati...
891 MinMax Recurrent Neural Cascades
2605.06384
MinMax recurrent networks提出MinMax递归级联网络以避免梯度消失爆炸并具强表达能力。
cs.LGcs.AI
Alessandro Ronca
We show that the MinMax algebra provides a form of recurrence that is expressively powerful, efficiently implementable, and most importantly it is not affected by vanishing or exploding gradient. We call MinMax Recurrent Neural Cascades (RNCs) the models obtai...
We show that the MinMax algebra provides a form of recurrence that is expressively powerful, efficiently implementable, and most importantly it is not affected by vanishing or exploding gradient. We call MinMax Recurrent Neural Cascades (RNCs) the models obtained by cascading several layers of neurons that employ such recurrence. We show that MinMax RNCs enjoy many favourable theoretical properties. First, their formal expressivity includes all regular languages, arguably the maximal expressivit...
892 Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level
2605.06387
On-policy distillation for LLMs提出非对称在策略蒸馏以降低方差并改进token级纠错与探索。
cs.LGcs.AI
Nan Jia, Haojin Yang, Xing Ma, Jiesong Lian, Shuailiang Zhang
On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suf...
On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suffers from three structural weaknesses, including high variance updates, vanishing gradients in zero-advantage regions, and exploration bottlenecks when corrective signals are insufficient. We therefore propose Asymmetric On-Policy Distillat...
893 Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching
2605.06474
Off-policy evaluation theory提出Q-MMR用递归重加权与矩匹配学习样本权重以估计目标策略回报。
cs.LGcs.AI
Xiang Li, Nan Jiang
We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weig...
We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned inductively in a top-down manner via a moment matching objective against a value-function discriminator class. Notably, and perhaps surprisingly, a data-dependent finite-sample guarantee for general function approximation ca...
894 Learning quantum Hamiltonians at any temperature in polynomial time
2310.02243
Quantum Hamiltonian learning给出在任意温度下多项式时间学习局域量子哈密顿量的算法。
cs.LG
Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang
We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $\rho = e^{-\beta H}/\textrm{tr}(e^{-\beta H})$ at a known inverse temperature $\beta>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave...
We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $\rho = e^{-\beta H}/\textrm{tr}(e^{-\beta H})$ at a known inverse temperature $\beta>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $\epsilon$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [...
895 Active teacher selection for reward learning
2310.15288
Multi-teacher reward learning提出HUB框架主动选择教师以在异质人类反馈下高效学习奖励。
cs.LGcs.AI
Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell
Reward learning techniques enable machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite gathering feedback from large and heterogene...
Reward learning techniques enable machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite gathering feedback from large and heterogeneous populations. We propose the Hidden Utility Bandit (HUB) framework to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers. We develop a variety of solution algor...
896 Clinical Characteristics and Laboratory Biomarkers in ICU-admitted Septic Patients with and without Bacteremia
2311.08433
Sepsis bacteremia biomarkers回顾性分析ICU脓毒症患者实验室指标以评估预测菌血症的效能。
cs.LG
Sangwon Baek, Seung Jun Lee
Few studies have investigated the diagnostic utilities of biomarkers for predicting bacteremia among septic patients admitted to intensive care units (ICU). Therefore, this study evaluated the prediction power of laboratory biomarkers to utilize those markers ...
Few studies have investigated the diagnostic utilities of biomarkers for predicting bacteremia among septic patients admitted to intensive care units (ICU). Therefore, this study evaluated the prediction power of laboratory biomarkers to utilize those markers with high performance to optimize the predictive model for bacteremia. This retrospective cross-sectional study was conducted at the ICU department of Gyeongsang National University Changwon Hospital in 2019. Adult patients qualifying SEPSI...
897 Structure learning of Hamiltonians from real-time evolution
2405.00082
Hamiltonian structure learning从未知哈密顿量的实时演化中高效恢复其相互作用项与结构。
cs.LG
Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang
We study the problem of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m \lambda_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is alre...
We study the problem of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m \lambda_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is already well-understood under the assumption that the interaction terms, $E_a$, are given, and only the interaction strengths, $\lambda_a$, are unknown. But how efficiently can we learn a local Hamiltonian without prior knowledge of its interac...
898 Optimising MFCC parameters for the automatic detection of respiratory diseases
2408.07522
MFCC for respiratory diagnosis系统优化MFCC提取参数以提升呼吸系统疾病的自动检测性能。
cs.LGcs.SDeess.AS
Yuyang Yan, Sami O. Simons, Loes van Bemmel, Lauren Reinders, Frits M. E. Franssen
Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for auto...
Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we addres...
899 A Hybrid Graph Neural Network for Enhanced EEG-Based Depression Detection
2410.18103
EEG depression detection GNN提出混合图神经网络同时建模抑郁的共性与个体化脑连接以提升检测。
cs.LGcs.AI
Yiye Wang, Wenming Zheng, Yang Li, Hao Yang
Graph neural networks (GNNs) are becoming increasingly popular for EEG-based depression detection. However, previous GNN-based methods fail to sufficiently consider the characteristics of depression, thus limiting their performance. Firstly, studies in neurosc...
Graph neural networks (GNNs) are becoming increasingly popular for EEG-based depression detection. However, previous GNN-based methods fail to sufficiently consider the characteristics of depression, thus limiting their performance. Firstly, studies in neuroscience indicate that depression patients exhibit both common and individualized brain abnormal patterns. Previous GNN-based approaches typically focus either on fixed graph connections to capture common abnormal brain patterns or on adaptive...
900 Pretraining a Foundation Model for Small-Molecule Natural Products
2503.17656
Molecular foundation model pretraining预训练面向天然产物小分子的基础模型以提升多下游任务的泛化表现。
cs.LGcs.AI
Yuheng Ding, Bo Qiang, Shaoning Li, Yiran Zhou, Jie Yu
Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learnin...
Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learning approaches designed for specific downstream tasks. However, such one-model-for-a-task paradigm often lacks generalizability and leaves significant room for performance improvement. Additionally, existing molecular characterization methods...
901 An abstract effective convergence theorem for stochastic processes, with applications to stochastic approximation
2504.12922
Stochastic Approximation Convergence提出松弛超鞅条件下随机过程的有效收敛定理并给出定量保证。
cs.LG
Morenikeji Neri, Nicholas Pischke, Thomas Powell
We provide a general theorem on the asymptotic behavior of stochastic processes that conform to a relaxed supermartingale condition. The distinguishing feature of our result is that it provides quantitative convergence guarantees at a much higher level of abst...
We provide a general theorem on the asymptotic behavior of stochastic processes that conform to a relaxed supermartingale condition. The distinguishing feature of our result is that it provides quantitative convergence guarantees at a much higher level of abstraction and generality than is typically seen in the stochastic approximation literature, formulated in particular in terms of a general modulus $\tau$ that, on an intuitive level, captures an effective variant of the uniqueness in expectat...
902 Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors
2505.11325
Uncertainty for PFNs用鞅后验为PFN预测均值与分位数提供高效无调参不确定性量化。
cs.LGcs.AI
Thomas Nagler, David R\"ugamer
Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular datasets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not p...
Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular datasets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled, efficient, and tuning-free sampling procedure to construct Bayesian posteriors for such estimates based on martingale pos...
903 Versatile yet Efficient Network Traffic Analysis: Offloading Network Foundation Model to SmartNIC
2508.02001
SmartNIC-Offloaded Traffic Analysis将网络基础模型卸载到SmartNIC以兼顾加密流量分析的通用性与低延迟。
cs.LG
Chungang Lin, Xuying Meng, Tianyu Zuo, Weiyao Zhang, Meng Shen
Pervasive encryption makes large-scale labeling infeasible for traffic analysis, while security operations demand edge analysis to avert service degradation and further vulnerabilities. These pressures have produced two disjoint research lines: 1) versatile an...
Pervasive encryption makes large-scale labeling infeasible for traffic analysis, while security operations demand edge analysis to avert service degradation and further vulnerabilities. These pressures have produced two disjoint research lines: 1) versatile analysis, via network foundation models for low label dependency, and 2) efficient analysis, via hardware offloading for low analysis latency. However, versatility and efficiency have appeared fundamentally incompatible to co-achieve, with pr...
904 Chordless cycle filtrations for dimensionality detection in complex networks via topological data analysis
2509.08350
TDA Network Dimensionality基于无弦环的拓扑加权滤波来数据驱动估计复杂网络的潜在维度。
cs.LG
Aina Ferr\`a Marc\'us, Robert Jankowski, Meritxell Vila Mi\~nana, Carles Casacuberta, M. \'Angeles Serrano
Many complex networks, ranging from social to biological systems, exhibit structural patterns consistent with an underlying hyperbolic geometry. Revealing the dimensionality of this latent space can disentangle the structural complexity of communities, impact ...
Many complex networks, ranging from social to biological systems, exhibit structural patterns consistent with an underlying hyperbolic geometry. Revealing the dimensionality of this latent space can disentangle the structural complexity of communities, impact efficient network navigation, and fundamentally shape connectivity and system behavior. We introduce a topological data analysis weighting scheme for graphs based on chordless cycles to estimate network dimensionality in a data-driven way. ...
905 Privately Estimating Black-Box Statistics
2510.00322
Black-Box Differential Privacy提出无需已知敏感度的黑盒统计量差分隐私估计方法并提升数据效率。
cs.LG
G\"unter F. Steinke, Thomas Steinke
Standard techniques for differentially private estimation, such as Laplace or Gaussian noise addition, require guaranteed bounds on the sensitivity of the estimator in question. But such sensitivity bounds are often large or simply unknown. Thus we seek differ...
Standard techniques for differentially private estimation, such as Laplace or Gaussian noise addition, require guaranteed bounds on the sensitivity of the estimator in question. But such sensitivity bounds are often large or simply unknown. Thus we seek differentially private methods that can be applied to arbitrary black-box functions. A handful of such techniques exist, but all are either inefficient in their use of data or require evaluating the function on exponentially many inputs. In this ...
906 Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining
2510.18516
Calcium Imaging SSL Pretraining提出细胞模式感知预训练以从钙成像中稳健解码动态视觉体验。
cs.LG
Sangyoon Bae, Mehdi Azabou, Blake Richards, Jiook Cha
Neural recordings exhibit a distinctive form of heterogeneity rooted in differences in cell types, intrinsic circuit dynamics, and stochastic stimulus-response variability that goes beyond ordinary dataset variability, mixing statistically regular neurons with...
Neural recordings exhibit a distinctive form of heterogeneity rooted in differences in cell types, intrinsic circuit dynamics, and stochastic stimulus-response variability that goes beyond ordinary dataset variability, mixing statistically regular neurons with highly stochastic, stimulus-contingent ones within the same dataset. This heterogeneity poses a challenge for self-supervised learning (SSL) -- learnable statistical regularity -- thereby destabilizing representation learning and limiting ...
907 Benchmarking World-Model Learning with Environment-Level Queries
2510.19788
World-Model Evaluation Benchmark用环境级查询构建评测以检验世界模型回答多类问题的能力。
cs.LGcs.AI
Archana Warrier, Dat Nguyen, Michelangelo Naim, Moksh Jain, Yichao Liang
World models are central to building AI agents capable of flexible reasoning and planning. Yet current evaluations (i) test only properties measurable from observed interactions, such as next-frame prediction or task return, and (ii) do not test whether a lear...
World models are central to building AI agents capable of flexible reasoning and planning. Yet current evaluations (i) test only properties measurable from observed interactions, such as next-frame prediction or task return, and (ii) do not test whether a learned model supports diverse queries about the environment. In contrast, humans build $\textit{general-purpose}$ models that can answer many different questions about an environment$\unicode{x2014}$including questions that require understandi...
908 RNAGenScape: Property-Guided, Optimized Generation of mRNA Sequences with Manifold Langevin Dynamics
2510.24736
mRNA Sequence Generation用流形朗之万动力学进行属性引导的mRNA序列生成并保持生物可行性。
cs.LG
Danqi Liao, Chen Liu, Xingzhi Sun, Di\'e Tang, Haochen Wang
Generating property-optimized mRNA sequences is central to applications such as vaccine design and protein replacement therapy, but remains challenging due to limited data, complex sequence-function relationships, and the narrow space of biologically viable se...
Generating property-optimized mRNA sequences is central to applications such as vaccine design and protein replacement therapy, but remains challenging due to limited data, complex sequence-function relationships, and the narrow space of biologically viable sequences. Generative methods that drift away from the data manifold can yield sequences that fail to fold, translate poorly, or are otherwise nonfunctional. We present RNAGenScape, a property-guided manifold Langevin dynamics framework for m...
909 Understanding Robustness of Model Editing in Code LLMs
2511.03182
Code LLM Model Editing构建基准分析代码大模型编辑在API迁移中的泛化与鲁棒性保持。
cs.LG
Vinaik Chhetri, Moghis Fereidouni, A. B Siddique, Umar Farooq
Large language models (LLMs) for code are increasingly used in software development, but they remain static after pretraining while APIs and software libraries continue to evolve. Model editing offers a lightweight alternative to retraining for incorporating A...
Large language models (LLMs) for code are increasingly used in software development, but they remain static after pretraining while APIs and software libraries continue to evolve. Model editing offers a lightweight alternative to retraining for incorporating API updates, yet it remains unclear whether existing editing methods can induce correct API migration, generalize that behavior to unseen tasks, and preserve performance on tasks involving unmodified APIs. We present a controlled benchmark f...
910 Assumed Density Filtering and Smoothing with Neural Network Surrogate Models
2511.09016
Neural ADF Filtering用神经网络代理模型的高斯输入解析矩传播实现ADF滤波与RTS平滑。
cs.LG
Simon Kuang, Xinfan Lin
The Kalman filter and Rauch-Tung-Striebel (RTS) smoother are optimal for state estimation in linear dynamic systems. With nonlinear systems, the challenge consists in how to propagate uncertainty through the state transitions and output function. For the case ...
The Kalman filter and Rauch-Tung-Striebel (RTS) smoother are optimal for state estimation in linear dynamic systems. With nonlinear systems, the challenge consists in how to propagate uncertainty through the state transitions and output function. For the case of a neural network model, we enable accurate uncertainty propagation using a recent state-of-the-art analytic formula for computing the mean and covariance of a deep neural network with Gaussian input. We argue that cross entropy is a more...
911 End-to-end PDDL Planning with Hardcoded and Dynamic Agents
2512.09629
End-to-End PDDL Planning将自然语言需求转为PDDL并由多代理迭代修正以完成可验证规划。
cs.LGcs.AI
Emanuele La Malfa, Ping Zhu, Samuele Marro, Sara Bernardini, Michael Wooldridge
We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem are iterati...
We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem are iteratively refined by sub-modules (agents) to address common planning requirements, such as time constraints and optimality, as well as ambiguities and contradictions that may exist in the human specification. We support two categories of agents:...
912 Evaluating Large Language Models in Scientific Discovery
2512.15567
LLMs for Scientific Discovery提出情景化基准评测大模型在科学发现中的迭代推理与假设生成能力。
cs.LGcs.AI
Zhangde Song, Jieyu Lu, Yuanqi Du, Botao Yu, Thomas M. Pruyn
Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific d...
Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We introduce a scenario-grounded benchmark that evaluates LLMs across biology, chemistry, materials, and physics, where domain experts define research projects of genuine interest and decompose them into modular research scenarios...
913 Bellman Calibration for $V$-Learning in Offline Reinforcement Learning
2512.23694
Offline RL Value Calibration提出Bellman校准准则与误差度量以提升离线V学习的长程可靠性。
cs.LG
Lars van der Laan, Nathan Kallus
Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman completeness or realizabil...
Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman completeness or realizability. We introduce Bellman calibration, a weak reliability criterion requiring that states assigned similar predicted values have average Bellman targets that agree with those predictions. This criterion yields a scalar calibration error for...
914 Fitted $Q$ Evaluation Without Bellman Completeness via Stationary Weighting
2512.23805
FQE without Bellman Completeness通过目标策略平稳分布加权的回归范数实现无完备性假设的FQE稳定性。
cs.LG
Lars van der Laan, Nathan Kallus
Fitted $Q$-evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximation. We study an alternative rout...
Fitted $Q$-evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximation. We study an alternative route: changing the norm used in the regression step. The policy-evaluation Bellman operator is contractive in the $L^2$ norm induced by the target policy's stationary state-action distribution, whereas standard off-policy FQE projects Bellman ...
915 Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration
2512.23927
Soft FQI Local Convergence分析软FQI在平稳范数对齐下无需Bellman完备性也可局部收敛。
cs.LG
Lars van der Laan, Nathan Kallus
Fitted $Q$-iteration (FQI) and soft FQI are widely used value-based methods for offline reinforcement learning, but their standard stability guarantees often depend on Bellman completeness, a strong closure condition that can fail under function approximation....
Fitted $Q$-iteration (FQI) and soft FQI are widely used value-based methods for offline reinforcement learning, but their standard stability guarantees often depend on Bellman completeness, a strong closure condition that can fail under function approximation. We analyze soft FQI without Bellman completeness and identify the stability mechanism that replaces it: local stationary norm alignment. Near the soft-optimal fixed point, the soft Bellman operator has the same first-order behavior as the ...
916 SWaRL: Safeguard Code Watermarking via Reinforcement Learning
2601.02602
RL-based Code Watermarking用强化学习嵌入可验证且难移除的代码水印并保持程序功能正确。
cs.LG
Neusha Javidnia, Ruisi Zhang, Ashish Kundu, Farinaz Koushanfar
We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLMs by embedding unique and verifiable signatures in the generated program. Existing watermarking approaches either rely on handcra...
We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLMs by embedding unique and verifiable signatures in the generated program. Existing watermarking approaches either rely on handcrafted code transformations or manipulate token generation probabilities at inference time, making them vulnerable to removal attacks or prone to breaking functional correctness. To address these challenges, SWaRL employs a reinforcement lear...
917 Multi-environment Invariance Learning with Missing Data
2601.07247
Invariant Learning with Missing Data研究多环境不变性学习在缺失数据下的建模与泛化方法。
cs.LG
Yiran Jia, Jelena Bradic
Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationship...
Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationships, which may represent causal effects when the data distribution is encoded within a structural equation model (SEM) and satisfies modularity conditions. This has led to a growing body of work that builds on invariance learning, leveraging ...
918 TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models
2601.18744
Time Series Reasoning Benchmark提出多任务多模态TSRBench以全面评测通用模型的时间序列推理能力。
cs.LGcs.AI
Fangxu Yu, Xingang Guo, Lingzhi Yuan, Haoqiang Kang, Hongyu Zhao
Time series are ubiquitous in real-world scenarios and crucial for applications ranging from energy management to traffic control. Consequently, the ability to reason over time series is a fundamental skill for generalist models to solve complex problems. Howe...
Time series are ubiquitous in real-world scenarios and crucial for applications ranging from energy management to traffic control. Consequently, the ability to reason over time series is a fundamental skill for generalist models to solve complex problems. However, current benchmarks for generalist models largely overlook this dimension. To bridge this gap, we introduce TSRBench, a comprehensive multi-modal benchmark designed to stress-test the full spectrum of time series reasoning capabilities....
919 Test-Time Compute Games
2601.21839
Economics of Test-Time Compute用博弈论分析LLM按算力计费导致的社会低效与提供商激励扭曲。
cs.LGcs.AI
Ander Artola Velasco, Dimitrios Rontogiannis, Stratis Tsirtsis, Manuel Gomez-Rodriguez
Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models (LLMs). However, this strategy has in turn increased how much users pay cloud-based providers offering LLM-as-a-service, since providers charge us...
Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models (LLMs). However, this strategy has in turn increased how much users pay cloud-based providers offering LLM-as-a-service, since providers charge users for the amount of test-time compute they use to generate an output. In our work, we show that the market of LLM-as-a-service is socially inefficient: providers have a financial incentive to increase the amount of test-time compute, even...
920 Diffusion Path Samplers via Sequential Monte Carlo
2601.21951
SMC Diffusion Samplers结合扩散路径与序贯蒙特卡洛构造采样器并给出得分与密度估计。
cs.LG
James Matthew Young, Paula Cordero-Encinar, Sebastian Reich, Andrew Duncan, O. Deniz Akyildiz
We develop diffusion-based samplers for target distributions known up to a normalising constant. To this end, we rely on the well-known diffusion path that smoothly interpolates between a simple base distribution and the target, popularised by diffusion models...
We develop diffusion-based samplers for target distributions known up to a normalising constant. To this end, we rely on the well-known diffusion path that smoothly interpolates between a simple base distribution and the target, popularised by diffusion models. We tackle the score estimation problem by developing an efficient sequential Monte Carlo sampler that evolves auxiliary variables from conditional distributions along the path, providing principled score and density estimates for time-var...
921 Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients
2602.00474
Markov Chain Policy Evaluation用最小外周商分解区分持久相位与瞬态效应以改进马尔可夫链评估。
cs.LG
Yang Xu, Vaneet Aggarwal
We study fixed-policy evaluation for finite Markov chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Ces\`aro averages, while persistent phase-dep...
We study fixed-policy evaluation for finite Markov chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Ces\`aro averages, while persistent phase-dependent behavior is absorbed into the bias together with genuinely transient effects. We identify the real peripheral invariant subspace $\mathcal{K}(P)$ of the transition matrix $P$ as the source of this ambiguity. Quotienting by $\mathcal{...
922 Emergence of Distortions in High-Dimensional Guided Diffusion Models
2602.00716
Classifier-Free Guidance Distortions用统计物理刻画CFG在高维下引入的条件分布失真及其依赖关系。
cs.LG
Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello
Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often reduces sample diversity. Using tools from statistical physics, we analyze the emergence of generative distortions induced by CFG, namely the mis...
Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often reduces sample diversity. Using tools from statistical physics, we analyze the emergence of generative distortions induced by CFG, namely the mismatch between the CFG sampling distribution and the true conditional distribution. We study this phenomenon in analytically tractable settings with exact score functions, characterizing its dependence on data dimensionality and the number o...
923 Robust Sublinear Convergence Rates for Iterative Bregman Projections
2602.01372
Bregman Projection Convergence给出迭代KL型Bregman投影的稳健O(1/k)对偶收敛率证明框架。
cs.LG
Gabriel Peyr\'e
Entropic regularization provides a simple way to approximate linear programs whose constraints split into two or more tractable blocks. The resulting objectives are amenable to cyclic Kullback-Leibler (KL) Bregman projections, with Sinkhorn-type algorithms for...
Entropic regularization provides a simple way to approximate linear programs whose constraints split into two or more tractable blocks. The resulting objectives are amenable to cyclic Kullback-Leibler (KL) Bregman projections, with Sinkhorn-type algorithms for optimal transport, matrix scaling, and barycenters as canonical examples. This paper gives a general blueprint for proving $O(1/k)$ dual convergence rate with a constant that scales only linearly in $1/\gamma$, where $\gamma$ is the entrop...
924 CGF-Softmax: A Cumulant-Based Softmax Reformulation for Efficient Inference under Homomorphic Encryption
2602.01621
HE-Friendly Softmax Approximation提出基于累积量生成函数的softmax重写以降低同态加密推理开销。
cs.LG
Hanjun Park, Byeongseo Min, Jiheon Woo, Min-Wook Jeong, Jongho Shin
Homomorphic encryption (HE) is a prominent framework for privacy-preserving machine learning, enabling inference directly on encrypted data. However, evaluating softmax, a core component of transformer architectures, remains particularly challenging in HE due ...
Homomorphic encryption (HE) is a prominent framework for privacy-preserving machine learning, enabling inference directly on encrypted data. However, evaluating softmax, a core component of transformer architectures, remains particularly challenging in HE due to its multivariate structure, the large dynamic range induced by exponential functions, and the costly division operation. In this paper, we propose CGF-softmax, which reformulates the softmax denominator through the cumulant generating fu...
925 Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model
2602.04774
Optimal Learning Rate Schedules在可解随机特征模型中推导SGD最优学习率日程与缩放律。
cs.LG
Blake Bordelon, Francesco Mori
Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR schedules for a powerlaw random feature mod...
Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $\eta_T^\star(t)$ where $t$ is the current iterate and $T$ is the training horizon. This schedule is computed both as a numerical optimization problem and a...
926 On the Meta-Design of Allocation Problems
2602.08786
Meta-Design of Resource Allocation将资源配置的设计参数也纳入优化以联合提升总体福利与服务策略。
cs.LG
Unai Fischer-Abaigar, Emily Aiken, Christoph Kern, Juan Carlos Perdomo
There is an extensive literature that studies how to find optimal policies in resource allocation problems, taking the underlying design parameters that define the allocation, such as what data is collected, how many people can be served, and quality of servic...
There is an extensive literature that studies how to find optimal policies in resource allocation problems, taking the underlying design parameters that define the allocation, such as what data is collected, how many people can be served, and quality of service as fixed constraints. Yet, from a planner's perspective, these design parameters are themselves optimization variables that are just as important in determining overall welfare as selecting the optimal targeting rule for a given set of co...
927 From Average Sensitivity to Small-Loss Regret Bounds under Random-Order Model
2602.09457
Random-Order Online Learning Regret由平均敏感度与稳定性条件推导随机顺序模型下的小损失遗憾界。
cs.LG
Shinsaku Sakaue, Yuichi Yoshida
We study online learning in the random-order model, where the multiset of loss functions is chosen adversarially but revealed in a uniformly random order. By extending the batch-to-online transformation of Dong and Yoshida (2023), we show that if an offline al...
We study online learning in the random-order model, where the multiset of loss functions is chosen adversarially but revealed in a uniformly random order. By extending the batch-to-online transformation of Dong and Yoshida (2023), we show that if an offline algorithm enjoys a $(1+\varepsilon)$-approximation guarantee, an average sensitivity bound controlled by a function $\varphi(\varepsilon)$, and stability with respect to $\varepsilon$, then we can obtain a small-loss regret bound typically of...
928 MobileDev-Bench: A Benchmark for Issue Resolution in Mobile Application Development
2603.24946
Mobile App Issue Benchmark发布MobileDev-Bench评测大模型在真实移动应用多文件缺陷修复能力。
cs.LG
Moshood A. Fakorede, Krishna Upadhyay, A. B. Siddique, Umar Farooq
Large language models (LLMs) have shown strong performance on automated software engineering tasks, yet existing benchmarks focus primarily on library-style repositories, leaving mobile application development largely unexplored despite its framework-specific ...
Large language models (LLMs) have shown strong performance on automated software engineering tasks, yet existing benchmarks focus primarily on library-style repositories, leaving mobile application development largely unexplored despite its framework-specific build systems, heterogeneous artifact types, and coordinated multi-file fix requirements. We introduce MobileDev-Bench, a benchmark comprising 407 real-world issue-resolution tasks collected from 19 production mobile applications spanning A...
929 Beyond Pessimism: Offline Learning in KL-regularized Games
2604.06738
Offline Learning in Regularized Games提出无悲观估计的算法以改进KL正则零和博弈的离线学习统计效率。
cs.LG
Yuheng Zhang, Claire Chen, Nan Jiang
We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized with respect to a fixed reference policy through KL regularization. Prior work relies on pessimistic value estimation to handle distribution shift, yielding onl...
We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized with respect to a fixed reference policy through KL regularization. Prior work relies on pessimistic value estimation to handle distribution shift, yielding only $\widetilde{\mathcal{O}}(1/\sqrt n)$ statistical rates. We develop a new pessimism-free algorithm and analytical framework for KL-regularized games, built on the smoothness of KL-regularized best responses and a stability property of the ...
930 Steered LLM Activations are Non-Surjective
2604.09839
Limits of Activation Steering证明激活操控产生的行为映射非满射并分析其不可由提示实现的范围。
cs.LGcs.AI
Aayush Mishra, Daniel Khashabi, Anqi Liu
Activation steering is a popular white-box control technique that modifies model activations to elicit an abstract change in its behavior. It has also become a standard tool in interpretability (e.g., probing truthfulness, or translating activations into human...
Activation steering is a popular white-box control technique that modifies model activations to elicit an abstract change in its behavior. It has also become a standard tool in interpretability (e.g., probing truthfulness, or translating activations into human-readable explanations) and safety research (e.g., jailbreakability). However, it is unclear whether steered behavior is realizable by any textual prompt. In this work, we cast this question as a surjectivity problem: for a fixed model, doe...
931 One-Shot Generative Flows: Existence and Obstructions
2604.15439
Generative Flow Transport Theory研究生成模型中的动态测度传输流及其存在性障碍。
cs.LG
Panos Tsimpos, Daniel Sharp, Youssef Marzouk
We study dynamic measure transport for generative modeling, focusing on transport maps that connect a source measure $P_0$ to a target measure $P_1$ by integrating a velocity field of the form $v_t(x) = \mathbb{E}[\dot X_t \mid X_t = x]$, where $X_\bullet = (X...
We study dynamic measure transport for generative modeling, focusing on transport maps that connect a source measure $P_0$ to a target measure $P_1$ by integrating a velocity field of the form $v_t(x) = \mathbb{E}[\dot X_t \mid X_t = x]$, where $X_\bullet = (X_t)_t$ is a stochastic process satisfying $(X_0,X_1)\sim{P_0}\otimes{P_1}$ and $\dot X_t$ is its time derivative. We investigate when $X_\bullet$ induces a \emph{straight-line flow}: a flow whose pointwise acceleration vanishes and is there...
932 Verification Modulo Tested Library Contracts
2604.15533
Tested Library Contract Verification合成可测试的库方法契约以模块化验证客户端程序。
cs.LG
Abhishek Uppar, Omar Muhammad, Sumanth Prabhu, Deepak D'Souza, Madhusudan P
We consider the problem of verification modulo tested library contracts as a step towards automating the verification of client programs that use complex libraries. We formulate this problem as the synthesis of modular contracts for the library methods used by...
We consider the problem of verification modulo tested library contracts as a step towards automating the verification of client programs that use complex libraries. We formulate this problem as the synthesis of modular contracts for the library methods used by the client that are adequate to prove the client correct, and that also pass the scrutiny of a testing engine that tests the library against these contracts. We also consider a new form of method contracts called contextual contracts that ...
933 Beyond Bellman: High-Order Generator Regression for Continuous-Time Policy Evaluation
2604.18972
Continuous-Time Policy Evaluation用高阶生成元回归提升连续时间策略评估精度。
cs.LG
Yaowei Zheng, Richong Zhang, Shenxi Wu, Shirui Bian, Haosong Zhang
We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from one-step recursion is only...
We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from one-step recursion is only first-order in the grid width. We estimate the time-dependent generator from multi-step transitions using moment-matching coefficients that cancel lower-order truncation terms, and combine the resulting surrogate with backward regression. ...
934 FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards
2604.26733
Live RL Future Prediction构建实时未来事件预测环境以训练可持续学习的智能体。
cs.LGcs.AI
Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue
Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn fro...
Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from the real world. It can provide a large number of prediction questions grounded in diverse real-world events, while preventing answer leakage. To leverage the advantages of future prediction, we present FutureWorld, a live agentic reinforc...
935 Geometric and dynamical analysis of attractor boundaries and storage limits in kernel Hopfield networks
2605.00366
Kernel Hopfield Network Dynamics分析核Hopfield网络吸引域边界与存储容量极限机制。
cs.LG
Akira Tamamori
High-capacity associative memories based on Kernel Logistic Regression (KLR) exhibit strong storage capabilities, but the dynamical and geometric mechanisms underlying their stability remain poorly understood. This paper investigates the global geometry of att...
High-capacity associative memories based on Kernel Logistic Regression (KLR) exhibit strong storage capabilities, but the dynamical and geometric mechanisms underlying their stability remain poorly understood. This paper investigates the global geometry of attractor basins and the mechanisms governing the storage limit in KLR-trained Hopfield networks. We combine empirical evaluations using random sequences and real-world image embeddings (CIFAR-10) with morphing experiments and statistical Sign...
936 Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring
2605.00754
Multilingual Code Reward Models训练鲁棒多语种代码奖励模型以支持多指标灵活评分。
cs.LG
Indraneil Paul, Goran Glava\v{s}, Iryna Gurevych
Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with exi...
Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choice constrains post-training to optimizing functional correctness over self-contained executable code. In this work, we examine the training and evaluation of multilingual, multi-cr...
937 Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning
2605.01041
Multi-Agent UAV Deconfliction用多智能体强化学习实现异构无人机群安全间隔保障。
cs.LGcs.AI
Iman Sharifi, Hyeong Tae Kim, Maheed Hatem Ahmed, Mahsa Ghasemi, Peng Wei
In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sen...
In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free...
938 Saliency-Aware Regularized Quantization Calibration for Large Language Models
2605.05693
LLM Quantization Calibration提出显著性感知的正则化校准以提升LLM后训练量化泛化。
cs.LGcs.AI
Yanlong Zhao, Xiaoyuan Cheng, Huihang Liu, Baihua He, Xinyu Zhang
Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predeter...
Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predetermined calibration dataset, typically optimized via either scale search or Gram-based methods. However, from the perspective of generalization risk, existing PTQ calibration objectives based solely on empirical reconstruction error over limi...
939 Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
2605.05703
Active Learning for LLM-MAS用主动学习挑选任务以优化多智能体通信结构并省token。
cs.LGcs.AI
Huchen Yang, Xinghao Dong, Dan Negrut, Jin-Long Wu
Optimizing the communication structure of large language model based multi-agent systems (LLM-MAS) has been shown to improve downstream performance and reduce token usage. Existing methods typically rely on randomly sampled training tasks. However, tasks may d...
Optimizing the communication structure of large language model based multi-agent systems (LLM-MAS) has been shown to improve downstream performance and reduce token usage. Existing methods typically rely on randomly sampled training tasks. However, tasks may differ substantially in difficulty and domain, and thus they are not equally informative for updating communication structure, making optimization under limited training budgets often unstable and highly sensitive to the particular training ...
940 Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend
2605.06055
Efficient MoE Communication提出基于共享HBM的无中继缓冲MoE推理通信方案。
cs.LG
Tianlun Hu, Tiancheng Hu, Shengsheng Litang, Sheng Wang, Xiaoming Bao
Mixture-of-Experts (MoE) inference requires large-scale token exchange across devices, making dispatch and combine major bottlenecks in both prefill and decode. Beyond network transfer, routing-driven layout transformation, temporary relay, and output restorat...
Mixture-of-Experts (MoE) inference requires large-scale token exchange across devices, making dispatch and combine major bottlenecks in both prefill and decode. Beyond network transfer, routing-driven layout transformation, temporary relay, and output restoration can add substantial overhead. Existing MoE communication paths are often buffer-centric, using explicit inter-process relay and reordering buffers around collective transfer. This report presents a relay-buffer-free communication design...
cs.MM 1 papers
1102 Forensic analysis of video data deletion and recovery in Honeywell surveillance file system
2605.07430
Surveillance File System Forensics逆向分析Honeywell监控专有文件系统,研究视频删除与恢复取证。
cs.MM
Jinhee Yoon, Sungjae Hwang
Real-time video surveillance systems store recorded video using digital video recorders (DVRs) and network video recorders (NVRs). To support continuous high-volume video storage, these devices employ specialized, nonstandard file systems that are often propri...
Real-time video surveillance systems store recorded video using digital video recorders (DVRs) and network video recorders (NVRs). To support continuous high-volume video storage, these devices employ specialized, nonstandard file systems that are often proprietary and undocumented. This lack of documentation significantly increases the time and effort required for forensic analysis. In this study, we analyze an undocumented proprietary file system used by Honeywell video surveillance devices-on...
cs.SD 3 papers
1096 An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire
2605.06685
Piano Transcription and Information Profiling构建高精度钢琴转录到分析流水线,生成作曲家信息论风格画像。
cs.SDeess.AS
Fred Jalbert-Desforges
We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certif...
We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives emp...
1097 A Decomposed Retrieval-Edit-Rerank Framework for Chord Generation
2605.07489
Retrieval-Edit-Rerank Chord Generation用检索-编辑-重排分解式框架生成和弦,兼顾多样性与乐理约束。
cs.SDcs.MM
Qiqi He, Dichucheng Li, Xiaoheng Sun, Anqi Huang
Chord generation is an inherently constrained creative task that requires balancing stylistic diversity with music-theoretic feasibility. Existing approaches typically entangle candidate generation and constraint enforcement within a single model, making the d...
Chord generation is an inherently constrained creative task that requires balancing stylistic diversity with music-theoretic feasibility. Existing approaches typically entangle candidate generation and constraint enforcement within a single model, making the diversity-feasibility trade-off difficult to control and interpret. In this work, we approach chord generation from a system-level perspective, introducing a Retrieval-Edit-Rerank (RER) framework that decomposes the task into three explicit ...
1098 TARNet: A Temporal-Aware Multi-Scale Architecture for Closed-Set Speaker Identification
2605.07735
Temporal Multi-Scale Speaker Identification提出TARNet多尺度时序建模网络,提升闭集说话人识别性能。
cs.SD
Yassin Terraf, Youssef Iraqi
Closed-Set speaker identification aims to assign a speech utterance to one of a predefined set of enrolled speakers and requires robust modeling of speaker-specific characteristics across multiple temporal scales. While recent deep learning approaches have ach...
Closed-Set speaker identification aims to assign a speech utterance to one of a predefined set of enrolled speakers and requires robust modeling of speaker-specific characteristics across multiple temporal scales. While recent deep learning approaches have achieved strong performance, many existing architectures provide limited mechanisms for modeling temporal dependencies across different time scales, which can restrict the effective use of complementary short-, mid-, and long-term speaker char...
eess.AS 3 papers
1099 Evaluating voice anonymisation using similarity rank disclosure
2605.07291
Voice Anonymisation Privacy Metric用相似度排名泄露SRD度量匿名化隐私风险,避免EER局限。
eess.AS
Shilpa Chandra, Matteo Petten\`o, Nicholas Evans, Michele Panariello, Massimiliano Todisco
The evaluation of voice anonymisation remains challenging. Current practice relies on automatic speaker verification metrics such as the equal error rate (EER). Performance estimates dependent on the classifier and operating point provide an incomplete or even...
The evaluation of voice anonymisation remains challenging. Current practice relies on automatic speaker verification metrics such as the equal error rate (EER). Performance estimates dependent on the classifier and operating point provide an incomplete or even misleading characterisation of privacy risk. We investigate the use of similarity rank disclosure (SRD), an information-theoretic metric, which operates on feature representations rather than classifier decisions, providing a threshold-ind...
1100 Asymmetric Phase Coding Audio Watermarking
2605.07241
Cryptographic Audio Watermarking提出APC相位编码结合数字签名与纠错,实现训练无关音频溯源水印。
eess.AS
Guang Yang, Amir Ghasemian, Ninareh Mehrabi, Homa Hosseinmardi
The proliferation of deepfake audio challenges voice-based authentication systems; passive forensic detectors are sensitive to evolving generative models and to real-world channel distortions. We propose Asymmetric Phase Coding (APC), a training-free cryptogra...
The proliferation of deepfake audio challenges voice-based authentication systems; passive forensic detectors are sensitive to evolving generative models and to real-world channel distortions. We propose Asymmetric Phase Coding (APC), a training-free cryptographic signing layer for audio, designed as a compact and auditable provenance primitive that can stand alone or be stacked with learned watermarks. APC combines Ed25519 digital signatures (EdDSA, FIPS 186-5; 64-byte signatures) with Reed-Sol...
1101 Multi-Axis Speech Similarity via Factor-Partitioned Embeddings
2605.02804
Factor-Partitioned Speech Embeddings将语音嵌入按内容与说话人等因素分区,获得多轴相似度表示。
eess.AS
Jim O'Regan, Jens Edlund
Speech encodes multiple simultaneous attributes -- linguistic content, speaker identity, dialect, gender --that conventional single-vector embeddings conflate. We present a factor-partitioned embedding framework that maps each utterance into a single vector wh...
Speech encodes multiple simultaneous attributes -- linguistic content, speaker identity, dialect, gender --that conventional single-vector embeddings conflate. We present a factor-partitioned embedding framework that maps each utterance into a single vector whose subspaces correspond to distinct axes of variation. A shared acoustic encoder feeds per-axis linear projection heads, each trained via distillation from a specialist teacher or a contrastive objective over shared-label pairs. The result...