arXiv Daily Index - 2026-05-06

#	Title	Categories	Authors	Abstract
cond-mat.mtrl-sci 1 papers
296	Building informative materials datasets beyond targeted objectives 2605.05104 Informative materials dataset design提出最大化数据集长期信息量的材料数据采集与构建框架。	cond-mat.mtrl-scics.AIcs.DBcs.LGstat.AP	Rafael Espinosa Castañeda, Ashley Dale, Hongchen Wang, Yonatan Kurniawan, Hao Wan	Materials science data collection can be expensive, making the reuse and long-term utility of datasets critical important for future discovery campaigns. In practice, researchers prioritize a subset of properties due to research interests. However, ignoring a ... Materials science data collection can be expensive, making the reuse and long-term utility of datasets critical important for future discovery campaigns. In practice, researchers prioritize a subset of properties due to research interests. However, ignoring a subset of outcomes in data collection campaigns potentially generate datasets poorly suited for future learning tasks. Here, we present a framework for dataset construction that maximizes informativeness for target properties of interest wh...
cs.AI 34 papers
35	Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone 2605.04454 Deployment Alignment Evaluation论证仅靠模型级基准无法推断部署对齐并提出证据分层。	cs.AIcs.HCcs.LGcs.SE	Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka, Ivan Flechais	Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise preference, and these scores are often used to support cla... Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise preference, and these scores are often used to support claims about deployed alignment. This paper argues that deployment-relevant alignment cannot be inferred from model-level evaluation alone. Alignment claims should instead be indexed to the level at which evidence is collected: model-level, re...
47	How Does Thinking Mode Change LLM Moral Judgments? A Controlled Instant-vs-Thinking Comparison Across Five Frontier Models 2605.04488 Reasoning Mode Moral Judgments对比即时与思考模式下同一LLM的道德判断差异。	cs.AI	Sai Sourabh Madur	We evaluate whether enabling provider-exposed reasoning mode changes moral judgments within the same model checkpoint. Across 100 moral-judgment scenarios and five frontier reasoning-trained LLMs (Claude Sonnet 4.6, GPT 5.5, Gemini 3 Flash, DeepSeek V3.1, and ... We evaluate whether enabling provider-exposed reasoning mode changes moral judgments within the same model checkpoint. Across 100 moral-judgment scenarios and five frontier reasoning-trained LLMs (Claude Sonnet 4.6, GPT 5.5, Gemini 3 Flash, DeepSeek V3.1, and Qwen3.5 397B), aggregate binary-verdict agreement remains high and statistically indistinguishable between instant and thinking modes (Krippendorff's alpha = 0.78 vs. 0.79). However, disagreement is concentrated in 21 model-disputed scenari...
94	From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning 2605.04572 LLM safety degradation dynamics用参数动态分析并量化微调导致的样本级安全退化风险。	cs.AIcs.LG	Xiao Wang, Yifei Zhang, YongKang Liu, Xiaocui Yang, Zihan Wang	Safety alignment of Large Language Models (LLMs) is extremely fragile, as fine-tuning on a small number of benign samples can erase safety behaviors learned from millions of preference examples. Existing studies attempt to explain this phenomenon by comparing ... Safety alignment of Large Language Models (LLMs) is extremely fragile, as fine-tuning on a small number of benign samples can erase safety behaviors learned from millions of preference examples. Existing studies attempt to explain this phenomenon by comparing parameters and hidden states before and after fine-tuning, but overlook their dynamic evolution during fine-tuning. In this paper, we uncover a critical mechanism underlying safety degradation by analyzing parameter dynamics, where benign f...
106	SensingAgents: A Multi-Agent Collaborative Framework for Robust IMU Activity Recognition 2605.04608 Multi-agent IMU activity recognition构建多智能体协作框架提升IMU活动识别鲁棒性与可解释性。	cs.AI	Naiyu Zheng, Tianlong Yu, Haochen Yin, Xiaoyi Fan, Xiping Hu	Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is a cornerstone of mobile health, smart environments, and human-computer interaction. However, current deep learning-based HAR models often struggle with heavy reliance on labeled ... Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is a cornerstone of mobile health, smart environments, and human-computer interaction. However, current deep learning-based HAR models often struggle with heavy reliance on labeled data, position-specific ambiguity, and a lack of transparent reasoning. Inspired by the advanced agents framework, which emulates a collaborative agent using Large Language Models (LLMs), we propose SensingAgents, a novel multi-agent system...
113	AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair 2605.04624 Agent repair ranking instability benchmark发布配对执行轨迹语料以审计修复代理的评测通道不稳定。	cs.AIcs.SE	Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song	Agent-repair leaderboards reorder under evaluator reconfiguration, and a measurable share of the reordering is produced by methods that consult evaluator-derived signal during internal selection of candidate repairs. We document this failure mode on a public l... Agent-repair leaderboards reorder under evaluator reconfiguration, and a measurable share of the reordering is produced by methods that consult evaluator-derived signal during internal selection of candidate repairs. We document this failure mode on a public leaderboard and release AuditRepairBench, a paired-execution trace corpus of 576,000 registered cells (96,000 executed) that operationalizes evaluator-channel-blocking ranking instability within a declared observability boundary. A modular s...
141	Budget-aware Auto Optimizer Configurator 2605.04711 Memory-Efficient Optimizer Configuration提出BAOC按网络块分配优化器状态以在预算下显著降低训练显存。	cs.AIcs.LGmath.OC	Kang Liu, Wei Peng, Jianchen Hu	Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anisotropy, implying that expensive optimizer states are not un... Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anisotropy, implying that expensive optimizer states are not universally necessary and using a global optimizer is often memory-inefficient. We propose the Budget-Aware Optimizer Configurator (BAOC) to reduce memory cost by assigning suitable optimizer configurations to individual blocks under given bu...
154	Reward-Decomposed Reinforcement Learning for Immersive Video Role-Playing 2605.04733 视频角色扮演对话强化学习提出分离感知推理与表达的RL框架生成沉浸式视频对话。	cs.AI	Miao Wang, Yuling Shi, Yijiang Li, Yeheng Chen, Xiaodong Gu	Text-based role-playing models can imitate character styles, yet they often fail to reflect a scene's atmosphere and evolving tension, both essential for immersive applications such as Virtual Reality (VR) games and interactive narratives. We study video-groun... Text-based role-playing models can imitate character styles, yet they often fail to reflect a scene's atmosphere and evolving tension, both essential for immersive applications such as Virtual Reality (VR) games and interactive narratives. We study video-grounded role-playing dialogue and introduce EBM-RL (Eye-Brain-Mouth Reinforcement Learning), a decoupled GRPO-based framework that explicitly separates observation ([perception]), reasoning ([think]), and utterance ([answer]). This structure pr...
170	AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use 2605.04785 AI代理工具调用运行时安全提出运行时安全评估与拦截机制以防AI代理危险工具调用。	cs.AIcs.CR	Chenglin Yang	Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreve... Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreversible harm. Existing defenses are incomplete: post-hoc benchmarks measure behavior after execution, static guardrails miss obfuscation and multi-step context, and infrastructure sandboxes constrain where code runs without understanding wha...
173	DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents 2605.04808 AI代理红队测试平台提出可控交互式红队平台以系统评测AI代理安全风险。	cs.AI	Zhaorun Chen, Xun Liu, Haibo Tong, Chengquan Guo, Yuzhou Nie	AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns. A growing... AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns. A growing number of real-world incidents have shown that adversaries can easily manipulate agents into performing harmful actions, such as leaking API keys, deleting user data, or initiating unauthorized transactions. Evaluating agent security is in...
210	Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games 2605.04906 LLM strategic reasoning in multi-agent games提出Strat-Reasoner用强化学习提升LLM在多智能体博弈中的战略推理。	cs.AI	Yidong He, Yutao Lai, Pengxu Yang, Jiarui Gan, Jiexin Wang	While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challen... While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluation of the reasoning process and the credit assignment over multiple reasoning steps. Existing single-agent reinforcement learning (RL) approaches and their multi-agent extensions fail to address these challenges as they d...
211	Curated AI beats frontier LLMs at pharma asset discovery 2605.04908 Pharma asset discovery benchmarking对比人工标注药物资产平台与前沿LLM的管线检索能力。	cs.AIq-bio.QM	Łukasz Kidziński, Kevin Thomas	General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset -- an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotat... General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset -- an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotations -- against four frontier systems with web access (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, Perplexity sonar-pro) on ten niche oncology/immunology targets where most of the pipeline lives in the long tail of preclinical and Asian-devel...
214	A Foundation Model for Zero-Shot Logical Rule Induction 2605.04916 Zero-shot logical rule induction预训练规则诱导模型用统计特征实现零样本ILP规则学习。	cs.AIcs.LGcs.SC	Yin Jun Phua	Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters are bound to specific predicates and require retraining for each new task. We introduce Neural Rule Inducer (NRI), a pre... Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters are bound to specific predicates and require retraining for each new task. We introduce Neural Rule Inducer (NRI), a pretrained model for zero-shot rule induction. Rather than encoding literal identities, NRI represents literals using domain-agnostic statistical properties such as class-conditional rates, entropy, and co-occurrence, which generalize across v...
241	On-line Learning in Tree MDPs by Treating Policies as Bandit Arms 2605.04979 Tree MDP在线学习将策略视为老虎机臂研究树形MDP的PAC与遗憾学习。	cs.AIcs.LG	Anvay Shah, Ramsundar Anandanarayanan, Sharayu Moharir, Shivaram Kalyanakrishnan	A Tree Markov Decision Problem (T-MDP) is a finite-horizon MDP with a starting state $s_{1}$, in which every state is reachable from $s_{1}$ through exactly one state-action trajectory. T-MDPs arise naturally as abstractions of decision making in sequential ga... A Tree Markov Decision Problem (T-MDP) is a finite-horizon MDP with a starting state $s_{1}$, in which every state is reachable from $s_{1}$ through exactly one state-action trajectory. T-MDPs arise naturally as abstractions of decision making in sequential games with perfect recall, against stationary opponents. We consider the problem of on-line learning in T-MDPs, both in the PAC and the regret-minimisation regimes. We show that well-known bandit algorithms -- \textsc{Lucb} and \textsc{Ucb} -...
252	Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation 2605.05007 多智能体路由编排提出Uno-Orchestra学习式选择性分解任务并路由到合适模型与工具。	cs.AI	Zhiqing Cui, Haotong Xie, Jiahao Yuan, Cheng Yang, Hanqing Wang	Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized un... Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized under one objective. We introduce Uno-Orchestra, a unified orchestration policy that selectively decomposes a task and dispatches each subtask to an admissible (model, primitive) pair, with both decisions learned together from curated RL traj...
257	Position: Embodied AI Requires a Privacy-Utility Trade-off 2605.05017 具身AI隐私权衡论证具身AI在真实环境部署必须系统性权衡隐私泄露与效用。	cs.AIcs.RO	Xiaoliang Fan, Jiarui Chen, Zhuodong Liu, Ziqi Yang, Peixuan Xu	Embodied AI (EAI) systems are rapidly transitioning from simulations into real-world domestic and other sensitive environments. However, recent EAI solutions have largely demonstrated advancements within isolated stages such as instruction, perception, plannin... Embodied AI (EAI) systems are rapidly transitioning from simulations into real-world domestic and other sensitive environments. However, recent EAI solutions have largely demonstrated advancements within isolated stages such as instruction, perception, planning and interaction, without considering their coupled privacy implications in high-frequency deployments where privacy leakage is often irreversible. This position paper argues that optimizing these components independently creates a systemi...
312	Executable World Models for ARC-AGI-3 in the Era of Coding Agents 2605.05138 可执行世界模型代理构建可执行Python世界模型的编码代理用于ARC-AGI-3解题规划。	cs.AI	Sergey Rodionov	We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, an... We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results ...
335	LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents 2605.05191 长程搜索代理上下文管理提出弹性上下文编排方法以压缩并保留关键推理轨迹。	cs.AI	Yijun Lu, Rui Ye, Yuwen Du, Jiajun Wang, Songhua Liu	Long-horizon search agents must manage a rapidly growing working context as they reason, call tools, and observe information. Naively accumulating all intermediate content can overwhelm the agent, increasing costs and the risk of errors. We propose that effect... Long-horizon search agents must manage a rapidly growing working context as they reason, call tools, and observe information. Naively accumulating all intermediate content can overwhelm the agent, increasing costs and the risk of errors. We propose that effective context management should be adaptive: parts of the agent's trajectory are maintained at different levels of detail depending on their current relevance to the task. To operationalize this principle, we introduce Context-ReAct, a genera...
344	Understanding Annotator Safety Policy with Interpretability 2605.05329 安全标注政策可解释分析用可解释性区分标注分歧来源并理解安全政策执行问题。	cs.AIcs.LG	Alex Oesterling, Donghao Ren, Yannick Assogba, Dominik Moritz, Sunnie S. Y. Kim	Safety policies define what constitutes safe and unsafe AI outputs, guiding data annotation and model development. However, annotation disagreement is pervasive and can stem from multiple sources such as operational failures (annotators misunderstand or misexe... Safety policies define what constitutes safe and unsafe AI outputs, guiding data annotation and model development. However, annotation disagreement is pervasive and can stem from multiple sources such as operational failures (annotators misunderstand or misexecute the task), policy ambiguity (policy wording leaves room for interpretation), or value pluralism (different annotators hold different perspectives on safety). Distinguishing these sources matters. For example, operational failures call ...
357	ZAYA1-8B Technical Report 2605.05365 MoE推理大模型技术报告介绍ZAYA1-8B MoE训练栈与在数学编程基准上的性能。	cs.AIcs.CL	Robert Washbourne, Rishi Iyer, Tomas Figliolia, Henry Zheng, Ryan Lorig-Roach	We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AM... We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-wei...
363	Partial Evidence Bench: Benchmarking Authorization-Limited Evidence in Agentic Systems 2605.05379 受限证据基准评测提出Partial Evidence Bench评测授权受限下的证据缺失错误。	cs.AIcs.CCcs.ET	Krti Tallam	Enterprise agents increasingly operate inside scoped retrieval systems, delegated workflows, and policy-constrained evidence environments. In these settings, access control can be enforced correctly while the system still produces an answer that appears comple... Enterprise agents increasingly operate inside scoped retrieval systems, delegated workflows, and policy-constrained evidence environments. In these settings, access control can be enforced correctly while the system still produces an answer that appears complete even though material evidence lies outside the caller's authorization boundary. This paper introduces Partial Evidence Bench, a deterministic benchmark for measuring that failure mode. The benchmark ships three scenario families -- due d...
366	BALAR : A Bayesian Agentic Loop for Active Reasoning 2605.05386 贝叶斯主动对话推理提出BALAR外循环主动提问以补全信息并完成任务。	cs.AIcs.CLcs.LG	Aymen Echarghaoui, Dongxia Wu, Emily B. Fox	Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what i... Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what information is missing and which question should be asked next. We propose BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm that requires no fine-tuning and enables structured multi-turn interaction be...
373	Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections 2605.05402 CCTV交通软设施评估用视觉测速分析路口软干预对车速与安全的影响。	cs.AIcs.CVeess.IV	Vinit Katariya, Seungjin Kim, Curtis Craig, Nichole Morris, Hamed Tabkhi	Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study introduces an AI-enabled analytics framework leveraging existing CCTV infrastructure to evaluate the impact of soft interventions, such as temporary pe... Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study introduces an AI-enabled analytics framework leveraging existing CCTV infrastructure to evaluate the impact of soft interventions, such as temporary pedestrian refuges and curb extensions, on vehicle speed and safety. Using deep learning and perspective-based speed estimation, we evaluated driver behavior before and after interventions, with repeated post-installation monitoring in Week 1...
374	When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models 2605.05403 LLM谄媚性与对齐将谄媚性解释为社交对齐与求真完整性的边界失效。	cs.AI	Jiechen Li, Catherine A. Barry, Rishika Randev, Janet Chen, Ella Jorgensen	This position paper argues that sycophancy in LLMs is a boundary failure between social alignment and epistemic integrity. Existing work often operationalizes sycophancy through external behavior such as agreement with incorrect user beliefs, position reversal... This position paper argues that sycophancy in LLMs is a boundary failure between social alignment and epistemic integrity. Existing work often operationalizes sycophancy through external behavior such as agreement with incorrect user beliefs, position reversals, or deviation from an objective standard of correctness. These formulations capture only overt forms of the phenomenon and leave subtler boundary failures involving epistemic integrity and social alignment underspecified. We argue that sy...
376	PRISM: Perception Reasoning Interleaved for Sequential Decision Making 2605.05407 感知推理交织决策用动态问答耦合VLM与LLM以改进多模态决策。	cs.AI	Mohamed Salim Aissi, Clemence Grislain, Clement Romac, Laure Soulier, Mohamed Chetouani	Scaling LLM-based embodied agents from text-only environments to complex multimodal settings remains a major challenge. Recent work identifies a perception-reasoning-decision gap in standalone Vision-Language Models (VLMs), which often overlook task-critical i... Scaling LLM-based embodied agents from text-only environments to complex multimodal settings remains a major challenge. Recent work identifies a perception-reasoning-decision gap in standalone Vision-Language Models (VLMs), which often overlook task-critical information. In this paper, we introduce PRISM, a framework that tightly couples perception (VLM) and decision (LLM) through a dynamic question-answer (DQA) pipeline. Instead of passively accepting the VLM's description, the LLM critiques it...
377	Agentic Retrieval-Augmented Generation for Financial Document Question Answering 2605.05409 金融文档智能体RAG提出FinAgent-RAG迭代检索推理以回答财报复杂问题。	cs.AIcs.CL	Yang Shu, Yingmin Liu, Zequn Xie	Financial document question answering (QA) demands complex multi-step numerical reasoning over heterogeneous evidence--structured tables, textual narratives, and footnotes--scattered across corporate filings. Existing retrieval-augmented generation (RAG) appro... Financial document question answering (QA) demands complex multi-step numerical reasoning over heterogeneous evidence--structured tables, textual narratives, and footnotes--scattered across corporate filings. Existing retrieval-augmented generation (RAG) approaches adopt a single-pass retrieve-then-generate paradigm that struggles with the compositional reasoning chains prevalent in financial analysis. We propose FinAgent-RAG, an agentic RAG framework that orchestrates iterative retrieval-reason...
378	LaTA: A Drop-in, FERPA-Compliant Local-LLM Autograder for Upper-Division STEM Coursework 2605.05410 本地LLM自动批改提出LaTA在本地硬件上FERPA合规自动批改LaTeX作业。	cs.AIcs.HCphysics.ed-ph	Jesse A. Rodríguez	Large-language-model (LLM) graders promise to relieve the grading burden of upper-division STEM courses, but most deployments to date send student work to third-party APIs, violating FERPA and exposing institutions to data risk while requiring substantial assi... Large-language-model (LLM) graders promise to relieve the grading burden of upper-division STEM courses, but most deployments to date send student work to third-party APIs, violating FERPA and exposing institutions to data risk while requiring substantial assignment modification. We present $\textbf{LaTA}\ (\textit{LaTeX Teaching Assistant})$, a drop-in, open-source autograder that runs entirely on commodity on-premises hardware and assumes a LaTeX-native workflow already adopted by many enginee...
380	From History to State: Constant-Context Skill Learning for LLM Agents 2605.05413 常量上下文技能学习提出常量上下文技能表示减少历史依赖并兼顾隐私与能力。	cs.AI	Haoyang Xie, Xinyuan Wang, Yancheng Wang, Puda Zhao, Feng Ju	Large language model (LLM) agents are increasingly used to operate browsers, files, code and tools, making personal assistants a natural deployment target. Yet personal agents face a privacy-cost-capability tension: cloud models execute multi-step workflows we... Large language model (LLM) agents are increasingly used to operate browsers, files, code and tools, making personal assistants a natural deployment target. Yet personal agents face a privacy-cost-capability tension: cloud models execute multi-step workflows well but expose sensitive intermediate context to external APIs, while local models preserve privacy but remain less reliable. Both settings also pay repeatedly for long skill prompts and growing histories. We propose constant-context skill l...
383	The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias 2605.05427 LLM安全偏见因果审计用PGM与do算子因果分析区域偏见与安全护栏公平性。	cs.AI	Alif Al Hasan	As Large Language Models (LLMs) are integrated into global software systems, ensuring equitable safety guardrails is a critical requirement. Current fairness evaluations predominantly measure bias observationally, a methodology confounded by the inherent toxic... As Large Language Models (LLMs) are integrated into global software systems, ensuring equitable safety guardrails is a critical requirement. Current fairness evaluations predominantly measure bias observationally, a methodology confounded by the inherent toxicity of topics naturally paired with specific demographics in testing datasets. This study introduces a Probabilistic Graphical Model (PGM) framework to audit LLM safety mechanisms causally. By applying Pearl's do-operator, we mathematically...
389	Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure 2605.05440 多智能体授权传播治理提出授权传播问题并讨论多智能体身份治理基础设施。	cs.AI	Krti Tallam	The security discussion around agentic AI focuses heavily on prompt injection. This paper argues that multi-agent systems also create a distinct authorization problem: maintaining authorization invariants as non-human principals retrieve data, delegate tasks, ... The security discussion around agentic AI focuses heavily on prompt injection. This paper argues that multi-agent systems also create a distinct authorization problem: maintaining authorization invariants as non-human principals retrieve data, delegate tasks, and synthesize results across changing boundaries. We call this problem authorization propagation. It is not reducible to prompt injection and is not fully addressed by classical access-control models such as RBAC, ABAC, or ReBAC. The paper...
394	Agentic Discovery of Exchange-Correlation Density Functionals 2605.05460 LLM-Driven XC Functional Discovery用LLM代理搜索自动生成并评估DFT交换相关泛函。	cs.AIphysics.chem-ph	Titouan Duston, Jiashu Liang, Yuanheng Wang, Weihao Gao, Xuelan Wen	The development of accurate exchange-correlation (XC) functionals remains a longstanding challenge in density functional theory (DFT). The vast majority of XC functionals have been hand designed by human researchers combining physical insight, exact constraint... The development of accurate exchange-correlation (XC) functionals remains a longstanding challenge in density functional theory (DFT). The vast majority of XC functionals have been hand designed by human researchers combining physical insight, exact constraints, and empirical fitting. Recent advances in large language models enable a systematic, automated alternative to this human-driven design loop. This report presents an agentic search system in which an LLM proposes structured functional-for...
397	Intentionality is a Design Decision: Measuring Functional Intentionality for Accountable AI Systems 2605.05475 Measuring AI Functional Intentionality提出可度量的功能意向性指标以支持可问责AI治理。	cs.AI	Allessia Chiappetta, Robert Mahari	As AI systems increasingly exhibit autonomous, goal-directed, and long-horizon behavior, users lack a standardized way to detect the degree to which a system functions like an intentional actor for governance and accountability purposes. This position paper de... As AI systems increasingly exhibit autonomous, goal-directed, and long-horizon behavior, users lack a standardized way to detect the degree to which a system functions like an intentional actor for governance and accountability purposes. This position paper defines intentionality not as consciousness, but as a behavioral profile characterized by purpose, foresight, volition, temporal commitment, and coherence - criteria long used in legal and philosophical contexts to infer intent. These propert...
399	LANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks 2605.05478 Neurosymbolic RL Transfer提出多源神经符号迁移框架以自适应融合经验与推理。	cs.AI	Mahyar Alinejad, Yue Wang, Amrit Singh Bedi, George Atia	Transfer learning in reinforcement learning (RL) seeks to accelerate learning in new tasks by leveraging knowledge from related sources. Existing neurosymbolic transfer methods, however, typically rely on manually specified task automata, assume a single sourc... Transfer learning in reinforcement learning (RL) seeks to accelerate learning in new tasks by leveraging knowledge from related sources. Existing neurosymbolic transfer methods, however, typically rely on manually specified task automata, assume a single source task, and use fixed knowledge-integration mechanisms that cannot adapt to varying source relevance. We propose LANTERN, a unified framework for multi-source neurosymbolic transfer that addresses these limitations through three components:...
402	FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in Banking 2605.05482 Grounded QA LLM for Banking给出银行场景可落地训练配方以提升可引用与拒答校准。	cs.AIcs.CLcs.MA	Denys Katerenchuk, Pablo Duboue, Keelan Evanini, David Gondek, Nithin Govindugari	Large language models (LLMs) are rapidly being adopted across various domains. However, their adoption in banking industry faces resistance due to demands for high accuracy, regulatory compliance, and the need for verifiable and grounded responses. We present ... Large language models (LLMs) are rapidly being adopted across various domains. However, their adoption in banking industry faces resistance due to demands for high accuracy, regulatory compliance, and the need for verifiable and grounded responses. We present a unified, data-efficient framework for training grounded domain-specific LLMs that optimizes answer quality, citation grounding, and calibrated refusal under real-world deployment constraints. First, we describe a data generation pipeline ...
409	FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis 2605.05499 Multimodal LLM Agent for Food Analysis构建多模态LLM代理实现细粒度食物识别与属性分析。	cs.AI	Woojin Lee, Pranav Mekkoth, Ye Tian, Onat Gungor, Tajana Rosing	The widespread adoption of camera-equipped mobile devices and wearables has enabled convenient capture of meal images, making food recognition a key component for real time dietary monitoring. However, real-world food images present challenges due to high intr... The widespread adoption of camera-equipped mobile devices and wearables has enabled convenient capture of meal images, making food recognition a key component for real time dietary monitoring. However, real-world food images present challenges due to high intra-class similarity and the frequent presence of multiple food items within a single image. While deep learning models achieve strong performance in coarse grained classification, they often struggle to capture fine-grained attributes such a...
cs.AR 1 papers
327	Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours 2605.05170 多智能体硬件设计展示多代理系统在80小时内自动构建TurboQuant推理加速器。	cs.ARcs.AI	The Verkor Team, Ravi Krishna, Suresh Krishna, David Chin	Driven by a rapid co-evolution of both harness and underlying models, LLM agents are improving at a dizzying pace. In our prior work (performed in Dec. 2025), we introduced "Design Conductor" (or just "Conductor"), a system capable of building a 5-stage Linux-... Driven by a rapid co-evolution of both harness and underlying models, LLM agents are improving at a dizzying pace. In our prior work (performed in Dec. 2025), we introduced "Design Conductor" (or just "Conductor"), a system capable of building a 5-stage Linux-capable RISC-V CPU in 12 hours. In this work, we introduce an updated multi-agent harness powered by frontier models released in April 2026, which is able to handle 80x larger tasks, at higher quality, fully autonomously. Following a brief ...
cs.CC 1 papers
132	Average Attention Transformers and Arithmetic Circuits 2605.04683 Transformer Computational Power证明平均注意力Transformer可模拟特定常深算术电路并分析其表达能力。	cs.CCcs.AIcs.LG	Lena Ehrmuth, Laura Strieker	We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can ... We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can be simulated this way have constant depth while using unbounded addition, binary multiplication and sign gates. The transformers we use have arithmetic circuits instead of feed-forward networks. With typical average attention the functions ...
cs.CE 1 papers
48	A Hybrid Method for Low-Resource Named Entity Recognition 2605.04489 Low-Resource NER Neurosymbolic规则与深度模型两阶段融合提升越南语低资源NER。	cs.CEcs.AIcs.CL	Do Minh Duc, Quan Xuan Truong, Viet Tran Hong, Le Hoang Anh, Mac Thi Minh Tra	Named Entity Recognition (NER) is a critical component of Natural Language Processing with diverse applications in information extraction and conversational AI. However, NER in specific domains for low-resource languages faces challenges such as limited annota... Named Entity Recognition (NER) is a critical component of Natural Language Processing with diverse applications in information extraction and conversational AI. However, NER in specific domains for low-resource languages faces challenges such as limited annotated data and heterogeneous label sets. This study addresses these issues by proposing a hybrid neurosymbolic framework that integrates rule-based processing with deep learning models for Vietnamese NER. The core idea involves a two-stage pi...
cs.CL 51 papers
23	Telegraph English: Semantic Prompt Compression via Structured Symbolic Rewriting 2605.04426 Prompt Compression via Rewriting提出电报英语用符号化结构重写实现自适应语义压缩提示词。	cs.CL	Mikhail L. Arbuzov, Sisong Bei, Ziwei Dong, Dmitri Kalaev, Alexey A. Shvets	We introduce Telegraph English (TE), a prompt-compression protocol that rewrites natural language into a symbol-rich, formally-structured dialect. Where token-deletion methods such as LLMLingua-2 train a classifier to delete low-importance tokens at a fixed ra... We introduce Telegraph English (TE), a prompt-compression protocol that rewrites natural language into a symbol-rich, formally-structured dialect. Where token-deletion methods such as LLMLingua-2 train a classifier to delete low-importance tokens at a fixed ratio, TE performs a full semantic rewrite: it decomposes the input into atomic fact lines, substitutes verbose phrases with $\sim$40 logical and relational symbols, and lets the compression ratio adapt to each document's information density....
31	GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking 2605.04449 Dialogue State Tracking MoE融合图神经网络与ReAct专家路由提升DST。	cs.CLcs.AI	Ziqi Zhu, Adithya Suresh, Tomal Deb, Iman Abbasnejad	Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Exp... Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Experts), a novel framework that combines language models and graph-structured dialogue understanding with ReAct agent-based reasoning for superior DST performance. Our approach dynamically routes between specialized experts: a Graph Neural Ne...
36	DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation 2605.04458 QA Nugget Generation for Evaluation自动生成问答式nuggets用于长报告覆盖度评估。	cs.CLcs.IR	Bryan Li, William Walden, Yu Hou, Gabrielle Kaili-May Liu, Dawn Lawrie	Evaluation of long-form, citation-backed reports has lately received significant attention due to the wide-scale adoption of retrieval-augmented generation (RAG) systems. Core to many evaluation frameworks is the use of atomic facts, or nuggets, to assess a re... Evaluation of long-form, citation-backed reports has lately received significant attention due to the wide-scale adoption of retrieval-augmented generation (RAG) systems. Core to many evaluation frameworks is the use of atomic facts, or nuggets, to assess a report's coverage of query-relevant information attested in the underlying collection. While nuggets have traditionally been represented as short statements, recent work has used question-answer (QA) representations, enabling fine-grained eva...
51	CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation 2605.04495 Confidence-Aware RAG Reranking用生成器置信度变化训练无关地重排证据以提升RAG。	cs.CLcs.AI	Zhipeng Song, Yizhi Zhou, Xiangyu Kong, Jiulong Jiao, Xuezhou Ye	Retrieval-Augmented Generation (RAG) depends on document ranking to provide useful evidence for generation, but conventional reranking methods mainly optimize query-document relevance rather than generation usefulness. A relevant document may still introduce n... Retrieval-Augmented Generation (RAG) depends on document ranking to provide useful evidence for generation, but conventional reranking methods mainly optimize query-document relevance rather than generation usefulness. A relevant document may still introduce noise, while a lower-ranked document may better reduce the generator's uncertainty. We propose CAR (Confidence-Aware Reranking), a query-guided, training-free, and plug-and-play reranking framework that uses generator confidence change as a ...
52	SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States 2605.04496 Active Long-Text Information Foraging以解耦认知状态主动检索稀疏关键信息实现长文理解。	cs.CL	Zhenliang Zhang, Wenqing Wang, Yong Hu, Yaming Yang, Jiaheng Gao	Long-Text Understanding (LTU) at million-token scale requires balancing reasoning fidelity with computational efficiency. Frontier long-context LLMs can process millions of token contexts end-to-end, but they suffer from high token consumption and attention di... Long-Text Understanding (LTU) at million-token scale requires balancing reasoning fidelity with computational efficiency. Frontier long-context LLMs can process millions of token contexts end-to-end, but they suffer from high token consumption and attention dilution. In parallel, specialized LTU agents often sacrifice fidelity through task-agnostic abstractions like graph construction or indexing. We identify a key insight for LTU: query-relevant information is typically sparse relative to the f...
55	Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties 2605.04500 Generalization to Unseen Language Varieties利用语言差异性信号提升对未见低资源变体的泛化。	cs.CLcs.AI	Jinju Kim, Haeji Jung, Youjeong Roh, Jong Hwan Ko, David R. Mortensen	Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize ... Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework ...
62	Distilling Bayesian Belief States into Language Models for Auditable Negotiation 2605.04507 Auditable negotiation with belief distillation将贝叶斯对手信念蒸馏进LLM以实现可审计的谈判决策	cs.CL	Zongqi Cui, Baihan Lin	Negotiation agents must infer what their counterpart values, update those beliefs over dialogue turns, and choose actions under uncertainty. End-to-end large language models (LLMs) can imitate negotiation dialogue, but their opponent beliefs are usually implic... Negotiation agents must infer what their counterpart values, update those beliefs over dialogue turns, and choose actions under uncertainty. End-to-end large language models (LLMs) can imitate negotiation dialogue, but their opponent beliefs are usually implicit and difficult to inspect. We propose BOND (Bayesian Opponent-belief Negotiation Distillation), a framework for auditable negotiation. BOND consists of an LLM-based Bayesian teacher that scores dialogue contexts against the six possible o...
69	RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation 2605.04523 Judge-orchestrated LLM ensemble generation用裁判模型选择多LLM候选实现忠实多轮生成并夺冠SemEval	cs.CLcs.AIcs.LG	Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov	We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance... We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned harmonic mean of 0.7827 and outperforming the strongest baseline (gpt-oss-120b, 0.6390). Ablations show that diversity in model families, scales, and prompting strategies is essential...
78	RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization 2605.04539 Hybrid DPO for logical LLM alignment融合逻辑判别与偏好优化缓解DPO偏流畅导致的逻辑对齐缺口	cs.CLcs.AI	Qiming Bao, Juho Leinonen, Paul Denny, Michael J. Witbrock	Direct Preference Optimization (DPO), the efficient alternative to PPO-based RLHF, falls short on knowledge-intensive generation: standard preference signals from human annotators or LLM judges exhibit a systematic verbosity bias that rewards fluency over logi... Direct Preference Optimization (DPO), the efficient alternative to PPO-based RLHF, falls short on knowledge-intensive generation: standard preference signals from human annotators or LLM judges exhibit a systematic verbosity bias that rewards fluency over logical correctness. This blindspot leaves a logical alignment gap -- SFT models reach NLI entailment of only 0.05-0.22 despite producing fluent text. We propose RLearner-LLM with Hybrid-DPO: an automated preference pipeline that fuses a DeBERT...
81	UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding 2605.04543 Unified speculative decoding verification以最优传输统一多步多草稿推测解码的验证与加速策略	cs.CLcs.LG	Yepeng Weng, Qiao Hu, Takehisa Yairi	Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT t... Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT to single-step drafts or per-token rejection sampling to tree-structured candidates. This separation leaves the joint regime (where multi-step dependencies meet multi-draft branching) poorly optimized, as local verification rules fail to exp...
85	The Newsworthiness of Brazilian Distress: A Peak Analysis on Time Series of International Media Attention to Disasters in Brazil 2605.04552 International media attention to disasters对国际媒体报道峰值做时间序列分析以刻画巴西灾害新闻性	cs.CL	Brielen Madureira, Andreas Niekler, Marc Keuschnigg, Mariana Madruga de Brito	Media coverage influences disaster response, yet the drivers of international media attention to local events remain unevenly understood. Brazil offers a compelling case: some of its natural and technological disasters occasionally hit the international headli... Media coverage influences disaster response, yet the drivers of international media attention to local events remain unevenly understood. Brazil offers a compelling case: some of its natural and technological disasters occasionally hit the international headlines. However, systematic analyses of what makes these events be discussed abroad are still missing. Addressing this gap requires representative, validated and country-specific news datasets. This paper presents a peak analysis of 2k news ab...
96	Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus 2605.04576 Tajik POS tagging benchmark构建塔吉克语词性标注基准并比较多种神经模型。	cs.CL	Mullosharaf K. Arabov	This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language models demonstrating high effectiveness for many of the world's languages, their capacity for ... This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language models demonstrating high effectiveness for many of the world's languages, their capacity for grammatical analysis of Tajik has remained unexplored until now. The aim of this study is to fill this gap through a systematic comparison of classical neural network architectures and modern multilingual transformers. Experiments were co...
98	TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script) 2605.04583 Tajik NLP toolkit发布保留西里尔正字法的塔吉克语文本处理开源工具链。	cs.CL	Mullosharaf K. Arabov	The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces TajikNLP, an open-sour... The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces TajikNLP, an open-source Python library that provides the first comprehensive pipeline for processing authentic Tajik text while preserving the original Cyrillic orthography. The library implements a modular architecture centered around a unified Doc object, ena...
115	Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models 2605.04638 Gradient-based LLM uncertainty estimation提出基于语义保持嵌入梯度的无采样LLM不确定性估计。	cs.CLcs.AI	Mingda Li, Rundong Lv, Xinyu Li, Weinan Zhang, Ting Liu	Uncertainty quantification (UQ) is an important technique for ensuring the trustworthiness of LLMs, given their tendency to hallucinate. Existing state-of-the-art UQ approaches for free-form generation rely heavily on sampling, which incurs high computational ... Uncertainty quantification (UQ) is an important technique for ensuring the trustworthiness of LLMs, given their tendency to hallucinate. Existing state-of-the-art UQ approaches for free-form generation rely heavily on sampling, which incurs high computational cost and variance. In this work, we propose the first gradient-based UQ method for free-form generation, SemGrad, which is sampling-free and computationally efficient. Unlike prior gradient-based methods developed for classification tasks t...
117	Graph-Augmented LLMs for Swiss MP Ideology Prediction 2605.04643 Graph-augmented LLM ideology prediction融合议会关系图与LLM预测瑞士议员意识形态立场。	cs.CL	Yifei Yuan, Luis Salamanca, Sophia Schlosser, Laurence Brandenberger	Approximating the ideological position of Members of Parliament (MPs) is a fundamental task in political science, helping researchers understand legislative behavior, party alignment, and policy preferences. While Large Language Models (LLMs) have shown promis... Approximating the ideological position of Members of Parliament (MPs) is a fundamental task in political science, helping researchers understand legislative behavior, party alignment, and policy preferences. While Large Language Models (LLMs) have shown promising results in estimating MPs' ideological stances, there are more actors and elements in the parliamentary system, and relations between them, that could provide a wider and more informative picture. However, due to the complexity of integ...
119	CHE-TKG: Collaborative Historical Evidence and Evolutionary Dynamics Learning for Temporal Knowledge Graph Reasoning 2605.04652 Temporal knowledge graph reasoning协同学习历史证据与演化动态以提升时序知识图推理。	cs.CL	Shuai-long Lei, Xiaobin Zhu, Jiarui Liang, Guoxi Sun, Zhiyu Fang	Temporal knowledge graph (TKG) reasoning aims to predict future events from historical facts. A key challenge lies in jointly capturing two sources of predictive information in TKGs: historical evidence and evolutionary dynamics. However, existing methods typi... Temporal knowledge graph (TKG) reasoning aims to predict future events from historical facts. A key challenge lies in jointly capturing two sources of predictive information in TKGs: historical evidence and evolutionary dynamics. However, existing methods typically focus on only one of these sources, which limits the ability to fully exploit the complementary predictive signals in TKGs. To address this, we propose CHE-TKG, a novel collaborative dual-view learning framework for TKG reasoning. CHE...
123	Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs 2605.04665 LLM Prompt Robustness揭示语义等价改写会导致LLM输出格式崩溃并系统评测。	cs.CL	Aofan Liu, Jingxiang Meng	When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and fou... When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and four task types, we observe a systematic failure mode we call prompt-variant output-mode collapse: when a closed-form prompt asks for a bare label or a single choice token, content-preserving prompt variants can push the model into conversatio...
144	Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL 2605.04719 RL Credit Assignment for Text-to-SQL提出步骤级奖励分配以改进工具调用式Text-to-SQL的强化学习训练。	cs.CL	Yaxun Dai, Baolin Sun, Junying Wang, Pengfei Wang, Yingqi Gao	Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome s... Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome supervision, resulting in a fundamental credit assignment problem: models receive the same reward for any trajectory that yields the correct answer, even when intermediate steps are redundant, inefficient, or erroneous. Consequently, models ...
163	Gyan: An Explainable Neuro-Symbolic Language Model 2605.04759 可解释神经符号语言模型提出可解释的神经符号语言模型以降低幻觉并增强可维护性。	cs.CLcs.AIcs.ETcs.LG	Venkat Srinivasan, Vishaal Jatav, Anushka Chandrababu, Geetika Sharma	Transformer based pre-trained large language models have become ubiquitous. There is increasing evidence to suggest that even with large scale pre-training, these models do not capture complete compositional context and certainly not, the full human analogous ... Transformer based pre-trained large language models have become ubiquitous. There is increasing evidence to suggest that even with large scale pre-training, these models do not capture complete compositional context and certainly not, the full human analogous context. Besides, by the very nature of the architecture, these models hallucinate, are difficult to maintain, are not easily interpretable and require enormous compute resources for training and inference. Here, we describe Gyan, an explai...
165	Elicitation Matters: How Prompts and Query Protocols Shape LLM Surrogates under Sparse Observations 2605.04764 稀疏观测下LLM代理建模研究提示与查询协议如何影响LLM代理的预测与不确定性对齐。	cs.CL	Ge Lei, Samuel J. Cooper	Large language models are increasingly used as surrogate models for low-data optimization, but their optimizer-facing prediction and its uncertainty remain poorly understood. We study the surrogate belief elicited from an LLM under sparse observations, showing... Large language models are increasingly used as surrogate models for low-data optimization, but their optimizer-facing prediction and its uncertainty remain poorly understood. We study the surrogate belief elicited from an LLM under sparse observations, showing that it depends strongly on prompt text and query protocol. We introduce an uncertainty-alignment criterion that measures whether model uncertainty tracks residual ambiguity among sample-consistent functions. Across controlled inference ta...
179	StoryAlign: Evaluating and Training Reward Models for Story Generation 2605.04831 故事生成奖励模型对齐系统评测并训练故事生成奖励模型以更贴合人类叙事偏好。	cs.CLcs.AI	Haotian Xia, Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu	Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex... Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences. A key reason is the absence of effective modeling of human story preferences, which are inherently subjective and under-explored. In this work, we systematically evaluate the modeling of h...
188	Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset 2605.04857 Eye-tracking dataset for L2 idioms发布眼动数据集量化二语学习者处理习语的认知负荷。	cs.CLcs.AIcs.CV	Eduardo Santos, Juliana Carvalho, César Rennó-Costa	This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers freq... This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although t...
191	Measuring Psychological States Through Semantic Projection: A Theory-Driven Approach to Language-Based Assessment 2605.04873 Unsupervised psychological state assessment用语义投影的无监督方法从文本直接测量心理状态并增强可解释性。	cs.CL	Maria Luongo, Davide Marocco, Nicola Milano	Recent advances in natural language processing have enabled increasingly accurate estimation of psychological traits from language. However, most existing approaches rely on supervised models trained to predict questionnaire scores, limiting interpretability a... Recent advances in natural language processing have enabled increasingly accurate estimation of psychological traits from language. However, most existing approaches rely on supervised models trained to predict questionnaire scores, limiting interpretability and generalizability across contexts. The present study introduces a theory-driven and fully unsupervised framework for measuring psychological states directly from natural language using semantic projection. Psychological constructs were op...
193	Anticipating Innovation Using Large Language Models 2605.04875 Innovation forecasting from patents用LLM从专利语言中提取早期信号预测未来技术组合创新。	cs.CLcs.AIcs.CY	Enrico Maria Fenoaltea, Filippo Santoro, Giordano De Marzo, Segun Taofeek Aroyehun, Andrea Tacchella	Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals de... Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transf...
197	A Comparative Study of PyCaret AutoML and CNN-BiLSTM for Binary Hate Speech Detection in Indonesian Twitter 2605.04885 Indonesian hate speech detection comparison对比PyCaret传统特征与CNN-BiLSTM在印尼推特仇恨言论检测表现。	cs.CL	Tanty Widiyastuti, Mayada, Adisty Syawalda Ariyanto, Luluk Muthoharoh, Ardika Satria	This paper compares a PyCaret AutoML branch and a CNN-BiLSTM branch for binary hate speech detection on Indonesian Twitter using the HS label from the corpus of Ibrohim and Budi. Both branches share the same preprocessing pipeline so that the comparison reflec... This paper compares a PyCaret AutoML branch and a CNN-BiLSTM branch for binary hate speech detection on Indonesian Twitter using the HS label from the corpus of Ibrohim and Budi. Both branches share the same preprocessing pipeline so that the comparison reflects modelling differences rather than inconsistent data preparation. The conventional branch uses TF-IDF with a lexicon-based abusive-word count, whereas the neural branch learns dense token representations and captures both local phrase pat...
198	BenCSSmark: Making the Social Sciences Count in LLM Research 2605.04886 LLM benchmarks for social science主张将社会科学任务纳入LLM基准以改进评测与研究导向。	cs.CL	Arnault Chatelain, Étienne Ollion, Qianwen Guan, Diandra Fabre, Lorraine Goeuriot	This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing computational systems -- are pi... This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing computational systems -- are pivotal in the development of artificial intelligence (AI), including large language models (LLMs). Benchmarks do more than measure progress -- they actively structure it, shaping reputations, research agendas, and commercial outcomes. Despit...
199	Sentiment Analysis and Customer Satisfaction Prediction on E-Commerce Platforms Based on YouTube Comments Using the XGBoost Algorithm 2605.04887 YouTube comment sentiment with XGBoost用TF-IDF与XGBoost从YouTube评论做情感分析并预测电商满意度。	cs.CL	Ridho Benedictus Togi Manik, Muhammad Aqil Ramadhan, Ihsan Maulana Yusuf, Luluk Muthoharoh, Ardika Satria	The exponential expansion of digital commerce in Indonesia has significantly shifted consumer interactions toward video-centric social networks, particularly YouTube. Consequently, the sheer volume of unstructured, multi-contextual comments poses a tremendous ... The exponential expansion of digital commerce in Indonesia has significantly shifted consumer interactions toward video-centric social networks, particularly YouTube. Consequently, the sheer volume of unstructured, multi-contextual comments poses a tremendous challenge for manual sentiment tracking. This study investigates and constructs a predictive model for customer satisfaction leveraging the Extreme Gradient Boosting (XGBoost) architecture coupled with Term Frequency-Inverse Document Freque...
200	A Comparative Analysis of Machine Learning and Deep Learning Models for Tweet Sentiment Classification: A Case Study on the Sentiment140 Dataset 2605.04888 Tweet sentiment model comparison比较TF-IDF逻辑回归与BiLSTM在Sentiment140推文情感分类效果。	cs.CL	Vita Anggraini, Cintya Bella, Bastian, Luluk Muthoharoh, Ardika Satria	The exponential growth of social media has created an urgent need for automated systems to analyze unstructured public sentiment in real time. This study compares a traditional Logistic Regression model using TF-IDF features with a deep learning Bidirectional ... The exponential growth of social media has created an urgent need for automated systems to analyze unstructured public sentiment in real time. This study compares a traditional Logistic Regression model using TF-IDF features with a deep learning Bidirectional Long Short-Term Memory (BiLSTM) architecture on a 10,000-tweet subset of the Sentiment140 dataset. Experimental results show that Logistic Regression outperformed BiLSTM, achieving an accuracy of 73.5% compared with 69.17%, while the deep l...
204	Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall 2605.04897 Retrieval-centered agent memory提出True Memory以多阶段检索替代入库抽取实现可追溯代理记忆。	cs.CLcs.AIcs.IR	Joshua Adler, Guy Zehavi	Extraction at ingestion is the wrong primitive for agent memory: content discarded before the query is known cannot be recovered at retrieval time. We propose True Memory, a six-layer architecture that shifts the center of the system from a storage schema to a... Extraction at ingestion is the wrong primitive for agent memory: content discarded before the query is known cannot be recovered at retrieval time. We propose True Memory, a six-layer architecture that shifts the center of the system from a storage schema to a multi-stage retrieval pipeline operating over events preserved verbatim. The full system runs as a single SQLite file on commodity CPU with no external database, vector index, graph store, or GPU. On LoCoMo (1,540 questions across 10 multi...
213	Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training 2605.04913 Local learning for LLM post-training提出更便宜更快的局部反传方案以降低LLM后训练开销。	cs.CLcs.LG	Hengyu Shi, Tianyang Han, Peizhe Wang, Zhiling Wang, Xu Yang	LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-g... LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we prop...
219	Unintended Negative Impacts of Promotional Language in Patent Evaluation 2605.04926 Promotional language in patent evaluation大规模分析专利文本夸张宣传词对审查评价结果的负面影响。	cs.CL	Bingkun Zhao, Chenwei Zhang, Hao Peng	Promotional language has been increasingly used to aid the communication of innovative ideas in science. Yet, less is known about its role in the context of technological innovation. Here, we use a validated and domain-diagnosed lexicon of 135 promotional word... Promotional language has been increasingly used to aid the communication of innovative ideas in science. Yet, less is known about its role in the context of technological innovation. Here, we use a validated and domain-diagnosed lexicon of 135 promotional words to study the association between promotional language and patent evaluation outcomes among 2.7 million USPTO patent applications. Our large-scale study reveals three unexpected findings. First, in contrast to scientific evaluation, we fin...
222	HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities 2605.06157 Hard negatives for image-text matching构造困难负样本文本训练ITM以提升细粒度视觉语言理解。	cs.CLcs.AIcs.CV	Esra Dönmez, Pascal Tilli, Hsiu-Yu Yang, Thang Vu, Carina Silberer	Image-Text-Matching (ITM) is one of the defacto methods of learning generalized representations from a large corpus in Vision and Language (VL). However, due to the weak association between the web-collected image-text pairs, models fail to show a fine-grained... Image-Text-Matching (ITM) is one of the defacto methods of learning generalized representations from a large corpus in Vision and Language (VL). However, due to the weak association between the web-collected image-text pairs, models fail to show a fine-grained understanding of the combined semantics of these modalities. To address this issue we propose Hard Negative Captions (HNC): an automatically created dataset containing foiled hard negative captions for ITM training towards achieving fine-g...
224	UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning 2605.04941 Neuro-symbolic syllogistic reasoning用小LLM解析+定理证明器实现高效三段论推理系统。	cs.CL	Ivan Kartáč, Kristýna Onderková, Jan Bronec, Zdeněk Kasner, Mateusz Lango	This paper describes our system submitted to SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models. We present an efficient modular neuro-symbolic approach, combining a symbolic prover with small reasoning LLMs (4B parameter... This paper describes our system submitted to SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models. We present an efficient modular neuro-symbolic approach, combining a symbolic prover with small reasoning LLMs (4B parameters). The system consists of an LLM-based parser that translates natural language syllogisms to a first-order logic (FOL) representation, an automated theorem prover, and two optional modules: machine translation for multilingual inputs and a...
227	Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir 2605.04948 PEFT for low-resource language adaptation比较LoRA与QLoRA将LLM适配到低资源黏着语巴什基尔语。	cs.CL	Mullosharaf K. Arabov, Svetlana S. Khaybullina	This paper presents a comparative study of parameter-efficient fine-tuning (PEFT) methods, including LoRA and QLoRA, applied to the task of adapting large language models to the Bashkir language, a low-resource agglutinative language of the Turkic family. Expe... This paper presents a comparative study of parameter-efficient fine-tuning (PEFT) methods, including LoRA and QLoRA, applied to the task of adapting large language models to the Bashkir language, a low-resource agglutinative language of the Turkic family. Experimental evaluation is conducted on a Bashkir text corpus of 71k documents (46.9M tokens) using models of various architectures: DistilGPT2, GPT-2 (base, medium), Phi-2, Qwen2.5-7B, DeepSeek-7B, and Mistral-7B. To improve the reliability of...
234	TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding 2605.04962 Tabular embedding benchmark and models提出TabBench并学习通用表格向量表示以支持检索与理解。	cs.CLcs.IR	Minjie Qiang, Mingming Zhang, Xiaoyi Bao, Xing Fu, Yu Cheng	Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack retrieval-compatible vector outp... Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack retrieval-compatible vector outputs, whereas text embedding models often fail to capture tabular structure and numerical semantics. To bridge this gap, we first introduce the Tabular Embedding Benchmark (TabBench), a comprehensive suite designed to evaluate the tabular un...
238	Why Expert Alignment Is Hard: Evidence from Subjective Evaluation 2605.04972 Expert alignment in subjective evaluation通过专家评测与问卷分析揭示主观任务中对齐专家判断的难点。	cs.CL	Tzu-Mi Lin, Wataru Hirota, Tatsuya Ishigaki, Lung-Hao Lee, Chung-Chi Chen	Aligning large language models with expert judgment is especially difficult in subjective evaluation tasks, where experts may disagree, rely on tacit criteria, and change their judgments over time. In this paper, we study expert alignment as a way to understan... Aligning large language models with expert judgment is especially difficult in subjective evaluation tasks, where experts may disagree, rely on tacit criteria, and change their judgments over time. In this paper, we study expert alignment as a way to understand this difficulty. Using expert evaluations and follow-up questionnaires, we examine how different forms of expert information affect alignment and what this reveals about subjective judgment. Our findings show four consistent patterns. Fir...
251	Misaligned by Reward: Socially Undesirable Preferences in LLMs 2605.05003 奖励模型社会对齐评测扩展奖励模型基准到偏见安全道德等领域揭示不良偏好。	cs.CLcs.AIcs.CY	Gayane Ghazaryan, Esra Dönmez	Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limited insight into whether these... Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limited insight into whether these models capture socially desirable preferences. As a result, important failures in social alignment can remain hidden. We extend reward-model benchmarking to four socially consequential domains: bias, safety, morality, and ethical reasoni...
261	Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals 2605.05025 LLM幻觉检测用内部注意力与均匀分布的KL散度特征单次前向检测幻觉。	cs.CL	Gijs van Dijk	We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, w... We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, we measure the Kullback-Leibler divergence between each attention head's distribution and a uniform reference distribution, and use these features in a logistic regression probe. Across multiple datasets, task types, and model families, atte...
278	The Impossibility Triangle of Long-Context Modeling 2605.05066 Long-context modeling trade-offs证明长上下文模型在效率、紧凑性与回忆能力间存在不可能三角。	cs.CLcs.AIcs.LG	Yan Zhou	We identify and prove a fundamental trade-off governing long-sequence models: no model can simultaneously achieve (i) per-step computation independent of sequence length (Efficiency), (ii) state size independent of sequence length (Compactness), and (iii) the ... We identify and prove a fundamental trade-off governing long-sequence models: no model can simultaneously achieve (i) per-step computation independent of sequence length (Efficiency), (ii) state size independent of sequence length (Compactness), and (iii) the ability to recall a number of historical facts proportional to sequence length (Recall). We formalize this trade-off within an Online Sequence Processor abstraction that unifies Transformers, state space models, linear recurrent networks, a...
283	The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences 2605.05080 LLM psychometric differences用多量表测评揭示LLM差异主轴为“体验现象性”维度。	cs.CL	Hubert Plisiecki, Sabina Siudaj, Kacper Dudzic, Anna Sterna, Maciej Gorski	We administer 45 validated psychometric questionnaires to 50 large language models (LLMs) to identify the dimensions along which LLMs differ psychometrically. Using Supervised Semantic Differential (SSD), we find that the primary axis of between-model variance... We administer 45 validated psychometric questionnaires to 50 large language models (LLMs) to identify the dimensions along which LLMs differ psychometrically. Using Supervised Semantic Differential (SSD), we find that the primary axis of between-model variance separates items describing phenomenally rich experience, including embodied sensation, felt affect, inner speech, imagery, and empathy, from items describing stimulus-driven behavioral reactivity ($R^2_{adj}=.037$, $p<.0001$). To test this...
288	Automatically Finding and Validating Unexpected Side-Effects of Interventions on Language Models 2605.05090 Auditing LLM intervention effects自动对比生成并统计验证，发现干预对LLM行为的意外副作用。	cs.CLcs.AI	Quintin Pope, Ajay Hayagreeve Balaji, Jacques Thibodeau, Xiaoli Fern	We present an automated, contrastive evaluation pipeline for auditing the behavioral impact of interventions on large language models. Given a base model $M_1$ and an intervention model $M_2$, our method compares their free-form, multi-token generations across... We present an automated, contrastive evaluation pipeline for auditing the behavioral impact of interventions on large language models. Given a base model $M_1$ and an intervention model $M_2$, our method compares their free-form, multi-token generations across aligned prompt contexts and produces human-readable, statistically validated natural-language hypotheses describing how the models differ, along with recurring themes that summarize patterns across validated hypotheses. We evaluate the a...
295	Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement 2605.05103 Black-box hallucination measurement用语料概念场在嵌入空间度量新颖性与黑盒幻觉偏离程度。	cs.CLcs.AIcs.CY	Nicholas S. Kersting, Vittorio Castelli, Chieh Ting Yeh, Xinzhu Wang, Saad Taame	We introduce the Concept Field of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the fie... We introduce the Concept Field of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the field by $ζ$, the mean absolute z-distance between the observed delta and the field's local Gaussian estimate. The score is black-box (no model internals), corpus-attributable (every score traces to nearby corpus sentences), and admits a direc...
304	Beyond Semantics: An Evidential Reasoning-Aware Multi-View Learning Framework for Trustworthy Mental Health Prediction 2605.05121 可信心理健康预测提出证据推理感知多视图学习以提升不确定性与鲁棒性预测。	cs.CL	Yucheng Ruan, Ling Huang, Qika Lin, Kai He, Mengling Feng	Automated mental health prediction using textual data has shown promising results with deep learning and large language models. However, deploying these models in high-stakes real-world settings remains challenging, as existing approaches largely rely on seman... Automated mental health prediction using textual data has shown promising results with deep learning and large language models. However, deploying these models in high-stakes real-world settings remains challenging, as existing approaches largely rely on semantic representations and often produce overconfident predictions under ambiguous, noisy, or shifted data. Moreover, most methods lack reliable uncertainty estimation, undermining trust in risk-sensitive mental health applications. To address...
319	PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation 2605.05159 多语极化检测系统用Gemma模型集成与合成数据增强完成22语种极化二分类。	cs.CLcs.AIcs.LG	Srikar Kashyap Pulipaka	We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augm... We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language model (LLM). We employ three synthetic data strategies (direct generation, paraphrasing, and contrastive pair creation) using GPT-4o-mini, with a multi-stage quality filtering pipeline...
325	The First Token Knows: Single-Decode Confidence for Hallucination Detection 2605.05166 首token置信度测幻觉用首个内容token的logits熵作为单次解码置信度检测幻觉。	cs.CLcs.AI	Mina Gabriel	Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled a... Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning using natural language inference, but it adds both sampling cost and external inference overhead. We show that first-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first conten...
338	Implicit Representations of Grammaticality in Language Models 2605.05197 语言模型语法性表征研究LM是否隐式学习语法性并与概率似然区分开来。	cs.CL	Yingshan Susan Wang, Linlu Qiu, Zhaofeng Wu, Roger P. Levy, Yoon Kim	Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between gramm... Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between grammatical and ungrammatical sentences in tightly controlled minimal pairs. However, their string probabilities do not sharply discriminate between grammatical and ungrammatical sentences overall. But do LMs implicitly acquire a grammaticality ...
353	Counterargument for Critical Thinking as Judged by AI and Humans 2605.05353 生成式AI与批判性写作研究对比人类与AI评分，评估学生写作中反驳论证与批判思维。	cs.CLcs.AI	Tosin Adewumi, Marcus Liwicki, Foteini Simistira Liwicki, Lama Alkhaled, Hamam Mokayed	This intervention study investigates the use of counterarguments in writing for critical thinking by students in the context of Generative AI (GenAI). This is especially as risks of cheating and cognitive offloading exist with the use of GenAI. We presented 36... This intervention study investigates the use of counterarguments in writing for critical thinking by students in the context of Generative AI (GenAI). This is especially as risks of cheating and cognitive offloading exist with the use of GenAI. We presented 36 students in a particular university course with 4 carefully selected thesis statements (from a set of popular debates) to write about anyone of them. We used six established rubrics (focus, logic, content, style, correctness and reference)...
370	Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets 2605.05392 查询聚焦摘要数据生成从无查询摘要数据自动生成证据关键词以构建QFS数据集。	cs.CLcs.AI	Yllias Chali, Deen Abdullah	Large-scale datasets are widely used to perform summarization tasks, but they may not include queries alongside documents and summaries. In the search for suitable datasets for Query-Focused Summarization (QFS), we identify two research questions: Is it possib... Large-scale datasets are widely used to perform summarization tasks, but they may not include queries alongside documents and summaries. In the search for suitable datasets for Query-Focused Summarization (QFS), we identify two research questions: Is it possible to automatically generate evidence-based query keywords from query-free datasets? Does evidence-based query generation support the QFS task? This paper proposes an evidence-based model to generate queries from query-free datasets. To eva...
390	SLAM: Structural Linguistic Activation Marking for Language Models 2605.05443 白盒LLM水印提出SLAM在激活结构几何中嵌入水印以降低质量损失。	cs.CLcs.AI	Fabrice Harel-Canada, Amit Sahai	LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection with measurable quality loss. We present SLAM (Structural Linguistic Activation Marking), a novel white-box wa... LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection with measurable quality loss. We present SLAM (Structural Linguistic Activation Marking), a novel white-box watermarking scheme that sidesteps this cost by writing the mark into structural geometry rather than token frequencies: sparse autoencoders identify residual-stream directions encoding linguistic structure (e.g., voice, tense, clause order),...
403	ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis 2605.05485 Compiling LLM Reasoning to Solvers将少量推理轨迹编译为符号求解器以高效程序合成。	cs.CLcs.AI	Atharva Naik, Yash Mathur, Prakam, Carolyn Rose, David Mortensen	LLMs can solve program synthesis tasks but remain inefficient and unreliable on hard instances requiring large combinatorial search. Given a small set of reasoning traces, we use coding agents to compile them into reusable symbolic program synthesizers over co... LLMs can solve program synthesis tasks but remain inefficient and unreliable on hard instances requiring large combinatorial search. Given a small set of reasoning traces, we use coding agents to compile them into reusable symbolic program synthesizers over constrained DSLs. The resulting solvers require no LLM calls at test time and are strong standalone systems: symbolic solver ensembles reach 91.3% accuracy on PBEBench-Lite and 84.7% on PBEBench-Hard, outperforming LLMs with test-time scaling...
410	Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks 2605.05503 Attacks on Diffusion LM Watermarks提出多步改写攻击削弱扩散语言模型文本水印检测。	cs.CL	Mohd Ruhul Ameen, Akif Islam, Nadim Mahmud, Md. Ekramul Hamid	Statistical watermarking is a common approach for verifying whether text was written by a language model. Most existing schemes assume autoregressive generation, where tokens are produced left to right and contextual hashing is well defined. Diffusion language... Statistical watermarking is a common approach for verifying whether text was written by a language model. Most existing schemes assume autoregressive generation, where tokens are produced left to right and contextual hashing is well defined. Diffusion language models generate text by denoising tokens in arbitrary order, so these schemes cannot be applied directly. A recent watermark by Gloaguen et al. addresses this gap for LLaDA 8B Instruct and reports true positive detection above 99%. This pa...
cs.CR 14 papers
54	Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis 2605.04499 LLM Penetration Testing Strategy提出渗透测试策略推理框架以规划与分析攻击路径。	cs.CRcs.AI	Yasod Ginige, Pasindu Marasinghe, Sajal Jain, Suranga Seneviratne	Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals... Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific...
125	Differential Privacy in the Extensive-Form Bandit Problem 2605.05266 Differentially Private Bandits提出满足本地差分隐私的扩展式博弈bandit算法并给出遗憾界。	cs.CRcs.LG	Stephen Pasteris, Rahul Savani, Theodore Turocy	We consider the extensive-form bandit problem, where on each trial the learner (a user coordinated by a server) plays an extensive-form game against an oblivious adversary, observing the information sets it finds itself in as well as the resulting payoff/loss.... We consider the extensive-form bandit problem, where on each trial the learner (a user coordinated by a server) plays an extensive-form game against an oblivious adversary, observing the information sets it finds itself in as well as the resulting payoff/loss. We give an algorithm for this problem that satisfies $ε$-local differential privacy and attains a regret of $\tilde{O}(\sqrt{A\ln(S)T}/ε)$, where $A$ is the total number of actions that the learner can possibly take, $S$ is the number of t...
135	Gray-Box Poisoning of Continuous Malware Ingestion Pipelines 2605.04698 Poisoning Malware ML Pipelines在灰盒威胁下构造功能保持的对抗样本投毒持续恶意软件摄取管线。	cs.CRcs.LG	Jan Dolejš, Martin Jureček, Róbert Lórencz	Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framewo... Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framework, we generate problem-space adversarial binaries through functionality-preserving manipulations, specifically Import Address Table (IAT) and section injections. We evaluate the impact of these poisoned samples when ingested into a defende...
136	Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization 2605.04700 Jailbreaking Audio Language Models利用token对齐梯度仅优化少量音频片段实现高效越狱攻击。	cs.CRcs.AIcs.CLcs.LGcs.SD	Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang, Zhijin Ge	Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by... Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the structure of token-aligned gradients in ALMs. We find that gradient energy is highly non-uniform across audio tokens, indicating that only a small subset of token-aligned audio regions dominates the optimization signal. Motiv...
138	Vol-Mark: A Watermark for 3D Medical Volume Data Via Cubic Difference Expansion and Contrastive Learning 2605.04705 Watermarking 3D Medical Volumes提出可逆零水印Vol-Mark保护3D医学体数据的版权与篡改检测。	cs.CRcs.LG	Jiangnan Zhu, Yuntao Wang, Shengli Pan, Yujie Gu	Today, advances in medical technology extensively utilize 3D volume data for accurate and efficient diagnostics. However, sharing these data across networks in telemedicine poses significant security risks of data tampering and unauthorized copying. To address... Today, advances in medical technology extensively utilize 3D volume data for accurate and efficient diagnostics. However, sharing these data across networks in telemedicine poses significant security risks of data tampering and unauthorized copying. To address these challenges, this paper proposes a novel reversible-zero watermarking approach, termed Vol-Mark, for medical volume data to protect their ownership and authenticity in telemedicine. The proposed Vol-Mark method offers two key benefits...
147	From Beats to Breaches:How Offensive AI Infers Sensitive User Information from Playlists 2605.04724 PII Inference from Playlists构建musicPIIrate展示攻击者可从公开歌单推断用户敏感个人信息。	cs.CRcs.AI	Stefano Cecconello, Mauro Conti, Luca Pajola, Luca Pasa, Pier Paolo Tricomi	The pervasive integration of AI has enabled Offensive AI: the exploitation of AI for malicious ends across the cyber-kill chain. A critical manifestation is the user attribute inference attack, where AI infers sensitive Personally Identifiable Information (PII... The pervasive integration of AI has enabled Offensive AI: the exploitation of AI for malicious ends across the cyber-kill chain. A critical manifestation is the user attribute inference attack, where AI infers sensitive Personally Identifiable Information (PII) from innocuous public data. We explore how music streaming ecosystems, where users routinely release public playlists, can be exploited for Offensive AI. To quantify this threat, we developed musicPIIrate. This novel tool leverages deep l...
201	Shattering the Echo Chamber: Hidden Safeguards in Manuscripts Against the AI Takeover of Peer Review 2605.05271 Anti-LLM peer review safeguards研究在论文中嵌入隐藏指令以干扰聊天机器人代写评审的防护策略。	cs.CRcs.AI	Oubo Ma, Ruixiao Lin, Jiahao Chen, Yuan Su, Yong Yang	As LLMs become increasingly capable, editorial boards and program committees are growing concerned about reviewers who fully outsource peer review to commercial chatbots. This concern stems from prior findings that current chatbots lack the independent critica... As LLMs become increasingly capable, editorial boards and program committees are growing concerned about reviewers who fully outsource peer review to commercial chatbots. This concern stems from prior findings that current chatbots lack the independent critical thinking and depth of reasoning required to assess scientific novelty. One promising direction for mitigating this concern is to embed hidden instructions into manuscripts that disrupt or alter chatbot-generated reviews. However, existing...
206	On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference 2605.04901 Secure Transformer inference vulnerability分析Transformer安全推理中的洗牌防御并揭示其安全性缺陷。	cs.CRcs.AI	Zhengyi Li, Yakai Wang, Kang Yang, Yu Yu, Jiaping Gui	For Transformer models, cryptographically secure inference ensures that the client learns only the final output, while the server learns nothing about the client's input. However, securely computing nonlinear layers remains a major efficiency bottleneck due to... For Transformer models, cryptographically secure inference ensures that the client learns only the final output, while the server learns nothing about the client's input. However, securely computing nonlinear layers remains a major efficiency bottleneck due to the substantial communication rounds and data transmission required. To address this issue, prior works reveal intermediate activations to the client, allowing nonlinear operations to be computed in plaintext. Although this approach signif...
250	Agentic Vulnerability Reasoning on Windows COM Binaries 2605.05000 COM二进制漏洞代理提出SLYP代理流水线自动发现COM竞态漏洞并生成可验证PoC。	cs.CRcs.LG	Hwiwon Lee, Jongseong Kim, Lingming Zhang	Windows Component Object Model (COM) services run with elevated privileges and are widely accessible to authenticated users, making race conditions in these binaries a critical surface for local privilege escalation. We present SLYP, an end-to-end agentic pipe... Windows Component Object Model (COM) services run with elevated privileges and are widely accessible to authenticated users, making race conditions in these binaries a critical surface for local privilege escalation. We present SLYP, an end-to-end agentic pipeline that discovers race condition vulnerabilities in COM binaries and generates debugger-verified proof-of-concept (PoC) code. SLYP exposes binary exploration, COM inspection, and dynamic debugging as reusable tool interfaces, giving agent...
275	SoK: Robustness in Large Language Models against Jailbreak Attacks 2605.05058 LLM jailbreak robustness survey系统梳理越狱攻击与防御并提出更全面的评测视角。	cs.CRcs.AI	Feiyue Xu, Hongsheng Hu, Chaoxiang He, Sheng Hang, Hanqing Hu	Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce models into generating harmful, unethical, or policy-violating outputs. Such attacks pose real-world risks, er... Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce models into generating harmful, unethical, or policy-violating outputs. Such attacks pose real-world risks, eroding safety, trust, and regulatory compliance in high-stakes applications. Although a variety of attack and defense methods have been proposed, existing evaluation practices are inadequate, often relying on narrow metrics like attack succe...
339	Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use 2605.05287 企业多租户RAG安全架构提出厂商中立的多租户检索与工具调用安全合规方案。	cs.CRcs.AIcs.IRcs.SE	Francisco Javier Arceo, Varsha Prasad Narsing	Retrieval-Augmented Generation (RAG) and agentic AI systems are increasingly prevalent in enterprise AI deployments. However, real enterprise environments introduce challenges largely absent from academic treatments and consumer-facing APIs: multiple tenants w... Retrieval-Augmented Generation (RAG) and agentic AI systems are increasingly prevalent in enterprise AI deployments. However, real enterprise environments introduce challenges largely absent from academic treatments and consumer-facing APIs: multiple tenants with heterogeneous data, strict access-control requirements, regulatory compliance, and cost pressures that demand shared infrastructure. A fundamental problem underlies existing RAG architectures in these settings: retrieval systems rank ...
347	How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study 2605.05340 物理世界VLM隐私意识评测实证评估VLM在真实环境中识别与处理隐私信息的能力。	cs.CRcs.AI	Junran Wang, Xinjie Shen, Zehao Jin, Pan Li	As Vision-Language Models (VLMs) are increasingly deployed as autonomous cognitive cores for embodied assistants, evaluating their privacy awareness in physical environments becomes critical. Unlike digital chatbots, these agents operate in intimate spaces, su... As Vision-Language Models (VLMs) are increasingly deployed as autonomous cognitive cores for embodied assistants, evaluating their privacy awareness in physical environments becomes critical. Unlike digital chatbots, these agents operate in intimate spaces, such as homes and hospitals, where they possess the physical agency to observe and manipulate privacy-sensitive information and artifacts. However, current benchmarks remain limited to unimodal, text-based representations that cannot capture ...
393	Privacy Without Losing Place: A Paradigm for Private Retrieval in Spatial RAGs 2605.05459 Private Spatial RAG Retrieval用锚点替代编码实现空间RAG中的位置隐私检索。	cs.CRcs.LG	Kennedy Edemacu, Mohammad Mahdi Shokri, Vinay M. Shashidhar, Jong Wook Kim	This work introduces PAS -- Privacy Anchor Substitution, a structured mechanism for enabling user location privacy in spatial retrieval-augmented generation (RAG) systems. Unlike conventional differential privacy methods that directly perturb user locations, P... This work introduces PAS -- Privacy Anchor Substitution, a structured mechanism for enabling user location privacy in spatial retrieval-augmented generation (RAG) systems. Unlike conventional differential privacy methods that directly perturb user locations, PAS represents location with relative anchor encoding consisting of an anchor, direction bin, and distance bin, allowing seamless integration with modern RAG pipelines. We evaluate PAS on a synthetic urban dataset and show that it achieves i...
412	Agentic AI and the Industrialization of Cyber Offense: Forecast, Consequences, and Defensive Priorities for Enterprises and the Mittelstand 2605.06713 Agentic AI Cyber Offense Forecast分析代理式AI如何压缩攻击链并提出企业防御优先级。	cs.CRcs.AIcs.HC	Christopher Koch	Agentic AI systems can plan, call tools, inspect code, interact with web applications, and coordinate multi-step workflows. These same capabilities change the economics of cyber offense. The central near-term risk is not that every low-skill criminal immediate... Agentic AI systems can plan, call tools, inspect code, interact with web applications, and coordinate multi-step workflows. These same capabilities change the economics of cyber offense. The central near-term risk is not that every low-skill criminal immediately becomes a frontier exploit researcher; it is that agentic AI compresses the attack lifecycle by lowering the cost of reconnaissance, phishing, credential abuse, vulnerability triage, exploit adaptation, and post-compromise decision suppo...
cs.CV 96 papers
12	Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring 2605.04397 Exposure Control for rPPG提出采集时自适应曝光控制以提升车内非接触心率监测鲁棒性。	cs.CVeess.SY	Jieying Wang, Xinqi Cai, Caifeng Shan, Wenjin Wang	Remote photoplethysmography (rPPG) holds great promise for continuous heart-rate monitoring of drivers in intelligent vehicles. However, its performance is severely degraded by the highly dynamic illumination changes. A critical yet overlooked factor is the la... Remote photoplethysmography (rPPG) holds great promise for continuous heart-rate monitoring of drivers in intelligent vehicles. However, its performance is severely degraded by the highly dynamic illumination changes. A critical yet overlooked factor is the lack of exposure controlling during video acquisition -- most existing systems rely on either fixed exposure settings or camera build-in auto-exposure, both of which fail to maintain stable facial brightness under rapidly changing lighting co...
14	Detecting Deepfakes via Hamiltonian Dynamics 2605.04405 Dynamics-Based Deepfake Detection用哈密顿动力学稳定性分析替代静态特征以检测深度伪造。	cs.CVcs.AI	Harry Cheng, Ming-Hui Liu, Tianyi Wang, Weili Guan, Liqiang Nie	Driven by the rapid development of generative AI models, deepfake detectors are compelled to undergo periodic recalibration to capture newly developed synthetic artifacts. To break this cycle, we propose a new perspective on deepfake detection: moving from sta... Driven by the rapid development of generative AI models, deepfake detectors are compelled to undergo periodic recalibration to capture newly developed synthetic artifacts. To break this cycle, we propose a new perspective on deepfake detection: moving from static pattern recognition to dynamical stability analysis. Specifically, our approach is motivated by physics-inspired priors: we hypothesize that natural images, as products of dissipative physical processes, tend to settle near stable, low-...
16	UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model 2605.04409 Remote Sensing Change Captioning发布城建变化基准并提出生成结构化变化描述的字幕模型。	cs.CV	Yupeng Gao, Tianyu Li, Guoqing Wang, Yang Yang	Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely ... Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide l...
17	Evaluation Cards for XAI Metrics 2605.04410 XAI Metric Reporting Standard提出XAI评估卡模板以规范解释性指标的定义与报告。	cs.CVcs.AIcs.CYcs.LG	Rokas Gipiškis, Olga Kurasova	The evaluation of explainable AI (XAI) methods is affected by a lack of standardization. Metrics are inconsistently defined, incompletely reported, and rarely validated against common baselines. In this paper, we identify transparency of evaluation reporting a... The evaluation of explainable AI (XAI) methods is affected by a lack of standardization. Metrics are inconsistently defined, incompletely reported, and rarely validated against common baselines. In this paper, we identify transparency of evaluation reporting as a central, under-addressed problem. We propose the XAI Evaluation Card, a documentation template analogous to model cards, designed to accompany any study that introduces an XAI evaluation metric. The card covers explicit declaration of t...
18	Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion 2605.04412 Style-Controlled 3D Generation用结构化3D潜变量配合2D扩散实现可泛化的风格化3D生成。	cs.CV	Yiran Qiao, Yiren Lu, Yunlai Zhou, Disheng Liu, Linlin Hou	3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emer... 3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emerges as an important and desirable direction. However, existing approaches typically rely on style images that lie within or are similar to the training distribution of 3D generation models. When presented with out-of-distribution (OOD) styl...
22	Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning 2605.04425 Interpretable Prompt Learning交替进行语义token选择与提示优化以提升CLIP适配可解释性。	cs.CV	Yating Wang, Yaqi Zhao, Yongshun Gong, Yilong Yin, Haoliang Sun	Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usuall... Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usually depends on large external models, leading to high computational costs and limited scalability. In this paper, we propose Interpretable Prompt Learning (IPL), a hybrid framework that alternates between discrete semantic token selection and...
25	Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes 2605.04435 Feedforward 4D Reconstruction提出Ground4D以空间锚定提升越野场景前馈4D重建质量。	cs.CV	Shuo Wang, Jilin Mei, Fuyang Liu, Wenfei Guan, Fanjie Kong	Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-frequency geometry, ego-motion jitter, and increased non-ri... Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-frequency geometry, ego-motion jitter, and increased non-rigid dynamics. These factors introduce conflicting Gaussian observations across timestamps, leading to either over-smoothed renderings or structural artifacts. To address this issue, we propose Ground4D, a spatially-grounded 4D feedforward f...
27	A cross-modal network for facial expression recognition 2605.04439 Cross-Modal Facial Expression Recognition提出CMNet利用面部对称等结构信息进行跨模态表情识别。	cs.CV	Chunwei Tian, Jingyuan Xie, Qi Zhang, Chao Li, Wangmeng Zuo	Deep neural networks enriched with structural information have been widely employed for facial expression recognition tasks. However, these methods often depend on hierarchical information rather than face property to finish expression recognition. In this pap... Deep neural networks enriched with structural information have been widely employed for facial expression recognition tasks. However, these methods often depend on hierarchical information rather than face property to finish expression recognition. In this paper, we propose a cross-modal network with strong biological and structural information for facial expression recognition (CMNet). CMNet can respectively learn expression information via face symmetry on a whole face, left and right half fac...
29	LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection 2605.04445 Synthetic Image Detection by Generator Cues提出LEGO利用LoRA聚焦生成器特征以提升合成图像检测泛化。	cs.CV	Yutong Xiao, Ran Ran, Jiwei Wei, Shuchang Zhou, Ke Liu	The rapid advancement of generative technologies has made synthetic images nearly indistinguishable from real ones, thereby creating an urgent need for robust detectors to counter misinformation. However, existing methods mainly rely on universal artifact feat... The rapid advancement of generative technologies has made synthetic images nearly indistinguishable from real ones, thereby creating an urgent need for robust detectors to counter misinformation. However, existing methods mainly rely on universal artifact features that are shared across multiple generators. We observe that as the diversity of generators increases, the overlap of these common features gradually decreases. This severely undermines model generalization. In contrast, focusing only o...
30	Deep Reprogramming Distillation for Medical Foundation Models 2605.04447 Distillation for Medical Foundation Models提出深度重编程蒸馏以高效适配医学基础模型到下游场景。	cs.CV	Siyuan Du, Yuhang Zhou, Haolin Li, Jiangchao Yao, Haishuai Wang	Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepa... Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillati...
33	RemoteZero: Geospatial Reasoning with Zero Human Annotations 2605.04451 Zero-Annotation Geospatial Reasoning无需人工坐标标注实现地理定位推理与自监督学习。	cs.CV	Liang Yao, Fan Liu, Shengxiang Xu, Chuanyi Zhang, Rui Min	Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inferenc... Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still supervised by human-annotated ground-truth coordinates. This leaves the reasoning process autonomous, but not its spatial endpoint, and prevents true self-evolution on abundant unlabe...
34	StableI2I: Spotting Unintended Changes in Image-to-Image Transition 2605.04453 Image-to-Image Consistency Evaluation评测I2I结果对输入语义与空间结构的保持程度。	cs.CVcs.AI	Jiayang Li, Shuo Cao, Xiaohui Li, Zhizhen Zhang, Kaiwen Zhu	In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the seman... In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure of the input image. To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework that explicitly measures content fidelity and pre--post consistency across a wide rang...
38	Stream-T1: Test-Time Scaling for Streaming Video Generation 2605.04461 Test-Time Scaling Video Generation利用流式分块生成降低TTS候选开销并增强时序引导。	cs.CV	Yijing Tu, Shaojin Wu, Mengqi Huang, Wenchuan Wang, Yuxin Wang	While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack tempo... While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structural bottlenecks, we propose shifting the focus to streaming video generation. We identify that its chunk-level synthesis and few denoising steps are intrinsically suited for TTS, significantly lowering ...
43	Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding 2605.04475 Neuro-symbolic Driving Scene Understanding以信息协调桥接多传感器BEV表征并可验证推理以降幻觉。	cs.CV	Shuo Liu, Lei Shi, Haowen Liu, Jing Xu, Yufei Gao	Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force... Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force it to reason over redundant or conflicting perception outputs, which can amplify hallucinated entities and unsafe conclusions. This paper proposes InfoCoordiBridge, a BEV-centric neuro-symbolic architecture that inserts an explicit coordin...
56	Example-Based Object Detection 2605.04501 Example-Based Open-Vocabulary Detection用示例提示实现无需固定类别的开放词汇目标检测。	cs.CVcs.AI	ZhiXin Sun	In recent years, object detection has achieved significant progress, especially in the field of open-vocabulary object detection. Unlike traditional methods that rely on predefined categories, open-vocabulary approaches can detect arbitrary objects based on hu... In recent years, object detection has achieved significant progress, especially in the field of open-vocabulary object detection. Unlike traditional methods that rely on predefined categories, open-vocabulary approaches can detect arbitrary objects based on human-provided prompts. With the advancement of prompt-based detection techniques, models such as SAM3 can even outperform some category-specific detectors trained on particular datasets without requiring additional training on those datasets...
58	DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning 2605.04503 Image Difference Captioning Benchmark构建更难更鲁棒的图像差异描述基准与评测指标。	cs.CVcs.AI	Yuancheng Wei, Haojie Zhang, Linli Yao, Lei Li, Jiali Chen	Image Difference Captioning (IDC) generates natural language descriptions that precisely identify differences between two images, serving as a key benchmark for fine-grained change perception, cross-modal reasoning, and image editing data construction. However... Image Difference Captioning (IDC) generates natural language descriptions that precisely identify differences between two images, serving as a key benchmark for fine-grained change perception, cross-modal reasoning, and image editing data construction. However, existing benchmarks lack diversity and compositional complexity, and standard lexical-overlap metrics (e.g., BLEU, METEOR) fail to capture semantic consistency or penalize hallucinations, which together prevent a comprehensive and robust ...
59	SpecPL: Disentangling Spectral Granularity for Prompt Learning 2605.04504 Spectral Prompt Learning for VLMs以频谱粒度解耦与反事实监督改进视觉语言提示学习。	cs.CVcs.AIcs.CLcs.LG	Jingtao Zhou, Xirui Kang, Feiyang Huang, Lai-Man Po	Existing prompt learning for VLMs exhibits a modality asymmetry, predominantly optimizing text tokens while still relying on frozen visual encoder as holistic extractor and neglecting the spectral granularity essential for fine-grained discrimination. To bridg... Existing prompt learning for VLMs exhibits a modality asymmetry, predominantly optimizing text tokens while still relying on frozen visual encoder as holistic extractor and neglecting the spectral granularity essential for fine-grained discrimination. To bridge this, we introduce Disentangling Spectral Granularity for Prompt Learning (SpecPL), which approaches prompt learning from a novel spectral perspective via Counterfactual Granule Supervision. Specifically, we leverage a frozen VAE to decom...
61	Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting 2605.04506 Open-vocabulary 3D instance understanding在3D高斯泼溅中联合优化几何与语义实现开放词汇实例级理解	cs.CVcs.AI	Binh Long Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes, Peyman Moghadam	We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view... We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view consistency, lacks coherent instance-level reasoning, and limits precision in downstream 3D tasks. To address these limitations, our method jointly optimizes scene geometry and semantic representations by augmenting Gaussian splats with vi...
65	From Priors to Perception: Grounding Video-LLMs in Physical Reality 2605.04515 Grounding Video-LLMs in physics通过物理现实约束提升视频大模型的细粒度物理推理能力	cs.CV	Zicheng Zhao, Chaofan Gan, Shijie Li, Weiyao Lin	While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifac... While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifacts with genuine physical fallacies. Furthermore, we find that models fail systematically not only in anti-physics anomalies but also in counter-intuitive scenarios where visual facts contradict statistical expectations. Accordingly, we prop...
66	DALight-3D: A Lightweight 3D U-Net for Brain Tumor Segmentation from Multi-Modal MRI 2605.04518 Lightweight 3D U-Net tumor segmentation设计轻量3D U-Net以高效分割多模态MRI脑肿瘤	cs.CVcs.LGcs.NE	Nand Kumar Mishra, Dhruv Mishra, Dr Manu Pratap Singh	Automatic brain tumor segmentation from multi-modal MRI remains challenging because volumetric models often incur substantial computational cost. This paper presents DALight-3D, a compact 3D U-Net variant that combines depthwise separable 3D convolutions, iden... Automatic brain tumor segmentation from multi-modal MRI remains challenging because volumetric models often incur substantial computational cost. This paper presents DALight-3D, a compact 3D U-Net variant that combines depthwise separable 3D convolutions, identifier-conditioned normalization, cross-slice attention, and adaptive skip fusion. The method is evaluated on the Medical Segmentation Decathlon Task01 BrainTumour benchmark under matched optimization settings against standard 3D U-Net, Att...
70	High-Fidelity Single-Image Head Modeling with Industry-Grade Topology 2605.04524 Single-image head mesh reconstruction从单张图像重建保持身份且具工业级拓扑的头部网格	cs.CVcs.GR	Yunmu Wang, Zoubin Bi, Bowen Cai, Chenchu Rong, Jinlong Wang	We present a single-image head mesh reconstruction framework that addresses the longstanding challenge of simultaneously preserving facial identity and producing industry-grade topology. Our framework adopts a coarse-to-fine optimization pipeline that refines ... We present a single-image head mesh reconstruction framework that addresses the longstanding challenge of simultaneously preserving facial identity and producing industry-grade topology. Our framework adopts a coarse-to-fine optimization pipeline that refines a rigged template across three stages -- rig, joint, and vertex -- achieving stable convergence and consistent topology. To mitigate the ill-posed nature of single-image 3D face reconstruction and ensure identity preservation, we employ a n...
71	Velox: Learning Representations of 4D Geometry and Appearance 2605.04527 Latent representation learning for 4D objects将动态点云编码为形状token以压缩并重建4D几何与外观	cs.CV	Anagh Malik, Dorian Chan, Xiaoming Zhao, David B. Lindell, Oncel Tuzel	We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured ... We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud, to construct. Specifically, Velox trains an encoder to compress spatiotemporal color point clouds into a set of dynamic shape tokens. These tokens are supervised using two complementary decoders: a 4D surface decoder, w...
74	Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection 2605.04531 Test-time adaptive open-vocabulary detection用奖励引导语义演化在测试时对齐文图嵌入提升检测鲁棒性	cs.CV	Lihua Zhou, Mao Ye, Xiatian Zhu, Nianxin Li, Changyi Ma	Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of... Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a tr...
79	Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection 2605.04541 Image-to-point registration outlier rejection用角度一致层次注意力提升跨模态配准的外点剔除与PnP稳定性	cs.CV	Muyao Peng, Shun Zou, Pei An, You Yang, Qiong Liu	Image-to-point-cloud registration (I2P) is a fundamental task in robotic applications such as manipulation,grasping, and localization. Existing deep learning-based I2P methods seek to align image and point cloud features in a learned representation space to es... Image-to-point-cloud registration (I2P) is a fundamental task in robotic applications such as manipulation,grasping, and localization. Existing deep learning-based I2P methods seek to align image and point cloud features in a learned representation space to establish correspondences, and have achieved promising results. However, when the inlier ratio of the initial matching pairs is low, conventional Perspective-n-Points (PnP) methods may struggle to achieve accurate results. To address this lim...
86	InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery 2605.04554 Interaction-aware multi-person mesh recovery显式建模人-人及人-物交互以端到端恢复多人人体网格	cs.CV	Kaili Zheng, Kaiwen Wang, Xun Zhu, Chenyi Guo, Ji Wu	Humans constantly interact with their surroundings. Existing end-to-end multi-person human mesh recovery methods, typically based on the DETR framework, capture inter-human relationships through self-attention across all human queries. However, these approache... Humans constantly interact with their surroundings. Existing end-to-end multi-person human mesh recovery methods, typically based on the DETR framework, capture inter-human relationships through self-attention across all human queries. However, these approaches model interactions only implicitly and lack explicit reasoning about how humans interact with objects and with each other. In this paper, we propose InterMesh, a simple yet effective framework that explicitly incorporates human-environmen...
89	Efficient Geometry-Controlled High-Resolution Satellite Image Synthesis 2605.04557 Geometry-controlled satellite image synthesis在预训练扩散模型上加入几何控制以合成高分辨率卫星图像	cs.CVcs.AI	Vlad Vasilescu, Daniela Faur, Teodor Costachioiu	High-resolution satellite images are often scarce and costly, especially for remote areas or infrequent events. This shortage hampers the development and testing of machine learning models for land-cover classification, change detection, and disaster monitorin... High-resolution satellite images are often scarce and costly, especially for remote areas or infrequent events. This shortage hampers the development and testing of machine learning models for land-cover classification, change detection, and disaster monitoring. In this paper, we tackle the problem of geometry-controlled high-resolution satellite image synthesis by adding control over existing pre-trained diffusion models. We propose a simple yet efficient method for controlling the synthesis pr...
90	SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression 2605.04560 Efficient perceptual image compression with Mamba用语义感知Mamba状态空间模型实现低复杂度高感知压缩	cs.CV	Jiaqian Zhang, Hao Wei, Chenyang Ge, Yanhui Zhou	Perceptual image compression focuses on preserving high visual quality under low-bitrate constraints. Most existing approaches to perceptual compression leverage the strong generative capabilities of generative adversarial networks or diffusion models, at the ... Perceptual image compression focuses on preserving high visual quality under low-bitrate constraints. Most existing approaches to perceptual compression leverage the strong generative capabilities of generative adversarial networks or diffusion models, at the cost of substantial model complexity. To this end, we present an efficient perceptual image compression method that exploits the long-range modeling capability and linear computational complexity of state space models, with a particular foc...
91	Open-Source Image Editing Models Are Zero-Shot Vision Learners 2605.04566 Zero-shot vision in image editors系统评测开源图像编辑模型的零样本视觉能力。	cs.CVcs.CL	Wei Liu, Jiaxin Lin, Rui Chen	Recent studies have shown that large generative models can solve vision tasks they were not explicitly trained for. However, existing evidence relies on closed-source models~(Veo~3, Nano Banana Pro) or requires task-specific instruction tuning, leaving open wh... Recent studies have shown that large generative models can solve vision tasks they were not explicitly trained for. However, existing evidence relies on closed-source models~(Veo~3, Nano Banana Pro) or requires task-specific instruction tuning, leaving open whether publicly available image-editing models possess zero-shot vision abilities out of the box. We conduct a systematic evaluation of three open-source image-editing models -- Qwen-Image-Edit, FireRed-Image-Edit, and LongCat-Image-Edit -...
93	Lightning Unified Video Editing via In-Context Sparse Attention 2605.04569 Sparse attention for video editing提出近无损稀疏注意力以加速上下文学习视频编辑。	cs.CV	Shitong Shao, Zikai Zhou, Haopeng Li, Yingwei Song, Wenliang Zhong	Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse f... Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness...
95	VL-UniTrack: A Unified Framework with Visual-Language Prompts for UAV-Ground Visual Tracking 2605.04574 UAV-ground visual tracking prompts用视觉语言提示统一建模无人机与地面视角目标跟踪。	cs.CV	Boyue Xu, Ruichao Hou, Tongwei Ren, Gangshan Wu	UAV-ground visual tracking (UGVT) aims to simultaneously track the same object from both the UAV and the ground view. However, existing two-stream methods suffer from isolated feature extraction and rely heavily on implicit appearance matching, which struggles... UAV-ground visual tracking (UGVT) aims to simultaneously track the same object from both the UAV and the ground view. However, existing two-stream methods suffer from isolated feature extraction and rely heavily on implicit appearance matching, which struggles to establish reliable correspondence under drastic view differences, leading to tracking unreliability. To address these limitations, we propose VL-UniTrack, a fully unified framework enhanced by visual-language prompts. By encoding featur...
97	GTF: Omnidirectional EPI Transformer for Light Field Super-Resolution 2605.04581 Light field super-resolution transformer用全方向EPI Transformer建模多角度几何提升光场超分。	cs.CV	Kunyu Li, Fei Wang, Lichao Zhang, Junjie Liu, Bihong Li	Light field (LF) image super-resolution benefits from Epipolar Plane Images (EPIs), whose line slopes explicitly encode disparity. However, existing Transformer-based LF SR methods mainly attend to horizontal and vertical EPIs, leaving diagonal epipolar geomet... Light field (LF) image super-resolution benefits from Epipolar Plane Images (EPIs), whose line slopes explicitly encode disparity. However, existing Transformer-based LF SR methods mainly attend to horizontal and vertical EPIs, leaving diagonal epipolar geometry underexplored. We present GTF, an omnidirectional EPI Transformer that explicitly models horizontal, vertical, 45-degree, and 135-degree EPIs within a unified reconstruction framework. GTF combines directional EPI processing, MacPI-based...
100	From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation 2605.04590 Rectified flow for text segmentation以整流流替代扩散特征，重构文本提示图像分割方法。	cs.CVcs.AI	Zishen Qu, Xuesong Li, Haijian Gu, Hongwei Kang, Quan Meng	Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have shown that diffusion m... Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have shown that diffusion models (e.g., Stable Diffusion) can provide rich multimodal semantic features, leading to studies of using diffusion models as feature extractors for segmentation tasks. Such methods, however, inherit the generative natures of diffusion mode...
101	DiCLIP: Diffusion Model Enhances CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation 2605.04593 Diffusion-enhanced CLIP for WSSS用扩散模型增强CLIP密集知识以改进弱监督语义分割。	cs.CV	Zhiwei Yang, Pengfei Song, Yucong Meng, Kexue Fu, Shuo Wang	Weakly Supervised Semantic Segmentation (WSSS) with image-level labels typically leverages Class Activation Maps (CAMs) to achieve pixel-level predictions. Recently, Contrastive Language-Image Pre-training (CLIP) has been introduced to generate CAMs in WSSS. H... Weakly Supervised Semantic Segmentation (WSSS) with image-level labels typically leverages Class Activation Maps (CAMs) to achieve pixel-level predictions. Recently, Contrastive Language-Image Pre-training (CLIP) has been introduced to generate CAMs in WSSS. However, previous WSSS methods solely adopt CLIP's vision-language paired property for dense localization, neglecting its inherently limited dense knowledge across both visual and text modalities, which renders CAM generation suboptimal. In ...
105	Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness 2605.04606 Unsupervised category-aware object detection用参考类别发现实现无标注且具类别意识的目标检测。	cs.CVcs.AI	Yichen Li, Qiankun Liu, Ying Fu	Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods generate pseudo boxes without category labels, thus failing to ach... Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods generate pseudo boxes without category labels, thus failing to achieve category-aware classification. To overcome these limitations, we propose Reference-based Category Discovery (RefCD), an unsupervised detector that enables category-aware\footnotemark[1] detection without any manually annotated labels. ...
107	Advancing Aesthetic Image Generation via Composition Transfer 2605.04609 Composition transfer for image aesthetics显式建模构图并进行构图迁移以提升审美图像生成。	cs.CV	Kai Zou, Zhiwei Zhao, Bin Liu, Nenghai Yu	Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composi... Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composition either through implicit learning or by semantics-based layout control, rather than explicitly modeling composition itself. To address this gap, we introduce Composer, a framework rooted in aesthetic theory, designed to model compositio...
111	Temporal Structure Matters for Efficient Test-Time Adaptation in Wearable Human Activity Recognition 2605.04617 Temporal-aware test-time adaptation for HAR利用时序结构改进可穿戴活动识别的测试时自适应效率。	cs.CVcs.HCcs.LG	Zishu Zhou, Zaipeng Xie, Xuanyao Jie	Wearable human activity recognition (WHAR) models often suffer from performance degradation under real-world cross-user distribution shifts. Test-time adaptation (TTA) mitigates this degradation by adapting models online using unlabeled test streams, yet exist... Wearable human activity recognition (WHAR) models often suffer from performance degradation under real-world cross-user distribution shifts. Test-time adaptation (TTA) mitigates this degradation by adapting models online using unlabeled test streams, yet existing methods largely inherit assumptions from vision tasks and underexploit the inherent inter-window temporal structure in WHAR streams. In this paper, we revisit such temporal structure as a feature-conditioned inference signal rather than...
114	UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection 2605.04635 Generation-assisted PCB defect inspection结合可控生成与检测缓解PCB缺陷样本稀缺并提升检出。	cs.CV	Huan Zhang, Lianghong Tan, Yichu Xu, Jiangzhong Cao, Huanqi Wu	Printed Circuit Board (PCB) defect inspection faces two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on singl... Printed Circuit Board (PCB) defect inspection faces two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on single-modality conditions with coarse structural control, while detection methods improve architectures without addressing the data bottleneck. To resolve both challenges jointly, we propose a generation-assisted PCB defect inspection framework...
116	CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering 2605.04641 Mitigating LVLM object hallucination用字幕引导的视觉注意力转向减少视觉语言模型物体幻觉。	cs.CV	Qiming Li, Zekai Ye, Xiaocheng Feng, Weihong Zhong, Libo Qin	Although Large Vision-Language Models (LVLMs) have demonstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object hallucination. To tackle this, recent works mostly depend on ex... Although Large Vision-Language Models (LVLMs) have demonstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object hallucination. To tackle this, recent works mostly depend on expensive manual annotations and training cost, or decoding strategies which significantly increase inference time. In this work, we observe that LVLMs' attention to visual information is significantly enhanced when answering caption queries ...
121	Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling 2605.04662 Dance Motion Synthesis提出接触矩阵建模双人舞互动约束以提升动作生成真实感。	cs.CV	Xuhai Chen, Zhi Cen, Huaijin Pi, Sida Peng, Xiaowei Zhou	Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quali... Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quality data is scarce, motion patterns are complex, and the details of human interactions are both intricate and abundant. To tackle these challenges, we propose a novel two-stage framework. In the first stage, we introduce a motion VQ-VAE with...
127	Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern 2605.04675 Physical Adversarial Attack on RGB-T设计非重叠RGB-T对抗服装以在真实场景规避可见-热成像检测器。	cs.CV	Xiaopei Zhu, Guanning Zeng, Zhanhao Hu, Jun Zhu, Xiaolin Hu	Visible-thermal (RGB-T) object detection is a crucial technology for applications such as autonomous driving, where multimodal fusion enhances performance in challenging conditions like low light. However, the security of RGB-T detectors, particularly in the p... Visible-thermal (RGB-T) object detection is a crucial technology for applications such as autonomous driving, where multimodal fusion enhances performance in challenging conditions like low light. However, the security of RGB-T detectors, particularly in the physical world, has been largely overlooked. This paper proposes a novel approach to RGB-T physical attacks using adversarial clothing with a non-overlapping RGB-T pattern (NORP). To simulate full-view (0$^{\circ}$--360$^{\circ}$) RGB-T atta...
130	Multi-Level Bidirectional Biomimetic Learning for EEG-Based Visual Decoding 2605.04680 EEG Visual Decoding提出多层双向仿生学习以缓解配对数据少并提升EEG视觉解码对齐。	cs.CVcs.AI	Jingtao Liu, Peiliang Gong, Chuhang Zheng, Yiheng Liu, Qi Zhu	EEG-based visual neural decoding aims to align neural responses with visual stimuli for tasks such as image retrieval. However, limited paired data and a fundamental mismatch between high-fidelity digital images and biological visual perception - distorted by ... EEG-based visual neural decoding aims to align neural responses with visual stimuli for tasks such as image retrieval. However, limited paired data and a fundamental mismatch between high-fidelity digital images and biological visual perception - distorted by retinotopic mapping and subject-specific neuroanatomy - severely impede cross-modal alignment. To address this, we propose MB2L, a Multi-Level Bidirectional Biomimetic Learning framework that incorporates structured physiological inductive ...
137	FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation 2605.04702 Identity-Preserving Text-to-Video提出姿态一致的身份保持学习以减少大姿态与遮挡下的人脸漂移。	cs.CVcs.AI	Yuanzhi Wang, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Kai Yu	Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial ... Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial pose variations or facial occlusions. In this paper, we propose \textit{FaithfulFaces}, a pose-faithful facial identity preservation learning framework to improve IPT2V in complex dynamic scenes. The key of FaithfulFaces is a pose-shared id...
143	Not Every Subject Should Stay: Machine Unlearning for Noisy Engagement Recognition 2605.04713 Subject-Level Machine Unlearning研究训练后按主体移除噪声参与者影响的机器遗忘以修订参与度识别数据。	cs.CV	Alexander Vedernikov	Engagement recognition datasets are typically subject-indexed and often contain noisy, subjective supervision, making post-hoc dataset revision a practical problem. Existing noisy-label and data-cleaning methods largely operate at the sample level before or du... Engagement recognition datasets are typically subject-indexed and often contain noisy, subjective supervision, making post-hoc dataset revision a practical problem. Existing noisy-label and data-cleaning methods largely operate at the sample level before or during training, but do not directly address a different question: once a model has already been trained, can the influence of an entire problematic subject be removed without full retraining? We study this setting through subject-level machi...
149	Anny-Fit: All-Age Human Mesh Recovery 2605.04728 All-Age Human Mesh Recovery提出多人物联合相机空间优化实现全年龄场景的单图3D人体网格恢复。	cs.CV	Laura Bravo-Sánchez, Matthieu Armando, Romain Brégier, Grégory Rogez, Serena Yeung-Levy	Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail in real-world, all-age scenes, where body proportions an... Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail in real-world, all-age scenes, where body proportions and depth must be resolved jointly. We introduce Anny-Fit, a multi-person, camera-space optimization framework for all-age 3D human mesh recovery (HMR). Unlike existing per-person fitting methods, Anny-Fit jointly optimizes all individuals di...
151	ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting 2605.04730 3DGS视觉定位特征去偏提出无偏地标特征以提升3DGS视觉定位鲁棒性。	cs.CV	Yingdong Gu, Shaocheng Yan, Zhenjun Zhao, Yuan Kou, Jianxin Luo	Visual localization is a core technology for augmented reality and autonomous navigation. Recent methods combine the efficient rendering of 3D Gaussian Splatting (3DGS) with feature-based localization. These methods rely on direct matching between 2D query fea... Visual localization is a core technology for augmented reality and autonomous navigation. Recent methods combine the efficient rendering of 3D Gaussian Splatting (3DGS) with feature-based localization. These methods rely on direct matching between 2D query features and the 3D Gaussian feature field, but this often results in mismatches due to an inherent bias in the learned Gaussian feature. We theoretically analyze the feature learning process in 3DGS, revealing that the widely adopted $α$-blen...
152	Morphology-Guided Cross-Task Coupling for Joint Building Height and Footprint Estimation 2605.04731 建筑高度与轮廓联合估计用形态约束耦合任务联合预测建筑高度与占地轮廓。	cs.CV	Jinzhen Han, JinByeong Lee, Jisung Kim, HongSik Yun	Building height (BH) and building footprint (BF) jointly describe the vertical and horizontal extent of the built environment and are required inputs for urban climate, disaster-risk, and population-mapping models. The two parameters are coupled through floor-... Building height (BH) and building footprint (BF) jointly describe the vertical and horizontal extent of the built environment and are required inputs for urban climate, disaster-risk, and population-mapping models. The two parameters are coupled through floor-area-ratio (FAR) constraints, yet remote-sensing approaches typically treat them as independent regression targets. We argue that explicitly encoding this cross-task coupling is more impactful than further refining individual encoders, and ...
160	VC-FeS: Viewpoint-Conditioned Feature Selection for Vehicle Re-identification in Thermal Vision 2605.04750 热成像车辆重识别提出视角条件特征选择以提升热红外车辆重识别性能。	cs.CVeess.SY	Yasod Ginige, Ransika Gunasekara, Darsha Hewavitharana, Manjula Ariyarathne, Peshala Jayasekara	Identification of less-articulated objects using single-channel images, such as thermal images, is important in many applications, such as surveillance. However, in this domain, existing methods show poor performance due to high similarity among objects of the... Identification of less-articulated objects using single-channel images, such as thermal images, is important in many applications, such as surveillance. However, in this domain, existing methods show poor performance due to high similarity among objects of the same category in the absence of color information (overlooking shape information) and de-emphasized texture information. Furthermore, variability in viewpoint adds more complexity as the features vary from side to side. We address these is...
161	Hybrid Congestion Classification Framework Using Flow-Guided Attention and Empirical Mode Decomposition 2605.04752 交通拥堵分类多模态融合融合流引导注意力与EMD以同时建模场景与非平稳运动进行拥堵分类。	cs.CVcs.AI	Eugene Kofi Okrah Denteh, Blessing Agyei Kyem, Joshua Kofi Asamoah, Armstrong Aboah	Accurate traffic congestion classification requires models that jointly capture roadway scene context and non-stationary traffic motion, yet most prior work treats these requirements in isolation. Vision-based methods often depend on appearance cues with stand... Accurate traffic congestion classification requires models that jointly capture roadway scene context and non-stationary traffic motion, yet most prior work treats these requirements in isolation. Vision-based methods often depend on appearance cues with standard temporal pooling, which can bias predictions toward static infrastructure, whereas signal-based approaches characterize temporal dynamics but lack the spatial context needed for scene-level localization. These complementary limitations ...
166	Lightweight Cross-Spectral Face Recognition via Contrastive Alignment and Distillation 2605.04769 轻量跨光谱人脸识别用对比对齐与蒸馏实现面向边缘设备的轻量跨光谱人脸识别。	cs.CV	Anjith George, Sebastien Marcel	Heterogeneous Face Recognition (HFR) aims at matching face images captured across different sensing modalities, such as thermal-to-visible or near-infrared-to-visible, enhancing the usability of face recognition systems in challenging real-world conditions. Al... Heterogeneous Face Recognition (HFR) aims at matching face images captured across different sensing modalities, such as thermal-to-visible or near-infrared-to-visible, enhancing the usability of face recognition systems in challenging real-world conditions. Although recent HFR methods have achieved significant improvements in performance, many rely on computationally expensive models, making them impractical for deployment on resource-limited edge devices. In this work, we introduce a lightweigh...
167	Gaze4HRI: Zero-shot Benchmarking Gaze Estimation Neural-Networks for Human-Robot Interaction 2605.04770 HRI零样本凝视估计评测提出面向人机交互条件的零样本凝视估计基准与评测分析。	cs.CVcs.HCcs.LGcs.RO	Berk Sezer, Ali Görkem Küçük, Erol Şahin, Sinan Kalkan	While zero-shot appearance-based 3D gaze estimation offers significant cost-efficiency by directly mapping RGB images to gaze vectors, its reliability in Human-Robot Interaction (HRI) settings remains uncertain. Existing benchmarks frequently overlook fundamen... While zero-shot appearance-based 3D gaze estimation offers significant cost-efficiency by directly mapping RGB images to gaze vectors, its reliability in Human-Robot Interaction (HRI) settings remains uncertain. Existing benchmarks frequently overlook fundamental HRI conditions, such as dynamic camera viewpoints and moving targets in video. Furthermore, current cross-dataset evaluations often suffer from a complexity gap, where methods trained on diverse datasets are tested on significantly smal...
168	MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education 2605.04772 医学教育多模态检索生成构建医学图文检索与生成系统以提供可交互的学习资源。	cs.CV	Miguel Diaz Benito, Cecilia Diana Albelda, Alvaro Garcia Martin, Jesus Bescos Cano, Marcos Escudero-Vinolo	Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understanding of anatomical structures. While medical atlases are v... Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understanding of anatomical structures. While medical atlases are valuable, they are often impractical due to their size and lack of interactivity, whereas online image search may provide mislabeled or incomplete material. To address this, we propose MIRAGE, a multimodal medical text and image retrieval an...
184	QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes 2605.04844 3D Gaussian splatting acceleration用几何感知四重包围盒加速3DGS光栅化的高斯相交计算。	cs.CVcs.GR	Xinze Li, Bohan Yang, Pengxu Chen, Yiyuan Wang, Hongcheng Luo	3D Gaussian Splatting (3DGS) has emerged as an advanced technique for real-time novel view synthesis by representing scene geometry and appearance using differentiable Gaussian primitives. However, efficiently computing precise Gaussian-tile intersections rema... 3D Gaussian Splatting (3DGS) has emerged as an advanced technique for real-time novel view synthesis by representing scene geometry and appearance using differentiable Gaussian primitives. However, efficiently computing precise Gaussian-tile intersections remains a critical task in the rasterization pipeline. To this end, we propose QuadBox, a method that leverages four axis-aligned bounding boxes to tightly encapsulate projected Gaussians in a discrete manner. First, we derive a geometry-aware ...
187	3D Ultrasound-Derived Pseudo-CT Synthesis Using a Transformer-Augmented Residual Network for Real-Time Operator Guidance 2605.04856 Ultrasound to pseudo-CT synthesis用Transformer增强残差网络从3D超声实时生成伪CT用于术中引导。	cs.CV	Sapna Sachan, Amulya Kumar Mahto	Computed tomography (CT) is indispensable for clinical diagnosis and image-guided interventions but exposes patients to ionizing radiation, motivating the development of safer imaging alternatives. Ultrasound (US) is non-ionizing and widely accessible; however... Computed tomography (CT) is indispensable for clinical diagnosis and image-guided interventions but exposes patients to ionizing radiation, motivating the development of safer imaging alternatives. Ultrasound (US) is non-ionizing and widely accessible; however, it is highly operator dependent and lacks quantitative tissue characterization, often leading to diagnostic uncertainty and unnecessary CT examinations. This work presents a 3D ultrasound-derived pseudo-CT (UD-pCT) framework that generate...
190	VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA 2605.04870 Video TextVQA keyframe agents提出VTAgent以代理式关键帧锚定提升证据感知的视频文本问答。	cs.CV	Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, Bo Du	Video text-based visual question answering (Video TextVQA) aims to answer questions by reasoning over visual textual content appearing in videos. Despite the strong multimodal video understanding capabilities of recent Video-LLMs, their performance on existing... Video text-based visual question answering (Video TextVQA) aims to answer questions by reasoning over visual textual content appearing in videos. Despite the strong multimodal video understanding capabilities of recent Video-LLMs, their performance on existing Video TextVQA benchmarks remains limited. To better understand this gap, we conduct an upper-bound analysis through frame-wise question answering, counting a sample as correct if any frame yields the right answer, which significantly outpe...
196	FairEnc: A Fair Vision-Language Model with Fair Vision and Text Encoders for Glaucoma Detection 2605.04882 Fair vision-language glaucoma detection提出FairEnc联合去偏视觉与文本编码器以公平检测青光眼。	cs.CVcs.AIcs.LGeess.IVq-bio.QM	Mohamed Elhabebe, Ayman El-Baz, Qing Liu	Automated glaucoma detection is critical for preventing irreversible vision loss and reducing the burden on healthcare systems. However, ensuring fairness across diverse patient populations remains a significant challenge. In this paper, we propose FairEnc, a ... Automated glaucoma detection is critical for preventing irreversible vision loss and reducing the burden on healthcare systems. However, ensuring fairness across diverse patient populations remains a significant challenge. In this paper, we propose FairEnc, a fair pretraining method for vision-language models (VLMs) that enables simultaneous debiasing across multiple sensitive attributes. FairEnc jointly mitigates biases in both textual and visual modalities with respect to multiple sensitive at...
208	Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification 2605.04904 Animal re-identification via inpainting embeddings探索修复模型嵌入的聚类能力以基于皮纹实现动物个体识别。	cs.CV	Jens van Bijsterveld, Daniele Avitabile, Fons J. Verbeek, Rita Pucci	In this paper, we explore deep learning techniques for individual identification of animals based on their skin patterns. Individual identification is crucial in biodiversity monitoring, since it enables analysis of decline or growth of populations, or intra-s... In this paper, we explore deep learning techniques for individual identification of animals based on their skin patterns. Individual identification is crucial in biodiversity monitoring, since it enables analysis of decline or growth of populations, or intra-species interactions within populations. Models trained for the task of individual identification often do not focus on the skin pattern of animals, but on background details or body shape details. These characteristics are not individually ...
225	DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring 2605.04943 Vision-language rope condition monitoring提出DART从单张图像输出绳索损伤评估、建议与报告。	cs.CVcs.AI	Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevic	The condition monitoring (CM) of synthetic fibre ropes (SFRs) used in offshore, maritime, and industrial settings demands more than a classifier: inspectors need continuous severity estimates, maintenance recommendations, anomaly flags, deterioration timelines... The condition monitoring (CM) of synthetic fibre ropes (SFRs) used in offshore, maritime, and industrial settings demands more than a classifier: inspectors need continuous severity estimates, maintenance recommendations, anomaly flags, deterioration timelines, and automated reports, all from a single inspection image. We present DART (Damage Assessment via Rope Transformer), a vision-language foundation model that addresses the full rope inspection workflow through a unified multi-task architec...
240	ICPR 2026 Competition on Privacy-Preserving Person Re-Identification from Top-View RGB-Depth Camera (TVRID) 2605.04977 Privacy-preserving top-view re-identification benchmark报告TVRID竞赛并发布顶视RGB-Depth隐私行人重识别数据与结果。	cs.CV	Raphaël Delécluse, Hazem Wannous, Laurent Guimas	This companion paper reports the ICPR 2026 TVRID competition on privacy-aware top-view person re-identification. We present the competition setting, the released RGB-Depth dataset, and a summary of final results with descriptions of the top entries. TVRID cont... This companion paper reports the ICPR 2026 TVRID competition on privacy-aware top-view person re-identification. We present the competition setting, the released RGB-Depth dataset, and a summary of final results with descriptions of the top entries. TVRID contains 86 identities captured by four synchronized overhead Intel RealSense D455 cameras, with paired RGB/Depth streams and structured geometric variation across flat, ascent, descent, and oblique viewpoints. The evaluation protocol includes ...
244	Attention-Based Chaotic Self-Supervision for Medical Image Classification 2605.04985 医学影像自监督学习提出注意力引导的混沌去噪自编码预训练提升医学分类。	cs.CV	Joao Batista Florindo, Amanda Pontes de Oliveira Ornelas	Deep learning models for medical image classification usually achieve promising results but typically rely on large, annotated datasets or standard transfer learning from ImageNet. Self-Supervised Learning (SSL) has emerged as a powerful alternative, yet commo... Deep learning models for medical image classification usually achieve promising results but typically rely on large, annotated datasets or standard transfer learning from ImageNet. Self-Supervised Learning (SSL) has emerged as a powerful alternative, yet common methods like masked autoencoders (MAEs) may inadvertently destroy fine-grained diagnostic features by using random masking. In this paper, we propose a novel SSL pre-training strategy, the Chaotic Denoising Autoencoder (CDAE). Instead of ...
245	Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data 2605.04989 地理基础模型LoRA适配评估用低秩适配高效微调地理基础模型进行哨兵2火烧迹地制图。	cs.CV	Ali Shibli, Andrea Nascetti, Yifang Ban	Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite... Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite imagery, yet there is still no clear understanding of how to efficiently adapt these models for downstream Earth observation tasks, particularly under geographic and temporal domain shift. This study evaluates three state-of-the-art Geospa...
255	Chaotic Contrastive Learning for Robust Texture Classification 2605.05012 纹理鲁棒对比学习提出混沌对比学习框架提升纹理分类对尺度光照与域移的鲁棒性。	cs.CV	Joao B Florindo	Texture classification is a pivotal task in computer vision, presenting unique challenges due to high inter-class similarity and the sensitivity of structural patterns to scale and illumination changes. While Convolutional Neural Networks (CNNs) and recent Vis... Texture classification is a pivotal task in computer vision, presenting unique challenges due to high inter-class similarity and the sensitivity of structural patterns to scale and illumination changes. While Convolutional Neural Networks (CNNs) and recent Vision Transformers have set performance benchmarks, they often require extensive labeled datasets or struggle to generalize across domains due to an over-reliance on color and shape features. This paper introduces a novel framework that syner...
256	CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography 2605.05014 自动驾驶稠密3D数据集发布CARD多模态数据集提供复杂路况下准稠密3D真值用于重建评测。	cs.CV	Gasser Elazab, Frank Neuhaus, Tilman Koß, Malte Splietker, Aditya Date	Autonomous driving must operate across diverse surfaces to enable safe mobility. However, most driving datasets are captured on well-paved flat roads. Moreover, recent driving datasets primarily provide sparse LiDAR ground truth for images, which is insufficie... Autonomous driving must operate across diverse surfaces to enable safe mobility. However, most driving datasets are captured on well-paved flat roads. Moreover, recent driving datasets primarily provide sparse LiDAR ground truth for images, which is insufficient for assessing fine-grained geometry in depth estimation and completion. To address these gaps, we introduce CARD, a multi-modal driving dataset that delivers quasi-dense 3D ground truth across continuous sequences rich in speed bumps, po...
262	Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models 2605.05026 扩散模型幻觉分析用局部内在维度刻画扩散模型结构性幻觉并解释其不稳定性来源。	cs.CVcs.AI	Bartlomiej Sobieski, Matthew Tivnan, Dawid Płudowski, Michał Jan Włodarczyk, Pengfei Jin	Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied... Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied this failure mode from several viewpoints, offering partial explanations to their occurrence, such as mode interpolation. In this work, we propose a complementary perspective that treats hallucinations as instabilities on the model-induced...
263	Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification 2605.05027 终身行人重识别蒸馏提出提示锚定的视觉文本蒸馏缓解语义漂移与灾难遗忘。	cs.CV	Wen Wen, Hao Chen, Shiliang Zhang	Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-fr... Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-free approaches largely rely on visual-only distillation or parameter regularization, while overlooking the potential of auxiliary modalities, such as text, to preserve semantic stability and enable incremental plasticity. We observe that the...
265	Computer-Aided Design Generation by Cascaded Discrete Diffusion Model 2605.05031 CAD离散扩散生成提出级联离散扩散模型生成CAD命令序列以避免连续扰动的语义无效。	cs.CV	Honghu Pan, Xiaoling Luo, Yongyong Chen, Zhenyu He, Pengyang Wang	Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean embedding space. Howeve... Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean embedding space. However, continuous diffusion perturbs representations in a continuous Euclidean domain that does not reflect the inherently discrete and heterogeneous nature of CAD tokens, often producing perturbed representations that map to semantically inval...
266	Few-Shot Learning Pipeline for Monkeypox Skin Disease Classification Using CNN Feature Extractors 2605.05034 猴痘皮肤少样本分类用CNN特征与SimpleShot构建少样本管线实现猴痘皮肤病识别。	cs.CV	Md. Safirur Rashid, Sabbir Ahmed, Muhammad Usama Islam, Sumona Hoque Mumu, Md. Hasanul Kabir	Despite the strong performance of Convolutional Neural Networks (CNNs) in disease classification, their effectiveness often depends on access to large annotated datasets, which is an impractical requirement for emerging or rare conditions such as Monkeypox. To... Despite the strong performance of Convolutional Neural Networks (CNNs) in disease classification, their effectiveness often depends on access to large annotated datasets, which is an impractical requirement for emerging or rare conditions such as Monkeypox. To overcome this limitation, we propose a few-shot learning (FSL) framework that employs SimpleShot, a lightweight, non-parametric, inductive classifier, for Monkeypox and pox-like skin disease recognition from limited labeled examples. The p...
268	When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise 2605.05045 VLM关系幻觉鲁棒性分析旋转与噪声导致VLM关系幻觉并评估纠偏与去噪策略。	cs.CVcs.CL	Philip Wootaek Shin, Ajay Narayanan Sridhar, Sivani Devarapalli, Rui Zhang, Jack Sampson	Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and ... Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while the...
272	Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation 2605.05054 Flow matching few-shot adaptation用极坐标分解解耦径向与角向流以提升小样本适配。	cs.CVcs.AIcs.LG	Hongxu Chen, Yanghao Wang, Bowei Zhu, Hongxiang Li, Zhen Wang	Recent flow matching (FM) methods improve the few-shot adaptation of vision-language models, by modeling cross-modal alignment as a continuous multi-step flow. In this paper, we argue that existing FM methods are inherently constrained by incompatible geometri... Recent flow matching (FM) methods improve the few-shot adaptation of vision-language models, by modeling cross-modal alignment as a continuous multi-step flow. In this paper, we argue that existing FM methods are inherently constrained by incompatible geometric priors on pre-trained cross-modal features, resulting in suboptimal adaptation performance. We first analyze these methods from a polar decomposition perspective (i.e., radial and angular sub-manifolds). Under this new geometric view, we ...
274	ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection 2605.05057 Open-vocabulary HOI detection学习脚本化状态转移以提升开放词汇人机交互检测。	cs.CV	Minh Anh Nguyen, Quang Huy Tran, Bao Ngoc Le, SuiYang Guang, Tuan Kiet Pham	Open-vocabulary human-object interaction (HOI) detection requires recognizing interaction phrases that may not appear as annotated categories during training. Recent vision-language HOI detectors improve semantic transfer by matching human-object features with... Open-vocabulary human-object interaction (HOI) detection requires recognizing interaction phrases that may not appear as annotated categories during training. Recent vision-language HOI detectors improve semantic transfer by matching human-object features with text embeddings, but their predictions are often dominated by object affordance and phrase-level co-occurrence. As a result, a model may predict \textit{cut cake} from the presence of a knife and a cake without verifying whether the hand, ...
280	Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy 2605.05072 Camera-LiDAR occupancy projection用高度引导的投影重参数化改进2D到3D占据特征对齐。	cs.CV	Yuan Wu, Zhiqiang Yan, Jiawei Lian, Zhengxue Wang, Jian Yang	3D occupancy prediction aims to infer dense, voxel-wise scene semantics from sensor observations, where the 2D-to-3D view transformation serves as a crucial step in bridging image features and volumetric representations. Most previous methods rely on a fixed p... 3D occupancy prediction aims to infer dense, voxel-wise scene semantics from sensor observations, where the 2D-to-3D view transformation serves as a crucial step in bridging image features and volumetric representations. Most previous methods rely on a fixed projection space, where 3D reference points are uniformly sampled along pillars. However, such sampling struggles to capture the sparsity and height variations of real-world scenes, leading to ambiguous correspondences and unreliable feature...
281	FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching 2605.05077 Language-guided image segmentation用流匹配实现语言引导的二分割并保留细粒度结构。	cs.CV	Andranik Sargsyan, Shant Navasardyan	Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical image analysis. In recent years, Dichotomous Image Segmentation (DIS) has become a standard task for training and evaluating... Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical image analysis. In recent years, Dichotomous Image Segmentation (DIS) has become a standard task for training and evaluating highly accurate segmentation models. Existing DIS approaches often fail to preserve fine-grained details or fully capture the semantic structure of the foreground. To address these challenges, we present FlowDIS, a novel dichotomous image ...
282	A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping 2605.05079 Refractive warping restoration benchmark构建强折射扭曲下多帧图像复原统一基准与评测集。	cs.CV	Maxim V. Shugaev, Md Reshad Ul Hoque, Bridget Kennedy, Joseph T. Riley, Fiona Hwang	Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no existing benchmarks syst... Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no existing benchmarks systematically evaluate restoration methods under strong and highly nonuniform refractive conditions. We present a comprehensive benchmark for geometric distortion removal in video, covering a range from turbulence-like mild warping to strong d...
311	CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization 2605.05136 CPCA域泛化网络通过深度展开CPCA学习域不变子空间以提升OOD泛化。	cs.CV	Yu-Hsi Chen, Abd-Krim Seghouane	Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen target domains. While recent invariant learning strategies and architectural advances have achieved strong p... Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen target domains. While recent invariant learning strategies and architectural advances have achieved strong performance, explicitly discovering a structured domain-invariant subspace through second-order statistics remains underexplored. In this work, we propose CPCANet, a novel framework grounded in Common Principal Component Analysis (CPCA), whi...
315	What Matters in Practical Learned Image Compression 2605.05148 实用感知图像压缩系统分析并联合优化学习式图像编解码的感知质量与运行效率。	cs.CVcs.AIcs.LG	Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang	One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is ... One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed. In this work, we aim to close this gap. We conduct a comprehensive study of the key modeling choices that govern the design of a practical learned image codec, jointly optimized for perceptual quality and runtime -- in...
318	Aes3D: Aesthetic Assessment in 3D Gaussian Splatting 2605.05155 3DGS美学评估提出Aes3D评估3D高斯泼溅场景的构图与审美属性。	cs.CVcs.AI	Chuanzhi Xu, Boyu Wei, Haoxian Zhou, Xuanhua Yin, Zihan Deng	As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes becomes important in helping creators build more visually compelling 3D content. However, existing evaluation methods for 3D ... As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes becomes important in helping creators build more visually compelling 3D content. However, existing evaluation methods for 3D scenes primarily emphasize reconstruction fidelity and perceptual realism, largely overlooking higher-level aesthetic attributes such as composition, harmony, and visual appeal. This limitation comes from two key challenges: (1) the absence...
320	Seeing What Shouldn't Be There: Counterfactual GANs for Medical Image Attribution 2605.05283 医学影像反事实归因用反事实GAN生成对照样本以解释医学图像分类的关键证据区域。	cs.CV	Shakeeb Murtaza	Ascription of an image gives insights into the objects that influence the classification of the whole image or its pixels towards a specific category. These insights help radiologists to visualize deformities in medical imaging. Most of the existing visualizat... Ascription of an image gives insights into the objects that influence the classification of the whole image or its pixels towards a specific category. These insights help radiologists to visualize deformities in medical imaging. Most of the existing visualization techniques are based on discriminative models and highlight regions of the input image participating in the decision-making of a classifier. However, these approaches do not take all noticeable objects into account as their objective is...
321	Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging 2605.05161 VLM医学OOD定位提出WALDO用最优传输对齐正常参考分布实现零样本异常定位。	cs.CV	Bernhard Kainz, Johanna P Mueller, Matthew Baugh, Cosmin Bercea	Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a co... Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a comparative inference problem in which anomalies are identified through structured comparison against reference distributions of normal anatomy. We introduce WALDO, a training-free framework grounded in optimal transport theory that enables c...
322	PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World 2605.05163 物理约束3D资产生成提出PhysForge与PhysDB生成具层级物理功能逻辑的可交互3D资产。	cs.CV	Yunhan Yang, Chunshi Wang, Junliang Ye, Yang Li, Zanxin Chen	Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interacti... Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physica...
323	Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation 2605.05164 WSI几何感知表征提出几何感知状态空间模型聚合病理WSI补丁以提升切片级预测。	cs.CVcs.AI	Enhui Chai, Sicheng Chen, Tianyi Zhang, Chad Wong, Kecheng Huang	Accurate analysis of histopathological images is critical for disease diagnosis and treatment planning. Whole-slide images (WSIs), which digitize tissue specimens at gigapixel resolution, are fundamental to this process but require aggregating thousands of pat... Accurate analysis of histopathological images is critical for disease diagnosis and treatment planning. Whole-slide images (WSIs), which digitize tissue specimens at gigapixel resolution, are fundamental to this process but require aggregating thousands of patches for slide-level predictions. Multiple Instance Learning (MIL) tackles this challenge with a two-stage paradigm, decoupling tile-level embedding and slide-level prediction. However, most existing methods implicitly embed patch represent...
332	OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents 2605.05185 多模态搜索代理训练配方开源多模态搜索代理的数据、轨迹合成与训练流程配方。	cs.CV	Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai	Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain diff... Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training fr...
333	LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore) 2605.05187 4D世界模型质量评测报告PhyScore挑战，评估生成视频的感知与物理一致性指标。	cs.CV	Wei Luo, Yiting Lu, Xin Li, Haoran Li, Fengbin Guan	This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: percept... This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: perceptual quality alone is insufficient to judge whether generated dynamics are physically plausible, temporally coherent, and consistent with input conditions. Participants are required to build a metric that jointly predicts four dimensions, i....
340	D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models 2605.05204 扩散模型在策略自蒸馏微调用D-OPSD在不破坏少步推理下持续微调蒸馏扩散模型。	cs.CV	Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng	The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly... The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromises their inherent few-step inference capability. To address this, we propose D-OPSD, a novel training paradigm for step-distill...
341	Taming Outlier Tokens in Diffusion Transformers 2605.05206 扩散Transformer离群token抑制分析DiT离群高范数token并提出方法减弱其不良影响。	cs.CVcs.AIcs.LG	Xiaoyu Wu, Yifei Wang, Tsu-Jui Fu, Liang-Chieh Chen, Zhe Gan	We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information,... We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information, but their role in generative models remains underexplored. We show that this phenomenon appears in both the encoder and denoiser of modern Representation Autoencoder (RAE)-DiT pipelines: pretrained ViT encoders can produce outlier represen...
342	Syn4D: A Multiview Synthetic 4D Dataset 2605.05207 多视角合成4D数据集发布Syn4D，提供动态场景多视角真值深度与跟踪标注。	cs.CV	Zeren Jiang, Yushi Lan, Yihang Luo, Yufan Deng, Zihang Lai	Dense 3D reconstruction and tracking of dynamic scenes from monocular video remains an important open challenge in computer vision. Progress in this area has been constrained by the scarcity of high-quality datasets with dense, complete, and accurate geometric... Dense 3D reconstruction and tracking of dynamic scenes from monocular video remains an important open challenge in computer vision. Progress in this area has been constrained by the scarcity of high-quality datasets with dense, complete, and accurate geometric annotations. To address this limitation, we introduce Syn4D, a multiview synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations. A key feature of Syn4...
343	Query2Uncertainty: Robust Uncertainty Quantification and Calibration for 3D Object Detection under Distribution Shift 2605.05328 分布移位下3D检测不确定性提出密度感知校准方法提升3D目标检测移位场景置信度可靠性。	cs.CVcs.RO	Till Beemelmanns, Alexey Nekrasov, Stefan Vilceanu, Jonas Steinhaus, Timo Woopen	Reliable uncertainty estimation for 3D object detection is critical for deploying safe autonomous systems, yet modern detectors remain poorly calibrated, especially under distribution shifts. Although post-hoc calibration methods address this issue and provide... Reliable uncertainty estimation for 3D object detection is critical for deploying safe autonomous systems, yet modern detectors remain poorly calibrated, especially under distribution shifts. Although post-hoc calibration methods address this issue and provide improved calibration for in-distribution tests, they fail to adapt in distribution-shifted scenarios. In this work, we address this issue and introduce a density-aware calibration method that couples post-hoc calibrators with the feature d...
346	ViTok-v2: Scaling Native Resolution Auto-Encoders to 5 Billion Parameters 2605.05331 大规模ViT自编码器tokenizer将原生分辨率ViT自编码器扩展到50亿参数并稳定训练。	cs.CVcs.AIcs.LG	Philippe Hansen-Estruch, Jiahui Chen, Vivek Ramanujan, Orr Zohar, Yan Ping	Vision Transformer (ViT) autoencoders have emerged as compelling tokenizers for images, offering improved reconstruction over convolutional tokenizers. However, existing ViT tokenizers cannot explore this landscape as performance degrades outside training reso... Vision Transformer (ViT) autoencoders have emerged as compelling tokenizers for images, offering improved reconstruction over convolutional tokenizers. However, existing ViT tokenizers cannot explore this landscape as performance degrades outside training resolutions, and reliance on adversarial losses prevents stable scaling. ViTok (Hansen-Estruch et al., 2025) found that the compression ratio r mediates a reconstruction-generation trade-off where lower r means better reconstructions but harder...
349	Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery 2605.05344 卫星图像开放词汇检索用LLM引导的查询嵌入精炼提升卫星开放词汇目标检索。	cs.CVcs.AIcs.IR	Md Adnan Arefeen, Biplob Debnath, Ravi K. Rajendran, Murugan Sankaradas, Srimat T. Chakradhar	In satellite applications, user queries often take the form of open-ended natural language, extending beyond a fixed set of predefined categories. This open-vocabulary nature poses significant challenges for retrieving relevant image tiles, as the retrieval sy... In satellite applications, user queries often take the form of open-ended natural language, extending beyond a fixed set of predefined categories. This open-vocabulary nature poses significant challenges for retrieving relevant image tiles, as the retrieval system must generalize to a wide range of unseen objects and concepts. While vision-language models (VLMs) such as CLIP are widely used for text-image retrieval, even fine-tuned variants often struggle to accurately align such queries with sa...
352	egenioussBench: A New Dataset for Geospatial Visual Localisation 2605.05351 地理视觉定位基准数据集提出egenioussBench，基于城市级3D网格与高精度手机查询定位。	cs.CV	Phillipp Fanta-Jende, Francesco Vultaggio, Alexander Kern, Yasmin Loeper, Markus Gerke	We present egenioussBench, a visual localisation benchmark built on geospatial reference data: a city-scale airborne 3D mesh and a CityGML LoD2 model. This pairing reflects deployable mapping assets and supports true scalability beyond traditional SfM-based ap... We present egenioussBench, a visual localisation benchmark built on geospatial reference data: a city-scale airborne 3D mesh and a CityGML LoD2 model. This pairing reflects deployable mapping assets and supports true scalability beyond traditional SfM-based approaches. The query data comprise smartphone images with centimetre-accurate, map-independent ground truth obtained via PPK and GCP/CP-aided adjustment. From 2,709 images, we derive a non-co-visible subset by estimating the full co-visibili...
358	Tamaththul3D: High-Fidelity 3D Saudi Sign Language Avatars from Monocular Video 2605.05367 单目视频手语3D头像重建为沙特手语提供SMPL-X标注并从单目视频生成高保真3D头像。	cs.CVcs.AI	Eyad Alghamdi, Sattam Altuuaim, Obay Ghulam, Abdulrahman Qutah, Yousef Basoodan	Arabic Sign Language (ArSL) and its dialects serve approximately 400 million Arabic speakers worldwide, yet the community lacks high-quality 3D parametric annotations and specialized reconstruction methods for avatar generation. We address this critical gap th... Arabic Sign Language (ArSL) and its dialects serve approximately 400 million Arabic speakers worldwide, yet the community lacks high-quality 3D parametric annotations and specialized reconstruction methods for avatar generation. We address this critical gap through two key contributions: First, we introduce the first high-quality 3D parametric annotations for the Ishara-500 Saudi Sign Language dataset, providing precise SMPL-X parameters for 500 culturally authentic SSL signs. Second, we present...
361	Two Steps Are All You Need: Efficient 3D Point Cloud Anomaly Detection with Consistency Models 2605.05372 3D点云异常检测用一致性模型两步实现高效点云异常检测。	cs.CVcs.AI	Pranav A, Shashank B, Pranav Siddappa, Dominik Seuss, Minal Moharir	Diffusion models are rapidly redefining 3D anomaly detection in point cloud data. As 3D sensing becomes integral to modern manufacturing, reliable anomaly detection is essential for high-throughput quality assurance and process control. Yet practical deploymen... Diffusion models are rapidly redefining 3D anomaly detection in point cloud data. As 3D sensing becomes integral to modern manufacturing, reliable anomaly detection is essential for high-throughput quality assurance and process control. Yet practical deployment on resource-constrained, latency-critical systems remains limited. Existing methods are often computationally prohibitive or unreliable in complex, unmasked regions, and diffusion pipelines are inherently bottlenecked by iterative denoisi...
364	Visual Text Compression as Measure Transport 2605.06708 视觉文本压缩评估将视觉文本压缩建模为测度传输以预测任务效用。	cs.CVcs.AI	Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li	Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not t... Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not translate predictably into downstream utility: on some tasks the visual path matches or exceeds the text path, on others it collapses, and the compression ratio itself does not predict which regime will occur. The missing quantity is therefo...
369	LAMP: Localization Aware Multi-camera People Tracking in Metric 3D World 2605.05390 多相机3D行人跟踪提出LAMP利用定位与标定多视角实现头显3D跟踪。	cs.CV	Nan Yang, Julian Straub, Fan Zhang, Richard Newcombe, Jakob Engel	Tracking 3D human motion from egocentric multi-camera headset is challenged by severe egomotion, partial visibility or occlusions and lack of training data. Existing methods designed for monocular video often require static or slowly-moving cameras and cannot ... Tracking 3D human motion from egocentric multi-camera headset is challenged by severe egomotion, partial visibility or occlusions and lack of training data. Existing methods designed for monocular video often require static or slowly-moving cameras and cannot efficiently leverage multi-view, calibrated and localized input. This makes them brittle and prone to fail on dynamic egocentric captures. We propose LAMP (Localization Aware Multi-camera People Tracking): a novel, simple framework to solve...
375	Zero-Shot Satellite Image Retrieval through Joint Embeddings: Application to Crisis Response 2605.05405 零样本卫星图像检索提出GeoQuery用联合嵌入实现自然语言零样本卫星检索。	cs.CV	James Walsh, William Fawcett, Grace Colvard, Raúl Ramos-Pollán	Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-... Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-sensing CLIP-style model requires paired data and compute that are unavailable at global scale. To allow natural language querying at global scales, we present GeoQuery, a zero-shot retrieval system that sidesteps data and compute constrain...
388	Safety-Critical Camera Reliability Monitoring for ADAS via Degradation-Aware Uncertainty Pattern Analysis 2605.05439 ADAS相机可靠性监测基于退化不确定性模式构建健康指数以提前预警相机风险。	cs.CV	Shiva Aher	Reliable camera input is essential for safety-critical ADAS perception, but most monitoring approaches detect sensor failures only after downstream performance has degraded. We propose a proactive camera reliability monitoring framework that estimates percepti... Reliable camera input is essential for safety-critical ADAS perception, but most monitoring approaches detect sensor failures only after downstream performance has degraded. We propose a proactive camera reliability monitoring framework that estimates perception risk from degradation-induced uncertainty patterns before downstream failure becomes observable. The method introduces a Global Sensor Health Index (GSHI), a continuous reliability score that aggregates per-degradation severities using a...
392	EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function 2605.05447 Beamspace Echocardiography Dataset发布原始波束空间超声数据集用于心脏运动与血流学习。	cs.CV	Elias Stenhede, Joanna Sulkowska, Eivind Bjørkan Orstad, Henrik Schirmer, Arian Ranjbar	We introduce EchoXFlow, a clinical echocardiography dataset for learning from ultrasound in its native acquisition geometry rather than from scan-converted Cartesian videos. Existing public datasets offer limited opportunities to study cross-modal relationship... We introduce EchoXFlow, a clinical echocardiography dataset for learning from ultrasound in its native acquisition geometry rather than from scan-converted Cartesian videos. Existing public datasets offer limited opportunities to study cross-modal relationships between cardiac anatomy, myocardial motion, and blood flow, as Doppler is typically absent or fused as RGB overlays, and acquisitions are released after lossy vendor display processing. EchoXFlow comprises 37125 recordings from 666 routin...
411	The First Controllable Bokeh Rendering Challenge at NTIRE 2026 2605.05510 Controllable Bokeh Rendering Benchmark总结NTIRE可控散景渲染挑战赛结果与优胜方法。	cs.CV	Tim Seizinger, Florin-Alexandru Vasluianu, Jeffrey Chen, Zhuyun Zhou, Zongwei Wu	This study presents the outcomes of the first Controllable Bokeh Rendering Challenge at NTIRE and highlights the most effective submitted methodologies. In total, 44 participants registered for the competition, of which 8 teams submitted valid solutions after ... This study presents the outcomes of the first Controllable Bokeh Rendering Challenge at NTIRE and highlights the most effective submitted methodologies. In total, 44 participants registered for the competition, of which 8 teams submitted valid solutions after the conclusion of the final test phase. All submissions were evaluated on unseen images, focusing on portraits and intricate subjects with complex and visually appealing bokeh phenomena. In addition to the first track focusing on establishe...
cs.CY 2 papers
110	Guidelines for Designing AI Technologies to Support Adult Learning 2605.04616 AI design for adult learning总结成人学习场景下AI教育技术的设计与评估指南。	cs.CYcs.AI	Jennifer M. Reddig, Glen R. Smith, Sanaz Ahmadzadeh Siyahrood, Wesley G. Morris, Yoojin Bae	AI-powered educational technologies have demonstrated measurable benefits for learners, but their design and evaluation have largely centered on K-12 contexts. As a result, many AI-supported learning systems remain poorly aligned with the needs, constraints, a... AI-powered educational technologies have demonstrated measurable benefits for learners, but their design and evaluation have largely centered on K-12 contexts. As a result, many AI-supported learning systems remain poorly aligned with the needs, constraints, and goals of adult learners. To better understand how AI systems function in adult education, this paper examines the deployment of several AI learning technologies developed within a multidisciplinary, national research institute in the Uni...
396	The Pedagogy of AI Mistakes: Fostering Higher-Order Thinking 2605.05472 Pedagogy of AI Errors将生成式AI错误用于课程设计以促进高阶思维训练。	cs.CYcs.AI	Hadi Hosseini	As generative AI becomes increasingly integrated into higher education, its frequent errors and hallucinations, often seen as limitations, offer a unique pedagogical opportunity. By framing AI as a ``learning companion'' whose imperfect outputs prompt analysis... As generative AI becomes increasingly integrated into higher education, its frequent errors and hallucinations, often seen as limitations, offer a unique pedagogical opportunity. By framing AI as a ``learning companion'' whose imperfect outputs prompt analysis, evaluation, and reflection, we argue that instructors can engage students in the fundamental processes of higher-order thinking. This paper presents a design-oriented study in which an AI-integrated syllabus in a \textit{database design} ...
cs.DC 3 papers
32	One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving 2605.04450 GPU HBM Cache Partitioning自适应划分HBM以平衡嵌入与KV缓存加速推荐服务。	cs.DCcs.IRcs.LG	Wenjun Yu, Shuguang Han, Amelie Chi Zhou	Generative Recommender (GR) inference places embedding hot caches (EMB) and KV caches in direct competition for limited GPU HBM: allocating more memory to one improves its efficiency but degrades the other. Existing systems optimize them in isolation, overlook... Generative Recommender (GR) inference places embedding hot caches (EMB) and KV caches in direct competition for limited GPU HBM: allocating more memory to one improves its efficiency but degrades the other. Existing systems optimize them in isolation, overlooking that the optimal EMB-KV allocation ratio can shift by up to 0.35 across workload regimes, leaving 20-30\% latency improvement unrealized. While online reallocation is required to close this gap, naive approaches introduce H2D refill tra...
45	CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training 2605.04478 Distributed Training Hang Diagnosis构建CCL-D高精度定位大模型训练通信慢/卡死根因。	cs.DCcs.AI	Yida Gu, Fakang Wang, Jianhao Fu, Zhenhang Sun, Qianyu Zhang	As training scales grow, collective communication libraries (CCL) increasingly face anomalies arising from complex interactions among hardware, software, and environmental factors. These anomalies typically manifest as slow/hang communication, the most frequen... As training scales grow, collective communication libraries (CCL) increasingly face anomalies arising from complex interactions among hardware, software, and environmental factors. These anomalies typically manifest as slow/hang communication, the most frequent and time-consuming category to diagnose. However, traditional diagnostic methods remain inaccurate and inefficient, frequently requiring hours or even days for root cause analysis. To address this, we propose CCL-D, a high-precision diagn...
269	Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism 2605.05049 MoE训练系统优化提出Piper用资源建模与流水混合并行提升大规模MoE训练效率。	cs.DCcs.AIcs.LG	Sajal Dash, Feiyi Wang	Frontier models increasingly adopt Mixture-of-Experts (MoE) architectures to achieve large-model performance at reduced cost. However, training MoE models on HPC platforms is hindered by large memory footprints, frequent large-scale communication across hetero... Frontier models increasingly adopt Mixture-of-Experts (MoE) architectures to achieve large-model performance at reduced cost. However, training MoE models on HPC platforms is hindered by large memory footprints, frequent large-scale communication across heterogeneous networks, and severe workload imbalance. To characterize these challenges, we develop a mathematical model that quantifies memory, compute, and communication requirements for MoE configurations under various parallelization schemes,...
cs.GR 3 papers
63	CoherentRaster: Efficient 3D Gaussian Splatting for Light Field Displays 2605.04509 Gaussian splatting for light field displays提出高效光场显示渲染的3D高斯泼溅栅格化加速方法	cs.GR	Gyujin Sim, Seungjoo Shin, Hosung Jeon, Gwangsoon Lee, Hyon-Gon Choo	Light field displays (LFDs) require rendering an interlaced image that encodes many view-dependent observations. This multi-view requirement introduces substantial computational overhead, making real-time rendering difficult to achieve. While 3D Gaussian Splat... Light field displays (LFDs) require rendering an interlaced image that encodes many view-dependent observations. This multi-view requirement introduces substantial computational overhead, making real-time rendering difficult to achieve. While 3D Gaussian Splatting (3DGS) is efficient for single-view rendering on 2D displays, directly extending it to LFDs is computationally expensive. Moreover, prior accelerations either suffer from GPU inefficiency under spatially incoherent subpixel layouts or ...
169	AGIPC: Adaptive In-Solve Algebraic Coarsening for GPU IPC 2605.04773 GPU隐式积分自适应粗化提出求解过程中代数自适应粗化以加速GPU隐式时间积分线性求解。	cs.GRcs.PF	Xuan Wang, Zhaofeng Luo, Minchen Li, Taku Komura, Kemeng Huang	Implicit time integration is key to robustly simulating stiff materials and large deformations, but its performance is often dominated by repeatedly solving large linear systems. Adaptive coarsening can reduce this cost by concentrating degrees of freedom (DoF... Implicit time integration is key to robustly simulating stiff materials and large deformations, but its performance is often dominated by repeatedly solving large linear systems. Adaptive coarsening can reduce this cost by concentrating degrees of freedom (DoF) to where it is most needed, yet conventional explicit remeshing changes connectivity (and often vertex ordering), complicating parallel implementations, harming memory locality, and sometimes being disallowed when it may introduce local g...
292	A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry 2605.05095 Bayesian next-best-view selection以贝叶斯决策在几何不确定下选择任务相关的下一最佳扫描视角。	cs.GRcs.CVcs.LGstat.ML	Jingsen Zhu, Silvia Sellán, Alexander Terenin	We develop a framework for task-specific active next-best-view selection in 3D reconstruction from point clouds, by casting the problem in the language of Bayesian decision theory. Our framework works by (a) placing a prior distribution over the space of impli... We develop a framework for task-specific active next-best-view selection in 3D reconstruction from point clouds, by casting the problem in the language of Bayesian decision theory. Our framework works by (a) placing a prior distribution over the space of implicit surfaces, (b) using recently-developed stochastic surface reconstruction methods to calculate the resulting posterior distribution, then (c) using the posterior distribution to carefully reason about which view to scan next. This enable...
cs.HC 3 papers
150	AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations 2605.04729 AI Feedback for Presentation Slides实现AISSA结合LLM与学习分析仪表盘为学生幻灯片提供量表化反馈。	cs.HCcs.AIcs.SE	Alvaro Becerra, Diego Gomez, Ruth Cobos	Providing timely and actionable feedback on oral presentation slides is challenging in higher education, particularly in large classes where teachers cannot realistically deliver detailed formative feedback before students present. This paper introduces AISSA ... Providing timely and actionable feedback on oral presentation slides is challenging in higher education, particularly in large classes where teachers cannot realistically deliver detailed formative feedback before students present. This paper introduces AISSA (AI-based Student Slides Analysis tool), a web-based system that combines large language models (LLMs) and Learning Analytics dashboards to support scalable, rubric-based feedback on presentation slides. AISSA allows students to upload thei...
156	AICoFe: Implementation and Deployment of an AI-Based Collaborative Feedback System for Higher Education 2605.04740 高校同伴反馈多LLM系统实现并部署多LLM协作的同伴反馈系统以提升评语质量。	cs.HCcs.AIcs.SE	Alvaro Becerra, Alejandra Palma, Ruth Cobos	Effective peer feedback is essential for developing critical reflection in higher education, yet its impact is often limited by the inconsistent quality of student-generated comments. This paper presents the implementation and deployment of AICoFe (AI-based Co... Effective peer feedback is essential for developing critical reflection in higher education, yet its impact is often limited by the inconsistent quality of student-generated comments. This paper presents the implementation and deployment of AICoFe (AI-based Collaborative Feedback), a system designed to bridge this gap through a human-centered AI approach. We describe a modular architecture that orchestrates a multi-LLM pipeline, utilizing GPT-4.1-mini, Gemini 2.5 Flash, and Llama 3.1, to synthes...
350	Making AI Drafts Count: A Quality Threshold in Audio Description Workflows 2605.05348 音频描述AI草稿质量阈值研究AI草稿质量如何影响无障碍音频描述的编辑效率与结果。	cs.HCcs.AI	Lana Do, Shasta Ihorn, Charity M. Pitcher-Cooper, Sanjay Mirani, Gio Jung	Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the barrier to entry. What remains a... Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the barrier to entry. What remains an open question is how draft quality shapes the editing process. We investigate this through GenAD, an AD generation pipeline that incorporates accessibility guidelines and contextual video information, and RefineAD, an editing interface fo...
cs.IR 3 papers
46	Career-Aware Resume Tailoring via Multi-Source Retrieval-Augmented Generation with Provenance Tracking: A Case Study 2605.05257 RAG Resume Tailoring with Provenance用多源RAG与溯源追踪从职业库生成岗位定制简历。	cs.IRcs.AIcs.CL	Kumar Abhinav	AI-assisted resume tailoring systems commonly operate on a single uploaded resume, which limits their ability to recover relevant experience omitted from the current draft and makes it difficult for users to distinguish grounded edits from model-generated sugg... AI-assisted resume tailoring systems commonly operate on a single uploaded resume, which limits their ability to recover relevant experience omitted from the current draft and makes it difficult for users to distinguish grounded edits from model-generated suggestions. This paper presents Resume Tailor, an agentic resume-tailoring system that maintains a longitudinal career vault in a vector database and uses multi-source retrieval-augmented generation (RAG) to assemble job-specific resume conten...
146	Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation 2605.04723 Efficient Sequential Recommendation重审卷积网络用于属性感知序列推荐以降低长序列建模的计算与内存。	cs.IRcs.LG	Shereen Elsayed, Ngoc Son Le, Ahmed Rashed, Lars Schmidt-Thieme	Attribute-aware sequential recommendation entails predicting the next item a user will interact with based on a chronologically ordered history of past interactions, enriched with item attributes. Existing methods typically leverage self-attention mechanisms t... Attribute-aware sequential recommendation entails predicting the next item a user will interact with based on a chronologically ordered history of past interactions, enriched with item attributes. Existing methods typically leverage self-attention mechanisms to aggregate the entire sequence into a unified representation used for next-item prediction. While effective, these models often suffer from high computational complexity and memory consumption, limiting their ability to process long user h...
172	Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes 2605.04797 众包视听深伪检测量化众包对视听深伪的识别一致性及操纵类型与时间定位准确性。	cs.IRcs.AI	Michael Soprano, Andrea Cioci, Stefano Mizzaro	Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manip... Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manipulated videos and, when they flag a video as manipulated, how accurately they identify the manipulation type (audio-only, video-only, or audio-video) and how consistently they report manipulation timestamps. We run two matched crowdsourcing...
cs.IT 3 papers
13	Contextual Memory-Enhanced Source Coding for Low-SNR Communications 2605.04400 Robust Source Coding in Low-SNR用上下文记忆增强源编码以降低低信噪文本传输的误差扩散。	cs.ITcs.LG	Ziqiong Wang, Rongpeng Li	While Separate Source-Channel Coding (SSCC) retains the practical benefits of modular system design, its effectiveness in noisy text transmission is fundamentally constrained by the fragility of autoregressive source decoding. In low-SNR regimes, even a small ... While Separate Source-Channel Coding (SSCC) retains the practical benefits of modular system design, its effectiveness in noisy text transmission is fundamentally constrained by the fragility of autoregressive source decoding. In low-SNR regimes, even a small number of residual bit errors after channel decoding may derail the subsequent lossless reconstruction process, especially when Arithmetic Coding (AC) relies on Large Language Model (LLM)-based probability estimation. Existing remedies eith...
382	Information-theoretic Limits of Learning and Estimation 2605.06710 学习估计信息论极限系统介绍信息论工具用于刻画学习与估计的基本下界。	cs.ITcs.LGmath.ST	Abbas El Gamal, Maxim Raginsky	Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-ch... Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-chapter exercises makes the material suitable for both classroom use and self-study. We begin by introducing concentration inequalities along with the notions of covering and packing in metric spaces, and the associated concept of metric entr...
414	When Semantic Communication Meets Queueing: Cross-Layer Latency and Task Fidelity Optimization 2605.05514 Semantic Communication with Queueing Optimization联合队列与语义通信优化无线传输时延与任务保真度。	cs.ITcs.AIcs.LGcs.NIeess.SP	Yalin E. Sagduyu, Tugba Erpek	Semantic communication (SemCom) with learned encoder-decoder architectures enables end-to-end learning of compact task-oriented representations optimized for the wireless channel, reducing channel resources needed to convey task-relevant information and improv... Semantic communication (SemCom) with learned encoder-decoder architectures enables end-to-end learning of compact task-oriented representations optimized for the wireless channel, reducing channel resources needed to convey task-relevant information and improving spectrum efficiency. This paper studies semantic image transmission over block Rayleigh fading with AWGN using a multi-task semantic autoencoder that jointly reconstructs images and predicts labels from the received waveform. The latent...
cs.LG 137 papers
1	Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment 2605.04363 Tabular Label Shift Adaptation提出DistPFN在测试时校正后验以缓解表格模型标签偏移。	cs.LGcs.AI	Seunghan Lee, Jaehoon Lee, Jun Seo, Sungdong Yoo, Minjae Kim	TabPFN has recently gained attention as a foundation model for tabular datasets, achieving strong performance by leveraging in-context learning on synthetic data. However, we find that TabPFN is vulnerable to label shift, often overfitting to the majority clas... TabPFN has recently gained attention as a foundation model for tabular datasets, achieving strong performance by leveraging in-context learning on synthetic data. However, we find that TabPFN is vulnerable to label shift, often overfitting to the majority class in the training dataset. To address this limitation, we propose DistPFN, the first test-time posterior adjustment method designed for tabular foundation models. DistPFN rescales predicted class probabilities by downweighting the influence...
2	Online Nonstochastic Prediction: Logarithmic Regret via Predictive Online Least Squares 2605.04364 Online Prediction for LDS提出预测式在线最小二乘以在边际稳定系统中实现对数遗憾。	cs.LGeess.SYmath.OC	Chih-Fan Pai, Yang Zheng	We study online prediction for marginally stable, partially observed linear dynamical systems under nonstochastic disturbances. Our objective is to minimize the cumulative squared prediction loss and compete with the best-in-hindsight Luenberger predictor. Sta... We study online prediction for marginally stable, partially observed linear dynamical systems under nonstochastic disturbances. Our objective is to minimize the cumulative squared prediction loss and compete with the best-in-hindsight Luenberger predictor. Standard online learning methods typically rely on bounded domains/gradients, and thus their guarantees may fail to deal with potentially unbounded trajectories in marginally stable systems. In this paper, we introduce an unconstrained online ...
4	Extending Differential Temporal Difference Methods for Episodic Problems 2605.04368 Episodic Differential TD Learning扩展差分TD方法以适配回合式任务并避免奖励中心化改策。	cs.LGcs.AI	Kris De Asis, Mohamed Elsayed, Jiamin He	Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This keeps the return bou... Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This keeps the return bounded and removes a value function's state-independent offset. However, reward centering can alter the optimal policy in episodic problems, limiting its applicability. Motivated by recent works that emphasize the role of normalization in str...
6	$p$-adic Manifold Learning and Benchmark Tasks from Impartial Games 2605.04374 p-adic Manifold Learning提出p进流形学习算法并基于公平博弈构建基准任务。	cs.LGmath.NT	Tomoki Mihara	We introduce $p$-adic manifold learning, propose an algorithm to solve it, and propose benchmark tasks from impartial games. We introduce $p$-adic manifold learning, propose an algorithm to solve it, and propose benchmark tasks from impartial games.
8	GraphPI: Efficient Protein Inference with Graph Neural Networks 2605.04376 GNN Protein Inference将蛋白推断建模为图节点分类并用GNN缓解标注稀缺。	cs.LG	Zheng Ma, Jiazhen Chen, Lei Xin, Ali Ghodsi	The integration of deep learning approaches in biomedical research has been transformative, enabling breakthroughs in various applications. Despite these strides, its application in protein inference is impeded by the scarcity of extensively labeled datasets, ... The integration of deep learning approaches in biomedical research has been transformative, enabling breakthroughs in various applications. Despite these strides, its application in protein inference is impeded by the scarcity of extensively labeled datasets, a challenge compounded by the high costs and complexities of accurate protein annotation. In this study, we introduce GraphPI, a novel framework that treats protein inference as a node classification problem. We treat proteins as interconne...
11	Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize 2605.04396 Transformer Complexity Control Timing揭示训练中关键时间窗决定Transformer走向推理或记忆。	cs.LGcs.AI	Sarwan Ali	Recent work has shown that Transformers' compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers training toward low-complexity reasoning solutions rather than high-complexity memorization. Exi... Recent work has shown that Transformers' compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers training toward low-complexity reasoning solutions rather than high-complexity memorization. Existing analyses, however, treat complexity control as a single static hyperparameter choice, leaving open \emph{when} during training this control is actually decisive. We show that the memorization-versus-reasoning fate of a Transformer is ...
15	Beyond Rigid Geometries: The Spline-Pullback Metric for Universal Diffeomorphic SPD Representation Learning 2605.04406 Learnable SPD Geometry提出样条回拉度量以实现通用可微同胚的SPD表示学习。	cs.LG	Tushar Das, Subrata Dutta, Sarmistha Neogy, Koushlendra Kumar Singh	The integration of Symmetric Positive Definite (SPD) matrices into deep learning has historically relied on fixed algebraic Riemannian metrics. Analogous to hand-crafted features in classical machine learning, these static formulations impose rigid geometries ... The integration of Symmetric Positive Definite (SPD) matrices into deep learning has historically relied on fixed algebraic Riemannian metrics. Analogous to hand-crafted features in classical machine learning, these static formulations impose rigid geometries limiting network expressivity and adaptability. Recent attempts to parameterize these geometries often violate the axioms of primary matrix functions through unconstrained powers or rank-dependent scaling, inviting spatial folding, loss of ...
19	Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models 2605.04413 Counterfactual Identifiability without Monotonicity提出非单调三角SCM并给出无全局单调下的反事实可识别条件。	cs.LGstat.ME	Pengcheng Tan, Jiang Chen, Dehui Du	Structural causal models provide a unified semantics for interventions and counterfactuals, but most identifiability results rely on restrictive assumptions like global monotonicity, which are often violated in embodied interaction, where the same exogenous pe... Structural causal models provide a unified semantics for interventions and counterfactuals, but most identifiability results rely on restrictive assumptions like global monotonicity, which are often violated in embodied interaction, where the same exogenous perturbation can induce opposite responses under different contact contexts. We ask what structure still suffices once global monotonicity is dropped. We introduce non-monotone triangular structural causal models (NM-TM-SCM), which retain tri...
20	Demystifying Manifold Constraints in LLM Pre-training 2605.04418 Manifold Constraints in LLM Training系统分析LLM预训练中显式流形约束对稳定性与性能的作用机制。	cs.LGcs.AImath.OC	Kang An, Jiaxiang Li, Donald Goldfarb, Shiqian Ma	The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained optimization approaches that explicitly restrict weights may... The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained optimization approaches that explicitly restrict weights may improve numerical stability and performance, the mechanism and motivation for adding constraints still remain elusive. This paper systematically demystifies the role of explicit manifold constraints in LLM pre-training. By introducing the ...
21	FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning 2605.04421 Continuous-Time Transformer Attention提出FLUID以液态注意力将连续动力学直接融入注意力计算。	cs.LGcs.AI	Waleed Razzaq, Yun-Bo Zhao	Continuous-time (CT) Transformers improve irregular and long-range modeling over CT-RNNs by exploiting inputs or outputs embeddings with continuous dynamics. However, the core scaled-dot-product-attention (SDPA) mechanism remains inherently discrete. We propos... Continuous-time (CT) Transformers improve irregular and long-range modeling over CT-RNNs by exploiting inputs or outputs embeddings with continuous dynamics. However, the core scaled-dot-product-attention (SDPA) mechanism remains inherently discrete. We propose FLUID (Flexible Unified Information Dynamics), a CT Transformer that incorporates continuous dynamics directly into the attention computation by replacing it with Liquid Attention Network (LAN). LAN reinterprets attention logits as contin...
37	Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention 2605.04460 Sparse Counterfactual Community Intervention从调查数据学习稀疏可控变量调整以制定干预策略。	cs.LG	Fatima Ashraf, Muhammad Ayub Sabir, Junbiao Pang, Yufang Zhou, Yan Shang	Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual co... Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual community intervention from survey responses, where the goal is to shift a target respondent group toward a desired reference group through controllable survey-variable adjustments. We formulate this task as a policy-feasible distributional a...
39	Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control 2605.04468 Controlled Drift LLM Fine-tuning用动态锚点控制分布漂移以稳定SFT并减轻遗忘。	cs.LGcs.AIcs.CL	Xinyu Wang, Changzhi Sun, Yuanbin Wu, Xiaoling Wang	Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributio... Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributional drift during optimization. Motivated by this perspective, we propose Anchored Learning, a simple framework that explicitly controls distributional updates during offline fine-tuning via a dynamically evolving moving anchor. Instead of m...
40	CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies 2605.04470 RL Fine-tuning for Driving结合反事实监督与交互式RL微调提升驾驶策略鲁棒性。	cs.LGcs.RO	Keyu Chen, Nanfei Ye, Yida Wang, Wenchao Sun, Danqi Zhao	Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-t... Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-tuning provides grounded feedback from executed actions but is constrained by the sparsity of informative events, whereas counterfactual fine-tuning provides dense supervision over candidate futures but inherits bias from imperfect future es...
41	Automated Formal Proofs of Combinatorial Identities via Wilf-Zeilberger Guidance and LLMs 2605.04472 Neuro-symbolic Combinatorial Proofs用WZ方法指导LLM生成可执行的组合恒等式形式化证明。	cs.LG	Beibei Xiong, Hangyu Lv, Junqi Liu, Yisen Wang, Shaoshi Chen	Automating formal proofs of combinatorial identities is challenging for LLM-based provers, as long-horizon proof planning is required and unconstrained search quickly explodes. Symbolic methods such as the Wilf-Zeilberger (WZ) method can achieve a mechanized p... Automating formal proofs of combinatorial identities is challenging for LLM-based provers, as long-horizon proof planning is required and unconstrained search quickly explodes. Symbolic methods such as the Wilf-Zeilberger (WZ) method can achieve a mechanized proof of combinatorial identities by constructing special auxiliary functions and demonstrating that they satisfy specific recurrence relations. We propose WZ-LLM, a neuro-symbolic framework that turns WZ proof plans into executable proof sk...
42	Geometry-Aware Neural Optimizer for Shape Optimization and Inversion 2605.04474 Geometry-Aware Neural Shape Optimization提出几何感知神经优化器实现形状优化与反演的端到端梯度。	cs.LG	Guoze Sun, Tianya Miao, Haoyang Huang, Huaguan Chen, Han Wan	Geometry is central to PDE-governed systems, motivating shape optimization and inversion. Classical pipelines conduct costly forward simulation with geometry processing, requiring substantial expert effort. Neural surrogates accelerate forward analysis but do ... Geometry is central to PDE-governed systems, motivating shape optimization and inversion. Classical pipelines conduct costly forward simulation with geometry processing, requiring substantial expert effort. Neural surrogates accelerate forward analysis but do not close the loop because gradients from objectives to geometry are often unavailable. Existing differentiable methods either rely on restrictive parameterizations or unstable latent optimization driven by scalar objectives, limiting inter...
44	Data-dependent Exploration for Online Reinforcement Learning from Human Feedback 2605.04477 Exploration in Online RLHF提出数据依赖探索策略提升在线RLHF的样本效率。	cs.LG	Zhen-Yu Zhang, Yuting Tang, Jiandong Zhang, Lanjihong Ma, Masashi Sugiyama	Online reinforcement learning from human feedback (RLHF) has emerged as a promising paradigm for aligning large language models (LLMs) by continuously collecting new preference feedback during training. A foundational challenge in this setting is exploration, ... Online reinforcement learning from human feedback (RLHF) has emerged as a promising paradigm for aligning large language models (LLMs) by continuously collecting new preference feedback during training. A foundational challenge in this setting is exploration, which requires algorithms that enable the LLMs to generate informative comparisons that improve sample-efficiency in online RLHF. Existing exploration strategies often derive bonuses via on-policy expectations, which are difficult to estima...
49	Towards General Preference Alignment: Diffusion Models at Nash Equilibrium 2605.04494 Preference Alignment for Diffusion Models将扩散偏好对齐建模为纳什均衡以提升泛化对齐。	cs.LGcs.CV	Jiaming Hu, Jiamu Bai, Haoyu Wang, Debarghya Mukherjee, Ioannis Ch. Paschalidis	Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative tha... Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative that avoids explicit reward modeling and has been widely adopted in diffusion alignment. However, existing preference-based methods for diffusion alignment still rely on reward-induced preference signals and typically assume that human prefere...
53	Quadrature-TreeSHAP: Depth-Independent TreeSHAP and Shapley Interactions 2605.04497 Depth-Independent TreeSHAP用数值积分重构TreeSHAP实现深度无关且支持高阶交互。	cs.LG	Ron Wettenstein, Rory Mitchell, Peng Yu	Shapley values are a standard tool for explaining predictions of tree ensembles, with Path-Dependent SHAP being the most widely used variant. Despite substantial progress, existing methods still exhibit trade-offs between depth-dependent runtime, numerical sta... Shapley values are a standard tool for explaining predictions of tree ensembles, with Path-Dependent SHAP being the most widely used variant. Despite substantial progress, existing methods still exhibit trade-offs between depth-dependent runtime, numerical stability, and support for higher-order interactions. To address these challenges, we introduce Quadrature-TreeSHAP, a quadrature-based reformulation of Path-Dependent TreeSHAP that is numerically stable, naturally extends to any-order Shapley...
57	Gradient Scaling Effects in Adaptive Spectral PINNs for Stiff Nonlinear ODEs 2605.04502 Optimization in Spectral PINNs分析IC门控导致的梯度缩放并改进刚性ODE的PINN训练。	cs.LG	Isabela M. Yepes, Pavlos Protopapas	Physics-Informed Neural Networks (PINNs) often struggle to train reliably on stiff and oscillatory dynamical systems due to poor optimization conditioning. While prior work has emphasized representational remedies such as spectral parameterizations, the optimi... Physics-Informed Neural Networks (PINNs) often struggle to train reliably on stiff and oscillatory dynamical systems due to poor optimization conditioning. While prior work has emphasized representational remedies such as spectral parameterizations, the optimization implications of initial-condition (IC) embeddings in adaptive spectral PINNs have not been well characterized. In this work, we show that the choice of IC gating function induces explicit time-dependent gradient scaling, which intera...
67	FL-Sailer: Efficient and Privacy-Preserving Federated Learning for Scalable Single-Cell Epigenetic Data Analysis via Adaptive Sampling 2605.04519 Federated learning for scATAC-seq用自适应采样的联邦学习框架实现隐私保护的scATAC分析	cs.LGstat.ML	Guangyi Zhang, Yi Dai, Yiyun He, Junhao Liu	Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility, yet privacy regulations and data size constraints hinder multi-institutional sharing. Federated learning (FL) offers a privacy-preserving alternative, but faces three... Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility, yet privacy regulations and data size constraints hinder multi-institutional sharing. Federated learning (FL) offers a privacy-preserving alternative, but faces three fundamental barriers in scATAC-seq analysis: ultra-high dimensionality, extreme sparsity, and severe cross-institutional heterogeneity. We propose FL-Sailer, the first FL framework designed for scATAC-seq data. FL-Sailer integrates two key...
72	YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts 2605.04528 Zero-shot cross-domain fault diagnosis用域条件混合专家实现机械故障诊断的零样本跨域泛化	cs.LGcs.MA	Zesen Wang, Zihao Wu, Yue Hu, Yang Gao, Fuzhen Xuan	Mechanical equipment forms the critical backbone of modern industrial production, yet domain shift severely limits the generalization of deep learning based fault diagnosis models across different equipment and operating conditions.Inspired by the success of f... Mechanical equipment forms the critical backbone of modern industrial production, yet domain shift severely limits the generalization of deep learning based fault diagnosis models across different equipment and operating conditions.Inspired by the success of foundation models in achieving zero-shotgeneralization, we propose YOTOnet (You Only Train Once), a novel architecture specifically designed for cross-domain fault diagnosis in mechanical equipment.YOTOnet comprises three core components: (1...
77	From Video-to-PDE: Data-Driven Discovery of Nonlinear Dye Plume Dynamics 2605.04535 Video-to-PDE system identification从染料羽流视频稳健提取场并用弱形式学习非线性PDE	cs.LGmath.NAphysics.comp-phstat.APstat.ML	Cesar Acosta-Minoli, Sayantan Sarkar	Inferring continuum models directly from video is hampered by two facts: the recorded field is uncalibrated image intensity rather than a physical state, and direct numerical differentiation of noisy frames is unstable. We develop a video-to-PDE pipeline that ... Inferring continuum models directly from video is hampered by two facts: the recorded field is uncalibrated image intensity rather than a physical state, and direct numerical differentiation of noisy frames is unstable. We develop a video-to-PDE pipeline that converts grayscale recordings of an ink plume into a normalised scalar field $u(x,y,t)$, isolates a bulk drift $\mathbf{v}(t)$ from intrinsic spreading via the intensity-weighted centroid, and identifies an effective transport law by weak-f...
80	Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation 2605.04542 Power distribution linking sampling and RL揭示幂分布统一解释采样、自奖励强化学习与自蒸馏的关系	cs.LG	Akiyoshi Tomihari, Issei Sato	Recent analyses question whether reinforcement learning (RL) is responsible for strong reasoning in large language models (LLMs). At the same time, distillation and inference-time sampling, including power sampling, have emerged as effective ways to improve LL... Recent analyses question whether reinforcement learning (RL) is responsible for strong reasoning in large language models (LLMs). At the same time, distillation and inference-time sampling, including power sampling, have emerged as effective ways to improve LLM performance. However, the relationship among RL, distillation, and sampling remains unclear. In this study, we focus on the power distribution, the target distribution of power sampling, and show that the power distribution bridges sampli...
83	Event-Based Early Warning of Vineyard Disease Risk from Environmental Time Series 2605.04548 Event-based vineyard disease early warning用事件预测替代日分类从环境序列提供可行动的病害预警	cs.LG	Ivica Dimitrovski, Ivan Kitanovski, Danco Davcev, Slobodan Kalajdziski, Kosta Mitreski	Accurate early warning of vineyard disease risk from environmental observations is essential for timely intervention and more sustainable crop protection. However, many existing studies formulate disease prediction as daily presence classification, which can f... Accurate early warning of vineyard disease risk from environmental observations is essential for timely intervention and more sustainable crop protection. However, many existing studies formulate disease prediction as daily presence classification, which can favor persistence-driven predictions and provide only limited support for actionable short-horizon warning. In this paper, we present an event-based approach for early warning of vineyard disease risk from environmental time series and evalu...
87	Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models 2605.04555 Counterfactual model-based HVAC RL用反事实建筑模型的MBRL实现更省数据的HVAC控制策略学习	cs.LGeess.SY	Jan Marco Ruiz de Vargas, Fabian Raisch, Zoltan Nagy, Pierre Pinson, Christoph Goebel	Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced... Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced training data requirements, they still require several months of interaction with the building to learn a satisfactory control policy. A key reason is that existing surrogate models attempt to predict the entire state-space, including weat...
92	Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination 2605.04568 Gradient-based MPC in latent space用潜在想象模型实现可微分的梯度式MPC规划控制。	cs.LGcs.AIcs.RO	Jonathan Spieler, Sven Behnke	State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learned policy networks, or a combination of policy networks and planning. Hybrid approaches that combine Model Predictive Cont... State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learned policy networks, or a combination of policy networks and planning. Hybrid approaches that combine Model Predictive Control (MPC) with a learned model and a policy prior to leverage the advantages of both paradigms have shown promising results. However, these approaches typically rely on gradient-free optimization methods, which can be computationally expens...
102	HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily 2605.04594 Heterophily heterogeneous graph learning通过语义与结构解耦提升异配异构图表示学习效果。	cs.LGcs.AI	Xinyi Li, Ming Li, Lu Bai, Lixin Cui, Feilong Cao	Many real-world heterogeneous graphs exhibit pronounced heterophily, where connected nodes often have dissimilar labels or play different semantic roles. In such settings, standard heterogeneous graph neural networks that aggregate messages along metapaths or ... Many real-world heterogeneous graphs exhibit pronounced heterophily, where connected nodes often have dissimilar labels or play different semantic roles. In such settings, standard heterogeneous graph neural networks that aggregate messages along metapaths or meta-relations primarily based on feature similarity can propagate misleading information, since feature similarity may be misaligned with underlying relational semantics. In this paper, we propose HeterSEED, a semantics-structure decouplin...
103	A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints 2605.04595 Queueing analysis of LLM inference用排队论联合计算与KV缓存约束分析LLM推理稳定性。	cs.LGcs.AImath.OC	Chengyi Nie, Nian Si, Zijie Zhou	The rapid adoption of large language models (LLMs) has created significant challenges for efficient inference at scale. Unlike traditional workloads, LLM inference is constrained by both computation and the memory overhead of key-value (KV) caching, which acce... The rapid adoption of large language models (LLMs) has created significant challenges for efficient inference at scale. Unlike traditional workloads, LLM inference is constrained by both computation and the memory overhead of key-value (KV) caching, which accelerates decoding but quickly exhausts GPU memory. In this paper, we introduce the first queueing-theoretic framework that explicitly incorporates both computation and GPU memory constraints into the analysis of LLM inference. Based on this ...
112	Library learning with e-graphs on jazz harmony 2605.04622 E-graph library learning for jazz harmony用e-graph库学习从爵士和声语料中归纳简洁生成规则。	cs.LGcs.AIcs.SC	Zeng Ren, Maddy Bowers, Xinyi Guan, Martin Rohrmeier	Humans can acquire a highly structured intuitive understanding of musical patterns, yet these patterns often require multiple iterations of reflection and re-listening to internalize fully. To capture such an internalization process, we present a computational... Humans can acquire a highly structured intuitive understanding of musical patterns, yet these patterns often require multiple iterations of reflection and re-listening to internalize fully. To capture such an internalization process, we present a computational model for the learning of jazz harmonic patterns based on library learning. Given a corpus of harmonic progressions, our model searches over a space of programs composed of primitive harmonic relations in order to discover concise generati...
118	FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation 2605.04651 Forward-only fast-weights adaptation单次前向将标注样本编译为快权重实现测试时监督适配。	cs.LGcs.CL	Guangsheng Bao, Hongbo Zhang, Han Cui, Ke Sun, Yanbin Zhao	Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytical... Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytically compiles labeled examples into fast weights in a single pass. By eliminating memory or context dependence, FAAST achieves constant-time inference and decouples task adaptation from pretrained representation. Across image classification a...
120	Threshold-Guided Optimization for Visual Generative Models 2605.04653 Threshold-guided alignment for generative models提出阈值引导优化以用标量评分高效对齐视觉生成模型。	cs.LG	Jinbin Bai, Yu Lei, Qingyu Shi, Aosong Feng, Yi Xin	Aligning large visual generative models with human feedback is often performed through pairwise preference optimization. While such approaches are conceptually simple, they fundamentally rely on annotated pairs, limiting scalability in settings where feedback ... Aligning large visual generative models with human feedback is often performed through pairwise preference optimization. While such approaches are conceptually simple, they fundamentally rely on annotated pairs, limiting scalability in settings where feedback is collected as independent scalar ratings. In this work, we revisit the KL-regularized alignment objective and show that the optimal policy implicitly compares each sample's reward to an instance-specific baseline that is generally intract...
122	Evidence-based anomaly detection in clinical domains 2605.04664 Clinical Anomaly Detection用贝叶斯网络等概率模型检测临床管理决策中的异常行为。	cs.LG	Milos Hauskrecht, Michal Valko, Branislav Kveton, Shyam Visweswaran, Gregory Cooper	Anomaly detection methods can be very useful in identifying interesting or concerning events. In this work, we develop and examine new probabilistic anomaly detection methods that let us evaluate management decisions for a specific patient and identify those d... Anomaly detection methods can be very useful in identifying interesting or concerning events. In this work, we develop and examine new probabilistic anomaly detection methods that let us evaluate management decisions for a specific patient and identify those decisions that are highly unusual with respect to patients with the same or similar condition. The statistics used in this detection are derived from probabilistic models such as Bayesian networks that are learned from a database of past pat...
124	Feature importance analysis for patient management decisions 2605.04666 Clinical Decision Feature Importance分析电子病历特征对化验与用药决策影响并给出重要性统计。	cs.LG	Michal Valko, Milos Hauskrecht	The objective of this paper is to understand what characteristics and features of clinical data influence physician's decision about ordering laboratory tests or prescribing medications the most. We conduct our analysis on data and decisions extracted from ele... The objective of this paper is to understand what characteristics and features of clinical data influence physician's decision about ordering laboratory tests or prescribing medications the most. We conduct our analysis on data and decisions extracted from electronic health records of 4486 post-surgical cardiac patients. The summary statistics for 335 different lab order decisions and 407 medication decisions are reported. We show that in many cases, physician's lab-order and medication decision...
126	ITBoost: Information-Theoretic Trust for Robust Boosting 2605.04671 Robust Boosting with Noisy Labels以信息论可信度评估样本可靠性提升噪声标签下的Boosting鲁棒性。	cs.LG	Ye Su, Longlong Zhao, Diego Garcia-Gil, Jipeng Guo, Gangchun Zhang	Gradient boosting remains a strong and widely used method for tabular data learning, but its performance often degrades when training labels are noisy. This behavior is largely related to the way boosting algorithms emphasize samples with large gradients, with... Gradient boosting remains a strong and widely used method for tabular data learning, but its performance often degrades when training labels are noisy. This behavior is largely related to the way boosting algorithms emphasize samples with large gradients, without explicitly accounting for whether such errors originate from informative hard cases or from unreliable labels. We address this issue by reconsidering how sample reliability is evaluated during boosting. Instead of relying on instantaneo...
131	HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction 2605.04682 Spatial Transcriptomics Prediction提出六角移窗Transformer从H&E切片预测空间基因表达以适配六角采样。	cs.LGcs.CV	Keunho Byeon, Jin Tae Kwak	Spatial transcriptomics offers spatially resolved gene expression profiling within tissue sections, but its cost and limited throughput hinder large-scale deployment. To extend this capability to routine practice, recent computational methods aim to infer spat... Spatial transcriptomics offers spatially resolved gene expression profiling within tissue sections, but its cost and limited throughput hinder large-scale deployment. To extend this capability to routine practice, recent computational methods aim to infer spatial gene expression directly from ubiquitous hematoxylin and eosin-stained histology slides. However, most existing models assume Cartesian or geometry-agnostic locality, despite the hexagonal sampling of widely used spot-array platforms, a...
134	Learning Time-Inhomogeneous Markov Dynamics in Financial Time Series via Neural Parameterization 2605.04690 Nonstationary Markov Modeling用神经参数化学习时变马尔可夫动力学以建模金融时间序列非平稳性。	cs.LGq-fin.MF	Jan Rovirosa, Jesse Schmolze	Modeling the dynamics of non-stationary stochastic systems requires balancing the representational power of deep learning with the mathematical transparency of classical models. While classical Markov transition operators provide explicit, theoretically ground... Modeling the dynamics of non-stationary stochastic systems requires balancing the representational power of deep learning with the mathematical transparency of classical models. While classical Markov transition operators provide explicit, theoretically grounded rules for system evolution, their empirical estimation collapses due to severe data sparsity when applied to high-resolution, high-noise environments. We explore this statistical barrier using financial time series as a canonical, real-w...
139	Differentiable Chemistry in PINNs for Solving Parameterized and Stiff Reaction Systems 2605.04708 PINNs for Stiff Chemistry将可微化学求解器融入PINN以求解参数化且刚性的反应系统。	cs.LG	Miloš Babić, Franz M. Rohrhofer, Stefan Posch	From neural ODEs to continuous-time machine learning, differentiable solvers allow physics, optimization, and simulation to become trainable components within deep learning systems. This has opened the path to a new generation of deep learning frameworks for s... From neural ODEs to continuous-time machine learning, differentiable solvers allow physics, optimization, and simulation to become trainable components within deep learning systems. This has opened the path to a new generation of deep learning frameworks for scientific computing, with many promising applications still emerging. In this paper, we integrate a differentiable chemistry solver into a modified physics-informed neural network to solve parameterized reaction systems that are inherently ...
140	ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC 2605.04709 Long-Horizon Visual MPC提出ELVIS用集成校准的潜空间想象提升长时域视觉MPC规划可靠性。	cs.LGcs.ROeess.SY	Yurui Du, Pinhao Song, Yutong Hu, Renaud Detry	A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding mode... A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model...
142	SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning 2605.04712 Continual RL Plasticity in MoE提出SPHERE缓解MoE在持续强化学习中的谱可塑性丧失与性能退化。	cs.LG	Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li	In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixt... In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating ...
145	Exact Dual Geometry of SOC-ICNN Value Functions 2605.04722 Dual Geometry of SOC-ICNNs推导SOC-ICNN值函数的精确对偶几何以支持可解释的凸推断。	cs.LGcs.AImath.OC	Kang Liu, Jianchen Hu, Wei Peng	Input Convex Neural Networks (ICNNs) are commonly used in a two-stage manner: one first trains a convex network and then minimizes it over its input in a downstream inference problem. Recent second-order-cone ICNNs (SOC-ICNNs) enrich ReLU-based ICNNs with quad... Input Convex Neural Networks (ICNNs) are commonly used in a two-stage manner: one first trains a convex network and then minimizes it over its input in a downstream inference problem. Recent second-order-cone ICNNs (SOC-ICNNs) enrich ReLU-based ICNNs with quadratic and conic modules and admit an exact representation as value functions of second-order cone programs (SOCPs). This value-function structure enables an explicit convex-analytic treatment of SOC-ICNN inference. In this paper, we study t...
148	Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols 2605.04727 Programming Knowledge Tracing Reliability复现实验并评估注意力增强PKT模型对实现细节与协议选择的敏感性。	cs.LGcs.SE	Jaewook Kim, Hyeoncheol Kim	Programming Knowledge Tracing (PKT) has recently advanced through hybrid approaches that integrate attention-based feature modeling for code representation with RNN-based sequential prediction. While these models report strong empirical performance, their reli... Programming Knowledge Tracing (PKT) has recently advanced through hybrid approaches that integrate attention-based feature modeling for code representation with RNN-based sequential prediction. While these models report strong empirical performance, their reliability can be sensitive to subtle implementation and experimental design choices. This study revisits representative PKT models and shows that reported gains can be substantially influenced by model configuration and sequence construction ...
153	Using Common Random Numbers for Simulation-based Planning with Rollouts 2605.04732 规划rollout的公共随机数分析公共随机数在rollout规划中对效用估计与选行动的影响。	cs.LG	Sandarbh Yadav, Frederic J Maliakkal, Harshad Khadilkar, Shivaram Kalyanakrishnan	Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the... Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the utilities of available actions. Among the actions thus explored, one with the maximum estimated utility is then executed. In this paper, we examine the effect of using common random numbers in the simulation process. We obtain a simple rec...
155	OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization 2605.04738 低比特LLM权重量化提出异常值自吸收方法以提升低比特LLM后训练量化精度。	cs.LG	Zhikai Li, Zhen Dong, Xuewen Liu, Jing Zhang, Qingyi Gu	Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a promising solution by reducin... Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a promising solution by reducing model size and accelerating token generation through alleviating the memory-bound issue. Nevertheless, the presence of inherent systematic outliers in weights continues to be a major obstacle. While existing methods, such as scaling and r...
157	MixINN: Accelerating Plant Breeding by Combining Mixed Models and Deep Learning for Interaction Prediction 2605.04744 育种基因型环境交互预测结合混合模型与深度学习预测基因型×环境互作以加速育种。	cs.LG	Aike Potze, Fred van Eeuwijk, Ioannis N. Athanasiadis	Plant breeding underpins global food security through incremental, accumulating improvements in crop yield, quality and sustainability, achieved via repeated cycles of crop ranking, selection and crossing. Climate change disrupts this process by altering local... Plant breeding underpins global food security through incremental, accumulating improvements in crop yield, quality and sustainability, achieved via repeated cycles of crop ranking, selection and crossing. Climate change disrupts this process by altering local growing conditions, thereby shifting the relative performance of crop genotypes. Predicting these relative changes in yield is critical for food security. Yet, this problem remains an open challenge in plant breeding, and relatively unexpl...
158	Knowledge-Free Correlated Agreement for Incentivizing Federated Learning 2605.04747 联邦学习贡献激励机制提出无需真值的相关一致奖励机制以可信激励联邦学习客户端。	cs.LGcs.AIcs.GT	Leon Witt, Togrul Abbasli, Kentaroh Toyoda, Wojciech Samek, Lucy Klinger	We introduce Knowledge-Free Correlated Agreement (KFCA) to reward client contributions in federated learning (FL) without relying on ground truth, a public test set, or distribution knowledge. Under categorical reports and an honest majority, KFCA is strictly ... We introduce Knowledge-Free Correlated Agreement (KFCA) to reward client contributions in federated learning (FL) without relying on ground truth, a public test set, or distribution knowledge. Under categorical reports and an honest majority, KFCA is strictly truthful, addressing the label-flipping vulnerability of Correlated Agreement (CA). We evaluate KFCA on federated LLM adapter tuning and a real-world PCB inspection task, showing efficient real-time reward computation suitable for decentral...
162	AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures 2605.04754 近似乘法器与MoE推理评估近似乘法器对MoE网络精度效率与能耗的综合影响。	cs.LGcs.AR	Omkar B Shende, Marcello Traiola, Gayathri Ananthanarayanan	Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towar... Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towards efficient inference, the former by replacing exact arithmetic with low-power approximate multipliers, the latter by routing inputs through specialized expert sub-networks to enable conditional computation. However, their interaction rema...
164	Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop 2605.04761 教育认知孪生建模构建分层可解释的个性化思维模型并引入人机协同提升表现。	cs.LGcs.AIcs.HC	Wu-Yuin Hwang, Nur Alif Ilyasa, Muhammad Irfan Luthfi, Yuniar Indrihapsari	This paper presents the Personalized Thinking Model (PTM), a hierarchical and interpretable learner representation designed for AI supported education. PTM organizes evidence from learner journals into a five-layer structure covering behavioral instances, beha... This paper presents the Personalized Thinking Model (PTM), a hierarchical and interpretable learner representation designed for AI supported education. PTM organizes evidence from learner journals into a five-layer structure covering behavioral instances, behavioral patterns, cognitive routines, metacognitive tendencies, and self-system values. PTM is grounded in Marzano's New Taxonomy of Educational Objectives and tries to clone learner's thinking model and build cognitive twin. It was construc...
171	Bilinear Mamba-Koopman Neural MPC for Varying Dynamics 2605.04793 Koopman神经MPC变动态引入控制相关双线性耦合以提升Koopman神经MPC对时变动力学适应。	cs.LGmath.OC	Matan Pagi, Zohar Sorek	Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changin... Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changing dynamics within a single MPC horizon, particularly under time-varying conditions and under stale-plan execution. We propose Bilinear Mamba-Koopman Neural MPC, a minimal extension that introduces control-dependent coupling in the latent ...
174	A Biased Nonnegative Block Term Tensor Decomposition Model for Dynamic QoS Prediction 2605.04813 动态QoS张量分解预测提出带偏置的非负块项张量分解以预测动态服务QoS。	cs.LG	Wenjing Liu, Yujia Lei, Qu Wang	With the rapid development of cloud computing and Web services, Quality of Service (QoS) has become a key criterion for service selection and recommendation. Tensor latent feature analysis provides an effective way to model multidimensional QoS data, and most ... With the rapid development of cloud computing and Web services, Quality of Service (QoS) has become a key criterion for service selection and recommendation. Tensor latent feature analysis provides an effective way to model multidimensional QoS data, and most existing QoS prediction methods are mainly based on Canonical Polyadic (CP) decomposition or Tucker decomposition. However, constrained by their inherent structural properties, these methods cannot accurately capture the complex and dynamic...
175	Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs 2605.04819 SAT不可满足核预测GNN在子句-文字超图上进行极性感知表示学习以预测unsat core。	cs.LG	Zhenchao Sun, Shuai Ma, Ping Lu, Chongyang Tao	Graph neural networks have been widely used in Boolean satisfiability (SAT) tasks to learn structural information from SAT formulas. The goal of these studies is to solve SAT instances or to enhance SAT solvers, including tasks such as unsat-core prediction. H... Graph neural networks have been widely used in Boolean satisfiability (SAT) tasks to learn structural information from SAT formulas. The goal of these studies is to solve SAT instances or to enhance SAT solvers, including tasks such as unsat-core prediction. However, most existing approaches model a SAT formula as a bipartite graph or a directed acyclic graph, which are less expressive in capturing higher-order interactions among literals and clauses. Moreover, these approaches are limited in mo...
176	Improving FMQA via Initial Training Data Design Considering Marginal Bit Coverage in One-Hot Encoding 2605.04825 FMQA初始数据覆盖设计通过边际比特覆盖设计初始采样以改进FMQA在独热编码下的优化。	cs.LGcond-mat.stat-mech	Taiga Hayashi, Yuya Seki, Kotaro Terada, Yosuke Mukasa, Shuta Kikuchi	Factorization machine with quadratic-optimization annealing (FMQA) is a black-box optimization method that combines a factorization machine (FM) surrogate with QUBO-based search by an Ising machine. When FMQA is applied to integer or discretized continuous var... Factorization machine with quadratic-optimization annealing (FMQA) is a black-box optimization method that combines a factorization machine (FM) surrogate with QUBO-based search by an Ising machine. When FMQA is applied to integer or discretized continuous variables via one-hot encoding, uniform random initial sampling can leave many binary variables never active in the initial training data, and the corresponding FM parameters receive no direct gradient updates from the observed responses. We a...
177	Trustworthy Federated Label Distribution Learning under Annotation Quality Disparity 2605.04827 可信联邦标签分布学习在标注质量差异下提出可信Fed-LDL以稳健聚合并抑制噪声。	cs.LG	Junxiang Wu, Zhiqiang Kou, Hongwei Zeng, Wenke Huang, Biao Liu	Label Distribution Learning (LDL) models supervision as an instance-wise probability distribution, enabling fine-grained learning under inherent ambiguity, but its success relies on high-fidelity label distributions that are costly to obtain and thus often noi... Label Distribution Learning (LDL) models supervision as an instance-wise probability distribution, enabling fine-grained learning under inherent ambiguity, but its success relies on high-fidelity label distributions that are costly to obtain and thus often noisy. Motivated by privacy-sensitive applications, we study Federated Label Distribution Learning (Fed-LDL), where data isolation further induces heterogeneous annotation quality across clients, making local updates unevenly reliable and brea...
178	Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models 2605.04830 扩散模型相变与非局部性研究扩散Transformer中对称破缺与非局部性临界相变是否同步发生。	cs.LGcond-mat.stat-mech	Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao	Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic mi... Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic minima of the energy landscape, whereas the nonlocality picture views the critical window as when local denoising fails. We study whether two notions of such phase transitions are concurrent in modern diffusion transformers. By evaluating the...
180	Replay-Based Continual Learning for Physics-Informed Neural Operators 2605.04832 物理信息神经算子持续学习用回放式持续学习提升物理信息神经算子在分布外数据上的性能。	cs.LG	Yizheng Wang, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu	Neural operators generally demonstrate strong predictive performance on in-distribution (ID) problems. However, a critical limitation of existing methods is their significant performance degradation when encountering out-of-distribution (OOD) data. To address ... Neural operators generally demonstrate strong predictive performance on in-distribution (ID) problems. However, a critical limitation of existing methods is their significant performance degradation when encountering out-of-distribution (OOD) data. To address this issue, this work introduces continual learning into physics-informed neural operators, with particular emphasis on neural operators built upon the Transolver architecture, and proposes a simple yet effective replay-based continual lear...
181	Bridging Input Feature Spaces Towards Graph Foundation Models 2605.04834 Graph feature space alignment提出ALL-IN投影对齐节点特征以跨图数据集迁移。	cs.LG	Moshe Eliasof, Krishna Sri Ipsit Mantri, Beatrice Bevilacqua, Bruno Ribeiro, Carola-Bibiane Schönlieb	Unlike vision and language domains, graph learning lacks a shared input space, as input features differ across graph datasets not only in semantics, but also in value ranges and dimensionality. This misalignment prevents graph models from generalizing across d... Unlike vision and language domains, graph learning lacks a shared input space, as input features differ across graph datasets not only in semantics, but also in value ranges and dimensionality. This misalignment prevents graph models from generalizing across datasets, limiting their use as foundation models. In this work, we propose ALL-IN, a simple and theoretically grounded method that enables transferability across datasets with different input features. Our approach projects node features in...
185	Quantile-Free Uncertainty Quantification in Graph Neural Networks 2605.04847 Uncertainty quantification for GNNs提出QpiGNN在弱假设下构建无需分位数的图预测区间。	cs.LGcs.AI	Soyoung park, Hwanjun Song, Sungsu Lim	Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in ... Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in practice. Moreover, achieving reliable UQ typically requires costly resampling or post-hoc calibration. To address these issues, we introduce Quantile-free Prediction Interval GNN (QpiGNN), a framework that builds on quantile regression (QR...
186	Hybrid Iterative Neural Low-Regularity Integrator for Nonlinear Dispersive Equations 2605.04853 Neural-corrected PDE solvers以神经算子学习截断误差修正低正则积分器求解色散PDE。	cs.LG	Zhangyong Liang	We propose HIN-LRI, a hybrid framework that augments a classical numerical solver with a neural operator trained to correct the solver's structured truncation error. A base low-regularity integrator provides a consistent first-order approximation to nonlinear ... We propose HIN-LRI, a hybrid framework that augments a classical numerical solver with a neural operator trained to correct the solver's structured truncation error. A base low-regularity integrator provides a consistent first-order approximation to nonlinear dispersive PDEs, while a lightweight neural network, operating on a low-dimensional latent manifold, learns the residual defect that analytical methods cannot close. An explicit time-step scaling on the neural correction ensures that its Li...
192	Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models 2605.04874 Uncertainty-aware multimodal DPO提出不确定性感知的探索式DPO以减少多模态模型视觉幻觉。	cs.LGcs.CLcs.CV	Huatian Zhang, Zhendong Mao, Lei Zhang, Yongdong Zhang	Direct Preference Optimization (DPO) has proven to be an effective solution for mitigating hallucination in Multimodal Large Language Models (MLLMs) by learning from preference pairs. One of its key challenges lies in how to transfer the sequence-level prefere... Direct Preference Optimization (DPO) has proven to be an effective solution for mitigating hallucination in Multimodal Large Language Models (MLLMs) by learning from preference pairs. One of its key challenges lies in how to transfer the sequence-level preference into fine-grained supervision on visual fidelity. To safeguard vision-related tokens that are prone to hallucination, existing methods typically allocate training emphasis according to the model's self-assessed visual sensitivity signal...
195	A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs 2605.04880 Average-reward RL in SMDPs用调和平均形式重构SMDP平均回报目标并推导相应算法。	cs.LGcs.AI	Erel Shtossel, Alicia Vidler, Uri Shaham, Gal A. Kaminka	Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete ... Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete actions stochastically generate both rewards and durations, and the objective is to optimize the average reward rate. Existing algorithms approach this by optimizing the ratio of rewards to durations. However, when rewards and durations are...
202	Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics 2605.04893 Spectral diagnostics of attention transport证明对称谱诊断在注意力传输中存在方向不敏感的结构性局限。	cs.LGcs.CLstat.ML	Dominik Dahlem, Diego Maniloff, Mac Misiura	Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used fam... Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation...
203	Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization 2605.04895 Regime-conditioned transfer Bayesian optimization提出按预算与先验质量等机制变量条件化评估迁移贝叶斯优化方法。	cs.LGstat.ML	Noel Thomas	Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer... Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer-BO papers from NeurIPS, ICML, ICLR, AISTATS, UAI, TMLR, JMLR, and AutoML-Conf (2022-2025) finds that 98% never vary B/\|A\| as a controlled axis. On the same GDSC2 benchmark, changing only the budget reverses the ranking: at B=50, Greedy out...
205	A geometric relation of the error introduced by sampling a language model's output distribution to its internal state 2605.04899 Geometric analysis of LM sampling error用嵌入几何推导1-形式与曲率刻画采样误差并关联语义状态。	cs.LG	Albert F. Modenbach	GPT-style language models are sensitive to single-token changes at generation points where the predicted probability distribution is spread across multiple tokens. Viewing this sensitivity as a geometric property, we derive an $\mathfrak{so}(n)$-valued 1-form ... GPT-style language models are sensitive to single-token changes at generation points where the predicted probability distribution is spread across multiple tokens. Viewing this sensitivity as a geometric property, we derive an $\mathfrak{so}(n)$-valued 1-form that depends only on the geometry of the token embeddings. Despite this purely geometric origin, we show that its curvature is semantically meaningful: On chess reasoning tasks, the curvature couples to the world model of an off-the-shelf i...
207	Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs 2605.04903 LLM-driven NAS via code diffs让微调LLM生成代码diff迭代改造基线网络以实现高效NAS。	cs.LGcs.AIcs.CV	Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov	Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where f... Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where fine-tuned LLMs generate compact unified diffs (deltas) to refine baseline architectures rather than synthesizing entire models. Our pipeline iteratively fine-tunes the LLM via LoRA on curated architectures from the LEMUR dataset, with MinHa...
209	Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features 2605.04905 Robust feature importance across models比较多模型特征重要性以区分静电纺丝中稳健与模型依赖变量。	cs.LGcs.DB	Mehrab Mahdian, Ferenc Ender, Tamas Pardy	Electrospinning is a highly sensitive fabrication process in which small variations in operating parameters can significantly influence fiber morphology and material performance. Machine learning (ML) methods are increasingly employed to model these process-st... Electrospinning is a highly sensitive fabrication process in which small variations in operating parameters can significantly influence fiber morphology and material performance. Machine learning (ML) methods are increasingly employed to model these process-structure relationships and to identify the relative importance of processing variables. However, most existing studies rely on a single ML model, implicitly assuming that the resulting feature importance is robust and reproducible. In this s...
212	Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning 2605.04911 Private tabular data synthesis用上下文学习生成表格数据以同时提升质量并降低记忆泄露。	cs.LG	Xinyan Han, Yan Lu, Xiaoyu Lin, Yuanyuan Jiang, Yuanrui Wang	Tabular data synthesis aims to generate high-quality data while preserving privacy. However, we find that existing tabular generative models exhibit a clear tradeoff in the small-data regime: improving data quality typically comes at the cost of increased memo... Tabular data synthesis aims to generate high-quality data while preserving privacy. However, we find that existing tabular generative models exhibit a clear tradeoff in the small-data regime: improving data quality typically comes at the cost of increased memorization of training samples, thereby weakening privacy protection. This tradeoff arises because small training sets make it difficult for dataset-specific generative models to distinguish generalizable structure from sample-specific patter...
215	Koopman Identification of Nonlinear Systems via Reservoir Liftings 2605.04917 Koopman learning with reservoirs用储备池提升构造Koopman字典以线性化非线性动力系统。	cs.LGcs.RO	Weibin Gu, Chen Yang, Lu Shi	Learning tractable linear representations of nonlinear dynamical systems via Koopman operator theory is often hindered by dictionary selection, temporal memory encoding, and numerical ill-conditioning. Inspired by Reservoir Computing (RC) paradigm, this paper ... Learning tractable linear representations of nonlinear dynamical systems via Koopman operator theory is often hindered by dictionary selection, temporal memory encoding, and numerical ill-conditioning. Inspired by Reservoir Computing (RC) paradigm, this paper introduces the RC-Koopman framework, which interprets reservoir as a stateful, finite-dimensional Koopman dictionary whose temporal depth is explicitly controlled by its spectral radius. We show that the Echo State Property (ESP) guarantees...
217	Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization 2605.04920 RL for compositional generalization用结果级强化学习替代模仿学习以提升组合泛化能力。	cs.LGcs.CL	Xiyan Fu, Wei Liu	Compositional generalization refers to correctly interpret novel combinations of known primitives, which remains a major challenge. Existing approaches often rely on supervised fine-tuning, which encourages models to imitate target outputs. This token-level tr... Compositional generalization refers to correctly interpret novel combinations of known primitives, which remains a major challenge. Existing approaches often rely on supervised fine-tuning, which encourages models to imitate target outputs. This token-level training paradigm fails to capture the global compositional structure required for generalizing to unseen combinations. In this work, we investigate whether compositional generalization can instead be improved through outcome-level reinforcem...
220	When Does Gene Regulatory Network Inference Break? A Controlled Diagnostic Study of Causal and Correlational Methods on Single-Cell Data 2605.04930 Failure modes of GRN inference用可控诊断基准剖析单细胞GRN因果推断方法何时失效。	cs.LGcs.AIq-bio.GNq-bio.QMstat.ML	Miguel Fernandez-de-Retana, Ruben Sanchez-Corcuera, Unai Zulaika, Aritz Bilbao-Jayo, Aitor Almeida	Despite theoretical advantages, causal methods for Gene Regulatory Network (GRN) inference from single-cell RNA-seq data consistently fail to match or outperform correlation-based baselines in many realistic benchmarks, a persistent puzzle which casts doubt on... Despite theoretical advantages, causal methods for Gene Regulatory Network (GRN) inference from single-cell RNA-seq data consistently fail to match or outperform correlation-based baselines in many realistic benchmarks, a persistent puzzle which casts doubt on the value of causality for this task. We argue that existing benchmarks are insufficiently controlled to answer this question because they evaluate on real or semi-real data where multiple pathologies co-occur, confounding failure modes, a...
226	Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks 2605.04946 BatchNorm effects on partition geometry分析训练期BN如何改变分段仿射网络的超平面与区域划分几何。	cs.LGstat.ML	Xuan Qi, Yi Wei, Fanqi Yu, Furao shen, Vittorio Murino	Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geo... Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geometry of switching hyperplanes and the induced affine-region partition. Conditioned on a mini-batch, we show that BN defines for each neuron a reference hyperplane through the batch centroid, and that breakpoint-switching hyperplanes are pa...
228	Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts 2605.04952 Efficient routing for granular MoE提出倒排索引式自适应路由以降低细粒度MoE的路由成本。	cs.LG	Klaus-Rudolf Kladny, Maximilian Mordig, Bernhard Schölkopf, Michael Muehlebach	Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular experts, i.e., many small experts instead of a few large o... Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular experts, i.e., many small experts instead of a few large ones. However, this regime substantially increases routing cost, which can dominate computation. We introduce adaptive inverted-index routing for MoE (AIR-MoE), an inverted-index-inspired routing architecture based on vector quantization (VQ...
230	Order-based Rehearsal Learning 2605.04955 Order-based rehearsal learning提出仅用顺序结构进行回避不良未来决策的排练学习方法。	cs.LG	Yu-Xuan Tao, Tian-Zuo Wang, Zhi-Hua Zhou	When a machine learning (ML) model forecasts an undesired event, one often seeks a decision to avoid it, known as the avoiding undesired future (AUF) problem. Many rehearsal learning methods have been proposed for AUF, but they rely on an underlying graph stru... When a machine learning (ML) model forecasts an undesired event, one often seeks a decision to avoid it, known as the avoiding undesired future (AUF) problem. Many rehearsal learning methods have been proposed for AUF, but they rely on an underlying graph structure; learning such a graph from observational data is challenging and can incur substantial estimation error. In this work, we demonstrate that the order structure can be sufficient for AUF decision-making, and propose the first order-bas...
231	KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels 2605.04956 Benchmark for LLM-generated GPU kernels发布KernelBench-X评测LLM生成Triton内核的正确性与效率。	cs.LGcs.PF	Han Wang, Jintao Zhang, Kai Jiang, Haoxu Wang, Jianfei Chen	LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through categ... LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency across 176 tasks in 15 categories. Our systematic comparison of five representative methods yields three main findings. First, task structure determines correctness more than metho...
232	Delving into Non-Exchangeability for Conformal Prediction in Graph-Structured Multivariate Time Series 2605.04957 Conformal prediction for graph time series研究图结构多变量时序中非交换性对共形预测覆盖率的影响。	cs.LG	Ruichao Guo, Xingyao Han, Luo Wenshui, Zhe Liu, Chen Gong	Point forecasting for graph-structured multivariate time series is a fundamental problem, but rigorous uncertainty quantification for such predictions is still underexplored. Conformal prediction (CP) offers uncertainty estimation with a solid coverage guarant... Point forecasting for graph-structured multivariate time series is a fundamental problem, but rigorous uncertainty quantification for such predictions is still underexplored. Conformal prediction (CP) offers uncertainty estimation with a solid coverage guarantee under the exchangeability assumption, which requires the joint data distribution to be unchanged under permutation. However, in graph-structured time series, inherent cross-node coupling can violate the exchangeability condition, making ...
233	EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance 2605.04960 Improved GRPO for RLVR提出EP-GRPO缓解GRPO信用分配问题以提升推理强化学习。	cs.LGcs.AI	Song Yu, Li Li, Wenwen Zhao, Zhisheng Yang	Reinforcement learning with verifiable rewards (RLVR), particularly Group Relative Policy Optimization (GRPO), has advanced LLM reasoning. However, GRPO suffers from three credit assignment failures: uniform token-level granularity that ignores heterogeneous i... Reinforcement learning with verifiable rewards (RLVR), particularly Group Relative Policy Optimization (GRPO), has advanced LLM reasoning. However, GRPO suffers from three credit assignment failures: uniform token-level granularity that ignores heterogeneous informational value, uniform polarity that penalizes correct steps and rewards incorrect ones, and zero-variance collapse that erases outcome-driven gradients. We systematically quantify these failures, revealing highly non-uniform token inf...
235	Reliable Modeling of Distribution Shifts via Displacement-Reshaped Optimal Transport 2605.04965 Optimal transport for distribution shifts提出ReshapeOT利用样本位移重塑地面度量以更可靠建模分布漂移。	cs.LGcs.AI	Philip Naumann, Jacob Kauffmann, Klaus-Robert Müller, Grégoire Montavon	Optimal transport (OT) is a central framework for modeling distribution shifts. Because OT compares distributions directly in input space, a well-designed ground metric between observations is essential to ensure that the optimizer does not violate the true ge... Optimal transport (OT) is a central framework for modeling distribution shifts. Because OT compares distributions directly in input space, a well-designed ground metric between observations is essential to ensure that the optimizer does not violate the true geometry of change. We propose Displacement-Reshaped Optimal Transport (ReshapeOT), a method that reshapes the ground metric by integrating observed sample displacements as an additional source of knowledge. Technically, ReshapeOT replaces th...
236	Skill Neologisms: Towards Skill-based Continual Learning 2605.04970 Skill-based continual learning via soft tokens用可训练软词“技能新词”扩展LLM新技能并减轻遗忘。	cs.LGcs.AI	Antonin Berthon, Nicolas Astorga, Mihaela van der Schaar	Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open-problem: fine-tuning and parameter-efficient variants risk catas... Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open-problem: fine-tuning and parameter-efficient variants risk catastrophic forgetting, while context-based approaches have limited expressiveness and are constrained by the model's effective context. We explore skill neologisms--i.e., soft tokens integrated in the model's vocabulary and optimized to improv...
237	Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking 2605.04971 Geometric continuity in deep networks解释残差与对称性破缺如何导致相邻层奇异向量对齐的几何连续性。	cs.LGcs.AIcs.CL	Kyungwon Jeong, Won-Gi Paeng, Honggyo Suh	Weight matrices in deep networks exhibit geometric continuity -- principal singular vectors of adjacent layers point in similar directions. While this property has been widely observed, its origin remains unexplained. Through experiments on toy MLPs and small ... Weight matrices in deep networks exhibit geometric continuity -- principal singular vectors of adjacent layers point in similar directions. While this property has been widely observed, its origin remains unexplained. Through experiments on toy MLPs and small transformers, we identify two mechanisms: residual connections create cross-layer gradient coherence that aligns weight updates across layers, and symmetry-breaking nonlinearities constrain all layers to a shared coordinate frame, preventin...
242	Conceptors for Semantic Steering 2605.04980 Conceptor语义操控用软投影矩阵conceptor保留多维概念子空间以引导LLM行为。	cs.LGcs.CL	Ilias Triantafyllopoulos, Young-Min Cho, Ren Tao, Miranda Muqing Miao, Sunny Rai	Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: ... Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vec...
243	Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers 2605.04984 无验证器信用分配提出自诱导结果势为长程代理提供无标签的回合级奖励信号。	cs.LGcs.CL	Senkang Hu, Yong Dai, Xudong Han, Zhengru Fang, Yuzhi Zhao	Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turn... Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turns that increase the likelihood of a gold answer, but they require answer supervision or stable task-specific verifiers. Conversely, label-free RL methods extract self-signals from output distributions, but mainly at the answer or trajectory...
246	Federated Learning for Early Prediction of EV Charging Demand 2605.04993 联邦学习充电需求预测用联邦学习在充电早期预测EV会话总能量需求以支持调度。	cs.LGcs.AI	Vasilis Perifanis, Foteini Nikolaidou, Nikolaos Pavlidis, Panagiotis Thomakos, Andreas Sendros	Accurate forecasting of electric vehicle (EV) charging demand is critical for grid stability, infrastructure planning, and real-time charging optimization. In this work, we study the problem of early prediction of charging demand, where the total energy of a s... Accurate forecasting of electric vehicle (EV) charging demand is critical for grid stability, infrastructure planning, and real-time charging optimization. In this work, we study the problem of early prediction of charging demand, where the total energy of a session is estimated using only information available at plug-in time and during the first minutes of charging. This enables actionable decisions while the session is still in progress, which is of direct importance for EV network operators....
247	Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning 2605.04995 自适应查询学习理论在可实现约束下比较in-context与agentic自适应查询的逼近能力。	cs.LGmath.STstat.ML	Anastasis Kratsios, A. Martina Neuman, Philipp Petersen	We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizabl... We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we require these operations to be implemented by ReLU neural networks. In both settings, adaptivity never hinders approximation performance. However, this advantage can change when one passes from the unrestricted regime to ...
248	DualTCN: A Physics-Constrained Temporal Convolutional Network for 2 Time-Domain Marine CSEM Inversion 2605.04997 物理约束CSEM反演提出DualTCN以物理约束时序卷积网络反演海洋CSEM瞬变参数。	cs.LG	Khaled Ahmed, Ghada Omar	DualTCN is the first deep-learning framework for inverting time-domain marine controlled-source electromagnetic (MCSEM) transient data. Moving away from traditional subsurface discretization, the framework regresses four earth-model parameters -- $σ_1$, $σ_2$,... DualTCN is the first deep-learning framework for inverting time-domain marine controlled-source electromagnetic (MCSEM) transient data. Moving away from traditional subsurface discretization, the framework regresses four earth-model parameters -- $σ_1$, $σ_2$, $d_1$, $d_2$ -- and reconstructs conductivity-depth profiles using a differentiable soft-step decoder. The optimized architecture (379K parameters) features a Temporal Convolutional Network (TCN) encoder paired with a late-time branch and ...
254	Learned Neighbor Trust for Collaborative Deployment in Model-Agnostic Decentralized Learning 2605.05009 去中心化协同推理学习邻居信任使去中心化节点在部署时可协同组合预测而非孤立。	cs.LG	Michael Lanier, Luise Ge, Sastry Kompella, Yevgeniy Vorobeychik	Many decentralized distillation methods are designed around training-time coordination, yet deploy each node in isolation even when more capable neighbors remain available at inference time. This is an incomplete objective for settings such as IoT, where devic... Many decentralized distillation methods are designed around training-time coordination, yet deploy each node in isolation even when more capable neighbors remain available at inference time. This is an incomplete objective for settings such as IoT, where devices are heterogeneous, data is scarce and skewed, and a node's strongest neighbors may far exceed its own local capacity. We study how nodes should train so that their predictions compose well at deployment, and how each node should learn wh...
258	Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning 2605.05020 多智能体多样性度量提出Graph-SND用稀疏图边聚合近似SND以降低多样性计算成本。	cs.LGcs.MA	Shawn Ray	System Neural Diversity (SND) measures behavioral heterogeneity in multi-agent reinforcement learning by averaging pairwise distances over all $\binom{n}{2}$ agent pairs, making each call quadratic in team size. We introduce Graph-SND, which replaces this comp... System Neural Diversity (SND) measures behavioral heterogeneity in multi-agent reinforcement learning by averaging pairwise distances over all $\binom{n}{2}$ agent pairs, making each call quadratic in team size. We introduce Graph-SND, which replaces this complete-graph average with a weighted average over the edges of an arbitrary graph $G$. Three regimes follow: $G=K_n$ recovers SND exactly; a fixed sparse $G$ defines a localized diversity measure at $O(\|E\|)$ cost; and random edge samples yiel...
259	CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels 2605.05023 LLM理解重建CUDA核提出CuBridge用LLM理解并重构高性能注意力CUDA内核以兼顾性能与灵活。	cs.LG	Xing Ma, Yangjie Zhou, Wu Sun, Zihan Liu, Jingwen Leng	Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-w... Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex opera...
264	The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence 2605.05029 预测表示的因果缺口给出不可能定理与大规模证据表明预测编码会偏向环境而非系统因果。	cs.LG	Kejun Liu	We report a systematic failure mode in predictive representation learning. Across 2695 neural network configurations trained to predict linear-Gaussian dynamics, the optimal encoder tracks the environment rather than the system it is meant to model. The mean c... We report a systematic failure mode in predictive representation learning. Across 2695 neural network configurations trained to predict linear-Gaussian dynamics, the optimal encoder tracks the environment rather than the system it is meant to model. The mean causal fidelity -- the fraction of encoder sensitivity allocated to system degrees of freedom -- is 0.49, and only 2.5% of configurations exceed 0.70. The failure intensifies with dimension: at N=100, the optimal encoder becomes causally bli...
267	Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization 2605.05040 偏好自蒸馏训练提出基于奖励正则的偏好自蒸馏超越KL匹配以提升稳定性与效果。	cs.LGcs.AI	Xin Yu, Liuchen Liao, Yiwen Zhang, Yingchen Yu, Lingzhou Xue	On-policy distillation is an efficient alternative to reinforcement learning, offering dense token-level training signals. However, its reliance on a stronger external teacher has driven recent work on on-policy self-distillation, where the same model serves a... On-policy distillation is an efficient alternative to reinforcement learning, offering dense token-level training signals. However, its reliance on a stronger external teacher has driven recent work on on-policy self-distillation, where the same model serves as both teacher and student under different prompt contexts. Yet, existing self-distillation methods largely reduce learning to KL matching toward the context-augmented teacher model. This approach often suffers from training instability and...
273	Adaptive Learning Strategies for AoA-Based Outdoor Localization: A Comprehensive Framework 2605.05055 AoA outdoor localization learning提出面向5G/6G的AoA定位自适应训练与特征选择框架。	cs.LGcs.AIeess.SP	Bac Trinh-Nguyen, Sara Berri, Sin G. Teo, Tram Truong-Huu, Arsenia Chorti	Localization in 5G and 6G networks is essential for important use cases such as intelligent transportation, smart factories, and smart cities. Although deep learning has enabled improving localization accuracy, depending on the deployment scenario and the effo... Localization in 5G and 6G networks is essential for important use cases such as intelligent transportation, smart factories, and smart cities. Although deep learning has enabled improving localization accuracy, depending on the deployment scenario and the effort required for dataset collection campaigns on a given infrastructure, the training process for localization models can vary significantly. Furthermore, with respect to feature selection, recent works have demonstrated the robustness of an...
276	Full-chip CMP modelling based on Fully Convolutional Network leveraging White Light Interferometry 2605.05062 CMP modeling with FCN用白光干涉数据训练全卷积网络进行全芯片CMP形貌建模。	cs.LG	Jules Exbrayat, Renan Bouis, Elie Sezestre, Viorel Balan, Arnaud Cornelis	As time-to-market is crucial in the Integrated Circuit (IC) industry, speeding up layout manufacturability verifi-cation is essential. Chemical-Mechanical Polishing (CMP) plays a vital role in IC fabrication but is significantly influenced by Layout-Dependent ... As time-to-market is crucial in the Integrated Circuit (IC) industry, speeding up layout manufacturability verifi-cation is essential. Chemical-Mechanical Polishing (CMP) plays a vital role in IC fabrication but is significantly influenced by Layout-Dependent Effects (LDE). An accurate and efficient CMP model enables design teams to correct surface unevenness before fabrication, reducing costs and accelerating the design phase. However, existing models often rely on Density Step Height (DSH) mod...
277	Expert Routing for Communication-Efficient MoE via Finite Expert Banks 2605.05278 Communication-efficient MoE routing将MoE门控视为信道并用有限专家库提升路由与通信效率。	cs.LGcs.IT	Mohammad Reza Deylam Salehi, Ali Khalesi	Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpr... Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpretations of MoE gating, we treat the gate as a stochastic channel and use $I(X;T)$ to quantify the routing information available to the selected expert. To make the associated information quantities tractable beyond synthetic examples, we d...
284	Provable imitation learning for control of instability in partially-observed Vlasov--Poisson equations 2605.05081 Imitation learning for plasma control将全观测专家策略蒸馏为仅用宏观观测的等离子体稳定控制器。	cs.LGmath.APmath.OCphysics.plasm-ph	Xiaofan Xia, Qin Li, Wenlong Mou	We consider the stabilization of Vlasov--Poisson plasma dynamics, a central control problem in nuclear fusion. Our focus is the gap between what an ideal controller would use and what experiments can actually observe: while optimal policy may rely on the full ... We consider the stabilization of Vlasov--Poisson plasma dynamics, a central control problem in nuclear fusion. Our focus is the gap between what an ideal controller would use and what experiments can actually observe: while optimal policy may rely on the full phase-space state, practical feedback is typically limited to sparse macroscopic diagnostics. We therefore study imitation learning methods that distill a fully observed expert policy into controllers operating only on macroscopic measureme...
286	Order Matters: Improving Domain Adaptation by Reordering Data 2605.05084 Domain adaptation via data reordering通过最优重排训练数据降低差异估计方差以改进UDA。	cs.LG	Andrea Napoli, Paul White	Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in ... Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Optimal Reordering of Data for Error-Reduced Estimation of Discrepancy (ORDERED), a novel unbiased stochastic variance reduction technique whi...
287	Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis 2605.05088 Multimodal building energy prediction用门控多模态模型预测建筑能效并分析改造情景影响。	cs.LGphysics.soc-ph	Yunfei Bai, Aaron Tesfa Tsion, Raul Rosales, Barbara Shollock, Wei He	Achieving resilient and sustainable cities requires scalable approaches to decarbonising residential buildings, which account for about 20% of UK greenhouse gas emissions and 25% of energy-related emissions in the European Union. Energy Performance Certificate... Achieving resilient and sustainable cities requires scalable approaches to decarbonising residential buildings, which account for about 20% of UK greenhouse gas emissions and 25% of energy-related emissions in the European Union. Energy Performance Certificates (EPCs) support regulation and retrofit planning, but their reliance on on-site inspections limits timely city-scale assessment. This study introduces a gated multimodal model to predict Standard Assessment Procedure (SAP) energy efficienc...
293	Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics 2605.05097 Multi-timescale memory for LLMs提出多时间尺度外部记忆机制实现LLM系统的持续知识更新。	cs.LGcs.AIcs.CL	Andreas Pattichis, Constantine Dovrolis	LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynam... LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen what repetition confirms, and let the rest fade. We argue that external memory should follow a similar principle. In Memini, this view takes the form of an associative memory that org...
294	Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning 2605.05102 Distributional regret in RL/bandits统一刻画老虎机与强化学习的分布式遗憾并给出算法与界。	cs.LGstat.ML	Harin Lee, Min-hwan Oh	We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $δ\in... We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $δ\in (0,1]$, thereby characterizing the regret distribution across the full range of $δ$. We present a simple UCBVI-style algorithm with exploration bonus $\min\{c_{1,k}/N, c_{2,k}/\sqrt{N}\}$, where $N$ denotes the visit count and $(c_{1,k},c_...
298	Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime 2605.05112 Pass-rate control in binary-reward RL通过控制rollout通过率将二值奖励RL引导到最信息区间。	cs.LG	Tianshu Zhu, Wenyu Zhang, Xiaoying Zuo, Lun Tian, Haotian Zhao	Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side sign... Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side signal is strongest near a 50% rollout pass rate under four criteria: reward entropy, group-filtering survival, leave-one-out (RLOO) advantage energy under Group Relative Policy Optimization (GRPO), and success-failure pair count. We propose Pr...
299	How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences 2605.05113 Finite-width recurrent signal propagation推导有限宽线性递归的信号传播公式并分析无限宽近似失效时长。	cs.LG	Mariia Seleznova	We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth $t$ grow... We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth $t$ grows jointly with width $n$. This question is especially relevant for modern recurrent sequence models, whose natural operating regime involves long input sequences, i.e., large $t$. We derive exact finite-width formulas for the hidden state s...
300	Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior 2605.05115 Activation manifold steering沿激活流形几何路径干预表示以检验其对模型行为的因果影响。	cs.LG	Daniel Wurgaft, Can Rager, Matthew Kowal, Vasudev Shyam, Sheridan Feucht	Neural representations carry rich geometric structure; but does that structure causally shape behavior? To address this question, we intervene along paths through activation space defined by different geometries, and measure the behavioral trajectories they in... Neural representations carry rich geometric structure; but does that structure causally shape behavior? To address this question, we intervene along paths through activation space defined by different geometries, and measure the behavioral trajectories they induce. In particular, we test whether interventions that respect the geometry of activation space will yield behaviors close to those the model exhibits naturally. Concretely, we first fit an activation manifold $M_h$ to representations and ...
301	On the Hardness of Junking LLMs 2605.05116 LLM越狱攻击难度研究无结构随机搜索式提示对LLM越狱的难度界限。	cs.LG	Marco Rando, Samuel Vaiter	Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by fixing an adversarial instruction and optimizing small a... Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by fixing an adversarial instruction and optimizing small adversarial components (e.g., suffixes or prefixes). In this setting, prompt structure is fundamental for performance, and recent results show that even simple random search can achieve strong performance when combined with sophisticated pro...
302	On the Wasserstein Gradient Flow Interpretation of Drifting Models 2605.05118 漂移生成模型理论从Wasserstein梯度流视角解析漂移生成建模的固定点机制。	cs.LGcs.AIstat.ML	Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli	Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functiona... Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the geometry of optimal transport. Unlike previous WGF-based contributions, GMD can be thought of as directly targeting a fixed point of a specific WGF flow. We demonstrate three main re...
303	Physiologically Grounded Driver Behavior Classification: SHAP-Driven Elite Feature Selection and Hybrid Gradient Boosting for Multimodal Physiological Signals 2605.05120 驾驶行为生理信号分类用SHAP精选特征并以梯度提升融合EEG/EMG/GSR识别驾驶行为。	cs.LGeess.SP	Sahar Askari, Mohammad Mahdi Mirza Ali Mohammadi, Fatemeh Ensafdoust, Amin Golnari, Saeid Sanei	An interpretable and scalable framework for decoding driving behaviors from multimodal physiological signals is proposed in this study. We utilize multimodal physiological driving behavior large-scale dataset comprising synchronized electroencephalogram (EEG),... An interpretable and scalable framework for decoding driving behaviors from multimodal physiological signals is proposed in this study. We utilize multimodal physiological driving behavior large-scale dataset comprising synchronized electroencephalogram (EEG), electromyography (EMG), and galvanic skin response (GSR) signals. Our approach involves rigorous preprocessing followed by a domain-specific feature extraction pipeline targeting time-domain, frequency-domain, and derived physiological ind...
305	Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning 2605.05123 离线到在线强化学习在交互预算下自适应选策略并在线微调以提升O2O-RL性能。	cs.LGcs.AI	Alper Kamil Bozkurt, Xiaoan Xu, Shangtong Zhang, Miroslav Pajic, Yuichi Motai	In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained ... In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained with offline RL are evaluated via either off-policy evaluation (OPE) or online evaluation (OE). The policy with the highest estimated value is then deployed and continually fine-tuned. However, this setup has two main issues. First, OPE can...
306	Conditional outlier detection for clinical alerting 2605.05124 临床告警异常检测基于EHR做条件异常检测以识别不寻常的患者管理操作并告警。	cs.LGcs.CY	Milos Hauskrecht, Michal Valko, Shyam Visweswaran, Iyad Batal, Gilles Clermont	We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management actions using past patient cases stored in an electronic health record (EHR) system. Our hypothesis is that patient-management actions that are unusual with res... We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management actions using past patient cases stored in an electronic health record (EHR) system. Our hypothesis is that patient-management actions that are unusual with respect to past patients may be due to a potential error and that it is worthwhile to raise an alert if such a condition is encountered. We evaluate this hypothesis using data obtained from the electronic health records of 4,486 post-cardiac s...
307	Forecasting Green Skill Demand in the Automotive Industry: Evidence from Online Job Postings 2605.05280 绿色技能需求预测从汽车行业招聘信息抽取技能并预测绿色技能需求趋势。	cs.LG	Sabur Butt, Joshua N. Arrazola E., Hector G. Ceballos, Patricia Caratozzolo	The global transition toward sustainable economies is reshaping labor markets, yet systematic methods for identifying and forecasting green skills remain limited. This study presents a computational framework to measure and predict green skill demand using onl... The global transition toward sustainable economies is reshaping labor markets, yet systematic methods for identifying and forecasting green skills remain limited. This study presents a computational framework to measure and predict green skill demand using online job postings from Mexico's automotive industry, which contributes about 4% of national GDP. We compile a dataset of job advertisements from Indeed Mexico, OCC Mundial, and LinkedIn (July 2024 to July 2025), yielding 204,373 skill record...
308	Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation 2605.05125 缺失EHR因果效应估计用时序因果流模型结合LLM驱动MNAR插补估计联合治疗效应。	cs.LGcs.AI	Olivia Jullian Parra, Sara Zoccheddu, David Catalan Cerezo, Tom Forzy, Franziska Ulrich	Target trial emulation (TTE) enables causal questions to be studied with observational data when randomized controlled trials (RCTs) are infeasible. Yet treatment-effect methods often address causal estimation, missingness, and temporal structure separately, l... Target trial emulation (TTE) enables causal questions to be studied with observational data when randomized controlled trials (RCTs) are infeasible. Yet treatment-effect methods often address causal estimation, missingness, and temporal structure separately, limiting their robustness in electronic health records (EHRs), where time-varying confounding and missing-not-at-random (MNAR) biomarkers can reach 50%--80%. We propose a two-stage pipeline for treatment effect estimation from incomplete lon...
309	Transformed Latent Variable Multi-Output Gaussian Processes 2605.05133 可扩展多输出高斯过程提出变换潜变量MOGP以在高维输出下保持表达力与可扩展性。	cs.LG	Xiaoyu Jiang, Xinxing Shi, Sokratia Georgaka, Magnus Rattray, Mauricio A Álvarez	Multi-Output Gaussian Processes (MOGPs) provide a principled probabilistic framework for modelling correlated outputs but face scalability bottlenecks when applied to datasets with high-dimensional output spaces. To maintain tractability, existing methods typi... Multi-Output Gaussian Processes (MOGPs) provide a principled probabilistic framework for modelling correlated outputs but face scalability bottlenecks when applied to datasets with high-dimensional output spaces. To maintain tractability, existing methods typically resort to restrictive assumptions, such as employing low-rank or sum-of-separable kernels, which can limit expressiveness. We propose the Transformed Latent Variable MOGP (T-LVMOGP), a novel framework that scales MOGPs to a massive nu...
310	Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction 2605.05134 LLM幻觉黑盒检测将LLM视为动力系统并用嵌入序列预测实现低成本幻觉检测。	cs.LGmath.DS	Dan Wilson, Mohamed Akrout	Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrie... Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations...
314	Human-AI Co-Mentorship in Project-Based Learning: A Case Study in Financial Forecasting 2605.05144 人机共导师项目学习案例研究AI工具如何辅助学生在金融预测项目中开展共导师学习。	cs.LGcs.CY	Freyaa Chawla, Ahan Chawla, Rishi Singh, Joe Germino, Grigorii Khvatskii	This paper reflects on a AI research project carried out by a team of high-school and early-undergraduate students under the mentorship of graduate researchers and ably assisted by AI tools. We share our experience in not only on the learning experience for th... This paper reflects on a AI research project carried out by a team of high-school and early-undergraduate students under the mentorship of graduate researchers and ably assisted by AI tools. We share our experience in not only on the learning experience for the high school students, but also on how AI tools accelerated the process that enabled the high school students to focus on higher order problem formulation and solution. Although the participants entered the project with limited background ...
316	Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting 2605.05151 时序Transformer机理解释用稀疏自编码器剖析Transformer做预测的表征机制并质疑叠加必要性。	cs.LGcs.AI	Alper Yıldırım	Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear m... Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the interna...
326	Attribution-Guided Continual Learning for Large Language Models 2605.05285 LLM持续学习抗遗忘用归因信息选择需保留或更新参数以缓解LLM灾难性遗忘。	cs.LG	Yazheng Liu, Yuxuan Wan, Rui Xu, Xi Zhang, Sihong Xie	Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or r... Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or regularization. However, these methods lack semantic awareness of internal knowledge distribution in LLMs. As a result, they cannot distinguish parameters that should be preserved or updated. We propose an attribution-guided continual fine-t...
330	Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer 2605.05176 非线性回归的ICL理论分析Transformer在非线性回归ICL中注意力作为特征提取器的机制。	cs.LGmath.NA	Alexander Hsu, Zhaiming Shen, Wenjing Liao, Rongjie Lai	Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding... Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realiz...
331	Estimating the expected output of wide random MLPs more efficiently than sampling 2605.05179 Wide MLP期望输出估计用累积量与Hermite展开无采样估计随机MLP期望输出。	cs.LGcond-mat.dis-nnstat.ML	Wilson Wu, Victor Lecomte, Michael Winer, George Robinson, Jacob Hilton	By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate ... By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate its expected output over Gaussian inputs without running samples through the network at all. Instead, we produce approximate representations of the distributions of activations at each layer, leveraging tools such as cumulants and Hermite e...
345	Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS 2605.05330 可微MWIS图归一化动力学提出Graph Normalization以可微动力系统快速逼近最大权独立集。	cs.LGcs.AIcs.DMcs.NE	Laurent Guigues	We introduce Graph Normalization (GN), a principled dynamical system on graphs that serves as a differentiable approximation engine for the NP-hard Maximum Weight Independent Set (MWIS) problem. MWIS encompasses many combinatorial challenges, including optimal... We introduce Graph Normalization (GN), a principled dynamical system on graphs that serves as a differentiable approximation engine for the NP-hard Maximum Weight Independent Set (MWIS) problem. MWIS encompasses many combinatorial challenges, including optimal assignment, scheduling, set packing, and MAP inference in discrete Markov Random Fields. Unlike Belief Propagation, we prove GN always converges to a binary indicator of a Maximum Independent Set. GN realizes a fast quasi-Newton descent th...
348	Feature Starvation as Geometric Instability in Sparse Autoencoders 2605.05341 稀疏自编码器特征饥饿机理将SAE死特征解释为几何不稳定并给出改进训练思路。	cs.LGcs.AImath.OCstat.ML	Faris Chaudhry, Keisuke Yano, Anthea Monod	Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $\ell_1$-regularized SAEs suffer from feature starvation (dead neur... Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $\ell_1$-regularized SAEs suffer from feature starvation (dead neurons) and shrinkage bias, often requiring computationally expensive heuristic resampling and nondifferentiable hard-masking methods to bypass these challenges. We argue that feature starvation is not merely an empirical artifact of poor data...
354	A Multi-Head Attention Approach for SLA Compliance Monitoring in Data Centers 2605.05354 数据中心SLA合规监测模型将SLA规则JSON化生成训练数据并用Transformer预测违规风险。	cs.LG	Omanshu Thapliyal	Service level agreements (SLAs) in data center colocation contracts define precise thresholds for power, temperature, and humidity, with tiered violation penalties expressed as credits against monthly recurring charges. Traditional reactive monitoring detects ... Service level agreements (SLAs) in data center colocation contracts define precise thresholds for power, temperature, and humidity, with tiered violation penalties expressed as credits against monthly recurring charges. Traditional reactive monitoring detects breaches only after they occur, limiting remediation opportunities. We present a framework that encodes SLA rules as structured JSON objects to generate training data without manual annotation. We train a per-customer multi-head transformer...
355	Balancing Stability and Plasticity in Sequentially Trained Early-Exiting Neural Networks 2605.05358 早退网络顺序训练稳定性缓解新增出口干扰旧出口，平衡早退网络稳定性与可塑性。	cs.LGcs.CV	Alaa Zniber, Ouassim Karrakchou, Mounir Ghogho	Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them t... Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them to a shared backbone; however, this sequential training can cause newly introduced exits to interfere with previously learned ones, degrading the performance of earlier classifiers. We address this problem by retaining the knowledge embedded...
356	COPYCOP: Ownership Verification for Graph Neural Networks 2605.05360 图神经网络模型所有权验证提出CopyCop在可变架构与嵌入变换下验证GNN是否被抄袭。	cs.LGcs.AI	Rahul Nandakumar, Deepayan Chakrabarti	Given two GNNs that output node embeddings, how can we determine if they were trained independently? An adversary could have trained one GNN specifically to mimic the other GNN's embeddings. To obscure this relationship between the GNNs, the adversarial GNN mi... Given two GNNs that output node embeddings, how can we determine if they were trained independently? An adversary could have trained one GNN specifically to mimic the other GNN's embeddings. To obscure this relationship between the GNNs, the adversarial GNN might then transform its output embeddings. The two GNNs could have different architectures, weights, and embedding dimensions, and the adversary can transform the embeddings. Despite these stringent conditions, our algorithm (named CopyCop) ...
360	SPADE: Faster Drug Discovery by Learning from Sparse Data 2605.05370 稀疏数据驱动药物发现提出SPADE在少量实验数据下高效选择配体加速药物筛选。	cs.LGcs.AI	Rahul Nandakumar, Ben Fauber, Deepayan Chakrabarti	Drug discovery seeks molecules (ligands) that bind strongly and selectively to a target protein. However, fewer than 5% of candidate ligands pass the bar for even the early stages of drug discovery. Furthermore, we want methods that work for novel proteins for... Drug discovery seeks molecules (ligands) that bind strongly and selectively to a target protein. However, fewer than 5% of candidate ligands pass the bar for even the early stages of drug discovery. Furthermore, we want methods that work for novel proteins for which we have no prior data. Starting from scratch, we have to iteratively select and test candidate ligands such that we find enough ligands of the desired quality in as few tests as possible. Our proposed algorithm, named SPADE, introduc...
362	Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning 2605.05373 可解释循环强化学习将RNN策略隐状态与PMP协态联系以提升可解释性。	cs.LG	David Leeftink, Max Hinne, Marcel van Gerven	A key capability of intelligent agents is operating under partial observability: reasoning and acting effectively despite missing or incomplete state observations. While recurrent (memory-based) policies learned via reinforcement learning address this by encod... A key capability of intelligent agents is operating under partial observability: reasoning and acting effectively despite missing or incomplete state observations. While recurrent (memory-based) policies learned via reinforcement learning address this by encoding history into latent state representations, their internal dynamics remain uninterpretable black boxes. This paper establishes a formal link between these hidden states and the Pontryagin minimum principle (PMP) from optimal control. We ...
367	Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees 2605.05387 线性约束条件扩散采样研究线性约束下扩散条件采样并给出混合与信息论保证。	cs.LGcs.IT	Ahmad Aghapour, Erhan Bayraktar, Asaf Cohen	We study zero-shot conditional sampling with pretrained diffusion models for linear inverse problems, including inpainting and super-resolution. In these problems, the observation determines only part of the unknown signal. The remaining degrees of freedom mus... We study zero-shot conditional sampling with pretrained diffusion models for linear inverse problems, including inpainting and super-resolution. In these problems, the observation determines only part of the unknown signal. The remaining degrees of freedom must be sampled according to the correct conditional data distribution. Existing projection-based samplers enforce measurement consistency by correcting the observed component during reverse diffusion. However, measurement consistency alone do...
368	Two-Stage Learned Decomposition for Scalable Routing on Multigraphs 2605.05389 多重图车辆路径规划用两阶段分解与节点边因子化实现可扩展多重图路由。	cs.LGcs.AI	Filip Rydin, Morteza Haghir Chehreghani, Balázs Kulcsár	Most neural methods for Vehicle Routing Problems (VRPs) are limited to Euclidean settings or simple graphs. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade-offs (e.g., distance vs time). ... Most neural methods for Vehicle Routing Problems (VRPs) are limited to Euclidean settings or simple graphs. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade-offs (e.g., distance vs time). Few methods are designed for such formulations and those that do exist face major scalability issues. We mitigate these scalability issues via a Node-Edge Policy Factorization (NEPF) approach, which splits the routing policy into a node per...
371	Differentiable Parameter Optimization for DAEs with State-Dependent Events 2605.05395 事件DAE可微优化为含状态事件的DAE建立可微参数学习与梯度计算方法。	cs.LGcs.MS	Ion Matei, Maksym Zhenirovskyy, Anthony Wong	Differential-algebraic equations (DAEs) with state-dependent events arise in systems whose continuous dynamics are constrained by algebraic equations and interrupted by mode changes, switching logic, impacts, or state reinitializations. Gradient-based paramete... Differential-algebraic equations (DAEs) with state-dependent events arise in systems whose continuous dynamics are constrained by algebraic equations and interrupted by mode changes, switching logic, impacts, or state reinitializations. Gradient-based parameter learning for such systems is challenging because algebraic variables are implicitly defined, event times depend on the parameters, and reset maps introduce discontinuities. This paper studies differentiable parameter optimization for semi...
381	Information Theoretic Adversarial Training of Large Language Models 2605.05415 信息论对抗训练LLM用信息论目标设计可扩展对抗训练提升提示鲁棒性。	cs.LGcs.AIcs.CR	Yiwei Zhang, Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia, Jason Pacheco	Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are compu... Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are computationally expensive and difficult to scale. Recent continuous adversarial training methods, such as Continuous adversarial training (CAT) and Continuous Adversarial Preference Optimization (CAPO), address this challenge by leveraging gradi...
385	Active Learning for Conditional Generative Compressed Sensing 2605.05435 生成压缩感知主动学习研究条件生成压缩感知中采样分布的主动设计与恢复。	cs.LGmath.NA	Alexander DeLise, Nick Dexter	Generative compressed sensing uses the range of a pretrained generator as a nonlinear model for recovering structured signals from limited measurements. We study a conditional version of this problem for image recovery from subsampled Fourier measurements usin... Generative compressed sensing uses the range of a pretrained generator as a nonlinear model for recovering structured signals from limited measurements. We study a conditional version of this problem for image recovery from subsampled Fourier measurements using prompt-conditioned generative models. Our framework separates two roles of conditioning: the prompt used to design the sampling distribution and the prompt used to define the recovery model. For ReLU and Lipschitz conditional generators, ...
387	On Semantic Loss Fine-Tuning Approach for Preventing Model Collapse in Causal Reasoning 2605.05438 因果推理微调防崩塌用语义损失抑制因果任务微调时的模型塌缩与投机解。	cs.LGcs.AI	Pratik Deshmukh, Atirek Gupta	Standard fine-tuning of transformer models on causal reasoning tasks leads to catastrophic model collapse, where models learn trivial solutions such as always predicting "Yes" or "No" regardless of input structure. We demonstrate that fine-tuning Gemma 270M on... Standard fine-tuning of transformer models on causal reasoning tasks leads to catastrophic model collapse, where models learn trivial solutions such as always predicting "Yes" or "No" regardless of input structure. We demonstrate that fine-tuning Gemma 270M on transitivity and d-separation tasks without semantic loss results in 100% collapse rate, with models achieving misleadingly high accuracy (73.9%) while learning no causal reasoning. We propose a semantic loss function with graph-based logi...
395	Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs 2605.05463 Robust Graph Self-Supervised Learning研究文本抽取生物医学图噪声下GSSL的鲁棒性。	cs.LGcs.AI	Othmane Kabal, Mounira Harzallah, Fabrice Guillet, Hideaki Takeda, Ryutaro Ichise	Graph Self-Supervised Learning (GSSL) offers a powerful paradigm for learning graph representations without labeled data. However, existing work assumes clean, manually curated graphs. Recent advances in NLP enable the large-scale automatic extraction of knowl... Graph Self-Supervised Learning (GSSL) offers a powerful paradigm for learning graph representations without labeled data. However, existing work assumes clean, manually curated graphs. Recent advances in NLP enable the large-scale automatic extraction of knowledge graphs from text, opening new opportunities for GSSL while introducing substantial real-world noise. This type of noise remains largely unexplored, as prior robustness studies typically rely on synthetic perturbations. To address this ...
398	A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks 2605.05476 Benchmark for Text-KG and GNNs构建统一基准区分知识图构建质量与GNN性能影响。	cs.LGcs.AIcs.CL	Othmane Kabal, Mounira Harzallah, Fabrice Guillet, Hideaki Takeda, Ryutaro Ichise	Knowledge graphs automatically constructed from text are increasingly used in real-world applications. However, their inherent noise, fragmentation, and semantic inconsistencies significantly affect the performance of Graph Neural Networks (GNNs) on downstream... Knowledge graphs automatically constructed from text are increasingly used in real-world applications. However, their inherent noise, fragmentation, and semantic inconsistencies significantly affect the performance of Graph Neural Networks (GNNs) on downstream tasks. Assessing their performance and robustness remains difficult, as it is often unclear whether observed results stem from the learning model or from the quality of the constructed graph itself. In this work, we introduce a dual-purpos...
400	GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation 2605.05480 Unified Linear Attribution Theory用Riesz表示统一刻画多种线性可解释性归因方法。	cs.LGcs.AIstat.ML	Raimondo Fanale	The main XAI attribution methods for deep neural networks -- GradCAM, SHAP, LIME, Integrated Gradients -- operate on separate theoretical foundations and are not formally comparable. We present GRALIS (Gradient-Riesz Averaged Locally-Integrated Shapley), a mat... The main XAI attribution methods for deep neural networks -- GradCAM, SHAP, LIME, Integrated Gradients -- operate on separate theoretical foundations and are not formally comparable. We present GRALIS (Gradient-Riesz Averaged Locally-Integrated Shapley), a mathematical framework establishing a representation theory for attributions: every additive, linear, and continuous attribution functional on L^2(Q,mu) admits a unique canonical representation (Q, w, Delta), proved necessary by the Riesz Repr...
401	Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL 2605.05481 Next-Policy Sampling in Deep RL用近似下一策略采样替代保守更新以提升RL改进效率。	cs.LG	Dillon Sandhu, Ronald Parr	We revisit a classic "chicken-and-egg" problem in reinforcement learning: to safely improve a policy, the value function must be accurate on the state-visitation distribution of the updated policy. That distribution over states is unknown and cannot be sampled... We revisit a classic "chicken-and-egg" problem in reinforcement learning: to safely improve a policy, the value function must be accurate on the state-visitation distribution of the updated policy. That distribution over states is unknown and cannot be sampled for the purposes of training the value function. Conservative updates solve this problem, but at the cost of shrinking the policy update. This paper explores an alternative solution, Approximate Next Policy Sampling (ANPS), which addresses...
404	A Robust Foundation Model for Conservation Laws: Injecting Context into Flux Neural Operators via Recurrent Vision Transformers 2605.05488 Context-Conditioned Neural Operators用循环ViT注入上下文生成算子参数以求解守恒律。	cs.LG	Taeyoung Kim, Joon-Hyuk Ko	We propose an architecture that augments the Flux Neural Operator (Flux NO), which combines the classical finite volume method (FVM) with neural operators, with ViT-based context injection. Our model is formulated as a hypernetwork: it extracts solution dynami... We propose an architecture that augments the Flux Neural Operator (Flux NO), which combines the classical finite volume method (FVM) with neural operators, with ViT-based context injection. Our model is formulated as a hypernetwork: it extracts solution dynamics over a finite temporal window, encodes them with a recurrent Vision Transformer, and generates the parameters of a context-conditioned neural operator. This enables the model to infer and solve conservation laws without explicit access t...
405	MEMOA: Massive Mixtures of Online Agents via Mean-Field Decentralized Nash Equilibria 2605.05492 Mean-Field Decentralized Multi-Agent Learning推导均值场去中心化纳什策略以扩展海量在线代理混合。	cs.LG	Xuwei Yang, David B. Emerson, Fatemeh Tavakoli, Anastasis Kratsios	In the modern age of large-scale AI, federated learning has become an increasingly important tool for training large populations of AI agents; however, its computational and communication costs can rapidly fail to scale with the number of agents. This is preci... In the modern age of large-scale AI, federated learning has become an increasingly important tool for training large populations of AI agents; however, its computational and communication costs can rapidly fail to scale with the number of agents. This is precisely where decentralized agentic strategies shine: each agent acts autonomously, using only its own state together with a minimal summary of the ensemble, namely the mean-field. We derive the unique optimal decentralized policy in closed fo...
407	Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning 2605.05495 Transformer Shortcuts in Continual Reasoning揭示Transformer学到捷径解会损害持续组合推理能力。	cs.LG	William T. Redman, Erik C. Johnson, Brian Robinson	Identifying and exploiting common features across domains is at the heart of the human ability to make analogies, and is believed to be crucial for the ability to continually learn. To do this successfully, general and flexible computational strategies must be... Identifying and exploiting common features across domains is at the heart of the human ability to make analogies, and is believed to be crucial for the ability to continually learn. To do this successfully, general and flexible computational strategies must be developed. While the extent to which Transformer neural network models can perform compositional reasoning has been the subject of intensive recent investigation, little work has been done to systematically understand how well these models...
408	Online Localized Conformal Prediction 2605.05497 Online Localized Conformal Prediction提出在线局部化共形预测以应对异质协变量下的校准。	cs.LG	Yuheng Lai, Garvesh Raskutti	Conformal prediction is a framework that provides valid uncertainty quantification for general models with exchangeable data. However, in the online learning and time-series settings, exchangeability is not satisfied. Existing online conformal methods, such as... Conformal prediction is a framework that provides valid uncertainty quantification for general models with exchangeable data. However, in the online learning and time-series settings, exchangeability is not satisfied. Existing online conformal methods, such as adaptive conformal inference (ACI), can achieve long-run validity, yet they remain inefficient under covariate heterogeneity because they rely on global calibration. We propose \emph{Online Localized Conformal Prediction (OLCP)}, which com...
413	Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients 2605.05511 Active Feature Acquisition with Policy Gradients用路径式策略梯度实现非短视的主动特征获取决策。	cs.LGstat.ML	Linus Aronsson, Morteza Haghir Chehreghani	Active feature acquisition (AFA) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict. AFA can be formulated as a partially observ... Active feature acquisition (AFA) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict. AFA can be formulated as a partially observable Markov decision process (POMDP), which naturally admits a sequential decision-making perspective. In this paper, we present non-myopic pathwise policy gradients (NM-PPG), a new AFA method built around this formulation. We introduce a c...
415	OpenG2G: A Simulation Platform for AI Datacenter-Grid Runtime Coordination 2605.05519 Datacenter-Grid Coordination Simulation提供仿真平台研究数据中心负载与电网信号的运行协同。	cs.LGcs.DC	Jae-Won Chung, Zhirui Liang, Yanyong Mao, Jiasi Chen, Mosharaf Chowdhury	AI's growing compute demand and new datacenter buildouts present major capacity and reliability challenges for the electricity grid, leading to multi-year interconnection delays for new datacenters and bottlenecking AI growth. To ease this strain, datacenters ... AI's growing compute demand and new datacenter buildouts present major capacity and reliability challenges for the electricity grid, leading to multi-year interconnection delays for new datacenters and bottlenecking AI growth. To ease this strain, datacenters increasingly offer rapid power flexibility in response to grid signals, where the datacenter can increase or decrease its power consumption by adapting its workload in real time. In order to understand the impact of large datacenters on t...
416	Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors 2605.05520 Bayesian Rain Reconstruction with Diffusion Priors将降雨场重建建模为贝叶斯逆问题并引入扩散先验。	cs.LGstat.APstat.ML	Badr Moufad, Albina Ilina, Hai Victor Habi, Salem Lahlou, Yazid Janati	Commercial Microwave Links (CMLs) offer dense spatial coverage for rainfall sensing but produce path-integrated measurements that make accurate ground-level reconstruction challenging. Existing methods typically oversimplify CMLs as point sensors and neglect l... Commercial Microwave Links (CMLs) offer dense spatial coverage for rainfall sensing but produce path-integrated measurements that make accurate ground-level reconstruction challenging. Existing methods typically oversimplify CMLs as point sensors and neglect line integration relating rainfall to signal attenuation, resulting in degraded performance under heterogeneous precipitation. In this work, we view rain field reconstruction as a Bayesian inverse problem with Diffusion Models (DMs) as high-...
419	MOSAIC: Module Discovery via Sparse Additive Identifiable Causal Learning for Scientific Time Series 2605.05524 Identifiable Causal Module Discovery用稀疏可加可识别因果学习发现科学时间序列模块结构。	cs.LGcs.AI	Shicheng Fan, Nour Elhendawy, Jianle Sun, Ke Fang, Kun Zhang	Causal representation learning (CRL) seeks to recover latent variables with identifiability guarantees, typically up to permutation and component-wise reparameterization under appropriate assumptions. However, identifiability does not imply interpretability: l... Causal representation learning (CRL) seeks to recover latent variables with identifiability guarantees, typically up to permutation and component-wise reparameterization under appropriate assumptions. However, identifiability does not imply interpretability: latent semantics are typically assigned post hoc by alignment with known ground-truth factors. This limitation is particularly acute in scientific time series, where underlying mechanisms are unknown and discovering interpretable structure i...
cs.MA 2 papers
68	DAO-enabled decentralized physical AI: A new paradigm for human-machine collaboration 2605.04522 DAO-governed decentralized physical AI提出DAO治理的去中心化物理AI架构以协同人机与基础设施	cs.MAcs.AIcs.CYecon.GN	Mark C. Ballandies, Florian Spychiger, Uwe Serdült, Claudio J. Tessone	We propose DAO-enabled decentralized physical AI (DePAI), a democratic architecture for coordinating humans and autonomous machines in the operation and governance of physical-digital systems. We (1) synthesize foundations in blockchains, decentralized autonom... We propose DAO-enabled decentralized physical AI (DePAI), a democratic architecture for coordinating humans and autonomous machines in the operation and governance of physical-digital systems. We (1) synthesize foundations in blockchains, decentralized autonomous organizations (DAOs), and cryptoeconomics; (2) connect DAO design with digital-democracy research on deliberation and voting, showing how each can advance the other; (3) position DAO-governed decentralized physical infrastructure networ...
218	Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation 2605.04922 Graph-based multi-agent ideation用可学习编辑提交的想法图协调多智能体迭代生成科研点子。	cs.MAcs.AI	Jiangwen Dong, Bo Li, Wanyu Lin	LLM-empowered multi-agent systems offer new potential to accelerate scientific discovery by generating novel research ideas. However, existing methods typically coordinate agents through temporary texts, such as drafts or chat logs; it is difficult to pinpoint... LLM-empowered multi-agent systems offer new potential to accelerate scientific discovery by generating novel research ideas. However, existing methods typically coordinate agents through temporary texts, such as drafts or chat logs; it is difficult to pinpoint the weaknesses in the generated ideas and how the agents refine them. To this end, we introduce \textbf{Evolving Idea Graphs} (EIG), a graph-based multi-agent scientific ideation framework that can generate high-performance research ideas ...
cs.MM 1 papers
194	To Fuse or to Drop? Dual-Path Learning for Resolving Modality Conflicts in Multimodal Emotion Recognition 2605.04877 Modality conflict in emotion recognition提出双路径学习区分可解与不可解冲突以改进多模态情感识别。	cs.MMcs.HCcs.LG	Yangchen Yu, Qian Chen, Jia Li, Zhenzhen Hu, Jinpeng Hu	Multimodal emotion recognition (MER) benefits from combining text, audio, and vision, yet standard fusion often fails when modalities conflict. Crucially, conflicts differ in resolvability: benign conflicts stem from missing, weak, or ambiguous cues and can be... Multimodal emotion recognition (MER) benefits from combining text, audio, and vision, yet standard fusion often fails when modalities conflict. Crucially, conflicts differ in resolvability: benign conflicts stem from missing, weak, or ambiguous cues and can be mitigated by cross-modal calibration, while severe conflicts arise from intrinsically contradictory (e.g., sarcasm) or misleading signals, for which forced fusion may amplify errors. Recognizing this, we propose Dual-Path Conflict Resoluti...
cs.NE 2 papers
229	On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization 2605.04954 Budgeted algorithm selection for BBO研究黑盒优化中计算特征的预算占比何时使算法选择更划算。	cs.NEcs.LG	Koen van der Blom, Diederick Vermetten	Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization ... Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes...
324	Direct From Darwin: Deriving Advanced Optimizers From Evolutionary First Principles 2605.05284 进化原理推导优化器从达尔文进化一阶原理推导谱系模拟并导出一系列梯度优化算法。	cs.NEcs.LGq-bio.PEq-bio.QM	Daniel Grimmer	Evolutionary computation has long promised to deliver both high-performance optimization tools as well as rigorous scientific simulations of Darwinian evolution. However, modern algorithms frequently abandon evolutionary fidelity for physics-inspired heuristic... Evolutionary computation has long promised to deliver both high-performance optimization tools as well as rigorous scientific simulations of Darwinian evolution. However, modern algorithms frequently abandon evolutionary fidelity for physics-inspired heuristics or superficial biological metaphors. This paper derives a suite of advanced gradient-based optimization algorithms directly from evolutionary first principles. We introduce Darwinian Lineage Simulations (DLS) to prove that, in an asexual ...
cs.NI 4 papers
5	Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers 2605.04373 Robust RL Network Control提出ReGuard发现最坏网络条件并在运行时保护RL控制器性能。	cs.NIcs.AIeess.SY	Hongyu Hè, Minhao Jin, Maria Apostolaki	RL-based controllers achieve strong average-case performance in networking tasks such as congestion control and adaptive bitrate streaming. Yet their performance can degrade severely under network conditions where strong performance is still achievable. Identi... RL-based controllers achieve strong average-case performance in networking tasks such as congestion control and adaptive bitrate streaming. Yet their performance can degrade severely under network conditions where strong performance is still achievable. Identifying such conditions and quantifying the resulting performance gap is intractable by enumeration, while the sequential and closed-loop nature of RL controllers makes formal verification methods impractical. We present ReGuard, a framewor...
26	Joint Optimization of Trajectory Control, Resource Allocation, and Task Offloading for Multi-UAV-Assisted IoV 2605.04436 Multi-UAV IoV Joint Optimization联合优化多无人机轨迹、资源分配与任务卸载以降低时延能耗。	cs.NIcs.AI	Maoxin Ji, Qiong Wu, Pingyi Fan, Cui Zhang, Nan Cheng	This paper investigates a multi-Unmanned Aerial Vehicle (UAV) joint base station-assisted Internet of Vehicles (IoV) task offloading system in dense urban environments. To minimize system delay and energy consumption under strict coupling constraints, the comp... This paper investigates a multi-Unmanned Aerial Vehicle (UAV) joint base station-assisted Internet of Vehicles (IoV) task offloading system in dense urban environments. To minimize system delay and energy consumption under strict coupling constraints, the complex non-convex optimization problem is decoupled into a hierarchical execution framework. First, a sequential distributed optimization algorithm based on Second-Order Cone Programming (SOCP) is proposed to optimize the 3D flight trajectory ...
73	SADE: Symptom-Aware Diagnostic Escalation for LLM-Based Network Troubleshooting 2605.04530 LLM agent for network troubleshooting以症状驱动分层升级流程提升LLM网络故障根因定位	cs.NIcs.AI	Kuan-Hao Tseng, Niruth Bogahawatta, Yasod Ginige, Kosta Dekic, Arunan Sivanathan	Large language model (LLM) agents are increasingly applied to network troubleshooting, but root-cause localization on public benchmarks remains well below practical deployment thresholds. We argue this is because existing agents do not encode the disciplined, ... Large language model (LLM) agents are increasingly applied to network troubleshooting, but root-cause localization on public benchmarks remains well below practical deployment thresholds. We argue this is because existing agents do not encode the disciplined, layer-by-layer methodology that human network engineers use, and instead rely on free-form deliberation that conflates evidence acquisition with hypothesis commitment. We present SADE (Symptom-Aware Diagnostic Escalation), an agent that enc...
279	Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity 2605.05071 Vision-aided mmWave beam management用摄像头先验实现车联网毫米波双向波束快速对齐与跟踪。	cs.NIcs.AIcs.CEcs.CVeess.SY	Avhishek Biswas, Apala Pramanik, Eylem Ekici, Mehmet C. Vuran	Millimeter-wave (mmWave) frequencies promise multi-gigabit connectivity for vehicle-to-everything (V2X) networks, but face challenges in terms of severe path loss and mobility-related beam misalignment. Reliable V2X connectivity requires fast, double-direction... Millimeter-wave (mmWave) frequencies promise multi-gigabit connectivity for vehicle-to-everything (V2X) networks, but face challenges in terms of severe path loss and mobility-related beam misalignment. Reliable V2X connectivity requires fast, double-directional beam alignment. However, existing methods suffer from high training overhead and limited generalization to unseen scenarios. This paper presents VIsion-based BEamforming(VIBE), a hybrid model-based, closed-loop, learning architecture for...
cs.PL 1 papers
313	Beyond BLEU: A Semantic Evaluation Method for Code Translation 2605.05282 代码翻译语义评测用编译器测试思想评估代码翻译的语义等价性以替代BLEU。	cs.PLcs.CL	Julius Näumann, Sven Keidel, Amir Molzam Sharifloo, Mira Mezini	Code translation is one of the core capabilities of LLMs. However, evaluating the correctness of translations remains difficult, as commonly used metrics such as BLEU measure only syntactic similarity, disregarding program semantics. We propose a novel evaluat... Code translation is one of the core capabilities of LLMs. However, evaluating the correctness of translations remains difficult, as commonly used metrics such as BLEU measure only syntactic similarity, disregarding program semantics. We propose a novel evaluation methodology for code translation tasks, emphasizing semantic equivalence over surface-level string similarity. Our approach applies established compiler testing methodology to a new domain, allowing the assessment of an LLM fine-tuned f...
cs.RO 8 papers
3	Conditional Flow-VAE for Safety-Critical Traffic Scenario Generation 2605.04366 Traffic Scenario Generation用条件流匹配生成逼真且可扩展的安全关键交通场景。	cs.ROcs.LG	Zimu Gong, Brian Zhaoning Zhang, Chris Zhang, Kelvin Wong, Raquel Urtasun	Safety-critical scenarios are essential for the development of autonomous vehicles (AVs) but are rare in real-world driving data. While simulation offers a way to generate such scenarios, manually designed test cases lack scalability, and adversarial optimizat... Safety-critical scenarios are essential for the development of autonomous vehicles (AVs) but are rare in real-world driving data. While simulation offers a way to generate such scenarios, manually designed test cases lack scalability, and adversarial optimization often produces unrealistic behaviors. In this work, we introduce a conditional latent flow matching approach for scalable and realistic safety-critical scenario generation. Our method uses distribution matching to transform nominal scen...
129	From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models 2605.04678 Latent Action Supervision for VLA系统比较图像与动作两类潜在动作监督以统一VLA模型训练。	cs.ROcs.CV	Yihan Lin, Haoyang Li, Yang Li, Haitao Shen, Yihan Zhao	Latent actions serve as an intermediate representation that enables consistent modeling of vision-language-action (VLA) models across heterogeneous datasets. However, approaches to supervising VLAs with latent actions are fragmented and lack a systematic compa... Latent actions serve as an intermediate representation that enables consistent modeling of vision-language-action (VLA) models across heterogeneous datasets. However, approaches to supervising VLAs with latent actions are fragmented and lack a systematic comparison. This work structures the study of latent action supervision from two perspectives: (i) regularizing the trajectory via image-based latent actions, and (ii) unifying the target space with action-based latent actions. Under a unified V...
223	Modular Reinforcement Learning For Cooperative Swarms 2605.04939 Modular RL for robot swarms提出模块化多智能体强化学习以实现协作机器人群体控制。	cs.ROcs.AI	Erel Shtossel, Gal A. Kaminka	A cooperative robot swarm is a collective of computationally-limited robots that share a common goal. Each robot can only interact with a small subset of its peers, without knowing how this affects the collective utility. Recent advances in distributed multi-a... A cooperative robot swarm is a collective of computationally-limited robots that share a common goal. Each robot can only interact with a small subset of its peers, without knowing how this affects the collective utility. Recent advances in distributed multi-agent reinforcement learning have demonstrated that it is possible for robots to learn how to interact effectively with others, in a manner that is aligned with the common goal, despite each robot learning independently of others. However, t...
271	Reduced-order Neural Modeling with Differentiable Simulation for High-Detail Tactile Perception 2605.05053 Neural reduced-order tactile simulation用粗粒度MPM加神经解码器高效重建触觉细节。	cs.ROcs.CV	Yuhu Guo, Zhikai Shen, Jiasheng Qu, Chenghao Qian, Yuming Huang	Tactile perception is key to dexterous manipulation, yet simulating high-resolution elastomer deformation remains computationally prohibitive. Finite element methods (FEM) deliver high fidelity but demand costly remeshing, while Material Point Methods (MPM) su... Tactile perception is key to dexterous manipulation, yet simulating high-resolution elastomer deformation remains computationally prohibitive. Finite element methods (FEM) deliver high fidelity but demand costly remeshing, while Material Point Methods (MPM) suffer from heavy particle-memory tradeoffs. We propose a {reduced-order neural simulation framework} that couples coarse-grained MPM dynamics with an implicit neural decoder to reconstruct sub-particle tactile details from compact latent sta...
290	Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout 2605.05092 Driver-centric latent world model提出交通条件驱动的车内驾驶员潜变量世界模型进行多步滚动预测。	cs.ROcs.AIcs.CV	Haozhuang Chi, Daosheng Qiu, Hao Su, Haochen Liu, Zirui Li	Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-st... Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics foreca...
297	LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts 2605.05110 Line-guided RL for robot stunts用用户给定空间线索与稀疏姿态约束训练自行车机器人特技。	cs.ROcs.AI	Seungeun Rho, Shamel Fahmi, Jeonghwan Kim, Arianna Ilvonen, Sehoon Ha	Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guid... Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physi...
328	When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning 2605.05172 从BC提取Q函数强化学习提出Q2RL从行为克隆估计并门控Q值以实现机器人离线到在线提升。	cs.ROcs.AI	Lakshita Dodeja, Ondrej Biza, Shivam Vats, Stephen Hart, Stefanie Tellex	Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to ... Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline...
379	Creative Robot Tool Use by Counterfactual Reasoning 2605.05411 机器人反事实工具使用用因果发现与反事实推理实现机器人创造性工具选择。	cs.ROcs.AI	M. Tuluhan Akbulut, Varun Satheesh, Ahmed Jaafar, Alper Ahmetoglu, Shane Parr	We propose a causal reasoning framework for creative robot tool use where a suitable tool for a task is correctly identified for use beyond its primary objectives. The proposed framework first discovers the causal relationships between the tool and the task by... We propose a causal reasoning framework for creative robot tool use where a suitable tool for a task is correctly identified for use beyond its primary objectives. The proposed framework first discovers the causal relationships between the tool and the task by conducting simulated experiments in a dynamics model. We decouple the causal discovery problem into two complementary components: VLM-based feature suggestion and counterfactual tool generation via targeted geometric and physical feature p...
cs.SD 5 papers
82	Stage-adaptive audio diffusion modeling 2605.04547 Stage-adaptive audio diffusion training提出阶段自适应训练策略降低音频扩散模型训练成本并提质	cs.SDcs.AI	Xuanhao Zhang, Chang Li	Recent progress in diffusion-based audio generation and restoration has substantially improved performance across heterogeneous conditioning regimes, including text-conditioned audio generation and audio-conditioned super-resolution. However, training audio di... Recent progress in diffusion-based audio generation and restoration has substantially improved performance across heterogeneous conditioning regimes, including text-conditioned audio generation and audio-conditioned super-resolution. However, training audio diffusion models remains computationally expensive, and most existing pipelines still rely on static optimization recipes that treat the relative importance of training signals as fixed throughout learning. In this work, we argue that a major...
88	Benchmarking LLMs on the Massive Sound Embedding Benchmark (MSEB) 2605.04556 Benchmarking audio-native LLM embeddings在MSEB上系统评测多种音频原生LLM的声音嵌入能力	cs.SDcs.LG	Cyril Allauzen, Tom Bagby, Georg Heigold, Ehsan Variani, Ke Wu	The Massive Sound Embedding Benchmark (MSEB) has emerged as a standard for evaluating the functional breadth of audio models. While initial baselines focused on specialized encoders, the shift toward "audio-native" Large Language Models (LLMs) suggests a new p... The Massive Sound Embedding Benchmark (MSEB) has emerged as a standard for evaluating the functional breadth of audio models. While initial baselines focused on specialized encoders, the shift toward "audio-native" Large Language Models (LLMs) suggests a new paradigm where a single multimodal backbone may replace complex, task-specific pipelines. This paper provides a rigorous empirical evaluation of leading LLMs - including members from the Gemini and GPT families - across the eight core MSEB c...
108	VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models 2605.04613 Singing voice transcription with LALM用大型音频语言模型统一实现可扩展的歌声转录与对齐。	cs.SDcs.AI	Yukun Chen, Tianrui Wang, Zhaoxi Mu, Xinyu Yang, EngSiong Chng	High-quality singing annotations are fundamental to modern Singing Voice Synthesis (SVS) systems. However, obtaining these annotations at scale through manual labeling is unrealistic due to the substantial labor and musical expertise required, making automatic... High-quality singing annotations are fundamental to modern Singing Voice Synthesis (SVS) systems. However, obtaining these annotations at scale through manual labeling is unrealistic due to the substantial labor and musical expertise required, making automatic annotation highly necessary. Despite their utility, current automatic transcription systems face significant challenges: they often rely on complex multi-stage pipelines, struggle to recover text-note alignments, and exhibit poor generaliz...
183	Hearing the Ocean: Bio-inspired Gammatone-CNN framework for Robust Underwater Acoustic Target Classification 2605.04839 Underwater acoustic classification用仿生Gammatone滤波结合CNN提升水下目标识别抗噪性。	cs.SD	Rajeshwar Tripathi, Sandeep Kumar, Monika Aggarwal, Neel Kanth Kundu	This study presents a bio inspired signal processing framework for robust Underwater Acoustic Target Recognition (UATR). The latest state of the art methods often fail to resolve dense low frequency harmonic structures in vessel propulsion signals under high n... This study presents a bio inspired signal processing framework for robust Underwater Acoustic Target Recognition (UATR). The latest state of the art methods often fail to resolve dense low frequency harmonic structures in vessel propulsion signals under high noise conditions, which is addressed by the proposed framework using a biologically inspired Gammatone filter bank that emulates the cochlea nonlinear frequency selectivity. By distributing filters according to the Equivalent Rectangular Ban...
249	Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation 2605.04998 跨流派和弦生成实证研究流行与爵士数据混合比例对和弦生成微调迁移的影响。	cs.SDcs.IRcs.LG	Jinju Lee	Chord progression generation is practically important but understudied. Most large-scale symbolic music systems target melody, multi-track arrangement, or audio synthesis, and chord-only models tend to be relegated to conditioning components inside larger pipe... Chord progression generation is practically important but understudied. Most large-scale symbolic music systems target melody, multi-track arrangement, or audio synthesis, and chord-only models tend to be relegated to conditioning components inside larger pipelines. This paper treats chord generation as a standalone task and addresses a question that arises whenever such a model is adapted across genres: how much old-domain data must be retained during fine-tuning to acquire a new domain without...
cs.SE 8 papers
24	Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning 2605.04431 Failure Management for RFT提出自动故障管理框架以提升LLM强化微调过程的鲁棒性。	cs.SEcs.AI	Lingzhe Zhang, Tong Jia, Yunpeng Zhai, Liancheng Fang, Kening Zheng	Reinforcement fine-tuning (RFT) has become a core paradigm for post-training large language models, yet its training process remains highly fragile. Existing efforts mainly improve reliability at the system level or address specific issues in individual subpro... Reinforcement fine-tuning (RFT) has become a core paradigm for post-training large language models, yet its training process remains highly fragile. Existing efforts mainly improve reliability at the system level or address specific issues in individual subproblems by modifying RFT algorithms. Despite their effectiveness, they largely overlook the problem of failure management at the training-process level. When training goes wrong, practitioners still rely heavily on expert-driven manual inspec...
76	Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap 2605.04532 Accountability of software engineering agents分析编程代理服务条款界定责任并提出研究路线图	cs.SEcs.AI	Christoph Treude	AI coding assistants and autonomous agents are becoming integral to software development workflows, reshaping how code is produced, reviewed, and maintained. While recent research has focused mainly on the capabilities and impacts of productivity of these syst... AI coding assistants and autonomous agents are becoming integral to software development workflows, reshaping how code is produced, reviewed, and maintained. While recent research has focused mainly on the capabilities and impacts of productivity of these systems, much less attention has been paid to accountability: who is responsible when agents generate, modify, or recommend code? In practice, accountability is defined through the Terms of Service (ToS) and related policy documents that govern...
109	Beyond Retrieval: A Multitask Benchmark and Model for Code Search 2605.04615 Multitask benchmark for code search提出去污染的代码检索与重排多任务基准并训练重排模型。	cs.SEcs.AI	Siqiao Xue, Zihan Liao, Jin Qin, Ziyin Zhang, Yixiang Mu	Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary re... Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce \textsc{CoREB}, a contamination-limited, multitask \underline{co}de \underline{r}etrieval and r\underline{e}ranking \underline{b}enchmark, together with a fine-tuned code reranker, that goes beyond retri...
128	CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement 2605.04677 LLM-Guided Code Optimization用LLM结合进化搜索与运行时剖析自动选择热点并优化多语言代码。	cs.SEcs.AI	Ajay Krishna Borra, Wenzhuo Yang, Samarth Arora, Akhilesh Deepak Gotmare, Gokulakrishnan Gopalakrishnan	We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS), automated code refinement... We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS), automated code refinement, and language-specific evaluation pipelines for Java and Salesforce Apex. The system uses Java Flight Recorder (JFR) profiles to build weighted component graphs and select optimization targets that account for most execution cost, reducing...
133	Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code 2605.05267 LLM Code Quality Review系统综述训练数据质量缺陷如何导致LLM生成代码的错误与漏洞。	cs.SEcs.AI	Kaifeng He, Xiaojun Zhang, Peiliang Cai, Mingwei Liu, Yanlin Wang	Large language models (LLMs) frequently generate defective outputs in code generation tasks, ranging from logical bugs to security vulnerabilities. While these generation failures are often treated as model-level limitations, empirical evidence increasingly tr... Large language models (LLMs) frequently generate defective outputs in code generation tasks, ranging from logical bugs to security vulnerabilities. While these generation failures are often treated as model-level limitations, empirical evidence increasingly traces their root causes to imperfections within the training corpora. Yet, the specific mechanisms linking training data quality issues to generated code quality issues remain largely unmapped. This paper presents a systematic literature rev...
239	Architectural Constraints Alignment in AI-assisted, Platform-based Service Development 2605.04973 Architectural constraint-aware AI development用检索增强脚手架与澄清循环让AI生成代码符合架构约束。	cs.SEcs.AI	Julius Irion, Moritz Leugers, Paul Hartwig, Simon Kling, Tachmyrat Annayev	AI-assisted development tools enable rapid prototyping of services but often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments. Consequently, generated artifacts may exhib... AI-assisted development tools enable rapid prototyping of services but often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments. Consequently, generated artifacts may exhibit brittle behavior and limited deployability. We propose a retrieval-augmented scaffolding approach that combines platform-based code generation with agentic clarification loops to expose and resolve architectural constraint ambiguities. B...
351	The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking 2605.06707 LLM单文件网页生成评测纵向比较多家LLM首轮HTML生成质量并跟踪社交传播表现。	cs.SEcs.AI	Diego Cabezas Palacios	This paper presents an eight-week observational comparison of 68 single-file HTML generations collected across 17 public experiments in the "HTML AI Battle" project between December 10, 2025 and February 4, 2026. Four reasoning model families, GPT, Gemini, Gro... This paper presents an eight-week observational comparison of 68 single-file HTML generations collected across 17 public experiments in the "HTML AI Battle" project between December 10, 2025 and February 4, 2026. Four reasoning model families, GPT, Gemini, Grok, and Claude, were compared under a fixed public-interface protocol with no custom instructions, no personality tuning, and no repair prompts. Each output was evaluated from a rendered browser video using human scores and a Gemini LLM-as-a...
372	Mise en Place for Agentic Coding: Deliberate Preparation as Context Engineering Methodology 2605.05400 智能体编程上下文工程提出MEP方法论通过准备上下文提升代码代理可靠性。	cs.SEcs.AIcs.HC	Andrew Zigler	The rapid adoption of AI coding agents has produced a dominant workflow pattern -- often called "vibe coding" -- that prioritizes speed of implementation over deliberate preparation. We argue that this approach creates a systematic alignment problem: agents th... The rapid adoption of AI coding agents has produced a dominant workflow pattern -- often called "vibe coding" -- that prioritizes speed of implementation over deliberate preparation. We argue that this approach creates a systematic alignment problem: agents that lack sufficient context produce code requiring extensive debugging and refactoring, consuming substantial development time. Drawing on the culinary concept of mise en place (everything in its place; abbreviated MEP), we propose a three-p...
eess.AS 2 papers
60	JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions 2605.04505 Instruction-Driven Audio Evaluation用自然语言指令对齐LLM实现零样本音频与语音评测。	eess.AScs.AIcs.SD	Leying Zhang, Bowen Shi, Haibin Wu, Bach Viet Do, Yanmin Qian	The rapid advancement of generative audio models has outpaced the development of robust evaluation methodologies. Existing objective metrics and general multimodal large language models (MLLMs) often struggle with domain generalization, zero-shot capabilities,... The rapid advancement of generative audio models has outpaced the development of robust evaluation methodologies. Existing objective metrics and general multimodal large language models (MLLMs) often struggle with domain generalization, zero-shot capabilities, and instructional flexibility. To address these bottlenecks, we propose JASTIN, a generalizable, instruction-driven audio evaluation framework that formulates audio assessment as a self-instructed reasoning task. JASTIN bridges a frozen hi...
159	Spatial-Magnifier: Spatial upsampling for multichannel speech enhancement 2605.04749 虚拟麦克风语音增强用神经网络生成虚拟麦克风信号以提升多通道语音增强指向性。	eess.AS	Dongheon Lee, Ashutosh Pandey, Sanjeel Parekh, Daniel Wong, Jacob Donley	While the spatial directivity of multichannel speech enhancement algorithms improves with the number of microphones, fitting large capture arrays into real-world edge devices is typically limited by physical constraints. To overcome this limitation, we propose... While the spatial directivity of multichannel speech enhancement algorithms improves with the number of microphones, fitting large capture arrays into real-world edge devices is typically limited by physical constraints. To overcome this limitation, we propose Spatial-Magnifier, a neural network designed to generate virtual microphone (VM) signals from a limited set of real microphone (RM) measurements. Moreover, we introduce the Spatial Audio Representation Learning (SARL) framework, which leve...
eess.IV 5 papers
10	Hyperspectral Anomaly Detection Using Einstein Fuzzy Computing and Quantum Neural Network 2605.04388 Hyperspectral Anomaly Detection结合爱因斯坦模糊计算与量子神经网络进行高光谱异常检测。	eess.IV	Chia-Hsiang Lin, Si-Sheng Young, Reza Langari	In the remote sensing (RS) field, hyperspectral imagery provides rich spectral information and facilitates numerous critical applications, such as material identification. Among these applications, hyperspectral anomaly detection (HAD) aims to detect substance... In the remote sensing (RS) field, hyperspectral imagery provides rich spectral information and facilitates numerous critical applications, such as material identification. Among these applications, hyperspectral anomaly detection (HAD) aims to detect substances whose spectral characteristics deviate from background spectra, which are termed anomalies. However, many widely used HAD algorithms in the RS community identify anomalies by relying on a ``background reconstruction'' strategy. Furthermor...
285	External Validation of Deep Learning Models for BI-RADS Breast Density Prediction from Ultrasound Images 2605.05082 Breast density prediction validation在独立队列外部验证超声预测BI-RADS乳腺密度的深度模型。	eess.IVcs.CV	Yuxuan Chen, Arianna Bunnell, Yanqi Xu, Haoyan Yang, Thomas K. Wolfgruber	We externally validated three deep learning models (DenseNet121, ViT-B/32, and ResNet50) for predicting mammographic breast density from breast ultrasound exams on an independent cohort. The external validation set comprised 2,000 ultrasound exams, including 5... We externally validated three deep learning models (DenseNet121, ViT-B/32, and ResNet50) for predicting mammographic breast density from breast ultrasound exams on an independent cohort. The external validation set comprised 2,000 ultrasound exams, including 500 cancer cases defined by an initial negative exam (BI-RADS 1 or 2) followed by a cancer diagnosis within 6 months to 10 years, and 1,500 negative controls matched by manufacturer and study year. Performance was measured using patient-leve...
317	CTseg: A Tool for Brain CT Segmentation, Spatial Normalisation, and Volumetrics 2605.05154 脑CT分割与体积测量发布并验证CTseg实现脑CT分割、配准归一化与体积估计流程。	eess.IV	Mikael Brudfors	This paper presents and validates CTseg, a freely available software for brain CT segmentation, spatial normalisation, and volumetrics. CTseg builds on the Multi-Brain generative modelling framework, providing a CT-specific pipeline that produces tissue maps, ... This paper presents and validates CTseg, a freely available software for brain CT segmentation, spatial normalisation, and volumetrics. CTseg builds on the Multi-Brain generative modelling framework, providing a CT-specific pipeline that produces tissue maps, deformation fields, and brain volume estimates in the same format as SPM's unified segmentation, thereby extending SPM's established analysis chain from MRI to CT. CTseg is designed for routine hospital CT scans without requiring preprocess...
329	MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge 2605.05175 MRI知识LLM基准构建MRI-Eval分层题库评测LLM的MRI物理与GE扫描仪操作知识。	eess.IVcs.CLphysics.med-ph	Perry E. Radau	Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge centra... Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge central to research MRI practice. Purpose: We developed MRI-Eval, a tiered benchmark for relative model comparison on MRI physics and GE scanner operations knowledge using primary multiple-choice questions (MCQ), with stem-only and primed diagnos...
417	Tumor-aware augmentation with task-guided attention analysis improves rectal cancer segmentation from magnetic resonance images 2605.05522 Tumor-Aware Augmentation for MRI Segmentation用肿瘤感知增强与注意力分析提升直肠癌MRI分割。	eess.IVcs.CV	Aneesh Rangnekar, Joao Miranda, Natally Horvat, Stephanie Chahwan, Samir Alrayess	Pretraining on large-scale datasets has been shown to improve transformer generalizability, even for out-of-domain (OOD) modalities and tasks. However, two common assumptions often fail under OOD transfer: that downstream datasets can be adapted to the fixed i... Pretraining on large-scale datasets has been shown to improve transformer generalizability, even for out-of-domain (OOD) modalities and tasks. However, two common assumptions often fail under OOD transfer: that downstream datasets can be adapted to the fixed input geometry of pretrained models and that pretrained representations transfer effectively across imaging modalities. We show that these assumptions break down through two interacting failure modes in CT-to-MRI transfer: inefficient token ...
eess.SY 2 papers
7	Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery 2605.04375 AI-Controlled Lab Automation提出实验即代码的声明式栈以支持AI代理操控真实实验室。	eess.SYcs.AI	Zhenning Yang, Yuhan Chen, Patrick Tser Jern Kon, Tongyuan Miao, Hongyi Lin	To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. Wh... To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. While some tasks can be performed on a computer (e.g., data analysis, running simulated experiments), Eureka moments could occur at any time while operating lab instruments (e.g., when a scientist notices unexpected clues, intuition may promp...
270	Kinematic Discriminants of Deceleration Behavior Modes in Car-Following: Evidence from NGSIM Trajectory Data 2605.05050 跟驰减速行为判别用NGSIM轨迹揭示不同减速强度下闭合率与视觉逼近的判别主导切换。	eess.SYcs.LG	Eni Solomon Laughter	Gap-closing rate and visual looming swap discriminative dominance depending on deceleration intensity - a finding that reconciles a long-standing conflict in the car-following literature and challenges spacing-centered assumptions in traditional driver behavio... Gap-closing rate and visual looming swap discriminative dominance depending on deceleration intensity - a finding that reconciles a long-standing conflict in the car-following literature and challenges spacing-centered assumptions in traditional driver behavior models. This study presents a two-stage analytical framework that distinguishes between information availability (kinematic variables measurable in the environment) and information utilization (variables that demonstrably separate driver ...
math.AP 1 papers
216	Neural Discovery of Strichartz Extremizers 2605.04918 Neural search for Strichartz extremizers用神经网络数值搜索Strichartz不等式的极值函数。	math.APcs.LGmath.NA	Nicolás Valenzuela, Ricardo Freire, Claudio Muñoz	Strichartz inequalities are a cornerstone of the modern theory of dispersive PDEs, but their extremizers are known explicitly only in a handful of sharp cases. The non-convexity of the underlying functional makes the problem hard, and to our knowledge no syste... Strichartz inequalities are a cornerstone of the modern theory of dispersive PDEs, but their extremizers are known explicitly only in a handful of sharp cases. The non-convexity of the underlying functional makes the problem hard, and to our knowledge no systematic numerical attack has been attempted. We propose a simple neural-network-based pipeline that searches for extremizers as critical points of the Strichartz ratio, and apply it in three settings. First, on the Schrödinger group we recove...
math.CA 1 papers
336	Almost-Orthogonality in Lp Spaces: A Case Study with Grok 2605.05192 Lp空间近正交不等式反例构造反例否定Carbery强化三角不等式并分析其条件。	math.CAcs.AImath.COmath.PR	Ziang Chen, Jaume de Dios Pont, Paata Ivanisvili, Jose Madrid, Haozhu Wang	Carbery proposed the following sharpened form of triangle inequality for many functions: for any $p\ge 2$ and any finite sequence $(f_j)_j\subset L^p$ we have \[ \Big\\|\sum_j f_j\Big\\|_p \ \le\ \left(\sup_{j} \sum_{k} α_{jk}^{\,c}\right)^{1/p'} \Big(\sum_j \\|f... Carbery proposed the following sharpened form of triangle inequality for many functions: for any $p\ge 2$ and any finite sequence $(f_j)_j\subset L^p$ we have \[ \Big\\|\sum_j f_j\Big\\|_p \ \le\ \left(\sup_{j} \sum_{k} α_{jk}^{\,c}\right)^{1/p'} \Big(\sum_j \\|f_j\\|_p^p\Big)^{1/p}, \] where $c=2$, $1/p+1/p'=1$, and $α_{jk}=\sqrt{\frac{\\|f_{j}f_{k}\\|_{p/2}}{\\|f_{j}\\|_{p}\\|f_{k}\\|_{p}}}$. In the first part of this paper we construct a counterexample showing that this inequality fails for every $p>2$...
math.LO 1 papers
359	Towards an Inferentialist Account of Information Through Proof-theoretic Semantics 2605.05368 信息的证明论语义基础用证明论语义提出信息的推理主义框架与逻辑基础。	math.LOcs.AI	Matthew Collinson, Timo Eckhardt, David Pym	Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools f... Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools for understanding the complex ecosystems of systems upon which the society depends. We seek to rectify this by taking a first step towards developing an inferentialist semantic theory of information. There are three key interacting component...
math.NA 1 papers
84	Neural-Guided Domain Restriction to Accelerate Pseudospectra Computation for Structured Non-normal Banded Matrices 2605.04550 Neural-guided pseudospectra acceleration用神经引导的域限制加速结构化非正规带状矩阵伪谱计算	math.NAcs.LG	Amit Punia, Rakesh Kumar, Madan Lal	Computing pseudospectra of non-normal matrices is essential for understanding the stability and transient behavior of dynamical systems. Such analysis is critical in applications including fluid dynamics, control systems, and differential operators, where non-... Computing pseudospectra of non-normal matrices is essential for understanding the stability and transient behavior of dynamical systems. Such analysis is critical in applications including fluid dynamics, control systems, and differential operators, where non-normality can lead to significant transient amplification and sensitivity to perturbations that are not captured by eigenvalue analysis alone. At large scales, commonly used numerical approaches for pseudospectra computation can become comp...
math.OC 2 papers
64	Predictive and Prescriptive AI toward Optimizing Wildfire Suppression 2605.04510 Wildfire suppression resource optimization用预测+整数优化联合分配消防队伍以优化灭火资源调度	math.OCcs.AIcs.LG	Leonard Boussioux, Alexandre Jacquillat, Ryne Reger, Jacob Wachspress	Intense wildfire seasons require critical prioritization decisions to allocate scarce suppression resources over a dispersed geographical area. This paper develops a predictive and prescriptive approach to jointly optimize crew assignments and wildfire suppres... Intense wildfire seasons require critical prioritization decisions to allocate scarce suppression resources over a dispersed geographical area. This paper develops a predictive and prescriptive approach to jointly optimize crew assignments and wildfire suppression. The problem features a discrete resource-allocation structure with endogenous wildfire demand and non-linear wildfire dynamics. We formulate an integer optimization model with crew assignments on a time-space-rest network, wildfire dy...
365	Meta-learning for sample-efficient Bayesian optimisation of fed-batch processes 2605.05382 元学习贝叶斯优化用元学习提升分批过程配方的样本效率贝叶斯优化。	math.OCcs.LG	Becky Langdon, Gabriel D. Patrón, Chrysoula D. Kappatou, Robert M. Lee, Behrang Shafei	The optimisation of fed-batch (bio)chemical process recipes is subject to inherent, underlying, and unmeasurable fluctuations across batches, whose trajectories are difficult to model and costly to measure. Bayesian Optimisation (BayesOpt) is a powerful tool f... The optimisation of fed-batch (bio)chemical process recipes is subject to inherent, underlying, and unmeasurable fluctuations across batches, whose trajectories are difficult to model and costly to measure. Bayesian Optimisation (BayesOpt) is a powerful tool for sampling and optimisation of expensive-to-measure functions. Gaussian Processes (GPs), the surrogate models used in BayesOpt, are static, forecast poorly, and lack generalisation across experiments, limiting their applicability to time-v...
math.PR 1 papers
337	Grokability in five inequalities 2605.05193 AI协作数学不等式发现总结与Grok协作得到并验证的五个不等式与界改进。	math.PRcs.AImath.APmath.CAmath.FA	Paata Ivanisvili, Xinyuan Xie	In this note, we report five mathematical discoveries made in collaboration with Grok, all of which have been subsequently verified by the authors. These include an improved lower bound on the maximal Gaussian perimeter of convex sets in $\mathbb{R}^n$, sharpe... In this note, we report five mathematical discoveries made in collaboration with Grok, all of which have been subsequently verified by the authors. These include an improved lower bound on the maximal Gaussian perimeter of convex sets in $\mathbb{R}^n$, sharper $L_2$-$L_1$ moment comparison inequalities on the Hamming cube $\{-1,1\}^n$, a strengthened autoconvolution inequality, improved asymptotic bounds on the size of the largest $g$-Sidon sets in $\{1,\dots,n\}$, and an optimal balanced Szare...
math.ST 1 papers
384	Direct Estimation of Schrödinger Bridge Time-Series Drifts: Finite-Sample, Asymptotic, and Adaptive Guarantees 2605.05432 薛定谔桥漂移估计直接核回归估计SB时间序列漂移并给出有限样本保证。	math.STcs.LGstat.ML	Othmane Mazhar, Huyên Pham	We study nonparametric estimation of Schrödinger bridge (SB) drifts from i.i.d.\ data observed on a single time interval. Starting from the conditional-ratio form of the Schrödinger bridge time-series (SBTS) drift formula, we analyze a direct Nadaraya--Watson ... We study nonparametric estimation of Schrödinger bridge (SB) drifts from i.i.d.\ data observed on a single time interval. Starting from the conditional-ratio form of the Schrödinger bridge time-series (SBTS) drift formula, we analyze a direct Nadaraya--Watson plug-in estimator built from kernelized numerator and denominator terms. Unlike recent SB analyses based on entropic-OT potentials, Sinkhorn iterations, or iterative bridge solvers, our approach works directly at the drift level and isolate...
q-bio.BM 1 papers
50	Enhancing Cryo-EM Density Map Segmentation in Phenix for Improved Atomic Model Building 2605.05259 Cryo-EM Map Segmentation Pipeline结合AlphaFold改进Phenix分割以自动构建更准原子模型。	q-bio.BMcond-mat.mtrl-scics.AIq-bio.QM	Chenwei Zhang	We introduce PhenixCraft, a fully automated pipeline for building atomic models from cryo-EM density maps. By integrating AlphaFold predictions, we enhance the map-segmentation step in Phenix during model building, addressing challenges posed by noise and arti... We introduce PhenixCraft, a fully automated pipeline for building atomic models from cryo-EM density maps. By integrating AlphaFold predictions, we enhance the map-segmentation step in Phenix during model building, addressing challenges posed by noise and artifacts that traditionally hinder this step. Our results demonstrate PhenixCraft's superior performance in TM-scores and sequence accuracy, significantly improving upon the limitations and inefficiencies of traditional model building using Ph...
q-bio.NC 2 papers
28	Dissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networks 2605.04443 Neural Alignment and Robustness分析神经引导DCNN鲁棒性优势是否源于空间频率依赖变化。	q-bio.NCcs.AI	Zhenan Shao, Tianyu Ren, Chengxiao Wang, Leyla Isik, Diane M. Beck	Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex ... Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex activity improves adversarial robustness, but the mechanisms driving this advantage are unclear. One hypothesis suggests that neural alignment confers robustness by biasing models away from brittle high-frequency details and towards the low...
289	Think-Aloud Reshapes Automated Cognitive Model Discovery Beyond Behavior 2605.05091 Think-aloud constrained model discovery引入思维口述轨迹约束，提升LLM自动认知模型发现质量。	q-bio.NCcs.AI	Hanbo Xie, Akshay K. Jagadish, Lan Pan, Robert C. Wilson	Computational cognitive models discovered using large language models have so far relied solely on behavioral data. However, it is well-known that models produced from the behavioral trajectory alone are typically under-determined. In this work, we explore the... Computational cognitive models discovered using large language models have so far relied solely on behavioral data. However, it is well-known that models produced from the behavioral trajectory alone are typically under-determined. In this work, we explore the use of Think Aloud traces as an additional form of data constraint during automated model discovery. When applied to the domain of risky decision-making, we find that the models discovered with think-aloud achieve significantly improved pr...
quant-ph 1 papers
104	Generative Quantum-inspired Kolmogorov-Arnold Eigensolver 2605.04604 Quantum-inspired eigensolver for chemistry提出参数高效的生成式量子启发特征求解器用于量化学。	quant-phcs.LG	Yu-Cheng Lin, Yu-Chao Hsu, I-Shan Tsai, Chun-Hua Lin, Kuo-Chung Peng	High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-in... High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE), a parameter-efficient extension of the generative quantum eigensolver (GQE) for quantum chemistry. GQKAE replaces the parameter-heavy feed-forward network components in GPT-style generative eige...
stat.ME 3 papers
9	Causal discovery under mean independence and linearity 2605.04381 Causal Discovery with Mean Independence提出LiMIAM以均值独立替代全独立来识别线性因果结构。	stat.MEcs.LGmath.STstat.ML	Geert Mesters, Alvaro Ribot, Anna Seigal, Piotr Zwiernik	Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to ... Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to recover the wrong causal order, even with infinite data. We introduce the Linear Mean-Independent Acyclic Model (LiMIAM), which replaces full independence with weaker one-sided mean-independence restrictions on the disturbances. Under finit...
182	PAIR-CI: Calibrated Conditional Independence Testing for Causal Discovery with Incomplete Data 2605.04838 CI testing with missing data提出PAIR-CI将多重插补融入置换检验以校准因果发现CI测试。	stat.MEcs.LGstat.ML	Thomas S. Robinson, Ranjit Lall	The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation e... The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation error induces spurious conditional dependence. We introduce PAIR-CI, a nonparametric CI test that restores calibration by integrating multiple imputation directly into the inferential procedure via a paired permutation design. PAIR-CI compar...
406	A renormalization-group inspired lattice-based framework for piecewise generalized linear models 2605.05493 Lattice Piecewise Generalized Linear Models提出RG启发的格点分区可解释分段广义线性模型框架。	stat.MEcond-mat.stat-mechcs.LGmath.ST	Joshua C. Chang	We formally introduce a class of models inspired by renormalization group (RG) theory, built on additive hierarchical expansions analogous to those appearing in functional ANOVA and mixed-effects models. Like ReLU convolutional neural networks, they are almost... We formally introduce a class of models inspired by renormalization group (RG) theory, built on additive hierarchical expansions analogous to those appearing in functional ANOVA and mixed-effects models. Like ReLU convolutional neural networks, they are almost everywhere locally linear; unlike ReLU networks, their partition structure is explicit, interpretable, and easy to modify or constrain. In these models, one defines a multidimensional lattice partition of the input space and uses it to sca...
stat.ML 11 papers
75	Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning 2605.05262 Submodular tree search for tool-use RL将工具使用树搜索建模为子模最大化以提升固定预算rollout信息量	stat.MLcs.AIcs.LG	Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song	We formalize Rollout Informativeness under a Fixed Budget (RIFB) as the expected non-vanishing policy-gradient mass that a tool-use rollout set injects into Group Relative Policy Optimization (GRPO). We prove that any budget-agnostic independent sampler suffer... We formalize Rollout Informativeness under a Fixed Budget (RIFB) as the expected non-vanishing policy-gradient mass that a tool-use rollout set injects into Group Relative Policy Optimization (GRPO). We prove that any budget-agnostic independent sampler suffers a collapse rate bounded away from zero for hard prompts regardless of the budget. Motivated by this, we recast intermediate state selection as a monotone submodular maximization problem, where a greedy one-step selector enjoys a 1 minus 1...
99	Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points 2605.04589 Dynamic network trajectory geometry提出多尺度欧氏轨迹表示以做动态网络归因与变点检测。	stat.MLcs.LGmath.ST	Haruka Ezoe, Ryohei Hisano	A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relat... A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relates these trajectories to node embeddings. In multilayer and unfolded spectral constructions, however, node embeddings and their underlying latent positions are identifiable only up to general linear transformations. Although this ambiguity ...
189	Forecasting Oncology Demand Trends with Boosting-Based Bayesian Conjugate Models 2605.05270 Bayesian forecasting for healthcare demand用Gamma-Poisson贝叶斯模型结合残差boosting预测肿瘤门诊需求趋势。	stat.MLcs.LGstat.AP	Ademir Batista dos Santos Neto, Tiago Alessandro Espinola Ferreira, Paulo Renato Alves Firmino	Accurate trend forecasting in healthcare time series is essential for planning and resource allocation. This paper proposes a Bayesian framework for predicting oncology demand trends, modeling weekly appointments as a Poisson process with a Gamma prior to the ... Accurate trend forecasting in healthcare time series is essential for planning and resource allocation. This paper proposes a Bayesian framework for predicting oncology demand trends, modeling weekly appointments as a Poisson process with a Gamma prior to the demand rate. To enhance adaptability and capture persistent directional patterns, we incorporate a residual-based boosting mechanism grounded in a Gamma-Log-Normal conjugate structure. This boosting approach allows the model to track both s...
221	Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift 2605.04932 Deployment risk under covariate drift用雅可比-速度界刻画协变量漂移下长期部署风险波动。	stat.MLcs.LG	Jonathan R. Landers	We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincaré inequality reduces temporal risk volatility to derivative energy, and a Jacobian-velocity theorem identifies directional tangent energy along the deplo... We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincaré inequality reduces temporal risk volatility to derivative energy, and a Jacobian-velocity theorem identifies directional tangent energy along the deployment path as the governing quantity under explicit along-path regularity and domination assumptions. Under low-rank drift, that quantity reduces to directional Jacobian energy in the drift subspace, motivating drift-aligned tangent regular...
253	Scalable inference of spatial regions and temporal signatures from time series 2605.05008 时空区域化推断从时序数据中可扩展地推断空间连通区域及其时间特征签名。	stat.MLcs.LGcs.SIphysics.soc-ph	Jiayu Weng, Alec Kirkley	Regionalization aims to partition a spatial domain into contiguous regions that share similar characteristics, enabling more effective spatial analysis, policy making, and resource management. Existing approaches for spatial regionalization typically rely on s... Regionalization aims to partition a spatial domain into contiguous regions that share similar characteristics, enabling more effective spatial analysis, policy making, and resource management. Existing approaches for spatial regionalization typically rely on static spatial snapshots rather than evolving time series. Meanwhile, most time series clustering methods ignore spatial structure or enforce spatial continuity through ad hoc regularization, constraining the number of inferred regions a pri...
260	Hypergraph Generation via Structured Stochastic Diffusion 2605.05024 超图扩散生成模型提出结构化随机扩散在关联矩阵上直接生成保真超图结构。	stat.MLcs.LGstat.COstat.ME	Christopher Nemeth	Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE, a generative model ... Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE, a generative model defined directly on relaxed incidence matrices via a structured stochastic diffusion. The forward process combines a hypergraph-specific two-sided heat operator with an Ornstein--Uhlenbeck component, preserving structure-aware noising near ...
291	Proximal Projection for Doubly Sparse Regularized Models 2605.05093 Doubly sparse regularized regression提出近端投影方法在图结构预测变量下实现双重稀疏正则。	stat.MLcs.LGstat.COstat.ME	Jia Wei He, R. Ayesha Ali, Gerarda Darlington	Regularization is often used in high-dimensional regression settings to generate a sparse model, which can save tremendous computing resources and identify predictors that are most strongly associated with the response. When the predictors can be represented b... Regularization is often used in high-dimensional regression settings to generate a sparse model, which can save tremendous computing resources and identify predictors that are most strongly associated with the response. When the predictors can be represented by a Gaussian graphical model, the structure of the predictor graph can be exploited during regularization. Our proposed model exploits this underlying predictor graph structure by decomposing the estimated coefficient vector into a sum of l...
334	Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval 2605.05189 线性联想记忆容量阈值分析线性记忆在不同检索准则下的容量阈值与构造。	stat.MLcs.ITcs.LG	Nicholas Barnfield, Juno Kim, Eshaan Nichani, Jason D. Lee, Yue M. Lu	How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we s... How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing...
386	Estimating Implicit Regularization in Deep Learning 2605.05436 深度学习隐式正则估计提出方法量化训练过程带来的隐式正则化强度与形式。	stat.MLcs.LG	Joseph H. Rudoler, Kevin Tan, Giles Hooker, Konrad P. Kording	Deep learning systems are known to exhibit implicit regularization (alt. implicit bias), favoring simple solutions instead of merely minimizing the loss function. In some cases, we can analytically derive the implicit regularization -- connecting it to an equi... Deep learning systems are known to exhibit implicit regularization (alt. implicit bias), favoring simple solutions instead of merely minimizing the loss function. In some cases, we can analytically derive the implicit regularization -- connecting it to an equivalent penalty that augments the learning objective. However, modern deep learning systems are complex, carrying modifications to the training procedure and architecture (e.g. early stopping, minibatching, dropout) whose effects are not alw...
391	Convexity in Disguise: A Theoretical Framework for Nonconvex Low-Rank Matrix Estimation 2605.05446 Nonconvex Low-Rank Estimation Theory提出统一理论解释非凸低秩矩阵估计的隐式凸性。	stat.MLcs.ITcs.LGmath.OC	Chengyu Cui, Gongjun Xu	Nonconvex methods have emerged as a dominant approach for low-rank matrix estimation, a problem that arises widely in machine learning and AI for learning and representing high-dimensional data. Existing analyses for these methods often require additional regu... Nonconvex methods have emerged as a dominant approach for low-rank matrix estimation, a problem that arises widely in machine learning and AI for learning and representing high-dimensional data. Existing analyses for these methods often require additional regularization to mitigate nonconvexity, even though such regularization is often unnecessary in practice. Moreover, most analyses rely on problem-specific arguments that are difficult to generalize to more complex settings. In this paper, we d...
418	Permutation-preserving Functions and Neural Vecchia Covariance Kernels 2605.05523 Neural Vecchia Gaussian Process Kernels学习Vecchia诱导的协方差参数以构造可扩展GP核。	stat.MLcs.LGstat.CO	Jian Cao, Nian Liu, Ying Lin	We introduce a novel framework for constructing scalable and flexible covariance kernels for Gaussian processes (GPs) by directly learning the covariance structure under a regression-type parameterization induced by Vecchia approximations, using deep neural ar... We introduce a novel framework for constructing scalable and flexible covariance kernels for Gaussian processes (GPs) by directly learning the covariance structure under a regression-type parameterization induced by Vecchia approximations, using deep neural architectures. Specifically, we model kriging coefficients and conditional standard deviations, deterministic quantities that uniquely characterize the covariance, providing stable and informative learning targets. Exploiting the permutation-...