| # | Title | Categories | Authors | Abstract |
|---|---|---|---|---|
| cond-mat.mtrl-sci 1 papers | ||||
| 296 |
Building informative materials datasets beyond targeted objectives
2605.05104
Informative materials dataset design提出最大化数据集长期信息量的材料数据采集与构建框架。
|
cond-mat.mtrl-scics.AIcs.DBcs.LGstat.AP
|
Rafael Espinosa Castañeda, Ashley Dale, Hongchen Wang, Yonatan Kurniawan, Hao Wan |
Materials science data collection can be expensive, making the reuse and long-term utility of datasets critical important for future discovery campaigns. In practice, researchers prioritize a subset of properties due to research interests. However, ignoring a ...Materials science data collection can be expensive, making the reuse and long-term utility of datasets critical important for future discovery campaigns. In practice, researchers prioritize a subset of properties due to research interests. However, ignoring a subset of outcomes in data collection campaigns potentially generate datasets poorly suited for future learning tasks. Here, we present a framework for dataset construction that maximizes informativeness for target properties of interest wh...
|
| cs.AI 34 papers | ||||
| 35 |
Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone
2605.04454
Deployment Alignment Evaluation论证仅靠模型级基准无法推断部署对齐并提出证据分层。
|
cs.AIcs.HCcs.LGcs.SE
|
Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka, Ivan Flechais |
Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise preference, and these scores are often used to support cla...Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise preference, and these scores are often used to support claims about deployed alignment. This paper argues that deployment-relevant alignment cannot be inferred from model-level evaluation alone. Alignment claims should instead be indexed to the level at which evidence is collected: model-level, re...
|
| 47 |
How Does Thinking Mode Change LLM Moral Judgments? A Controlled Instant-vs-Thinking Comparison Across Five Frontier Models
2605.04488
Reasoning Mode Moral Judgments对比即时与思考模式下同一LLM的道德判断差异。
|
cs.AI
|
Sai Sourabh Madur |
We evaluate whether enabling provider-exposed reasoning mode changes moral judgments within the same model checkpoint. Across 100 moral-judgment scenarios and five frontier reasoning-trained LLMs (Claude Sonnet 4.6, GPT 5.5, Gemini 3 Flash, DeepSeek V3.1, and ...We evaluate whether enabling provider-exposed reasoning mode changes moral judgments within the same model checkpoint. Across 100 moral-judgment scenarios and five frontier reasoning-trained LLMs (Claude Sonnet 4.6, GPT 5.5, Gemini 3 Flash, DeepSeek V3.1, and Qwen3.5 397B), aggregate binary-verdict agreement remains high and statistically indistinguishable between instant and thinking modes (Krippendorff's alpha = 0.78 vs. 0.79). However, disagreement is concentrated in 21 model-disputed scenari...
|
| 94 |
From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning
2605.04572
LLM safety degradation dynamics用参数动态分析并量化微调导致的样本级安全退化风险。
|
cs.AIcs.LG
|
Xiao Wang, Yifei Zhang, YongKang Liu, Xiaocui Yang, Zihan Wang |
Safety alignment of Large Language Models (LLMs) is extremely fragile, as fine-tuning on a small number of benign samples can erase safety behaviors learned from millions of preference examples. Existing studies attempt to explain this phenomenon by comparing ...Safety alignment of Large Language Models (LLMs) is extremely fragile, as fine-tuning on a small number of benign samples can erase safety behaviors learned from millions of preference examples. Existing studies attempt to explain this phenomenon by comparing parameters and hidden states before and after fine-tuning, but overlook their dynamic evolution during fine-tuning. In this paper, we uncover a critical mechanism underlying safety degradation by analyzing parameter dynamics, where benign f...
|
| 106 |
SensingAgents: A Multi-Agent Collaborative Framework for Robust IMU Activity Recognition
2605.04608
Multi-agent IMU activity recognition构建多智能体协作框架提升IMU活动识别鲁棒性与可解释性。
|
cs.AI
|
Naiyu Zheng, Tianlong Yu, Haochen Yin, Xiaoyi Fan, Xiping Hu |
Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is a cornerstone of mobile health, smart environments, and human-computer interaction. However, current deep learning-based HAR models often struggle with heavy reliance on labeled ...Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is a cornerstone of mobile health, smart environments, and human-computer interaction. However, current deep learning-based HAR models often struggle with heavy reliance on labeled data, position-specific ambiguity, and a lack of transparent reasoning. Inspired by the advanced agents framework, which emulates a collaborative agent using Large Language Models (LLMs), we propose SensingAgents, a novel multi-agent system...
|
| 113 |
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
2605.04624
Agent repair ranking instability benchmark发布配对执行轨迹语料以审计修复代理的评测通道不稳定。
|
cs.AIcs.SE
|
Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song |
Agent-repair leaderboards reorder under evaluator reconfiguration, and a measurable share of the reordering is produced by methods that consult evaluator-derived signal during internal selection of candidate repairs. We document this failure mode on a public l...Agent-repair leaderboards reorder under evaluator reconfiguration, and a measurable share of the reordering is produced by methods that consult evaluator-derived signal during internal selection of candidate repairs. We document this failure mode on a public leaderboard and release AuditRepairBench, a paired-execution trace corpus of 576,000 registered cells (96,000 executed) that operationalizes evaluator-channel-blocking ranking instability within a declared observability boundary. A modular s...
|
| 141 |
Budget-aware Auto Optimizer Configurator
2605.04711
Memory-Efficient Optimizer Configuration提出BAOC按网络块分配优化器状态以在预算下显著降低训练显存。
|
cs.AIcs.LGmath.OC
|
Kang Liu, Wei Peng, Jianchen Hu |
Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anisotropy, implying that expensive optimizer states are not un...Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anisotropy, implying that expensive optimizer states are not universally necessary and using a global optimizer is often memory-inefficient. We propose the Budget-Aware Optimizer Configurator (BAOC) to reduce memory cost by assigning suitable optimizer configurations to individual blocks under given bu...
|
| 154 |
Reward-Decomposed Reinforcement Learning for Immersive Video Role-Playing
2605.04733
视频角色扮演对话强化学习提出分离感知推理与表达的RL框架生成沉浸式视频对话。
|
cs.AI
|
Miao Wang, Yuling Shi, Yijiang Li, Yeheng Chen, Xiaodong Gu |
Text-based role-playing models can imitate character styles, yet they often fail to reflect a scene's atmosphere and evolving tension, both essential for immersive applications such as Virtual Reality (VR) games and interactive narratives. We study video-groun...Text-based role-playing models can imitate character styles, yet they often fail to reflect a scene's atmosphere and evolving tension, both essential for immersive applications such as Virtual Reality (VR) games and interactive narratives. We study video-grounded role-playing dialogue and introduce EBM-RL (Eye-Brain-Mouth Reinforcement Learning), a decoupled GRPO-based framework that explicitly separates observation ([perception]), reasoning ([think]), and utterance ([answer]). This structure pr...
|
| 170 |
AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use
2605.04785
AI代理工具调用运行时安全提出运行时安全评估与拦截机制以防AI代理危险工具调用。
|
cs.AIcs.CR
|
Chenglin Yang |
Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreve...Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreversible harm. Existing defenses are incomplete: post-hoc benchmarks measure behavior after execution, static guardrails miss obfuscation and multi-step context, and infrastructure sandboxes constrain where code runs without understanding wha...
|
| 173 |
DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
2605.04808
AI代理红队测试平台提出可控交互式红队平台以系统评测AI代理安全风险。
|
cs.AI
|
Zhaorun Chen, Xun Liu, Haibo Tong, Chengquan Guo, Yuzhou Nie |
AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns. A growing...AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns. A growing number of real-world incidents have shown that adversaries can easily manipulate agents into performing harmful actions, such as leaking API keys, deleting user data, or initiating unauthorized transactions. Evaluating agent security is in...
|
| 210 |
Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games
2605.04906
LLM strategic reasoning in multi-agent games提出Strat-Reasoner用强化学习提升LLM在多智能体博弈中的战略推理。
|
cs.AI
|
Yidong He, Yutao Lai, Pengxu Yang, Jiarui Gan, Jiexin Wang |
While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challen...While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluation of the reasoning process and the credit assignment over multiple reasoning steps. Existing single-agent reinforcement learning (RL) approaches and their multi-agent extensions fail to address these challenges as they d...
|
| 211 |
Curated AI beats frontier LLMs at pharma asset discovery
2605.04908
Pharma asset discovery benchmarking对比人工标注药物资产平台与前沿LLM的管线检索能力。
|
cs.AIq-bio.QM
|
Łukasz Kidziński, Kevin Thomas |
General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset -- an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotat...General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset -- an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotations -- against four frontier systems with web access (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, Perplexity sonar-pro) on ten niche oncology/immunology targets where most of the pipeline lives in the long tail of preclinical and Asian-devel...
|
| 214 |
A Foundation Model for Zero-Shot Logical Rule Induction
2605.04916
Zero-shot logical rule induction预训练规则诱导模型用统计特征实现零样本ILP规则学习。
|
cs.AIcs.LGcs.SC
|
Yin Jun Phua |
Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters are bound to specific predicates and require retraining for each new task. We introduce Neural Rule Inducer (NRI), a pre...Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters are bound to specific predicates and require retraining for each new task. We introduce Neural Rule Inducer (NRI), a pretrained model for zero-shot rule induction. Rather than encoding literal identities, NRI represents literals using domain-agnostic statistical properties such as class-conditional rates, entropy, and co-occurrence, which generalize across v...
|
| 241 |
On-line Learning in Tree MDPs by Treating Policies as Bandit Arms
2605.04979
Tree MDP在线学习将策略视为老虎机臂研究树形MDP的PAC与遗憾学习。
|
cs.AIcs.LG
|
Anvay Shah, Ramsundar Anandanarayanan, Sharayu Moharir, Shivaram Kalyanakrishnan |
A Tree Markov Decision Problem (T-MDP) is a finite-horizon MDP with a starting state $s_{1}$, in which every state is reachable from $s_{1}$ through exactly one state-action trajectory. T-MDPs arise naturally as abstractions of decision making in sequential ga...A Tree Markov Decision Problem (T-MDP) is a finite-horizon MDP with a starting state $s_{1}$, in which every state is reachable from $s_{1}$ through exactly one state-action trajectory. T-MDPs arise naturally as abstractions of decision making in sequential games with perfect recall, against stationary opponents. We consider the problem of on-line learning in T-MDPs, both in the PAC and the regret-minimisation regimes. We show that well-known bandit algorithms -- \textsc{Lucb} and \textsc{Ucb} -...
|
| 252 |
Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation
2605.05007
多智能体路由编排提出Uno-Orchestra学习式选择性分解任务并路由到合适模型与工具。
|
cs.AI
|
Zhiqing Cui, Haotong Xie, Jiahao Yuan, Cheng Yang, Hanqing Wang |
Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized un...Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized under one objective. We introduce Uno-Orchestra, a unified orchestration policy that selectively decomposes a task and dispatches each subtask to an admissible (model, primitive) pair, with both decisions learned together from curated RL traj...
|
| 257 |
Position: Embodied AI Requires a Privacy-Utility Trade-off
2605.05017
具身AI隐私权衡论证具身AI在真实环境部署必须系统性权衡隐私泄露与效用。
|
cs.AIcs.RO
|
Xiaoliang Fan, Jiarui Chen, Zhuodong Liu, Ziqi Yang, Peixuan Xu |
Embodied AI (EAI) systems are rapidly transitioning from simulations into real-world domestic and other sensitive environments. However, recent EAI solutions have largely demonstrated advancements within isolated stages such as instruction, perception, plannin...Embodied AI (EAI) systems are rapidly transitioning from simulations into real-world domestic and other sensitive environments. However, recent EAI solutions have largely demonstrated advancements within isolated stages such as instruction, perception, planning and interaction, without considering their coupled privacy implications in high-frequency deployments where privacy leakage is often irreversible. This position paper argues that optimizing these components independently creates a systemi...
|
| 312 |
Executable World Models for ARC-AGI-3 in the Era of Coding Agents
2605.05138
可执行世界模型代理构建可执行Python世界模型的编码代理用于ARC-AGI-3解题规划。
|
cs.AI
|
Sergey Rodionov |
We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, an...We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results ...
|
| 335 |
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents
2605.05191
长程搜索代理上下文管理提出弹性上下文编排方法以压缩并保留关键推理轨迹。
|
cs.AI
|
Yijun Lu, Rui Ye, Yuwen Du, Jiajun Wang, Songhua Liu |
Long-horizon search agents must manage a rapidly growing working context as they reason, call tools, and observe information. Naively accumulating all intermediate content can overwhelm the agent, increasing costs and the risk of errors. We propose that effect...Long-horizon search agents must manage a rapidly growing working context as they reason, call tools, and observe information. Naively accumulating all intermediate content can overwhelm the agent, increasing costs and the risk of errors. We propose that effective context management should be adaptive: parts of the agent's trajectory are maintained at different levels of detail depending on their current relevance to the task. To operationalize this principle, we introduce Context-ReAct, a genera...
|
| 344 |
Understanding Annotator Safety Policy with Interpretability
2605.05329
安全标注政策可解释分析用可解释性区分标注分歧来源并理解安全政策执行问题。
|
cs.AIcs.LG
|
Alex Oesterling, Donghao Ren, Yannick Assogba, Dominik Moritz, Sunnie S. Y. Kim |
Safety policies define what constitutes safe and unsafe AI outputs, guiding data annotation and model development. However, annotation disagreement is pervasive and can stem from multiple sources such as operational failures (annotators misunderstand or misexe...Safety policies define what constitutes safe and unsafe AI outputs, guiding data annotation and model development. However, annotation disagreement is pervasive and can stem from multiple sources such as operational failures (annotators misunderstand or misexecute the task), policy ambiguity (policy wording leaves room for interpretation), or value pluralism (different annotators hold different perspectives on safety). Distinguishing these sources matters. For example, operational failures call ...
|
| 357 |
ZAYA1-8B Technical Report
2605.05365
MoE推理大模型技术报告介绍ZAYA1-8B MoE训练栈与在数学编程基准上的性能。
|
cs.AIcs.CL
|
Robert Washbourne, Rishi Iyer, Tomas Figliolia, Henry Zheng, Ryan Lorig-Roach |
We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AM...We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-wei...
|
| 363 |
Partial Evidence Bench: Benchmarking Authorization-Limited Evidence in Agentic Systems
2605.05379
受限证据基准评测提出Partial Evidence Bench评测授权受限下的证据缺失错误。
|
cs.AIcs.CCcs.ET
|
Krti Tallam |
Enterprise agents increasingly operate inside scoped retrieval systems, delegated workflows, and policy-constrained evidence environments. In these settings, access control can be enforced correctly while the system still produces an answer that appears comple...Enterprise agents increasingly operate inside scoped retrieval systems, delegated workflows, and policy-constrained evidence environments. In these settings, access control can be enforced correctly while the system still produces an answer that appears complete even though material evidence lies outside the caller's authorization boundary. This paper introduces Partial Evidence Bench, a deterministic benchmark for measuring that failure mode. The benchmark ships three scenario families -- due d...
|
| 366 |
BALAR : A Bayesian Agentic Loop for Active Reasoning
2605.05386
贝叶斯主动对话推理提出BALAR外循环主动提问以补全信息并完成任务。
|
cs.AIcs.CLcs.LG
|
Aymen Echarghaoui, Dongxia Wu, Emily B. Fox |
Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what i...Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what information is missing and which question should be asked next. We propose BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm that requires no fine-tuning and enables structured multi-turn interaction be...
|
| 373 |
Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections
2605.05402
CCTV交通软设施评估用视觉测速分析路口软干预对车速与安全的影响。
|
cs.AIcs.CVeess.IV
|
Vinit Katariya, Seungjin Kim, Curtis Craig, Nichole Morris, Hamed Tabkhi |
Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study introduces an AI-enabled analytics framework leveraging existing CCTV infrastructure to evaluate the impact of soft interventions, such as temporary pe...Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study introduces an AI-enabled analytics framework leveraging existing CCTV infrastructure to evaluate the impact of soft interventions, such as temporary pedestrian refuges and curb extensions, on vehicle speed and safety. Using deep learning and perspective-based speed estimation, we evaluated driver behavior before and after interventions, with repeated post-installation monitoring in Week 1...
|
| 374 |
When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models
2605.05403
LLM谄媚性与对齐将谄媚性解释为社交对齐与求真完整性的边界失效。
|
cs.AI
|
Jiechen Li, Catherine A. Barry, Rishika Randev, Janet Chen, Ella Jorgensen |
This position paper argues that sycophancy in LLMs is a boundary failure between social alignment and epistemic integrity. Existing work often operationalizes sycophancy through external behavior such as agreement with incorrect user beliefs, position reversal...This position paper argues that sycophancy in LLMs is a boundary failure between social alignment and epistemic integrity. Existing work often operationalizes sycophancy through external behavior such as agreement with incorrect user beliefs, position reversals, or deviation from an objective standard of correctness. These formulations capture only overt forms of the phenomenon and leave subtler boundary failures involving epistemic integrity and social alignment underspecified. We argue that sy...
|
| 376 |
PRISM: Perception Reasoning Interleaved for Sequential Decision Making
2605.05407
感知推理交织决策用动态问答耦合VLM与LLM以改进多模态决策。
|
cs.AI
|
Mohamed Salim Aissi, Clemence Grislain, Clement Romac, Laure Soulier, Mohamed Chetouani |
Scaling LLM-based embodied agents from text-only environments to complex multimodal settings remains a major challenge. Recent work identifies a perception-reasoning-decision gap in standalone Vision-Language Models (VLMs), which often overlook task-critical i...Scaling LLM-based embodied agents from text-only environments to complex multimodal settings remains a major challenge. Recent work identifies a perception-reasoning-decision gap in standalone Vision-Language Models (VLMs), which often overlook task-critical information. In this paper, we introduce PRISM, a framework that tightly couples perception (VLM) and decision (LLM) through a dynamic question-answer (DQA) pipeline. Instead of passively accepting the VLM's description, the LLM critiques it...
|
| 377 |
Agentic Retrieval-Augmented Generation for Financial Document Question Answering
2605.05409
金融文档智能体RAG提出FinAgent-RAG迭代检索推理以回答财报复杂问题。
|
cs.AIcs.CL
|
Yang Shu, Yingmin Liu, Zequn Xie |
Financial document question answering (QA) demands complex multi-step numerical reasoning over heterogeneous evidence--structured tables, textual narratives, and footnotes--scattered across corporate filings. Existing retrieval-augmented generation (RAG) appro...Financial document question answering (QA) demands complex multi-step numerical reasoning over heterogeneous evidence--structured tables, textual narratives, and footnotes--scattered across corporate filings. Existing retrieval-augmented generation (RAG) approaches adopt a single-pass retrieve-then-generate paradigm that struggles with the compositional reasoning chains prevalent in financial analysis. We propose FinAgent-RAG, an agentic RAG framework that orchestrates iterative retrieval-reason...
|
| 378 |
LaTA: A Drop-in, FERPA-Compliant Local-LLM Autograder for Upper-Division STEM Coursework
2605.05410
本地LLM自动批改提出LaTA在本地硬件上FERPA合规自动批改LaTeX作业。
|
cs.AIcs.HCphysics.ed-ph
|
Jesse A. Rodríguez |
Large-language-model (LLM) graders promise to relieve the grading burden of upper-division STEM courses, but most deployments to date send student work to third-party APIs, violating FERPA and exposing institutions to data risk while requiring substantial assi...Large-language-model (LLM) graders promise to relieve the grading burden of upper-division STEM courses, but most deployments to date send student work to third-party APIs, violating FERPA and exposing institutions to data risk while requiring substantial assignment modification. We present $\textbf{LaTA}\ (\textit{LaTeX Teaching Assistant})$, a drop-in, open-source autograder that runs entirely on commodity on-premises hardware and assumes a LaTeX-native workflow already adopted by many enginee...
|
| 380 |
From History to State: Constant-Context Skill Learning for LLM Agents
2605.05413
常量上下文技能学习提出常量上下文技能表示减少历史依赖并兼顾隐私与能力。
|
cs.AI
|
Haoyang Xie, Xinyuan Wang, Yancheng Wang, Puda Zhao, Feng Ju |
Large language model (LLM) agents are increasingly used to operate browsers, files, code and tools, making personal assistants a natural deployment target. Yet personal agents face a privacy-cost-capability tension: cloud models execute multi-step workflows we...Large language model (LLM) agents are increasingly used to operate browsers, files, code and tools, making personal assistants a natural deployment target. Yet personal agents face a privacy-cost-capability tension: cloud models execute multi-step workflows well but expose sensitive intermediate context to external APIs, while local models preserve privacy but remain less reliable. Both settings also pay repeatedly for long skill prompts and growing histories. We propose constant-context skill l...
|
| 383 |
The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias
2605.05427
LLM安全偏见因果审计用PGM与do算子因果分析区域偏见与安全护栏公平性。
|
cs.AI
|
Alif Al Hasan |
As Large Language Models (LLMs) are integrated into global software systems, ensuring equitable safety guardrails is a critical requirement. Current fairness evaluations predominantly measure bias observationally, a methodology confounded by the inherent toxic...As Large Language Models (LLMs) are integrated into global software systems, ensuring equitable safety guardrails is a critical requirement. Current fairness evaluations predominantly measure bias observationally, a methodology confounded by the inherent toxicity of topics naturally paired with specific demographics in testing datasets. This study introduces a Probabilistic Graphical Model (PGM) framework to audit LLM safety mechanisms causally. By applying Pearl's do-operator, we mathematically...
|
| 389 |
Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure
2605.05440
多智能体授权传播治理提出授权传播问题并讨论多智能体身份治理基础设施。
|
cs.AI
|
Krti Tallam |
The security discussion around agentic AI focuses heavily on prompt injection. This paper argues that multi-agent systems also create a distinct authorization problem: maintaining authorization invariants as non-human principals retrieve data, delegate tasks, ...The security discussion around agentic AI focuses heavily on prompt injection. This paper argues that multi-agent systems also create a distinct authorization problem: maintaining authorization invariants as non-human principals retrieve data, delegate tasks, and synthesize results across changing boundaries. We call this problem authorization propagation. It is not reducible to prompt injection and is not fully addressed by classical access-control models such as RBAC, ABAC, or ReBAC. The paper...
|
| 394 |
Agentic Discovery of Exchange-Correlation Density Functionals
2605.05460
LLM-Driven XC Functional Discovery用LLM代理搜索自动生成并评估DFT交换相关泛函。
|
cs.AIphysics.chem-ph
|
Titouan Duston, Jiashu Liang, Yuanheng Wang, Weihao Gao, Xuelan Wen |
The development of accurate exchange-correlation (XC) functionals remains a longstanding challenge in density functional theory (DFT). The vast majority of XC functionals have been hand designed by human researchers combining physical insight, exact constraint...The development of accurate exchange-correlation (XC) functionals remains a longstanding challenge in density functional theory (DFT). The vast majority of XC functionals have been hand designed by human researchers combining physical insight, exact constraints, and empirical fitting. Recent advances in large language models enable a systematic, automated alternative to this human-driven design loop. This report presents an agentic search system in which an LLM proposes structured functional-for...
|
| 397 |
Intentionality is a Design Decision: Measuring Functional Intentionality for Accountable AI Systems
2605.05475
Measuring AI Functional Intentionality提出可度量的功能意向性指标以支持可问责AI治理。
|
cs.AI
|
Allessia Chiappetta, Robert Mahari |
As AI systems increasingly exhibit autonomous, goal-directed, and long-horizon behavior, users lack a standardized way to detect the degree to which a system functions like an intentional actor for governance and accountability purposes. This position paper de...As AI systems increasingly exhibit autonomous, goal-directed, and long-horizon behavior, users lack a standardized way to detect the degree to which a system functions like an intentional actor for governance and accountability purposes. This position paper defines intentionality not as consciousness, but as a behavioral profile characterized by purpose, foresight, volition, temporal commitment, and coherence - criteria long used in legal and philosophical contexts to infer intent. These propert...
|
| 399 |
LANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks
2605.05478
Neurosymbolic RL Transfer提出多源神经符号迁移框架以自适应融合经验与推理。
|
cs.AI
|
Mahyar Alinejad, Yue Wang, Amrit Singh Bedi, George Atia |
Transfer learning in reinforcement learning (RL) seeks to accelerate learning in new tasks by leveraging knowledge from related sources. Existing neurosymbolic transfer methods, however, typically rely on manually specified task automata, assume a single sourc...Transfer learning in reinforcement learning (RL) seeks to accelerate learning in new tasks by leveraging knowledge from related sources. Existing neurosymbolic transfer methods, however, typically rely on manually specified task automata, assume a single source task, and use fixed knowledge-integration mechanisms that cannot adapt to varying source relevance. We propose LANTERN, a unified framework for multi-source neurosymbolic transfer that addresses these limitations through three components:...
|
| 402 |
FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in Banking
2605.05482
Grounded QA LLM for Banking给出银行场景可落地训练配方以提升可引用与拒答校准。
|
cs.AIcs.CLcs.MA
|
Denys Katerenchuk, Pablo Duboue, Keelan Evanini, David Gondek, Nithin Govindugari |
Large language models (LLMs) are rapidly being adopted across various domains. However, their adoption in banking industry faces resistance due to demands for high accuracy, regulatory compliance, and the need for verifiable and grounded responses. We present ...Large language models (LLMs) are rapidly being adopted across various domains. However, their adoption in banking industry faces resistance due to demands for high accuracy, regulatory compliance, and the need for verifiable and grounded responses. We present a unified, data-efficient framework for training grounded domain-specific LLMs that optimizes answer quality, citation grounding, and calibrated refusal under real-world deployment constraints. First, we describe a data generation pipeline ...
|
| 409 |
FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis
2605.05499
Multimodal LLM Agent for Food Analysis构建多模态LLM代理实现细粒度食物识别与属性分析。
|
cs.AI
|
Woojin Lee, Pranav Mekkoth, Ye Tian, Onat Gungor, Tajana Rosing |
The widespread adoption of camera-equipped mobile devices and wearables has enabled convenient capture of meal images, making food recognition a key component for real time dietary monitoring. However, real-world food images present challenges due to high intr...The widespread adoption of camera-equipped mobile devices and wearables has enabled convenient capture of meal images, making food recognition a key component for real time dietary monitoring. However, real-world food images present challenges due to high intra-class similarity and the frequent presence of multiple food items within a single image. While deep learning models achieve strong performance in coarse grained classification, they often struggle to capture fine-grained attributes such a...
|
| cs.AR 1 papers | ||||
| 327 |
Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours
2605.05170
多智能体硬件设计展示多代理系统在80小时内自动构建TurboQuant推理加速器。
|
cs.ARcs.AI
|
The Verkor Team, Ravi Krishna, Suresh Krishna, David Chin |
Driven by a rapid co-evolution of both harness and underlying models, LLM agents are improving at a dizzying pace. In our prior work (performed in Dec. 2025), we introduced "Design Conductor" (or just "Conductor"), a system capable of building a 5-stage Linux-...Driven by a rapid co-evolution of both harness and underlying models, LLM agents are improving at a dizzying pace. In our prior work (performed in Dec. 2025), we introduced "Design Conductor" (or just "Conductor"), a system capable of building a 5-stage Linux-capable RISC-V CPU in 12 hours. In this work, we introduce an updated multi-agent harness powered by frontier models released in April 2026, which is able to handle 80x larger tasks, at higher quality, fully autonomously. Following a brief ...
|
| cs.CC 1 papers | ||||
| 132 |
Average Attention Transformers and Arithmetic Circuits
2605.04683
Transformer Computational Power证明平均注意力Transformer可模拟特定常深算术电路并分析其表达能力。
|
cs.CCcs.AIcs.LG
|
Lena Ehrmuth, Laura Strieker |
We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can ...We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can be simulated this way have constant depth while using unbounded addition, binary multiplication and sign gates. The transformers we use have arithmetic circuits instead of feed-forward networks. With typical average attention the functions ...
|
| cs.CE 1 papers | ||||
| 48 |
A Hybrid Method for Low-Resource Named Entity Recognition
2605.04489
Low-Resource NER Neurosymbolic规则与深度模型两阶段融合提升越南语低资源NER。
|
cs.CEcs.AIcs.CL
|
Do Minh Duc, Quan Xuan Truong, Viet Tran Hong, Le Hoang Anh, Mac Thi Minh Tra |
Named Entity Recognition (NER) is a critical component of Natural Language Processing with diverse applications in information extraction and conversational AI. However, NER in specific domains for low-resource languages faces challenges such as limited annota...Named Entity Recognition (NER) is a critical component of Natural Language Processing with diverse applications in information extraction and conversational AI. However, NER in specific domains for low-resource languages faces challenges such as limited annotated data and heterogeneous label sets. This study addresses these issues by proposing a hybrid neurosymbolic framework that integrates rule-based processing with deep learning models for Vietnamese NER. The core idea involves a two-stage pi...
|
| cs.CL 51 papers | ||||
| 23 |
Telegraph English: Semantic Prompt Compression via Structured Symbolic Rewriting
2605.04426
Prompt Compression via Rewriting提出电报英语用符号化结构重写实现自适应语义压缩提示词。
|
cs.CL
|
Mikhail L. Arbuzov, Sisong Bei, Ziwei Dong, Dmitri Kalaev, Alexey A. Shvets |
We introduce Telegraph English (TE), a prompt-compression protocol that rewrites natural language into a symbol-rich, formally-structured dialect. Where token-deletion methods such as LLMLingua-2 train a classifier to delete low-importance tokens at a fixed ra...We introduce Telegraph English (TE), a prompt-compression protocol that rewrites natural language into a symbol-rich, formally-structured dialect. Where token-deletion methods such as LLMLingua-2 train a classifier to delete low-importance tokens at a fixed ratio, TE performs a full semantic rewrite: it decomposes the input into atomic fact lines, substitutes verbose phrases with $\sim$40 logical and relational symbols, and lets the compression ratio adapt to each document's information density....
|
| 31 |
GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking
2605.04449
Dialogue State Tracking MoE融合图神经网络与ReAct专家路由提升DST。
|
cs.CLcs.AI
|
Ziqi Zhu, Adithya Suresh, Tomal Deb, Iman Abbasnejad |
Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Exp...Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Experts), a novel framework that combines language models and graph-structured dialogue understanding with ReAct agent-based reasoning for superior DST performance. Our approach dynamically routes between specialized experts: a Graph Neural Ne...
|
| 36 |
DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation
2605.04458
QA Nugget Generation for Evaluation自动生成问答式nuggets用于长报告覆盖度评估。
|
cs.CLcs.IR
|
Bryan Li, William Walden, Yu Hou, Gabrielle Kaili-May Liu, Dawn Lawrie |
Evaluation of long-form, citation-backed reports has lately received significant attention due to the wide-scale adoption of retrieval-augmented generation (RAG) systems. Core to many evaluation frameworks is the use of atomic facts, or nuggets, to assess a re...Evaluation of long-form, citation-backed reports has lately received significant attention due to the wide-scale adoption of retrieval-augmented generation (RAG) systems. Core to many evaluation frameworks is the use of atomic facts, or nuggets, to assess a report's coverage of query-relevant information attested in the underlying collection. While nuggets have traditionally been represented as short statements, recent work has used question-answer (QA) representations, enabling fine-grained eva...
|
| 51 |
CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation
2605.04495
Confidence-Aware RAG Reranking用生成器置信度变化训练无关地重排证据以提升RAG。
|
cs.CLcs.AI
|
Zhipeng Song, Yizhi Zhou, Xiangyu Kong, Jiulong Jiao, Xuezhou Ye |
Retrieval-Augmented Generation (RAG) depends on document ranking to provide useful evidence for generation, but conventional reranking methods mainly optimize query-document relevance rather than generation usefulness. A relevant document may still introduce n...Retrieval-Augmented Generation (RAG) depends on document ranking to provide useful evidence for generation, but conventional reranking methods mainly optimize query-document relevance rather than generation usefulness. A relevant document may still introduce noise, while a lower-ranked document may better reduce the generator's uncertainty. We propose CAR (Confidence-Aware Reranking), a query-guided, training-free, and plug-and-play reranking framework that uses generator confidence change as a ...
|
| 52 |
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States
2605.04496
Active Long-Text Information Foraging以解耦认知状态主动检索稀疏关键信息实现长文理解。
|
cs.CL
|
Zhenliang Zhang, Wenqing Wang, Yong Hu, Yaming Yang, Jiaheng Gao |
Long-Text Understanding (LTU) at million-token scale requires balancing reasoning fidelity with computational efficiency. Frontier long-context LLMs can process millions of token contexts end-to-end, but they suffer from high token consumption and attention di...Long-Text Understanding (LTU) at million-token scale requires balancing reasoning fidelity with computational efficiency. Frontier long-context LLMs can process millions of token contexts end-to-end, but they suffer from high token consumption and attention dilution. In parallel, specialized LTU agents often sacrifice fidelity through task-agnostic abstractions like graph construction or indexing. We identify a key insight for LTU: query-relevant information is typically sparse relative to the f...
|
| 55 |
Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties
2605.04500
Generalization to Unseen Language Varieties利用语言差异性信号提升对未见低资源变体的泛化。
|
cs.CLcs.AI
|
Jinju Kim, Haeji Jung, Youjeong Roh, Jong Hwan Ko, David R. Mortensen |
Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize ...Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework ...
|
| 62 |
Distilling Bayesian Belief States into Language Models for Auditable Negotiation
2605.04507
Auditable negotiation with belief distillation将贝叶斯对手信念蒸馏进LLM以实现可审计的谈判决策
|
cs.CL
|
Zongqi Cui, Baihan Lin |
Negotiation agents must infer what their counterpart values, update those beliefs over dialogue turns, and choose actions under uncertainty. End-to-end large language models (LLMs) can imitate negotiation dialogue, but their opponent beliefs are usually implic...Negotiation agents must infer what their counterpart values, update those beliefs over dialogue turns, and choose actions under uncertainty. End-to-end large language models (LLMs) can imitate negotiation dialogue, but their opponent beliefs are usually implicit and difficult to inspect. We propose BOND (Bayesian Opponent-belief Negotiation Distillation), a framework for auditable negotiation. BOND consists of an LLM-based Bayesian teacher that scores dialogue contexts against the six possible o...
|
| 69 |
RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation
2605.04523
Judge-orchestrated LLM ensemble generation用裁判模型选择多LLM候选实现忠实多轮生成并夺冠SemEval
|
cs.CLcs.AIcs.LG
|
Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov |
We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance...We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned harmonic mean of 0.7827 and outperforming the strongest baseline (gpt-oss-120b, 0.6390). Ablations show that diversity in model families, scales, and prompting strategies is essential...
|
| 78 |
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
2605.04539
Hybrid DPO for logical LLM alignment融合逻辑判别与偏好优化缓解DPO偏流畅导致的逻辑对齐缺口
|
cs.CLcs.AI
|
Qiming Bao, Juho Leinonen, Paul Denny, Michael J. Witbrock |
Direct Preference Optimization (DPO), the efficient alternative to PPO-based RLHF, falls short on knowledge-intensive generation: standard preference signals from human annotators or LLM judges exhibit a systematic verbosity bias that rewards fluency over logi...Direct Preference Optimization (DPO), the efficient alternative to PPO-based RLHF, falls short on knowledge-intensive generation: standard preference signals from human annotators or LLM judges exhibit a systematic verbosity bias that rewards fluency over logical correctness. This blindspot leaves a logical alignment gap -- SFT models reach NLI entailment of only 0.05-0.22 despite producing fluent text. We propose RLearner-LLM with Hybrid-DPO: an automated preference pipeline that fuses a DeBERT...
|
| 81 |
UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding
2605.04543
Unified speculative decoding verification以最优传输统一多步多草稿推测解码的验证与加速策略
|
cs.CLcs.LG
|
Yepeng Weng, Qiao Hu, Takehisa Yairi |
Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT t...Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT to single-step drafts or per-token rejection sampling to tree-structured candidates. This separation leaves the joint regime (where multi-step dependencies meet multi-draft branching) poorly optimized, as local verification rules fail to exp...
|
| 85 |
The Newsworthiness of Brazilian Distress: A Peak Analysis on Time Series of International Media Attention to Disasters in Brazil
2605.04552
International media attention to disasters对国际媒体报道峰值做时间序列分析以刻画巴西灾害新闻性
|
cs.CL
|
Brielen Madureira, Andreas Niekler, Marc Keuschnigg, Mariana Madruga de Brito |
Media coverage influences disaster response, yet the drivers of international media attention to local events remain unevenly understood. Brazil offers a compelling case: some of its natural and technological disasters occasionally hit the international headli...Media coverage influences disaster response, yet the drivers of international media attention to local events remain unevenly understood. Brazil offers a compelling case: some of its natural and technological disasters occasionally hit the international headlines. However, systematic analyses of what makes these events be discussed abroad are still missing. Addressing this gap requires representative, validated and country-specific news datasets. This paper presents a peak analysis of 2k news ab...
|
| 96 |
Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus
2605.04576
Tajik POS tagging benchmark构建塔吉克语词性标注基准并比较多种神经模型。
|
cs.CL
|
Mullosharaf K. Arabov |
This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language models demonstrating high effectiveness for many of the world's languages, their capacity for ...This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language models demonstrating high effectiveness for many of the world's languages, their capacity for grammatical analysis of Tajik has remained unexplored until now. The aim of this study is to fill this gap through a systematic comparison of classical neural network architectures and modern multilingual transformers. Experiments were co...
|
| 98 |
TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script)
2605.04583
Tajik NLP toolkit发布保留西里尔正字法的塔吉克语文本处理开源工具链。
|
cs.CL
|
Mullosharaf K. Arabov |
The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces TajikNLP, an open-sour...The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces TajikNLP, an open-source Python library that provides the first comprehensive pipeline for processing authentic Tajik text while preserving the original Cyrillic orthography. The library implements a modular architecture centered around a unified Doc object, ena...
|
| 115 |
Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models
2605.04638
Gradient-based LLM uncertainty estimation提出基于语义保持嵌入梯度的无采样LLM不确定性估计。
|
cs.CLcs.AI
|
Mingda Li, Rundong Lv, Xinyu Li, Weinan Zhang, Ting Liu |
Uncertainty quantification (UQ) is an important technique for ensuring the trustworthiness of LLMs, given their tendency to hallucinate. Existing state-of-the-art UQ approaches for free-form generation rely heavily on sampling, which incurs high computational ...Uncertainty quantification (UQ) is an important technique for ensuring the trustworthiness of LLMs, given their tendency to hallucinate. Existing state-of-the-art UQ approaches for free-form generation rely heavily on sampling, which incurs high computational cost and variance. In this work, we propose the first gradient-based UQ method for free-form generation, SemGrad, which is sampling-free and computationally efficient. Unlike prior gradient-based methods developed for classification tasks t...
|
| 117 |
Graph-Augmented LLMs for Swiss MP Ideology Prediction
2605.04643
Graph-augmented LLM ideology prediction融合议会关系图与LLM预测瑞士议员意识形态立场。
|
cs.CL
|
Yifei Yuan, Luis Salamanca, Sophia Schlosser, Laurence Brandenberger |
Approximating the ideological position of Members of Parliament (MPs) is a fundamental task in political science, helping researchers understand legislative behavior, party alignment, and policy preferences. While Large Language Models (LLMs) have shown promis...Approximating the ideological position of Members of Parliament (MPs) is a fundamental task in political science, helping researchers understand legislative behavior, party alignment, and policy preferences. While Large Language Models (LLMs) have shown promising results in estimating MPs' ideological stances, there are more actors and elements in the parliamentary system, and relations between them, that could provide a wider and more informative picture. However, due to the complexity of integ...
|
| 119 |
CHE-TKG: Collaborative Historical Evidence and Evolutionary Dynamics Learning for Temporal Knowledge Graph Reasoning
2605.04652
Temporal knowledge graph reasoning协同学习历史证据与演化动态以提升时序知识图推理。
|
cs.CL
|
Shuai-long Lei, Xiaobin Zhu, Jiarui Liang, Guoxi Sun, Zhiyu Fang |
Temporal knowledge graph (TKG) reasoning aims to predict future events from historical facts. A key challenge lies in jointly capturing two sources of predictive information in TKGs: historical evidence and evolutionary dynamics. However, existing methods typi...Temporal knowledge graph (TKG) reasoning aims to predict future events from historical facts. A key challenge lies in jointly capturing two sources of predictive information in TKGs: historical evidence and evolutionary dynamics. However, existing methods typically focus on only one of these sources, which limits the ability to fully exploit the complementary predictive signals in TKGs. To address this, we propose CHE-TKG, a novel collaborative dual-view learning framework for TKG reasoning. CHE...
|
| 123 |
Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs
2605.04665
LLM Prompt Robustness揭示语义等价改写会导致LLM输出格式崩溃并系统评测。
|
cs.CL
|
Aofan Liu, Jingxiang Meng |
When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and fou...When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and four task types, we observe a systematic failure mode we call prompt-variant output-mode collapse: when a closed-form prompt asks for a bare label or a single choice token, content-preserving prompt variants can push the model into conversatio...
|
| 144 |
Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL
2605.04719
RL Credit Assignment for Text-to-SQL提出步骤级奖励分配以改进工具调用式Text-to-SQL的强化学习训练。
|
cs.CL
|
Yaxun Dai, Baolin Sun, Junying Wang, Pengfei Wang, Yingqi Gao |
Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome s...Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome supervision, resulting in a fundamental credit assignment problem: models receive the same reward for any trajectory that yields the correct answer, even when intermediate steps are redundant, inefficient, or erroneous. Consequently, models ...
|
| 163 |
Gyan: An Explainable Neuro-Symbolic Language Model
2605.04759
可解释神经符号语言模型提出可解释的神经符号语言模型以降低幻觉并增强可维护性。
|
cs.CLcs.AIcs.ETcs.LG
|
Venkat Srinivasan, Vishaal Jatav, Anushka Chandrababu, Geetika Sharma |
Transformer based pre-trained large language models have become ubiquitous. There is increasing evidence to suggest that even with large scale pre-training, these models do not capture complete compositional context and certainly not, the full human analogous ...Transformer based pre-trained large language models have become ubiquitous. There is increasing evidence to suggest that even with large scale pre-training, these models do not capture complete compositional context and certainly not, the full human analogous context. Besides, by the very nature of the architecture, these models hallucinate, are difficult to maintain, are not easily interpretable and require enormous compute resources for training and inference. Here, we describe Gyan, an explai...
|
| 165 |
Elicitation Matters: How Prompts and Query Protocols Shape LLM Surrogates under Sparse Observations
2605.04764
稀疏观测下LLM代理建模研究提示与查询协议如何影响LLM代理的预测与不确定性对齐。
|
cs.CL
|
Ge Lei, Samuel J. Cooper |
Large language models are increasingly used as surrogate models for low-data optimization, but their optimizer-facing prediction and its uncertainty remain poorly understood. We study the surrogate belief elicited from an LLM under sparse observations, showing...Large language models are increasingly used as surrogate models for low-data optimization, but their optimizer-facing prediction and its uncertainty remain poorly understood. We study the surrogate belief elicited from an LLM under sparse observations, showing that it depends strongly on prompt text and query protocol. We introduce an uncertainty-alignment criterion that measures whether model uncertainty tracks residual ambiguity among sample-consistent functions. Across controlled inference ta...
|
| 179 |
StoryAlign: Evaluating and Training Reward Models for Story Generation
2605.04831
故事生成奖励模型对齐系统评测并训练故事生成奖励模型以更贴合人类叙事偏好。
|
cs.CLcs.AI
|
Haotian Xia, Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu |
Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex...Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences. A key reason is the absence of effective modeling of human story preferences, which are inherently subjective and under-explored. In this work, we systematically evaluate the modeling of h...
|
| 188 |
Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset
2605.04857
Eye-tracking dataset for L2 idioms发布眼动数据集量化二语学习者处理习语的认知负荷。
|
cs.CLcs.AIcs.CV
|
Eduardo Santos, Juliana Carvalho, César Rennó-Costa |
This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers freq...This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although t...
|
| 191 |
Measuring Psychological States Through Semantic Projection: A Theory-Driven Approach to Language-Based Assessment
2605.04873
Unsupervised psychological state assessment用语义投影的无监督方法从文本直接测量心理状态并增强可解释性。
|
cs.CL
|
Maria Luongo, Davide Marocco, Nicola Milano |
Recent advances in natural language processing have enabled increasingly accurate estimation of psychological traits from language. However, most existing approaches rely on supervised models trained to predict questionnaire scores, limiting interpretability a...Recent advances in natural language processing have enabled increasingly accurate estimation of psychological traits from language. However, most existing approaches rely on supervised models trained to predict questionnaire scores, limiting interpretability and generalizability across contexts. The present study introduces a theory-driven and fully unsupervised framework for measuring psychological states directly from natural language using semantic projection. Psychological constructs were op...
|
| 193 |
Anticipating Innovation Using Large Language Models
2605.04875
Innovation forecasting from patents用LLM从专利语言中提取早期信号预测未来技术组合创新。
|
cs.CLcs.AIcs.CY
|
Enrico Maria Fenoaltea, Filippo Santoro, Giordano De Marzo, Segun Taofeek Aroyehun, Andrea Tacchella |
Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals de...Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transf...
|
| 197 |
A Comparative Study of PyCaret AutoML and CNN-BiLSTM for Binary Hate Speech Detection in Indonesian Twitter
2605.04885
Indonesian hate speech detection comparison对比PyCaret传统特征与CNN-BiLSTM在印尼推特仇恨言论检测表现。
|
cs.CL
|
Tanty Widiyastuti, Mayada, Adisty Syawalda Ariyanto, Luluk Muthoharoh, Ardika Satria |
This paper compares a PyCaret AutoML branch and a CNN-BiLSTM branch for binary hate speech detection on Indonesian Twitter using the HS label from the corpus of Ibrohim and Budi. Both branches share the same preprocessing pipeline so that the comparison reflec...This paper compares a PyCaret AutoML branch and a CNN-BiLSTM branch for binary hate speech detection on Indonesian Twitter using the HS label from the corpus of Ibrohim and Budi. Both branches share the same preprocessing pipeline so that the comparison reflects modelling differences rather than inconsistent data preparation. The conventional branch uses TF-IDF with a lexicon-based abusive-word count, whereas the neural branch learns dense token representations and captures both local phrase pat...
|
| 198 |
BenCSSmark: Making the Social Sciences Count in LLM Research
2605.04886
LLM benchmarks for social science主张将社会科学任务纳入LLM基准以改进评测与研究导向。
|
cs.CL
|
Arnault Chatelain, Étienne Ollion, Qianwen Guan, Diandra Fabre, Lorraine Goeuriot |
This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing computational systems -- are pi...This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing computational systems -- are pivotal in the development of artificial intelligence (AI), including large language models (LLMs). Benchmarks do more than measure progress -- they actively structure it, shaping reputations, research agendas, and commercial outcomes. Despit...
|
| 199 |
Sentiment Analysis and Customer Satisfaction Prediction on E-Commerce Platforms Based on YouTube Comments Using the XGBoost Algorithm
2605.04887
YouTube comment sentiment with XGBoost用TF-IDF与XGBoost从YouTube评论做情感分析并预测电商满意度。
|
cs.CL
|
Ridho Benedictus Togi Manik, Muhammad Aqil Ramadhan, Ihsan Maulana Yusuf, Luluk Muthoharoh, Ardika Satria |
The exponential expansion of digital commerce in Indonesia has significantly shifted consumer interactions toward video-centric social networks, particularly YouTube. Consequently, the sheer volume of unstructured, multi-contextual comments poses a tremendous ...The exponential expansion of digital commerce in Indonesia has significantly shifted consumer interactions toward video-centric social networks, particularly YouTube. Consequently, the sheer volume of unstructured, multi-contextual comments poses a tremendous challenge for manual sentiment tracking. This study investigates and constructs a predictive model for customer satisfaction leveraging the Extreme Gradient Boosting (XGBoost) architecture coupled with Term Frequency-Inverse Document Freque...
|
| 200 |
A Comparative Analysis of Machine Learning and Deep Learning Models for Tweet Sentiment Classification: A Case Study on the Sentiment140 Dataset
2605.04888
Tweet sentiment model comparison比较TF-IDF逻辑回归与BiLSTM在Sentiment140推文情感分类效果。
|
cs.CL
|
Vita Anggraini, Cintya Bella, Bastian, Luluk Muthoharoh, Ardika Satria |
The exponential growth of social media has created an urgent need for automated systems to analyze unstructured public sentiment in real time. This study compares a traditional Logistic Regression model using TF-IDF features with a deep learning Bidirectional ...The exponential growth of social media has created an urgent need for automated systems to analyze unstructured public sentiment in real time. This study compares a traditional Logistic Regression model using TF-IDF features with a deep learning Bidirectional Long Short-Term Memory (BiLSTM) architecture on a 10,000-tweet subset of the Sentiment140 dataset. Experimental results show that Logistic Regression outperformed BiLSTM, achieving an accuracy of 73.5% compared with 69.17%, while the deep l...
|
| 204 |
Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall
2605.04897
Retrieval-centered agent memory提出True Memory以多阶段检索替代入库抽取实现可追溯代理记忆。
|
cs.CLcs.AIcs.IR
|
Joshua Adler, Guy Zehavi |
Extraction at ingestion is the wrong primitive for agent memory: content discarded before the query is known cannot be recovered at retrieval time. We propose True Memory, a six-layer architecture that shifts the center of the system from a storage schema to a...Extraction at ingestion is the wrong primitive for agent memory: content discarded before the query is known cannot be recovered at retrieval time. We propose True Memory, a six-layer architecture that shifts the center of the system from a storage schema to a multi-stage retrieval pipeline operating over events preserved verbatim. The full system runs as a single SQLite file on commodity CPU with no external database, vector index, graph store, or GPU. On LoCoMo (1,540 questions across 10 multi...
|
| 213 |
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
2605.04913
Local learning for LLM post-training提出更便宜更快的局部反传方案以降低LLM后训练开销。
|
cs.CLcs.LG
|
Hengyu Shi, Tianyang Han, Peizhe Wang, Zhiling Wang, Xu Yang |
LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-g...LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we prop...
|
| 219 |
Unintended Negative Impacts of Promotional Language in Patent Evaluation
2605.04926
Promotional language in patent evaluation大规模分析专利文本夸张宣传词对审查评价结果的负面影响。
|
cs.CL
|
Bingkun Zhao, Chenwei Zhang, Hao Peng |
Promotional language has been increasingly used to aid the communication of innovative ideas in science. Yet, less is known about its role in the context of technological innovation. Here, we use a validated and domain-diagnosed lexicon of 135 promotional word...Promotional language has been increasingly used to aid the communication of innovative ideas in science. Yet, less is known about its role in the context of technological innovation. Here, we use a validated and domain-diagnosed lexicon of 135 promotional words to study the association between promotional language and patent evaluation outcomes among 2.7 million USPTO patent applications. Our large-scale study reveals three unexpected findings. First, in contrast to scientific evaluation, we fin...
|
| 222 |
HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities
2605.06157
Hard negatives for image-text matching构造困难负样本文本训练ITM以提升细粒度视觉语言理解。
|
cs.CLcs.AIcs.CV
|
Esra Dönmez, Pascal Tilli, Hsiu-Yu Yang, Thang Vu, Carina Silberer |
Image-Text-Matching (ITM) is one of the defacto methods of learning generalized representations from a large corpus in Vision and Language (VL). However, due to the weak association between the web-collected image-text pairs, models fail to show a fine-grained...Image-Text-Matching (ITM) is one of the defacto methods of learning generalized representations from a large corpus in Vision and Language (VL). However, due to the weak association between the web-collected image-text pairs, models fail to show a fine-grained understanding of the combined semantics of these modalities. To address this issue we propose Hard Negative Captions (HNC): an automatically created dataset containing foiled hard negative captions for ITM training towards achieving fine-g...
|
| 224 |
UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning
2605.04941
Neuro-symbolic syllogistic reasoning用小LLM解析+定理证明器实现高效三段论推理系统。
|
cs.CL
|
Ivan Kartáč, Kristýna Onderková, Jan Bronec, Zdeněk Kasner, Mateusz Lango |
This paper describes our system submitted to SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models. We present an efficient modular neuro-symbolic approach, combining a symbolic prover with small reasoning LLMs (4B parameter...This paper describes our system submitted to SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models. We present an efficient modular neuro-symbolic approach, combining a symbolic prover with small reasoning LLMs (4B parameters). The system consists of an LLM-based parser that translates natural language syllogisms to a first-order logic (FOL) representation, an automated theorem prover, and two optional modules: machine translation for multilingual inputs and a...
|
| 227 |
Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir
2605.04948
PEFT for low-resource language adaptation比较LoRA与QLoRA将LLM适配到低资源黏着语巴什基尔语。
|
cs.CL
|
Mullosharaf K. Arabov, Svetlana S. Khaybullina |
This paper presents a comparative study of parameter-efficient fine-tuning (PEFT) methods, including LoRA and QLoRA, applied to the task of adapting large language models to the Bashkir language, a low-resource agglutinative language of the Turkic family. Expe...This paper presents a comparative study of parameter-efficient fine-tuning (PEFT) methods, including LoRA and QLoRA, applied to the task of adapting large language models to the Bashkir language, a low-resource agglutinative language of the Turkic family. Experimental evaluation is conducted on a Bashkir text corpus of 71k documents (46.9M tokens) using models of various architectures: DistilGPT2, GPT-2 (base, medium), Phi-2, Qwen2.5-7B, DeepSeek-7B, and Mistral-7B. To improve the reliability of...
|
| 234 |
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
2605.04962
Tabular embedding benchmark and models提出TabBench并学习通用表格向量表示以支持检索与理解。
|
cs.CLcs.IR
|
Minjie Qiang, Mingming Zhang, Xiaoyi Bao, Xing Fu, Yu Cheng |
Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack retrieval-compatible vector outp...Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack retrieval-compatible vector outputs, whereas text embedding models often fail to capture tabular structure and numerical semantics. To bridge this gap, we first introduce the Tabular Embedding Benchmark (TabBench), a comprehensive suite designed to evaluate the tabular un...
|
| 238 |
Why Expert Alignment Is Hard: Evidence from Subjective Evaluation
2605.04972
Expert alignment in subjective evaluation通过专家评测与问卷分析揭示主观任务中对齐专家判断的难点。
|
cs.CL
|
Tzu-Mi Lin, Wataru Hirota, Tatsuya Ishigaki, Lung-Hao Lee, Chung-Chi Chen |
Aligning large language models with expert judgment is especially difficult in subjective evaluation tasks, where experts may disagree, rely on tacit criteria, and change their judgments over time. In this paper, we study expert alignment as a way to understan...Aligning large language models with expert judgment is especially difficult in subjective evaluation tasks, where experts may disagree, rely on tacit criteria, and change their judgments over time. In this paper, we study expert alignment as a way to understand this difficulty. Using expert evaluations and follow-up questionnaires, we examine how different forms of expert information affect alignment and what this reveals about subjective judgment. Our findings show four consistent patterns. Fir...
|
| 251 |
Misaligned by Reward: Socially Undesirable Preferences in LLMs
2605.05003
奖励模型社会对齐评测扩展奖励模型基准到偏见安全道德等领域揭示不良偏好。
|
cs.CLcs.AIcs.CY
|
Gayane Ghazaryan, Esra Dönmez |
Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limited insight into whether these...Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limited insight into whether these models capture socially desirable preferences. As a result, important failures in social alignment can remain hidden. We extend reward-model benchmarking to four socially consequential domains: bias, safety, morality, and ethical reasoni...
|
| 261 |
Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals
2605.05025
LLM幻觉检测用内部注意力与均匀分布的KL散度特征单次前向检测幻觉。
|
cs.CL
|
Gijs van Dijk |
We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, w...We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, we measure the Kullback-Leibler divergence between each attention head's distribution and a uniform reference distribution, and use these features in a logistic regression probe. Across multiple datasets, task types, and model families, atte...
|
| 278 |
The Impossibility Triangle of Long-Context Modeling
2605.05066
Long-context modeling trade-offs证明长上下文模型在效率、紧凑性与回忆能力间存在不可能三角。
|
cs.CLcs.AIcs.LG
|
Yan Zhou |
We identify and prove a fundamental trade-off governing long-sequence models: no model can simultaneously achieve (i) per-step computation independent of sequence length (Efficiency), (ii) state size independent of sequence length (Compactness), and (iii) the ...We identify and prove a fundamental trade-off governing long-sequence models: no model can simultaneously achieve (i) per-step computation independent of sequence length (Efficiency), (ii) state size independent of sequence length (Compactness), and (iii) the ability to recall a number of historical facts proportional to sequence length (Recall). We formalize this trade-off within an Online Sequence Processor abstraction that unifies Transformers, state space models, linear recurrent networks, a...
|
| 283 |
The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences
2605.05080
LLM psychometric differences用多量表测评揭示LLM差异主轴为“体验现象性”维度。
|
cs.CL
|
Hubert Plisiecki, Sabina Siudaj, Kacper Dudzic, Anna Sterna, Maciej Gorski |
We administer 45 validated psychometric questionnaires to 50 large language models (LLMs) to identify the dimensions along which LLMs differ psychometrically. Using Supervised Semantic Differential (SSD), we find that the primary axis of between-model variance...We administer 45 validated psychometric questionnaires to 50 large language models (LLMs) to identify the dimensions along which LLMs differ psychometrically. Using Supervised Semantic Differential (SSD), we find that the primary axis of between-model variance separates items describing phenomenally rich experience, including embodied sensation, felt affect, inner speech, imagery, and empathy, from items describing stimulus-driven behavioral reactivity ($R^2_{adj}=.037$, $p<.0001$). To test this...
|
| 288 |
Automatically Finding and Validating Unexpected Side-Effects of Interventions on Language Models
2605.05090
Auditing LLM intervention effects自动对比生成并统计验证,发现干预对LLM行为的意外副作用。
|
cs.CLcs.AI
|
Quintin Pope, Ajay Hayagreeve Balaji, Jacques Thibodeau, Xiaoli Fern |
We present an automated, contrastive evaluation pipeline for auditing the behavioral impact of interventions on large language models. Given a base model $M_1$ and an intervention model $M_2$, our method compares their free-form, multi-token generations across...We present an automated, contrastive evaluation pipeline for auditing the behavioral impact of interventions on large language models. Given a base model $M_1$ and an intervention model $M_2$, our method compares their free-form, multi-token generations across aligned prompt contexts and produces human-readable, statistically validated natural-language hypotheses describing how the models differ, along with recurring themes that summarize patterns across validated hypotheses. We evaluate the a...
|
| 295 |
Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
2605.05103
Black-box hallucination measurement用语料概念场在嵌入空间度量新颖性与黑盒幻觉偏离程度。
|
cs.CLcs.AIcs.CY
|
Nicholas S. Kersting, Vittorio Castelli, Chieh Ting Yeh, Xinzhu Wang, Saad Taame |
We introduce the **Concept Field** of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the fie...We introduce the **Concept Field** of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the field by $ζ$, the mean absolute z-distance between the observed delta and the field's local Gaussian estimate. The score is black-box (no model internals), corpus-attributable (every score traces to nearby corpus sentences), and admits a direc...
|
| 304 |
Beyond Semantics: An Evidential Reasoning-Aware Multi-View Learning Framework for Trustworthy Mental Health Prediction
2605.05121
可信心理健康预测提出证据推理感知多视图学习以提升不确定性与鲁棒性预测。
|
cs.CL
|
Yucheng Ruan, Ling Huang, Qika Lin, Kai He, Mengling Feng |
Automated mental health prediction using textual data has shown promising results with deep learning and large language models. However, deploying these models in high-stakes real-world settings remains challenging, as existing approaches largely rely on seman...Automated mental health prediction using textual data has shown promising results with deep learning and large language models. However, deploying these models in high-stakes real-world settings remains challenging, as existing approaches largely rely on semantic representations and often produce overconfident predictions under ambiguous, noisy, or shifted data. Moreover, most methods lack reliable uncertainty estimation, undermining trust in risk-sensitive mental health applications. To address...
|
| 319 |
PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation
2605.05159
多语极化检测系统用Gemma模型集成与合成数据增强完成22语种极化二分类。
|
cs.CLcs.AIcs.LG
|
Srikar Kashyap Pulipaka |
We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augm...We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language model (LLM). We employ three synthetic data strategies (direct generation, paraphrasing, and contrastive pair creation) using GPT-4o-mini, with a multi-stage quality filtering pipeline...
|
| 325 |
The First Token Knows: Single-Decode Confidence for Hallucination Detection
2605.05166
首token置信度测幻觉用首个内容token的logits熵作为单次解码置信度检测幻觉。
|
cs.CLcs.AI
|
Mina Gabriel |
Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled a...Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning using natural language inference, but it adds both sampling cost and external inference overhead. We show that first-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first conten...
|
| 338 |
Implicit Representations of Grammaticality in Language Models
2605.05197
语言模型语法性表征研究LM是否隐式学习语法性并与概率似然区分开来。
|
cs.CL
|
Yingshan Susan Wang, Linlu Qiu, Zhaofeng Wu, Roger P. Levy, Yoon Kim |
Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between gramm...Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between grammatical and ungrammatical sentences in tightly controlled minimal pairs. However, their string probabilities do not sharply discriminate between grammatical and ungrammatical sentences overall. But do LMs implicitly acquire a grammaticality ...
|
| 353 |
Counterargument for Critical Thinking as Judged by AI and Humans
2605.05353
生成式AI与批判性写作研究对比人类与AI评分,评估学生写作中反驳论证与批判思维。
|
cs.CLcs.AI
|
Tosin Adewumi, Marcus Liwicki, Foteini Simistira Liwicki, Lama Alkhaled, Hamam Mokayed |
This intervention study investigates the use of counterarguments in writing for critical thinking by students in the context of Generative AI (GenAI). This is especially as risks of cheating and cognitive offloading exist with the use of GenAI. We presented 36...This intervention study investigates the use of counterarguments in writing for critical thinking by students in the context of Generative AI (GenAI). This is especially as risks of cheating and cognitive offloading exist with the use of GenAI. We presented 36 students in a particular university course with 4 carefully selected thesis statements (from a set of popular debates) to write about anyone of them. We used six established rubrics (focus, logic, content, style, correctness and reference)...
|
| 370 |
Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets
2605.05392
查询聚焦摘要数据生成从无查询摘要数据自动生成证据关键词以构建QFS数据集。
|
cs.CLcs.AI
|
Yllias Chali, Deen Abdullah |
Large-scale datasets are widely used to perform summarization tasks, but they may not include queries alongside documents and summaries. In the search for suitable datasets for Query-Focused Summarization (QFS), we identify two research questions: Is it possib...Large-scale datasets are widely used to perform summarization tasks, but they may not include queries alongside documents and summaries. In the search for suitable datasets for Query-Focused Summarization (QFS), we identify two research questions: Is it possible to automatically generate evidence-based query keywords from query-free datasets? Does evidence-based query generation support the QFS task? This paper proposes an evidence-based model to generate queries from query-free datasets. To eva...
|
| 390 |
SLAM: Structural Linguistic Activation Marking for Language Models
2605.05443
白盒LLM水印提出SLAM在激活结构几何中嵌入水印以降低质量损失。
|
cs.CLcs.AI
|
Fabrice Harel-Canada, Amit Sahai |
LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection with measurable quality loss. We present SLAM (Structural Linguistic Activation Marking), a novel white-box wa...LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection with measurable quality loss. We present SLAM (Structural Linguistic Activation Marking), a novel white-box watermarking scheme that sidesteps this cost by writing the mark into structural geometry rather than token frequencies: sparse autoencoders identify residual-stream directions encoding linguistic structure (e.g., voice, tense, clause order),...
|
| 403 |
ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis
2605.05485
Compiling LLM Reasoning to Solvers将少量推理轨迹编译为符号求解器以高效程序合成。
|
cs.CLcs.AI
|
Atharva Naik, Yash Mathur, Prakam, Carolyn Rose, David Mortensen |
LLMs can solve program synthesis tasks but remain inefficient and unreliable on hard instances requiring large combinatorial search. Given a small set of reasoning traces, we use coding agents to compile them into reusable symbolic program synthesizers over co...LLMs can solve program synthesis tasks but remain inefficient and unreliable on hard instances requiring large combinatorial search. Given a small set of reasoning traces, we use coding agents to compile them into reusable symbolic program synthesizers over constrained DSLs. The resulting solvers require no LLM calls at test time and are strong standalone systems: symbolic solver ensembles reach 91.3% accuracy on PBEBench-Lite and 84.7% on PBEBench-Hard, outperforming LLMs with test-time scaling...
|
| 410 |
Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks
2605.05503
Attacks on Diffusion LM Watermarks提出多步改写攻击削弱扩散语言模型文本水印检测。
|
cs.CL
|
Mohd Ruhul Ameen, Akif Islam, Nadim Mahmud, Md. Ekramul Hamid |
Statistical watermarking is a common approach for verifying whether text was written by a language model. Most existing schemes assume autoregressive generation, where tokens are produced left to right and contextual hashing is well defined. Diffusion language...Statistical watermarking is a common approach for verifying whether text was written by a language model. Most existing schemes assume autoregressive generation, where tokens are produced left to right and contextual hashing is well defined. Diffusion language models generate text by denoising tokens in arbitrary order, so these schemes cannot be applied directly. A recent watermark by Gloaguen et al. addresses this gap for LLaDA 8B Instruct and reports true positive detection above 99%. This pa...
|
| cs.CR 14 papers | ||||
| 54 |
Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis
2605.04499
LLM Penetration Testing Strategy提出渗透测试策略推理框架以规划与分析攻击路径。
|
cs.CRcs.AI
|
Yasod Ginige, Pasindu Marasinghe, Sajal Jain, Suranga Seneviratne |
Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals...Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific...
|
| 125 |
Differential Privacy in the Extensive-Form Bandit Problem
2605.05266
Differentially Private Bandits提出满足本地差分隐私的扩展式博弈bandit算法并给出遗憾界。
|
cs.CRcs.LG
|
Stephen Pasteris, Rahul Savani, Theodore Turocy |
We consider the extensive-form bandit problem, where on each trial the learner (a user coordinated by a server) plays an extensive-form game against an oblivious adversary, observing the information sets it finds itself in as well as the resulting payoff/loss....We consider the extensive-form bandit problem, where on each trial the learner (a user coordinated by a server) plays an extensive-form game against an oblivious adversary, observing the information sets it finds itself in as well as the resulting payoff/loss. We give an algorithm for this problem that satisfies $ε$-local differential privacy and attains a regret of $\tilde{O}(\sqrt{A\ln(S)T}/ε)$, where $A$ is the total number of actions that the learner can possibly take, $S$ is the number of t...
|
| 135 |
Gray-Box Poisoning of Continuous Malware Ingestion Pipelines
2605.04698
Poisoning Malware ML Pipelines在灰盒威胁下构造功能保持的对抗样本投毒持续恶意软件摄取管线。
|
cs.CRcs.LG
|
Jan Dolejš, Martin Jureček, Róbert Lórencz |
Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framewo...Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framework, we generate problem-space adversarial binaries through functionality-preserving manipulations, specifically Import Address Table (IAT) and section injections. We evaluate the impact of these poisoned samples when ingested into a defende...
|
| 136 |
Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization
2605.04700
Jailbreaking Audio Language Models利用token对齐梯度仅优化少量音频片段实现高效越狱攻击。
|
cs.CRcs.AIcs.CLcs.LGcs.SD
|
Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang, Zhijin Ge |
Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by...Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the structure of token-aligned gradients in ALMs. We find that gradient energy is highly non-uniform across audio tokens, indicating that only a small subset of token-aligned audio regions dominates the optimization signal. Motiv...
|
| 138 |
Vol-Mark: A Watermark for 3D Medical Volume Data Via Cubic Difference Expansion and Contrastive Learning
2605.04705
Watermarking 3D Medical Volumes提出可逆零水印Vol-Mark保护3D医学体数据的版权与篡改检测。
|
cs.CRcs.LG
|
Jiangnan Zhu, Yuntao Wang, Shengli Pan, Yujie Gu |
Today, advances in medical technology extensively utilize 3D volume data for accurate and efficient diagnostics. However, sharing these data across networks in telemedicine poses significant security risks of data tampering and unauthorized copying. To address...Today, advances in medical technology extensively utilize 3D volume data for accurate and efficient diagnostics. However, sharing these data across networks in telemedicine poses significant security risks of data tampering and unauthorized copying. To address these challenges, this paper proposes a novel reversible-zero watermarking approach, termed Vol-Mark, for medical volume data to protect their ownership and authenticity in telemedicine. The proposed Vol-Mark method offers two key benefits...
|
| 147 |
From Beats to Breaches:How Offensive AI Infers Sensitive User Information from Playlists
2605.04724
PII Inference from Playlists构建musicPIIrate展示攻击者可从公开歌单推断用户敏感个人信息。
|
cs.CRcs.AI
|
Stefano Cecconello, Mauro Conti, Luca Pajola, Luca Pasa, Pier Paolo Tricomi |
The pervasive integration of AI has enabled Offensive AI: the exploitation of AI for malicious ends across the cyber-kill chain. A critical manifestation is the user attribute inference attack, where AI infers sensitive Personally Identifiable Information (PII...The pervasive integration of AI has enabled Offensive AI: the exploitation of AI for malicious ends across the cyber-kill chain. A critical manifestation is the user attribute inference attack, where AI infers sensitive Personally Identifiable Information (PII) from innocuous public data. We explore how music streaming ecosystems, where users routinely release public playlists, can be exploited for Offensive AI. To quantify this threat, we developed musicPIIrate. This novel tool leverages deep l...
|
| 201 |
Shattering the Echo Chamber: Hidden Safeguards in Manuscripts Against the AI Takeover of Peer Review
2605.05271
Anti-LLM peer review safeguards研究在论文中嵌入隐藏指令以干扰聊天机器人代写评审的防护策略。
|
cs.CRcs.AI
|
Oubo Ma, Ruixiao Lin, Jiahao Chen, Yuan Su, Yong Yang |
As LLMs become increasingly capable, editorial boards and program committees are growing concerned about reviewers who fully outsource peer review to commercial chatbots. This concern stems from prior findings that current chatbots lack the independent critica...As LLMs become increasingly capable, editorial boards and program committees are growing concerned about reviewers who fully outsource peer review to commercial chatbots. This concern stems from prior findings that current chatbots lack the independent critical thinking and depth of reasoning required to assess scientific novelty. One promising direction for mitigating this concern is to embed hidden instructions into manuscripts that disrupt or alter chatbot-generated reviews. However, existing...
|
| 206 |
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
2605.04901
Secure Transformer inference vulnerability分析Transformer安全推理中的洗牌防御并揭示其安全性缺陷。
|
cs.CRcs.AI
|
Zhengyi Li, Yakai Wang, Kang Yang, Yu Yu, Jiaping Gui |
For Transformer models, cryptographically secure inference ensures that the client learns only the final output, while the server learns nothing about the client's input. However, securely computing nonlinear layers remains a major efficiency bottleneck due to...For Transformer models, cryptographically secure inference ensures that the client learns only the final output, while the server learns nothing about the client's input. However, securely computing nonlinear layers remains a major efficiency bottleneck due to the substantial communication rounds and data transmission required. To address this issue, prior works reveal intermediate activations to the client, allowing nonlinear operations to be computed in plaintext. Although this approach signif...
|
| 250 |
Agentic Vulnerability Reasoning on Windows COM Binaries
2605.05000
COM二进制漏洞代理提出SLYP代理流水线自动发现COM竞态漏洞并生成可验证PoC。
|
cs.CRcs.LG
|
Hwiwon Lee, Jongseong Kim, Lingming Zhang |
Windows Component Object Model (COM) services run with elevated privileges and are widely accessible to authenticated users, making race conditions in these binaries a critical surface for local privilege escalation. We present SLYP, an end-to-end agentic pipe...Windows Component Object Model (COM) services run with elevated privileges and are widely accessible to authenticated users, making race conditions in these binaries a critical surface for local privilege escalation. We present SLYP, an end-to-end agentic pipeline that discovers race condition vulnerabilities in COM binaries and generates debugger-verified proof-of-concept (PoC) code. SLYP exposes binary exploration, COM inspection, and dynamic debugging as reusable tool interfaces, giving agent...
|
| 275 |
SoK: Robustness in Large Language Models against Jailbreak Attacks
2605.05058
LLM jailbreak robustness survey系统梳理越狱攻击与防御并提出更全面的评测视角。
|
cs.CRcs.AI
|
Feiyue Xu, Hongsheng Hu, Chaoxiang He, Sheng Hang, Hanqing Hu |
Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce models into generating harmful, unethical, or policy-violating outputs. Such attacks pose real-world risks, er...Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce models into generating harmful, unethical, or policy-violating outputs. Such attacks pose real-world risks, eroding safety, trust, and regulatory compliance in high-stakes applications. Although a variety of attack and defense methods have been proposed, existing evaluation practices are inadequate, often relying on narrow metrics like attack succe...
|
| 339 |
Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use
2605.05287
企业多租户RAG安全架构提出厂商中立的多租户检索与工具调用安全合规方案。
|
cs.CRcs.AIcs.IRcs.SE
|
Francisco Javier Arceo, Varsha Prasad Narsing |
Retrieval-Augmented Generation (RAG) and agentic AI systems are increasingly prevalent in enterprise AI deployments. However, real enterprise environments introduce challenges largely absent from academic treatments and consumer-facing APIs: multiple tenants w...Retrieval-Augmented Generation (RAG) and agentic AI systems are increasingly prevalent in enterprise AI deployments. However, real enterprise environments introduce challenges largely absent from academic treatments and consumer-facing APIs: multiple tenants with heterogeneous data, strict access-control requirements, regulatory compliance, and cost pressures that demand shared infrastructure. A fundamental problem underlies existing RAG architectures in these settings: retrieval systems rank ...
|
| 347 |
How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study
2605.05340
物理世界VLM隐私意识评测实证评估VLM在真实环境中识别与处理隐私信息的能力。
|
cs.CRcs.AI
|
Junran Wang, Xinjie Shen, Zehao Jin, Pan Li |
As Vision-Language Models (VLMs) are increasingly deployed as autonomous cognitive cores for embodied assistants, evaluating their privacy awareness in physical environments becomes critical. Unlike digital chatbots, these agents operate in intimate spaces, su...As Vision-Language Models (VLMs) are increasingly deployed as autonomous cognitive cores for embodied assistants, evaluating their privacy awareness in physical environments becomes critical. Unlike digital chatbots, these agents operate in intimate spaces, such as homes and hospitals, where they possess the physical agency to observe and manipulate privacy-sensitive information and artifacts. However, current benchmarks remain limited to unimodal, text-based representations that cannot capture ...
|
| 393 |
Privacy Without Losing Place: A Paradigm for Private Retrieval in Spatial RAGs
2605.05459
Private Spatial RAG Retrieval用锚点替代编码实现空间RAG中的位置隐私检索。
|
cs.CRcs.LG
|
Kennedy Edemacu, Mohammad Mahdi Shokri, Vinay M. Shashidhar, Jong Wook Kim |
This work introduces PAS -- Privacy Anchor Substitution, a structured mechanism for enabling user location privacy in spatial retrieval-augmented generation (RAG) systems. Unlike conventional differential privacy methods that directly perturb user locations, P...This work introduces PAS -- Privacy Anchor Substitution, a structured mechanism for enabling user location privacy in spatial retrieval-augmented generation (RAG) systems. Unlike conventional differential privacy methods that directly perturb user locations, PAS represents location with relative anchor encoding consisting of an anchor, direction bin, and distance bin, allowing seamless integration with modern RAG pipelines. We evaluate PAS on a synthetic urban dataset and show that it achieves i...
|
| 412 |
Agentic AI and the Industrialization of Cyber Offense: Forecast, Consequences, and Defensive Priorities for Enterprises and the Mittelstand
2605.06713
Agentic AI Cyber Offense Forecast分析代理式AI如何压缩攻击链并提出企业防御优先级。
|
cs.CRcs.AIcs.HC
|
Christopher Koch |
Agentic AI systems can plan, call tools, inspect code, interact with web applications, and coordinate multi-step workflows. These same capabilities change the economics of cyber offense. The central near-term risk is not that every low-skill criminal immediate...Agentic AI systems can plan, call tools, inspect code, interact with web applications, and coordinate multi-step workflows. These same capabilities change the economics of cyber offense. The central near-term risk is not that every low-skill criminal immediately becomes a frontier exploit researcher; it is that agentic AI compresses the attack lifecycle by lowering the cost of reconnaissance, phishing, credential abuse, vulnerability triage, exploit adaptation, and post-compromise decision suppo...
|
| cs.CV 96 papers | ||||
| 12 |
Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring
2605.04397
Exposure Control for rPPG提出采集时自适应曝光控制以提升车内非接触心率监测鲁棒性。
|
cs.CVeess.SY
|
Jieying Wang, Xinqi Cai, Caifeng Shan, Wenjin Wang |
Remote photoplethysmography (rPPG) holds great promise for continuous heart-rate monitoring of drivers in intelligent vehicles. However, its performance is severely degraded by the highly dynamic illumination changes. A critical yet overlooked factor is the la...Remote photoplethysmography (rPPG) holds great promise for continuous heart-rate monitoring of drivers in intelligent vehicles. However, its performance is severely degraded by the highly dynamic illumination changes. A critical yet overlooked factor is the lack of exposure controlling during video acquisition -- most existing systems rely on either fixed exposure settings or camera build-in auto-exposure, both of which fail to maintain stable facial brightness under rapidly changing lighting co...
|
| 14 |
Detecting Deepfakes via Hamiltonian Dynamics
2605.04405
Dynamics-Based Deepfake Detection用哈密顿动力学稳定性分析替代静态特征以检测深度伪造。
|
cs.CVcs.AI
|
Harry Cheng, Ming-Hui Liu, Tianyi Wang, Weili Guan, Liqiang Nie |
Driven by the rapid development of generative AI models, deepfake detectors are compelled to undergo periodic recalibration to capture newly developed synthetic artifacts. To break this cycle, we propose a new perspective on deepfake detection: moving from sta...Driven by the rapid development of generative AI models, deepfake detectors are compelled to undergo periodic recalibration to capture newly developed synthetic artifacts. To break this cycle, we propose a new perspective on deepfake detection: moving from static pattern recognition to dynamical stability analysis. Specifically, our approach is motivated by physics-inspired priors: we hypothesize that natural images, as products of dissipative physical processes, tend to settle near stable, low-...
|
| 16 |
UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model
2605.04409
Remote Sensing Change Captioning发布城建变化基准并提出生成结构化变化描述的字幕模型。
|
cs.CV
|
Yupeng Gao, Tianyu Li, Guoqing Wang, Yang Yang |
Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely ...Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide l...
|
| 17 |
Evaluation Cards for XAI Metrics
2605.04410
XAI Metric Reporting Standard提出XAI评估卡模板以规范解释性指标的定义与报告。
|
cs.CVcs.AIcs.CYcs.LG
|
Rokas Gipiškis, Olga Kurasova |
The evaluation of explainable AI (XAI) methods is affected by a lack of standardization. Metrics are inconsistently defined, incompletely reported, and rarely validated against common baselines. In this paper, we identify transparency of evaluation reporting a...The evaluation of explainable AI (XAI) methods is affected by a lack of standardization. Metrics are inconsistently defined, incompletely reported, and rarely validated against common baselines. In this paper, we identify transparency of evaluation reporting as a central, under-addressed problem. We propose the XAI Evaluation Card, a documentation template analogous to model cards, designed to accompany any study that introduces an XAI evaluation metric. The card covers explicit declaration of t...
|
| 18 |
Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
2605.04412
Style-Controlled 3D Generation用结构化3D潜变量配合2D扩散实现可泛化的风格化3D生成。
|
cs.CV
|
Yiran Qiao, Yiren Lu, Yunlai Zhou, Disheng Liu, Linlin Hou |
3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emer...3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emerges as an important and desirable direction. However, existing approaches typically rely on style images that lie within or are similar to the training distribution of 3D generation models. When presented with out-of-distribution (OOD) styl...
|
| 22 |
Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning
2605.04425
Interpretable Prompt Learning交替进行语义token选择与提示优化以提升CLIP适配可解释性。
|
cs.CV
|
Yating Wang, Yaqi Zhao, Yongshun Gong, Yilong Yin, Haoliang Sun |
Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usuall...Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usually depends on large external models, leading to high computational costs and limited scalability. In this paper, we propose Interpretable Prompt Learning (IPL), a hybrid framework that alternates between discrete semantic token selection and...
|
| 25 |
Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes
2605.04435
Feedforward 4D Reconstruction提出Ground4D以空间锚定提升越野场景前馈4D重建质量。
|
cs.CV
|
Shuo Wang, Jilin Mei, Fuyang Liu, Wenfei Guan, Fanjie Kong |
Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-frequency geometry, ego-motion jitter, and increased non-ri...Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-frequency geometry, ego-motion jitter, and increased non-rigid dynamics. These factors introduce conflicting Gaussian observations across timestamps, leading to either over-smoothed renderings or structural artifacts. To address this issue, we propose Ground4D, a spatially-grounded 4D feedforward f...
|
| 27 |
A cross-modal network for facial expression recognition
2605.04439
Cross-Modal Facial Expression Recognition提出CMNet利用面部对称等结构信息进行跨模态表情识别。
|
cs.CV
|
Chunwei Tian, Jingyuan Xie, Qi Zhang, Chao Li, Wangmeng Zuo |
Deep neural networks enriched with structural information have been widely employed for facial expression recognition tasks. However, these methods often depend on hierarchical information rather than face property to finish expression recognition. In this pap...Deep neural networks enriched with structural information have been widely employed for facial expression recognition tasks. However, these methods often depend on hierarchical information rather than face property to finish expression recognition. In this paper, we propose a cross-modal network with strong biological and structural information for facial expression recognition (CMNet). CMNet can respectively learn expression information via face symmetry on a whole face, left and right half fac...
|
| 29 |
LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection
2605.04445
Synthetic Image Detection by Generator Cues提出LEGO利用LoRA聚焦生成器特征以提升合成图像检测泛化。
|
cs.CV
|
Yutong Xiao, Ran Ran, Jiwei Wei, Shuchang Zhou, Ke Liu |
The rapid advancement of generative technologies has made synthetic images nearly indistinguishable from real ones, thereby creating an urgent need for robust detectors to counter misinformation. However, existing methods mainly rely on universal artifact feat...The rapid advancement of generative technologies has made synthetic images nearly indistinguishable from real ones, thereby creating an urgent need for robust detectors to counter misinformation. However, existing methods mainly rely on universal artifact features that are shared across multiple generators. We observe that as the diversity of generators increases, the overlap of these common features gradually decreases. This severely undermines model generalization. In contrast, focusing only o...
|
| 30 |
Deep Reprogramming Distillation for Medical Foundation Models
2605.04447
Distillation for Medical Foundation Models提出深度重编程蒸馏以高效适配医学基础模型到下游场景。
|
cs.CV
|
Siyuan Du, Yuhang Zhou, Haolin Li, Jiangchao Yao, Haishuai Wang |
Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepa...Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillati...
|
| 33 |
RemoteZero: Geospatial Reasoning with Zero Human Annotations
2605.04451
Zero-Annotation Geospatial Reasoning无需人工坐标标注实现地理定位推理与自监督学习。
|
cs.CV
|
Liang Yao, Fan Liu, Shengxiang Xu, Chuanyi Zhang, Rui Min |
Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inferenc...Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still supervised by human-annotated ground-truth coordinates. This leaves the reasoning process autonomous, but not its spatial endpoint, and prevents true self-evolution on abundant unlabe...
|
| 34 |
StableI2I: Spotting Unintended Changes in Image-to-Image Transition
2605.04453
Image-to-Image Consistency Evaluation评测I2I结果对输入语义与空间结构的保持程度。
|
cs.CVcs.AI
|
Jiayang Li, Shuo Cao, Xiaohui Li, Zhizhen Zhang, Kaiwen Zhu |
In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the seman...In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure of the input image. To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework that explicitly measures content fidelity and pre--post consistency across a wide rang...
|
| 38 |
Stream-T1: Test-Time Scaling for Streaming Video Generation
2605.04461
Test-Time Scaling Video Generation利用流式分块生成降低TTS候选开销并增强时序引导。
|
cs.CV
|
Yijing Tu, Shaojin Wu, Mengqi Huang, Wenchuan Wang, Yuxin Wang |
While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack tempo...While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structural bottlenecks, we propose shifting the focus to streaming video generation. We identify that its chunk-level synthesis and few denoising steps are intrinsically suited for TTS, significantly lowering ...
|
| 43 |
Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding
2605.04475
Neuro-symbolic Driving Scene Understanding以信息协调桥接多传感器BEV表征并可验证推理以降幻觉。
|
cs.CV
|
Shuo Liu, Lei Shi, Haowen Liu, Jing Xu, Yufei Gao |
Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force...Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force it to reason over redundant or conflicting perception outputs, which can amplify hallucinated entities and unsafe conclusions. This paper proposes InfoCoordiBridge, a BEV-centric neuro-symbolic architecture that inserts an explicit coordin...
|
| 56 |
Example-Based Object Detection
2605.04501
Example-Based Open-Vocabulary Detection用示例提示实现无需固定类别的开放词汇目标检测。
|
cs.CVcs.AI
|
ZhiXin Sun |
In recent years, object detection has achieved significant progress, especially in the field of open-vocabulary object detection. Unlike traditional methods that rely on predefined categories, open-vocabulary approaches can detect arbitrary objects based on hu...In recent years, object detection has achieved significant progress, especially in the field of open-vocabulary object detection. Unlike traditional methods that rely on predefined categories, open-vocabulary approaches can detect arbitrary objects based on human-provided prompts. With the advancement of prompt-based detection techniques, models such as SAM3 can even outperform some category-specific detectors trained on particular datasets without requiring additional training on those datasets...
|
| 58 |
DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning
2605.04503
Image Difference Captioning Benchmark构建更难更鲁棒的图像差异描述基准与评测指标。
|
cs.CVcs.AI
|
Yuancheng Wei, Haojie Zhang, Linli Yao, Lei Li, Jiali Chen |
Image Difference Captioning (IDC) generates natural language descriptions that precisely identify differences between two images, serving as a key benchmark for fine-grained change perception, cross-modal reasoning, and image editing data construction. However...Image Difference Captioning (IDC) generates natural language descriptions that precisely identify differences between two images, serving as a key benchmark for fine-grained change perception, cross-modal reasoning, and image editing data construction. However, existing benchmarks lack diversity and compositional complexity, and standard lexical-overlap metrics (e.g., BLEU, METEOR) fail to capture semantic consistency or penalize hallucinations, which together prevent a comprehensive and robust ...
|
| 59 |
SpecPL: Disentangling Spectral Granularity for Prompt Learning
2605.04504
Spectral Prompt Learning for VLMs以频谱粒度解耦与反事实监督改进视觉语言提示学习。
|
cs.CVcs.AIcs.CLcs.LG
|
Jingtao Zhou, Xirui Kang, Feiyang Huang, Lai-Man Po |
Existing prompt learning for VLMs exhibits a modality asymmetry, predominantly optimizing text tokens while still relying on frozen visual encoder as holistic extractor and neglecting the spectral granularity essential for fine-grained discrimination. To bridg...Existing prompt learning for VLMs exhibits a modality asymmetry, predominantly optimizing text tokens while still relying on frozen visual encoder as holistic extractor and neglecting the spectral granularity essential for fine-grained discrimination. To bridge this, we introduce Disentangling Spectral Granularity for Prompt Learning (SpecPL), which approaches prompt learning from a novel spectral perspective via Counterfactual Granule Supervision. Specifically, we leverage a frozen VAE to decom...
|
| 61 |
Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting
2605.04506
Open-vocabulary 3D instance understanding在3D高斯泼溅中联合优化几何与语义实现开放词汇实例级理解
|
cs.CVcs.AI
|
Binh Long Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes, Peyman Moghadam |
We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view...We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view consistency, lacks coherent instance-level reasoning, and limits precision in downstream 3D tasks. To address these limitations, our method jointly optimizes scene geometry and semantic representations by augmenting Gaussian splats with vi...
|
| 65 |
From Priors to Perception: Grounding Video-LLMs in Physical Reality
2605.04515
Grounding Video-LLMs in physics通过物理现实约束提升视频大模型的细粒度物理推理能力
|
cs.CV
|
Zicheng Zhao, Chaofan Gan, Shijie Li, Weiyao Lin |
While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifac...While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifacts with genuine physical fallacies. Furthermore, we find that models fail systematically not only in anti-physics anomalies but also in counter-intuitive scenarios where visual facts contradict statistical expectations. Accordingly, we prop...
|
| 66 |
DALight-3D: A Lightweight 3D U-Net for Brain Tumor Segmentation from Multi-Modal MRI
2605.04518
Lightweight 3D U-Net tumor segmentation设计轻量3D U-Net以高效分割多模态MRI脑肿瘤
|
cs.CVcs.LGcs.NE
|
Nand Kumar Mishra, Dhruv Mishra, Dr Manu Pratap Singh |
Automatic brain tumor segmentation from multi-modal MRI remains challenging because volumetric models often incur substantial computational cost. This paper presents DALight-3D, a compact 3D U-Net variant that combines depthwise separable 3D convolutions, iden...Automatic brain tumor segmentation from multi-modal MRI remains challenging because volumetric models often incur substantial computational cost. This paper presents DALight-3D, a compact 3D U-Net variant that combines depthwise separable 3D convolutions, identifier-conditioned normalization, cross-slice attention, and adaptive skip fusion. The method is evaluated on the Medical Segmentation Decathlon Task01 BrainTumour benchmark under matched optimization settings against standard 3D U-Net, Att...
|
| 70 |
High-Fidelity Single-Image Head Modeling with Industry-Grade Topology
2605.04524
Single-image head mesh reconstruction从单张图像重建保持身份且具工业级拓扑的头部网格
|
cs.CVcs.GR
|
Yunmu Wang, Zoubin Bi, Bowen Cai, Chenchu Rong, Jinlong Wang |
We present a single-image head mesh reconstruction framework that addresses the longstanding challenge of simultaneously preserving facial identity and producing industry-grade topology. Our framework adopts a coarse-to-fine optimization pipeline that refines ...We present a single-image head mesh reconstruction framework that addresses the longstanding challenge of simultaneously preserving facial identity and producing industry-grade topology. Our framework adopts a coarse-to-fine optimization pipeline that refines a rigged template across three stages -- rig, joint, and vertex -- achieving stable convergence and consistent topology. To mitigate the ill-posed nature of single-image 3D face reconstruction and ensure identity preservation, we employ a n...
|
| 71 |
Velox: Learning Representations of 4D Geometry and Appearance
2605.04527
Latent representation learning for 4D objects将动态点云编码为形状token以压缩并重建4D几何与外观
|
cs.CV
|
Anagh Malik, Dorian Chan, Xiaoming Zhao, David B. Lindell, Oncel Tuzel |
We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured ...We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud, to construct. Specifically, Velox trains an encoder to compress spatiotemporal color point clouds into a set of dynamic shape tokens. These tokens are supervised using two complementary decoders: a 4D surface decoder, w...
|
| 74 |
Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection
2605.04531
Test-time adaptive open-vocabulary detection用奖励引导语义演化在测试时对齐文图嵌入提升检测鲁棒性
|
cs.CV
|
Lihua Zhou, Mao Ye, Xiatian Zhu, Nianxin Li, Changyi Ma |
Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of...Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a tr...
|
| 79 |
Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection
2605.04541
Image-to-point registration outlier rejection用角度一致层次注意力提升跨模态配准的外点剔除与PnP稳定性
|
cs.CV
|
Muyao Peng, Shun Zou, Pei An, You Yang, Qiong Liu |
Image-to-point-cloud registration (I2P) is a fundamental task in robotic applications such as manipulation,grasping, and localization. Existing deep learning-based I2P methods seek to align image and point cloud features in a learned representation space to es...Image-to-point-cloud registration (I2P) is a fundamental task in robotic applications such as manipulation,grasping, and localization. Existing deep learning-based I2P methods seek to align image and point cloud features in a learned representation space to establish correspondences, and have achieved promising results. However, when the inlier ratio of the initial matching pairs is low, conventional Perspective-n-Points (PnP) methods may struggle to achieve accurate results. To address this lim...
|
| 86 |
InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery
2605.04554
Interaction-aware multi-person mesh recovery显式建模人-人及人-物交互以端到端恢复多人人体网格
|
cs.CV
|
Kaili Zheng, Kaiwen Wang, Xun Zhu, Chenyi Guo, Ji Wu |
Humans constantly interact with their surroundings. Existing end-to-end multi-person human mesh recovery methods, typically based on the DETR framework, capture inter-human relationships through self-attention across all human queries. However, these approache...Humans constantly interact with their surroundings. Existing end-to-end multi-person human mesh recovery methods, typically based on the DETR framework, capture inter-human relationships through self-attention across all human queries. However, these approaches model interactions only implicitly and lack explicit reasoning about how humans interact with objects and with each other. In this paper, we propose InterMesh, a simple yet effective framework that explicitly incorporates human-environmen...
|
| 89 |
Efficient Geometry-Controlled High-Resolution Satellite Image Synthesis
2605.04557
Geometry-controlled satellite image synthesis在预训练扩散模型上加入几何控制以合成高分辨率卫星图像
|
cs.CVcs.AI
|
Vlad Vasilescu, Daniela Faur, Teodor Costachioiu |
High-resolution satellite images are often scarce and costly, especially for remote areas or infrequent events. This shortage hampers the development and testing of machine learning models for land-cover classification, change detection, and disaster monitorin...High-resolution satellite images are often scarce and costly, especially for remote areas or infrequent events. This shortage hampers the development and testing of machine learning models for land-cover classification, change detection, and disaster monitoring. In this paper, we tackle the problem of geometry-controlled high-resolution satellite image synthesis by adding control over existing pre-trained diffusion models. We propose a simple yet efficient method for controlling the synthesis pr...
|
| 90 |
SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression
2605.04560
Efficient perceptual image compression with Mamba用语义感知Mamba状态空间模型实现低复杂度高感知压缩
|
cs.CV
|
Jiaqian Zhang, Hao Wei, Chenyang Ge, Yanhui Zhou |
Perceptual image compression focuses on preserving high visual quality under low-bitrate constraints. Most existing approaches to perceptual compression leverage the strong generative capabilities of generative adversarial networks or diffusion models, at the ...Perceptual image compression focuses on preserving high visual quality under low-bitrate constraints. Most existing approaches to perceptual compression leverage the strong generative capabilities of generative adversarial networks or diffusion models, at the cost of substantial model complexity. To this end, we present an efficient perceptual image compression method that exploits the long-range modeling capability and linear computational complexity of state space models, with a particular foc...
|
| 91 |
Open-Source Image Editing Models Are Zero-Shot Vision Learners
2605.04566
Zero-shot vision in image editors系统评测开源图像编辑模型的零样本视觉能力。
|
cs.CVcs.CL
|
Wei Liu, Jiaxin Lin, Rui Chen |
Recent studies have shown that large generative models can solve vision tasks they were not explicitly trained for. However, existing evidence relies on closed-source models~(Veo~3, Nano Banana Pro) or requires task-specific instruction tuning, leaving open wh...Recent studies have shown that large generative models can solve vision tasks they were not explicitly trained for. However, existing evidence relies on closed-source models~(Veo~3, Nano Banana Pro) or requires task-specific instruction tuning, leaving open whether publicly available image-editing models possess zero-shot vision abilities out of the box. We conduct a systematic evaluation of three open-source image-editing models -- Qwen-Image-Edit, FireRed-Image-Edit, and LongCat-Image-Edit -...
|
| 93 |
Lightning Unified Video Editing via In-Context Sparse Attention
2605.04569
Sparse attention for video editing提出近无损稀疏注意力以加速上下文学习视频编辑。
|
cs.CV
|
Shitong Shao, Zikai Zhou, Haopeng Li, Yingwei Song, Wenliang Zhong |
Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse f...Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness...
|
| 95 |
VL-UniTrack: A Unified Framework with Visual-Language Prompts for UAV-Ground Visual Tracking
2605.04574
UAV-ground visual tracking prompts用视觉语言提示统一建模无人机与地面视角目标跟踪。
|
cs.CV
|
Boyue Xu, Ruichao Hou, Tongwei Ren, Gangshan Wu |
UAV-ground visual tracking (UGVT) aims to simultaneously track the same object from both the UAV and the ground view. However, existing two-stream methods suffer from isolated feature extraction and rely heavily on implicit appearance matching, which struggles...UAV-ground visual tracking (UGVT) aims to simultaneously track the same object from both the UAV and the ground view. However, existing two-stream methods suffer from isolated feature extraction and rely heavily on implicit appearance matching, which struggles to establish reliable correspondence under drastic view differences, leading to tracking unreliability. To address these limitations, we propose VL-UniTrack, a fully unified framework enhanced by visual-language prompts. By encoding featur...
|
| 97 |
GTF: Omnidirectional EPI Transformer for Light Field Super-Resolution
2605.04581
Light field super-resolution transformer用全方向EPI Transformer建模多角度几何提升光场超分。
|
cs.CV
|
Kunyu Li, Fei Wang, Lichao Zhang, Junjie Liu, Bihong Li |
Light field (LF) image super-resolution benefits from Epipolar Plane Images (EPIs), whose line slopes explicitly encode disparity. However, existing Transformer-based LF SR methods mainly attend to horizontal and vertical EPIs, leaving diagonal epipolar geomet...Light field (LF) image super-resolution benefits from Epipolar Plane Images (EPIs), whose line slopes explicitly encode disparity. However, existing Transformer-based LF SR methods mainly attend to horizontal and vertical EPIs, leaving diagonal epipolar geometry underexplored. We present GTF, an omnidirectional EPI Transformer that explicitly models horizontal, vertical, 45-degree, and 135-degree EPIs within a unified reconstruction framework. GTF combines directional EPI processing, MacPI-based...
|
| 100 |
From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation
2605.04590
Rectified flow for text segmentation以整流流替代扩散特征,重构文本提示图像分割方法。
|
cs.CVcs.AI
|
Zishen Qu, Xuesong Li, Haijian Gu, Hongwei Kang, Quan Meng |
Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have shown that diffusion m...Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have shown that diffusion models (e.g., Stable Diffusion) can provide rich multimodal semantic features, leading to studies of using diffusion models as feature extractors for segmentation tasks. Such methods, however, inherit the generative natures of diffusion mode...
|
| 101 |
DiCLIP: Diffusion Model Enhances CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
2605.04593
Diffusion-enhanced CLIP for WSSS用扩散模型增强CLIP密集知识以改进弱监督语义分割。
|
cs.CV
|
Zhiwei Yang, Pengfei Song, Yucong Meng, Kexue Fu, Shuo Wang |
Weakly Supervised Semantic Segmentation (WSSS) with image-level labels typically leverages Class Activation Maps (CAMs) to achieve pixel-level predictions. Recently, Contrastive Language-Image Pre-training (CLIP) has been introduced to generate CAMs in WSSS. H...Weakly Supervised Semantic Segmentation (WSSS) with image-level labels typically leverages Class Activation Maps (CAMs) to achieve pixel-level predictions. Recently, Contrastive Language-Image Pre-training (CLIP) has been introduced to generate CAMs in WSSS. However, previous WSSS methods solely adopt CLIP's vision-language paired property for dense localization, neglecting its inherently limited dense knowledge across both visual and text modalities, which renders CAM generation suboptimal. In ...
|
| 105 |
Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness
2605.04606
Unsupervised category-aware object detection用参考类别发现实现无标注且具类别意识的目标检测。
|
cs.CVcs.AI
|
Yichen Li, Qiankun Liu, Ying Fu |
Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods generate pseudo boxes without category labels, thus failing to ach...Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods generate pseudo boxes without category labels, thus failing to achieve category-aware classification. To overcome these limitations, we propose Reference-based Category Discovery (RefCD), an unsupervised detector that enables category-aware\footnotemark[1] detection without any manually annotated labels. ...
|
| 107 |
Advancing Aesthetic Image Generation via Composition Transfer
2605.04609
Composition transfer for image aesthetics显式建模构图并进行构图迁移以提升审美图像生成。
|
cs.CV
|
Kai Zou, Zhiwei Zhao, Bin Liu, Nenghai Yu |
Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composi...Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composition either through implicit learning or by semantics-based layout control, rather than explicitly modeling composition itself. To address this gap, we introduce Composer, a framework rooted in aesthetic theory, designed to model compositio...
|
| 111 |
Temporal Structure Matters for Efficient Test-Time Adaptation in Wearable Human Activity Recognition
2605.04617
Temporal-aware test-time adaptation for HAR利用时序结构改进可穿戴活动识别的测试时自适应效率。
|
cs.CVcs.HCcs.LG
|
Zishu Zhou, Zaipeng Xie, Xuanyao Jie |
Wearable human activity recognition (WHAR) models often suffer from performance degradation under real-world cross-user distribution shifts. Test-time adaptation (TTA) mitigates this degradation by adapting models online using unlabeled test streams, yet exist...Wearable human activity recognition (WHAR) models often suffer from performance degradation under real-world cross-user distribution shifts. Test-time adaptation (TTA) mitigates this degradation by adapting models online using unlabeled test streams, yet existing methods largely inherit assumptions from vision tasks and underexploit the inherent inter-window temporal structure in WHAR streams. In this paper, we revisit such temporal structure as a feature-conditioned inference signal rather than...
|
| 114 |
UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection
2605.04635
Generation-assisted PCB defect inspection结合可控生成与检测缓解PCB缺陷样本稀缺并提升检出。
|
cs.CV
|
Huan Zhang, Lianghong Tan, Yichu Xu, Jiangzhong Cao, Huanqi Wu |
Printed Circuit Board (PCB) defect inspection faces two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on singl...Printed Circuit Board (PCB) defect inspection faces two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on single-modality conditions with coarse structural control, while detection methods improve architectures without addressing the data bottleneck. To resolve both challenges jointly, we propose a generation-assisted PCB defect inspection framework...
|
| 116 |
CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering
2605.04641
Mitigating LVLM object hallucination用字幕引导的视觉注意力转向减少视觉语言模型物体幻觉。
|
cs.CV
|
Qiming Li, Zekai Ye, Xiaocheng Feng, Weihong Zhong, Libo Qin |
Although Large Vision-Language Models (LVLMs) have demonstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object hallucination. To tackle this, recent works mostly depend on ex...Although Large Vision-Language Models (LVLMs) have demonstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object hallucination. To tackle this, recent works mostly depend on expensive manual annotations and training cost, or decoding strategies which significantly increase inference time. In this work, we observe that LVLMs' attention to visual information is significantly enhanced when answering caption queries ...
|
| 121 |
Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling
2605.04662
Dance Motion Synthesis提出接触矩阵建模双人舞互动约束以提升动作生成真实感。
|
cs.CV
|
Xuhai Chen, Zhi Cen, Huaijin Pi, Sida Peng, Xiaowei Zhou |
Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quali...Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quality data is scarce, motion patterns are complex, and the details of human interactions are both intricate and abundant. To tackle these challenges, we propose a novel two-stage framework. In the first stage, we introduce a motion VQ-VAE with...
|
| 127 |
Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern
2605.04675
Physical Adversarial Attack on RGB-T设计非重叠RGB-T对抗服装以在真实场景规避可见-热成像检测器。
|
cs.CV
|
Xiaopei Zhu, Guanning Zeng, Zhanhao Hu, Jun Zhu, Xiaolin Hu |
Visible-thermal (RGB-T) object detection is a crucial technology for applications such as autonomous driving, where multimodal fusion enhances performance in challenging conditions like low light. However, the security of RGB-T detectors, particularly in the p...Visible-thermal (RGB-T) object detection is a crucial technology for applications such as autonomous driving, where multimodal fusion enhances performance in challenging conditions like low light. However, the security of RGB-T detectors, particularly in the physical world, has been largely overlooked. This paper proposes a novel approach to RGB-T physical attacks using adversarial clothing with a non-overlapping RGB-T pattern (NORP). To simulate full-view (0$^{\circ}$--360$^{\circ}$) RGB-T atta...
|
| 130 |
Multi-Level Bidirectional Biomimetic Learning for EEG-Based Visual Decoding
2605.04680
EEG Visual Decoding提出多层双向仿生学习以缓解配对数据少并提升EEG视觉解码对齐。
|
cs.CVcs.AI
|
Jingtao Liu, Peiliang Gong, Chuhang Zheng, Yiheng Liu, Qi Zhu |
EEG-based visual neural decoding aims to align neural responses with visual stimuli for tasks such as image retrieval. However, limited paired data and a fundamental mismatch between high-fidelity digital images and biological visual perception - distorted by ...EEG-based visual neural decoding aims to align neural responses with visual stimuli for tasks such as image retrieval. However, limited paired data and a fundamental mismatch between high-fidelity digital images and biological visual perception - distorted by retinotopic mapping and subject-specific neuroanatomy - severely impede cross-modal alignment. To address this, we propose MB2L, a Multi-Level Bidirectional Biomimetic Learning framework that incorporates structured physiological inductive ...
|
| 137 |
FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation
2605.04702
Identity-Preserving Text-to-Video提出姿态一致的身份保持学习以减少大姿态与遮挡下的人脸漂移。
|
cs.CVcs.AI
|
Yuanzhi Wang, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Kai Yu |
Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial ...Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial pose variations or facial occlusions. In this paper, we propose \textit{FaithfulFaces}, a pose-faithful facial identity preservation learning framework to improve IPT2V in complex dynamic scenes. The key of FaithfulFaces is a pose-shared id...
|
| 143 |
Not Every Subject Should Stay: Machine Unlearning for Noisy Engagement Recognition
2605.04713
Subject-Level Machine Unlearning研究训练后按主体移除噪声参与者影响的机器遗忘以修订参与度识别数据。
|
cs.CV
|
Alexander Vedernikov |
Engagement recognition datasets are typically subject-indexed and often contain noisy, subjective supervision, making post-hoc dataset revision a practical problem. Existing noisy-label and data-cleaning methods largely operate at the sample level before or du...Engagement recognition datasets are typically subject-indexed and often contain noisy, subjective supervision, making post-hoc dataset revision a practical problem. Existing noisy-label and data-cleaning methods largely operate at the sample level before or during training, but do not directly address a different question: once a model has already been trained, can the influence of an entire problematic subject be removed without full retraining? We study this setting through subject-level machi...
|
| 149 |
Anny-Fit: All-Age Human Mesh Recovery
2605.04728
All-Age Human Mesh Recovery提出多人物联合相机空间优化实现全年龄场景的单图3D人体网格恢复。
|
cs.CV
|
Laura Bravo-Sánchez, Matthieu Armando, Romain Brégier, Grégory Rogez, Serena Yeung-Levy |
Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail in real-world, all-age scenes, where body proportions an...Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail in real-world, all-age scenes, where body proportions and depth must be resolved jointly. We introduce Anny-Fit, a multi-person, camera-space optimization framework for all-age 3D human mesh recovery (HMR). Unlike existing per-person fitting methods, Anny-Fit jointly optimizes all individuals di...
|
| 151 |
ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting
2605.04730
3DGS视觉定位特征去偏提出无偏地标特征以提升3DGS视觉定位鲁棒性。
|
cs.CV
|
Yingdong Gu, Shaocheng Yan, Zhenjun Zhao, Yuan Kou, Jianxin Luo |
Visual localization is a core technology for augmented reality and autonomous navigation. Recent methods combine the efficient rendering of 3D Gaussian Splatting (3DGS) with feature-based localization. These methods rely on direct matching between 2D query fea...Visual localization is a core technology for augmented reality and autonomous navigation. Recent methods combine the efficient rendering of 3D Gaussian Splatting (3DGS) with feature-based localization. These methods rely on direct matching between 2D query features and the 3D Gaussian feature field, but this often results in mismatches due to an inherent bias in the learned Gaussian feature. We theoretically analyze the feature learning process in 3DGS, revealing that the widely adopted $α$-blen...
|
| 152 |
Morphology-Guided Cross-Task Coupling for Joint Building Height and Footprint Estimation
2605.04731
建筑高度与轮廓联合估计用形态约束耦合任务联合预测建筑高度与占地轮廓。
|
cs.CV
|
Jinzhen Han, JinByeong Lee, Jisung Kim, HongSik Yun |
Building height (BH) and building footprint (BF) jointly describe the vertical and horizontal extent of the built environment and are required inputs for urban climate, disaster-risk, and population-mapping models. The two parameters are coupled through floor-...Building height (BH) and building footprint (BF) jointly describe the vertical and horizontal extent of the built environment and are required inputs for urban climate, disaster-risk, and population-mapping models. The two parameters are coupled through floor-area-ratio (FAR) constraints, yet remote-sensing approaches typically treat them as independent regression targets. We argue that explicitly encoding this cross-task coupling is more impactful than further refining individual encoders, and ...
|
| 160 |
VC-FeS: Viewpoint-Conditioned Feature Selection for Vehicle Re-identification in Thermal Vision
2605.04750
热成像车辆重识别提出视角条件特征选择以提升热红外车辆重识别性能。
|
cs.CVeess.SY
|
Yasod Ginige, Ransika Gunasekara, Darsha Hewavitharana, Manjula Ariyarathne, Peshala Jayasekara |
Identification of less-articulated objects using single-channel images, such as thermal images, is important in many applications, such as surveillance. However, in this domain, existing methods show poor performance due to high similarity among objects of the...Identification of less-articulated objects using single-channel images, such as thermal images, is important in many applications, such as surveillance. However, in this domain, existing methods show poor performance due to high similarity among objects of the same category in the absence of color information (overlooking shape information) and de-emphasized texture information. Furthermore, variability in viewpoint adds more complexity as the features vary from side to side. We address these is...
|
| 161 |
Hybrid Congestion Classification Framework Using Flow-Guided Attention and Empirical Mode Decomposition
2605.04752
交通拥堵分类多模态融合融合流引导注意力与EMD以同时建模场景与非平稳运动进行拥堵分类。
|
cs.CVcs.AI
|
Eugene Kofi Okrah Denteh, Blessing Agyei Kyem, Joshua Kofi Asamoah, Armstrong Aboah |
Accurate traffic congestion classification requires models that jointly capture roadway scene context and non-stationary traffic motion, yet most prior work treats these requirements in isolation. Vision-based methods often depend on appearance cues with stand...Accurate traffic congestion classification requires models that jointly capture roadway scene context and non-stationary traffic motion, yet most prior work treats these requirements in isolation. Vision-based methods often depend on appearance cues with standard temporal pooling, which can bias predictions toward static infrastructure, whereas signal-based approaches characterize temporal dynamics but lack the spatial context needed for scene-level localization. These complementary limitations ...
|
| 166 |
Lightweight Cross-Spectral Face Recognition via Contrastive Alignment and Distillation
2605.04769
轻量跨光谱人脸识别用对比对齐与蒸馏实现面向边缘设备的轻量跨光谱人脸识别。
|
cs.CV
|
Anjith George, Sebastien Marcel |
Heterogeneous Face Recognition (HFR) aims at matching face images captured across different sensing modalities, such as thermal-to-visible or near-infrared-to-visible, enhancing the usability of face recognition systems in challenging real-world conditions. Al...Heterogeneous Face Recognition (HFR) aims at matching face images captured across different sensing modalities, such as thermal-to-visible or near-infrared-to-visible, enhancing the usability of face recognition systems in challenging real-world conditions. Although recent HFR methods have achieved significant improvements in performance, many rely on computationally expensive models, making them impractical for deployment on resource-limited edge devices. In this work, we introduce a lightweigh...
|
| 167 |
Gaze4HRI: Zero-shot Benchmarking Gaze Estimation Neural-Networks for Human-Robot Interaction
2605.04770
HRI零样本凝视估计评测提出面向人机交互条件的零样本凝视估计基准与评测分析。
|
cs.CVcs.HCcs.LGcs.RO
|
Berk Sezer, Ali Görkem Küçük, Erol Şahin, Sinan Kalkan |
While zero-shot appearance-based 3D gaze estimation offers significant cost-efficiency by directly mapping RGB images to gaze vectors, its reliability in Human-Robot Interaction (HRI) settings remains uncertain. Existing benchmarks frequently overlook fundamen...While zero-shot appearance-based 3D gaze estimation offers significant cost-efficiency by directly mapping RGB images to gaze vectors, its reliability in Human-Robot Interaction (HRI) settings remains uncertain. Existing benchmarks frequently overlook fundamental HRI conditions, such as dynamic camera viewpoints and moving targets in video. Furthermore, current cross-dataset evaluations often suffer from a complexity gap, where methods trained on diverse datasets are tested on significantly smal...
|
| 168 |
MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education
2605.04772
医学教育多模态检索生成构建医学图文检索与生成系统以提供可交互的学习资源。
|
cs.CV
|
Miguel Diaz Benito, Cecilia Diana Albelda, Alvaro Garcia Martin, Jesus Bescos Cano, Marcos Escudero-Vinolo |
Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understanding of anatomical structures. While medical atlases are v...Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understanding of anatomical structures. While medical atlases are valuable, they are often impractical due to their size and lack of interactivity, whereas online image search may provide mislabeled or incomplete material. To address this, we propose MIRAGE, a multimodal medical text and image retrieval an...
|
| 184 |
QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes
2605.04844
3D Gaussian splatting acceleration用几何感知四重包围盒加速3DGS光栅化的高斯相交计算。
|
cs.CVcs.GR
|
Xinze Li, Bohan Yang, Pengxu Chen, Yiyuan Wang, Hongcheng Luo |
3D Gaussian Splatting (3DGS) has emerged as an advanced technique for real-time novel view synthesis by representing scene geometry and appearance using differentiable Gaussian primitives. However, efficiently computing precise Gaussian-tile intersections rema...3D Gaussian Splatting (3DGS) has emerged as an advanced technique for real-time novel view synthesis by representing scene geometry and appearance using differentiable Gaussian primitives. However, efficiently computing precise Gaussian-tile intersections remains a critical task in the rasterization pipeline. To this end, we propose QuadBox, a method that leverages four axis-aligned bounding boxes to tightly encapsulate projected Gaussians in a discrete manner. First, we derive a geometry-aware ...
|
| 187 |
3D Ultrasound-Derived Pseudo-CT Synthesis Using a Transformer-Augmented Residual Network for Real-Time Operator Guidance
2605.04856
Ultrasound to pseudo-CT synthesis用Transformer增强残差网络从3D超声实时生成伪CT用于术中引导。
|
cs.CV
|
Sapna Sachan, Amulya Kumar Mahto |
Computed tomography (CT) is indispensable for clinical diagnosis and image-guided interventions but exposes patients to ionizing radiation, motivating the development of safer imaging alternatives. Ultrasound (US) is non-ionizing and widely accessible; however...Computed tomography (CT) is indispensable for clinical diagnosis and image-guided interventions but exposes patients to ionizing radiation, motivating the development of safer imaging alternatives. Ultrasound (US) is non-ionizing and widely accessible; however, it is highly operator dependent and lacks quantitative tissue characterization, often leading to diagnostic uncertainty and unnecessary CT examinations. This work presents a 3D ultrasound-derived pseudo-CT (UD-pCT) framework that generate...
|
| 190 |
VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA
2605.04870
Video TextVQA keyframe agents提出VTAgent以代理式关键帧锚定提升证据感知的视频文本问答。
|
cs.CV
|
Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, Bo Du |
Video text-based visual question answering (Video TextVQA) aims to answer questions by reasoning over visual textual content appearing in videos. Despite the strong multimodal video understanding capabilities of recent Video-LLMs, their performance on existing...Video text-based visual question answering (Video TextVQA) aims to answer questions by reasoning over visual textual content appearing in videos. Despite the strong multimodal video understanding capabilities of recent Video-LLMs, their performance on existing Video TextVQA benchmarks remains limited. To better understand this gap, we conduct an upper-bound analysis through frame-wise question answering, counting a sample as correct if any frame yields the right answer, which significantly outpe...
|
| 196 |
FairEnc: A Fair Vision-Language Model with Fair Vision and Text Encoders for Glaucoma Detection
2605.04882
Fair vision-language glaucoma detection提出FairEnc联合去偏视觉与文本编码器以公平检测青光眼。
|
cs.CVcs.AIcs.LGeess.IVq-bio.QM
|
Mohamed Elhabebe, Ayman El-Baz, Qing Liu |
Automated glaucoma detection is critical for preventing irreversible vision loss and reducing the burden on healthcare systems. However, ensuring fairness across diverse patient populations remains a significant challenge. In this paper, we propose FairEnc, a ...Automated glaucoma detection is critical for preventing irreversible vision loss and reducing the burden on healthcare systems. However, ensuring fairness across diverse patient populations remains a significant challenge. In this paper, we propose FairEnc, a fair pretraining method for vision-language models (VLMs) that enables simultaneous debiasing across multiple sensitive attributes. FairEnc jointly mitigates biases in both textual and visual modalities with respect to multiple sensitive at...
|
| 208 |
Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification
2605.04904
Animal re-identification via inpainting embeddings探索修复模型嵌入的聚类能力以基于皮纹实现动物个体识别。
|
cs.CV
|
Jens van Bijsterveld, Daniele Avitabile, Fons J. Verbeek, Rita Pucci |
In this paper, we explore deep learning techniques for individual identification of animals based on their skin patterns. Individual identification is crucial in biodiversity monitoring, since it enables analysis of decline or growth of populations, or intra-s...In this paper, we explore deep learning techniques for individual identification of animals based on their skin patterns. Individual identification is crucial in biodiversity monitoring, since it enables analysis of decline or growth of populations, or intra-species interactions within populations. Models trained for the task of individual identification often do not focus on the skin pattern of animals, but on background details or body shape details. These characteristics are not individually ...
|
| 225 |
DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring
2605.04943
Vision-language rope condition monitoring提出DART从单张图像输出绳索损伤评估、建议与报告。
|
cs.CVcs.AI
|
Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevic |
The condition monitoring (CM) of synthetic fibre ropes (SFRs) used in offshore, maritime, and industrial settings demands more than a classifier: inspectors need continuous severity estimates, maintenance recommendations, anomaly flags, deterioration timelines...The condition monitoring (CM) of synthetic fibre ropes (SFRs) used in offshore, maritime, and industrial settings demands more than a classifier: inspectors need continuous severity estimates, maintenance recommendations, anomaly flags, deterioration timelines, and automated reports, all from a single inspection image. We present DART (Damage Assessment via Rope Transformer), a vision-language foundation model that addresses the full rope inspection workflow through a unified multi-task architec...
|
| 240 |
ICPR 2026 Competition on Privacy-Preserving Person Re-Identification from Top-View RGB-Depth Camera (TVRID)
2605.04977
Privacy-preserving top-view re-identification benchmark报告TVRID竞赛并发布顶视RGB-Depth隐私行人重识别数据与结果。
|
cs.CV
|
Raphaël Delécluse, Hazem Wannous, Laurent Guimas |
This companion paper reports the ICPR 2026 TVRID competition on privacy-aware top-view person re-identification. We present the competition setting, the released RGB-Depth dataset, and a summary of final results with descriptions of the top entries. TVRID cont...This companion paper reports the ICPR 2026 TVRID competition on privacy-aware top-view person re-identification. We present the competition setting, the released RGB-Depth dataset, and a summary of final results with descriptions of the top entries. TVRID contains 86 identities captured by four synchronized overhead Intel RealSense D455 cameras, with paired RGB/Depth streams and structured geometric variation across flat, ascent, descent, and oblique viewpoints. The evaluation protocol includes ...
|
| 244 |
Attention-Based Chaotic Self-Supervision for Medical Image Classification
2605.04985
医学影像自监督学习提出注意力引导的混沌去噪自编码预训练提升医学分类。
|
cs.CV
|
Joao Batista Florindo, Amanda Pontes de Oliveira Ornelas |
Deep learning models for medical image classification usually achieve promising results but typically rely on large, annotated datasets or standard transfer learning from ImageNet. Self-Supervised Learning (SSL) has emerged as a powerful alternative, yet commo...Deep learning models for medical image classification usually achieve promising results but typically rely on large, annotated datasets or standard transfer learning from ImageNet. Self-Supervised Learning (SSL) has emerged as a powerful alternative, yet common methods like masked autoencoders (MAEs) may inadvertently destroy fine-grained diagnostic features by using random masking. In this paper, we propose a novel SSL pre-training strategy, the Chaotic Denoising Autoencoder (CDAE). Instead of ...
|
| 245 |
Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data
2605.04989
地理基础模型LoRA适配评估用低秩适配高效微调地理基础模型进行哨兵2火烧迹地制图。
|
cs.CV
|
Ali Shibli, Andrea Nascetti, Yifang Ban |
Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite...Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite imagery, yet there is still no clear understanding of how to efficiently adapt these models for downstream Earth observation tasks, particularly under geographic and temporal domain shift. This study evaluates three state-of-the-art Geospa...
|
| 255 |
Chaotic Contrastive Learning for Robust Texture Classification
2605.05012
纹理鲁棒对比学习提出混沌对比学习框架提升纹理分类对尺度光照与域移的鲁棒性。
|
cs.CV
|
Joao B Florindo |
Texture classification is a pivotal task in computer vision, presenting unique challenges due to high inter-class similarity and the sensitivity of structural patterns to scale and illumination changes. While Convolutional Neural Networks (CNNs) and recent Vis...Texture classification is a pivotal task in computer vision, presenting unique challenges due to high inter-class similarity and the sensitivity of structural patterns to scale and illumination changes. While Convolutional Neural Networks (CNNs) and recent Vision Transformers have set performance benchmarks, they often require extensive labeled datasets or struggle to generalize across domains due to an over-reliance on color and shape features. This paper introduces a novel framework that syner...
|
| 256 |
CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography
2605.05014
自动驾驶稠密3D数据集发布CARD多模态数据集提供复杂路况下准稠密3D真值用于重建评测。
|
cs.CV
|
Gasser Elazab, Frank Neuhaus, Tilman Koß, Malte Splietker, Aditya Date |
Autonomous driving must operate across diverse surfaces to enable safe mobility. However, most driving datasets are captured on well-paved flat roads. Moreover, recent driving datasets primarily provide sparse LiDAR ground truth for images, which is insufficie...Autonomous driving must operate across diverse surfaces to enable safe mobility. However, most driving datasets are captured on well-paved flat roads. Moreover, recent driving datasets primarily provide sparse LiDAR ground truth for images, which is insufficient for assessing fine-grained geometry in depth estimation and completion. To address these gaps, we introduce CARD, a multi-modal driving dataset that delivers quasi-dense 3D ground truth across continuous sequences rich in speed bumps, po...
|
| 262 |
Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models
2605.05026
扩散模型幻觉分析用局部内在维度刻画扩散模型结构性幻觉并解释其不稳定性来源。
|
cs.CVcs.AI
|
Bartlomiej Sobieski, Matthew Tivnan, Dawid Płudowski, Michał Jan Włodarczyk, Pengfei Jin |
Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied...Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied this failure mode from several viewpoints, offering partial explanations to their occurrence, such as mode interpolation. In this work, we propose a complementary perspective that treats hallucinations as instabilities on the model-induced...
|
| 263 |
Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification
2605.05027
终身行人重识别蒸馏提出提示锚定的视觉文本蒸馏缓解语义漂移与灾难遗忘。
|
cs.CV
|
Wen Wen, Hao Chen, Shiliang Zhang |
Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-fr...Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-free approaches largely rely on visual-only distillation or parameter regularization, while overlooking the potential of auxiliary modalities, such as text, to preserve semantic stability and enable incremental plasticity. We observe that the...
|
| 265 |
Computer-Aided Design Generation by Cascaded Discrete Diffusion Model
2605.05031
CAD离散扩散生成提出级联离散扩散模型生成CAD命令序列以避免连续扰动的语义无效。
|
cs.CV
|
Honghu Pan, Xiaoling Luo, Yongyong Chen, Zhenyu He, Pengyang Wang |
Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean embedding space. Howeve...Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean embedding space. However, continuous diffusion perturbs representations in a continuous Euclidean domain that does not reflect the inherently discrete and heterogeneous nature of CAD tokens, often producing perturbed representations that map to semantically inval...
|
| 266 |
Few-Shot Learning Pipeline for Monkeypox Skin Disease Classification Using CNN Feature Extractors
2605.05034
猴痘皮肤少样本分类用CNN特征与SimpleShot构建少样本管线实现猴痘皮肤病识别。
|
cs.CV
|
Md. Safirur Rashid, Sabbir Ahmed, Muhammad Usama Islam, Sumona Hoque Mumu, Md. Hasanul Kabir |
Despite the strong performance of Convolutional Neural Networks (CNNs) in disease classification, their effectiveness often depends on access to large annotated datasets, which is an impractical requirement for emerging or rare conditions such as Monkeypox. To...Despite the strong performance of Convolutional Neural Networks (CNNs) in disease classification, their effectiveness often depends on access to large annotated datasets, which is an impractical requirement for emerging or rare conditions such as Monkeypox. To overcome this limitation, we propose a few-shot learning (FSL) framework that employs SimpleShot, a lightweight, non-parametric, inductive classifier, for Monkeypox and pox-like skin disease recognition from limited labeled examples. The p...
|
| 268 |
When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise
2605.05045
VLM关系幻觉鲁棒性分析旋转与噪声导致VLM关系幻觉并评估纠偏与去噪策略。
|
cs.CVcs.CL
|
Philip Wootaek Shin, Ajay Narayanan Sridhar, Sivani Devarapalli, Rui Zhang, Jack Sampson |
Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and ...Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while the...
|
| 272 |
Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation
2605.05054
Flow matching few-shot adaptation用极坐标分解解耦径向与角向流以提升小样本适配。
|
cs.CVcs.AIcs.LG
|
Hongxu Chen, Yanghao Wang, Bowei Zhu, Hongxiang Li, Zhen Wang |
Recent flow matching (FM) methods improve the few-shot adaptation of vision-language models, by modeling cross-modal alignment as a continuous multi-step flow. In this paper, we argue that existing FM methods are inherently constrained by incompatible geometri...Recent flow matching (FM) methods improve the few-shot adaptation of vision-language models, by modeling cross-modal alignment as a continuous multi-step flow. In this paper, we argue that existing FM methods are inherently constrained by incompatible geometric priors on pre-trained cross-modal features, resulting in suboptimal adaptation performance. We first analyze these methods from a polar decomposition perspective (i.e., radial and angular sub-manifolds). Under this new geometric view, we ...
|
| 274 |
ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection
2605.05057
Open-vocabulary HOI detection学习脚本化状态转移以提升开放词汇人机交互检测。
|
cs.CV
|
Minh Anh Nguyen, Quang Huy Tran, Bao Ngoc Le, SuiYang Guang, Tuan Kiet Pham |
Open-vocabulary human-object interaction (HOI) detection requires recognizing interaction phrases that may not appear as annotated categories during training. Recent vision-language HOI detectors improve semantic transfer by matching human-object features with...Open-vocabulary human-object interaction (HOI) detection requires recognizing interaction phrases that may not appear as annotated categories during training. Recent vision-language HOI detectors improve semantic transfer by matching human-object features with text embeddings, but their predictions are often dominated by object affordance and phrase-level co-occurrence. As a result, a model may predict \textit{cut cake} from the presence of a knife and a cake without verifying whether the hand, ...
|
| 280 |
Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy
2605.05072
Camera-LiDAR occupancy projection用高度引导的投影重参数化改进2D到3D占据特征对齐。
|
cs.CV
|
Yuan Wu, Zhiqiang Yan, Jiawei Lian, Zhengxue Wang, Jian Yang |
3D occupancy prediction aims to infer dense, voxel-wise scene semantics from sensor observations, where the 2D-to-3D view transformation serves as a crucial step in bridging image features and volumetric representations. Most previous methods rely on a fixed p...3D occupancy prediction aims to infer dense, voxel-wise scene semantics from sensor observations, where the 2D-to-3D view transformation serves as a crucial step in bridging image features and volumetric representations. Most previous methods rely on a fixed projection space, where 3D reference points are uniformly sampled along pillars. However, such sampling struggles to capture the sparsity and height variations of real-world scenes, leading to ambiguous correspondences and unreliable feature...
|
| 281 |
FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching
2605.05077
Language-guided image segmentation用流匹配实现语言引导的二分割并保留细粒度结构。
|
cs.CV
|
Andranik Sargsyan, Shant Navasardyan |
Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical image analysis. In recent years, Dichotomous Image Segmentation (DIS) has become a standard task for training and evaluating...Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical image analysis. In recent years, Dichotomous Image Segmentation (DIS) has become a standard task for training and evaluating highly accurate segmentation models. Existing DIS approaches often fail to preserve fine-grained details or fully capture the semantic structure of the foreground. To address these challenges, we present FlowDIS, a novel dichotomous image ...
|
| 282 |
A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping
2605.05079
Refractive warping restoration benchmark构建强折射扭曲下多帧图像复原统一基准与评测集。
|
cs.CV
|
Maxim V. Shugaev, Md Reshad Ul Hoque, Bridget Kennedy, Joseph T. Riley, Fiona Hwang |
Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no existing benchmarks syst...Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no existing benchmarks systematically evaluate restoration methods under strong and highly nonuniform refractive conditions. We present a comprehensive benchmark for geometric distortion removal in video, covering a range from turbulence-like mild warping to strong d...
|
| 311 |
CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization
2605.05136
CPCA域泛化网络通过深度展开CPCA学习域不变子空间以提升OOD泛化。
|
cs.CV
|
Yu-Hsi Chen, Abd-Krim Seghouane |
Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen target domains. While recent invariant learning strategies and architectural advances have achieved strong p...Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen target domains. While recent invariant learning strategies and architectural advances have achieved strong performance, explicitly discovering a structured domain-invariant subspace through second-order statistics remains underexplored. In this work, we propose CPCANet, a novel framework grounded in Common Principal Component Analysis (CPCA), whi...
|
| 315 |
What Matters in Practical Learned Image Compression
2605.05148
实用感知图像压缩系统分析并联合优化学习式图像编解码的感知质量与运行效率。
|
cs.CVcs.AIcs.LG
|
Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang |
One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is ...One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed. In this work, we aim to close this gap. We conduct a comprehensive study of the key modeling choices that govern the design of a practical learned image codec, jointly optimized for perceptual quality and runtime -- in...
|
| 318 |
Aes3D: Aesthetic Assessment in 3D Gaussian Splatting
2605.05155
3DGS美学评估提出Aes3D评估3D高斯泼溅场景的构图与审美属性。
|
cs.CVcs.AI
|
Chuanzhi Xu, Boyu Wei, Haoxian Zhou, Xuanhua Yin, Zihan Deng |
As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes becomes important in helping creators build more visually compelling 3D content. However, existing evaluation methods for 3D ...As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes becomes important in helping creators build more visually compelling 3D content. However, existing evaluation methods for 3D scenes primarily emphasize reconstruction fidelity and perceptual realism, largely overlooking higher-level aesthetic attributes such as composition, harmony, and visual appeal. This limitation comes from two key challenges: (1) the absence...
|
| 320 |
Seeing What Shouldn't Be There: Counterfactual GANs for Medical Image Attribution
2605.05283
医学影像反事实归因用反事实GAN生成对照样本以解释医学图像分类的关键证据区域。
|
cs.CV
|
Shakeeb Murtaza |
Ascription of an image gives insights into the objects that influence the classification of the whole image or its pixels towards a specific category. These insights help radiologists to visualize deformities in medical imaging. Most of the existing visualizat...Ascription of an image gives insights into the objects that influence the classification of the whole image or its pixels towards a specific category. These insights help radiologists to visualize deformities in medical imaging. Most of the existing visualization techniques are based on discriminative models and highlight regions of the input image participating in the decision-making of a classifier. However, these approaches do not take all noticeable objects into account as their objective is...
|
| 321 |
Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging
2605.05161
VLM医学OOD定位提出WALDO用最优传输对齐正常参考分布实现零样本异常定位。
|
cs.CV
|
Bernhard Kainz, Johanna P Mueller, Matthew Baugh, Cosmin Bercea |
Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a co...Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a comparative inference problem in which anomalies are identified through structured comparison against reference distributions of normal anatomy. We introduce WALDO, a training-free framework grounded in optimal transport theory that enables c...
|
| 322 |
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
2605.05163
物理约束3D资产生成提出PhysForge与PhysDB生成具层级物理功能逻辑的可交互3D资产。
|
cs.CV
|
Yunhan Yang, Chunshi Wang, Junliang Ye, Yang Li, Zanxin Chen |
Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interacti...Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physica...
|
| 323 |
Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation
2605.05164
WSI几何感知表征提出几何感知状态空间模型聚合病理WSI补丁以提升切片级预测。
|
cs.CVcs.AI
|
Enhui Chai, Sicheng Chen, Tianyi Zhang, Chad Wong, Kecheng Huang |
Accurate analysis of histopathological images is critical for disease diagnosis and treatment planning. Whole-slide images (WSIs), which digitize tissue specimens at gigapixel resolution, are fundamental to this process but require aggregating thousands of pat...Accurate analysis of histopathological images is critical for disease diagnosis and treatment planning. Whole-slide images (WSIs), which digitize tissue specimens at gigapixel resolution, are fundamental to this process but require aggregating thousands of patches for slide-level predictions. Multiple Instance Learning (MIL) tackles this challenge with a two-stage paradigm, decoupling tile-level embedding and slide-level prediction. However, most existing methods implicitly embed patch represent...
|
| 332 |
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
2605.05185
多模态搜索代理训练配方开源多模态搜索代理的数据、轨迹合成与训练流程配方。
|
cs.CV
|
Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai |
Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain diff...Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training fr...
|
| 333 |
LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)
2605.05187
4D世界模型质量评测报告PhyScore挑战,评估生成视频的感知与物理一致性指标。
|
cs.CV
|
Wei Luo, Yiting Lu, Xin Li, Haoran Li, Fengbin Guan |
This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: percept...This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: perceptual quality alone is insufficient to judge whether generated dynamics are physically plausible, temporally coherent, and consistent with input conditions. Participants are required to build a metric that jointly predicts four dimensions, i....
|
| 340 |
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
2605.05204
扩散模型在策略自蒸馏微调用D-OPSD在不破坏少步推理下持续微调蒸馏扩散模型。
|
cs.CV
|
Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng |
The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly...The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromises their inherent few-step inference capability. To address this, we propose D-OPSD, a novel training paradigm for step-distill...
|
| 341 |
Taming Outlier Tokens in Diffusion Transformers
2605.05206
扩散Transformer离群token抑制分析DiT离群高范数token并提出方法减弱其不良影响。
|
cs.CVcs.AIcs.LG
|
Xiaoyu Wu, Yifei Wang, Tsu-Jui Fu, Liang-Chieh Chen, Zhe Gan |
We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information,...We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information, but their role in generative models remains underexplored. We show that this phenomenon appears in both the encoder and denoiser of modern Representation Autoencoder (RAE)-DiT pipelines: pretrained ViT encoders can produce outlier represen...
|
| 342 |
Syn4D: A Multiview Synthetic 4D Dataset
2605.05207
多视角合成4D数据集发布Syn4D,提供动态场景多视角真值深度与跟踪标注。
|
cs.CV
|
Zeren Jiang, Yushi Lan, Yihang Luo, Yufan Deng, Zihang Lai |
Dense 3D reconstruction and tracking of dynamic scenes from monocular video remains an important open challenge in computer vision. Progress in this area has been constrained by the scarcity of high-quality datasets with dense, complete, and accurate geometric...Dense 3D reconstruction and tracking of dynamic scenes from monocular video remains an important open challenge in computer vision. Progress in this area has been constrained by the scarcity of high-quality datasets with dense, complete, and accurate geometric annotations. To address this limitation, we introduce Syn4D, a multiview synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations. A key feature of Syn4...
|
| 343 |
Query2Uncertainty: Robust Uncertainty Quantification and Calibration for 3D Object Detection under Distribution Shift
2605.05328
分布移位下3D检测不确定性提出密度感知校准方法提升3D目标检测移位场景置信度可靠性。
|
cs.CVcs.RO
|
Till Beemelmanns, Alexey Nekrasov, Stefan Vilceanu, Jonas Steinhaus, Timo Woopen |
Reliable uncertainty estimation for 3D object detection is critical for deploying safe autonomous systems, yet modern detectors remain poorly calibrated, especially under distribution shifts. Although post-hoc calibration methods address this issue and provide...Reliable uncertainty estimation for 3D object detection is critical for deploying safe autonomous systems, yet modern detectors remain poorly calibrated, especially under distribution shifts. Although post-hoc calibration methods address this issue and provide improved calibration for in-distribution tests, they fail to adapt in distribution-shifted scenarios. In this work, we address this issue and introduce a density-aware calibration method that couples post-hoc calibrators with the feature d...
|
| 346 |
ViTok-v2: Scaling Native Resolution Auto-Encoders to 5 Billion Parameters
2605.05331
大规模ViT自编码器tokenizer将原生分辨率ViT自编码器扩展到50亿参数并稳定训练。
|
cs.CVcs.AIcs.LG
|
Philippe Hansen-Estruch, Jiahui Chen, Vivek Ramanujan, Orr Zohar, Yan Ping |
Vision Transformer (ViT) autoencoders have emerged as compelling tokenizers for images, offering improved reconstruction over convolutional tokenizers. However, existing ViT tokenizers cannot explore this landscape as performance degrades outside training reso...Vision Transformer (ViT) autoencoders have emerged as compelling tokenizers for images, offering improved reconstruction over convolutional tokenizers. However, existing ViT tokenizers cannot explore this landscape as performance degrades outside training resolutions, and reliance on adversarial losses prevents stable scaling. ViTok (Hansen-Estruch et al., 2025) found that the compression ratio r mediates a reconstruction-generation trade-off where lower r means better reconstructions but harder...
|
| 349 |
Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery
2605.05344
卫星图像开放词汇检索用LLM引导的查询嵌入精炼提升卫星开放词汇目标检索。
|
cs.CVcs.AIcs.IR
|
Md Adnan Arefeen, Biplob Debnath, Ravi K. Rajendran, Murugan Sankaradas, Srimat T. Chakradhar |
In satellite applications, user queries often take the form of open-ended natural language, extending beyond a fixed set of predefined categories. This open-vocabulary nature poses significant challenges for retrieving relevant image tiles, as the retrieval sy...In satellite applications, user queries often take the form of open-ended natural language, extending beyond a fixed set of predefined categories. This open-vocabulary nature poses significant challenges for retrieving relevant image tiles, as the retrieval system must generalize to a wide range of unseen objects and concepts. While vision-language models (VLMs) such as CLIP are widely used for text-image retrieval, even fine-tuned variants often struggle to accurately align such queries with sa...
|
| 352 |
egenioussBench: A New Dataset for Geospatial Visual Localisation
2605.05351
地理视觉定位基准数据集提出egenioussBench,基于城市级3D网格与高精度手机查询定位。
|
cs.CV
|
Phillipp Fanta-Jende, Francesco Vultaggio, Alexander Kern, Yasmin Loeper, Markus Gerke |
We present egenioussBench, a visual localisation benchmark built on geospatial reference data: a city-scale airborne 3D mesh and a CityGML LoD2 model. This pairing reflects deployable mapping assets and supports true scalability beyond traditional SfM-based ap...We present egenioussBench, a visual localisation benchmark built on geospatial reference data: a city-scale airborne 3D mesh and a CityGML LoD2 model. This pairing reflects deployable mapping assets and supports true scalability beyond traditional SfM-based approaches. The query data comprise smartphone images with centimetre-accurate, map-independent ground truth obtained via PPK and GCP/CP-aided adjustment. From 2,709 images, we derive a non-co-visible subset by estimating the full co-visibili...
|
| 358 |
Tamaththul3D: High-Fidelity 3D Saudi Sign Language Avatars from Monocular Video
2605.05367
单目视频手语3D头像重建为沙特手语提供SMPL-X标注并从单目视频生成高保真3D头像。
|
cs.CVcs.AI
|
Eyad Alghamdi, Sattam Altuuaim, Obay Ghulam, Abdulrahman Qutah, Yousef Basoodan |
Arabic Sign Language (ArSL) and its dialects serve approximately 400 million Arabic speakers worldwide, yet the community lacks high-quality 3D parametric annotations and specialized reconstruction methods for avatar generation. We address this critical gap th...Arabic Sign Language (ArSL) and its dialects serve approximately 400 million Arabic speakers worldwide, yet the community lacks high-quality 3D parametric annotations and specialized reconstruction methods for avatar generation. We address this critical gap through two key contributions: First, we introduce the first high-quality 3D parametric annotations for the Ishara-500 Saudi Sign Language dataset, providing precise SMPL-X parameters for 500 culturally authentic SSL signs. Second, we present...
|
| 361 |
Two Steps Are All You Need: Efficient 3D Point Cloud Anomaly Detection with Consistency Models
2605.05372
3D点云异常检测用一致性模型两步实现高效点云异常检测。
|
cs.CVcs.AI
|
Pranav A, Shashank B, Pranav Siddappa, Dominik Seuss, Minal Moharir |
Diffusion models are rapidly redefining 3D anomaly detection in point cloud data. As 3D sensing becomes integral to modern manufacturing, reliable anomaly detection is essential for high-throughput quality assurance and process control. Yet practical deploymen...Diffusion models are rapidly redefining 3D anomaly detection in point cloud data. As 3D sensing becomes integral to modern manufacturing, reliable anomaly detection is essential for high-throughput quality assurance and process control. Yet practical deployment on resource-constrained, latency-critical systems remains limited. Existing methods are often computationally prohibitive or unreliable in complex, unmasked regions, and diffusion pipelines are inherently bottlenecked by iterative denoisi...
|
| 364 |
Visual Text Compression as Measure Transport
2605.06708
视觉文本压缩评估将视觉文本压缩建模为测度传输以预测任务效用。
|
cs.CVcs.AI
|
Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li |
Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not t...Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not translate predictably into downstream utility: on some tasks the visual path matches or exceeds the text path, on others it collapses, and the compression ratio itself does not predict which regime will occur. The missing quantity is therefo...
|
| 369 |
LAMP: Localization Aware Multi-camera People Tracking in Metric 3D World
2605.05390
多相机3D行人跟踪提出LAMP利用定位与标定多视角实现头显3D跟踪。
|
cs.CV
|
Nan Yang, Julian Straub, Fan Zhang, Richard Newcombe, Jakob Engel |
Tracking 3D human motion from egocentric multi-camera headset is challenged by severe egomotion, partial visibility or occlusions and lack of training data. Existing methods designed for monocular video often require static or slowly-moving cameras and cannot ...Tracking 3D human motion from egocentric multi-camera headset is challenged by severe egomotion, partial visibility or occlusions and lack of training data. Existing methods designed for monocular video often require static or slowly-moving cameras and cannot efficiently leverage multi-view, calibrated and localized input. This makes them brittle and prone to fail on dynamic egocentric captures. We propose LAMP (Localization Aware Multi-camera People Tracking): a novel, simple framework to solve...
|
| 375 |
Zero-Shot Satellite Image Retrieval through Joint Embeddings: Application to Crisis Response
2605.05405
零样本卫星图像检索提出GeoQuery用联合嵌入实现自然语言零样本卫星检索。
|
cs.CV
|
James Walsh, William Fawcett, Grace Colvard, Raúl Ramos-Pollán |
Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-...Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-sensing CLIP-style model requires paired data and compute that are unavailable at global scale. To allow natural language querying at global scales, we present GeoQuery, a zero-shot retrieval system that sidesteps data and compute constrain...
|
| 388 |
Safety-Critical Camera Reliability Monitoring for ADAS via Degradation-Aware Uncertainty Pattern Analysis
2605.05439
ADAS相机可靠性监测基于退化不确定性模式构建健康指数以提前预警相机风险。
|
cs.CV
|
Shiva Aher |
Reliable camera input is essential for safety-critical ADAS perception, but most monitoring approaches detect sensor failures only after downstream performance has degraded. We propose a proactive camera reliability monitoring framework that estimates percepti...Reliable camera input is essential for safety-critical ADAS perception, but most monitoring approaches detect sensor failures only after downstream performance has degraded. We propose a proactive camera reliability monitoring framework that estimates perception risk from degradation-induced uncertainty patterns before downstream failure becomes observable. The method introduces a Global Sensor Health Index (GSHI), a continuous reliability score that aggregates per-degradation severities using a...
|
| 392 |
EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function
2605.05447
Beamspace Echocardiography Dataset发布原始波束空间超声数据集用于心脏运动与血流学习。
|
cs.CV
|
Elias Stenhede, Joanna Sulkowska, Eivind Bjørkan Orstad, Henrik Schirmer, Arian Ranjbar |
We introduce EchoXFlow, a clinical echocardiography dataset for learning from ultrasound in its native acquisition geometry rather than from scan-converted Cartesian videos. Existing public datasets offer limited opportunities to study cross-modal relationship...We introduce EchoXFlow, a clinical echocardiography dataset for learning from ultrasound in its native acquisition geometry rather than from scan-converted Cartesian videos. Existing public datasets offer limited opportunities to study cross-modal relationships between cardiac anatomy, myocardial motion, and blood flow, as Doppler is typically absent or fused as RGB overlays, and acquisitions are released after lossy vendor display processing. EchoXFlow comprises 37125 recordings from 666 routin...
|
| 411 |
The First Controllable Bokeh Rendering Challenge at NTIRE 2026
2605.05510
Controllable Bokeh Rendering Benchmark总结NTIRE可控散景渲染挑战赛结果与优胜方法。
|
cs.CV
|
Tim Seizinger, Florin-Alexandru Vasluianu, Jeffrey Chen, Zhuyun Zhou, Zongwei Wu |
This study presents the outcomes of the first Controllable Bokeh Rendering Challenge at NTIRE and highlights the most effective submitted methodologies. In total, 44 participants registered for the competition, of which 8 teams submitted valid solutions after ...This study presents the outcomes of the first Controllable Bokeh Rendering Challenge at NTIRE and highlights the most effective submitted methodologies. In total, 44 participants registered for the competition, of which 8 teams submitted valid solutions after the conclusion of the final test phase. All submissions were evaluated on unseen images, focusing on portraits and intricate subjects with complex and visually appealing bokeh phenomena. In addition to the first track focusing on establishe...
|
| cs.CY 2 papers | ||||
| 110 |
Guidelines for Designing AI Technologies to Support Adult Learning
2605.04616
AI design for adult learning总结成人学习场景下AI教育技术的设计与评估指南。
|
cs.CYcs.AI
|
Jennifer M. Reddig, Glen R. Smith, Sanaz Ahmadzadeh Siyahrood, Wesley G. Morris, Yoojin Bae |
AI-powered educational technologies have demonstrated measurable benefits for learners, but their design and evaluation have largely centered on K-12 contexts. As a result, many AI-supported learning systems remain poorly aligned with the needs, constraints, a...AI-powered educational technologies have demonstrated measurable benefits for learners, but their design and evaluation have largely centered on K-12 contexts. As a result, many AI-supported learning systems remain poorly aligned with the needs, constraints, and goals of adult learners. To better understand how AI systems function in adult education, this paper examines the deployment of several AI learning technologies developed within a multidisciplinary, national research institute in the Uni...
|
| 396 |
The Pedagogy of AI Mistakes: Fostering Higher-Order Thinking
2605.05472
Pedagogy of AI Errors将生成式AI错误用于课程设计以促进高阶思维训练。
|
cs.CYcs.AI
|
Hadi Hosseini |
As generative AI becomes increasingly integrated into higher education, its frequent errors and hallucinations, often seen as limitations, offer a unique pedagogical opportunity. By framing AI as a ``learning companion'' whose imperfect outputs prompt analysis...As generative AI becomes increasingly integrated into higher education, its frequent errors and hallucinations, often seen as limitations, offer a unique pedagogical opportunity. By framing AI as a ``learning companion'' whose imperfect outputs prompt analysis, evaluation, and reflection, we argue that instructors can engage students in the fundamental processes of higher-order thinking. This paper presents a design-oriented study in which an AI-integrated syllabus in a \textit{database design} ...
|
| cs.DC 3 papers | ||||
| 32 |
One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving
2605.04450
GPU HBM Cache Partitioning自适应划分HBM以平衡嵌入与KV缓存加速推荐服务。
|
cs.DCcs.IRcs.LG
|
Wenjun Yu, Shuguang Han, Amelie Chi Zhou |
Generative Recommender (GR) inference places embedding hot caches (EMB) and KV caches in direct competition for limited GPU HBM: allocating more memory to one improves its efficiency but degrades the other. Existing systems optimize them in isolation, overlook...Generative Recommender (GR) inference places embedding hot caches (EMB) and KV caches in direct competition for limited GPU HBM: allocating more memory to one improves its efficiency but degrades the other. Existing systems optimize them in isolation, overlooking that the optimal EMB-KV allocation ratio can shift by up to 0.35 across workload regimes, leaving 20-30\% latency improvement unrealized. While online reallocation is required to close this gap, naive approaches introduce H2D refill tra...
|
| 45 |
CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training
2605.04478
Distributed Training Hang Diagnosis构建CCL-D高精度定位大模型训练通信慢/卡死根因。
|
cs.DCcs.AI
|
Yida Gu, Fakang Wang, Jianhao Fu, Zhenhang Sun, Qianyu Zhang |
As training scales grow, collective communication libraries (CCL) increasingly face anomalies arising from complex interactions among hardware, software, and environmental factors. These anomalies typically manifest as slow/hang communication, the most frequen...As training scales grow, collective communication libraries (CCL) increasingly face anomalies arising from complex interactions among hardware, software, and environmental factors. These anomalies typically manifest as slow/hang communication, the most frequent and time-consuming category to diagnose. However, traditional diagnostic methods remain inaccurate and inefficient, frequently requiring hours or even days for root cause analysis. To address this, we propose CCL-D, a high-precision diagn...
|
| 269 |
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
2605.05049
MoE训练系统优化提出Piper用资源建模与流水混合并行提升大规模MoE训练效率。
|
cs.DCcs.AIcs.LG
|
Sajal Dash, Feiyi Wang |
Frontier models increasingly adopt Mixture-of-Experts (MoE) architectures to achieve large-model performance at reduced cost. However, training MoE models on HPC platforms is hindered by large memory footprints, frequent large-scale communication across hetero...Frontier models increasingly adopt Mixture-of-Experts (MoE) architectures to achieve large-model performance at reduced cost. However, training MoE models on HPC platforms is hindered by large memory footprints, frequent large-scale communication across heterogeneous networks, and severe workload imbalance. To characterize these challenges, we develop a mathematical model that quantifies memory, compute, and communication requirements for MoE configurations under various parallelization schemes,...
|
| cs.GR 3 papers | ||||
| 63 |
CoherentRaster: Efficient 3D Gaussian Splatting for Light Field Displays
2605.04509
Gaussian splatting for light field displays提出高效光场显示渲染的3D高斯泼溅栅格化加速方法
|
cs.GR
|
Gyujin Sim, Seungjoo Shin, Hosung Jeon, Gwangsoon Lee, Hyon-Gon Choo |
Light field displays (LFDs) require rendering an interlaced image that encodes many view-dependent observations. This multi-view requirement introduces substantial computational overhead, making real-time rendering difficult to achieve. While 3D Gaussian Splat...Light field displays (LFDs) require rendering an interlaced image that encodes many view-dependent observations. This multi-view requirement introduces substantial computational overhead, making real-time rendering difficult to achieve. While 3D Gaussian Splatting (3DGS) is efficient for single-view rendering on 2D displays, directly extending it to LFDs is computationally expensive. Moreover, prior accelerations either suffer from GPU inefficiency under spatially incoherent subpixel layouts or ...
|
| 169 |
AGIPC: Adaptive In-Solve Algebraic Coarsening for GPU IPC
2605.04773
GPU隐式积分自适应粗化提出求解过程中代数自适应粗化以加速GPU隐式时间积分线性求解。
|
cs.GRcs.PF
|
Xuan Wang, Zhaofeng Luo, Minchen Li, Taku Komura, Kemeng Huang |
Implicit time integration is key to robustly simulating stiff materials and large deformations, but its performance is often dominated by repeatedly solving large linear systems. Adaptive coarsening can reduce this cost by concentrating degrees of freedom (DoF...Implicit time integration is key to robustly simulating stiff materials and large deformations, but its performance is often dominated by repeatedly solving large linear systems. Adaptive coarsening can reduce this cost by concentrating degrees of freedom (DoF) to where it is most needed, yet conventional explicit remeshing changes connectivity (and often vertex ordering), complicating parallel implementations, harming memory locality, and sometimes being disallowed when it may introduce local g...
|
| 292 |
A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry
2605.05095
Bayesian next-best-view selection以贝叶斯决策在几何不确定下选择任务相关的下一最佳扫描视角。
|
cs.GRcs.CVcs.LGstat.ML
|
Jingsen Zhu, Silvia Sellán, Alexander Terenin |
We develop a framework for task-specific active next-best-view selection in 3D reconstruction from point clouds, by casting the problem in the language of Bayesian decision theory. Our framework works by (a) placing a prior distribution over the space of impli...We develop a framework for task-specific active next-best-view selection in 3D reconstruction from point clouds, by casting the problem in the language of Bayesian decision theory. Our framework works by (a) placing a prior distribution over the space of implicit surfaces, (b) using recently-developed stochastic surface reconstruction methods to calculate the resulting posterior distribution, then (c) using the posterior distribution to carefully reason about which view to scan next. This enable...
|
| cs.HC 3 papers | ||||
| 150 |
AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations
2605.04729
AI Feedback for Presentation Slides实现AISSA结合LLM与学习分析仪表盘为学生幻灯片提供量表化反馈。
|
cs.HCcs.AIcs.SE
|
Alvaro Becerra, Diego Gomez, Ruth Cobos |
Providing timely and actionable feedback on oral presentation slides is challenging in higher education, particularly in large classes where teachers cannot realistically deliver detailed formative feedback before students present. This paper introduces AISSA ...Providing timely and actionable feedback on oral presentation slides is challenging in higher education, particularly in large classes where teachers cannot realistically deliver detailed formative feedback before students present. This paper introduces AISSA (AI-based Student Slides Analysis tool), a web-based system that combines large language models (LLMs) and Learning Analytics dashboards to support scalable, rubric-based feedback on presentation slides. AISSA allows students to upload thei...
|
| 156 |
AICoFe: Implementation and Deployment of an AI-Based Collaborative Feedback System for Higher Education
2605.04740
高校同伴反馈多LLM系统实现并部署多LLM协作的同伴反馈系统以提升评语质量。
|
cs.HCcs.AIcs.SE
|
Alvaro Becerra, Alejandra Palma, Ruth Cobos |
Effective peer feedback is essential for developing critical reflection in higher education, yet its impact is often limited by the inconsistent quality of student-generated comments. This paper presents the implementation and deployment of AICoFe (AI-based Co...Effective peer feedback is essential for developing critical reflection in higher education, yet its impact is often limited by the inconsistent quality of student-generated comments. This paper presents the implementation and deployment of AICoFe (AI-based Collaborative Feedback), a system designed to bridge this gap through a human-centered AI approach. We describe a modular architecture that orchestrates a multi-LLM pipeline, utilizing GPT-4.1-mini, Gemini 2.5 Flash, and Llama 3.1, to synthes...
|
| 350 |
Making AI Drafts Count: A Quality Threshold in Audio Description Workflows
2605.05348
音频描述AI草稿质量阈值研究AI草稿质量如何影响无障碍音频描述的编辑效率与结果。
|
cs.HCcs.AI
|
Lana Do, Shasta Ihorn, Charity M. Pitcher-Cooper, Sanjay Mirani, Gio Jung |
Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the barrier to entry. What remains a...Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the barrier to entry. What remains an open question is how draft quality shapes the editing process. We investigate this through GenAD, an AD generation pipeline that incorporates accessibility guidelines and contextual video information, and RefineAD, an editing interface fo...
|
| cs.IR 3 papers | ||||
| 46 |
Career-Aware Resume Tailoring via Multi-Source Retrieval-Augmented Generation with Provenance Tracking: A Case Study
2605.05257
RAG Resume Tailoring with Provenance用多源RAG与溯源追踪从职业库生成岗位定制简历。
|
cs.IRcs.AIcs.CL
|
Kumar Abhinav |
AI-assisted resume tailoring systems commonly operate on a single uploaded resume, which limits their ability to recover relevant experience omitted from the current draft and makes it difficult for users to distinguish grounded edits from model-generated sugg...AI-assisted resume tailoring systems commonly operate on a single uploaded resume, which limits their ability to recover relevant experience omitted from the current draft and makes it difficult for users to distinguish grounded edits from model-generated suggestions. This paper presents Resume Tailor, an agentic resume-tailoring system that maintains a longitudinal career vault in a vector database and uses multi-source retrieval-augmented generation (RAG) to assemble job-specific resume conten...
|
| 146 |
Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation
2605.04723
Efficient Sequential Recommendation重审卷积网络用于属性感知序列推荐以降低长序列建模的计算与内存。
|
cs.IRcs.LG
|
Shereen Elsayed, Ngoc Son Le, Ahmed Rashed, Lars Schmidt-Thieme |
Attribute-aware sequential recommendation entails predicting the next item a user will interact with based on a chronologically ordered history of past interactions, enriched with item attributes. Existing methods typically leverage self-attention mechanisms t...Attribute-aware sequential recommendation entails predicting the next item a user will interact with based on a chronologically ordered history of past interactions, enriched with item attributes. Existing methods typically leverage self-attention mechanisms to aggregate the entire sequence into a unified representation used for next-item prediction. While effective, these models often suffer from high computational complexity and memory consumption, limiting their ability to process long user h...
|
| 172 |
Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes
2605.04797
众包视听深伪检测量化众包对视听深伪的识别一致性及操纵类型与时间定位准确性。
|
cs.IRcs.AI
|
Michael Soprano, Andrea Cioci, Stefano Mizzaro |
Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manip...Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manipulated videos and, when they flag a video as manipulated, how accurately they identify the manipulation type (audio-only, video-only, or audio-video) and how consistently they report manipulation timestamps. We run two matched crowdsourcing...
|
| cs.IT 3 papers | ||||
| 13 |
Contextual Memory-Enhanced Source Coding for Low-SNR Communications
2605.04400
Robust Source Coding in Low-SNR用上下文记忆增强源编码以降低低信噪文本传输的误差扩散。
|
cs.ITcs.LG
|
Ziqiong Wang, Rongpeng Li |
While Separate Source-Channel Coding (SSCC) retains the practical benefits of modular system design, its effectiveness in noisy text transmission is fundamentally constrained by the fragility of autoregressive source decoding. In low-SNR regimes, even a small ...While Separate Source-Channel Coding (SSCC) retains the practical benefits of modular system design, its effectiveness in noisy text transmission is fundamentally constrained by the fragility of autoregressive source decoding. In low-SNR regimes, even a small number of residual bit errors after channel decoding may derail the subsequent lossless reconstruction process, especially when Arithmetic Coding (AC) relies on Large Language Model (LLM)-based probability estimation. Existing remedies eith...
|
| 382 |
Information-theoretic Limits of Learning and Estimation
2605.06710
学习估计信息论极限系统介绍信息论工具用于刻画学习与估计的基本下界。
|
cs.ITcs.LGmath.ST
|
Abbas El Gamal, Maxim Raginsky |
Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-ch...Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-chapter exercises makes the material suitable for both classroom use and self-study. We begin by introducing concentration inequalities along with the notions of covering and packing in metric spaces, and the associated concept of metric entr...
|
| 414 |
When Semantic Communication Meets Queueing: Cross-Layer Latency and Task Fidelity Optimization
2605.05514
Semantic Communication with Queueing Optimization联合队列与语义通信优化无线传输时延与任务保真度。
|
cs.ITcs.AIcs.LGcs.NIeess.SP
|
Yalin E. Sagduyu, Tugba Erpek |
Semantic communication (SemCom) with learned encoder-decoder architectures enables end-to-end learning of compact task-oriented representations optimized for the wireless channel, reducing channel resources needed to convey task-relevant information and improv...Semantic communication (SemCom) with learned encoder-decoder architectures enables end-to-end learning of compact task-oriented representations optimized for the wireless channel, reducing channel resources needed to convey task-relevant information and improving spectrum efficiency. This paper studies semantic image transmission over block Rayleigh fading with AWGN using a multi-task semantic autoencoder that jointly reconstructs images and predicts labels from the received waveform. The latent...
|
| cs.LG 137 papers | ||||
| 1 |
Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment
2605.04363
Tabular Label Shift Adaptation提出DistPFN在测试时校正后验以缓解表格模型标签偏移。
|
cs.LGcs.AI
|
Seunghan Lee, Jaehoon Lee, Jun Seo, Sungdong Yoo, Minjae Kim |
TabPFN has recently gained attention as a foundation model for tabular datasets, achieving strong performance by leveraging in-context learning on synthetic data. However, we find that TabPFN is vulnerable to label shift, often overfitting to the majority clas...TabPFN has recently gained attention as a foundation model for tabular datasets, achieving strong performance by leveraging in-context learning on synthetic data. However, we find that TabPFN is vulnerable to label shift, often overfitting to the majority class in the training dataset. To address this limitation, we propose DistPFN, the first test-time posterior adjustment method designed for tabular foundation models. DistPFN rescales predicted class probabilities by downweighting the influence...
|
| 2 |
Online Nonstochastic Prediction: Logarithmic Regret via Predictive Online Least Squares
2605.04364
Online Prediction for LDS提出预测式在线最小二乘以在边际稳定系统中实现对数遗憾。
|
cs.LGeess.SYmath.OC
|
Chih-Fan Pai, Yang Zheng |
We study online prediction for marginally stable, partially observed linear dynamical systems under nonstochastic disturbances. Our objective is to minimize the cumulative squared prediction loss and compete with the best-in-hindsight Luenberger predictor. Sta...We study online prediction for marginally stable, partially observed linear dynamical systems under nonstochastic disturbances. Our objective is to minimize the cumulative squared prediction loss and compete with the best-in-hindsight Luenberger predictor. Standard online learning methods typically rely on bounded domains/gradients, and thus their guarantees may fail to deal with potentially unbounded trajectories in marginally stable systems. In this paper, we introduce an unconstrained online ...
|
| 4 |
Extending Differential Temporal Difference Methods for Episodic Problems
2605.04368
Episodic Differential TD Learning扩展差分TD方法以适配回合式任务并避免奖励中心化改策。
|
cs.LGcs.AI
|
Kris De Asis, Mohamed Elsayed, Jiamin He |
Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This keeps the return bou...Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This keeps the return bounded and removes a value function's state-independent offset. However, reward centering can alter the optimal policy in episodic problems, limiting its applicability. Motivated by recent works that emphasize the role of normalization in str...
|
| 6 |
$p$-adic Manifold Learning and Benchmark Tasks from Impartial Games
2605.04374
p-adic Manifold Learning提出p进流形学习算法并基于公平博弈构建基准任务。
|
cs.LGmath.NT
|
Tomoki Mihara |
We introduce $p$-adic manifold learning, propose an algorithm to solve it, and propose benchmark tasks from impartial games.We introduce $p$-adic manifold learning, propose an algorithm to solve it, and propose benchmark tasks from impartial games.
|
| 8 |
GraphPI: Efficient Protein Inference with Graph Neural Networks
2605.04376
GNN Protein Inference将蛋白推断建模为图节点分类并用GNN缓解标注稀缺。
|
cs.LG
|
Zheng Ma, Jiazhen Chen, Lei Xin, Ali Ghodsi |
The integration of deep learning approaches in biomedical research has been transformative, enabling breakthroughs in various applications. Despite these strides, its application in protein inference is impeded by the scarcity of extensively labeled datasets, ...The integration of deep learning approaches in biomedical research has been transformative, enabling breakthroughs in various applications. Despite these strides, its application in protein inference is impeded by the scarcity of extensively labeled datasets, a challenge compounded by the high costs and complexities of accurate protein annotation. In this study, we introduce GraphPI, a novel framework that treats protein inference as a node classification problem. We treat proteins as interconne...
|
| 11 |
Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize
2605.04396
Transformer Complexity Control Timing揭示训练中关键时间窗决定Transformer走向推理或记忆。
|
cs.LGcs.AI
|
Sarwan Ali |
Recent work has shown that Transformers' compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers training toward low-complexity reasoning solutions rather than high-complexity memorization. Exi...Recent work has shown that Transformers' compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers training toward low-complexity reasoning solutions rather than high-complexity memorization. Existing analyses, however, treat complexity control as a single static hyperparameter choice, leaving open \emph{when} during training this control is actually decisive. We show that the memorization-versus-reasoning fate of a Transformer is ...
|
| 15 |
Beyond Rigid Geometries: The Spline-Pullback Metric for Universal Diffeomorphic SPD Representation Learning
2605.04406
Learnable SPD Geometry提出样条回拉度量以实现通用可微同胚的SPD表示学习。
|
cs.LG
|
Tushar Das, Subrata Dutta, Sarmistha Neogy, Koushlendra Kumar Singh |
The integration of Symmetric Positive Definite (SPD) matrices into deep learning has historically relied on fixed algebraic Riemannian metrics. Analogous to hand-crafted features in classical machine learning, these static formulations impose rigid geometries ...The integration of Symmetric Positive Definite (SPD) matrices into deep learning has historically relied on fixed algebraic Riemannian metrics. Analogous to hand-crafted features in classical machine learning, these static formulations impose rigid geometries limiting network expressivity and adaptability. Recent attempts to parameterize these geometries often violate the axioms of primary matrix functions through unconstrained powers or rank-dependent scaling, inviting spatial folding, loss of ...
|
| 19 |
Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models
2605.04413
Counterfactual Identifiability without Monotonicity提出非单调三角SCM并给出无全局单调下的反事实可识别条件。
|
cs.LGstat.ME
|
Pengcheng Tan, Jiang Chen, Dehui Du |
Structural causal models provide a unified semantics for interventions and counterfactuals, but most identifiability results rely on restrictive assumptions like global monotonicity, which are often violated in embodied interaction, where the same exogenous pe...Structural causal models provide a unified semantics for interventions and counterfactuals, but most identifiability results rely on restrictive assumptions like global monotonicity, which are often violated in embodied interaction, where the same exogenous perturbation can induce opposite responses under different contact contexts. We ask what structure still suffices once global monotonicity is dropped. We introduce non-monotone triangular structural causal models (NM-TM-SCM), which retain tri...
|
| 20 |
Demystifying Manifold Constraints in LLM Pre-training
2605.04418
Manifold Constraints in LLM Training系统分析LLM预训练中显式流形约束对稳定性与性能的作用机制。
|
cs.LGcs.AImath.OC
|
Kang An, Jiaxiang Li, Donald Goldfarb, Shiqian Ma |
The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained optimization approaches that explicitly restrict weights may...The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained optimization approaches that explicitly restrict weights may improve numerical stability and performance, the mechanism and motivation for adding constraints still remain elusive. This paper systematically demystifies the role of explicit manifold constraints in LLM pre-training. By introducing the ...
|
| 21 |
FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning
2605.04421
Continuous-Time Transformer Attention提出FLUID以液态注意力将连续动力学直接融入注意力计算。
|
cs.LGcs.AI
|
Waleed Razzaq, Yun-Bo Zhao |
Continuous-time (CT) Transformers improve irregular and long-range modeling over CT-RNNs by exploiting inputs or outputs embeddings with continuous dynamics. However, the core scaled-dot-product-attention (SDPA) mechanism remains inherently discrete. We propos...Continuous-time (CT) Transformers improve irregular and long-range modeling over CT-RNNs by exploiting inputs or outputs embeddings with continuous dynamics. However, the core scaled-dot-product-attention (SDPA) mechanism remains inherently discrete. We propose FLUID (Flexible Unified Information Dynamics), a CT Transformer that incorporates continuous dynamics directly into the attention computation by replacing it with Liquid Attention Network (LAN). LAN reinterprets attention logits as contin...
|
| 37 |
Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention
2605.04460
Sparse Counterfactual Community Intervention从调查数据学习稀疏可控变量调整以制定干预策略。
|
cs.LG
|
Fatima Ashraf, Muhammad Ayub Sabir, Junbiao Pang, Yufang Zhou, Yan Shang |
Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual co...Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual community intervention from survey responses, where the goal is to shift a target respondent group toward a desired reference group through controllable survey-variable adjustments. We formulate this task as a policy-feasible distributional a...
|
| 39 |
Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control
2605.04468
Controlled Drift LLM Fine-tuning用动态锚点控制分布漂移以稳定SFT并减轻遗忘。
|
cs.LGcs.AIcs.CL
|
Xinyu Wang, Changzhi Sun, Yuanbin Wu, Xiaoling Wang |
Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributio...Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributional drift during optimization. Motivated by this perspective, we propose Anchored Learning, a simple framework that explicitly controls distributional updates during offline fine-tuning via a dynamically evolving moving anchor. Instead of m...
|
| 40 |
CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies
2605.04470
RL Fine-tuning for Driving结合反事实监督与交互式RL微调提升驾驶策略鲁棒性。
|
cs.LGcs.RO
|
Keyu Chen, Nanfei Ye, Yida Wang, Wenchao Sun, Danqi Zhao |
Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-t...Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-tuning provides grounded feedback from executed actions but is constrained by the sparsity of informative events, whereas counterfactual fine-tuning provides dense supervision over candidate futures but inherits bias from imperfect future es...
|
| 41 |
Automated Formal Proofs of Combinatorial Identities via Wilf-Zeilberger Guidance and LLMs
2605.04472
Neuro-symbolic Combinatorial Proofs用WZ方法指导LLM生成可执行的组合恒等式形式化证明。
|
cs.LG
|
Beibei Xiong, Hangyu Lv, Junqi Liu, Yisen Wang, Shaoshi Chen |
Automating formal proofs of combinatorial identities is challenging for LLM-based provers, as long-horizon proof planning is required and unconstrained search quickly explodes. Symbolic methods such as the Wilf-Zeilberger (WZ) method can achieve a mechanized p...Automating formal proofs of combinatorial identities is challenging for LLM-based provers, as long-horizon proof planning is required and unconstrained search quickly explodes. Symbolic methods such as the Wilf-Zeilberger (WZ) method can achieve a mechanized proof of combinatorial identities by constructing special auxiliary functions and demonstrating that they satisfy specific recurrence relations. We propose WZ-LLM, a neuro-symbolic framework that turns WZ proof plans into executable proof sk...
|
| 42 |
Geometry-Aware Neural Optimizer for Shape Optimization and Inversion
2605.04474
Geometry-Aware Neural Shape Optimization提出几何感知神经优化器实现形状优化与反演的端到端梯度。
|
cs.LG
|
Guoze Sun, Tianya Miao, Haoyang Huang, Huaguan Chen, Han Wan |
Geometry is central to PDE-governed systems, motivating shape optimization and inversion. Classical pipelines conduct costly forward simulation with geometry processing, requiring substantial expert effort. Neural surrogates accelerate forward analysis but do ...Geometry is central to PDE-governed systems, motivating shape optimization and inversion. Classical pipelines conduct costly forward simulation with geometry processing, requiring substantial expert effort. Neural surrogates accelerate forward analysis but do not close the loop because gradients from objectives to geometry are often unavailable. Existing differentiable methods either rely on restrictive parameterizations or unstable latent optimization driven by scalar objectives, limiting inter...
|
| 44 |
Data-dependent Exploration for Online Reinforcement Learning from Human Feedback
2605.04477
Exploration in Online RLHF提出数据依赖探索策略提升在线RLHF的样本效率。
|
cs.LG
|
Zhen-Yu Zhang, Yuting Tang, Jiandong Zhang, Lanjihong Ma, Masashi Sugiyama |
Online reinforcement learning from human feedback (RLHF) has emerged as a promising paradigm for aligning large language models (LLMs) by continuously collecting new preference feedback during training. A foundational challenge in this setting is exploration, ...Online reinforcement learning from human feedback (RLHF) has emerged as a promising paradigm for aligning large language models (LLMs) by continuously collecting new preference feedback during training. A foundational challenge in this setting is exploration, which requires algorithms that enable the LLMs to generate informative comparisons that improve sample-efficiency in online RLHF. Existing exploration strategies often derive bonuses via on-policy expectations, which are difficult to estima...
|
| 49 |
Towards General Preference Alignment: Diffusion Models at Nash Equilibrium
2605.04494
Preference Alignment for Diffusion Models将扩散偏好对齐建模为纳什均衡以提升泛化对齐。
|
cs.LGcs.CV
|
Jiaming Hu, Jiamu Bai, Haoyu Wang, Debarghya Mukherjee, Ioannis Ch. Paschalidis |
Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative tha...Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative that avoids explicit reward modeling and has been widely adopted in diffusion alignment. However, existing preference-based methods for diffusion alignment still rely on reward-induced preference signals and typically assume that human prefere...
|
| 53 |
Quadrature-TreeSHAP: Depth-Independent TreeSHAP and Shapley Interactions
2605.04497
Depth-Independent TreeSHAP用数值积分重构TreeSHAP实现深度无关且支持高阶交互。
|
cs.LG
|
Ron Wettenstein, Rory Mitchell, Peng Yu |
Shapley values are a standard tool for explaining predictions of tree ensembles, with Path-Dependent SHAP being the most widely used variant. Despite substantial progress, existing methods still exhibit trade-offs between depth-dependent runtime, numerical sta...Shapley values are a standard tool for explaining predictions of tree ensembles, with Path-Dependent SHAP being the most widely used variant. Despite substantial progress, existing methods still exhibit trade-offs between depth-dependent runtime, numerical stability, and support for higher-order interactions. To address these challenges, we introduce Quadrature-TreeSHAP, a quadrature-based reformulation of Path-Dependent TreeSHAP that is numerically stable, naturally extends to any-order Shapley...
|
| 57 |
Gradient Scaling Effects in Adaptive Spectral PINNs for Stiff Nonlinear ODEs
2605.04502
Optimization in Spectral PINNs分析IC门控导致的梯度缩放并改进刚性ODE的PINN训练。
|
cs.LG
|
Isabela M. Yepes, Pavlos Protopapas |
Physics-Informed Neural Networks (PINNs) often struggle to train reliably on stiff and oscillatory dynamical systems due to poor optimization conditioning. While prior work has emphasized representational remedies such as spectral parameterizations, the optimi...Physics-Informed Neural Networks (PINNs) often struggle to train reliably on stiff and oscillatory dynamical systems due to poor optimization conditioning. While prior work has emphasized representational remedies such as spectral parameterizations, the optimization implications of initial-condition (IC) embeddings in adaptive spectral PINNs have not been well characterized. In this work, we show that the choice of IC gating function induces explicit time-dependent gradient scaling, which intera...
|
| 67 |
FL-Sailer: Efficient and Privacy-Preserving Federated Learning for Scalable Single-Cell Epigenetic Data Analysis via Adaptive Sampling
2605.04519
Federated learning for scATAC-seq用自适应采样的联邦学习框架实现隐私保护的scATAC分析
|
cs.LGstat.ML
|
Guangyi Zhang, Yi Dai, Yiyun He, Junhao Liu |
Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility, yet privacy regulations and data size constraints hinder multi-institutional sharing. Federated learning (FL) offers a privacy-preserving alternative, but faces three...Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility, yet privacy regulations and data size constraints hinder multi-institutional sharing. Federated learning (FL) offers a privacy-preserving alternative, but faces three fundamental barriers in scATAC-seq analysis: ultra-high dimensionality, extreme sparsity, and severe cross-institutional heterogeneity. We propose FL-Sailer, the first FL framework designed for scATAC-seq data. FL-Sailer integrates two key...
|
| 72 |
YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts
2605.04528
Zero-shot cross-domain fault diagnosis用域条件混合专家实现机械故障诊断的零样本跨域泛化
|
cs.LGcs.MA
|
Zesen Wang, Zihao Wu, Yue Hu, Yang Gao, Fuzhen Xuan |
Mechanical equipment forms the critical backbone of modern industrial production, yet domain shift severely limits the generalization of deep learning based fault diagnosis models across different equipment and operating conditions.Inspired by the success of f...Mechanical equipment forms the critical backbone of modern industrial production, yet domain shift severely limits the generalization of deep learning based fault diagnosis models across different equipment and operating conditions.Inspired by the success of foundation models in achieving zero-shotgeneralization, we propose YOTOnet (You Only Train Once), a novel architecture specifically designed for cross-domain fault diagnosis in mechanical equipment.YOTOnet comprises three core components: (1...
|
| 77 |
From Video-to-PDE: Data-Driven Discovery of Nonlinear Dye Plume Dynamics
2605.04535
Video-to-PDE system identification从染料羽流视频稳健提取场并用弱形式学习非线性PDE
|
cs.LGmath.NAphysics.comp-phstat.APstat.ML
|
Cesar Acosta-Minoli, Sayantan Sarkar |
Inferring continuum models directly from video is hampered by two facts: the recorded field is uncalibrated image intensity rather than a physical state, and direct numerical differentiation of noisy frames is unstable. We develop a video-to-PDE pipeline that ...Inferring continuum models directly from video is hampered by two facts: the recorded field is uncalibrated image intensity rather than a physical state, and direct numerical differentiation of noisy frames is unstable. We develop a video-to-PDE pipeline that converts grayscale recordings of an ink plume into a normalised scalar field $u(x,y,t)$, isolates a bulk drift $\mathbf{v}(t)$ from intrinsic spreading via the intensity-weighted centroid, and identifies an effective transport law by weak-f...
|
| 80 |
Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
2605.04542
Power distribution linking sampling and RL揭示幂分布统一解释采样、自奖励强化学习与自蒸馏的关系
|
cs.LG
|
Akiyoshi Tomihari, Issei Sato |
Recent analyses question whether reinforcement learning (RL) is responsible for strong reasoning in large language models (LLMs). At the same time, distillation and inference-time sampling, including power sampling, have emerged as effective ways to improve LL...Recent analyses question whether reinforcement learning (RL) is responsible for strong reasoning in large language models (LLMs). At the same time, distillation and inference-time sampling, including power sampling, have emerged as effective ways to improve LLM performance. However, the relationship among RL, distillation, and sampling remains unclear. In this study, we focus on the power distribution, the target distribution of power sampling, and show that the power distribution bridges sampli...
|
| 83 |
Event-Based Early Warning of Vineyard Disease Risk from Environmental Time Series
2605.04548
Event-based vineyard disease early warning用事件预测替代日分类从环境序列提供可行动的病害预警
|
cs.LG
|
Ivica Dimitrovski, Ivan Kitanovski, Danco Davcev, Slobodan Kalajdziski, Kosta Mitreski |
Accurate early warning of vineyard disease risk from environmental observations is essential for timely intervention and more sustainable crop protection. However, many existing studies formulate disease prediction as daily presence classification, which can f...Accurate early warning of vineyard disease risk from environmental observations is essential for timely intervention and more sustainable crop protection. However, many existing studies formulate disease prediction as daily presence classification, which can favor persistence-driven predictions and provide only limited support for actionable short-horizon warning. In this paper, we present an event-based approach for early warning of vineyard disease risk from environmental time series and evalu...
|
| 87 |
Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models
2605.04555
Counterfactual model-based HVAC RL用反事实建筑模型的MBRL实现更省数据的HVAC控制策略学习
|
cs.LGeess.SY
|
Jan Marco Ruiz de Vargas, Fabian Raisch, Zoltan Nagy, Pierre Pinson, Christoph Goebel |
Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced...Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced training data requirements, they still require several months of interaction with the building to learn a satisfactory control policy. A key reason is that existing surrogate models attempt to predict the entire state-space, including weat...
|
| 92 |
Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination
2605.04568
Gradient-based MPC in latent space用潜在想象模型实现可微分的梯度式MPC规划控制。
|
cs.LGcs.AIcs.RO
|
Jonathan Spieler, Sven Behnke |
State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learned policy networks, or a combination of policy networks and planning. Hybrid approaches that combine Model Predictive Cont...State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learned policy networks, or a combination of policy networks and planning. Hybrid approaches that combine Model Predictive Control (MPC) with a learned model and a policy prior to leverage the advantages of both paradigms have shown promising results. However, these approaches typically rely on gradient-free optimization methods, which can be computationally expens...
|
| 102 |
HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily
2605.04594
Heterophily heterogeneous graph learning通过语义与结构解耦提升异配异构图表示学习效果。
|
cs.LGcs.AI
|
Xinyi Li, Ming Li, Lu Bai, Lixin Cui, Feilong Cao |
Many real-world heterogeneous graphs exhibit pronounced heterophily, where connected nodes often have dissimilar labels or play different semantic roles. In such settings, standard heterogeneous graph neural networks that aggregate messages along metapaths or ...Many real-world heterogeneous graphs exhibit pronounced heterophily, where connected nodes often have dissimilar labels or play different semantic roles. In such settings, standard heterogeneous graph neural networks that aggregate messages along metapaths or meta-relations primarily based on feature similarity can propagate misleading information, since feature similarity may be misaligned with underlying relational semantics. In this paper, we propose HeterSEED, a semantics-structure decouplin...
|
| 103 |
A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints
2605.04595
Queueing analysis of LLM inference用排队论联合计算与KV缓存约束分析LLM推理稳定性。
|
cs.LGcs.AImath.OC
|
Chengyi Nie, Nian Si, Zijie Zhou |
The rapid adoption of large language models (LLMs) has created significant challenges for efficient inference at scale. Unlike traditional workloads, LLM inference is constrained by both computation and the memory overhead of key-value (KV) caching, which acce...The rapid adoption of large language models (LLMs) has created significant challenges for efficient inference at scale. Unlike traditional workloads, LLM inference is constrained by both computation and the memory overhead of key-value (KV) caching, which accelerates decoding but quickly exhausts GPU memory. In this paper, we introduce the first queueing-theoretic framework that explicitly incorporates both computation and GPU memory constraints into the analysis of LLM inference. Based on this ...
|
| 112 |
Library learning with e-graphs on jazz harmony
2605.04622
E-graph library learning for jazz harmony用e-graph库学习从爵士和声语料中归纳简洁生成规则。
|
cs.LGcs.AIcs.SC
|
Zeng Ren, Maddy Bowers, Xinyi Guan, Martin Rohrmeier |
Humans can acquire a highly structured intuitive understanding of musical patterns, yet these patterns often require multiple iterations of reflection and re-listening to internalize fully. To capture such an internalization process, we present a computational...Humans can acquire a highly structured intuitive understanding of musical patterns, yet these patterns often require multiple iterations of reflection and re-listening to internalize fully. To capture such an internalization process, we present a computational model for the learning of jazz harmonic patterns based on library learning. Given a corpus of harmonic progressions, our model searches over a space of programs composed of primitive harmonic relations in order to discover concise generati...
|
| 118 |
FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
2605.04651
Forward-only fast-weights adaptation单次前向将标注样本编译为快权重实现测试时监督适配。
|
cs.LGcs.CL
|
Guangsheng Bao, Hongbo Zhang, Han Cui, Ke Sun, Yanbin Zhao |
Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytical...Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytically compiles labeled examples into fast weights in a single pass. By eliminating memory or context dependence, FAAST achieves constant-time inference and decouples task adaptation from pretrained representation. Across image classification a...
|
| 120 |
Threshold-Guided Optimization for Visual Generative Models
2605.04653
Threshold-guided alignment for generative models提出阈值引导优化以用标量评分高效对齐视觉生成模型。
|
cs.LG
|
Jinbin Bai, Yu Lei, Qingyu Shi, Aosong Feng, Yi Xin |
Aligning large visual generative models with human feedback is often performed through pairwise preference optimization. While such approaches are conceptually simple, they fundamentally rely on annotated pairs, limiting scalability in settings where feedback ...Aligning large visual generative models with human feedback is often performed through pairwise preference optimization. While such approaches are conceptually simple, they fundamentally rely on annotated pairs, limiting scalability in settings where feedback is collected as independent scalar ratings. In this work, we revisit the KL-regularized alignment objective and show that the optimal policy implicitly compares each sample's reward to an instance-specific baseline that is generally intract...
|
| 122 |
Evidence-based anomaly detection in clinical domains
2605.04664
Clinical Anomaly Detection用贝叶斯网络等概率模型检测临床管理决策中的异常行为。
|
cs.LG
|
Milos Hauskrecht, Michal Valko, Branislav Kveton, Shyam Visweswaran, Gregory Cooper |
Anomaly detection methods can be very useful in identifying interesting or concerning events. In this work, we develop and examine new probabilistic anomaly detection methods that let us evaluate management decisions for a specific patient and identify those d...Anomaly detection methods can be very useful in identifying interesting or concerning events. In this work, we develop and examine new probabilistic anomaly detection methods that let us evaluate management decisions for a specific patient and identify those decisions that are highly unusual with respect to patients with the same or similar condition. The statistics used in this detection are derived from probabilistic models such as Bayesian networks that are learned from a database of past pat...
|
| 124 |
Feature importance analysis for patient management decisions
2605.04666
Clinical Decision Feature Importance分析电子病历特征对化验与用药决策影响并给出重要性统计。
|
cs.LG
|
Michal Valko, Milos Hauskrecht |
The objective of this paper is to understand what characteristics and features of clinical data influence physician's decision about ordering laboratory tests or prescribing medications the most. We conduct our analysis on data and decisions extracted from ele...The objective of this paper is to understand what characteristics and features of clinical data influence physician's decision about ordering laboratory tests or prescribing medications the most. We conduct our analysis on data and decisions extracted from electronic health records of 4486 post-surgical cardiac patients. The summary statistics for 335 different lab order decisions and 407 medication decisions are reported. We show that in many cases, physician's lab-order and medication decision...
|
| 126 |
ITBoost: Information-Theoretic Trust for Robust Boosting
2605.04671
Robust Boosting with Noisy Labels以信息论可信度评估样本可靠性提升噪声标签下的Boosting鲁棒性。
|
cs.LG
|
Ye Su, Longlong Zhao, Diego Garcia-Gil, Jipeng Guo, Gangchun Zhang |
Gradient boosting remains a strong and widely used method for tabular data learning, but its performance often degrades when training labels are noisy. This behavior is largely related to the way boosting algorithms emphasize samples with large gradients, with...Gradient boosting remains a strong and widely used method for tabular data learning, but its performance often degrades when training labels are noisy. This behavior is largely related to the way boosting algorithms emphasize samples with large gradients, without explicitly accounting for whether such errors originate from informative hard cases or from unreliable labels. We address this issue by reconsidering how sample reliability is evaluated during boosting. Instead of relying on instantaneo...
|
| 131 |
HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction
2605.04682
Spatial Transcriptomics Prediction提出六角移窗Transformer从H&E切片预测空间基因表达以适配六角采样。
|
cs.LGcs.CV
|
Keunho Byeon, Jin Tae Kwak |
Spatial transcriptomics offers spatially resolved gene expression profiling within tissue sections, but its cost and limited throughput hinder large-scale deployment. To extend this capability to routine practice, recent computational methods aim to infer spat...Spatial transcriptomics offers spatially resolved gene expression profiling within tissue sections, but its cost and limited throughput hinder large-scale deployment. To extend this capability to routine practice, recent computational methods aim to infer spatial gene expression directly from ubiquitous hematoxylin and eosin-stained histology slides. However, most existing models assume Cartesian or geometry-agnostic locality, despite the hexagonal sampling of widely used spot-array platforms, a...
|
| 134 |
Learning Time-Inhomogeneous Markov Dynamics in Financial Time Series via Neural Parameterization
2605.04690
Nonstationary Markov Modeling用神经参数化学习时变马尔可夫动力学以建模金融时间序列非平稳性。
|
cs.LGq-fin.MF
|
Jan Rovirosa, Jesse Schmolze |
Modeling the dynamics of non-stationary stochastic systems requires balancing the representational power of deep learning with the mathematical transparency of classical models. While classical Markov transition operators provide explicit, theoretically ground...Modeling the dynamics of non-stationary stochastic systems requires balancing the representational power of deep learning with the mathematical transparency of classical models. While classical Markov transition operators provide explicit, theoretically grounded rules for system evolution, their empirical estimation collapses due to severe data sparsity when applied to high-resolution, high-noise environments. We explore this statistical barrier using financial time series as a canonical, real-w...
|
| 139 |
Differentiable Chemistry in PINNs for Solving Parameterized and Stiff Reaction Systems
2605.04708
PINNs for Stiff Chemistry将可微化学求解器融入PINN以求解参数化且刚性的反应系统。
|
cs.LG
|
Miloš Babić, Franz M. Rohrhofer, Stefan Posch |
From neural ODEs to continuous-time machine learning, differentiable solvers allow physics, optimization, and simulation to become trainable components within deep learning systems. This has opened the path to a new generation of deep learning frameworks for s...From neural ODEs to continuous-time machine learning, differentiable solvers allow physics, optimization, and simulation to become trainable components within deep learning systems. This has opened the path to a new generation of deep learning frameworks for scientific computing, with many promising applications still emerging. In this paper, we integrate a differentiable chemistry solver into a modified physics-informed neural network to solve parameterized reaction systems that are inherently ...
|
| 140 |
ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC
2605.04709
Long-Horizon Visual MPC提出ELVIS用集成校准的潜空间想象提升长时域视觉MPC规划可靠性。
|
cs.LGcs.ROeess.SY
|
Yurui Du, Pinhao Song, Yutong Hu, Renaud Detry |
A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding mode...A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model...
|
| 142 |
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
2605.04712
Continual RL Plasticity in MoE提出SPHERE缓解MoE在持续强化学习中的谱可塑性丧失与性能退化。
|
cs.LG
|
Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li |
In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixt...In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating ...
|
| 145 |
Exact Dual Geometry of SOC-ICNN Value Functions
2605.04722
Dual Geometry of SOC-ICNNs推导SOC-ICNN值函数的精确对偶几何以支持可解释的凸推断。
|
cs.LGcs.AImath.OC
|
Kang Liu, Jianchen Hu, Wei Peng |
Input Convex Neural Networks (ICNNs) are commonly used in a two-stage manner: one first trains a convex network and then minimizes it over its input in a downstream inference problem. Recent second-order-cone ICNNs (SOC-ICNNs) enrich ReLU-based ICNNs with quad...Input Convex Neural Networks (ICNNs) are commonly used in a two-stage manner: one first trains a convex network and then minimizes it over its input in a downstream inference problem. Recent second-order-cone ICNNs (SOC-ICNNs) enrich ReLU-based ICNNs with quadratic and conic modules and admit an exact representation as value functions of second-order cone programs (SOCPs). This value-function structure enables an explicit convex-analytic treatment of SOC-ICNN inference. In this paper, we study t...
|
| 148 |
Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols
2605.04727
Programming Knowledge Tracing Reliability复现实验并评估注意力增强PKT模型对实现细节与协议选择的敏感性。
|
cs.LGcs.SE
|
Jaewook Kim, Hyeoncheol Kim |
Programming Knowledge Tracing (PKT) has recently advanced through hybrid approaches that integrate attention-based feature modeling for code representation with RNN-based sequential prediction. While these models report strong empirical performance, their reli...Programming Knowledge Tracing (PKT) has recently advanced through hybrid approaches that integrate attention-based feature modeling for code representation with RNN-based sequential prediction. While these models report strong empirical performance, their reliability can be sensitive to subtle implementation and experimental design choices. This study revisits representative PKT models and shows that reported gains can be substantially influenced by model configuration and sequence construction ...
|
| 153 |
Using Common Random Numbers for Simulation-based Planning with Rollouts
2605.04732
规划rollout的公共随机数分析公共随机数在rollout规划中对效用估计与选行动的影响。
|
cs.LG
|
Sandarbh Yadav, Frederic J Maliakkal, Harshad Khadilkar, Shivaram Kalyanakrishnan |
Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the...Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the utilities of available actions. Among the actions thus explored, one with the maximum estimated utility is then executed. In this paper, we examine the effect of using common random numbers in the simulation process. We obtain a simple rec...
|
| 155 |
OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
2605.04738
低比特LLM权重量化提出异常值自吸收方法以提升低比特LLM后训练量化精度。
|
cs.LG
|
Zhikai Li, Zhen Dong, Xuewen Liu, Jing Zhang, Qingyi Gu |
Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a promising solution by reducin...Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a promising solution by reducing model size and accelerating token generation through alleviating the memory-bound issue. Nevertheless, the presence of inherent systematic outliers in weights continues to be a major obstacle. While existing methods, such as scaling and r...
|
| 157 |
MixINN: Accelerating Plant Breeding by Combining Mixed Models and Deep Learning for Interaction Prediction
2605.04744
育种基因型环境交互预测结合混合模型与深度学习预测基因型×环境互作以加速育种。
|
cs.LG
|
Aike Potze, Fred van Eeuwijk, Ioannis N. Athanasiadis |
Plant breeding underpins global food security through incremental, accumulating improvements in crop yield, quality and sustainability, achieved via repeated cycles of crop ranking, selection and crossing. Climate change disrupts this process by altering local...Plant breeding underpins global food security through incremental, accumulating improvements in crop yield, quality and sustainability, achieved via repeated cycles of crop ranking, selection and crossing. Climate change disrupts this process by altering local growing conditions, thereby shifting the relative performance of crop genotypes. Predicting these relative changes in yield is critical for food security. Yet, this problem remains an open challenge in plant breeding, and relatively unexpl...
|
| 158 |
Knowledge-Free Correlated Agreement for Incentivizing Federated Learning
2605.04747
联邦学习贡献激励机制提出无需真值的相关一致奖励机制以可信激励联邦学习客户端。
|
cs.LGcs.AIcs.GT
|
Leon Witt, Togrul Abbasli, Kentaroh Toyoda, Wojciech Samek, Lucy Klinger |
We introduce Knowledge-Free Correlated Agreement (KFCA) to reward client contributions in federated learning (FL) without relying on ground truth, a public test set, or distribution knowledge. Under categorical reports and an honest majority, KFCA is strictly ...We introduce Knowledge-Free Correlated Agreement (KFCA) to reward client contributions in federated learning (FL) without relying on ground truth, a public test set, or distribution knowledge. Under categorical reports and an honest majority, KFCA is strictly truthful, addressing the label-flipping vulnerability of Correlated Agreement (CA). We evaluate KFCA on federated LLM adapter tuning and a real-world PCB inspection task, showing efficient real-time reward computation suitable for decentral...
|
| 162 |
AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures
2605.04754
近似乘法器与MoE推理评估近似乘法器对MoE网络精度效率与能耗的综合影响。
|
cs.LGcs.AR
|
Omkar B Shende, Marcello Traiola, Gayathri Ananthanarayanan |
Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towar...Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towards efficient inference, the former by replacing exact arithmetic with low-power approximate multipliers, the latter by routing inputs through specialized expert sub-networks to enable conditional computation. However, their interaction rema...
|
| 164 |
Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop
2605.04761
教育认知孪生建模构建分层可解释的个性化思维模型并引入人机协同提升表现。
|
cs.LGcs.AIcs.HC
|
Wu-Yuin Hwang, Nur Alif Ilyasa, Muhammad Irfan Luthfi, Yuniar Indrihapsari |
This paper presents the Personalized Thinking Model (PTM), a hierarchical and interpretable learner representation designed for AI supported education. PTM organizes evidence from learner journals into a five-layer structure covering behavioral instances, beha...This paper presents the Personalized Thinking Model (PTM), a hierarchical and interpretable learner representation designed for AI supported education. PTM organizes evidence from learner journals into a five-layer structure covering behavioral instances, behavioral patterns, cognitive routines, metacognitive tendencies, and self-system values. PTM is grounded in Marzano's New Taxonomy of Educational Objectives and tries to clone learner's thinking model and build cognitive twin. It was construc...
|
| 171 |
Bilinear Mamba-Koopman Neural MPC for Varying Dynamics
2605.04793
Koopman神经MPC变动态引入控制相关双线性耦合以提升Koopman神经MPC对时变动力学适应。
|
cs.LGmath.OC
|
Matan Pagi, Zohar Sorek |
Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changin...Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changing dynamics within a single MPC horizon, particularly under time-varying conditions and under stale-plan execution. We propose Bilinear Mamba-Koopman Neural MPC, a minimal extension that introduces control-dependent coupling in the latent ...
|
| 174 |
A Biased Nonnegative Block Term Tensor Decomposition Model for Dynamic QoS Prediction
2605.04813
动态QoS张量分解预测提出带偏置的非负块项张量分解以预测动态服务QoS。
|
cs.LG
|
Wenjing Liu, Yujia Lei, Qu Wang |
With the rapid development of cloud computing and Web services, Quality of Service (QoS) has become a key criterion for service selection and recommendation. Tensor latent feature analysis provides an effective way to model multidimensional QoS data, and most ...With the rapid development of cloud computing and Web services, Quality of Service (QoS) has become a key criterion for service selection and recommendation. Tensor latent feature analysis provides an effective way to model multidimensional QoS data, and most existing QoS prediction methods are mainly based on Canonical Polyadic (CP) decomposition or Tucker decomposition. However, constrained by their inherent structural properties, these methods cannot accurately capture the complex and dynamic...
|
| 175 |
Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs
2605.04819
SAT不可满足核预测GNN在子句-文字超图上进行极性感知表示学习以预测unsat core。
|
cs.LG
|
Zhenchao Sun, Shuai Ma, Ping Lu, Chongyang Tao |
Graph neural networks have been widely used in Boolean satisfiability (SAT) tasks to learn structural information from SAT formulas. The goal of these studies is to solve SAT instances or to enhance SAT solvers, including tasks such as unsat-core prediction. H...Graph neural networks have been widely used in Boolean satisfiability (SAT) tasks to learn structural information from SAT formulas. The goal of these studies is to solve SAT instances or to enhance SAT solvers, including tasks such as unsat-core prediction. However, most existing approaches model a SAT formula as a bipartite graph or a directed acyclic graph, which are less expressive in capturing higher-order interactions among literals and clauses. Moreover, these approaches are limited in mo...
|
| 176 |
Improving FMQA via Initial Training Data Design Considering Marginal Bit Coverage in One-Hot Encoding
2605.04825
FMQA初始数据覆盖设计通过边际比特覆盖设计初始采样以改进FMQA在独热编码下的优化。
|
cs.LGcond-mat.stat-mech
|
Taiga Hayashi, Yuya Seki, Kotaro Terada, Yosuke Mukasa, Shuta Kikuchi |
Factorization machine with quadratic-optimization annealing (FMQA) is a black-box optimization method that combines a factorization machine (FM) surrogate with QUBO-based search by an Ising machine. When FMQA is applied to integer or discretized continuous var...Factorization machine with quadratic-optimization annealing (FMQA) is a black-box optimization method that combines a factorization machine (FM) surrogate with QUBO-based search by an Ising machine. When FMQA is applied to integer or discretized continuous variables via one-hot encoding, uniform random initial sampling can leave many binary variables never active in the initial training data, and the corresponding FM parameters receive no direct gradient updates from the observed responses. We a...
|
| 177 |
Trustworthy Federated Label Distribution Learning under Annotation Quality Disparity
2605.04827
可信联邦标签分布学习在标注质量差异下提出可信Fed-LDL以稳健聚合并抑制噪声。
|
cs.LG
|
Junxiang Wu, Zhiqiang Kou, Hongwei Zeng, Wenke Huang, Biao Liu |
Label Distribution Learning (LDL) models supervision as an instance-wise probability distribution, enabling fine-grained learning under inherent ambiguity, but its success relies on high-fidelity label distributions that are costly to obtain and thus often noi...Label Distribution Learning (LDL) models supervision as an instance-wise probability distribution, enabling fine-grained learning under inherent ambiguity, but its success relies on high-fidelity label distributions that are costly to obtain and thus often noisy. Motivated by privacy-sensitive applications, we study Federated Label Distribution Learning (Fed-LDL), where data isolation further induces heterogeneous annotation quality across clients, making local updates unevenly reliable and brea...
|
| 178 |
Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models
2605.04830
扩散模型相变与非局部性研究扩散Transformer中对称破缺与非局部性临界相变是否同步发生。
|
cs.LGcond-mat.stat-mech
|
Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao |
Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic mi...Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic minima of the energy landscape, whereas the nonlocality picture views the critical window as when local denoising fails. We study whether two notions of such phase transitions are concurrent in modern diffusion transformers. By evaluating the...
|
| 180 |
Replay-Based Continual Learning for Physics-Informed Neural Operators
2605.04832
物理信息神经算子持续学习用回放式持续学习提升物理信息神经算子在分布外数据上的性能。
|
cs.LG
|
Yizheng Wang, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu |
Neural operators generally demonstrate strong predictive performance on in-distribution (ID) problems. However, a critical limitation of existing methods is their significant performance degradation when encountering out-of-distribution (OOD) data. To address ...Neural operators generally demonstrate strong predictive performance on in-distribution (ID) problems. However, a critical limitation of existing methods is their significant performance degradation when encountering out-of-distribution (OOD) data. To address this issue, this work introduces continual learning into physics-informed neural operators, with particular emphasis on neural operators built upon the Transolver architecture, and proposes a simple yet effective replay-based continual lear...
|
| 181 |
Bridging Input Feature Spaces Towards Graph Foundation Models
2605.04834
Graph feature space alignment提出ALL-IN投影对齐节点特征以跨图数据集迁移。
|
cs.LG
|
Moshe Eliasof, Krishna Sri Ipsit Mantri, Beatrice Bevilacqua, Bruno Ribeiro, Carola-Bibiane Schönlieb |
Unlike vision and language domains, graph learning lacks a shared input space, as input features differ across graph datasets not only in semantics, but also in value ranges and dimensionality. This misalignment prevents graph models from generalizing across d...Unlike vision and language domains, graph learning lacks a shared input space, as input features differ across graph datasets not only in semantics, but also in value ranges and dimensionality. This misalignment prevents graph models from generalizing across datasets, limiting their use as foundation models. In this work, we propose ALL-IN, a simple and theoretically grounded method that enables transferability across datasets with different input features. Our approach projects node features in...
|
| 185 |
Quantile-Free Uncertainty Quantification in Graph Neural Networks
2605.04847
Uncertainty quantification for GNNs提出QpiGNN在弱假设下构建无需分位数的图预测区间。
|
cs.LGcs.AI
|
Soyoung park, Hwanjun Song, Sungsu Lim |
Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in ...Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in practice. Moreover, achieving reliable UQ typically requires costly resampling or post-hoc calibration. To address these issues, we introduce Quantile-free Prediction Interval GNN (QpiGNN), a framework that builds on quantile regression (QR...
|
| 186 |
Hybrid Iterative Neural Low-Regularity Integrator for Nonlinear Dispersive Equations
2605.04853
Neural-corrected PDE solvers以神经算子学习截断误差修正低正则积分器求解色散PDE。
|
cs.LG
|
Zhangyong Liang |
We propose HIN-LRI, a hybrid framework that augments a classical numerical solver with a neural operator trained to correct the solver's structured truncation error. A base low-regularity integrator provides a consistent first-order approximation to nonlinear ...We propose HIN-LRI, a hybrid framework that augments a classical numerical solver with a neural operator trained to correct the solver's structured truncation error. A base low-regularity integrator provides a consistent first-order approximation to nonlinear dispersive PDEs, while a lightweight neural network, operating on a low-dimensional latent manifold, learns the residual defect that analytical methods cannot close. An explicit time-step scaling on the neural correction ensures that its Li...
|
| 192 |
Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models
2605.04874
Uncertainty-aware multimodal DPO提出不确定性感知的探索式DPO以减少多模态模型视觉幻觉。
|
cs.LGcs.CLcs.CV
|
Huatian Zhang, Zhendong Mao, Lei Zhang, Yongdong Zhang |
Direct Preference Optimization (DPO) has proven to be an effective solution for mitigating hallucination in Multimodal Large Language Models (MLLMs) by learning from preference pairs. One of its key challenges lies in how to transfer the sequence-level prefere...Direct Preference Optimization (DPO) has proven to be an effective solution for mitigating hallucination in Multimodal Large Language Models (MLLMs) by learning from preference pairs. One of its key challenges lies in how to transfer the sequence-level preference into fine-grained supervision on visual fidelity. To safeguard vision-related tokens that are prone to hallucination, existing methods typically allocate training emphasis according to the model's self-assessed visual sensitivity signal...
|
| 195 |
A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs
2605.04880
Average-reward RL in SMDPs用调和平均形式重构SMDP平均回报目标并推导相应算法。
|
cs.LGcs.AI
|
Erel Shtossel, Alicia Vidler, Uri Shaham, Gal A. Kaminka |
Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete ...Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete actions stochastically generate both rewards and durations, and the objective is to optimize the average reward rate. Existing algorithms approach this by optimizing the ratio of rewards to durations. However, when rewards and durations are...
|
| 202 |
Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics
2605.04893
Spectral diagnostics of attention transport证明对称谱诊断在注意力传输中存在方向不敏感的结构性局限。
|
cs.LGcs.CLstat.ML
|
Dominik Dahlem, Diego Maniloff, Mac Misiura |
Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used fam...Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation...
|
| 203 |
Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization
2605.04895
Regime-conditioned transfer Bayesian optimization提出按预算与先验质量等机制变量条件化评估迁移贝叶斯优化方法。
|
cs.LGstat.ML
|
Noel Thomas |
Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer...Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer-BO papers from NeurIPS, ICML, ICLR, AISTATS, UAI, TMLR, JMLR, and AutoML-Conf (2022-2025) finds that 98% never vary B/|A| as a controlled axis. On the same GDSC2 benchmark, changing only the budget reverses the ranking: at B=50, Greedy out...
|
| 205 |
A geometric relation of the error introduced by sampling a language model's output distribution to its internal state
2605.04899
Geometric analysis of LM sampling error用嵌入几何推导1-形式与曲率刻画采样误差并关联语义状态。
|
cs.LG
|
Albert F. Modenbach |
GPT-style language models are sensitive to single-token changes at generation points where the predicted probability distribution is spread across multiple tokens. Viewing this sensitivity as a geometric property, we derive an $\mathfrak{so}(n)$-valued 1-form ...GPT-style language models are sensitive to single-token changes at generation points where the predicted probability distribution is spread across multiple tokens. Viewing this sensitivity as a geometric property, we derive an $\mathfrak{so}(n)$-valued 1-form that depends only on the geometry of the token embeddings. Despite this purely geometric origin, we show that its curvature is semantically meaningful: On chess reasoning tasks, the curvature couples to the world model of an off-the-shelf i...
|
| 207 |
Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs
2605.04903
LLM-driven NAS via code diffs让微调LLM生成代码diff迭代改造基线网络以实现高效NAS。
|
cs.LGcs.AIcs.CV
|
Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov |
Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where f...Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where fine-tuned LLMs generate compact unified diffs (deltas) to refine baseline architectures rather than synthesizing entire models. Our pipeline iteratively fine-tunes the LLM via LoRA on curated architectures from the LEMUR dataset, with MinHa...
|
| 209 |
Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features
2605.04905
Robust feature importance across models比较多模型特征重要性以区分静电纺丝中稳健与模型依赖变量。
|
cs.LGcs.DB
|
Mehrab Mahdian, Ferenc Ender, Tamas Pardy |
Electrospinning is a highly sensitive fabrication process in which small variations in operating parameters can significantly influence fiber morphology and material performance. Machine learning (ML) methods are increasingly employed to model these process-st...Electrospinning is a highly sensitive fabrication process in which small variations in operating parameters can significantly influence fiber morphology and material performance. Machine learning (ML) methods are increasingly employed to model these process-structure relationships and to identify the relative importance of processing variables. However, most existing studies rely on a single ML model, implicitly assuming that the resulting feature importance is robust and reproducible. In this s...
|
| 212 |
Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning
2605.04911
Private tabular data synthesis用上下文学习生成表格数据以同时提升质量并降低记忆泄露。
|
cs.LG
|
Xinyan Han, Yan Lu, Xiaoyu Lin, Yuanyuan Jiang, Yuanrui Wang |
Tabular data synthesis aims to generate high-quality data while preserving privacy. However, we find that existing tabular generative models exhibit a clear tradeoff in the small-data regime: improving data quality typically comes at the cost of increased memo...Tabular data synthesis aims to generate high-quality data while preserving privacy. However, we find that existing tabular generative models exhibit a clear tradeoff in the small-data regime: improving data quality typically comes at the cost of increased memorization of training samples, thereby weakening privacy protection. This tradeoff arises because small training sets make it difficult for dataset-specific generative models to distinguish generalizable structure from sample-specific patter...
|
| 215 |
Koopman Identification of Nonlinear Systems via Reservoir Liftings
2605.04917
Koopman learning with reservoirs用储备池提升构造Koopman字典以线性化非线性动力系统。
|
cs.LGcs.RO
|
Weibin Gu, Chen Yang, Lu Shi |
Learning tractable linear representations of nonlinear dynamical systems via Koopman operator theory is often hindered by dictionary selection, temporal memory encoding, and numerical ill-conditioning. Inspired by Reservoir Computing (RC) paradigm, this paper ...Learning tractable linear representations of nonlinear dynamical systems via Koopman operator theory is often hindered by dictionary selection, temporal memory encoding, and numerical ill-conditioning. Inspired by Reservoir Computing (RC) paradigm, this paper introduces the RC-Koopman framework, which interprets reservoir as a stateful, finite-dimensional Koopman dictionary whose temporal depth is explicitly controlled by its spectral radius. We show that the Echo State Property (ESP) guarantees...
|
| 217 |
Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization
2605.04920
RL for compositional generalization用结果级强化学习替代模仿学习以提升组合泛化能力。
|
cs.LGcs.CL
|
Xiyan Fu, Wei Liu |
Compositional generalization refers to correctly interpret novel combinations of known primitives, which remains a major challenge. Existing approaches often rely on supervised fine-tuning, which encourages models to imitate target outputs. This token-level tr...Compositional generalization refers to correctly interpret novel combinations of known primitives, which remains a major challenge. Existing approaches often rely on supervised fine-tuning, which encourages models to imitate target outputs. This token-level training paradigm fails to capture the global compositional structure required for generalizing to unseen combinations. In this work, we investigate whether compositional generalization can instead be improved through outcome-level reinforcem...
|
| 220 |
When Does Gene Regulatory Network Inference Break? A Controlled Diagnostic Study of Causal and Correlational Methods on Single-Cell Data
2605.04930
Failure modes of GRN inference用可控诊断基准剖析单细胞GRN因果推断方法何时失效。
|
cs.LGcs.AIq-bio.GNq-bio.QMstat.ML
|
Miguel Fernandez-de-Retana, Ruben Sanchez-Corcuera, Unai Zulaika, Aritz Bilbao-Jayo, Aitor Almeida |
Despite theoretical advantages, causal methods for Gene Regulatory Network (GRN) inference from single-cell RNA-seq data consistently fail to match or outperform correlation-based baselines in many realistic benchmarks, a persistent puzzle which casts doubt on...Despite theoretical advantages, causal methods for Gene Regulatory Network (GRN) inference from single-cell RNA-seq data consistently fail to match or outperform correlation-based baselines in many realistic benchmarks, a persistent puzzle which casts doubt on the value of causality for this task. We argue that existing benchmarks are insufficiently controlled to answer this question because they evaluate on real or semi-real data where multiple pathologies co-occur, confounding failure modes, a...
|
| 226 |
Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks
2605.04946
BatchNorm effects on partition geometry分析训练期BN如何改变分段仿射网络的超平面与区域划分几何。
|
cs.LGstat.ML
|
Xuan Qi, Yi Wei, Fanqi Yu, Furao shen, Vittorio Murino |
Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geo...Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geometry of switching hyperplanes and the induced affine-region partition. Conditioned on a mini-batch, we show that BN defines for each neuron a reference hyperplane through the batch centroid, and that breakpoint-switching hyperplanes are pa...
|
| 228 |
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
2605.04952
Efficient routing for granular MoE提出倒排索引式自适应路由以降低细粒度MoE的路由成本。
|
cs.LG
|
Klaus-Rudolf Kladny, Maximilian Mordig, Bernhard Schölkopf, Michael Muehlebach |
Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular experts, i.e., many small experts instead of a few large o...Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular experts, i.e., many small experts instead of a few large ones. However, this regime substantially increases routing cost, which can dominate computation. We introduce adaptive inverted-index routing for MoE (AIR-MoE), an inverted-index-inspired routing architecture based on vector quantization (VQ...
|
| 230 |
Order-based Rehearsal Learning
2605.04955
Order-based rehearsal learning提出仅用顺序结构进行回避不良未来决策的排练学习方法。
|
cs.LG
|
Yu-Xuan Tao, Tian-Zuo Wang, Zhi-Hua Zhou |
When a machine learning (ML) model forecasts an undesired event, one often seeks a decision to avoid it, known as the avoiding undesired future (AUF) problem. Many rehearsal learning methods have been proposed for AUF, but they rely on an underlying graph stru...When a machine learning (ML) model forecasts an undesired event, one often seeks a decision to avoid it, known as the avoiding undesired future (AUF) problem. Many rehearsal learning methods have been proposed for AUF, but they rely on an underlying graph structure; learning such a graph from observational data is challenging and can incur substantial estimation error. In this work, we demonstrate that the order structure can be sufficient for AUF decision-making, and propose the first order-bas...
|
| 231 |
KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
2605.04956
Benchmark for LLM-generated GPU kernels发布KernelBench-X评测LLM生成Triton内核的正确性与效率。
|
cs.LGcs.PF
|
Han Wang, Jintao Zhang, Kai Jiang, Haoxu Wang, Jianfei Chen |
LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through categ...LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency across 176 tasks in 15 categories. Our systematic comparison of five representative methods yields three main findings. First, task structure determines correctness more than metho...
|
| 232 |
Delving into Non-Exchangeability for Conformal Prediction in Graph-Structured Multivariate Time Series
2605.04957
Conformal prediction for graph time series研究图结构多变量时序中非交换性对共形预测覆盖率的影响。
|
cs.LG
|
Ruichao Guo, Xingyao Han, Luo Wenshui, Zhe Liu, Chen Gong |
Point forecasting for graph-structured multivariate time series is a fundamental problem, but rigorous uncertainty quantification for such predictions is still underexplored. Conformal prediction (CP) offers uncertainty estimation with a solid coverage guarant...Point forecasting for graph-structured multivariate time series is a fundamental problem, but rigorous uncertainty quantification for such predictions is still underexplored. Conformal prediction (CP) offers uncertainty estimation with a solid coverage guarantee under the exchangeability assumption, which requires the joint data distribution to be unchanged under permutation. However, in graph-structured time series, inherent cross-node coupling can violate the exchangeability condition, making ...
|
| 233 |
EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance
2605.04960
Improved GRPO for RLVR提出EP-GRPO缓解GRPO信用分配问题以提升推理强化学习。
|
cs.LGcs.AI
|
Song Yu, Li Li, Wenwen Zhao, Zhisheng Yang |
Reinforcement learning with verifiable rewards (RLVR), particularly Group Relative Policy Optimization (GRPO), has advanced LLM reasoning. However, GRPO suffers from three credit assignment failures: uniform token-level granularity that ignores heterogeneous i...Reinforcement learning with verifiable rewards (RLVR), particularly Group Relative Policy Optimization (GRPO), has advanced LLM reasoning. However, GRPO suffers from three credit assignment failures: uniform token-level granularity that ignores heterogeneous informational value, uniform polarity that penalizes correct steps and rewards incorrect ones, and zero-variance collapse that erases outcome-driven gradients. We systematically quantify these failures, revealing highly non-uniform token inf...
|
| 235 |
Reliable Modeling of Distribution Shifts via Displacement-Reshaped Optimal Transport
2605.04965
Optimal transport for distribution shifts提出ReshapeOT利用样本位移重塑地面度量以更可靠建模分布漂移。
|
cs.LGcs.AI
|
Philip Naumann, Jacob Kauffmann, Klaus-Robert Müller, Grégoire Montavon |
Optimal transport (OT) is a central framework for modeling distribution shifts. Because OT compares distributions directly in input space, a well-designed ground metric between observations is essential to ensure that the optimizer does not violate the true ge...Optimal transport (OT) is a central framework for modeling distribution shifts. Because OT compares distributions directly in input space, a well-designed ground metric between observations is essential to ensure that the optimizer does not violate the true geometry of change. We propose Displacement-Reshaped Optimal Transport (ReshapeOT), a method that reshapes the ground metric by integrating observed sample displacements as an additional source of knowledge. Technically, ReshapeOT replaces th...
|
| 236 |
Skill Neologisms: Towards Skill-based Continual Learning
2605.04970
Skill-based continual learning via soft tokens用可训练软词“技能新词”扩展LLM新技能并减轻遗忘。
|
cs.LGcs.AI
|
Antonin Berthon, Nicolas Astorga, Mihaela van der Schaar |
Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open-problem: fine-tuning and parameter-efficient variants risk catas...Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open-problem: fine-tuning and parameter-efficient variants risk catastrophic forgetting, while context-based approaches have limited expressiveness and are constrained by the model's effective context. We explore skill neologisms--i.e., soft tokens integrated in the model's vocabulary and optimized to improv...
|
| 237 |
Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking
2605.04971
Geometric continuity in deep networks解释残差与对称性破缺如何导致相邻层奇异向量对齐的几何连续性。
|
cs.LGcs.AIcs.CL
|
Kyungwon Jeong, Won-Gi Paeng, Honggyo Suh |
Weight matrices in deep networks exhibit geometric continuity -- principal singular vectors of adjacent layers point in similar directions. While this property has been widely observed, its origin remains unexplained. Through experiments on toy MLPs and small ...Weight matrices in deep networks exhibit geometric continuity -- principal singular vectors of adjacent layers point in similar directions. While this property has been widely observed, its origin remains unexplained. Through experiments on toy MLPs and small transformers, we identify two mechanisms: residual connections create cross-layer gradient coherence that aligns weight updates across layers, and symmetry-breaking nonlinearities constrain all layers to a shared coordinate frame, preventin...
|
| 242 |
Conceptors for Semantic Steering
2605.04980
Conceptor语义操控用软投影矩阵conceptor保留多维概念子空间以引导LLM行为。
|
cs.LGcs.CL
|
Ilias Triantafyllopoulos, Young-Min Cho, Ren Tao, Miranda Muqing Miao, Sunny Rai |
Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: ...Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vec...
|
| 243 |
Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers
2605.04984
无验证器信用分配提出自诱导结果势为长程代理提供无标签的回合级奖励信号。
|
cs.LGcs.CL
|
Senkang Hu, Yong Dai, Xudong Han, Zhengru Fang, Yuzhi Zhao |
Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turn...Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turns that increase the likelihood of a gold answer, but they require answer supervision or stable task-specific verifiers. Conversely, label-free RL methods extract self-signals from output distributions, but mainly at the answer or trajectory...
|
| 246 |
Federated Learning for Early Prediction of EV Charging Demand
2605.04993
联邦学习充电需求预测用联邦学习在充电早期预测EV会话总能量需求以支持调度。
|
cs.LGcs.AI
|
Vasilis Perifanis, Foteini Nikolaidou, Nikolaos Pavlidis, Panagiotis Thomakos, Andreas Sendros |
Accurate forecasting of electric vehicle (EV) charging demand is critical for grid stability, infrastructure planning, and real-time charging optimization. In this work, we study the problem of early prediction of charging demand, where the total energy of a s...Accurate forecasting of electric vehicle (EV) charging demand is critical for grid stability, infrastructure planning, and real-time charging optimization. In this work, we study the problem of early prediction of charging demand, where the total energy of a session is estimated using only information available at plug-in time and during the first minutes of charging. This enables actionable decisions while the session is still in progress, which is of direct importance for EV network operators....
|
| 247 |
Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning
2605.04995
自适应查询学习理论在可实现约束下比较in-context与agentic自适应查询的逼近能力。
|
cs.LGmath.STstat.ML
|
Anastasis Kratsios, A. Martina Neuman, Philipp Petersen |
We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizabl...We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we require these operations to be implemented by ReLU neural networks. In both settings, adaptivity never hinders approximation performance. However, this advantage can change when one passes from the unrestricted regime to ...
|
| 248 |
DualTCN: A Physics-Constrained Temporal Convolutional Network for 2 Time-Domain Marine CSEM Inversion
2605.04997
物理约束CSEM反演提出DualTCN以物理约束时序卷积网络反演海洋CSEM瞬变参数。
|
cs.LG
|
Khaled Ahmed, Ghada Omar |
DualTCN is the first deep-learning framework for inverting time-domain marine controlled-source electromagnetic (MCSEM) transient data. Moving away from traditional subsurface discretization, the framework regresses four earth-model parameters -- $σ_1$, $σ_2$,...DualTCN is the first deep-learning framework for inverting time-domain marine controlled-source electromagnetic (MCSEM) transient data. Moving away from traditional subsurface discretization, the framework regresses four earth-model parameters -- $σ_1$, $σ_2$, $d_1$, $d_2$ -- and reconstructs conductivity-depth profiles using a differentiable soft-step decoder. The optimized architecture (379K parameters) features a Temporal Convolutional Network (TCN) encoder paired with a late-time branch and ...
|
| 254 |
Learned Neighbor Trust for Collaborative Deployment in Model-Agnostic Decentralized Learning
2605.05009
去中心化协同推理学习邻居信任使去中心化节点在部署时可协同组合预测而非孤立。
|
cs.LG
|
Michael Lanier, Luise Ge, Sastry Kompella, Yevgeniy Vorobeychik |
Many decentralized distillation methods are designed around training-time coordination, yet deploy each node in isolation even when more capable neighbors remain available at inference time. This is an incomplete objective for settings such as IoT, where devic...Many decentralized distillation methods are designed around training-time coordination, yet deploy each node in isolation even when more capable neighbors remain available at inference time. This is an incomplete objective for settings such as IoT, where devices are heterogeneous, data is scarce and skewed, and a node's strongest neighbors may far exceed its own local capacity. We study how nodes should train so that their predictions compose well at deployment, and how each node should learn wh...
|
| 258 |
Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning
2605.05020
多智能体多样性度量提出Graph-SND用稀疏图边聚合近似SND以降低多样性计算成本。
|
cs.LGcs.MA
|
Shawn Ray |
System Neural Diversity (SND) measures behavioral heterogeneity in multi-agent reinforcement learning by averaging pairwise distances over all $\binom{n}{2}$ agent pairs, making each call quadratic in team size. We introduce Graph-SND, which replaces this comp...System Neural Diversity (SND) measures behavioral heterogeneity in multi-agent reinforcement learning by averaging pairwise distances over all $\binom{n}{2}$ agent pairs, making each call quadratic in team size. We introduce Graph-SND, which replaces this complete-graph average with a weighted average over the edges of an arbitrary graph $G$. Three regimes follow: $G=K_n$ recovers SND exactly; a fixed sparse $G$ defines a localized diversity measure at $O(|E|)$ cost; and random edge samples yiel...
|
| 259 |
CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels
2605.05023
LLM理解重建CUDA核提出CuBridge用LLM理解并重构高性能注意力CUDA内核以兼顾性能与灵活。
|
cs.LG
|
Xing Ma, Yangjie Zhou, Wu Sun, Zihan Liu, Jingwen Leng |
Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-w...Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex opera...
|
| 264 |
The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence
2605.05029
预测表示的因果缺口给出不可能定理与大规模证据表明预测编码会偏向环境而非系统因果。
|
cs.LG
|
Kejun Liu |
We report a systematic failure mode in predictive representation learning. Across 2695 neural network configurations trained to predict linear-Gaussian dynamics, the optimal encoder tracks the environment rather than the system it is meant to model. The mean c...We report a systematic failure mode in predictive representation learning. Across 2695 neural network configurations trained to predict linear-Gaussian dynamics, the optimal encoder tracks the environment rather than the system it is meant to model. The mean causal fidelity -- the fraction of encoder sensitivity allocated to system degrees of freedom -- is 0.49, and only 2.5% of configurations exceed 0.70. The failure intensifies with dimension: at N=100, the optimal encoder becomes causally bli...
|
| 267 |
Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization
2605.05040
偏好自蒸馏训练提出基于奖励正则的偏好自蒸馏超越KL匹配以提升稳定性与效果。
|
cs.LGcs.AI
|
Xin Yu, Liuchen Liao, Yiwen Zhang, Yingchen Yu, Lingzhou Xue |
On-policy distillation is an efficient alternative to reinforcement learning, offering dense token-level training signals. However, its reliance on a stronger external teacher has driven recent work on on-policy self-distillation, where the same model serves a...On-policy distillation is an efficient alternative to reinforcement learning, offering dense token-level training signals. However, its reliance on a stronger external teacher has driven recent work on on-policy self-distillation, where the same model serves as both teacher and student under different prompt contexts. Yet, existing self-distillation methods largely reduce learning to KL matching toward the context-augmented teacher model. This approach often suffers from training instability and...
|
| 273 |
Adaptive Learning Strategies for AoA-Based Outdoor Localization: A Comprehensive Framework
2605.05055
AoA outdoor localization learning提出面向5G/6G的AoA定位自适应训练与特征选择框架。
|
cs.LGcs.AIeess.SP
|
Bac Trinh-Nguyen, Sara Berri, Sin G. Teo, Tram Truong-Huu, Arsenia Chorti |
Localization in 5G and 6G networks is essential for important use cases such as intelligent transportation, smart factories, and smart cities. Although deep learning has enabled improving localization accuracy, depending on the deployment scenario and the effo...Localization in 5G and 6G networks is essential for important use cases such as intelligent transportation, smart factories, and smart cities. Although deep learning has enabled improving localization accuracy, depending on the deployment scenario and the effort required for dataset collection campaigns on a given infrastructure, the training process for localization models can vary significantly. Furthermore, with respect to feature selection, recent works have demonstrated the robustness of an...
|
| 276 |
Full-chip CMP modelling based on Fully Convolutional Network leveraging White Light Interferometry
2605.05062
CMP modeling with FCN用白光干涉数据训练全卷积网络进行全芯片CMP形貌建模。
|
cs.LG
|
Jules Exbrayat, Renan Bouis, Elie Sezestre, Viorel Balan, Arnaud Cornelis |
As time-to-market is crucial in the Integrated Circuit (IC) industry, speeding up layout manufacturability verifi-cation is essential. Chemical-Mechanical Polishing (CMP) plays a vital role in IC fabrication but is significantly influenced by Layout-Dependent ...As time-to-market is crucial in the Integrated Circuit (IC) industry, speeding up layout manufacturability verifi-cation is essential. Chemical-Mechanical Polishing (CMP) plays a vital role in IC fabrication but is significantly influenced by Layout-Dependent Effects (LDE). An accurate and efficient CMP model enables design teams to correct surface unevenness before fabrication, reducing costs and accelerating the design phase. However, existing models often rely on Density Step Height (DSH) mod...
|
| 277 |
Expert Routing for Communication-Efficient MoE via Finite Expert Banks
2605.05278
Communication-efficient MoE routing将MoE门控视为信道并用有限专家库提升路由与通信效率。
|
cs.LGcs.IT
|
Mohammad Reza Deylam Salehi, Ali Khalesi |
Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpr...Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpretations of MoE gating, we treat the gate as a stochastic channel and use $I(X;T)$ to quantify the routing information available to the selected expert. To make the associated information quantities tractable beyond synthetic examples, we d...
|
| 284 |
Provable imitation learning for control of instability in partially-observed Vlasov--Poisson equations
2605.05081
Imitation learning for plasma control将全观测专家策略蒸馏为仅用宏观观测的等离子体稳定控制器。
|
cs.LGmath.APmath.OCphysics.plasm-ph
|
Xiaofan Xia, Qin Li, Wenlong Mou |
We consider the stabilization of Vlasov--Poisson plasma dynamics, a central control problem in nuclear fusion. Our focus is the gap between what an ideal controller would use and what experiments can actually observe: while optimal policy may rely on the full ...We consider the stabilization of Vlasov--Poisson plasma dynamics, a central control problem in nuclear fusion. Our focus is the gap between what an ideal controller would use and what experiments can actually observe: while optimal policy may rely on the full phase-space state, practical feedback is typically limited to sparse macroscopic diagnostics. We therefore study imitation learning methods that distill a fully observed expert policy into controllers operating only on macroscopic measureme...
|
| 286 |
Order Matters: Improving Domain Adaptation by Reordering Data
2605.05084
Domain adaptation via data reordering通过最优重排训练数据降低差异估计方差以改进UDA。
|
cs.LG
|
Andrea Napoli, Paul White |
Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in ...Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Optimal Reordering of Data for Error-Reduced Estimation of Discrepancy (ORDERED), a novel unbiased stochastic variance reduction technique whi...
|
| 287 |
Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis
2605.05088
Multimodal building energy prediction用门控多模态模型预测建筑能效并分析改造情景影响。
|
cs.LGphysics.soc-ph
|
Yunfei Bai, Aaron Tesfa Tsion, Raul Rosales, Barbara Shollock, Wei He |
Achieving resilient and sustainable cities requires scalable approaches to decarbonising residential buildings, which account for about 20% of UK greenhouse gas emissions and 25% of energy-related emissions in the European Union. Energy Performance Certificate...Achieving resilient and sustainable cities requires scalable approaches to decarbonising residential buildings, which account for about 20% of UK greenhouse gas emissions and 25% of energy-related emissions in the European Union. Energy Performance Certificates (EPCs) support regulation and retrofit planning, but their reliance on on-site inspections limits timely city-scale assessment. This study introduces a gated multimodal model to predict Standard Assessment Procedure (SAP) energy efficienc...
|
| 293 |
Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics
2605.05097
Multi-timescale memory for LLMs提出多时间尺度外部记忆机制实现LLM系统的持续知识更新。
|
cs.LGcs.AIcs.CL
|
Andreas Pattichis, Constantine Dovrolis |
LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynam...LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen what repetition confirms, and let the rest fade. We argue that external memory should follow a similar principle. In Memini, this view takes the form of an associative memory that org...
|
| 294 |
Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning
2605.05102
Distributional regret in RL/bandits统一刻画老虎机与强化学习的分布式遗憾并给出算法与界。
|
cs.LGstat.ML
|
Harin Lee, Min-hwan Oh |
We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $δ\in...We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $δ\in (0,1]$, thereby characterizing the regret distribution across the full range of $δ$. We present a simple UCBVI-style algorithm with exploration bonus $\min\{c_{1,k}/N, c_{2,k}/\sqrt{N}\}$, where $N$ denotes the visit count and $(c_{1,k},c_...
|
| 298 |
Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
2605.05112
Pass-rate control in binary-reward RL通过控制rollout通过率将二值奖励RL引导到最信息区间。
|
cs.LG
|
Tianshu Zhu, Wenyu Zhang, Xiaoying Zuo, Lun Tian, Haotian Zhao |
Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side sign...Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side signal is strongest near a 50% rollout pass rate under four criteria: reward entropy, group-filtering survival, leave-one-out (RLOO) advantage energy under Group Relative Policy Optimization (GRPO), and success-failure pair count. We propose Pr...
|
| 299 |
How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences
2605.05113
Finite-width recurrent signal propagation推导有限宽线性递归的信号传播公式并分析无限宽近似失效时长。
|
cs.LG
|
Mariia Seleznova |
We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth $t$ grow...We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth $t$ grows jointly with width $n$. This question is especially relevant for modern recurrent sequence models, whose natural operating regime involves long input sequences, i.e., large $t$. We derive exact finite-width formulas for the hidden state s...
|
| 300 |
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
2605.05115
Activation manifold steering沿激活流形几何路径干预表示以检验其对模型行为的因果影响。
|
cs.LG
|
Daniel Wurgaft, Can Rager, Matthew Kowal, Vasudev Shyam, Sheridan Feucht |
Neural representations carry rich geometric structure; but does that structure causally shape behavior? To address this question, we intervene along paths through activation space defined by different geometries, and measure the behavioral trajectories they in...Neural representations carry rich geometric structure; but does that structure causally shape behavior? To address this question, we intervene along paths through activation space defined by different geometries, and measure the behavioral trajectories they induce. In particular, we test whether interventions that respect the geometry of activation space will yield behaviors close to those the model exhibits naturally. Concretely, we first fit an activation manifold $M_h$ to representations and ...
|
| 301 |
On the Hardness of Junking LLMs
2605.05116
LLM越狱攻击难度研究无结构随机搜索式提示对LLM越狱的难度界限。
|
cs.LG
|
Marco Rando, Samuel Vaiter |
Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by fixing an adversarial instruction and optimizing small a...Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by fixing an adversarial instruction and optimizing small adversarial components (e.g., suffixes or prefixes). In this setting, prompt structure is fundamental for performance, and recent results show that even simple random search can achieve strong performance when combined with sophisticated pro...
|
| 302 |
On the Wasserstein Gradient Flow Interpretation of Drifting Models
2605.05118
漂移生成模型理论从Wasserstein梯度流视角解析漂移生成建模的固定点机制。
|
cs.LGcs.AIstat.ML
|
Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli |
Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functiona...Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the geometry of optimal transport. Unlike previous WGF-based contributions, GMD can be thought of as directly targeting a fixed point of a specific WGF flow. We demonstrate three main re...
|
| 303 |
Physiologically Grounded Driver Behavior Classification: SHAP-Driven Elite Feature Selection and Hybrid Gradient Boosting for Multimodal Physiological Signals
2605.05120
驾驶行为生理信号分类用SHAP精选特征并以梯度提升融合EEG/EMG/GSR识别驾驶行为。
|
cs.LGeess.SP
|
Sahar Askari, Mohammad Mahdi Mirza Ali Mohammadi, Fatemeh Ensafdoust, Amin Golnari, Saeid Sanei |
An interpretable and scalable framework for decoding driving behaviors from multimodal physiological signals is proposed in this study. We utilize multimodal physiological driving behavior large-scale dataset comprising synchronized electroencephalogram (EEG),...An interpretable and scalable framework for decoding driving behaviors from multimodal physiological signals is proposed in this study. We utilize multimodal physiological driving behavior large-scale dataset comprising synchronized electroencephalogram (EEG), electromyography (EMG), and galvanic skin response (GSR) signals. Our approach involves rigorous preprocessing followed by a domain-specific feature extraction pipeline targeting time-domain, frequency-domain, and derived physiological ind...
|
| 305 |
Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning
2605.05123
离线到在线强化学习在交互预算下自适应选策略并在线微调以提升O2O-RL性能。
|
cs.LGcs.AI
|
Alper Kamil Bozkurt, Xiaoan Xu, Shangtong Zhang, Miroslav Pajic, Yuichi Motai |
In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained ...In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained with offline RL are evaluated via either off-policy evaluation (OPE) or online evaluation (OE). The policy with the highest estimated value is then deployed and continually fine-tuned. However, this setup has two main issues. First, OPE can...
|
| 306 |
Conditional outlier detection for clinical alerting
2605.05124
临床告警异常检测基于EHR做条件异常检测以识别不寻常的患者管理操作并告警。
|
cs.LGcs.CY
|
Milos Hauskrecht, Michal Valko, Shyam Visweswaran, Iyad Batal, Gilles Clermont |
We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management actions using past patient cases stored in an electronic health record (EHR) system. Our hypothesis is that patient-management actions that are unusual with res...We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management actions using past patient cases stored in an electronic health record (EHR) system. Our hypothesis is that patient-management actions that are unusual with respect to past patients may be due to a potential error and that it is worthwhile to raise an alert if such a condition is encountered. We evaluate this hypothesis using data obtained from the electronic health records of 4,486 post-cardiac s...
|
| 307 |
Forecasting Green Skill Demand in the Automotive Industry: Evidence from Online Job Postings
2605.05280
绿色技能需求预测从汽车行业招聘信息抽取技能并预测绿色技能需求趋势。
|
cs.LG
|
Sabur Butt, Joshua N. Arrazola E., Hector G. Ceballos, Patricia Caratozzolo |
The global transition toward sustainable economies is reshaping labor markets, yet systematic methods for identifying and forecasting green skills remain limited. This study presents a computational framework to measure and predict green skill demand using onl...The global transition toward sustainable economies is reshaping labor markets, yet systematic methods for identifying and forecasting green skills remain limited. This study presents a computational framework to measure and predict green skill demand using online job postings from Mexico's automotive industry, which contributes about 4% of national GDP. We compile a dataset of job advertisements from Indeed Mexico, OCC Mundial, and LinkedIn (July 2024 to July 2025), yielding 204,373 skill record...
|
| 308 |
Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
2605.05125
缺失EHR因果效应估计用时序因果流模型结合LLM驱动MNAR插补估计联合治疗效应。
|
cs.LGcs.AI
|
Olivia Jullian Parra, Sara Zoccheddu, David Catalan Cerezo, Tom Forzy, Franziska Ulrich |
Target trial emulation (TTE) enables causal questions to be studied with observational data when randomized controlled trials (RCTs) are infeasible. Yet treatment-effect methods often address causal estimation, missingness, and temporal structure separately, l...Target trial emulation (TTE) enables causal questions to be studied with observational data when randomized controlled trials (RCTs) are infeasible. Yet treatment-effect methods often address causal estimation, missingness, and temporal structure separately, limiting their robustness in electronic health records (EHRs), where time-varying confounding and missing-not-at-random (MNAR) biomarkers can reach 50%--80%. We propose a two-stage pipeline for treatment effect estimation from incomplete lon...
|
| 309 |
Transformed Latent Variable Multi-Output Gaussian Processes
2605.05133
可扩展多输出高斯过程提出变换潜变量MOGP以在高维输出下保持表达力与可扩展性。
|
cs.LG
|
Xiaoyu Jiang, Xinxing Shi, Sokratia Georgaka, Magnus Rattray, Mauricio A Álvarez |
Multi-Output Gaussian Processes (MOGPs) provide a principled probabilistic framework for modelling correlated outputs but face scalability bottlenecks when applied to datasets with high-dimensional output spaces. To maintain tractability, existing methods typi...Multi-Output Gaussian Processes (MOGPs) provide a principled probabilistic framework for modelling correlated outputs but face scalability bottlenecks when applied to datasets with high-dimensional output spaces. To maintain tractability, existing methods typically resort to restrictive assumptions, such as employing low-rank or sum-of-separable kernels, which can limit expressiveness. We propose the Transformed Latent Variable MOGP (T-LVMOGP), a novel framework that scales MOGPs to a massive nu...
|
| 310 |
Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction
2605.05134
LLM幻觉黑盒检测将LLM视为动力系统并用嵌入序列预测实现低成本幻觉检测。
|
cs.LGmath.DS
|
Dan Wilson, Mohamed Akrout |
Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrie...Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations...
|
| 314 |
Human-AI Co-Mentorship in Project-Based Learning: A Case Study in Financial Forecasting
2605.05144
人机共导师项目学习案例研究AI工具如何辅助学生在金融预测项目中开展共导师学习。
|
cs.LGcs.CY
|
Freyaa Chawla, Ahan Chawla, Rishi Singh, Joe Germino, Grigorii Khvatskii |
This paper reflects on a AI research project carried out by a team of high-school and early-undergraduate students under the mentorship of graduate researchers and ably assisted by AI tools. We share our experience in not only on the learning experience for th...This paper reflects on a AI research project carried out by a team of high-school and early-undergraduate students under the mentorship of graduate researchers and ably assisted by AI tools. We share our experience in not only on the learning experience for the high school students, but also on how AI tools accelerated the process that enabled the high school students to focus on higher order problem formulation and solution. Although the participants entered the project with limited background ...
|
| 316 |
Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting
2605.05151
时序Transformer机理解释用稀疏自编码器剖析Transformer做预测的表征机制并质疑叠加必要性。
|
cs.LGcs.AI
|
Alper Yıldırım |
Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear m...Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the interna...
|
| 326 |
Attribution-Guided Continual Learning for Large Language Models
2605.05285
LLM持续学习抗遗忘用归因信息选择需保留或更新参数以缓解LLM灾难性遗忘。
|
cs.LG
|
Yazheng Liu, Yuxuan Wan, Rui Xu, Xi Zhang, Sihong Xie |
Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or r...Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or regularization. However, these methods lack semantic awareness of internal knowledge distribution in LLMs. As a result, they cannot distinguish parameters that should be preserved or updated. We propose an attribution-guided continual fine-t...
|
| 330 |
Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
2605.05176
非线性回归的ICL理论分析Transformer在非线性回归ICL中注意力作为特征提取器的机制。
|
cs.LGmath.NA
|
Alexander Hsu, Zhaiming Shen, Wenjing Liao, Rongjie Lai |
Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding...Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realiz...
|
| 331 |
Estimating the expected output of wide random MLPs more efficiently than sampling
2605.05179
Wide MLP期望输出估计用累积量与Hermite展开无采样估计随机MLP期望输出。
|
cs.LGcond-mat.dis-nnstat.ML
|
Wilson Wu, Victor Lecomte, Michael Winer, George Robinson, Jacob Hilton |
By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate ...By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate its expected output over Gaussian inputs without running samples through the network at all. Instead, we produce approximate representations of the distributions of activations at each layer, leveraging tools such as cumulants and Hermite e...
|
| 345 |
Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS
2605.05330
可微MWIS图归一化动力学提出Graph Normalization以可微动力系统快速逼近最大权独立集。
|
cs.LGcs.AIcs.DMcs.NE
|
Laurent Guigues |
We introduce Graph Normalization (GN), a principled dynamical system on graphs that serves as a differentiable approximation engine for the NP-hard Maximum Weight Independent Set (MWIS) problem. MWIS encompasses many combinatorial challenges, including optimal...We introduce Graph Normalization (GN), a principled dynamical system on graphs that serves as a differentiable approximation engine for the NP-hard Maximum Weight Independent Set (MWIS) problem. MWIS encompasses many combinatorial challenges, including optimal assignment, scheduling, set packing, and MAP inference in discrete Markov Random Fields. Unlike Belief Propagation, we prove GN always converges to a binary indicator of a Maximum Independent Set. GN realizes a fast quasi-Newton descent th...
|
| 348 |
Feature Starvation as Geometric Instability in Sparse Autoencoders
2605.05341
稀疏自编码器特征饥饿机理将SAE死特征解释为几何不稳定并给出改进训练思路。
|
cs.LGcs.AImath.OCstat.ML
|
Faris Chaudhry, Keisuke Yano, Anthea Monod |
Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $\ell_1$-regularized SAEs suffer from feature starvation (dead neur...Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $\ell_1$-regularized SAEs suffer from feature starvation (dead neurons) and shrinkage bias, often requiring computationally expensive heuristic resampling and nondifferentiable hard-masking methods to bypass these challenges. We argue that feature starvation is not merely an empirical artifact of poor data...
|
| 354 |
A Multi-Head Attention Approach for SLA Compliance Monitoring in Data Centers
2605.05354
数据中心SLA合规监测模型将SLA规则JSON化生成训练数据并用Transformer预测违规风险。
|
cs.LG
|
Omanshu Thapliyal |
Service level agreements (SLAs) in data center colocation contracts define precise thresholds for power, temperature, and humidity, with tiered violation penalties expressed as credits against monthly recurring charges. Traditional reactive monitoring detects ...Service level agreements (SLAs) in data center colocation contracts define precise thresholds for power, temperature, and humidity, with tiered violation penalties expressed as credits against monthly recurring charges. Traditional reactive monitoring detects breaches only after they occur, limiting remediation opportunities. We present a framework that encodes SLA rules as structured JSON objects to generate training data without manual annotation. We train a per-customer multi-head transformer...
|
| 355 |
Balancing Stability and Plasticity in Sequentially Trained Early-Exiting Neural Networks
2605.05358
早退网络顺序训练稳定性缓解新增出口干扰旧出口,平衡早退网络稳定性与可塑性。
|
cs.LGcs.CV
|
Alaa Zniber, Ouassim Karrakchou, Mounir Ghogho |
Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them t...Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them to a shared backbone; however, this sequential training can cause newly introduced exits to interfere with previously learned ones, degrading the performance of earlier classifiers. We address this problem by retaining the knowledge embedded...
|
| 356 |
COPYCOP: Ownership Verification for Graph Neural Networks
2605.05360
图神经网络模型所有权验证提出CopyCop在可变架构与嵌入变换下验证GNN是否被抄袭。
|
cs.LGcs.AI
|
Rahul Nandakumar, Deepayan Chakrabarti |
Given two GNNs that output node embeddings, how can we determine if they were trained independently? An adversary could have trained one GNN specifically to mimic the other GNN's embeddings. To obscure this relationship between the GNNs, the adversarial GNN mi...Given two GNNs that output node embeddings, how can we determine if they were trained independently? An adversary could have trained one GNN specifically to mimic the other GNN's embeddings. To obscure this relationship between the GNNs, the adversarial GNN might then transform its output embeddings. The two GNNs could have different architectures, weights, and embedding dimensions, and the adversary can transform the embeddings. Despite these stringent conditions, our algorithm (named CopyCop) ...
|
| 360 |
SPADE: Faster Drug Discovery by Learning from Sparse Data
2605.05370
稀疏数据驱动药物发现提出SPADE在少量实验数据下高效选择配体加速药物筛选。
|
cs.LGcs.AI
|
Rahul Nandakumar, Ben Fauber, Deepayan Chakrabarti |
Drug discovery seeks molecules (ligands) that bind strongly and selectively to a target protein. However, fewer than 5% of candidate ligands pass the bar for even the early stages of drug discovery. Furthermore, we want methods that work for novel proteins for...Drug discovery seeks molecules (ligands) that bind strongly and selectively to a target protein. However, fewer than 5% of candidate ligands pass the bar for even the early stages of drug discovery. Furthermore, we want methods that work for novel proteins for which we have no prior data. Starting from scratch, we have to iteratively select and test candidate ligands such that we find enough ligands of the desired quality in as few tests as possible. Our proposed algorithm, named SPADE, introduc...
|
| 362 |
Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning
2605.05373
可解释循环强化学习将RNN策略隐状态与PMP协态联系以提升可解释性。
|
cs.LG
|
David Leeftink, Max Hinne, Marcel van Gerven |
A key capability of intelligent agents is operating under partial observability: reasoning and acting effectively despite missing or incomplete state observations. While recurrent (memory-based) policies learned via reinforcement learning address this by encod...A key capability of intelligent agents is operating under partial observability: reasoning and acting effectively despite missing or incomplete state observations. While recurrent (memory-based) policies learned via reinforcement learning address this by encoding history into latent state representations, their internal dynamics remain uninterpretable black boxes. This paper establishes a formal link between these hidden states and the Pontryagin minimum principle (PMP) from optimal control. We ...
|
| 367 |
Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees
2605.05387
线性约束条件扩散采样研究线性约束下扩散条件采样并给出混合与信息论保证。
|
cs.LGcs.IT
|
Ahmad Aghapour, Erhan Bayraktar, Asaf Cohen |
We study zero-shot conditional sampling with pretrained diffusion models for linear inverse problems, including inpainting and super-resolution. In these problems, the observation determines only part of the unknown signal. The remaining degrees of freedom mus...We study zero-shot conditional sampling with pretrained diffusion models for linear inverse problems, including inpainting and super-resolution. In these problems, the observation determines only part of the unknown signal. The remaining degrees of freedom must be sampled according to the correct conditional data distribution. Existing projection-based samplers enforce measurement consistency by correcting the observed component during reverse diffusion. However, measurement consistency alone do...
|
| 368 |
Two-Stage Learned Decomposition for Scalable Routing on Multigraphs
2605.05389
多重图车辆路径规划用两阶段分解与节点边因子化实现可扩展多重图路由。
|
cs.LGcs.AI
|
Filip Rydin, Morteza Haghir Chehreghani, Balázs Kulcsár |
Most neural methods for Vehicle Routing Problems (VRPs) are limited to Euclidean settings or simple graphs. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade-offs (e.g., distance vs time). ...Most neural methods for Vehicle Routing Problems (VRPs) are limited to Euclidean settings or simple graphs. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade-offs (e.g., distance vs time). Few methods are designed for such formulations and those that do exist face major scalability issues. We mitigate these scalability issues via a Node-Edge Policy Factorization (NEPF) approach, which splits the routing policy into a node per...
|
| 371 |
Differentiable Parameter Optimization for DAEs with State-Dependent Events
2605.05395
事件DAE可微优化为含状态事件的DAE建立可微参数学习与梯度计算方法。
|
cs.LGcs.MS
|
Ion Matei, Maksym Zhenirovskyy, Anthony Wong |
Differential-algebraic equations (DAEs) with state-dependent events arise in systems whose continuous dynamics are constrained by algebraic equations and interrupted by mode changes, switching logic, impacts, or state reinitializations. Gradient-based paramete...Differential-algebraic equations (DAEs) with state-dependent events arise in systems whose continuous dynamics are constrained by algebraic equations and interrupted by mode changes, switching logic, impacts, or state reinitializations. Gradient-based parameter learning for such systems is challenging because algebraic variables are implicitly defined, event times depend on the parameters, and reset maps introduce discontinuities. This paper studies differentiable parameter optimization for semi...
|
| 381 |
Information Theoretic Adversarial Training of Large Language Models
2605.05415
信息论对抗训练LLM用信息论目标设计可扩展对抗训练提升提示鲁棒性。
|
cs.LGcs.AIcs.CR
|
Yiwei Zhang, Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia, Jason Pacheco |
Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are compu...Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are computationally expensive and difficult to scale. Recent continuous adversarial training methods, such as Continuous adversarial training (CAT) and Continuous Adversarial Preference Optimization (CAPO), address this challenge by leveraging gradi...
|
| 385 |
Active Learning for Conditional Generative Compressed Sensing
2605.05435
生成压缩感知主动学习研究条件生成压缩感知中采样分布的主动设计与恢复。
|
cs.LGmath.NA
|
Alexander DeLise, Nick Dexter |
Generative compressed sensing uses the range of a pretrained generator as a nonlinear model for recovering structured signals from limited measurements. We study a conditional version of this problem for image recovery from subsampled Fourier measurements usin...Generative compressed sensing uses the range of a pretrained generator as a nonlinear model for recovering structured signals from limited measurements. We study a conditional version of this problem for image recovery from subsampled Fourier measurements using prompt-conditioned generative models. Our framework separates two roles of conditioning: the prompt used to design the sampling distribution and the prompt used to define the recovery model. For ReLU and Lipschitz conditional generators, ...
|
| 387 |
On Semantic Loss Fine-Tuning Approach for Preventing Model Collapse in Causal Reasoning
2605.05438
因果推理微调防崩塌用语义损失抑制因果任务微调时的模型塌缩与投机解。
|
cs.LGcs.AI
|
Pratik Deshmukh, Atirek Gupta |
Standard fine-tuning of transformer models on causal reasoning tasks leads to catastrophic model collapse, where models learn trivial solutions such as always predicting "Yes" or "No" regardless of input structure. We demonstrate that fine-tuning Gemma 270M on...Standard fine-tuning of transformer models on causal reasoning tasks leads to catastrophic model collapse, where models learn trivial solutions such as always predicting "Yes" or "No" regardless of input structure. We demonstrate that fine-tuning Gemma 270M on transitivity and d-separation tasks without semantic loss results in 100% collapse rate, with models achieving misleadingly high accuracy (73.9%) while learning no causal reasoning. We propose a semantic loss function with graph-based logi...
|
| 395 |
Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs
2605.05463
Robust Graph Self-Supervised Learning研究文本抽取生物医学图噪声下GSSL的鲁棒性。
|
cs.LGcs.AI
|
Othmane Kabal, Mounira Harzallah, Fabrice Guillet, Hideaki Takeda, Ryutaro Ichise |
Graph Self-Supervised Learning (GSSL) offers a powerful paradigm for learning graph representations without labeled data. However, existing work assumes clean, manually curated graphs. Recent advances in NLP enable the large-scale automatic extraction of knowl...Graph Self-Supervised Learning (GSSL) offers a powerful paradigm for learning graph representations without labeled data. However, existing work assumes clean, manually curated graphs. Recent advances in NLP enable the large-scale automatic extraction of knowledge graphs from text, opening new opportunities for GSSL while introducing substantial real-world noise. This type of noise remains largely unexplored, as prior robustness studies typically rely on synthetic perturbations. To address this ...
|
| 398 |
A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks
2605.05476
Benchmark for Text-KG and GNNs构建统一基准区分知识图构建质量与GNN性能影响。
|
cs.LGcs.AIcs.CL
|
Othmane Kabal, Mounira Harzallah, Fabrice Guillet, Hideaki Takeda, Ryutaro Ichise |
Knowledge graphs automatically constructed from text are increasingly used in real-world applications. However, their inherent noise, fragmentation, and semantic inconsistencies significantly affect the performance of Graph Neural Networks (GNNs) on downstream...Knowledge graphs automatically constructed from text are increasingly used in real-world applications. However, their inherent noise, fragmentation, and semantic inconsistencies significantly affect the performance of Graph Neural Networks (GNNs) on downstream tasks. Assessing their performance and robustness remains difficult, as it is often unclear whether observed results stem from the learning model or from the quality of the constructed graph itself. In this work, we introduce a dual-purpos...
|
| 400 |
GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation
2605.05480
Unified Linear Attribution Theory用Riesz表示统一刻画多种线性可解释性归因方法。
|
cs.LGcs.AIstat.ML
|
Raimondo Fanale |
The main XAI attribution methods for deep neural networks -- GradCAM, SHAP, LIME, Integrated Gradients -- operate on separate theoretical foundations and are not formally comparable. We present GRALIS (Gradient-Riesz Averaged Locally-Integrated Shapley), a mat...The main XAI attribution methods for deep neural networks -- GradCAM, SHAP, LIME, Integrated Gradients -- operate on separate theoretical foundations and are not formally comparable. We present GRALIS (Gradient-Riesz Averaged Locally-Integrated Shapley), a mathematical framework establishing a representation theory for attributions: every additive, linear, and continuous attribution functional on L^2(Q,mu) admits a unique canonical representation (Q, w, Delta), proved necessary by the Riesz Repr...
|
| 401 |
Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL
2605.05481
Next-Policy Sampling in Deep RL用近似下一策略采样替代保守更新以提升RL改进效率。
|
cs.LG
|
Dillon Sandhu, Ronald Parr |
We revisit a classic "chicken-and-egg" problem in reinforcement learning: to safely improve a policy, the value function must be accurate on the state-visitation distribution of the updated policy. That distribution over states is unknown and cannot be sampled...We revisit a classic "chicken-and-egg" problem in reinforcement learning: to safely improve a policy, the value function must be accurate on the state-visitation distribution of the updated policy. That distribution over states is unknown and cannot be sampled for the purposes of training the value function. Conservative updates solve this problem, but at the cost of shrinking the policy update. This paper explores an alternative solution, Approximate Next Policy Sampling (ANPS), which addresses...
|
| 404 |
A Robust Foundation Model for Conservation Laws: Injecting Context into Flux Neural Operators via Recurrent Vision Transformers
2605.05488
Context-Conditioned Neural Operators用循环ViT注入上下文生成算子参数以求解守恒律。
|
cs.LG
|
Taeyoung Kim, Joon-Hyuk Ko |
We propose an architecture that augments the Flux Neural Operator (Flux NO), which combines the classical finite volume method (FVM) with neural operators, with ViT-based context injection. Our model is formulated as a hypernetwork: it extracts solution dynami...We propose an architecture that augments the Flux Neural Operator (Flux NO), which combines the classical finite volume method (FVM) with neural operators, with ViT-based context injection. Our model is formulated as a hypernetwork: it extracts solution dynamics over a finite temporal window, encodes them with a recurrent Vision Transformer, and generates the parameters of a context-conditioned neural operator. This enables the model to infer and solve conservation laws without explicit access t...
|
| 405 |
MEMOA: Massive Mixtures of Online Agents via Mean-Field Decentralized Nash Equilibria
2605.05492
Mean-Field Decentralized Multi-Agent Learning推导均值场去中心化纳什策略以扩展海量在线代理混合。
|
cs.LG
|
Xuwei Yang, David B. Emerson, Fatemeh Tavakoli, Anastasis Kratsios |
In the modern age of large-scale AI, federated learning has become an increasingly important tool for training large populations of AI agents; however, its computational and communication costs can rapidly fail to scale with the number of agents. This is preci...In the modern age of large-scale AI, federated learning has become an increasingly important tool for training large populations of AI agents; however, its computational and communication costs can rapidly fail to scale with the number of agents. This is precisely where decentralized agentic strategies shine: each agent acts autonomously, using only its own state together with a minimal summary of the ensemble, namely the mean-field. We derive the unique optimal decentralized policy in closed fo...
|
| 407 |
Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning
2605.05495
Transformer Shortcuts in Continual Reasoning揭示Transformer学到捷径解会损害持续组合推理能力。
|
cs.LG
|
William T. Redman, Erik C. Johnson, Brian Robinson |
Identifying and exploiting common features across domains is at the heart of the human ability to make analogies, and is believed to be crucial for the ability to continually learn. To do this successfully, general and flexible computational strategies must be...Identifying and exploiting common features across domains is at the heart of the human ability to make analogies, and is believed to be crucial for the ability to continually learn. To do this successfully, general and flexible computational strategies must be developed. While the extent to which Transformer neural network models can perform compositional reasoning has been the subject of intensive recent investigation, little work has been done to systematically understand how well these models...
|
| 408 |
Online Localized Conformal Prediction
2605.05497
Online Localized Conformal Prediction提出在线局部化共形预测以应对异质协变量下的校准。
|
cs.LG
|
Yuheng Lai, Garvesh Raskutti |
Conformal prediction is a framework that provides valid uncertainty quantification for general models with exchangeable data. However, in the online learning and time-series settings, exchangeability is not satisfied. Existing online conformal methods, such as...Conformal prediction is a framework that provides valid uncertainty quantification for general models with exchangeable data. However, in the online learning and time-series settings, exchangeability is not satisfied. Existing online conformal methods, such as adaptive conformal inference (ACI), can achieve long-run validity, yet they remain inefficient under covariate heterogeneity because they rely on global calibration. We propose \emph{Online Localized Conformal Prediction (OLCP)}, which com...
|
| 413 |
Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients
2605.05511
Active Feature Acquisition with Policy Gradients用路径式策略梯度实现非短视的主动特征获取决策。
|
cs.LGstat.ML
|
Linus Aronsson, Morteza Haghir Chehreghani |
Active feature acquisition (AFA) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict. AFA can be formulated as a partially observ...Active feature acquisition (AFA) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict. AFA can be formulated as a partially observable Markov decision process (POMDP), which naturally admits a sequential decision-making perspective. In this paper, we present non-myopic pathwise policy gradients (NM-PPG), a new AFA method built around this formulation. We introduce a c...
|
| 415 |
OpenG2G: A Simulation Platform for AI Datacenter-Grid Runtime Coordination
2605.05519
Datacenter-Grid Coordination Simulation提供仿真平台研究数据中心负载与电网信号的运行协同。
|
cs.LGcs.DC
|
Jae-Won Chung, Zhirui Liang, Yanyong Mao, Jiasi Chen, Mosharaf Chowdhury |
AI's growing compute demand and new datacenter buildouts present major capacity and reliability challenges for the electricity grid, leading to multi-year interconnection delays for new datacenters and bottlenecking AI growth. To ease this strain, datacenters ...AI's growing compute demand and new datacenter buildouts present major capacity and reliability challenges for the electricity grid, leading to multi-year interconnection delays for new datacenters and bottlenecking AI growth. To ease this strain, datacenters increasingly offer rapid power flexibility in response to grid signals, where the datacenter can increase or decrease its power consumption by adapting its workload in real time. In order to understand the impact of large datacenters on t...
|
| 416 |
Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors
2605.05520
Bayesian Rain Reconstruction with Diffusion Priors将降雨场重建建模为贝叶斯逆问题并引入扩散先验。
|
cs.LGstat.APstat.ML
|
Badr Moufad, Albina Ilina, Hai Victor Habi, Salem Lahlou, Yazid Janati |
Commercial Microwave Links (CMLs) offer dense spatial coverage for rainfall sensing but produce path-integrated measurements that make accurate ground-level reconstruction challenging. Existing methods typically oversimplify CMLs as point sensors and neglect l...Commercial Microwave Links (CMLs) offer dense spatial coverage for rainfall sensing but produce path-integrated measurements that make accurate ground-level reconstruction challenging. Existing methods typically oversimplify CMLs as point sensors and neglect line integration relating rainfall to signal attenuation, resulting in degraded performance under heterogeneous precipitation. In this work, we view rain field reconstruction as a Bayesian inverse problem with Diffusion Models (DMs) as high-...
|
| 419 |
MOSAIC: Module Discovery via Sparse Additive Identifiable Causal Learning for Scientific Time Series
2605.05524
Identifiable Causal Module Discovery用稀疏可加可识别因果学习发现科学时间序列模块结构。
|
cs.LGcs.AI
|
Shicheng Fan, Nour Elhendawy, Jianle Sun, Ke Fang, Kun Zhang |
Causal representation learning (CRL) seeks to recover latent variables with identifiability guarantees, typically up to permutation and component-wise reparameterization under appropriate assumptions. However, identifiability does not imply interpretability: l...Causal representation learning (CRL) seeks to recover latent variables with identifiability guarantees, typically up to permutation and component-wise reparameterization under appropriate assumptions. However, identifiability does not imply interpretability: latent semantics are typically assigned post hoc by alignment with known ground-truth factors. This limitation is particularly acute in scientific time series, where underlying mechanisms are unknown and discovering interpretable structure i...
|
| cs.MA 2 papers | ||||
| 68 |
DAO-enabled decentralized physical AI: A new paradigm for human-machine collaboration
2605.04522
DAO-governed decentralized physical AI提出DAO治理的去中心化物理AI架构以协同人机与基础设施
|
cs.MAcs.AIcs.CYecon.GN
|
Mark C. Ballandies, Florian Spychiger, Uwe Serdült, Claudio J. Tessone |
We propose DAO-enabled decentralized physical AI (DePAI), a democratic architecture for coordinating humans and autonomous machines in the operation and governance of physical-digital systems. We (1) synthesize foundations in blockchains, decentralized autonom...We propose DAO-enabled decentralized physical AI (DePAI), a democratic architecture for coordinating humans and autonomous machines in the operation and governance of physical-digital systems. We (1) synthesize foundations in blockchains, decentralized autonomous organizations (DAOs), and cryptoeconomics; (2) connect DAO design with digital-democracy research on deliberation and voting, showing how each can advance the other; (3) position DAO-governed decentralized physical infrastructure networ...
|
| 218 |
Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation
2605.04922
Graph-based multi-agent ideation用可学习编辑提交的想法图协调多智能体迭代生成科研点子。
|
cs.MAcs.AI
|
Jiangwen Dong, Bo Li, Wanyu Lin |
LLM-empowered multi-agent systems offer new potential to accelerate scientific discovery by generating novel research ideas. However, existing methods typically coordinate agents through temporary texts, such as drafts or chat logs; it is difficult to pinpoint...LLM-empowered multi-agent systems offer new potential to accelerate scientific discovery by generating novel research ideas. However, existing methods typically coordinate agents through temporary texts, such as drafts or chat logs; it is difficult to pinpoint the weaknesses in the generated ideas and how the agents refine them. To this end, we introduce \textbf{Evolving Idea Graphs} (EIG), a graph-based multi-agent scientific ideation framework that can generate high-performance research ideas ...
|
| cs.MM 1 papers | ||||
| 194 |
To Fuse or to Drop? Dual-Path Learning for Resolving Modality Conflicts in Multimodal Emotion Recognition
2605.04877
Modality conflict in emotion recognition提出双路径学习区分可解与不可解冲突以改进多模态情感识别。
|
cs.MMcs.HCcs.LG
|
Yangchen Yu, Qian Chen, Jia Li, Zhenzhen Hu, Jinpeng Hu |
Multimodal emotion recognition (MER) benefits from combining text, audio, and vision, yet standard fusion often fails when modalities conflict. Crucially, conflicts differ in resolvability: benign conflicts stem from missing, weak, or ambiguous cues and can be...Multimodal emotion recognition (MER) benefits from combining text, audio, and vision, yet standard fusion often fails when modalities conflict. Crucially, conflicts differ in resolvability: benign conflicts stem from missing, weak, or ambiguous cues and can be mitigated by cross-modal calibration, while severe conflicts arise from intrinsically contradictory (e.g., sarcasm) or misleading signals, for which forced fusion may amplify errors. Recognizing this, we propose Dual-Path Conflict Resoluti...
|
| cs.NE 2 papers | ||||
| 229 |
On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization
2605.04954
Budgeted algorithm selection for BBO研究黑盒优化中计算特征的预算占比何时使算法选择更划算。
|
cs.NEcs.LG
|
Koen van der Blom, Diederick Vermetten |
Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization ...Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes...
|
| 324 |
Direct From Darwin: Deriving Advanced Optimizers From Evolutionary First Principles
2605.05284
进化原理推导优化器从达尔文进化一阶原理推导谱系模拟并导出一系列梯度优化算法。
|
cs.NEcs.LGq-bio.PEq-bio.QM
|
Daniel Grimmer |
Evolutionary computation has long promised to deliver both high-performance optimization tools as well as rigorous scientific simulations of Darwinian evolution. However, modern algorithms frequently abandon evolutionary fidelity for physics-inspired heuristic...Evolutionary computation has long promised to deliver both high-performance optimization tools as well as rigorous scientific simulations of Darwinian evolution. However, modern algorithms frequently abandon evolutionary fidelity for physics-inspired heuristics or superficial biological metaphors. This paper derives a suite of advanced gradient-based optimization algorithms directly from evolutionary first principles. We introduce Darwinian Lineage Simulations (DLS) to prove that, in an asexual ...
|
| cs.NI 4 papers | ||||
| 5 |
Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers
2605.04373
Robust RL Network Control提出ReGuard发现最坏网络条件并在运行时保护RL控制器性能。
|
cs.NIcs.AIeess.SY
|
Hongyu Hè, Minhao Jin, Maria Apostolaki |
RL-based controllers achieve strong average-case performance in networking tasks such as congestion control and adaptive bitrate streaming. Yet their performance can degrade severely under network conditions where strong performance is still achievable. Identi...RL-based controllers achieve strong average-case performance in networking tasks such as congestion control and adaptive bitrate streaming. Yet their performance can degrade severely under network conditions where strong performance is still achievable. Identifying such conditions and quantifying the resulting performance gap is intractable by enumeration, while the sequential and closed-loop nature of RL controllers makes formal verification methods impractical. We present ReGuard, a framewor...
|
| 26 |
Joint Optimization of Trajectory Control, Resource Allocation, and Task Offloading for Multi-UAV-Assisted IoV
2605.04436
Multi-UAV IoV Joint Optimization联合优化多无人机轨迹、资源分配与任务卸载以降低时延能耗。
|
cs.NIcs.AI
|
Maoxin Ji, Qiong Wu, Pingyi Fan, Cui Zhang, Nan Cheng |
This paper investigates a multi-Unmanned Aerial Vehicle (UAV) joint base station-assisted Internet of Vehicles (IoV) task offloading system in dense urban environments. To minimize system delay and energy consumption under strict coupling constraints, the comp...This paper investigates a multi-Unmanned Aerial Vehicle (UAV) joint base station-assisted Internet of Vehicles (IoV) task offloading system in dense urban environments. To minimize system delay and energy consumption under strict coupling constraints, the complex non-convex optimization problem is decoupled into a hierarchical execution framework. First, a sequential distributed optimization algorithm based on Second-Order Cone Programming (SOCP) is proposed to optimize the 3D flight trajectory ...
|
| 73 |
SADE: Symptom-Aware Diagnostic Escalation for LLM-Based Network Troubleshooting
2605.04530
LLM agent for network troubleshooting以症状驱动分层升级流程提升LLM网络故障根因定位
|
cs.NIcs.AI
|
Kuan-Hao Tseng, Niruth Bogahawatta, Yasod Ginige, Kosta Dekic, Arunan Sivanathan |
Large language model (LLM) agents are increasingly applied to network troubleshooting, but root-cause localization on public benchmarks remains well below practical deployment thresholds. We argue this is because existing agents do not encode the disciplined, ...Large language model (LLM) agents are increasingly applied to network troubleshooting, but root-cause localization on public benchmarks remains well below practical deployment thresholds. We argue this is because existing agents do not encode the disciplined, layer-by-layer methodology that human network engineers use, and instead rely on free-form deliberation that conflates evidence acquisition with hypothesis commitment. We present SADE (Symptom-Aware Diagnostic Escalation), an agent that enc...
|
| 279 |
Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity
2605.05071
Vision-aided mmWave beam management用摄像头先验实现车联网毫米波双向波束快速对齐与跟踪。
|
cs.NIcs.AIcs.CEcs.CVeess.SY
|
Avhishek Biswas, Apala Pramanik, Eylem Ekici, Mehmet C. Vuran |
Millimeter-wave (mmWave) frequencies promise multi-gigabit connectivity for vehicle-to-everything (V2X) networks, but face challenges in terms of severe path loss and mobility-related beam misalignment. Reliable V2X connectivity requires fast, double-direction...Millimeter-wave (mmWave) frequencies promise multi-gigabit connectivity for vehicle-to-everything (V2X) networks, but face challenges in terms of severe path loss and mobility-related beam misalignment. Reliable V2X connectivity requires fast, double-directional beam alignment. However, existing methods suffer from high training overhead and limited generalization to unseen scenarios. This paper presents VIsion-based BEamforming(VIBE), a hybrid model-based, closed-loop, learning architecture for...
|
| cs.PL 1 papers | ||||
| 313 |
Beyond BLEU: A Semantic Evaluation Method for Code Translation
2605.05282
代码翻译语义评测用编译器测试思想评估代码翻译的语义等价性以替代BLEU。
|
cs.PLcs.CL
|
Julius Näumann, Sven Keidel, Amir Molzam Sharifloo, Mira Mezini |
Code translation is one of the core capabilities of LLMs. However, evaluating the correctness of translations remains difficult, as commonly used metrics such as BLEU measure only syntactic similarity, disregarding program semantics. We propose a novel evaluat...Code translation is one of the core capabilities of LLMs. However, evaluating the correctness of translations remains difficult, as commonly used metrics such as BLEU measure only syntactic similarity, disregarding program semantics. We propose a novel evaluation methodology for code translation tasks, emphasizing semantic equivalence over surface-level string similarity. Our approach applies established compiler testing methodology to a new domain, allowing the assessment of an LLM fine-tuned f...
|
| cs.RO 8 papers | ||||
| 3 |
Conditional Flow-VAE for Safety-Critical Traffic Scenario Generation
2605.04366
Traffic Scenario Generation用条件流匹配生成逼真且可扩展的安全关键交通场景。
|
cs.ROcs.LG
|
Zimu Gong, Brian Zhaoning Zhang, Chris Zhang, Kelvin Wong, Raquel Urtasun |
Safety-critical scenarios are essential for the development of autonomous vehicles (AVs) but are rare in real-world driving data. While simulation offers a way to generate such scenarios, manually designed test cases lack scalability, and adversarial optimizat...Safety-critical scenarios are essential for the development of autonomous vehicles (AVs) but are rare in real-world driving data. While simulation offers a way to generate such scenarios, manually designed test cases lack scalability, and adversarial optimization often produces unrealistic behaviors. In this work, we introduce a conditional latent flow matching approach for scalable and realistic safety-critical scenario generation. Our method uses distribution matching to transform nominal scen...
|
| 129 |
From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models
2605.04678
Latent Action Supervision for VLA系统比较图像与动作两类潜在动作监督以统一VLA模型训练。
|
cs.ROcs.CV
|
Yihan Lin, Haoyang Li, Yang Li, Haitao Shen, Yihan Zhao |
Latent actions serve as an intermediate representation that enables consistent modeling of vision-language-action (VLA) models across heterogeneous datasets. However, approaches to supervising VLAs with latent actions are fragmented and lack a systematic compa...Latent actions serve as an intermediate representation that enables consistent modeling of vision-language-action (VLA) models across heterogeneous datasets. However, approaches to supervising VLAs with latent actions are fragmented and lack a systematic comparison. This work structures the study of latent action supervision from two perspectives: (i) regularizing the trajectory via image-based latent actions, and (ii) unifying the target space with action-based latent actions. Under a unified V...
|
| 223 |
Modular Reinforcement Learning For Cooperative Swarms
2605.04939
Modular RL for robot swarms提出模块化多智能体强化学习以实现协作机器人群体控制。
|
cs.ROcs.AI
|
Erel Shtossel, Gal A. Kaminka |
A cooperative robot swarm is a collective of computationally-limited robots that share a common goal. Each robot can only interact with a small subset of its peers, without knowing how this affects the collective utility. Recent advances in distributed multi-a...A cooperative robot swarm is a collective of computationally-limited robots that share a common goal. Each robot can only interact with a small subset of its peers, without knowing how this affects the collective utility. Recent advances in distributed multi-agent reinforcement learning have demonstrated that it is possible for robots to learn how to interact effectively with others, in a manner that is aligned with the common goal, despite each robot learning independently of others. However, t...
|
| 271 |
Reduced-order Neural Modeling with Differentiable Simulation for High-Detail Tactile Perception
2605.05053
Neural reduced-order tactile simulation用粗粒度MPM加神经解码器高效重建触觉细节。
|
cs.ROcs.CV
|
Yuhu Guo, Zhikai Shen, Jiasheng Qu, Chenghao Qian, Yuming Huang |
Tactile perception is key to dexterous manipulation, yet simulating high-resolution elastomer deformation remains computationally prohibitive. Finite element methods (FEM) deliver high fidelity but demand costly remeshing, while Material Point Methods (MPM) su...Tactile perception is key to dexterous manipulation, yet simulating high-resolution elastomer deformation remains computationally prohibitive. Finite element methods (FEM) deliver high fidelity but demand costly remeshing, while Material Point Methods (MPM) suffer from heavy particle-memory tradeoffs. We propose a {reduced-order neural simulation framework} that couples coarse-grained MPM dynamics with an implicit neural decoder to reconstruct sub-particle tactile details from compact latent sta...
|
| 290 |
Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout
2605.05092
Driver-centric latent world model提出交通条件驱动的车内驾驶员潜变量世界模型进行多步滚动预测。
|
cs.ROcs.AIcs.CV
|
Haozhuang Chi, Daosheng Qiu, Hao Su, Haochen Liu, Zirui Li |
Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-st...Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics foreca...
|
| 297 |
LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts
2605.05110
Line-guided RL for robot stunts用用户给定空间线索与稀疏姿态约束训练自行车机器人特技。
|
cs.ROcs.AI
|
Seungeun Rho, Shamel Fahmi, Jeonghwan Kim, Arianna Ilvonen, Sehoon Ha |
Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guid...Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physi...
|
| 328 |
When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning
2605.05172
从BC提取Q函数强化学习提出Q2RL从行为克隆估计并门控Q值以实现机器人离线到在线提升。
|
cs.ROcs.AI
|
Lakshita Dodeja, Ondrej Biza, Shivam Vats, Stephen Hart, Stefanie Tellex |
Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to ...Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline...
|
| 379 |
Creative Robot Tool Use by Counterfactual Reasoning
2605.05411
机器人反事实工具使用用因果发现与反事实推理实现机器人创造性工具选择。
|
cs.ROcs.AI
|
M. Tuluhan Akbulut, Varun Satheesh, Ahmed Jaafar, Alper Ahmetoglu, Shane Parr |
We propose a causal reasoning framework for creative robot tool use where a suitable tool for a task is correctly identified for use beyond its primary objectives. The proposed framework first discovers the causal relationships between the tool and the task by...We propose a causal reasoning framework for creative robot tool use where a suitable tool for a task is correctly identified for use beyond its primary objectives. The proposed framework first discovers the causal relationships between the tool and the task by conducting simulated experiments in a dynamics model. We decouple the causal discovery problem into two complementary components: VLM-based feature suggestion and counterfactual tool generation via targeted geometric and physical feature p...
|
| cs.SD 5 papers | ||||
| 82 |
Stage-adaptive audio diffusion modeling
2605.04547
Stage-adaptive audio diffusion training提出阶段自适应训练策略降低音频扩散模型训练成本并提质
|
cs.SDcs.AI
|
Xuanhao Zhang, Chang Li |
Recent progress in diffusion-based audio generation and restoration has substantially improved performance across heterogeneous conditioning regimes, including text-conditioned audio generation and audio-conditioned super-resolution. However, training audio di...Recent progress in diffusion-based audio generation and restoration has substantially improved performance across heterogeneous conditioning regimes, including text-conditioned audio generation and audio-conditioned super-resolution. However, training audio diffusion models remains computationally expensive, and most existing pipelines still rely on static optimization recipes that treat the relative importance of training signals as fixed throughout learning. In this work, we argue that a major...
|
| 88 |
Benchmarking LLMs on the Massive Sound Embedding Benchmark (MSEB)
2605.04556
Benchmarking audio-native LLM embeddings在MSEB上系统评测多种音频原生LLM的声音嵌入能力
|
cs.SDcs.LG
|
Cyril Allauzen, Tom Bagby, Georg Heigold, Ehsan Variani, Ke Wu |
The Massive Sound Embedding Benchmark (MSEB) has emerged as a standard for evaluating the functional breadth of audio models. While initial baselines focused on specialized encoders, the shift toward "audio-native" Large Language Models (LLMs) suggests a new p...The Massive Sound Embedding Benchmark (MSEB) has emerged as a standard for evaluating the functional breadth of audio models. While initial baselines focused on specialized encoders, the shift toward "audio-native" Large Language Models (LLMs) suggests a new paradigm where a single multimodal backbone may replace complex, task-specific pipelines. This paper provides a rigorous empirical evaluation of leading LLMs - including members from the Gemini and GPT families - across the eight core MSEB c...
|
| 108 |
VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
2605.04613
Singing voice transcription with LALM用大型音频语言模型统一实现可扩展的歌声转录与对齐。
|
cs.SDcs.AI
|
Yukun Chen, Tianrui Wang, Zhaoxi Mu, Xinyu Yang, EngSiong Chng |
High-quality singing annotations are fundamental to modern Singing Voice Synthesis (SVS) systems. However, obtaining these annotations at scale through manual labeling is unrealistic due to the substantial labor and musical expertise required, making automatic...High-quality singing annotations are fundamental to modern Singing Voice Synthesis (SVS) systems. However, obtaining these annotations at scale through manual labeling is unrealistic due to the substantial labor and musical expertise required, making automatic annotation highly necessary. Despite their utility, current automatic transcription systems face significant challenges: they often rely on complex multi-stage pipelines, struggle to recover text-note alignments, and exhibit poor generaliz...
|
| 183 |
Hearing the Ocean: Bio-inspired Gammatone-CNN framework for Robust Underwater Acoustic Target Classification
2605.04839
Underwater acoustic classification用仿生Gammatone滤波结合CNN提升水下目标识别抗噪性。
|
cs.SD
|
Rajeshwar Tripathi, Sandeep Kumar, Monika Aggarwal, Neel Kanth Kundu |
This study presents a bio inspired signal processing framework for robust Underwater Acoustic Target Recognition (UATR). The latest state of the art methods often fail to resolve dense low frequency harmonic structures in vessel propulsion signals under high n...This study presents a bio inspired signal processing framework for robust Underwater Acoustic Target Recognition (UATR). The latest state of the art methods often fail to resolve dense low frequency harmonic structures in vessel propulsion signals under high noise conditions, which is addressed by the proposed framework using a biologically inspired Gammatone filter bank that emulates the cochlea nonlinear frequency selectivity. By distributing filters according to the Equivalent Rectangular Ban...
|
| 249 |
Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation
2605.04998
跨流派和弦生成实证研究流行与爵士数据混合比例对和弦生成微调迁移的影响。
|
cs.SDcs.IRcs.LG
|
Jinju Lee |
Chord progression generation is practically important but understudied. Most large-scale symbolic music systems target melody, multi-track arrangement, or audio synthesis, and chord-only models tend to be relegated to conditioning components inside larger pipe...Chord progression generation is practically important but understudied. Most large-scale symbolic music systems target melody, multi-track arrangement, or audio synthesis, and chord-only models tend to be relegated to conditioning components inside larger pipelines. This paper treats chord generation as a standalone task and addresses a question that arises whenever such a model is adapted across genres: how much old-domain data must be retained during fine-tuning to acquire a new domain without...
|
| cs.SE 8 papers | ||||
| 24 |
Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning
2605.04431
Failure Management for RFT提出自动故障管理框架以提升LLM强化微调过程的鲁棒性。
|
cs.SEcs.AI
|
Lingzhe Zhang, Tong Jia, Yunpeng Zhai, Liancheng Fang, Kening Zheng |
Reinforcement fine-tuning (RFT) has become a core paradigm for post-training large language models, yet its training process remains highly fragile. Existing efforts mainly improve reliability at the system level or address specific issues in individual subpro...Reinforcement fine-tuning (RFT) has become a core paradigm for post-training large language models, yet its training process remains highly fragile. Existing efforts mainly improve reliability at the system level or address specific issues in individual subproblems by modifying RFT algorithms. Despite their effectiveness, they largely overlook the problem of failure management at the training-process level. When training goes wrong, practitioners still rely heavily on expert-driven manual inspec...
|
| 76 |
Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap
2605.04532
Accountability of software engineering agents分析编程代理服务条款界定责任并提出研究路线图
|
cs.SEcs.AI
|
Christoph Treude |
AI coding assistants and autonomous agents are becoming integral to software development workflows, reshaping how code is produced, reviewed, and maintained. While recent research has focused mainly on the capabilities and impacts of productivity of these syst...AI coding assistants and autonomous agents are becoming integral to software development workflows, reshaping how code is produced, reviewed, and maintained. While recent research has focused mainly on the capabilities and impacts of productivity of these systems, much less attention has been paid to accountability: who is responsible when agents generate, modify, or recommend code? In practice, accountability is defined through the Terms of Service (ToS) and related policy documents that govern...
|
| 109 |
Beyond Retrieval: A Multitask Benchmark and Model for Code Search
2605.04615
Multitask benchmark for code search提出去污染的代码检索与重排多任务基准并训练重排模型。
|
cs.SEcs.AI
|
Siqiao Xue, Zihan Liao, Jin Qin, Ziyin Zhang, Yixiang Mu |
Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary re...Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce \textsc{CoREB}, a contamination-limited, multitask \underline{co}de \underline{r}etrieval and r\underline{e}ranking \underline{b}enchmark, together with a fine-tuned code reranker, that goes beyond retri...
|
| 128 |
CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement
2605.04677
LLM-Guided Code Optimization用LLM结合进化搜索与运行时剖析自动选择热点并优化多语言代码。
|
cs.SEcs.AI
|
Ajay Krishna Borra, Wenzhuo Yang, Samarth Arora, Akhilesh Deepak Gotmare, Gokulakrishnan Gopalakrishnan |
We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS), automated code refinement...We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS), automated code refinement, and language-specific evaluation pipelines for Java and Salesforce Apex. The system uses Java Flight Recorder (JFR) profiles to build weighted component graphs and select optimization targets that account for most execution cost, reducing...
|
| 133 |
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code
2605.05267
LLM Code Quality Review系统综述训练数据质量缺陷如何导致LLM生成代码的错误与漏洞。
|
cs.SEcs.AI
|
Kaifeng He, Xiaojun Zhang, Peiliang Cai, Mingwei Liu, Yanlin Wang |
Large language models (LLMs) frequently generate defective outputs in code generation tasks, ranging from logical bugs to security vulnerabilities. While these generation failures are often treated as model-level limitations, empirical evidence increasingly tr...Large language models (LLMs) frequently generate defective outputs in code generation tasks, ranging from logical bugs to security vulnerabilities. While these generation failures are often treated as model-level limitations, empirical evidence increasingly traces their root causes to imperfections within the training corpora. Yet, the specific mechanisms linking training data quality issues to generated code quality issues remain largely unmapped. This paper presents a systematic literature rev...
|
| 239 |
Architectural Constraints Alignment in AI-assisted, Platform-based Service Development
2605.04973
Architectural constraint-aware AI development用检索增强脚手架与澄清循环让AI生成代码符合架构约束。
|
cs.SEcs.AI
|
Julius Irion, Moritz Leugers, Paul Hartwig, Simon Kling, Tachmyrat Annayev |
AI-assisted development tools enable rapid prototyping of services but often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments. Consequently, generated artifacts may exhib...AI-assisted development tools enable rapid prototyping of services but often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments. Consequently, generated artifacts may exhibit brittle behavior and limited deployability. We propose a retrieval-augmented scaffolding approach that combines platform-based code generation with agentic clarification loops to expose and resolve architectural constraint ambiguities. B...
|
| 351 |
The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking
2605.06707
LLM单文件网页生成评测纵向比较多家LLM首轮HTML生成质量并跟踪社交传播表现。
|
cs.SEcs.AI
|
Diego Cabezas Palacios |
This paper presents an eight-week observational comparison of 68 single-file HTML generations collected across 17 public experiments in the "HTML AI Battle" project between December 10, 2025 and February 4, 2026. Four reasoning model families, GPT, Gemini, Gro...This paper presents an eight-week observational comparison of 68 single-file HTML generations collected across 17 public experiments in the "HTML AI Battle" project between December 10, 2025 and February 4, 2026. Four reasoning model families, GPT, Gemini, Grok, and Claude, were compared under a fixed public-interface protocol with no custom instructions, no personality tuning, and no repair prompts. Each output was evaluated from a rendered browser video using human scores and a Gemini LLM-as-a...
|
| 372 |
Mise en Place for Agentic Coding: Deliberate Preparation as Context Engineering Methodology
2605.05400
智能体编程上下文工程提出MEP方法论通过准备上下文提升代码代理可靠性。
|
cs.SEcs.AIcs.HC
|
Andrew Zigler |
The rapid adoption of AI coding agents has produced a dominant workflow pattern -- often called "vibe coding" -- that prioritizes speed of implementation over deliberate preparation. We argue that this approach creates a systematic alignment problem: agents th...The rapid adoption of AI coding agents has produced a dominant workflow pattern -- often called "vibe coding" -- that prioritizes speed of implementation over deliberate preparation. We argue that this approach creates a systematic alignment problem: agents that lack sufficient context produce code requiring extensive debugging and refactoring, consuming substantial development time. Drawing on the culinary concept of mise en place (everything in its place; abbreviated MEP), we propose a three-p...
|
| eess.AS 2 papers | ||||
| 60 |
JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions
2605.04505
Instruction-Driven Audio Evaluation用自然语言指令对齐LLM实现零样本音频与语音评测。
|
eess.AScs.AIcs.SD
|
Leying Zhang, Bowen Shi, Haibin Wu, Bach Viet Do, Yanmin Qian |
The rapid advancement of generative audio models has outpaced the development of robust evaluation methodologies. Existing objective metrics and general multimodal large language models (MLLMs) often struggle with domain generalization, zero-shot capabilities,...The rapid advancement of generative audio models has outpaced the development of robust evaluation methodologies. Existing objective metrics and general multimodal large language models (MLLMs) often struggle with domain generalization, zero-shot capabilities, and instructional flexibility. To address these bottlenecks, we propose JASTIN, a generalizable, instruction-driven audio evaluation framework that formulates audio assessment as a self-instructed reasoning task. JASTIN bridges a frozen hi...
|
| 159 |
Spatial-Magnifier: Spatial upsampling for multichannel speech enhancement
2605.04749
虚拟麦克风语音增强用神经网络生成虚拟麦克风信号以提升多通道语音增强指向性。
|
eess.AS
|
Dongheon Lee, Ashutosh Pandey, Sanjeel Parekh, Daniel Wong, Jacob Donley |
While the spatial directivity of multichannel speech enhancement algorithms improves with the number of microphones, fitting large capture arrays into real-world edge devices is typically limited by physical constraints. To overcome this limitation, we propose...While the spatial directivity of multichannel speech enhancement algorithms improves with the number of microphones, fitting large capture arrays into real-world edge devices is typically limited by physical constraints. To overcome this limitation, we propose Spatial-Magnifier, a neural network designed to generate virtual microphone (VM) signals from a limited set of real microphone (RM) measurements. Moreover, we introduce the Spatial Audio Representation Learning (SARL) framework, which leve...
|
| eess.IV 5 papers | ||||
| 10 |
Hyperspectral Anomaly Detection Using Einstein Fuzzy Computing and Quantum Neural Network
2605.04388
Hyperspectral Anomaly Detection结合爱因斯坦模糊计算与量子神经网络进行高光谱异常检测。
|
eess.IV
|
Chia-Hsiang Lin, Si-Sheng Young, Reza Langari |
In the remote sensing (RS) field, hyperspectral imagery provides rich spectral information and facilitates numerous critical applications, such as material identification. Among these applications, hyperspectral anomaly detection (HAD) aims to detect substance...In the remote sensing (RS) field, hyperspectral imagery provides rich spectral information and facilitates numerous critical applications, such as material identification. Among these applications, hyperspectral anomaly detection (HAD) aims to detect substances whose spectral characteristics deviate from background spectra, which are termed anomalies. However, many widely used HAD algorithms in the RS community identify anomalies by relying on a ``background reconstruction'' strategy. Furthermor...
|
| 285 |
External Validation of Deep Learning Models for BI-RADS Breast Density Prediction from Ultrasound Images
2605.05082
Breast density prediction validation在独立队列外部验证超声预测BI-RADS乳腺密度的深度模型。
|
eess.IVcs.CV
|
Yuxuan Chen, Arianna Bunnell, Yanqi Xu, Haoyan Yang, Thomas K. Wolfgruber |
We externally validated three deep learning models (DenseNet121, ViT-B/32, and ResNet50) for predicting mammographic breast density from breast ultrasound exams on an independent cohort. The external validation set comprised 2,000 ultrasound exams, including 5...We externally validated three deep learning models (DenseNet121, ViT-B/32, and ResNet50) for predicting mammographic breast density from breast ultrasound exams on an independent cohort. The external validation set comprised 2,000 ultrasound exams, including 500 cancer cases defined by an initial negative exam (BI-RADS 1 or 2) followed by a cancer diagnosis within 6 months to 10 years, and 1,500 negative controls matched by manufacturer and study year. Performance was measured using patient-leve...
|
| 317 |
CTseg: A Tool for Brain CT Segmentation, Spatial Normalisation, and Volumetrics
2605.05154
脑CT分割与体积测量发布并验证CTseg实现脑CT分割、配准归一化与体积估计流程。
|
eess.IV
|
Mikael Brudfors |
This paper presents and validates CTseg, a freely available software for brain CT segmentation, spatial normalisation, and volumetrics. CTseg builds on the Multi-Brain generative modelling framework, providing a CT-specific pipeline that produces tissue maps, ...This paper presents and validates CTseg, a freely available software for brain CT segmentation, spatial normalisation, and volumetrics. CTseg builds on the Multi-Brain generative modelling framework, providing a CT-specific pipeline that produces tissue maps, deformation fields, and brain volume estimates in the same format as SPM's unified segmentation, thereby extending SPM's established analysis chain from MRI to CT. CTseg is designed for routine hospital CT scans without requiring preprocess...
|
| 329 |
MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge
2605.05175
MRI知识LLM基准构建MRI-Eval分层题库评测LLM的MRI物理与GE扫描仪操作知识。
|
eess.IVcs.CLphysics.med-ph
|
Perry E. Radau |
Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge centra...Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge central to research MRI practice. Purpose: We developed MRI-Eval, a tiered benchmark for relative model comparison on MRI physics and GE scanner operations knowledge using primary multiple-choice questions (MCQ), with stem-only and primed diagnos...
|
| 417 |
Tumor-aware augmentation with task-guided attention analysis improves rectal cancer segmentation from magnetic resonance images
2605.05522
Tumor-Aware Augmentation for MRI Segmentation用肿瘤感知增强与注意力分析提升直肠癌MRI分割。
|
eess.IVcs.CV
|
Aneesh Rangnekar, Joao Miranda, Natally Horvat, Stephanie Chahwan, Samir Alrayess |
Pretraining on large-scale datasets has been shown to improve transformer generalizability, even for out-of-domain (OOD) modalities and tasks. However, two common assumptions often fail under OOD transfer: that downstream datasets can be adapted to the fixed i...Pretraining on large-scale datasets has been shown to improve transformer generalizability, even for out-of-domain (OOD) modalities and tasks. However, two common assumptions often fail under OOD transfer: that downstream datasets can be adapted to the fixed input geometry of pretrained models and that pretrained representations transfer effectively across imaging modalities. We show that these assumptions break down through two interacting failure modes in CT-to-MRI transfer: inefficient token ...
|
| eess.SY 2 papers | ||||
| 7 |
Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery
2605.04375
AI-Controlled Lab Automation提出实验即代码的声明式栈以支持AI代理操控真实实验室。
|
eess.SYcs.AI
|
Zhenning Yang, Yuhan Chen, Patrick Tser Jern Kon, Tongyuan Miao, Hongyi Lin |
To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. Wh...To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. While some tasks can be performed on a computer (e.g., data analysis, running simulated experiments), Eureka moments could occur at any time while operating lab instruments (e.g., when a scientist notices unexpected clues, intuition may promp...
|
| 270 |
Kinematic Discriminants of Deceleration Behavior Modes in Car-Following: Evidence from NGSIM Trajectory Data
2605.05050
跟驰减速行为判别用NGSIM轨迹揭示不同减速强度下闭合率与视觉逼近的判别主导切换。
|
eess.SYcs.LG
|
Eni Solomon Laughter |
Gap-closing rate and visual looming swap discriminative dominance depending on deceleration intensity - a finding that reconciles a long-standing conflict in the car-following literature and challenges spacing-centered assumptions in traditional driver behavio...Gap-closing rate and visual looming swap discriminative dominance depending on deceleration intensity - a finding that reconciles a long-standing conflict in the car-following literature and challenges spacing-centered assumptions in traditional driver behavior models. This study presents a two-stage analytical framework that distinguishes between information availability (kinematic variables measurable in the environment) and information utilization (variables that demonstrably separate driver ...
|
| math.AP 1 papers | ||||
| 216 |
Neural Discovery of Strichartz Extremizers
2605.04918
Neural search for Strichartz extremizers用神经网络数值搜索Strichartz不等式的极值函数。
|
math.APcs.LGmath.NA
|
Nicolás Valenzuela, Ricardo Freire, Claudio Muñoz |
Strichartz inequalities are a cornerstone of the modern theory of dispersive PDEs, but their extremizers are known explicitly only in a handful of sharp cases. The non-convexity of the underlying functional makes the problem hard, and to our knowledge no syste...Strichartz inequalities are a cornerstone of the modern theory of dispersive PDEs, but their extremizers are known explicitly only in a handful of sharp cases. The non-convexity of the underlying functional makes the problem hard, and to our knowledge no systematic numerical attack has been attempted. We propose a simple neural-network-based pipeline that searches for extremizers as critical points of the Strichartz ratio, and apply it in three settings. First, on the Schrödinger group we recove...
|
| math.CA 1 papers | ||||
| 336 |
Almost-Orthogonality in Lp Spaces: A Case Study with Grok
2605.05192
Lp空间近正交不等式反例构造反例否定Carbery强化三角不等式并分析其条件。
|
math.CAcs.AImath.COmath.PR
|
Ziang Chen, Jaume de Dios Pont, Paata Ivanisvili, Jose Madrid, Haozhu Wang |
Carbery proposed the following sharpened form of triangle inequality for many functions: for any $p\ge 2$ and any finite sequence $(f_j)_j\subset L^p$ we have \[ \Big\|\sum_j f_j\Big\|_p \ \le\ \left(\sup_{j} \sum_{k} α_{jk}^{\,c}\right)^{1/p'} \Big(\sum_j \|f...Carbery proposed the following sharpened form of triangle inequality for many functions: for any $p\ge 2$ and any finite sequence $(f_j)_j\subset L^p$ we have \[ \Big\|\sum_j f_j\Big\|_p \ \le\ \left(\sup_{j} \sum_{k} α_{jk}^{\,c}\right)^{1/p'} \Big(\sum_j \|f_j\|_p^p\Big)^{1/p}, \] where $c=2$, $1/p+1/p'=1$, and $α_{jk}=\sqrt{\frac{\|f_{j}f_{k}\|_{p/2}}{\|f_{j}\|_{p}\|f_{k}\|_{p}}}$. In the first part of this paper we construct a counterexample showing that this inequality fails for every $p>2$...
|
| math.LO 1 papers | ||||
| 359 |
Towards an Inferentialist Account of Information Through Proof-theoretic Semantics
2605.05368
信息的证明论语义基础用证明论语义提出信息的推理主义框架与逻辑基础。
|
math.LOcs.AI
|
Matthew Collinson, Timo Eckhardt, David Pym |
Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools f...Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools for understanding the complex ecosystems of systems upon which the society depends. We seek to rectify this by taking a first step towards developing an inferentialist semantic theory of information. There are three key interacting component...
|
| math.NA 1 papers | ||||
| 84 |
Neural-Guided Domain Restriction to Accelerate Pseudospectra Computation for Structured Non-normal Banded Matrices
2605.04550
Neural-guided pseudospectra acceleration用神经引导的域限制加速结构化非正规带状矩阵伪谱计算
|
math.NAcs.LG
|
Amit Punia, Rakesh Kumar, Madan Lal |
Computing pseudospectra of non-normal matrices is essential for understanding the stability and transient behavior of dynamical systems. Such analysis is critical in applications including fluid dynamics, control systems, and differential operators, where non-...Computing pseudospectra of non-normal matrices is essential for understanding the stability and transient behavior of dynamical systems. Such analysis is critical in applications including fluid dynamics, control systems, and differential operators, where non-normality can lead to significant transient amplification and sensitivity to perturbations that are not captured by eigenvalue analysis alone. At large scales, commonly used numerical approaches for pseudospectra computation can become comp...
|
| math.OC 2 papers | ||||
| 64 |
Predictive and Prescriptive AI toward Optimizing Wildfire Suppression
2605.04510
Wildfire suppression resource optimization用预测+整数优化联合分配消防队伍以优化灭火资源调度
|
math.OCcs.AIcs.LG
|
Leonard Boussioux, Alexandre Jacquillat, Ryne Reger, Jacob Wachspress |
Intense wildfire seasons require critical prioritization decisions to allocate scarce suppression resources over a dispersed geographical area. This paper develops a predictive and prescriptive approach to jointly optimize crew assignments and wildfire suppres...Intense wildfire seasons require critical prioritization decisions to allocate scarce suppression resources over a dispersed geographical area. This paper develops a predictive and prescriptive approach to jointly optimize crew assignments and wildfire suppression. The problem features a discrete resource-allocation structure with endogenous wildfire demand and non-linear wildfire dynamics. We formulate an integer optimization model with crew assignments on a time-space-rest network, wildfire dy...
|
| 365 |
Meta-learning for sample-efficient Bayesian optimisation of fed-batch processes
2605.05382
元学习贝叶斯优化用元学习提升分批过程配方的样本效率贝叶斯优化。
|
math.OCcs.LG
|
Becky Langdon, Gabriel D. Patrón, Chrysoula D. Kappatou, Robert M. Lee, Behrang Shafei |
The optimisation of fed-batch (bio)chemical process recipes is subject to inherent, underlying, and unmeasurable fluctuations across batches, whose trajectories are difficult to model and costly to measure. Bayesian Optimisation (BayesOpt) is a powerful tool f...The optimisation of fed-batch (bio)chemical process recipes is subject to inherent, underlying, and unmeasurable fluctuations across batches, whose trajectories are difficult to model and costly to measure. Bayesian Optimisation (BayesOpt) is a powerful tool for sampling and optimisation of expensive-to-measure functions. Gaussian Processes (GPs), the surrogate models used in BayesOpt, are static, forecast poorly, and lack generalisation across experiments, limiting their applicability to time-v...
|
| math.PR 1 papers | ||||
| 337 |
Grokability in five inequalities
2605.05193
AI协作数学不等式发现总结与Grok协作得到并验证的五个不等式与界改进。
|
math.PRcs.AImath.APmath.CAmath.FA
|
Paata Ivanisvili, Xinyuan Xie |
In this note, we report five mathematical discoveries made in collaboration with Grok, all of which have been subsequently verified by the authors. These include an improved lower bound on the maximal Gaussian perimeter of convex sets in $\mathbb{R}^n$, sharpe...In this note, we report five mathematical discoveries made in collaboration with Grok, all of which have been subsequently verified by the authors. These include an improved lower bound on the maximal Gaussian perimeter of convex sets in $\mathbb{R}^n$, sharper $L_2$-$L_1$ moment comparison inequalities on the Hamming cube $\{-1,1\}^n$, a strengthened autoconvolution inequality, improved asymptotic bounds on the size of the largest $g$-Sidon sets in $\{1,\dots,n\}$, and an optimal balanced Szare...
|
| math.ST 1 papers | ||||
| 384 |
Direct Estimation of Schrödinger Bridge Time-Series Drifts: Finite-Sample, Asymptotic, and Adaptive Guarantees
2605.05432
薛定谔桥漂移估计直接核回归估计SB时间序列漂移并给出有限样本保证。
|
math.STcs.LGstat.ML
|
Othmane Mazhar, Huyên Pham |
We study nonparametric estimation of Schrödinger bridge (SB) drifts from i.i.d.\ data observed on a single time interval. Starting from the conditional-ratio form of the Schrödinger bridge time-series (SBTS) drift formula, we analyze a direct Nadaraya--Watson ...We study nonparametric estimation of Schrödinger bridge (SB) drifts from i.i.d.\ data observed on a single time interval. Starting from the conditional-ratio form of the Schrödinger bridge time-series (SBTS) drift formula, we analyze a direct Nadaraya--Watson plug-in estimator built from kernelized numerator and denominator terms. Unlike recent SB analyses based on entropic-OT potentials, Sinkhorn iterations, or iterative bridge solvers, our approach works directly at the drift level and isolate...
|
| q-bio.BM 1 papers | ||||
| 50 |
Enhancing Cryo-EM Density Map Segmentation in Phenix for Improved Atomic Model Building
2605.05259
Cryo-EM Map Segmentation Pipeline结合AlphaFold改进Phenix分割以自动构建更准原子模型。
|
q-bio.BMcond-mat.mtrl-scics.AIq-bio.QM
|
Chenwei Zhang |
We introduce PhenixCraft, a fully automated pipeline for building atomic models from cryo-EM density maps. By integrating AlphaFold predictions, we enhance the map-segmentation step in Phenix during model building, addressing challenges posed by noise and arti...We introduce PhenixCraft, a fully automated pipeline for building atomic models from cryo-EM density maps. By integrating AlphaFold predictions, we enhance the map-segmentation step in Phenix during model building, addressing challenges posed by noise and artifacts that traditionally hinder this step. Our results demonstrate PhenixCraft's superior performance in TM-scores and sequence accuracy, significantly improving upon the limitations and inefficiencies of traditional model building using Ph...
|
| q-bio.NC 2 papers | ||||
| 28 |
Dissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networks
2605.04443
Neural Alignment and Robustness分析神经引导DCNN鲁棒性优势是否源于空间频率依赖变化。
|
q-bio.NCcs.AI
|
Zhenan Shao, Tianyu Ren, Chengxiao Wang, Leyla Isik, Diane M. Beck |
Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex ...Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex activity improves adversarial robustness, but the mechanisms driving this advantage are unclear. One hypothesis suggests that neural alignment confers robustness by biasing models away from brittle high-frequency details and towards the low...
|
| 289 |
Think-Aloud Reshapes Automated Cognitive Model Discovery Beyond Behavior
2605.05091
Think-aloud constrained model discovery引入思维口述轨迹约束,提升LLM自动认知模型发现质量。
|
q-bio.NCcs.AI
|
Hanbo Xie, Akshay K. Jagadish, Lan Pan, Robert C. Wilson |
Computational cognitive models discovered using large language models have so far relied solely on behavioral data. However, it is well-known that models produced from the behavioral trajectory alone are typically under-determined. In this work, we explore the...Computational cognitive models discovered using large language models have so far relied solely on behavioral data. However, it is well-known that models produced from the behavioral trajectory alone are typically under-determined. In this work, we explore the use of Think Aloud traces as an additional form of data constraint during automated model discovery. When applied to the domain of risky decision-making, we find that the models discovered with think-aloud achieve significantly improved pr...
|
| quant-ph 1 papers | ||||
| 104 |
Generative Quantum-inspired Kolmogorov-Arnold Eigensolver
2605.04604
Quantum-inspired eigensolver for chemistry提出参数高效的生成式量子启发特征求解器用于量化学。
|
quant-phcs.LG
|
Yu-Cheng Lin, Yu-Chao Hsu, I-Shan Tsai, Chun-Hua Lin, Kuo-Chung Peng |
High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-in...High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE), a parameter-efficient extension of the generative quantum eigensolver (GQE) for quantum chemistry. GQKAE replaces the parameter-heavy feed-forward network components in GPT-style generative eige...
|
| stat.ME 3 papers | ||||
| 9 |
Causal discovery under mean independence and linearity
2605.04381
Causal Discovery with Mean Independence提出LiMIAM以均值独立替代全独立来识别线性因果结构。
|
stat.MEcs.LGmath.STstat.ML
|
Geert Mesters, Alvaro Ribot, Anna Seigal, Piotr Zwiernik |
Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to ...Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to recover the wrong causal order, even with infinite data. We introduce the Linear Mean-Independent Acyclic Model (LiMIAM), which replaces full independence with weaker one-sided mean-independence restrictions on the disturbances. Under finit...
|
| 182 |
PAIR-CI: Calibrated Conditional Independence Testing for Causal Discovery with Incomplete Data
2605.04838
CI testing with missing data提出PAIR-CI将多重插补融入置换检验以校准因果发现CI测试。
|
stat.MEcs.LGstat.ML
|
Thomas S. Robinson, Ranjit Lall |
The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation e...The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation error induces spurious conditional dependence. We introduce PAIR-CI, a nonparametric CI test that restores calibration by integrating multiple imputation directly into the inferential procedure via a paired permutation design. PAIR-CI compar...
|
| 406 |
A renormalization-group inspired lattice-based framework for piecewise generalized linear models
2605.05493
Lattice Piecewise Generalized Linear Models提出RG启发的格点分区可解释分段广义线性模型框架。
|
stat.MEcond-mat.stat-mechcs.LGmath.ST
|
Joshua C. Chang |
We formally introduce a class of models inspired by renormalization group (RG) theory, built on additive hierarchical expansions analogous to those appearing in functional ANOVA and mixed-effects models. Like ReLU convolutional neural networks, they are almost...We formally introduce a class of models inspired by renormalization group (RG) theory, built on additive hierarchical expansions analogous to those appearing in functional ANOVA and mixed-effects models. Like ReLU convolutional neural networks, they are almost everywhere locally linear; unlike ReLU networks, their partition structure is explicit, interpretable, and easy to modify or constrain. In these models, one defines a multidimensional lattice partition of the input space and uses it to sca...
|
| stat.ML 11 papers | ||||
| 75 |
Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
2605.05262
Submodular tree search for tool-use RL将工具使用树搜索建模为子模最大化以提升固定预算rollout信息量
|
stat.MLcs.AIcs.LG
|
Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song |
We formalize Rollout Informativeness under a Fixed Budget (RIFB) as the expected non-vanishing policy-gradient mass that a tool-use rollout set injects into Group Relative Policy Optimization (GRPO). We prove that any budget-agnostic independent sampler suffer...We formalize Rollout Informativeness under a Fixed Budget (RIFB) as the expected non-vanishing policy-gradient mass that a tool-use rollout set injects into Group Relative Policy Optimization (GRPO). We prove that any budget-agnostic independent sampler suffers a collapse rate bounded away from zero for hard prompts regardless of the budget. Motivated by this, we recast intermediate state selection as a monotone submodular maximization problem, where a greedy one-step selector enjoys a 1 minus 1...
|
| 99 |
Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points
2605.04589
Dynamic network trajectory geometry提出多尺度欧氏轨迹表示以做动态网络归因与变点检测。
|
stat.MLcs.LGmath.ST
|
Haruka Ezoe, Ryohei Hisano |
A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relat...A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relates these trajectories to node embeddings. In multilayer and unfolded spectral constructions, however, node embeddings and their underlying latent positions are identifiable only up to general linear transformations. Although this ambiguity ...
|
| 189 |
Forecasting Oncology Demand Trends with Boosting-Based Bayesian Conjugate Models
2605.05270
Bayesian forecasting for healthcare demand用Gamma-Poisson贝叶斯模型结合残差boosting预测肿瘤门诊需求趋势。
|
stat.MLcs.LGstat.AP
|
Ademir Batista dos Santos Neto, Tiago Alessandro Espinola Ferreira, Paulo Renato Alves Firmino |
Accurate trend forecasting in healthcare time series is essential for planning and resource allocation. This paper proposes a Bayesian framework for predicting oncology demand trends, modeling weekly appointments as a Poisson process with a Gamma prior to the ...Accurate trend forecasting in healthcare time series is essential for planning and resource allocation. This paper proposes a Bayesian framework for predicting oncology demand trends, modeling weekly appointments as a Poisson process with a Gamma prior to the demand rate. To enhance adaptability and capture persistent directional patterns, we incorporate a residual-based boosting mechanism grounded in a Gamma-Log-Normal conjugate structure. This boosting approach allows the model to track both s...
|
| 221 |
Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift
2605.04932
Deployment risk under covariate drift用雅可比-速度界刻画协变量漂移下长期部署风险波动。
|
stat.MLcs.LG
|
Jonathan R. Landers |
We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincaré inequality reduces temporal risk volatility to derivative energy, and a Jacobian-velocity theorem identifies directional tangent energy along the deplo...We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincaré inequality reduces temporal risk volatility to derivative energy, and a Jacobian-velocity theorem identifies directional tangent energy along the deployment path as the governing quantity under explicit along-path regularity and domination assumptions. Under low-rank drift, that quantity reduces to directional Jacobian energy in the drift subspace, motivating drift-aligned tangent regular...
|
| 253 |
Scalable inference of spatial regions and temporal signatures from time series
2605.05008
时空区域化推断从时序数据中可扩展地推断空间连通区域及其时间特征签名。
|
stat.MLcs.LGcs.SIphysics.soc-ph
|
Jiayu Weng, Alec Kirkley |
Regionalization aims to partition a spatial domain into contiguous regions that share similar characteristics, enabling more effective spatial analysis, policy making, and resource management. Existing approaches for spatial regionalization typically rely on s...Regionalization aims to partition a spatial domain into contiguous regions that share similar characteristics, enabling more effective spatial analysis, policy making, and resource management. Existing approaches for spatial regionalization typically rely on static spatial snapshots rather than evolving time series. Meanwhile, most time series clustering methods ignore spatial structure or enforce spatial continuity through ad hoc regularization, constraining the number of inferred regions a pri...
|
| 260 |
Hypergraph Generation via Structured Stochastic Diffusion
2605.05024
超图扩散生成模型提出结构化随机扩散在关联矩阵上直接生成保真超图结构。
|
stat.MLcs.LGstat.COstat.ME
|
Christopher Nemeth |
Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE, a generative model ...Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE, a generative model defined directly on relaxed incidence matrices via a structured stochastic diffusion. The forward process combines a hypergraph-specific two-sided heat operator with an Ornstein--Uhlenbeck component, preserving structure-aware noising near ...
|
| 291 |
Proximal Projection for Doubly Sparse Regularized Models
2605.05093
Doubly sparse regularized regression提出近端投影方法在图结构预测变量下实现双重稀疏正则。
|
stat.MLcs.LGstat.COstat.ME
|
Jia Wei He, R. Ayesha Ali, Gerarda Darlington |
Regularization is often used in high-dimensional regression settings to generate a sparse model, which can save tremendous computing resources and identify predictors that are most strongly associated with the response. When the predictors can be represented b...Regularization is often used in high-dimensional regression settings to generate a sparse model, which can save tremendous computing resources and identify predictors that are most strongly associated with the response. When the predictors can be represented by a Gaussian graphical model, the structure of the predictor graph can be exploited during regularization. Our proposed model exploits this underlying predictor graph structure by decomposing the estimated coefficient vector into a sum of l...
|
| 334 |
Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval
2605.05189
线性联想记忆容量阈值分析线性记忆在不同检索准则下的容量阈值与构造。
|
stat.MLcs.ITcs.LG
|
Nicholas Barnfield, Juno Kim, Eshaan Nichani, Jason D. Lee, Yue M. Lu |
How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we s...How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing...
|
| 386 |
Estimating Implicit Regularization in Deep Learning
2605.05436
深度学习隐式正则估计提出方法量化训练过程带来的隐式正则化强度与形式。
|
stat.MLcs.LG
|
Joseph H. Rudoler, Kevin Tan, Giles Hooker, Konrad P. Kording |
Deep learning systems are known to exhibit implicit regularization (alt. implicit bias), favoring simple solutions instead of merely minimizing the loss function. In some cases, we can analytically derive the implicit regularization -- connecting it to an equi...Deep learning systems are known to exhibit implicit regularization (alt. implicit bias), favoring simple solutions instead of merely minimizing the loss function. In some cases, we can analytically derive the implicit regularization -- connecting it to an equivalent penalty that augments the learning objective. However, modern deep learning systems are complex, carrying modifications to the training procedure and architecture (e.g. early stopping, minibatching, dropout) whose effects are not alw...
|
| 391 |
Convexity in Disguise: A Theoretical Framework for Nonconvex Low-Rank Matrix Estimation
2605.05446
Nonconvex Low-Rank Estimation Theory提出统一理论解释非凸低秩矩阵估计的隐式凸性。
|
stat.MLcs.ITcs.LGmath.OC
|
Chengyu Cui, Gongjun Xu |
Nonconvex methods have emerged as a dominant approach for low-rank matrix estimation, a problem that arises widely in machine learning and AI for learning and representing high-dimensional data. Existing analyses for these methods often require additional regu...Nonconvex methods have emerged as a dominant approach for low-rank matrix estimation, a problem that arises widely in machine learning and AI for learning and representing high-dimensional data. Existing analyses for these methods often require additional regularization to mitigate nonconvexity, even though such regularization is often unnecessary in practice. Moreover, most analyses rely on problem-specific arguments that are difficult to generalize to more complex settings. In this paper, we d...
|
| 418 |
Permutation-preserving Functions and Neural Vecchia Covariance Kernels
2605.05523
Neural Vecchia Gaussian Process Kernels学习Vecchia诱导的协方差参数以构造可扩展GP核。
|
stat.MLcs.LGstat.CO
|
Jian Cao, Nian Liu, Ying Lin |
We introduce a novel framework for constructing scalable and flexible covariance kernels for Gaussian processes (GPs) by directly learning the covariance structure under a regression-type parameterization induced by Vecchia approximations, using deep neural ar...We introduce a novel framework for constructing scalable and flexible covariance kernels for Gaussian processes (GPs) by directly learning the covariance structure under a regression-type parameterization induced by Vecchia approximations, using deep neural architectures. Specifically, we model kriging coefficients and conditional standard deviations, deterministic quantities that uniquely characterize the covariance, providing stable and informative learning targets. Exploiting the permutation-...
|