| # | Title | Categories | Authors | Abstract |
|---|---|---|---|---|
| astro-ph.IM 1 papers | ||||
| 24 |
AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification
2605.05573
|
astro-ph.IMcs.AI
|
Claire Chen, Jiabao Sean Xiao, Shuze Daniel Liu, Facundo Perez Paolino, Luke Handley |
Modern astronomical observatories generate a massive volume of multimodal data, creating a critical bottleneck for expert human review. While multimodal large language models (LLMs) have shown promise in interpreting complex visual and textual inputs, their ab...Modern astronomical observatories generate a massive volume of multimodal data, creating a critical bottleneck for expert human review. While multimodal large language models (LLMs) have shown promise in interpreting complex visual and textual inputs, their ability to perform specialized scientific classification while providing interpretable reasoning remains understudied. We introduce AstroAlertBench, a comprehensive multimodal benchmark designed to evaluate LLM performance in astronomical eve...
|
| cond-mat.mtrl-sci 1 papers | ||||
| 113 |
Polarizable atomic multipoles for learning long-range electrostatics
2605.05746
|
cond-mat.mtrl-scics.LGphysics.chem-phphysics.comp-ph
|
Dongjin Kim, Daniel S. King, Yoonjae Park, Roya Savoj, Sebastien Hamel |
Long-range electrostatics and polarization remain central obstacles to extending machine learning interatomic potentials (MLIPs) to ionic, polar, and interfacial systems. Here, we introduce a semi-local framework for learning electrostatics from energies and f...Long-range electrostatics and polarization remain central obstacles to extending machine learning interatomic potentials (MLIPs) to ionic, polar, and interfacial systems. Here, we introduce a semi-local framework for learning electrostatics from energies and forces using polarizable atomic multipoles. Local equivariant descriptors predict environment-dependent latent monopoles, dipoles, and quadrupoles, while residual non-local charge transfer and polarization are captured by non-self-consistent...
|
| cs.AI 124 papers | ||||
| 7 |
Housing Potential Common Data Model and City Digital Twin
2605.05535
|
cs.AI
|
Megan Katsumi, Mark Fox, Anderson Wong, Divnoor Chatha |
The evaluation of housing potential requires consideration of a location from multiple perspectives, ranging from zoning and land use to population characteristics and access to services. This research introduces the Housing Potential Common Data Model (HPCDM)...The evaluation of housing potential requires consideration of a location from multiple perspectives, ranging from zoning and land use to population characteristics and access to services. This research introduces the Housing Potential Common Data Model (HPCDM) to overcome existing data silos, serving as a standard to support integration and interoperability across the diverse range of datasets that are required for housing potential analysis. This report details the evaluation of the model along...
|
| 8 |
AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases
2605.05538
|
cs.AIcs.IR
|
Susheel Suresh, Hazel Mak, Shangpo Chou, Fred Kroon, Sahil Bhatnagar |
We present AgenticRAG, a practical agentic harness for retrieval and analysis over enterprise knowledge bases. Standard RAG pipelines place significant burden of grounding on the search stack, constraining the language model to a fixed candidate set chosen dee...We present AgenticRAG, a practical agentic harness for retrieval and analysis over enterprise knowledge bases. Standard RAG pipelines place significant burden of grounding on the search stack, constraining the language model to a fixed candidate set chosen deep in the retrieval process. Our approach reduces this overdependence by layering a lightweight harness on top of existing enterprise search infrastructure, equipping a reasoning LLM with search, find, open, and summarize tools enabling the ...
|
| 11 |
SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs
2605.05546
|
cs.AI
|
Hyobin Park, Taeseop Kim, Dong-Geol Choi |
Self-play reinforcement learning has shown strong performance in domains with formally verifiable structure, such as mathematics and coding, where both problem generation and reward computation can be grounded in explicit rules. Extending this paradigm to scie...Self-play reinforcement learning has shown strong performance in domains with formally verifiable structure, such as mathematics and coding, where both problem generation and reward computation can be grounded in explicit rules. Extending this paradigm to scientific literature is more challenging: the relationships among multi-modal elements within and across documents are rarely made explicit in text, which makes automatic generation of relational reasoning questions difficult and weakens the r...
|
| 17 |
Who Prices Cognitive Labor in the Age of Agents? Compute-Anchored Wages
2605.05558
|
cs.AIcs.CY
|
Siqi Zhu |
A natural intuition about the economics of AI agents is that, because agents can be replicated at very low marginal cost, agent labor may be supplied highly elastically, placing downward pressure on cognitive-labor wages when it closely substitutes for human l...A natural intuition about the economics of AI agents is that, because agents can be replicated at very low marginal cost, agent labor may be supplied highly elastically, placing downward pressure on cognitive-labor wages when it closely substitutes for human labor. We argue this framing is wrong in mechanism but partially correct in conclusion, and that the correction matters for both theory and policy. \textbf{Agents are not labor; they are a production technology that converts compute capital ...
|
| 18 |
BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models
2605.05561
|
cs.AI
|
Sai Babu Patarlapalli, Surya Teja Avvaru |
Post-training quantization makes large reasoning models practical under tight memory and latency budgets, but it can distort the online signals that drive adaptive test-time compute allocation. Under a fixed cap on the number of newly generated tokens, miscali...Post-training quantization makes large reasoning models practical under tight memory and latency budgets, but it can distort the online signals that drive adaptive test-time compute allocation. Under a fixed cap on the number of newly generated tokens, miscalibrated confidence can lead to harmful early halting: the model may surface a plausible final line while the underlying reasoning is still wrong, or the controller may stop before the trace has stabilized. We study this interaction for greed...
|
| 19 |
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
2605.05566
|
cs.AIcs.CLcs.LG
|
Langlin Huang, Chengsong Huang, Jinyuan Li, Donghong Cai, Yuyi Yang |
Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, in complex tasks, GRPO frequently suffers from the ``zero-ad...Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, in complex tasks, GRPO frequently suffers from the ``zero-advantage problem'': when all sampled rollouts for a query fail, the relative advantage collapses to zero. Consequently, the model loses effective training signals for these questions, wasting the training data and computational budget. While...
|
| 20 |
Locality-aware Private Class Identification for Domain Adaptation with Extreme Label Shift
2605.05567
|
cs.AI
|
Chuan-Xian Ren, Cheng-Jun Guo, Hong Yan |
Domain adaptation aims to transfer knowledge from a labeled source domain to an unlabeled target domain with different distributions. In real-world scenarios, the label spaces of the two domains often have an inclusion relationship, where some classes exist on...Domain adaptation aims to transfer knowledge from a labeled source domain to an unlabeled target domain with different distributions. In real-world scenarios, the label spaces of the two domains often have an inclusion relationship, where some classes exist only in one domain but not the other. These non-overlapping classes are referred to as private classes. Identifying private class samples and mitigating their adverse effects is critical in the literature. Existing methods rely on the assumpt...
|
| 26 |
AlphaCrafter: A Full-Stack Multi-Agent Framework for Cross-Sectional Quantitative Trading
2605.05580
|
cs.AI
|
Yishuo Yuan, Jiayi Sheng, Sirui Zeng, Jiaqi Wang, Jiaheng Liu |
Financial markets are inherently non-stationary, driven by complex interactions among macroeconomic regimes, microstructural frictions, and behavioral dynamics. Building quantitative strategies that remain profitable demands the continuous coupling of factor d...Financial markets are inherently non-stationary, driven by complex interactions among macroeconomic regimes, microstructural frictions, and behavioral dynamics. Building quantitative strategies that remain profitable demands the continuous coupling of factor discovery, regime-adaptive selection, and risk-constrained execution. Prevailing approaches, however, optimize these components under static or isolated assumptions. Factor mining frameworks typically treat alpha discovery as a one-time sear...
|
| 28 |
Belief Memory: Agent Memory Under Partial Observability
2605.05583
|
cs.AIcs.CL
|
Junfeng Liao, Qizhou Wang, Jianing Zhu, Bo Du, Rui Yan |
LLM agents that operate over long context depend on external memory to accumulate knowledge over time. However, existing methods typically store each observation as a single deterministic conclusion (e.g., inferring "API~X failed" from temporary errors), even ...LLM agents that operate over long context depend on external memory to accumulate knowledge over time. However, existing methods typically store each observation as a single deterministic conclusion (e.g., inferring "API~X failed" from temporary errors), even though such observations are inherently partial and potentially ambiguous. By committing to one conclusion and discarding uncertainty, these methods introduce self-reinforcing error: the agent acts on the stored conclusion, never revisits a...
|
| 33 |
Causal Probing for Internal Visual Representations in Multimodal Large Language Models
2605.05593
|
cs.AI
|
Zehao Deng, Tianjie Ju, Zheng Wu, Liangbo He, Jun Lan |
Despite the remarkable success of Multimodal Large Language Models (MLLMs) across diverse tasks, the internal mechanisms governing how they encode and ground distinct visual concepts remain poorly understood. To bridge this gap, we propose a causal framework b...Despite the remarkable success of Multimodal Large Language Models (MLLMs) across diverse tasks, the internal mechanisms governing how they encode and ground distinct visual concepts remain poorly understood. To bridge this gap, we propose a causal framework based on activation steering to actively probe and manipulate internal visual representations. Through systematic intervention across four visual concept categories, our results reveal a divergence in concept encoding: entities exhibit disti...
|
| 35 |
Prober.ai: Gated Inquiry-Based Feedback via LLM-Constrained Personas for Argumentative Writing Development
2605.05598
|
cs.AIcs.HC
|
Ran Bi, Shiyao Wei, Yuanyiyi Zhou |
The proliferation of large language models (LLMs) in educational settings has paradoxically undermined the cognitive processes they purport to support. Students increasingly outsource critical thinking to AI assistants that generate polished text on demand, re...The proliferation of large language models (LLMs) in educational settings has paradoxically undermined the cognitive processes they purport to support. Students increasingly outsource critical thinking to AI assistants that generate polished text on demand, resulting in measurable cognitive debt and diminished argumentative reasoning skills. We present Prober.ai, a web-based writing environment that inverts the conventional AI-tutoring paradigm: rather than generating or rewriting student text, ...
|
| 49 |
From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms
2605.06716
|
cs.AIcs.CL
|
Jinghao Luo, Yuchen Tian, Chuxue Cao, Ziyang Luo, Hongzhan Lin |
Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. While memory mechanisms have emerged as the architectural cornerstone of these systems, current research remain...Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. While memory mechanisms have emerged as the architectural cornerstone of these systems, current research remains fragmented, oscillating between operating system engineering and cognitive science. This theoretical divide prevents a unified view of technological synthesis and a coherent evolutionary perspective. To bridge this gap, this survey propos...
|
| 53 |
Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG
2605.05643
|
cs.AIcs.IR
|
Jiarui Zhong, Hong Cai Chen |
Retrieval-Augmented Generation (RAG) has become a core paradigm for enhancing factual grounding and multi-hop reasoning in Large Language Models (LLMs). Traditional text-based RAG often retrieves logically irrelevant pseudo-evidence, while graph-based RAG is f...Retrieval-Augmented Generation (RAG) has become a core paradigm for enhancing factual grounding and multi-hop reasoning in Large Language Models (LLMs). Traditional text-based RAG often retrieves logically irrelevant pseudo-evidence, while graph-based RAG is frequently hindered by search-time pruning, which may discard potentially valid reasoning paths. Existing hybrid approaches primarily adopt simple evidence concatenation or unidirectional enhancement, which fails to address the fundamental "...
|
| 60 |
Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation
2605.05657
|
cs.AIcs.MA
|
Abhijit Talluri, Pujith Anne, Bhagavan Choudary Pendiyala, Raghavendra Chilukuri |
Multi-agent LLM systems for code generation face a fundamental routing problem: the optimal orchestration topology depends on the structural complexity of the code under modification, yet existing systems select topologies without consulting the codebase. We p...Multi-agent LLM systems for code generation face a fundamental routing problem: the optimal orchestration topology depends on the structural complexity of the code under modification, yet existing systems select topologies without consulting the codebase. We present Retrieval-Guided Adaptive Orchestration (RGAO), an architecture that closes this loop by extracting a structural complexity vector from a hierarchical code index before selecting the orchestration topology. RGAO operates within Code-...
|
| 65 |
Large Vision-Language Models Get Lost in Attention
2605.05668
|
cs.AIcs.CV
|
Gongli Xi, Ye Tian, Mengyu Yang, Huahui Yi, Liang Lin |
Despite the rapid evolution of training paradigms, the decoder backbone of large vision--language models (LVLMs) remains fundamentally rooted in the residual-connection Transformer architecture. Therefore, deciphering the distinct roles of internal modules is ...Despite the rapid evolution of training paradigms, the decoder backbone of large vision--language models (LVLMs) remains fundamentally rooted in the residual-connection Transformer architecture. Therefore, deciphering the distinct roles of internal modules is critical for understanding model mechanics and guiding architectural optimization. While prior statistical approaches have provided valuable attribution-based insights, they often lack a unified theoretical basis. To bridge this gap, we pro...
|
| 68 |
Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering
2605.05678
|
cs.AI
|
Xiaomin Li, Jianheng Hou, Zheyuan Deng, Zhiwei Zhang, Taoran Li |
Large reasoning models (LRMs) increasingly expose chain-of-thought-like reasoning for transparency, verification, and deliberate problem solving. This creates a safety blind spot: harmful or policy-violating content may appear in reasoning traces even when fin...Large reasoning models (LRMs) increasingly expose chain-of-thought-like reasoning for transparency, verification, and deliberate problem solving. This creates a safety blind spot: harmful or policy-violating content may appear in reasoning traces even when final answers appear safe. We test whether final-answer safety is a sufficient proxy for the full reasoning-answer trajectory by scoring both stages under a unified twenty-principle safety rubric. Using prompts from seven public harmfulness an...
|
| 73 |
Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination
2605.05686
|
cs.AI
|
Qiyao Liang, Risto Miikkulainen, Ila Fiete |
Language models draw on two knowledge sources: facts baked into weights (parametric memory, PM) and information in context (working memory, WM). We study two mechanistically distinct failure modes--conflict, when PM and WM disagree and interfere; and hallucina...Language models draw on two knowledge sources: facts baked into weights (parametric memory, PM) and information in context (working memory, WM). We study two mechanistically distinct failure modes--conflict, when PM and WM disagree and interfere; and hallucination, when the queried fact was never learned. Both produce confident output regardless, making output-based monitoring blind by design. We show both failures share a unified geometric account. In the hidden-state space of autoregressive ge...
|
| 74 |
DataDignity: Training Data Attribution for Large Language Models
2605.05687
|
cs.AI
|
Xiaomin Li, Andrzej Banburski-Fahey, Jaron Lanier |
Auditing language-model outputs often requires more than judging correctness: an auditor may need to identify which source document most likely supports the knowledge expressed in a response. We study this as pinpoint provenance: given a prompt, a target-model...Auditing language-model outputs often requires more than judging correctness: an auditor may need to identify which source document most likely supports the knowledge expressed in a response. We study this as pinpoint provenance: given a prompt, a target-model response, and a candidate corpus, rank the documents that best support the response. We introduce FakeWiki, a controlled benchmark of 3,537 fabricated Wikipedia-style articles designed to preserve ground-truth provenance while weakening le...
|
| 76 |
GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model
2605.05689
|
cs.AI
|
Shaozhen Ma, Wei Huang, Hanchen Wang, Dong Wen, Wenjie Zhang |
Conditional generative models, particularly diffusion-based methods, have recently been applied to graph prediction by modeling the target as a conditional distribution given the input graph, yielding competitive results compared to deterministic predictor. Ho...Conditional generative models, particularly diffusion-based methods, have recently been applied to graph prediction by modeling the target as a conditional distribution given the input graph, yielding competitive results compared to deterministic predictor. However, existing diffusion-based prediction methods typically require expensive iterative denoising at inference and often suffer from unstable sampling, which motivates recent efforts to reduce inference denoising steps and enable stable sa...
|
| 78 |
Saliency-Aware Regularized Quantization Calibration for Large Language Models
2605.05693
|
cs.AIcs.LG
|
Yanlong Zhao, Xiaoyuan Cheng, Huihang Liu, Baihua He, Xinyu Zhang |
Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predeter...Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predetermined calibration dataset, typically optimized via either scale search or Gram-based methods. However, from the perspective of generalization risk, existing PTQ calibration objectives based solely on empirical reconstruction error over limi...
|
| 84 |
Inference-Time Budget Control for LLM Search Agents
2605.05701
|
cs.AI
|
Zhengru Fang, Senkang Forest Hu, Zhonghao Chang, Yu Guo, Yihang Tao |
LLM search agents increasingly rely on tools at inference time, but their trajectories are often constrained by hard limits on both tool calls and generated tokens. Under such dual budgets, better answers require not only stronger models, but also explicit con...LLM search agents increasingly rely on tools at inference time, but their trajectories are often constrained by hard limits on both tool calls and generated tokens. Under such dual budgets, better answers require not only stronger models, but also explicit control over which search action should receive the next budget unit and when the accumulated evidence is sufficient to commit a final answer. We study this problem in multi-hop question answering (QA) and formulate it as two-stage inference-t...
|
| 85 |
Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents
2605.05702
|
cs.AI
|
Huyu Wu, Jun Liu, Xiaochi Wei, Yan Gao, Yi Wu |
Self-evolving search agents reduce reliance on human-written training questions by generating and solving their own search tasks. We build on Search Self-Play (SSP), a representative Proposer and Solver framework in which questions are generated and answered v...Self-evolving search agents reduce reliance on human-written training questions by generating and solving their own search tasks. We build on Search Self-Play (SSP), a representative Proposer and Solver framework in which questions are generated and answered via multi-step search and reasoning. In practice, however, SSP faces two bottlenecks: the Proposer constructs questions from isolated answer entities without relational context, yielding many invalid or unverifiable questions in early self-p...
|
| 89 |
Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicine
2605.05706
|
cs.AIq-bio.QM
|
Peisong Zhang, Manqiang Peng, Yuxuan Wu, Pawit Phadungsaksawasdi, Wesley Yeung |
Estimating individualized treatment effects from longitudinal observational data is central to data-driven medicine, yet existing methods face a fundamental limitation: reducing confounding bias often suppresses clinically informative heterogeneity, degrading ...Estimating individualized treatment effects from longitudinal observational data is central to data-driven medicine, yet existing methods face a fundamental limitation: reducing confounding bias often suppresses clinically informative heterogeneity, degrading patient-specific predictions. Here, we identify this tension as a bias-precision paradox in causal representation learning and introduce sampling-based maximum mean discrepancy (sMMD), a stochastic alignment strategy that replaces global ad...
|
| 90 |
Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs
2605.05709
|
cs.AI
|
Md Farhamdur Reza, Richeng Jin, Tianfu Wu, Huaiyu Dai |
Intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs) transform a harmful query into a concealed multimodal input to bypass safety mechanisms. We show that such attacks are governed by a \emph{reconstruction--concealment tradeo...Intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs) transform a harmful query into a concealed multimodal input to bypass safety mechanisms. We show that such attacks are governed by a \emph{reconstruction--concealment tradeoff}: the transformed input must hide harmful intent from safety filters while remaining recoverable enough for the victim model to reconstruct the original request. Through a reconstruction analysis of three representative black-box methods...
|
| 95 |
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
2605.05715
|
cs.AIcs.CLcs.LG
|
Ming Liu |
Can linearly decodable failure signals in LLM hidden states be leveraged to correct those failures? We investigate this classification-correction gap via Overthinking (OT)--a stable behavioral regime (Jaccard >= 0.81, 94% inter-annotator agreement) in medic...Can linearly decodable failure signals in LLM hidden states be leveraged to correct those failures? We investigate this classification-correction gap via Overthinking (OT)--a stable behavioral regime (Jaccard >= 0.81, 94% inter-annotator agreement) in medical QA where models answer correctly under resampling yet fail in extended chain-of-thought. OT is linearly decodable at 71.6% balanced accuracy (p < 10^{-16}). Yet five families of fixed linear steering (29 configurations, n=1,273) all yield D...
|
| 96 |
More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding
2605.05716
|
cs.AIcs.CL
|
Ming Liu |
LLM agent systems are built by stacking scaffolding components (planning, tools, memory, self-reflection, retrieval) assuming more is better. We study cross-component interference (CCI): degradation when components interact destructively. We run a full factori...LLM agent systems are built by stacking scaffolding components (planning, tools, memory, self-reflection, retrieval) assuming more is better. We study cross-component interference (CCI): degradation when components interact destructively. We run a full factorial experiment over all 2^5=32 subsets of five components on HotpotQA and GSM8K with Llama-3.1-8B/70B (96 conditions, up to 10 seeds). The All-In system is consistently suboptimal: on HotpotQA, a single-tool agent surpasses All-In by 32% (F1...
|
| 100 |
Detecting Time Series Anomalies Like an Expert: A Multi-Agent LLM Framework with Specialized Analyzers
2605.05725
|
cs.AI
|
Hyeongwon Kang, Jeongseob Kim, Jinwoo Park, Pilsung Kang |
Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly infer anomaly indices or intervals, limiting controllability, interpretability, and reliabili...Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly infer anomaly indices or intervals, limiting controllability, interpretability, and reliability for complex anomaly patterns. We propose SAGE (Specialized Analyzer Group for Expert-like Detection), a multi-agent framework for structured anomaly diagnosis in univariate time series. It decomposes anomaly analysis into four specialize...
|
| 101 |
SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents
2605.05726
|
cs.AI
|
Hongcheol Cho, Ryangkyung Kang, Youngeun Kim |
As LLM agents are increasingly deployed with large libraries of reusable skills, selecting the right skill for a user request has become a critical systems challenge. In small libraries, users may invoke skills explicitly by name, but this assumption breaks do...As LLM agents are increasingly deployed with large libraries of reusable skills, selecting the right skill for a user request has become a critical systems challenge. In small libraries, users may invoke skills explicitly by name, but this assumption breaks down as skill ecosystems grow under tight context and latency budgets. Despite its practical importance, skill retrieval remains underexplored, with limited benchmarks and little understanding of retrieval behavior on realistic skill librarie...
|
| 103 |
Knee Osteoarthritis Severity Grading Using Optimized Deep Learning and LLM-Driven Intelligent AI on Computationally Limited Systems
2605.05731
|
cs.AI
|
Dayam Nadeem, Neha, Safdar Mustafa, Adnan Alvi, Mohd Hussain |
Knee osteoarthritis (KOA) is among the musculoskeletal disorders that considerably restrict joint mobility, cause severe chronic pain and impact negatively on quality life. It is one of the persistent health issues worldwide. Generally, subjectivity and inter-...Knee osteoarthritis (KOA) is among the musculoskeletal disorders that considerably restrict joint mobility, cause severe chronic pain and impact negatively on quality life. It is one of the persistent health issues worldwide. Generally, subjectivity and inter-observer variability undermine conventional practices and evaluation process that are adopted to address such health issues. Hence precise and timely diagnosis would be one of the effective ways for the assessment of its severity. This pape...
|
| 105 |
SDFlow: Similarity-Driven Flow Matching for Time Series Generation
2605.05736
|
cs.AI
|
Wei Li, Shibo Feng, Pengcheng Wu, Min Wu, Peilin Zhao |
Vector quantization (VQ) with autoregressive (AR) token modeling is a widely adopted and highly competitive paradigm for time-series generation. However, such models are fundamentally limited by exposure bias: during inference, errors can accumulate across seq...Vector quantization (VQ) with autoregressive (AR) token modeling is a widely adopted and highly competitive paradigm for time-series generation. However, such models are fundamentally limited by exposure bias: during inference, errors can accumulate across sequential predictions, leading to pronounced quality degradation in long-horizon generation. To address this, we propose SDFlow ($\textbf{S}$imilarity-$\textbf{D}$riven $\textbf{Flow}$ Matching), a non-autoregressive framework that operates e...
|
| 106 |
ReFlect: An Effective Harness System for Complex Long-Horizon LLM Reasoning
2605.05737
|
cs.AIcs.CL
|
Fan Huang |
Current reasoning paradigms for LLMs include chain-of-thought, ReAct, and post-hoc self-critique. These paradigms rely on two assumptions that fail on long-horizon, multi-stage tasks. As a result, errors accumulate silently across reasoning steps, leaving an o...Current reasoning paradigms for LLMs include chain-of-thought, ReAct, and post-hoc self-critique. These paradigms rely on two assumptions that fail on long-horizon, multi-stage tasks. As a result, errors accumulate silently across reasoning steps, leaving an open question: can a reasoning system effectively detect and recover from its own failures? We present ReFlect, a \emph{harness} system for LLM reasoning that creates standalone error detection and recovery logic as a deterministic wrapper a...
|
| 109 |
HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory
2605.05741
|
cs.AI
|
Chengda Lu, Xiaoyu Fan, Wei Xu |
While Large Language Models (LLMs) achieve strong performance across diverse tasks, their inference dynamics remain poorly understood because of the limited resolution of existing analysis tools. In this work, we identify an intrinsic magnification mechanism i...While Large Language Models (LLMs) achieve strong performance across diverse tasks, their inference dynamics remain poorly understood because of the limited resolution of existing analysis tools. In this work, we identify an intrinsic magnification mechanism in transformer architectures: deeper layers inherently magnify the small changes of layer-wise confidence, providing a fine-grained confidence trajectory. Building on this insight, we introduce HyperLens, a high-resolution probe designed to ...
|
| 112 |
Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback
2605.05745
|
cs.AI
|
Qirun Zeng, Xuchuang Wang, Jiayi Shen, Xutong Liu, Fang Kong |
We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both...We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumpt...
|
| 114 |
Evaluating Explainability in Safety-Critical ATR Systems: Limitations of Post-Hoc Methods and Paths Toward Robust XAI
2605.05748
|
cs.AI
|
Vanessa Buhrmester, David Muench, Dimitri Bulatov, Michael Arens |
Explainable Artificial Intelligence (XAI) is increasingly rec ognized as essential for deploying machine learning systems in safety critical environments. In Automatic Target Recognition (ATR), where models operate on image, video, radar, and multisensor data,...Explainable Artificial Intelligence (XAI) is increasingly rec ognized as essential for deploying machine learning systems in safety critical environments. In Automatic Target Recognition (ATR), where models operate on image, video, radar, and multisensor data, high pre dictive performance alone is insufficient. Model decisions must also be interpretable, reliable, and suitable for validation. This paper presents a structured evaluation of explainability methods in the context of safety-critical ...
|
| 128 |
Confidence is the key: how conformal prediction enhances the generative design of permeable peptides
2605.05770
|
cs.AI
|
Laura van Weesep, Sunay Chankeshwara, Leonardo De Maria, Florian David, Ola Engkvist |
Generative models coupled with reinforcement learning (RL), such as REINVENT and PepINVENT, have emerged as a powerful framework for de novo molecular design. During the ideation process these generative frameworks utilize various predictive models as part of ...Generative models coupled with reinforcement learning (RL), such as REINVENT and PepINVENT, have emerged as a powerful framework for de novo molecular design. During the ideation process these generative frameworks utilize various predictive models as part of the optimization objectives. However, the utility of the predictive models can be limited by their domain of applicability. When RL is used to explore the chemical space with predictive models, it can suggest molecules that lie outside the ...
|
| 129 |
CircuitFormer: A Circuit Language Model for Analog Topology Design from Natural Language Prompt
2605.05773
|
cs.AI
|
Md Touhidul Islam, Sujan Kumar Saha, Farimah Farahmandi, Mark Tehranipoor |
Automating analog circuit design remains a longstanding challenge in Electronic Design Automation (EDA). While Transformer-based Large Language Models (LLMs) have revolutionized software code generation, their application to analog hardware design is hindered ...Automating analog circuit design remains a longstanding challenge in Electronic Design Automation (EDA). While Transformer-based Large Language Models (LLMs) have revolutionized software code generation, their application to analog hardware design is hindered by two critical limitations: (i) the scarcity of analog design datasets containing natural language description of a design and its corresponding netlist, and (ii) the inefficiency of general-purpose tokenizers (e.g., Byte Pair Encoding (BP...
|
| 131 |
HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning
2605.05776
|
cs.AI
|
Yu Feng, Zhen Tian, Haoran Luo, Xie Yu, Diancheng Cheng |
Domain Incremental Learning is a critical scenario that requires models to continuously adapt to new data domains without retraining. However, domain shifts often cause severe performance degradation. To address this, we propose Hybrid Energy-Distance Prompt, ...Domain Incremental Learning is a critical scenario that requires models to continuously adapt to new data domains without retraining. However, domain shifts often cause severe performance degradation. To address this, we propose Hybrid Energy-Distance Prompt, a domain-incremental framework inspired by Helmholtz free energy. HEDP introduces an energy regularization loss to enhance the separability of domain representations and a hybrid energy-distance weighted mechanism that fuses energy-based an...
|
| 133 |
Von Neumann Networks
2605.05780
|
cs.AIcs.CVcs.LG
|
Shekhar S. Chandra |
In the mid-twentieth century, mathematician and polymath John von Neumann created a computational system on an array of cells as a simple model of the human brain, where each cell had one of a finite set of roles or states that he predicted would be modelled b...In the mid-twentieth century, mathematician and polymath John von Neumann created a computational system on an array of cells as a simple model of the human brain, where each cell had one of a finite set of roles or states that he predicted would be modelled by a diffusion process. In this work, we show that such a system, when developed in a modern deep learning setting, enables the construction of an artificial neuron having specialized roles that can be learnt. We refer to this neuron as the ...
|
| 145 |
Sheet as Token: A Graph-Enhanced Representation for Multi-Sheet Spreadsheet Understanding
2605.05811
|
cs.AI
|
Yiming Lei, Yiqi Wang, Yujia Zhang, Bo Guan, Depei Zhu |
Workbook-scale spreadsheet understanding is increasingly important for language-model-based data analysis agents, but remains challenging because relevant information is often distributed across multiple sheets with heterogeneous schemas, layouts, and implicit...Workbook-scale spreadsheet understanding is increasingly important for language-model-based data analysis agents, but remains challenging because relevant information is often distributed across multiple sheets with heterogeneous schemas, layouts, and implicit relationships. Existing retrieval-augmented approaches typically decompose spreadsheets into rows, columns, or blocks to improve scalability; however, such chunk-centric representations can fragment worksheets into isolated text spans and ...
|
| 146 |
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
2605.05812
|
cs.AI
|
Armaan A. Abraham, Lucy Xiaoyang Shi, Chelsea Finn |
Off-policy, value-based reinforcement learning methods such as Q-learning are appealing because they can learn from arbitrary experience, including data collected by older policies or other agents. In practice, however, bootstrapping makes long-horizon learnin...Off-policy, value-based reinforcement learning methods such as Q-learning are appealing because they can learn from arbitrary experience, including data collected by older policies or other agents. In practice, however, bootstrapping makes long-horizon learning brittle: estimation errors at later states propagate backward through temporal-difference (TD) updates and can compound over time. We propose long-horizon Q-learning (LQL), which introduces a principled backstop against compounding error ...
|
| 151 |
AGPO: Asymmetric Group Policy Optimization for Verifiable Reasoning and Search Ads Relevance at JD
2605.05826
|
cs.AI
|
Yang Xu, Kun Yao, Yiming Deng, Zheng Fang, Kai Ming Ting |
Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated notable success in enhancing the reasoning performance of large language models (LLMs). However, recent studies reveal that while current RLVR methods improve sampling efficiency towards co...Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated notable success in enhancing the reasoning performance of large language models (LLMs). However, recent studies reveal that while current RLVR methods improve sampling efficiency towards correct paths, they do not elicit fundamentally new reasoning patterns. Instead, the reasoning capability boundary of trained models often narrows compared to their base models, with base models achieving higher coverage at large sample sizes...
|
| 153 |
MolRecBench-Wild: A Real-World Benchmark for Optical Chemical Structure Recognition
2605.05832
|
cs.AI
|
Haote Yang, Hui Wang, Chen Zhu, Jingchao Wang, Linye Li |
Optical Chemical Structure Recognition (OCSR) aims to translate molecular diagrams in scientific literature into machine-readable formats, but current systems remain unreliable on real-world images due to substantial visual and chemical complexity. We introduc...Optical Chemical Structure Recognition (OCSR) aims to translate molecular diagrams in scientific literature into machine-readable formats, but current systems remain unreliable on real-world images due to substantial visual and chemical complexity. We introduce MOSAIC, a dual-dimensional difficulty framework with 37 fine-grained labels that jointly characterize visual interference and chemical semantic challenges in molecular diagrams. Based on this framework, we construct MolRecBench-Wild, a be...
|
| 154 |
On the Role of Language Representations in Auto-Bidding: Findings and Implications
2605.05833
|
cs.AI
|
Guanyu Zhu, Jining Luan, Hanwen Du, Xinyu Fang, Sibo Xu |
Auto-bidding is a crucial task in real-time advertising markets, where policies must optimize long-horizon value under delivery constraints (e.g., budget and CPA). Existing methods for auto-bidding rely on compact numerical state representations: while they ca...Auto-bidding is a crucial task in real-time advertising markets, where policies must optimize long-horizon value under delivery constraints (e.g., budget and CPA). Existing methods for auto-bidding rely on compact numerical state representations: while they can implicitly capture delivery dynamics, they offer limited support for explicitly representing and controlling high-level intent, evolving feedback, and operator-style strategic guidance in real campaigns. Meanwhile, Large Language Models (...
|
| 157 |
Taklif.AI: LLM-Powered Platform for Interest-Based Personalized College Assignments
2605.05842
|
cs.AI
|
Zaki Kurdya, Mohammed Zuqlam, Salem Amassi, Shady Telbany, Motaz Saad |
Educators face significant challenges in creating engaging, personalized assignments that accommodate students' diverse interests and cognitive abilities. Traditional one-size-fits-all assignments frequently lead to decreased student engagement and increased r...Educators face significant challenges in creating engaging, personalized assignments that accommodate students' diverse interests and cognitive abilities. Traditional one-size-fits-all assignments frequently lead to decreased student engagement and increased reliance on unethical practices such as plagiarism. To address these challenges, we present Taklif.AI, a platform that leverages Large Language Models (LLMs) to automatically generate personalized assignments tailored to individual student i...
|
| 162 |
AirQualityBench: A Realistic Evaluation Benchmark for Global Air Quality Forecasting
2605.05854
|
cs.AI
|
Xing Xu, Xu Wang, Yudong Zhang, Huilin Zhao, Zhengyang Zhou |
Air-quality forecasting models are commonly evaluated on regional, preprocessed, and normalized datasets, where missing observations are removed or artificially completed. Such protocols simplify comparison but hide the conditions that dominate real monitoring...Air-quality forecasting models are commonly evaluated on regional, preprocessed, and normalized datasets, where missing observations are removed or artificially completed. Such protocols simplify comparison but hide the conditions that dominate real monitoring networks: uneven global coverage, structured missingness, heterogeneous pollutant scales, and deployment cost. We introduce \textbf{AirQualityBench}, a global multi-pollutant benchmark designed to evaluate forecasting models under these re...
|
| 166 |
SANEmerg: An Emergent Communication Framework for Semantic-aware Agentic AI Networking
2605.05861
|
cs.AIcs.NI
|
Yong Xiao, Haoran Zhou, Yujie Zhou, Marwan Krunz |
Future networking systems are envisioned to become part of an agentic AI-native ecosystem in which a vast number of heterogeneous and specialized AI agents cooperate seamlessly to fulfill complex user requirements in real time. However, traditional networking ...Future networking systems are envisioned to become part of an agentic AI-native ecosystem in which a vast number of heterogeneous and specialized AI agents cooperate seamlessly to fulfill complex user requirements in real time. However, traditional networking paradigms are characterized by a rigid decoupling of communication and computation, which often leads to significant inefficiencies in large-scale agentic AI networking (AgentNet) systems. Emergent communication offers a novel solution by e...
|
| 170 |
XDecomposer: Learning Prior-Free Set Decomposition for Multiphase X-ray Diffraction
2605.05866
|
cs.AIcond-mat.mtrl-scics.LG
|
Hanyu Gao, Bin Cao, Yunyue Su, Tong-Yi Zhang, Qiang Liu |
Multiphase powder X-ray diffraction (PXRD) analysis remains a fundamental bottleneck in structure identification, as real-world synthesis often produces complex mixtures whose constituent phases (components) cannot be reliably disentangled. While recent advanc...Multiphase powder X-ray diffraction (PXRD) analysis remains a fundamental bottleneck in structure identification, as real-world synthesis often produces complex mixtures whose constituent phases (components) cannot be reliably disentangled. While recent advances in representation-based crystal retrieval and generation suggest the possibility of inferring structures directly from PXRD, existing approaches largely assume single-phase inputs and break down in multiphase settings. Here, we present X...
|
| 171 |
When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment
2605.06723
|
cs.AIcs.CLcs.LG
|
Long Zhang, Wei-neng Chen, Feng-feng Wei, Zi-bo Qin |
Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilizat...Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilization}. For a model state and specified answer verbalizers, we project the model's own continuation probabilities onto a finite answer set; in binary tasks this yields an exact log-odds code, $δ(ξ)=S_θ(\mathrm{yes}\midξ)-S_θ(\mathrm{no}\midξ)...
|
| 176 |
Agentic, Context-Aware Risk Intelligence in the Internet of Value
2605.05878
|
cs.AI
|
Basel Magableh, OmniRisk Research |
The Internet of Value (IoV) is a heterogeneous, partially-trusted network in which the dominant marginal risk is composite (route, sentiment, liquidity, and the policy a system is willing to commit to) rather than a property of any single chain. We argue that ...The Internet of Value (IoV) is a heterogeneous, partially-trusted network in which the dominant marginal risk is composite (route, sentiment, liquidity, and the policy a system is willing to commit to) rather than a property of any single chain. We argue that a risk primitive adequate for this regime is a composition of five engines: a prediction engine over price, liquidity, volatility, and route health; a Bittensor verification subnet that decentralises and economically scores prediction outpu...
|
| 190 |
Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning
2605.05909
|
cs.AI
|
Yuhang Wang, Zhenxing Niu, Haoxuan Ji, Guangyu He, Linlin Zhang |
The core challenge of machine unlearning is to strike a balance between target knowledge removal and non-target knowledge retention. In the context of Multimodal Large Language Models (MLLMs), this challenge becomes even more pronounced, as knowledge is furthe...The core challenge of machine unlearning is to strike a balance between target knowledge removal and non-target knowledge retention. In the context of Multimodal Large Language Models (MLLMs), this challenge becomes even more pronounced, as knowledge is further divided into visual and textual modalities that are tightly intertwined. In this paper, we introduce an MLLM unlearning approach that aims to forget target visual knowledge while preserving non-target visual knowledge and all textual know...
|
| 192 |
PREFER: Personalized Review Summarization with Online Preference Learning
2605.05911
|
cs.AIcs.GTcs.LGeess.SYmath.OC
|
Millend Roy, Agostino Capponi, Vineet Goyal |
Product reviews significantly influence purchasing decisions on e-commerce platforms. However, the sheer volume of reviews can overwhelm users, obscuring the information most relevant to their specific needs. Current e-commerce summarization systems typically ...Product reviews significantly influence purchasing decisions on e-commerce platforms. However, the sheer volume of reviews can overwhelm users, obscuring the information most relevant to their specific needs. Current e-commerce summarization systems typically produce generic, static summaries that fail to account for the fact that (i) different users care about different product characteristics, and (ii) these preferences may evolve with interactions. To address the challenge of unknown latent p...
|
| 194 |
Wisteria: A Unified Multi-Scale Feature Learning Framework for DNA Language Model
2605.05913
|
cs.AI
|
Weihua Wang, Haoji Li, Feilong Bao, Lei Yang, Guanglai Gao |
DNA language model aims to decipher the regulatory grammar and semantic of genomes by capturing long range dependencies in DNA sequences. Existing methods emphasize long range token interactions but often ignore the interplay between local motifs and global de...DNA language model aims to decipher the regulatory grammar and semantic of genomes by capturing long range dependencies in DNA sequences. Existing methods emphasize long range token interactions but often ignore the interplay between local motifs and global dependencies. In this paper, we propose Wisteria, a genomic language model that integrates multi scale feature learning within a unified framework for DNA sequence. Specifically, Wisteria augments the Mamba based architecture with gated dilat...
|
| 197 |
Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery
2605.05921
|
cs.AIcs.HC
|
Alex Bäuerle, Adam Connors, Alexander Novikov, Adam Zsolt Wagner, Ngân Vũ |
Artificial intelligence offers powerful new tools for scientific discovery, but the interaction paradigms required to effectively harness these systems remain underexplored. In this paper, we present findings from a formative user study with 11 expert mathemat...Artificial intelligence offers powerful new tools for scientific discovery, but the interaction paradigms required to effectively harness these systems remain underexplored. In this paper, we present findings from a formative user study with 11 expert mathematicians who used AlphaEvolve, an evolutionary coding agent, to tackle advanced problems in their fields of expertise. We identify and characterize a distinct workflow we term intentmaking, the iterative process of discovering, defining, and ...
|
| 201 |
Which Are the Low-Resource Languages of the Semantic Web?
2605.05929
|
cs.AI
|
Ndeye-Emilie Mbengue, Pierre Monnin, Miguel Couceiro, Fabien Gandon |
Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from the global digital transformation. Multilingual Linked Open Data Knowledge Graphs (LOD KGs) co...Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from the global digital transformation. Multilingual Linked Open Data Knowledge Graphs (LOD KGs) could contribute to mitigating this divide through cross-lingual transfer; however, no clear quantitative definition of low-resource languages has yet been established in the context of LOD KGs. In this poster, we present a methodology to ana...
|
| 202 |
In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs
2605.05931
|
cs.AI
|
Ndeye-Emilie Mbengue |
Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from participating in the global digital transformation. In this PhD proposal, we aim to address th...Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from participating in the global digital transformation. In this PhD proposal, we aim to address this gap, focusing on the language coverage of Linked Open Data knowledge graphs (LOD KGs). First, we identify key variables that characterize language distribution in LOD, including the number of Wikipedia articles per language edition and t...
|
| 204 |
ICU-Bench:Benchmarking Continual Unlearning in Multimodal Large Language Models
2605.05938
|
cs.AI
|
Yuhang Wang, Wenjie Mei, Junkai Zhang, Guangyu He, Zhenxing Niu |
Although Multimodal Large Language Models (MLLMs) have achieved remarkable progress across many domains, their training on large-scale multimodal datasets raises serious privacy concerns, making effective machine unlearning increasingly necessary. However, exi...Although Multimodal Large Language Models (MLLMs) have achieved remarkable progress across many domains, their training on large-scale multimodal datasets raises serious privacy concerns, making effective machine unlearning increasingly necessary. However, existing benchmarks mainly focus on static or short-sequence settings, offering limited support for evaluating continual privacy deletion requests in realistic deployments. To bridge this gap, we introduce ICU-Bench, a continual multimodal unl...
|
| 209 |
MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System
2605.05949
|
cs.AIcs.SE
|
Yuliang Xu, Xiang Xu, Yao Wan, Hu Wei, Tong Jia |
Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios. Existing approaches predominantly rely on model-c...Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios. Existing approaches predominantly rely on model-centric strategies, such as architectural modifications and data scaling, which are costly and offer limited interpretability. Alternative methods leveraging external tools or prompting techniques (e.g., chain-of-thought) are often fragmente...
|
| 211 |
HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning
2605.05951
|
cs.AI
|
Haoyun Tang, Haodong Cui, Keyao Xu, Kun Wang, Zhandong Mei |
World models enable model-based planning through learned latent dynamics, but imagined rollouts become unstable as the planning horizon grows or the dynamics distribution shifts. We argue that this instability reflects two missing structures in planner-facing ...World models enable model-based planning through learned latent dynamics, but imagined rollouts become unstable as the planning horizon grows or the dynamics distribution shifts. We argue that this instability reflects two missing structures in planner-facing latents: history-conditioned memory for approximate Markov completeness, and geometric organization that separates configuration, momentum, and task semantics. We propose HaM-World (HMW), a structured world model that decomposes the latent ...
|
| 215 |
Temporal Smoothness Doubly Robust Learning for Debiased Knowledge Tracing
2605.05958
|
cs.AI
|
Peilin Zhan, Wei Chen, Weilin Chen, Shuyi Pan, Ruichu Cai |
Knowledge Tracing (KT) is fundamental to intelligent education systems, yet relies on educational logs that are selectively observed. The non-random nature of exercise recommendations and student choices inevitably induces severe selection bias. Most existing ...Knowledge Tracing (KT) is fundamental to intelligent education systems, yet relies on educational logs that are selectively observed. The non-random nature of exercise recommendations and student choices inevitably induces severe selection bias. Most existing KT methods neglect this issue, training on observed logs using standard empirical risk, which yields biased mastery estimates and accumulates errors in subsequent recommendations. To address this, we introduce a doubly robust (DR) formulati...
|
| 216 |
From Coordinate Matching to Structural Alignment: Rethinking Prototype Alignment in Heterogeneous Federated Learning
2605.05959
|
cs.AIcs.DCcs.LG
|
Xinghao Wu, Jianwei Niu, Guogang Zhu, Xuefeng Liu, Shaojie Tang |
Heterogeneous federated learning (HtFL) aims to enable collaboration among clients that differ in both data distributions and model architectures. Prototype-based methods, which communicate class-level feature centers (prototypes) instead of full model paramet...Heterogeneous federated learning (HtFL) aims to enable collaboration among clients that differ in both data distributions and model architectures. Prototype-based methods, which communicate class-level feature centers (prototypes) instead of full model parameters, have recently shown strong potential for HtFL. Existing prototype-based HtFL methods typically reuse the MSE-based or cosine-based alignment mechanism developed for homogeneous FL when aligning client-specific representations with glob...
|
| 218 |
TheraAgent: Self-Improving Therapeutic Agent for Precise and Comprehensive Treatment Planning
2605.05963
|
cs.AIcs.CL
|
Junkai Li, Yunghwei Lai, Tianyi Zhu, Zheng Long Lee, Weizhi Ma |
Formulating a treatment plan is inherently a complex reasoning and refinement task rather than a simple generation problem. However, existing large language models (LLMs) mainly rely on one-shot output without explicit verification, which may result in rough, ...Formulating a treatment plan is inherently a complex reasoning and refinement task rather than a simple generation problem. However, existing large language models (LLMs) mainly rely on one-shot output without explicit verification, which may result in rough, incomplete, and potentially unsafe treatment plans. To address these limitations, we propose TheraAgent, an agentic framework that replaces one-shot generation with an iterative generate-judge-refine pipeline. By mirroring the actual reason...
|
| 226 |
BehaviorGuard: Online Backdoor Defense for Deep Reinforcement Learning
2605.05977
|
cs.AI
|
Yinbo Yu, Xueyu Yin, Jiadai Wang, Chunwei Tian, Sai Xu |
Backdoor attacks pose a serious threat to deep reinforcement learning (DRL). Current defenses typically rely on reward anomalies to reverse-engineer triggers and model finetuning to remove backdoors. However, complex trigger patterns undermine their robustness...Backdoor attacks pose a serious threat to deep reinforcement learning (DRL). Current defenses typically rely on reward anomalies to reverse-engineer triggers and model finetuning to remove backdoors. However, complex trigger patterns undermine their robustness, and fine-tuning entails high costs, limiting practical utility. Therefore, we shift defense concerns to trigger-agnostic backdoor output behaviors and propose BehaviorGuard, an online behavior-based backdoor detection and mitigation frame...
|
| 228 |
TACT: Mitigating Overthinking and Overacting in Coding Agents via Activation Steering
2605.05980
|
cs.AI
|
Yuan Sui, Yulin Chen, Yibo Li, Xue Jiang, Yufei He |
When language model agents tackle complex software engineering tasks, they often degrade over long trajectories, which we define as *agent drift*. We focus on two recurring failure modes *overthinking* and *overacting*, i.e., where the agent repeatedly reasons...When language model agents tackle complex software engineering tasks, they often degrade over long trajectories, which we define as *agent drift*. We focus on two recurring failure modes *overthinking* and *overacting*, i.e., where the agent repeatedly reasons over information it already has, and where it issues tool calls without integrating recent observations or acquiring new evidence. In this paper, we introduce TACT (Think-Act Calibration via activation Steering), to detect and mitigate age...
|
| 231 |
BioResearcher: Scenario-Guided Multi-Agent for Translational Medicine
2605.05985
|
cs.AIcs.MAq-bio.QM
|
Remigiusz Kinas, Joanna Krawczyk, Rafał Powalski, Przemysław Pietrzak, Agnieszka Kowalewska |
Translational medicine turns underspecified development goals into evidence synthesis that must combine literature, trials, patents, and quantitative multi-omics analysis while preserving identifiers, uncertainty, and retrievable provenance. General-purpose fo...Translational medicine turns underspecified development goals into evidence synthesis that must combine literature, trials, patents, and quantitative multi-omics analysis while preserving identifiers, uncertainty, and retrievable provenance. General-purpose foundation models and off-the-shelf tool-augmented or multi-agent systems are not built for this: they tend to produce single-shot answers or run open-endedly, and fall short on the auditable, scenario-specific workflows that heterogeneous bi...
|
| 248 |
Strat-LLM: Stratified Strategy Alignment for LLM-based Stock Trading with Real-time Multi-Source Signals
2605.06024
|
cs.AI
|
Wenliang Huang, Zengyi Yu |
Large Language Models (LLMs) are evolving into autonomous trading agents, yet existing benchmarks often overlook the interplay between architectural reasoning and strategy consistency. We propose Strat-LLM, a framework grounded in Stratified Strategy Alignment...Large Language Models (LLMs) are evolving into autonomous trading agents, yet existing benchmarks often overlook the interplay between architectural reasoning and strategy consistency. We propose Strat-LLM, a framework grounded in Stratified Strategy Alignment. Operating in a live-forward setting throughout 2025, it integrates heterogeneous data including sequential prices, real-time news, and annual reports to eliminate look-ahead bias. Extensive stress tests on A-share and U.S. markets reveal:...
|
| 250 |
Pathways to AGI
2605.06029
|
cs.AI
|
Gordon Fletcher, Saomai Vu Khan |
Our focus are five related questions that stem from a critical software studies perspective. Underpinning this view is the acknowledged need to avoid assumptions regarding the inevitability of the current situation relating to AI. What we need to see is the cl...Our focus are five related questions that stem from a critical software studies perspective. Underpinning this view is the acknowledged need to avoid assumptions regarding the inevitability of the current situation relating to AI. What we need to see is the closeness of the linkage between current commercial AI development and our prevailing social, political and economic circumstances. This does mean that the perspectives presented here are done so critically and conditionally. Most importantly...
|
| 259 |
Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning
2605.06040
|
cs.AIcs.CL
|
Leon Hamm, Zlatan Ajanovic |
Although advances such as chain-of-thought, tree-of-thought or reinforcement learning have improved the performance of LLMs in reasoning and planning tasks, they are still brittle and have not achieved human-level performance in many domains, and often suffer ...Although advances such as chain-of-thought, tree-of-thought or reinforcement learning have improved the performance of LLMs in reasoning and planning tasks, they are still brittle and have not achieved human-level performance in many domains, and often suffer from high time and token costs. Inspired by the success of width-based search in planning, we explore how the concept of novelty can be transferred to language domains and how it can improve tree-of-thought reasoning. A tree of thoughts rel...
|
| 269 |
Visual Fingerprints for LLM Generation Comparison
2605.06054
|
cs.AIcs.HC
|
Amal Alnouri, Andreas Hinterreiter, Christina Humer, Furui Cheng, Marc Streit |
Large language model (LLM) outputs arise from complex interactions among prompts, system instructions, model parameters, and architecture. We refer to specific configurations of these factors as generation conditions, each of which can bias outputs in various ...Large language model (LLM) outputs arise from complex interactions among prompts, system instructions, model parameters, and architecture. We refer to specific configurations of these factors as generation conditions, each of which can bias outputs in various ways. Understanding how different generation conditions shape model behaviors is essential for tasks such as prompt design and model evaluation, yet it remains challenging due to the stochastic and open-ended nature of text generation. We p...
|
| 278 |
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
2605.06068
|
cs.AIcs.DC
|
Keisuke Kamahori, Shihang Li, Simon Peter, Baris Kasikci |
For years, we have built LLM serving systems like any other critical infrastructure: a single general-purpose stack, hand-tuned over many engineer-years, meant to support every model and workload. In this paper, we take the opposite bet: a multi-agent loop tha...For years, we have built LLM serving systems like any other critical infrastructure: a single general-purpose stack, hand-tuned over many engineer-years, meant to support every model and workload. In this paper, we take the opposite bet: a multi-agent loop that automatically synthesizes bespoke serving systems for different usage scenarios. We propose VibeServe, the first agentic loop that generates entire LLM serving stacks end-to-end. VibeServe uses an outer loop to plan and track the search o...
|
| 291 |
Safety Certification is Classification
2605.06087
|
cs.AIeess.SY
|
Oliver Schön, Licio Romao, Sadegh Soudjani |
The goal of this paper is certifying safety of dynamical systems subject to uncertainty. Existing approaches use trajectory data to estimate transition probabilities, and compute safety probabilities recursively via dynamic programming (DP). This recursion may...The goal of this paper is certifying safety of dynamical systems subject to uncertainty. Existing approaches use trajectory data to estimate transition probabilities, and compute safety probabilities recursively via dynamic programming (DP). This recursion may lead to compounding errors in the certified safety probability, thus collapsing to a vacuous lower bound for growing horizons $T$. We propose a kernel embedding framework that treats safety certification as a classification problem on traj...
|
| 299 |
Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility
2605.06105
|
cs.AI
|
Jungsuk Oh, Hyeseo Jeon, Hyunjune Ji, Kyongmin Kong, Jay-Yoon Lee |
Long-context inference in decoder-only language models is costly because long prompts are processed during Prefill, cached at every layer, and repeatedly attended to during autoregressive Decode. We introduce \emph{Shallow Prefill, dEEp Decode} (SPEED), a phas...Long-context inference in decoder-only language models is costly because long prompts are processed during Prefill, cached at every layer, and repeatedly attended to during autoregressive Decode. We introduce \emph{Shallow Prefill, dEEp Decode} (SPEED), a phase-asymmetric KV-visibility policy that materializes non-anchor prompt-token KV states only in lower layers while keeping Decode-phase tokens full-depth. Unlike previous approaches that make upper-layer prompt KV states cheaper to store or c...
|
| 301 |
On Time, Within Budget: Constraint-Driven Online Resource Allocation for Agentic Workflows
2605.06110
|
cs.AIcs.CL
|
Xinglin Wang, Zishen Liu, Shaoxiong Feng, Peiwen Yuan, Yiwei Li |
Agentic systems increasingly solve complex user requests by executing orchestrated workflows, where subtasks are assigned to specialized models or tools and coordinated according to their dependencies. While recent work improves agent efficiency by optimizing ...Agentic systems increasingly solve complex user requests by executing orchestrated workflows, where subtasks are assigned to specialized models or tools and coordinated according to their dependencies. While recent work improves agent efficiency by optimizing the performance--cost--latency frontier, real deployments often impose concrete requirements: a workflow must be completed within a specified budget and before a specified deadline. This shifts the goal from average efficiency optimization ...
|
| 305 |
CrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMs
2605.06115
|
cs.AI
|
Zhen Zeng, Leijiang Gu, Feng Li, Jing Yu, Zenglin Shi |
Multimodal Large Language Models (MLLMs), trained primarily on English-centric data, frequently generate culturally inappropriate or misaligned responses in cross-cultural settings. To mitigate this, we introduce the task of cross-cultural knowledge insertion,...Multimodal Large Language Models (MLLMs), trained primarily on English-centric data, frequently generate culturally inappropriate or misaligned responses in cross-cultural settings. To mitigate this, we introduce the task of cross-cultural knowledge insertion, which focuses on adapting models to specific cultural contexts while preserving their original behavior in other cultures. To facilitate research in this area, we introduce CrossCult-KIBench, a comprehensive evaluation benchmark for assess...
|
| 306 |
Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning
2605.06116
|
cs.AI
|
Wenwen Si, Insup Lee, Osbert Bastani |
Inference-time computation has greatly enhanced the performance of large language models (LLMs) on challenging reasoning tasks, but this strategy can incur high inference costs. One solution is to route intermediate chain-of-thought (CoT) states to language mo...Inference-time computation has greatly enhanced the performance of large language models (LLMs) on challenging reasoning tasks, but this strategy can incur high inference costs. One solution is to route intermediate chain-of-thought (CoT) states to language models of different sizes; however, existing approaches rely on handcrafted routing strategies that limit performance, or on training large process reward models that may be infeasible in many applications. We formulate stepwise model routing...
|
| 309 |
Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs
2605.06123
|
cs.AI
|
Nguyen Viet Tuan Kiet, Bui Dinh Pham, Dao Van Tung, Tran Cong Dao, Huynh Thi Thanh Binh |
Large language models (LLMs) have recently advanced automatic heuristic design (AHD) for combinatorial optimization (CO), where candidate heuristics are iteratively proposed, evaluated, and refined. Most existing approaches search over executable programs and ...Large language models (LLMs) have recently advanced automatic heuristic design (AHD) for combinatorial optimization (CO), where candidate heuristics are iteratively proposed, evaluated, and refined. Most existing approaches search over executable programs and distill insights from execution feedback to guide later iterations. Because this process moves from low-level implementations to high-level principles, we refer to it as a bottom-up paradigm. We argue that this view is incomplete and introd...
|
| 310 |
P-Guide: Parameter-Efficient Prior Steering for Single-Pass CFG Inference
2605.06124
|
cs.AI
|
Xin Peng, Ang Gao |
Classifier-Free Guidance (CFG) is essential for high-fidelity conditional generation in flow matching, yet it imposes significant computational overhead by requiring dual forward passes at each sampling step. In this work, we address this bottleneck by introdu...Classifier-Free Guidance (CFG) is essential for high-fidelity conditional generation in flow matching, yet it imposes significant computational overhead by requiring dual forward passes at each sampling step. In this work, we address this bottleneck by introducing \textbf{P-Guide}, a framework that achieves high-quality guidance through a single inference pass by modulating only the initial latent state. We further show that, under a first-order approximation, P-Guide is equivalent to CFG in the...
|
| 312 |
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
2605.06130
|
cs.AI
|
Yaorui Shi, Yuxin Chen, Zhengxi Lu, Yuchun Miao, Shugui Liu |
A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from ...A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from experience. Existing methods optimize these capabilities in isolation or with separate reward sources, resulting in partial and conflicting evolution. We propose Skill1, a framework that trains a single policy to co-evolve skill selection, ...
|
| 327 |
Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models
2605.06154
|
cs.AIcs.LG
|
Kossi Amouzouvi, Robert Wardenga, Jens Lehmann, Sahar Vahdati |
Foundation models excel at language, where sentences become tokens, and vision, where images become pixels, because both reduce to discrete symbols on a shared, fixed grid. Knowledge Graphs share the discreteness, but not the geometry. Their entities and relat...Foundation models excel at language, where sentences become tokens, and vision, where images become pixels, because both reduce to discrete symbols on a shared, fixed grid. Knowledge Graphs share the discreteness, but not the geometry. Their entities and relations are discrete symbols, yet their arrangement is relational and lacks a common, fixed grid. Knowledge Graphs (KGs) share the discreteness, but not the geometry. They form irregular, non-Euclidean topologies whose local neighborhoods diff...
|
| 330 |
Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges
2605.06161
|
cs.AIcs.SE
|
Shihao Weng, Yang Feng, Xiaofei Xie |
LLM-as-a-Judge pipelines have become the de facto evaluator for agent safety, yet existing benchmarks treat their verdicts as ground-truth proxies without checking whether the verdicts depend on the agent's behavior or merely on how the evaluation policy happe...LLM-as-a-Judge pipelines have become the de facto evaluator for agent safety, yet existing benchmarks treat their verdicts as ground-truth proxies without checking whether the verdicts depend on the agent's behavior or merely on how the evaluation policy happens to be worded. We argue that any trustworthy safety judge must satisfy a basic property we call policy invariance, and we operationalize it as three testable principles: rubric-semantics invariance under certified-equivalent rewrites, rub...
|
| 332 |
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
2605.06165
|
cs.AI
|
Richmond Sin Jing Xuan, Rishabh Bhardwaj, Soujanya Poria |
As the widespread adoption of Large Language Models (LLMs) accelerates, token consumption from intermediate reasoning traces increasingly contributes to inference latency and operational cost. Recent studies suggest that many real-world tasks require little to...As the widespread adoption of Large Language Models (LLMs) accelerates, token consumption from intermediate reasoning traces increasingly contributes to inference latency and operational cost. Recent studies suggest that many real-world tasks require little to no explicit reasoning, with additional reasoning sometimes even degrading performance. In this work, we propose \textbf{Post-Reasoning}, a simple yet effective approach that improves instruction-tuned models by conditioning them to justify...
|
| 338 |
BioMedArena: An Open-source Toolkit for Building and Evaluating Biomedical Deep Research Agents
2605.06177
|
cs.AI
|
Jinge Wu, Hongjian Zhou, Mingde Zeng, Jiayuan Zhu, Junde Wu |
Building a deep research agent today is an exercise in glue code: the same backbone evaluated on the same benchmark can report different accuracies in different papers because harness and tool registry all differ, and integrating a new foundation model into a ...Building a deep research agent today is an exercise in glue code: the same backbone evaluated on the same benchmark can report different accuracies in different papers because harness and tool registry all differ, and integrating a new foundation model into a comparable evaluation surface costs weeks of model-specific engineering. We call this the per-paper engineering tax and release BioMedArena, an open-source toolkit that not only alleviates it but also provides an arena for fair comparison o...
|
| 341 |
Rethinking Adapter Placement: A Dominant Adaptation Module Perspective
2605.06183
|
cs.AIcs.CLcs.LG
|
Suoxin Zhang, Run He, Di Fang, Xiang Tan, Kaixuan Chen |
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method that places trainable low-rank adapters into frozen pre-trained models. Recent studies show that using fewer LoRA adapters may still maintain or even improve performance, but ex...Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method that places trainable low-rank adapters into frozen pre-trained models. Recent studies show that using fewer LoRA adapters may still maintain or even improve performance, but existing methods still distribute adapters broadly, leaving where to place a limited number of adapters to maximize performance largely open. To investigate this, we introduce PAGE (Projected Adapter Gradient Energy), a gradient-based sensiti...
|
| 343 |
Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios
2605.06185
|
cs.AIcs.CV
|
Peizheng Yan, Yu Zhao, Liang Xie, Juntong Qi, Mingming Wang |
Recent large vision-language models have achieved strong performance on short- and medium-length video understanding, yet they remain inadequate for ultra-long or even infinite video reasoning, where models must preserve coherent memory over extended durations...Recent large vision-language models have achieved strong performance on short- and medium-length video understanding, yet they remain inadequate for ultra-long or even infinite video reasoning, where models must preserve coherent memory over extended durations and infer causal dependencies across temporally distant events. Existing end-to-end video understanding methods are fundamentally limited by the $O(n^2)$ complexity of self-attention, while recent retrieval-augmented generation (RAG) appro...
|
| 346 |
OPSD Compresses What RLVR Teaches: A Post-RL Compaction Stage for Reasoning Models
2605.06188
|
cs.AIcs.CL
|
Jaehoon Kim, Dongha Lee |
On-Policy Self-Distillation (OPSD) has recently emerged as an alternative to Reinforcement Learning with Verifiable Rewards (RLVR), promising higher accuracy and shorter responses through token-level credit assignment from a self-teacher conditioned on privile...On-Policy Self-Distillation (OPSD) has recently emerged as an alternative to Reinforcement Learning with Verifiable Rewards (RLVR), promising higher accuracy and shorter responses through token-level credit assignment from a self-teacher conditioned on privileged context. However, this promise does not carry over to thinking-enabled mathematical reasoning, where reported accuracy gains shrink and sometimes turn negative. We hypothesize that hindsight supervision can specify better token-level al...
|
| 349 |
Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction
2605.06191
|
cs.AI
|
Shivali Dalmia, Ananya Mantravadi, Prasanna Desikan |
The work in this paper evaluates zero-shot and few-shot large language models (LLMs) for safety-critical clinical action extraction using the CLIP discharge-note dataset, with particular emphasis on transitions of care and post-discharge patient safety. To man...The work in this paper evaluates zero-shot and few-shot large language models (LLMs) for safety-critical clinical action extraction using the CLIP discharge-note dataset, with particular emphasis on transitions of care and post-discharge patient safety. To manage the complexity of clinical documentation, we introduce a two-stage extraction framework that decomposes discharge notes, that are written in narrative form, into fine-grained, explicitly actionable clinical tasks through a staged prompt...
|
| 351 |
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models
2605.06196
|
cs.AIcs.CL
|
Chonghan Qin, Xiachong Feng, Ziyun Song, Xiaocheng Feng, Jing Xiong |
Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to ...Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning. We show that they do. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B, this axis aligns with the p...
|
| 354 |
Towards Annotation-Free Validation of MLLMs: A Vision-Language Logical Consistency Metric
2605.06201
|
cs.AI
|
Ying Gu, Mei Chee Leong, Hui Li Tan, Shangbo Mao, Liyuan Li |
Dominant accuracy evaluation might reward unwarranted guessing of Large Language Models, and it might not be applicable to novel tasks for model validation without ground-truth (gt) annotation. Based on basic logic principle, we propose a novel framework to ev...Dominant accuracy evaluation might reward unwarranted guessing of Large Language Models, and it might not be applicable to novel tasks for model validation without ground-truth (gt) annotation. Based on basic logic principle, we propose a novel framework to evaluate the vision-language logical consistency of MLLMs on both sufficient and necessary cause-effect relations. We define Vision-Language Logical Consistency Metric (VL-LCM) on traditional MC-VQA tests, and recent NaturalBench tests withou...
|
| 364 |
Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models
2605.06213
|
cs.AI
|
Haoxiang Wang, Da Yu, Huishuai Zhang |
Evaluating large language models (LLMs) today rests on fixed benchmarks that apply the same set of items to any model, producing ceiling and floor effects that mask capability gaps. We argue that the most informative evaluation signal lies at the boundary, whe...Evaluating large language models (LLMs) today rests on fixed benchmarks that apply the same set of items to any model, producing ceiling and floor effects that mask capability gaps. We argue that the most informative evaluation signal lies at the boundary, where the per-prompt pass probability is near $0.5$ under random-sampling decoding, and propose Dynamic Boundary Evaluation (DBE), which actively locates each model's boundary and places it on a globally comparable difficulty scale. DBE delive...
|
| 369 |
Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization
2605.06219
|
cs.AI
|
Yunzhen Yao, Hongye Wang, Yahong Wang, Michael C. Gastpar, Bo Jiang |
This paper studies test-time aggregation, an approach that generates multiple reasoning traces and aggregates them into a final answer. Most existing methods rely on evaluation signals collected from candidate traces in isolation or answer frequencies, while i...This paper studies test-time aggregation, an approach that generates multiple reasoning traces and aggregates them into a final answer. Most existing methods rely on evaluation signals collected from candidate traces in isolation or answer frequencies, while ignoring comparative interactions among candidates. We propose Joint Consistency (JC), formulated as a constrained Ising-type energy minimization problem, where independent evaluation signals act as external fields and pairwise comparisons a...
|
| 372 |
Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries
2605.06223
|
cs.AIcs.RO
|
Junhyuk Kwon, Seungjoon Lee, Hyejin Park, Kyle Min, Jungseul Ok |
Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target fro...Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collec...
|
| 374 |
A Versatile AI Agent for Rare Disease Diagnosis and Risk Gene Prioritization
2605.06226
|
cs.AIq-bio.GN
|
Tianyu Liu, Wangjie Zheng, Rui Yang, Benny Kai Guo Loo, Hui Zhang |
Accurate and timely diagnosis is essential for effective treatment, particularly in the context of rare diseases. However, current diagnostic workflows often lead to prolonged assessment times and low accuracy. To address these limitations, we introduce Hygiei...Accurate and timely diagnosis is essential for effective treatment, particularly in the context of rare diseases. However, current diagnostic workflows often lead to prolonged assessment times and low accuracy. To address these limitations, we introduce Hygieia, a multi-modal AI agent system designed to support precision disease diagnosis by integrating diverse data sources, including phenotypic features, genetic profiles, and clinical records. Hygieia features a router-based and knowledge-enhan...
|
| 375 |
Price of Fairness in Short-Term and Long-Term Algorithmic Selections
2605.06227
|
cs.AI
|
Shahin Jabbari, Chen Wang |
Algorithmic decision-making in high-stakes settings can have profound impacts on individuals and populations. While much prior work studies fairness in static settings, recent results show that enforcing static fairness constraints may exacerbate long-run disp...Algorithmic decision-making in high-stakes settings can have profound impacts on individuals and populations. While much prior work studies fairness in static settings, recent results show that enforcing static fairness constraints may exacerbate long-run disparities. Motivated by this tension, we study a stylized sequential selection problem in which a decision-maker repeatedly selects individuals, affecting both immediate utility and the population distribution over time. We introduce notions ...
|
| 378 |
Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence
2605.06230
|
cs.AIcs.DC
|
Xinquan Chen, Zhenyun Yin, Shan He, Bin Huang, Shanzhe Lei |
As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data ...As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data management, and agent evolution, making it difficult to discover risks systematically and improve models in a continuous closed loop. In this report, we present \textbf{Safactory}, a scalable agent factory for trustworthy autonomous intelli...
|
| 408 |
Data Language Models: A New Foundation Model Class for Tabular Data
2605.06290
|
cs.AI
|
Eda Erol, Giuliano Pezzoli, Ozer Cem Kelahmet |
Every major data modality now has a foundation model that understands it natively: text has language models, images have vision models, audio has audio models. Tabular data, the modality on which many consequential real-world AI decisions are made, does not. E...Every major data modality now has a foundation model that understands it natively: text has language models, images have vision models, audio has audio models. Tabular data, the modality on which many consequential real-world AI decisions are made, does not. Every approach to tabular AI today, from gradient-boosted trees to the latest tabular foundation models, requires a preprocessing pipeline before any model can consume the data. None of them understand tabular data as a modality. We introduc...
|
| 414 |
Addressing Labelled Data Scarcity: Taxonomy-Agnostic Annotation of PII Values in HTTP Traffic using LLMs
2605.06305
|
cs.AIcs.IR
|
Thomas Cory, Axel Küpper |
Automated privacy audits of web and mobile applications often analyse outbound HTTP traffic to detect Personally Identifiable Information (PII) leakage. However, existing learning-based detectors typically depend on scarce, manually labelled traffic and are ti...Automated privacy audits of web and mobile applications often analyse outbound HTTP traffic to detect Personally Identifiable Information (PII) leakage. However, existing learning-based detectors typically depend on scarce, manually labelled traffic and are tightly coupled to fixed label taxonomies, limiting transferability across domains and evolving definitions of PII. This paper investigates whether Large Language Models (LLMs) can support taxonomy-agnostic annotation of explicitly transmitte...
|
| 416 |
Measuring Black-Box Confidence via Reasoning Trajectories: Geometry, Coverage, and Verbalization
2605.06308
|
cs.AI
|
Marc Boubnovski Martell, Josefa Lia Stoisser, Kaspar Märtens, Jialin Yu, Robert Kitchen |
Reliable confidence estimation enables safe deployment of chain-of-thought (CoT) reasoning through text-only APIs. Yet the dominant black-box baseline, self-consistency over K samples, is linearly expensive and ignores the geometry of the trace. We propose a b...Reliable confidence estimation enables safe deployment of chain-of-thought (CoT) reasoning through text-only APIs. Yet the dominant black-box baseline, self-consistency over K samples, is linearly expensive and ignores the geometry of the trace. We propose a black-box trajectory-confidence score: we embed a CoT as a sliding-window trajectory and measure its convergence to external answer anchors with a one-parameter softmax. The method needs no logits, hidden states, or supervised calibrators. A...
|
| 436 |
A Regime Theory of Controller Class Selection for LLM Action Decisions
2605.06339
|
cs.AI
|
Zhaoyang Jiang, Zhizhong Fu, Yunsoo Kim, Jiacong Mi, Zicheng Li |
Deployed language and vision-language models must decide, on each input, whether to answer directly, retrieve evidence, defer to a stronger model, or abstain. Contrary to the common monotonicity intuition, greater per-input expressivity is not uniformly benefi...Deployed language and vision-language models must decide, on each input, whether to answer directly, retrieve evidence, defer to a stronger model, or abstain. Contrary to the common monotonicity intuition, greater per-input expressivity is not uniformly beneficial in finite samples: under identical strict cross-validation, different benchmarks prefer different controller classes. This reflects a finite-sample limitation of instance-level uncertainty signals, which can be exhausted at a distribut...
|
| 441 |
Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models
2605.06343
|
cs.AI
|
Alex O. Davies, Telmo de Menezes e Silva Filho, Nirav Ajmeri |
Tabular foundation models are pre-trained on one of three classes of corpus: curated datasets drawn from benchmark repositories, tables harvested at scale from the web, or synthetic tables sampled from a parametric generative prior. Despite the centrality of p...Tabular foundation models are pre-trained on one of three classes of corpus: curated datasets drawn from benchmark repositories, tables harvested at scale from the web, or synthetic tables sampled from a parametric generative prior. Despite the centrality of pre-training data to model performance, little is known about how these corpora relate to one another in distribution, and the impact this has on downstream performance. In this work we take three canonical, archetypal datasets used to train...
|
| 442 |
More Than Can Be Said: A Benchmark and Framework for Pre-Question Scientific Ideation
2605.06345
|
cs.AI
|
Jie Yu, Song Qiu |
AI research agents have shown strong potential in automating literature search and manuscript refinement, yet most assume a clear and actionable initial input, operating only after a research question has been made explicit. In contrast, human research often b...AI research agents have shown strong potential in automating literature search and manuscript refinement, yet most assume a clear and actionable initial input, operating only after a research question has been made explicit. In contrast, human research often begins with tacit friction, a sense of misalignment before a question can be formed. We introduce InciteResearch, a multi-agent framework designed to make a researcher's implicit understanding explicit, inspectable, and actionable. InciteRes...
|
| 443 |
Prediction and Empowerment: A Theory of Agency through Bridge Interfaces
2605.06346
|
cs.AI
|
Richard Csaky |
We study agency under partial observability in deterministic physical or simulated worlds, where apparent randomness arises from uncertainty over initial conditions, fixed law bits, and unrolled exogenous noise. We model sensing and actuation as bridge interfa...We study agency under partial observability in deterministic physical or simulated worlds, where apparent randomness arises from uncertainty over initial conditions, fixed law bits, and unrolled exogenous noise. We model sensing and actuation as bridge interfaces split between agent-controlled parameters and environment-controlled channel state, inducing a deterministic POMDP through a prior over latent microstates and many-to-one observation coarsening. Within this framework, we prove a separat...
|
| 454 |
From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work
2605.06365
|
cs.AIcs.MAcs.SE
|
Josh Rosen, Seth Rosen |
Large language model systems are increasingly deployed as agentic workflows that interleave reasoning, tool use, memory, and iterative refinement. These systems are effective at producing answers, but they often rely on implicit conversational state, making it...Large language model systems are increasingly deployed as agentic workflows that interleave reasoning, tool use, memory, and iterative refinement. These systems are effective at producing answers, but they often rely on implicit conversational state, making it difficult to preserve stable work products, isolate irrelevant updates, or propagate changes through intermediate artifacts. We introduce execution lineage: an execution model in which AI-native work is represented as a directed acyclic ...
|
| 458 |
Debiased Multimodal Personality Understanding through Dual Causal Intervention
2605.06371
|
cs.AI
|
Yangfu Zhu, Zitong Han, Nianwen Ning, Yuting Wei, Yuandong Wang |
Multimodalpersonalityunderstandingplaysacriticalroleinhuman centered artificial intelligence. Previous work mainly focus on learn-ing rich multimodal representations for video personality under standing. However, they often suffer from potential harm caused by...Multimodalpersonalityunderstandingplaysacriticalroleinhuman centered artificial intelligence. Previous work mainly focus on learn-ing rich multimodal representations for video personality under standing. However, they often suffer from potential harm caused by subject bias (e.g., observable age and unobservable mental states), as subjects originate from diverse demographic backgrounds. Learn ing such spurious associations between multimodal features and traits may lead to unfair personality unde...
|
| 464 |
Rethinking Vacuity for OOD Detection in Evidential Deep Learning
2605.06382
|
cs.AI
|
Claire McNamara |
Vacuity, or Uncertainty Mass (UM), is commonly used as a metric to evaluate Out-of-Distribution (OOD) detection in Evidential Deep Learning (EDL). It generally involves dividing the number of classes ($K$) by the total strength of belief ($S$) of the model's p...Vacuity, or Uncertainty Mass (UM), is commonly used as a metric to evaluate Out-of-Distribution (OOD) detection in Evidential Deep Learning (EDL). It generally involves dividing the number of classes ($K$) by the total strength of belief ($S$) of the model's predictions, where $S$ is derived from summing the Dirichlet parameters. As such, UM is sensitive to the cardinality of $K$. In particular, it is unlikely in practice that there is a linear relationship between $K$ and $S$ as $K$ and $S$ inc...
|
| 470 |
Automated alignment is harder than you think
2605.06390
|
cs.AI
|
Aleksandr Bowkis, Marie Davidsen Buhl, Jacob Pfau, Geoffrey Irving |
A leading proposal for aligning artificial superintelligence (ASI) is to use AI agents to automate an increasing fraction of alignment research as capabilities improve. We argue that, even when research agents are not scheming to deliberately sabotage alignmen...A leading proposal for aligning artificial superintelligence (ASI) is to use AI agents to automate an increasing fraction of alignment research as capabilities improve. We argue that, even when research agents are not scheming to deliberately sabotage alignment work, this plan could produce compelling but catastrophically misleading safety assessments resulting in the unintentional deployment of misaligned AI. This could happen because alignment research involves many hard-to-supervise fuzzy tas...
|
| 483 |
Knowledge Graphs, the Missing Link in Agentic AI-based Formal Verification
2605.06434
|
cs.AI
|
Vaisakh Naduvodi Viswambharan, Keerthan Kopparam Radhakrishna, Deepak Narayan Gadde, Aman Kumar |
Recent advances in Large Language Models (LLMs) have enabled workflows that generate SystemVerilog Assertions (SVAs) from natural-language specifications, with the potential to accelerate Formal Verification (FV). However, high-quality assertion synthesis rema...Recent advances in Large Language Models (LLMs) have enabled workflows that generate SystemVerilog Assertions (SVAs) from natural-language specifications, with the potential to accelerate Formal Verification (FV). However, high-quality assertion synthesis remains challenging because specifications are often ambiguous or incomplete and critical micro-architectural details reside in the Register Transfer Level (RTL). Many existing approaches treat the specification and RTL as loosely structured te...
|
| 488 |
SCRuB: Social Concept Reasoning under Rubric-Based Evaluation
2605.06444
|
cs.AI
|
Jamelle Watson-Daniels, Himaghna Bhattacharjee, Skyler Wang, Brandon Handoko, Antonio Li |
While many studies of Large Language Model (LLM) reasoning capabilities emphasize mathematical or technical tasks, few address reasoning about social concepts: the abstract ideas shaping social norms, culture, and institutions. This understudied capability is ...While many studies of Large Language Model (LLM) reasoning capabilities emphasize mathematical or technical tasks, few address reasoning about social concepts: the abstract ideas shaping social norms, culture, and institutions. This understudied capability is essential for modern models acting as social agents, yet no systematic evaluation methodology targets it. We introduce SCRuB (Social Concept Reasoning under Rubric-Based Evaluation), a framework designed for this setting of task indetermina...
|
| 494 |
PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors
2605.06455
|
cs.AI
|
Xinmiao Huang, Jinwei Hu, Rajarshi Roy, Changshun Wu, Yi Dong |
Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and...Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. We introduce PrefixGuard, a trace-to-monitor framework with an offline StepView induction step followed by supervised monitor training. StepView induces deterministic typed-step adapters from raw trac...
|
| 495 |
Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems
2605.06457
|
cs.AI
|
Donghao Huang, Joon Kiat Chua, Zhaoxia Wang |
LLM-based multi-agent systems are increasingly deployed for payment workflows, yet prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions. We introduce the Agentic Success Rate (...LLM-based multi-agent systems are increasingly deployed for payment workflows, yet prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions. We introduce the Agentic Success Rate (ASR), a trajectory-fidelity metric that compares observed and expected agent execution sequences at the transition level, decomposing performance into Transition Recall and Transition Precision. Applied to the Hierarchical Multi-Agent Syste...
|
| 505 |
Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
2605.06475
|
cs.AI
|
Ranjith Chodavarapu |
We introduce a probabilistic approach for dating historical manuscript pages from visual features alone. Instead of aggregating centuries into classes as is standard in the previous literature, we pose dating as an evidential deep regression problem over a con...We introduce a probabilistic approach for dating historical manuscript pages from visual features alone. Instead of aggregating centuries into classes as is standard in the previous literature, we pose dating as an evidential deep regression problem over a continuous year axis, allowing our neural network to output a full predictive distribution with decomposed aleatoric and epistemic uncertainty in a single forward pass. Our architecture combines an EfficientNet-B2 backbone with a Normal-Invers...
|
| 509 |
Patch-Effect Graph Kernels for LLM Interpretability
2605.06480
|
cs.AIcs.CL
|
Ruben Fernandez-Boullon, David N. Olivieri |
Mechanistic interpretability aims to reverse-engineer transformer computations by identifying causal circuits through activation patching. However, scaling these interventions across diverse prompts and task families produces high-dimensional, unstructured dat...Mechanistic interpretability aims to reverse-engineer transformer computations by identifying causal circuits through activation patching. However, scaling these interventions across diverse prompts and task families produces high-dimensional, unstructured datasets that are difficult to compare systematically. We propose a framework that reframes mechanistic analysis as a graph machine-learning problem by representing activation-patching profiles as patch-effect graphs over model components. We ...
|
| 510 |
ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning
2605.06483
|
cs.AI
|
Bowen Ye, Zhijian Li, Junyue Huang, Junkai Ma, Xiang Yin |
Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practi...Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practice, however, users often express their requirements in natural language rather than in structured STL formulas, making natural-language-to-STL translation a critical yet challenging task. Manual specification requires temporal-logic experti...
|
| 514 |
Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors
2605.06490
|
cs.AIcs.CY
|
Jonas Wiedermann-Möller, Leonard Dung, Maksym Andriushchenko |
AI systems have become increasingly capable of dangerous behaviours in many domains. This raises the question: Do models sometimes choose to violate human instructions in order to perform behaviour that is more useful for certain goals? We introduce a benchmar...AI systems have become increasingly capable of dangerous behaviours in many domains. This raises the question: Do models sometimes choose to violate human instructions in order to perform behaviour that is more useful for certain goals? We introduce a benchmark for measuring model propensity for instrumental convergence (IC) behaviour in terminal-based agents. This is behaviour such as self-preservation that has been hypothesised to play a key role in risks from highly capable AI agents. Our ben...
|
| 515 |
From Token Lists to Graph Motifs: Weisfeiler-Lehman Analysis of Sparse Autoencoder Features
2605.06494
|
cs.AI
|
Ruben Fernandez-Boullon, Pablo Magariños-Docampo, Javier Perez-Robles |
Sparse autoencoders (SAEs) have become central to mechanistic interpretability, decomposing transformer activations into monosemantic features. Yet existing analyses characterise features almost exclusively through top-activating token lists or decoder weight ...Sparse autoencoders (SAEs) have become central to mechanistic interpretability, decomposing transformer activations into monosemantic features. Yet existing analyses characterise features almost exclusively through top-activating token lists or decoder weight vectors, leaving the higher-order co-occurrence structure shared across features largely unexamined. We introduce a graph-structured representation in which each SAE feature is modelled as a token co-occurrence graph: nodes are the tokens m...
|
| 532 |
Process Matters more than Output for Distinguishing Humans from Machines
2605.06524
|
cs.AI
|
Milena Rmus, Mathew D. Hardy, Thomas L. Griffiths, Mayank Agrawal |
Reliable human-machine discrimination is becoming increasingly important as large language models and autonomous agents are deployed in online settings. Existing approaches evaluate whether a system can produce behavior or responses indistinguishable from thos...Reliable human-machine discrimination is becoming increasingly important as large language models and autonomous agents are deployed in online settings. Existing approaches evaluate whether a system can produce behavior or responses indistinguishable from those of a human, following the emphasis on outputs as a criterion for intelligence proposed by Alan Turing. Cognitive science offers an alternative perspective: evaluating the process by which behavior is produced. To test whether cognitive pr...
|
| 534 |
Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State
2605.06529
|
cs.AIcs.LG
|
Peiying Zhu, Sidi Chang |
Outcome metrics can certify the wrong behavior. We study this failure in a two-hotel revenue-management simulator where Hotel A trains an agent against a fixed rule-based revenue-management competitor, Hotel B. A standard learning agent can obtain near-referen...Outcome metrics can certify the wrong behavior. We study this failure in a two-hotel revenue-management simulator where Hotel A trains an agent against a fixed rule-based revenue-management competitor, Hotel B. A standard learning agent can obtain near-reference revenue per available room (RevPAR) while failing to learn market-like yield management: it sells too aggressively, undercuts, or collapses to modal price buckets. We diagnose this as a Goodhart-style failure under partial observability....
|
| 535 |
SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting
2605.06530
|
cs.AI
|
Ruiqi Lyu, Alistair Turcan, Bryan Wilder |
Accurate epidemic forecasting is crucial for public health response, resource allocation, and outbreak intervention, but remains difficult with sparse, noisy, and highly non-stationary data. Because epidemics unfold across interacting regions, spatiotemporal m...Accurate epidemic forecasting is crucial for public health response, resource allocation, and outbreak intervention, but remains difficult with sparse, noisy, and highly non-stationary data. Because epidemics unfold across interacting regions, spatiotemporal methods are natural candidates for improving forecasts. Despite growing interest in spatial information, no standardized benchmark exists, and current evaluations often use simple chronological train-test splits that do not reflect real-time...
|
| 540 |
Ex Ante Evaluation of AI-Induced Idea Diversity Collapse
2605.06540
|
cs.AIcs.GT
|
Nafis Saami Azad, Raiyan Abdul Baten |
Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. This creates an evaluation blind spot, as AI can improve individual out...Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding. We introduce a human-relative framework for benchmarking AI-induced human diversity collapse without requiring human-AI interaction data, providing an ex ante protocol to estimate crowding ri...
|
| 559 |
Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline
2605.06583
|
cs.AI
|
Zhengyi Guo, Jiayuan Sheng, David D. Yao, Wenpin Tang |
We propose a deterministic adjoint matching framework that formulates human preference alignment for flow-based generative models as an optimal control problem over velocity fields. One can directly regress the control toward a value-gradient-induced target un...We propose a deterministic adjoint matching framework that formulates human preference alignment for flow-based generative models as an optimal control problem over velocity fields. One can directly regress the control toward a value-gradient-induced target under the current policy, leading to a simple and stable training objective. Building on this perspective, we introduce a truncated adjoint scheme that focuses computation on the terminal portion of the trajectory, where reward-relevant signa...
|
| 560 |
NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research
2605.06584
|
cs.AI
|
Lujia Zhong, Yihao Xia, Jianwei Zhang, Shuo huang, Jiaxin Yue |
Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and ...Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and disease classification commonly require task-specific code, evaluation protocols, and data-format conventions, creating additional barriers between raw acquisitions and reproducible scientific analysis. We present NeuroAgent, an LLM-driven ...
|
| 563 |
Weblica: Scalable and Reproducible Training Environments for Visual Web Agents
2605.06761
|
cs.AIcs.CVcs.LG
|
Oğuzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan |
The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environme...The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stabl...
|
| 580 |
SkillOS: Learning Skill Curation for Self-Evolving Agents
2605.06614
|
cs.AIcs.CL
|
Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang |
LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-...LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex ...
|
| 584 |
MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems
2605.06623
|
cs.AIcs.CLcs.LGcs.MA
|
Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang |
Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them acr...Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iterati...
|
| 594 |
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
2605.06638
|
cs.AIcs.CL
|
Tianle Wang, Zhaoyang Wang, Guangchen Lan, Xinpeng Wei, Sipeng Zhang |
Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a sy...Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that offers independent control over two axes of difficulty: the depth of the required proof planning (i.e., the horizon) and the expressiveness of the underlying logic. Our proposed framework supports a ...
|
| 597 |
GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation
2605.06641
|
cs.AIcs.CV
|
Ziyu Zhai, Siyou Li, Juexi Shao, Juntao Yu |
Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale data...Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale datasets required to train these models. We propose GlazyBench, the first dataset for AI-assisted glaze design. Comprising 23,148 real glaze formulations, GlazyBench supports two primary tasks: predicting post-firing surface properties, such as...
|
| cs.AR 2 papers | ||||
| 196 |
LLM-Driven Design Space Exploration of FPGA-based Accelerators
2605.05920
|
cs.ARcs.AIcs.PF
|
Vinamra Sharma, Xingjian Fu, Jude Haris, José Cano |
Designing field-programmable gate array (FPGA)-based accelerators for modern artificial intelligence workloads requires navigating a large and complex hardware design space encompassing architectural parameters, dataflow strategies, and memory hierarchies, mak...Designing field-programmable gate array (FPGA)-based accelerators for modern artificial intelligence workloads requires navigating a large and complex hardware design space encompassing architectural parameters, dataflow strategies, and memory hierarchies, making the process time-consuming and resource-intensive. While the SECDA methodology enables rapid hardware-software co-design of accelerators through SystemC simulation and FPGA execution, identifying optimal accelerator configurations still...
|
| 286 |
PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs
2605.06082
|
cs.ARcs.LGcs.PF
|
Rappy Saha, Jude Haris, Nicolas Bohm Agostini, David Kaeli, José Cano |
Power-of-two (PoT) quantization significantly reduces the size of deep neural networks (DNNs) and replaces multiplications with bit-shift operations for inference. Prior work has shown that PoT-quantized DNNs can preserve accuracy for tasks such as image class...Power-of-two (PoT) quantization significantly reduces the size of deep neural networks (DNNs) and replaces multiplications with bit-shift operations for inference. Prior work has shown that PoT-quantized DNNs can preserve accuracy for tasks such as image classification; however, their performance on resource-constrained edge devices remains insufficiently understood. While general-purpose edge CPUs and GPUs do not provide optimized backends for bit-shift operations, custom hardware accelerators ...
|
| cs.CE 1 papers | ||||
| 2 |
Discrete Elastic Ribbons: A Unified Discrete Differential Geometry Framework for One-Dimensional Energy Models
2605.05529
|
cs.CEcs.GRcs.LG
|
Shivam Kumar Panda, M Khalid Jawed |
Elastic ribbons, slender structures whose length ($L$), width ($W$), and thickness ($b$) satisfy $L \gg W \gg b$, exhibit mechanical behaviors intermediate between one-dimensional rods ($L \gg W, b$) and two-dimensional plates ($L, W \gg b$). In quadratic Kirc...Elastic ribbons, slender structures whose length ($L$), width ($W$), and thickness ($b$) satisfy $L \gg W \gg b$, exhibit mechanical behaviors intermediate between one-dimensional rods ($L \gg W, b$) and two-dimensional plates ($L, W \gg b$). In quadratic Kirchhoff-type rod-based frameworks, such as Discrete Elastic Rods (DER), the governing equilibrium equations are independent of width, and therefore these models cannot capture width-dependent mechanical effects. Reduced centerline-based ribbo...
|
| cs.CL 58 papers | ||||
| 5 |
A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction
2605.05532
|
cs.CLcs.CY
|
Nicole Lincoln, Nick Whitehouse, Jaron Mar, Rivindu Perera |
This paper evaluates whether a domain trained Small Language Model (SLM) can outperform frontier Large Language Models on structured contract extraction at radically lower cost. We test Olava Extract, a self hosted legal domain Mixture of Experts model, agains...This paper evaluates whether a domain trained Small Language Model (SLM) can outperform frontier Large Language Models on structured contract extraction at radically lower cost. We test Olava Extract, a self hosted legal domain Mixture of Experts model, against five frontier models. Olava Extract achieved the strongest aggregate performance in the study, with a macro F1 of 0.812 and a micro F1 of 0.842, while reducing inference cost by 78% to 97% compared with the frontier models tested. It al...
|
| 34 |
The Cost of Context: Mitigating Textual Bias in Multimodal Retrieval-Augmented Generation
2605.05594
|
cs.CLcs.CVcs.LG
|
Hoin Jung, Xiaoqian Wang |
While Multimodal Large Language Models (MLLMs) are increasingly integrated with Retrieval-Augmented Generation (RAG) to mitigate hallucinations, the introduction of external documents can conceal severe failure modes at the instance level. We identify and form...While Multimodal Large Language Models (MLLMs) are increasingly integrated with Retrieval-Augmented Generation (RAG) to mitigate hallucinations, the introduction of external documents can conceal severe failure modes at the instance level. We identify and formalize the phenomenon of recorruption, where the introduction of even perfectly accurate "oracle" context causes a capable model to abandon an initially correct prediction. Through a mechanistic diagnosis of internal attention matrices, we s...
|
| 44 |
When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models
2605.05626
|
cs.CLcs.AI
|
Vihaan Nama, Shreya Mendi, Zian Ye, Brinnae Bent |
Large Language Models (LLMs) excel at generating contextually appropriate responses but remain poorly calibrated for multi-party conversations, where deciding when to speak is as critical as what to say. In such settings, naively responding at every turn leads...Large Language Models (LLMs) excel at generating contextually appropriate responses but remain poorly calibrated for multi-party conversations, where deciding when to speak is as critical as what to say. In such settings, naively responding at every turn leads to excessive interruptions and degraded conversational coherence. We introduce When2Speak, a grounded synthetic dataset and four-stage generation pipeline for learning intervention timing in group interactions. The dataset comprises over 2...
|
| 47 |
One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue
2605.05630
|
cs.CLcs.AIcs.CR
|
Xinjie Shen, Rongzhe Wei, Peizhi Niu, Haoyu Wang, Ruihan Wu |
Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful objective in a single prompt, increasingly capable attackers can distribute their intent across multiple benign-looki...Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful objective in a single prompt, increasingly capable attackers can distribute their intent across multiple benign-looking turns. Recent studies show that even modern commercial models with advanced guardrails remain vulnerable to such attacks despite advances in safety alignment and external guardrails. In this work, we address this challenge by detecting t...
|
| 58 |
Negative Before Positive: Asymmetric Valence Processing in Large Language Models
2605.05653
|
cs.CL
|
Sohan Venkatesh |
Mechanistic interpretability has revealed how concepts are encoded in large language models (LLMs), but emotional content remains poorly understood at the mechanistic level. We study whether LLMs process emotional valence through dedicated internal structure o...Mechanistic interpretability has revealed how concepts are encoded in large language models (LLMs), but emotional content remains poorly understood at the mechanistic level. We study whether LLMs process emotional valence through dedicated internal structure or through surface token matching. Using activation patching and steering on open-source LLMs, we find that negative and positive valence are processed at different network depths. Negative outcomes localize to early layers while positive ou...
|
| 63 |
XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity
2605.05662
|
cs.CLcs.AI
|
Dasol Choi, Eugenia Kim, Jaewon Noh, Sang Seo, Eunmi Kim |
Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover, they rarely evaluate a model's ability to detect culturally embedded sensitivities as distinct from universal ha...Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover, they rarely evaluate a model's ability to detect culturally embedded sensitivities as distinct from universal harms. We introduce XL-SafetyBench. a suite of 5,500 test cases across 10 country-language pairs, comprising a Jailbreak Benchmark of country-grounded adversarial prompts and a Cultural Benchmark where local sensitivities are embedded within ...
|
| 67 |
Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning
2605.05676
|
cs.CLcs.AI
|
Bing Wang, Ximing Li, Changchun Li, Jinjin Chi, Gang Niu |
Recently, the prominent performance of large language models (LLMs) has been largely driven by multi-task instruct-tuning. Unfortunately, this training paradigm suffers from a key issue, named cross-task interference, due to conflicting gradients over shared p...Recently, the prominent performance of large language models (LLMs) has been largely driven by multi-task instruct-tuning. Unfortunately, this training paradigm suffers from a key issue, named cross-task interference, due to conflicting gradients over shared parameters among different tasks. Some previous methods mitigate this issue by isolating task-specific parameters, e.g., task-specific neuron selection and mixture-of-experts. In this paper, we empirically reveal that the cross-task interfer...
|
| 120 |
BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models
2605.05758
|
cs.CL
|
Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie |
Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which ...Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensively in daily workflows. While recent general-domain tool-calling datasets have substantially improved the capabilities of LLM agents, existing efforts in the biomedical domain larg...
|
| 132 |
Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation
2605.05777
|
cs.CL
|
Huizi Cui, Huan Ma, Qilin Wang, Yuhang Gao, Changqing Zhang |
Large language models (LLMs) have progressed rapidly in complex reasoning and question answering, yet LLM hallucination remains a central bottleneck that hinders practical deployment, especially for commercial black-box LLMs accessible only via APIs. Existing ...Large language models (LLMs) have progressed rapidly in complex reasoning and question answering, yet LLM hallucination remains a central bottleneck that hinders practical deployment, especially for commercial black-box LLMs accessible only via APIs. Existing uncertainty quantification methods typically depend on computationally expensive multiple sampling or internal parameters, which prevents real-time estimation and fails to capture information implicit in the black-box reasoning process. To ...
|
| 155 |
Evaluation Awareness in Language Models Has Limited Effect on Behaviour
2605.05835
|
cs.CLcs.CY
|
Amelie Knecht, Lucas Florin, Thilo Hagendorff |
Large reasoning models (LRMs) sometimes note in their chain of thought (CoT) that they may be under evaluation. Researchers worry that this verbalised evaluation awareness (VEA) causes models to adapt their outputs strategically, optimising for perceived evalu...Large reasoning models (LRMs) sometimes note in their chain of thought (CoT) that they may be under evaluation. Researchers worry that this verbalised evaluation awareness (VEA) causes models to adapt their outputs strategically, optimising for perceived evaluation criteria, which, for instance, can make models appear safer than they actually are. However, whether VEA actually has this effect is largely unknown. We tested this across open-weight LRMs and benchmarks covering safety, alignment, mo...
|
| 182 |
Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention
2605.05892
|
cs.CLcs.LG
|
Zehao Jin, Ruixuan Deng, Junran Wang, Xinjie Shen, Chao Zhang |
Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations such as AxBench show that exi...Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations such as AxBench show that existing steering methods are often outperformed by simple in-context prompting and generalize poorly to unseen concepts. We hypothesize that these limitations arise from unvalidated simplifying assumptions shared across prior methods, which t...
|
| 183 |
Logic-Regularized Verifier Elicits Reasoning from LLMs
2605.05893
|
cs.CLcs.AI
|
Xinyu Wang, Changzhi Sun, Lian Cheng, Yuanbin Wu, Dell Zhang |
Verifiers are crucial components for enhancing modern LLMs' reasoning capability. Typicalverifiers require resource-intensive superviseddataset construction, which is costly and faceslimitations in data diversity. In this paper, wepropose LOVER, an unsupervise...Verifiers are crucial components for enhancing modern LLMs' reasoning capability. Typicalverifiers require resource-intensive superviseddataset construction, which is costly and faceslimitations in data diversity. In this paper, wepropose LOVER, an unsupervised verifier regularized by logical rules. LOVER treats theverifier as a binary latent variable, utilizinginternal activations and enforcing three logical constraints on multiple reasoning paths:negation consistency, intra-group consistency,a...
|
| 199 |
Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM
2605.05927
|
cs.CLcs.SDeess.AS
|
Wenqian Cui, Xiao-Hui Li, Daxin Tan, Qiyong Zheng, Irwin King |
Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech generation...Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech generation more text-like, but the gap remains. We argue that the key remaining bottleneck lies on the input side. We propose TextPro-SLM, an SLM that makes spoken input more closely resemble that of a prosody-aware text LLM. TextPro-SLM combines Whi...
|
| 210 |
Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation
2605.05950
|
cs.CL
|
Siyuan Li, Aodu Wulianghai, Xi Lin, Xibin Yuan, Qinghua Mao |
The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from LLM-generated counterparts a critical task for multimedia moderation. Existing detectors often rely on statistical cues or ...The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from LLM-generated counterparts a critical task for multimedia moderation. Existing detectors often rely on statistical cues or model-specific heuristics, making them vulnerable to paraphrasing and adversarial manipulations, and consequently limiting their robustness and interpretability. In this work, we proposeLiSCP , a novel lightweight stylistic consistency prof...
|
| 212 |
Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits
2605.05953
|
cs.CLcs.AI
|
Erik Nielsen, Elia Cunegatti, Marcus Vukojevic, Giovanni Iacca |
One of the most critical challenges in Large Language Models is their tendency to hallucinate, i.e., produce factually incorrect responses. Existing approaches show promising results in terms of hallucination correction, but still suffer from a main limitation...One of the most critical challenges in Large Language Models is their tendency to hallucinate, i.e., produce factually incorrect responses. Existing approaches show promising results in terms of hallucination correction, but still suffer from a main limitation: they apply corrections indiscriminately to every token, corrupting also the originally correct generations. To overcome this drawback, we propose PCNET, a Probabilistic Circuit trained as a tractable density estimator over the LLM residua...
|
| 213 |
TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity
2605.05955
|
cs.CLcs.CV
|
Zheyuan Yang, Liqiang Shang, Junjie Chen, Xun Yang, Chenglong Xu |
We introduce TableVista, a comprehensive benchmark for evaluating foundation models in multimodal table reasoning under visual and structural complexity. TableVista consists of 3,000 high-quality table reasoning problems, where each instance is expanded into 1...We introduce TableVista, a comprehensive benchmark for evaluating foundation models in multimodal table reasoning under visual and structural complexity. TableVista consists of 3,000 high-quality table reasoning problems, where each instance is expanded into 10 distinct visual variants through our multi-style rendering and transformation pipeline. This process encompasses diverse scenario styles, robustness perturbations, and vision-only configurations, culminating in 30,000 multimodal samples f...
|
| 217 |
Tatarstan Toponyms: A Bilingual Dataset and Hybrid RAG System for Geospatial Question Answering
2605.05962
|
cs.CL
|
Mullosharaf K. Arabov |
This paper addresses automatic geospatial question answering over multilingual toponymic data. An original bilingual dataset of toponyms of the Republic of Tatarstan is introduced, comprising 9,688 structured records with linguistic, etymological, administrati...This paper addresses automatic geospatial question answering over multilingual toponymic data. An original bilingual dataset of toponyms of the Republic of Tatarstan is introduced, comprising 9,688 structured records with linguistic, etymological, administrative, and coordinate information (93.1% georeferenced). Based on this dataset, a question-answering corpus of approximately 39,000 question-context-answer triples is constructed with guaranteed answer localization. A hybrid retriever integrat...
|
| 241 |
From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence
2605.06006
|
cs.CL
|
Premtim Sahitaj, Jawan Kolanowski, Ariana Sahitaj, Veronika Solopova, Max Upravitelev |
Fact-checking articles encode rich supporting evidence and reasoning, yet this evidence remains largely inaccessible to automated verification systems due to unstructured presentation. We introduce PrimeFacts, a methodology and resource for extracting fine-gra...Fact-checking articles encode rich supporting evidence and reasoning, yet this evidence remains largely inaccessible to automated verification systems due to unstructured presentation. We introduce PrimeFacts, a methodology and resource for extracting fine-grained evidence from full fact-checking articles. We compile 13,106 PolitiFact articles with claims, verdicts, and all referenced sources, and we identify 49,718 in-article hyperlinks as natural anchors to pinpoint key evidence. Our framework...
|
| 242 |
PersonaKit (PK): A Plug-and-Play Platform for User Testing Diverse Roles in Full-Duplex Dialogue
2605.06007
|
cs.CLcs.AIcs.HC
|
Hyunbae Jeon, Jinho D. Choi |
As spoken dialogue systems expand beyond traditional assistant roles to encompass diverse personas -- such as authoritative instructors, uncooperative merchants, or distracted workers -- they require distinct, human-like turn-taking behaviors to maintain psych...As spoken dialogue systems expand beyond traditional assistant roles to encompass diverse personas -- such as authoritative instructors, uncooperative merchants, or distracted workers -- they require distinct, human-like turn-taking behaviors to maintain psychological immersion. However, current full-duplex systems often default to a rigid, overly accommodating ``always-yield'' policy during overlapping speech, which severely undermines character consistency for non-submissive roles. Evaluating ...
|
| 252 |
More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs
2605.06030
|
cs.CL
|
Adrián Gude, Roi Santos-Ríos, Francis Bond, Dan Flickinger, Carlos Gómez-Rodríguez |
This study contributes to a growing line of research in comparing LLM-generated texts with human-authored text, in this case, English news text. We focus in particular on the evaluation of syntactic properties through formal grammar frameworks. Our analysis co...This study contributes to a growing line of research in comparing LLM-generated texts with human-authored text, in this case, English news text. We focus in particular on the evaluation of syntactic properties through formal grammar frameworks. Our analysis compares two generations of LLMs in the context of two human-authored English news datasets from two different years. Employing the Head-Driven Phrase Structure Grammar (HPSG) formalism, we investigate the distributions of syntactic structure...
|
| 281 |
Navigating by Old Maps: The Pitfalls of Static Mechanistic Localization in LLM Post-Training
2605.06076
|
cs.CL
|
Hang Chen, Jiaying Zhu, Hongyang Chen, Hongxu Liu, Xinyu Yang |
The "Locate-then-Update" paradigm has become a predominant approach in the post-training of large language models (LLMs), identifying critical components via mechanistic interpretability for targeted parameter updates. However, this paradigm rests on a fundame...The "Locate-then-Update" paradigm has become a predominant approach in the post-training of large language models (LLMs), identifying critical components via mechanistic interpretability for targeted parameter updates. However, this paradigm rests on a fundamental yet unverified assumption: can mechanisms derived from current static parameters reliably guide future dynamic parameter updates? To investigate this, we systematically track the structural evolution of Transformer circuits throughout ...
|
| 283 |
Milestone-Guided Policy Learning for Long-Horizon Language Agents
2605.06078
|
cs.CLcs.AI
|
Zixuan Wang, Yuchen Yan, Hongxing Li, Teng Pan, Dingming Li |
While long-horizon agentic tasks require language agents to perform dozens of sequential decisions, training such agents with reinforcement learning remains challenging. We identify two root causes: credit misattribution, where correct early actions are penali...While long-horizon agentic tasks require language agents to perform dozens of sequential decisions, training such agents with reinforcement learning remains challenging. We identify two root causes: credit misattribution, where correct early actions are penalized due to terminal failures, and sample inefficiency, where scarce successful trajectories result in near-total loss of learning signal. We introduce a milestone-guided policy learning framework, BEACON, that leverages the compositional st...
|
| 296 |
Uncovering Entity Identity Confusion in Multimodal Knowledge Editing
2605.06096
|
cs.CLcs.CV
|
Shu Wu, Xiaotian Ye, Xinyu Mou, Dongsheng Liu, Xiaohan Wang |
Multimodal knowledge editing (MKE) aims to correct the internal knowledge of large vision-language models after deployment, yet the behavioral patterns of post-edit models remain underexplored. In this paper, we identify a systemic failure mode in edited model...Multimodal knowledge editing (MKE) aims to correct the internal knowledge of large vision-language models after deployment, yet the behavioral patterns of post-edit models remain underexplored. In this paper, we identify a systemic failure mode in edited models, termed Entity Identity Confusion (EIC): edited models exhibit an absurd behavior where text-only queries about the original entity's identity unexpectedly return information about the new entity. To rigorously investigate EIC, we constru...
|
| 313 |
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
2605.06132
|
cs.CL
|
Chunyu Li, Jingyi Kang, Ding Chen, Mengyuan Zhang, Jiajun Shen |
In agent memory systems, the reranking model serves as the critical bridge connecting user queries with long-term memory. Most systems adopt the "retrieve-then-rerank" two-stage paradigm, but generic reranking models rely on semantic similarity matching and la...In agent memory systems, the reranking model serves as the critical bridge connecting user queries with long-term memory. Most systems adopt the "retrieve-then-rerank" two-stage paradigm, but generic reranking models rely on semantic similarity matching and lack genuine reasoning capabilities, leading to a problem where recalled results are semantically highly relevant yet do not contain the key information needed to answer the question. This deficiency manifests in memory scenarios as three spe...
|
| 320 |
IRC-Bench: Recognizing Entities from Contextual Cues in First-Person Reminiscences
2605.06142
|
cs.CLcs.AI
|
Yehudit Aperstein, Eden Moran, Alexander Apartsin |
When people recount personal memories, they often refer to people, places, and events indirectly, relying on contextual cues rather than explicit names. Such implicit references are central to reminiscence narratives: first-person accounts of lived experience ...When people recount personal memories, they often refer to people, places, and events indirectly, relying on contextual cues rather than explicit names. Such implicit references are central to reminiscence narratives: first-person accounts of lived experience used in therapeutic, archival, and social settings. They pose a difficult computational problem because the intended entity must be inferred from dispersed narrative evidence rather than from a local mention. We introduce IRC-Bench, the Imp...
|
| 353 |
A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping
2605.06200
|
cs.CL
|
Dingwei Chen, Zefang Zong, Zhipeng Ma, Leo Luo, Yang Li |
Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward, making it difficult to evaluate the contribution of individual tool-calls within multi-turn interactions. Existing approaches to such...Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward, making it difficult to evaluate the contribution of individual tool-calls within multi-turn interactions. Existing approaches to such process credit assignment either depend on separate external process reward models that introduce additional consumption, or tree-based structural rollout that merely redistributes the outcome signal while constraining trajectory diversity...
|
| 367 |
TIDE: Every Layer Knows the Token Beneath the Context
2605.06216
|
cs.CLcs.AIcs.LG
|
Ajay Jaiswal, Lauren Hannah, Han-Byul Kim, Duc Hoang, Mehrdad Farajtabar |
We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and then permanently discarded. This single-injection assumption induces two structural failures: (i) the Rare ...We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and then permanently discarded. This single-injection assumption induces two structural failures: (i) the Rare Token Problem, where a Zipf-type distribution of vocabulary causes rare-token embeddings are chronically under-trained due to receiving a fraction of the cumulative gradient signal compared to common tokens; and (ii) the Contextual Collapse...
|
| 370 |
UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification
2605.06221
|
cs.CL
|
Qihang Fan, Huaibo Huang, Zhiying Wu, Bingning Wang, Ran He |
As large language models (LLMs) continue to advance rapidly, they are becoming increasingly capable while simultaneously demanding ever-longer context lengths. To improve the inference efficiency of long-context processing, several novel low-complexity hybrid ...As large language models (LLMs) continue to advance rapidly, they are becoming increasingly capable while simultaneously demanding ever-longer context lengths. To improve the inference efficiency of long-context processing, several novel low-complexity hybrid architectures have recently been proposed, effectively alleviating the computational burden of long-context inference. However, existing research on long-context prefill acceleration remains predominantly focused on sparse attention mechani...
|
| 379 |
YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling
2605.06231
|
cs.CL
|
Fengze Guo, Yue Chang |
This paper presents our system for SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization, which identifies polarized social media content in 22 languages through three subtasks: binary detection, target classification, a...This paper presents our system for SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization, which identifies polarized social media content in 22 languages through three subtasks: binary detection, target classification, and manifestation identification. We propose a heterogeneous ensemble of multilingual pretrained models, combining XLM-RoBERTa-large and mDeBERTa-v3-base. We investigate techniques such as multi-task learning, translation-based data augmenta...
|
| 384 |
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
2605.06241
|
cs.CL
|
Ömer Faruk Akgül, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna |
Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mass over solutions the base model already contains. In this...Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mass over solutions the base model already contains. In this work, we ask: if RL merely steers the model toward paths it already knows, is the RL optimization loop itself necessary? Through token-level analysis across multiple model families and RL algorithms, we find that RL's beneficial footprint ...
|
| 399 |
Linear Semantic Segmentation for Low-Resource Spoken Dialects
2605.06276
|
cs.CLcs.AI
|
Kirill Chirkunov, Younes Samih, Abed Alhakim Freihat, Hanan Aldarmaki |
Semantic segmentation is a core component of discourse analysis, yet existing models are primarily developed and evaluated on high-resource written text, limiting their effectiveness on low-resource spoken varieties. In particular, dialectal Arabic exhibits in...Semantic segmentation is a core component of discourse analysis, yet existing models are primarily developed and evaluated on high-resource written text, limiting their effectiveness on low-resource spoken varieties. In particular, dialectal Arabic exhibits informal syntax, code-switching, and weakly marked discourse structure that challenge standard segmentation approaches. In this paper, we introduce a new multi-genre benchmark (more than 1000 samples) for semantic segmentation in conversation...
|
| 404 |
Quantifying the Statistical Effect of Rubric Modifications on Human-Autorater Agreement
2605.06283
|
cs.CL
|
Jessica Huynh, Alfredo Gomez, Athiya Deviyani, Renee Shelby, Jeffrey P. Bigham |
Autoraters, also referred to as LLM-as-judges, are increasingly used for evaluation and automated content moderation. However, there is limited statistical analysis of how modifications in a rubric presented to both humans and autoraters affect their score agr...Autoraters, also referred to as LLM-as-judges, are increasingly used for evaluation and automated content moderation. However, there is limited statistical analysis of how modifications in a rubric presented to both humans and autoraters affect their score agreement. Rubrics that ask for an overall or \emph{holistic} judgment - for example, rating the ``quality'' of an essay - may be inconsistently interpreted due to the complexity or subjectivity of the criteria. Conversely, rubrics can ask for...
|
| 405 |
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
2605.06285
|
cs.CLcs.LG
|
Yijia Zheng, Marcel Worring |
Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacing single-step retrieval with a...Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacing single-step retrieval with a multi-step process, in which the large language model (LLM) acts as a search agent that generates intermediate thoughts and subqueries to iteratively interact with the retrieval system. This iterative process incurs substantial latency due...
|
| 409 |
Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text
2605.06294
|
cs.CLcs.AIcs.LG
|
Tom Kempton, Viktor Drobnyi, Maeve Madigan, Stuart Burrell |
The ability to reliably distinguish human-written text from that generated by large language models is of profound societal importance. The dominant approach to this problem exploits the likelihood hypothesis: that machine-generated text should appear more pro...The ability to reliably distinguish human-written text from that generated by large language models is of profound societal importance. The dominant approach to this problem exploits the likelihood hypothesis: that machine-generated text should appear more probable to a detector language model than human-written text. However, we demonstrate that the token-level signal distinguishing human and machine text is non-uniform across the hidden space of the detector model, and naively averaging likeli...
|
| 417 |
MultiLinguahah : A New Unsupervised Multilingual Acoustic Laughter Segmentation Method
2605.06309
|
cs.CL
|
Callejas Sofia, Gomez Nahuel, Pelachaud Catherine, Ravenet Brian, Barriere Valentin |
Laughter is a social non-vocalization that is universal across cultures and languages, and is crucial for human communication, including social bonding and communication signaling. However, detecting laughter in audio is a challenging task, and segmenting is e...Laughter is a social non-vocalization that is universal across cultures and languages, and is crucial for human communication, including social bonding and communication signaling. However, detecting laughter in audio is a challenging task, and segmenting is even more difficult. Currently, Machine Learning methods generally rely on costly manual annotation, and their datasets are mostly based on English contexts. Thus, we propose an unsupervised multilingual method that sets up the laughter segm...
|
| 423 |
Who and What? Using Linguistic Features and Annotator Characteristics to Analyze Annotation Variation
2605.06318
|
cs.CLcs.CY
|
Maximilian Maurer, Maximilian Linde, Gabriella Lapesa |
Human label variation has been established as a central phenomenon in NLP: the perspectives different annotators have on the same item need to be embraced. Data collection practices thus shifted towards increasing the annotator numbers and releasing disaggrega...Human label variation has been established as a central phenomenon in NLP: the perspectives different annotators have on the same item need to be embraced. Data collection practices thus shifted towards increasing the annotator numbers and releasing disaggregated datasets, harmful language being most resourced due to its high subjectivity. While this resulted in rich information about \textit{who} annotated (sociodemographics, attitudes, etc.), the \textit{what} (e.g., linguistic properties of i...
|
| 427 |
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning
2605.06326
|
cs.CL
|
Qianjia Cheng, Yuchen Zhang, Zhilin Wang, Yuxin Zuo, Shunkai Zhang |
Tool-integrated reasoning (TIR) offers a direct way to extend thinking models beyond the limits of text-only reasoning. Paradoxically, we observe that tool-enabled evaluation can degrade reasoning performance even when the strong thinking models make almost no...Tool-integrated reasoning (TIR) offers a direct way to extend thinking models beyond the limits of text-only reasoning. Paradoxically, we observe that tool-enabled evaluation can degrade reasoning performance even when the strong thinking models make almost no actual tool calls. In this paper, we investigate how to inject natural tool-use behavior into a strong thinking model without sacrificing its no-tool reasoning ability, and present a comprehensive TIR recipe. We highlight that (i) the effe...
|
| 428 |
Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity
2605.06327
|
cs.CLcs.AIcs.LG
|
Florian A. D. Burnat, Brittany I. Davidson |
Safety benchmarks are routinely treated as evidence about how a language model will behave once deployed, but this inference is fragile if behavior depends on whether a prompt looks like an evaluation. We define evaluation-context divergence as an observable w...Safety benchmarks are routinely treated as evidence about how a language model will behave once deployed, but this inference is fragile if behavior depends on whether a prompt looks like an evaluation. We define evaluation-context divergence as an observable within-item change in behavior induced by framing a fixed task as an evaluation, a live deployment interaction, or a neutral request, and present a paired-prompt protocol that measures it in open-weight LLMs while controlling for paraphrase ...
|
| 433 |
MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents
2605.06334
|
cs.CLcs.LGcs.LO
|
Ashwani Anand, Ivi Chatzi, Ritam Raha, Anne-Kathrin Schmuck |
Tool-using large language model (LLM) agents are increasingly deployed in settings where their reliable behavior is governed by strict procedural manuals. Ensuring that such agents comply with the rules from these manuals is challenging, as they are typically ...Tool-using large language model (LLM) agents are increasingly deployed in settings where their reliable behavior is governed by strict procedural manuals. Ensuring that such agents comply with the rules from these manuals is challenging, as they are typically written for humans in natural language while agent behavior manifests as an execution trace of tool calls. Existing evaluations of LLM agents rely on manually constructed benchmarks or LLM-based judges, which either do not scale or lack rel...
|
| 440 |
Don't Lose Focus: Activation Steering via Key-Orthogonal Projections
2605.06342
|
cs.CL
|
Haoyan Luo, Mateo Espinosa Zarlenga, Mateja Jamnik |
Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention rerouting: steering vectors ...Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention rerouting: steering vectors alter query-key matching, shifting attention away from contextually important tokens toward less informative ones. To address this, we propose Steering via Key-Orthogonal Projections (SKOP), a steering method that constrains harmful attenti...
|
| 447 |
SEQUOR: A Multi-Turn Benchmark for Realistic Constraint Following
2605.06353
|
cs.CL
|
Beatriz Canaverde, Duarte M. Alves, José Pombal, Giuseppe Attanasio, André F. T. Martins |
In a conversation, a helpful assistant must reliably follow user directives, even as they refine, modify, or contradict earlier requests. Yet most instruction-following benchmarks focus on single-turn or short multi-turn scenarios, leaving open how well models...In a conversation, a helpful assistant must reliably follow user directives, even as they refine, modify, or contradict earlier requests. Yet most instruction-following benchmarks focus on single-turn or short multi-turn scenarios, leaving open how well models handle long-horizon instruction-following tasks. To bridge this gap, we present SEQUOR, an automatic benchmark for evaluating constraint adherence in long multi-turn conversations. SEQUOR consists of simulated persona-driven interactions b...
|
| 473 |
GATHER: Convergence-Centric Hyper-Entity Retrieval for Zero-Shot Cell-Type Annotation
2605.06403
|
cs.CLcs.IR
|
Zhonghui Zhang, Feng Jiang, Shaowei Qin, Jiahao Zhao, Min Yang |
Zero-shot single-cell cell-type annotation aims to determine a cell's type from a given set of expressed genes without any training. Existing knowledge-graph-based RAG approaches retrieve evidence by expanding from source entities and relying on iterative LLM ...Zero-shot single-cell cell-type annotation aims to determine a cell's type from a given set of expressed genes without any training. Existing knowledge-graph-based RAG approaches retrieve evidence by expanding from source entities and relying on iterative LLM reasoning. However, in this setting each query contains tens to hundreds of genes, where no single gene is decisive and the label emerges only from their collective co-occurrence. Such hyper-entity queries fundamentally challenge local, ent...
|
| 479 |
MiA-Signature: Approximating Global Activation for Long-Context Understanding
2605.06416
|
cs.CL
|
Yuqing Li, Jiangnan Li, Mo Yu, Zheng Lin, Weiping Wang |
A growing body of work in cognitive science suggests that reportable conscious access is associated with \emph{global ignition} over distributed memory systems, while such activation is only partially accessible as individuals cannot directly access or enumera...A growing body of work in cognitive science suggests that reportable conscious access is associated with \emph{global ignition} over distributed memory systems, while such activation is only partially accessible as individuals cannot directly access or enumerate all activated contents. This tension suggests a plausible mechanism that cognition may rely on a compact representation that approximates the global influence of activation on downstream processing. Inspired by this idea, we introduce th...
|
| 481 |
From 124 Million Tokens to 1,021 Neologisms: A Large-Scale Pipeline for Automatic Neologism Detection
2605.06426
|
cs.CL
|
Diego Rossini, Lonneke van der Plas |
We present a scalable, modular pipeline for automatic neologism detection that combines rule-based filtering with LLM classification. The pipeline is grounded in two complementary word-formation frameworks, grammatical and extra-grammatical morphology, which j...We present a scalable, modular pipeline for automatic neologism detection that combines rule-based filtering with LLM classification. The pipeline is grounded in two complementary word-formation frameworks, grammatical and extra-grammatical morphology, which jointly define the scope of what counts as a neologism and inform a four-class classification scheme (neologism, entity, foreign, none). While designed to be modular and transferable at the architectural level, the pipeline is instantiated o...
|
| 484 |
COVID-19 Infodemic. Understanding content features in detecting fake news using a machine learning approach
2605.06435
|
cs.CLcs.AIcs.LG
|
Vimala Balakrishnan, Lee Zing Hii, Eric Laporte |
The use of content features, particularly textual and linguistic for fake news detection is under-researched, despite empirical evidence showing the features could contribute to differentiating real and fake news. To this end, this study investigates a selecti...The use of content features, particularly textual and linguistic for fake news detection is under-researched, despite empirical evidence showing the features could contribute to differentiating real and fake news. To this end, this study investigates a selection of content features such as word bigrams, part of speech distribution etc. to improve fake news detection. We performed a series of experiments on a new dataset gathered during the COVID-19 pandemic and using Decision Tree, K-Nearest Nei...
|
| 506 |
Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts
2605.06476
|
cs.CL
|
Sneha Oram, Ojaswita Bhushan, Pushpak Bhattacharyya |
In this work, we conduct an analysis to examine the consistency of Large Language Models (LLMs) with respect to their own generated responses in an emotionally-driven conversational context. Specifically, the text generated by LLM is framed as a query to the s...In this work, we conduct an analysis to examine the consistency of Large Language Models (LLMs) with respect to their own generated responses in an emotionally-driven conversational context. Specifically, the text generated by LLM is framed as a query to the same model, and its responses are subsequently assessed. This is performed with three queries across two dimensions of extreme and moderate emotions. The three queries are, in particular, false claim queries that contain inherently wrong ass...
|
| 512 |
Litespark Inference on Consumer CPUs: Custom SIMD Kernels for Ternary Neural Networks
2605.06485
|
cs.CLcs.AI
|
Nii Osae Osae Dade, Tony Morri, Moinul Hossain Rahat, Sayandip Pal |
Large language models (LLMs) have transformed artificial intelligence, but their computational requirements remain prohibitive for most users. Standard inference demands expensive datacenter GPUs or cloud API access, leaving over one billion personal computers...Large language models (LLMs) have transformed artificial intelligence, but their computational requirements remain prohibitive for most users. Standard inference demands expensive datacenter GPUs or cloud API access, leaving over one billion personal computers underutilized for AI workloads. Ternary models offer a path forward: their weights are constrained to {-1, 0, +1}, theoretically eliminating the need for floating-point multiplication. However, existing frameworks fail to exploit this stru...
|
| 520 |
The Frequency Confound in Language-Model Surprisal and Metaphor Novelty
2605.06506
|
cs.CL
|
Omar Momen, Sina Zarrieß |
Language-model (LM) surprisal is widely used as a proxy for contextual predictability and has been reported to correlate with metaphor novelty judgments. However, surprisal is tightly intertwined with lexical frequency. We explore this interaction on metaphor ...Language-model (LM) surprisal is widely used as a proxy for contextual predictability and has been reported to correlate with metaphor novelty judgments. However, surprisal is tightly intertwined with lexical frequency. We explore this interaction on metaphor novelty ratings using two different word frequency measures. We analyse surprisal estimates from eight Pythia model sizes and 154 training checkpoints. Across settings, word frequency is a stronger predictor of metaphor novelty than surpris...
|
| 533 |
STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?
2605.06527
|
cs.CL
|
Hanxiang Chao, Yihan Bai, Rui Sheng, Tianle Li, Yushi Sun |
Large Language Model (LLM) agents are increasingly expected to maintain coherent, long-term personalized memory, yet current benchmarks primarily measure static fact retrieval, overlooking the ability to revise stored beliefs when new evidence emerges. We iden...Large Language Model (LLM) agents are increasingly expected to maintain coherent, long-term personalized memory, yet current benchmarks primarily measure static fact retrieval, overlooking the ability to revise stored beliefs when new evidence emerges. We identify a critical and underexplored failure mode, Implicit Conflict: a later observation invalidates an earlier memory without explicit negation, requiring contextual inference and commonsense reasoning to detect. To rigorously evaluate this ...
|
| 542 |
Efficient Pre-Training with Token Superposition
2605.06546
|
cs.CL
|
Bowen Peng, Théo Gigant, Jeffrey Quesnelle |
Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we present Token-Superposition Training (TST), a simple drop-i...Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we present Token-Superposition Training (TST), a simple drop-in method that significantly improves the data throughput per FLOPs during pre-training without modifying the parallelism, optimizer, tokenizer, data, or model architecture. TST is done in two phases: (i) A highly efficient superposition pha...
|
| 543 |
Continuous Latent Diffusion Language Model
2605.06548
|
cs.CLcs.AIcs.CV
|
Hongcan Guo, Qinyu Zhao, Yian Zhao, Shen Nie, Rui Zhu |
Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable ...Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learning, and effective global semantic modeling. We propose Cola DLM, a hierarchical latent diffusion language model that frames text generation through hierarchical information decomposition. Cola DLM first learns a stable ...
|
| 546 |
Long Context Pre-Training with Lighthouse Attention
2605.06554
|
cs.CL
|
Bowen Peng, Subho Ghosh, Jeffrey Quesnelle |
Training causal transformers at extreme sequence lengths is bottlenecked by the quadratic time and memory of scaled dot-product attention (SDPA). In this work, we propose Lighthouse Attention, a training-only symmetrical selection-based hierarchical attention ...Training causal transformers at extreme sequence lengths is bottlenecked by the quadratic time and memory of scaled dot-product attention (SDPA). In this work, we propose Lighthouse Attention, a training-only symmetrical selection-based hierarchical attention algorithm that wraps around ordinary SDPA and can be easily removed towards the end of the training. Our hierarchical selection is also gradient-free, which exempts us from dealing with a complicated and potentially inefficient backward pas...
|
| 567 |
Automated Clinical Report Generation for Remote Cognitive Remediation: Comparing Knowledge-Engineered Templates and LLMs in Low-Resource Settings
2605.06594
|
cs.CL
|
Yongxin Zhou, Fabien Ringeval, François Portet |
The growing demand for cognitive remediation therapy, combined with limited speech therapist availability, has accelerated the adoption of remote rehabilitation tools. These systems generate large volumes of interaction data that are difficult for clinicians t...The growing demand for cognitive remediation therapy, combined with limited speech therapist availability, has accelerated the adoption of remote rehabilitation tools. These systems generate large volumes of interaction data that are difficult for clinicians to review efficiently. This paper investigates automated clinical report generation for avatar-guided, home-based cognitive remediation sessions in a low-resource setting with no reference reports. We present and compare two approaches: (1) ...
|
| 570 |
UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
2605.06597
|
cs.CLcs.AIcs.LG
|
Yiqiao Jin, Yiyang Wang, Lucheng Fu, Yijia Xiao, Yinyi Luo |
Self-distillation (SD) offers a promising path for adapting large language models (LLMs) without relying on stronger external teachers. However, SD in autoregressive LLMs remains challenging because self-generated trajectories are free-form, correctness is tas...Self-distillation (SD) offers a promising path for adapting large language models (LLMs) without relying on stronger external teachers. However, SD in autoregressive LLMs remains challenging because self-generated trajectories are free-form, correctness is task-dependent, and plausible rationales can still provide unstable or unreliable supervision. Existing methods mainly examine isolated design choices, leaving their effectiveness, roles, and interactions unclear. In this paper, we propose Uni...
|
| 583 |
Algospeak, Hiding in the Open: The Trade-off Between Legible Meaning and Detection Avoidance
2605.06619
|
cs.CLcs.CY
|
Jan Fillies, Ronald E. Robertson, Jeffrey Hancock |
As large language models (LLMs) increasingly mediate both content generation and moderation, linguistic evasion strategies known as Algospeak have intensified the coevolution between evaders and detectors. This research formalizes the underlying dynamics groun...As large language models (LLMs) increasingly mediate both content generation and moderation, linguistic evasion strategies known as Algospeak have intensified the coevolution between evaders and detectors. This research formalizes the underlying dynamics grounded in a joint action model: when Algospeak increases, detectability and understandability decrease. Further, the concept of Majority Understandable Modulation (MUM) is introduced and defined as the modulation level at which additional evas...
|
| 586 |
Parser agreement and disagreement in L2 Korean UD: Implications for human-in-the-loop annotation
2605.06625
|
cs.CL
|
Hakyung Sung, Gyu-Ho Shin |
We propose a simplified human-in-the-loop workflow for second language (L2) Korean morphosyntactic annotation by leveraging agreement between two domain-adapted parsers. We first evaluate whether parser agreement can serve as a proxy for annotation correctness...We propose a simplified human-in-the-loop workflow for second language (L2) Korean morphosyntactic annotation by leveraging agreement between two domain-adapted parsers. We first evaluate whether parser agreement can serve as a proxy for annotation correctness by comparing it with independent human judgments. The results show strong correspondence between parser and human judgments, supporting the feasibility of semi-automatic L2-Korean UD annotation. Further analysis demonstrates that parser di...
|
| 592 |
Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents
2605.06635
|
cs.CL
|
Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah |
Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, ...Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation (RAG) that does not validate source accessibility, relevance, or factual consistency. We introduce the first source attribution evaluation framework that uses a reproducible AST parser to extract and...
|
| 599 |
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction
2605.06642
|
cs.CLcs.AI
|
Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr |
Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over exte...Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajectory Abstraction (StraTA), a simple framework that introduces an explicit trajectory-level strategy into agentic reinforcement learning (RL). StraTA samples a compact strategy from...
|
| cs.CR 17 papers | ||||
| 48 |
Architecture Matters: Comparing RAG Systems under Knowledge Base Poisoning
2605.05632
|
cs.CRcs.CLcs.LG
|
Samuel Korn |
Retrieval-Augmented Generation (RAG) systems are vulnerable to knowledge base poisoning, yet existing attacks have been evaluated almost exclusively against vanilla retrieve-then-generate pipelines. Architectures designed to handle conflicting retrieved inform...Retrieval-Augmented Generation (RAG) systems are vulnerable to knowledge base poisoning, yet existing attacks have been evaluated almost exclusively against vanilla retrieve-then-generate pipelines. Architectures designed to handle conflicting retrieved information - multi-agent debate, agentic retrieval, recursive language models - remain untested against adversarially optimized contradictions. We evaluate four RAG architectures (vanilla RAG, agentic RAG, MADAM-RAG, and Recursive Language Model...
|
| 59 |
TUANDROMD-X: Advanced Entropy and Visual Analytics Dataset for Enhanced Malware Detection and Classification
2605.06718
|
cs.CRcs.LG
|
Parthajit Borah, Upasana Sarmah, D. K. Bhattacharyya, J. K. Kalita |
Malware and malware-based attacks are becoming more prevalent and complex. Attackers regularly come up with new techniques that have the ability to evade conventional and signature-based malware defense. In order to address such threats, there is an increasing...Malware and malware-based attacks are becoming more prevalent and complex. Attackers regularly come up with new techniques that have the ability to evade conventional and signature-based malware defense. In order to address such threats, there is an increasing demand for advanced and better defense solutions. Machine learning-based techniques are efficiently capable of defending against malware and malware-based attacks. Nevertheless, creating and efficiently testing such techniques demand high-...
|
| 87 |
SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety
2605.05704
|
cs.CRcs.AI
|
Zhe Liu, Zonghao Ying, Wenxin Zhang, Quanchen Zou, Deyue Zhang |
With the rapid evolution of foundation models, Large Language Model (LLM) agents have demonstrated increasingly powerful tool-use capabilities. However, this proficiency introduces significant security risks, as malicious actors can manipulate agents into exec...With the rapid evolution of foundation models, Large Language Model (LLM) agents have demonstrated increasingly powerful tool-use capabilities. However, this proficiency introduces significant security risks, as malicious actors can manipulate agents into executing tools to generate harmful content. While existing defensive mechanisms are effective, they frequently suffer from the over-refusal problem, where increased safety strictness compromises the agent's utility on benign tasks. To mitigate...
|
| 135 |
Stego Battlefield: Evaluating Image Steganography Attacks and Steganalysis Defenses
2605.05789
|
cs.CRcs.CV
|
Zhen Sun, Zongmin Zhang, Leyi Sheng, Yule Liu, Yifan Liao |
Image steganography is widely used to protect user privacy and enable covert communication. However, it can also be abused by the adversary as a covert channel to bypass content moderation, disseminate harmful semantics, and even hide malicious instructions in...Image steganography is widely used to protect user privacy and enable covert communication. However, it can also be abused by the adversary as a covert channel to bypass content moderation, disseminate harmful semantics, and even hide malicious instructions in images to elicit dangerous outputs from large models, posing a practical security risk that continues to evolve. To address the lack of a unified and systematic evaluation framework, we propose SADBench, a systematic benchmark that assesse...
|
| 142 |
LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution
2605.05807
|
cs.CRcs.AI
|
Christopher G. Pedraza Pohlenz, Hassan Jalil Hadi, Ali Hassan, Ali Shoker |
LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and insufficient code-level grounding for identifying malicious and vulnerable code segments. To address these limitat...LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and insufficient code-level grounding for identifying malicious and vulnerable code segments. To address these limitations, this research introduces LCC-LLM, a code-centric benchmark dataset and evidence-grounded framework for malware attribution and multi-task static malware analysis. The proposed LCCD dataset contains approximately 34K PE samples process...
|
| 148 |
LeakDojo: Decoding the Leakage Threats of RAG Systems
2605.05818
|
cs.CRcs.AIcs.CL
|
Maosen Zhang, Jianshuo Dong, Boting Lu, Wenyue Li, Xiaoping Zhang |
Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to leverage external knowledge, but also exposes valuable RAG databases to leakage attacks. As RAG systems grow more complex and LLMs exhibit stronger instruction-following capabilities,...Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to leverage external knowledge, but also exposes valuable RAG databases to leakage attacks. As RAG systems grow more complex and LLMs exhibit stronger instruction-following capabilities, existing studies fall short of systematically assessing RAG leakage risks. We present LeakDojo, a configurable framework for controlled evaluation of RAG leakage. Using LeakDojo, we benchmark six existing attacks across fourteen LLMs, four...
|
| 158 |
LoopTrap: Termination Poisoning Attacks on LLM Agents
2605.05846
|
cs.CRcs.AI
|
Huiyu Xu, Zhibo Wang, Wenhui Zhang, Ziqi Zhu, Yaopeng Wang |
Modern LLM agents solve complex tasks by operating in iterative execution loops, where they repeatedly reason, act, and self-evaluate progress to determine when a task is complete. In this work, we show that while this self-directed loop facilitates autonomy, ...Modern LLM agents solve complex tasks by operating in iterative execution loops, where they repeatedly reason, act, and self-evaluate progress to determine when a task is complete. In this work, we show that while this self-directed loop facilitates autonomy, it also introduces a critical risk: by injecting malicious prompts into the agent's context, an adversary can distort the agent's termination judgment, making it believe the task remains incomplete and leading to unbounded computation.To un...
|
| 224 |
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
2605.05974
|
cs.CRcs.AI
|
Qinfeng Li, Yuntai Bao, Jianghui Hu, Wenqi Zhang, Jintao Chen |
LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property. However, in untrusted deployments, adversaries can copy and reuse these prompts with other proprietary LLMs, causi...LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property. However, in untrusted deployments, adversaries can copy and reuse these prompts with other proprietary LLMs, causing economic losses. To protect these prompts, we identify four key challenges: proactivity, runtime protection, usability, and non-portability that existing approaches fail to address. We present PragLocker, a prompt protection scheme that ...
|
| 235 |
Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks
2605.05995
|
cs.CRcs.AIcs.CL
|
Guoxin Lu, Letian Sha, Qing Wang, Peijie Sun, Hao Zhou |
The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on parameters, gradients, or internal representations, we observe that they can be effectively circumvented under p...The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on parameters, gradients, or internal representations, we observe that they can be effectively circumvented under persistent HFT. Our analysis traces this failure to the inherent redundancy of the high-dimensional parameter space: attackers exploit optimization trajectories that are orthogonal to defense constraints to restore harmful capabilities while...
|
| 304 |
When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents
2605.06731
|
cs.CRcs.CLcs.LG
|
Xiaoyu Xu, Minxin Du, Qipeng Xie, Haobin Ke, Qingqing Ye |
Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term sta...Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term state, inadvertently weakening future confirmation boundaries, expanding tool-use defaults, and escalating autonomous behavior over time. We formalize this risk as \textbf{unintended long-term state poisoning}. To systematically study it, we i...
|
| 326 |
Secure Seed-Based Multi-bit Watermarking for Diffusion Models from First Principles
2605.06153
|
cs.CRcs.CV
|
Enoal Gesny, Eva Giboulot |
The rapid emergence of generative image models has led to the development of specialized watermarking techniques, particularly in-generation methods such as seed-based embedding. However, current evaluations in this area remain largely empirical, making them h...The rapid emergence of generative image models has led to the development of specialized watermarking techniques, particularly in-generation methods such as seed-based embedding. However, current evaluations in this area remain largely empirical, making them heavily reliant on the specific model architectures used for generation and inversion. This prevents any clear conclusion on the performance of any method, especially regarding security, for which a rigorous definition is lacking. Against th...
|
| 415 |
From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents
2605.06738
|
cs.CRcs.AI
|
Lars Kersten Kroehl |
Autonomous AI agents now transact at production scale -- 69,000 bots executing 165 million transactions across 50 million USDC in cumulative volume on a single marketplace -- without any shared trust layer between participants. Regulatory frameworks (Singapore...Autonomous AI agents now transact at production scale -- 69,000 bots executing 165 million transactions across 50 million USDC in cumulative volume on a single marketplace -- without any shared trust layer between participants. Regulatory frameworks (Singapore IMDA, NIST CAISI, EU AI Act) and major AI laboratories (Anthropic, Google) have independently converged on the same structural requirement: an open, portable, cryptographically verifiable trust infrastructure for autonomous agents that no ...
|
| 426 |
Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation
2605.06324
|
cs.CRcs.CYcs.LG
|
Florian A. D. Burnat, Brittany I. Davidson |
Online-safety regulation under the UK Online Safety Act and the EU Digital Services Act increasingly treats scalar metrics as compliance evidence. Once announced, such a metric also becomes an optimization target: a strategic platform can improve its score by ...Online-safety regulation under the UK Online Safety Act and the EU Digital Services Act increasingly treats scalar metrics as compliance evidence. Once announced, such a metric also becomes an optimization target: a strategic platform can improve its score by routing recommendations through semantically equivalent content variants, without reducing true harm. We ask when such an audit metric can still certify a genuine reduction in harm. The protocol is modeled as a published transformation grap...
|
| 430 |
Fine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysis
2605.06330
|
cs.CRcs.AI
|
Siraaj Akhtar, Saad Khan, Simon Parkinson |
Large language models (LLMs) have shown promise for event log analysis, but their high computational requirements, reliance on cloud infrastructure, and security concerns limit practical deployment. In addition, most existing approaches focus only on the ident...Large language models (LLMs) have shown promise for event log analysis, but their high computational requirements, reliance on cloud infrastructure, and security concerns limit practical deployment. In addition, most existing approaches focus only on the identification of the problem and do not provide actionable remediation. Small language models (SLMs) present a light-weight alternative that can be fine-tuned for a specific purpose and hosted locally. This paper investigates whether SLMs, when...
|
| 522 |
On the Security of Research Artifacts
2605.06508
|
cs.CRcs.AI
|
Nanda Rani, Christian Rossow |
Research artifacts are widely shared to support reproducibility, and artifact evaluation (AE) has become common at many leading conferences. However, AE mainly checks whether artifacts work as claimed and can be reproduced. It largely overlooks potential secur...Research artifacts are widely shared to support reproducibility, and artifact evaluation (AE) has become common at many leading conferences. However, AE mainly checks whether artifacts work as claimed and can be reproduced. It largely overlooks potential security risks. Since these artifacts are publicly released and reused, they may unintentionally create opportunities for misuse and raise concerns about safe and responsible sharing. We study 509 research artifacts from top-tier security venues...
|
| 569 |
FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning
2605.06596
|
cs.CRcs.LG
|
Su Zhang, Junfeng Guo, Heng Huang |
Watermark radioactivity testing type of methods can detect whether a model was trained on watermarked documents, and have become key tools for protecting data ownership in the fine-tuning of large language models (LLMs). Existing works have proved their effect...Watermark radioactivity testing type of methods can detect whether a model was trained on watermarked documents, and have become key tools for protecting data ownership in the fine-tuning of large language models (LLMs). Existing works have proved their effectiveness in centralized LLM fine-tuning. However, this type of method faces several challenges and remains underexplored in federated learning (FL), a widely-applied paradigm for fine-tuning LLMs collaboratively on private data across differ...
|
| 572 |
Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches
2605.06601
|
cs.CRcs.AI
|
Isaac David, Arthur Gervais |
Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. ...Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. This paper asks whether a language-model agent, restricted to local binary-derived evidence, can reconstruct the security meaning of Linux distribution updates. Patch2Vuln is a local, resumable pipeline that extracts old/new ELF pairs, diff...
|
| cs.CV 108 papers | ||||
| 4 |
Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey
2605.06714
|
cs.CVcs.AI
|
Yiwen Xu, Tariq M. Khan, Yang Song, Erik Meijering |
Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensiv...Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensive review of the current state of the art in edge deep learning, focusing on computer vision applications, in particular medical diagnostics. An overview of the foundational principles and technical advantages of edge deep learning is presen...
|
| 12 |
Characterizing Brazilian Atlantic Forest Restoration Outcomes with Geospatial AlphaEarth Embeddings
2605.05547
|
cs.CV
|
Alice Heiman |
The Atlantic Forest in Brazil is a critical biodiversity hotspot, yet less than 12-15% of its original cover remains. Although monitoring forest restoration on a large scale is essential, traditional methods are limited by the impracticality of on-the-ground r...The Atlantic Forest in Brazil is a critical biodiversity hotspot, yet less than 12-15% of its original cover remains. Although monitoring forest restoration on a large scale is essential, traditional methods are limited by the impracticality of on-the-ground reporting on such a scale and by the saturation of remote-sensing indices such as NDVI. Furthermore, reforestation is a gradual process as opposed to the rapid spectral changes caused by deforestation. In this study, we examine 1,729 restora...
|
| 13 |
A Novel Graph-Regulated Disentangling Mamba Model with Sparse Tokens for Enhanced Tree Species Classification from MODIS Time Series
2605.05549
|
cs.CV
|
Motasem Alkayid, Zhengsen Xu, Saeid Taleghanidoozdoozan, Yimin Zhu, Megan Greenwood |
Although tree species classification from Moderate Resolution Imaging Spectroradiometer (MODIS) time series data is critical for supporting various environmental applications, it is a challenging task due to several key difficulties: the subtle signature diffe...Although tree species classification from Moderate Resolution Imaging Spectroradiometer (MODIS) time series data is critical for supporting various environmental applications, it is a challenging task due to several key difficulties: the subtle signature differences among tree species, strong spatial-spectral-temporal information coupling, and the difficulty of modeling large-scale topological context information. To better address these challenges, this paper presents a novel Graph-regulated Di...
|
| 16 |
An extremely coarse feedback signal is sufficient for learning human-aligned visual representations
2605.05556
|
cs.CV
|
Yash Mehta, Michael F. Bonner |
Artificial neural networks trained on visual tasks develop internal representations resembling those of the primate visual system, a discovery that has guided a decade of computational neuroscience. Research on building brain-aligned models has progressively e...Artificial neural networks trained on visual tasks develop internal representations resembling those of the primate visual system, a discovery that has guided a decade of computational neuroscience. Research on building brain-aligned models has progressively embraced finer-grained supervisory signals, from object classification to contrastive self-supervised objectives that maximize distinctions among individual images, yet the role of supervisory signal granularity on brain alignment remains la...
|
| 23 |
Text-to-CAD Retrieval: a Strong Baseline
2605.05572
|
cs.CV
|
Honghu Pan, Zibo Du, Daxiang Liu, Chengliang Liu, Xiaoling Luo |
Text-based retrieval of Computer-Aided Design (CAD) models is a critical yet underexplored task for the reuse of legacy industrial designs. Existing CAD repositories are typically searched using filenames or directories, which limits the efficiency, scalabilit...Text-based retrieval of Computer-Aided Design (CAD) models is a critical yet underexplored task for the reuse of legacy industrial designs. Existing CAD repositories are typically searched using filenames or directories, which limits the efficiency, scalability, and accuracy of design retrieval. In this paper, we formally introduce text-to-CAD retrieval as a new cross-modal retrieval task, aiming to retrieve semantically relevant CAD models from large-scale databases given natural language queri...
|
| 30 |
Uncertainty-Guided Edge Learning for Deep Image Regression in Remote Sensing
2605.05590
|
cs.CV
|
Anh Vu Nguyen, Dino Sejdinovic, Tat-Jun Chin |
Edge learning refers to training machine learning models deployed on edge platforms, typically using new data accumulated onboard. The computational limitations on edge devices affect not only model optimisation, but also calculation of the predictive uncertai...Edge learning refers to training machine learning models deployed on edge platforms, typically using new data accumulated onboard. The computational limitations on edge devices affect not only model optimisation, but also calculation of the predictive uncertainty of the current model on the unlabelled data, which is vital for informing model updating. In this paper, we investigate edge learning in the context of performing deep image regression on a remote sensing satellite, where a deep network...
|
| 41 |
RAM-H1200: A Unified Evaluation and Dataset on Hand Radiographs for Rheumatoid Arthritis
2605.05616
|
cs.CVcs.LG
|
Songxiao Yang, Haolin Wang, Yao Fu, Junmu Peng, Lin Fan |
Rheumatoid arthritis (RA) assessment from hand radiographs requires multi-level analysis and modeling of anatomical structures and fine-grained local pathological changes. However, existing public resources do not support such unified multi-level analysis, oft...Rheumatoid arthritis (RA) assessment from hand radiographs requires multi-level analysis and modeling of anatomical structures and fine-grained local pathological changes. However, existing public resources do not support such unified multi-level analysis, often lacking full-hand coverage, fine-grained annotations, and consistent integration with clinical scoring systems. In particular, annotations that enable quantitative analysis of bone erosion (BE) remain scarce. RAM-H1200 contains 1,200 han...
|
| 45 |
Leveraging Image Generators to Address Training Data Scarcity: The Gen4Regen Dataset for Forest Regeneration Mapping
2605.05627
|
cs.CVcs.AIcs.LGcs.RO
|
Gabriel Jeanson, David-Alexandre Duclos, William Larrivée-Hardy, Noé Cochet, Matěj Boxan |
Sustainable forest management relies on precise species composition mapping, yet traditional ground surveys are labour-intensive and geographically constrained. While Uncrewed Aerial Vehicles (UAVs) offer scalable data collection, the transition to deep learni...Sustainable forest management relies on precise species composition mapping, yet traditional ground surveys are labour-intensive and geographically constrained. While Uncrewed Aerial Vehicles (UAVs) offer scalable data collection, the transition to deep learning-based interpretation is bottlenecked by the severe scarcity of expert-annotated imagery, particularly in complex, visually heterogeneous regeneration zones. This paper addresses the dual challenges of data scarcity and extreme class imba...
|
| 50 |
Learning a Delighting Prior for Facial Appearance Capture in the Wild
2605.05636
|
cs.CVcs.GR
|
Yuxuan Han, Xin Ming, Tianxiao Li, Zhuofan Shen, Qixuan Zhang |
High-quality facial appearance capture has traditionally required costly studio recording. Recent works consider an in-the-wild smartphone-based setup; however, their model-based inverse rendering paradigm struggles with the complex disentanglement of reflecta...High-quality facial appearance capture has traditionally required costly studio recording. Recent works consider an in-the-wild smartphone-based setup; however, their model-based inverse rendering paradigm struggles with the complex disentanglement of reflectance from unknown illumination. To bridge this gap, we propose to shift the paradigm into training a powerful delighting network as a prior to constrain the optimization. We leverage the OLAT dataset and the rendered Light Stage scans for tr...
|
| 52 |
AffectSeek: Agentic Affective Understanding in Long Videos under Vague User Queries
2605.05640
|
cs.CV
|
Zhen Zhang, Yuhang Yang, Yunxiang Jiang, Yuhuan Lu, Haifeng Lu |
Existing affective understanding studies have mainly focused on recognizing emotions from images, audio signals, or pre-cliped video clips, where the affective evidence is already given. This passive and clip-centered setting does not fully reflect real-world ...Existing affective understanding studies have mainly focused on recognizing emotions from images, audio signals, or pre-cliped video clips, where the affective evidence is already given. This passive and clip-centered setting does not fully reflect real-world scenarios, in which users often interact with long videos and express their needs through natural-language queries. In this paper, we study \textbf{Vague-Query-driven video Affective Understanding (VQAU)}, a new task that requires models to...
|
| 55 |
MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality
2605.05646
|
cs.CV
|
Panqi Yang, Haodong Jing, Jiahao Chao, Tingyan Xiang, Li Lin |
Unified visual tokenization faces a fundamental trade-off between high-fidelity pixel reconstruction (spatial equivariance) and semantic abstraction (conceptual invariance). We attribute this conflict to Manifold Misalignment: naive joint optimization induces ...Unified visual tokenization faces a fundamental trade-off between high-fidelity pixel reconstruction (spatial equivariance) and semantic abstraction (conceptual invariance). We attribute this conflict to Manifold Misalignment: naive joint optimization induces opposing gradients, creating a zero-sum game between reconstruction and perception. To address this, we propose MUSE, a framework based on Topological Orthogonality. By treating Structure as an orthogonal bridge, MUSE decouples optimization...
|
| 64 |
Sparse-to-Complete: From Sparse Image Captures to Complete 3D Scenes
2605.05664
|
cs.CV
|
Yiyang Shen, Yin Yang, Kun Zhou, Tianjia Shao |
We introduce S2C-3D, a novel sparse-view 3D reconstruction framework for high-fidelity and complete scene reconstruction from as few as six to eight images. Our framework features three components: a specialized diffusion model for scene-specific image restora...We introduce S2C-3D, a novel sparse-view 3D reconstruction framework for high-fidelity and complete scene reconstruction from as few as six to eight images. Our framework features three components: a specialized diffusion model for scene-specific image restoration, a training-free view-consistency conditioned sampling process in the diffusion model for refined Gaussian optimization, and a camera trajectory planning scheme to ensure comprehensive scene coverage. The specialized diffusion model is...
|
| 66 |
EGA: Adapting Frozen Encoders for Vector Search with Bounded Out-of-Distribution Degradation
2605.05674
|
cs.CVcs.AIcs.LG
|
Dongfang Zhao |
Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wron...Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wrong seen-class clusters, dropping worst-case Label Precision by over 40 points below the frozen baseline in our tests. We propose Euclidean Geodesic Alignment (EGA), a residual adapter that couples three principles: zero initialization, local...
|
| 69 |
MotionGRPO: Overcoming Low Intra-Group Diversity in GRPO-Based Egocentric Motion Recovery
2605.05680
|
cs.CV
|
Nanjie Yao, Junlong Ren, Wenhao Shen, Hao Wang |
This paper studies full-body 3D human motion recovery from head-mounted device signals. Existing diffusion-based methods often rely on global distribution matching, leading to local joint reconstruction errors. We propose MotionGRPO, a novel framework leveragi...This paper studies full-body 3D human motion recovery from head-mounted device signals. Existing diffusion-based methods often rely on global distribution matching, leading to local joint reconstruction errors. We propose MotionGRPO, a novel framework leveraging reinforcement learning post-training to inject fine-grained guidance into the diffusion process. Technically, we model diffusion sampling as a Markov decision process optimized via Group Relative Policy Optimization (GRPO). To this end, ...
|
| 75 |
R2H-Diff: Guided Spectral Diffusion Model for RGB-to-Hyperspectral Reconstruction
2605.05688
|
cs.CV
|
Songyu Ding, Ronggiang Zhao, Mingchun Sun, Jie Liu |
RGB-to-hyperspectral image reconstruction is a highly ill-posed inverse problem, since multiple plausible spectral distributions may correspond to the same RGB observation. Existing regression-based methods usually learn a deterministic mapping, which limits t...RGB-to-hyperspectral image reconstruction is a highly ill-posed inverse problem, since multiple plausible spectral distributions may correspond to the same RGB observation. Existing regression-based methods usually learn a deterministic mapping, which limits their ability to model reconstruction uncertainty and often leads to over-smoothed spectral responses. Although diffusion models provide strong distribution modeling capability, their direct application to hyperspectral reconstruction remain...
|
| 77 |
CFE-PPAR: Compression-friendly encryption for privacy-preserving action recognition leveraging video transformers
2605.05692
|
cs.CVcs.AIcs.CR
|
Haiwei Lin, Shoko Imaizumi, Hitoshi Kiya |
Privacy-preserving action recognition (PPAR) enables machines to understand human activities in videos without revealing sensitive visual content. Among the various strategies for PPAR, encryption-based methods achieve strong privacy protection while maintaini...Privacy-preserving action recognition (PPAR) enables machines to understand human activities in videos without revealing sensitive visual content. Among the various strategies for PPAR, encryption-based methods achieve strong privacy protection while maintaining high recognition performance. However, these methods lead to a catastrophic decrease in recognition performance and visual quality when the encrypted videos are compressed. That is, the previous methods are not compression-friendly. To a...
|
| 79 |
Adaptive Physical-Facial Representation Fusion via Subject-Invariant Cross-Modal Prompt Tuning for Video-Based Emotion Recognition
2605.05694
|
cs.CV
|
Xiwen Luo, Jia Li, Rencheng Song, Yu Liu, Juan Cheng |
Emotion recognition from facial videos enables non-contact inference of human emotional states. Although facial expressions are widely used cues, they cannot fully reflect intrinsic affective states. Remote photoplethysmography (rPPG) provides complementary ph...Emotion recognition from facial videos enables non-contact inference of human emotional states. Although facial expressions are widely used cues, they cannot fully reflect intrinsic affective states. Remote photoplethysmography (rPPG) provides complementary physiological information, but it is highly susceptible to noise and inter-subject variability, limiting generalization to unseen individuals. Existing multimodal methods combine facial and rPPG features, yet their fusion strategies often dis...
|
| 92 |
Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling
2605.05711
|
cs.CVcs.GRcs.HCcs.LGcs.MM
|
Anh H. Vo, Sungyo Lee, Phil-Joong Kim, Soo-Mi Choi, Yong-Guk Kim |
Recent advances in large language models (LLMs) have significantly improved language-driven 3D content generation, but most existing approaches still treat scene generation and user interaction as separate processes, limiting the adaptability and immersive pot...Recent advances in large language models (LLMs) have significantly improved language-driven 3D content generation, but most existing approaches still treat scene generation and user interaction as separate processes, limiting the adaptability and immersive potential of interactive multimedia systems. This paper presents a unified framework that closes the loop between language-driven 3D scene generation and immersive user interaction. Given natural language instructions, the system first constru...
|
| 93 |
EgoEMG: A Multimodal Egocentric Dataset with Bilateral EMG and Vision for Hand Pose Estimation
2605.05712
|
cs.CV
|
Ziheng Xi, Jiayi Yu, Yitao Wang, Yanbo Duan, Jianjiang Feng |
Surface electromyography (sEMG) records muscle activity during hand movement and can be decoded to recover detailed hand articulation. EMG and egocentric vision are complementary for hand sensing: EMG captures fine-grained finger articulation even under occlus...Surface electromyography (sEMG) records muscle activity during hand movement and can be decoded to recover detailed hand articulation. EMG and egocentric vision are complementary for hand sensing: EMG captures fine-grained finger articulation even under occlusion and poor lighting, while vision provides global hand configuration. However, no existing dataset synchronizes both modalities. We present EgoEMG, a multimodal egocentric dataset for bimanual hand pose estimation. EgoEMG includes bilater...
|
| 94 |
TriRelVLA: Triadic Relational Structure for Generalizable Embodied Manipulation
2605.05714
|
cs.CVcs.RO
|
Hanyu Zhou, Chuanhao Ma, Gim Hee Lee |
Vision-language-action (VLA) models perform well on training-seen robotic tasks but struggle to generalize to unseen scenes and objects. A key limitation lies in their implicit visual representations, which entangle object appearance, background, and scene lay...Vision-language-action (VLA) models perform well on training-seen robotic tasks but struggle to generalize to unseen scenes and objects. A key limitation lies in their implicit visual representations, which entangle object appearance, background, and scene layout. This makes policies sensitive to visual variations. Prior work improves transferability through structured intermediate representations that objectify visual content. However, these representations mainly capture scene semantics instea...
|
| 98 |
$\mathcal{B}^{3}$-Net: Controlled Posterior Bridge Learning for Multi-Task Dense Prediction
2605.05722
|
cs.CV
|
Meihua Zhou, Li Yang |
Multi-task dense prediction solves complementary pixel-level tasks in a unified model, such as semantic segmentation, depth estimation, surface normal estimation, and edge detection. Existing decoder-side interactions use attention, prompts, routing, diffusion...Multi-task dense prediction solves complementary pixel-level tasks in a unified model, such as semantic segmentation, depth estimation, surface normal estimation, and edge detection. Existing decoder-side interactions use attention, prompts, routing, diffusion, Mamba, or bridge features to exchange task evidence, but most of them organize this evidence implicitly. They usually fuse task features by similarity or affinity, without explicitly modeling that evidence reliability varies across tasks ...
|
| 115 |
Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction
2605.05749
|
cs.CV
|
Feifei Li, Qi Song, Chi Zhang, Rui Huang |
Dense 3D reconstruction from continuous image streams requires both accurate geometric aggregation and stable long-term memory management. Recent feed-forward reconstruction frameworks integrate observations through persistent memory representations, yet most ...Dense 3D reconstruction from continuous image streams requires both accurate geometric aggregation and stable long-term memory management. Recent feed-forward reconstruction frameworks integrate observations through persistent memory representations, yet most rely primarily on appearance-based similarity when updating memory. Such appearance-driven integration often leads to redundant accumulation of observations and unstable geometry when viewpoint changes occur. In this work, we propose a ray-...
|
| 117 |
Jointly Learning Structured Representations and Stabilized Affinity for Human Motion Segmentation
2605.05753
|
cs.CV
|
Xianghan Meng, Zhiyuan Huang, Zhengyu Tong, Chun-Guang Li |
Human Motion Segmentation (HMS), which aims to partition a video into non-overlapping segments corresponding to different human motions, has recently attracted increasing research attention. Existing HMS approaches are predominantly based on subspace clusterin...Human Motion Segmentation (HMS), which aims to partition a video into non-overlapping segments corresponding to different human motions, has recently attracted increasing research attention. Existing HMS approaches are predominantly based on subspace clustering, which are grounded on the assumption that the distribution of high-dimensional temporal features well aligns with a Union-of-Subspaces (UoS). For videos in the real world, however, the raw frame-level features often violate the UoS assum...
|
| 122 |
iTRIALSPACE: Programmable Virtual Lesion Trials for Controlled Evaluation of Lung CT Models
2605.05761
|
cs.CV
|
Fakrul Islam Tushar, Umme Hafsa Momy, Joseph Y. Lo, Geoffrey D. Rubin |
We introduce iTRIALSPACE, a programmable evaluation framework for controlled assessment of lung CT models. Standard benchmarks are static retrospective collections that entangle lesion size, lobe prevalence, anatomy, and acquisition context, making it difficul...We introduce iTRIALSPACE, a programmable evaluation framework for controlled assessment of lung CT models. Standard benchmarks are static retrospective collections that entangle lesion size, lobe prevalence, anatomy, and acquisition context, making it difficult to determine what structurally drives model accuracy. iTRIALSPACE addresses this limitation by composing real clinical CTs and lesion profiles into controlled virtual lesion trials through a four-stage pipeline: multidataset nodule profil...
|
| 124 |
X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction
2605.05765
|
cs.CV
|
Xiaoming Ren, Ru Zhen, Chao Li, Yang Song, Qiuxia Hou |
Inspired by the development of OpenClaw, there is a growing demand for mobile-based personal agents capable of handling complex and intuitive interactions. In this technical report, we introduce X-OmniClaw, a unified mobile agent designed for multimodal unders...Inspired by the development of OpenClaw, there is a growing demand for mobile-based personal agents capable of handling complex and intuitive interactions. In this technical report, we introduce X-OmniClaw, a unified mobile agent designed for multimodal understanding and interaction in the Android ecosystem. This unified architecture of perception, memory, and action enables the agent to handle complex mobile tasks with high contextual awareness. Specifically, Omni Perception provides a unified ...
|
| 130 |
The autoPET3 Challenge -- Automated Lesion Segmentation in Whole-Body PET/CT - Multitracer Multicenter Generalization
2605.05775
|
cs.CVcs.AI
|
Jakob Dexl, Katharina Jeblick, Andreas Mittermeier, Balthasar Schachtner, Anna Theresa Stüber |
We report the design and results of the third autoPET challenge (MICCAI 2024), which benchmarked automated lesion segmentation in whole-body PET/CT under a compositional generalization setting. Training data comprised 1,014 [18F]-FDG PET/CT studies from the Un...We report the design and results of the third autoPET challenge (MICCAI 2024), which benchmarked automated lesion segmentation in whole-body PET/CT under a compositional generalization setting. Training data comprised 1,014 [18F]-FDG PET/CT studies from the University Hospital Tübingen and 597 [18F]/[68Ga]-PSMA PET/CT studies from the LMU University Hospital Munich, constituting the largest publicly available annotated PSMA PET/CT dataset to date. The held-out test set of 200 studies covered fou...
|
| 134 |
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
2605.05781
|
cs.CVcs.AI
|
Zeyu Liu, Zanlin Ni, Yang Yue, Cheng Da, Huan Yang |
Unified multimodal models are envisioned to bridge the gap between understanding and generation. Yet, to achieve competitive performance, state-of-the-art models adopt largely decoupled understanding and generation components. This design, while effective for ...Unified multimodal models are envisioned to bridge the gap between understanding and generation. Yet, to achieve competitive performance, state-of-the-art models adopt largely decoupled understanding and generation components. This design, while effective for individual tasks, weakens the connection required for mutual enhancement, leaving the potential synergy empirically uncertain. We propose to explicitly restore this synergy by introducing Understanding-Oriented Post-Training (UNO), a lightw...
|
| 140 |
Na-IRSTD: Enhancing Infrared Small Target Detection via Native-Resolution Feature Selection and Fusion
2605.05804
|
cs.CV
|
Qian Xu, Chi Zhang, Qiming Zhang, Xi Li, Haojuan Yuan |
Infrared small target detection (IRSTD) faces the inherent challenge of precisely localizing dim targets amid complex background clutter. While progress has been made, existing methods usually follow conventional strategies to downsample features and discard s...Infrared small target detection (IRSTD) faces the inherent challenge of precisely localizing dim targets amid complex background clutter. While progress has been made, existing methods usually follow conventional strategies to downsample features and discard small targets' details, resulting in suboptimal performance. In this paper, we present Na-IRSTD, a native-resolution feature extraction and fusion framework for IRSTD. This framework elegantly incorporates native-resolution features to prese...
|
| 144 |
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs
2605.05810
|
cs.CV
|
Zhengru Fang, Yanan Ma, Yu Guo, Senkang Hu, Yixian Zhang |
When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clinical statement contradicting...When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clinical statement contradicting the image. We study this failure as negated-option attraction, where a model is drawn to a negated answer option even when it conflicts with both the visual evidence and the question. We introduce CXR-ContraBench (Chest X-Ray Contradiction...
|
| 150 |
ChartZero: Synthetic Priors Enable Zero Shot Chart Data Extraction
2605.05820
|
cs.CV
|
Md Touhidul Islam, Yasir Mahmud, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi |
Automated data extraction from line charts remains fundamentally bottlenecked by extreme stylistic diversity and a severe scarcity of comprehensively annotated, real-world datasets. Current end-to-end pipelines depend heavily on costly manual annotations, crip...Automated data extraction from line charts remains fundamentally bottlenecked by extreme stylistic diversity and a severe scarcity of comprehensively annotated, real-world datasets. Current end-to-end pipelines depend heavily on costly manual annotations, crippling their ability to generalize across arbitrary aesthetics and grid layouts. Furthermore, existing models suffer from two critical failure modes during reconstruction. First, extracting thin, intersecting curves frequently causes structu...
|
| 152 |
Unifying Scientific Communication: Fine-Grained Correspondence Across Scientific Media
2605.05831
|
cs.CV
|
Megha Mariam K. M, Vineeth N. Balasubramanian, C. V. Jawahar |
The communication of scientific knowledge has become increasingly multimodal, spanning text, visuals, and speech through materials such as research papers, slides, and recorded presentations. These different representations collectively convey a study's reason...The communication of scientific knowledge has become increasingly multimodal, spanning text, visuals, and speech through materials such as research papers, slides, and recorded presentations. These different representations collectively convey a study's reasoning, results, and insights, offering complementary perspectives that enrich understanding. However, despite their shared purpose, such materials are rarely connected in a structured way. The absence of explicit links across formats makes it...
|
| 159 |
VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding
2605.05848
|
cs.CVcs.AI
|
Kuanwei Lin, Wenhao Zhang, Ge Li |
Video large multimodal models increasingly face a scalability bottleneck: long videos produce excessively long visual-token sequences, which sharply increase memory and latency during inference. While existing compression methods are effective in specific sett...Video large multimodal models increasingly face a scalability bottleneck: long videos produce excessively long visual-token sequences, which sharply increase memory and latency during inference. While existing compression methods are effective in specific settings, most are either weakly query-aware or apply a fixed compression policy across frames, proving suboptimal when visual evidence is unevenly distributed over time. To address this, we present VideoRouter, a query-adaptive dual-router fra...
|
| 160 |
Align3D-AD: Cross-Modal Feature Alignment and Dual-Prompt Learning for Zero-shot 3D Anomaly Detection
2605.05850
|
cs.CV
|
Letian Bai, Xuanming Cao, Juan Du, Chengyu Tao |
Zero-shot 3D anomaly detection aims to identify anomalies without access to training data from target categories. However, existing methods mainly rely on projecting 3D observations into multi-view representations that primarily capture geometric cues rather t...Zero-shot 3D anomaly detection aims to identify anomalies without access to training data from target categories. However, existing methods mainly rely on projecting 3D observations into multi-view representations that primarily capture geometric cues rather than realistic visual semantics and process them with vision encoders pretrained on RGB data, leading to a significant domain gap between the encoder and the projected representations. To address this issue, we propose Align3D-AD, a unified ...
|
| 169 |
InkDiffuser: High-Fidelity One-shot Chinese Calligraphy via Differentiable Morphological Optimization
2605.05865
|
cs.CV
|
Kunchong Shi, Jing Zhang |
Current Chinese calligraphy generation methods suffer from poor stroke rendering and unrealistic ink morphology, resulting in outputs with limited visual fidelity and artistic fluidity. To address this problem, we propose \textbf{InkDiffuser}, a diffusion-base...Current Chinese calligraphy generation methods suffer from poor stroke rendering and unrealistic ink morphology, resulting in outputs with limited visual fidelity and artistic fluidity. To address this problem, we propose \textbf{InkDiffuser}, a diffusion-based generative framework for one-shot Chinese calligraphy synthesis. To guarantee high-fidelity rendering, we introduce two core contributions: a high-frequency enhancement mechanism and a Differentiable Ink Structure (DIS) loss that explicit...
|
| 178 |
Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models
2605.05886
|
cs.CV
|
Daniel Sungho Jung, Kyoung Mu Lee |
Dense hand contact estimation requires both high-level semantic understanding and fine-grained geometric reasoning of human interaction to accurately localize contact regions. Recently, multi-modal large language models (MLLMs) have demonstrated strong capabil...Dense hand contact estimation requires both high-level semantic understanding and fine-grained geometric reasoning of human interaction to accurately localize contact regions. Recently, multi-modal large language models (MLLMs) have demonstrated strong capabilities in understanding visual semantics, enabled by vision-language priors learned from large-scale data. However, leveraging MLLMs for dense hand contact estimation remains underexplored. There are two major challenges in applying MLLMs to...
|
| 179 |
DBMSolver: A Training-free Diffusion Bridge Sampler for High-Quality Image-to-Image Translation
2605.05889
|
cs.CVcs.AIcs.LGmath.NA
|
Sankarshana Venugopal, Mohammad Mostafavi, Jonghyun Choi |
Diffusion-based image-to-image (I2I) translation excels in high-fidelity generation but suffers from slow sampling in state-of-the-art Diffusion Bridge Models (DBMs), often requiring dozens of function evaluations (NFEs). We introduce DBMSolver, a training-fre...Diffusion-based image-to-image (I2I) translation excels in high-fidelity generation but suffers from slow sampling in state-of-the-art Diffusion Bridge Models (DBMs), often requiring dozens of function evaluations (NFEs). We introduce DBMSolver, a training-free sampler that exploits the semi-linear structure of DBM's underlying SDE and ODE via exponential integrators, yielding highly-efficient 1st- and 2nd-order solutions. This reduces NFEs by up to 5x while boosting quality (e.g., FID drops 53%...
|
| 181 |
MTL-MAD: Multi-Task Learners are Effective Medical Anomaly Detectors
2605.05891
|
cs.CVcs.AIcs.LG
|
Bogdan Alexandru Bercean, Florinel Alin Croitoru, Vlad Hondru, Ciprian Mihai Ceausescu, Andreea Iuliana Ionescu |
Anomaly detection in medical images is a challenging task, since anomalies are not typically available during training. Recent methods leverage a single pretext task coupled with a large-scale pre-trained model to reach state-of-the-art performance. Instead, w...Anomaly detection in medical images is a challenging task, since anomalies are not typically available during training. Recent methods leverage a single pretext task coupled with a large-scale pre-trained model to reach state-of-the-art performance. Instead, we propose to learn multiple self-supervised and pseudo-labeling tasks from scratch, using a joint model based on Mixture-of-Experts (MoE). By carefully integrating multiple proxy tasks, the joint model effectively learns a robust representa...
|
| 184 |
Detecting AI-Generated Videos with Spiking Neural Networks
2605.05895
|
cs.CVcs.AI
|
Minsuk Jang, Yujin Yang, Heeseon Kim, Minseok Son, Younghun Kim |
Modern AI-generated videos are photorealistic at the single-frame level, leaving inter-frame dynamics as the main remaining axis for detection. Existing detectors typically handle this temporal evidence in three ways: feeding the full frame sequence to a gener...Modern AI-generated videos are photorealistic at the single-frame level, leaving inter-frame dynamics as the main remaining axis for detection. Existing detectors typically handle this temporal evidence in three ways: feeding the full frame sequence to a generic temporal backbone, reducing one dominant temporal cue to fixed video-level descriptors, or comparing temporal features to real-video statistics through a detection metric. These strategies degrade sharply under cross-generator evaluation...
|
| 187 |
Understanding Cross-Language Transfer Improvements in Low-Resource HTR: The Role of Sequence Modeling
2605.05900
|
cs.CV
|
Sana Al-azzawi, Chang Liu, Nudrat Habib, Elisa Barney, Marcus Liwicki |
Handwritten Text Recognition (HTR) for Arabic-script languages benefits from cross-language joint training under low-resource conditions, particularly when using CRNN-based models that combine convolutional encoders with sequence modeling. However, it remains ...Handwritten Text Recognition (HTR) for Arabic-script languages benefits from cross-language joint training under low-resource conditions, particularly when using CRNN-based models that combine convolutional encoders with sequence modeling. However, it remains unclear whether these improvements are better explained by shared visual representations or sequence-level dependencies. In this work, we conduct a controlled architectural study of line-level Arabic-script HTR, comparing CNN-only models wi...
|
| 189 |
Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers
2605.05908
|
cs.CVcs.AI
|
Frederik Schäfer, Luis Mandl, Lars Kälber, Tim Ricken |
Label noise remains a critical bottleneck for the generalization of supervised deep learning models, particularly when errors are structured rather than random. Standard robust training methods often fail in the presence of such semantically proximal classific...Label noise remains a critical bottleneck for the generalization of supervised deep learning models, particularly when errors are structured rather than random. Standard robust training methods often fail in the presence of such semantically proximal classification errors. This work presents an architecture-agnostic Lipschitz-constant Bayesian header that can be integrated into feature extractors such as vision transformers, yielding the bi-Lipschitz-constrained Bayesian Vision Transformer (LipB...
|
| 191 |
Plug-and-play Class-aware Knowledge Injection for Prompt Learning with Visual-Language Model
2605.05910
|
cs.CV
|
Junhui Yin, Nan Pu, Xinyu Zhang, Lingfeng Yang, Lin Wu |
Prompt learning has become an effective and widely used technique in enhancing vision-language models (VLMs) such as CLIP for various downstream tasks, particularly in zero-shot classification within specific domains. Existing methods typically focus on either...Prompt learning has become an effective and widely used technique in enhancing vision-language models (VLMs) such as CLIP for various downstream tasks, particularly in zero-shot classification within specific domains. Existing methods typically focus on either learning class-shared prompts for a given domain or generating instance-specific prompts through conditional prompt learning. While these methods have achieved promising performance, they often overlook class-specific knowledge in prompt d...
|
| 198 |
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
2605.05922
|
cs.CV
|
Yuan Wang, Ouxiang Li, Yulong Xu, Borui Liao, Jiajun Liang |
Recent advances in generative video models are increasingly driven by post-training and test-time scaling, both of which critically depend on the quality of video reward models (RMs). An ideal reward model should predict accurate rewards that align with human ...Recent advances in generative video models are increasingly driven by post-training and test-time scaling, both of which critically depend on the quality of video reward models (RMs). An ideal reward model should predict accurate rewards that align with human preferences across diverse scenarios. However, existing paradigms face a fundamental dilemma: \textit{Discriminative RMs} regress rewards directly on features extracted by multimodal large language models (MLLMs) without explicit reasoning,...
|
| 200 |
Backdoor Mitigation in Object Detection via Adversarial Fine-Tuning
2605.05928
|
cs.CVcs.CR
|
Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak |
Backdoor attacks can implant malicious behaviours into deep models while preserving performance on clean data, posing a serious threat to safety-critical vision systems. Although backdoor mitigation has been studied extensively for image classification, defens...Backdoor attacks can implant malicious behaviours into deep models while preserving performance on clean data, posing a serious threat to safety-critical vision systems. Although backdoor mitigation has been studied extensively for image classification, defenses for object detection remain comparatively underdeveloped. Adversarial fine-tuning is a common backdoor mitigation approach in classification, but adapting it to detection is nontrivial as classification-oriented adversarial generation do...
|
| 203 |
Whole-body CT attenuation and volume charts from routine clinical scans via evidence-grounded LLM report filtering
2605.05933
|
cs.CV
|
Christian Wachinger, Bernhard Renger, Christopher Späth, Jan Kirschke, Marcus Makowski |
Interpreting quantitative CT biomarkers, such as organ volume and tissue attenuation, requires large-scale healthy reference distributions. However, creating these is challenging because clinical datasets are often heavily enriched with pathology. Here, we dev...Interpreting quantitative CT biomarkers, such as organ volume and tissue attenuation, requires large-scale healthy reference distributions. However, creating these is challenging because clinical datasets are often heavily enriched with pathology. Here, we develop an evidence-grounded, cross-verified large language model (LLM) ensemble to filter pathological findings from radiology reports, enabling the construction of pathology-reduced cohorts from over 350,000 CT examinations. Five LLMs, first...
|
| 206 |
RAWild: Sensor-Agnostic RAW Object Detection via Physics-Guided Curve and Grid Modeling
2605.05941
|
cs.CV
|
Shuhong Liu, Gengjia Chang, Jun Liu, Xuangeng Chu, Yinqiang Zheng |
Camera sensor RAW data offers intrinsic advantages for object detection, including deeper bit depth, preserved physical information, and freedom from image signal processor (ISP) distortions. However, varying exposure conditions, spectral sensitivities, and bi...Camera sensor RAW data offers intrinsic advantages for object detection, including deeper bit depth, preserved physical information, and freedom from image signal processor (ISP) distortions. However, varying exposure conditions, spectral sensitivities, and bit depths across devices introduce substantially larger domain gaps than sRGB, making sensor-agnostic generalization a fundamental challenge. In this study, we present \textbf{RAWild}, a physics-guided global-local tone mapping framework for...
|
| 208 |
MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware
2605.05945
|
cs.CVcs.CL
|
Senthil Palanisamy, Abhishek Anand, Satpal Singh Rathor, Pratyush Patnaik, Shubhanshu Khatana |
The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scale egocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to captu...The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scale egocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to capture the long horizon temporal dependencies necessary for complex robotic task execution. To bridge this gap, we present MobileEgo Anywhere, a framework designed to facilitate the collection of robust, hour plus egocentric trajectories using ...
|
| 227 |
Prompt-Free and Efficient SAM2 Adaptation for Biomedical Semantic Segmentation via Dual Adapters
2605.05979
|
cs.CV
|
Hinako Mitsuoka, Kazuhiro Hotta |
Segment Anything Model 2 (SAM2) demonstrated impressive zero-shot capabilities on natural images but faces challenges in biomedical segmentation due to significant domain shifts and prompt dependency. To address these limitations, we propose a prompt-free, par...Segment Anything Model 2 (SAM2) demonstrated impressive zero-shot capabilities on natural images but faces challenges in biomedical segmentation due to significant domain shifts and prompt dependency. To address these limitations, we propose a prompt-free, parameter-efficient fine-tuning framework designed for multi-class segmentation on variable-sized inputs. We introduce a convolutional Positional Encoding Generator to adapt effectively to arbitrary aspect ratios and present a dual-adapter str...
|
| 232 |
iPhoneBlur: A Difficulty-Stratified Benchmark for Consumer Device Motion Deblurring
2605.05990
|
cs.CVcs.AI
|
Abdullah Al Shafi, Kazi Saeed Alam |
Motion blur restoration on consumer mobile devices is typically evaluated using aggregate metrics that obscure performance variation across blur difficulty, masking model behavior under real deployment conditions. This work introduces iPhoneBlur, a difficulty-...Motion blur restoration on consumer mobile devices is typically evaluated using aggregate metrics that obscure performance variation across blur difficulty, masking model behavior under real deployment conditions. This work introduces iPhoneBlur, a difficulty-stratified benchmark of 7,400 image pairs synthesized from high-framerate iPhone 17 Pro videos captured in diverse real-world scenarios. Samples are partitioned into Easy, Medium, and Hard categories through PSNR-guided adaptive temporal wi...
|
| 237 |
4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding
2605.05997
|
cs.CV
|
Zhangquan Chen, Manyuan Zhang, Xinlei Yu, Xiang An, Bo Li |
Dynamic spatial reasoning from monocular video is essential for bridging visual intelligence and the physical world, yet remains challenging for vision-language models (VLMs). Prior approaches either verbalize spatial-temporal reasoning entirely as text, which...Dynamic spatial reasoning from monocular video is essential for bridging visual intelligence and the physical world, yet remains challenging for vision-language models (VLMs). Prior approaches either verbalize spatial-temporal reasoning entirely as text, which is inherently verbose and imprecise for complex dynamics, or rely on external geometric modules that increase inference complexity without fostering intrinsic model capability. In this paper, we present 4DThinker, the first framework that ...
|
| 240 |
Neuromorphic visual attention for Sign-language recognition on SpiNNaker
2605.06005
|
cs.CV
|
Sarka Liskova, Olha Vedmedenko, Mazdak Fatahi, Matej Hoffmann, P. Michael Furlong |
Sign-language recognition has achieved substantial gains in classification accuracy in recent years; however, the latency and power requirements of most existing methods limit their suitability for real-time deployment. Neuromorphic sensing and processing offe...Sign-language recognition has achieved substantial gains in classification accuracy in recent years; however, the latency and power requirements of most existing methods limit their suitability for real-time deployment. Neuromorphic sensing and processing offer an alternative paradigm based on sparse, event-driven computation that supports low-latency and energy-efficient perception. In this work, we introduce an end-to-end neuromorphic architecture for American Sign Language (ASL) fingerspellin...
|
| 243 |
Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models
2605.06010
|
cs.CVcs.AI
|
Yuchen Guo, Junli Gong, Wenjun Dong, Yiuming Cheung, Weifeng Su |
Purely RGB-based vision models often fail to provide reliable cues in challenging scenarios such as nighttime and fog, leading to degraded performance and safety risks. Infrared imaging captures heat-emitting sources and provides critical complementary informa...Purely RGB-based vision models often fail to provide reliable cues in challenging scenarios such as nighttime and fog, leading to degraded performance and safety risks. Infrared imaging captures heat-emitting sources and provides critical complementary information, but existing high-fidelity fusion methods suffer from prohibitive latency, rendering them impractical for real-time edge deployment. To address this, we propose FusionProxy, a real-time image fusion module designed as a fully independ...
|
| 244 |
T2I-VeRW: Part-level Fine-grained Perception for Text-to-Image Vehicle Retrieval
2605.06012
|
cs.CVcs.AI
|
Xiao Wang, Ziwen Wang, Weizhe Kong, Wentao Wu, Yuehang Li |
Vehicle Re-identification (Re-ID) aims to retrieve the most similar image to a given query from images captured by non-overlapping cameras. Extending vehicle Re-ID from image-only queries to text-based queries enables retrieval in real-world scenarios where on...Vehicle Re-identification (Re-ID) aims to retrieve the most similar image to a given query from images captured by non-overlapping cameras. Extending vehicle Re-ID from image-only queries to text-based queries enables retrieval in real-world scenarios where only a witness description of the target vehicle is available. In this paper, we propose PFCVR, a Part-level Fine-grained Cross-modal Vehicle Retrieval model for text-to-image vehicle re-identification. PFCVR constructs locally paired images ...
|
| 247 |
PlotPick: AI-powered batch extraction of numerical data from scientific figures
2605.06021
|
cs.CVcs.DL
|
Tommy Carstensen |
Systematic reviews and meta-analyses frequently require numerical data that authors report only as figures, yet manual digitisation is slow and does not scale. We present PlotPick, an open-source tool that uses vision-language models (VLMs) to batch-extract st...Systematic reviews and meta-analyses frequently require numerical data that authors report only as figures, yet manual digitisation is slow and does not scale. We present PlotPick, an open-source tool that uses vision-language models (VLMs) to batch-extract structured tabular data from scientific figures. We evaluate six VLMs from three providers on two established chart-to-table benchmarks (ChartX and PlotQA) and compare against the dedicated chart-to-table model DePlot. All six VLMs outperform...
|
| 260 |
Domain Generalization through Spatial Relation Induction over Visual Primitives
2605.06043
|
cs.CV
|
Dat Nguyen, Duc-Duy Nguyen |
Domain generalization requires identifying stable representations that support reliable classification across domains. Most existing methods seek such stability through improving the training process, for example, through model selection strategies, data augme...Domain generalization requires identifying stable representations that support reliable classification across domains. Most existing methods seek such stability through improving the training process, for example, through model selection strategies, data augmentation, or feature-alignment objectives. Although these strategies can be effective, they leave the representation learning of structural composition implicit, which may limit performance on compositional domain generalization benchmarks. ...
|
| 263 |
Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization
2605.06049
|
cs.CV
|
Weijian Su, Songqian Zhang, Yuqi Han, Jian Zhuang, Yongdong Huang |
As a key technique in multi-modal processing, infrared and visible image fusion (IVIF) plays a crucial role in integrating complementary spectral information for visual enhancement and downstream vision tasks. Despite remarkable progress, existing methods stru...As a key technique in multi-modal processing, infrared and visible image fusion (IVIF) plays a crucial role in integrating complementary spectral information for visual enhancement and downstream vision tasks. Despite remarkable progress, existing methods struggle to flexibly accommodate heterogeneous demands. Achieving adaptive fusion that aligns with various preferences from both human and machine vision remains an open and challenging problem. To address this challenge, we propose DPOFusion, ...
|
| 265 |
RealCam: Real-Time Novel-View Video Generation with Interactive Camera Control
2605.06051
|
cs.CV
|
Youcan Xu, Jiaxin Shi, Zhen Wang, Wensong Song, Feifei Shao |
Camera-controlled video-to-video (V2V) generation enables dynamic viewpoint synthesis from monocular footage, holding immense potential for interactive filmmaking and live broadcasting. However, existing implicit synthesis methods fundamentally rely on non-cau...Camera-controlled video-to-video (V2V) generation enables dynamic viewpoint synthesis from monocular footage, holding immense potential for interactive filmmaking and live broadcasting. However, existing implicit synthesis methods fundamentally rely on non-causal, full-sequence processing and rigid prefix-style temporal concatenation. This architectural paradigm mandates bidirectional attention, resulting in prohibitive computational latency, quadratic complexity scaling, and inherent incompatib...
|
| 275 |
PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers
2605.06064
|
cs.CV
|
Xiangyue Zhang, Yiyi Cai, Kunhang Li, Kaixing Yang, You Zhou |
We propose PersonaGesture, a diffusion-based pipeline for single-reference co-speech gesture personalization of unseen speakers. Given target speech and one motion clip from a new speaker, the model must synthesize gestures that follow the new utterance while ...We propose PersonaGesture, a diffusion-based pipeline for single-reference co-speech gesture personalization of unseen speakers. Given target speech and one motion clip from a new speaker, the model must synthesize gestures that follow the new utterance while retaining speaker-specific pose choices, without per-speaker optimization. This setting is useful for avatars and virtual agents, but it is hard because the reference mixes stable speaker habits with utterance-specific trajectories. Persona...
|
| 279 |
Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models
2605.06070
|
cs.CV
|
Zhikai Li, Yue Zhao, Edward Zhongwei Zhang, Xuewen Liu, Jing Zhang |
Reinforcement learning from human feedback (RLHF) effectively promotes preference alignment of text-to-image (T2I) diffusion models. To improve computational efficiency, direct preference optimization (DPO), which avoids explicit reward modeling, has been wide...Reinforcement learning from human feedback (RLHF) effectively promotes preference alignment of text-to-image (T2I) diffusion models. To improve computational efficiency, direct preference optimization (DPO), which avoids explicit reward modeling, has been widely studied. However, its reliance on binary feedback limits it to coarse-grained modeling on chosen-rejected pairs, resulting in suboptimal optimization. In this paper, we propose ArenaPO, which leverages Arena scores as offline rewards to ...
|
| 284 |
MSD-Score: Multi-Scale Distributional Scoring for Reference-Free Image Caption Evaluation
2605.06080
|
cs.CV
|
Shichao Kan, Xuyang Zhang, Haojie Zhang, Zhe Zhu, Yigang Cen |
Evaluating image captions without references remains challenging because global embedding similarity often misses fine-grained mismatches such as hallucinated objects, missing attributes, or incorrect relations. We propose MSD-Score, a reference-free metric th...Evaluating image captions without references remains challenging because global embedding similarity often misses fine-grained mismatches such as hallucinated objects, missing attributes, or incorrect relations. We propose MSD-Score, a reference-free metric that models image patch and text token embeddings as von Mises-Fisher mixtures on the unit hypersphere. Instead of treating each modality as a single point, MSD-Score formulates image-text matching as a multi-scale distributional scoring prob...
|
| 287 |
Revisiting Uncertainty: On Evidential Learning for Partially Relevant Video Retrieval
2605.06083
|
cs.CVcs.IRcs.LGcs.MM
|
Jun Li, Peifeng Lai, Xuhang Lou, Jinpeng Wang, Yuting Wang |
Partially relevant video retrieval aims to retrieve untrimmed videos using text queries that describe only partial content. However, the inherent asymmetry between brief queries and rich video content inevitably introduces uncertainty into the retrieval proces...Partially relevant video retrieval aims to retrieve untrimmed videos using text queries that describe only partial content. However, the inherent asymmetry between brief queries and rich video content inevitably introduces uncertainty into the retrieval process. In this setting, vague queries often induce semantic ambiguity across videos, a challenge that is further exacerbated by the sparse temporal supervision within videos, which fails to provide sufficient matching evidence. To address this,...
|
| 288 |
AMIEOD: Adaptive Multi-Experts Image Enhancement for Object Detection in Low-Illumination Scenes
2605.06084
|
cs.CV
|
Xiaochen Huang, Honggang Chen, Weicheng Zhang, Xiaobo Dai, Yongyi Li |
In multimedia application scenarios, images captured under low-illumination conditions often lead to lower accuracy in visual perception tasks compared to those taken in well-lit environments. To tackle this challenge, we propose AMIEOD, an image enhancement...In multimedia application scenarios, images captured under low-illumination conditions often lead to lower accuracy in visual perception tasks compared to those taken in well-lit environments. To tackle this challenge, we propose AMIEOD, an image enhancement-enabled object detection framework for low-illumination scenes, where the two tasks are jointly optimized in a detection performance-oriented manner. Specifically, to fully exploit the information in poorly lit images, a Multi-Experts Im...
|
| 289 |
LARGO: Low-Rank Hypernetwork for Handling Missing Modalities
2605.06086
|
cs.CV
|
Niels Vyncke, Pooya Ashtari, Aleksandra Pižurica |
Addressing missing modalities is an important challenge in multimodal image analysis and often relies on complex architectures that do not transfer easily to different datasets without architectural modifications or hyperparameter tuning. While most existing m...Addressing missing modalities is an important challenge in multimodal image analysis and often relies on complex architectures that do not transfer easily to different datasets without architectural modifications or hyperparameter tuning. While most existing methods tackle this problem in feature space by engineering representations that are robust to missing inputs, we instead operate in weight space. We propose LARGO, a hypernetwork that compresses the $2^N-1$ dedicated missing-modality models...
|
| 290 |
OpenGaFF: Open-Vocabulary Gaussian Feature Field with Codebook Attention
2605.06088
|
cs.CV
|
Kunyi Li, Michael Niemeyer, Sen Wang, Stefano Gasperini, Nassir Navab |
Understanding open-vocabulary 3D scenes with Gaussian-based representations remains challenging due to fragmented and spatially inconsistent semantic predictions across multi-view observations. In this paper, we present OpenGaFF, a novel framework for open-voc...Understanding open-vocabulary 3D scenes with Gaussian-based representations remains challenging due to fragmented and spatially inconsistent semantic predictions across multi-view observations. In this paper, we present OpenGaFF, a novel framework for open-vocabulary 3D scene understanding built upon 3D Gaussian Splatting. At the core of our method is a Gaussian Feature Field that models semantics as a continuous function of Gaussian geometry and appearance. By explicitly conditioning semantic p...
|
| 293 |
Boosting Self-Supervised Tracking with Contextual Prompts and Noise Learning
2605.06092
|
cs.CV
|
Yaozong Zheng, Qihua Liang, Bineng Zhong, Shuimu Zeng, Yuanliang Xue |
Learning robust contextual knowledge from unlabeled videos is essential for advancing self-supervised tracking. However, conventional self-supervised trackers lack effective context modeling, while existing context association methods based on non-semantic que...Learning robust contextual knowledge from unlabeled videos is essential for advancing self-supervised tracking. However, conventional self-supervised trackers lack effective context modeling, while existing context association methods based on non-semantic queries struggle to adapt to unlabeled tracking scenarios, making it difficult to learn reliable contextual cues. In this work, we propose a novel self-supervised tracking framework, named \textbf{\tracker}, which introduces a dual-modal conte...
|
| 294 |
VISD: Enhancing Video Reasoning via Structured Self-Distillation
2605.06094
|
cs.CVcs.AI
|
Hao Lin, Kunyang Lv, Xu Jiang, Jingqi Tian, Zhongjing Du |
Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level rewards and the lack of fine grained credit assignment over long, temporally grounded reasoning trajectories. While reinforcement learning with verifiable rewards (RLVR) ...Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level rewards and the lack of fine grained credit assignment over long, temporally grounded reasoning trajectories. While reinforcement learning with verifiable rewards (RLVR) provides reliable supervision, it fails to capture token level contributions, leading to inefficient learning. Conversely, existing self distillation methods offer dense supervision but lack structure and diagnostic specificity, and often i...
|
| 295 |
Metonymy in vision models undermines attention-based interpretability
2605.06095
|
cs.CV
|
Ananthu Aniraj, Cassio F. Dantas, Dino Ienco, Massimiliano Mancini, Diego Marcos |
Part-based reasoning is a classical strategy to make a computer vision model directly focus on the object parts that are relevant to the downstream task. In the context of deep learning, this also serves to improve by-design interpretability, often by using pa...Part-based reasoning is a classical strategy to make a computer vision model directly focus on the object parts that are relevant to the downstream task. In the context of deep learning, this also serves to improve by-design interpretability, often by using part-centric attention mechanisms on top of a latent image representation provided by a standard, black-box model. This approach is based on a locality assumption: that the latent representation of an object part encodes primarily information...
|
| 303 |
Dynamic Pondering Sparsity-aware Mixture-of-Experts Transformer for Event Stream based Visual Object Tracking
2605.06112
|
cs.CVcs.AI
|
Shiao Wang, Xiao Wang, Duoqing Yang, Wenhao Zhang, Bo Jiang |
Despite significant progress, RGB-based trackers remain vulnerable to challenging imaging conditions, such as low illumination and fast motion. Event cameras offer a promising alternative by asynchronously capturing pixel-wise brightness changes, providing hig...Despite significant progress, RGB-based trackers remain vulnerable to challenging imaging conditions, such as low illumination and fast motion. Event cameras offer a promising alternative by asynchronously capturing pixel-wise brightness changes, providing high dynamic range and high temporal resolution. However, existing event-based trackers often neglect the intrinsic spatial sparsity and temporal density of event data, while relying on a single fixed temporal-window sampling strategy that is ...
|
| 308 |
Pest-Thinker: Learning to Think and Reason like Entomologists via Reinforcement Learning
2605.06121
|
cs.CV
|
Xueheng Li, Yu Wang, Tao Hu, Ji Huang, Ke Cao |
Pest-induced crop losses pose a major threat to global food security and sustainable agricultural development. While recent advances in Multimodal Large Language Models (MLLMs) have shown strong potential for visual understanding and smart agriculture, their d...Pest-induced crop losses pose a major threat to global food security and sustainable agricultural development. While recent advances in Multimodal Large Language Models (MLLMs) have shown strong potential for visual understanding and smart agriculture, their direct application to pest recognition remains limited due to the domain's unique challenges such as high inter-species complexity, intra-species variability, and the scarcity of expert-annotated data. In this work, we introduce Pest-Thinker...
|
| 311 |
Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration
2605.06127
|
cs.CVcs.AI
|
Haisen He, Xiangyu Zou, SongLin Dong, Heng Li, Yihong Gong |
Real-world image degradation is often unknown, spatially non-uniform, and compositional, requiring all-in-one restoration models to adapt a single set of weights to diverse local corruption patterns without test-time degradation labels. Existing methods typica...Real-world image degradation is often unknown, spatially non-uniform, and compositional, requiring all-in-one restoration models to adapt a single set of weights to diverse local corruption patterns without test-time degradation labels. Existing methods typically modulate a shared backbone with global prompts or degradation descriptors, or route features through predefined expert pools. However, compact global conditioning can bottleneck localized degradation evidence, while static expert routin...
|
| 316 |
Autoregressive Visual Generation Needs a Prologue
2605.06137
|
cs.CVcs.AIcs.LG
|
Bowen Zheng, Weijian Luo, Guang Yang, Colin Zhang, Tianyang Hu |
In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive (AR) image generation. Instead of modifying visual tokens to satisfy both reconstruction and generation, Prologue generates a small set of prologue t...In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive (AR) image generation. Instead of modifying visual tokens to satisfy both reconstruction and generation, Prologue generates a small set of prologue tokens prepended to the visual token sequence. These prologue tokens are trained exclusively with the AR cross-entropy (CE) loss, while visual tokens remain dedicated to reconstruction. This decoupled design lets us optimize generation throu...
|
| 321 |
AI-Generated Images: What Humans and Machines See When They Look at the Same Image
2605.06143
|
cs.CVcs.AI
|
Silvia Poletti, Justin Ilyes, Marcel Hasenbalg, David Fischinger, Martin Boyer |
The misuse of generative AI in online disinformation campaigns highlights the urgent need for transparent and explainable detection systems. In this work, we investigate how detectors for AI-generated images can be more effective in providing human-understanda...The misuse of generative AI in online disinformation campaigns highlights the urgent need for transparent and explainable detection systems. In this work, we investigate how detectors for AI-generated images can be more effective in providing human-understandable explanations for their predictions. To this end, we develop a suite of detectors with various architectures and fine-tuning strategies, trained on our large-scale photorealistic fake image dataset, AIText2Image, and assess their perform...
|
| 323 |
Learning Discrete Autoregressive Priors with Wasserstein Gradient Flow
2605.06148
|
cs.CVcs.AIcs.LG
|
Bowen Zheng, Yihong Luo, Tianyang Hu |
Discrete image tokenizers are commonly trained in two stages: first for reconstruction, and then with a prior model fitted to the frozen token sequences. This decoupling leaves the tokenizer unaware of the model that will later generate its tokens. As a result...Discrete image tokenizers are commonly trained in two stages: first for reconstruction, and then with a prior model fitted to the frozen token sequences. This decoupling leaves the tokenizer unaware of the model that will later generate its tokens. As a result, the learned tokens may preserve image information well but still be difficult for an autoregressive (AR) prior to predict from left to right. We analyze this mismatch using Tripartite Variational Consistency (TVC), which decomposes latent...
|
| 329 |
Beyond Forgetting in Continual Medical Image Segmentation: A Comprehensive Benchmark Study
2605.06160
|
cs.CV
|
Bomin Wang, Hangqi Zhou, Yibo Gao, Xiahai Zhuang |
Continual learning (CL) is essential for deploying medical image segmentation models in clinical environments where imaging domains, anatomical targets, and diagnostic tasks evolve over time. However, continual segmentation still faces three main challenges. F...Continual learning (CL) is essential for deploying medical image segmentation models in clinical environments where imaging domains, anatomical targets, and diagnostic tasks evolve over time. However, continual segmentation still faces three main challenges. First, the scenarios for this task remain insufficiently standardized for real-world clinical settings. Second, existing research has been primarily focused on mitigating forgetting, overlooking the other essential properties such as plastic...
|
| 335 |
DynT2I-Eval: A Dynamic Evaluation Framework for Text-to-Image Models
2605.06170
|
cs.CV
|
Juntong Wang, Jiarui Wang, Huiyu Duan, Lewei Li, Guangtao Zhai |
Existing text-to-image (T2I) benchmarks largely rely on fixed prompt sets, leaving them vulnerable to overfitting and benchmark contamination once publicly released and repeatedly reused. In this work, we propose DynT2I-Eval, a fully automated dynamic evaluati...Existing text-to-image (T2I) benchmarks largely rely on fixed prompt sets, leaving them vulnerable to overfitting and benchmark contamination once publicly released and repeatedly reused. In this work, we propose DynT2I-Eval, a fully automated dynamic evaluation framework for T2I models. It constructs a structured visual semantic space from long-form descriptions, decomposing prompts into controllable dimensions (e.g., subject, logical constraint, environment, and composition). This enables the ...
|
| 337 |
Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation
2605.06173
|
cs.CVcs.AI
|
Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier |
Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost...Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading, macular edema (ME) detection, and report generation. The architecture decouples a high-performance retinal classifier and a parameter-efficient vision-language model (Qwen2.5-VL-...
|
| 340 |
SuperFace: Preference-Aligned Facial Expression Estimation Beyond Pseudo Supervision
2605.06179
|
cs.CV
|
Zejian Kang, Xuanyang Xu, Wentao Yang, Kai Zheng, Yuanchen Fei |
Accurate facial estimation is crucial for realistic digital human animation, and ARKit blendshape coefficients offer an interpretable representation by mapping facial motions to semantic animation controls. However, learning high-quality ARKit coefficient pred...Accurate facial estimation is crucial for realistic digital human animation, and ARKit blendshape coefficients offer an interpretable representation by mapping facial motions to semantic animation controls. However, learning high-quality ARKit coefficient prediction remains limited by the absence of reliable ground-truth supervision. Existing methods typically rely on capture software such as Live Link Face to provide pseudo labels, which may contain noisy activations, biased coefficient magnitu...
|
| 350 |
EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields
2605.06192
|
cs.CVcs.AIcs.RO
|
Zhaoyang Yang, Yurun Jin, Lizhe Qi, Cong Huang, Kai Chen |
Pretrained video diffusion models provide powerful spatiotemporal generative priors, making them a natural foundation for robotic world models. While recent world-action models jointly optimize future videos and actions, they predominantly treat video generati...Pretrained video diffusion models provide powerful spatiotemporal generative priors, making them a natural foundation for robotic world models. While recent world-action models jointly optimize future videos and actions, they predominantly treat video generation as an auxiliary representation for policy learning. Consequently, they insufficiently explore the inverse problem: leveraging action signals to guide video synthesis, thereby often failing to preserve precise robot spatial geometry and f...
|
| 352 |
Bridging visual saliency and large language models for explainable deep learning in medical imaging
2605.06197
|
cs.CVcs.LG
|
Paul Valery Nguezet, Elie Tagne Fute, Yusuf Brima, Benoit Martin Azanguezet, Marcellin Atemkeng |
The opaque nature of deep learning models remains a significant barrier to their clinical adoption in medical imaging. This paper presents a multimodal explainability framework that bridges the gap between convolutional neural network (CNN) predictions and cli...The opaque nature of deep learning models remains a significant barrier to their clinical adoption in medical imaging. This paper presents a multimodal explainability framework that bridges the gap between convolutional neural network (CNN) predictions and clinically actionable insights for brain tumor classification, leveraging large language models (LLMs) to deliver human-interpretable diagnostic narratives. The proposed framework operates through three coupled stages. First, nine CNN architec...
|
| 360 |
Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation
2605.06207
|
cs.CVcs.AIcs.LG
|
Bowen Zheng, Weijian Luo, Guang Yang, Colin Zhang, Tianyang Hu |
Most discrete visual tokenizers rely on a default design: every position in the sequence shares the same codebook. Researchers try to scale the codebook size $K$ to get better reconstruction performance. Such a constant-codebook design hits a fundamental infor...Most discrete visual tokenizers rely on a default design: every position in the sequence shares the same codebook. Researchers try to scale the codebook size $K$ to get better reconstruction performance. Such a constant-codebook design hits a fundamental information-theoretic limit. We observe that the per-position conditional entropy of the training set decays so quickly along the sequence that, after a few positions, the conditional distribution becomes essentially deterministic. On ImageNet w...
|
| 365 |
Differentiable Adaptive 4D Structured Illumination for Joint Capture of Shape and Reflectance
2605.06214
|
cs.CV
|
Huakeng Ding, Yaowen Chen, Kun Zhou, Hongzhi Wu |
We present a differentiable framework to adaptively compute 4D illumination conditions with respect to an object, for efficient, high-quality simultaneous acquisition of its shape and reflectance, with a unified spatial-angular structured light and a single ca...We present a differentiable framework to adaptively compute 4D illumination conditions with respect to an object, for efficient, high-quality simultaneous acquisition of its shape and reflectance, with a unified spatial-angular structured light and a single camera. Using a simple histogram-based pixel-level probability model for depth and reflectance, we differentiably link the next illumination condition(s) with a loss that encourages the reduction in depth uncertainty. As new structured illumi...
|
| 377 |
Look Beyond Saliency: Low-Attention Guided Dual Encoding for Video Semantic Search
2605.06229
|
cs.CV
|
Faisal Aljehrai, Mohammed A. Alkhrashi, Alreem Almuhrij, Sarah Abuhimed, Noorh Aldossary |
Video semantic search in densely crowded scenes remains a challenging task due to visual encoders tendency to prioritize salient foreground regions while neglecting contextually important, background areas. We propose an Inverse Attention Embedding mechanism t...Video semantic search in densely crowded scenes remains a challenging task due to visual encoders tendency to prioritize salient foreground regions while neglecting contextually important, background areas. We propose an Inverse Attention Embedding mechanism that explicitly captures and highlights these overlooked regions. By combining inverse attention embeddings with traditional visual embeddings, our method significantly enhances semantic retrieval performance without additional training. Ini...
|
| 394 |
ZScribbleSeg: A comprehensive segmentation framework with modeling of efficient annotation and maximization of scribble supervision
2605.06266
|
cs.CV
|
Ke Zhang, Bomin Wang, Hangqi Zhou, Xiahai Zhuang |
Curating fully annotated datasets for medical image segmentation is labour-intensive and expertise-demanding. To alleviate this problem, prior studies have explored scribble annotations for weakly supervised segmentation. Existing solutions mainly compute loss...Curating fully annotated datasets for medical image segmentation is labour-intensive and expertise-demanding. To alleviate this problem, prior studies have explored scribble annotations for weakly supervised segmentation. Existing solutions mainly compute losses on annotated areas and generate pseudo labels by propagating annotations to adjacent regions. However, these methods often suffer from inaccurate and unrealistic segmentations due to insufficient supervision and incomplete shape informat...
|
| 395 |
Spark3R: Asymmetric Token Reduction Makes Fast Feed-Forward 3D Reconstruction
2605.06270
|
cs.CV
|
Zecheng Tang, Jiaye Fu, Qiankun Gao, Haijie Li, Yanmin Wu |
Feed-forward 3D reconstruction models based on Vision Transformers can directly estimate scene geometry and camera poses from a small set of input images, but scaling them to video inputs with hundreds or thousands of frames remains challenging due to the quad...Feed-forward 3D reconstruction models based on Vision Transformers can directly estimate scene geometry and camera poses from a small set of input images, but scaling them to video inputs with hundreds or thousands of frames remains challenging due to the quadratic cost of global attention layers. Recent token-merging methods accelerate these models by compressing the token sequence within the global attention layers, but they apply a uniform reduction to query tokens and key-value tokens, ignor...
|
| 397 |
On-Orbit Real-Time Wildfire Detection Under On-Board Constraints
2605.06273
|
cs.CVcs.AR
|
Matthias Rötzer, Veronika Pörtge, Martin Ickerott, Jayendra Praveen Kumar Chorapalli, Dimitri Scheftelowitsch |
We present a deployed system for on-orbit wildfire detection aboard a nine-satellite commercial thermal infrared constellation, operating under demanding joint constraints: sub-megabyte model footprint, sub-150 ms per-batch TensorRT FP16 inference on an NVIDIA...We present a deployed system for on-orbit wildfire detection aboard a nine-satellite commercial thermal infrared constellation, operating under demanding joint constraints: sub-megabyte model footprint, sub-150 ms per-batch TensorRT FP16 inference on an NVIDIA Jetson Xavier NX, and an end-to-end alert pipeline targeting under 10 minutes from satellite overpass to fire event communication. The system operates on uncalibrated mid-wave infrared (MWIR) single-band imagery at 200 m ground sampling di...
|
| 402 |
Eulerian Motion Guidance: Robust Image Animation via Bidirectional Geometric Consistency
2605.06280
|
cs.CV
|
Thong Nguyen, Khoi M. Le, Cong-Duy Nguyen, Luu Anh Tuan, See-Kiong Ng |
Recent advancements in image animation have utilized diffusion models to breathe life into static images. However, existing controllable frameworks typically rely on Lagrangian motion guidance, where optical flow is estimated relative to the initial frame. Thi...Recent advancements in image animation have utilized diffusion models to breathe life into static images. However, existing controllable frameworks typically rely on Lagrangian motion guidance, where optical flow is estimated relative to the initial frame. This paper revisits the same optical-flow primitive through a more local supervision design: we use adjacent-frame Eulerian motion fields to guide generation, where the motion signal always describes a short temporal hop. This shift enables pa...
|
| 411 |
Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement
2605.06298
|
cs.CVcs.AI
|
Roussel Desmond Nzoyem, Mauro Comi |
Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction leaves the...Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction leaves these models computationally expensive and uninterpretable. We address this problem by introducing NOVA, a world modelling framework that represents the system state as the weights and biases of an auxiliary coordinate-based implicit neural re...
|
| 422 |
NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps
2605.06317
|
cs.CVcs.AI
|
Dijia Zhan, Jinyi Li, Chenxi Zheng, Shaoyu Huang, Yong Li |
Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on in...Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on incrementally updating memory graphs or scoring discrete path proposals, which restricts continuous spatial reasoning and creates discrete bottlenecks. We propose Top-Down VLN (TD-VLN), reformulating navigation as a one-step global path plann...
|
| 432 |
TinyBayes: Closed-Form Bayesian Inference via Jacobi Prior for Real-Time Image Classification on Edge Devices
2605.06333
|
cs.CVcs.AIcs.LGstat.APstat.ML
|
Shouvik Sardar, Sourish Das |
Cocoa (Theobroma cacao) is a critical cash crop for millions of smallholder farmers in West Africa, where Cocoa Swollen Shoot Virus Disease (CSSVD) and anthracnose cause devastating yield losses. Automated disease detection from leaf images is essential for ea...Cocoa (Theobroma cacao) is a critical cash crop for millions of smallholder farmers in West Africa, where Cocoa Swollen Shoot Virus Disease (CSSVD) and anthracnose cause devastating yield losses. Automated disease detection from leaf images is essential for early intervention, yet deploying such systems in resource-constrained settings demands models that are small, fast, and require no internet connectivity. Existing edge-deployable plant disease systems rely on end-to-end deep learning without...
|
| 435 |
Earth-o1: A Grid-free Observation-native Atmospheric World Model
2605.06337
|
cs.CV
|
Junchao Gong, Kaiyi Xu, Wangxu Wei, Siwei Tu, Jingyi Xu |
Despite the unprecedented volume of multimodal data provided by modern Earth observation systems, our ability to model atmospheric dynamics remains constrained. Traditional modeling frameworks force heterogeneous measurements into predefined spatial grids, inh...Despite the unprecedented volume of multimodal data provided by modern Earth observation systems, our ability to model atmospheric dynamics remains constrained. Traditional modeling frameworks force heterogeneous measurements into predefined spatial grids, inherently limiting the full exploitation of raw sensor data and creating severe computational bottlenecks. Here we present Earth-o1, an observation-native atmospheric world model that overcomes these structural limitations. Rather than relyin...
|
| 449 |
SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation
2605.06356
|
cs.CV
|
YaoYang Liu, Yuechen Zhang, Wenbo Li, Yufei Zhao, Rui Liu |
High-resolution image-to-video (I2V) generation aims to synthesize realistic temporal dynamics while preserving fine-grained appearance details of the input image. At 2K resolution, it becomes extremely challenging, and existing solutions suffer from various w...High-resolution image-to-video (I2V) generation aims to synthesize realistic temporal dynamics while preserving fine-grained appearance details of the input image. At 2K resolution, it becomes extremely challenging, and existing solutions suffer from various weaknesses: 1) end-to-end models are often prohibitively expensive in memory and latency; 2) cascading low-resolution generation with a generic video super-resolution tends to hallucinate details and drift from input-specific local structure...
|
| 457 |
eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts
2605.06368
|
cs.CVcs.AIcs.LG
|
Paulo Mario P. Medina, Jose Marie Antonio Miñoza, Sebastian C. Ibañez |
Despite extensive research into mitigating distribution shifts, many existing algorithms yield inconsistent performance, often failing to outperform baseline Empirical Risk Minimization (ERM) across diverse scenarios. Furthermore, high algorithmic complexity f...Despite extensive research into mitigating distribution shifts, many existing algorithms yield inconsistent performance, often failing to outperform baseline Empirical Risk Minimization (ERM) across diverse scenarios. Furthermore, high algorithmic complexity frequently limits interpretability and offers only an indirect means of addressing spurious correlations. We propose eXplaining to Learn (eX2L): an interpretable, explanation-based framework that decorrelates confounding features from a clas...
|
| 461 |
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
2605.06376
|
cs.CVcs.AI
|
Tao Liu, Hao Yan, Mengting Chen, Taihang Hu, Zhengrong Yue |
Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency Distillation are two representative paradigms. While consistency methods enforce self-consistency along the...Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency Distillation are two representative paradigms. While consistency methods enforce self-consistency along the full PF-ODE trajectory to steer it toward the clean data manifold, vanilla DMD relies on sparse supervision at a few predefined discrete timesteps. This restricted discrete-time formulation and mode-seeking nature of the reverse KL diverge...
|
| 463 |
Empirical Evidence for Simply Connected Decision Regions in Image Classifiers
2605.06380
|
cs.CVcs.LG
|
Arjhun Swaminathan, Mete Akgün |
Understanding the topology of decision regions is central to explaining the inner workings of deep neural networks. Prior empirical work has provided evidence that these regions are path connected. We study a stronger topological question: whether closed loops...Understanding the topology of decision regions is central to explaining the inner workings of deep neural networks. Prior empirical work has provided evidence that these regions are path connected. We study a stronger topological question: whether closed loops inside a decision region can be contracted without leaving that region. To this end, we propose an iterative quad-mesh filling procedure that constructs a finite-resolution label-preserving surface bounded by a given loop and lying entirel...
|
| 469 |
Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models
2605.06388
|
cs.CVcs.LGcs.RO
|
Nilaksh, Saurav Jha, Artem Zholus, Sarath Chandar |
World model-based policy evaluation is a practical proxy for testing real-world robot control by rolling out candidate actions in action-conditioned video diffusion models. As these models increasingly adopt latent diffusion modeling (LDM), choosing the right ...World model-based policy evaluation is a practical proxy for testing real-world robot control by rolling out candidate actions in action-conditioned video diffusion models. As these models increasingly adopt latent diffusion modeling (LDM), choosing the right latent space becomes critical. While the status quo uses autoencoding latent spaces like VAEs that are primarily trained for pixel reconstruction, recent work suggests benefits from pretrained encoders with representation-aligned semantic l...
|
| 476 |
HumanNet: Scaling Human-centric Video Learning to One Million Hours
2605.06747
|
cs.CVcs.RO
|
Yufan Deng, Daquan Zhou |
Progress in embodied intelligence increasingly depends on scalable data infrastructure. While vision and language have scaled with internet corpora, learning physical interaction remains constrained by the lack of large, diverse, and richly annotated human act...Progress in embodied intelligence increasingly depends on scalable data infrastructure. While vision and language have scaled with internet corpora, learning physical interaction remains constrained by the lack of large, diverse, and richly annotated human activity data. We present HumanNet, a one-million-hour human-centric video corpus that captures how humans interact with the physical world at scale. HumanNet spans both first-person and third-person perspectives and covers fine-grained activi...
|
| 480 |
FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation
2605.06421
|
cs.CVcs.LG
|
Mingfeng Lin, Jiakun Chen, Liang Han, Liqiang Nie |
Pixel-space diffusion has re-emerged as a promising alternative to latent-space generation because it avoids the representation bottleneck introduced by VAEs. Yet most existing methods still treat image generation as a frequency-homogeneous process, overlookin...Pixel-space diffusion has re-emerged as a promising alternative to latent-space generation because it avoids the representation bottleneck introduced by VAEs. Yet most existing methods still treat image generation as a frequency-homogeneous process, overlooking the distinct roles and learning dynamics of low- and high-frequency components. To address this, we propose FREPix, a FREquency-heterogeneous flow matching framework for Pixel-space image generation. FREPix explicitly decomposes generatio...
|
| 507 |
GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs
2605.06477
|
cs.CV
|
Pranav Mantini, Shishir K. Shah |
We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that a...We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model. By imposing geometric and structural constraints on the adapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermor...
|
| 513 |
3D MRI Image Pretraining via Controllable 2D Slice Navigation Task
2605.06487
|
cs.CVcs.AI
|
Yu Wang, Qingchao Chen |
Self-supervised pretraining has become the mainstream approach for learning MRI representations from unlabeled scans. However, most existing objectives still treat each scan primarily as static aggregations of slices, patches or volumes. We ask whether there e...Self-supervised pretraining has become the mainstream approach for learning MRI representations from unlabeled scans. However, most existing objectives still treat each scan primarily as static aggregations of slices, patches or volumes. We ask whether there exists an intrinsic form of self-supervision signal that is different from reconstructing the masked patches, through transforming the 3D volumes into controllable 2D rendered sequences: by rendering slices at continuous positions, orientati...
|
| 521 |
MARBLE: Multi-Aspect Reward Balance for Diffusion RL
2605.06507
|
cs.CVcs.LG
|
Canyu Zhao, Hao Chen, Yunze Tong, Yu Qiao, Jiacheng Li |
Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously....Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice deal with multiple rewards by training one specialist model per reward, optimizing a weighted-sum reward $R(x)=\sum_k w_k R_k(x)$, or sequentially fine-tuning with a hand-crafted stage schedule. These approaches either fa...
|
| 523 |
FreeSpec: Training-Free Long Video Generation via Singular-Spectrum Reconstruction
2605.06509
|
cs.CV
|
Fangda Chen, Shanshan Zhao, Longrong Yang, Chuanfu Xu, Zhigang Luo |
Video diffusion models perform well in short-video synthesis, but their training-free extension to long videos often suffers from content drift, temporal inconsistency, and over-smoothed dynamics. Existing methods improve temporal consistency by combining a gl...Video diffusion models perform well in short-video synthesis, but their training-free extension to long videos often suffers from content drift, temporal inconsistency, and over-smoothed dynamics. Existing methods improve temporal consistency by combining a global branch with a local branch, but they often further decompose appearance consistency and temporal dynamics within each branch using predefined criteria. This assignment is unreliable when appearance and action progression are tightly co...
|
| 525 |
DCR: Counterfactual Attractor Guidance for Rare Compositional Generation
2605.06512
|
cs.CV
|
Taewon Kang, Matthias Zwicker |
Diffusion models generate realistic visual content, yet often fail to produce rare but plausible compositions. When prompted with combinations that are valid but underrepresented in training data, such as a snowy beach or a rainbow at night, the generation pro...Diffusion models generate realistic visual content, yet often fail to produce rare but plausible compositions. When prompted with combinations that are valid but underrepresented in training data, such as a snowy beach or a rainbow at night, the generation process frequently collapses toward more common alternatives. We identify this failure mode as default completion bias, where denoising trajectories are implicitly attracted toward high-frequency semantic configurations. Existing guidance mech...
|
| 537 |
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance
2605.06535
|
cs.CVcs.AI
|
Ziyun Zeng, Yiqi Lin, Guoqiang Liang, Mike Zheng Shou |
In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original sce...In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In contrast, Background Replacement, a task central to creative applications such as film production and advertising, requires synthesizing entirely new, temporally consistent scenes while maintaining a...
|
| 538 |
MedHorizon: Towards Long-context Medical Video Understanding in the Wild
2605.06537
|
cs.CV
|
Bodong Du, Bowen Liu, Yang Yu, Xinpeng Ding, Zhiheng Wu |
Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain highly redundant ana...Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain highly redundant anatomical views, while decisive evidence is temporally sparse, spatially subtle, and context dependent. Existing benchmarks often assume this evidence has already been localized through images, short clips, or pre-segmented videos, leaving th...
|
| 547 |
R$^3$L: Reasoning 3D Layouts from Relative Spatial Relations
2605.06758
|
cs.CVcs.AIcs.LGcs.RO
|
Zhifeng Gu, Yuqi Wang, Bing Wang |
Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred...Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred relations are often unreliable and are typically handled with post-hoc heuristics. In this paper, we propose R$^3$L, a general framework that improves the reliability and consistency of relative spatial reasoning for 3D layout generation. ...
|
| 555 |
Solving Minimal Problems Without Matrix Inversion Using FFT-Based Interpolation
2605.06572
|
cs.CVmath.NA
|
Haidong Wu, Snehal Bhayani, Janne Heikkilä |
Estimating camera geometry typically involves solving minimal problems formulated as systems of multivariate polynomial equations, which often pose computational challenges when using existing Gröbner-basis or resultant-based methods due to matrix inversion ne...Estimating camera geometry typically involves solving minimal problems formulated as systems of multivariate polynomial equations, which often pose computational challenges when using existing Gröbner-basis or resultant-based methods due to matrix inversion needed in the online solver. Here we propose a sampling-based, matrix inversion-free method that constructs the solvers using sparse hidden-variable resultants. The determinant polynomial in the hidden variable is efficiently reconstructed vi...
|
| 565 |
DINORANKCLIP: DINOv3 Distillation and Injection for Vision-Language Pretraining with High-Order Ranking Consistency
2605.06592
|
cs.CVcs.AIcs.LG
|
Shuyang Jiang, Nan Yu, Yiming Zhang, Zenghui Ding, Zhenyu Wu |
Contrastive language-image pretraining (CLIP) suffers from two structural weaknesses: the symmetric InfoNCE loss discards the relative ordering among unmatched in-batch pairs, and global pooling collapses the visual representation into a semantic bottleneck th...Contrastive language-image pretraining (CLIP) suffers from two structural weaknesses: the symmetric InfoNCE loss discards the relative ordering among unmatched in-batch pairs, and global pooling collapses the visual representation into a semantic bottleneck that is poorly sensitive to fine-grained local structure. RANKCLIP partially addresses the first issue with a list-wise Plackett-Luce ranking-consistency loss, but its model is strictly first-order and inherits the second weakness untouched. ...
|
| 593 |
DPM++: Dynamic Masked Metric Learning for Occluded Person Re-identification
2605.06637
|
cs.CV
|
Lei Tan, Yingshi Luan, Pincong Zou, Pingyang Dai, Liujuan Cao |
Although person re-identification has made impressive progress, occlusion caused by obstacles remains an unsettled issue in real applications. The difficulty lies in the mismatch between incomplete occluded samples and holistic identity representations. Severe...Although person re-identification has made impressive progress, occlusion caused by obstacles remains an unsettled issue in real applications. The difficulty lies in the mismatch between incomplete occluded samples and holistic identity representations. Severe occlusion removes discriminative body cues and introduces interference from background clutter and occluders, making global metric learning unreliable. Existing methods mainly rely on extra pre-trained models to estimate visible parts for ...
|
| 598 |
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study
2605.06643
|
cs.CVcs.AIcs.LGcs.MM
|
Hao Dong, Hongzhao Li, Shupan Li, Muhammad Haris Khan, Eleni Chatzi |
Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current ...Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly across datasets, modality configurations, and experimental settings. Furthermore, existing benchmarks focus predominantly on action recognition, often neglecting critical real-world...
|
| cs.CY 3 papers | ||||
| 56 |
The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness
2605.05648
|
cs.CYcs.AIcs.HC
|
Rose Niousha, Samantha Boatright Smith, Bita Akram, Peter Brusilovsky, Arto Hellas |
Current Artificial Intelligence (AI)-based tutoring systems (AI tutors) are primarily evaluated based on the pedagogical quality of their feedback messages. While important, pedagogy alone is insufficient because it ignores a critical question: what do student...Current Artificial Intelligence (AI)-based tutoring systems (AI tutors) are primarily evaluated based on the pedagogical quality of their feedback messages. While important, pedagogy alone is insufficient because it ignores a critical question: what do students actually do with the feedback they receive? We argue that AI tutor evaluation should be extended with a behavioral dimension grounded in student interaction data, which complements pedagogical assessment. We propose an evaluation framewor...
|
| 437 |
A Benchmark for Strategic Auditee Gaming Under Continuous Compliance Monitoring
2605.06340
|
cs.CYcs.GTcs.LG
|
Florian A. D. Burnat, Brittany I. Davidson |
Continuous post-deployment compliance audits, mandated by emerging regulations such as the EU AI Act and Digital Services Act, create a class of strategic gaming distinct from the one-shot input/output gaming studied in prior work. Regulated systems can delay ...Continuous post-deployment compliance audits, mandated by emerging regulations such as the EU AI Act and Digital Services Act, create a class of strategic gaming distinct from the one-shot input/output gaming studied in prior work. Regulated systems can delay outcome reporting, drift their reports within plausible noise envelopes, exploit longitudinal sample attrition, and cherry-pick among ambiguous metric definitions. We formalize continuous auditing as a $T$-round Stackelberg game between an ...
|
| 486 |
From Review to Design: Ethical Multimodal Driver Monitoring Systems for Risk Mitigation, Incident Response, and Accountability in Automated Vehicles
2605.06439
|
cs.CYcs.CVcs.ET
|
Bilal Khana, Waseem Shariff, Rory Coyne, Muhammad Ali Farooq, Peter Corcoran |
As vehicles transition toward higher levels of automation, Driver Monitoring Systems (DMS) have become essential for ensuring human oversight, safety, and regulatory compliance in a vehicle. These systems rely on multimodal sensing and AI-driven inference to a...As vehicles transition toward higher levels of automation, Driver Monitoring Systems (DMS) have become essential for ensuring human oversight, safety, and regulatory compliance in a vehicle. These systems rely on multimodal sensing and AI-driven inference to assess driver attention, cognitive state, and readiness to take control. While technologically promising, their deployment introduces a complex set of ethical and legal challenges - ranging from privacy and consent to data ownership and algo...
|
| cs.DB 1 papers | ||||
| 1 |
Anatomy of a Query: W5H Dimensions and FAR Patterns for Text-to-SQL Evaluation
2605.05525
|
cs.DBcs.CL
|
Vicki Stover Hertzberg, Eduardo Valverde, Joyce C. Ho |
Natural language interfaces to databases have gained popularity, yet the theoretical foundations for evaluating and designing these systems remain underdeveloped. We present QUEST (Query Understanding Evaluation through Semantic Translation), a framework resti...Natural language interfaces to databases have gained popularity, yet the theoretical foundations for evaluating and designing these systems remain underdeveloped. We present QUEST (Query Understanding Evaluation through Semantic Translation), a framework resting on two independently motivated components: the FAR structural invariant, which holds that every well-formed query reduces to Filter, Aggregate, and Return operations; and the W5H dimensional framework, which holds that all filtering crit...
|
| cs.DC 3 papers | ||||
| 27 |
A Scalable Digital Twin Framework for Energy Optimization in Data Centers
2605.05581
|
cs.DCcs.LG
|
Raphael Hendrigo de Souza Gonçalves, Wendel Marcos dos Santos |
This study proposes a scalable Digital Twin framework for energy optimization in data centers.The framework integrates IoT-based data acquisition, cloud computing, and machine learning techniques to enable real-time monitoring, forecasting, and intelligent ene...This study proposes a scalable Digital Twin framework for energy optimization in data centers.The framework integrates IoT-based data acquisition, cloud computing, and machine learning techniques to enable real-time monitoring, forecasting, and intelligent energy management. A controlled small-scale data center environment was developed to monitor variables such as power consumption, temperature, and computational workload. Long Short-Term Memory (LSTM) models were employed to predict energy dem...
|
| 80 |
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
2605.05696
|
cs.DCcs.AIcs.LG
|
Bole Ma, Jan Eitzinger, Harald Köstler |
Agentic LLM workloads put bit-identical tokens at shifted positions every turn, voiding prefix caches at the first byte of divergence. Operators report cache-hit regressions ranging from moderate slowdowns to severe TTFT spikes of 10-16s on unchanged content. ...Agentic LLM workloads put bit-identical tokens at shifted positions every turn, voiding prefix caches at the first byte of divergence. Operators report cache-hit regressions ranging from moderate slowdowns to severe TTFT spikes of 10-16s on unchanged content. Prior position-independent caching systems correct RoPE on the full $d_K$-dimensional key, an architectural cost imposed by GQA, not by caching itself. Multi-Head Latent Attention, deployed at scale in DeepSeek-V2/V3/R1, Kimi-K2/Moonlight, ...
|
| 270 |
Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend
2605.06055
|
cs.DCcs.LG
|
Tianlun Hu, Tiancheng Hu, Shengsheng Litang, Sheng Wang, Xiaoming Bao |
Mixture-of-Experts (MoE) inference requires large-scale token exchange across devices, making dispatch and combine major bottlenecks in both prefill and decode. Beyond network transfer, routing-driven layout transformation, temporary relay, and output restorat...Mixture-of-Experts (MoE) inference requires large-scale token exchange across devices, making dispatch and combine major bottlenecks in both prefill and decode. Beyond network transfer, routing-driven layout transformation, temporary relay, and output restoration can add substantial overhead. Existing MoE communication paths are often buffer-centric, using explicit inter-process relay and reordering buffers around collective transfer. This report presents a relay-buffer-free communication design...
|
| cs.DL 1 papers | ||||
| 254 |
When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge
2605.06033
|
cs.DLcs.AIcs.CYcs.SI
|
Andrés F. Castro Torres, Joan Giner-Miguelez, Mercè Crosas |
The extent to which Artificial Intelligence (AI) can trigger generalized paradigm shifts in science is unclear. Although some of these technologies have revolutionized data collection and analysis in specific scientific fields such as Chemistry, their overall ...The extent to which Artificial Intelligence (AI) can trigger generalized paradigm shifts in science is unclear. Although some of these technologies have revolutionized data collection and analysis in specific scientific fields such as Chemistry, their overall impact depends on the scope of adoption and the ways scholars use them. In this study, we document substantial differences in the timing and extent of AI adoption across countries and scientific domains from 1960 to 2015. After 2015, we fin...
|
| cs.DS 1 papers | ||||
| 36 |
Nearly Optimal Attention Coresets
2605.05602
|
cs.DScs.AI
|
Edo Liberty, Alexandr Andoni, Eldar Kleiner |
We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K,V)$ in $\mathbb{R}^d$, there exists a subse...We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K,V)$ in $\mathbb{R}^d$, there exists a subset $(K',V')$ of size at most $O({\sqrt{d} e^{ρ+o(ρ)}/\varepsilon})$ such that \[ \left\| \operatorname{Attn}(q,K,V)- \operatorname{Attn}(q,K',V') \right\| \le \varepsilon \] simultaneously for all queries whose norm is bounded by $ρ$. This o...
|
| cs.GR 2 papers | ||||
| 175 |
3DSS: 3D Surface Splatting for Inverse Rendering
2605.05876
|
cs.GRcs.CV
|
Mae Younes, Adnane Boukhayma |
We present 3D Surface Splatting (3DSS), the first differentiable surface splatting renderer for physically-based inverse rendering from multi-view images. Our central insight is that the surface separation problem at the heart of surface splatting admits a dir...We present 3D Surface Splatting (3DSS), the first differentiable surface splatting renderer for physically-based inverse rendering from multi-view images. Our central insight is that the surface separation problem at the heart of surface splatting admits a direct formulation in terms of the reconstruction kernels themselves. From this foundation we derive a coverage-based compositing model whose per-layer opacity arises directly from the accumulated Elliptical Weighted Average reconstruction wei...
|
| 274 |
Reality Check: How Avatar and Face Representation Affect the Perceptual Evaluation of Synthesized Gestures
2605.06063
|
cs.GRcs.HC
|
Haoyang Du, Yinghan Xu, John Dingliana, Brian Keegan, Rachel McDonnell |
The capacity to create realistic virtual humans has progressed significantly, and such characters can be found in many applications across entertainment, education and health. As an essential element of interactive virtual humans, speech-driven 3D gesture gene...The capacity to create realistic virtual humans has progressed significantly, and such characters can be found in many applications across entertainment, education and health. As an essential element of interactive virtual humans, speech-driven 3D gesture generation still depends heavily on perceptual evaluation, yet studies often vary avatar appearance and facial presentation when judging the generated motions. Prior work suggests these visual choices can bias motion judgments, but controlled e...
|
| cs.GT 2 papers | ||||
| 462 |
Independent Learning of Nash Equilibria in Partially Observable Markov Potential Games with Decoupled Dynamics
2605.06377
|
cs.GTcs.LGcs.MA
|
Philip Jordan, Maryam Kamgarpour |
We study Nash equilibrium learning in partially observable Markov games (POMGs), a multi-agent reinforcement learning framework in which agents cannot fully observe the underlying state. Prior work in this setting relies on centralization or information sharin...We study Nash equilibrium learning in partially observable Markov games (POMGs), a multi-agent reinforcement learning framework in which agents cannot fully observe the underlying state. Prior work in this setting relies on centralization or information sharing, and suffers from sample and computational complexity that scales exponentially in the number of players. We focus on a subclass of POMGs with independent state transitions, where agents remain coupled through their rewards, and assume th...
|
| 529 |
Optimizing Social Utility in Sequential Experiments
2605.06520
|
cs.GTcs.LGcs.MAstat.ME
|
Ander Artola Velasco, Stratis Tsirtsis, Manuel Gomez-Rodriguez |
Regulatory approval of products in high-stakes domains such as drug development requires statistical evidence of safety and efficacy through large-scale randomized controlled trials. However, the high financial cost of these trials may deter developers who lac...Regulatory approval of products in high-stakes domains such as drug development requires statistical evidence of safety and efficacy through large-scale randomized controlled trials. However, the high financial cost of these trials may deter developers who lack absolute certainty in their product's efficacy, ultimately stifling the development of `moonshot' products that could offer high social utility. To address this inefficiency, in this paper, we introduce a statistical protocol for experime...
|
| cs.HC 3 papers | ||||
| 70 |
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
2605.05682
|
cs.HCcs.AIcs.CY
|
Wesley Hanwen Deng, Mingxi Yan, Sunnie S. Y. Kim, Akshita Jha, Lauren Wilcox |
Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models, with growing emphasis on how red-teamers' backgrounds and perspectives shape their strategies and the risks th...Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models, with growing emphasis on how red-teamers' backgrounds and perspectives shape their strategies and the risks they uncover. While automated red-teaming approaches promise to complement human red-teaming through larger-scale exploration, existing automated approaches do not account for human identities and rarely incorporate human inputs. In this work...
|
| 125 |
Priming, Path-dependence, and Plasticity: Understanding the molding of user-LLM interaction and its implications from (many) chat logs in the wild
2605.05767
|
cs.HCcs.CL
|
Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno |
User interactions with LLMs are shaped by prior experiences and individual exploration, but in-lab studies do not provide system designers with visibility into these in-the-wild factors. This work explores a new approach to studying real-world user-LLM interac...User interactions with LLMs are shaped by prior experiences and individual exploration, but in-lab studies do not provide system designers with visibility into these in-the-wild factors. This work explores a new approach to studying real-world user-LLM interactions through large-scale chat logs from the wild. Through analysis of 140K chatbot sessions from 7,955 anonymized global users over time, we demonstrate key patterns in user expressions despite varied tasks: (1) LLM users are not tabula ra...
|
| 444 |
Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Systems Perspective
2605.06347
|
cs.HCcs.AI
|
Xuening Wu, Yanlan Kang, Qianya Xu, Kexuan Xie, Jiaqi Mi |
Large language models (LLMs) are reshaping how knowledge is produced, with increasing reliance on AI systems for generation, summarization, and reasoning. While prior work has studied cognitive offloading in humans and model collapse in recursive training, the...Large language models (LLMs) are reshaping how knowledge is produced, with increasing reliance on AI systems for generation, summarization, and reasoning. While prior work has studied cognitive offloading in humans and model collapse in recursive training, these effects are typically considered in isolation. We propose a unified perspective: humans and language models form a coupled dynamical system linked by a feedback loop of usage, generation, and retraining. We introduce a minimal model with...
|
| cs.IR 2 papers | ||||
| 163 |
Bridging Passive and Active: Enhancing Conversation Starter Recommendation via Active Expression Modeling
2605.05855
|
cs.IRcs.CL
|
Yiqing Wu, Haoming Li, Guanyu Jiang, Jiahao Liang, Yongchun Zhu |
Large Language Model (LLM)-driven conversational search is shifting information retrieval from reactive keyword matching to proactive, open-ended dialogues. In this context, Conversation Starters are widely deployed to provide personalized query recommendation...Large Language Model (LLM)-driven conversational search is shifting information retrieval from reactive keyword matching to proactive, open-ended dialogues. In this context, Conversation Starters are widely deployed to provide personalized query recommendations that help users initiate dialogues. Conventionally, recommending these starters relies on a closed "exposure-click" loop. Yet, this feedback loop mechanism traps the system in an echo chamber where, compounded by data sparsity, it fails t...
|
| 380 |
OBLIQ-Bench: Exposing Overlooked Bottlenecks in Modern Retrievers with Latent and Implicit Queries
2605.06235
|
cs.IRcs.AI
|
Diane Tchuindjo, Devavrat Shah, Omar Khattab |
Retrieval benchmarks are increasingly saturating, but we argue that efficient search is far from a solved problem. We identify a class of queries we call oblique, which seek documents that instantiate a latent pattern, like finding all tweets that express an i...Retrieval benchmarks are increasingly saturating, but we argue that efficient search is far from a solved problem. We identify a class of queries we call oblique, which seek documents that instantiate a latent pattern, like finding all tweets that express an implicit stance, chat logs that demonstrate a particular failure mode, or transcripts that match an abstract scenario. We study three mechanisms through which obliqueness may arise and introduce OBLIQ-Bench, a suite of five oblique search pr...
|
| cs.LG 193 papers | ||||
| 3 |
Energy Generative Modeling: A Lyapunov-based Energy Matching Perspective
2605.05530
|
cs.LG
|
Yixuan Wang, Wenqian Xue, Warren E. Dixon |
Generative models based on static scalar energy functions represent an emerging paradigm in which a single time independent potential drives sample generation through its gradient field, eliminating the need for time conditioning entirely. We unify the trainin...Generative models based on static scalar energy functions represent an emerging paradigm in which a single time independent potential drives sample generation through its gradient field, eliminating the need for time conditioning entirely. We unify the training and sampling phases of this paradigm, conventionally treated as separate procedures, within a single framework: density transport on the Wasserstein space, cast as a nonlinear control problem in which the Kullback Leibler (KL) divergence ...
|
| 6 |
Adversarial Graph Neural Network Benchmarks: Towards Practical and Fair Evaluation
2605.05534
|
cs.LG
|
Tran Gia Bao Ngo, Zulfikar Alom, Federico Errica, Murat Kantarcioglu, Cuneyt Gurcan Akcora |
Adversarial learning and the robustness of Graph Neural Networks (GNNs) are topics of widespread interest in the machine learning community, as documented by the number of adversarial attacks and defenses designed for these purposes. While a rigorous evaluatio...Adversarial learning and the robustness of Graph Neural Networks (GNNs) are topics of widespread interest in the machine learning community, as documented by the number of adversarial attacks and defenses designed for these purposes. While a rigorous evaluation of these adversarial methods is necessary to understand the robustness of GNNs in real-world applications, we posit that many works in the literature do not share the same experimental settings, leading to ambiguous and potentially contra...
|
| 9 |
Towards Scalable One-Step Generative Modeling for Autoregressive Dynamical System Forecasting
2605.05540
|
cs.LGphysics.flu-dyn
|
Tianyue Yang, Xiao Xue |
Fast surrogate modeling for high-dimensional physical dynamics requires more than low short-term error: useful models must roll out efficiently while preserving the statistical structure of long trajectories. Neural operators provide inexpensive autoregressive...Fast surrogate modeling for high-dimensional physical dynamics requires more than low short-term error: useful models must roll out efficiently while preserving the statistical structure of long trajectories. Neural operators provide inexpensive autoregressive forecasts but can drift in turbulent regimes, whereas rolling diffusion and latent generative surrogates can represent stochastic transitions at the cost of multi-step denoising, noise-schedule design, or auxiliary compression models. We p...
|
| 10 |
Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning
2605.05544
|
cs.LGcs.RO
|
Nandiraju Gireesh, Yuanliang Ju, He Wang |
Offline-to-online reinforcement learning with action chunking eliminates multi-step off-policy bias and enables temporally coherent exploration, but all existing methods use a fixed chunk size across every state. This is suboptimal: near contact events the age...Offline-to-online reinforcement learning with action chunking eliminates multi-step off-policy bias and enables temporally coherent exploration, but all existing methods use a fixed chunk size across every state. This is suboptimal: near contact events the agent needs short chunks for reactive control, while during free-space motion long chunks provide better credit assignment. The natural solution is to train critics for several chunk sizes and select the best one at each state, but naive compa...
|
| 14 |
FedeKD: Energy-Based Gating for Robust Federated Knowledge Distillation under Heterogeneous Settings
2605.05553
|
cs.LG
|
Quang-Huy Nguyen, Jiaqi Wang, Wei-shinn Ku |
Federated learning (FL) operates in heterogeneous environments, where variations in data distributions and asymmetric model design often result in negative transfer. While federated knowledge distillation (FKD) avoids direct model parameter sharing, existing m...Federated learning (FL) operates in heterogeneous environments, where variations in data distributions and asymmetric model design often result in negative transfer. While federated knowledge distillation (FKD) avoids direct model parameter sharing, existing methods typically rely on public datasets or assume that transferred knowledge is uniformly reliable, which limits their robustness in practice. This paper presents FedeKD, a reliability-aware FKD framework that makes sample-wise trust estim...
|
| 25 |
Accelerating LMO-Based Optimization via Implicit Gradient Transport
2605.05577
|
cs.LGcs.AI
|
Won-Jun Jang, Si-Hyeon Lee |
Recent optimizers such as Lion and Muon have demonstrated strong empirical performance by normalizing gradient momentum via linear minimization oracles (LMOs). While variance reduction has been explored to accelerate LMO-based methods, it typically incurs subs...Recent optimizers such as Lion and Muon have demonstrated strong empirical performance by normalizing gradient momentum via linear minimization oracles (LMOs). While variance reduction has been explored to accelerate LMO-based methods, it typically incurs substantial computational overhead due to additional gradient evaluations. At the same time, the theoretical understanding of LMO-based methods remains fragmented across unconstrained and constrained formulations. Motivated by these limitations...
|
| 29 |
AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling
2605.05586
|
cs.LG
|
Francisco Giral, Abhijeet Vishwasrao, Andrea Arroyo Ramo, Mahmoud Golestanian, Federica Tonti |
Aerodynamic surrogate models are increasingly used to replace repeated high-fidelity CFD evaluations in many-query design settings, but current approaches still face two important limitations: they often scale poorly to the very large fields arising in realist...Aerodynamic surrogate models are increasingly used to replace repeated high-fidelity CFD evaluations in many-query design settings, but current approaches still face two important limitations: they often scale poorly to the very large fields arising in realistic 3D aerodynamics, and they rarely produce latent representations that are directly useful for analysis and design. We introduce AeroJEPA, a Joint-Embedding Predictive Architecture for aerodynamic field modeling that addresses both issues....
|
| 32 |
When Can Voting Help, Hurt, or Change Course? Exact Structure of Binary Test-Time Aggregation
2605.05592
|
cs.LGcs.IT
|
Yi Liu |
Majority voting is one of the few black-box interventions that can improve a fixed stochastic predictor: repeated access can be cheaper than changing a high-capability model. Classical fixed-competence theory makes this intervention look monotone -- more votes...Majority voting is one of the few black-box interventions that can improve a fixed stochastic predictor: repeated access can be cheaper than changing a high-capability model. Classical fixed-competence theory makes this intervention look monotone -- more votes help above the majority threshold and hurt below it. We show that this picture is fundamentally incomplete. Under the de Finetti representation for exchangeable repeated correctness, voting is governed by a latent distribution of per-examp...
|
| 38 |
Optimal Contextual Pricing under Agnostic Non-Lipschitz Demand
2605.05609
|
cs.LGecon.EMstat.ML
|
Jianyu Xu, Yu-Xiang Wang |
We study contextual dynamic pricing with linear valuations and bounded-support agnostic noise, whose induced demand curve may be non-Lipschitz with arbitrary jumps and atoms. Such discontinuities break the cross-context interpolation arguments used by smooth-d...We study contextual dynamic pricing with linear valuations and bounded-support agnostic noise, whose induced demand curve may be non-Lipschitz with arbitrary jumps and atoms. Such discontinuities break the cross-context interpolation arguments used by smooth-demand pricing algorithms, while the best previous method achieved only $\tilde O(T^{3/4})$ regret. We propose Conservative-Markdown Redirect-UCB Pricing, a polynomial-time algorithm that combines randomized parameter estimation, conservativ...
|
| 40 |
LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites
2605.05615
|
cs.LGcs.CY
|
Lei Jiang, Adrian Ildefonso, Daniel Loveless, Fan Chen |
Large language models (LLMs) impose rapidly growing energy demands, creating an emerging energy and carbon crisis driven by large-scale inference. Solar-powered, AI-enabled low Earth orbit (LEO) satellites have been proposed to mitigate terrestrial electricity...Large language models (LLMs) impose rapidly growing energy demands, creating an emerging energy and carbon crisis driven by large-scale inference. Solar-powered, AI-enabled low Earth orbit (LEO) satellites have been proposed to mitigate terrestrial electricity consumption, but their lifecycle carbon footprint remains poorly understood due to launch emissions, satellite manufacturing, and radiation-hardened hardware requirements. This paper presents \textit{LLMSpace}, the first carbon modeling fr...
|
| 42 |
Region-adaptable retrieval of coastal biogeochemical parameters from near-surface hyperspectral remote sensing reflectance using physics-aware meta-learning
2605.05623
|
cs.LG
|
Yiqing Guo, Nagur R. C. Cherukuru, Eric A. Lehmann, S. L. Kesav Unnithan, Tim J. Malthus |
Hyperspectral in situ sensing has shown promise in retrieving aquatic biogeochemical (BGC) parameters, such as total suspended solids, dissolved organic carbon, and total chlorophyll-a, for cost-effective monitoring of coastal water quality. However, generalis...Hyperspectral in situ sensing has shown promise in retrieving aquatic biogeochemical (BGC) parameters, such as total suspended solids, dissolved organic carbon, and total chlorophyll-a, for cost-effective monitoring of coastal water quality. However, generalising such retrieval algorithms across water bodies remains challenging, as the relationship between remote sensing reflectance (Rrs) and BGC parameters can vary considerably from one region to another due to regional distinctions in environm...
|
| 51 |
Scaling Pretrained Representations Enables Label-Free Out-of-Distribution Detection Without Fine-Tuning
2605.05638
|
cs.LG
|
Brett Barkley, Preston Culbertson, David Fridovich-Keil |
Models trained with deep learning often fail to signal when inputs fall outside their training data manifold, leading to unreliable predictions under distribution shift. Prior work suggests that effective out-of-distribution (OOD) detection often requires clas...Models trained with deep learning often fail to signal when inputs fall outside their training data manifold, leading to unreliable predictions under distribution shift. Prior work suggests that effective out-of-distribution (OOD) detection often requires class-conditional modeling or specialized models obtained through supervised fine-tuning. We revisit this assumption in modern pretrained models and show that their frozen representations already encode sufficient geometric structure for accura...
|
| 57 |
Information-Preserving Domain Transfer with Unlabeled Data in Misspecified Simulation-Based Inference
2605.05652
|
cs.LG
|
Joon Jang, Eunho Jeong, Kyu Sung Choi, Hyeonjin Kim |
Simulation-based inference (SBI) provides amortized Bayesian parameter inference from simulator-generated data without requiring explicit likelihood evaluation. Its reliability can degrade under model misspecification, where real-world observations are not wel...Simulation-based inference (SBI) provides amortized Bayesian parameter inference from simulator-generated data without requiring explicit likelihood evaluation. Its reliability can degrade under model misspecification, where real-world observations are not well represented by the simulator used for training. Existing methods using unlabeled real-world data often align simulated and real-world data distributions, but marginal alignment alone does not directly preserve parameter-relevant informati...
|
| 61 |
Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks
2605.05659
|
cs.LG
|
Ying Chen, Aoxi Li, Jihun Kim, Javad Lavaei |
The massive computational costs of scaling modern deep learning architectures have driven the widespread use of parameter-efficient low-rank structures, such as LoRA and low-rank factorization. However, theoretical guarantees for their expressive power are les...The massive computational costs of scaling modern deep learning architectures have driven the widespread use of parameter-efficient low-rank structures, such as LoRA and low-rank factorization. However, theoretical guarantees for their expressive power are less explored, often relying on restrictive priors like a pretrained base matrix, ReLU activations or non-verifiable singularity conditions. We first investigate the limits of neural networks constrained strictly to low-rank manifolds without ...
|
| 62 |
Distributionally Robust Multi-Objective Optimization
2605.05660
|
cs.LGmath.OC
|
Yufeng Yang, Fangning Zhuo, Ziyi Chen, Heng Huang, Yi Zhou |
Multi-objective optimization (MOO) has received growing attention in applications that require learning under multiple criteria. However, the existing MOO formulations do not explicitly account for distributional shifts in the data. We introduce distributional...Multi-objective optimization (MOO) has received growing attention in applications that require learning under multiple criteria. However, the existing MOO formulations do not explicitly account for distributional shifts in the data. We introduce distributionally robust multi-objective optimization (DR-MOO), which minimizes multiple objectives under their respective worst-case distributions. We propose Pareto-type solution concepts for DR-MOO and develop multi-gradient descent algorithms (MGDA) w...
|
| 72 |
Temporal Functional Circuits: From Spline Plots to Faithful Explanations in KAN Forecasting
2605.05685
|
cs.LGcs.AIstat.ML
|
Naveen Mysore |
Unlike MLPs, Kolmogorov-Arnold Networks (KANs) expose explicit learnable edge functions on every connection, enabling mechanistic explanation in time-series forecasting. This paper introduces Temporal Functional Circuits, a framework that transforms KAN edge f...Unlike MLPs, Kolmogorov-Arnold Networks (KANs) expose explicit learnable edge functions on every connection, enabling mechanistic explanation in time-series forecasting. This paper introduces Temporal Functional Circuits, a framework that transforms KAN edge functions from latent visualizations into faithful, temporally grounded explanations. Built on a gated residual KAN that decomposes forecasts into a linear base and a sparsely activated KAN correction, the framework (i) maps each edge to inp...
|
| 81 |
Budgeted Attention Allocation: Cost-Conditioned Compute Control for Efficient Transformers
2605.05697
|
cs.LGcs.AI
|
Amrit Nidhi |
Transformers usually expose one inference cost per trained model, while deployed systems often need multiple cost-quality operating points. We study Budgeted Attention Allocation, a monotone head-gating mechanism conditioned on a requested attention budget. De...Transformers usually expose one inference cost per trained model, while deployed systems often need multiple cost-quality operating points. We study Budgeted Attention Allocation, a monotone head-gating mechanism conditioned on a requested attention budget. Dense warm-starting is important for stability: on a robust synthetic sequence task, one budgeted model reaches 99.7% accuracy at 0.303 estimated attention cost and 100.0% accuracy at 0.504 cost. On held-out AG News with a custom word-level t...
|
| 91 |
On the Blessing of Pre-training in Weak-to-Strong Generalization
2605.05710
|
cs.LG
|
Wei Yao, Wang Zhaoyang, Gengze Xu, Chen Qian, Dongrui Liu |
The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work, we identify pre-training as...The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work, we identify pre-training as the essential prerequisite for the emergence of W2SG. Theoretically, we formalize the W2SG problem within a high-dimensional single-index model framework using spiked Gaussian data, modeling pre-training as a spectral initialization step. ...
|
| 97 |
Enabling Federated Inference via Unsupervised Consensus Embedding
2605.05718
|
cs.LG
|
Yui Hashimoto, Takayuki Nishio, Yuichi Kitagawa, Takahito Tanimura |
Cooperative inference across independently deployed machine learning models is increasingly desirable in distributed environments, as there is a growing need to leverage multiple models while keeping their data and model parameters private. However, existing c...Cooperative inference across independently deployed machine learning models is increasingly desirable in distributed environments, as there is a growing need to leverage multiple models while keeping their data and model parameters private. However, existing cooperative frameworks typically rely on sharing input data, model parameters, or a common encoder, which limits their applicability in privacy-sensitive or cross-organizational settings. To address this challenge, we propose Consensus Embed...
|
| 102 |
WARP: A Benchmark for Primal-Dual Warm-Starting of Interior-Point Solvers
2605.05728
|
cs.LGcs.AIeess.SYmath.OC
|
Dhruv Suri, Helgi Hilmarsson, Shourya Bose |
Solving AC Optimal Power Flow (AC-OPF) is of central importance in electricity market operations, where interior-point methods (IPMs) such as IPOPT are the standard solvers. A growing body of work uses machine learning to predict primal warm-start iterates, re...Solving AC Optimal Power Flow (AC-OPF) is of central importance in electricity market operations, where interior-point methods (IPMs) such as IPOPT are the standard solvers. A growing body of work uses machine learning to predict primal warm-start iterates, reporting iteration reductions of 30-46\%. We show that these reported gains rest on an inappropriate evaluation baseline: prior methods benchmark against the flat start $V_m = 1, V_a = 0$, whereas the solver's actual default - the variable-b...
|
| 104 |
CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning
2605.05732
|
cs.LGcs.AI
|
Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari |
Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank int...Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in three stages: it first routes each task to a group of similar tasks based on output-distribution divergence; it then fine-tunes the model using a Kullback-Leibler (KL) divergence again...
|
| 107 |
CoMemNet: Contrastive Sampling with Memory Replay Network for Continual Traffic Prediction
2605.05738
|
cs.LGcs.AI
|
Mei Wu, Wenchao Weng, Wenxin Su, Wenjie Tang, Wei Zhou |
In recent years, the integration of non-topological space modeling with temporal learning methods has emerged as an effective approach for capturing spatio-temporal information in non-Euclidean graphs. However, most existing methods rely on static underlying g...In recent years, the integration of non-topological space modeling with temporal learning methods has emerged as an effective approach for capturing spatio-temporal information in non-Euclidean graphs. However, most existing methods rely on static underlying graph structures, which are inadequate for capturing the continuously expanding and evolving patterns in streaming traffic networks. To address this challenge, we propose a simple yet efficient dual-branch continual learning framework for tr...
|
| 108 |
Multi-Dimensional Behavioral Evaluation of Agentic Stock Prediction Systems Using LLM Judges with Closed-Loop Reinforcement Learning Feedback
2605.05739
|
cs.LGcs.AIcs.CLq-fin.CP
|
Mohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman |
Agentic stock prediction systems make sequences of interdependent decisions (regime detection, pathway routing, reinforcement learning control) whose individual quality is hidden by aggregate metrics such as mean absolute percentage error (MAPE) or directional...Agentic stock prediction systems make sequences of interdependent decisions (regime detection, pathway routing, reinforcement learning control) whose individual quality is hidden by aggregate metrics such as mean absolute percentage error (MAPE) or directional accuracy. We present a behavioral evaluation framework that addresses this gap. Behavioral traces logged at every autonomous decision point are grouped into five-day episodes and scored along six domain-specific dimensions (regime detectio...
|
| 110 |
Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models)
2605.05742
|
cs.LG
|
Scott Geng, Dutch Hansen, Jerry Li |
Weak-to-strong generalization is a phenomenon in post-training whereby a strong student model, when finetuned solely with feedback from a weaker teacher, can not only surpass the teacher, but can improve upon its own capabilities. Recent work of Burns et al. (...Weak-to-strong generalization is a phenomenon in post-training whereby a strong student model, when finetuned solely with feedback from a weaker teacher, can not only surpass the teacher, but can improve upon its own capabilities. Recent work of Burns et al. (2023) demonstrated that this can occur in the setting of frontier language models, and subsequently there has been a flurry of both empirical work trying to exploit this phenomenon, as well as theoretical work attempting to understand it. I...
|
| 116 |
RVPO: Risk-Sensitive Alignment via Variance Regularization
2605.05750
|
cs.LGcs.CL
|
Ivan Montero, Tomasz Jurczyk, Bhuwan Dhingra |
Current critic-less RLHF methods aggregate multi-objective rewards via an arithmetic mean, leaving them vulnerable to constraint neglect: high-magnitude success in one objective can numerically offset critical failures in others (e.g., safety or formatting), m...Current critic-less RLHF methods aggregate multi-objective rewards via an arithmetic mean, leaving them vulnerable to constraint neglect: high-magnitude success in one objective can numerically offset critical failures in others (e.g., safety or formatting), masking low-performing "bottleneck" rewards vital for reliable multi-objective alignment. We propose Reward-Variance Policy Optimization (RVPO), a risk-sensitive framework that penalizes inter-reward variance during advantage aggregation, sh...
|
| 121 |
Full-Spectrum Graph Neural Network: Expressive and Scalable
2605.05759
|
cs.LG
|
Xiaohan Wang, Deyu Bo, Longlong Li, Kelin Xia |
It is well established that spectral graph neural networks (GNNs) can universally approximate node signals; however, their expressive power remains bounded by the 1-dimensional Weisfeiler-Lehman test, which is mirrored in their lack of universality for higher-...It is well established that spectral graph neural networks (GNNs) can universally approximate node signals; however, their expressive power remains bounded by the 1-dimensional Weisfeiler-Lehman test, which is mirrored in their lack of universality for higher-order signals. To go beyond this bound, we propose the Full-Spectrum GNN (FSpecGNN), a second-order generalization of classical spectral GNNs. FSpecGNN advances spectral filtering in two perspectives: (1) it lifts the signal from the node d...
|
| 123 |
Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion
2605.06720
|
cs.LGcs.AI
|
Justin Sanders, Luca Giancardo, Lan Guo, Yue Zhao, Kemal Sonmez |
Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as powerful tools for ant...Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as powerful tools for antibody sequence design, existing approaches largely suffer from two key limitations: they predominantly memorize germline sequences rather than modeling biologically meaningful somatic variation, and they offer limited support for flexible c...
|
| 127 |
Adaptive Selection of LoRA Components in Privacy-Preserving Federated Learning
2605.05769
|
cs.LGcs.AIcs.CL
|
Myoungjun Kim, Sangwoo Park, Yoseob Han, Jin-Hyun Ahn |
Differentially private federated fine-tuning of large models with LoRA suffers from aggregation error caused by LoRA's multiplicative structure, which is further amplified by DP noise and degrades both stability and accuracy. Existing remedies apply a single u...Differentially private federated fine-tuning of large models with LoRA suffers from aggregation error caused by LoRA's multiplicative structure, which is further amplified by DP noise and degrades both stability and accuracy. Existing remedies apply a single update mode uniformly across all layers and all communication rounds (or alternate them on a fixed schedule), ignoring both the structural asymmetry between the two LoRA factors and the round-wise dynamics of training. We propose AS-LoRA, an...
|
| 136 |
A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration
2605.05791
|
cs.LG
|
Manuel Haussmann, Mustafa Mert Çelikok, Melih Kandemir |
While reinforcement learning (RL) promises to revolutionize the control of complex nonlinear robotic systems, a profound gap persists between the heuristic success of model-free off-policy deep RL and the underlying theory, which remains largely confined to ta...While reinforcement learning (RL) promises to revolutionize the control of complex nonlinear robotic systems, a profound gap persists between the heuristic success of model-free off-policy deep RL and the underlying theory, which remains largely confined to tabular or linearizable settings. We identify the cause of this gap as an emergent isolation of three traditions: (i) measure-theoretic MDP foundations on general spaces limit their analysis to exact dynamic programming and ignore all error s...
|
| 137 |
Revealing Modular Gradient Noise Imbalance in LLMs: Calibrating Adam via Signal-to-Noise Ratio
2605.05794
|
cs.LGcs.AI
|
Ziqing Wen, Zhouyang Liu, Jiahuan Wang, Ping Luo, Li Shen |
The impressive performance of large language models (LLMs) arises from their massive scale and heterogeneous module composition. However, this structural heterogeneity introduces additional optimization challenges. While adaptive optimizers such as Adam(W) pro...The impressive performance of large language models (LLMs) arises from their massive scale and heterogeneous module composition. However, this structural heterogeneity introduces additional optimization challenges. While adaptive optimizers such as Adam(W) provide per-parameter adaptivity, they do not explicitly account for module-level gradient heterogeneity, resulting in slower convergence, suboptimal performance, or training instability. Existing approaches typically rely on manually tuned mo...
|
| 138 |
Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs
2605.05795
|
cs.LG
|
Nicholas Potteiger, Ankita Samaddar, Taylor T. Johnson, Xenofon Koutsoukos |
Decomposing complex tasks into a sequence of simpler subtasks can improve learning efficiency for an autonomous agent. Reinforcement learning (RL) can be used to optimize agent policies to complete subtasks, but requires well-defined subtask rewards and benefi...Decomposing complex tasks into a sequence of simpler subtasks can improve learning efficiency for an autonomous agent. Reinforcement learning (RL) can be used to optimize agent policies to complete subtasks, but requires well-defined subtask rewards and benefits from action masking. Recent work uses large language models (LLMs) to automate reward shaping and action masking, however none of them fully address reactivity to subtask failure and modularity to varying objects for compositional tasks....
|
| 139 |
Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
2605.05802
|
cs.LG
|
Zhiyuan Zhai, Xin Wang |
Group-relative RL training (GRPO) samples a small group of parallel rollouts for every training prompt and uses their within-group reward spread to compute per-trajectory advantages. In agentic environments each rollout is a long multi-turn dialogue with one L...Group-relative RL training (GRPO) samples a small group of parallel rollouts for every training prompt and uses their within-group reward spread to compute per-trajectory advantages. In agentic environments each rollout is a long multi-turn dialogue with one LLM call per step, so this multi-sample multiplier dominates the total training cost. When every rollout of a prompt ends with the same reward, the group has zero reward variance and contributes no gradient, so the extra rollouts add no info...
|
| 141 |
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
2605.05806
|
cs.LG
|
Elad Hoffer, Yochai Blau, Edan Kinderman, Ron Banner, Daniel Soudry |
Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval v...Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generat...
|
| 147 |
A Testable Certificate for Constant Collapse in Teacher-Guided VAEs
2605.05813
|
cs.LGcs.AI
|
Zegu Zhang, Jianhua Peng, Jian Zhang |
Posterior collapse in variational autoencoders is often diagnosed by its symptoms: a small KL term, a strong decoder, or weak use of the latent code. These signals are useful, but they do not define a collapse boundary. We study a concrete failure mode, input-...Posterior collapse in variational autoencoders is often diagnosed by its symptoms: a small KL term, a strong decoder, or weak use of the latent code. These signals are useful, but they do not define a collapse boundary. We study a concrete failure mode, input-independent constant collapse, and show that this case admits an exact threshold. For any fixed nonconstant teacher distribution \(T(\cdot\mid x)\), the best constant student is the dataset-average teacher distribution, and its alignment co...
|
| 149 |
HCInfer: An Efficient Inference System via Error Compensation for Resource-Constrained Devices
2605.05819
|
cs.LG
|
Shen Xu, Xiangwen Zhuge, Zhe Xu, Yingkun Hu, Zheng Yang |
LLMs often struggle with memory-constrained deployment on consumer-grade hardware due to their massive parameter sizes. While existing solutions such as model compression and offloading improve deployment feasibility, they often suffer from substantial accurac...LLMs often struggle with memory-constrained deployment on consumer-grade hardware due to their massive parameter sizes. While existing solutions such as model compression and offloading improve deployment feasibility, they often suffer from substantial accuracy degradation or severe throughput bottlenecks. Recent error compensation methods recover accuracy through auxiliary LoRA-style branches, and we observe that these branches are inherently amenable to offloading: they require substantial par...
|
| 156 |
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention
2605.05838
|
cs.LGcs.NE
|
Yulong Huang, Xiang Liu, Hongxiang Huang, Xiaopeng Lin, Zunchang Liu |
Linear Attention (LA) offers a promising paradigm for scaling large language models (LLMs) to long sequences by avoiding the quadratic complexity of self-attention. Recent LA models such as Mamba2 and GDN interpret linear recurrences as closed-form online stoc...Linear Attention (LA) offers a promising paradigm for scaling large language models (LLMs) to long sequences by avoiding the quadratic complexity of self-attention. Recent LA models such as Mamba2 and GDN interpret linear recurrences as closed-form online stochastic gradient descent (SGD), but naive SGD updates suffer from rapid information decay and suboptimal convergence in optimization. While momentum-based optimizers provide a natural remedy, they pose challenges in simultaneously achieving ...
|
| 161 |
Hypothesis generation and updating in large language models
2605.05851
|
cs.LG
|
Hua-Dong Xiong |
Large language models (LLMs) increasingly help people solve problems, from debugging code to repairing machinery. This process requires generating plausible hypotheses from partial descriptions, then updating them as more information arrives. Yet how LLMs perf...Large language models (LLMs) increasingly help people solve problems, from debugging code to repairing machinery. This process requires generating plausible hypotheses from partial descriptions, then updating them as more information arrives. Yet how LLMs perform this form of inference, and how close it is to optimal, remains unclear. We study this question in the number game, a controlled setting in which a learner infers the hypothesis supported by a few positive integers, such as $\{16, 8, 2,...
|
| 164 |
Measuring Learning Progress via Gradient-Momentum Coupling
2605.05856
|
cs.LG
|
Samuel Blad, Martin Längkvist, Amy Loutfi |
Measuring learning progress is essential for curiosity-driven exploration in reinforcement learning, but widely used signals such as prediction error often fail to distinguish meaningful, learnable patterns from random noise. This paper proposes Gradient-Momen...Measuring learning progress is essential for curiosity-driven exploration in reinforcement learning, but widely used signals such as prediction error often fail to distinguish meaningful, learnable patterns from random noise. This paper proposes Gradient-Momentum Coupling (GMC), a signal derived from optimization dynamics that quantifies how useful each sample's gradient is for ongoing learning by measuring its per-parameter normalized absolute product with the momentum from previous gradients. ...
|
| 165 |
Offline Reinforcement Learning for Rotation Profile Control in Tokamaks
2605.05857
|
cs.LG
|
Rohit Sonker, Hiro Josep Farre Kaga, Jiayu Chen, Andrew Rothstein, Ian Char |
Tokamaks remain leading candidates for achieving practical fusion energy, yet many important control problems inside these devices are still difficult or unsolved. One such challenge is controlling the plasma rotation profile, which strongly influences stabili...Tokamaks remain leading candidates for achieving practical fusion energy, yet many important control problems inside these devices are still difficult or unsolved. One such challenge is controlling the plasma rotation profile, which strongly influences stability, confinement, and transport. While the average rotation can be controlled, controlling the full profile is challenging due to high dimensionality, response to multiple actuators and dependence on plasma condition. Learning-based control ...
|
| 167 |
Do Neural Operators Forget Geometry? The Forgetting Hypothesis in Deep Operator Learning
2605.05862
|
cs.LG
|
Yanming Xia, Angelica I. Aviles-Rivero |
Neural operators perform well on structured domains, yet their behaviour on irregular geometries remains poorly understood. We show that this limitation is not merely an encoding issue, but a depth-wise failure mode inherent to deep operator architectures. W...Neural operators perform well on structured domains, yet their behaviour on irregular geometries remains poorly understood. We show that this limitation is not merely an encoding issue, but a depth-wise failure mode inherent to deep operator architectures. We formalise the Geometric Forgetting Hypothesis: due to the Markovian structure of operator layers and their reliance on global mixing mechanisms, neural operators progressively lose access to domain geometry as depth increases. Using layer...
|
| 168 |
SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data
2605.05863
|
cs.LGcs.AI
|
Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, Andrew D. Bagdanov |
Incorporating prior data into online reinforcement learning accelerates training but typically forces a difficult trade-off between high computational costs and long, multi-stage training pipelines. While fixed-length stabilization phases are significantly mor...Incorporating prior data into online reinforcement learning accelerates training but typically forces a difficult trade-off between high computational costs and long, multi-stage training pipelines. While fixed-length stabilization phases are significantly more computationally efficient than static update schedules, they require task-dependent manual tuning, risking either the waste of prior knowledge or severe overfitting. To address this, we propose SOPE, an algorithm that uses an actor-aligne...
|
| 172 |
QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
2605.05870
|
cs.LG
|
Majid Mohammadi, Grigory Reznikov, Pavel Sinitcyn, Krikamol Muandet, Siu Lun Chau |
We study the efficient computation of Shapley values for \emph{product games} -- cooperative games in which the coalition value factorizes as a product of per-player terms. Such games arise in machine learning explainability whenever the value function inherit...We study the efficient computation of Shapley values for \emph{product games} -- cooperative games in which the coalition value factorizes as a product of per-player terms. Such games arise in machine learning explainability whenever the value function inherits a multiplicative structure from the underlying model, as in kernel methods with product kernels and tree-based models. Our key result is that the Shapley value of each player in a product game admits an exact one-dimensional integral repr...
|
| 173 |
Retain-Neutral Surrogates for Min-Max Unlearning
2605.05871
|
cs.LG
|
Junhao Cai, Dohun Kim, Dowon Kim, Sung Il Choi, Chengjun Jin |
Machine unlearning seeks to remove the influence of designated training data while preserving performance on the remaining data. Approximate unlearning can be viewed as a local editing problem; in min-max unlearning, the key local object is the surrogate point...Machine unlearning seeks to remove the influence of designated training data while preserving performance on the remaining data. Approximate unlearning can be viewed as a local editing problem; in min-max unlearning, the key local object is the surrogate point at which the retain objective is evaluated. When forget and retain gradients are strongly aligned, an unconstrained forget-maximizing perturbation can move to a surrogate point that increases retain loss. We propose Retain-Orthogonal Surro...
|
| 180 |
RepFlow: Representation Enhanced Flow Matching for Causal Effect Estimation
2605.05890
|
cs.LGstat.ME
|
Yifei Xie, Jian Huang |
Estimating causal effects from observational data has become increasingly critical in diverse fields including healthcare, economics, and social policy. The fundamental challenge in causal inference arises from the missing counterfactuals and the selection bia...Estimating causal effects from observational data has become increasingly critical in diverse fields including healthcare, economics, and social policy. The fundamental challenge in causal inference arises from the missing counterfactuals and the selection bias. Existing methods are largely limited to point estimates and lack the capacity for distribution modeling. In this work, we propose RepFlow, a novel framework that formulates causal effect estimation as a joint optimization problem integra...
|
| 185 |
VARS-FL: Validation-Aligned Client Selection for Non-IID Federated Learning in IoT Systems
2605.05896
|
cs.LGcs.AI
|
Mohamed Lakas, Mohamed Amine Ferrag |
Federated learning (FL) systems typically employ stateless client selection, treating each communication round independently and ignoring accumulated evidence of client contribution quality. Under non-IID data, this leads to slow convergence and unstable train...Federated learning (FL) systems typically employ stateless client selection, treating each communication round independently and ignoring accumulated evidence of client contribution quality. Under non-IID data, this leads to slow convergence and unstable training, particularly when selection relies on local proxies (e.g., training loss) that are misaligned with the global optimization objective. These challenges are especially pronounced in Internet of Things (IoT) and Industrial IoT (IIoT) envi...
|
| 186 |
VisMMOE: Exploiting Visual-Expert Affinity for Efficient Visual-Language MoE Offloading
2605.05899
|
cs.LG
|
Cheng Xu, Xiaofeng Hou, Jiacheng Liu, Chao Li |
Large-scale vision-language mixture-of-experts (VL-MoE) models provide strong multimodal capability, but efficient deployment on memory-constrained platforms remains difficult. Existing MoE offloading systems are largely designed for text-centric workloads and...Large-scale vision-language mixture-of-experts (VL-MoE) models provide strong multimodal capability, but efficient deployment on memory-constrained platforms remains difficult. Existing MoE offloading systems are largely designed for text-centric workloads and become much less effective for visual-heavy inputs, where large numbers of visual tokens induce broader and less predictable expert accesses. We present VisMMoE, a VL-MoE offloading system built on a single systems insight: pruning redunda...
|
| 188 |
Quadratic Objective Perturbation: Curvature-Based Differential Privacy
2605.05905
|
cs.LGmath.OC
|
Daniel Cortild, Coralia Cartis |
Objective perturbation is a standard mechanism in differentially private empirical risk minimization. In particular, Linear Objective Perturbation (LOP) enforces privacy by adding a random linear term, while strong convexity and stability are ensured by an add...Objective perturbation is a standard mechanism in differentially private empirical risk minimization. In particular, Linear Objective Perturbation (LOP) enforces privacy by adding a random linear term, while strong convexity and stability are ensured by an additional deterministic quadratic term. However, this approach requires the strong assumption of bounded gradients of the loss function, which excludes many modern machine learning models. In this work, we introduce Quadratic Objective Pertur...
|
| 193 |
From Drops to Grid: Noise-Aware Spatio-Temporal Neural Process for Rainfall Estimation
2605.05912
|
cs.LGcs.CV
|
Rafael Pablos Sarabia, Joachim Nyborg, Morten Birk, Ira Assent |
High-resolution rainfall observations are crucial for weather forecasting, water management, and hazard mitigation. Traditional operational measurements are often biased and low-resolution, limiting their ability to capture local rainfall. Accurate high-resolu...High-resolution rainfall observations are crucial for weather forecasting, water management, and hazard mitigation. Traditional operational measurements are often biased and low-resolution, limiting their ability to capture local rainfall. Accurate high-resolution rainfall maps require integrating sparse surface observations, yet existing deep learning densification methods are hindered by rainfall's skewed, localized nature, noise, and limited spatio-temporal fusion. We present DropsToGrid, a N...
|
| 205 |
Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing
2605.05940
|
cs.LGcs.CL
|
Miao Rang, Zhenni Bi, Hang Zhou, Kai Han, Xuechun Wang |
Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning (RL) frameworks. To...Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning (RL) frameworks. To improve efficiency, we propose Near-Policy Distillation (NPD), an asynchronous approach that decouples student generation from training. This reformulation enables Supervised Fine-Tuning (SFT) with sequence packing. However, asynchronous u...
|
| 214 |
Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs
2605.05957
|
cs.LG
|
Zixuan Chen, Hao Lin, Zizhe Chen, Yizhou Tian, Garry Yang |
LLMs reliably correct false claims when presented in isolation, yet when the same claims are embedded in task-oriented requests, they often comply rather than correct. We term this failure mode \emph{correction suppression} and construct a benchmark of 300 fal...LLMs reliably correct false claims when presented in isolation, yet when the same claims are embedded in task-oriented requests, they often comply rather than correct. We term this failure mode \emph{correction suppression} and construct a benchmark of 300 false premises to systematically evaluate it across eight models. Suppression rates range from 19\% to 90\%, with four models exceeding 80\%, establishing correction suppression as a prevalent and severe phenomenon. Mechanistic analysis reveal...
|
| 219 |
Uncertainty Estimation via Hyperspherical Confidence Mapping
2605.05964
|
cs.LG
|
Eunseo Choi, Ho-Yeon Kim, Jaewon Lee, Taeyong jo, Myungjun lee |
Quantifying uncertainty in neural network predictions is essential for high-stakes domains such as autonomous driving, healthcare, and manufacturing. While existing approaches often depend on costly sampling or restrictive distributional assumptions, we propos...Quantifying uncertainty in neural network predictions is essential for high-stakes domains such as autonomous driving, healthcare, and manufacturing. While existing approaches often depend on costly sampling or restrictive distributional assumptions, we propose Hyperspherical Confidence Mapping (HCM), a simple yet principled framework for sampling-free and distribution-free uncertainty estimation. HCM decomposes outputs into a magnitude and a normalized direction vector constrained to lie on the...
|
| 220 |
Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR
2605.05965
|
cs.LGcs.AI
|
Chaoli Mou, Zhan Zhuang, Xinning Chen, Yu Zhang |
Reinforcement Learning with Verifiable Rewards (RLVR) has become a key approach for improving the reasoning abilities of large language models. However, widely used critic-free algorithms such as Group Relative Policy Optimization (GRPO) necessitate a ``unifor...Reinforcement Learning with Verifiable Rewards (RLVR) has become a key approach for improving the reasoning abilities of large language models. However, widely used critic-free algorithms such as Group Relative Policy Optimization (GRPO) necessitate a ``uniform credit assignment'' assumption that indiscriminately broadcast trajectory-level advantages, hindering learning efficiency by failing to distinguish critical reasoning steps. To address this limitation, we propose Selective Eligibility Tra...
|
| 221 |
Sharper Guarantees for Misspecified Kernelized Bandit Optimization
2605.05967
|
cs.LGmath.OCstat.ML
|
Davide Maran, Csaba Szepesvári |
Existing guarantees for misspecified kernelized bandit optimization pay for misspecification through kernel complexity: in generic offline bounds, the misspecification level $\varepsilon$ is multiplied by $\sqrt{d_\mathrm{eff}}$, where $d_\mathrm{eff}$ is the ...Existing guarantees for misspecified kernelized bandit optimization pay for misspecification through kernel complexity: in generic offline bounds, the misspecification level $\varepsilon$ is multiplied by $\sqrt{d_\mathrm{eff}}$, where $d_\mathrm{eff}$ is the kernel effective dimension, while in online regret bounds, the corresponding penalty is $\sqrt{γ_n}\,n\varepsilon$, where $γ_n$ is the maximum information gain after $n$ rounds of interaction. In this work, we show that, for a large class...
|
| 222 |
Training Transformers for KV Cache Compressibility
2605.05971
|
cs.LG
|
Yoav Gelberg, Yam Eitan, Michael Bronstein, Yarin Gal, Haggai Maron |
Long-context language modeling is increasingly constrained by the Key-Value (KV) cache, whose memory and decode-time access costs scale linearly with the prefix length. This bottleneck has motivated a range of context-compression methods, from token-level summ...Long-context language modeling is increasingly constrained by the Key-Value (KV) cache, whose memory and decode-time access costs scale linearly with the prefix length. This bottleneck has motivated a range of context-compression methods, from token-level summarization to recent optimization-based KV compression methods. These post-hoc methods operate on the KV cache of a fixed pretrained model, so their effectiveness is fundamentally limited by how well the model's internal representations can ...
|
| 225 |
Physical Fidelity Reconstruction via Improved Consistency-Distilled Flow Matching for Dynamical Systems
2605.05975
|
cs.LGphysics.flu-dyn
|
Sicheng Ma, Tianyue Yang, Xiuzhe Wu, Xiao Xue |
Reconstructing high-fidelity flow fields from low-fidelity observations is a central problem in scientific machine learning, yet recent diffusion and flow-matching models typically rely on iterative sampling, making them costly for latency-sensitive workflows ...Reconstructing high-fidelity flow fields from low-fidelity observations is a central problem in scientific machine learning, yet recent diffusion and flow-matching models typically rely on iterative sampling, making them costly for latency-sensitive workflows such as ensemble forecasting, real-time visualization, and simulation-in-the-loop inference. We study whether a high-fidelity flow-matching generative model can be compressed into a compact one-step model for fast scientific flow reconstruc...
|
| 230 |
Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions
2605.05983
|
cs.LG
|
Yuntai Bao, Qinfeng Li, Xinyan Yu, Xuhong Zhang, Ge Su |
Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are more effective than optimization-free ones. However, current approaches to fine-tuned SVs ...Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are more effective than optimization-free ones. However, current approaches to fine-tuned SVs suffer from two limitations. First, they require careful selection of steering factors on a per-SV basis to balance steering effectiveness and generation quality at inference time. Second, they operate as full-sequence SVs (FSSVs), which ca...
|
| 234 |
DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression
2605.05994
|
cs.LG
|
Nobutaka Ono |
In this paper, we propose DiBA (Diagonal and Binary Matrix Approximation), a compact matrix factorization for neural network weight compression. Many components of modern networks, including linear layers, $1\times1$ convolutions, attention projections, and em...In this paper, we propose DiBA (Diagonal and Binary Matrix Approximation), a compact matrix factorization for neural network weight compression. Many components of modern networks, including linear layers, $1\times1$ convolutions, attention projections, and embedding layers, have dense matrix weights. DiBA approximates $A\in\mathbb{R}^{m\times n}$ by $\widehat A=D_1B_1D_2B_2D_3$, where $D_1,D_2,D_3$ are diagonal matrices and $B_1,B_2$ are $0/1$ binary matrices. The intermediate dimension $k$ con...
|
| 238 |
A Fine-Grained Understanding of Uniform Convergence for Halfspaces
2605.06004
|
cs.LGcs.AImath.ST
|
Aryeh Kontorovich, Kasper Green Larsen |
We study the fine-grained uniform convergence behavior of halfspaces beyond worst-case VC bounds. For inhomogeneous halfspaces in $\mathbb{R}^d$ with $d\ge 2$, we show that standard first-order VC bounds are essentially tight: even consistent hypotheses can in...We study the fine-grained uniform convergence behavior of halfspaces beyond worst-case VC bounds. For inhomogeneous halfspaces in $\mathbb{R}^d$ with $d\ge 2$, we show that standard first-order VC bounds are essentially tight: even consistent hypotheses can incur population error $Θ(d\ln(n/d)/n)$, and in the agnostic setting the deviation scales as $\sqrt{τ\ln(1/τ)}$ at true error $τ$. In contrast, homogeneous halfspaces in $\mathbb{R}^2$ exhibit a markedly different behavior. In the realizable ...
|
| 239 |
Enabling Unsupervised Training of Deep EEG Denoisers With Intelligent Partitioning
2605.06724
|
cs.LGcs.AIeess.SP
|
Qiyu Rao, Haozhe Tian, Homayoun Hamedmoghadam, Danilo Mandic |
Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot ...Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot handle the time-varying pervasive artifacts in wearable EEGs. Deep learning methods, on the other hand, show promise in decomposition-free EEG denoising using highly expressive neural networks, but the training requires artifact-free EEG, w...
|
| 245 |
Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven
2605.06014
|
cs.LGcs.AIcs.DScs.NI
|
Ran Ben-Basat, William Kuszmaul, Michael Mitzenmacher, Amit Portnoy, Shay Vargaftik |
Uniform random rotations (URRs) are a common preprocessing step in modern quantization approaches used for gradient compression, inference acceleration, KV-cache compression, model weight quantization, and approximate nearest-neighbor search in vector database...Uniform random rotations (URRs) are a common preprocessing step in modern quantization approaches used for gradient compression, inference acceleration, KV-cache compression, model weight quantization, and approximate nearest-neighbor search in vector databases. In practice, URRs are often replaced by randomized Hadamard transforms (RHTs), which preserve orthogonality while admitting fast implementations. The remaining issue is the performance for worst-case inputs. With a URR, each coordinate i...
|
| 246 |
Matrix-Decoupled Concentration for Autoregressive Sequences: Dimension-Free Guarantees for Sparse Long-Context Rewards
2605.06017
|
cs.LGmath.PR
|
Pei-Sen Li |
Sequence-level evaluations in autoregressive Large Language Models (LLMs) rely on highly dependent token generation. Establishing tight concentration bounds for these processes remains a challenge due to two fundamental bottlenecks in existing frameworks: (i) ...Sequence-level evaluations in autoregressive Large Language Models (LLMs) rely on highly dependent token generation. Establishing tight concentration bounds for these processes remains a challenge due to two fundamental bottlenecks in existing frameworks: (i) classical inequalities typically separate dependency structures from target sensitivities, leading to a scalar collapse that inflates the variance proxy to a suboptimal $\mathcal{O}(N)$ for sparse terminal rewards; (ii) conversely, while ce...
|
| 249 |
Multi-agent decision making: A Blackwell's informativeness approach
2605.06028
|
cs.LG
|
Zheng Zhang, Cuong C. Nguyen, Kevin Wells, Gustavo Carneiro |
The rapid development of large language models (LLMs) has motivated research on decision-making in multi-agent systems, where multiple agents collaborate to achieve shared objectives. Existing aggregation approaches, such as voting and debate, are largely ad-h...The rapid development of large language models (LLMs) has motivated research on decision-making in multi-agent systems, where multiple agents collaborate to achieve shared objectives. Existing aggregation approaches, such as voting and debate, are largely ad-hoc and lack formal guarantees regarding the informativeness of the resulting decisions. In this paper, we provide a principled approach to analyse decisions made in the multi-LLM setting using Blackwell's informativeness framework. Within t...
|
| 251 |
Transformer-Based Wildlife Species Classification from Daily Movement Trajectories
2605.06726
|
cs.LG
|
Obed Irakoze, Prasenjit Mitra |
Inferring the identity of wildlife species from daily movement data alone is a challenging task. We train sequence models on large-scale, 7-species GPS trajectories from the Movebank platform. Trajectories models are evaluated using a protocol in which entire ...Inferring the identity of wildlife species from daily movement data alone is a challenging task. We train sequence models on large-scale, 7-species GPS trajectories from the Movebank platform. Trajectories models are evaluated using a protocol in which entire telemetry studies or regions are heldout during testing. We compare Transformer-based sequence models to LSTM, CNN, and Temporal Convolutional Networks, and find that Transformers consistently achieve higher balanced accuracy with gains of ...
|
| 253 |
Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters
2605.06032
|
cs.LGcs.AI
|
Hugo Cazaux, Eyjólfur Ingi Ásgeirsson, Hlynur Stefánsson |
Synthetic data has transformed language model training, yet its role in time series forecasting remains poorly understood. We present a large-scale empirical study: nine experiment groups, 4,218 runs systematically evaluating synthetic time series augmentation...Synthetic data has transformed language model training, yet its role in time series forecasting remains poorly understood. We present a large-scale empirical study: nine experiment groups, 4,218 runs systematically evaluating synthetic time series augmentation across five architectures, four synthetic signals and seven datasets. The effect is sharply architecture-conditional: channel-mixing models (TimesNet, iTransformer) benefit in the majority of trials, while channel-independent models (DLine...
|
| 256 |
Optimal Transport for LLM Reward Modeling from Noisy Preference
2605.06036
|
cs.LGcs.AI
|
Licheng Pan, Haochen Yang, Haoxuan Li, Yunsheng Lu, Yongqi Tong |
Reward models are fundamental to Reinforcement Learning from Human Feedback (RLHF), yet real-world datasets are inevitably corrupted by noisy preference. Conventional training objectives tend to overfit these errors, while existing denoising approaches often r...Reward models are fundamental to Reinforcement Learning from Human Feedback (RLHF), yet real-world datasets are inevitably corrupted by noisy preference. Conventional training objectives tend to overfit these errors, while existing denoising approaches often rely on homogeneous noise assumptions that fail to capture the complexity of linguistic preferences. To handle these challenges, we propose SelectiveRM, a framework grounded in optimal transport. We first devise a Joint Consistency Discrepan...
|
| 257 |
Medical Imaging Classification with Cold-Atom Reservoir Computing using Auto-Encoders and Surrogate-Driven Training
2605.06727
|
cs.LGcs.ETeess.IV
|
Nuno Batista, Ana Morgado, Oscar Ferraz, Sagar Silva Pratapsi, Jorge Lobo |
We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guide...We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guided auto-encoder. This pipeline learns compact and discriminative representations of image data that are also well-suited for quantum reservoir computing. A key challenge in such systems is the non-differentiable nature of quantum measurement...
|
| 261 |
Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference
2605.06046
|
cs.LG
|
Saksham Rathi, Preeti, Mythili Vutukuru |
Auto-regressive token generation in large language models is memory-bound because it requires "attending to" key and value tensors (KV cache) of all previous tokens. Prior work aims to improve the efficiency of this decode process by batching multiple requests...Auto-regressive token generation in large language models is memory-bound because it requires "attending to" key and value tensors (KV cache) of all previous tokens. Prior work aims to improve the efficiency of this decode process by batching multiple requests together, and maximizing batch size subject to GPU memory constraints. The key observation of our work is that with prefix-sharing workloads, smaller, prefix-homogeneous batches -- where all requests share a common prefix -- can achieve hi...
|
| 262 |
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
2605.06047
|
cs.LGcs.AI
|
Duong Nguyen, Mohammed Jawhar, Nicolas Chesneau |
Tabular foundation models (TFMs), such as TabPFN-2.6, TabICLv2, ConTextTab, Mitra, LimiX, and TabDPT, achieve strong zero-shot performance through in-context learning, but their inductive biases remain fixed at inference time. Adapting a pretrained TFM to a sp...Tabular foundation models (TFMs), such as TabPFN-2.6, TabICLv2, ConTextTab, Mitra, LimiX, and TabDPT, achieve strong zero-shot performance through in-context learning, but their inductive biases remain fixed at inference time. Adapting a pretrained TFM to a specific dataset or task typically requires either full fine-tuning, which is computationally expensive, or parameter-efficient tuning methods (PEFT) such as LoRA, which must be tailored to the internal architecture of each TFM. Furthermore, ...
|
| 264 |
When Brain Networks Travel: Learning Beyond Site
2605.06050
|
cs.LG
|
Yingxu Wang, Kunyu Zhang, Yanwu Yang, Thomas Wolfers, Yujie Wu |
Graph-based learning on functional magnetic resonance imaging (fMRI) has shown strong potential for brain network analysis. However, existing methods degrade under cross-site out-of-distribution (OOD) settings because site-conditioned confounders induce non-pa...Graph-based learning on functional magnetic resonance imaging (fMRI) has shown strong potential for brain network analysis. However, existing methods degrade under cross-site out-of-distribution (OOD) settings because site-conditioned confounders induce non-pathological shortcuts, while functional connectivity constructed by temporal averaging obscures transient neurodynamics, limiting generalization to unseen sites. In this paper, we propose Cross-site OOD Robust brain nEtwork (CORE), a unified...
|
| 266 |
The E$Δ$-MHC-Geo Transformer: Adaptive Geodesic Operations with Guaranteed Orthogonality
2605.06729
|
cs.LGcs.AI
|
Arash Shahmansoori |
We present the E$Δ$-MHC-Geo Transformer, a novel architecture that unifies Manifold-Constrained Hyper-Connections (mHC), Deep Delta Learning (DDL), and the Cayley transform to obtain input-adaptive, unconditionally orthogonal residual connections. Unlike DDL, ...We present the E$Δ$-MHC-Geo Transformer, a novel architecture that unifies Manifold-Constrained Hyper-Connections (mHC), Deep Delta Learning (DDL), and the Cayley transform to obtain input-adaptive, unconditionally orthogonal residual connections. Unlike DDL, whose Householder operator is orthogonal only at $β\in \{0,2\}$, our Data-Dependent Cayley rotation $Q(x)=(I+(β/2)A(x))^{-1}(I-(β/2)A(x))$ preserves orthogonality for all $β$ and all inputs. To handle negation, an eigenvalue $-1$ case that ...
|
| 267 |
Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics
2605.06730
|
cs.LG
|
Likhita Yerra, Remi Uttejitha Allam |
We introduce Semantic State Abstraction Interfaces (SSAI): a methodological template for mapping sparse unstructured text into $K$ auditable, named coordinates with neutral defaults on no-news days, designed to separate representation hypotheses from optimisat...We introduce Semantic State Abstraction Interfaces (SSAI): a methodological template for mapping sparse unstructured text into $K$ auditable, named coordinates with neutral defaults on no-news days, designed to separate representation hypotheses from optimisation variance in sequential decision systems. Our contribution is the framework and its evaluation protocol, not a claim that SSAI outperforms denser alternatives. We instantiate SSAI with $K=4$ axes (sentiment, risk, confidence, volatilit...
|
| 268 |
Towards Generation-Efficient Uncertainty Estimation in Large Language Models
2605.06053
|
cs.LG
|
Mingcheng Zhu, Yu Liu, Tingting Zhu |
Uncertainty estimation is important for deploying LLMs in high-stakes applications such as healthcare and finance, where hallucinations can appear fluent and plausible while being factually incorrect, making it difficult for users to judge whether an output sh...Uncertainty estimation is important for deploying LLMs in high-stakes applications such as healthcare and finance, where hallucinations can appear fluent and plausible while being factually incorrect, making it difficult for users to judge whether an output should be trusted. Existing methods require one or more full autoregressive generations to estimate uncertainty, which introduces substantial inference cost and often delays uncertainty assessment. In this paper, we investigate whether effect...
|
| 271 |
Towards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictions
2605.06058
|
cs.LGcs.CV
|
Kjetil Indrehus, Adrian Duric, Changkyu Choi, Ali Ramezani-Kebrya |
Document Visual Question Answering (DocVQA) requires vision-language models to reason not only about what information in a document is relevant to a question, but also where the answer is grounded on the page. Existing DocVQA models entangle question-relevant ...Document Visual Question Answering (DocVQA) requires vision-language models to reason not only about what information in a document is relevant to a question, but also where the answer is grounded on the page. Existing DocVQA models entangle question-relevant evidence and answer localization and operate largely as black boxes, offering limited means to verify how predictions depend on visual evidence. We propose CoExVQA, a self-explainable DocVQA framework with a grounded reasoning process throu...
|
| 273 |
Geometry-Aware Simplicial Message Passing
2605.06061
|
cs.LGcs.CGmath.AT
|
Elena Xinyi Wang, Bastian Rieck |
The Weisfeiler--Lehman (WL) test and its simplicial extension (SWL) characterize the combinatorial expressivity of message passing networks, but they are blind to geometry, i.e., meshes with identical connectivity but different embeddings are indistinguishable...The Weisfeiler--Lehman (WL) test and its simplicial extension (SWL) characterize the combinatorial expressivity of message passing networks, but they are blind to geometry, i.e., meshes with identical connectivity but different embeddings are indistinguishable. We introduce the Geometric Simplicial Weisfeiler--Lehman (GSWL) test, which incorporates vertex coordinates into color refinement for geometric simplicial complexes. In addition, we show that (i) the expressivity of geometry-aware simplic...
|
| 276 |
Causal Reinforcement Learning for Complex Card Games: A Magic The Gathering Benchmark
2605.06066
|
cs.LGcs.AI
|
Cristiano da Costa Cunha, Ajmal Mian, Tim French, Wei Liu |
Causal reinforcement learning (RL) lacks benchmarks for complex systems that combine sequential decision making, hidden information, large masked action spaces, and explicit causal structure. We introduce MTG-Causal-RL, a Gymnasium benchmark built on Magic: Th...Causal reinforcement learning (RL) lacks benchmarks for complex systems that combine sequential decision making, hidden information, large masked action spaces, and explicit causal structure. We introduce MTG-Causal-RL, a Gymnasium benchmark built on Magic: The Gathering with a 3,077-dimensional partial observation, a 478-action masked discrete action space, five competitive Standard archetypes, three reward schemes, and a hand-specified Structural Causal Model (SCM) over strategic variables. Ev...
|
| 277 |
Normalized Architectures are Natively 4-Bit
2605.06067
|
cs.LGcs.AI
|
Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry, Boris Ginsburg |
Training large language models at 4-bit precision is critical for efficiency. We show that nGPT, an architecture that constrains weights and hidden representations to the unit hypersphere, is inherently more robust to low-precision arithmetic. This removes the...Training large language models at 4-bit precision is critical for efficiency. We show that nGPT, an architecture that constrains weights and hidden representations to the unit hypersphere, is inherently more robust to low-precision arithmetic. This removes the need for interventions-such as applying random Hadamard transforms and performing per-tensor scaling calculations-to preserve model quality, and it enables stable end-to-end NVFP4 training. We validate this approach on both a 1.2B dense mo...
|
| 280 |
PRISM: Iterative Cross-Modal Posterior Refinement for Dynamic Text-Attributed Graphs
2605.06073
|
cs.LG
|
Trimble Chang, Yihang Liu, Mingjing Han, Han Zhang |
Dynamic text-attributed graphs (DyTAGs) provide a powerful framework for modeling evolving systems in which node semantics and time-dependent interactions are tightly coupled. Recently, multimodal learning has emerged as a promising yet underexplored direction...Dynamic text-attributed graphs (DyTAGs) provide a powerful framework for modeling evolving systems in which node semantics and time-dependent interactions are tightly coupled. Recently, multimodal learning has emerged as a promising yet underexplored direction for enhancing DyTAG representation learning. However, existing methods typically rely on rigid modality partitions and one-shot fusion strategies, which limit their ability to capture the intrinsic and evolving dependencies between node se...
|
| 282 |
Understanding diffusion models requires rethinking (again) generalization
2605.06077
|
cs.LG
|
Pierre Marion, Yu-Han Wu |
This position paper argues that understanding generalization in diffusion models requires fundamentally new theoretical frameworks that go beyond both classical statistical learning theory and the benign overfitting paradigm developed for supervised learning. ...This position paper argues that understanding generalization in diffusion models requires fundamentally new theoretical frameworks that go beyond both classical statistical learning theory and the benign overfitting paradigm developed for supervised learning. In diffusion models, unlike in supervised learning, memorization of training data and generalization to novel samples are incompatible: a model that has fully memorized its training set generates copies rather than novel data. Several theor...
|
| 285 |
Fast Gauss-Newton for Multiclass Cross-Entropy
2605.06081
|
cs.LG
|
Mikalai Korbit, Mario Zanon |
In multiclass softmax cross-entropy, the full generalized Gauss-Newton (GGN) curvature couples all output logits through the softmax covariance, making curvature-vector products harder to scale as the number of classes grows. We show that the standard multicla...In multiclass softmax cross-entropy, the full generalized Gauss-Newton (GGN) curvature couples all output logits through the softmax covariance, making curvature-vector products harder to scale as the number of classes grows. We show that the standard multiclass GGN can be decomposed exactly into a true-vs-rest term and a positive semidefinite within-competitor covariance term. Fast Gauss-Newton (FGN) retains the first term and drops the second, yielding a positive semidefinite under-approximati...
|
| 298 |
Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer
2605.06104
|
cs.LGcs.AI
|
Yongyi Wang, Hanyu Liu, Lingfeng Li, Bozhou Chen, Ang Li |
Decision Transformer (DT) formulates offline reinforcement learning as autoregressive sequence modeling, achieving promising results by predicting actions from a sequence of Return-to-Go (RTG), state, and action tokens. However, RTG is a scalar that summarizes...Decision Transformer (DT) formulates offline reinforcement learning as autoregressive sequence modeling, achieving promising results by predicting actions from a sequence of Return-to-Go (RTG), state, and action tokens. However, RTG is a scalar that summarizes future rewards, containing far less information than typical state or action vectors, yet it consumes the same computational budget per token. Worse, the self-attention cost of Transformers grows quadratically with sequence length, so incl...
|
| 307 |
BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification
2605.06117
|
cs.LG
|
Yi-Siang Wang, Kuan-Yu Chen, Yu-Chen Den, Darby Tien-Hao Chang |
Large language models (LLMs) have recently been adapted to tabular prediction by serializing structured features into natural language, but their performance in low-data regimes remains limited compared to gradient-boosted decision trees (GBDTs). In this work,...Large language models (LLMs) have recently been adapted to tabular prediction by serializing structured features into natural language, but their performance in low-data regimes remains limited compared to gradient-boosted decision trees (GBDTs). In this work, we revisit the boosting paradigm, traditionally associated with tree ensembles, and ask whether it can be applied as a general training principle for LLM fine-tuning. We propose BoostLLM, a framework that transforms parameter-efficient fin...
|
| 317 |
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
2605.06139
|
cs.LGcs.AI
|
Yun Qu, Qi Wang, Yixiu Mao, Heming Zou, Yuhang Jiang |
Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes, group-based policy gradient is prevalent, which samples a group of re...Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes, group-based policy gradient is prevalent, which samples a group of responses per prompt and updates the policy via group-relative advantage signals. This work reveals that these optimization strategies share a common geometric structure: each implicitly defines a target distribution on the response simplex a...
|
| 318 |
SymDrift: One-Shot Generative Modeling under Symmetries
2605.06140
|
cs.LGcs.AI
|
Samir Darouich, Vinh Tong, Lluís Pastor-Pérez, Tanja Bien, Loay Mualem |
Generative modeling of physical systems, such as molecules, requires learning distributions that are invariant under global symmetries, such as rotations in three-dimensional space. Equivariant diffusion and flow matching models can incorporate such invariance...Generative modeling of physical systems, such as molecules, requires learning distributions that are invariant under global symmetries, such as rotations in three-dimensional space. Equivariant diffusion and flow matching models can incorporate such invariances effectively, even when trained on a non-invariant empirical distribution, but they typically rely on costly multi-step sampling. Recently, drifting models have emerged as an efficient alternative, enabling single-step generation and achie...
|
| 319 |
Matrix-Valued Optimism is Matrix-Valued Augmentation: Additive Hybrid Designs for Constrained Optimization
2605.06141
|
cs.LG
|
Jiayi Zhao |
Augmented Lagrangian and optimistic primal--dual methods stabilize equality-constrained optimization through seemingly different mechanisms: the former adds constraint-dependent primal curvature, while the latter adds dual memory. Recent work has shown that th...Augmented Lagrangian and optimistic primal--dual methods stabilize equality-constrained optimization through seemingly different mechanisms: the former adds constraint-dependent primal curvature, while the latter adds dual memory. Recent work has shown that these mechanisms are equivalent for scalar parameters. We extend this equivalence to matrix-valued correction. We prove an additivity principle: for symmetric matrix parameters, the ideal primal trajectory depends only on the summed correctio...
|
| 322 |
Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization
2605.06145
|
cs.LGcs.AIeess.SY
|
Alireza Modirshanechi, Benjamin Eysenbach, Peter Dayan, Eric Schulz |
Unsupervised pretraining has driven empirical advances in goal-conditioned reinforcement learning (GCRL), but its theoretical foundations remain poorly understood. In particular, an influential class of methods, mutual information skill learning (MISL), discov...Unsupervised pretraining has driven empirical advances in goal-conditioned reinforcement learning (GCRL), but its theoretical foundations remain poorly understood. In particular, an influential class of methods, mutual information skill learning (MISL), discovers behaviorally diverse skills that can later be used for downstream goal-reaching. However, it remains a theoretical mystery why skills learned through MISL should support goal-reaching. A subtle challenge is that both GCRL and MISL are u...
|
| 324 |
AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning
2605.06149
|
cs.LGcs.AI
|
Yaomin Wang, Jianting Pan, Ran Tian, Xiaoyang Li, Yu Zhang |
The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naiv...The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor--critic implementations can become unstable and degenerate toward TD-error collapse. We propose AdaGamma, a practical deep actor--critic method for state-dependent discounting that learns a state-dependent discount function tog...
|
| 325 |
Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes
2605.06152
|
cs.LGcs.CLmath.OCstat.ML
|
Liu Hanqing, Jianjun Cao, Yuanze Li, Zijian Zhou |
Deep neural networks exhibit periodic loss spikes during unregularized long-term training, a phenomenon known as the "Slingshot Mechanism." Existing work usually attributes this to intrinsic optimization dynamics, but its triggering mechanism remains unclear. ...Deep neural networks exhibit periodic loss spikes during unregularized long-term training, a phenomenon known as the "Slingshot Mechanism." Existing work usually attributes this to intrinsic optimization dynamics, but its triggering mechanism remains unclear. This paper proves that this phenomenon is a result of floating-point arithmetic precision limits. As training enters a high-confidence stage, the difference between the correct-class logit and the other logits may exceed the absorption-erro...
|
| 328 |
Entropy-Regularized Adjoint Matching for Offline Reinforcement Learning
2605.06156
|
cs.LGcs.AI
|
Abdelghani Ghanem, Mounir Ghogho |
Integrating expressive generative policies, such as flow-matching models, into offline reinforcement learning (RL) allows agents to capture complex, multi-modal behaviors. While Q-learning with Adjoint Matching (QAM) stabilizes policy optimization via the cont...Integrating expressive generative policies, such as flow-matching models, into offline reinforcement learning (RL) allows agents to capture complex, multi-modal behaviors. While Q-learning with Adjoint Matching (QAM) stabilizes policy optimization via the continuous adjoint method, it remains inherently bound to the fixed behavior distribution. This dependence induces a \textit{popularity bias} that can suppress high-reward actions in low-density regions, and creates a \textit{support binding} t...
|
| 331 |
On Training in Imagination
2605.06732
|
cs.LG
|
Nadav Timor, Ravid Shwartz-Ziv, Micah Goldblum, Yann LeCun, David Harel |
State-of-the-art model-based reinforcement learning methods train policies on imagined rollouts. These rollouts are trajectories generated by a learned dynamics model and are scored by a learned reward model, but without querying the true environment during po...State-of-the-art model-based reinforcement learning methods train policies on imagined rollouts. These rollouts are trajectories generated by a learned dynamics model and are scored by a learned reward model, but without querying the true environment during policy updates. We study this training paradigm by quantifying how errors in learned dynamics and reward models affect returns and policy optimization. First, we extend the analysis of Asadi et al. (2018) to MDPs with learned reward models, a...
|
| 333 |
One Algorithm, Two Goals: Dual Scoring for Parameter and Data Selection in LLM Fine-Tuning
2605.06166
|
cs.LG
|
Xinrui Chen, Liu Yang, Ou Wu |
In Large Language Model (LLM) fine-tuning, parameter and data selection are common strategies for reducing fine-tuning cost, yet they are typically driven by separate scoring mechanisms. When a parameter mask and data subset jointly determine restricted fine-t...In Large Language Model (LLM) fine-tuning, parameter and data selection are common strategies for reducing fine-tuning cost, yet they are typically driven by separate scoring mechanisms. When a parameter mask and data subset jointly determine restricted fine-tuning, this separation incurs redundant overhead and makes coordinated selection difficult. We cast parameter and data selection as two bilevel selection problems under a common validation objective and derive a shared local response-surrog...
|
| 334 |
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
2605.06169
|
cs.LGcs.CV
|
Pengqi Lu |
Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. Through mechanistic auditing...Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. Through mechanistic auditing, we isolate the trigger event of this collapse as Mean Mode Screaming (MMS). MMS can occur even when training appears stable, with a mean-coherent backward shock on residual writers that opens deep residual branches and drives the network ...
|
| 339 |
Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA
2605.06733
|
cs.LGcs.AI
|
Jinqian Chen, Chang Liu, Jihua Zhu |
Federated LoRA enables parameter-efficient adaptation of large language models under decentralized data and limited client resources.However, directly averaging LoRA factors is representation-dependent: the same intrinsic update admits infinitely many gauge-eq...Federated LoRA enables parameter-efficient adaptation of large language models under decentralized data and limited client resources.However, directly averaging LoRA factors is representation-dependent: the same intrinsic update admits infinitely many gauge-equivalent factorizations, so factor-level aggregation can change under arbitrary coordinate choices while the underlying update remains unchanged. This reveals a semantic mismatch in existing federated LoRA aggregation rules. We propose \tex...
|
| 344 |
In-Context Black-Box Optimization with Unreliable Feedback
2605.06187
|
cs.LGcs.AI
|
Nicolas Samuel Blumer, Julien Martinelli, Samuel Kaski |
Black-box optimization in science and engineering often comes with side information: experts, simulators, pretrained predictors, or heuristics can suggest which candidates look promising. This information can accelerate search, but it can also be biased, input...Black-box optimization in science and engineering often comes with side information: experts, simulators, pretrained predictors, or heuristics can suggest which candidates look promising. This information can accelerate search, but it can also be biased, input-dependent, or misleading. Feedback-aware BO methods typically handle one task at a time, limiting their ability to generalize over multiple sources of feedback. In-context optimizers address cross-task adaptation, but usually assume that o...
|
| 345 |
Gated QKAN-FWP: Scalable Quantum-inspired Sequence Learning
2605.06734
|
cs.LGcs.AIquant-ph
|
Kuo-Chung Peng, Samuel Yen-Chi Chen, Jiun-Cheng Jiang, Chen-Yu Liu, En-Jui Kuo |
Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-q...Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-qubit architectures that are difficult to scale on noisy intermediate-scale quantum (NISQ) devices and expensive to simulate classically. We propose gated QKAN-FWP, a fast-weight framework that integrates FWP with Quantum-inspired Kolmogorov...
|
| 348 |
Constrained Contextual Bandits with Adversarial Contexts
2605.06190
|
cs.LG
|
Dhruv Sarkar, Abhishek Sinha |
We study budget-constrained contextual bandits with adversarial contexts, where each action yields a random reward and incurs a random cost. We adopt the standard realizability assumption: conditioned on the observed context, rewards and costs are drawn indepe...We study budget-constrained contextual bandits with adversarial contexts, where each action yields a random reward and incurs a random cost. We adopt the standard realizability assumption: conditioned on the observed context, rewards and costs are drawn independently from fixed distributions whose expectations belong to known function classes. We focus on the continuing setting, in which the algorithm operates over the entire horizon even after the budget for cumulative cost is exhausted. In thi...
|
| 355 |
STDA-Net: Spectrogram-Based Domain Adaptation for cross-dataset Sleep Stage Classification
2605.06736
|
cs.LGcs.AIcs.HC
|
Unaza Tallal, Shruti Kshirsagar, Ankita Shukla |
Accurate sleep stage classification across datasets remains challenging due to variability in EEG channel montages, sampling rates, recording environments, and subject populations. Although deep learning has shown considerable promise for automated sleep stagi...Accurate sleep stage classification across datasets remains challenging due to variability in EEG channel montages, sampling rates, recording environments, and subject populations. Although deep learning has shown considerable promise for automated sleep staging, most existing cross-dataset methods rely on one-dimensional EEG signal representations, whereas the use of two-dimensional spectrogram-based inputs within an unsupervised domain adaptation framework has remained largely unexplored. Here...
|
| 356 |
Bandit Learning in General Open Multi-agent Systems
2605.06202
|
cs.LGstat.ML
|
Mengfan Xu |
Recent developments in digital platforms have highlighted the prevalence of open systems, where agents can arrive and depart over time. While bandit learning in open systems has recently received initial attention, existing work imposes structural assumptions ...Recent developments in digital platforms have highlighted the prevalence of open systems, where agents can arrive and depart over time. While bandit learning in open systems has recently received initial attention, existing work imposes structural assumptions that are frequently violated in practice. A learning paradigm for general open systems creates fresh challenges: newly arriving agents induce endogenous non-stationarity; agent patterns determine how quickly information accumulates; and new...
|
| 359 |
Federation of Experts: Communication Efficient Distributed Inference for Large Language Models
2605.06206
|
cs.LG
|
Muhammad Shahir Abdurrahman, Chun Deng, Azalia Mirhoseini, Philip Levis |
Mixture of experts has emerged as the primary mechanism for making Large Language Models (LLMs) computationally efficient. However, in distributed settings, communicating token embeddings between experts is a significant bottleneck. We present the novel Fede...Mixture of experts has emerged as the primary mechanism for making Large Language Models (LLMs) computationally efficient. However, in distributed settings, communicating token embeddings between experts is a significant bottleneck. We present the novel Federation of Experts (FoE) architecture. FoE restructures the MoE block of a transformer layer into multiple MoE clusters. Each cluster is responsible for only one of the KV heads and expert parallelism is applied between those experts. Betwee...
|
| 362 |
Contrastive Identification and Generation in the Limit
2605.06211
|
cs.LGcs.AIcs.CLcs.DS
|
Xiaoyu Li, Andi Han, Jiaojiao Jiang, Junbin Gao |
In the classical identification in the limit model of Gold [1967], a stream of positive examples is presented round by round, and the learner must eventually recover the target hypothesis. Recently, Kleinberg and Mullainathan [2024] introduced generation in th...In the classical identification in the limit model of Gold [1967], a stream of positive examples is presented round by round, and the learner must eventually recover the target hypothesis. Recently, Kleinberg and Mullainathan [2024] introduced generation in the limit, where the learner instead must eventually output novel elements of the target's support. Both lines of work focus on positive-only or fully labeled data. Yet many natural supervision signals are inherently relational rather than si...
|
| 363 |
Playing the network backward: A Game Theoretic Attribution Framework
2605.06212
|
cs.LGcs.CV
|
Jakob Paul Zimmermann, Jim Berend, Georg Loho, Sebastian Lapuschkin, Wojciech Samek |
Attribution methods explain which input features drive a model's prediction, making them central to model debugging and mechanistic interpretability. Yet backward attribution methods, including gradients, LRP, and transformer-specific rules, lack a shared fram...Attribution methods explain which input features drive a model's prediction, making them central to model debugging and mechanistic interpretability. Yet backward attribution methods, including gradients, LRP, and transformer-specific rules, lack a shared framework in which to compare the underlying backward calculations. We introduce such a framework by recasting backward attribution as a two-player game on an extended network graph, building on Gaubert and Vlassopoulos' ReLU Net Game. Gradient...
|
| 368 |
AffineLens: Capturing the Continuous Piecewise Affine Functions of Neural Networks
2605.06218
|
cs.LG
|
Yi Wei, Xuan Qi, Furao shen, Jian Zhao, Vittorio Murino |
Piecewise affine neural networks (PANNs) provide a principled geometric perspective on neural network expressivity by characterizing the input--output map as a continuous piecewise affine (CPA) function whose complexity is governed by the number, arrangement, ...Piecewise affine neural networks (PANNs) provide a principled geometric perspective on neural network expressivity by characterizing the input--output map as a continuous piecewise affine (CPA) function whose complexity is governed by the number, arrangement, and shapes of its affine regions. However, existing interpretability and expressivity analyses often rely on indirect proxies (e.g., activation statistics or theoretical upper bounds) and rarely offer practical, accurate tools for enumerati...
|
| 373 |
Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs
2605.06225
|
cs.LGcs.AI
|
Andy Zeyi Liu, Michael Zhang, Ilana Greenberg, Adam Alnasser, Lucas Baker |
Steering large language models (LLMs) is usually done by either instruction prompting or activation steering. Prompting often gives strong control, but caches guidance tokens at every layer and can clutter long interactions; activation steering is compact but ...Steering large language models (LLMs) is usually done by either instruction prompting or activation steering. Prompting often gives strong control, but caches guidance tokens at every layer and can clutter long interactions; activation steering is compact but typically weaker and does not support large structured reminders. We introduce memory inception (MI), a training-free method that steers in latent attention space by inserting text-derived key-value (KV) banks only at selected layers. Rathe...
|
| 376 |
Soft Deterministic Policy Gradient with Gaussian Smoothing
2605.06228
|
cs.LGcs.AI
|
Hyunjun Na, Donghwan Lee |
Deterministic policy gradient (DPG) is widely utilized for continuous control; however, it inherently relies on the differentiability of the critic with respect to the action during policy updates. This assumption is violated in practical control problems invo...Deterministic policy gradient (DPG) is widely utilized for continuous control; however, it inherently relies on the differentiability of the critic with respect to the action during policy updates. This assumption is violated in practical control problems involving sparse or discrete rewards, leading to ill-defined policy gradients and unstable learning. To address these challenges, we propose a principled alternative based on a smoothed Bellman equation formulated via Gaussian smoothing. Specif...
|
| 381 |
Band Together: Untargeted Adversarial Training with Multimodal Coordination against Evasion-based Promotion Attacks
2605.06238
|
cs.LGcs.AI
|
Guanmeng Xian, Ning Yang, Philip S. Yu |
Multimodal recommender systems exploit visual and textual signals to alleviate data sparsity, but this also makes them more vulnerable to evasion-based promotion attacks. Existing defenses are largely limited to single-modal settings and mainly focus on poison...Multimodal recommender systems exploit visual and textual signals to alleviate data sparsity, but this also makes them more vulnerable to evasion-based promotion attacks. Existing defenses are largely limited to single-modal settings and mainly focus on poisoning-based threats, leaving evasion-based threats underexplored. In this work, we first identify a cross-modal gradient mismatch under the multi-user promotion setting, where visual and textual perturbations are optimized in inconsistent dir...
|
| 382 |
When Graph Language Models Go Beyond Memorization
2605.06239
|
cs.LG
|
Masatsugu Yamada, Mahito Sugiyama |
It remains unclear whether graph language models learn structural regularities or merely memorize training graphs; this cannot be resolved by current aggregate fidelity metrics alone. We develop a calibrated diagnostic protocol that combines frequent subgraph ...It remains unclear whether graph language models learn structural regularities or merely memorize training graphs; this cannot be resolved by current aggregate fidelity metrics alone. We develop a calibrated diagnostic protocol that combines frequent subgraph mining, a graph-level bootstrap baseline, and three-level frequency stratification to disentangle memorization from structural alignment. Using this framework, we show that graph language models can acquire structural regularities beyond me...
|
| 383 |
Cumulative-Goodness Free-Riding in Forward-Forward Networks: Real, Repairable, but Not Accuracy-Dominant
2605.06240
|
cs.LGcs.AI
|
Amirhossein Yousefiramandi |
Forward-Forward (FF) training allows each layer to learn from a local goodness criterion. In cumulative-goodness variants, however, later layers can inherit a task that earlier layers have already partially separated. We formalize this phenomenon as layer free...Forward-Forward (FF) training allows each layer to learn from a local goodness criterion. In cumulative-goodness variants, however, later layers can inherit a task that earlier layers have already partially separated. We formalize this phenomenon as layer free-riding: under the softplus FF criterion, the class-discrimination gradient reaching block $d$ decays exponentially with the positive margin accumulated by preceding blocks. We then study three local remedies -- per-block, hardness-gated, a...
|
| 386 |
Structure-Preserving Gaussian Processes Via Discrete Euler-Lagrange Equations
2605.06246
|
cs.LGcs.RO
|
Jan-Hendrik Ewering, Kathrin Flaßkamp, Niklas Wahlström, Thomas B. Schön, Thomas Seel |
In this paper, we propose Lagrangian Gaussian Processes (LGPs) for probabilistic and data-efficient learning of dynamics via discrete forced Euler-Lagrange equations. Importantly, the geometric structure of the Lagrange-d'Alembert principle, which governs the ...In this paper, we propose Lagrangian Gaussian Processes (LGPs) for probabilistic and data-efficient learning of dynamics via discrete forced Euler-Lagrange equations. Importantly, the geometric structure of the Lagrange-d'Alembert principle, which governs the motion of dynamical systems, is preserved by construction in the absence of external forces. This allows learning physically consistent models that overcome erroneous drift in the system's energy, thereby providing stable long-term predicti...
|
| 387 |
The Role of Node Features in Graph Pooling
2605.06250
|
cs.LG
|
Jan von Pichowski, Alžbeta Hrabošová, Ingo Scholtes, Christopher Blöcker |
Graph pooling is commonly applied in graph classification, yet its empirical gains over standard WL-1 expressive GNNs are often marginal or inconsistent. We study this gap by analysing the interaction between node features and graph topology and their effect o...Graph pooling is commonly applied in graph classification, yet its empirical gains over standard WL-1 expressive GNNs are often marginal or inconsistent. We study this gap by analysing the interaction between node features and graph topology and their effect on pooling objectives. Our analysis reveals that pooling operators require node features that are well-aligned with the graph's topology -- a condition often overlooked and not guaranteed in empirical networks. We formalise fundamental requi...
|
| 388 |
The Weight Gram Matrix Captures Sequential Feature Linearization in Deep Networks
2605.06258
|
cs.LGcs.AI
|
Taehun Cha, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Donghun Lee |
Understanding how deep neural networks learn representations remains a central challenge in machine learning theory. In this work, we propose a feature-centric framework for analyzing neural network training by relating weight updates to feature evolution. We ...Understanding how deep neural networks learn representations remains a central challenge in machine learning theory. In this work, we propose a feature-centric framework for analyzing neural network training by relating weight updates to feature evolution. We introduce a simple identity, the Feature Learning Equation, which identifies the weight Gram matrix as the key object capturing feature dynamics. This enables us to interpret gradient descent as implicitly inducing a hypothetical evolution ...
|
| 389 |
Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds
2605.06259
|
cs.LGcs.CR
|
Marten van Dijk, Murat Bilgehan Ertan |
We derive a tight analysis of the trade-off function for Differentially Private Stochastic Gradient Descent (DP-SGD) with subsampling based on random shuffling within the $f$-DP framework. Our analysis covers the regime $σ\geq \sqrt{3/\ln M}$, where $σ$ is the...We derive a tight analysis of the trade-off function for Differentially Private Stochastic Gradient Descent (DP-SGD) with subsampling based on random shuffling within the $f$-DP framework. Our analysis covers the regime $σ\geq \sqrt{3/\ln M}$, where $σ$ is the noise multiplier and $M$ is the number of rounds within a single epoch. Unlike $f$-DP analyses for Poisson subsampling, which yield non-closed implicit formulas that can be machine computed but are non-transparent, random shuffling admits ...
|
| 390 |
Beyond Rigid Alignment: Graph Federated Learning via Dual Manifold Calibration
2605.06260
|
cs.LG
|
Wentao Yu, Bo Han, Jie Yang, Chen Gong |
Graph Federated Learning (GFL) enables collaborative representation learning across distributed subgraphs while preserving privacy. However, heterogeneity remains a critical challenge, as subgraphs across clients typically differ significantly in both semantic...Graph Federated Learning (GFL) enables collaborative representation learning across distributed subgraphs while preserving privacy. However, heterogeneity remains a critical challenge, as subgraphs across clients typically differ significantly in both semantics and structures. Existing methods address heterogeneity by enforcing the rigid alignment of model parameters or prototypes between clients and the server. However, these alignments implicitly rely on a restrictive global linearity assumpti...
|
| 391 |
Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion
2605.06261
|
cs.LGcs.AI
|
Eugenio Lomurno, Filippo Balzarini, Francesco Benelle, Francesca Pia Panaccione, Matteo Matteucci |
Diffusion-based generators set the current state of the art for synthetic tabular data. These methods approach but rarely exceed real-data utility, and closing this synthetic-real gap has so far been pursued exclusively at training time, via architectural adva...Diffusion-based generators set the current state of the art for synthetic tabular data. These methods approach but rarely exceed real-data utility, and closing this synthetic-real gap has so far been pursued exclusively at training time, via architectural advances, scaling, and retraining of monolithic generators. The inference-time alternative, i.e., refining the outputs of a pre-trained backbone with parameters left untouched, has remained largely unexplored for tabular synthesis. We introduce...
|
| 392 |
Can Attribution Predict Risk? From Multi-View Attribution to Planning Risk Signals in End-to-End Autonomous Driving
2605.06264
|
cs.LG
|
Le Yang, Ruoyu Chen, Haijun Liu, Jiawei Liang, ShangQuan Sun |
End-to-end autonomous driving models generate future trajectories from multi-view inputs, improving system integration but introducing opaque decisions and hard-to-localize risks. Existing methods either rely on auxiliary monitoring models or generate textual ...End-to-end autonomous driving models generate future trajectories from multi-view inputs, improving system integration but introducing opaque decisions and hard-to-localize risks. Existing methods either rely on auxiliary monitoring models or generate textual explanations, but are decoupled from the planning process and fail to reveal the visual evidence underlying trajectory generation. While attribution offers a direct alternative, planning differs from image classification by taking six-view ...
|
| 396 |
A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions
2605.06272
|
cs.LG
|
Tyler Ingebrand, Ruihan Zhao, Kushagra Gupta, David Fridovich-Keil, Sandeep P. Chinchali |
While generative modeling has achieved remarkable success on tasks like natural language-conditioned image generation, enabling model adaptation from example data points remains a relatively underexplored and challenging problem. To this end, we propose Functi...While generative modeling has achieved remarkable success on tasks like natural language-conditioned image generation, enabling model adaptation from example data points remains a relatively underexplored and challenging problem. To this end, we propose Function Projection for Flow Matching (FP-FM), an algorithm that directly conditions generation on samples from the target distribution. FP-FM learns basis functions to span the velocity fields corresponding to a set of training distributions, an...
|
| 398 |
When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy
2605.06274
|
cs.LGcs.CV
|
April Chan, Davide D'Ascenzo, Sebastiano Cultrera di Montesano |
Standard cross-entropy is the default classification loss across virtually all of machine learning, yet it treats all misclassifications equally, ignoring the semantic distances that a class hierarchy encodes. We propose Hierarchy-Aware Cross-Entropy (HACE), a...Standard cross-entropy is the default classification loss across virtually all of machine learning, yet it treats all misclassifications equally, ignoring the semantic distances that a class hierarchy encodes. We propose Hierarchy-Aware Cross-Entropy (HACE), a drop-in replacement for standard cross-entropy that incorporates a known class hierarchy directly into the loss. HACE combines two components: prediction aggregation, which propagates the model's probability mass upward through the class h...
|
| 400 |
PACE: Prune-And-Compress Ensemble Models
2605.06278
|
cs.LGmath.OC
|
Fabian Akkerman, Julien Ferry, Théo Guyard, Thibaut Vidal |
Ensemble models achieve state-of-the-art performance on prediction tasks, but usually require aggregating a large number of weak learners. This can hinder deployment, interpretability, and downstream tasks such as robustness verification. Remedies to this issu...Ensemble models achieve state-of-the-art performance on prediction tasks, but usually require aggregating a large number of weak learners. This can hinder deployment, interpretability, and downstream tasks such as robustness verification. Remedies to this issue fall into two main camps: pruning, which discards redundant learners, and compression, which generates new ones from scratch. We introduce PACE, a framework that interleaves these paradigms in a two-phase strategy. First, new learners are...
|
| 403 |
INEUS: Iterative Neural Solver for High-Dimensional PIDEs
2605.06281
|
cs.LGmath.NAq-fin.CP
|
Jean-Loup Dupret, Davide Gallon, Patrick Cheridito |
In this paper, we introduce INEUS, a meshfree iterative neural solver for partial integro-differential equations (PIDEs). The method replaces the explicit evaluation of nonlocal jump integrals with single-jump sampling and reformulates PIDE solving as a sequen...In this paper, we introduce INEUS, a meshfree iterative neural solver for partial integro-differential equations (PIDEs). The method replaces the explicit evaluation of nonlocal jump integrals with single-jump sampling and reformulates PIDE solving as a sequence of recursive regression problems. Like Physics-Informed Neural Networks (PINNs), INEUS learns global solutions over the entire space-time domain, yet it offers a more efficient treatment of nonlocal terms and avoids the computationally e...
|
| 410 |
Attributions All the Way Down? The Metagame of Interpretability
2605.06295
|
cs.LGcs.AIstat.ML
|
Hubert Baniecki, Przemyslaw Biecek, Fabian Fumagalli |
We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $φ(f)$ explaining a model $f$, we measure the directional influence of feature $j$ on the attribution of f...We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $φ(f)$ explaining a model $f$, we measure the directional influence of feature $j$ on the attribution of feature $i$, denoted as meta-attribution $\varphi_{j \to i}(f)$, by treating the attribution method itself as a cooperative game and computing its Shapley value. Theoretically, we prove that attributions hierarchically decompose into meta-at...
|
| 412 |
Region Seeding via Pre-Activation Regularization: A Geometric View from Piecewise Affine Nerual Networks
2605.06300
|
cs.LG
|
Yi Wei, Xuan Qi, Furao Shen |
Deep networks with continuous piecewise affine activations induce polyhedral partitions of the input space, making the number of realized affine regions a natural measure of expressive capacity and a key determinant of how well the model can approximate nonlin...Deep networks with continuous piecewise affine activations induce polyhedral partitions of the input space, making the number of realized affine regions a natural measure of expressive capacity and a key determinant of how well the model can approximate nonlinear target functions. In practice, standard training realizes far fewer region refinements in data-visited neighborhoods than the architecture could in principle support, while existing region-count theory is primarily architectural and off...
|
| 413 |
Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces
2605.06303
|
cs.LG
|
Zakaria Elabid, Jan Andrzejewski, Bartosz Brzoza, Attila Cangi |
Molecular generative models often assume meaningful latent geometry, but apparent property predictability can reflect sequence-level shortcuts rather than chemical organization. We study this issue in an unsupervised autoregressive Transformer-VAE trained on S...Molecular generative models often assume meaningful latent geometry, but apparent property predictability can reflect sequence-level shortcuts rather than chemical organization. We study this issue in an unsupervised autoregressive Transformer-VAE trained on SELFIES. After training, we freeze the model, fit linear probes to RDKit descriptors, and use the probe weights as candidate global steering directions. To separate chemical signal from SELFIES artifacts, we introduce a confound-aware evalua...
|
| 418 |
Perceive, Route and Modulate: Dynamic Pattern Recalibration for Time Series Forecasting
2605.06310
|
cs.LG
|
Siru Zhong, Zhao Meng, Haohuan Fu, Haoyang Li, Qingsong Wen |
Local temporal patterns in real-world time series continuously shift, rendering globally shared transformations suboptimal. Current deep forecasting models, despite their scale and complexity, rely on fixed weight matrices applied uniformly to all temporal tok...Local temporal patterns in real-world time series continuously shift, rendering globally shared transformations suboptimal. Current deep forecasting models, despite their scale and complexity, rely on fixed weight matrices applied uniformly to all temporal tokens. This creates a static pattern response: models settle into a compromised average, unable to adapt to changing local dynamics. We introduce Dynamic Pattern Recalibration (DPR), a backbone-agnostic mechanism that resolves this via token-...
|
| 419 |
When Does $\ell_2$-Boosting Overfit Benignly? High-Dimensional Risk Asymptotics and the $\ell_1$ Implicit Bias
2605.06314
|
cs.LG
|
Ye Su, Jian Li, Yong Liu |
Benign overfitting is well-characterized in $\ell_2$ geometries, but its behavior under the $\ell_1$ implicit bias of greedy ensembles remains challenging. The analytical barrier stems from the non-linear coupling of coordinate selection thresholds, which inva...Benign overfitting is well-characterized in $\ell_2$ geometries, but its behavior under the $\ell_1$ implicit bias of greedy ensembles remains challenging. The analytical barrier stems from the non-linear coupling of coordinate selection thresholds, which invalidates standard spectral resolvent tools. To isolate this algorithmic bias, we characterize the high-dimensional risk of continuous-time $\ell_2$-Boosting over $p$ features and $n$ samples. By coupling the Convex Gaussian Minimax Theorem w...
|
| 421 |
Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
2605.06316
|
cs.LGcs.AI
|
Ruotong Sun, Ermin Wei |
Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the preconditioner via KL divergence minimiz...Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the preconditioner via KL divergence minimization -- and orthogonalization of the gradient momentum, exemplified by Muon and analyzed as steepest descent under the spectral norm. The two routes are typically developed in isolation. We make a structural observation about KL-Shampoo's ...
|
| 425 |
SMolLM: Small Language Models Learn Small Molecular Grammar
2605.06322
|
cs.LG
|
Akhil Jindal, Harang Ju |
Language models for molecular design have scaled to hundreds of millions of parameters, yet how they learn chemical grammar is poorly understood. We train SMolLM, a 53K-parameter weight-shared transformer, to generate novel SMILES with 95% validity on the ZINC...Language models for molecular design have scaled to hundreds of millions of parameters, yet how they learn chemical grammar is poorly understood. We train SMolLM, a 53K-parameter weight-shared transformer, to generate novel SMILES with 95% validity on the ZINC-250K drug-like-molecule benchmark, outperforming a standard GPT with 10 times more parameters. Mechanistically, the same block resolves SMILES constraints across passes in a fixed order: brackets first, rings second, and valence last, as s...
|
| 429 |
Geometric Kolmogorov--Arnold Network (GeoKAN)
2605.06740
|
cs.LGcs.AI
|
Abhijit Sen, Bikram Keshari Parida, Giridas Maiti, Mahima Arya, Denys I. Bondar |
We introduce Geometric Kolmogorov--Arnold Networks (GeoKANs), a family of geometry-aware KAN-type models in which approximation is carried out in learned, geometry-adapted coordinates rather than in fixed Euclidean input coordinates. GeoKAN achieves this by le...We introduce Geometric Kolmogorov--Arnold Networks (GeoKANs), a family of geometry-aware KAN-type models in which approximation is carried out in learned, geometry-adapted coordinates rather than in fixed Euclidean input coordinates. GeoKAN achieves this by learning a diagonal Riemannian metric that warps the input before basis expansion and feature mixing. The learned metric provides a geometric inductive bias through local length scaling and volume distortion, and in physics-informed settings ...
|
| 431 |
LINC: Decoupling Local Consequence Scoring from Hidden Matching in Constructive Neural Routing
2605.06332
|
cs.LG
|
Shaofeng Qin, Li Wang |
Constructive neural routing solvers usually score the next action by matching a decoder context to candidate embeddings, hiding deterministic one-step consequences such as travel, waiting, slack, and capacity changes. We propose LINC (Local Inference via Norme...Constructive neural routing solvers usually score the next action by matching a decoder context to candidate embeddings, hiding deterministic one-step consequences such as travel, waiting, slack, and capacity changes. We propose LINC (Local Inference via Normed Comparison), a decoder-side candidate decision architecture that computes these consequences explicitly. LINC uses them according to their decision role: centered relative consequences are compared by a shared linear local scorer, while f...
|
| 434 |
Eliciting associations between clinical variables from LLMs via comparison questions across populations
2605.06335
|
cs.LG
|
Fabian Kabus, Kian Kordtomeikel, Thomas Brox, Heinz Wiendl, Daiana Stolz |
The training data of large language models (LLMs) comprises a wide range of biomedical literature, reflecting data from many different patient populations. We investigate how it might be possible to recover information on correlation and causal links between p...The training data of large language models (LLMs) comprises a wide range of biomedical literature, reflecting data from many different patient populations. We investigate how it might be possible to recover information on correlation and causal links between patient characteristics, as a key building block for medical decision making. To avoid the pitfalls of direct elicitation, we propose an approach based on structured comparison questions, specifically patient comparison triplet questions. Th...
|
| 438 |
A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics
2605.06741
|
cs.LG
|
Zixi Li, Youzhen Li |
Learning-rate steps are usually treated as hyperparameters. This paper isolates a local beliefspace calculation: when an update is modeled as a projected forward step on the probability simplex, admissibility means contractivity in the natural KL/Bregman geome...Learning-rate steps are usually treated as hyperparameters. This paper isolates a local beliefspace calculation: when an update is modeled as a projected forward step on the probability simplex, admissibility means contractivity in the natural KL/Bregman geometry. Under this model, the upper bound of an admissible step is not a tuning slogan but a formula.
|
| 445 |
Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades
2605.06350
|
cs.LGcs.AIcs.CL
|
Dylan Bouchard |
Model cascades, in which a cheap LLM defers to an expensive one on low-confidence queries, are widely used to navigate the cost-quality tradeoff at deployment. Existing approaches largely treat the deferral threshold as an empirical hyperparameter, with limite...Model cascades, in which a cheap LLM defers to an expensive one on low-confidence queries, are widely used to navigate the cost-quality tradeoff at deployment. Existing approaches largely treat the deferral threshold as an empirical hyperparameter, with limited guidance on the geometry of the resulting cost-quality frontier over a model pool. We develop a decision-theoretic framework grounded in constrained optimization and duality. For a two-model cascade, we establish piecewise concavity of th...
|
| 446 |
Topological Signatures of Grokking
2605.06352
|
cs.LGcs.AIstat.ML
|
Yifan Tang, Qiquan Wang, Inés García-Redondo, Anthea Monod |
We study the grokking phenomenon through the lens of topology. Using persistent homology on point clouds derived from the embedding matrices of a range of models trained on modular arithmetic with varying primes, we identify a clear and consistent topological ...We study the grokking phenomenon through the lens of topology. Using persistent homology on point clouds derived from the embedding matrices of a range of models trained on modular arithmetic with varying primes, we identify a clear and consistent topological signature of grokking: a sharp increase in both the maximum and total persistence of first homology ($H_1$). Persistence diagrams reveal the emergence of a dominant long-lived topological feature together with increasingly structured second...
|
| 448 |
Order-Agnostic Autoregressive Modelling with Missing Data
2605.06355
|
cs.LGstat.ML
|
Ignacio Peis, Pablo M. Olmos, Jes Frellsen |
Order-Agnostic autoregressive models have demonstrated strong performance in deep generative modeling, yet their use in settings with incomplete data remains largely unexplored. In this work, we reinterpret them through the lens of missing data. First, we show...Order-Agnostic autoregressive models have demonstrated strong performance in deep generative modeling, yet their use in settings with incomplete data remains largely unexplored. In this work, we reinterpret them through the lens of missing data. First, we show that their standard training procedure on fully observed data implicitly performs imputation under a missing completely at random mechanism, resulting in robust out-of-sample imputation performance in settings with high missingness. Second...
|
| 450 |
Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations
2605.06357
|
cs.LGcs.AIcs.CV
|
Yuan Du, Mitchel Hill, HanQin Cai |
This work studies the robust evaluation of iterative stochastic purification defenses under white-box adversarial attacks. Our key technical insight is that gradient checkpointing makes exact end-to-end gradient computation through long purification trajectori...This work studies the robust evaluation of iterative stochastic purification defenses under white-box adversarial attacks. Our key technical insight is that gradient checkpointing makes exact end-to-end gradient computation through long purification trajectories practical by trading additional recomputation for substantially lower memory usage. This enables full-gradient adaptive attacks against diffusion- and Langevin-based purification defenses, where prior evaluations often resort to approxim...
|
| 452 |
Preliminary Insights in Chronos Frequency Data Understanding and Reconstruction
2605.06361
|
cs.LG
|
Alessandro Pagani, Marco Cominelli, Liying Han, Gaofeng Dong, Sergio Benini |
This paper presents a preliminary analysis of the ability of Chronos foundation model to process and internally represent frequency domain information. Foundation models that process time-series data offer practitioners a unified architecture capable of learni...This paper presents a preliminary analysis of the ability of Chronos foundation model to process and internally represent frequency domain information. Foundation models that process time-series data offer practitioners a unified architecture capable of learning generic temporal representations across diverse tasks and domains, reducing the need for task-specific feature engineering and enabling transfer across signal modalities. Despite their growing adoption, the extent to which such models en...
|
| 453 |
Flow Matching with Arbitrary Auxiliary Paths
2605.06364
|
cs.LGcs.AI
|
Xin Peng, Ang Gao |
We introduce a new generative modeling framework, \textbf{Flow Matching with Arbitrary Auxiliary Paths (AuxPath-FM)}, which generalizes conditional flow matching by incorporating an auxiliary variable drawn from an arbitrary distribution into the probability p...We introduce a new generative modeling framework, \textbf{Flow Matching with Arbitrary Auxiliary Paths (AuxPath-FM)}, which generalizes conditional flow matching by incorporating an auxiliary variable drawn from an arbitrary distribution into the probability path. Unlike prior methods that restrict auxiliary components to Gaussian noise, AuxPath-FM allows the variable $η$ to follow any distribution, producing trajectories of the form $X_t = a(t)X_1 + b(t)X_0 + c(t)η$. We theoretically demonstrat...
|
| 455 |
Layer Collapse in Diffusion Language Models
2605.06366
|
cs.LG
|
Alexander Conzelmann, Albert Catalan-Tatjer, Shiwei Liu |
Diffusion language models (DLMs) have recently emerged as competitive alternatives to autoregressive (AR) language models, yet differences in their activation dynamics remain poorly understood. We characterize these dynamics in LLaDA-8B and identify a striking...Diffusion language models (DLMs) have recently emerged as competitive alternatives to autoregressive (AR) language models, yet differences in their activation dynamics remain poorly understood. We characterize these dynamics in LLaDA-8B and identify a striking layer-collapse property: a few early layers exhibit highly similar, collapsed activation patterns dominated by a single large super-outlier persisting over a long token range. Despite its apparent redundancy, this outlier is critical: prun...
|
| 460 |
A Unified Pair-GRPO Family: From Implicit to Explicit Preference Constraints for Stable and General RL Alignment
2605.06375
|
cs.LGmath.ST
|
Hao Yu |
Large language model (LLM) alignment via reinforcement learning from human preferences (RLHF) suffers from unstable policy updates, ambiguous gradient directions, poor interpretability, and high gradient variance in mainstream pairwise preference learning para...Large language model (LLM) alignment via reinforcement learning from human preferences (RLHF) suffers from unstable policy updates, ambiguous gradient directions, poor interpretability, and high gradient variance in mainstream pairwise preference learning paradigms. To systematically address these limitations, we establish a unified theoretical framework for preference-based RL optimization centered on the Pair-GRPO family, comprising two tightly coupled variants: Soft-Pair-GRPO and Hard-Pair-GR...
|
| 465 |
MinMax Recurrent Neural Cascades
2605.06384
|
cs.LGcs.AIcs.FL
|
Alessandro Ronca |
We show that the MinMax algebra provides a form of recurrence that is expressively powerful, efficiently implementable, and most importantly it is not affected by vanishing or exploding gradient. We call MinMax Recurrent Neural Cascades (RNCs) the models obtai...We show that the MinMax algebra provides a form of recurrence that is expressively powerful, efficiently implementable, and most importantly it is not affected by vanishing or exploding gradient. We call MinMax Recurrent Neural Cascades (RNCs) the models obtained by cascading several layers of neurons that employ such recurrence. We show that MinMax RNCs enjoy many favourable theoretical properties. First, their formal expressivity includes all regular languages, arguably the maximal expressivit...
|
| 466 |
Data-Driven Covariate Selection for Nonparametric and Cycle-Agnostic Causal Effect Estimation
2605.06385
|
cs.LG
|
Ana Leticia Garcez Vicente, Gijs van Seeventer, Saber Salehkaleybar |
Estimating causal effects from observational data requires identifying valid adjustment sets. This task is especially challenging in realistic settings where latent confounding and feedback loops are present. Existing approaches typically assume acyclicity or ...Estimating causal effects from observational data requires identifying valid adjustment sets. This task is especially challenging in realistic settings where latent confounding and feedback loops are present. Existing approaches typically assume acyclicity or rely on global causal structure learning, limiting applicability and computational efficiency. In this work, we study a local, data-driven method for covariate selection based on conditional independence information. While this method is kn...
|
| 468 |
Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level
2605.06387
|
cs.LGcs.AI
|
Nan Jia, Haojin Yang, Xing Ma, Jiesong Lian, Shuailiang Zhang |
On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suf...On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suffers from three structural weaknesses, including high variance updates, vanishing gradients in zero-advantage regions, and exploration bottlenecks when corrective signals are insufficient. We therefore propose Asymmetric On-Policy Distillat...
|
| 471 |
Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves
2605.06395
|
cs.LGcs.AIeess.SP
|
Kartik Tandon, Julian Gould, Tanishq Bhatia, Francesca Dominici, Alejandro Ribeiro |
Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for ...Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for these settings has been lacking. To start addressing this gap, we introduce a novel convolutional learning framework for possibly infinite-dimensional signals supported on a manifold. Namely, we use the connection Laplacian associated with ...
|
| 472 |
SparseForge: Efficient Semi-Structured LLM Sparsification via Annealing of Hessian-Guided Soft-Mask
2605.06402
|
cs.LG
|
Liu Hanzuo, Chaofan Lin, Weixuan Sun, Yulong Wang, Key |
Semi-structured sparsity provides a practical path to accelerate large language models (LLMs) with native hardware support, but post-training semi-structured pruning often suffers from substantial quality degradation due to strong structural coupling. Existing...Semi-structured sparsity provides a practical path to accelerate large language models (LLMs) with native hardware support, but post-training semi-structured pruning often suffers from substantial quality degradation due to strong structural coupling. Existing methods rely on large-scale sparse retraining to recover accuracy, resulting in high computational cost. We propose SparseForge, a post-training framework that improves recovery efficiency by directly optimizing the sparsity mask rather ...
|
| 474 |
FRInGe: Distribution-Space Integrated Gradients with Fisher--Rao Geometry
2605.06404
|
cs.LG
|
Gabriele Martino, Sebastian Tschiatschek |
Gradient-based attribution methods are model-faithful and scalable, but Integrated Gradients (IG) can be brittle because explanations depend on heuristic baselines, straight-line paths, discretization, and saturation. We propose Fisher--Rao Integrated Gradient...Gradient-based attribution methods are model-faithful and scalable, but Integrated Gradients (IG) can be brittle because explanations depend on heuristic baselines, straight-line paths, discretization, and saturation. We propose Fisher--Rao Integrated Gradients (FRInGe), which defines both the reference and interpolation schedule in predictive distribution space. FRInGe replaces input baselines with a maximum-entropy predictive reference and follows a Fisher-Rao geodesic on the probability simpl...
|
| 478 |
E = T*H/(O+B): A Dimensionless Control Parameter for Mixture-of-Experts Ecology
2605.06415
|
cs.LGcs.AIcs.CLcs.CV
|
Qingjun Zhang |
We introduce E = T*H/(O+B), a dimensionless control parameter that predicts whether Mixture-of-Experts (MoE) models will develop a healthy expert ecology or collapse into dead experts. E combines four hyperparameters -- routing temperature T, routing entropy w...We introduce E = T*H/(O+B), a dimensionless control parameter that predicts whether Mixture-of-Experts (MoE) models will develop a healthy expert ecology or collapse into dead experts. E combines four hyperparameters -- routing temperature T, routing entropy weight H, oracle weight O, and balance weight B -- into a single quantity. Through 12 controlled experiments (8 vision, 4 language) totaling over 11,000 training epochs, we establish that E >= 0.5 alone is sufficient to guarantee zero dead e...
|
| 482 |
Federated Cross-Client Subgraph Pattern Detection
2605.06433
|
cs.LG
|
Selin Ceydeli, Rui Wang, Kubilay Atasu |
Subgraph pattern detection aims to uncover complex interaction structures in graphs. However, state-of-the-art graph neural network (GNN)-based solutions assume centralized access to the entire graph. When graphs are instead distributed across multiple parties...Subgraph pattern detection aims to uncover complex interaction structures in graphs. However, state-of-the-art graph neural network (GNN)-based solutions assume centralized access to the entire graph. When graphs are instead distributed across multiple parties, client-local GNN computations diverge from those of a centralized model, resulting in a representation-equivalence gap. We formalize this as a structural observability problem, where subgraph patterns crossing partition boundaries become ...
|
| 487 |
Hyperbolic Concept Bottleneck Models
2605.06440
|
cs.LGcs.CV
|
Daniel Uyterlinde, Swasti Shreya Mishra, Pascal Mettes |
Concept Bottleneck Models (CBMs) have become a popular approach to enable interpretability in neural networks by constraining classifier inputs to a set of human-understandable concepts. While effective, current models embed concepts in flat Euclidean space, t...Concept Bottleneck Models (CBMs) have become a popular approach to enable interpretability in neural networks by constraining classifier inputs to a set of human-understandable concepts. While effective, current models embed concepts in flat Euclidean space, treating them as independent, orthogonal dimensions. Concepts, however, are highly structured and organized in semantic hierarchies. To resolve this mismatch, we propose Hyperbolic Concept Bottleneck Models (HypCBM), a post-hoc framework tha...
|
| 490 |
FedFrozen: Two-Stage Federated Optimization via Attention Kernel Freezing
2605.06446
|
cs.LG
|
Junye Du, Zhenghao Li, Yushi Feng, Long Feng |
Federated learning with heterogeneous clients remains a significant challenge for deep learning, primarily due to client drift arising from inconsistent local updates. Existing federated optimization methods typically address this issue through objective-level...Federated learning with heterogeneous clients remains a significant challenge for deep learning, primarily due to client drift arising from inconsistent local updates. Existing federated optimization methods typically address this issue through objective-level regularization or update-correction mechanisms. Recent studies, however, suggest that Transformer-based architectures may be inherently more robust than conventional models under heterogeneous federated training. Motivated by this observat...
|
| 491 |
Scene-Adaptive Continual Learning for CSI-based Human Activity Recognition with Mixture of Experts
2605.06447
|
cs.LG
|
Wenhan Zheng, Yuyi Mao, Ivan Wang-Hei Ho |
Channel state information (CSI)-based human activity recognition (HAR) is vulnerable to performance degradation under domain shifts across varying physical environments. Continual learning (CL) offers a principled way to learn new domains sequentially while pr...Channel state information (CSI)-based human activity recognition (HAR) is vulnerable to performance degradation under domain shifts across varying physical environments. Continual learning (CL) offers a principled way to learn new domains sequentially while preserving past knowledge, but existing CL solutions for CSI-based HAR scale poorly with accumulating domains, rely on a large replay buffer, or incur linearly growing inference cost. In this letter, we propose Scene-Adaptive Mixture of Exper...
|
| 493 |
ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization
2605.06454
|
cs.LGcs.AI
|
Maresa Schröder, Pascal Janetzky, Michael Klar, Stefan Feuerriegel |
Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mode that was previously overl...Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mode that was previously overlooked: even when the surrogate model and acquisition target are correctly specified, finite-sample Monte Carlo error can perturb acquisition values. This can, in turn, flip candidate rankings and lead to suboptimal BO decisions. As a remedy...
|
| 496 |
Invariant Features in Language Models: Geometric Characterization and Model Attribution
2605.06458
|
cs.LGcs.CL
|
Agnibh Dasgupta, Abdullah Tanvir, Xin Zhong |
Language models exhibit strong robustness to paraphrasing, suggesting that semantic information may be encoded through stable internal representations, yet the structure and origin of such invariance remain unclear. We propose a local geometric framework in wh...Language models exhibit strong robustness to paraphrasing, suggesting that semantic information may be encoded through stable internal representations, yet the structure and origin of such invariance remain unclear. We propose a local geometric framework in which semantically equivalent inputs occupy structured regions in latent space, with paraphrastic variation along nuisance directions and semantic identity preserved in invariant subspaces. Building on this view, we make three contributions: ...
|
| 497 |
MINER: Mining Multimodal Internal Representation for Efficient Retrieval
2605.06460
|
cs.LG
|
Weien Li, Rui Song, Zeyu Li, Haochen Liu, Gonghao Zhang |
Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but store hundreds of vec...Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but store hundreds of vectors per page, incurring large index footprints and high serving costs. By contrast, dense single-vector retrievers retain storage and latency advantages but consistently lag in quality because they compress all information into a single fi...
|
| 498 |
Invariant-Based Diagnostics for Graph Benchmarks
2605.06462
|
cs.LGmath.CO
|
Richard von Moos, Mathieu Alain, Bastian Rieck |
Progress on graph foundation models is hindered by benchmark practices that conflate the contributions of node features and graph structure, making it hard to tell whether a model actually learns from connectivity, or whether it even needs to. We propose addre...Progress on graph foundation models is hindered by benchmark practices that conflate the contributions of node features and graph structure, making it hard to tell whether a model actually learns from connectivity, or whether it even needs to. We propose addressing this using graph invariants, i.e., permutation-invariant, task-agnostic structural descriptors that serve as a diagnostic framework for graph benchmarks. We show that (i) invariants are more expressive than standard GNNs, (ii) invaria...
|
| 499 |
Diversity Curves for Graph Representation Learning
2605.06466
|
cs.LG
|
Katharina Limbeck, Nadja Häusermann, Martin Carrasco, Guy Wolf, Bastian Rieck |
Graph-level representations are crucial tools for characterising structural differences between graphs. However, comparing graphs with different cardinalities, even when sampled from the same underlying distribution, remains challenging. Unsupervised tasks in ...Graph-level representations are crucial tools for characterising structural differences between graphs. However, comparing graphs with different cardinalities, even when sampled from the same underlying distribution, remains challenging. Unsupervised tasks in particular require interpretable, scalable, and reliable size-aware graph representations. Our work addresses these issues by tracking the structural diversity of a graph across coarsening levels. The resulting graph embeddings, which we de...
|
| 500 |
No Triangulation Without Representation: Generalization in Topological Deep Learning
2605.06467
|
cs.LGmath.AT
|
Johannes S. Schmidt, Martin Carrasco, Ernst Röell, Guy Wolf, Nello Blaser |
Despite an ever-increasing interest in topological deep learning models that target higher-order datasets, there is no consensus on how to evaluate such models. This is exacerbated by the fact that topological objects permit operations, such as structural refi...Despite an ever-increasing interest in topological deep learning models that target higher-order datasets, there is no consensus on how to evaluate such models. This is exacerbated by the fact that topological objects permit operations, such as structural refinements, that are not appropriate for graph data. In this work, we extend MANTRA, a benchmark dataset containing manifold triangulations, to a larger class of manifolds with more diverse homeomorphism types. We show that, unlike prior claim...
|
| 502 |
Hitting Time Isomorphism for Multi-Stage Planning with Foundation Policies
2605.06470
|
cs.LG
|
Magnus Victor Boock, Abdullah Akgül, Mustafa Mert Çelikok, Melih Kandemir |
We present a new operator-theoretic representation learning framework for offline reinforcement learning that recovers the directed temporal geometry of a controlled Markov process from hitting time observations. While prior art often produces symmetric distan...We present a new operator-theoretic representation learning framework for offline reinforcement learning that recovers the directed temporal geometry of a controlled Markov process from hitting time observations. While prior art often produces symmetric distances or fails to satisfy the triangle inequality, our framework learns a Hilbert-space displacement geometry where expected hitting times are realized as linear functionals of latent displacements. We prove that this representation exists un...
|
| 503 |
Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management
2605.06472
|
cs.LG
|
Haoyu Zheng, Fangcheng Fu, Jia Wu, Binhang Yuan, Yongqiang Zhang |
LLM-based workflows compose specialized agents to execute complex tasks, and these agents usually share substantial context, allowing KV-Cache reuse to save computation. Existing approaches either manage KV-Cache at agent level and fail to exploit the reuse op...LLM-based workflows compose specialized agents to execute complex tasks, and these agents usually share substantial context, allowing KV-Cache reuse to save computation. Existing approaches either manage KV-Cache at agent level and fail to exploit the reuse opportunities within workflows, or manage cache at the workflow level but assume that each workflow calls a static sequence of agents. However, practical workflows are typically dynamic, where the sequence of invoked agents and thus induced c...
|
| 504 |
Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching
2605.06474
|
cs.LGcs.AIstat.ML
|
Xiang Li, Nan Jiang |
We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weig...We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned inductively in a top-down manner via a moment matching objective against a value-function discriminator class. Notably, and perhaps surprisingly, a data-dependent finite-sample guarantee for general function approximation ca...
|
| 516 |
Operator-Guided Invariance Learning for Continuous Reinforcement Learning
2605.06500
|
cs.LGcs.AI
|
Zuyuan Zhang, Fei Xu Yu, Tian Lan |
Reinforcement learning (RL) with continuous time and state/action spaces is often data-intensive and brittle under nuisance variability and shift, motivating methods that exploit value-preserving structures to stabilize and improve learning. Most existing appr...Reinforcement learning (RL) with continuous time and state/action spaces is often data-intensive and brittle under nuisance variability and shift, motivating methods that exploit value-preserving structures to stabilize and improve learning. Most existing approaches focus on special cases, such as prescribed symmetries and exact equivariance, without addressing how to discover more general structures that require nonlinear operators to transform and map between continuous state/action systems wi...
|
| 517 |
Cubit: Token Mixer with Kernel Ridge Regression
2605.06501
|
cs.LGcs.CL
|
Chuanyang Zheng, Jiankai Sun, Yihang Gao, Yuehao Wang, Liangchen Tan |
Since its introduction in 2017, the Transformer has become one of the most widely adopted architectures in modern deep learning. Despite extensive efforts to improve positional encoding, attention mechanisms, and feed-forward networks, the core token-mixing me...Since its introduction in 2017, the Transformer has become one of the most widely adopted architectures in modern deep learning. Despite extensive efforts to improve positional encoding, attention mechanisms, and feed-forward networks, the core token-mixing mechanism in Transformers remains attention. In this work, we show that the attention module in Transformers can be interpreted as performing Nadaraya-Watson regression, where it computes similarities between tokens and aggregates the corresp...
|
| 518 |
Gradient Extrapolation-Based Policy Optimization
2605.06755
|
cs.LGcs.AI
|
Ismam Nur Swapnil, Aranya Saha, Tanvir Ahmed Khan, Mohammad Ariful Haque, Ser-Nam Lim |
Reinforcement learning is widely used to improve the reasoning ability of large language models, especially when answers can be automatically checked. Standard GRPO-style training updates the model using only the current step, while full multi-step lookahead c...Reinforcement learning is widely used to improve the reasoning ability of large language models, especially when answers can be automatically checked. Standard GRPO-style training updates the model using only the current step, while full multi-step lookahead can give a better update direction but is too expensive because it needs many backward passes. We propose Gradient Extrapolation-Based Policy Optimization (GXPO), a plug-compatible policy-update rule for GRPO-style reasoning RL. GXPO approxi...
|
| 519 |
PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization
2605.06505
|
cs.LGcs.AIcs.CR
|
Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen, Marten van Dijk, Srinivas Devadas |
We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prio...We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the...
|
| 524 |
Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models
2605.06510
|
cs.LGcs.AI
|
Amir Rezaei Balef, Mykhailo Koshil, Katharina Eggensperger |
Transformer-based tabular foundation models (TFMs) dominate small to medium tabular predictive benchmark tasks, yet their inference mechanisms remain largely unexplored. We present the first large-scale mechanistic study of layerwise dynamics in 6 state-of-the...Transformer-based tabular foundation models (TFMs) dominate small to medium tabular predictive benchmark tasks, yet their inference mechanisms remain largely unexplored. We present the first large-scale mechanistic study of layerwise dynamics in 6 state-of-the-art tabular in-context learning models. We explore how predictions emerge across depth, identify distinct stages of inference and reveal latent-space dynamics that differ from those of language models. Our findings indicate substantial dep...
|
| 527 |
Physics-based Digital Twins for Integrated Thermal Energy Systems Using Active Learning
2605.06756
|
cs.LGeess.SY
|
Umme Mahbuba Nabila, Paul Seurin, Linyu Lin, Majdi I. Radaideh |
Real-time supervisory control of thermal energy distribution systems requires digital twins that are accurate, interpretable, and uncertainty-aware, yet remain data and computationally efficient. High-fidelity simulations alone are costly, while purely data-dr...Real-time supervisory control of thermal energy distribution systems requires digital twins that are accurate, interpretable, and uncertainty-aware, yet remain data and computationally efficient. High-fidelity simulations alone are costly, while purely data-driven surrogates often lack robustness. To address these challenges, this work proposes an active learning (AL) framework that couples system-level Modelica simulations with four simpler physics-informed and data-driven surrogate modeling ap...
|
| 528 |
Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees
2605.06519
|
cs.LG
|
Edward Tansley, Roy Makhlouf, Estelle Massart, Coralia Cartis |
Data reconstruction attacks on trained neural networks aim to recover the data on which the network has been trained and pose a significant threat to privacy, especially if the training dataset contains sensitive information. Here, we propose a unified optimiz...Data reconstruction attacks on trained neural networks aim to recover the data on which the network has been trained and pose a significant threat to privacy, especially if the training dataset contains sensitive information. Here, we propose a unified optimization formulation of the data reconstruction problem based on initial and trained parameter values, incorporating state-of-the-art proposals. We show that in the random feature model, this formulation provably leads to training data reconst...
|
| 530 |
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models
2605.06522
|
cs.LGcs.CV
|
Xin Wang, Haibo Chen, Wenxuan Liu, Wenwu Zhu |
Foundation models (FMs) are increasingly deployed in open-world settings where distribution shift is the rule rather than the exception. The out-of-distribution (OOD) phenomena they face -- knowledge boundaries, capability ceilings, compositional shifts, and o...Foundation models (FMs) are increasingly deployed in open-world settings where distribution shift is the rule rather than the exception. The out-of-distribution (OOD) phenomena they face -- knowledge boundaries, capability ceilings, compositional shifts, and open-ended task variation -- differ in kind from the settings that have shaped prior OOD research, and are further complicated because the pretraining and post-training distributions of modern FMs are often only partially observed. Our posit...
|
| 531 |
On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR
2605.06523
|
cs.LGcs.AI
|
Hao Ye, Jisheng Dang, Junfeng Fang, Bimei Wang, Yizhou Zhang |
Recent extensive research has demonstrated that the enhanced reasoning capabilities acquired by models through Reinforcement Learning with Verifiable Rewards (RLVR) are primarily concentrated within the rank-1 components. Predicated on this observation, we emp...Recent extensive research has demonstrated that the enhanced reasoning capabilities acquired by models through Reinforcement Learning with Verifiable Rewards (RLVR) are primarily concentrated within the rank-1 components. Predicated on this observation, we employed Periodic Rank-1 Substitution and identified a counterintuitive phenomenon: RLVR may exhibit implicit reward overfitting to the training dataset. Specifically, the model can achieve satisfactory performance on the test set even when it...
|
| 539 |
Diffusion-Based Posterior Sampling: A Feynman-Kac Analysis of Bias and Stability
2605.06538
|
cs.LG
|
Matias G. Delgadino, Sebastien Motsch, Advait Parulekar, William Porteous, Sanjay Shakkottai |
Diffusion-based posterior samplers use pretrained diffusion priors to sample from measurement- or reward-conditioned posteriors, and are widely used for inverse problems. Yet their theoretical behavior remains poorly understood: even with exact prior scores, t...Diffusion-based posterior samplers use pretrained diffusion priors to sample from measurement- or reward-conditioned posteriors, and are widely used for inverse problems. Yet their theoretical behavior remains poorly understood: even with exact prior scores, their outputs are biased, and in low-temperature regimes their discretizations can become unstable. We characterize this bias by introducing a tractable surrogate path connecting the true posterior to a standard Gaussian and comparing it to ...
|
| 541 |
Hedging Memory Horizons for Non-Stationary Prediction via Online Aggregation
2605.06541
|
cs.LGstat.ML
|
Yutong Wang, Yannig Goude, Qiwei Yao |
We study online prediction under distribution shift, where inputs arrive chronologically and outcomes are revealed only after prediction. In this setting, predictors must remain stable in quiet regimes yet adapt when regimes shift, and the right adaptation mem...We study online prediction under distribution shift, where inputs arrive chronologically and outcomes are revealed only after prediction. In this setting, predictors must remain stable in quiet regimes yet adapt when regimes shift, and the right adaptation memory is unknown in advance. We propose MELO (Memory-hedged Exponentially Weighted Least-Squares Online aggregation), a model-agnostic method that hedges across adaptation scales: it wraps any non-anticipating base-predictor pool with exponen...
|
| 544 |
Sequential Design of Genetic Circuits Under Uncertainty With Reinforcement Learning
2605.06552
|
cs.LG
|
Michal Kobiela, Diego A. Oyarzún, Michael U. Gutmann |
The design of biological systems is hindered by uncertainty arising from both intrinsic stochasticity of biomolecular reactions and variability across laboratory or experimental conditions. In this work, we present a sequential framework to optimize genetic ci...The design of biological systems is hindered by uncertainty arising from both intrinsic stochasticity of biomolecular reactions and variability across laboratory or experimental conditions. In this work, we present a sequential framework to optimize genetic circuits under both forms of uncertainty. By employing simulator models based on differential equations or Markov jump processes alongside a reinforcement learning (RL) policy-based approach, our method suggests experiments that adapt to unkn...
|
| 545 |
Diverse Sampling in Diffusion Models with Marginal Preserving Particle Guidance
2605.06553
|
cs.LG
|
Gal Vinograd, Idan Achituve, Ethan Fetaya |
We present EDDY (Exact-marginal Diversification via Divergence-free dYnamics), a guidance mechanism for diffusion and flow matching models that promotes diversity among samples generated while maintaining quality. EDDY exploits symmetries of the Fokker-Planck ...We present EDDY (Exact-marginal Diversification via Divergence-free dYnamics), a guidance mechanism for diffusion and flow matching models that promotes diversity among samples generated while maintaining quality. EDDY exploits symmetries of the Fokker-Planck equation, using drift perturbations that change particle trajectories while preserving the evolving marginal distribution. We instantiate this principle through kernel-based anti-symmetric pairwise matrix fields, constructed from the repuls...
|
| 549 |
Optimal Counterfactual Search in Tree Ensembles: A Study Across Modeling and Solution Paradigms
2605.06561
|
cs.LG
|
Awa Khouna, Youssouf Emine, Julien Ferry, Thibaut Vidal |
Trust in counterfactual explanations depends critically on whether their recommended changes are truly minimal: suboptimal explanations may vastly overshoot the actual changes needed to alter a decision, and heuristic errors can affect individuals unevenly, gi...Trust in counterfactual explanations depends critically on whether their recommended changes are truly minimal: suboptimal explanations may vastly overshoot the actual changes needed to alter a decision, and heuristic errors can affect individuals unevenly, giving some users relevant recourse while assigning others unnecessarily costly recommendations. Consequently, we study the problem of computing optimal counterfactual explanations for tree ensembles under plausibility and actionability const...
|
| 550 |
Feature Dimensionality Outweighs Model Complexity in Breast Cancer Subtype Classification Using TCGA-BRCA Gene Expression Data
2605.06562
|
cs.LGq-bio.GN
|
Meena Al Hasani |
Accurate classification of breast cancer subtypes from gene expression data is critical for diagnosis and treatment selection. However, such datasets are characterized by high dimensionality and limited sample size, posing challenges for machine learning model...Accurate classification of breast cancer subtypes from gene expression data is critical for diagnosis and treatment selection. However, such datasets are characterized by high dimensionality and limited sample size, posing challenges for machine learning models. In this study, we evaluate the impact of model complexity and feature selection on subtype classification performance using TCGA-BRCA gene expression data. Logistic regression, random forest, and support vector machine (SVM) models wer...
|
| 551 |
Criticality and Saturation in Orthogonal Neural Networks
2605.06563
|
cs.LG
|
Max Guillen, Jan E. Gerken |
It has been known for a long time that initializing weight matrices to be orthogonal instead of having i.i.d. Gaussian components can improve training performance. This phenomenon can be analyzed using finite-width corrections, where the infinite-width statist...It has been known for a long time that initializing weight matrices to be orthogonal instead of having i.i.d. Gaussian components can improve training performance. This phenomenon can be analyzed using finite-width corrections, where the infinite-width statistics are supplemented by a power series in $1/\mathrm{width}$. In particular, recent empirical results by Day et al. show that the tensors appearing in this treatment stabilize for large depth, as opposed to the tensors of i.i.d.-initialized...
|
| 553 |
SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation
2605.06570
|
cs.LGmath.OCq-fin.CPq-fin.MFq-fin.RM
|
Dmitri Goloubentsev, Natalija Karpichina |
Many real-world problems require sequential decisions under uncertainty: when to inject or withdraw gas from storage, how to rebalance a pension portfolio each month, what temperature profile to run through a pharmaceutical reactor chain. Dynamic programming s...Many real-world problems require sequential decisions under uncertainty: when to inject or withdraw gas from storage, how to rebalance a pension portfolio each month, what temperature profile to run through a pharmaceutical reactor chain. Dynamic programming solves small instances exactly but scales exponentially in state dimensions. Black-box reinforcement learning handles high-dimensional states but trains slowly and produces no sensitivities. We introduce SNAPO (Smooth Neural Adjoint Policy O...
|
| 554 |
CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification
2605.06571
|
cs.LGcs.CRcs.DCcs.NI
|
Iason Ofeidis, Nikos Papadis, Randeep Bhatia, Leandros Tassiulas, TV Lakshman |
The rapid expansion of the Internet of Things (IoT) and Industrial IoT (IIoT) has created a massive, heterogeneous attack surface that challenges traditional network security mechanisms. While Federated Learning (FL) offers a privacy-preserving alternative to ...The rapid expansion of the Internet of Things (IoT) and Industrial IoT (IIoT) has created a massive, heterogeneous attack surface that challenges traditional network security mechanisms. While Federated Learning (FL) offers a privacy-preserving alternative to centralized Intrusion Detection Systems (IDS), standard approaches struggle to generalize across diverse device behaviors and typically fail to utilize the vast amounts of unlabeled data present in realistic edge environments. To bridge the...
|
| 556 |
Directional Consistency as a Complementary Optimization Signal: The GONO Framework
2605.06575
|
cs.LGcs.AI
|
Victor Daniel Gera |
We identify and formalize an underexplored phenomenon in deep learning optimization: directional alignment and loss convergence can be decoupled. An optimizer can exhibit near-perfect directional consistency (cc_t -> 1, measured via consecutive gradient cos...We identify and formalize an underexplored phenomenon in deep learning optimization: directional alignment and loss convergence can be decoupled. An optimizer can exhibit near-perfect directional consistency (cc_t -> 1, measured via consecutive gradient cosine similarity) while the loss remains high or decreases slowly. This observation reveals that existing optimizers such as Adam, SGD, and RMSprop lack explicit mechanisms to exploit temporal consistency in gradient directions, relying instead ...
|
| 557 |
On the Safety of Graph Representation Learning
2605.06576
|
cs.LG
|
Xiaoguang Guo, Zehong Wang, Ziming Li, Shawn Spitzel, Soonwoo Kwon |
Graph representation learning (GRL) has evolved from topology-only graph embeddings to task-specific supervised GNNs, and more recently to reusable representations and graph foundation models (GFMs). However, existing evaluations mainly measure clean transfer,...Graph representation learning (GRL) has evolved from topology-only graph embeddings to task-specific supervised GNNs, and more recently to reusable representations and graph foundation models (GFMs). However, existing evaluations mainly measure clean transfer, adaptation, and task coverage. It remains unclear whether GRL methods stay reliable when deployment stresses affect graph signals, graph contexts, label support, structural groups, or predictive evidence. We introduce GRL-Safety, a multi-a...
|
| 558 |
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
2605.06582
|
cs.LGcs.CLcs.SD
|
Adhiraj Banerjee, Vipul Arora |
Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantiz...Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, or codec reconstruction, assigning tokens locally, so sequence consistency, compactness, length control, termination, and edit similarity are rarely optimized directly. We introduce PairAlign, a framework for compact au...
|
| 561 |
Distributionally-Robust Learning to Optimize
2605.06585
|
cs.LGmath.OC
|
Vinit Ranjan, Jisun Park, Bartolomeo Stellato |
We propose a distributionally robust approach to learning hyperparameters for first-order methods in convex optimization. Given a dataset of problem instances, we minimize a Wasserstein distributionally robust version of the performance estimation problem (PEP...We propose a distributionally robust approach to learning hyperparameters for first-order methods in convex optimization. Given a dataset of problem instances, we minimize a Wasserstein distributionally robust version of the performance estimation problem (PEP) over algorithm parameters such as step sizes. Our framework unifies two extremes: as the robustness radius vanishes, we recover classical learning to optimize (L2O); as it grows, we recover worst-case optimal algorithm design via PEP. We ...
|
| 562 |
Towards Metric-Faithful Neural Graph Matching
2605.06588
|
cs.LGcs.AI
|
Jyotirmaya Shivottam, Subhankar Mishra |
Graph Edit Distance (GED) is a fundamental, albeit NP-hard, metric for structural graph similarity. Recent neural graph matching architectures approximate GED by first encoding graphs with a Graph Neural Network (GNN) and then applying either a graph-level reg...Graph Edit Distance (GED) is a fundamental, albeit NP-hard, metric for structural graph similarity. Recent neural graph matching architectures approximate GED by first encoding graphs with a Graph Neural Network (GNN) and then applying either a graph-level regression head or a matching-based alignment module. Despite substantial architectural progress, the role of encoder geometry in neural GED estimation remains poorly understood. In this paper, we develop a theoretical framework that connects ...
|
| 564 |
BRICKS: Compositional Neural Markov Kernels for Zero-Shot Radiation-Matter Simulation
2605.06591
|
cs.LGhep-ph
|
Richard Hildebrandt, Evangelos Kourlitis, Baran Hashemi, Manuel Bünstorf, Thierry Meyer |
We introduce a new strategy for compositional neural surrogates for radiation-matter interactions, a key task spanning domains from particle physics through nuclear and space engineering to medical physics. Exploiting the locality and the Markov nature of part...We introduce a new strategy for compositional neural surrogates for radiation-matter interactions, a key task spanning domains from particle physics through nuclear and space engineering to medical physics. Exploiting the locality and the Markov nature of particle interactions, we create a \emph{next-particle prediction} kernel using hybrid discrete-continuous transformer models based on Riemannian Flow Matching on product manifolds. The model generates variable-sized typed sets of particles and...
|
| 571 |
Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization
2605.06599
|
cs.LGeess.AS
|
Abhijit Das, Sayantan Dutta |
Weight decay is widely used as a regularizer in large language models, yet its precise role in shaping Transformer loss landscapes remains theoretically underexplored. This paper provides the first rigorous functional-analytic characterization of the standard ...Weight decay is widely used as a regularizer in large language models, yet its precise role in shaping Transformer loss landscapes remains theoretically underexplored. This paper provides the first rigorous functional-analytic characterization of the standard Transformer objective--cross-entropy loss with $L^2$ regularization--by proving it satisfies Villani's criteria for coercive energy functions. Specifically, we show that the regularized loss $\mathcal{F}$ is infinitely differentiable, grows...
|
| 573 |
How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
2605.06605
|
cs.LG
|
Shai Feldman, Yaniv Romano |
Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally expensive; key events -- e.g., jailbreaks or successful task completion by an agent -- often emerge only after repea...Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally expensive; key events -- e.g., jailbreaks or successful task completion by an agent -- often emerge only after repeated interactions. These events might be rare, and under any feasible computational budget, remain unobserved. Recent conformal survival frameworks construct reliable lower predictive bounds (LPBs) on the number of iterations to trigger th...
|
| 576 |
Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
2605.06609
|
cs.LGstat.ML
|
Chenyang Zhang, Yuan Cao |
Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certain algorithms on the context, thereby enhancing prediction ...Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certain algorithms on the context, thereby enhancing prediction and generation. In this work, we investigate how transformers with softmax attention perform in-context learning on linear classification data. We first construct a class of multi-layer transformers that can perform in-context logistic regr...
|
| 577 |
SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders
2605.06610
|
cs.LGcs.CV
|
Jakub Stępień, Marcin Mazur, Jacek Tabor, Przemysław Spurek |
Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets ...Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets of monosemantic features, SAEs aim to translate neural network computations into human-understandable concepts. However, common architectures such as TopK SAEs rely on a fixed sparsity level. They enforce the same number of active features ...
|
| 578 |
The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity
2605.06611
|
cs.LGcs.AIstat.ML
|
Siquan Li, Kaiqi Jiang, Jiacheng Sun, Tianyang Hu |
Despite the prevalence of the attention sink phenomenon in Large Language Models (LLMs), where initial tokens disproportionately monopolize attention scores, its structural origins remain elusive. This work provides a \textit{mechanistic explanation} for this ...Despite the prevalence of the attention sink phenomenon in Large Language Models (LLMs), where initial tokens disproportionately monopolize attention scores, its structural origins remain elusive. This work provides a \textit{mechanistic explanation} for this phenomenon. First, we trace its root to the value aggregation process inherent in self-attention, which induces a systematic variance discrepancy. We further demonstrate that this discrepancy is drastically amplified by the activation of su...
|
| 579 |
Online Bayesian Calibration under Gradual and Abrupt System Changes
2605.06612
|
cs.LGcs.ETstat.ML
|
Yang Xu, Chiwoo Park |
Bayesian model calibration is central to digital twins and computer experiments, as it aligns model outputs with field observations by estimating calibration parameters and correcting systematic model bias. Classical Bayesian calibration introduces latent para...Bayesian model calibration is central to digital twins and computer experiments, as it aligns model outputs with field observations by estimating calibration parameters and correcting systematic model bias. Classical Bayesian calibration introduces latent parameters and a discrepancy function to model bias, but suffers from parameter--discrepancy confounding and is typically formulated as an offline procedure under a stationary data-generating assumption. These limitations are restrictive in mod...
|
| 581 |
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
2605.06615
|
cs.LGcs.AIcs.CLmath.OC
|
Hongyi Tao, Dingzhi Yu, Lijun Zhang |
Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical understanding of when and why th...Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical understanding of when and why these sign-based methods outperform vanilla SGD. The core obstacle is that under standard smoothness and finite variance conditions, SGD is known to be minimax optimal for finding stationary points measured by $\ell_2$-norms, thereby fundamen...
|
| 585 |
Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache
2605.06763
|
cs.LG
|
Mohsen Dehghankar, Abolfazl Asudeh |
Sparse attention improves LLM inference efficiency by selecting a subset of key-value entries, but at the cost of potential accuracy degradation. In particular, omitting critical KV entries can induce substantial errors in model outputs. Existing methods typic...Sparse attention improves LLM inference efficiency by selecting a subset of key-value entries, but at the cost of potential accuracy degradation. In particular, omitting critical KV entries can induce substantial errors in model outputs. Existing methods typically operate under fixed or adaptive token budgets and provide empirical robustness or partial theoretical guarantees, yet they do not ensure zero false negatives in decoding steps, particularly since the set of relevant tokens is both quer...
|
| 589 |
Hybrid Quantum-Classical GANs for the Generation of Adversarial Network Flows
2605.06629
|
cs.LG
|
Prateek Paudel, Nitin Jha, Abhishek Parakh, Mahadevan Subramaniam |
Classical generative adversarial networks (GANs) have been applied to generate adversarial network traffic capable of attacking intrusion detection systems, but they suffer from shortcomings such as the need for large amounts of high-dimensional datasets, mode...Classical generative adversarial networks (GANs) have been applied to generate adversarial network traffic capable of attacking intrusion detection systems, but they suffer from shortcomings such as the need for large amounts of high-dimensional datasets, mode collapse, and high computational overhead. In this work, we propose a hybrid quantum-classical GAN (QC-GAN) framework where a variational quantum generator is used to generate synthetic network traffic flows mimicking malicious traffic usi...
|
| 591 |
Crafting Reversible SFT Behaviors in Large Language Models
2605.06632
|
cs.LG
|
Yuping Lin, Pengfei He, Yue Xing, Yingqian Cui, Jiayuan Ding |
Supervised fine-tuning (SFT) induces new behaviors in large language models, yet imposes no structural constraint on how these behaviors are distributed within the model. Existing behavior interpretation methods, such as circuit attribution approaches, identif...Supervised fine-tuning (SFT) induces new behaviors in large language models, yet imposes no structural constraint on how these behaviors are distributed within the model. Existing behavior interpretation methods, such as circuit attribution approaches, identify sparse subnetworks correlated with SFT-induced behaviors post-hoc. However, such correlations do not imply *causal necessity*, limiting the ability to selectively control SFT-induced behaviors at inference time. We pursue an alternative b...
|
| 595 |
Recursive Agent Optimization
2605.06639
|
cs.LGcs.AIcs.CLcs.MA
|
Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig |
We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling...We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer contexts and generalize to more difficult problems via divide-and-conquer. RAO provides a method to train models to best take advantage of such recursive inference, teaching agents ...
|
| 596 |
Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models
2605.06640
|
cs.LGcs.AI
|
Ronaldo Canizales, Divya Gopinath, Corina Păsăreanu, Ravi Mangal |
*Concept-based explanations* offer a promising approach for explaining the predictions of deep neural networks in terms of high-level, human-understandable concepts. However, existing methods either do not establish a causal connection between the concepts and...*Concept-based explanations* offer a promising approach for explaining the predictions of deep neural networks in terms of high-level, human-understandable concepts. However, existing methods either do not establish a causal connection between the concepts and model predictions or are limited in expressivity and only able to infer causal explanations involving single concepts. At the same time, the parallel line of work on *formal abductive and contrastive explanations* computes the minimal set ...
|
| 600 |
Edge-specific signal propagation on mature chromophore-region 3D mechanism graphs for fluorescent protein quantum-yield prediction
2605.06644
|
cs.LG
|
Yuchen Xiong, Swee Keong Yeap, Steven Aw Yoong Kit |
Fluorescent protein quantum yield (QY) is governed by the mature chromophore and its three-dimensional microenvironment rather than sequence identity alone. Protein language models and emission-band averages capture global trends, but do not model how local ph...Fluorescent protein quantum yield (QY) is governed by the mature chromophore and its three-dimensional microenvironment rather than sequence identity alone. Protein language models and emission-band averages capture global trends, but do not model how local physical signals act on specific chromophore regions. We present a chromophore-centred mechanism graph algorithm for QY prediction. Each PDB structure is converted into a typed 3D residue graph, registered to a mature-CRO state, partitioned...
|
| cs.MA 4 papers | ||||
| 86 |
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
2605.05703
|
cs.MAcs.AIcs.LG
|
Huchen Yang, Xinghao Dong, Dan Negrut, Jin-Long Wu |
Optimizing the communication structure of large language model based multi-agent systems (LLM-MAS) has been shown to improve downstream performance and reduce token usage. Existing methods typically rely on randomly sampled training tasks. However, tasks may d...Optimizing the communication structure of large language model based multi-agent systems (LLM-MAS) has been shown to improve downstream performance and reduce token usage. Existing methods typically rely on randomly sampled training tasks. However, tasks may differ substantially in difficulty and domain, and thus they are not equally informative for updating communication structure, making optimization under limited training budgets often unstable and highly sensitive to the particular training ...
|
| 99 |
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes
2605.05724
|
cs.MAcs.AI
|
Jingjie Ning, Xiaochuan Li, Ji Zeng, Hao Kang, Chenyan Xiong |
We study auto research as a closed empirical loop driven by external measurement. Each submitted trial carries a hypothesis, an executable code edit, an evaluator-owned outcome, and feedback that shapes the next proposal. The output is not a generated paper or...We study auto research as a closed empirical loop driven by external measurement. Each submitted trial carries a hypothesis, an executable code edit, an evaluator-owned outcome, and feedback that shapes the next proposal. The output is not a generated paper or a single model checkpoint, but an auditable trajectory of proposals, code diffs, experiments, scores, and failure labels. We instantiate this loop with specialist agents that partition recipe surfaces and share measured lineage across tria...
|
| 424 |
Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs
2605.06320
|
cs.MAcs.AIcs.CL
|
Elizabeth Mieczkowski, Alexander Ku, Tiwalayo Eisape, Dilip Arumugam, John Matters |
Large language models (LLMs) are increasingly deployed in teams, yet existing coordination approaches often occupy two extremes. Highly structured methods rely on fixed roles, pipelines, or task decompositions assigned a priori. In contrast, fully unstructured...Large language models (LLMs) are increasingly deployed in teams, yet existing coordination approaches often occupy two extremes. Highly structured methods rely on fixed roles, pipelines, or task decompositions assigned a priori. In contrast, fully unstructured teams enable adaptability and exploration but suffer from inefficiencies such as error propagation, inter-agent conflicts, and wasted resources (measured in time, tokens, or file operations). We introduce Language Agent Teams for Task Evol...
|
| 548 |
Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning
2605.06557
|
cs.MAcs.AIcs.LG
|
Maria Ana Cardei, Matthew Landers, Afsaneh Doryab |
Cooperative multi-agent reinforcement learning (MARL) benchmarks commonly emphasize aggregate outcomes such as return, success rate, or completion time. While essential, these metrics often fail to reveal how agents coordinate, particularly in settings where a...Cooperative multi-agent reinforcement learning (MARL) benchmarks commonly emphasize aggregate outcomes such as return, success rate, or completion time. While essential, these metrics often fail to reveal how agents coordinate, particularly in settings where agents, tasks, and joint assignment choices scale combinatorially. We propose a coordination-aware evaluation perspective that supplements return with process-level diagnostics. We instantiate this perspective using STAT, a controlled commit...
|
| cs.MM 1 papers | ||||
| 385 |
Modality-Aware Contrastive and Uncertainty-Regularized Emotion Recognition
2605.06245
|
cs.MM
|
Yan Zhuang, Minhao Liu, Yanru Zhang, Jiawen Deng, Fuji Ren |
Multimodal Emotion Recognition (MER) has attracted growing attention with the rapid advancement of human-computer interaction. However, different modalities exhibit substantial discrepancies in semantics, quality, and availability, leading to highly heterogene...Multimodal Emotion Recognition (MER) has attracted growing attention with the rapid advancement of human-computer interaction. However, different modalities exhibit substantial discrepancies in semantics, quality, and availability, leading to highly heterogeneous modality combinations and posing significant challenges to achieving consistent and reliable emotion understanding. To address this challenge, we propose the Modality-Aware Contrastive and Uncertainty-Regularized (MCUR) framework, which...
|
| cs.NE 1 papers | ||||
| 439 |
CoupleEvo: Evolving Heuristics for Coupled Optimization Problems Using Large Language Models
2605.06341
|
cs.NEcs.AImath.OC
|
Thomas Bömer, Bastian Amberg, Max Disselnmeyer, Anne Meyer |
Many real-world optimization problems consist of multiple tightly coupled subproblems whose solutions must be coordinated to achieve high overall performance. However, existing large language model driven automated heuristic design approaches are limited to si...Many real-world optimization problems consist of multiple tightly coupled subproblems whose solutions must be coordinated to achieve high overall performance. However, existing large language model driven automated heuristic design approaches are limited to single-problem settings. In this paper, we propose CoupleEvo. CoupleEvo proposes three evolutionary coordination strategies to evolve heuristics for coupled optimization problems: the sequential strategy evolves heuristics for one subproblem ...
|
| cs.PF 1 papers | ||||
| 82 |
When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon
2605.05699
|
cs.PFcs.AI
|
Mohamed Amine Bergach |
KV-cache quantization is framed as a quality--latency trade-off. We show it is \emph{inverted} on Apple Silicon's unified memory: a single fused Metal kernel (sign-randomized FFT $+$ per-channel $λ$ $+$ per-group abs-max $+$ int4 nibble pack), exposed as a Hug...KV-cache quantization is framed as a quality--latency trade-off. We show it is \emph{inverted} on Apple Silicon's unified memory: a single fused Metal kernel (sign-randomized FFT $+$ per-channel $λ$ $+$ per-group abs-max $+$ int4 nibble pack), exposed as a HuggingFace \texttt{Cache} subclass, runs \emph{faster than fp16} across $256$--$4096$-token prefixes on Gemma-3 1B ($-3$ to $-8\%$ ms/tok) and at short context on Qwen2.5-1.5B ($-0.7$ to $-2.6\%$ through $1$K), with $3\times$ persistent memor...
|
| cs.RO 4 papers | ||||
| 119 |
MaMi-HOI: Harmonizing Global Kinematics and Local Geometry for Human-Object Interaction Generation
2605.05756
|
cs.ROcs.CV
|
Hao Wang, Shiqi Wang, Qi Liu |
Generating realistic 3D Human-Object Interactions (HOI) is a fundamental task for applications ranging from embodied AI to virtual content creation, which requires harmonizing high-level semantic intent with strict low-level physical constraints. Existing meth...Generating realistic 3D Human-Object Interactions (HOI) is a fundamental task for applications ranging from embodied AI to virtual content creation, which requires harmonizing high-level semantic intent with strict low-level physical constraints. Existing methods excel at semantic alignment, however, they struggle to maintain precise object contact. We reveal a key finding termed \textit{Geometric Forgetting}: as diffusion model depth increases, semantic feature tend to overshadow object geometr...
|
| 371 |
When to Trust Imagination: Adaptive Action Execution for World Action Models
2605.06222
|
cs.ROcs.AI
|
Rui Wang, Yue Zhang, Jiehong Lin, Kuncheng Luo, Jianan Wang |
World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model ...World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model inference, leaving the robot blind to whether the imagined future remains consistent with the actual physical rollout. In this work, we formulate adaptive WAM execution as a future-reality verification problem: the robot should execute long...
|
| 566 |
ReActor: Reinforcement Learning for Physics-Aware Motion Retargeting
2605.06593
|
cs.ROcs.GRcs.LG
|
David Müller, Agon Serifi, Sammy Christen, Ruben Grandia, Espen Knoop |
Retargeting human kinematic reference motion onto a robot's morphology remains a formidable challenge. Existing methods often produce physical inconsistencies, such as foot sliding, self-collisions, or dynamically infeasible motions, which hinder downstream im...Retargeting human kinematic reference motion onto a robot's morphology remains a formidable challenge. Existing methods often produce physical inconsistencies, such as foot sliding, self-collisions, or dynamically infeasible motions, which hinder downstream imitation learning. We propose a bilevel optimization framework that jointly adapts reference motions to a robot's morphology while training a tracking policy using reinforcement learning. To make the optimization tractable, we derive an appr...
|
| 568 |
Cross-Modal Navigation with Multi-Agent Reinforcement Learning
2605.06595
|
cs.ROcs.AIcs.LGcs.MA
|
Shuo Liu, Xinzichen Li, Christopher Amato |
Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex represe...Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable paradigm. It enables flexible deployment and parallel execution, while preserving the strength of...
|
| cs.SD 4 papers | ||||
| 39 |
X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning
2605.05611
|
cs.SDcs.AIeess.AS
|
Rixi Xu, Qingyu Liu, Haitao Li, Yushen Chen, Zhikang Niu |
In this paper, we present X-Voice, a 0.4B multilingual zero-shot voice cloning model that clones arbitrary voices and enables everyone to speak 30 languages. X-Voice is trained on a 420K-hour multilingual corpus using the International Phonetic Alphabet (IPA) ...In this paper, we present X-Voice, a 0.4B multilingual zero-shot voice cloning model that clones arbitrary voices and enables everyone to speak 30 languages. X-Voice is trained on a 420K-hour multilingual corpus using the International Phonetic Alphabet (IPA) as a unified representation. To eliminate the reliance on prompt text without complex preprocessing like forced alignment, we design a two-stage training paradigm. In Stage 1, we establish X-Voice$_{\text{s1}}$ through standard conditional ...
|
| 229 |
Do Melody and Rhythm Coevolve?
2605.05982
|
cs.SD
|
Harin Lee, Rainer Polak, Manuel Anglada-Tort, Marc Schönwiesner, Minsu Park |
Music comprises two core structural components, melody and rhythm, that vary widely across cultures. Whether these components coevolve in a coupled way or follow independent trajectories remains unclear. We introduce a novel computational pipeline to extract v...Music comprises two core structural components, melody and rhythm, that vary widely across cultures. Whether these components coevolve in a coupled way or follow independent trajectories remains unclear. We introduce a novel computational pipeline to extract vocal melodic pitch-interval and percussive inter-onset timing distributions from 27,628 popular songs across 59 countries, enabling large-scale cross-cultural comparison that bypasses traditional music annotations. Musical similarities betw...
|
| 255 |
Quantum Kernels for Audio Deepfake Detection Using Spectrogram Patch Features
2605.06035
|
cs.SDcs.AI
|
Lisan Al Amin, Rakib Hossain, Mahbubul Islam, Faisal Quader, Thanh Thi Nguyen |
Quantum machine learning has emerged as a promising tool for pattern recognition, yet many audio-focused approaches still treat spectrograms as generic images and do not explicitly exploit their time-frequency structure. We propose Q-Patch, a quantum feature m...Quantum machine learning has emerged as a promising tool for pattern recognition, yet many audio-focused approaches still treat spectrograms as generic images and do not explicitly exploit their time-frequency structure. We propose Q-Patch, a quantum feature map tailored to audio that encodes local time-frequency patches from mel-spectrograms into quantum states using shallow, hardware-efficient circuits with adjacency-aware entanglement. Each selected patch is summarized by a compact four-dimen...
|
| 587 |
PianoCoRe: Combined and Refined Piano MIDI Dataset
2605.06627
|
cs.SDcs.LG
|
Ilya Borovik |
Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsist...Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora. The dataset contains 250,046 performances of 5,625 pieces written by 483 composers, totaling 21,763...
|
| cs.SE 8 papers | ||||
| 54 |
Agentic Coding Needs Proactivity, Not Just Autonomy
2605.06717
|
cs.SEcs.AI
|
Nghi D. Q. Bui, Georgios Evangelopoulos |
Coding agents are rapidly changing the landscape of software development, moving from inline completion to autonomous systems that edit repositories, open pull requests, respond to issues, and run scheduled or webhook triggered routines across the development ...Coding agents are rapidly changing the landscape of software development, moving from inline completion to autonomous systems that edit repositories, open pull requests, respond to issues, and run scheduled or webhook triggered routines across the development life cycle. The next generation is increasingly described as proactive and long-horizon: agents should notice relevant changes before the developer asks, connect signals across tools, decide when to interrupt, and carry preferences across s...
|
| 83 |
An Empirical Study of Proactive Coding Assistants in Real-World Software Development
2605.05700
|
cs.SEcs.AI
|
Lehui Li, Ruixuan Jia, Guo-Ye Yang, Jia Li |
Large language model (LLM)-based coding assistants have made substantial progress, yet most systems remain reactive, requiring developers to explicitly formulate their needs. Proactive coding assistants aim to infer latent developer intent from integrated deve...Large language model (LLM)-based coding assistants have made substantial progress, yet most systems remain reactive, requiring developers to explicitly formulate their needs. Proactive coding assistants aim to infer latent developer intent from integrated development environment (IDE) interactions and repository context, thereby reducing interaction overhead and supporting more seamless assistance. However, research in this direction is limited by the scarcity of large-scale real-world developer...
|
| 302 |
Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs
2605.06111
|
cs.SEcs.AI
|
Yujia Chen, Yang Ye, Xiao Chu, Yuchi Ma, Cuiyun Gao |
Reinforcement learning (RL) with verifiable rewards has proven effective at post-training LLMs for coding, yet deploying separate task-specific specialists incurs costs that scale with the number of tasks, motivating a unified multi-task RL (MTRL) approach. Ho...Reinforcement learning (RL) with verifiable rewards has proven effective at post-training LLMs for coding, yet deploying separate task-specific specialists incurs costs that scale with the number of tasks, motivating a unified multi-task RL (MTRL) approach. However, existing MTRL methods treat all coding tasks uniformly, relying on fixed data curricula under a shared optimization strategy, ultimately limiting the effectiveness of multi-task training. To address these limitations, we propose ASTO...
|
| 315 |
BUILD-AND-FIND: An Effort-Aware Protocol for Evaluating Agent-Managed Codebases
2605.06136
|
cs.SEcs.AI
|
Jhen-Ke Lin |
Most coding-agent benchmarks ask whether generated code behaves correctly. That remains essential, but repository-level engineering is increasingly agent-managed: one agent writes a repository, and later agents inspect, audit, or extend it as working context. ...Most coding-agent benchmarks ask whether generated code behaves correctly. That remains essential, but repository-level engineering is increasingly agent-managed: one agent writes a repository, and later agents inspect, audit, or extend it as working context. In that setting, a generated repository is not only an answer to a task but also a communication artifact for future work. Even when strong agents nearly satisfy the visible behavioral objective, repositories can differ in how clearly they ...
|
| 342 |
Teaching LLMs Program Semantics via Symbolic Execution Traces
2605.06184
|
cs.SEcs.LGcs.PL
|
Jonas Bayer, Stefan Zetzsche, Olivier Bouissou, Remi Delmas, Michael Tautschnig |
We introduce an evaluation framework of 500 C verification tasks across five property types (memory safety, overflow, termination, reachability, data races) built on SV-COMP 2025, and evaluate 14 models across six families. We find that high overall accuracy m...We introduce an evaluation framework of 500 C verification tasks across five property types (memory safety, overflow, termination, reachability, data races) built on SV-COMP 2025, and evaluate 14 models across six families. We find that high overall accuracy masks a critical weakness: while most models reliably confirm properties hold, violation detection varies widely and degrades sharply with program length. To close this gap, we train on formal verification artifacts: running the Soteria symb...
|
| 357 |
A Self-Healing Framework for Reliable LLM-Based Autonomous Agents
2605.06737
|
cs.SEcs.AI
|
Cheonsu Jeong, Younggun Shin |
Autonomous agents based on Large Language Models (LLMs) are increasingly being utilized in complex software systems. However, reliability remains a significant challenge due to unpredictable failures such as hallucinations, execution errors, and inconsistent r...Autonomous agents based on Large Language Models (LLMs) are increasingly being utilized in complex software systems. However, reliability remains a significant challenge due to unpredictable failures such as hallucinations, execution errors, and inconsistent reasoning. This paper proposes a reliability-aware self-healing framework for LLM-based software agents. The framework integrates failure detection, reliability assessment, and automated recovery mechanisms. First, we define a taxonomy of fa...
|
| 401 |
Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions
2605.06279
|
cs.SEcs.AI
|
Chengjie Wang, Jingzheng Wu, Xiang Ling, Tianyue Luo, Chen Zhao |
Large language models (LLMs) are now largely involved in software development workflows, and the code they generate routinely includes third-party library (TPL) imports annotated with specific version identifiers. These version choices can carry security and c...Large language models (LLMs) are now largely involved in software development workflows, and the code they generate routinely includes third-party library (TPL) imports annotated with specific version identifiers. These version choices can carry security and compatibility risks, yet they have not been systematically studied. We present the first large-scale measurement study of version-level risk in LLM-generated Python code, evaluating 10 LLMs on PinTrace, a curated benchmark of 1,000 Stack O...
|
| 489 |
Constraint Decay: The Fragility of LLM Agents in Backend Code Generation
2605.06445
|
cs.SEcs.AI
|
Francesco Dente, Dario Satriani, Paolo Papotti |
Large Language Model (LLM) agents demonstrate strong performance in autonomous code generation under loose specifications. However, production-grade software requires strict adherence to structural constraints, such as architectural patterns, databases, and ob...Large Language Model (LLM) agents demonstrate strong performance in autonomous code generation under loose specifications. However, production-grade software requires strict adherence to structural constraints, such as architectural patterns, databases, and object-relational mappings. Existing benchmarks often overlook these non-functional requirements, rewarding functionally correct but structurally arbitrary solutions. We present a systematic study evaluating how well agents handle structural ...
|
| econ.EM 1 papers | ||||
| 467 |
Covariate Balancing and Riesz Regression Should Be Guided by the Neyman Orthogonal Score in Debiased Machine Learning
2605.06386
|
econ.EMcs.LGmath.STstat.MEstat.ML
|
Masahiro Kato |
This position paper argues that, in debiased machine learning, balancing functions should be derived from the Neyman orthogonal score, not chosen only as functions of covariates. Covariate balancing is effective when the regression error entering the score can...This position paper argues that, in debiased machine learning, balancing functions should be derived from the Neyman orthogonal score, not chosen only as functions of covariates. Covariate balancing is effective when the regression error entering the score can be represented by functions of covariates alone, and it is the natural finite-dimensional approximation for targets such as ATT counterfactual means. For ATE estimation under treatment effect heterogeneity, however, the score error general...
|
| eess.AS 5 papers | ||||
| 15 |
Optimal Transport Audio Distance with Learned Riemannian Ground Metrics
2605.05554
|
eess.AScs.SD
|
Wonwoo Jeong |
In audio generation evaluation, Fréchet Audio Distance (FAD) is a 2-Wasserstein distance with structural constraints for both primitives: the cost is a frozen embedding pullback whose invariance set hides severe artifacts, and the coupling is a Gaussian fit th...In audio generation evaluation, Fréchet Audio Distance (FAD) is a 2-Wasserstein distance with structural constraints for both primitives: the cost is a frozen embedding pullback whose invariance set hides severe artifacts, and the coupling is a Gaussian fit that dilutes rank-1 contamination relative to discrete OT. We propose Optimal Transport Audio Distance (OTAD), which corrects each primitive with one dedicated mechanism -- a residual Riemannian ground-metric adapter for the cost and entropic...
|
| 300 |
NDF+: Joint Neural Directional Filtering and Diffuse Sound Extraction
2605.06108
|
eess.AS
|
Weilong Huang, Le Nhat Tam Huynh, Oliver Thiergart, Emanuël A. P. Habets |
Recently, neural directional filtering (NDF) has been introduced as a flexible approach for reconstructing a virtual directional microphone (VDM) with a desired directivity pattern for spatial sound capture. Building on this idea, we propose NDF+, which enable...Recently, neural directional filtering (NDF) has been introduced as a flexible approach for reconstructing a virtual directional microphone (VDM) with a desired directivity pattern for spatial sound capture. Building on this idea, we propose NDF+, which enables joint neural directional filtering and diffuse sound extraction. NDF+ reformulates VDM estimation into two coupled subtasks: dereverberated VDM reconstruction and diffuse sound extraction. This reformulation enables NDF+ to manipulate dif...
|
| 347 |
Predictive-Generative Drift Decomposition for Speech Enhancement and Separation
2605.06189
|
eess.AScs.LG
|
Julius Richter, Yoshiki Masuyama, Christoph Boeddeker, Takahiro Edo, Gordon Wichern |
We propose a plug-and-play framework for speech enhancement and separation that augments predictive methods with a generative speech prior. Our approach, termed Stochastic Interpolant Prior for Speech (SIPS), builds on stochastic interpolants and leverages the...We propose a plug-and-play framework for speech enhancement and separation that augments predictive methods with a generative speech prior. Our approach, termed Stochastic Interpolant Prior for Speech (SIPS), builds on stochastic interpolants and leverages their flexibility to bridge predictive and generative modeling. Specifically, we decompose the interpolation dynamics into a task-specific drift and a stochastic denoising component, allowing a predictive estimate to be integrated directly int...
|
| 475 |
WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling
2605.06407
|
eess.AScs.AIcs.CL
|
Guanrou Yang, Tian Tan, Qian Chen, Zhikang Niu, Yakun Song |
Integrating speech understanding and generation is a pivotal step toward building unified speech models. However, the different representations required for these two tasks currently pose significant compatibility challenges. Typically, semantics-oriented feat...Integrating speech understanding and generation is a pivotal step toward building unified speech models. However, the different representations required for these two tasks currently pose significant compatibility challenges. Typically, semantics-oriented features are learned from self-supervised learning (SSL), and acoustic-oriented features from reconstruction. Such fragmented representations hinder the realization of truly unified speech systems. We present WavCube, a compact continuous laten...
|
| 590 |
Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models
2605.06631
|
eess.AS
|
Amir Ivry |
Large audio language models (LALMs) are increasingly used to reason over long audio clips, yet deployment often compresses audio before inference to reduce memory and latency. The risk is that compression can leave aggregate accuracy acceptable while sharply d...Large audio language models (LALMs) are increasingly used to reason over long audio clips, yet deployment often compresses audio before inference to reduce memory and latency. The risk is that compression can leave aggregate accuracy acceptable while sharply degrading answers for a deployment-critical query family. We study answer-preserving audio compression, judging a compressor by the excess answer-error it induces, especially for the worst-affected family. We formulate this theoretically as ...
|
| eess.IV 2 papers | ||||
| 536 |
Histogramless Time-Domain Sketched Fluorescence Lifetime Imaging
2605.06532
|
eess.IV
|
Zhenya Zang, Istvan Gyongy, Mike Davies |
We present a statistics-aware compression strategy that processes photon timestamps directly from time-correlated single-photon counting (TCSPC) modules for time-domain fluorescence lifetime imaging (FLIM). Rather than storing or transmitting the full histogra...We present a statistics-aware compression strategy that processes photon timestamps directly from time-correlated single-photon counting (TCSPC) modules for time-domain fluorescence lifetime imaging (FLIM). Rather than storing or transmitting the full histogram per pixel, timestamps are projected onto sparse, non-uniform one-dimensional spline sketches, with knot positions optimally allocated based on Fisher information. This knot allocation concentrates sketch channels where the decay signal ex...
|
| 588 |
LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation
2605.06628
|
eess.IVcs.LGcs.MMeess.ASeess.SP
|
Dan Jacobellis, Neeraja J. Yadwadkar |
Modern sensors generate rich, high-fidelity data, yet applications operating on wearable or remote sensing devices remain constrained by bandwidth and power budgets. Standardized codecs such as JPEG and MPEG achieve efficient trade-offs between bitrate and per...Modern sensors generate rich, high-fidelity data, yet applications operating on wearable or remote sensing devices remain constrained by bandwidth and power budgets. Standardized codecs such as JPEG and MPEG achieve efficient trade-offs between bitrate and perceptual quality but are designed for human perception, limiting their applicability to machine-perception tasks and non-traditional modalities such as spatial audio arrays, hyperspectral images, and 3D medical images. General-purpose compre...
|
| eess.SP 2 papers | ||||
| 297 |
CredibleDFGO: Differentiable Factor Graph Optimization with Credibility Supervision
2605.06100
|
eess.SPcs.AIcs.LGcs.RO
|
Liang Qian, Penggao Yan, Penghui Xu, Li-Ta Hsu |
Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods already learn measu...Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods already learn measurement weighting through the solver, but they still use position-only objectives. As a result, the mean estimate may improve while the reported covariance remains too small, too large, or wrong in shape. In this work, we propose CredibleDFG...
|
| 451 |
The frame-level leakage trap: rethinking evaluation protocols for intrinsic image decomposition, with source-separable uncertainty as a case study
2605.06359
|
eess.SPcs.CV
|
Jihwan Woo |
Evaluation protocols for learned intrinsic image decomposition on MPI Sintel have been inconsistent. Several prior works split the dataset by frames, which allows spatially similar frames of the same scene to appear in both train and test partitions. We quanti...Evaluation protocols for learned intrinsic image decomposition on MPI Sintel have been inconsistent. Several prior works split the dataset by frames, which allows spatially similar frames of the same scene to appear in both train and test partitions. We quantify this leakage effect for the first time, across three architectures: a frame-level split inflates test R_PSNR by 1.6 to 2.0 dB (p less than 0.01 for all three, paired t-test across 3 seeds) relative to a scene-level split, confirming an a...
|
| hep-lat 1 papers | ||||
| 314 |
Diffusion model for SU(N) gauge theories
2605.06134
|
hep-latcs.LG
|
Javad Komijani, Marina K. Marinkovic, Lara Turgut |
Implicit score matching provides a computationally efficient approach for training diffusion models and generating high-quality samples from complex distributions. In this work, we develop a score-matching framework for SU(N) lattice gauge theories, which can ...Implicit score matching provides a computationally efficient approach for training diffusion models and generating high-quality samples from complex distributions. In this work, we develop a score-matching framework for SU(N) lattice gauge theories, which can be extended to other Lie groups. We apply the method to SU(3) gauge configurations with the Wilson gauge action in two and four dimensions and assess the quality of the generated samples by comparison with Hybrid Monte Carlo (HMC) simulatio...
|
| math.NA 1 papers | ||||
| 88 |
Convex-Geometric Error Bounds for Positive-Weight Kernel Quadrature
2605.05705
|
math.NAcs.LGmath.PRstat.ML
|
Satoshi Hayakawa |
Kernel quadrature can exploit RKHS spectral structure and outperform Monte Carlo on smooth integrands, but optimized quadrature weights are generally signed and may be numerically unstable. We study whether spectral acceleration remains possible when the weigh...Kernel quadrature can exploit RKHS spectral structure and outperform Monte Carlo on smooth integrands, but optimized quadrature weights are generally signed and may be numerically unstable. We study whether spectral acceleration remains possible when the weights are constrained to be positive, i.e., simplex weights. In the exact-target fixed-pool setting, an evaluated i.i.d. candidate pool of size $N$ is already available and the task is to reweight it so as to approximate the kernel mean embedd...
|
| math.OC 3 papers | ||||
| 22 |
Stability of the Monge Map in Semi-Dual Optimal Transport
2605.05569
|
math.OCcs.LG
|
Anton Selitskiy, David Millard |
This paper shows that the semi-dual formulation of the optimal transport problem has a degenerate saddle-point structure, and that its numerical solution is equivalent to solving a constrained optimization problem. We derive necessary and sufficient conditions...This paper shows that the semi-dual formulation of the optimal transport problem has a degenerate saddle-point structure, and that its numerical solution is equivalent to solving a constrained optimization problem. We derive necessary and sufficient conditions for the convergence of Monge maps without requiring optimality of the dual potential. This analysis helps explain why, in practice, numerical algorithms often require more iterations to update the transport map than the potential.
|
| 501 |
Dynamic Controlled Variables Based Dynamic Self-Optimizing Control
2605.06469
|
math.OCcs.LGeess.SY
|
Chenchen Zhou, Shaoqi Wang, Hongxin Su, Xinhui Tang, Yi Cao |
Self-optimizing control is a strategy for selecting controlled variables, where the economic objective guides the selection and design of controlled variables, with the expectation that maintaining the controlled variables at constant values can achieve optimi...Self-optimizing control is a strategy for selecting controlled variables, where the economic objective guides the selection and design of controlled variables, with the expectation that maintaining the controlled variables at constant values can achieve optimization effects, translating the process optimization problem into a process control problem. Currently, self-optimizing control is widely applied to steady-state optimization problems. However, the development of process systems exhibits a ...
|
| 526 |
Learning to Cut: Reinforcement Learning for Benders Decomposition
2605.06516
|
math.OCcs.AI
|
Haochen Cai, Xian Yu |
Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing numb...Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algo...
|
| math.ST 2 papers | ||||
| 126 |
Optimal Confidence Band for Kernel Gradient Flow Estimator
2605.05768
|
math.STcs.LGstat.ML
|
Yuqian Cheng, Zhuo Chen, Qian Lin |
In this paper, we investigate the supremum-norm generalization error and the uniform inference for a specific class of kernel regression methods, namely the kernel gradient flows. Under the widely adopted capacity-source condition framework in the kernel regre...In this paper, we investigate the supremum-norm generalization error and the uniform inference for a specific class of kernel regression methods, namely the kernel gradient flows. Under the widely adopted capacity-source condition framework in the kernel regression literature, we first establish convergence rates for the supremum norm generalization error of both continuous and discrete kernel gradient flows under the source condition $s>α_0$, where $α_0\in(0,1)$ denotes the embedding index of t...
|
| 292 |
Time-Inhomogeneous Preconditioned Langevin Dynamics
2605.06091
|
math.STcs.LGmath.PRstat.CO
|
Alexander Falk, Laurenz Nagler, Andreas Habring, Thomas Pock |
Langevin sampling from distributions of the form $p(x) \propto \exp(-Ψ(x))$ faces two major challenges: (global) mode coverage and (local) mode exploration. The first challenge is particularly relevant for multi-modal distributions with disjoint modes, whereas...Langevin sampling from distributions of the form $p(x) \propto \exp(-Ψ(x))$ faces two major challenges: (global) mode coverage and (local) mode exploration. The first challenge is particularly relevant for multi-modal distributions with disjoint modes, whereas the second arises when the potential $Ψ$ exhibits diverse and ill-conditioned local mode geometry. To address these challenges, a common approach is to precondition Langevin dynamics with problem-specific information, such as the sample co...
|
| physics.chem-ph 1 papers | ||||
| 366 |
FunctionalAgent: Towards end-to-end on-top functional design
2605.06215
|
physics.chem-phcs.AI
|
Yuhao Chen, Donald G. Truhlar, Xiao He |
Multiconfiguration pair-density functional theory (MC-PDFT) offers an efficient and accurate framework for computing electronic energies in strongly correlated molecular systems, with the quality of the on-top functional being a key determinant of its predicti...Multiconfiguration pair-density functional theory (MC-PDFT) offers an efficient and accurate framework for computing electronic energies in strongly correlated molecular systems, with the quality of the on-top functional being a key determinant of its predictive accuracy. Here we introduce FunctionalAgent, an agentic system for fully automated functional development. FunctionalAgent orchestrates a team of specialized sub-agents to decompose the development process into dataset construction, acti...
|
| physics.flu-dyn 1 papers | ||||
| 574 |
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
2605.06607
|
physics.flu-dyncs.AI
|
Nithin Somasekharan, Rabi Pathak, Manushri Dhanakoti, Tingwen Zhang, Ling Yue |
Recent LLM-based agents have closed substantial portions of the scientific discovery loop in software-only machine-learning research, in chemistry, and in biology. Extending the same loop to high-fidelity physical simulators is harder, because solver completio...Recent LLM-based agents have closed substantial portions of the scientific discovery loop in software-only machine-learning research, in chemistry, and in biology. Extending the same loop to high-fidelity physical simulators is harder, because solver completion does not imply physical validity and many failure modes appear only in field-level imagery rather than in solver logs. We present AI CFD Scientist, an open-source AI scientist for computational fluid dynamics (CFD) that, to our knowledge,...
|
| q-bio.GN 2 papers | ||||
| 258 |
OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning
2605.06728
|
q-bio.GNcs.AIq-bio.CB
|
Maciej Sypetkowski, Joanna Krawczyk, Łukasz Smoliński, Remigiusz Kinas, Przemysław Pietrzak |
Interpreting transcriptomic data is one of the most common analytical tasks in modern biology. Yet most current models either consume expression profiles without producing natural-language biological explanations, or reason in language without direct access to...Interpreting transcriptomic data is one of the most common analytical tasks in modern biology. Yet most current models either consume expression profiles without producing natural-language biological explanations, or reason in language without direct access to quantitative omics measurements. We introduce OmicsLM, a multimodal LLM that connects quantitative omics profiles with natural-language biological tasks. OmicsLM represents each transcriptomic profile as a compact continuous representation...
|
| 582 |
A Linear-Transformer Hybrid for SNP-Based Genotype-to-Phenotype Prediction in Grapevine
2605.06762
|
q-bio.GNcs.AI
|
Yibin Wang, Murukarthick Jayakodi, Silvas Kirubakaran, Ambika Chandra, Azlan Zahid |
Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Tran...Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Transformer approach, LiT-G2P (Linear-Transformer Genotype-to-Phenotype), an automated predictive framework that integrates additive genetic variance effects with Transformer-based nonlinear interactions using genome-wide single-nucleotide poly...
|
| quant-ph 3 papers | ||||
| 43 |
Quantum Kernels for Parity-Structured Classification: A Hybrid Pipeline
2605.05625
|
quant-phcs.LG
|
Tushar Pandey |
Parity (XOR) classification requires detecting discrete, high-order feature interactions that smooth classical kernels cannot efficiently capture. We study how quantum kernel advantage depends on parity complexity, the number of features entering the XOR rule,...Parity (XOR) classification requires detecting discrete, high-order feature interactions that smooth classical kernels cannot efficiently capture. We study how quantum kernel advantage depends on parity complexity, the number of features entering the XOR rule, and find a clear threshold behavior. We pair a ZZ quantum feature map with binary {0, pi} encoding (features median thresholded before circuit input) to expose parity structure. A binary encoding ablation, RBF SVM trained on the identical ...
|
| 195 |
Quantum-enhanced Large Language Models on Quantum Hardware via Cayley Unitary Adapters
2605.05914
|
quant-phcs.AIcs.LG
|
Borja Aizpurua, Sukhbinder Singh, Augustine Kshetrimayum, Saeed S. Jahromi, Roman Orus |
Large language models (LLMs) have transformed artificial intelligence, yet classical architectures impose a fundamental constraint: every trainable parameter demands classical memory that scales unfavourably with model size. Quantum computing offers a qualitat...Large language models (LLMs) have transformed artificial intelligence, yet classical architectures impose a fundamental constraint: every trainable parameter demands classical memory that scales unfavourably with model size. Quantum computing offers a qualitatively different pathway, but practical demonstrations on real hardware have remained elusive for models of practical relevance. Here we show that Cayley-parameterised unitary adapters -- quantum circuit blocks inserted into the frozen proje...
|
| 207 |
Architecture Shape Governs QNN Trainability: Jacobian Null Space Growth and Parameter Efficiency
2605.05942
|
quant-phcs.LG
|
Michael Poppel, David Bucher, Maximilian Zorn, Markus Baumann, Sebastian Wölckert |
Variational quantum circuits with angle encoding implement truncated Fourier series, and architectures arranging $N$ qubits with $L$ encoding layers each -- sharing encoding budget $E = NL$ -- generate identical frequency spectra, identical frequency redundanc...Variational quantum circuits with angle encoding implement truncated Fourier series, and architectures arranging $N$ qubits with $L$ encoding layers each -- sharing encoding budget $E = NL$ -- generate identical frequency spectra, identical frequency redundancy, and require the same minimum parameter count for coefficient control. Despite this equivalence, trainability varies substantially with architecture shape $(N,L)$ at fixed $E$. We identify structural rank deficiency of the coefficient mat...
|
| stat.AP 1 papers | ||||
| 272 |
Correcting heterogeneous diagnostic bias when developing clinical prediction models using causal hidden Markov models
2605.06059
|
stat.APcs.LG
|
Jose Benitez-Aurioles, Ricardo Silva, Brian McMillan, Matthew Sperrin |
In routine care, individuals identified a priori as high-risk are usually tested for conditions more frequently. Protected attributes, such as sex or ethnicity may also determine testing frequency. Such heterogeneous detection rates across a population induce ...In routine care, individuals identified a priori as high-risk are usually tested for conditions more frequently. Protected attributes, such as sex or ethnicity may also determine testing frequency. Such heterogeneous detection rates across a population induce label error. This causes systematic model error for specific groups and biases performance metrics during validation. This paper proposes a method to correct for such bias in prediction models due to differential diagnostic delay. We use ...
|
| stat.ME 3 papers | ||||
| 406 |
A Topological Sorting Criterion for Random Causal Directed Acyclic Graphs
2605.06288
|
stat.MEcs.AI
|
Alexander G. Reisach, Antoine Chambaz, Gilles Blanchard, Sebastian Weichwald |
Random directed acyclic graphs (DAGs) based on imposing an order on Erdős-Rényi and scale free random graphs are widely used for evaluating causal discovery algorithms. We show that in such DAGs, the set of nodes reachable via open paths, termed relatives, inc...Random directed acyclic graphs (DAGs) based on imposing an order on Erdős-Rényi and scale free random graphs are widely used for evaluating causal discovery algorithms. We show that in such DAGs, the set of nodes reachable via open paths, termed relatives, increases monotonically along the causal order. We assess the prevalence of this pattern numerically, and demonstrate that it can be exploited for causal order recovery via sorting by the estimated number of relatives. We note that many simula...
|
| 492 |
A Statistical Framework for Algorithmic Collective Action with Multiple Collectives
2605.06749
|
stat.MEcs.AI
|
Claudio Battiloro, Pietro Greiner, Dario Rancati, Bret Nestor, Oumaima Amezgar |
As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collect...As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collective actions have traditionally been decentralized and fragmented into multiple collectives, despite sharing overarching objectives, with each collective differing in size, strategy, and actionable goals. However, most of the ACA literature ...
|
| 511 |
Estimate Level Adjustment For Inference With Proxies Under Random Distribution Shifts
2605.06484
|
stat.MEcs.LGstat.ML
|
Steven Wilkins-Reeves, Alexandra N. M. Darmon, Deeksha Sinha |
In many scientific domains, including experimentation, researchers rely on measurements of proxy outcomes to achieve faster and more frequent reads, especially when the primary outcome of interest is challenging to measure directly. While proxies offer a more ...In many scientific domains, including experimentation, researchers rely on measurements of proxy outcomes to achieve faster and more frequent reads, especially when the primary outcome of interest is challenging to measure directly. While proxies offer a more readily accessible observation for inference, the ultimate goal is to draw statistical inferences about the primary outcome parameter and proxy data are typically imperfect in some ways. To correct for these imperfections, current statistic...
|
| stat.ML 26 papers | ||||
| 21 |
Relaxed Sparsest-Permutation Formulation for Causal Discovery at Scale
2605.05568
|
stat.MLcs.LG
|
Sunmin Oh, Sang-Yun Oh, Gunwoong Park |
Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary...Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary for structure recovery. This observation motivates a support-level relaxation that searches for sparse triangular factors over a precision-support screening graph. The relaxed formulation can be efficiently evaluated via masked zero-fill i...
|
| 31 |
In-Context Positive-Unlabeled Learning
2605.05591
|
stat.MLcs.LGstat.CO
|
Siyan Liu, Yi Chang, Manli Cheng, Qinglong Tian, Pengfei Li |
Positive-unlabeled (PU) learning addresses binary classification when only a set of labeled positives is available alongside a pool of unlabeled samples drawn from a mixture of positives and negatives. Existing PU methods typically require dataset-specific tra...Positive-unlabeled (PU) learning addresses binary classification when only a set of labeled positives is available alongside a pool of unlabeled samples drawn from a mixture of positives and negatives. Existing PU methods typically require dataset-specific training or iterative optimization, which limits their applicability when many tasks must be solved quickly or with little tuning. We introduce PUICL, a pretrained transformer that solves PU classification entirely through in-context learning....
|
| 37 |
Variational Smoothing and Inference for SDEs from Sparse Data with Dynamic Neural Flows
2605.05606
|
stat.MLcs.LGmath.PR
|
Yu Wang, Arnab Ganguly |
Stochastic differential equations (SDEs) provide a flexible framework for modeling temporal dynamics in partially observed systems. A central task is to calibrate such models from data, which requires inferring latent trajectories and parameters from sparse, n...Stochastic differential equations (SDEs) provide a flexible framework for modeling temporal dynamics in partially observed systems. A central task is to calibrate such models from data, which requires inferring latent trajectories and parameters from sparse, noisy observations. Classical smoothing methods for this problem are often limited by path degeneracy and poor scalability. In this work, we developed a novel method based on characterization of the posterior SDE in terms of conditional back...
|
| 46 |
Spherical Flows for Sampling Categorical Data
2605.05629
|
stat.MLcs.CLcs.LG
|
Jannis Chemseddine, Gregor Kornhardt, Gabriele Steidl |
We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb S^{d-1}$. There the v...We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb S^{d-1}$. There the von Mises-Fisher (vMF) distribution induces a natural noise process and admits a closed-form conditional score. The conditional velocity is in general intractable. Exploiting the radial symmetry of the vMF density we reduce the continuity eq...
|
| 71 |
Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization
2605.05683
|
stat.MLcs.LG
|
Andy Zeyi Liu, Elliot Paquette, John Sous |
Training loss and throughput can hide distinct internal representation in language-model training. To examine these hidden mechanics, we use spectral measurements as practical and operational diagnostics. Using a controlled family of decoder-only models adapte...Training loss and throughput can hide distinct internal representation in language-model training. To examine these hidden mechanics, we use spectral measurements as practical and operational diagnostics. Using a controlled family of decoder-only models adapted from the modded NanoGPT codebase, we introduce an empirical protocol based on activation covariance and per-sample gradient SVD spectra. This dual-view reveals three empirical findings and one mechanistic explanation. First, batch size ac...
|
| 111 |
Fourier Feature Methods for Nonlinear Causal Discovery: FFML Scoring and FFCI Testing in Mixed Data
2605.05743
|
stat.MLcs.AIcs.LG
|
Joseph D. Ramsey |
Gaussian process marginal likelihood scores and kernel conditional independence tests are theoretically appealing for nonlinear causal discovery but computationally prohibitive at scale. We present two complementary RFF-based methods forming a practical toolki...Gaussian process marginal likelihood scores and kernel conditional independence tests are theoretically appealing for nonlinear causal discovery but computationally prohibitive at scale. We present two complementary RFF-based methods forming a practical toolkit for score-based, constraint-based, and hybrid causal discovery. The Fourier Feature Marginal Likelihood (FFML) score approximates the exact GP marginal likelihood by replacing the n x n kernel Gram matrix with a finite-dimensional featu...
|
| 118 |
Transformers Provably Implement In-Context Reinforcement Learning with Policy Improvement
2605.05755
|
stat.MLcs.AIcs.LG
|
Haodong Liang, Lifeng Lai |
We investigate the ability of transformers to perform in-context reinforcement learning (ICRL), where a model must infer and execute learning algorithms from trajectory data without parameter updates. We show that a linear self-attention transformer block can ...We investigate the ability of transformers to perform in-context reinforcement learning (ICRL), where a model must infer and execute learning algorithms from trajectory data without parameter updates. We show that a linear self-attention transformer block can provably implement policy-improvement methods, including semi-gradient SARSA and actor-critic, via explicit parameter constructions. Beyond existence, we design a teacher-mimicking training procedure, analyze its gradient-flow dynamics, and...
|
| 143 |
Ratio-based Loss Functions
2605.05808
|
stat.MLcs.LGmath.ST
|
Lena Helgerth, Andreas Christmann |
Algorithms in machine learning and AI do critically depend on at least three key components: (i) the risk function, which is the expectation of the loss function, (ii) the function space, which is often called the hypothesis space, and (iii) the set of probabi...Algorithms in machine learning and AI do critically depend on at least three key components: (i) the risk function, which is the expectation of the loss function, (ii) the function space, which is often called the hypothesis space, and (iii) the set of probability measures, which are allowed for the specified algorithm. This paper gives a survey of a certain class of loss functions, which we call ratio-based. In supervised learning, margin-based loss functions for classification tasks depending ...
|
| 174 |
CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency
2605.05873
|
stat.MLcs.AIcs.LGmath.STstat.ME
|
Hirofumi Ota, Naoto Iwase, Yuki Ichihara, Junpei Komiyama, Masaaki Imaizumi |
Large language models often improve reasoning by sampling multiple outputs and aggregating their final answers, but precise and efficient control of error levels remains a challenging task. In particular, deciding when to stop sampling remains difficult when t...Large language models often improve reasoning by sampling multiple outputs and aggregating their final answers, but precise and efficient control of error levels remains a challenging task. In particular, deciding when to stop sampling remains difficult when the stopping rule is data-dependent and the set of possible answers is not known in advance. We study anytime-valid certification of a prespecified target answer as the unique mode of the model's response distribution, a guarantee distinct f...
|
| 177 |
Tuning Derivatives for Causal Fairness in Machine Learning
2605.05882
|
stat.MLcs.AIcs.CYcs.LG
|
Filip Edström, Guilherme W. F. Barros, Tetiana Gorbach, Xavier de Luna |
Artificial-intelligence systems are becoming ubiquitous in society, yet their predictions typically inherit biases with respect to protected attributes such as race, gender, or age. Classical fairness notions, most notably Statistical Parity (SP), demand that ...Artificial-intelligence systems are becoming ubiquitous in society, yet their predictions typically inherit biases with respect to protected attributes such as race, gender, or age. Classical fairness notions, most notably Statistical Parity (SP), demand that predictions be independent of the protected attributes, but are overly restrictive when these attributes influence mediating variables that are considered business necessities. Recent causal formulations relax SP by distinguishing allowed f...
|
| 223 |
Towards Reliable LLM Evaluation: Correcting the Winner's Curse in Adaptive Benchmarking
2605.05973
|
stat.MLcs.AIcs.LGstat.AP
|
Yang Xu, Jiefu Zhang, Haixiang Sun, Zihan Zhou, Tianyu Cao |
Adaptive prompt and program search makes LLM evaluation selection-sensitive. Once benchmark items are reused inside tuning, the observed winner's score need not estimate the fresh-data performance of the full tune-then-deploy procedure. We study inference for ...Adaptive prompt and program search makes LLM evaluation selection-sensitive. Once benchmark items are reused inside tuning, the observed winner's score need not estimate the fresh-data performance of the full tune-then-deploy procedure. We study inference for this procedure-level target under explicit tuning budgets. We propose SIREN, a selection-aware repeated-split reporting protocol that freezes the post-search shortlist, separates splitwise selection from held-out evaluation, and uses an ite...
|
| 233 |
TabCF: Distributional Control Function Estimation with Tabular Foundation Models
2605.05993
|
stat.MLcs.LGstat.MEstat.OT
|
Geping Chen, Chunlin Li, Tianzhong Yang, Zhengyuan Zhu, Jing Zhou |
Instrumental variable (IV) and control function (CF) methods are powerful tools for causal effect estimation in the presence of unmeasured confounding, yet most existing approaches target only mean effects and/or demand substantial fitting and tuning effort. I...Instrumental variable (IV) and control function (CF) methods are powerful tools for causal effect estimation in the presence of unmeasured confounding, yet most existing approaches target only mean effects and/or demand substantial fitting and tuning effort. In this paper, we introduce a simple method, TabCF, for control function regression using tabular foundation models, which enables accurate, fast, identification-transparent, and tuning-light causal estimation of distributional quantities, s...
|
| 236 |
Gaussian mixture models in Hilbert spaces via kernel methods
2605.05996
|
stat.MLcs.LG
|
Daniel López-Montero, Antonio Álvarez-López, Marcos Matabuena |
Modern datasets across many disciplines increasingly consist of time-evolving, potentially infinite-dimensional random objects, such as dynamic functional data, which are naturally modeled in Hilbert spaces. In these settings, characterizing probability measur...Modern datasets across many disciplines increasingly consist of time-evolving, potentially infinite-dimensional random objects, such as dynamic functional data, which are naturally modeled in Hilbert spaces. In these settings, characterizing probability measures, for example, through densities, can be ill-defined or technically challenging. Motivated by clustering applications, we propose a Gaussian mixture framework for Hilbert-space-valued data based on kernel mean embeddings and develop effic...
|
| 336 |
Expressivity of Bi-Lipschitz Normalizing Flows: A Score-Based Diffusion Perspective
2605.06172
|
stat.MLcs.LGmath.NAmath.PR
|
Meira Iske, Carola-Bibiane Schönlieb |
Many normalizing flow architectures impose regularity constraints, yet their distributional approximation properties are not fully characterized. We study the expressivity of bi-Lipschitz normalizing flows through the lens of score-based diffusion models. For ...Many normalizing flow architectures impose regularity constraints, yet their distributional approximation properties are not fully characterized. We study the expressivity of bi-Lipschitz normalizing flows through the lens of score-based diffusion models. For the probability flow ODE of a variance-preserving diffusion, Lipschitz regularity of the score induces a flow of bi-Lipschitz diffeomorphic transport maps. This ODE bridge allows us to analyze the distributional approximation power of bi-Li...
|
| 358 |
When Does Trimming Help Conformal Prediction? A Retained-Law Diagnostic under Calibration Contamination
2605.06204
|
stat.MLcs.LG
|
Congye Wang |
Trimming suspicious calibration points is a common response to contamination in conformal prediction. Its effect on clean-target coverage, however, is governed by the retained law induced by trimming, not by the contamination level alone. We analyse fixed-thre...Trimming suspicious calibration points is a common response to contamination in conformal prediction. Its effect on clean-target coverage, however, is governed by the retained law induced by trimming, not by the contamination level alone. We analyse fixed-threshold trimming as conditioning rather than purification. It replaces the contaminated calibration law with a retained law, reducing clean-target coverage to a one-dimensional score-CDF transfer problem with an exact finite-sample identity. ...
|
| 361 |
Super-Level-Set Regression: Conditional Quantiles via Volume Minimization
2605.06210
|
stat.MLcs.AIcs.LGstat.APstat.ME
|
Sacha Braun, Michael I. Jordan, Francis Bach |
Constructing minimum-volume prediction regions that satisfy conditional coverage is a fundamental challenge in multivariate regression. Standard approaches rely on explicitly estimating the full conditional density and subsequently thresholding it. This two-st...Constructing minimum-volume prediction regions that satisfy conditional coverage is a fundamental challenge in multivariate regression. Standard approaches rely on explicitly estimating the full conditional density and subsequently thresholding it. This two-step plug-in process is notoriously difficult, sensitive to estimation errors, and computationally expensive. One would like to instead optimize the region directly. Formulating a direct solution is challenging, however, because it requires m...
|
| 393 |
ConquerNet: Convolution-Smoothed Quantile ReLU Neural Networks with Minimax Guarantees
2605.06265
|
stat.MLcs.LG
|
Tianpai Luo, Fangwei Wu, Weichi Wu |
Quantile regression is a fundamental tool for distributional learning but poses significant optimization challenges for deep models due to the non-smoothness of the pinball loss. We propose ConquerNet, a class of \textbf{con}volution-smoothed \textbf{qu}antil\...Quantile regression is a fundamental tool for distributional learning but poses significant optimization challenges for deep models due to the non-smoothness of the pinball loss. We propose ConquerNet, a class of \textbf{con}volution-smoothed \textbf{qu}antil\textbf{e} \textbf{R}eLU neural \textbf{net}works, which yield smooth objectives while preserving the underlying quantile structure. We establish general nonasymptotic risk bounds for ConquerNet under mild conditions, providing minimax guara...
|
| 407 |
Multimodal Deep Generative Model for Semi-Supervised Learning under Class Imbalance
2605.06289
|
stat.MLcs.AIcs.LG
|
Heegeon Yoon, Heeyoung Kim |
When modeling class-imbalanced data, it is crucial to address the imbalance, as models trained on such data tend to be biased towards the majority classes. This problem is amplified under partial supervision, where pseudo-labels for unlabeled data are predicte...When modeling class-imbalanced data, it is crucial to address the imbalance, as models trained on such data tend to be biased towards the majority classes. This problem is amplified under partial supervision, where pseudo-labels for unlabeled data are predicted based on imbalanced labeled data, propagating the bias. While recent semi-supervised models address class imbalance, they typically assume single-modal input data. However, with the growing availability of multimodal data, it is essential...
|
| 420 |
End-to-End Identifiable and Consistent Recurrent Switching Dynamical Systems
2605.06315
|
stat.MLcs.LG
|
Carles Balsells-Rodas, Zhengrui Xiang, Xavier Sumba, Yingzhen Li |
Learning identifiable representations in deep generative models remains a fundamental challenge, particularly for sequential data with regime-switching dynamics. Existing approaches establish identifiability under restrictive assumptions, such as stationarity ...Learning identifiable representations in deep generative models remains a fundamental challenge, particularly for sequential data with regime-switching dynamics. Existing approaches establish identifiability under restrictive assumptions, such as stationarity or limited emission models, and typically rely on variational autoencoder (VAE) estimators, which introduce approximation gaps that limit the recovery of the latent structure. In this work, we address both the theoretical and practical limi...
|
| 456 |
The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models
2605.06367
|
stat.MLcond-mat.dis-nncs.LG
|
Flavio Nicoletti, Chenxiao Ma, Enrico Ventura, Luca Saglietti, Stefano Sarao Mannelli |
Real-world datasets are inherently heterogeneous, yet how per-class structural differences and sampling imbalance shape the training dynamics of diffusion models-and potentially exacerbate disparities-remains poorly understood. While models typically transitio...Real-world datasets are inherently heterogeneous, yet how per-class structural differences and sampling imbalance shape the training dynamics of diffusion models-and potentially exacerbate disparities-remains poorly understood. While models typically transition from an initial phase of generalization to memorizing the training set, existing theory assumes homogeneous data, leaving open how class imbalance and heterogeneity reshape these dynamics. In this work, we develop a high-dimensional analy...
|
| 459 |
Beyond the Independence Assumption: Finite-Sample Guarantees for Deep Q-Learning under $τ$-Mixing
2605.06373
|
stat.MLcs.LG
|
Leon Halgryn, Sophie Langer, Janusz M. Meylahn, E. Moritz Hahn |
Finite-sample analyses of deep Q-learning typically treat replayed data as independent, even though it is sampled from temporally dependent state-action trajectories. We study the Deep Q-networks (DQN) algorithm under explicit dependence by modelling the minib...Finite-sample analyses of deep Q-learning typically treat replayed data as independent, even though it is sampled from temporally dependent state-action trajectories. We study the Deep Q-networks (DQN) algorithm under explicit dependence by modelling the minibatches used for updating the network as $τ$-mixing. We show that this assumption holds under certain dependence conditions on the underlying trajectories and the mechanism used to sample minibatches. Building on this observation, we extend ...
|
| 477 |
Decoupled PFNs: Identifiable Epistemic-Aleatoric Decomposition via Structured Synthetic Priors
2605.06413
|
stat.MLcs.LG
|
Richard Bergna, Stefan Depeweg, José Miguel Hernández-Lobato |
Prior-Fitted Networks (PFNs) amortize Bayesian prediction by meta-learning over a synthetic task prior, but their standard output is a posterior predictive distribution over noisy observations. For sequential decision-making, such as active learning and Bayesi...Prior-Fitted Networks (PFNs) amortize Bayesian prediction by meta-learning over a synthetic task prior, but their standard output is a posterior predictive distribution over noisy observations. For sequential decision-making, such as active learning and Bayesian optimization, acquisition should prioritize epistemic uncertainty about the latent signal rather than irreducible aleatoric observation noise. We show that this epistemic--aleatoric split is not identifiable in general from the posterior...
|
| 485 |
Neural-Actuarial Longevity Forecasting: Anchoring LSTMs for Explainable Risk Management
2605.06438
|
stat.MLcs.LGq-fin.RM
|
Davide Rindori |
Traditional multi-population models, such as the Li-Lee framework, rely on the assumption of mean-reverting country-specific deviations. However, recent data from high-longevity clusters suggest a systemic break in this paradigm. We identify a stationarity par...Traditional multi-population models, such as the Li-Lee framework, rely on the assumption of mean-reverting country-specific deviations. However, recent data from high-longevity clusters suggest a systemic break in this paradigm. We identify a stationarity paradox where mortality residuals in countries like Sweden and West Germany exhibit persistent unit roots, leading to a systematic mispricing of longevity risk in linear models. To address these non-linearities, we propose Hybrid-Lift, a neura...
|
| 508 |
Risk-Controlled Post-Processing of Decision Policies
2605.06479
|
stat.MLcs.LGmath.ST
|
Sunay Joshi, Tao Wang, Hamed Hassani, Edgar Dobriban |
Predictive models are often deployed through existing decision policies that stakeholders are reluctant to change unless a risk constraint requires intervention. We study risk-controlled post-processing: given a deterministic baseline policy, choose a new poli...Predictive models are often deployed through existing decision policies that stakeholders are reluctant to change unless a risk constraint requires intervention. We study risk-controlled post-processing: given a deterministic baseline policy, choose a new policy that maximizes agreement with the baseline subject to a chance constraint on a user-specified loss. At the population level, we show that the optimal policy has a threshold structure: it follows the baseline except on contexts where swit...
|
| 552 |
Dynamic Treatment on Networks
2605.06564
|
stat.MLcs.LG
|
Bengusu Nar, Jiguang Li, Veronika Ročková, Panos Toulis |
In networks, effective dynamic treatment allocation requires deciding both whom to treat and also when, so as to amplify policy impact through spillovers. An early intervention at a well-connected node can trigger cascades that change which nodes are worth tar...In networks, effective dynamic treatment allocation requires deciding both whom to treat and also when, so as to amplify policy impact through spillovers. An early intervention at a well-connected node can trigger cascades that change which nodes are worth targeting in the next period. Existing treatment strategies under network interference are largely static while dynamic treatment frameworks typically ignore network structure altogether. We integrate these perspectives and propose Q-Ising, a ...
|
| 575 |
DARTS: Targeting Prognostic Covariates in Budget-Constrained Sequential Experiments
2605.06608
|
stat.MLcs.LGstat.ME
|
Kateryna Husar, Alexander Volfovsky |
Randomized controlled trials typically assume that prognostic covariates are known and available at no cost. In practice, obtaining high-dimensional pretreatment data is costly, forcing a trade-off between covariate-adaptive precision and a measurement budget....Randomized controlled trials typically assume that prognostic covariates are known and available at no cost. In practice, obtaining high-dimensional pretreatment data is costly, forcing a trade-off between covariate-adaptive precision and a measurement budget. We introduce Dynamic Adaptive Rerandomization via Thompson Sampling (DARTS), which treats covariate acquisition as a sequential optimization problem embedded within a design-based causal inference task. A budgeted combinatorial Thompson sa...
|