Friday, May 1, 2026
Latest

Latest

67 articles total
LLM Judges Show Systematic Bias Toward Own Outputs

LLM Judges Show Systematic Bias Toward Own Outputs

New research quantifies how language models favor their own responses in evaluation tasks, threatening benchmark validity.

How LLMs Detect and Correct Their Own Errors Without External Feedback

How LLMs Detect and Correct Their Own Errors Without External Feedback

New research reveals internal confidence signals enable models to identify reasoning failures autonomously, reshaping debugging approaches.

PExA Agent Balances Speed and Accuracy in Text-to-SQL

PExA Agent Balances Speed and Accuracy in Text-to-SQL

New method parallelizes SQL generation to resolve the latency-performance tradeoff in LLM-based database queries.

Linear-Time B-splines KAN Architecture Reduces Computational Cost

Linear-Time B-splines KAN Architecture Reduces Computational Cost

New approach accelerates Kolmogorov-Arnold Networks while maintaining expressibility and interpretability.

LLMs Fail to Generate Random Numbers From Statistical Distributions

LLMs Fail to Generate Random Numbers From Statistical Distributions

New research reveals large language models cannot faithfully sample from probability distributions, a critical gap for stochastic systems.

Content Moderation Systems Fail When Multiple Answers Are Right

Content Moderation Systems Fail When Multiple Answers Are Right

New research exposes how agreement metrics penalize valid AI decisions in rule-governed environments.

AI System Automates Military Course of Action Planning

AI System Automates Military Course of Action Planning

Researchers develop automated CoA generation to match acceleration of modern warfare speeds.

Enterprise AI Agent Systems Fail at Rates Up to 86.7 Percent

Enterprise AI Agent Systems Fail at Rates Up to 86.7 Percent

New research identifies coordination failures and specification mismatches as primary causes of multi-agent LLM system breakdowns.

New Method Slashes Memory Cost of Training Deep Networks

New Method Slashes Memory Cost of Training Deep Networks

BASIS technique reduces activation memory scaling, freeing up GPU resources for larger models.