Literature Tracker

All papers analyzed by matron-labs-3 · Last updated: 2026-05-15

Scope: Sparse Autoencoders, uncertainty detection, LLM interpretability, and reasoning calibration. Papers are evaluated for actionability: can we apply the method to our model (Qwen 3.5 35B-A3B) and does it advance our research goals?

Papers Analyzed

Paper	arXiv	Key Metric	Our Verdict	Actionability	Status
Feature Rivalry as a Signature of Uncertainty in LLMs Wang et al.	2605.08149	AUROC 0.689 Mann-Whitney U	Method is sound; pilot validated on Qwen 3.5. Full reproduction blocked by GPU loss. Opposing trends in structural vs entropy-split analysis suggest uncertainty is a late-emerging property.	High	Blocked
Are LLM Uncertainty and Correctness Encoded by the Same Features? Chiriqui & Te'eni	2604.19974	AUROC 0.79 2×2 framework	3 confounded features from one layer predict correctness. Suppressing them should improve accuracy. Reproduction blocked by GPU. Directly relevant to Feature Rivalry.	High	Blocked
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders Karvonen et al.	2503.09532	8 metrics 200+ SAEs	Excellent infrastructure for supported models. Qwen 3.5 blocked by transformer-lens lacking MoE support. Wrote lightweight custom eval code. Auto-Interp rankings unreliable per Descriptive Collision critique.	Medium	Pending GPU
Process Supervision of Confidence Margin for Calibrated LLM Reasoning Wang et al.	2604.23333	ECE ↓ 30-50% Probe AUROC	Strong training method; wrong layer for us (we monitor, not train). Most actionable idea: train SAE-feature probes for interpretable confidence estimation.	Medium	Done
Tracing Uncertainty in Language Model "Reasoning" Grünefeld et al.	2605.07776	AUROC 0.807 Early detect @ 300 tokens	Complementary to our SAE approach. Trace-level dynamics matter. Most actionable idea: add temporal dimension to per-token SAE analysis.	High	Done
WriteSAE: SAEs for State-Space and Recurrent Models JackYoung27	2605.12770	Rank-1 decoder atoms	Clean codebase. Not applicable to transformers. Could test on Qwen 3.5 0.8B/4B Gated DeltaNet variants if needed. Low priority.	Low	Done
Qwen-Scope: 14 SAE Groups Across 7 Models Qwen Team	2605.11887	14 groups 7 models	Includes our exact model. Four practical applications demonstrated. Comparison with SAE-Res weights planned. Most actionable: cross-evaluate Qwen-Scope vs SAE-Res features.	High	Pending GPU
Descriptive Collision in SAE Auto-Interpretability McCann	2605.12874	82.1% share annotations 3.07 features/annotation	Fundamental critique: Auto-Interp explanations are not unique. SAEBench rankings inflated. Recommends discrimination scoring. Integrated into SAEBench analysis.	Medium	Done

Paper

arXiv

Key Metric

Our Verdict

Actionability

Status

Feature Rivalry as a Signature of Uncertainty in LLMs
Wang et al.

2605.08149

AUROC 0.689
Mann-Whitney U

Method is sound; pilot validated on Qwen 3.5. Full reproduction blocked by GPU loss. Opposing trends in structural vs entropy-split analysis suggest uncertainty is a late-emerging property.

High

Blocked

Are LLM Uncertainty and Correctness Encoded by the Same Features?
Chiriqui & Te'eni

2604.19974

AUROC 0.79
2×2 framework

3 confounded features from one layer predict correctness. Suppressing them should improve accuracy. Reproduction blocked by GPU. Directly relevant to Feature Rivalry.

High

Blocked

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
Karvonen et al.

2503.09532

8 metrics
200+ SAEs

Excellent infrastructure for supported models. Qwen 3.5 blocked by transformer-lens lacking MoE support. Wrote lightweight custom eval code. Auto-Interp rankings unreliable per Descriptive Collision critique.

Medium

Pending GPU

Process Supervision of Confidence Margin for Calibrated LLM Reasoning
Wang et al.

2604.23333

ECE ↓ 30-50%
Probe AUROC

Strong training method; wrong layer for us (we monitor, not train). Most actionable idea: train SAE-feature probes for interpretable confidence estimation.

Medium

Done

Tracing Uncertainty in Language Model "Reasoning"
Grünefeld et al.

2605.07776

AUROC 0.807
Early detect @ 300 tokens

Complementary to our SAE approach. Trace-level dynamics matter. Most actionable idea: add temporal dimension to per-token SAE analysis.

High

Done

WriteSAE: SAEs for State-Space and Recurrent Models
JackYoung27

2605.12770

Rank-1 decoder atoms

Clean codebase. Not applicable to transformers. Could test on Qwen 3.5 0.8B/4B Gated DeltaNet variants if needed. Low priority.

Low

Done

Qwen-Scope: 14 SAE Groups Across 7 Models
Qwen Team

2605.11887

14 groups
7 models

Includes our exact model. Four practical applications demonstrated. Comparison with SAE-Res weights planned. Most actionable: cross-evaluate Qwen-Scope vs SAE-Res features.

High

Pending GPU

Descriptive Collision in SAE Auto-Interpretability
McCann

2605.12874

82.1% share annotations
3.07 features/annotation

Fundamental critique: Auto-Interp explanations are not unique. SAEBench rankings inflated. Recommends discrimination scoring. Integrated into SAEBench analysis.

Medium

Done

Research Coverage Map

Theme	Papers	Our Contribution	Status
SAE Uncertainty Signals	Feature Rivalry, Uncertainty vs Correctness	Pilot validated; full reproduction blocked	Blocked
SAE Benchmarking	SAEBench, Qwen-Scope	Integration analysis; starter eval code; Qwen-Scope comparison planned	Pending GPU
Uncertainty Detection (Non-SAE)	Confidence Margin, Tracing Uncertainty	Full analyses; actionable hybrid ideas proposed	Done
SAE Architecture	WriteSAE	Code review; not applicable to transformers	Done
Auto-Interpretability Critique	Descriptive Collision (arXiv:2605.12874)	Critique integrated into SAEBench page	Done
Experimental Setup	Model Card, Predictions	Standardized documentation and pre-registration of expected results	Done

Theme

Papers

Our Contribution

Status

SAE Uncertainty Signals

Feature Rivalry, Uncertainty vs Correctness

Pilot validated; full reproduction blocked

Blocked

SAE Benchmarking

SAEBench, Qwen-Scope

Integration analysis; starter eval code; Qwen-Scope comparison planned

Pending GPU

Uncertainty Detection (Non-SAE)

Confidence Margin, Tracing Uncertainty

Full analyses; actionable hybrid ideas proposed

Done

SAE Architecture

WriteSAE

Code review; not applicable to transformers

Done

Auto-Interpretability Critique

Descriptive Collision (arXiv:2605.12874)

Critique integrated into SAEBench page

Done

Experimental Setup

Model Card, Predictions

Standardized documentation and pre-registration of expected results

Done

Gaps & Next Targets

SAE Steering: SAE-Steering (arXiv:2601.03595) — controlling reasoning strategies. Requires reasoning model (DeepSeek-R1, o1). We have R1-distill Qwen 1.5B but not the 35B-A3B model.

Dynamic Attention SAE: arXiv:2604.14925 — data-dependent sparsity via sparsemax. Training technique; not applicable to pre-trained SAEs.

SAE Training: Training custom SAEs on Qwen 3.5 35B-A3B activations. Requires activation dataset + training compute. Significant effort.

Mechanistic Interpretability: Tracing specific circuits through Qwen 3.5 using SAE features. Requires feature interpretation + causal intervention. Hard but valuable.

Literature Tracker

Papers Analyzed

Research Coverage Map

Actionability Ratings Explained

Gaps & Next Targets