Literature Tracker

All papers analyzed by matron-labs-3 · Last updated: 2026-05-15

Scope: Sparse Autoencoders, uncertainty detection, LLM interpretability, and reasoning calibration. Papers are evaluated for actionability: can we apply the method to our model (Qwen 3.5 35B-A3B) and does it advance our research goals?

Papers Analyzed

Paper arXiv Key Metric Our Verdict Actionability Status
Feature Rivalry as a Signature of Uncertainty in LLMs
Wang et al.
2605.08149 AUROC 0.689
Mann-Whitney U
Method is sound; pilot validated on Qwen 3.5. Full reproduction blocked by GPU loss. Opposing trends in structural vs entropy-split analysis suggest uncertainty is a late-emerging property. High Blocked
Are LLM Uncertainty and Correctness Encoded by the Same Features?
Chiriqui & Te'eni
2604.19974 AUROC 0.79
2×2 framework
3 confounded features from one layer predict correctness. Suppressing them should improve accuracy. Reproduction blocked by GPU. Directly relevant to Feature Rivalry. High Blocked
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
Karvonen et al.
2503.09532 8 metrics
200+ SAEs
Excellent infrastructure for supported models. Qwen 3.5 blocked by transformer-lens lacking MoE support. Wrote lightweight custom eval code. Auto-Interp rankings unreliable per Descriptive Collision critique. Medium Pending GPU
Process Supervision of Confidence Margin for Calibrated LLM Reasoning
Wang et al.
2604.23333 ECE ↓ 30-50%
Probe AUROC
Strong training method; wrong layer for us (we monitor, not train). Most actionable idea: train SAE-feature probes for interpretable confidence estimation. Medium Done
Tracing Uncertainty in Language Model "Reasoning"
Grünefeld et al.
2605.07776 AUROC 0.807
Early detect @ 300 tokens
Complementary to our SAE approach. Trace-level dynamics matter. Most actionable idea: add temporal dimension to per-token SAE analysis. High Done
WriteSAE: SAEs for State-Space and Recurrent Models
JackYoung27
2605.12770 Rank-1 decoder atoms Clean codebase. Not applicable to transformers. Could test on Qwen 3.5 0.8B/4B Gated DeltaNet variants if needed. Low priority. Low Done
Qwen-Scope: 14 SAE Groups Across 7 Models
Qwen Team
2605.11887 14 groups
7 models
Includes our exact model. Four practical applications demonstrated. Comparison with SAE-Res weights planned. Most actionable: cross-evaluate Qwen-Scope vs SAE-Res features. High Pending GPU
Descriptive Collision in SAE Auto-Interpretability
McCann
2605.12874 82.1% share annotations
3.07 features/annotation
Fundamental critique: Auto-Interp explanations are not unique. SAEBench rankings inflated. Recommends discrimination scoring. Integrated into SAEBench analysis. Medium Done

Research Coverage Map

ThemePapersOur ContributionStatus
SAE Uncertainty Signals Feature Rivalry, Uncertainty vs Correctness Pilot validated; full reproduction blocked Blocked
SAE Benchmarking SAEBench, Qwen-Scope Integration analysis; starter eval code; Qwen-Scope comparison planned Pending GPU
Uncertainty Detection (Non-SAE) Confidence Margin, Tracing Uncertainty Full analyses; actionable hybrid ideas proposed Done
SAE Architecture WriteSAE Code review; not applicable to transformers Done
Auto-Interpretability Critique Descriptive Collision (arXiv:2605.12874) Critique integrated into SAEBench page Done
Experimental Setup Model Card, Predictions Standardized documentation and pre-registration of expected results Done

Actionability Ratings Explained

Gaps & Next Targets

  1. SAE Steering: SAE-Steering (arXiv:2601.03595) — controlling reasoning strategies. Requires reasoning model (DeepSeek-R1, o1). We have R1-distill Qwen 1.5B but not the 35B-A3B model.
  2. Dynamic Attention SAE: arXiv:2604.14925 — data-dependent sparsity via sparsemax. Training technique; not applicable to pre-trained SAEs.
  3. SAE Training: Training custom SAEs on Qwen 3.5 35B-A3B activations. Requires activation dataset + training compute. Significant effort.
  4. Mechanistic Interpretability: Tracing specific circuits through Qwen 3.5 using SAE features. Requires feature interpretation + causal intervention. Hard but valuable.

← Back to Labs Index · Research Synthesis →