Qwen-Scope: Turning Sparse Features into Development Tools

Paper: arXiv:2605.11887 · Authors: Deng et al. (Qwen Team) · Explored: 2026-05-15

Overview

Most SAE research treats sparse autoencoders as post-hoc analysis tools: train them, look at features, write descriptions, done. Qwen-Scope breaks from this pattern. It is an open-source suite of 14 SAE groups across 7 Qwen model variants (Qwen3 and Qwen3.5, both dense and MoE), and it demonstrates four practical applications where SAEs are used as development interfaces — not just diagnostic mirrors.

Core claim: SAEs can serve as reusable representation-level interfaces for diagnosing, controlling, evaluating, and improving LLMs. The paper shows this empirically across steering, evaluation, data classification, and post-training optimization.

What Qwen-Scope Provides

ModelArchitectureSAE Groups
Qwen3-0.6BDense2
Qwen3-4BDense2
Qwen3-8BDense2
Qwen3.5-0.5BMoE2
Qwen3.5-1.5BMoE2
Qwen3.5-4BMoE2
Qwen3.5-35B-A3BMoE2

Each "group" contains SAEs for multiple layers. All weights are released on HuggingFace. The training uses a standard TopK SAE architecture with JumpReLU gating, trained on large-scale activation datasets.

Four Applications

1. Inference-time steering

SAE features are used as control vectors during inference. By adding or subtracting feature directions from the residual stream, the model's behavior changes without modifying any weights.

Case studies demonstrated:

The steering is feature-selective: rather than applying a blanket intervention to all activations, they identify specific features responsible for the target behavior and intervene only on those. This is more precise than full-layer steering (e.g., Representation Engineering).

2. Evaluation analysis

SAE activations provide a representation-level proxy for what a benchmark is actually testing. The authors use this to:

3. Data-centric workflows

SAE features are used for data classification and synthesis:

4. Post-training optimization

SAE features are incorporated into training objectives:

Results: SAE-guided SFT reduces code-switching by 40% with minimal accuracy loss. SAE-guided RL reduces repetition by 25% without degrading helpfulness.

Connections to Our Work

They have SAEs for our exact model

Qwen-Scope includes SAEs for Qwen3.5-35B-A3B — the exact model we are studying. Their SAEs may differ from the SAE-Res weights we are using (different training recipe, different hyperparameters). Comparing the two could reveal which features are robust across training runs and which are artifacts of the specific SAE configuration.

Uncertainty detection is unexplored in Qwen-Scope

Despite covering steering, evaluation, data, and post-training, Qwen-Scope does not address uncertainty detection or correctness prediction. This is a gap we can fill: apply their SAEs (or ours) to the uncertainty detection tasks explored in Feature Rivalry, Confidence Margin, and Tracing Uncertainty.

Feature steering vs feature rivalry

Qwen-Scope steers by amplifying/suppressing specific features. Feature Rivalry detects uncertainty by finding negatively correlated feature pairs. These are complementary: steering tells us features are causal; rivalry tells us features are in conflict. A model with high rivalry might be harder to steer (conflicting features resist unidirectional intervention).

Evaluation analysis and SAEBench

Qwen-Scope's evaluation analysis uses SAE features to assess benchmark redundancy. SAEBench evaluates SAEs on downstream tasks. These are inverse operations: Qwen-Scope uses SAEs to evaluate benchmarks; SAEBench uses benchmarks to evaluate SAEs. Combining both could yield a bidirectional understanding of which SAE features correspond to which capabilities.

Our Assessment

What we like

What concerns us

Verdict

Important infrastructure, not a research breakthrough. Qwen-Scope is primarily an engineering contribution: a well-engineered, open-sourced SAE suite with practical demos. It does not introduce new methods or surprising findings. But it is extremely valuable for our work because (1) it gives us SAEs for our exact model, (2) it validates that SAEs work on MoE architectures, and (3) its steering and evaluation applications suggest new experiments we can run.

Most actionable next step: Load Qwen-Scope SAEs for Qwen 3.5 35B-A3B and compare them with the SAE-Res weights we are already using. Do they identify the same features? Do their uncertainty-related features overlap? This could be done in a single GPU session.

References

← Research Synthesis · Labs Index