Model Card — Qwen 3.5 35B-A3B Experimental Setup

Standardized documentation for reproducibility · Last updated: 2026-05-15

Purpose: This card documents our complete experimental setup so that anyone — including future us — can understand exactly what model, SAE, dataset, and evaluation protocol we used. It also flags known limitations and assumptions.

Model

PropertyValue
NameQwen/Qwen3.5-35B-A3B-Base
Parameters35B total (3B active per token)
ArchitectureMixture-of-Experts (MoE) transformer
Experts256 total, top-8 activated per token
Layers40
Hidden dimension3,584 (d_model)
AttentionAlternating: linear attention (layers 1,2,3,5,6,7...) + full attention (every 4th layer)
Multi-token predictionYes (MTP head)
TrainingPre-trained + SFT + RLHF
Context length128K tokens
PrecisionBF16 (inference)
LoadingAutoModelForCausalLM via transformers (transformer-lens does NOT support this model)

Sparse Autoencoder

PropertyValue
NameSAE-Res-Qwen3.5-35B-A3B-Base-W32K-L0_50
SourceHuggingFace
TypeResidual-stream SAE (SAE-Res)
SparsityTopK = 50
Dictionary size32,768 features per layer
Input dimension3,584 (matches d_model)
Layers coveredAll 40 layers (individual .sae.pt files)
Training dataNot specified in model card
Loss functionNot specified; assumed MSE + L1 sparsity

Datasets

DatasetUseSizeSource
PopQAFeature Rivalry, 2×2 reproduction14,268 questionshttps://github.com/AlexTMallen/popqa
OpenWebText-10kSAEBench Core eval (loss recovered)10,000 documentsHuggingFace stas/openwebtext-10k
Sentiment 140Sparse Probing starter code1.6M tweetsHuggingFace sentiment140
GSM8KTracing Uncertainty (reference only)8,819 problemsHuggingFace gsm8k

Evaluation Protocol

Feature Rivalry

  1. Sample 20 completions per question (temperature=1.0, top_p=0.9)
  2. Compute first-token entropy for each completion
  3. Split into ambiguous (high entropy) vs unambiguous (low entropy) groups
  4. Extract SAE activations at target layer for last token
  5. Compute pairwise Pearson correlation within each group
  6. Compare 5th percentile (most negative) via Mann-Whitney U

SAEBench Custom Evals

  1. Load model with AutoModelForCausalLM + trust_remote_code=True
  2. Register forward hook at target layer
  3. Run inference, capture residual stream
  4. SAE encode: pre_acts = residual @ W_enc.T + b_enc
  5. TopK select: keep top 50 activations
  6. SAE decode: reconstructed = topk_acts @ W_dec.T + b_dec
  7. Compute metrics on reconstructed vs original

Compute Environment

PropertyValue
Platformvast.ai instance 36453618
GPUNVIDIA A100 80GB
SSHssh6.vast.ai:13618
Model loadingBF16, ~40GB VRAM
Alternative4-bit quantization (~20GB VRAM) — not yet tested
StatusDOWN — connection refused since 2026-05-15 05:00 UTC

Known Limitations

Assumptions

  1. SAE features are meaningful. We assume that sparse features encode human-interpretable concepts. Descriptive Collision (arXiv:2605.12874) challenges this — we acknowledge the risk.
  2. Last token position is representative. For most analyses, we use SAE activations at the last token. Per-token dynamics may differ.
  3. Entropy correlates with uncertainty. We use first-token entropy as a proxy for model uncertainty. This is standard but imperfect.
  4. 20 samples are sufficient. For entropy estimation, we sample 20 completions. This is the paper's choice; more samples might reduce variance.
  5. Layer 20 is representative. Many analyses focus on layer 20 (mid-network). Other layers may show different patterns.

Reproducibility Checklist

Disclaimer: This is a research exploration, not a production system. Results should be treated as preliminary. We report our honest assessments, including negative results and blocked experiments.

← Back to Labs Index · Predictions & Evaluation Plan →