Thought Anchors vs Attention Anchors

10
3.0
About this visualization: We compare two methods for identifying important tokens in Chain-of-Thought (CoT) reasoning. Thought Anchors (left, blue) measure chunk importance using black-box methods from the math-rollouts dataset. Available metrics: accuracy, resampling_importance, counterfactual_importance, forced_importance, overdeterminedness. Attention Anchors (right, red) measure importance by tracing attention flow from the final answer back through the reasoning chain — a white-box approach.

The formula: score = (AD).T, where A is the attention matrix (A[i,j] = attention from token i to token j, averaged across layers and heads), D is the depth, and T is a one-hot target vector marking the answer tokens. The dot denotes matrix-vector multiplication. AD is computed using matrix multiplication. Entry (AD)[i,j] sums over all paths of length D from token i to token j, where each path's weight is the product of attention weights along that path. So (AD).T gives each token a score based on how strongly it connects to the answer through D-hop attention paths.

Grey tokens are the target (answer) tokens.

Thought Anchors (selected metric - blue = high)

Attention Anchors (computed - red = high)