model_rewiring

model_rewiring#

Helper classes for rewiring, ablating, and intervening on model activations.

These helpers are intended to be inserted into a model to enable analysis of the causal impact of different model components. For instance, they can be used to ablate attention heads, to implement activation patching, or to linearize parts of a model for easier comparisons.

For an example of how to use these components, see the induction heads tutorial notebook.

Classes

From

A connection between two parallel computations.

KnockOutAttentionHeads

Layer that redirects masked-out heads to attend to the <BOS> token.

LinearizeAndAdjust

Linearizes and evaluates a model around two adjusted inputs.

RewireComputationPaths

Rewires computation across parallel model runs along a worlds axis.