KnockOutAttentionHeads

KnockOutAttentionHeads#

class penzai.toolshed.model_rewiring.KnockOutAttentionHeads[source]#

Bases: Layer

Layer that redirects masked-out heads to attend to the <BOS> token.

This layer can be inserted into a tramsformer model’s attention layer immediately after the softmax operation, in order to ablate a subset of the attention heads. It assumes that a reasonable “default” behavior for the head is to attend to the <BOS> token, which is common for many attention heads. (This ablation may be less effective for heads that never attend to BOS.)

Variables:

head_mask (pz.nx.NamedArray) – NamedArray with 1s for heads we want to keep, and 0s for heads that should be rewritten to point to BOS. Values between 0 and 1 will smoothly interpolate between them.

Methods

__init__(head_mask)

__call__(attn_weights, **_unused_side_inputs)

Attributes

head_mask

Inherited Methods

(expand to view inherited methods)

attributes_dict()

Constructs a dictionary with all of the fields in the class.

bind_variables(variables[, allow_unused])

Convenience function to bind variables to a layer.

from_attributes(**field_values)

Directly instantiates a struct given all of its fields.

key_for_field(field_name)

Generates a JAX PyTree key for a given field name.

select()

Wraps this struct in a selection, enabling functional-style mutations.

stateless_call(variable_values, argument, /, ...)

Calls a layer with temporary variables, without modifying its state.

tree_flatten()

Flattens this tree node.

tree_flatten_with_keys()

Flattens this tree node with keys.

tree_unflatten(aux_data, children)

Unflattens this tree node.

treescope_color()

Computes a CSS color to display for this object in treescope.