KnockOutAttentionHeads#
- class penzai.toolshed.model_rewiring.KnockOutAttentionHeads[source]#
Bases:
Layer
Layer that redirects masked-out heads to attend to the
<BOS>
token.This layer can be inserted into a tramsformer model’s attention layer immediately after the softmax operation, in order to ablate a subset of the attention heads. It assumes that a reasonable “default” behavior for the head is to attend to the
<BOS>
token, which is common for many attention heads. (This ablation may be less effective for heads that never attend toBOS
.)- Variables:
head_mask (pz.nx.NamedArray) – NamedArray with 1s for heads we want to keep, and 0s for heads that should be rewritten to point to
BOS
. Values between 0 and 1 will smoothly interpolate between them.
Methods
__init__
(head_mask)__call__
(attn_weights, **_unused_side_inputs)Attributes
head_mask
Inherited Methods
(expand to view inherited methods)
attributes_dict
()Constructs a dictionary with all of the fields in the class.
bind_variables
(variables[, allow_unused])Convenience function to bind variables to a layer.
from_attributes
(**field_values)Directly instantiates a struct given all of its fields.
key_for_field
(field_name)Generates a JAX PyTree key for a given field name.
select
()Wraps this struct in a selection, enabling functional-style mutations.
stateless_call
(variable_values, argument, /, ...)Calls a layer with temporary variables, without modifying its state.
tree_flatten
()Flattens this tree node.
tree_flatten_with_keys
()Flattens this tree node with keys.
tree_unflatten
(aux_data, children)Unflattens this tree node.
treescope_color
()Computes a CSS color to display for this object in treescope.