KnockOutAttentionHeads#
- class penzai.deprecated.v1.toolshed.model_rewiring.KnockOutAttentionHeads[source]#
Bases:
Layer
Layer that redirects masked-out heads to attend to the
<BOS>
token.This layer can be inserted into a tramsformer model’s attention layer immediately after the softmax operation, in order to ablate a subset of the attention heads. It assumes that a reasonable “default” behavior for the head is to attend to the
<BOS>
token, which is common for many attention heads. (This ablation may be less effective for heads that never attend toBOS
.)- Variables:
head_mask (pz.nx.NamedArray) – NamedArray with 1s for heads we want to keep, and 0s for heads that should be rewritten to point to
BOS
. Values between 0 and 1 will smoothly interpolate between them.
Methods
__init__
(head_mask)__call__
(attn_weights)Attributes
head_mask
Inherited Methods
(expand to view inherited methods)
attributes_dict
()Constructs a dictionary with all of the fields in the class.
from_attributes
(**field_values)Directly instantiates a struct given all of its fields.
input_structure
()Returns the input structure of this layer.
key_for_field
(field_name)Generates a JAX PyTree key for a given field name.
output_structure
()Returns the output structure of this layer.
select
()Wraps this struct in a selection, enabling functional-style mutations.
tree_flatten
()Flattens this tree node.
tree_flatten_with_keys
()Flattens this tree node with keys.
tree_unflatten
(aux_data, children)Unflattens this tree node.
treescope_color
()Computes a CSS color to display for this object in treescope.