KnockOutAttentionHeads#
- class penzai.toolshed.model_rewiring.KnockOutAttentionHeads[source]#
Bases:
LayerLayer that redirects masked-out heads to attend to the
<BOS>token.This layer can be inserted into a tramsformer model’s attention layer immediately after the softmax operation, in order to ablate a subset of the attention heads. It assumes that a reasonable “default” behavior for the head is to attend to the
<BOS>token, which is common for many attention heads. (This ablation may be less effective for heads that never attend toBOS.)- Variables:
head_mask (pz.nx.NamedArray) – NamedArray with 1s for heads we want to keep, and 0s for heads that should be rewritten to point to
BOS. Values between 0 and 1 will smoothly interpolate between them.
Methods
__init__(head_mask)__call__(attn_weights, **_unused_side_inputs)Attributes
head_maskInherited Methods
(expand to view inherited methods)
attributes_dict()Constructs a dictionary with all of the fields in the class.
bind_variables(variables[, allow_unused])Convenience function to bind variables to a layer.
from_attributes(**field_values)Directly instantiates a struct given all of its fields.
key_for_field(field_name)Generates a JAX PyTree key for a given field name.
select()Wraps this struct in a selection, enabling functional-style mutations.
stateless_call(variable_values, argument, /, ...)Calls a layer with temporary variables, without modifying its state.
tree_flatten()Flattens this tree node.
tree_flatten_with_keys()Flattens this tree node with keys.
tree_unflatten(aux_data, children)Unflattens this tree node.
treescope_color()Computes a CSS color to display for this object in treescope.