TransformerLM

TransformerLM#

class penzai.models.transformer.model_parts.TransformerLM[source]#

Top-level transformer decoder wrapper.

This class is a simple wrapper that holds configuration data and runs safety checks.

Variables:

body (pz.nn.Layer) – The implementation of the transformer.
metadata (TransformerMetadata) – The axis size and dtype info for the transformer.

Methods

`__init__`(body, metadata)
`__call__`(tokens, *[, token_positions])	Scores log-probabilities for the given inputs.

Attributes

`body`
`metadata`

Inherited Methods

(expand to view inherited methods)

`attributes_dict`()	Constructs a dictionary with all of the fields in the class.
`bind_variables`(variables[, allow_unused])	Convenience function to bind variables to a layer.
`from_attributes`(**field_values)	Directly instantiates a struct given all of its fields.
`key_for_field`(field_name)	Generates a JAX PyTree key for a given field name.
`select`()	Wraps this struct in a selection, enabling functional-style mutations.
`stateless_call`(variable_values, argument, /, ...)	Calls a layer with temporary variables, without modifying its state.
`tree_flatten`()	Flattens this tree node.
`tree_flatten_with_keys`()	Flattens this tree node with keys.
`tree_unflatten`(aux_data, children)	Unflattens this tree node.
`treescope_color`()	Computes a CSS color to display for this object in treescope.

__call__(tokens: pz.nx.NamedArray, *, token_positions: pz.nx.NamedArray | None = None, **side_inputs) → pz.nx.NamedArray[source]#

Scores log-probabilities for the given inputs.

Parameters:

tokens – Array of token IDs, as an integer named array with a “seq” axis and possibly batch axes. Usually starts with the beginning-of-sequence token.
token_positions – Array of token positions, as an integer named array with a “seq” axis and possibly batch axes. Usually starts with 0. Inferred to start from 0 and increment along the “seq” axis if not provided.
**side_inputs – Side inputs, which will be forwarded to the body.

Returns:

The final matrix of logits from the embedding decoding layer, which (in the normal configuration) will have axes “seq” and “vocabulary”.