gpt_neox

gpt_neox#

Transformer variant for GPT-NeoX models.

The GPT-NeoX architecture is used by the GPT-NeoX-20B model (Black et al., 2022) and the Pythia model scaling suite (Biderman et al., 2023).

Features of the architecture:

Full multi-head attention
Rotary positional embeddings (Su et al., 2021), but only applied to a subset of positions,
Parallel Transformer layer formulation (Wang & Komatsuzaki, 2021)
Biases for all kernels

Classes

Configuration parameters for a GPT Neo-X transformer.

Functions

`build_gpt_neox_attention`(name, ...)	Builds an attention block from a configuration.
`build_gpt_neox_block`(name, init_base_rng, config)	Builds a GPT-NeoX "parallel" transformer block from a configuration.
`build_gpt_neox_feedforward`(name, ...)	Creates a feedforward block.
`build_gpt_neox_transformer`(config[, ...])	Builds a Llama-like transformer model from a configuration.
`gpt_neox_from_huggingface_model`(model[, ...])	Converts a GPT-NeoX model to a Penzai model.