gpt_neox

gpt_neox#

Transformer variant for GPT-NeoX models.

The GPT-NeoX architecture is used by the GPT-NeoX-20B model (Black et al., 2022) and the Pythia model scaling suite (Biderman et al., 2023).

Features of the architecture:

  • Full multi-head attention

  • Rotary positional embeddings (Su et al., 2021), but only applied to a subset of positions,

  • Parallel Transformer layer formulation (Wang & Komatsuzaki, 2021)

  • Biases for all kernels

Classes

GPTNeoXTransformerConfig

Configuration parameters for a GPT Neo-X transformer.

Functions

build_gpt_neox_attention(name, ...)

Builds an attention block from a configuration.

build_gpt_neox_block(name, init_base_rng, config)

Builds a GPT-NeoX "parallel" transformer block from a configuration.

build_gpt_neox_feedforward(name, ...)

Creates a feedforward block.

build_gpt_neox_transformer(config[, ...])

Builds a Llama-like transformer model from a configuration.

gpt_neox_from_huggingface_model(model[, ...])

Converts a GPT-NeoX model to a Penzai model.