gpt_neox#
Transformer variant for GPT-NeoX models.
The GPT-NeoX architecture is used by the GPT-NeoX-20B model (Black et al., 2022) and the Pythia model scaling suite (Biderman et al., 2023).
Features of the architecture:
Full multi-head attention
Rotary positional embeddings (Su et al., 2021), but only applied to a subset of positions,
Parallel Transformer layer formulation (Wang & Komatsuzaki, 2021)
Biases for all kernels
Classes
Configuration parameters for a GPT Neo-X transformer. |
Functions
|
Builds an attention block from a configuration. |
|
Builds a GPT-NeoX "parallel" transformer block from a configuration. |
|
Creates a feedforward block. |
|
Builds a Llama-like transformer model from a configuration. |
|
Converts a GPT-NeoX model to a Penzai model. |