sampling_mode#
Sampling-mode adapters for TransformerLM models.
This file includes the kv-cache sampling mode of the base TransformerLM model.
This mode is intended to be hot-swapped for the main TransformerLM
implementation: you should generally start by loading a
model_parts.TransformerLM
and then converting it to a KVCachingTransformerLM
using KVCachingTransformerLM.from_uncached
.
The layers defined here follow the same conventions documented in the module
docstring for model_parts
. In addition:
Where applicable, “kv_token_positions” is the name of the side input that provides the position of each token for the purposes of positional embeddings.
Where applicable, “cache_end_index” is the name of the side input that identifies the current length of the key/value cache state.
Classes
Top-level transformer in (stateful) cached autoregressive sampling mode. |