sampling_mode

sampling_mode#

Sampling-mode adapters for the Gemma model.

This file includes the kv-cache sampling variant of the Gemma model. This variant is intended to be hot-swapped for the main Gemma variant: you should generally start by loading a model_core.GemmaTransformer and then converting it to a GemmaKVCachingTransformer using GemmaKVCachingTransformer.from_uncached.

The layers defined here follow the same conventions documented in the module docstring for model_core.

Classes

`GemmaKVCachingAttention`	Gemma-specific configuration of the key-value-caching attention layer.
`GemmaKVCachingInputs`	Input structure for the `GemmaKVCachingTransformer`.
`GemmaKVCachingState`	Sampling state for the key-value-caching Gemma variant.
`GemmaKVCachingTransformer`	Top-level Gemma transformer in cached autoregressive sampling mode.