sampling_mode#
Sampling-mode adapters for the Gemma model.
This file includes the kv-cache sampling variant of the Gemma model. This
variant is intended to be hot-swapped for the main Gemma variant: you should
generally start by loading a model_core.GemmaTransformer
and then
converting it to a GemmaKVCachingTransformer
using
GemmaKVCachingTransformer.from_uncached
.
The layers defined here follow the same conventions documented in the module
docstring for model_core
.
Classes
Gemma-specific configuration of the key-value-caching attention layer. |
|
Input structure for the |
|
Sampling state for the key-value-caching Gemma variant. |
|
Top-level Gemma transformer in cached autoregressive sampling mode. |