sampling_mode

sampling_mode#

Sampling-mode adapters for the Gemma model.

This file includes the kv-cache sampling variant of the Gemma model. This variant is intended to be hot-swapped for the main Gemma variant: you should generally start by loading a model_core.GemmaTransformer and then converting it to a GemmaKVCachingTransformer using GemmaKVCachingTransformer.from_uncached.

The layers defined here follow the same conventions documented in the module docstring for model_core.

Classes

GemmaKVCachingAttention

Gemma-specific configuration of the key-value-caching attention layer.

GemmaKVCachingInputs

Input structure for the GemmaKVCachingTransformer.

GemmaKVCachingState

Sampling state for the key-value-caching Gemma variant.

GemmaKVCachingTransformer

Top-level Gemma transformer in cached autoregressive sampling mode.