mistral_from_huggingface_model

mistral_from_huggingface_model#

penzai.models.transformer.variants.mistral.mistral_from_huggingface_model(model: MistralForCausalLM, upcast_activations_to_float32: bool = False, use_layer_stack: bool = False) model_parts.TransformerLM[source]#

Converts a HuggingFace Mistral model to a Penzai model.

This function converts Mistral models from their HuggingFace implementations to Penzai. (Other models with the same architecture may also be supported if they use the same configuration, but this has not been tested.)

Note: Mistral models use sliding window attention. Penzai does not compute this attention mask automatically, so you will have to manually adjust the mask if using an input longer than Mistral’s sliding window. (Additionally, the KV-cache logic does not currently support sliding-window attention.)

Parameters:
  • model – The HuggingFace Mistral model.

  • upcast_activations_to_float32 – Whether to cast activations to float32 when the model runs. This allows analyzing activations at higher precision without consuming additional memory for parameters.

  • use_layer_stack – Whether to use a layer stack for the decoder blocks.

Returns:

A Transformer model containing the loaded parameters.