attention

attention#

Base attention dataflow combinators.

This module contains basic primitives for attention operations in Transformer neural networks. These primitives are intentionally as simple as possible, and do not include the actual initialization logic or attention weight computation. Instead, they abstract away the core dataflow patterns across training and kv-cache inference modes.

Classes

ApplyAttentionMask

Applies an attention mask to its input logit array.

Attention

A basic attention combinator.

KVCachingAttention

Key/value caching variant of Attention.