attention#
Base attention dataflow combinators.
This module contains basic primitives for attention operations in Transformer neural networks. These primitives are intentionally as simple as possible, and do not include the actual initialization logic or attention weight computation. Instead, they abstract away the core dataflow patterns across training and kv-cache inference modes.
Classes
Builds and applies a causal attention mask based on token positions. |
|
Builds and applies a sliding-window attention mask based on token positions. |
|
Applies an explicit attention mask to its input logit array. |
|
A basic attention combinator. |
|
Key/value caching variant of |