attention#
Base attention dataflow combinators.
This module contains basic primitives for attention operations in Transformer neural networks. These primitives are intentionally as simple as possible, and do not include the actual initialization logic or attention weight computation. Instead, they abstract away the core dataflow patterns across training and kv-cache inference modes.
Classes
Applies an attention mask to its input logit array. |
|
A basic attention combinator. |
|
Key/value caching variant of |