Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Attention ¶
func Attention(ctx ml.Context, query, key, value ml.Tensor, scale float64, cache kvcache.Cache) ml.Tensor
Attention implements scaled dot-product attention for transformer models: Attention(Q, K, V) = softmax(QK^T/√d_k)V
Parameters:
- ctx: Context for tensor operations
- query: Query tensor (Q) with shape [d_k, heads, seq_len_q]
- key: Key tensor (K) with shape [d_k, kv_heads, seq_len_k], can be nil to read from cache only
- value: Value tensor (V) with shape [d_v, kv_heads, seq_len_k], can be nil to read from cache only
- scale: Scaling factor, typically 1/√d_k where d_k is the key dimension
- cache: KV cache to store key/value and get past history, can be nil to only use provided key/value
Returns:
Attention output with shape [d_v, heads, seq_len_q]
Types ¶
type LinearBatch ¶
Source Files
¶
Click to show internal directories.
Click to hide internal directories.