GroupedQueryAttentionΒΆ
Dot-product attention sharing keys and values across heads.
Abstract Signature:
GroupedQueryAttention(embed_dim: int, num_heads: int, num_kv_heads: int)
PyTorch
API:
βStrategy: Custom / Partial
Dot-product attention sharing keys and values across heads.
Abstract Signature:
GroupedQueryAttention(embed_dim: int, num_heads: int, num_kv_heads: int)
β