AttentionΒΆ

Computes scaled dot product attention on query, key and value tensors, using an optional attention mask if passed. This operator covers self and cross variants of the attention operation based on sequence lengths of K, Q and V. For self attention, kv_sequence_length equals to q_sequence_length. …

Abstract Signature:

Attention(Q, K, V, attn_mask, past_key, past_value, nonpad_kv_seqlen: int, is_causal: int, kv_num_heads: int, q_num_heads: int, qk_matmul_output_mode: int, scale: float, softcap: float, softmax_precision: int)

Keras

API: keras.layers.Attention
Strategy: Direct Mapping

TensorFlow

API: keras.layers.Attention
Strategy: Direct Mapping

Apple MLX

API: mlx.nn.layers.transformer.TransformerEncoderLayer.attention
Strategy: Direct Mapping