TransformerΒΆ

Transformer layer with multi-headed attention.

Abstract Signature:

Transformer(input_dims: int, hidden_dims: int, num_heads: int, dim_per_head: int)

PyTorch

API: torch.nn.TransformerEncoderLayer
Strategy: Direct Mapping

Apple MLX

API: mlx.nn.Transformer
Strategy: Direct Mapping

PaxML / Praxis

API: praxis.layers.Transformer
Strategy: Direct Mapping