attention-based networks

·

Recurrent networks are rapidly being replaced by a newer form of network based on the idea of attention [11]. The difference in attention-based networks is that they work on whole sequences rather than one token at a time. They include a processing block known as a transformer that uses attention to provide a mechanism with which the network can learn how each token in the input sequence influences other tokens.

Link:: The Little Learner

Обратные ссылки