Tutorial 3 | IT:U

Tutorial title: xLSTM – Extended Long Short-Term Memory

Abstract: Language models power transformative applications such as tool-using digital assistants, biological sequence modeling, and autonomous robotic systems. Currently, these language models are predominantly based on Transformer architectures. While Transformers have yielded impressive results, their quadratic runtime dependency on sequence length complicates their use for long sequences and slows democratization. Recently, the recurrent xLSTM architecture has been shown to perform favorably compared to Transformers and modern state-space model (SSM) architectures in both natural and scientific domains. Similar to SSMs, xLSTMs have a linear runtime dependency on sequence length and allow for constant-memory decoding at inference time, which makes them prime candidates for modeling long-range dependencies.
This tutorial provides a comprehensive introduction to the xLSTM architecture, tracing its theoretical foundations from classical RNNs through modern linear attention mechanisms. We explore how xLSTMs achieve competitive performance while maintaining computational efficiency, examining empirical scaling laws that demonstrate comparable performance to Transformers at significantly reduced costs. The tutorial showcases xLSTM’s versatility through diverse applications: modeling biological sequences at genome scale, enabling low-latency robot control systems, and efficient knowledge transfer via transformer distillation. These case studies underscore the potential of xLSTM as a prime candidate for foundational models across scientific and engineering domains.