Details, Fiction and mamba paper
Discretization has deep connections to constant-time programs which may endow them with more Homes for example resolution invariance and automatically ensuring which the design is properly normalized. working on byte-sized tokens, transformers scale badly as each token have to "attend" to each other token leading to O(n2) scaling laws, as a result