Generative AI has the potential to transform various industries. However, current state-of-the-art models like transformers face significant challenges in computational and memory efficiency, especially when deployed on resource-constrained hardware. This PhD research aims to address these limitations by optimizing Mamba networks for hardware-aware applications. Mamba networks offer a promising alternative by reducing the quadratic complexity of self-attention mechanisms through innovative architectural choices. By leveraging techniques such as sparse attention patterns and efficient parameter sharing, Mamba networks can generate high-quality data with significantly lower resource demands. The research will focus on implementing hardware-aware optimizations to enhance the efficiency of Mamba networks, making them suitable for real-time applications and edge devices. This includes optimizing training and inference times, as well as exploring potential hardware accelerations. The goal is to advance the practical deployment of generative AI in resource-constrained domains, contributing to its broader adoption and impact.