Title: Democratizing Open LLMs for Global Sovereign AI Applications
Abstract: As nations increasingly recognize the strategic importance of artificial intelligence, many efforts have emerged to develop Sovereign AI that is aligned with priorities of different national environments. In this talk, I’ll describe Switzerland’s AI Initiative, its goals and organization, and opportunities to work with us to develop open AI models that can serve the rest of the world. Then, I’ll highlight some recent research projects related to the development of open LLMs for global contexts. First, I’ll discuss our studies on formalizing the data compliance gap, a measure of the performance cost to complying with data-related regulations that protect the authorship rights of content providers. Then, I’ll discuss challenges in curating multilingual pretraining corpora to train models that are capable in both high-resource and low-resource languages. Finally, I’ll discuss how evaluation of multilingual LLMs must move beyond translated benchmarks to truly reflect the cultural specificities of language environments where LLMs would be deployed.