Latam-GPT: The Free, Open Source, and Collaborative AI of Latin America

Latam-Gpt is new Large language model under development in Latin America. The project, led by the non-profit Chilean National Center for Artificial Intelligence (CENIA), aims to help the region achieve technological independence by developing an open source model formed on Latin American languages and contexts.
“This work cannot be undertaken by a single group or a country in Latin America: it is a challenge that requires everyone’s participation,” explains Álvaro Soto, director of Cenia, in an interview with Wired in Español. “Latam-GPT is a project that seeks to create an open, free and, above all, collaborative AI model. We have been working for two years with a very ascending process, bringing together citizens from different countries who wish to collaborate.
The project is distinguished by its collaborative spirit. “We are not trying to compete with Openai, Deepseek or Google. We want a model specific to Latin America and the Caribbean, aware of the cultural requirements and the challenges that this implies, such as the understanding of the different dialects, the history of the region and the unique cultural aspects, “explains Soto.
Thanks to 33 strategic partnerships with institutions in Latin America and the Caribbean, the project brought together a corpus of data exceeding eight teraoctes of text, the equivalent of millions of books. This basis of information allowed the development of a language model with 50 billion parameters, a scale which makes it comparable to GPT-3.5 and gives it an average capacity to high to perform complex tasks such as reasoning, translation and associations.
LATAM-GPT is formed on a regional database which compiles information from 20 countries in Latin America and Spain, with an impressive total of 2,645,500 documents. Data distribution shows a significant concentration in the largest countries in the region, with Brazil the chef with 685,000 documents, followed by Mexico with 385,000, in Spain with 325,000, Colombia with 220,000 and Argentina with 210,000 documents. The numbers reflect the size of these markets, their digital development and the availability of structured content.
“Initially, we will launch a linguistic model. We expect its performance in general tasks to be close to those of large commercial models, but with higher performance in subjects specific to Latin America. The idea is that, if we ask him for relevant subjects for our region, his knowledge will be much deeper, ”explains Soto.
The first model is the starting point to develop a family of technologies more advanced in the future, including those with image and video, and for the transmission of larger models. “As this is an open project, we want other institutions to use it. A group in Colombia could adapt it to the school education system or one in Brazil could adapt it to the health sector. The idea is to open the door to different organizations to generate specific models for particular areas such as agriculture, culture and others, ”explains Cenia director.




