AI Pioneer Kai-Fu Lee’s 01.AI Unveils Open Source Model
Credit: Visual China
BEIJING, November 6 (TMTPOST) – 01.AI, an AI large model startup created by Dr. Kai-Fu Lee, officially released its open-source artificial intelligence (AI) large model series called "Yi" last Sunday.
The Yi series of base models includes two models in its first public version: Yi-6B with 6 billion data parameters and Yi-34B with 34 billion data parameters. Both models are bilingual, English and Chinese, and open-source. The Yi-34B model leads the world in various benchmark evaluations, achieving a 40% cost reduction in model training based on a powerful infrastructure. It can simulate models of up to one trillion parameters with a cost reduction of up to 50%. Additionally, it outperforms large-scale open-source models such as LLaMA2-34B/70B, Falcon-180B, and Baichuan2-13B.
Lee, Chairman & CEO of Sinovation Ventures and CEO of 01.AI, said: "01.AI is determined to join the global top tier. From the first person we hired, the first line of code we wrote, and the first model we designed, we have always aimed at becoming 'the World's No.1.' We have assembled a team with the potential to compete with top-tier companies like OpenAI and Google. After nearly half a year of solid work, we have made globally competitive achievements. Yi-34B has lived up to expectations, making a remarkable debut."
"As the team came together, we started writing the first line of code in June and, in just four months, we created a product that we are proud of. That is, we had remained unnoticed until we became remarkable. Today we are still at the beginning, and we will continue to make more impressive achievements," Lee said.
It is also revealed that the Yi series will release training models specializing in code and mathematics in the future. Furthermore, 01.AI has already started training models with 100 billion parameters and is expected to release them in the coming months. There are also plans to introduce applications based on AI 2.0 technology.
On March 19, Lee announced his entry into the arena of AI large models and set up a new company called Project AI 2.0, which he described as a global company dedicated to creating a new platform for AI 2.0 and AI-first productivity applications.
In early July, he founded the AI large model startup company, 01.AI, which is the 7th company incubated by Sinovation Ventures and headquartered in Beijing. The company focuses on advancing models and pre-training framework technology in seven areas, striving to create a new AI 2.0 model. Its technical experts have backgrounds in multiple tech giants across the world.
Previously, at the 2023 Zhongguancun Forum, Lee said that the popularity of generative AI, represented by GPT-4 and other large models, has spread globally, indicating that the AI 2.0 era has arrived. He predicted that this new AI 2.0 platform would significantly boost productivity and create substantial economic and social value.
"The technical threshold for large models is high, and it requires an efficient team with both research and development capabilities to formulate and implement the company's technical and product strategies," said Lee, who believes that 01.AI will focus on creating a platform and the AI 2.0 era is about application ecosystem.
According to public information, there are two experts at 01.AI: Dai Zonghong, Vice President of Technology at 01.AI and head of AIInfra, and Huang Wenhao, Vice President of Technology at 01.AI and head of Pretrained. Dai has previously worked at Alibaba and Huawei, serving as the CTO in the field of AI for Huawei Cloud. Huang holds a doctoral degree from Peking University and previously worked as a senior researcher at Microsoft Research and Beijing Academy of Artificial Intelligence.
The models released this time, Yi-6B and Yi-34B were trained with 4K sequence lengths and can expand to 32K (32,000) during inference. They are open-source, and the models are entirely open for academic research, with free commercial applications available.
The open-source Yi series of large models has two main features: they outperform models with hundreds of billions of parameters using fewer parameters and support the world’s longest context window, reaching up to 400,000 characters.
An extended context window is a critical dimension that reflects the strength of large models. Having a longer context window allows them to process more information, generate more coherent and accurate text, and better handle tasks such as document summarization and question-answering. In many vertical applications of large models, such as finance, document processing capability is essential. GPT-4 can support 25,000 Chinese characters, and Claude 2 can support around 200,000 characters.
Yi-34B takes the context window length for large models to a new level by refreshing it to 200K, allowing it to process extremely long text inputs of approximately 400,000 Chinese characters. This means it can handle scenarios like processing two volumes of The Three-Body Problem novels or understanding over 1,000 pages of PDF documents. Therefore, 01.AI has not only set a new industry record but has also become the first large model company to open an ultra-long context window to the open-source community.
In the Hugging Face open-source single pre-trained model rankings, Yi-34B ranks first globally with a score of 70.72, surpassing models with double its parameter count, LLaMA-70B, and five times the parameter count, Falcon-180B. It not only tops the rankings with a smaller parameter count but also achieves an impressive performance across different dimensions, beating thousand-billion-parameter models with a hundred-billion-parameter model. Especially in the MMLU and TruthfulQA benchmarks, Yi-34B outperforms other large models significantly.
Currently, Yi series models have been officially launched on three major global open-source community platforms: Hugging Face, ModelScope, and GitHub, with commercial applications open for developers.
The key difference between Yi-6B and Yi-34B is that Yi-6B is suitable for personal and research purposes, while Yi-34B is suitable for diverse scenarios, meeting the rigid requirements of the open-source community.
Regarding computing power, Lee said that 01.AI recognized the importance of GPU (Graphics Processing Unit) chips and has already leased a significant amount of computing power. They have collaborated with many Chinese cloud vendors and GPU cloud providers.
In terms of commercialization of large models, Lee expressed that AI 1.0 commercialization was not good in the past. However, in the AI 2.0 era, there are more commercial opportunities. He also revealed that, in addition to completing the pre-training of Yi-34B, 01.AI has already initiated the training of the next hundred-billion-parameter model.
According to Lee, 01.AI hopes more developers use the Yi series models to create their own "ChatGPT" scenarios, lead the next-generation innovations, and explore the path to a new AI era.
(This article was first published on the TMTPost App. Reporting by Lin Zhijia.)