speaker 1: Hey everyone, I'm jyoung from quanteam Oliver growth, very happy to be here in air engineer worlfare 25. And I'm very excited to share some progress about quen. Do all of you I think you guys know about quwan and maybe you are developers. I'm very happy to share some more things about quen to all of you. Qua, a series of large memory models and large mulmodel models. And we have a dream of building a genernalist model and generalist. Before I start, I would like to share some important links to all of you. Maybe you know our product, our chat interface, quenchat, which is chat dot quen AI. And it is very easy to use. You can gralatest models. You can use our mulmodel models by uploading images and videos. And you can also interact with our omni models using voice chat and video chat. And there are some important features like web taand, deep research. Welcome to enjoy it. And if you like to know more about technical details, you can check our blowhich is quam GitHub. Io. In our bloonce, we release something new. We often release a bloso. You will and know more about technical details throughout our blocks as we keep open sourcing. So we have our codes in GitHub and we have our checkpoints in huging base. So you can download our checkpoints and play with our models. Yeah so feel free to check our websites and enjoy them. Yeah in this year, just before the spring festival, we have released a very good instruction tune models. I think it's a very good basis for larger language models, which is 12.5 max. It is a very large moe models. And we find that in multiple benchmarks, it achieves very, very competitive performance to the state of the art models at that time, including coa 3.5, GPT -4 and dcb V3. But we believe that there are more potential for a large language model, not just becoming an instruction tune model, but it can be smarter and smarter with reinforcement learning. So we have dived into doing the research and reinforcement learning, and we finally found that, well, it is really amazing to see that using ican increase its performance, especially in visiting tunlike math and coding in increase the performance quite consistently, just like for amme 2024 for a 32 billion per meter model, you can find that its performance starting from like 65 and is increasing until it is 80. So it is really amazing to build a Brezing model like qw q. And also in trepod arena, you can find that its performance, it's also very competitive with even larger models and it is in top 15 for for a long time. So we would like to combine all our efforts in our research and development to build a stronger next generation models. So very, very recently, we have released quthree. Quthree is our latest large language gry models. And we have released multiple sizes of dens and moe models, large number of models. First of all, I would like to share the a flaction model to all of you, which is a 20 235 billion per meter model. It was a total number, but only activates 22 billion parameter models. It is in moe models. So it is actually efficient, but it's very effective in comparison with very top tier models like o three mini, and it's just lagging a little bit behind Gemini 25 pro. And also for our largest dense model, it's also very competitive as well. And we have a very fast moe models. It is relatively small, although it has 30 billion total parameter models, only only activate 3 billion parameter models. But if you compare the performance with q dot, q 32 billion, the moe models, which only activate 3 billion parameter models, even can outcompete two to q 32 billion in some task ks. And for a even smaller models, very small, it is only 4 billion parameter models. But this time we have a lot of techniques in distillation, distilling large, the knowledge from large models to small models. We finally build a very good small models and with thinking capabilities, it even shows competitiveness to the fractory model in our last iteration, like when 2.5 72b. Yeah. So it is a really good model and it is really worth a try for you to play with our 4 billion per meter model and the 4 billion per meter model. And you can even deploy it in mobile devices. Yeah. So for Quin three, we have some important features. And the most important features this time it is hybrid thinking mode. So what is hybrid thinking mode? Hybrid thinking mode means that you can use thinking and non thinking in a single mode. We combine the two behaviors into a mowhat is thinking mode. Thinking. It's like before you answer the question with an detailed answer, you just start thinking. You just reflect on yourself, explore possibilities. Finally, you find that while you are ready to answer the question, so you provide the answer. So like zero one and deep C R one, they have the thinking behavior Yeah and for non thinking mode, it is just the traditional instruction tune model. It's just like a chatbot. Without thinking, without lagging, it is just provide the answer. It is near instant. But this might be the first time in the open source community that we combine the two modes into a single model. So you can use prompts or hyperparameters to control its behavior, such as you like. And once we dive into the hybrid thinking modes, we find that we can create a feature of dynamic thinking budget. So what is dynamic thinking budgets? For thinking budgets, which means the maximum thinking tokens, for example, if you have 32000 tokens for your thinking budget, so for a PaaS which requires thinking, for example, if you finish the thinking with like 8000 tokens, it is below 32000. Okay, fine. You finish thinking, provide the answer, it is good. But if you only have a think budget of 4000 tokens, so if your thinking requires more than 4000 tokens, like it requires 8000 tokens to finish your thinking, but you only have a budget of 4000 tokens, so you will stop at 4000 tokens, which means that your thinking process is truncator. So we can check the performance with larger and larger thinking budgets. And you'll find that well, the performance increase quite well with the increase of thinking budgets and with a very small thinking budget. For example, in ame 24, it only achieves just over 40. But if you have a large thinking budget like 32000 tokens, you can even achieve more than 80. This is really, really amazing with the thinking capabilities. So I hope you enjoy the hybrid thinking model. You own a single model to achieve thinking and not thinking, and you'll find something good for you. For example, in your task, you require only like 95% accuracy and you will find that well, for example, you use only 8000 tokens for your thinking budget. You can achieve over 95%. So that is quite well. You don't need to waste more tokens on your thinking. So you don't can keep your thinking budget gets with 8000 tokens. This is only an example. We would like to explore the usage just Yeah the next important features is that quthree supports over 119 languages and dialects. In qutwo point five, it only supports 29 languages. But this time we support over 119 languages and dialects. We have detailed names of the languages and the dialects that we support. You can check it. I think it would be really good for global applications. And there are a lot of people previously, especially if they are in using open weight models, the open enweight models don't support many languages quite well. So there will be more people that are capable of using large language models in their domains and in their languages. Yeah. And we have specifically increased capabilities in agents and codings, and especially, we enhanced the support for mcp, which is really popular very recently. And these are two examples to show that our models, how to use tools during thinking, who will find that it can think while it is using function calls to use the tools, and it gets the feedback from the environment and it keeps thinking. So that would be a feature that we really prefer, which is the model is capable of thinking, but it is also capable of interacting with the environment and it keeps thinking. It is really good for like inference time skating. And this is another example for the model to organize the desktop. So if it has the access to the file system, it can do things like that. It thinks, which tools should they use, and then use the tools and gets the feedback and continues thinking while it finished using the tools and it finished the task and tell you that, well, we have organized the tecdesktop quite well. And these are two very simple examples to show that we have provided better and better support for agent capabilities. And we not only would like our models to be a simple chatbot, we would like it to be really productive in our working in life to become a really, really productive agent. Yeah and these are three features of three. Yeah we have open weighted a lot of sizes, including two moe models. A small moe model has a total number of 30 billion and only activates 3 billion, but another one is 235 billion, activates 20, 20, 22 billion rambmodels. We also have six dense models. And for smaller models, you can use it for testing and you can support dramodels. But for 4 billion parameter models, you can deploy it on mobile devices. And for 32 billion parameter models, that these are models that you really prefer, and especially 32 billion parameter model, it's a strong, this shows competitiveness and you can use it for doing doing oil. You can deploy it in your local environment as well. So we also open weted dance models as well. But we believe that in this year, maybe in the next years, the future trend is belong to the moe models. So later, we will release more moe models for you to use, and there will be better support in the open source community like third party frameworks for the moe models as well. Besides building large language models, we are also building multimodel models. We have focused quite a lot on vision language models. And I think many of you maybe are using qutwo vl and now using qutwo point five l, which was released this year in January, achieves a very competitive performance in vision language benchmarks like understanding, benchmarks like mmu and also benchmarks like math vista and also general a lot of general vka benchmarks have achieved a very good performance in the benchmarks for vision language understanding. We also exploore the capabilities of thinking for vision language models. So we have built qbq as well. And we find inverse time scaling with larger maximum thinking length, which is equivalent to the thinking budget that. Talked before. So if you have a larger thinking about this, it will achieve better and better performance in reasontaespecially like mathematics. Even for vision language models, it shows similar features as well. But for mulmodel models, we what we really would like to do is to build an omi model, which accepts mulmodalities for the input, and also it is capable of generating multiple modalities like text, vision and audios. But this time, this is not a perfect state, but I think it's good. It's relatively a small model, but we are really proud of this attempt. It's a 7 billion large language model is based on it. But it is capable of accepting three modalities, including text, vision, vision, include images and videos. And also it can accept audio. And this time it can generate text and audio. Maybe in the future, our models might be capable of generating images, high quality images and videos. That would be a truly omnimodels. Now for this omni model, it is, it can be used in word chat and video chat and text chat as well. It achieved savr performance in audio tafor for for the same size models, like 7 billion per memodels. But what surprised us a little bit is that it can even achieve better performance in vision, language, understanding, house in comparison with quen 2.5 vl 7 billion, which means that we can achieve very good performance for an omni model in vision language taks. But a little bit that we have not done well, but we believe we can done we can do well is that we should recover the to performance drop in language tasks, especially for its intelligence, especially for its agent tasks. I think we can recover it by improving our data quality, improving our training training methods. But now there is still some room for improving the model capabilities in different domains and different tasks. And this is the omni models. No matter what models we keep open sourcing. We love open sourcing because open sourcing really helps us quite a lot by the developers can give us some feedbacks to help us improve our models. The interaction with the keep open source community makes us happy and makes us we have more and we are more encouraged to more good models to for for all of you. Yeah. We have a lot of title models in the open source communities, including lms and also coders and qutwo. Five coders is something that many people are using for local development and something that I can tell you that we are now building quthree coders. I think you guys know about it. We have many model sizes because we just believe that for each side, there might be a lot of users. And actually, there are there are a lot of users, no matter very, very small models, like 0.6 billion brand memodels, and there are a lot models, previously 72 billion dense models and now 235 billion moe models, there are a lot of users are using it, and they need quantized models. So we just provide sometimes models in different formats, including ggf, gbq, aw, q and mox for apple as well. We try to use apopatwo point zero for most models, so you can just use it freely. You can change the models freely in your business. You don't need to worry about too main things you don't need to access for the permission. You just directly use it. We hope that large language models and large mulmodels models, models, old foundation models can help you create good applications. And this is what we like to do. And as we are becoming popular and popular during these two years, so for qumodels, it should be supported by a lot of maybe most relative third party frameworks and api platforms as well. Yeah and we are also building products. We are also building products for for you to interact with our models. We are building agents as well. And in our qucharas I mentioned before, we have some very good features. This is something that I really like, which is called wet. By entering web Dev, you just need to insert or input a very simple prompt, things like create a Twitter website and we'll find one. It just generates a code and then some effects like artifacts. So you have a website to to see how it is strong. And you can also deploy as well. You can deploy it and get the url share to your friends to show that how creative you are. You can also create a website for your product, for example. This is a very simple prompt. Create a sunscreen product introduction website and you have a very good website. You can even click on the buttons as well. And for making a card, not just making a website, just by making a card, well, it generates the card. I really like the card. We often use the card in our Twitter. So our problem is also simple. We give it a link. So based on the link, you just create a nice looking card and we provide more information to you. You just based on the information and build a car for for us. Yeah this is something I really, really web app makes me more creative and helps me quite a lot to show our things to all to people all over the world. Yeah we also have things like deep research. By doing deep research, you just need to ask that something to write a report in what you are interested in. Yeah maybe like healthcare industry, maybe like artificial intelligence. Just ask it, give it a prompt, and it will ask you what you are going to focus on. You can tell that and or you can just say as you like. And it will start doing research by making a plan first and then doing the search step by step, writing pause step by step Yeah and keep searching. And finally gives a comprehensive report to all of it. You can download our report by a pdf. We are still improving its quality by doing reinforcement learning to build a fine tool model specifically for deep research. And we believe that there's still much room in this field. It is really hard to do reinforcement learning for the start, but once you have built a good model for this product, it will be really productive for people in their working life. Yeah. So in the future, we will do many things. There are still a lot of things for us to do to achieve agi, to build a really good foundation model and foundation in agent for all of you. And the first thing is that something really different from what other people think, we still believe that there is still much room in training. I'm happy that you you have shown your preference in our pretramodels, but we still find that there are still a lot of good data we did not put into it. There are still a lot of data we did not clean it quite well. And we find that we can use mulmodel data and to make the models more capable in doing different holes in different domains. And we also have synthetic data. And maybe we will finally do something really, really different training methods for pretraining, not just like next token prediction, maybe later we use reinforcement learning in pretraining as well. And so there still much room in prere training to go a very good basis for the chatboor agents. And this is the first thing. And the second thing is that the scaling laws, there are some changes in scaling lopreviously. We are scaling in the model sizes in the ptraining data, but now we need to scale the compute in reinforcement learning. And we have a focus on long horizon reasoning with the environment feedback. So if you train the model which is capable of interacting with the environment, keep thinking this will be something really, really competitive. So it will get the feedback from the environment. Keep thinking it will become smarter and smarter with inference time scaling. Yeah. So you'll find that it will genera very long context. And you have very long context for your input, especially when you have memory. So you use scale o context. Maybe. Finally, we are moving towards Internet context. But now we need to fix the problems of 1 million tokens quite well. And then we are marching to 10 million tokens. And then Internet contso we are scaling contwe are going to scale the context at least 1 million tokens this year for most of our models. Yeah, we're also going to scale modalities. Maybe scanning modalities doesn't increase your intelligence, but if you scale modalities, you can make your models more capable and more productive, especially with the vision language understanding. If you have vision language understanding capability, you can make a like gui agent. But before that, if you have no vision capability, you can it is almost impossible for you to make a goagent and do things like a computer use. And maybe there is still much room in scaling the modalities in either inputs and outputs. We are going to unifying understanding and general generation. For example, for the image understanding image generation at the same time, well, just like GPT -4o, they generate very interesting and high quality images. That that is something what we are going to do as well. Yeah. So based on the four things that I mentioned, so if you would like me to summarize what we are going to do in this year, next year, I think we are moving from the era of training models to training agents. We are actually training the agents, not only worth skaling with pretraining, but also skaling with rl, especially with the environment. We are actually training the agents. So I think we can say that we are now staying in the era of agents, that's all. Thank you very much for listening to my talk. And if you are interested in quen, show ot me an email and talk to me in Yeah, thanks a lot.