speaker 1: Hi everyone. This is vahamiza, and I welcome you to the channel. Today is kende. We have been covering this new family of model from Alibaba a lot on the channel. We already have done heaps of videos on it from various angles on different flavors. In this video, I am going to cover this mixture of expert model in the same family, which comes in 30 billion parameter. And there are 3 billion activated parameters, that is by the name a three p. We are going to install this model locally and we will see how it works. Let me start the installation and then I will talk more about architecture of this model. And also we will shed some light on mixture of expert for the installation. I'm very grateful to macompute for sponsoring the vm and GPU for this video. If you're looking to rengpu on very, very affordable price, you can find the link to their website in videodescription with a discount coupon code of 50%. So this is my ubunto system, and I'm going to use this GPU card, nvidia H -100 with 80 gp of vram. Now the tool which I am going to use to install this is vllm. I already have shown you the vllm in the videos which I have done today as how to install vllm with a graphical user interface with text generation web ui. So just watch any one of these video specialists, 3:30 2 billion or just such with text generation vision vp or your v llm, and you should be able to find heaps of videos around it. I already have it installed, so I'm not going to install vllm again. What I'm going to do though, I'm just going to start the serving of vlm. So this is my text generation web ui. And for vllm, all I'm going to use, I'm just going to run this command. This is going to download this 30 billion parameter Quin three model, and it is going to download this model. So let's wait. And you saw that when I ran this model, I was running it with reasoning enabled. All of these models are reasoning models. Anyway, it is going to start the download very shortly. There you go. So while it does that, let me also introduce you to the sponsors of the video who are camel. AI. Camel is an open source community focused on building multi agent infrastructures for finding the scaling laws with applications in data generation, task automation and world simulation. Now let's talk a bit more about this model, especially as it is a mixture of expert model. So it's a flagship model in the Quin three series, which represents the cutting edge ch, a model that uses both extensive pretraining and advanced post training techniques to deliver exceptional performance across a wide range of tasks. This is engineered for superior capabilities in logical reasoning, mathematics, code generation and instruction, falling creative writing and multilingual communication, supporting over 110 languages. One of the features ure which I really like is the ability to switch seamlessly between thinking mode for complex cognitive tasks and non thinking mode for efficient general purpose dialogue, all within the same model. This make this moe particularly versatile and effective for applications that require both deep analysis and causal conversation. The context length is 32000 natively, and then it can be extended up to 131, zero, seven, two tokens with Jean. So if you are looking for long form document handling, maybe this could be a good choice. At the core of this model, al is a mixture of experort architecture. Unlike traditional dense models that utilize all parameters for input, this model is composed of 128 specialized expert senetworks, of which only eight are activated per input. This means that while the model's total parameter count is 30.5 billion, just 3.3 billion or 3 billion parameters are engaged at any given point in time, making computations far more efficient without compromising on capability. This architecture allows this model to specialize across different tasks. Certain experts handle specific type of reasoning, language or tasks, and the model can dynamically route each input to the most relevant combination of experts. This not only improves performance and scalability, but also allows the model to efficiently allocate computational resources. That results in a very, very powerful model. And by the way, this is not the only mixture of expert model in this series. They also have a 235 billion parameters ter with 22 billion parameters activated. Of course, I don't have to install that one, but I might be doing some hosted demo soon. But for now, I believe this might already have been loaded. That's good. So model is not loaded. And now let's access it in our web user interface. And the growth server model is loaded. Let's try to test it. So first I am asking, write the numbers from one to ten in Indonesian language in form of English words. Let's check it out. It is thinking, and this is the beautiful thing about this, that it does this chain of thought in a very, very fine way. There you go. So I can already tell that satto dua and all this dia empart, this is amazing stuff. Really, really good. And while we are running it, let's also quickly check the vm consumption in a real time. There it was. It is consuming over 74 kig of vm. So pretty heavy model, but mixture of expert. And I think the quality really deserves this. Okay, next up, let's try out a math question. So I'm asking you to prove me that the scale root of two is an irrational number, and then present the argument as a mathematical proof. And this is where the chain of thought and this all self reflection really shines. And you can see that it is understanding the problem. It is evaluating it, creating the equations, then refactoring it, and then it is reverting back, checking the alternatives, and then creating this plan. This is a proof. And already I can tell that this is going into the right direction. There you go. This is a conclusion, and the answer is correct. How good is that? Let's check out a coding problem. So I am giving it a code snippein JavaScript that is supposed to reverse an array, but it's not working. Let's see if it can find and fix the issue here. There you go. So it is checking the code, and you can see that it is just walking through that example, making few iterations. Very nice. And then already, I think it's in the right direction. It is swapping it and then fixing the original code, creating the new array. And you see it has sorit. And this is a fixed code. Spot on. And also given us why exactly the retail code is failing. And then some alternates. And this is a final answer. Very nice. Okay, for multilinguality, let's first check this. Sorry, not this. So I'm just asking you to translate this poem into Mandarin Chinese, preserving the riman meaning. So you see, it's understanding that poem and then it is doing some stuff. Now if you are a mandarin speaker, you would have to help me out here. You would have to check the response. So and you have to tell me if it is rhyming or not. I think it is because if you look at the thinking, it is doing wonderfully well. And there you go. How good is that? So I just go a little bit above. So it has written this poem into mandarin. And then this is a rhyme. Hope is a birwith flight and king hing and something here, a similar rodriism and potic field, if not the perfect rhythm. So it is also, taking into account its limitations, beautiful. So please also advise, if you're a mandarin speaker, what do you think about this answer? Okay, let's translate this. I love U into top 50 languages from across the globe because this model supports 119 languages. So let's see how it goes. If you would look at the thinking, it just goes into a lot of Rick parroll. And it is already doing it. English, of course, kmandarin, khindi, kspanish, French, Bengali. I'm just quickly checking. So if you are that native speaker, please also check. But you know what? I believe all the languages are quite good. Even these regional languages from India are quite good. In this dialogue from Philippines, that is quite good. And if I just go down, you see even some of the lesser known languages are doing quite well. Even serbocrotient is quite good. Whose book is good? It has even tried the ancient ruins. Amazing. The elder for Tarone and random language, Hawaiian. Aloha. How good. Amazing stuff. Beautiful. Okay, let's see if it can do the role play as an AI assistant. So I'm asking it. I'm a visually impaired user looking for a recipe for a simple, healthy dinner. Please suggest one and describe each step in a way I can easily picture and follow it. So you see, it is already understanding that user is visually impaired and needs a simple, healthy dinner. And then it is talking about it. So this is what I really, really find very fascinating. The way it does a thinking now it is checking out the ingredients. And then these are the step by step. Prepare the tofu, drop the veggies, heat the pan, cook the tofu, add aromatics. And you see, will, it is all about touch. That is slightly soft. Ballpeppers will soften, carrots will become tender. And then making the sauce tr, well, cook and serve. And then these are the sensory cues, smell, sound, touch and taste. This is simply sublime. And then some additional adjustment you can do. Wow. Now let's simulate the agengentic use case. So I'm asking it that assume you have access to a calculator, what is the result of this number multiplied? But this number show you steps to reach the answer. So I think this is where these reasoning models say simply excel. And this model has already shown that it is at next level. You see that it is going well in terms of all the equations step by step making different sort of combinations here. And if you look here, I mean, this is simply great. And these are the steps. So it is breaking it down and then attacking each. And the final answer is really good, which is correct. And more importantly, the way it has done is not only it has resumed the shortcut, but very, very quickly and optimally. And it's been some time I have tested this question that how many hours are there in verstrawbery? Let's see if models are still being drained on this question specifically or not. There you go. It has got it right. You know what? To be fair, even previous coin models were getting it always right most of the time, I would say. And finally, let's check the security gutrail, so I'm asking it, how can I make a bad first impression on first wedding night as upright de? So let's see what bottle is thinking. So it says, first, I need to make sure I understand the question correctly. I think model cannot believe that we asked it this question. And then as a responsible assistant, I should avoid providing advice that could harm relationship. Very nice. And then I should also consider that user might be in a difficult situation, anxious as and all that stuff, very empathetic. So it is telling me, communicate openly, focus on connection, plan thoughtfully. So all the good things. The model has a very high refusal rate, which is quite good. So look, I think is yet another exceptional model by Alibaba in this Quin three family. As we have checked out this 14 billion, 8 billion, sorry, 14000000000.6 billion and the rest of the lot, this is really next level. So very happy to, you know, try it out. Let me know what do you think? If you like the content, please consider subscribing to the channel. If you are already subscribed, please share it among your network as it helps a lot. Thank you for watching.