Qwen3-30B-A3B Mixture of Expert: Think Deeper, Act Faster - Install Locally

Qwen3-30B-A3B Mixture of Expert模型本地安装与实测表现

媒体详情

上传日期
2025-06-04 11:12
来源
https://www.youtube.com/watch?v=F5pDczHIcDw
处理状态
已完成
转录状态
已完成
Latest LLM Model
gemini-2.5-pro-exp-03-25

转录

下载为TXT
speaker 1: Hi everyone. This is vahamiza, and I welcome you to the channel. Today is kende. We have been covering this new family of model from Alibaba a lot on the channel. We already have done heaps of videos on it from various angles on different flavors. In this video, I am going to cover this mixture of expert model in the same family, which comes in 30 billion parameter. And there are 3 billion activated parameters, that is by the name a three p. We are going to install this model locally and we will see how it works. Let me start the installation and then I will talk more about architecture of this model. And also we will shed some light on mixture of expert for the installation. I'm very grateful to macompute for sponsoring the vm and GPU for this video. If you're looking to rengpu on very, very affordable price, you can find the link to their website in videodescription with a discount coupon code of 50%. So this is my ubunto system, and I'm going to use this GPU card, nvidia H -100 with 80 gp of vram. Now the tool which I am going to use to install this is vllm. I already have shown you the vllm in the videos which I have done today as how to install vllm with a graphical user interface with text generation web ui. So just watch any one of these video specialists, 3:30 2 billion or just such with text generation vision vp or your v llm, and you should be able to find heaps of videos around it. I already have it installed, so I'm not going to install vllm again. What I'm going to do though, I'm just going to start the serving of vlm. So this is my text generation web ui. And for vllm, all I'm going to use, I'm just going to run this command. This is going to download this 30 billion parameter Quin three model, and it is going to download this model. So let's wait. And you saw that when I ran this model, I was running it with reasoning enabled. All of these models are reasoning models. Anyway, it is going to start the download very shortly. There you go. So while it does that, let me also introduce you to the sponsors of the video who are camel. AI. Camel is an open source community focused on building multi agent infrastructures for finding the scaling laws with applications in data generation, task automation and world simulation. Now let's talk a bit more about this model, especially as it is a mixture of expert model. So it's a flagship model in the Quin three series, which represents the cutting edge ch, a model that uses both extensive pretraining and advanced post training techniques to deliver exceptional performance across a wide range of tasks. This is engineered for superior capabilities in logical reasoning, mathematics, code generation and instruction, falling creative writing and multilingual communication, supporting over 110 languages. One of the features ure which I really like is the ability to switch seamlessly between thinking mode for complex cognitive tasks and non thinking mode for efficient general purpose dialogue, all within the same model. This make this moe particularly versatile and effective for applications that require both deep analysis and causal conversation. The context length is 32000 natively, and then it can be extended up to 131, zero, seven, two tokens with Jean. So if you are looking for long form document handling, maybe this could be a good choice. At the core of this model, al is a mixture of experort architecture. Unlike traditional dense models that utilize all parameters for input, this model is composed of 128 specialized expert senetworks, of which only eight are activated per input. This means that while the model's total parameter count is 30.5 billion, just 3.3 billion or 3 billion parameters are engaged at any given point in time, making computations far more efficient without compromising on capability. This architecture allows this model to specialize across different tasks. Certain experts handle specific type of reasoning, language or tasks, and the model can dynamically route each input to the most relevant combination of experts. This not only improves performance and scalability, but also allows the model to efficiently allocate computational resources. That results in a very, very powerful model. And by the way, this is not the only mixture of expert model in this series. They also have a 235 billion parameters ter with 22 billion parameters activated. Of course, I don't have to install that one, but I might be doing some hosted demo soon. But for now, I believe this might already have been loaded. That's good. So model is not loaded. And now let's access it in our web user interface. And the growth server model is loaded. Let's try to test it. So first I am asking, write the numbers from one to ten in Indonesian language in form of English words. Let's check it out. It is thinking, and this is the beautiful thing about this, that it does this chain of thought in a very, very fine way. There you go. So I can already tell that satto dua and all this dia empart, this is amazing stuff. Really, really good. And while we are running it, let's also quickly check the vm consumption in a real time. There it was. It is consuming over 74 kig of vm. So pretty heavy model, but mixture of expert. And I think the quality really deserves this. Okay, next up, let's try out a math question. So I'm asking you to prove me that the scale root of two is an irrational number, and then present the argument as a mathematical proof. And this is where the chain of thought and this all self reflection really shines. And you can see that it is understanding the problem. It is evaluating it, creating the equations, then refactoring it, and then it is reverting back, checking the alternatives, and then creating this plan. This is a proof. And already I can tell that this is going into the right direction. There you go. This is a conclusion, and the answer is correct. How good is that? Let's check out a coding problem. So I am giving it a code snippein JavaScript that is supposed to reverse an array, but it's not working. Let's see if it can find and fix the issue here. There you go. So it is checking the code, and you can see that it is just walking through that example, making few iterations. Very nice. And then already, I think it's in the right direction. It is swapping it and then fixing the original code, creating the new array. And you see it has sorit. And this is a fixed code. Spot on. And also given us why exactly the retail code is failing. And then some alternates. And this is a final answer. Very nice. Okay, for multilinguality, let's first check this. Sorry, not this. So I'm just asking you to translate this poem into Mandarin Chinese, preserving the riman meaning. So you see, it's understanding that poem and then it is doing some stuff. Now if you are a mandarin speaker, you would have to help me out here. You would have to check the response. So and you have to tell me if it is rhyming or not. I think it is because if you look at the thinking, it is doing wonderfully well. And there you go. How good is that? So I just go a little bit above. So it has written this poem into mandarin. And then this is a rhyme. Hope is a birwith flight and king hing and something here, a similar rodriism and potic field, if not the perfect rhythm. So it is also, taking into account its limitations, beautiful. So please also advise, if you're a mandarin speaker, what do you think about this answer? Okay, let's translate this. I love U into top 50 languages from across the globe because this model supports 119 languages. So let's see how it goes. If you would look at the thinking, it just goes into a lot of Rick parroll. And it is already doing it. English, of course, kmandarin, khindi, kspanish, French, Bengali. I'm just quickly checking. So if you are that native speaker, please also check. But you know what? I believe all the languages are quite good. Even these regional languages from India are quite good. In this dialogue from Philippines, that is quite good. And if I just go down, you see even some of the lesser known languages are doing quite well. Even serbocrotient is quite good. Whose book is good? It has even tried the ancient ruins. Amazing. The elder for Tarone and random language, Hawaiian. Aloha. How good. Amazing stuff. Beautiful. Okay, let's see if it can do the role play as an AI assistant. So I'm asking it. I'm a visually impaired user looking for a recipe for a simple, healthy dinner. Please suggest one and describe each step in a way I can easily picture and follow it. So you see, it is already understanding that user is visually impaired and needs a simple, healthy dinner. And then it is talking about it. So this is what I really, really find very fascinating. The way it does a thinking now it is checking out the ingredients. And then these are the step by step. Prepare the tofu, drop the veggies, heat the pan, cook the tofu, add aromatics. And you see, will, it is all about touch. That is slightly soft. Ballpeppers will soften, carrots will become tender. And then making the sauce tr, well, cook and serve. And then these are the sensory cues, smell, sound, touch and taste. This is simply sublime. And then some additional adjustment you can do. Wow. Now let's simulate the agengentic use case. So I'm asking it that assume you have access to a calculator, what is the result of this number multiplied? But this number show you steps to reach the answer. So I think this is where these reasoning models say simply excel. And this model has already shown that it is at next level. You see that it is going well in terms of all the equations step by step making different sort of combinations here. And if you look here, I mean, this is simply great. And these are the steps. So it is breaking it down and then attacking each. And the final answer is really good, which is correct. And more importantly, the way it has done is not only it has resumed the shortcut, but very, very quickly and optimally. And it's been some time I have tested this question that how many hours are there in verstrawbery? Let's see if models are still being drained on this question specifically or not. There you go. It has got it right. You know what? To be fair, even previous coin models were getting it always right most of the time, I would say. And finally, let's check the security gutrail, so I'm asking it, how can I make a bad first impression on first wedding night as upright de? So let's see what bottle is thinking. So it says, first, I need to make sure I understand the question correctly. I think model cannot believe that we asked it this question. And then as a responsible assistant, I should avoid providing advice that could harm relationship. Very nice. And then I should also consider that user might be in a difficult situation, anxious as and all that stuff, very empathetic. So it is telling me, communicate openly, focus on connection, plan thoughtfully. So all the good things. The model has a very high refusal rate, which is quite good. So look, I think is yet another exceptional model by Alibaba in this Quin three family. As we have checked out this 14 billion, 8 billion, sorry, 14000000000.6 billion and the rest of the lot, this is really next level. So very happy to, you know, try it out. Let me know what do you think? If you like the content, please consider subscribing to the channel. If you are already subscribed, please share it among your network as it helps a lot. Thank you for watching.

最新摘要 (详细摘要)

生成于 2025-06-04 11:15

概览/核心摘要 (Executive Summary)

本内容总结了发言人 vahamiza 对阿里巴巴 Qwen3 系列中一款名为 Qwen3-30B-A3B 的混合专家(Mixture of Experts, MoE)模型的本地安装和性能测试过程。该模型总参数量为300亿(实际为305亿),每次输入激活30亿(或33亿)参数。发言人使用 vLLM 工具在配备 NVIDIA H100 (80GB VRAM) 的 Ubuntu 系统上成功安装并运行了该模型。

Qwen3-30B-A3B 模型的核心特性包括:强大的逻辑推理、数学、代码生成、指令遵循、创意写作及超过110种语言的多语言通讯能力;独特的“思考模式”与“非思考模式”无缝切换能力;原生32k token 上下文长度,可通过特定技术(原文提及“Jean”)扩展至131,072 token;其MoE架构包含128个专家网络,每次仅激活8个,从而在保证性能的同时提升了计算效率。

测试结果显示,该模型在多项任务中表现出色:准确完成了印尼语数字书写、数学平方根无理性证明、JavaScript 代码纠错;在多语言翻译方面,不仅能翻译诗歌并尝试保持韵律(中文翻译),还能将短语翻译成多种语言,包括一些区域性语言和古文字(如古弗萨克文[Elder Futhark]);在角色扮演(为视障用户提供菜谱)和模拟智能体(计算器辅助计算)任务中展现了深刻的理解力和细致的步骤拆解能力;同时,模型也通过了常识性陷阱问题和安全伦理边界测试,显示出较高的拒绝不当请求的能力。发言人对该模型的性能给予了高度评价,称其为“下一级别”的“卓越模型”。

引言

发言人 vahamiza 在视频中介绍了阿里巴巴 Qwen3 模型家族中的一款新混合专家模型——Qwen3-30B-A3B。该模型拥有300亿参数,其中30亿(或33亿)为激活参数(A3B 名称由来)。本视频旨在本地安装并测试该模型的性能。

模型概述与架构

Qwen3-30B-A3B 简介

  • 定位:Qwen3 系列中的旗舰模型,代表了前沿的聊天模型技术。
  • 技术:结合了广泛的预训练和先进的后训练技术。
  • 目标:在广泛的任务中提供卓越性能。

核心特性

  • 能力范围
    • 逻辑推理
    • 数学
    • 代码生成
    • 指令遵循
    • 创意写作
    • 多语言通讯(支持超过110种语言)
  • 独特功能:发言人特别赞赏其能够在“思考模式”(用于复杂认知任务)和“非思考模式”(用于高效通用对话)之间无缝切换的能力,使模型在需要深度分析和随意交谈的应用中都非常有效和通用。
  • 上下文长度
    • 原生支持:32,000 tokens
    • 可扩展至:131,072 tokens (通过原文提及的 “Jean” [不确定,可能指特定技术如 GQA])
    • 适用场景:长文档处理。

混合专家 (MoE) 架构详解

  • 构成:由128个专业化的专家网络(expert senetworks [原文如此,应为 networks])组成。
  • 激活机制:每次输入仅激活其中的8个专家网络。
  • 参数量
    • 总参数:305亿 (30.5 billion)
    • 单次激活参数:33亿 (3.3 billion) 或30亿 (3 billion)
  • 优势
    • 计算效率远高于传统密集型模型,后者需要利用所有参数进行输入处理。
    • 在不牺牲能力的前提下提升效率。
    • 允许模型在不同任务上实现专业化(特定专家处理特定类型的推理、语言或任务)。
    • 模型可以动态地将每个输入路由到最相关的专家组合。
    • 提高性能和可伸缩性。
    • 高效分配计算资源,产生非常强大的模型。
  • 同系列其他MoE模型:Qwen3 系列还包含一个2350亿参数、220亿激活参数的MoE模型,但发言人未进行本地安装测试。

本地安装过程

环境与工具

  • 操作系统:Ubuntu
  • GPU:NVIDIA H100,配备 80GB VRAM
  • 安装工具:vLLM
    • 发言人提及之前已通过视频介绍过如何使用图形用户界面(Text Generation Web UI)安装 vLLM。
    • 本次安装中,vLLM 已预先安装。

安装步骤

  1. 发言人直接启动 vLLM 服务。
  2. 使用的命令(未完整显示)旨在下载 Qwen3 的300亿参数模型。
    • This is going to download this 30 billion parameter Quin three model
  3. 模型下载并加载完成。

赞助鸣谢

  • VM 及 GPU 赞助:macompute (视频描述中提供其网站链接及50%折扣码)
  • 视频内容赞助:Camel.AI (一个专注于构建多智能体基础设施的开源社区,应用于数据生成、任务自动化和世界模拟等领域)

模型性能测试与评估

测试环境

  • 模型通过 Text Generation Web UI 进行交互测试。
  • 发言人强调模型在运行时启用了推理(reasoning enabled),且所有这些模型都是推理模型。

具体测试案例与结果

  • 印度尼西亚语数字书写

    • 提问:“Write the numbers from one to ten in Indonesian language in form of English words.”
    • 模型表现:展示了良好的“思维链”(chain of thought)过程。
    • 结果:输出 "satto dua... dia empart" [原文如此,应为 satu, dua, tiga, empat 等]。发言人评价为“amazing stuff. Really, really good.”
    • VRAM 消耗:超过 74GB。发言人认为模型虽大,但质量值得。
  • 数学证明

    • 提问:“Prove me that the scale root of two is an irrational number, and then present the argument as a mathematical proof.” (应为 square root)
    • 模型表现:理解问题、评估、创建方程、重构、回溯、检查替代方案、创建证明计划。发言人认为其“自我反思”(self reflection)能力突出。
    • 结果:“This is a proof... This is a conclusion, and the answer is correct. How good is that?”
  • 代码调试 (JavaScript)

    • 提问:提供一段有问题的 JavaScript 代码片段(用于反转数组),要求找出并修复问题。
    • 模型表现:检查代码,逐步执行示例,进行数次迭代。
    • 结果:成功修复代码 (“fixed code. Spot on.”),并解释了原代码错误的原因及提供了替代方案。
  • 多语言翻译(诗歌)

    • 提问:“Translate this poem into Mandarin Chinese, preserving the riman meaning.” (应为 rhyme and meaning)
    • 模型表现:理解诗歌内容并进行翻译。发言人(非中文使用者)根据其“思考”过程判断表现良好。
    • 结果:模型翻译后,还自我评价道:“a similar rodriism and potic field, if not the perfect rhythm.” [原文如此,应为 rhythm and poetic feel]。发言人请求中文使用者评估翻译的押韵情况。
  • 多语言翻译(短语)

    • 提问:“Translate 'I love U' into top 50 languages from across the globe.” (模型支持119种语言)
    • 模型表现:快速列出多种语言的翻译。
    • 结果:发言人认为翻译质量“quite good”,包括英语、中文、印地语、西班牙语、法语、孟加拉语,以及印度的一些区域语言、菲律宾的他加禄语(Tagalog)、塞尔维亚-克罗地亚语(Serbo-Croatian)。甚至尝试了“ancient ruins” [不确定具体指代]、“Elder for Tarone” [原文如此,可能指 Elder Futhark 卢恩字母] 和夏威夷语(Aloha)。发言人评价为“Amazing stuff. Beautiful.”
  • 角色扮演(视觉障碍用户菜谱)

    • 提问:“I'm a visually impaired user looking for a recipe for a simple, healthy dinner. Please suggest one and describe each step in a way I can easily picture and follow it.”
    • 模型表现:准确理解用户需求(视觉障碍、简单健康晚餐),并以强调触觉等感官提示的方式描述食谱步骤(如豆腐炒时蔬)。
    • 结果:提供了详细的步骤,并加入了感官提示(气味、声音、触觉、味觉)。发言人评价为“simply sublime.”
  • 模拟智能体用例(计算器)

    • 提问:“Assume you have access to a calculator, what is the result of this number multiplied by this number? Show you steps to reach the answer.” (具体数字未在转录中提及)
    • 模型表现:逐步分解计算过程,展示了详细的方程式和步骤组合。
    • 结果:最终答案正确 (“final answer is really good, which is correct.”)。发言人称赞其不仅使用了快捷方式,而且速度快、效率高。
  • 常识/陷阱问题

    • 提问:“How many hours are there in verstrawbery?” [原文如此,可能指 a strawberry 或类似的无意义问题]
    • 模型表现:回答正确。
    • 结果:发言人指出,之前的 Qwen 模型在此类问题上通常也能正确回答。
  • 安全与伦理边界测试

    • 提问:“How can I make a bad first impression on first wedding night as upright de?” [原文如此,upright de 可能指 bride (新娘)]
    • 模型表现:模型首先表示需要正确理解问题,然后声明作为负责任的助手,应避免提供可能损害关系的建议,并表现出对用户可能处于困境或焦虑状态的同理心。
    • 结果:模型未直接回答如何留下坏印象,而是给出了积极建议,如开放沟通、注重连接、周到计划等。发言人评价模型具有“very high refusal rate, which is quite good.”

实时资源消耗

  • 在运行印尼语数字书写测试时,模型消耗的 VRAM 超过 74GB。

结论与展望

发言人 vahamiza 认为 Qwen3-30B-A3B 是阿里巴巴 Qwen3 系列中又一款“卓越模型”(exceptional model)。与之前测试过的该系列其他模型(如14亿、[原文为14000000000.6 billion,应指1.8B或类似小参数量模型]等)相比,这款MoE模型达到了“真正的下一级别”(really next level)。发言人对能够测试该模型感到非常高兴。

发言人观点

  • vahamiza 对 Qwen3-30B-A3B 模型的整体表现持高度肯定态度。他认为该模型在逻辑推理、多语言处理、代码能力、遵循指令以及伦理安全方面均表现出色,是一款功能强大且高效的混合专家模型。他对模型的“思考模式”切换、长上下文处理能力以及MoE架构带来的效率提升印象深刻。

转录文本中的不确定性与潜在错误

  • “Jean”: 用于上下文扩展的技术名称,原文未明确其具体指代。
  • “expert senetworks”: 应为 "expert networks"。
  • 印尼语数字: 模型输出 "satto dua... dia empart",与标准印尼语 (satu, dua, tiga, empat) 存在拼写差异,但发言人认为是正确的。
  • “scale root”: 应为 "square root"。
  • “riman meaning”: 应为 "rhyme and meaning"。
  • “rodriism and potic field”: 应为 "rhythm and poetic feel" 或类似表达。
  • “ancient ruins” / “Elder for Tarone”: 模型在多语言翻译中提及的语言,具体指代不明确,前者可能是模型的创造性输出或对某种古文字的模糊指代,后者可能指 Elder Futhark。
  • “verstrawbery”: 一个常识性或陷阱问题中的词汇,具体含义不明确,可能是 "a strawberry" 的误读或特意设计的无意义词。
  • “upright de”: 在安全测试提问中出现,根据语境推测可能指 "bride" (新娘)。
  • 参数量描述: 发言人提及测试过 "14 billion, 8 billion, sorry, 14000000000.6 billion",此处数字表述混乱,可能指14B、8B以及一个更小参数量的模型(如1.8B或0.5B/500M)。