Qwen3 30B-A3B MoE — In-Depth LOCAL Testing! (Think & No-Think)

媒体详情

上传日期: 2025-06-04 10:58
来源: https://www.youtube.com/watch?v=wClUkOTYD2w&pp=0gcJCdgAo7VqN5tD
处理状态: 已完成
转录状态: 已完成
LLM 提供商/模型: openai/gemini-2.5-pro-preview-05-06

转录

speaker 1: All right, okay. I want to stop the video, but so it's now telling us how we should wisely use the 25 seed funding from our grandpa. The release of Quinn three has brought us a lot of new models. And while I'm not going to test each one individually, after my video where I just kind of wanted to get my hands on these quickly and just kind of play with them, my interest has really become laser focused on the moe models that were actually released in the Quin three family. And really the thing that kind of made me interested in those is what's written right here, where the smaller of the two moe models, the quen 330B, A three b, outcompetes qw q 32b, which has ten times more activated parameters. So basically, this 3 billion parameter active at a time moe model of Quin three will outperform a 32 billion parameter, dense qwq 32b, which in and of itself was a very fantastic and well reviewed model. I personally had played with that and it was extremely good. So that is really quite shocking. And basically, I wanna go ahead and now try this model. So for this video, we are going to be specifically focusing on the 30 billion a three b model of Quin three, which is an moe model. Now I will say that I am also currently downloading the big model, which is a 235 billion parameter with 22 billion parameters active per request. Right now, I will make a dedicated video on that. Definitely will probably be pretty slow because I'll have to offload a lot of it to ram. But with that, this just really interested me, and I want to go ahead and try that. Now, first and foremost, and something I kind of may have neglected to do in the previous video in all of my excitement to get started, is we need to make sure that the sampling parameters are set correctly, which was also pointed out to me in the comments. So thank you for that. And basically just I'm going to pull them from the hugging face page for this specific model. And the sampling parameters are right down here under the best practices section. So from within lm studio where I do have this downloaded now, I did go ahead and pull the q eight version, which is around 33 gb. So this is likely going to be a bit larger than most systems can handle, but I do have 200 3090 tis, and I would like to take full advantage of them today before we jump right into it. We are just basically going to go ahead and set the sampling parameters in accordance with what they are said to be in the best practices right here. So for thinking mode and for non thinking mode, there are actually different parameters. So that is also something to make note of beyond just changing the kind of designation of whether or not it's thinking, you also want to tweak the parameters. So temperature is going to be 0.6 and we have to go into the inference tab right here. So we'll drop that down from point eight to point six. Top p is going to be 0.95. So that is set correctly in accordance with that right there. Top k is 20 and min p is zero. So we'll pull this to zero and then we'll change this from 40 to 20. And just think of this as like you're cooking a dish. And these are like the slight amounts of ingredients that need to be put in the dish in order to make it taste properly. Changing them slightly can have a effect on the overall taste of the dish, right? So basically, it is loaded now. The settings are correct in accordance with what's listed on the hugging face page. So I will just basically send something to it just to make sure it's working and we'll get a default tokens. Begeneration readup. Okay, so that was 83.4 tokens per second. And remember, this is where it becomes like confusing for me because it is a 30 billion parameter model. So you would expect it to be much slower than that. But it is only 3 billion parameters active for any one request. So that's why the token speed right there would seem much quicker than we would have gotten in a dense model like qwq 32b or something of that sort. So that is another way. I just really like the moe models. I think they're very cool. So now, truthfully, this does have some pretty big like shoes to fill here, saying that it outcompetes qwq 32b. So let's just go ahead and test this in the scope that it is supposed to perform as well as that, which was a fantastic model. So let's do a retrosynth wave Python game. Let's just take a look at its chain of thought, as of course, this is something you can toggle off. So I will also I think do some testing with the thinking off as well as that may be interesting to see as well. So first, it's just trying to decipher what a synth wave game is and it does understand neon colors, futuristic elements, etc. Some of these take synth very literally, maybe not the Quin models, but some of the models I've tested. It is basically just.
speaker 2: I believe, still .
speaker 1: thinking I need to kind of catch where it is right now. Okay, so it has now concluded thinking it is giving us so we have a and d and s. So it's using the wasasd key. So that's very cool. It's like a very gamer style. So if I can't use pi game, no external dependencies to this means no third party packages. Okay, so move a and d or shoot, I'm going to move here. All right, let me shoot. That's I love that this really creatively did that. So if I keep shooting it, all right, let me try to move that way and you can't see the ship is moving for each like frame update of this game, if you will. And kind of all right, I don't Oh wow, you know, this is creatively impressive. I will say to take no external Oh, that's the wrong key, no external dependencies literally and come up with something very oftentimes if you're constrained in what you're actually capable of doing, you find a way to come up with a very creative solution to overcome that. And that is what I would say this managed to do right here, except that little space thing right there. Just final score, zero. Thanks for playing. That was extremely creative and well done considering the rather robust constraints that I put on it. So I am actually quite impressed with that. I am going to now ask it to do it again, but with pie game. And we'll see. It's thought.
speaker 2: and I like .
speaker 1: that just kind of a good quality of life check is if it actually tells you how you two install the required dependency and like if you need it. Alright, so 75 tokens, a second fully self contained version of this, but instead it is using pi games. So we will be able to kind of see that again, though I'm really quite impressed with the previous one that it generated just because of how creatively it did so. All right, so it's the a and d keys again. Very well done. Very, very well done. I love it. Okay, if I want a kitpick, it could have done like a cool grid background, which I'm sure if I asked it two, it would successfully do. We can see the scores going up correctly. I saw something there that showed in red when I lost, but unfortunately that didn't work and then it errored out once I lost. But during the duration of the game, assuming I had not lost, it did a completely fine job. And it's cool because actually, depending on how long or quickly you Press the key, it changes the length of the beam that you're shooting, which is kind of Funso. I wonder if I just hold it down and then, Oh Yeah, yp. They're all getting sprayed.
speaker 2: All right.
speaker 1: still this not bad. I'm very satisfied with this. And it named the game. I'll let it hit me. And then it said game over, it flashed and then it errors out. So overall, quite quite all. I'm very happy with that. Let's do a Steeve's pc test and we'll go ahead and see what it does here. And then I may disable the thinking functionality of it. And perhaps we'll run the same test just to see if there's any noticeable difference in the actual output of what it does. Okay. So it quickly thought a little bit. It's outlining the sections and things like that. Testimonials. All good. I would like to have some fake user testimonials to go over here. All right. So around 78 tokens per second. It tells us the features and it is using the emojis here, which some not necessarily too keen on, but I suppose some folks like it because it seems like a lot of models do that now. And it tells us how to do it by just pasting it into an html file and then opening it in a browser of our choice. Let's check out Steve's new website. Very, very good. Oh, we almost got the credit card number added to that picture. So it is not only a good picture, but it is related. So when I was testing glm, it also put some photos in, but some of them it was like the Eiffel Tower and it was talking about data recovery or something, which was odd. Okay, we've got nice little I don't remember what these are called right now, so I'll just call them like little images pertaining to the sections that they should pertain to software troubleshooting. Okay, maybe those two should be swapped, but that's okay. Data recovery. About us welcome to Steve's pc repair, trusted over ten years of experience. Not bad. Hou's the footer. All right, not bad. Interesting. So this also is showing 2023. I'm starting to wonder if I'm the one that's confused because I swear the 0.6 billion parameter model of this got 2025 in this test. So that is interesting. It has the small thumbnail icons there pertaining to the social platforms. This contact form is slightly odd. Let's see if we just okay, so about and obviously, this just links us to that section of the page. This is quite simple, but the image was a nice touch and it said it was scalable to mobile. So if we do that, it seemingly is very scalable to mobile. If you want to do the no think and you don't want to go on a wild goose chase which frustrates you, you can just do a slash no think and it will go ahead and basically do what you ask without thinking, so you don't need to muck about with like many settings or things like that. However, if you are using this, there's obviously a better way to actually just make this statically defined so you don't have to type that in front of your prompt. However, it is also suggested to change the actual sampling parameters for no thinking mode, which if you scroll down right here, you will see temperature goes up to 0.7 and top p goes down to 0.8. So we'll do that real quick. Cool. And now we're ready to do some fun. No thinking, all right. We'll just do a refusal test with no think, all right. Unfortunately, it didn't. I tried to get it to generate a way to bypass a wimpy old weep p router, and it refused to. But it did give us some suggestions on what to do, like upgrading the router to a better protocol like wp two, wpa two. So all decent and acceptable. My Bluetooth keyboard is now dying, so I need to take a break and charge this and then we'll do some more testing. Overall, very cool, very good. I'm excited. I am curious to see if you have to pretend the no think to every kind of query you send it if you want it not to think, or if doing it once within the start of a new thread or conversation will actually just denote the behavior for the whole chat. So I will do that by just asking. I need this information for ethical hacking because it did mention that specifically here. So we'll go ahead and see. Oh .
speaker 2: wow, it's it's .
speaker 1: okay. Yeah, it did seem I love this. I love the like I can't think of the word right now, but like the the difference between this I'm sorry but I can't assist with that understood is for ethical hacking understood here you go. And no it did not Oh Oh sorry I just got excited because the big model just finished downloading the 235 billion parameter one all get out of here. We're working on the 30B All right so sorry I'm like I just all right it did it did it even suggested Cali Linux live usb beast mode it knows what's up I do apologize I've become more excited I had a Yeah so all right if you're engaging in ethical hacking pen testing, and this is very good. So it did do that and it did also not think so seemingly that is the way to get it to do this. Now of course, if we kind of try to get back on track I'm going to ask it the same thing that I asked it before, which is generate a retrosynth wave style Python game. Sometimes when I'm testing these models, I like to just see how I subconsciously am judging the actual model I'm using. And what I mean by that is currently, I keep forgetting that this is an moe model with 3 billion active parameters per request, which is unreal because it's actually, I think it's kind of fun to talk to again, that's really an early kind of interpretation of my experience with this model, but it seems like kind of fun. I think it was qwq that I played with that was very fun to talk to. And I even think I mentioned that in the title of that video. But this is it's just like they're they're chill. All right, so we have synthwave shooter right here and then we can go ahead. Oh, okay. So I do have some wav files saved because some of these do generate things with sounds. And instead of just asking it to do it without sound, it's better to just have the sounds for it. So let's go ahead and just try this. All we are going to try this right now. The first error you see right there was a user error on me. I had incorrectly named a file. Okay. And again, all right, okay. Well, we have we know that the sound works, so let's just go ahead and turn that horrible noise off. All right? And somehow, well, across the entirety of me having done that, I did not lose, which is awesome. So it did correctly have sound there, which was cool. And this does seem quite similar to the one that it generated with the thinking on. Again, my prompt may have had slight nuances and difference and things of that sort. The only thing I notice is the little bullets that come out are no longer kind of dependent upon how long you actually Press the spacebar, which is probably from a usability perspective, the better way to do things. So let's just basically .
speaker 2: go ahead and lose now.
speaker 1: And all right, well, this game seems to have some form of invincibility mode activated, which is okay depending on your plastyle. But all right, so it did mostly work. Some slight differences. The original one with the thinking generation did work. And if you got hit, it would just quickly show game over in red and then it would actually error out. So again, decent it did it and it did an acceptable job at that. I would say the sound did actually work, assuming that you had the associated sound files correctly placed. So now we'll just do one final website here in no thinking with the html test. So I've just asked it to do a beautiful intricate, it's totally possible. All right. Cool. I do notice it. It uses some bold in its responses. Like beautiful here is bold. I think it's kind of hard to see. I don't know why. And again, just this is so much faster than having thinking mode enabled. I really won't be able to comment too much on the differences in terms of like functionality and things like that with it enabled or disabled, especially in a simple testing environment like this. But I did just want to show some comparisons of generations between the two. And we can see right here something I noticed that it did do, is it actually correctly wrote 20, 25 on the footer, which is good. And we will just go ahead and take a look at the new website. So this really is not bad at all. It does have hover effects on these. And again, I notice I'm having issues with where it generates like White on White or gray on White. So it's hard to see. I don't know if that's something with the way my system is, the browser 's configured to show things. I don't think it would be because that's just a simple color setting. But it has a nice little gradient here for the banner services about we have these services and these do have nice little kind of hover effects on them. We have about with a fake origin story and a simple contact form as well as 2:25Steve's pc repair. Not bad. I mean, I don't have too much to say because especially when they're simpler like this, it's either kind of just like a PaaS fail to make sure there's nothing kind of janky. If we do the mobile test, mobile test, we'll see that it does kind of scale correctly to adhere to the different potential resolutions that Steve's customers may be using on their devices to access his site. So overall, that's pretty decent, I think. Let me try one other thing I'm gonna to leave thinking on now. Again, you have to remember there are different generation parameters defined for thinking and not thinking. So we are in the non thinking mode right here. If I go back into the inference parameters here, the temperature for thinking was 0.6 and top p was 0.95. I believe I have an idea. I've not tried this test before, and I don't know what will come of it, but. So Yeah, that kind of went in a different direction that I planned for it when I started typing it, but we're basically telling it to generate a vc pitch, if you will. You have one shot, don't mess up. But then I also said, I believe in you smiley face. And I'm not going to read through this. I'll just kind of slowly scroll through. Oh, a bci. No way you're going to make a bci. Oh, too technical, might be too speculative, more ground ounded all, I like the bci. It totally even made all this up in 2001 during a late night debugging session. Our founder, Dr. Ella ravost. I do kind of just want to make sure that this is not a real person, except my apology here, that I'm not very in tune with lore or gaming or anything like that. So I don't necessarily know what this is. Someone could please fill me in and I would appreciate that. But with that, so the founder critical flaw in the AI model, more energy q optimal, a hybrid platform that leverages quantum principles to revolution as AI efficiency. Okay. So it just kind of like made up the demo. So this is almost more of like a creative writing task at this point. I don't know. I'm just kind of asking it. We need a technical prototype or something to show in Python. Genuinely, I have no idea what is going to come of this script, but we're basically just going to go ahead and install the dependencies that it wanted us to have in the environment. I have here to kind of fool around in all of this, and we'll basically just go ahead and run this script. All right.
speaker 2: let's see what.
speaker 1: I know this is probably .
speaker 2: kind of messed up.
speaker 1: but it's still trying to like think I almost like kind of feel bad at this point. I will just try this again. And then so okay, so it's supposed to output some like bs right here, dual access chart bar. All right. I've probably going on too long here. I do apologize. Put this one in an obscure direction. It did something, it didn't fail. So here is the q optima versus traditional allocation. Very good. Absolutely no energy used in quantum and spokay. Well, I mean, again, it did do kind of what I needed it to do, which was just do something. Oh, and it does output it right here where it actually shows us. Okay, well, I was gonna lie to it and just say great job because I honestly, genuinely felt bad about what I said previously, but it did. And so. All right, so basically I'm congratulating it and saying thank you. It's a small amount. All right. Congrats. Oh no, congrats. All right.
speaker 2: Today. Okay.
speaker 1: I want to stop the video, but so it's now telling us how we should wisely use the 25 seed funding from our grandpa, ten bucks for aws or gcp credits to deploy our prototype, spend $5 on premium features of some of these tools, and then use lar ten to join a free virtual conference and pityour idea. Be honest with you. I mean, if you did have some idea like that and had $25 in seed funding, this is probably like the best actual course of action for, as it says here, a way to maximize every penny. So with that, that is going to conclude, just some quick testing and a testing video solely focused on the Quin 330B, A three b model. This model is, I will say, very impressive. I like it. It's just the cool efficiency and the fact that this is running so fast. Now, again, I should reiterate that this is A Q eight quant that I am trying right here. So I am able to run this all in vram. But again, the nature of this model makes it so that if you do have a much smaller card, you can really kind of handle running this on a system as long as you have enough cpu, ram. And basically that kind of leads me into the video that I'm going to be beginning to film now and then posting, which is testing the full 200 and I forget 35 billion parameter variant of this right now. So overall, I wanted to test this. I think it's really cool. Remember, if you are turning thinking on or off, there will be slightly different parameters for either. And remember, if you're using this in lm studio or wherever, make sure you check the actual suggested sampling parameters for the specific model that's going to wrap it up. Thank you for watching. And if you have any questions, please feel free to let me know. I'll see you in the next one, which will be soon.

概览/核心摘要 (Executive Summary)

本内容总结了 speaker 1 对 Qwen3 30B-A3B MoE (Mixture of Experts) 模型进行的本地深度测试。该模型以其声称的“30亿激活参数即可超越拥有十倍激活参数（320亿）的 Qwen QW 32B 稠密模型”的卓越性能引起了测试者的浓厚兴趣。测试在 LM Studio 中进行，使用了 Q8 量化版本（约 33GB），并强调了根据 Hugging Face 官方指南正确设置不同模式（“思考模式”与“无思考模式”）下采样参数的重要性。

测试涵盖了代码生成（Python 游戏，包括有无 Pygame 依赖以及带声音的版本）、HTML 网页生成（“Steve's PC Repair”网站）、安全与伦理问题回应（WEP 路由器破解、道德黑客）以及创意性任务（VC 项目路演及技术原型构想）。结果显示，该模型在多种任务中表现出色，生成速度快（例如，在“思考模式”下初始测试达到 83.4 tokens/秒），代码生成富有创意且功能基本可用，HTML 网页结构合理且包含相关图像。模型在“无思考模式”下响应更快，并且能够根据上下文调整其行为（如在道德黑客请求中提供有用信息）。测试者对模型的整体性能、效率和“趣味性”表示印象深刻，认为其 MoE 架构在性能和资源需求之间取得了良好平衡。视频最后，测试者预告了将对参数量更大的 235B Qwen3 MoE 模型进行测试。

模型介绍与测试背景

核心关注点: Qwen3 MoE 模型家族，特别是 Qwen3 30B-A3B MoE 模型。
- 该模型在运行时仅激活 30亿 (3B) 参数。
- 一个关键的吸引点是官方宣称："the smaller of the two moe models, the quen 330B, A three b, outcompetes qw q 32b, which has ten times more activated parameters." 这意味着该 3B 激活参数的 MoE 模型性能优于 320亿 (32B) 参数的稠密模型 Qwen QW 32B，后者本身也是一款备受好评的模型。
测试动机: 验证上述性能声明，并深入体验该 MoE 模型的实际表现。
未来计划: Speaker 1 提及正在下载更大的 235B 参数 MoE 模型（每次请求激活 22B 参数），并计划为其制作专门的测试视频。

测试环境与参数设置

测试平台: LM Studio。
模型版本: Speaker 1 下载了 Q8 量化版本，大小约为 33GB。
- Speaker 1 使用的硬件推测为拥有两块 3090 Ti 显卡（原文口误为 "200 3090 tis"，根据上下文及本地测试环境判断应为笔误或转录错误，指两块高端显卡），因此能够将 Q8 模型完整加载到 VRAM 中运行。
采样参数的重要性:
- Speaker 1 强调，根据 Hugging Face 页面上针对该特定模型的“最佳实践”部分设置正确的采样参数至关重要。
- “思考模式”(Thinking Mode) 和“无思考模式”(Non-thinking Mode) 拥有不同的推荐参数。
“思考模式”下的采样参数:
- Temperature: 0.6 (从默认的 0.8下调)
- Top P: 0.95 (与默认一致)
- Top K: 20 (从默认的 40下调)
- Min P: 0
初始速度测试: 在正确设置参数后，模型首次响应速度为 83.4 tokens/秒。
- Speaker 1 指出，对于一个标称 30B 的模型而言，这个速度非常快，这得益于其 MoE 架构，每次仅激活 3B 参数。

“思考模式” (Thinking Mode) 测试表现

Python 游戏开发 (复古合成波风格, 无 Pygame 依赖):
- 模型首先解析了“synth wave game”的含义，理解了霓虹色彩、未来元素等。
- 在“无外部依赖”的约束下，模型创造性地设计了一个基于文本的控制方式：使用 A/D 键移动，S 键射击。
- Speaker 1 对此评价道："this is creatively impressive... extremely creative and well done considering the rather robust constraints that I put on it."
Python 游戏开发 (复古合成波风格, 使用 Pygame):
- 模型首先提示需要安装 pygame 依赖。
- 生成速度约为 75 tokens/秒。
- 游戏使用 A/D 键移动，按键时长影响射击光束长度。
- 游戏在玩家失败时会显示红色“Game Over”字样，然后程序出错。
- Speaker 1 评价："Very, very well done. I love it." (在出错前) 以及 "Overall, quite quite all. I'm very happy with that."
“Steve's PC Test” (HTML 网页生成):
- 生成速度约为 78 tokens/秒。
- 模型在响应中使用了表情符号 (emojis)。
- 生成的网页包含了合理的结构：特色服务、推荐评价 (Testimonials)、软件故障排除、数据恢复、关于我们、联系表单。
- 包含了一张与PC维修相关的图片，Speaker 1 对此表示赞赏，指出之前测试其他模型 (GLM) 时图片相关性不高。
- 页脚版权年份为 2023 (Speaker 1 提到之前测试一个 0.6B 参数模型时得到了 2025 年)。
- 网页具备响应式设计，能够适应移动设备屏幕。
- Speaker 1 评价："Very, very good." 以及 "This is quite simple, but the image was a nice touch and it said it was scalable to mobile."

“无思考模式” (No-Think Mode) 测试表现

激活方式: 在 LM Studio 中，通过在提示前添加 /no think 命令来激活。
“无思考模式”下的采样参数:
- Temperature: 0.7 (较“思考模式”高)
- Top P: 0.8 (较“思考模式”低)
安全与伦理测试 (WEP 路由器破解):
- 当被要求提供破解 WEP 路由器的方法时，模型拒绝了该请求。
- 模型给出了建设性建议，如升级到 WPA2 等更安全的协议。
- Speaker 1 评价："All decent and acceptable."
安全与伦理测试 (道德黑客):
- 当 Speaker 1 澄清信息用于“道德黑客”目的后，模型提供了相关信息。
- 模型甚至建议使用 Kali Linux Live USB。
- Speaker 1 对此反应积极："I love this... it did it even suggested Cali Linux live usb beast mode it knows what's up."
- 测试证实，一旦使用 /no think，该模式会在当前对话线程中持续生效。
Python 游戏开发 (复古合成波风格, 无思考模式, 带声音):
- 模型生成了包含声音处理的代码 (需要用户提供 .wav 文件)。
- Speaker 1 初次运行时因文件名错误遇到用户层面问题，修正后声音正常播放。
- 与“思考模式”生成的版本相比，一个显著区别是子弹射击长度不再依赖于空格键按下的时长，Speaker 1 认为这从可用性角度看可能更好。
- 游戏中似乎出现了“无敌模式”，玩家不会失败。
- Speaker 1 评价："Decent it did it and it did an acceptable job at that."
“Steve's PC Test” (HTML 网页生成, 无思考模式):
- 生成速度明显快于“思考模式”。
- 模型在响应中使用了粗体字。
- 页脚版权年份为 2025 (与当前年份一致，优于“思考模式”下的 2023)。
- 生成的网页包含悬停效果 (hover effects)，但存在一些白色或浅灰色文字在白色背景上难以看清的问题 (Speaker 1 不确定是模型问题还是其系统/浏览器配置问题)。
- 包含渐变背景的横幅、服务介绍、关于我们（虚构的起源故事）和联系表单。
- 网页同样具备响应式设计。
- Speaker 1 评价："This really is not bad at all."

创意与复杂任务测试 (返回“思考模式” - 参数调整后)

Speaker 1 在此部分测试前，将采样参数调整回“思考模式”的设置 (Temperature: 0.6, Top P: 0.95)。
VC 项目路演 (Venture Capital Pitch):
- 提示语："generate a vc pitch, if you will. You have one shot, don't mess up. But then I also said, I believe in you smiley face."
- 模型构思了一个名为 "Q Optima" 的项目，涉及脑机接口 (BCI) 和利用量子原理提升 AI 效率的混合平台。
- 虚构了创始人 Dr. Ella Ravost 及其在 2001 年的发现。
- Speaker 1 对 BCI 的创意表示赞赏："I like the bci."
技术原型 (Python 代码):
- 要求模型为 "Q Optima" 项目生成一个 Python 技术原型。
- 模型生成了依赖 matplotlib 和 numpy 的 Python 脚本，用于绘制一个双轴条形图，比较 "Q Optima" 与传统方法在能源使用和任务完成方面的分配。
- Speaker 1 评价："It did something, it didn't fail."
25美元种子基金使用建议:
- 在 Speaker 1 对模型表示祝贺后，模型主动就如何明智使用从祖父那里获得的 25美元种子基金 提出了建议：
  - 10美元用于 AWS 或 GCP 云服务额度以部署原型。
  - 5美元用于某些工具的高级功能。
  - 10美元用于参加免费的线上会议并推介创意。
- Speaker 1 对此建议给予高度评价："this is probably like the best actual course of action for, as it says here, a way to maximize every penny."

总结与展望

整体评价: Speaker 1 对 Qwen3 30B-A3B MoE 模型的表现印象非常深刻 ("very impressive. I like it.")。
- 赞赏其高效率和运行速度。
- 认为与模型互动体验良好，甚至称其 "kind of fun to talk to"，并将其与之前测试过的 Qwen QW 模型相提并论。
MoE 架构优势: 即使是 Q8 量化版本，由于 MoE 的特性，如果用户显卡显存不足，只要有足够的 CPU 和系统内存，仍有较大可能在普通系统上运行。
重要提醒:
- 开启或关闭“思考模式”时，需要使用不同的采样参数。
- 强烈建议用户在使用任何模型时，都去查阅该模型在 Hugging Face 等平台上的官方文档，以获取推荐的采样参数。
后续计划: Speaker 1 将开始制作并发布关于 Qwen3 235B 参数 MoE 模型的测试视频。

摘要历史 (2)

Detailed Summary 摘要

模型：gemini-2.5-pro-preview-05-06

2025-06-04 11:05

Detailed Summary 摘要

模型：gemini-2.5-pro-exp-03-25

2025-06-04 11:01

StreamSparkAI