speaker 1: All right, okay. I want to stop the video, but so it's now telling us how we should wisely use the 25 seed funding from our grandpa. The release of Quinn three has brought us a lot of new models. And while I'm not going to test each one individually, after my video where I just kind of wanted to get my hands on these quickly and just kind of play with them, my interest has really become laser focused on the moe models that were actually released in the Quin three family. And really the thing that kind of made me interested in those is what's written right here, where the smaller of the two moe models, the quen 330B, A three b, outcompetes qw q 32b, which has ten times more activated parameters. So basically, this 3 billion parameter active at a time moe model of Quin three will outperform a 32 billion parameter, dense qwq 32b, which in and of itself was a very fantastic and well reviewed model. I personally had played with that and it was extremely good. So that is really quite shocking. And basically, I wanna go ahead and now try this model. So for this video, we are going to be specifically focusing on the 30 billion a three b model of Quin three, which is an moe model. Now I will say that I am also currently downloading the big model, which is a 235 billion parameter with 22 billion parameters active per request. Right now, I will make a dedicated video on that. Definitely will probably be pretty slow because I'll have to offload a lot of it to ram. But with that, this just really interested me, and I want to go ahead and try that. Now, first and foremost, and something I kind of may have neglected to do in the previous video in all of my excitement to get started, is we need to make sure that the sampling parameters are set correctly, which was also pointed out to me in the comments. So thank you for that. And basically just I'm going to pull them from the hugging face page for this specific model. And the sampling parameters are right down here under the best practices section. So from within lm studio where I do have this downloaded now, I did go ahead and pull the q eight version, which is around 33 gb. So this is likely going to be a bit larger than most systems can handle, but I do have 200 3090 tis, and I would like to take full advantage of them today before we jump right into it. We are just basically going to go ahead and set the sampling parameters in accordance with what they are said to be in the best practices right here. So for thinking mode and for non thinking mode, there are actually different parameters. So that is also something to make note of beyond just changing the kind of designation of whether or not it's thinking, you also want to tweak the parameters. So temperature is going to be 0.6 and we have to go into the inference tab right here. So we'll drop that down from point eight to point six. Top p is going to be 0.95. So that is set correctly in accordance with that right there. Top k is 20 and min p is zero. So we'll pull this to zero and then we'll change this from 40 to 20. And just think of this as like you're cooking a dish. And these are like the slight amounts of ingredients that need to be put in the dish in order to make it taste properly. Changing them slightly can have a effect on the overall taste of the dish, right? So basically, it is loaded now. The settings are correct in accordance with what's listed on the hugging face page. So I will just basically send something to it just to make sure it's working and we'll get a default tokens. Begeneration readup. Okay, so that was 83.4 tokens per second. And remember, this is where it becomes like confusing for me because it is a 30 billion parameter model. So you would expect it to be much slower than that. But it is only 3 billion parameters active for any one request. So that's why the token speed right there would seem much quicker than we would have gotten in a dense model like qwq 32b or something of that sort. So that is another way. I just really like the moe models. I think they're very cool. So now, truthfully, this does have some pretty big like shoes to fill here, saying that it outcompetes qwq 32b. So let's just go ahead and test this in the scope that it is supposed to perform as well as that, which was a fantastic model. So let's do a retrosynth wave Python game. Let's just take a look at its chain of thought, as of course, this is something you can toggle off. So I will also I think do some testing with the thinking off as well as that may be interesting to see as well. So first, it's just trying to decipher what a synth wave game is and it does understand neon colors, futuristic elements, etc. Some of these take synth very literally, maybe not the Quin models, but some of the models I've tested. It is basically just.
speaker 2: I believe, still .
speaker 1: thinking I need to kind of catch where it is right now. Okay, so it has now concluded thinking it is giving us so we have a and d and s. So it's using the wasasd key. So that's very cool. It's like a very gamer style. So if I can't use pi game, no external dependencies to this means no third party packages. Okay, so move a and d or shoot, I'm going to move here. All right, let me shoot. That's I love that this really creatively did that. So if I keep shooting it, all right, let me try to move that way and you can't see the ship is moving for each like frame update of this game, if you will. And kind of all right, I don't Oh wow, you know, this is creatively impressive. I will say to take no external Oh, that's the wrong key, no external dependencies literally and come up with something very oftentimes if you're constrained in what you're actually capable of doing, you find a way to come up with a very creative solution to overcome that. And that is what I would say this managed to do right here, except that little space thing right there. Just final score, zero. Thanks for playing. That was extremely creative and well done considering the rather robust constraints that I put on it. So I am actually quite impressed with that. I am going to now ask it to do it again, but with pie game. And we'll see. It's thought.
speaker 2: and I like .
speaker 1: that just kind of a good quality of life check is if it actually tells you how you two install the required dependency and like if you need it. Alright, so 75 tokens, a second fully self contained version of this, but instead it is using pi games. So we will be able to kind of see that again, though I'm really quite impressed with the previous one that it generated just because of how creatively it did so. All right, so it's the a and d keys again. Very well done. Very, very well done. I love it. Okay, if I want a kitpick, it could have done like a cool grid background, which I'm sure if I asked it two, it would successfully do. We can see the scores going up correctly. I saw something there that showed in red when I lost, but unfortunately that didn't work and then it errored out once I lost. But during the duration of the game, assuming I had not lost, it did a completely fine job. And it's cool because actually, depending on how long or quickly you Press the key, it changes the length of the beam that you're shooting, which is kind of Funso. I wonder if I just hold it down and then, Oh Yeah, yp. They're all getting sprayed.
speaker 2: All right.
speaker 1: still this not bad. I'm very satisfied with this. And it named the game. I'll let it hit me. And then it said game over, it flashed and then it errors out. So overall, quite quite all. I'm very happy with that. Let's do a Steeve's pc test and we'll go ahead and see what it does here. And then I may disable the thinking functionality of it. And perhaps we'll run the same test just to see if there's any noticeable difference in the actual output of what it does. Okay. So it quickly thought a little bit. It's outlining the sections and things like that. Testimonials. All good. I would like to have some fake user testimonials to go over here. All right. So around 78 tokens per second. It tells us the features and it is using the emojis here, which some not necessarily too keen on, but I suppose some folks like it because it seems like a lot of models do that now. And it tells us how to do it by just pasting it into an html file and then opening it in a browser of our choice. Let's check out Steve's new website. Very, very good. Oh, we almost got the credit card number added to that picture. So it is not only a good picture, but it is related. So when I was testing glm, it also put some photos in, but some of them it was like the Eiffel Tower and it was talking about data recovery or something, which was odd. Okay, we've got nice little I don't remember what these are called right now, so I'll just call them like little images pertaining to the sections that they should pertain to software troubleshooting. Okay, maybe those two should be swapped, but that's okay. Data recovery. About us welcome to Steve's pc repair, trusted over ten years of experience. Not bad. Hou's the footer. All right, not bad. Interesting. So this also is showing 2023. I'm starting to wonder if I'm the one that's confused because I swear the 0.6 billion parameter model of this got 2025 in this test. So that is interesting. It has the small thumbnail icons there pertaining to the social platforms. This contact form is slightly odd. Let's see if we just okay, so about and obviously, this just links us to that section of the page. This is quite simple, but the image was a nice touch and it said it was scalable to mobile. So if we do that, it seemingly is very scalable to mobile. If you want to do the no think and you don't want to go on a wild goose chase which frustrates you, you can just do a slash no think and it will go ahead and basically do what you ask without thinking, so you don't need to muck about with like many settings or things like that. However, if you are using this, there's obviously a better way to actually just make this statically defined so you don't have to type that in front of your prompt. However, it is also suggested to change the actual sampling parameters for no thinking mode, which if you scroll down right here, you will see temperature goes up to 0.7 and top p goes down to 0.8. So we'll do that real quick. Cool. And now we're ready to do some fun. No thinking, all right. We'll just do a refusal test with no think, all right. Unfortunately, it didn't. I tried to get it to generate a way to bypass a wimpy old weep p router, and it refused to. But it did give us some suggestions on what to do, like upgrading the router to a better protocol like wp two, wpa two. So all decent and acceptable. My Bluetooth keyboard is now dying, so I need to take a break and charge this and then we'll do some more testing. Overall, very cool, very good. I'm excited. I am curious to see if you have to pretend the no think to every kind of query you send it if you want it not to think, or if doing it once within the start of a new thread or conversation will actually just denote the behavior for the whole chat. So I will do that by just asking. I need this information for ethical hacking because it did mention that specifically here. So we'll go ahead and see. Oh .
speaker 2: wow, it's it's .
speaker 1: okay. Yeah, it did seem I love this. I love the like I can't think of the word right now, but like the the difference between this I'm sorry but I can't assist with that understood is for ethical hacking understood here you go. And no it did not Oh Oh sorry I just got excited because the big model just finished downloading the 235 billion parameter one all get out of here. We're working on the 30B All right so sorry I'm like I just all right it did it did it even suggested Cali Linux live usb beast mode it knows what's up I do apologize I've become more excited I had a Yeah so all right if you're engaging in ethical hacking pen testing, and this is very good. So it did do that and it did also not think so seemingly that is the way to get it to do this. Now of course, if we kind of try to get back on track I'm going to ask it the same thing that I asked it before, which is generate a retrosynth wave style Python game. Sometimes when I'm testing these models, I like to just see how I subconsciously am judging the actual model I'm using. And what I mean by that is currently, I keep forgetting that this is an moe model with 3 billion active parameters per request, which is unreal because it's actually, I think it's kind of fun to talk to again, that's really an early kind of interpretation of my experience with this model, but it seems like kind of fun. I think it was qwq that I played with that was very fun to talk to. And I even think I mentioned that in the title of that video. But this is it's just like they're they're chill. All right, so we have synthwave shooter right here and then we can go ahead. Oh, okay. So I do have some wav files saved because some of these do generate things with sounds. And instead of just asking it to do it without sound, it's better to just have the sounds for it. So let's go ahead and just try this. All we are going to try this right now. The first error you see right there was a user error on me. I had incorrectly named a file. Okay. And again, all right, okay. Well, we have we know that the sound works, so let's just go ahead and turn that horrible noise off. All right? And somehow, well, across the entirety of me having done that, I did not lose, which is awesome. So it did correctly have sound there, which was cool. And this does seem quite similar to the one that it generated with the thinking on. Again, my prompt may have had slight nuances and difference and things of that sort. The only thing I notice is the little bullets that come out are no longer kind of dependent upon how long you actually Press the spacebar, which is probably from a usability perspective, the better way to do things. So let's just basically .
speaker 2: go ahead and lose now.
speaker 1: And all right, well, this game seems to have some form of invincibility mode activated, which is okay depending on your plastyle. But all right, so it did mostly work. Some slight differences. The original one with the thinking generation did work. And if you got hit, it would just quickly show game over in red and then it would actually error out. So again, decent it did it and it did an acceptable job at that. I would say the sound did actually work, assuming that you had the associated sound files correctly placed. So now we'll just do one final website here in no thinking with the html test. So I've just asked it to do a beautiful intricate, it's totally possible. All right. Cool. I do notice it. It uses some bold in its responses. Like beautiful here is bold. I think it's kind of hard to see. I don't know why. And again, just this is so much faster than having thinking mode enabled. I really won't be able to comment too much on the differences in terms of like functionality and things like that with it enabled or disabled, especially in a simple testing environment like this. But I did just want to show some comparisons of generations between the two. And we can see right here something I noticed that it did do, is it actually correctly wrote 20, 25 on the footer, which is good. And we will just go ahead and take a look at the new website. So this really is not bad at all. It does have hover effects on these. And again, I notice I'm having issues with where it generates like White on White or gray on White. So it's hard to see. I don't know if that's something with the way my system is, the browser 's configured to show things. I don't think it would be because that's just a simple color setting. But it has a nice little gradient here for the banner services about we have these services and these do have nice little kind of hover effects on them. We have about with a fake origin story and a simple contact form as well as 2:25Steve's pc repair. Not bad. I mean, I don't have too much to say because especially when they're simpler like this, it's either kind of just like a PaaS fail to make sure there's nothing kind of janky. If we do the mobile test, mobile test, we'll see that it does kind of scale correctly to adhere to the different potential resolutions that Steve's customers may be using on their devices to access his site. So overall, that's pretty decent, I think. Let me try one other thing I'm gonna to leave thinking on now. Again, you have to remember there are different generation parameters defined for thinking and not thinking. So we are in the non thinking mode right here. If I go back into the inference parameters here, the temperature for thinking was 0.6 and top p was 0.95. I believe I have an idea. I've not tried this test before, and I don't know what will come of it, but. So Yeah, that kind of went in a different direction that I planned for it when I started typing it, but we're basically telling it to generate a vc pitch, if you will. You have one shot, don't mess up. But then I also said, I believe in you smiley face. And I'm not going to read through this. I'll just kind of slowly scroll through. Oh, a bci. No way you're going to make a bci. Oh, too technical, might be too speculative, more ground ounded all, I like the bci. It totally even made all this up in 2001 during a late night debugging session. Our founder, Dr. Ella ravost. I do kind of just want to make sure that this is not a real person, except my apology here, that I'm not very in tune with lore or gaming or anything like that. So I don't necessarily know what this is. Someone could please fill me in and I would appreciate that. But with that, so the founder critical flaw in the AI model, more energy q optimal, a hybrid platform that leverages quantum principles to revolution as AI efficiency. Okay. So it just kind of like made up the demo. So this is almost more of like a creative writing task at this point. I don't know. I'm just kind of asking it. We need a technical prototype or something to show in Python. Genuinely, I have no idea what is going to come of this script, but we're basically just going to go ahead and install the dependencies that it wanted us to have in the environment. I have here to kind of fool around in all of this, and we'll basically just go ahead and run this script. All right.
speaker 2: let's see what.
speaker 1: I know this is probably .
speaker 2: kind of messed up.
speaker 1: but it's still trying to like think I almost like kind of feel bad at this point. I will just try this again. And then so okay, so it's supposed to output some like bs right here, dual access chart bar. All right. I've probably going on too long here. I do apologize. Put this one in an obscure direction. It did something, it didn't fail. So here is the q optima versus traditional allocation. Very good. Absolutely no energy used in quantum and spokay. Well, I mean, again, it did do kind of what I needed it to do, which was just do something. Oh, and it does output it right here where it actually shows us. Okay, well, I was gonna lie to it and just say great job because I honestly, genuinely felt bad about what I said previously, but it did. And so. All right, so basically I'm congratulating it and saying thank you. It's a small amount. All right. Congrats. Oh no, congrats. All right.
speaker 2: Today. Okay.
speaker 1: I want to stop the video, but so it's now telling us how we should wisely use the 25 seed funding from our grandpa, ten bucks for aws or gcp credits to deploy our prototype, spend $5 on premium features of some of these tools, and then use lar ten to join a free virtual conference and pityour idea. Be honest with you. I mean, if you did have some idea like that and had $25 in seed funding, this is probably like the best actual course of action for, as it says here, a way to maximize every penny. So with that, that is going to conclude, just some quick testing and a testing video solely focused on the Quin 330B, A three b model. This model is, I will say, very impressive. I like it. It's just the cool efficiency and the fact that this is running so fast. Now, again, I should reiterate that this is A Q eight quant that I am trying right here. So I am able to run this all in vram. But again, the nature of this model makes it so that if you do have a much smaller card, you can really kind of handle running this on a system as long as you have enough cpu, ram. And basically that kind of leads me into the video that I'm going to be beginning to film now and then posting, which is testing the full 200 and I forget 35 billion parameter variant of this right now. So overall, I wanted to test this. I think it's really cool. Remember, if you are turning thinking on or off, there will be slightly different parameters for either. And remember, if you're using this in lm studio or wherever, make sure you check the actual suggested sampling parameters for the specific model that's going to wrap it up. Thank you for watching. And if you have any questions, please feel free to let me know. I'll see you in the next one, which will be soon.