speaker 1: Good morning. Thank you. Thank you all for coming. I want to talk today about Jango and data science. So I started a new job this year as a developer advocate at jet brains. You can see here with the swag working on the pie charm ide, and this means I get to focus on web tooling, which is fun, but also around data science, which I'm not gonna na say is less fun, but it's certainly less familiar to me. And like many of us, I've been head down in the web for years now, and there's plenty to keep us occupied, as we've seen with the talks today, lots of things going on with Jango, double digit prs every day, new advancements. But it's clear that in the wider Python world, data science has taken over and is where the momentum is. So it's been very eye opening for me and really good to be surrounded by colleagues who focus more on data science and to not just have an audience like this where we all agree with each other that Jango is know what we should focus on. So jet brain started with Java ID intelj, and that's still our biggest product. Pie charm is one of the top ones, but it's definitely second fiddle. And so I come to this talk again, having spent time without a web focus and without Python is even the main focus. So hopefully that's a bit of an outside perspective. So the tldr version of this keynote is that it's surprisingly easy and quite fun to train our own machine learning model. I'll walk you through that today in a Jupiter notebook. And then while deploying a production level machine learning model like ChatGPT takes a lot of resources and engineers, we can do it in Jango very easily. And I'll show you how to do that. And I have a GitHub repo, so you don't have to take notes or anything, but I'll do it all in one talk just to prove I'm not doing the typical hand waving thing. So I will do a bunch of talking, but there's also real code involved as well. And then I want to talk about what data science even means these days, because it can often seem like everything but web, you know, is it statistics? Is it AI? Is it machine learning? Is it data analysis? Is it scientific computing? We'll get into that. At the end of the day, it's all Jango, excuse me, all Python under the hood. So it's not that foreign to us. So very briefly, this is me in the slide of three books in their fifth edition, if you go and lib gen en, the pirated book database used to train all the llms. There's double digit versions of them. So I take that as some validation that you know there's more than more than three, more like 15 out there. The last six years, I've co hosted Jango chat podcast alongside Carleton Gibson, as well as co written Jango news newsletter with Jeff Triplett, who's on the board of Jango. I spent much of last year building out learnjango dot com Commonline site for all my resources. And I want to do a future talk on building a payment ments site in Django from scratch. And then since January, I've been a developer advocate at pie charm, focusing on the iwe. Just launched a ton of AI features last week. So I'm happy to talk to any of you outside this, talk about that. But this morning the focus is on Jango and data science. So let's try to define these terms a little bit. All right, I want to do a quick show of hands. Who considers themselves here a web developer, if you could raise your hand. Okay, almost everyone. All right. How about data science? Data scientist. Okay. Three. How about so that's pretty standard for the three of you. Do you consider yourselves equally data scientists and web developers? Okay. I want to talk to you after the talk. I think that's pretty rare. Most of the time. We have web developers, we have data scientists. You throw information back and forth over a wall, but there's very little actual overlap. You know, data science can seem scary, and we'll talk about it because there's lots of data, lots of maths, but they're terrified of the web in Jango. I mean, I gave a version of this talk in Boston before to mainly data scientists. And I think I sometimes and we sometimes forget how intimidating and how much knowledge there is actually in websites and using Jungo. So we'll talk about that. Jebrains has run this annual Python survey for many years now. And if you see the top two results, you may not be able to read that the top one, 44%, says data analysis for what do Python people do, followed by web development at 42%. And these trends have continued over five or six years. So it's pretty clear, even back in 2:17 when they started this survey, the this is what Python people are doing. They're doing data science or they're doing web development. But again, what even is data science? This is an old Twitter account I followed back in the day that because it's easy to feel like data science is everything but the web, but in some sense, it is kind of just statistics on a mac. And it is, I don't know, nobody seems to use windows these days. Again, just one more. But what do you actually do? Again, 80% of time, prepdata, 20% of time, complain about preparing data like this is kind of the real world reality. If you study in University, you have these beautiful algorithms, these nice clean data sets, then you go in the real world and you spend all your time cleaning the data and fit. It doesn't match. And so I saw, even back ten years ago, people with PhDs from pica school go in the real world. And it's fairly frustrating that they wanted to use all their academic mind, and they're just cleaning data the whole time. But the basic point is we have lots of math. We have big data, which requires cleaning. And in a hand wavy sense, that's kind of what data science is. But the interesting thing is, again, the amount of data is just hard to conceptualize. So we're just going to use llms as an example here, which is just one form of data science and AI. But this shows how much data you have to be trained, how much public text is available in the world. So GPT -3, which came out a couple years ago, used ten to the eleventh number of tokens. A token is just simple explanation, is characters in a word. So dog would be dog three tokens. So we're now at ten to the fifteenth tokens for all human generated public text ever. Let me say that again, that's 1 quadrillion tokens, which we can say is roughly equivalent to a character and a word. You know, Google estimated in 2010, 15 years ago, there were 129 million books published. That's probably at least double now. There's over 1 billion websites, tens of trillions of index pages. Add in social media content, emails, forums, newspapers, it's easily one to 2 quadrillion tokens available to train. And the thing is, that's just text. We're not even talking about audio, video, real world information such as a self driving car. So hard for us to conceptualize when we talk about millions of rows in a database for Jango. But ultimately, this is kind of what you're doing in data science, right? You're taking unimaginable amounts of data, and you're trying to focus it and extract insights that you can use using statistics, machine learning and computer science. And so just common examples we see in everyday life, spam filters and email mapping technologies, right? With Google Maps, Apple Maps, recommendation systems, healthcare, early detection, llms, like we talked about, finance, fraud detection, weather prediction for farming, crop yields, on and on and on. And I do want to make the point that today, Python is the dominant, almost the dominant programming language, and certainly very dominant in the web space and the data science case. But this was not the case. It was really the rise. Like this has risen in the last ten, 15 years, if we go back all the way to two, ten. Jango was only five years old. Flask had just been released on April first as a April Fool's joke about how small can you make a framework. Jangar reframework wasn't released until 2011. Starlit, the lightweight asgy, also from Tom Christie, was not until 2018. Fast api, which uses starlet, didn't come out until 2019. And Jango starting then also rolled out its own asynchronous support. And then Jango ninja, which is quite popular for apis, only came out in 2:20. So it seems like pythons everywhere now. But certainly when I started programming, Python and Jango were not dominant choices. And the same thing is in data science actually. So again, back in your time machine to 2010, R and Matt lab were much more dominant pandas, which is now a default for data manipulation. Excuse me, only hit 0.1 release in 2010NumPy for numerical computing. So large arrays and matrices became mainstream only really in recent years. Seaborwhich used for visualization, which we I'll show in the demo 2013. Jupiter notebooks only spun off from ipython in 2014. And then the machine learning tools that again we're going to use that are common now, such as psykit learn only first came out in 2010. Tensor flow in 2015, pi torch 2016. So this talk assumes you know everyone uses Python for everything. What's important to say? Even in my not super long career, that has not been the case. All right, so let's let's train a model, the source cois available on GitHub. So you see it all at the end. Don't take notes or anything. I want to talk you about how you do this if you've never done this before. And this is an intentionally simple example, but the general process applies to training any machine learning model. And again, machine learning means we give the inputs and the outputs in the computer through algorithms, comes up with a reasoning on its own. That's what machine learning means. All right? So if you start with machine learning, you learn there's basically two big data sets for classification problems. There's titanic data set, which is, it's a bit morbid, but it's literally who lived and died. And then iris data set for what type of species of iris flower? So titanic is often considered the holworld of machine learning because it's clean enough not to overwhelm beginners, but it's messy enough that you have to do a lot of core machine learning concepts, regressions, decision trees, random forests, these types of things. For titanic, there's a total of 13, 309 passengers, 891 survival outcomes. And then you get information like passenger class, first, second, third, sex, male, female age, number of siblings, etcetera. And then you can make predictions and parse out the data from there. But the iris data set, which is the one we're going to use, is even simpler. So in this case, the. There is only 150 rows, so 50 sets of measurements for each of three species. There's no missing data. So we don't have to do any preprocessing, so we can focus just on doing the model. So in short, if you're starting with things, I would recommend starting with iris and then move to titanic. So we're skipping all of the cleaning data, which is a big part of machine learning, just to focus on doing the model itself. Okay, first step, Jupiter notebook, right? The multiple ways to do this, you can do it on the web with Jupiter dot org. You could use anaconda or you could use a text editor. Pcharm, bs code have their own versions as well. We're just going to use pie charm here, but it doesn't really matter how you use your Jupiter. So I know you can't see this, but this is if you created a new juper notebook and pie charm. It comes with data and models folders. There's a requirements. Txt file, readme file in a sample ipynb file. That's where the notebook is. It's not uncommon to train and retrain multiple models. That's why you have a models delfolder. But we're just going to focus on one here for simplicity. So this is what we're working with. This is what the iris flower looks like. There's three different species. There's cytosis, versicollar, virgenica, and they have different petal and sepal measurements for width and length. So it's a balanced data set, which makes things a lot simpler for us. And the goal is we want to train a model so that we will add in our own petal and seal information and it will predict for us what flower that is. Again, this is just another look at the csv file on the far left. You have the ID and then you have seple length and centimeters, sepal width, pedal length, pedal width, and then species. I would mention if you do this yourself, there's actually different versions of this online. So you get slightly different csv files, which blew up a couple days of my life. So just be aware this one comes from kegle, but they're all slightly different for some reason. But this is so common that it's included by default in machine learning libraries like psyhlearn R matlab. Because again, it's sort of where you start if you're getting your your feet wet. All right, let's do a little code. So what do we do? Install two packages. So right now we're just doing pandas. So it's a library for data manipulation and analysis using data frames, that kind of thing, and then psych it. Learn. Excuse me, this is a machine learning library for data mining, preprocessing, model building. So we'll use both here. And again, this is making it as simple as I can. Okay, I'm going to try not to drly talk through code, but there's a little bit of code. So this is just one Jupiter notebook, again, source codes on GitHub. But just to talk you through, how do you train a model? So we import the libraries at the start, pandas, to load and manipulate the data set, train, test, split from psykit, learn. So that's to split the data set into testing and training sets. That's something you do in machine learning, an accuracy score to evaluate our model, see how well it performs svc, so that's support vector classification to train an svm, which is a support vector machine classifier. Talk about that in a sec. And then jolib to load and save our model. Jolib is a binary file that stores a serialized Python object. It's included with psykit. The end result of what we're gonna to do is we're gonna to have a jolib file that we can move over to Jango and show to people who view the web page. So an svm classifier, so supervised means trained on labeled data. So that's what we have here where it's actually labeled. In the real world, often you have non labeled data, so that would be unsupervised. So you load the data set, creating a variable df here to read the iris csv file into a pandda's data frame. We're going to extract the features as columns and rows. So there's four feature columns, three species labels, and then we're going to split it into training and testing data. So 80% training, 20% testing, 8020 is a common split. You could use a different split, and in the real world, you would, depending on your needs. So 7030 is more conservative if you really want to be accurate. 9010 would be for large data sets. We don't need as many tests. All train the svm model. This is kind of where all the action happens. So we're creating an svm classifier, setting gamma to auto. So that's the kernel coefficient. So this is how tight our model is around the data. Around the data. A lower kernel smoother, might underfit. Higher kernel be wiggly, might overfit as a default. We can just use auto here to not worry about that. And then model fit trains the svm model on our training data. Then we are going to make the predictions to test the model and evaluate its accuracy. And then we save the model using job and reload it. So we trained it once and then we have it. We don't have to retrain it every time we want to use it. All right, this is the last little bit. So we get the user input for predictions. And I'll show you what this looks like in just a sec. So there's prompts to enter in the four values, so sepal length and width, pedal length and width, and then try to predict the species based on user input. All right, let me just show you. Whoops, whooh. I have a live demo where it go. Well, that's too bad, huh? That's two hours of my life I don't get back. Well, you just have to trust me on this. You can load it yourself that if you hit run in the Jupiter notebook, you can enter in the inputs and it will show a prediction that is deeply unsatisfying. You know, when I gave an earlier version of this talk, I was flipping between screens at the live Jupiter notebook. But from Tim's example and others, that never worked. So I thought loading a video would work. But okay, trust. Oh, Oh, Oh, it's not playing on mine. Oh, wild. Yeah. So you can see this is in the juper notebook we're entering in our four things and scroll down to the bottom. In this case, it says Vera color, 97% accuracy. Thank you, Adam. Okay, so it does work, but it doesn't show here. That's very odd. Okay, but now we can do something cool. Now we want to visualize our data in our model so we can install seaborand matte plot lib to help us do that. And then we add a new cell to our juper notebook imports, both runs a basic pair plot. There could be a whole talk on what a pair plot is, but it's basically an easy enough default to get some visualizations. I'll show you in a second. But it creates a grid of scatter plots and histograms. So this is a very basic visualization, but you can see the clumping of data, and this is actually important. So in the top left is the ID. So those are the same. But then you have visualizations for each four, so you can ignore that top row. Those are nicely separated, but you see seo length and seo width here. And you can see that orange and Green are clumped together a bit, whereas blue is separate. That's actually good because that creates a challenge essentially for our model. It's not if they were all separate, there wouldn't be much for our machine learning model to do. So it makes our classifications work a little bit harder. And then this is the bottom, the bottom two for pedal length and width. And again, you can see clumpings for orange and Green, which is vercolor and virgenica, where sitosis is on its own. The important thing is that we have a trained model here. It exists as an irs is jolip file, and now we can deploy it to jangoza web app. So now it gets the comfort space. This is our game plan. So we're going to create a new jgo project, load the job lib file, add forms so the user can make predictions, share store user info in the database, and then maybe I'll show you deployment at this time. So an earlier version of this talk, I waited until the end, but if you actually pull out your phone or laptop right now, you can see what we're gonna to build on jingofor datasscience dot com and thatmaybe help get you through talking through the code that's coming. So I recommend you check that out. Do I have is it going to work? This what this is what you will see. So you web page you can enter in your information, make a prediction and it varies depending on the type of flower. So again, this is just basically our jolip file thrown into a Jango website and we're storing the information too. Again, a different version of this talk. I would show you the live admin as you're typing things in to prove this all works. But hopefully, you trust me that it actually works. So that's what we're going to build. And this process will apply to any model that you make with a relatively basic Jango website. Yep. Get my thing over here. So let me see how here we go. Okay, so I'm I'm going to walk through the process for a new Jango project. I'm going to go a little bit fast, but I do want to show all the steps because I know it's easy. Many people in this audience are very familiar with Jango. In this part, it was not as interesting as more technical deep dies. But anyone watching or knew to Jango, I remember being so frustrated when someone waved their hands and skipped a step. So I'm gonna to go through the steps. I might go a little fast, but all the code is in the repo and I want to show how you do it because it's not that many steps and it's the same thing again and again and again. So we're gonna to create a new Jango project from scratch. So you could create a new if we did it in terminal, you could create a new directory in your computer. New Python virtual environment Jango admin start project start app to create a predict app update installed apps in settings pi create a GitHub git repo. If you're in pie charm pro, you can just do it from this screen. But again, it doesn't really matter how you do it. All right, so this is the layout of our new Jango project. And this is one I would recommend in general for four projects. So we have Jango project. That's our project folder. People can call this anything. I like to just call it all jgo project. But that's a whole separate forum discussion, predict. This is our app. This is where we're gonna to put our focus templates for our template file. And again, I think I sometimes forget, but all of this structure, this exists for us as Jango developers. Jango doesn't care. The computer doesn't care. You can have one app, no apps. Like people in this room can and do do very different things with this. But if you're new to Jango or you're just doing kind of a vanilla version, I think this is as safe as it goes. But again, this is just one approach. Jango doesn't care how we structure things, but I like to take advantage of the project and app separation, and we're going na do it that way. So you're going along, you do your project, manage that high run server, Jango welcome page, which we all know and love. And then again, we just want the iris dot jolib file so we can copy it over. You see it in the middle there to the project level this in our application. That's all we have to do. Now, again, if we had multiple models, wehave a models directory, and those are more complicated real world things, but for demonstration purposes, we just pull over the one model that we want to work with. You probably want to run Python managed dot pi to migrate and get rid of all the warnings about unapplied migrations. And then as ever, we need url's views and templates. So the order really doesn't matter. And that really trips up beginners. But I like to start off with url's. So we're gonna to do that. So we just have project level file here. We have got opens empty string, excuse me, because we're just going to have it at the home page and we're including the predict urls, which we'll down below. Again, I gave this talk previously to a data science crowd. So I focused a lot on the Jango piece, but I'm going to go a bit faster because we're a Jango crowd here. Then we add a view, again, function based view. Hold talks. You could do unfunction versus class based views, but we'll just do a function based view, call it predict, render a template that's called predict html, and then we just create the simple template. So to step through this iteratively, we don't do anything other than just have hello in it for right now. Run server, right? Bob's your uncle. You're good. Okay, so now we get to interesting stuff. So this is the views file. This is not that scary. This is really where things are happening. So at the top, we're gonna to install jolib and NumPy, and then we're gonna to load the model from our base directory. We have post request for the form four inputs to match the four options, make a prediction using a NumPy array, and then return that as a variable prediction that we send to our template. I should note that we also need to install psyit, learn if we want to load the jolib file to create it. So separately in the new requirements txt, you're going to need psych it learn again, that code is in the repo. You can see. Okay, update our template file. This is a little bit hacky, but some basic css. And then here's the form that has the information where the user can put in their guesses or their measurements. And then we would get this. So this is our basic form. Enter n predictions. And then here's the results. If you did one, two, three, four, so you would get iris for genica. Now one, two, three, four are terrible. Like that's not actually what widths and lengths are for some of these things. That's why in the live site I put some boundaries, but just for demonstration. All right, so let's keep going in the view. Now we're going to add a dictionary called form data to store user inputs because it's nice to store what people had. This is often the case where you make a machine learning model, you test sted on users, you see how it works, and then you iterate on it. So for example, if you had a recommendation engine, you would build it, test sted on users, store that information, retrain the model, and you do these feedback cycles. This is why adding storage in the database is important. And I thought was I thought it was kind of cool how easy it is to do. I'll show you in a second. So dictionaries populated by values from the form and the request. We're storing them because Jango clears form fields by default. And then we PaaS this form data to the template context at the bottom of the file so it can be rendered on the page. All right. And then finally, we're adding the inputs and the form data. Nope, that's not correct. Why am I showing this again? We'll just skip that. All right, so very ugly. Here's where we are. This is cool. All. I feel safer. Now let's talk about the models. So obviously, if we want to store data in a database, we need a models pi file. We're gonna to create one here called iris prediction. Just use float fields for the four inputs and also store the prediction. And just for the heck of it, we'll add a created out date and then the string method to show the prediction date and time when it was made. If you were doing this step by step, youmake migrations here, run, migrate. And then this is almost the last slide with code, I promise. We update the view to save the prediction. So we import the model at the top, and then we do the iriprediction objects create to save the prediction to our database. And this is the last line of code, so we would update the admin so we could view it again. If you were doing this from scratch, create a super user account, log into the admin, you know, look something like this, very vanilla, but very functional, and you can customize it as we want. All right. So I had a version of this where I was gonna na show you how to do deployment, but then I realized that's probably a 40 minute talk, but I just want to give you the short version, which is this to me is the deployment checklist you can and should use. So for example, I was able to take the live site and in 15 minutes put up the version you have now on a custom domain, because I've done this up a jillion times. So I'll quickly talk through it. You can do it differently. I think this is pretty much the bare basics to have a not wildly insecure site. So configure static files, environment variables. A lot of people like Jango environs. I'm partial to environs. It really doesn't matter as long as you have environment variables. Create A M file. Update your gignore file. So to ignore the m file, otherwise, what's the point? Update your settings, right? So debug allowed hosts secret key csrf trusted origins, update databases to run post gressing production. Install psycho pg. So if you install environs, there's a Jango configuration where it will automatically inststore dg, dg database, url, extra goodies that do that for you. So production whiskey server, gunicorn Proc file because kroku, but that would vary depending on what hosting provider you would use. Update requirements that txt file, and then just create a quick kroku project, push the code, start a dyno process again. It seems like a lot. I can never remember any of this. But that's is why we have checklists. And I would strongly, I feel pretty strongly recommending this for a basic, not wildly insecure setup. Of course, you could do a million more things, right? This is the last slide, I promise. So here's basically the takeaways. So Jango is great for deploying machine learning models. I think most data scientists, they just want what I showed you. They want store ing a database forms. They want all the basic features that Jango gives you out of the box. There's often this sense that Jango is hard to use. And so theyuse flask, or maybe fast api just because they think Jango is difficult. And no disrespect to those frameworks, they have their uses, and if you know them, use them. But Jango is built for this use case. Just take a model forms like we give everything you need out of the box, and you can more or less follow the code here and apply it to almost any basic machine learning model. Again, iris, the great data set, titanic, it's really fun to train machine learning models. Like you don't have to know all the maths to do it. In fact, you don't really I mean to use it. You don't need to know any of the maths to understand it. You do. But you can go a long way just playing around and following tutorials and then deploy it in the real world, right? If you have your machine learning model, there's no sense having a Jupiter notebook thing locally. Like you can easily share it with friends, others, colleagues and in a real world setting. This is what data scientists want to do. They want to take their model. They want to expose it to users and do that iteritative loop of retraining it. Okay. Thank you for your patience. Happy to take any questions.
speaker 2: Thanks. Will those super talk can I just pick up at that point the end do you think when failing to market jangon to the data science .
speaker 1: 1000%? Yeah, Yeah, Yeah. I mean, I don't think that I don't know that other web frameworks are doing a better job. But again, it's I think for us in this room, Jango doesn't seem so scary and difficult. But I'm telling you, people with PhDs and machine learning are scared of web development and Jango. And I think just it's just you know it's batteries included, it could be presented better. The whole point of this talk was to show you it's really not that much code to train a model or to do the Jango bit and the process is the same. So hopefully this helps market Jango a bit better for that. You and you don't have to install a third party forms ums library like Jango comes with every most of what you need in a basic setting to deploy your model. Thanks, will. In the okay example here, you trained the model and then provided that to users. Are there any gches that you can think of for your users to be able to train known models to then show or provide to other users? Is there anything different that you would follow rably? But I can't speak to it after top of my head.
speaker 2: Well, you went through all the steps to set up the know, showing off your model in Django and showing how accessible that actually is, and you didn't skip through anything. But then in the deployment checklist, Yeah.
speaker 1: as you say, you know that's .
speaker 2: that's a quick process for you. You've done it a lot. But I think to someone new to web development, that process has actually got a lot of watts and a lot of hairs. And is there anything to make that more accessible for new people?
speaker 1: I mean, there are books. I've written a few. I would like to have a more in depth step by step guide. I think the thing is, it depends on the project you have. So the steps know, because you have so much flexibility with Jango, you have to know what the project is before the steps that you say apply. And if know, one thing is off to a newcomer, they're gonna to get totally frazzled. So for example, my Jango for beginner's book, I show you how to do a bunch of projects and I show you how to do all the steps. But Yeah, I do think about this. I think Jango has a great deployment checklist. I would like to make that more accessible. But I have seen the beginners, their projects are different enough that they get tripped up. So unfortunately, it's difficult to just say this is exactly how you do it unless I know exactly what your project is. But I completely agree. You know, read a book and you got it. Thank you. Great. To just a wild idea slash suggestion, perhaps this can be a great official jungle tutorial to live alongside the other one that you can send to data science folks and say, Hey, it's not that hard. And this will help push jungle to more people, that's all. I and we could have a hello world tutorial too. That's simpler than polls while we're at it as well. Yeah. I mean, I'm not you know I'm not in the board or important anymore, so I'm happy to give it to Jango .
speaker 2: if they want it. I realize you glossed over it in the talk, but you spoke about briefly this Jango underscore project thing. Yeah where should I look to learn more about that particular convention or that idea? Because I've seen it in a few places and it looks like it solves one of the continual problems I come across myself.
speaker 1: Tutorial or book I've written on learnjgo com has that pattern. I mean, I feel less comfortable saying everyone should use that. I use that because I see that people name their djgo project different things. And to me, I just wanna know what the project and what the apps are you know, in a real world, repo, the structure is a bit different. But you know if you have six, ten apps, I just like to say the name and project. So that's more of a personal thing. I used to call it config because my friend Jeff Triplett likes that approach. But other things are config to. In projects, I often have a config directory. So Yeah, I'm partial to it. I don't feel compensing like that's the one way to do it, but for me just having having it called something project is helpful.
speaker 2: Thank you. Thanks. Well, great talk. This was a small model and you committed it in the repo.
speaker 1: Yeah most models are big and we don't want to commit them and make our get repo now giant and we have push updates like what would be the next steps you take for larger models? Yeah. So I have the same question. I mean, I would love to know where is that limit with a Jupiter notebook? Because I think it's actually a lot bigger than we think of. So in the broader world, data scientists think of themselves as not great programmers, like below web developers, which I think we're relatively low on the spectrum, or not like nuclear submarine programmers. I don't know exactly. I would love to know at what point can you like what is the limit of a Jupiter notebook? And then when do you write Python scripts and how do you do all those other things? So Yeah, if I did another version of the talk or if somebody knows, please tell me. But that's a very good question. I have the same one. Okay, thanks. We actually have an online question, so I'll read it out. If jungle were to be marketed better to data scientists, can you imagine it becoming one framework to rule them all for data science and web development? I'm old enough to say, no, I don't think there's ever going to be one to rule them all. But I do think what data scientists want is crud with auwith guardrails that just works. And so if you're not, I would say if you're not comfortable with web development, chgoes great for you because it just gives you batteries. It gives you things to do. It doesn't require you to be an expert. It doesn't ask you to make some of the decisions that other web frameworks do. So I think it could be I think it should be like the top default. I think it often is not, and that's probably related to just not having tutorials or ways to do it. I think also, again, there is this perception that Jango is really, you know its batteries included, it's really hard to learn, whereas flask is considered simpler. And you know the first part of flask is simpler. And if you need more advanced stuff, maybe you want all the flexibility of flask. But if you're right in the middle and you want crudding off and forms and stuff, that just works. I think Jango Yeah should should be more prominent. But of course, I'm biased. So I thanks for the talk. Can you comment a bit about the real world, let's say examples of using model? So if the resulting file size is big about the performance of the whole thing, so how much time would it take to actually train it? And how much time would it take to query the thing and get the results back? Thank you. Yeah, that's a great question. I don't have a great answer because I'm not a data scientist, but I'm spending this year learning a lot about data science. So hopefully maybe next year I'll have a better answer for that. Yeah, that's a very good question. Sorry, I don't have the answer is thank you. Going to a bit more in depth for long term project maintenance. You use jooplip here. Jolip uses pickle under the hood or replacement or something like that. Yeah how do you ensure that that model that you trained once will run on whatever future Python cycould learn whatsoever version? So I didn't mention so you could use pickle or jolib and jolib is preferred for larger data sets. I don't know the answer to that question. That's a really good one. I mean, there's still so much of data science that's mysterious to me, to be honest. I mean, most of what I know is up here. So but Yeah, I wanna find out. I mean, I don't I haven't found resources of people talking about deploying models outside of you know, at massive, massive scale because I think people just don't do it that much. But Yeah, that's a great question. I'll research it, but I don't know. Thank you.
speaker 2: Thank you for the talk. In this demo, you used a csv file for the source of data for training the model in a jangle project. Usually you have a lot of data in your database. How easy it is to take like a query set instead of a csv file as a source of data.
speaker 1: I don't know exactly. I could make predictions, but I haven't done it myself. So Yeah, that's another great question. Again, I mean, even for me presenting this, I propose this talk with the idea of like how hard is it? How hard can it be? I mean, to data scientists, the idea that you can just move the job with file over with sort of mind blowing because they're used to big production scale. So this, that's a great question I have. I want na push it further and see like where is that limit? How much can we put just within standard Jango structure? I know I would love to do a demo to retrain the model because that was feedback I got from an earlier version was, Hey, that's what we do in the real world. We have the model users take the data from the database, retrain the model. Maybe next year I'll .
speaker 2: have a demo showing that you started by saying these data sets of the hello world over of data science. Can you tell us a little bit more about what people in data science see as not just the hello world datset, but the hello world problem? For us, we know what it is. It's to get a web page up that's displaying something we want from from a database. How should we be thinking about data and data we could use or data we could try and construct for purposes like this? So Kagle is like .
speaker 1: a big one of the big places with tons and tons of data sets that you can use. I think it depends what you're trying to do. I mean, if you're a data scientist, again, iris is good here. Just because it's easy, you don't have to pprocess it or clean it. So much of what you do is around that. So I hope I'm answering your question correctly. Like a lot of what you're doing is cleaning data and trying to get the accuracy prediction and you know how is the data clumped? Which classifier do you use? That's a lot of what data scientists do. I feel like I'm not answering your question directly though. That's you know for me to go further. I mean, there's a number of books. I mean, the thing is you can be overwhelming if you read an entire book on you know pandas, an entire book on psychic learn. Like I would recommend people go to Kaggle and follow tutorials and learn kind of the basic cleaning and training kind of steps.
speaker 2: Yeah, I just realized that actually how world is the wrong metaphor. The right metaphor is the first thing we do in a tutorial, which is, for example, make a to do app Yeah Jango or to make a polls app in so what's .
speaker 1: the data science equivalent?
speaker 2: Yes.
speaker 1: that's what I'm I mean, it's basically what I showed that's and again, I played around with a few. That's as simple as it is. We actually do something. And then the fact that you have two of the three are clumped together means your model has to do some work because of course, if the data was just completely separate, you wouldn't need a model for it. So Yeah, iris is as simple as it gets. And then most people, they focus on titanic because it's busy enough and big enough that you can get a taste of the problems you encounter as a data scientist without it being overwhelming again. But it's so morbid. Like really thank you for the talk. It was a great talk. I'm going to put you a little bit on the spot. And I blame carton, who said we should do that today. You showed that results for the ptrm survey for the past few years. Do you know when you are going to get the results for 20, 24 soon? I've seen it and reviewed it months ago. Jeff brains does a lot of work and a lot of teams to put it out. But soon I hope and I hope that cycle improves, but it's it's there. It's just inching its way through the process. But Yeah, that's a good question. Thank you. I will say there's no there wasn't anything like crazy that changed. Like if there was maybe weput out a code read you know in the responses. But you know for next year, I think we'll probably ask about uv for package management. And maybe actually that's I need to do maybe on the forum, I want to get some more community involvement around the questions. I mean, the board runs it now, but I'm trying to help make sure we ask the right questions because it does matter. It does help you. We saw that Reddis had a lot of support, so there was worked done to make that official for caching. It is kind of our only public way to get feedback.
speaker 2: Thanks again for the talk. So this presentation was using a bunch of pre made data. And obviously, it's data science. But do you see value in there being something like some sample data sets for Jango itself, so people can use maybe the books with examples of real data to play with and things like that to demonstrate parts of the orm a bit easier?
speaker 1: Yes. I mean, I think the challenge is, I mean, you could just make up data. It's nice if it's actually real world. But yes, like especially if we had a tutorial or had sort of a get your feet wet, something beyond just iris and titanic would be great. Yeah, I don't know why. Yeah, yes, we should do that. Okay, there's no more questions. Thank you, will. Thank you, everyone.