speaker 1: Hi all, my name is Khanna. I'm the chief of curation at cactus Consulting Group. As developers and tech professionals, I'm pretty sure you all are familiar with solving for unknown quantities. And as fate would have it, Tobias mcnlty is out with a sore throat and unable to give his talk. Today, cacti are known for withstanding some extreme and sometimes unpredictable conditions, and our team is a manifestation of that enduring quality. So we adapt and we support each other. So bias has asked a few trusted people to step in and deliver his presentation, and they graciously answered the call. Co presenters, Kenya Phelps, a software engineer at cactus, one of my teammates, and Ahmad vturi, great friend to cactus and a senior software engineer, will deliver today's talk on behalf of Tobias. So next I'm going to tell you a little bit about our cofounder and the creator of this presentation, and then PaaS the mic to them to get to the good stuff. That's Dubais. He is one of the cofounders of cactus and started using Jango in 2007. Cactus is an employee owned Jango first consultancy based here in Durham. We support the full life cycle necessary to design, build and maintain custom solutions for clients with the special affinity and talent for Python, Jango and mobile app development. We are proud to own a building right around the corner from the convention center, and we'll be hosting sprints at that location starting tomorrow. So if you are still around, please do stop in and say hello during Tobias's time, starting with Jango. He's a big advocate for the contrived that message sages. He's also a member of the Jango ops team. And he spends a lot of his time in our offices working, excuse me, most of his full time designing cloud based and on prem infrastructure for cactus clients. And also he helps keep my crazy ideas grounded in a spare time. He bakes amazing bread and grows vegetables. And if you're ever lucky enough to try some of his bread, you won't regret it. At this point, I'm going to turn over the floor to king you Phelps to get to the good stuff, talking about celery. speaker 2: All right, this is my first tech conference, and this is my first talk. So please be gentle. Okay, all right, so a couple of questions. Don't we all like polls? How many of you are familiar with celery? Good, good. How many of you are actively using celery on a project? How many of you are using celery canvas? And last question, how many of you are here for Morris celery cooking ideas? Excellent. So I saw quite a few hands go up and I'm gonna I'm software developer, two and a half years experience, and I just have a quick story about, like you guys say, you know what celary was? It is, I didn't know what celary was when I first started. I got to spend some time with a senior developer working and some code base, legacy code base, and we were working on a ticket. And I was just so excited that he had the time and the patience to help me through some problem solving. So you know, as a junior developer, I come prepared and I'm writing my notes down furiously because he's talking and I'm like, getting into it. I know what that means. I know what that means. Don't know what that means. Let me write everything down. And then sorry, at the end, he's like, Yeah. And then you know, we're trying to, I think the issue is a celery beat and whatever. I'm like, wait, did he say celery? And then he says something about a rabbit and I'm like, okay, let me write these things down. They don't sound technical, but I'm gonna to write it down, okay? And so he gets done and I'm like, okay, Joel, you said celery. Did you mean celery? Because he was like, Oh, I absolutely meant celery. I'm like, well, I don't know what that is. And so we did a deep dive and no pun intended, going down that rabbit hole on the rabbit, too. We did that. And it was actually amazing because I felt like this. I understand it's easy to grasp this concept, and I loved him for doing it. So I'm glad that you guys knew what salary was because I is, but because I certainly didn't know. So just wanted to share that little brother R with you. Thank you so much for indulging me. Yeah, totally. I can't see my Oh my note thing is is not Yeah, but I okay, thank you. So I'll talk today. Thank you, Ahmed. Our talk today will be broken up into two parts. First, getting started with celery. And part one, we'll talk about what celery is and how to add it to a jangle project, how to choose a message broker, how to choose a task scheduler, and how to run task on a predefined schedule. The last thing we'll talk about is how to choose a result store and it won't be target. Okay, okay. So then in part two, we're going to focus on a case study in which we can explore various ways of breaking down large tasks into smaller ones that scale more easily. What we are calling good taso. We will also briefly summarize some of the celery patterns and anti patterns that we learned. So getting started with celery. What is celery anyway? So I was supposed to insert a bad dad joke in here, but I don't have a bad dad joke. Or what is celery but so insert your own. The celary home page says that it's a distributed message processing system with a focus on real time processing. That means, in a nutshell, salary is a means for executing task in the background, potentially on a large number of servers in peril. Some of celary's major features include support for running code that is triggered by, but outside of the normal request response cycle. Njango, it communicates with a message broker, which is responsible for handling the work, and it can be configured to run taon a periodic schedule, so it can act as a crime job replacement. Because it can run code outside of the request response cycle, it is its own workers, it can help make sure requests are timely, and then it is reasonably good at distributed computing and high availability. So when it's configured with multiple workers on different servers, celery remains immensely popular. We've written several blogs about it at cactus over the years, and we're really pretty famous for our blogs, you know, real famous. This consistently remains one of the highest traffic pages on our site, and sometimes by an order of magnitude. Okay, there are other options from message processing and background tasks with Jango, but we think that celery is a solid choice and there's a good chance other Jango developers like yourselves will have heard about or used it in other projects, which we already know since the poll. Yeah, we know that you do. All right. So some specific use cases where celery is valuable, it's generating assets, for example, different image sizes. So after uploading a huge file, a large file, notifying users, for example, when surveys opens at a specific time during the day, updating the search index for a large content site from time to time, or running maintenance tasks, so backups, or cleaning up all files, database records that aren't deleted on their own. Adding celery to a Jango project, pretty straforward, relatively simple. How many times have we heard that right? I just said that yyou need to install the package. Okay? There is some boiler plate. So it should be, like I said, fairly simple celery. Or you can search celery first steps with Jango in any of your favorite search engines. You need to import the app instance in your prot's top level edit pi file. Remember, top level? I made the mistake of not and we don't want to go down that row, right? So that's so that's configured every time the jangle runs. And before we can use celery to run any task, we also need to configure it to use a message broker. The message broker is a task. Quethat allows the workers to consume the task. These workers might all be running on the same machine or potentially on different servers. The two most common message brokers used with celery are rabbit mq and Reddis. Tobias recommends using rabbit mq. I technically don't have a preference now that I have endeawed. What Tobias does, except for smaller projects where you already have red is deployed and want to avoid adding a service dependency. Rabbit mq is purpose built to be a message broker, and it excels at that task. Per Tobias, it's highly scalable and it's very efficient. It comes with a built in admin dashboard like Jango itself, and it provides visibility on different cues. So if you decide to use Reddis as a broker, some of the caveats to be aware, and these are also documented in the celary documentation. Tasks that are not acknowledged within the visibility timeout will be redelivered to another worker. Example, executed again. The default visibility timeout for redis is one hour. While you shouldn't have a task that is running this long, if you do, you'll likely be quite surprised to see it getting executed again and again and again. If you forget to increase the visibility timeout, that's going to be the output. Now, while you might be tempted to simply increase the visibility timeout, its purpose is to protect against task loss in the event a worker crashes or in case of a power loss. So increasing visibility timeout has the downside of prolonging the time before interrupted task will be rerun. So it's really not a solution. You're just prolonging the terror because it's going to be terrifying when you have this. Go on, run, run, run. So other more nuanced copveats tes include key eviction and group result ordering. If you would like to use Reddis as a broker, we recommend familiarizing yourself closely with these pages in the docs. Really pay close attention to those those pages. So wealth of knowledge. In summary, we recommend using rabbit mq whenever possible, possibly falling back to redis if it's already deployed and the caveats are acceptable on that particular project. Once you've selected a broker, you can configure it into project settings pi file with the celery broker url setting as a celery 5.3. We also recommend setting the celery broker connection retry on startup. Now that we have a celery configured, we can start writing a task, a celary task and Python function that is wrapped with the task, or share task decorators. Here we have a task definfind called batch password reset emails, and it takes a list of user primary keys to send password reset emails for. You might cue this task, for example, when an admin user selects user accounts to activate in a web interface, but you don't want to wait to send all of the emails at once before returning a response back to the user. Here in our staff activate view, we might have a form that figures out which users that admin selto deliver invites to. Then we obtain those primary keys and PaaS them into the delay function on our salary task. Task bch password reset email. Say that really fast. Then you can use the messages framework to save a status message for the user and redirect back to the staff list page. When delay is called, the web process will deliver this task with its arguments, the message broker and then return immediately so the status message can be qued and the http response returned to the user in the ary worker process almost immediately. If there's an open worker, the task will be picked up and the emails will be sent out. Voila. Here is a workflow diagram with the different processes and messages in the Jango lane in orange, the http where quest is received. Then the Jango view calls a task delay function and then returns the http response. Rabbit, mq and teal receives the amqp and then passes it off to a celery worker running in another process. The celery worker in blue then sends the necessary emails or whatever the workload might be, and records the results if needed. Queuing tamanually covers a large number of use cases, but sometimes it's helpful to be able to queue on schedule as well. Celery beat is the name of a feature in celery that is responsible for scheduling tasks at predetermined times, predefined times. It doesn't run taitself. Instead, it adtasks to the queue so the workers can pick them up and execute those. It's worth noting that celery beat is optional, so don't go Willie nly adding stuff into stuff, but you don't go adding it if you don't need it. If you don't need it to run tasks at specific times of today, you don't need celery beat. We don't want senior developers poop point, you know, getting on you. Okay, celery beat is built in celery itself, but like a lot of packages, it comes with a couple of pain points. 11 of those is that it requires state by default, this is kept in a flat file, which can be annoying and to maintain in our modern container based deployments. The second is that you should only ever have one celery beat process running at a time, otherwise you might end up with two copies of the same task getting executed or at scheduled. I'm sorry. There is a separate app called Jango celery beat recommended in the celery docs that allows customizing the celery beat configuration specifically for Jango projects. It includes an optional task scheduler that allows you to use the Jango orm to track state. Jango orm yes, and it also comes with a Jango admin interface, so you can monitor celery beat. Tavia the admin, and we love the admin. So the insulation steps for Jango celery beat are similar, and also straforward you install the package via pip and that those are the installation instructions. While Jango celery beat supports configuring task schedules in the database, Tobias still recommend storing those in your setting files. I also recommend it. I don't know, so he does too. I do two whenever possible with the celery beat schedule settings. Please note that some older versions of celery supported a setting without the first underscore in this name, so be on the lookout for that. If you're copying paste and code from the Internet, please. To define a schedule task, you give it a name, you point the task or function to be called, and define a schedule either here in a period of seconds or in crime syntax. Once a schedule has been defined and you have a salary worker and beat process running, you will start to see the output from this debugged task in the salary worker logs. If you ever need to return values from the salary task, it's also worth configuring a result store. The result store is responsible for keeping track of the return values from task functions, so they can be retrieved async either by other task or rep request. For simple use cases, a better pattern is used to update the database and to end the task. But for more complex workflows, you might need a result store to keep track of the results as you go. There are many potential results stores you could use over 17 the last time to buy as check, but I checked to this morning and it was still 17. But since we're using Jango, we recommend starting with a Jango orm. Redis works quite well as a result store. So while we don't recommend using Reddis as a broker, it can absolutely be used to store the task results. So and just to like talk about that a little bit, that was confusing to me as like as a beginner developer because it's recommended for this particular technology is recommended for one thing and not the other. So you might want to dive a little deeper in the docs for that because that took A A couple of go rounds for me to understand that, just to point that out. And not to say that it would for you, but for me it did like jangle celery beat. There is a jangle celery results reusable app you can add that includes a custom result and cacback in for celery. It also uses the Jango orm and comes with Jango at min interface for observability. So the the installation instructions are similar and pretty straightforward. You need to install the package, add the app to the installed apps, and configure the two settings in your settings file. We didn't quite okay, so full disclosure, this is something that tobuy as a sand. I love this about him. He is being completely just you know, vulnerable disclosure. He's like, we don't quite understand how these short strings work, Jango db and Jango cash, the first time we saw them, but we believe that they are mapped behind the scenes. So in the project setup file to entry points or classes within Jango celery results source code. So don't come for us. We already said we're not sure how everything is working under the hood, but we know that it works now that we have a result store configure or we'll will be able to fetch those results of our tasks that are finished executing. And that that's my part now for the good stuff. Good task. I want to introduce amed and he's going to finish it off. speaker 3: My name is Ahmed piti. I come all the way from Libya. I had to take four different flights just to be here. I'm really thankful for cactus. This sent me invitation. I was really happy and thank them. I want to thank them for making this happen. So I'm here. This is my first Changle con and my first talk too. So we'll see how it goes. Right? So picking the right size taone of the hardest part about designing good tasks is fine tuning the amount of work done in each task to best balance readability, scability and maintainability. Sometimes low running tasks are easier to write, but it might be harder to debug when something goes wrong. On the flip side, release your tasks, for example, less than 1s might come with an undue amount of message overhead if scheduled in the high volume. Ultimately, the goal is to design around units of work that are small but not too small, and that can be executed simultaneously. Another good criterion to use for designing taks size is how graceful shutdown will be supported by your application. So for cover bernetes, the default time amount of graceful shutdown is 30s, and for supervisor is 10s. This can certainly be increased, but I would not increase them indefinitely. In any event, if your task does not wrap up work either by finishing what it's set out to do or by somehow saving its place and queuing another task to pick up where it's left off later, it will be terminated and might leave its work in an unconsistent state to to help supporting building the right size task. Celary supports a number of primitives of grouping and chaining tasks. These can be, this can be combined to support arbitrary, complex workflows before tasks can be chained or grouped together. However, we need a way to bundle tasks function with its arguments. So celary knows how to call it, celary calls it the signature, and it can be created with the s function, which taas its arguments. The arguments that you want to PaaS to it is the task when it's, when it's eventually executed by a worker. Typically, this would consist of adjacent representation of the function that needs to be called along with its arguments. Once you have a task signature, you can PaaS them into the chain function to execute them sequentially, the return value for previous tasks will be passed to the next task in the chain, and so forth. Similarly, the group primitive can be used to queue tesks all at once and run them in parallel, in this case, all their values from a group tasks will be collected together in a list. A task group can be part of a larger chain of tasks, so you can, for example, chain the list of results from a group into another task sts to aggregate and report on the results. Finally, there is also a star map function, which can be used to PaaS a list of arguments into a task, which will be executed sequentially for each set of arguments. We'll we'll talk more about this in the specific example shortly. There are other parameters you can explore in the celllary docks linked here. As a general rule, try to avoid passing state from one test to another through test results. Instead, just use just as you would PaaS an object primary key into a task instead of the model object itself, try to minimize the size of importance of arguments passed to in the chain. As Mr Flavio went into more details on this and his salary talk yesterday, and Tobias recommends checking out the video if you didn't have a chance to see it yesterday. To help understand how these celary primitives can work together, it's helpful to explore it's helpful to explore an example together in code as some brief context. Before we do that, cactus has been working as a software development team for the high National Election Commission in Libya since 2013. And this is actually the same time as I have met Tobias too. It's like ten years now. Libyan citizens registered to vote by sms and cactus team built their voter registration system using Python, Jango, and cillary. One function of this application is the generation of voroles. We think it's a good case study that we could use to explore how celary primitives works if youlike to follow along in the companion repo, here is a link on the GitHub kakkt us fourslaciary dash code. While this code is based on the idea of voroles, I should mention it was written from the ground up for the purpose of this talk. So it's not actually used to run any elections. If you would like to see running code for elections, there's an open source project that we have worked with cactus for. It's called smart elect. You can can navigate that smart elect. So the general goal of the code is to create lists of all eligible voters for distribution on paper at the time to all the 2000 polling centers in Libya, 2000 plus. In a nutshell, the code assigns voters within each polling center to groups of 500 to 600 voters, which is called a station. This is the largest number of people that could reasonably be handled by a single desk or a table in a polling center. For example, there might be anywhere between two to 30 stations in a center, depending on the population density around that area. After spitting voters, after splitting voters into stations, the code generates pdf's one for each station with the list of eligible voters in that station. So there are a lot of different ways we could write this code. And Tobias wrote up several of them in the example repo. If you're looking for the code, it can be found in the voter roll four slash roll roll gen dot pi. So let's talk through this, each of these options. Now in option one, all the coding is executed in a single celary task. We do iteration through all the polling centers in a database for each center. We split the voters in the center into reasonable size Zed stations. Then we write a voter list to a pdf file. Along the way, we keep track of the total number of pages written and print summary at the end. This is a short and easy to understand, but it's not ideal for a couple of reasons. All the work done here is sequentially. The work can be distributed to multiple workers in your configurations. It takes longer than 30s, even though we have given it a sample file of 10000 fake users. A, I'm sorry, fake voters. A natural way to break up this work might be to go through each center one by one and writing all the pdf's files for that center in one task. So this is option two here in sample code. What it does is splits the voters in two stations only for that specific center, and then it tries all the pdf's files for each of that stations, feature the stations. And at the end, it returns the total number of pages for all the stations within that center. We continue on option two. And this is the second task. We need a task or function to kick off all the individual centers, levels, tasks. We can do that with the cliary primitive called group. So group task takes a terrible of task signatures. What you get from the s function, which can in turn be called as a group with delay, the delay function to add a task for each center to the queue all at once, the work will be processed by the workers up to the total number of workers in parallel. You might be tempted then to wait for the task to complete and get Drothe results. Do not do this. This is not good practice. For one, it is not possible or even advisable to wait on a task within another task, and for number two, you will hold the process calling join doing nothing. The join function here will be doing nothing until all the other tasks are complete. So instead, this is the option number three. Okay, option number three, we can chain or pipe the results of the central level task into a final chord task that has the job of summing the page counts from the other tasks and reporting them. This task is virtually instant. It cues all the tasks with group as before, and tell celary to kick off the taks. Some pages taonce the taks in the group are complete. According to the celery docs, the synchronization step is expensive, so this model should be used sparingly, but sometimes it is unavoidable, so it's still better than the alternative of waiting synchronously for all the tasks to complete. Note you may also see the chord function here. It's doing the same thing as the pi syntax, but we use the pi syntax to be more explicit, just to be more explicit and easier to understand. And this is the following function for task number three. Option number three, here's so it takes the list of the results from all the tasks in the previous groups, then calculates and prints the sum of those page counts. So option four, as a brief tangent, instead of splitting voters into stations one by one, we could do all the paralyzed work up front. This turns out to be a bad idea. Why? Because work is relatively fast anyways. It even might be slower, highly paralyzed, with all the celary messaging overhead, and we don't want to pumble our database with all too many queries at once. So this is a database intensive category of fork that is often passed down synchronously. If the code is too slow, time is probably better spent on optimizing the underlying database queries rather than simply trying to do more things at once. In this case, trying to execute all the queries in parallel with almost certainly takes more than more time rather than less time. So in option five, we modify the task to accept two parameters, senid and station ID, and the work it contains is limited to generating the list only for that single station in the polling center. In this demo, the running time for this task is between 78s, which is an ideal for celery task because it simplifies graceful shutdown. Now you might be pursuing the stillery docs and see promising looking helper function called starmap. If you're familiar with the Python multiti processing library, or the map, or the other map produce type of workflows, you might think that celary, being the distributed systems that it is, would execute taas in parallel when called with sarmap. But this is actually not the case here. And noted in docs, sarmap q is only a single task and runs each subtask sequentially within that task instead of in parallel like group, like the group function. So the performance of this version is really the same as what we did in option number one, which we did all in one task. But the code is harder to read. Tobias wasn't sure what the use case is for starmap, so if anyone happens to know, we would love to hear about it during the Q&A. Option five b, our last and final option, we return to using group to q ue and one task for each center ID and center station ID pair. Then we pope the results into the task, some pages core task. This option gets the database work out of the way quickly and then paralyzes the cpu intensive work. Again, anecdotally, this particular unit work takes around seven to eight minutes, seven to 8s, sorry, which is a great length of time since it's less than the default graceful periods of Kubernetes and supervisor shutdowns. In other words, when a siliary worker is told to shut down, for example, when new code is deployed, we can be confident that it will finish the task it's currently running and then exit before consuming any more tasks. If youlike to try out the code, please check out the companion repo for the stock, and you can use the included management commands to generate some test data and run the various tasks. Watching the celary worker outputs will provide some insight into when and how tasks are run in parallel, when they're not, and how long the various options take to complete. To summarize, some patterns we like to recommend for Sillery on Jango projects are to first use rabbit mq as a broker whenever possible. When rom objects are needed in a task, PaaS the primary keys and fetch objects in the task. Use Jango Sillary results and Jango Sillary beat database backend for visibility and ease of use. If you need those features, divide the work into right size taas. A good target is for all tasks be complete within a graceful shutdown period. The don'ts here are just don't make a task that weigh synchronously for another task, and don't write a task that takes longer than your graceful shutdown period to complete. And don't massively paralyze database operations or other work that derives no benefits from running in parallel if youlike to learn more, cactus has several popular posts on the cactus blog about celery as cactus taus four slash block cash salary. And last but certainly not least, yesterday at jangocom, Mr Flavio gave a prerrecororded talk on mixing reliability with Siery for delicious sytasks. His talk focuon some more advances advanced use cases for ciery and suggests additional options for tracking state in more complex workflows. So we recommend watching when you to watch video after the conference if you did not happen to catch it yesterday. Thank you for coming to our talk about siary, and we are happy to engage in conversations about this talk. And you are also welcome to reach out directly to Mr Tobias by LinkedIn or fosadon. Thank you.