speaker 1: Hey folks, I'm really happy to speak at junkous 2024. I'm going to talk a little bit about jangoin celery in their amazing love story, right? My name is jugo. I'm a partner at Vincent software, where I've been working for the past seven years. Some of you folks might have already crossed with vinta before by using one of our open source packages for Jango, like Jango react boilerplate, Jango workler missions, drf rhythm rights, serializers, Jango virtual models, and the newest one, which is really fresh, which is Jango AI assistant. You should really have a look. We are a company that really cares about the open source world and especially the biyden community. We contribute with a lot of open source software. And also we've been present in most conferences all over the world, so that's something we really love. And we've been there for a while already. So about myself, I've been working with jgo for the past ten years in very different kinds of projects, most of them using celery. And that's what we're going to talk about today. To start this talk, I'd like to talk a little bit about Jango. Jango is almost 20 years old, and its philosophy has made many projects very successful so far. What I really love about Jango is it has batteries included. There's kind of the right way of doing things like authentication, authorization, interacting with data security, things like that, chawith school of good opinions on all these topics. So that basically allows us to work together and make a whole ecosystem in an integrated way built on top of chgo, like using the same base. And that's because of that. We have tons of packages that are really useful. We have events, we have meetups. We have and especially, there is an active development in the main framework, which introduces exciting new features on every new version. But there's a known problem of chgo that couldn't still be fixed, and that's like chgo performance, right? Chgo uses Python, which is not known for being a fast language like compared to rust or see or go. And on top of that, chgo wasn't built to work with multi tread. It's the synchronous features are still being developed actively. They are being still matsuring. The orm itself can be misleading sometimes with regard to performance, but there are workrounds to make Jango a bit faster in the database side, you can avoid impplus ones, for instance, like being more cautious on the way you write queries. You can use caching, you can make smart, not amazing indexes, you can use data normalization, you can run operations in the background, right? All these techniques could make your application run smoothly and serve a ton of users simultaneously. That has already been done in the past by big companies, but I want to talk a little bit more about the last one running operations in the background. Before that, let's have a look on how jgo works in the background, right? Basically, Jango exposes a wsgi interface or web server gateway interface. That's what a server like unicorn or wisky uses to run a jangle process, right? They kind of connect to that wsgi file in your Jango project. These servers themselves have a request load balancer. Basically they, this part of the server receives a lot of requests and they map these requests to processes they have, right? So each general process has like the whole Jango environment loaded there in memory. Each of these processes speaks a request at a time and only guess the next one when the previews has finished being processed, right? So long running requests are kind of expensive, right? Each general gle process loads the whole framework core, right? So it also it's expensive to have too many processes, right? Because all of them have like a cost to to be loaded, right, and request how they jungle process H while being processed. So there's no way to run multiple requests in the same process, right? There's no multi Thad in jgle processes usually, right? There might be workarounds, but we're focused sing on the main thing e today. So why running operations in background, right? Requests should be processed quickly, so we don't have to roll the process for too long. We want to be able to process the next request as quick as possible, otherwise the queue is going to be too long. And we also want to give feedback to the user as quick as possible as well so we can retain their attention, right? Attention is very valuable in these days. So we don't want like the user to switch tab while they wait for an operation he started on our app, right? We want to keep him in our tab and like maintain, retain his attention, their attention in our in our platform, right? In salary, it's a tool that can help with that, right? Salary is in a synchronous task queue or job queue, which is based on a distributed message passing, right? So the most important word here is distributed, right? It separates a syntheexecution from your main application, so it doesn't run in the same process that Jango runs. It's a completely separate process that just worries about these as asynchronous tasks. It is also very fast, and it's very well integrated with many tools, right? Including jangle, right? So with celary, you can have access to your models, to your services, your functions, your classes and all that. Charight and celery loves Jango, right? Celary basically has dedicated documentation for integrating with Jango. Actually, there are ways to configure celary using jgo settings with no extra butplate jgo, it's allowed to be accessed with thin tasks. So you have full access to your rm and your managers and your quers heads and your services classes, etcetera. And there's also an important community around salary and Jango building packages that enhance these integrations that uses Django database for managing some of salary things and things like that. So there's a whole echo system that helps us integrate them both, right? And here's how it looks like to integrate Jango and celerary. This is like very basic this case. So I'm just showing you how it can be done, how quick you can integrate them. Basically here I'm writing this process order view, right? It receives a request, it creates an order object based on the user, and it calculates it kind of called the calculate uers card. Taso, suppose you have like the this dysfunction called calculate user's score. That is very slow, but it isn't that important that you run it immediately after you create an order. It can have some eventual lack of synchrony. And so that's the perfect use case for us to move this task, move this function to another process, right, to running background, because it's just going to blow the request, right? So in here we are moving it to a task function that receives the user ID. It picks the user from the database and it actually calculates the user's score. But before all that happened, we actually give a response to the user. So this delay method here, it's just going to add the task to the queue. It's not going to execute it immediately, right? It's going to be executed in the background. So this is kind of a hello world for for salary, right? And what this can give us basically what async tacan be used for. You can delegate long blasting jobs like the one that we've just seen. We we can execute remote api calls, like apis can fail, right? So we can use like salary to run to like kind of kind of rap these api calls and give like have great tries running without blocking the request, right? We can also prepare in cache values, right, to make queries a bit lighter, right? We could spread book database insertions over time. For instance, like to not to avoid overloading our database, we can execute recurring jobs, things like that, right? There's many use cases for async tasks, right? And how does it work with salary? Basically, Jango uses a salary client that gives kind of that delay function, for instance, and it basically kills a message to the broker. A broker could be many platforms, right? Many, many kind of q of managers, right? It could be like repped in q sqs. It could be Reis itself, right? So we can use multiple queue manager as a broker, but like we choose one and like whenever we call this client, Jango is going to send a message to this broker and put it on the queue, and celary is going to be watching this queue. And it will mark a task as it started. It kind of depends on your settings there. It may or may not mark as it started. It may just started, but and it Marks it exsucceeded after it finishes processing it. So the second part is like after celary finishes the processing of that message of that task, it will also deliver the result of that task to a kind of database, we call it results backend. And it's basically storing task results. So you can query these results with other tasks or even with Jango, you may want to wait for a task to run to get its result. And the case that we showed may not make a lot of sense, but there are use cases for that. If you're just sending an email with salary, for instance, you might not even want to wait for the result. So the result back hand will kind of be useless in that situation, but there are situations where can be used, right? So when you configure salary, you actually have to give give it a results back hand so it can store the results there. It's just a database in normal database. So but like it's it's a good long story, right? Jango and celery, but it's not always rainbows and butterflies. We can say it's a tough love, right? There are many things that can go wrong whenever you're introducing a distributed system into your application. It makes things a lot more complex, right? Like all connections may feel there are a bunch of connections. Now you have Jango to the broker, the broker, the salary, the salary to the results back in the results back in to Jango. So there are many connections that may fail, right? You also have a lot of concurrency being added there, adding a lot of complexity so we can have like death, death never finish, outdated data issues that only happen in production. It's it's kind of introduces a lot of noise, right? Let's say salary has a strong personality, right? Salary is a distributed system, which means there are many points of failure. Right? Here, we are going to focus up on these four problems that I've been through in the past in my my whole experience with salary. These things always have caught me in the past, and I have to handle it correctly. The first one is outdata ated data, right? Sending complex data as parameters to salary may result in unexpected stuff, right? In this example I'm giving here, we are passing the user model to a seller task, right? And we are basically calculating the score here and saving the user. And why this is kind of tricky. It might not work as expected because the user model may change between the salary task being scheduled and it actually being executed. And as we are passing a model object, the model may be updated. I can even have deleted that user if, let's suppose, the task queue is full, right? It has a lot of tasks to be executed there. The user may have been deleted. And whenever you run this, the user may not exist anymore, right? So you calculate the score of an existent user, right? In this other case here, the right way to do this is basically passing a reference instead, right? I'm passing the ID of the user. I'm getting the user by the ID. If the user doesn't exist, I just don't run the task anymore, right? But if I can't find the user, I'll get the up to date object right? And I will calculate the the user's score accordingly. So this is the first thing you have to be very careful to what you PaaS to the salary tasks as parameters, right? Usually you should rely on references, not on like complex objects, right? Cerrealization is necessary if under the hood, by default, it uses pickle, which can actually PaaS like complex objects. But that's far away from an idea you shouldn't rely on like passing complex objects. You should like PaaS references and remthat this objects on the task itself. Okay, the next problem is duplicate runs, right? Depending on your seller's setup and task configuration, it may not be guaranteed that your tasks are only run, are only going to run once, right? Multiple workers may pick the same task at the same time. For instance, right? You have multiple workers looking at the same queue, both of them, both can get the same task at the same time. And this could happen depending on your configuration. A test may be interrupted and recued, for instance, right? If like a worker fails and the machinery starts, you may have a task that was interrupted and was recuwed, and it will be executed again. So you have to ensure that your tasks are atomic and idepentent, right? In this case that I'm showing here, this example, we are storing a reference to the latest order with process in the user model, right? You can see in the line 25, whenever we update the user score on the line 24, after it, we just store the latest order ID and we save the user, right? We save the user. That means if we call this task again with the same user and no new order, we're just going to pick again, this last already it ID, right? And not we are going to exclude that user with reference to that auid. So if there is a new order, we are going to recalculate. But if the order is the same as the last one, we are not going to do anything. We are going just return early like in line 22, right? We are not going to be able to pick the user and we are going to return early. So this function, it also has this transactional atomic decorator. So if something fails here or if this function stops in the middle, it will just roll back everything, right, row everything back and will not leave any like data updated. Not completely right. So the other problem we can see in salary projects is a little bit more complexity on their feedback, right? Because whenever you PaaS the ban on on to a salary worker, if it fails there, you you may not be waiting anymore in jangle. So you may not be able to return an error result to your user. So user may receive a success message from the request, but the home operation may still fail, right? So whenever we are developing a flow that like relies on sync stuff and async stuff, we need to take this possibility into consideration and give like a feedback, not like a success feedback immediately to the user, but feedback saying that this is still being processed and do some sort of paing to shack if this was completely successful or not. You may also need to be able to undo operations if partiit has partially happened synchronously and the other part is happening assynchronous in background, you may need to undo this first part that happens synchronously. So it adds a lot of complexity for error feedback and error handling in general. You might you need to take that into consideration when developing features that will touch both sync and async at the same time, right? And especially if it involves user feedback. Okay. So there's this fourth problem, which is conflicting operations while, right? So while a task heaven still run, another operation is triggered by the user, for instance, right? This operation conflicts with the one that's still pending and creates like an estate that is not predicted, right? Is not predictable. So for instance, let's consider these use cases. User can add notes to our system, right? User can also delete a note and user can book, create copies of a noso. You can choose a noand, create a bunch of notes based on that existing note. De, right. So let's look at this following flow I have on the right. So let's suppose the user starts creating a new node. It creates, then triggers the creation of ten copies of the node. This will running background, because creating ten copies may be a little bit longer. We don't want the user to be waiting on that. So it happens in the background. This task is kind of qued. It goes to queue and it will be executed by a salary worker. Then the user deletes the original node before the task to copy the ten notes run, right? The taks the task runs, but the original node isn't available anymore, so it cannot be copied, right? You cannot like copy something that don't even exist anymore in the database, right? There are many solutions to this. Like you can implement, for instance, soft deldelete on the noso. Whenever you delete the noyou, don't actually delete it from the database. You just mark it as delete it. But you can still find it on the database and you are still able to create these copies. You can also cancel all painting tasks before deleting. So whenever I try to delete something, some node, a query for painting tasks around that node that uses that will do something to that node and cancel all of them, all the painting ones. And after that, like other like other alternative is just locking notes with painoperations. So whenever we want to do something on the on an existing node before creating the task, we are going to mark this node as like locked. And whenever we try to delete a locked the node, we'll get an error, for instance, because it's still locked. So you have to wait for it to be unlocked before you can do water operations with it. So this these are alternatives for this conflicting operations. This is just an example. This can be a lot more complex depending on your use case. But like async may create a lot of issues. You can not like have this just do a Roback, a database rollback. You don't have like atomic operations. So there's it's hard to have like atomicity between the sync part and the async part. So basically, you have to handle it by storstate or even cancelling painting stuff. Right now let's talk a little bit about how to make things a little bit smoother with celery and jingle. There are ways to work around some of these issues that we talked about, right? There are some tips that can help like we're calling couples therapy here. So some tips to help the relationship to flow between chagon celery. So if you're going to retry taks, for instance, you need to use exponential backoffs, right? You don't want to like give a fixed number of seconds or milliseconds between the retries als because for instance, if you are making a request to an api, this api probably won't be back. If it fails, the request fails. It probably won't be back within a fixed number of seconds or milliseconds, right? You probably want to wait a bit more between the latest calls, right? So it's important to keep your backoffexponential between retries, right? Also, tasks shouldn't raise exceptions the same way like views don't are not allowed to raise exceptions, right? If you run an exceptionally general gle view, it will raise a 500 error and the user won't know what happened, right? So the same thing applies to salary. If you raise an exception, it will just fail in a in know terrible way, right? In an uncontrolled way. So ideally, you should handle all exceptions. And if there's nothing to be done in the code for like adjusting the state because of that exception, you can just like send a report through email or through a monitoring tool, something like that, right? So exceptions should not be unhandled in the salary tathe same way they should not be unhandled in jgo views. Monitoring is also essential, right? Like salary adds a lot of complexity. You need to kind of be able to see this complexity, see how it's going. So there's celery flower. This is a package that can help, right? It can help you see the current state of your salary setup, right? You can see the running Teyou can see the tethat have already run. So it can help a little bit for you to understand what's going on. There's also this flag called olzieger. You can do it by task or you can do it globally for all salary tasks. And this is very useful for development, right? Whenever this flag is active, the tasts kind of run synchronously. So whenever you call a task, it will run immediately instead of running another process, right? It will just run, which makes debugging a lot easier, right? If you don't want, if you are actually wanting to use the remote thing, having a different process for salary, rdb could be your best friend, right? Rdb space can remote the bugger. So whenever you set up a breakpoint, it will stop the execution there and it open a connection so you can have a terminal connected to it with telnet. And you can actually send like debug signals. You can check like variables, values, you can like send a next signal or continuous signal, things like that. You can debug the whole thing even with remote remote workers that are not in the same process as your go Jango application. So rdb could be very helpful. It's kind of a built thing, so you can import it from salary library, right? It's available within salary. You don't have to install anything external. So it's it could be like a really good friend for you. It could really help. The second part of the couples 's therapy, I would like to mention a good tip for long tasks, right? Long tasts don't may not work exactly as expected with salary, right? Because like imagine that you were the manager of this task. Like you cannot know if the task has, for instance, frozen, right? May have frozen. It will not give any result ever. It's just like locked in there. So in that case, you probably want to have a timeout, right? So this is configurable, but you probably don't want tests to run that long like hours. You don't want that. So in that scenario, you probably want to split the task into smaller ones to make sure that you have like a sense of progress, right? And you can know that things are running and there's no time out like killing your task while they is still running, right? Another thing about salaries, that monitoring tools are very limited. I talked before about celery flower, right? Or flower, I don't know how you pronounce it. I think it's flower celery flow lower by flow, right? So you may need to implement some monitoring yourself. Like for instance, I had in the past to implement like cues heartbeats, right, to know if my quills are well balanced. I have multiple ques for different sorts of tasks, right? And some quills were like being very full, and other quills were being very empty. So if a cuue is full, having a heartbeat may be helpful because if you're scheduling a new task, it may take like too long for it to run. So this is something you might have to implement yourself. I had to implement myself in the past, but like there are some paid monitoring tools that might also help in there, right? Like things like new relic or data dog may help a lot in monitoring your salary tasks and having useful logs and understanding the state and all that jazz, right? Another thing is that celary is an excellent tool for syndrotask and for doing simple jobs. But like for complex workflows, it might not be very reliable, right? It may be it has a lot of open issues and many complaints of loss tasks and unpredictable behaviors. But when you're running very complex stuff, so it may not be the best to for that, but don't worry. Like their celary enjanles relationship is nomonogamous. Like these tools, other tools can live together with salary, right? For instance, you could use salary for doing simple stuff because it has like very little boplate. It's very easy to use. The learning curve is very little. But you could use like temporal io, temporal io to run like more complex stuff. It has like a more robust infrastructure. It's kind of like created to run these complex workflows and even include like better monitoring tools like building in that you can rerun tasks and do stuff like that. So for very complex stuff, you may want to look into other tools for that, right? But if you're dealing with simple stuff that needs to run, assync salary could be your best friend, right? So other than that, we also have this that checklist site. Vinta has put that up, but it's an open source project as well. It has a bunch of dav checklist and there is one specifically about salary. Like my coworker Philippi shimanis has put lithis up and it has a lot of best practices you should follow when writing us integrating jingle salary. So basically, you should have a look at that. It can be very useful if we studying any application or if you're configuring a new integration, right? And that's it folks. Thank you for hearing me. I hope you you enjoyed the presentation and I'll probably be on the event. If you wanted to talk about salary or Jango or anything related, I'll probably be there. If you want to talk, just ping me. Okay, this is my email. If you want to talk about it as well. And that's it. Thank you.