2020-09-09 | PyCon 2020 | "Taking Django's ORM Async" - Andrew Godwin
Django 异步 ORM 的挑战与进展
标签
媒体详情
- 上传日期
- 2025-06-21 18:29
- 来源
- https://www.youtube.com/watch?v=ibAmA4QQDhs
- 处理状态
- 已完成
- 转录状态
- 已完成
- Latest LLM Model
- gemini-2.5-pro
转录
speaker 1: Hello everyone, and I hope having a good afternoon wherever you are. Now I'm very excited to get to talk to you this afternoon about Jango and async. But first one small note. I'm doing this from the past and not just the past in terms of time zones, but like the actual real actual past. What this means is of course, that future me is in the chat to answer your questions. So have questions anytime during the talk. Please pop in there and I'll try and answer as best I can. If you're watching this in even more the future on a video on YouTube is another site, then unfortunately you can't quite do that, but I'm sure you can find me and I'm always happy to answer questions. So with that, let's get on to talking about some async and some Django. Now before I dive straight in talking about Jango and async and the rm, because it's all very exciting, let's start with a little bit about me first. So for those who don't know me, I' M Andrew gobbin. I have been a Jango core developer for, let's just say, quite a while. And in the time I've done a few different projects, but probably the most notable ones are south, and then it's sort of successor Janger migrations. And then of course, most recently, a lot of work on async Jango. Now, one small fact is I am not currently and unfortunately in Australia. I am instead in the wonderful world of Colorado, in particular in Denver, Colorado. At some point when you can get here, I encourage you all to come visit the beautiful place full of mountains. And it has distinctly less animals that can kill you than Australia. But until then, we have to just be content with what we have, where we are now. So let's talk first about the good news. In case you missed it, with everygoing on acynfhave been released, jger 31 is out. You can download it user today. And in jger 31, you can just simply do async deaf in front of a view rather than deaf, and bam, you have an async view. Now this is the result of a large amount of very complex and interwoven work. I have some previous talks, including some from previous Python I use. You can go and look at to learn about that kind of stuff, but essentially it is to you, the end user, reasonably simple. And so my hope is like that's the good first stepping stone into a more asynfriendly Jango. And one thing to say up front here is that I've always said that the goal is not to make Jango asyc only the goal is to make Jango the place to have like a hybrid async world where you can have most of a code synchronous, because honestly, it's easier and often it's perfectly good enough what you need. And then just the performance critical bits or the things that do lots of io weting can be in asic mode. So the goal has always been in a hybrid. Jango does both. An async fuse are a perfect example of that. You can do an async def next to a normal synchronous deaf for views, put them in the same url's pi file. They serve off either synchronous asynchronous servers with some caveats in the asynchronous on synchronous server case. But generally, it just goes pretty well. It's kind of I'm really pleased with how it turned out. And Interestingly, one of the very first things when I started talking about all this asynctuin Jango about, let's say, four or five years ago, probably, I think it was a jangoon us, but don't quote me on that, was the ordering. Like this was always a big project. Taking Jango async was always going to be difficult. And so even by back then, I'd broken it down to three different phases. The first phase was basically developing and constructing a way for jger to run on some kind of native asynchronous platform that was asgi. It evolved from the channels project, and that basically came around in jger three zero. It wasn't very useful by itself. It was more of a foundational thing, but lots of hidden things you didn't see were included in djgo three zero. For example, things like async safety for the rm was built in from the start. Things like the ability for there to be more than one handler, there's a wiiskey handler and an asi handler, all that kind of like ground laying stuff that we need later on. And then of course, phase two async views was made manifest in jger 3.1 and is now released. And so those first two phases honestly worked pretty much as I had planned them. There were many missteps along the way, and I didn't do things quite the way I'd want to in many cases, but they ended up looking and feeling the way I kind of intended them to, and that's the point. And so now we find ourselves facing the uimprecipice of phase three. And in some ways, it's kind of unfair to call these one, 23 because it implies equal weand. That's really not the case when you come to these three things. The rm is honestly most of Django. This is by lines of code. It's certainly by complexity and understanding. It is a big, hairy beast sitting the center of what we know and love and having to dive into it and add asynchronous support. That is no small task in itself. It's kind of a reason why always put off orm until like this last one, not the last. But this this part of the project after the view stuff, first of course, was the fact that like having an async go m with that async views would be kind of pointless. Let's be real. But the other part is that it's just a lot of work, and I need to get my own async abilities and skills up as part of that process. And I also needed to make sure that like things were developing, the community is going forward. Like async is still pretty Young in pythons history. So we need to make sure that stuff is all in the right place. And fortunately, now I think it is. So let's look what it takes to go into the rm and change stuff. And first and foremost is Jango's api design. Now, I have long been a proponent of like what makes Jango is the way we design apis for the developer in a hurry, who also have things work safely. It's always been one of the bywords of Jango. But if you just do things that are faway, we suggest they generally come out working pretty well, pretty stable and with very few security holes. That's kind of the idea behind jangis that like as you get bigger, as your site grows, and I've worked for many sites to do this, you can just remove parts of jgo and replace them as you see fit. But the ghosts, the spirit of jgo still remains. The idea that like that layout and the idea you can remove pieces progressively and have them all work properly, that's still there. And that's what I love about it. And part of that, honestly, most of that is in the api design. So let's look at some of the features of the Janger urm that you probably all know and are familiar with, that when you come to async, they end up being just a little bit different. And in most cases, you can get around it. But I want to sort of appreciate some of the difficulties in the way that the async extensions were added on Python as a language design, the implications those give us for Django and how we design the irm. So when you come to async, there's a couple of key caveats. And I think the biggest one, of course, is that before for async views is that when you look at a function or a callable, you can't tell if it returns a co routine. The way Python chose do async is rather having like a separate function callable type. Instead, it has it says that a asynchronous co routine function is a function that returns a co rouine object. If you use generators in the yield keyword, this is very familiar, as is by Python core shows it. It's a very sensible move in that respect. In fact, async was based off generators originally. Now, the problem with this is that of course, everything's accorable, everything's a function. If it is written as async death in the actual source code, then there's a special flag that ament that says, yes, this function object actually is a co routine function. And there's a thing called is co routine function in asynchthat will tell you this, but that's not of all functions. If you just write a standard death function that returns a corouine, that's still a fairly go routine function that we should to do async things with, but we can't tell it's one from the outside. It's like, you know here on screen right now, these are both perfectly valid coating functions. They essentially return the same kind of thing. But one of them you can tell from the outside before you run it what it does. And the other one, you can't tell from the outside what it does. And the what comes down to at the end of this is that Python doesn't have a separate call and async call method. It just as a call method. And so what this means is as a function on the other end of that contract, you don't know if the thing calling you is expecting a coteam back because it's in like an asynchronous context, or if it's just a sort of normal old Python function that wants a normal synchronous result. And so what this means is you can't have one function that does two things. So you have to name space every function you're providing to have both asynchronous and an asynchronous variant. Let's take a look, for example, at the get method on managers in the rm. Now I would love to be able to say, Oh, a wait, model objects get ID, but we can't do that because the get method has to only be one. I'm synchronous, asynchronous. And of course, we won't want to break anything here. So instead we have to name space it. Now this particular name spacing is, shall we say, pending. We're still not quite sure how to separate out properly. It might be a, it might have like get underscasync. Whatever this ends up being, it will be something like this. But the idea is that the async variants will be named differently, and you will have to access them differently. This is just something thrust upon us. By the way, Python 's async is designed. It's there's a trade off like it's a good design. In many ways. This is the unfortunate one of the few unfortunate cons for it. So this is what we get in terms of functions. This is not too bad in many ways. I do like this because it means if you are calling an asynchronous variant, you can read through a code and make sure that yes, we're definitely calling all the async functions. Because the other kind of weird side effect of Python is that if you call a synchronous function in an asynchronous context by mistake, then it just runs, and it runs really slow, a blockyour le process, which doesn't throw an error. And so it's actually quite nice to have an extra safety barrier of, Oh, these are the definite set for asynchronous functions that definitely aren't going to block, because the failure mode of calling synchronous stuff is your website gets mysteriously slow and no one knows why until someone finally goes some, even this function you've called get the wrong way. So that's one other, the sort of tradeoffs you have to deal with. Now, the other main one, and this is maybe the biggest blow in some ways that jgoes 's design, is you can't do asynchronous attribute access. Now, if you're not familiar in Python, you can overriobjects so that the attribute access where you do dot name is run through a custom function. And this is what jgo does for a lot of its fields on models. In particular, when you have a foreign key field on a model, if you haven't loaded it, when you do dot foreign key dot name in the background, Jango will quickly go, Oh, and go and talk the database, query that thing for you as a single, get query, pull it back, load the object into memory, and then serve you the result of that object. Now of course, the thing with async is we're not allowed to do any io outside of asynchronous context, and we can't do actual access asynchronously. So what that means is we are not allowed to do that background database fetch in an asynchronous mode, because if you did dot name or say dot four key dot name in an asynchronous context and we left the old synchronous code in, Jango would again, block lot the whole process, run a synchronous query, pull it back, and that's not what you want. So this stung me for a while, but then one of the nice things is you probably shouldn't be doing this anyway. And so if you do a query like this, again, asynget function it like the way this is written is you know still being decided. But imagine it was written like this, this would error. Basically, if you are trying to access a foreign key you didn't fetch. In future, if you are in async mode, and only in async mode, jangle will throw an error saying, Hey, you can't do implicit access during async mode, please use select related. Because what you can do is you can do this. And honestly, you probably should be doing this. There are very few edge cases where you don't want to select related on a query where you're pulling foreign keys. So in many ways, this isn't too much of a problem to have because we're kind of encouraging the right thing to do anyway. Now of course, we'll still allow attrite access in synchronous mode, but work perfectly fine. We can tell inside functions what mode the outside is in at least. But in asynchronous mode, you kind of have to select related. There are some very, very like terminal edge cases where giango rm and query logic doesn't handle fliplated right. And for those, I apologize, you may eventually have to break out and do your own separate query inside of for loop, but for 95% of all cases, I think you'll find that being forced you slerelated is probably a good thing. And in fact, I'm almost tempted to have this option we can just turn on all the time where Jango just doesn't let you use non, doesn't let you use implicit aturate acat, all you always have to use slep rating because I can think of several projects I've worked on where I have made that mistake no number of times, like just a huge amount of times and pulled up jger about talburand said, why are there 400 queries? Oh, I forgot to I'm doing the query inside the loop. So that's kind of the thing we're going for there. So that's kind of the bad stuff. And let's go on the good stuff. So some of the nice things out there, there are nice analogs of some of the things we're used to in synchronous world, in the asynchronous world. Maybe my favorite one of these is iteration. Now when you call four or any other sort iteration primitives in Python on an object, it again looks for a special method, in this case, double underscore itter. And the nice thing is there's an async version of this called double underscore a iter and so on query sets, which of course is most of the things you're iterating over in the rm. We can supply both. And when you do async four, which is the async version of four, it can call the a itter method rather than the itter method. And we can give you that lovely full asynchronous path of database and not block the process. So what this means is if you're writing asynchronous code again, you can just basically take your synchronous four loop, make sure you put seltorators in the main query and then change four to async four. And generally, itmostly work like you want. And this is one of the really nice features. There's other things like async wiand, other things like that too. But this is maybe the most important one for the rm because thinking about like again, api design is crucial, right? We want to make sure that the api is designed in a way that people appreciate. This is part of that. And so I think it's really nice to have the core, like looping through iteration part of Jango, be honestly pretty similar between both worlds. So let's talk a bit about the plan to make all this work. And at the top of this, I want to put big disclaimer. This is a plan we, I mean, I and a few others have started like roughly planning this. Not much code at all than written yet. I think none of it's been committed. This is also it to change. But this is kind of based on what I've been poking around with, what I think the best PaaS forward is. So again, let's look at some phases. You can tell I've been doing prinple engineering for too long. I now do project planning and phases, three phases here again too. The first one is to have an asynchronous mod api. So in my opinion, the key thing is to get the api design locked down so you can start writing apps using it, even if behind the scenes you'll see in a second, is not quite the same as being pure async. The idea is that if you have async code that you can write async views with async ro accessing them. And it all looks and works nicely and crucially, is safe. We'll get on of safety in a bit later transaction especially. Secondly, once we made that sort of top level asyc friendly, we then go into the sort of the core guts of the Jangar rm, which is the query intertunnels and the compiler and other things like that. We make those async friendly. Now the nice thing is a lot of that stuff is cpu bound by the compiler, doesn't need to be told much about acynat all mostly it's the path through the query from the query set into the connection that needs to make async. I show you a diagram and a second of that. And then finally, once we made sort of that top part, all those synchronous, we then have a look at the database adapters. Now, as you'll see in a bit, we don't have to make those asynchronous. But if you want the best performance and some databases already to support this, you do want native asynchronous performance database layer as well. So let's diagram how this looks. So phase one, as I said, you take those sort of user facing apis, which is generally the base model class, the query set class and the manager class or the base manager class, all things come off of it. And you make those have the async variasync understanding. So things like aator, like I said, for looping over in for loops, async get, async update, async create, like basically async versions of all the operations that cause a change the database. Now for query set, this is surprisingly a few things, because whenever you do filter what other things in a query set chain, they're all lazy. We don't have to change those. Anything that does anything in a query set is when you finally iterate over it. So in fact, query set mostly stays untouched. And there's a few things in there that do cause an actual evaluation. I think values list does it, for example, but most things don't. So we can keep those all the same. Not much need for like async mainspacing everywhere in our query set. So that's phase one. And as you can see here, the rest of Jango below, those sort of initial user facing things is running threaded. And essentially, this is kind of version of what you can do today if you use async views. You have to basically take your rm code and wrap it in async to async wrapper that we provide as part of Jango. And this is to be doing that in a sort of more formalized, more safe way, where rather than just like make a separate inner function and decorate it and put it in, feed it in properly, you can just call the error in a way you think is correct from an async world. And Jango will handle the async handoffs, making sure the threads are good and making sure everything runs correctly. It is, of course, still threaded behind the scenes, but if you're doing, say, you know ten degracrews in parallel, Python threading overhead isn't going to hurt you very much. You still going to get massive increases. Like imagine if what your site does is it does three or four like medium length complex queries that don't depend on each other. In the future, you can literally pull all those and run them all in parallel in an asynview and then get back the result in a quarter of the time. And that's the kind of real benefit you can see. And that's honestly one of the reasons I think the irm is the place where acying's really going to shine. But let's talk about what we do after we've got this one running. So this is probably the first step and probably will be the first thing in a Jango release by itself. And then we need to go in and make it sort of a bit deeper. So we bring down that level of where the async boundary lies and make it the other side of the query. So now all the Jango's internal query logic and execution and mapping is all done with asynchronous friendliness in mind. It may not be pure async, but it understands how things interact. And then the only thing when threaded is sort the database adapters, things like when you sort of type in your database setting, like those that of code, like, Oh, here's the MySQL adapter, here's the posters adapter and so on. Those will still run async. And then finally, we will then pull the boundary all the way down and make the whole thing asynchronous. Now there is a big asterisk on this one, the very bottom of this diagram right there is not controlled by Jango, the datablibraries, our third party. And we have to make sure there are good enough, mature enough third party database libraries we can write against to make this work. That's one of the reasons like we know we are never going to have to be able to drop threader support entirely. In fact, in many ways, we may never get to phase three, and that's perfectly okay. Most of the performance gains you're going to see from the asyc rm are still present when it's fully threaded. And I expect that we will get full asing get apin time, maybe not for oracle, but certainly for postgreares and MySQL and sql light. Potentially, though, its single file threaded is a little different. And one of the other reasons this is difficult too is that Python has a database api standard. It's called db api two. It's been with us for a very long time now, and it's great. But the problem is, like whiskey, it's not ready for an async world ds, and honestly, for very good reasons, most database library authors have already had too much to do on their plates and don't want na have to talk about another kind of standard and then implementing it and then all agree on how it works. And there's also so like this, this is kind of what belies all of this. So at least in the shorter medium term, it's gonna to be jgo's job to rpple that up for you. I do expect we'll probably ship, at least in the medium term, a pure asynchronous backend, probably on a different library of the moment. Ents we have right now, for example, for postgrethere's, a couple of nicely maturing asynchronous libraries talking to postgres, but they're not psychopd two. And so we would have to go and change a lot of our code and the assumptions of how psychopg two works. And if you have looked at Jango's bug tracker for any length of time, you will know that most of the bugs are in weird edge Daase cases. That is our job to fix and don't want to just throw away those fixes. That's kind of where it becomes, you know, interesting. But I interto reiterate this, databases via threads are not terrible. Threading is bad. Like it's not going to be scaled indefinitely, but if you just booting them up just to run a query and shutting them down again, you can have a pool of ten to 20 threads that significantly speeds up applications performance. And of course, in async mode, you can do queries and also do api calls. In the modern world of very heavy like microservice architectures having asynchronous http or grpc or other calls, incredibly useful as well. In my experience, most large software design, if you are a big company, it's spent waiting on other things. So like as you get bigger, that gets more important. But even as a small shop, I think the rm is well of shine where you can do as queries quickly and understand them and see how they go. Now let's talk about the one final sort of flying the ointment here, and that is transactions. Transactions are very tricky. And the reason for this is transactions run on database cursors. And database cursors are one per thread. This is part of Jango is a big thread, local in the middle of it. If we are going to support transactions, we have to be able to support them all in one thread. And this is a problem when you're going from async to sync world internally. When you go from an asynchronous context to assynchronous context, Django boots have a separate thread and then run stuff in the separate thread and then brings it back to the main asynchronous thread when it's done. What that means is we can't share transactions across that boundary. So I'm not quite sure the solution here is, but it's probably quite likely that when you use transaction dolatomic, it won't cross over asynchronous synchronous boundaries. And we're going to have to work out how to signal that. So it's pretty safe it if there's error, maybe having to wrap it in a certain way, maybe a different version of atomic that is async compatible. But that's kind of one of the key things is like it's going na really suck unless you find a good way of doing it. There are some promising research into getting around this from some other database libraries I want to look into. But for now, I expect this will be maybe the main sticking point of the first part of that conversion of making an async friendly api. The thing to mention is some things just don't need async. I'm honestly very glad that migrations doesn't need it. It runs once in the background synchronously, like it does not need performance improvements in that sense, like it all runs in serial anyway. So we're gonna to leave those bits well alone. Introspection is another thing. A lot of the internal of fields like conversion to and from Python also the same. Honestly, the service area that we have to make async isn't too bad. It is very tracitable or I wouldn't be here given this talk saying it's totally possible, but it is still a lot of it. So thankfully, we can ignore some pieces. Again, like you know we can ignore things like forms right now in competitive views, but other things we have to tackle. So what's first like what is the first thing that we're working on here well before those phases, we need to decide on what the async api design looks like. And I'm not a fan of designing in a vacuum. I want na do it sort of with the real thing, with the concrete thing to play around with. We do that as well. We need to grapple with transactions. And again, that's kind of part of the api design and make sure it fits and works well. And finally, I need to figure out the thing that underlies transactions, which is asent connection management. How do we pull things and have multiple tions in one async thread or multiple like separate separations and separate threads? Do we share them across sync threads? There's a whole world of things. Some adapters don't like, other things. Sql lights are especially sensitive to this. It's kind of a pain. But on top of all of this, the thing to remember is async is only important for io bound code. This is one of the reasons we're doing the rm. It's not important for things that cpu bound doing computation, pyyon is still putting them on single core behind the scenes, but things that are io bound is super important for, and that's why the rm is a huge part of this. Like Django is the rm. I've used the rm so much over my probably 15 years at this point using Jango, and it is a huge win for us to make it async, especially if it's easy, like the ability to write a query quickly and easily that just can run in parallel and it's safe and you get the results back. Like to me, that's the promise of what async Jango is. And so I'm hoping that's what we get. There's a lot to come. I'm not gonna to commit to anything being 3.2, but I'm soft aiming for at least that first phase to be in there. But of course we live in interesting times, so we'll see. But Yeah, thank you very much for listening to this. I hope you're excited about async Jango as much as I am, and hopefully I'll see you soon somewhere talk about async Jango. And if you're interested in helping out with this kind of stuff, please come with the Jango forums. We're happy to discuss it. We have a whole async forum there to discuss things. But until then, thank you very much, and I'll see you around.
最新摘要 (详细摘要)
概览/核心摘要 (Executive Summary)
在2020年PyCon Australia的演讲中,Django核心开发者Andrew Godwin详细阐述了为Django对象关系映射器 (ORM) 引入异步支持的宏伟计划,这是继Django 3.1成功实现异步视图后的下一个关键步骤。该计划的核心目标并非将Django完全改造为异步框架,而是构建一个“混合”模式,允许开发者在同一个项目中同时使用同步和异步代码。这种模式旨在让开发者仅在性能关键的I/O密集型场景(如数据库查询、外部API调用)中使用异步来提升效率,例如通过并行执行4个独立的数据库查询,将响应时间缩短至原来的四分之一。
Godwin提出了一个三阶段实施方案,优先级明确:首先,发布一个对用户友好的异步API(如aget(), acreate(), async for),其底层通过线程池执行同步数据库操作,目的是尽快锁定API设计并让开发者受益;其次,将异步支持深入到ORM的查询构造和编译器等内部核心逻辑;最后,实现对原生异步数据库驱动的支持以获得最佳性能,但这依赖于第三方库的成熟度。
演讲强调了几个关键的技术挑战:
1. API命名空间:由于Python的语言设计,必须为异步操作提供独立的命名(如aget()),而不能简单地await现有同步方法。
2. 禁止隐式I/O:在异步模式下,将禁止通过属性访问懒加载关联对象(如my_obj.foreign_key),强制开发者使用select_related。这既是技术限制,也被认为是更优的编程实践。
3. 事务管理:跨越异步/同步边界时保持事务的原子性是当前最棘手的设计挑战,因事务与线程绑定的数据库连接紧密相关。
最终,该计划旨在让Django ORM在保持其安全、易用特性的同时,赋予开发者并行执行数据库查询的能力,从而显著提升应用的性能和响应速度。
背景与目标:迈向“混合”异步的Django
在Django 3.1成功发布并支持异步视图(Async Views)后,社区的下一个主要目标是为Django ORM添加异步功能。Andrew Godwin强调,这一努力并非要将Django转变为一个纯异步框架,而是旨在创建一个“混合异步世界”(hybrid async world)。
- 核心理念:允许同步代码和异步代码共存。开发者可以继续使用更简单、更直接的同步代码处理大部分业务逻辑,仅在性能瓶颈或涉及大量I/O等待的部分采用异步模式。
- 性能收益预期:异步ORM的价值在于并行处理I/O任务。例如,原本需要依次执行的4个独立数据库查询,未来可在异步视图中并行处理,理论上将总耗时缩减为原来的四分之一。
- ORM的重要性:ORM是Django代码库中体量最大、最复杂的部分。为其添加异步支持是整个异步计划中最具挑战性但也最有价值的一步。
“The rm is honestly most of Django. This is by lines of code. It's certainly by complexity and understanding. It is a big, hairy beast sitting the center of what we know and love.”
三阶段实施计划与优先级
Godwin提出了一个循序渐进的三阶段计划,其核心思想是优先交付用户可感知的API,再逐步深化底层实现。
-
第一阶段(最高优先级):实现异步用户接口 (Asynchronous User-Facing API)
- 目标:为ORM的常用操作提供异步版本的API,如
aget(),acreate(),aupdate()以及通过async for进行异步迭代。 - 实现方式:此阶段API虽为异步,但底层数据库操作仍将通过Django内置的
sync_to_async适配器在独立的线程中同步执行。 - 优势:可以快速锁定API设计,让开发者能够立即开始编写和测试异步ORM代码,并从并行查询中获益,而无需等待底层完全异步化。
- 目标:为ORM的常用操作提供异步版本的API,如
-
第二阶段:改造ORM核心逻辑 (Asynchronous Query Internals)
- 目标:将异步支持深入到ORM的内部,包括查询(Query)的构建、编译和执行逻辑。
- 实现方式:将异步与同步的边界从用户API层下推到数据库适配器层。这意味着从QuerySet到数据库连接的大部分路径都将是“异步友好”的。
- 效果:进一步减少线程切换的开销,提升内部处理效率。
-
第三阶段(长期目标):原生异步数据库适配器 (Native Asynchronous Database Adapters)
- 目标:利用原生的异步数据库驱动替换掉基于线程的同步驱动,实现端到端的全异步路径,以获得极致性能。
- 挑战与依赖:此阶段的实现高度依赖于第三方异步数据库库的成熟度,例如已在发展的针对PostgreSQL的
asyncpg等库。主要挑战在于,Python目前缺乏一个像同步DB-API 2那样的官方异步数据库API标准,导致各库实现不一,增加了Django的适配难度。 - 现实考量:Godwin明确指出,即使第三阶段无法对所有数据库后端完全实现,仅完成前两个阶段也能带来巨大的性能提升。基于线程的数据库访问“并非那么糟糕”(not terrible),对多数应用已足够。
关键技术挑战与设计决策
将一个庞大且设计精良的同步API改造为支持异步,面临着诸多源于语言设计和框架历史的挑战。
-
API命名空间:为何需要
aget()而非await get()- 原因:Python的
async def函数返回的是一个协程对象(coroutine),而普通函数返回实际结果。一个函数无法根据调用它的上下文(同步或异步)来决定返回类型。 - 决策:为保持向后兼容,所有异步操作都必须拥有一个明确的、独立的名称。因此,
Model.objects.get()的异步版本将被命名为Model.objects.aget()(或其他类似命名,具体待定)。 - 附带好处:这种明确的命名有助于代码审查,可以清晰地识别出异步调用,防止因误用同步阻塞函数而导致的性能问题。
- 原因:Python的
-
禁止隐式I/O:强制使用
select_related- 问题:Django ORM允许通过属性访问来“懒加载”关联对象(例如
post.author.name)。在后台,如果author对象未被预加载,Django会发起一次新的数据库查询。在异步上下文中,这种隐式的、同步的I/O操作会阻塞整个事件循环。 - 解决方案:在异步模式下,当尝试访问一个未通过
select_related或prefetch_related预先抓取的关联对象时,Django将直接抛出错误。 - 影响:这强制开发者在编写查询时就明确声明需要加载的数据,这本身就是一种被广泛推荐的最佳实践,有助于避免“N+1查询”问题。
“...in asynchronous mode, you kind of have to select related. ... I think you'll find that being forced you [to use] select_related is probably a good thing.”
- 问题:Django ORM允许通过属性访问来“懒加载”关联对象(例如
-
异步迭代:
async for的优雅支持- 优势:Django将利用Python原生的异步迭代支持(
async for语法和__aiter__魔术方法),这与同步迭代的for和__iter__形成了完美对应。 - 实现:Django的QuerySet对象可以同时实现
__iter__和__aiter__,使得在同步和异步代码中迭代查询结果集的写法非常相似和自然。
- 优势:Django将利用Python原生的异步迭代支持(
-
事务管理的复杂性 (Transactions)
- 核心难题:数据库事务通常与单个数据库连接(或游标)绑定,而这些连接在Django中是线程本地的(thread-local)。
- 直接后果:当代码从异步上下文切换到同步上下文(通过在新线程中运行)时,会丢失原始的事务状态。因此,标准的
transaction.atomic将无法跨越异步和同步的边界。 - 潜在方案:解决方案尚在探索中,可能需要引入一个专为异步设计的、新的事务API,或者找到一种在不同线程间传递事务状态的方法。这是当前最棘手的设计挑战。
无需异步化的部分
演讲者明确指出,并非Django的所有部分都需要异步化。
- Django Migrations:迁移操作是一次性的、在后台串行执行的任务,不需要异步带来的性能提升。
- 模型字段的内部转换、数据库Introspection等:这些CPU密集型或非高频I/O操作同样无需改造。
结论与展望
为Django ORM引入异步支持是一项复杂但至关重要的工程。其最终目标是让开发者能够轻松编写出高性能的Web应用,通过安全、便捷地并行执行数据库查询来大幅缩短响应时间。
- 核心价值:异步ORM的真正威力在于能够安全、便捷地并行执行数据库查询,这是Django异步化进程中最能体现价值的部分。
- 后续步骤:当前的首要任务是敲定异步API的设计、解决事务管理的难题以及设计异步连接管理方案。
- 时间表:Andrew Godwin表示,他“温和地期望”(soft aiming)能在Django 3.2版本中包含第一阶段的成果,但也强调这并非一个确定的承诺。
- 社区参与:Andrew Godwin鼓励感兴趣的开发者访问Django官方论坛(Django Forums)的异步专区,参与讨论,共同推动Django异步生态的发展。