2025-06-05 | DjangoCon Europe 2025 | How to get Foreign Keys horribly wrong in Django
如何在 Django 中正确处理外键和数据库迁移优化
标签
媒体详情
- 上传日期
- 2025-06-21 18:43
- 来源
- https://www.youtube.com/watch?v=l1xi_yKnhbE
- 处理状态
- 已完成
- 转录状态
- 已完成
- Latest LLM Model
- gemini-2.5-pro
转录
speaker 1: Don't be intimidated by the one out of 106 slides. Okay, so first of all, I'm very happy to be here. I've been a member of the Django community for about maybe eight, nine years, built successful products using Django and other open source technologies. And I'm writing about Jango. I'm talking about Jango, and I'm very happy to be here. So thank you all for coming. My name is hockey. I'm a software developer. I'm also a team lead. And today I want to talk to you about foreign keys. So the topic is foreign keys. But as you'll see, I'm going to use this as an excuse, talk about other stuff as well. So the basis of our talk is going to be a simple Jango app. We're going to build a product catalog in Jango. Okay. So the first model that we're going to have is a category, basically ID in the name of a product category in our catalog. Next, we're going to have a product. We have ID, we have name, we have description. And then we have a foreign key to the category, which is appropriate because this is a talk about foreign keys. Next, we have the sort order of the product inside the category. When we display the catalog ui, we want to make sure that feature products displayed first, less popular products displayed at the bottom. So we have the sort order within the category to make sure that we don't repeat the sort order within the category, we have a unique constraint at the top. You can see you together. Category salall them. Next, we want to keep some audit information about our product, who created the product and who edited the product. Okay. This is the two foreign keys at the bottom. Since this is a talk about foreign keys, I just wanted to highlight that we have three foreign keys category and two foreign keys to the user model in Jango. Okay. So what we're going to do now, yes, yesterday, they told us that we need to do more card reviews. So what we're going to do is we are going to go through a process of doing code review to this very, very simple app. And in the process, we're going to see a lot of little things that you need to pay attention to when you're building Jango models. Okay? So just to get us warmed up, if any of you saw this unique together and felt uncomfortable in your seat, it's for a good reason. Unique together, if you go to the documentation, you see that it's soon to be deprecated. I don't know when, but probably someday, hopefully soon, and that you should probably use unique constraint instead. Okay? So this is a simple change. We're going to do it together. Okay? So we go to our model, we drop the unique together and replace it with a unique constraint. This is basically the same thing. Okay? So the first takeaway, and that's an easy one yo, we have to start with something easy. Don't use unique together. Use unique constraints. You have better control. And it's encouraged by the documentation. Okay, so now let's have a look at our schema. Okay? This is our postgreaschema sladdescribe, the product table. Anything sticks out to you. I encourage you to answer. You can yell at me. It's fine. It's part of the process. Anything sticks out. Let me give you a hint. Anything sticks out. That's right. We have two indexes prefiexed by category. Okay? So when we query by category, basically the postgres can now use one of the two indexes, right? So maybe we can drop one of them, right? So the first index is the unique constraint we just created. Basically, it's not a good idea to drop a unique constraint because it's not used just for performance. It's also used to enforce some kind of constraint. So the next index, it's not that clear where this index came from. So this index came when we defined a foreign key. Okay? So we also talked about this yesterday in adamtalk. When you create a foreign key, Jango, Jango will implicitly add an index on this foreign key. Now, and most of the times, this is a good idea. Like this is a good default. If I had to make this decision, I probably come to the same conclusion. But it's not always a good idea. If you continue reading, you can see that you can control this behavior by setting db index to false. Okay, so let's go ahead and do that. So we go back to our model. We add db index equals false. And just to leave something for the next guy, I'm also adding a comment explaining that this is indexed by a unique constraints. They're the good reason I decided to ditch this index. Right? So now we generate the migrations. And what's next? What comes next? Apply the migration. Stop right there, my friend. Stop right there. Right? Whenever we apply migration, we first need to look at the migration. Okay? So this is the migration that we just generated. It looks harmless. It looks very, very innocent, right? We just alter the field. We change from db index to db index. False. What can possibly go wrong with such a simple migration? So then we take the next step. Now we want to look at the sql generated by this migration. What do you see? Look at this sql. What do you see? What do we actually want to do? We a, we have a foreign key constraint and want to drop the index, right? What is Jango doing? Jango is recreating the foreign key without the index. Is this what we want? Absolutely not, right? Jango is recreating the foreign key. So why is this bet? First of all, dropping an index is a potentially blocking operation, right? If we have a very large index and we dropping the index, it blocks the second, it possibly the more problematic issue here is that recreating the foreign key requires validating the foreign key and validating the foreign key. Ys, both blocking and potentially very, very slow, right? If we have huge tables, this can take a lot of time, but we don't actually want to validate the foreign key. The foreign key is fine. We just want to drop the index, right? Okay, so let's actually do that, because apparently Jango can identify the situation by itself. So we need to help him so to understand how we can change the way Jango executes stuff in the database. Okay. And people, I'm actually very happy that I'm not the first one to show you this up, but we need to understand how migrations work. So we need to make migration. What happens is Jango will take audio migrations, right? Apply them in order, generate some state. They would compare that to the state of the models, like the ones you have in your code, and generate migrations from the difference. These are the migrations files containing the operations, right? And finally, when we issue migration, takes these operations, converts them to sql and applies them to the database. So if we want to provide different sql, then what Jango is generating, we can use this special op called separate database and state. Okay, you've seen it yesterday, so let's see how it works. This is the change that we're going to make. Okay? So instead of doing ultrafield like Jango suggested, we're now doing separate database and state. So let's break it down first. Separate database in state allows us to execute stuff on the state and different things on the database, right? So the state operations, that's the easy part. Just take whatever Jango generated and stick it in there. Okay? This is exactly what Jango generated. Very easy. Next up, we have the database operation ations. This is what we actually want to execute in the database. In our case, all we actually want to do is drop the index. Okay? So let's check the sql generated by the migration now. And we can see that this is the sql that we asked, we said we want na use, right? So the next takeaway is always check the sql generated by a migration before you execute it because who knows what you'll find there? Maybe it's creating foreign keys. I don't know. Okay, now we're applying the migration. We now get the sql that you want, right? But something is missing. What if we made a mistake and we want to roll back? So Jango has this very nice feature that allows you to unapply migration. If we try to unapply our migration, we're going to get this nasty errow saying that we can because the migration is not reversible. Now this is an easy fix, right? We used rsql. And the second argument to this function is the opposite sql. So if we want to drop the index, the opposite would be to create the index. That's right. So we want to create the index. Where do we get this sql command from? From the migrations that edit, in this case, the initial migration, right? So if we want to reverse the migration now, we can see that it's working beautiful. So the next takeaway, whenever possible, provide reverse sql to any operation. Who knows when you're going to need it? So what if we have a very, very busy system, right? We said that dropping an index is a blocking operations. If we have a busy system and we have a very big index, dropping the index can cause some problems with live traffic, right? So one thing we can do is we can drop the index concurrently, okay? But there's a catch here that we need to understand. So let's break it down to drop an index concurrently. Okay, in it drops the index without obtaining exclusive locks, which means it doesn't interfere with ongoing traffic on our system, right? So it can take a little bit more time, but I think that it's worth it if you have a business system. The next cch that we need to be aware of is that postgres implements something called transactional ddl, which is a great feature. It basically allows you to run a bunch of changes to your schema, ultra drop, create, and then commit them at the end. The thing is, concurrent operation cannot be executed inside an atomic block, so you need to turn atomic off, which means that now if you have a lot of operation inside your migration and something fails in the middle, you risk leaving your system in an inconsistent state. Okay, so if you have just wup, it's fine. You said atomic falls. No worries. If you have a lot of operation, your migration, it might be a good idea to split the non atomic parts ture, a separate migration. Okay? So our next takeaway is, if you have a business system, use concuroperations on indexes, drop, create indexes. Okay? So this is our final migration. This is what it looks like. It took some work, but we made it right now. It's fine, it's safe, it's reversible, everything that we need. So now that we are all foreign key experts, okay, we have another foreign key created by right points to the user. Now the question is, do we need the index on this field? What do you think? What if I told you that. Created by is used mostly for audit purposes. Nobody cares like how many give me all the products that aki created. Nobody cares about that, right? Given a product, they want to know that aki created it, but nobody cares. Give me all select star from products were created by equsaki. So why do you say can we drop this index? It was created implicitly for us by Jango. We didn't make this decision explicitly anyway, so can we drop it? Okay, I see. Yes, I see. No. You're all very quiet. So I just say, sure, let's drop it. Okay, let's drop it. Now before we drop an index on a foreign key and some of you caught this, we need to stop and think for a second. Okay, so how exactly our indexes on foreign key are being used? Okay, so the first use is pretty straightforward. Like when did you select stofrom products were created by equals key then the index is being used, right? We said that this is not the case for us, right? We don't use that the other occasion where foreign keis being used is not that obvious, okay? So when we you delete related objects user for example the database in Jango needs to make sure that nobody references this user, right? So they would issue queries to check that nobodies using nobody's referencing this object. Okay, so this what you see right here is what we call written in blood. And the reason for that is we had this scenario in our system when we had a process where we deleted users, okay? So I don't know, Jango made this unfortunate decision to implement registration on top of the user model. So what's happening is if someone creates a user, never validates the email, you have a bunch of users with unvalidated emails, right? So we didn't want that in our system. That's a pretty big table. And you don't want trathere. So we started delleading user after a while, like if you didn't validate your email after a week, we just delete the user. Okay? But then we started getting this very strange errows. These delete commands started to time out. Why did they time out? I mean, so simple. Nobody ever used this user. It's indexed. What's the problem? So when we started investigating, we came to this conclusion. So let's see a demonstration. Okay, I created this user. I call him hockey. Okay, I just created this user. Nobody is referencing this user. Now let's delete this user. Okay, now see what's going on. I turn sql login on and I delete this user. See how many things are happening behind the scenes when you try to delete a user. So first you can see two queries there, looking for products referenced by, either created by or last edited by. Now, if you notice, this is the exact type of queries that can use the index in our foreign key. Right? Then you open a transaction, you start deleting rows from other tables that we didn't even think about that also reference the user. In this case, these are internal Jango tables for admin log, user groups, permissions and so on. And only then at the end, we actually delete the user. Now I want you to see something interesting. Look at how long it took to commit this transaction. It took around 300 milliseconds. Why is that? Why do you think it took so much time to commit this deletion? No, it might be, but there is another reason, because Jango was very nice to do all of these checks, right? And if you ever try to delete something from the admin, you'll see that you get a breakdown of what's going to be deleted. And when you delete something in jgo, you get a dictionary back saying which obgot deleted. But with all due respect to Jango, when I delete something, postgrealso needs to make sure of all of these things, right? Because postgreis the one guaranteeing that the foreign key holds. So if you look at the timings, I would say that what's happening during commit is basically all of this stuff on a warm cash. Okay? So basically we do all of these checks twice when we try to delete something. So the next takeaway is that indexes on foreign keys are also used indirectly when we delete related objects, in this case, a user. So why will not use them directly there might be used indirectly when we delete the related object. So we are going to keep this index, but we are going to do so explicitly. And we're also going to add a comment explaining why we decided to keep this index. So the next takeaway, and I can't emphasize this enough, always explicitly set db index on a foreign key and add a comment explaining why I went to the extent of adding a check, a Django check, to make sure that you always said db index explicitly. Otherwise you fail ci, someone would come yell at you. Now let's move on to the next index. But he say, can we ditch this index, right? We have this index last edited by right? It's snoable. We don't care about the users who edited a product. We just want, you know, mostly for audit purposes. So you say, no, I see someone shaking their head. Thanks for the participation, man. You're you're great. So we're not using it to filter by anything, but we already know that it's used when we delete associated objects, in this case, a user. So we are going to keep it, but we're not done with this index. Okay, once again, let's see a demonstration. For this demonstration, I'm actually going to add some data to my tables, right? So I'm going to create 100 users, 50 categories, okay? And I'm going to add a million products, okay? By the way, one cool thing I learned throughout this demonstration is that Jango has Lauren ipsom filter, which generates blobs, chunks of text, very useful for stuff like that. So if you'll notice, I generate a million products and then one, every 1000 products has actually been edited, okay? This is to mimic what happens in real life. Most products are not edited, only some of the products are being edited. Adam, you know where this is going. I think you do. Okay, so now let's go to our database after we created this index and look at the sizes. There's actually two interesting things that are going on here. Okay, the first one, which is a bit off topic, but I'm just going to see because I think it's interesting and I have an extra 1010 minutes. Thank you. So if you noticed, we have an index on primary key, right? So this index is indexing an integer, 21 mb. And then we have the indexes on created and last edited by. These are also indexes on integers. They both contain the same number of rows. So how come the primary keys 21 mb and the other indexes are just 6 mb? I'm sure at least some of you know, you can yell the answer. You know exactly, my friend. Starting get postgrefifteen postgrecan, implement implement an optimization called index deduplication that can most more efficiently store duplicate values in indexes. So the main difference between our two indexes is that primary keys, unique, so you can make use of deduplications, but created by and last edited by, contained a lot of repeating values, a lot of duplicate values. We just have 100 users and a million products. So lots of duplication. Postgres can optimize and therefore, the difference in size. But that's actually not the point here. What is the point? Let me give you a hint. What do you see here? We have two indexes on the user, and both of them are the same size. Yeah, just shout it out. No, no, don't tell him the answer, man. Just say why they are the same size. Okay, should I stop now? I don't know. Okay. So if we look at the actual data, okay, if we look at the actual data, we see that we have a million values, a million actual users set in, created by. This is a knonufield. When we create the product, we set the created by but less edited. Most products are not edited, so most values are null. So we only have 1000 values. That's like less than a tenth of a percent. Okay? So most values are. So if most of the rows don't actually contain a row, how come these indexes are the same size? Now, this game is a surprise to me, and I've been a dba for like 20 years, and this game is a shock to me when I transition from oracle to postgres. But apparently in postgress, no values are in fact, indexed. Okay? And this is a very, very big surprise to all of you. Using other databases, I never imagined that nles are indexed because nles effectively mean no value, right? But in postgres, they are indexed. Now, most of the time, we don't actually care about null values. Nobody wants to search for all the products that's never been edited. And if they do, and you actually have 99% of the table, you probably can't actually make use of this index. It doesn't make any sense, right? So we can use something called a partial index, which is an index built over a subset of the table. Which subset, you ask the one we actually care about. In this case, we only care about rows that actually has a value which are not null. So what we're going to do now is first, we're going to set db index to false and we're going to add a comment. We say this is indexed in, made a look up, you're going to see where it's indexed. And now we're going to add an index on last edited by and we're adding a condition. The condition is only where last edited by is not null, only when we actually have a value. Okay? Now if we create this index and compare the sizes again, look, this is amazing. From 6 mb, I went down to 32K. Now you're probably asking, come on, man, I can spare 6 mb. I wanna go to lunch. Why do I need all this work for? Right? Dude, don't be cheap. It's like friction of ascent. But if you have a very, very big tables, your betrees can become huge. And the bigger they get, the more difficult it becomes to recreate these indexes. So if you know in advance that you're not interested in this portion of the table, it's a good idea to make use of partial indexes early on. So the next takeaway, use partial indexes when make sense. I personally found that knowulable foreign keys are usually great candidates for partial indexes. Okay, so are we done? Did we forget something? What do tell? What did I forget? Okay, stop. We always need to check the migration, right? We just created a migration. We didn't look at the sql. So let's take a look at this migration. What do you see? We have an ultra field to set the db index to fault, and then we create the index on the product, right? So if you look at the sql, what do you see? We've already been down this road. We've set db index to false. Jango is recreating the foreign key, which is a bad thing. We just talked about this. So let's take care of that. Once again, exact same thing, separate database and state, create, drop the index, move the Jango op into the state operations. Now we can also do another thing, right? We also talked about dropping a, creating an index, being a blocking operation. So we want to create the index concurrently. We don't want to interfere with our system. Now this time, if you notice, I was able to use the built in the drop in replacements Jango provides for concurrent operations in. Previously, I couldn't use removing index concurrently cy, because my index was not defined in indexes. It was implicitly created by the foreign key. So if you try to remove index concurrently on the other index, Jango would tell you, I don't know this index, okay, because it wasn't created in indexes. But this time I can actually use add and remove inx concurrently, okay. So this is our complete migration. Why do you say this is a tough one? I'll give it to you. This is a tough one. Look at the migration. What would you change? You're now giving a review. Split it up. Okay. Interesting thought. Okay. Any other ideas? I know what you mean because it's atomic and you can get stuck, but I have a solution for that. What? Yeah, Yeah. Okay. Thank you, Adam. You get a gold star. So look at the the order of the operations that we're executing. So first we are dropping the full index and then we are creating the partial index, right? So as you just said, since we are atomic false, if we drop the full index at this point in the migration, there is no index, right? So if creating the partial index takes some time, we are leaving our system in a state that it has no index. So this can actually affect the system if it's actually using this index. So what would you do? That's right my friend. We are just going to change the order. So we are going to first create the partial index. At this point we have two indexes. That's fine. And then we going to drop the full index. So what you said, you can do it. You can split the migrations, but then you are you just reach the same situation where you're left without an index. If you simply change the order, it's less risky. Okay, so let's actually do this change. We just move ad index concurrently from the bottom to the top, and now we have a safe migration. Okay? So another takeaway, adjust the order of migration to reduce the effect on a running system. And usually the idea is you first create and then you drop. Okay, okay. No more migrations, I promise. So many talks about migrais like a big deal in Jango, I guess. So let's actually see some code. Okay, let's actually see some code. So let's add a function process of editing the product. And these these are our requirements. This is what we actually want to implement, okay? So we want to update the name and description, okay? And we also want to keep track of the user who last edited the product. And we also want to implement a business rule where we want to prevent non super users from editing products that were edby other super users. Okay. This is you can translate it to whatever permission checks you have in your systems. So the naive implementation would be a simple method that operates an instance method that operates on a product. You provide the name, the description and the user. You do the check right for the created by you said the values you save, right? So this is an iv implementation. Now the problem with instance methods, okay, and now you're going to have to work your imagination a bit is that it's not safe when it's executed by many concurrent transactions at the same time, okay? And the main problem with executing such a function by many, many processes at the same time is that usually the copy that they have in memory is not necessarily what's currently in the database. So this is a very, very like silly example of a problem with concurrency. But if you need to maintain things like counters, it becomes an actual problem. Okay? If we want to make this safe when executed by concurrent transaction at the same time, there's two prominent approaches, pessimistic approach and optimistic approach. So this talk is not about concurrency. So we're just going to go with the pessimistic approach. And this is basically how it goes. You start by obtaining a lock on the road to make sure that nobody else operates on it. Then you do whatever you want na do update the row and then you release the lock in Jango this translate to select for update to obtain a lock. Then you save and then you commit which releases the lock. Okay, so to obtain the lock, we're going to use Jango select for update, which uses the for update clause for the select statement in. So our refined implementation looks like that. And there are a few changes. First of all, when we operate on an instance method, we don't actually control how the object was fetched. If we want to control how the object is fetched and actually obtain a lock on it, we can use an instance method. So we switch to a class method. So now, instead of operating on self, we receive the iand argument to the function. Then we open a transaction, we fetch the product using select for update to obtain the lock. We do whatever we want to do and we save. Okay? So now that I convinced you that you need to use a class method, what do you think we can optimize? Okay. And they gave you a big hint. And I'm also going to ask you how many round trips to the database we have here? How many round trips to the database? Three. That's right. We first fethe product, that's one we save, that's two. And in the middle, when the when we check the super user we access created by and Jango is going to go and fetch this object from the database. So when you see something like that in code review, and this is like the lowest hanging fruit of optimization in Jango, what do you do? Select related. That's right. So let's add select related. Okay, now 99.9% of developers would stop here. Okay. But there's a catch. Let's imagine we have two sessions. The first session on the left starts our editing process, right? So it's open a transaction, it goes ahead and fetch the product with select for update and our select related. A different session, completely unrelated, might as well be from a different system is trying to make to turn user 97 inactive. When it tries to save, it gets blocked. Why is that super strange? Why is that? Yes, Mr. That's right. You should have give this talk. Okay. So if you look at the sql generated by our function, we can see that we're using select and the select related added an inner join to the user. And when we use select for update, the four update clause is basically locking all the rows referenced in this query. Since we have a join here, this also includes the user. Now do we want to lock the user? Is there a reason we should lock the user? No, that's right. No reason. So there is an argument to select for update, which allows you to specify what you actually want to lock. Okay? So in this case, I want to lock just the product so I can use the special keyword called self. Now I'm especially proud of this feature because it's a contribution from one of my teammates, right? So select for update off self would now translate to select for update off catalog product. So now I can obtain a lock on the product. I can also select related the user without locking it. Okay. So our next takeaway is always explicitly state which tables you want to lock using select for update. And I will even go as far as saying, if you're not using select related right now, add off self just to make sure that nobody comes in the future. Add, select related, and screws up the locking you wanted to do. Okay? So this is our final model, okay? There is no AI that can invite this model. Only we can write this model. Okay? I'm telling you, only humans can write this model. There's a lot of thought that went into this model. Now, I don't know if you noticed, but throughout this class, class, it kind of felike a class. But throughout this talk, we had a lot of takeaways. I'm not going to go over them, don't worry. But we had lots of takeaways. Okay. And if you're interested, you can read these stuff. I would just highlight the first one. Okay? And that's about using the Jango check framework to actually enforce some of this stuff, like make sure that you said db index, make sure that you set off on select for update and so on. And that's it. That's me. Thank you all very much. It feels . speaker 2: like a tough no. I don't think it's that half. It's going to be a few of the migrations. You had the explicit sel like create index and drop index. Do you see any benefit of using the add index operation there instead of writing the row sel? speaker 1: Well, yes, yes, because if you're using the operations provided by Jango, then you get portability across different databases and it's probably safer and future proof. I usually find that I can use them. When I can use them, I do use them. For example, when we wanted to remove index from currently, I use the built in migration. I do that only when I absolutely can't use them, I resort to run sql. Yeah, good question. speaker 2: Thank you for an interesting talk. The question is kinof twofold. Once you mentioned reordering the operations inside a single migration, I still think that, and I guess Tim will agree, if you hit control L C or something during the migration and you about the migration, you are still in a somewhat inconsistent state where you require manual intervention to fix some things. How do you propose to handle that? And the other thing is, how do you handle multiple databases in general? Create index concurrently is nice, but it only works in postgres. So how can you write migration files that work for multiple databases and still be efficient for every database? speaker 1: Okay. Law question. So as to your first question, first of all, if you're using an atomic transaction on a database that supports transactional ddl, then if you are born in the middle, you're not left in any consistent state. Yeah, I know, I know, I know. If you are using an anatomic migration, then you need to be a bit more careful about the order of operations. You in our case, if you hit control c in every in every point of the migration, at the worst case, you are either left in your initial state or with two indexes. So I would say that in this case, it's safe. I would say why why not? Oh, no. The problem with interrupting . speaker 3: create index concurrently is that it will leave behind the partially created index. No. And then you've got if you've written us create index concurrently, rather, if you're write, if not exists, for instance, it won't fail, but it will leave a usable . speaker 1: indeindex index. speaker 3: And if you just come back to and if there's not, if you've got if you haven't got not exists, it will attempt to create it. It already exists and is invalid, and at that point, it will fail again. Okay. So we've encountered this situation. The way we've dealt with it is to first check if the index exists and is not valid. speaker 1: And if so, delete it first. So drop if exists. Yeah, okay. Good, good tip. Yeah. Seriously, good tip. No. speaker 3: I take it I don't think there is a drop if exists. speaker 1: I think we just sts, Oh, okay. But I think we had to check . speaker 3: to see if it was invalid because if it exists, that's fine. Because let's say, okay, you got interrupted later on after it had been created and you're coming through again. If it's valid and exists, you want na leave it there, right? And then just keep moving on to the next thing that you missed. Okay. speaker 1: So the key takeaway is that nanatomic migration are super dangerous and you should avoid them at any cost . speaker 3: except you want to use create index concurrently because otherwise you're going to lock out access to the table if you've got a very big table and you're creating the index on it. speaker 1: Yeah, I know. Thank you. And to your second question, I have the luxury of only having to deal with postchrist. So I call this luxury, but I'm paying in other places. You know any other questions? Yeah. speaker 3: thank you for the talk. speaker 1: I thought it was great. speaker 3: Thank you. Do you have opinions on whether people should write tests for their migrations, especially when they're doing something slightly more advanced? Or are you more of the approach that you know you've mentioned checks and perhaps you've agreed these certain rules and perhaps those are tested or Yeah do you have any opinions about testing my Grayes? speaker 1: Yes, that's a great question. And I'm going to tell you my approach. I think it's a bit different than what you think it is. So what we've done in order to make sure that developers are very much aware of what actually being executed is, we created a githuaction that looks at the pr. And whenever there's a migration, it adds a comment with the actual sql. This way, when you review a pr, you have to look at the sql generated by the migration. And if there's a problem, you can identify it. If you're asking me if I think creating some kind of automated checks to try and identify it automatically, I think that would be great. But it's a question of how big your team is. Does it make economical sense to invest so much time in such a thing for me? I have a relatively small team, just ten people, so you know, I can just go throw something together them if they are doing something unsafe in the migration. So my approach is mostly just put it in front of their eye so they can't miss it. I don't think you can ask questions, man. Okay, I you said always look at the sql. That sounds a great advice. But most of us use the orm, so we never have to even think about sql. Is the orm just too leaky and we have to level up? First of all, I'm kind of like the sql guys. I'm not afraid of sql. I'm also I' M Camp sql, believe that everyone should know sql. And I'll tell you even more than that, I feel very, very comfortable with sql. And a lot of the times, I feel that the orm provides an actual value for me, and I much rather use the orm. So I would say to your question, I would not ditch the orm, but I would pay close attention to the places where the orm struggles. Okay. And by the way, just to be fair, a lot of the things I show here is about kind of decisions that Jango is making for you. Now, while I criticize or I don't even criticize, while I show the downsides of these decisions, it's important to understand that Jango, as a framework, always needs to target the lowest common denominator. Like if you have something that is correct 80% of the time, it makes sense to set this as the default. So I don't think that having db index on foreign keis a bad idea. I'm just saying that at some point, you need to be aware that this is what's happening and know the limitations and the default values and the default and the pleaof behaviors of the orm, so you can adjust them to get the most out of them. There's a lot of other implicit behaviors in Jango that you need to be aware of, but that's the topic for maybe next year. I don't know any more questions. Right? speaker 4: Talk. Thank you very much. Two questions not very related to each other. So about the concurrent index, let's say we have adding a concurrent index in a migration. And just to clarify, the migration will run. The process that runs the migration will only exit when the concurrent index is done. Is this correct? If we have a very big table and it takes like five minutes to add the index concurrently, and we are doing it concurrently, not to lock, but the migration process. speaker 1: we'll be there for five minutes, right? Well, the migration process, execute the command and waits until it ends. Yeah although it's connected migration. speaker 4: it would run them seriously. Yeah. Yeah. So a potential caveat of this is usually migration is part of the release process and this will hold the release process for five minutes. And sometimes you might want to do this . speaker 1: separately from other things, but it's really dependent on how you do your deployment. If you stop the service of the migration and start it back up, then Yeah, you have a problem. But if you're careful and very aware of what your migration is doing, it's usually a good idea to not stop the service, run the migration while the system is running, and then just restore the service, reload the code once it's done. There was actually a lighting talk yesterday about someone implemented post and prayer migration. I think that's a better you that's a very elegant solution. I need to talk to you about this. Yeah. speaker 4: second question. And second question is would recommend would you recommend people fiddling with sql without any abstraction so they can get a sense of what's happening and what the abstraction is giving them? And as you mentioned, what the abstraction is not taking away from them, but making implicit decisions instead of them. So would you recommend this? And what would be like the steps, like simply open postgreor, read a book on esqor. What would your recommendation be? speaker 1: First of all, read a book is a great recommendation. Okay. In general, which book? Well, I personally I don't I'm having a hard time understanding the fear of sql. Think that need to if you reach the point where the urm can do what you want to do, then Yeah, go ahead and run sql. It's fine. So to your question, if you need to use sql, use sql. The only place where this might be super, super challenging and dangerous is if you need to support different engines. And then it starts to become a bit difficult. But as I said earlier, if you have the luxury of not having to deal with lots of different databases, then I think it's a good investment to take the time to learn the capabilities of your specific database so know you can make safe migrations and get the most out of it. Yeah. Migration is a very hot topic apparently, and also concurrent operations. speaker 5: Yes. Mr. Shy. Hi. Thank you. This was a great talk. I want to incite a small battle. No. When you showed how you inspect the database, you used the psql, the basic text utilities that comes with postpress. And Adam yesterday recommended several graphical tools that allowed you to help you look at the database. Do you have a I'm expecting that this expresses your preference. And can you expand on that a little? speaker 1: Well, while I think ui tools have their place, when you ssh to a server to solve a production problem, you can use ui. So I like to keep my head dirty and get used to using the terminal. So ui tools are nice. And an opg mustard. I know, Michael, great guy, great product. But I like you know, I like my explain analyze right there in psql. But Yeah you can use print queries that explain it's a great utility and I use it often when I don't want to open the psql terminal. Yeah, thanks again for the talk. What do you think can be something we can hin, the Jango core or the Jango harm? Starting from what you found, we can add some fissure to solve some or to improve the immigration part and avoiding using raw sql, for example, to let other people with other database to use it. Yeah and maybe someone are interested to work on that? Yeah, good question. I know that I showed several limitations of the migration framework here. For example, that you can use drop, sorry, remove and add inx concurrently on indexes which are not part of which were not defined through indexes, mea indexes, like the implicit indexes on 40 keys. I don't know what is the reason for that, but I'm guessing that it would be nice if I could use these operations on indexes created on all the indexes created by Jango. This is, by the way, I think also a problem for you together. And this is part of the reason why it's discouraged. And another limitation I think that I saw here is I don't remember honestly, I need to think about it. But Yeah, I'm guessing that Oh Yeah, the fact that it recreates the foreign key when I just want to drop the index, that's unfortunate. I'm actually not entirely sure why that is, but I'm sure there's a good reason. No, you think no. Okay. So a concurrency expert here says there's no reason. So easy pickings for you. Go go implement that. Yeah, I just had . speaker 2: to look at the check comment and and it didn't seem to find what reported. Is it? Is check supposed to find all of these automatically or have . speaker 1: to configure it? Oh, you looked at the check command and you couldn't see my personal rules. Yeah. So what I did is I extended, I implemented my own checks to enforce my own coding style. For example, make sure that db index is explicitly set on foreign keys. So I impleted my own rules. I impleted a bunch of other rules. For example, always provide verbous name, stuff like that. So if you're talking about the check, you can check out the article and see how to implement custom checks. I think that another person gave a talk yesterday and implemented some custom checks here. So I guess I'm not the only one. Thank you. Yeah, thank you. Hello. Get off the stage.
最新摘要 (详细摘要)
核心摘要 (Executive Summary)
本次演讲由软件开发者Haki Benita主讲,深入剖析了Django中与外键相关的常见陷阱和高级优化技巧。演讲通过对一个产品目录应用进行“代码审查”,揭示了Django ORM的默认行为(如为外键自动创建索引)在高负载场景下可能引发性能瓶颈、数据库锁争用及不安全的迁移。
演讲强调开发者需理解并审查ORM生成的SQL,主动进行优化。核心议题包括:
1. 索引管理:识别并移除因UniqueConstraint和外键同时存在而产生的冗余索引,并通过设置db_index=False显式控制索引创建。
2. 安全迁移:揭示了默认的AlterField迁移在修改索引时会隐式重建外键,导致长时间锁表。推荐使用SeparateDatabaseAndState和RunSQL进行精细化、非阻塞的迁移控制。
3. 高级策略:介绍了CREATE INDEX CONCURRENTLY以避免在生产环境中锁表,并探讨了其在非原子性迁移中的风险与应对策略(如调整操作顺序)。此外,通过使用局部索引 (Partial Index),将可空字段的索引大小从6 MB优化至32 KB。
4. 并发控制:指出了在并发更新场景下,select_for_update与select_related结合使用时会意外锁定关联表。在涉及select_related的并发场景下,使用select_for_update(of=['self'])是避免意外锁定的有效方案。
最终结论是,构建健壮、高效的Django应用需要开发者具备批判性思维,主动审查并优化数据库交互,通过明确声明(如db_index)、高级工具(如并发操作)和自定义检查(如Django Check Framework)来弥补框架默认行为的不足。
引言与示例模型介绍
演讲以一个简单的Django产品目录应用作为示例,该应用包含Category和Product两个核心模型。
Product模型的核心字段:name,description: 基本产品信息。category: 指向Category的外键。sort_order: 产品在分类内的排序。created_by,last_edited_by: 指向User模型的外键,用于审计。unique_together = ('category', 'sort_order'): 确保同一分类下排序值唯一。
演讲目标是通过对该模型进行逐步审查,揭示其背后隐藏的与外键和数据库交互相关的深层次问题。
迁移与索引优化:从unique_together到UniqueConstraint
-
弃用
unique_together:unique_together是即将被废弃的元选项,应使用models.UniqueConstraint替代,因其提供更强的控制力且为官方推荐。
-
发现并处理冗余索引:
- 替换为
UniqueConstraint后,检查PostgreSQL表结构发现category_id字段存在两个索引:一个由UniqueConstraint创建,另一个由Django为外键隐式自动创建。 - 问题: 存在功能重叠的冗余索引。
- 解决方案: 在外键定义中设置
db_index=False,以阻止Django自动创建该索引。
- 替换为
深入剖析Django迁移:风险与手动干预
-
默认迁移的陷阱:
- 当为外键设置
db_index=False后,Django生成的AlterField迁移看似无害,但其背后的SQL操作并非简单地删除索引。 - 关键问题: Django会“重新创建整个外键约束” (
recreating the foreign key),这是一个非常缓慢且阻塞的操作,尤其是在大表上,会严重影响生产环境。
- 当为外键设置
-
解决方案:
SeparateDatabaseAndState- 为精确控制数据库操作,应使用
migrations.SeparateDatabaseAndState。 - 工作原理: 它将对Django模型状态的更改(State Operations)与实际在数据库上执行的SQL(Database Operations)分离开。
- 实施步骤:
- State Operations: 使用Django自动生成的
AlterField操作,确保模型状态同步。 - Database Operations: 使用
migrations.RunSQL,手动编写DROP INDEX语句,只执行真正需要的操作。
- State Operations: 使用Django自动生成的
- 核心建议: “在执行任何迁移之前,务必检查其生成的SQL”。
- 为精确控制数据库操作,应使用
构建健壮的迁移:可逆性与并发操作
-
可逆迁移:
- 为
RunSQL提供reverse_sql参数(如CREATE INDEX语句),以确保迁移可以安全回滚。
- 为
-
并发操作 (
CONCURRENTLY):- 在繁忙的生产系统上,标准
DROP/CREATE INDEX会获取排他锁并阻塞业务。 - 解决方案: PostgreSQL支持并发创建/删除索引 (
CREATE/DROP INDEX CONCURRENTLY),它不会阻塞进行中的读写操作。 - ⚠️ 重要前提与风险:
- 并发操作不能在事务块中执行,因此包含此类操作的迁移必须设置
atomic = False。 - 非原子性迁移一旦中途失败,可能导致数据库处于不一致状态。建议将这类高风险操作拆分到独立的迁移文件中。
- 并发操作不能在事务块中执行,因此包含此类操作的迁移必须设置
- 在繁忙的生产系统上,标准
外键索引的隐性用途:DELETE操作的性能陷阱
- 问题:
created_by字段很少用于查询,其索引是否可以删除? - 一个源于真实案例的关键教训 (
written in blood):- 当删除一个被外键引用的对象时(如
User),数据库为保证数据完整性,需要检查所有可能引用它的表。 - 若引用字段(如
product.created_by)上没有索引,该检查将退化为全表扫描,在数据量大时导致删除操作极其缓慢甚至超时。
- 当删除一个被外键引用的对象时(如
- 结论: 外键上的索引不仅用于
SELECT查询,也间接服务于关联对象的DELETE操作,对性能至关重要。 - 最佳实践:
- 总是为外键显式设置
db_index(db_index=True或False)。 - 附上注释,解释保留或移除该索引的原因,使决策过程透明化。演讲者甚至建议通过自定义Django Check来强制执行此规则。
- 总是为外键显式设置
高级索引策略:针对可空字段的局部索引 (Partial Indexes)
- 问题:
last_edited_by字段是可空的(null=True),大部分值为NULL。然而,PostgreSQL默认会为NULL值创建索引条目,导致索引体积庞大,造成空间浪费。 - 解决方案:局部索引 (Partial Index)
- 通过在
models.Index中添加condition,可以创建一个只覆盖特定数据子集(非空值)的索引。 - 示例:
models.Index(fields=['last_edited_by'], condition=Q(last_edited_by__isnull=False)) - 效果: 索引大小从 6 MB 骤降至 32 KB,极大地节省了存储空间并提高了效率。
- 通过在
迁移操作的顺序与安全性
在用局部索引替换完整索引时,尤其是在非原子性迁移(atomic = False)中,操作顺序至关重要。
- 错误顺序: 先
DROP旧索引,再CREATE新索引。这会在两个操作之间产生一个无索引的“真空期”,使系统面临性能风险。 - 正确顺序: 先
CREATE新索引,再DROP旧索引。这能确保在整个迁移过程中,该字段始终至少有一个可用索引,将对正在运行的系统的影响降至最低。
应用层并发控制:select_for_update的正确使用
- 问题: 在并发场景下,为保证数据一致性,通常使用
select_for_update()进行悲观锁定。 - 隐藏陷阱: 当
select_for_update()与select_related()结合使用时,默认会锁定查询中涉及的所有表的行(包括JOIN的表),可能导致意想不到的锁争用。 - 精确锁定:
- 解决方案: 使用
select_for_update(of=['self'])来明确告知Django只锁定主查询模型(如Product)的行,避免锁定关联表。 - 建议将此作为一项防御性编程实践,即使当前没有
select_related,添加of=['self']也能防止未来代码修改引入的潜在问题。
- 解决方案: 使用
问答环节 (Q&A) 摘要
- Django操作 vs. 原生SQL: 优先使用Django内置操作(如
AddIndex)以保证跨数据库兼容性,仅在功能不足时才使用RunSQL。 - 非原子性迁移失败处理: 观众指出,
CREATE INDEX CONCURRENTLY若被中断,会留下一个无效索引。解决方案是在重新执行迁移前,先通过DROP INDEX IF EXISTS等语句检查并清理可能存在的无效索引。 - 迁移测试: 演讲者团队通过GitHub Action自动将迁移生成的SQL发布到PR评论中,强制进行人工审查,以此作为核心的质量保障手段。
- 学习SQL的必要性: 开发者应学习并熟悉SQL,理解ORM的默认行为和局限性,才能在需要时进行有效干预和优化。
- 对Django Core的改进建议: 社区可以推动改进,例如让
AlterField在仅修改索引时避免重建外键,以及让并发索引操作能支持由外键隐式创建的索引。