2025-06-21 | DjangoCon 2023 | Postgres Performance: From Slow to Pro with Elizabeth Christensen

Postgres性能优化入门:从慢到高效的数据库调优技巧

媒体详情

上传日期
2025-06-21 17:48
来源
https://www.youtube.com/watch?v=gpbpVheR3gM
处理状态
已完成
转录状态
已完成
Latest LLM Model
gemini-2.5-pro

转录

下载为TXT
speaker 1: Good morning. I'm so excited to be here. We're going to talk about postgres performance, and we're going to talk about getting started. So going from the very, very beginning to kind of getting up to speed. I am Elizabeth Christensen. I love postgres. I work at a postgrecompany called crunchy data. I'll be happy to talk to you about that if you want na know more. I'm a volunteer for the United States Organization, and I have been at this booth out here, and I do some other volunteer work for them. And I host a meet up for postgres in Kansas City. I am also from Lawrence, Kansas. Like a lot of people here, if you're in the area, we would love to have you. And I'm on various social media platforms as sel Liz. So even though the title of my talk is from slow to pro, the postgres isn't actually slow. You know really if you're getting started with a database with postgres or any of the databases that work with the jinggo orm, you're you're gonna to be just fine. You you know hook up your stuff, use your orm, stuff goes in the database and things just work. And as a sort of overarching idea, postgresses very stable. It doesn't lose your data. It's got tons of options for data types and different things that you want to do with your application. It's an overall great database for running with your front end applications. So when you're talking about performance, right, you are building some kind of application. You need to sort of do some things with your database, and you're probably going na be concerned about cost, right? Compute cost money. Databases are notoriously memory intensive. And so what we kind of want to do when we're talking about postgreperformance is really maximize what you're getting out of the database itself in terms of how much you're spending on the database. We want to keep things really fast for our users, right? Because you guys are application developers, and if stuff is slow, it looks like your application is slow. So we never want that to be the case. And we want na kind of optimize as we go. So a little bit of a roadmap today for what we're going to talk about. We're going to talk about how data gets in and out of postgres, some memory settings that you'll want to look at, some other machine, things like cpu and iops. We'll talk about connection usage, and then we'll talk about some query performance. Writing a performance talk is really challenging, right? Because there's two big parts to database performance, right? One is the machine itself, and so how postgres is doing things, what's going on inside the database. And the other piece is how your application is interacting with the database. I will spend a lot more time on the database stuff itself because that's a little bit easier to talk about to a general audience. And I'll give you some tips about how to find individual performance issues inside your application. But generally, once you're past the everything is good with the database, there's a lot of performance stuff that goes into the application side. So I'll just touch on that. And if you're wondering, but what about my hosting? I'm on Amazon rds. Like how does this gonna to work? So in general, the stuff I'm gonna to talk about today is pretty applicable to anybody hosting postgres anywhere. If you're using a managed service like a crunchy bridge or an Amazon rds, you might have some of your configuration menus in a different place, and you may have some of the memory settings and some of this stuff pre configured for you. It's a good idea to know what's going on under the hood. So even if some of it's done for you, we'll cover some of that. All right. So at the sort of start, I just kind of want to go over all of the layers of data when you're when you're using an application and a database and where lots and lots of caching happens. So if you're you're getting started in performance, you're gonna to be talking a lot about caching data and having data at the ready. So at the very high level, you've got know your web stuff, you've got people coming in from the outside world and then you've got your application layer. And then underneath that, you've got your database layer. And inside your database layer, you may or may not have a connection Pooler, and we'll talk about that in a little bit. And then the individual pieces that actually connect to the database are called they're called a client back end. So when you're digging through the postgres docs, that's what it's called. And that individual connection that is coming from your application to read or write data is going into postgres and into the shared buffer cache. So if you're writing data to a database, it's going to go through the shared buffers in down into the physical disk storage because postgres is stored on disk, it's a stateful storage. And at the end, it has to be stored on disk. If you're reading data, some of your frequently accessed data will be in shared buffers, which means you don't have to go to the underlying disk to go get the data. And so when you're talking about a database and you're talking about iops and input output, every write that you write to postgres is going to write to the underlying disk. And reads that are not in shared buffers are going to read from the disk. So when you are kind of setting up your memory and your database memory for performance, you want to minimaximize what's in shared buffers, right? You want the data that your application is quering to be in that buffer cache so that when your application asks for it, it's right there. And then you kind of want to know that if you're using iops, and we'll get into some of these specifics later, the more iops that you use and the more stuff that you read from disk, the slower your database is going to be and the slower your query is going to be for your users. So the first kind of big picture idea for getting sort of your your memory configured right is working on these shared buffers. So in general, you can do about a quarter of the memory that you have on your machine for shared buffers. I've got a few sample sample numbers in here for just like a really small kind of standard production machine that's like let's say it's eight gigs. You know it's just a couple cores. It's not a big huge machine, but you can run a decent production size machine and postgres on eight gigs of memory. And so that standard postgres memory setting of 128 megs is not gonna to do it. You're gonna to want na get yourself some more shared buffers. And the way that you can check shared buffers in postgres is with the cash hit ratio. So postgres stores in when a query comes in whether or not that query data was part of the cache or whether or not it was missed and it had to read from the underlying disk. And so that's a query that you can just ask postgress, what is my cachit ratio? And postgres will tell you you're going to want a number in the high 90 ties. You're doing really well. If you're lower than that, you're not doing great. Something to keep in mind here is that if this is new data, if you added a lot of data, if you restarted recently, this is going to be different. So this is cash hit ratio for all the data is the same. You haven't loaded anything big recently. You'll want na be in that high nineties. So each individual connection to postgres, so those individual client back ends also connect to postgres. And each of those pieces, each of those connections use their own memory. And so postgres will default to four megs for each of those connections. And then typically, you will want to make that a little bit bigger, right? It used to be you could get a really easy number for work memory by just knowing how big your machine is. Like it's eight gigs and I'm going to use a quarter of my memory for working memory, and I've got 100 connections. It's going to be about 20 megs postgres. All the modern versions of postgres now have parallel queries. So a single shared back end can use a little bit more than it's allocated. And so some of this math gets a little complicated. So when you're kind of doing this, you play around in general for people on like an 8 gb machine, we've found that you know 16, 18, 20, somewhere in that range is probably good. Definitely the default is too small for a production postgrass use case. So if you have your postgres connections coming in, right, and you have that working memory and that each connection is allocated a certain amount of memory to go and get the data that it needs from shared buffers or from the disc if that connection doesn't have enough memory, it will spill that query and that workload to a temporary file. And so when we'll talk about this a little bit later, when we talk about logging, but if you're trying to figure out in my am I good with working memory or do I not have enough, your sort of clue to figuring that out is the postgress temporary file and how many of those you're generating. There's also a maintenance work memory setting inside postgres, and this is for maintenance tasks like building indexes and then the postgress vacuum. The default is probably too small for any production use case. In general, you can allocate three, 5% of your ram and you can do a little math to figure out what's a good number for your machine. And then you can also tell postgres what your effective cache size is. This is a setting and you'll just tell it you know sort of as you've allocated that 25% to shared buffers, you'll tell it the other 75% of that system memory. Let's see here. And then if you're kind of wondering, this seems really complicated. I don't want to have to do all this math myself. There is a developer out of the Ukraine who wrote a really nice tool called pg tune. And you can just put in like the size of your database and kind of what you're doing with it and hekind of give you some recommended settings. I'm going to move on and talk a little bit more about some other things other than memory happening inside your postgres machine. So you'll probably have some kind of third party tool. There's lots of different things out there, open source and not open source, that will monitor things like cpu and your system load. If you want to know what's going on in postgres and what's using cpu, you can use the pg stactivity table. And if you need to stop transactions or do something with those things, you can get the ID from there. When you're kind of peeking at cpu and thinking about things that might just take your database and run away with it, one of those things are long running statements. And postgres has a way to set a statement timeout. These are something that you can set by role. So it's a good idea to you know tell your you know give your application role a statement timeout of a minute or two minutes. In that way, if you have some weird code update that starts generating a bunch of long running stuff, it doesn't just completely take your whole database out and then you can you know obviously set a different timeout for yourself or for like a reporting application or something like that. Iops is another thing that is a good idea to keep an eye on when you're kind of peeking at your memory settings and kind of how everything is going on. Iops is a great indicator of I don't have enough working memory, I don't have enough shared buffers, postgresses using the disc all the time to read and write queries. So you you should be seeing an iop Spike if you're you know loading data every night, like you have an etl tool that loads data every night, obviously, that's going to use a lot of iops because it's writing a bunch of stuff to disk. But what you don't want na see is a huge, huge iops use all the time as you're using your application and writing queries, right? Because we want a lot of that data to be in those shared buffers. So the this is kind of another sort of clue to get your memory kind of set up the way that you want it. And another thing that is important to talk about with postgress performance is table bloat. There's a lot of good like full hour long talks on table bloat. And if you're kind of dealing with any of these really large tables that are frequently updated, there's some great information out there. So just at a super, super high level to not like spend a whole hour talking about table bloat. The way that postgres updates rows in a table is that it will update a row in a table. But because postgres allows multiple versions to be happening at a time, right? Because it allows many queries to be running in the database, it will keep that row around if something else is using it. So if you update you know one field in a row, postgress leaves the old sort of dead row around, and then later postgres will come in and it will vacuum up that dead row. So what ends up happening is if you have a high lots and lots of changes happening in the database, you will get lots and lots of dead rows. And that's called tbloat. And so that's sort of part of the overall size of your disc, right? Because those are rows stored on disc that are taking up disk space. There's a great query on the postgres wiki to help you measure your table bloat. And then some of the managed tools will do this for you too. If you're you know under 50% table bloat, that's fine. You're doing a good job. Once you start getting above 50%, you want na keep an eye on it and maybe figure out what's going on and look at your application and and whether or not you kind of need to do some other things with vacuum in general, post gresible auto vacuum and most of that stuff is fine. So in general, you don't need to do a ton with it until you get to a pretty active database. And the other thing just kind of for your underlying postgres machine is that the postgreversion that you're on really matters in terms of performance. This is a little benchmark that the enterprise db company did about postgres versions. And every version of postgress that comes out has a lot of things sort of under the hood, right? They're not you know features that you're going to read about on hacker news, but they're you know underlying things that make queries faster, that make the underlying kind of database parts work. Postgres 16 just came out a couple weeks ago and thathave, its first 16.1 come out probably early next year. Postgress sumis open for commits and you should be running a version of postgress 13, 14, 15 or newer. Twelve is getting end of life. And then versions older than that are kind of out of support. And let's talk a little bit about connections, right? So the individual pieces that come into the database are going to affect how your database operates. So postgres knows it has a max number of connections that it will allow into the database so you can run out of connections. The key there is to make sure that if you have you know however, your application server is connecting to the database, you haven't over committed connections and you've know got four web servers and they each have 50 connections to the database and you left your default at 100, that means half the time the connections are going to be refused. So you need to keep an eye on your max connections and then look at what you're using. This does this defaults to 100. We have seen like recently, I've seen a few people just say like, well, just set it to like 3000 and itbe fine, and then I'll never run out of connections. And that's not a good idea because those connections use memory. And so if you have a connection that was made and it's idle, it's still taking up whatever you gave it as that working memory. And now the connections that come in don't have any memory. So you can count the number of active connections that you have in postgres. You can just query that from the database directly. With this query, you can ask postgres what queries are using connections. And then it's a good idea once you get past a couple hundred connections and you start getting into the point where you're sort of scaling your application, right? You've got several hundred users and you've got things kind of happening to think about using a connection Pooler. So if you let individual connections come into postgres, those will open up a bunch of individual connections and then those will stay idle. And so you'll have a lot of idle connections taking up memory and you'll have a lot of open things going on. And so a connection Pooler will just sort of pool those resources, right, and itsort of manage keeping you know, letting things have open connections to the database, but not necessarily using tons of database resources to keep those connections going. And so pg bouncer is like the sort of de facto best use case connection puoler for postgres. There's several other connection poolers out there. I haven't tested a bunch of them. We have a small Jango application that we use where I work inside crunchy that does some some tools for our sales teams. And we I don't know. I think we probably have like a couple hundred people that use it. And things were a little bit slow and we were kind of not not loving the performance for the end users. And we put pg bouncer in front of it, and it's been awesome. So I haven't seen any big downsides to adding that to even a pretty small application. Okay, so let's sort of transition now. I've talked a lot about a bunch of the machine stuff and talk a little bit more about looking for issues with the application in the database, right? And so there's a couple different places that we're gonna to talk about where you can find stuff. You know there's application logs like Jango has a lot of tools and you know perf things and lots of ways to kind of get at what's happening inside Jango. Postgrest has a very robust logging system, and we'll talk about that. And then we'll talk about a couple of the query tools that will help you kind of get some data out of what's going on with your postgress queries. And those are the pg Stat statements and the explain plans. One thing I thought would be helpful if you're giving a talk about performance and queries is to just sort of set a bar about what is fast and slow. So I think this is obviously dependent on your application and know the standard engineering answer is it depends. But in general, if you're writing a quick query to your database, it should give you information in a millisecond. If you're not getting data that fast, there's probably an issue for your other sort of bigger queries where you're getting a set of data you're gonna to be wanting to get things in the hundreds of milliseconds, right? So if you're reading explain plans or looking through your tools, you know hundreds of milliseconds good, thousands of milliseconds generally bad. And if you're you know up above if you're in the couple seconds range, up above 5s, that's going to be super obvious to your end users. And that's a query that you either need to work on you know in the database or from the application perspective there, you can turn on debugging inside the Jango tool if you want to just watch what's happening as your app runs. And then it will kind of show you the sequel that's happening under the hood. The Jango debug toolbar does something kind of similar, except it does it in a toolbar. I just kind of tested this in in the like tutorial application to see how that would work. And it just kind of spits out the the raw sql queries for you and gives you some information about how those work. There's a Jango silk tool out there that I did a little bit of testing with that will give you even more query information and it will go by page and kind of collect stuff up for you. I found a nice tool on GitHub when I was, Oh, I must not have a slide for that. If you're looking for like n plus one queries, I'll show you a slide for that later. And when you're sort of digging into postgress, sort of beyond the jangle layer, which you'll you'll probably need to do if you want to get really into any kind of query performance tuning, you'll definitely want to take a look at the postgres logs. So if you're self hosting and you can get to your postgres logs, you can start digging through them. Most people are going to be shipping logs to some kind of third party that's going to parse those and do something with them. There's a lot of apm tools out there. I know scout is here and a lot of our clients have had great luck with that tool. There's an open source tool that come, if you want na stay sort of in the open source postgres only world called pg badger that will let you ingest logs. And it kind of does a whole bunch of stuff with your logs to kind of help you with it. So you'll have to set up a few things in postgres in terms of logging, I think, out of the box itjust log errors. So if you want postgres to log you know table modifications or everything that's happening, you'll have to tell it to do that with the log statement. You can also tell postgres how long of an operation to log. So in general, if you start logging everything that postgres is doing, it's going to be really noisy and you're gonna to fill up whatever it is that you're storing logs on, if you're paying somebody to store them or if you're storing them yourself. In general, you can pick a pick a speed at which you want to log something. 1s is a good idea since you know anything under a second, it's probably you know running and then you're kind of picking stuff higher. But you can decide what kind of works for your application. You can log the temp files that are created in postgres, and this is a really good idea. And you can decide the size of temp file to set. A lot of people just set this at what they set their working memory to be. That way you're logging you know temp files that are a little bit bigger and you're not sort of logging a ton of stuff. But this is a good indicator and something you can keep an eye on if you're trying to work on your memory settings that we talked about earlier. And then you can tell postgres what you want your logs to look like. You can set up a little you know prefixer and then that way theycome out in the way that you want. And you can also log locks. So postgres will lock tables for certain operations. There's a this is kind of like vacuum, where table locking is one of those topics where there's tons of information about it out there and lots of good talks and blogs. Generally, if you're seeing table locks and something's waiting on a lock, it means the thing that happened before is locking it. But if you're logging table locks and you're trying to figure out what's happening, this is a good thing to know so that you know this query wasn't slow because the query is slow. It's slow because it's waiting on a lock and something happened before it. And another tool that postgres has for sort of doing like query, I call it query logging, but it's sort of a way that postgres keeps track of what's happening with your queries is the pg Stat statements extension. So this is part of postgres itcome with postgres, but you have to load it in and say that you want na use it because it does use a little bit of memory. And then it collects statistics on all the queries that come into the database until you reset it, which you can do with a reset command. And pg Stat statements, like in the sort of application developer world, is definitely like the tool to make sure you're using. So you if you walk away from this talk with one thing that you're not doing today, turn on pg Stat statements and start what's going on under the hood with your queries. So you can just once you set up pg Stat statements, you can just look at postgres and you can just query postgres for information about your queries. So you can just say like, okay, I want na know you know my ten longest running queries. Just give those to me and and postgrewill spit that out for you. You can look at what queries run the most often. So if you're sort of picking away at what to work on performance wise and you think, okay, I'm you know, I'm gonna to pick a query to work on, I want na make my really slowest queries work and then anything that my users and my application use all the time, I wanna work on that too. That's a good place to start. You can also look at what queries use cpu and sort of how what they're doing with the actual machine, if that's a concern to you and that's something you want to work on. And then once you start getting into pg Stat statements and seeing what is happening, what queries are coming in, you're probably going to want to know more about that specific query. And so the way that postgres does this is with what's called in the explain plans. And so basically, explain is just something that you run inside postgres. You run an explain plan and then you put the query after it, and then postgres gives you a bunch of information about the query itself. And explain, analyze will do the same thing and itgive you the execution time. So how long it takes to run the query. So under the hood, you know postgress does tons of work. When your query from Jango application comes in, postgrest does all of this stuff to kind of parse it out, generate the fastest path, make a query plan, execute that, and then return it to you. So if you change the data that you're working with, you're going to get a different explain plan. So one of the sort of little tricks, if you're working on trying to make your queries faster, you'll have to do this on a copy of your actual production data. You can't really do this with a test data set, right? You can't do it with seed data, and you can't do it on a different version of postgras. So you'll have to kind of keep things all the same if you want to do some explain plans and kind of experiment with things. You can also run the explain tool with buffers. And so that will tell you things like what kind of scan it did, how long it took. Here's like a little breakdown of kind of what an explain plan looks like. And so ittell you okay, I did this kind of scan on this table. Lots of different information about loops and what's going on. And then if it used shared buffers to get that information, and then if it had to read from disc, ittell you that, and then italso tell you how much time it's spent reading io. So if you're kind of starting to get into putting this all together, right, trying to make sure that your queries are running and shared buffers minimize the amount of io, this is kind of how you're going to put that together, right? You're going to get the queries out of pg Stat statements. You're going to run some explain plans and sort of figure out what's happening under the hood. When you're looking at the scan types and explain, you'll want to look for and especially if later you get into some kind of performance, things like building indexes, you can check the explain plan to make sure postgres is actually using your index. And then you can also see like if postgresses using joins and what type of join it's using in that query, auto explain will log explain plans for you. So itsend kind of those plans that I showed you into your logs, and you can set a minimum duration so you can auto explain every query that's longer than 2s. Our support team told me to not even put the slide in the presentation because most people just run themselves out of log space with auto explain. It's super noisy and it uses a ton of memory because it it has to calculate so much to get you that explained plan. So if you do it, do it carefully and maybe on a test database or something. All right. So I've kind of talked a little bit about how to get all of the the query stuff, how to kind of find some of the problem areas, how to sort of start taking a look at what's inside the queries that are problem areas. And I kind of wanna give you just a few ideas about things that you can do to kind of improve the queries and improve some of the operations inside those database commands. So the first one is indexing. I'm I'm sure a lot of you have heard of this, but if you haven't started adding any indexes, they're super helpful in terms of performance postgrehas a lot of different kinds of indexes where itgroup information together you know based on different ideas, kinds of data that you have. So if you have just you know text data, stuff like that, you'll be fine with a betree index. If you've got lots of date range and time series data, you'll want na bring index. There's some nice spatial data indexes like the gist and the new sp gist. And then if you're using a lot of the newer postgres and the json functionality, you can use the gen index for that. And then for sort of application developers, multi column indexes can be super, super helpful. You can kind of combine things that you query together all the time, you know like names or you know things that are obviously always coming together if you just dipped your tone. Indexing, multi column indexing is definitely the next step in terms of getting things moving really fast. And partial indexing is also really helpful, especially since you know if you're working on web development data, you probably have a lot of data fields that are null or don't have data that's relevant, and you're never gonna to pull them out in a query. If you write a partial index that helps postgress know you know just skip over that data. You don't need to you know have that as readily available. I did some little kind of messing around with indexing performance just to kind of show you I have a little table of weather data, and if I run, you'll see it runs that sequence scan. And I just pulled that in, you know, 30 or so milliseconds. If I create a plain betree index on that weather data, run the exact same query, itrun an index scan, and itreturn that in three milliseconds. So just your simple b tree index is going to be pulling data so much faster from the underlying data set. Indexing can be kind of complicated, especially if you're working with a production database and you've got a really big data set. Creating an index takes a lot of time. The indexes themselves are stored on disk, so they take up disk space. And then every right that comes in obviously has to be written to the index. So if you have a really, really right heavy you know database you want na kind of be careful with with what you're doing with indexes. Postgress has an extension for hypothetical indexing, which some of our clients use, especially if the creation of an index is really time consuming. And it's going to take you know like I going to take a couple hours to build this index. You can create a hypothetical index and then you can ask postgres for that explain plan on that hypothetical index and postgres will give you an explain plan on an index that doesn't actually exist. And this is kind of a cool way to do it, really helpful if you have big data sets and you know kind of you know an index is going to take a long time to run. And then the sort of other thing about indexes is that you don't want too many indexes. So like I talked about, indexes have to be stored on disk and every right needs to get into the index as well so that it can be pulled out when you query it again, you can just ask postgres with a query what indexes are not being used, and postgres will tell you your unused indexes and you can drop those and kind of decide what you want na do from there. Another sort of idea to take away too, is just sort of your data modeling. I don't obviously, I can't give you an hour long talk on data modeling, but a couple tips that are helpful is just keeping your table small. Don't create tables where you're updating tons of data at a time. So you know an example of this that our cto x to talk about is you know if you have a bunch of user contact data right, and you're you've got their address and their email address and tons of information about this person, but you also keep a time stamp every time they log into your application, which is like 45 times a day. Don't store that time stamp in the same place that you store their contact information because like we were talking about with you know the table bloat, every every time you update data, postgres keeps a dead row around. And then just you know you're kind of adding to a lot of amplification with with different things there. So if you can kind of keep your really, really, really frequently updated data separate from other data, it's a lot better for the underlying performance of the data itself. And another thing to sort of keep in the back of your mind are n plus one queries. And this is like primarily something that you'll get straight from the orm when you're writing the models to kind of decide what you want and then you you know get a batch of data and then you want na print something related to it. So you'll kind of essentially what happens with an n plus one query is that you'll write one query and then you'll ask for something else. And that generates you know hundreds or thousands of individual queries instead of writing it as a batch of queries. There's a nice project on GitHub that I found called Jango perf in you. Where he has a little sample like template site that you can just spin up a little Jango project and then you start clicking and you generate n plus one queries. And so itkind of show you that in in the Jango silk, if you want na sort of see what an n plus one query looks like without finding it in your own in your own application. So what itlook like is you'll have, you know that query, that initial query at the bottom, and then you'll have you hundreds of individual, the exact same query running right after it. Some of the apm tools also have a lot of this n plus one stuff built into it. Now I know that the scout tool, I've seen people use that. And then there's probably other ways that people are kind of solving this problem. And I think in the Jango world, you know you've got some tools inside your orm to kind of make sure that you're selecting the related data or prefetching it first so that you're know you're querying the whole batch of data from progreinstead of querying one piece at a time. And those are really well documented in the Jango docs. Okay. So I'm going to just leave you with my final tips for postgress performance. Stop runaway queries and set a statement timeout, especially for things that shouldn't have long running queries. Set up pg Stat statements and know what your slowest queries are and if they're important, try to fix them or add indexes or do things to make them faster. Add indexes for your most frequent queries. Check your cash hit ratio. Tune your memory and add memory if you need to keep an eye on that. Make sure that your connections have enough memory and make sure that you have enough connections. And if you're kind of scaling up, take a look at that pg bouncer tool. Stay on top of your postgres versions and that is it there. Any questions? Thank you.

最新摘要 (详细摘要)

生成于 2025-06-21 17:53

概览/核心摘要 (Executive Summary)

本次演讲由 Crunchy Data 的 Elizabeth Christensen 主讲,旨在为应用开发者提供一份PostgreSQL性能优化的入门指南,帮助他们从零基础成长为能够解决常见性能问题的专业人士。演讲核心观点是,PostgreSQL本身并不慢,性能问题通常源于配置不当或应用与数据库交互的方式不佳。Christensen系统性地阐述了性能优化的两大方面:数据库服务器自身的调优应用程序的交互优化

在服务器端,她强调了内存配置的重要性,特别是shared_buffers(建议设为总内存的25%,并通过高缓存命中率验证)和work_mem(默认值过小,需适当增加以避免查询溢出到磁盘)。她还介绍了监控CPU、IOPS和表膨胀(Table Bloat)等关键指标的方法,并建议使用statement_timeout来防止失控查询拖垮数据库。

在应用交互层面,Christensen力荐使用pg_stat_statements扩展来识别最慢和最频繁的查询,并结合EXPLAIN ANALYZE深入分析查询计划。她详细介绍了通过添加索引(特别是多列索引和部分索引)来显著提升查询速度,并给出了避免N+1查询和优化数据模型的实用建议。此外,她强烈推荐在应用扩展时使用pgbouncer等连接池工具,并始终保持PostgreSQL版本更新以享受底层的性能改进。整个演讲为开发者提供了一套清晰、可行的性能诊断与优化路线图。

PostgreSQL 性能基础与核心理念

  • PostgreSQL 并非天生缓慢: 演讲者 Elizabeth Christensen 指出,对于初学者和常规应用,PostgreSQL 与 Django ORM 结合使用时通常表现良好、稳定且功能丰富。
  • 性能优化的目标:
    • 成本控制: 数据库是内存密集型应用,优化旨在最大化硬件投资的回报。
    • 用户体验: 保证应用响应迅速,避免因数据库缓慢导致应用看起来很慢。
    • 持续优化: 随着应用的增长进行渐进式优化。
  • 性能问题的两大来源:
    1. 数据库服务器: 包括内存配置、CPU使用、IOPS等机器层面的设置。本次演讲侧重于此部分。
    2. 应用程序交互: 应用程序如何查询和使用数据库,如查询语句的写法、ORM的使用方式等。

关键内存参数配置与调优

PostgreSQL的性能与内存配置密切相关,核心目标是让常用数据尽可能保留在内存中,减少磁盘I/O。

  • 数据缓存层次: 数据从物理磁盘到应用层经过多层缓存,核心是PostgreSQL的shared_buffers
    • 写操作: 数据写入会经过shared_buffers最终落到物理磁盘。
    • 读操作: 如果数据在shared_buffers中,则可快速读取;否则需从磁盘读取,产生I/O,速度较慢。
  • shared_buffers (共享缓冲区)
    • 作用: PostgreSQL最核心的内存缓存区,用于存放频繁访问的数据页。
    • 建议配置: 通常设置为总物理内存的25%。默认的128MB对于生产环境而言严重不足。
    • 监控指标: 缓存命中率 (Cache Hit Ratio)。可以通过SQL查询获取。
      > "你希望这个数字在90%以上的高位。如果低于这个值,说明做得不够好。"
    • 注意事项: 刚重启或加载大量新数据后,命中率会暂时偏低。
  • work_mem (工作内存)
    • 作用: 每个数据库连接在执行排序、哈希、连接等操作时可以使用的独立内存空间。
    • 问题: 默认值(如4MB)对于生产环境太小。如果work_mem不足,查询操作会溢出到磁盘临时文件,严重影响性能。
    • 配置挑战: 由于现代PostgreSQL支持并行查询,单个查询可能使用超过一份work_mem,使得精确计算变得复杂。对于8GB内存的机器,16-20MB通常是一个不错的起点。
    • 监控指标: 监控PostgreSQL日志中临时文件的生成情况,是判断work_mem是否充足的关键线索。
  • maintenance_work_mem (维护工作内存)
    • 作用: 用于数据库维护任务,如创建索引、执行VACUUM
    • 建议配置: 默认值过小,建议分配总RAM的3-5%。
  • effective_cache_size (有效缓存大小)
    • 作用: 告知PostgreSQL查询规划器,操作系统文件系统缓存可用的内存量(约等于总内存减去shared_buffers的部分),帮助其更准确地评估查询成本。
  • 辅助工具:
    • pgtune: 一个由乌克兰开发者编写的工具,可以根据数据库大小和用途推荐合理的内存配置参数。

服务器资源监控与管理

除了内存,其他系统资源也对性能至关重要。

  • CPU 使用率:
    • 监控: 可使用第三方工具或查询pg_stat_activity视图来查看当前哪些进程正在消耗CPU。
    • 控制失控查询: 设置statement_timeout,可以按角色(role)为应用设置一个合理的超时时间(如1-2分钟),防止异常查询耗尽CPU资源。
  • IOPS (每秒输入/输出操作)
    • 意义: IOPS是衡量内存配置是否合理的绝佳指标。持续的高IOPS通常意味着shared_bufferswork_mem不足,导致数据库频繁读写磁盘。
    • 正常情况: 在ETL或批量数据加载时IOPS飙升是正常的。
    • 异常情况: 在应用正常查询期间,不应看到持续的高IOPS。
  • 表膨胀 (Table Bloat)
    • 成因: PostgreSQL的MVCC(多版本并发控制)机制在更新或删除行时,不会立即物理删除旧版本数据(称为"死元组"),而是等待VACUUM进程回收。频繁更新的表会积累大量死元组,导致表文件占用过多无效的磁盘空间。
    • 测量: 可使用PostgreSQL Wiki上提供的查询脚本进行测量。
    • 判断标准: 膨胀率低于50%通常是可接受的。超过50%则需要关注。
    • 解决方案: autovacuum在大多数情况下能自动处理,但在高负载下可能需要手动调整其参数或进行干预。
  • PostgreSQL 版本:
    • 重要性: 新版本通常包含大量底层性能改进,即使这些改进不是显性的新功能。
    • 建议: 至少使用 PostgreSQL 13, 14, 15 或更新版本。版本12即将停止支持。

连接管理与连接池

  • max_connections:
    • 问题: 需要根据应用服务器数量和其连接配置来合理设置。如果应用请求的连接数超过max_connections,请求将被拒绝。
    • 误区: 不能简单地将max_connections设得非常大(如3000),因为每个连接(即使是空闲的)都会消耗work_mem,可能导致活动连接内存不足。
  • 连接池 (Connection Pooler)
    • 必要性: 当应用连接数达到数百级别时,强烈建议使用连接池。它能有效管理数据库连接,复用现有连接,避免为每个请求都创建和销毁新连接,从而减少资源浪费。
    • 推荐工具: pgbouncer 是事实上的标准和最佳实践选择。
    • 效果: 演讲者分享了为内部小型Django应用引入pgbouncer后,性能得到显著提升的成功案例。

查询性能分析与诊断

  • 定义“快”与“慢”:
    • : 简单查询应在1毫秒内返回。
    • 中等: 获取数据集的查询应在数百毫秒内完成。
    • : 超过数千毫秒(秒级)的查询对用户体验影响明显,需要优化。
  • 应用层工具 (Django):
    • Django Debug Toolbar: 在开发环境中直观地显示每个页面请求背后的SQL查询和耗时。
    • Django Silk: 一个更强大的分析工具,可以按页面收集和展示查询信息。
  • PostgreSQL 日志:
    • 配置: 默认只记录错误。需要配置log_statementlog_min_duration_statement(例如,记录所有超过1秒的查询)和log_temp_files(监控work_mem溢出)。
    • 工具: 可将日志发送到APM工具(如Scout)或使用开源工具pgbadger进行解析和可视化。
  • pg_stat_statements 扩展:
    • 核心工具: 演讲者强烈推荐开启此扩展。它在内存中收集所有查询的累计统计信息。
    • 功能: 可以查询出:
      • 总耗时最长的查询。
      • 执行次数最多的查询。
      • 消耗CPU最多的查询。
    • 使用: 这是定位性能瓶颈的首选工具。> "如果你从这次演讲中只带走一件事,那就是去开启 pg_stat_statements。"
  • EXPLAINEXPLAIN ANALYZE:
    • 作用: EXPLAIN显示查询规划器为特定查询生成的执行计划。EXPLAIN ANALYZE则实际执行查询并显示计划及真实耗时。
    • 关键信息: 可以看到扫描类型(如全表扫描 vs. 索引扫描)、连接类型、是否使用了shared_buffers以及I/O耗时。
    • 使用前提: 必须在与生产环境数据量和PostgreSQL版本相同的数据库上执行,否则结果没有参考价值。
  • auto_explain:
    • 功能: 自动为超过指定时长的慢查询记录EXPLAIN计划到日志中。
    • 警告: 此功能非常消耗资源且会产生大量日志,使用时必须极其谨慎,建议仅在测试环境或短期调试时开启。

查询优化策略与实践

  • 索引 (Indexing)
    • 基础: 添加索引是提升查询性能最直接有效的方法。
    • 索引类型: B-tree(通用)、BRIN(范围数据)、GiST/SP-GiST(空间数据)、GIN(JSONB等复杂类型)。
    • 高级索引:
      • 多列索引: 对经常一起查询的多个列创建联合索引。
      • 部分索引: 只对表中符合特定条件的行创建索引(如WHERE status = 'active'),可以减小索引体积和维护成本。
    • 性能对比示例: 演讲者展示了一个例子,通过添加简单的B-tree索引,查询耗时从约30毫秒降至3毫秒
    • 注意事项: 索引会占用磁盘空间,并增加写操作的开销。应定期查询并删除未被使用的索引。
    • hypopg 扩展: 允许创建“假设性”索引,在不实际构建索引的情况下,通过EXPLAIN评估其对查询计划的影响,对大表尤其有用。
  • 数据建模 (Data Modeling)
    • 原则: 保持表的小型化。将频繁更新的字段(如用户最后登录时间)与相对静态的字段(如用户联系信息)分在不同的表中,以减少更新操作带来的写放大和表膨胀问题。
  • N+1 查询
    • 问题: ORM中常见的性能陷阱。先查询一个列表(1次查询),然后在循环中为列表中的每个对象单独发起查询(N次查询)。
    • 识别: 可以通过APM工具(如Scout)或Django Silk等工具发现。日志中会表现为大量结构相同的查询被连续执行。
    • 解决 (Django): 使用ORM提供的select_related(用于外键和一对一关系)和prefetch_related(用于多对多和反向外键关系)将多次查询合并为一次或少数几次查询。

核心结论与最终建议

  1. 设置 statement_timeout: 防止失控查询拖垮整个数据库。
  2. 启用 pg_stat_statements: 了解哪些查询最慢、最频繁,并优先优化它们。
  3. 为常用查询添加索引: 这是最有效的性能提升手段。
  4. 检查缓存命中率: 确保shared_buffers配置合理,目标是90%以上。
  5. 调优内存并确保连接充足: 调整work_mem,确保max_connections设置合理。
  6. 使用 pgbouncer: 当应用规模扩大时,引入连接池。
  7. 保持 PostgreSQL 版本更新: 持续享受免费的底层性能提升。