speaker 1: Thank you so much for having me. I love jgochan. I do quite a bit of community stuff with postgres, and I copy all of the great ideas that you guys have. And you guys are real leaders in kind of how to build an awesome community. And thank you for inviting me back. I always love speaking here. Postgreinternals is, without a doubt, the most boring topic I could be talking about at the end of this conference. I'm going to try to make this entertaining painless, and if I don't succeed, you can buy me a drink later. And thatbe more entertaining than this. I work at a company called crunchy data. We do manage postgras on cloud. We have a new data warehouse product, and then we also do post grestuff and Kubernetes. This isn't really part of my talk, but postgres 17 is coming out tomorrow. I've been watching and everything's tagged and ready to go. If you are not on version 13 or above, you should be upgrading your postgres. There's not a ton of superherheadline awesome features in postgres 17, but there is some cool performance stuff, especially with betree indexes. I know those are used heavily by the jgo community, so there's some cool stuff in there. If you end up upgrading, I'm going to talk through a bunch of kind of little commands in postgres. I have some queries. Here's a qr code or a link to a just file. Nobody needs these slides because they're not that good. But if you need the sequel, it's in there. All right, so I kind of want to just go through at a high level. I know lots of you have tons of postgreexperience, some of you don't have a ton. So I'm gonna na start at the very beginning and then we'll get into some more complicated stuff. So if you've never gone into postgreor, into psel, there is a command line interface for it comes and installs with postgres itself. You can't really install it as a side piece. You're if you've got a local copy of postgress running, you're fine. You already have it. If you have like you know your company's postgres is running at Amazon rds, you will have to install postgres in order to work with the psql command line. It's not the biggest package in the world. Like it could be smaller, but it's not terrible. If you're like on a mac, you can just brew, install postgraphs. There's lots of other distributions that they provide as downloads that work pretty well. To get into the command line interface for psql, you'll just connect it to your host. If you're connecting to a remote host, like you have a database somewhere else that your company manages, you'll need to create a connection string with, you know, your user password, host port and all that stuff, so you know who you are. Once you're inside psql, you can ask it, you know, who you are and confirm all of that stuff. That's the slash con info command. You can do a slash du and find out who else is in this database. So who else has permissions, what other applications might be in here, all kinds of stuff like that. You can change the formatting with the backslash x auto. So if you've ever been in like in postgreand, you've queried through the command line. The queries just get really, really long, and they go out to the right and they're like impossible to read. So this is a way that itjust batch up all the records for you and it's easier to read. So I know that people hate clis. I hate clis. I had to learn the Amazon cli this year, and I told my boss, I'm not learning any more clis this year. So if you don't like clis, it's fine. You can do most of the stuff that I'm gonna na talk about today in a gui. Lots of people use pg admin. It's very popular. I personally use dbeaver. I think it's just a better interface. And then there's other pythonic, you know data gritools and other stuff like that. So Yeah. So now that we have kind of connected, I'm going to talk through some of the ways that we can kind of see what's going on inside our database. I'm going to start at the very, very beginning. So if you have a postgreinstallation, you may have more than one database inside of it. This is a nice way to experiment. Start test projects, do lots of stuff without Ching touching your other stuff. I like have probably like 100 test databases. This is a small sample so you can find out all the databases that are inside this postgres instance with the backslash l. If you want to change databases, you'll land often in the postgredatabase. But if you find other databases that you want to work inside, you can do the backslash c and you can go into another database. I don't know how many of you know this, but the entire postgress ecosystem runs on Jango. So all of our conference registration memberships, like the whole postgreworld, runs in Jango. So some of the screenshots in here are from the Jango database that comes with the postgreproperties. So backslash d plus will get you like a display of table information and show you you know what your table names are. This is at the point that you're kind of trying to figure out what you've got right? If you're exploring your own database or maybe someone's asking for help, this is a good place to start to see you know what you've got. If you do the backslash d plus thatshow you the size of each of the tables. And if you need to like you know look at storage size, this is like an easy way to do it. Once you're once you know what table you want to look at, you can do a described table and that's the backslash d and you can and then you'll add the table name to it. The backslash d plus is really, really handy, right? Because it's got tons of data about the table. So it's sort of describing for you what columns, data types, if you have primary keys, foreign keys, so if you're you know kind of messing around and you're you know your Jango code and you're trying to like figure out what's happening and what's been mapped to what and how it's displaying in the database, this is a great place to like confirm what's happening. Another just small plug for why I'm showing you this stuff in psql and I'm not showing you the stuff in queries, is that the backslash dt, the described tables for postgres is twelve separate sql queries. This is a small sample of it. So you can do this yourself. Like if you wanna get all of your internals described, you can totally find the queries. They're probably on stack overflow or in the postgres docs, and then you can run those in a gui. But psql does a good job of kind of collapsing all that for you. Another cool thing that you can do with psql and all of the stuff that I'm kind of talking about today is you can do an echo so you can have the commands that you're running be echoed, and then you can get the sel for them. So like let's say that you want na echo, you know, the describe table thing, because you want na change that query a little bit and you want na know specific stuff about tables. You can pull off the sequel and then kind of rearrange it yourself and decide what you want na do with it. And then once you get past the describing the databases, describing the tables, describing what's in the tables, what columns and rows there are, you have to switch to sql to actually see the data. There's no psql for like data. And then you can also, if you're mucking around and messing around in postgres, you can find all your settings. You can they're a little a select query. If if you're an admin, you can just find all of your postgres settings. This is super helpful if you're on a managed platform because you may not have all the bells and whistles to go into your underlying configuration files. And this is nice for people to just take a peek at what their settings are. If you're going through the postgres docs, one of the things I do at crunchy data is write like tutorials. Some of the stuff that I've showed you today is already inside tutorials that we have. Like I have one about basic psql. I have one about the echo stuff I was talking about. These all run in a web browser in like web assembly. So you don't need to install post graphs or do anything with your computer. You can just it even works on a phone. I was like showing somebody postgres business on a phone recently. So that's a learn dot crunchydata dot com if you want to mess around with that. All right. So let's kind of transition here into talking about what's happening in the database. So I know you guys are all really good at using like the Django tools to kind of see what's happening. As it happens, postgres has a lot of the same kinds of things, although they're stored in the database in a slightly different way. If you're using some kind of monitoring, this is not a replacement for that. So I'm going to talk about a couple of different things you could do. This is more just like things that are in the database if you want to dig in a little bit more, not a replacement for you know some kind of full application monitoring. So there's a table inside postgres called pg Stat activity that will tell you everything that's happening inside the database, how long it's been running for, when it started and what the state is. This is super helpful. Like if everything is not working and your database is completely not taking queries or you know things aren't working, you can go in and see what's running and then you can you can stop the pid that's actually stopping the database from working. You can also see in the pg stactivity who is doing work in the database. So you can see if your application has connections, how many connections that application has individual users, lots of stuff like that. So if you are kind of messing around trying to figure out how many connections your application has opened, I know I was talking a couple of people last night about just like the connection management, knowing how many connections are open or how many connections are idle is super helpful when you're trying to figure out what to do with your connection management. Several of the slides I have kind of sprinkled in here are like queries that have you know specific stuff kind of pulled out of them. And the reason for that is that these internal tables that I'm talking about are just huge. There's tons and tons of data in them and it's virtually impossible to kind of talk about if I don't show you a small sample of it. So you're if you're wondering if there's a lot more than there needs to be, this is postgres. So of course there is. Yeah so another cool table for kind of what's happening is the pg Stat database table that will show you all the transactions that are happening in individual databases. This is kind of the way that people measure transaction volume. If you run in this query now and you run it an hour from now, you will know what your transaction volume per hour is. And you could do that for days. But this is super helpful for just knowing how busy you are. Postgres holds a bunch of locks, which I'm sure you guys are super familiar with. You probably run a Jango migration and a locked your database and now you're not allowed to run Jango migrations. Maybe it's just me. Yeah. You know there's a lot of things, a couple things inside postgres that will actually lock postgres tables and those are table alteration commands. Like if you change a column or do something like that, there's a couple other things that will lock your tables a lot of times like I help on our support team and we spend a lot of time dealing with locks. It is kind of one of the unfortunate things about a relational database that's not trying to lose your data is that it needs a minute to like commit its transactions. But Yeah, so you can look at what is locked. Like there's a table where the locks are stored. One of my colleagues wrote this query because what happens with these locks is something will lock and then itbe sort of this you know cascading effect, right? And all these queries will be waiting for it and you're probably trying to find out like why is this query waiting? Like why is it not able to run? And then it's gonna na take you a minute to figure it out. So some one of my colleagues wrote this kind of locks query that just like has a cte at the beginning and then just kind of gives you the original thing that locked everything up. So that's super helpful if you're like, okay, I've got a in the process that's locking like 100 things. Postgres also has views for everything that's have so maintenance tasks that happen inside postgres, right? If you're super familiar with vacuum, I'm sorry, I have also spent a lot of time working on postgress vacuum. Again, it's kind of just an unfortunate remnant of the transaction system that you know needs to kind of keep dead stuff around. But you can check on things like if if you have a copy command, like if you're running some kind of etl process at night and loading a bunch of data, you can just check on the process of those maintenance tasks, which is pretty handy sometimes if you want to dig in more on vacuum, you can query post greand. Kind of ask it, you know how long ago did you vacuum and different things like that. All right. So we kind of covered what's inside of our database, what kind of data we have. We've covered what's happening. Postgres has this whole other world of cumulative stats that are really good. And there's been a lot of development even in the last couple versions of postgres to kind of build out some of this stuff. If you're using a postgres monitor, probably some of the monitoring tools are built off of these pieces, but you know they're kind of fun to just like get in and queer yourself too. So the pg Stat statements, if you're not familiar with it, is kind of the query tracking piece of postgres. Pg Stat statements ships with postgres, but it's not turned on. So you have to go and add it as an extension and then add it to your library. But it is like the most helpful thing you could do as an application developer to kind of find out like what's happening with your queries, how often good queries run. It's definitely the place to start when you're trying to do some query optimization and work, you know kind of if you're working backwards from the database to your application code. So here's a query to just find. Know your slowest ten queries and pg Stat statements surprising no one. The first slowest query in this database is a refreshing materialized view, which if you have any of those, they take like hours to run depending on how big they are. But you can go through it and you know some of the stuff you're obviously not gonna na be able to fix. And then you can just pick the things that you want to work on. There is a pg Stat io user table. There are a ton of io and memory related things inside postgraphs. Now if you're interested in memory usage, if you're concerned about memory usage, or you just love kind of that kind of piece of postgres, some of these internal tables have a ton of data in them. So one of the things that you can get out a pgstuser tables is a cash hit ratio. And so postgresses keeping track of how many things it served from the memory so that like shared buffer memory, and then how many things that it had to read from the underlying disk. So you know if you're familiar with kind of how the ideal postgress world works, is that you want the vast majority of your data in postgressmemory so that everything is super fast. So you're looking for a cash hit ratio in the nineties, but you can find out what it is. There's a pg stuser table that has a bunch of information about what's happening when tables are queried and scanned or indexed. And this will show you kind of this is a little query that somebody wrote that I think is kind of cute because they like did it up so that if there are more sequence scans than there are index scans, you're missing an index. So anyways, this you know this is one way that you can find out kind of what what is happening on tables in terms of like actual query behavior. And then pgstat user index is kind of similar to the one I was just talking about, has a ton of information about the indexes, how often they're used, what's going on with them. And then you can kind of decide like, okay, great. Like that's index is being used. Or you can be like, this index is never used and I should delete it because it's taking up space on my disk. You can reset all these Stat tables that I just talked about, which is super handy. So like if you go and you do a bunch of indexing and you do a bunch of great stuff with your queries and you don't want na like muck up all of your stats tables, you can reset it. I sat down and showed some of my slides last night and showed them the slide and they reset all of their stats, and I don't think they meant to do that. So maybe think about it before you copy something from one of my slides and run it on your database. Yeah. So I'm just going to kind of wrap up by like what I sort of really wanted people to get out of this talk and kind of people to walk away with is sort of you know that there's a bunch of data in your database, right? Like you know that postgrehas all of your users and all the data and all the settings, but there's actually a bunch of other information that is collected in the database you know for all of this activity and transactions, locks, connections and then all of this like kind of performance over time stuff that you have at your fingertips and you can just do whatever you want with I have a postgres meetup that's online called postgres meeup for all. It's pretty new or starting in October, if anybody feels like joining an online group of postgres people that are kind of loud and obnoxious. If you want to find me on social media, I'm sequel Liz, or as some people call it, sequel lies. And this is another chance if you need the just file and I'll take some, I'll take questions, I guess, in the hall so you guys can get your snacks. Ach so.