2019-10-19 | DjangoCon 2019 | Django REST Framework: Taking your API to the next level by Carlos Martinez
Django REST Framework进阶技巧与性能优化
标签
媒体详情
- 上传日期
- 2025-06-21 19:00
- 来源
- https://www.youtube.com/watch?v=695y8rdHsA4
- 处理状态
- 已完成
- 转录状态
- 已完成
- Latest LLM Model
- gemini-2.5-pro
转录
speaker 1: Thank you. Yes. Thank you, everyone for joining to this talk. Okay, let's begin our talk regarding Jago's framework and how to make it. And you go to the next level. So first of all, who are used, a jaguas framework or no janguas framework? So great. I don't need to make a review. So who am I? Carlos Martinez. I'm from Colombia, bacdeveloper at uit. You can follow me Twitter on GitHub. And also I have on a small in this columombia, there's the place that I came from and run out the Python Bogota group. We have a meet tup for about 2500 members right now. I also do photography time to time. I have a daughter and I do travel photography. So let's begin. We're going to start with a five minute api and we're are going to do something like have users create events, have promoters for events and to create tickets. While we are doing this one, we're starting to think about these five topics, how to display different data based on context. For instance, I don't want to to get the same fields for an event inside our ticket. For instance, how to get better performance, how to filter information inside the api, configure permissions, and how to render results in different formats. Let me get out of here and let's take a look how our first version of the api is going to look like. So this is our ticket endpoint. As you can see, I'm sending out a nested object for event and also for user. On each ticket that I get, user has only fields, doesn't have any nested object in particular. And for promoter, for evsorry, I got the same, but only promoter is the nested object. So that's it. That's everything the api is working. But no, the problem is here that we have here is if I got only ten items, I guess I have right now. Yeah, it's not quite much a word fi. I get responses in about. speaker 2: let's take a . speaker 1: look how much time is taking king. It's about 100 milliseconds. But let's start to create a few more tickets. Always happens. These kind of things. Come on, tike. Give me one circle please. Maybe I'm in the wrong branch. Yep, that was the reason. Okay, sometimes happens. Okay. Right now I'm creating 5000 new tickets for this api, including promoters and users. So it is continue going to create a few more. And what we're going to see in our api is that all these response times is going to grow bigger and bigger and bigger every time. So our api is not done. So first of all, something that we can start to thinking about is when we have nested objects, we can reuse our serizers. So instead of have different serizers for what is a representation of an event and inside our tickets and create a news analyzer class, we can start to to use something called dynamics Zers. So we can reuse our logic in one serizer and use it for different purposes and different sections of the api. So this is a dynamic analyzer class that you can use is based on mosterilizer from jgo's framework. But the difference is that it has a different attrito get fields. And that's what does really Detrick. Then you can create, for instance, in this case, an event zer with an space to get the representation that you expect to get for this particular nested object. So you can define static methods to get those fields in this particular serierizer. So if everything works fine, you continue creating more things. Let me continue there. Just . speaker 2: create too much. speaker 1: So that's the main thing that you can do with dynamics related. So you can, for instance, create a static method to get location fields, but you can get only a noone method to return ID and name, for instance, if you needed to. So that's the way you can create those particular methods. That way you can save time, reuse logic within the seral Lazer that you already create, and there is no code duplication. So what about getting better performance? As you can imagine, we started pretty good. We started pretty fast. There's no issues, but the things started to get worse and worse when we get more users involved in our application. So let's see if that already finishes. Come on. Well, I'm going to stop this and let's take a look to how it is working now. I mean, in the right one branch. So let me switch back to the wscenario right now. Previously I just read about 15000 elements is trying to render that information. And you can see here how the kdjango is making too many queries to the api sorry, to the database to get all the information that I am asking in this particular endpoint. So until Jungo doesn't get all that information, is not going to respond. So that's the reason we should start to use prefedge and prefedge. What really does it will create for us a different query to a database to try to get all data that we need to make our representation for our serializer really easy. And in just one query, if that's the case, for instance, let's take a look to what happens when you get ticket objects dot all. It just only is going to create a select with all the fields form the table that you need. But when you add select related, what is going to do is going to get all the fields related to that model in particular that you have relation in that moin particular. So here is the same query, but now gets not only the information about ticket, but also the information about user. So it's not going to go again to a database and make 100000 of queries, but instead it's going to be just one. So that small change is going to save you a lot of time, is going to save you a lot of head edges. So that's the important thing that you need to take a look how to get everything in just one query, the perfect related is going to be all one, which is useful. But keep in mind, that is great for many, too many relationships. And the previous one, as we can see, we get where user ID equals to users dot ID. So that way is making the relationship between them on perfecrelated is going to be a little bit different. But even though it's going to be better than not making any preferelated, so in this case in particular, it's going to do two queries. The first one to get the information about eticket, and the second one is going to get the information about the events. But it's going to select only the ideas that they need to get the information that they expect. So for instance, you make a filter for tickets and you are getting just a set of tickets that is going to create a different set of ids that is going to query for the table erso. It's going to be way smaller. Also something that you can do and you can improve is to use the profege object. What really help you out is to get the information that you really need for that particular representation on the api. So if you get if you see here inside the profedge related, instead of having the string of the related field, I'm getting a preecobject inside a setting string of the attribute or the external or the key, and I can create a new query set and select the things that I really need. So with that dot only, I only asking for in this case, ID, a name. So the query is going to be a little bit more smaller. And when you have a lot of data to get in, it's going to make the change. So let's take a look if that already finishes. Yep, it takes two minutes and 5s to complete it. So our user is gone from the application is not going to continue with us. So prefetch and related and select related is a must to when our applications are growing. Next that we can do is to start to make filter how to get our information in a better way. One package that is really useful to get this is to use jungle url filter. You can set up two ways. The easy way is this one. You only add Jungo filter vant for url filter integration drf and you can select the filters that you expect to get filters on. And with this in particular, you will be able to get to do something like this. So you can go for the endpoint events and you can filter to get a set of ids if you need to, or you can get the name and make sure it contains any particular screen that you expect to get. Or if a related object has some particular name in this case, and you can start to filter even more complicated things. So it's going to help you to get complex queries inside the url, but you can also make it without less and use a model filter set that you can reuse on different endpoints with the same thing. The same logic is pretty similar to make a serializer a more serializer. All right, so but we already make prefetch. We already make a few things regarding filtering. What about cash? Acwe can start to use the jungle powerful cash ache that we have out of the box. So to do it, you need to activate those middle ter words Res with a red row arrow and make sure that the common millware is in the middle, somewhere in the middle. And to make it work using Jungo's framework, you need to call these two decorators variant cookie with a method decorator, variarian cookie, and the decorator to get catcha page. And you can set up them time that you want if you need to. And I recommend it to make it select and create A K prefix. So that way you won't forget how to invalidate . speaker 2: your cache. At this point. speaker 1: if you set up this and it's working, is going to give you a better performance. Please do not forget to invalidate cash in some way. So drop these lines of code somewhere in your code to make sure that when something change, the cache is invalidated at the right time and not after the time expire. But there's something else that we can do to improve our performance using cash. One is a really good package called Django cash hops. And what is going to help you out is to cash old orm transactions that you are doing, and you can cache multiple query sets. So to use cachoyou need to start to work in using rethis. So to use it, you had to stops ps, as many packages array, set up your Reis cache. And you can use these configurations to set up a few things regarding what you expect to cash. Ups start to make cash for you. And you can set up your own models and your applications to be cash. So you can define what operation is going to be cash by default, what is going to be the timeout by default on different applications and different models if you need to. And you can define to create a cache for everything, but it's not quite recommended because you may up having cash on different things that you really don't want to. So if you need to, you can also make cash. You can use cash ops to make it by yourself. So as you can see, we have this get query set from a mold uset, and you can define your query as expected. But at the end, you append dot cache and open a closing parentheses. With that, it's going to be cash in Reis and you will get the results not through from your database, but instead from Reis. So it's going to help you out with performance. But what about permissions? What we can do to improve our permissions. Actually, Jungo already has a very great permission system, but we can try to use something called dry permissions and is going to improve our experience. Has any other package we installed with pip drirest permissions added to our styled apps? And what we are going to do is to in our bset, we're going to add permission classes, this line over here, and you can define if that's a public endpoint or is required of education cation, but also add dry permissions. And now you can start doing something like this. So in your models, you can define a set of global permissions. In these cases, these examples are has permission to read, has permission to write, or has permission to create. So you get access to their request. Up this point, you can say you can get the user and you can make some validayou, can get require permission, require an attribute inside the user or anything from their request. So you can define different sets of rules for each and one. And you can also have object permissions. So it's not only global, but also you can have access to the instance itself and make some validations. The first example is going to make like the user is related to that particular object. And also you can require, as I mentioned before, I require a particular permission already granted for that user. That is a request esting access. But even though we can add also permission not to only the object, but also to a field or an action. So if you have a field and you want to protect, to get right to a particular field, you can set up this method and you can protect this particular change. And if you define in abuse set and different action, you can use it with a name and you can make a restriction over there. And so all the logical regarding permission is going to be only on each model. So it's easier to find where the actual . speaker 2: logic is there. speaker 1: but you can customize a little bit more. Maybe you want to share a set of permissions within different models. One way to do it is to create our custom permission. So this is based on reframework permission itself. And you can use it four different, because you only need to implethe implement the method has permission and has some utilities. For instance, safe methods are going to be get meta and options. So if you are given in this case, access to where make requests to that method. But it's not one of these methods is going to require a permission before you can grant Ted access for patch boot post. So then ta can help you out to add this permission to a different view set. And then you can protect in a different way that particular buset that you add that permission. speaker 2: But how to use then dry permissions . speaker 1: when you are outside of a model view set, if you create an api view, you're going to see that it will throw you an error because there is not a key request. So important in those cases, you will need to create your fererilizer with context is going to required to provide a request. Otherwise, it's going to throw you an error because the model has not a way to get the request and and run all your logic that you are defining there. And last one that I want to bring to you guys to take your api to another level is to use some renders. So yesterday, we saw this package working when using automagic. So this is going to be another example is very, very useful to and very easy to implement. So on your model view set, you only need to add this particular class xls file mixing. And inside renderer classes, you need to add xlrenderer. So with that in mind, you can create excel files just calling the api by creating by arenina a new header or making explicitly on the url. Don't forget, if you need to, if you want to, to get to continue having the api html bu from your, from jungorest framework, don't forget to add brosible api render. Otherwise is going to be just json or excel. speaker 2: All this code is going . speaker 1: to be available on this repo. I'll give you a couple seconds. You can take a picture and let's take a look to how it works after we do all that things . speaker 2: to this api. speaker 1: So let's take a look. So I'm going to change the branch to muright now. And our previous response took two minutes, a little more, and now is only 124 milliseconds. Of course, it's paginated, but that's the idea to get quicker responses. And we are getting the event and the user here, and you can see here, remember last time I tried to do this, these are the logs to all the queries that the jungle was doing to get that representation. And now this is it. This is what he's really doing right now to get all that information. So it's a very small query. Our developed team is going to love us to do the kind of thing, that kind of stuff ough, but of course, you can get even a little bit more is going to take extra time to render. It depends on how many data you want to get. In this case, I'm getting all almost all that does set now 10000 items, but it's getting 7 mb of data at this point, and it took about three minutes. So maybe that's not what you're expecting to get. That's not exactly maybe what you need, but you can define the limits using the jungle reframework by gination. So well. speaker 2: that's it for now. speaker 1: I want to thank you to my team at uit and also for the client that I work with building engines. That's where all the mahappens and I learned that all these things. So thank you. Et titrage?
最新摘要 (详细摘要)
概览/核心摘要 (Executive Summary)
本内容总结了 Carlos Martinez 在 DjangoCon 2019 上关于提升 Django REST Framework (DRF) API 性能与功能的演讲。演讲的核心在于,一个初始简单的 DRF API 会随着数据量的增长(如增加数千条记录)而面临严重的性能瓶颈,主要原因是未经优化的数据库查询(N+1问题)。Carlos 通过一个实例展示了如何将一个因处理大量数据而响应时间超过2分钟的 API,通过优化(包括查询优化和分页)将首页响应时间缩短至124毫秒。
为实现这一目标,他系统性地介绍了六个关键技术和实践:
1. 动态序列化器 (Dynamic Serializers): 通过自定义基类,根据不同上下文(如嵌套对象)动态调整序列化字段,以重用代码并减少不必要的数据传输。
2. 查询优化: 使用 select_related (用于外键) 和 prefetch_related (用于多对多) 从根本上解决 N+1 查询问题,并通过 Prefetch 对象进行更精细的数据获取控制。
3. 高级过滤: 借助 django-url-filter 等库,实现复杂的、基于 URL 的数据筛选功能。
4. 多层缓存: 结合使用 Django 内置的页面缓存和 django-cacheops 库对 ORM 查询结果进行缓存,并强调了建立缓存失效机制的极端重要性。
5. 精细化权限管理: 采用 dry-rest-permissions 将权限逻辑直接定义在模型(Model)中,实现全局、对象级、字段级的精细化权限控制,使代码更清晰。
6. 自定义渲染器: 扩展 API 的输出格式,如通过 drf-renderer-xlsx 等工具轻松支持 Excel 文件导出。
演讲强调,综合运用这些策略,可以构建出既可扩展、高性能,又易于维护的健壮 API。
背景:API性能瓶颈问题
演讲者首先构建了一个包含用户、事件和票务的简单 API。在数据量较少时,API 表现良好。但随着数据增长(如向数据库添加5000条以上记录),问题开始显现。
- 初始状态: API 端点返回嵌套对象(例如,票务信息中包含完整的事件和用户信息)。
- 问题暴露:
- 当数据量激增后,API 的响应变得极其缓慢。
- 根本原因在于 N+1 查询问题:DRF 在序列化嵌套对象时,每处理一个主对象,就会为每个关联的嵌套对象发起一次独立的数据库查询,导致查询数量呈爆炸式增长。
- 量化影响:
- 优化前: 在未优化的条件下,获取一个包含约15,000条记录的列表,响应时间长达 “2分5秒” (two minutes and 5s)。
- 优化后: 经过查询优化并启用分页后,获取第一页数据的响应时间缩短至 124毫秒。
解决方案一:动态序列化器 (Dynamic Serializers)
为了在不同场景下(如列表视图 vs. 嵌套表示)返回不同的字段,同时避免代码重复,演讲者推荐使用动态序列化器。
- 目的: 避免为同一模型的不同数据表示(如完整视图、嵌套视图)创建多个序列化器类,将所有序列化逻辑集中管理。
- 实现方式:
- 创建一个继承自
ModelSerializer的自定义基类,该基类可以接收一个字段列表参数。 - 在具体的序列化器中,定义多个静态方法(
staticmethod),每个方法返回一个特定场景下所需的字段元组(tuple)。 - 在视图中实例化序列化器时,根据上下文调用相应的静态方法来确定需要序列化的字段。
- 创建一个继承自
- 优点:
- 代码复用: 将所有与同一模型相关的序列化逻辑集中在一个类中。
- 减少冗余: 无需为每个视图或嵌套级别创建新的序列化器类。
- 灵活性: 轻松控制 API 在不同端点返回的数据结构。
解决方案二:查询性能优化
这是解决性能瓶颈的核心步骤,旨在通过减少数据库查询次数来提升效率。
-
select_related:- 适用场景: 用于优化外键(
ForeignKey)和一对一(OneToOneField)关系。 - 工作原理: 通过 SQL
JOIN操作,在一次数据库查询中获取主对象及其关联对象,将多次查询合并为一次。
- 适用场景: 用于优化外键(
-
prefetch_related:- 适用场景: 用于优化多对多(
ManyToManyField)和反向外键关系。 - 工作原理: 它不使用
JOIN,而是执行两次独立的查询:一次查询主对象,另一次使用主对象的ID列表,通过WHERE ... IN (...)子句一次性获取所有相关的子对象。这远比 N+1 次查询高效。
- 适用场景: 用于优化多对多(
-
Prefetch对象:- 目的: 对
prefetch_related进行更精细的控制。 - 实现方式: 将
Prefetch对象实例传递给prefetch_related,可以在该对象中指定一个自定义的QuerySet。 - 高级用法: 可以在自定义
QuerySet中使用.only('id', 'name')等方法,仅预取所需字段,进一步减少数据传输量和内存消耗。queryset.prefetch_related(Prefetch('related_field', queryset=RelatedModel.objects.only('id', 'name')))
- 目的: 对
解决方案三:高级过滤
为了让 API 用户能更灵活地获取所需数据,演讲者推荐使用 django-url-filter 库。
- 功能: 允许通过 URL 参数进行复杂的、跨关联模型的查询。
- 使用示例:
- 筛选多个ID:
?id__in=1,2,3 - 模糊匹配名称:
?name__contains=some_text - 跨关系筛选:
?promoter__name__icontains=promoter_name
- 筛选多个ID:
- 集成:
- 在视图集(ViewSet)的
filter_backends中添加URLFilterBackend。 - 通过
filterset_fields指定允许过滤的字段,或创建可复用的ModelFilterSet类进行更复杂的定义。
- 在视图集(ViewSet)的
解决方案四:多层缓存策略
缓存是提升高频读取 API 性能的有效手段。
-
Django 内置缓存:
- 设置: 在
MIDDLEWARE中添加UpdateCacheMiddleware和FetchFromCacheMiddleware。 - 使用: 在视图方法上使用
@cache_page装饰器来缓存整个页面的响应。 - 关键警告: 演讲者特别强调:“请不要忘记以某种方式使缓存失效” (Please do not forget to invalidate cash in some way)。当数据发生变化时(如创建、更新、删除),必须有相应的机制来清除旧的缓存,否则用户会看到过时的数据。
- 设置: 在
-
django-cacheops库:- 依赖: 需要使用 Redis 作为缓存后端。
- 功能: 能够自动缓存 ORM 查询结果。
- 配置: 可以在
settings.py中精细配置哪些应用或模型的哪些操作(如get,fetch)默认被缓存及其超时时间。 - 手动使用: 可以通过在查询集末尾附加
.cache()方法来手动触发对该查询结果的缓存。MyModel.objects.filter(...).cache()
解决方案五:精细化权限管理
为了使权限逻辑更清晰、更集中,演讲者推荐了 dry-rest-permissions 库。
- 核心思想: 将权限检查的逻辑从视图(View)转移到模型(Model)中。
- 实现方式:
- 在视图集的
permission_classes中添加DRYPermissions。 - 直接在模型类中定义特定的权限方法。
- 在视图集的
- 权限类型:
- 全局权限:
has_read_permission(request)、has_write_permission(request) - 对象级权限:
has_object_read_permission(request, obj),可以访问对象实例obj。 - 字段级权限:
has_my_field_write_permission(request, obj),可以保护特定字段的读写。 - 动作级权限: 可以为自定义的 ViewSet action 定义权限。
- 全局权限:
- 注意事项: 当在非
ModelViewSet(如APIView)中使用时,必须在创建序列化器时手动传入request上下文,否则模型中的权限方法将无法获取请求信息。
>serializer = MySerializer(data=..., context={'request': request})
解决方案六:自定义渲染器
为了满足多样化的客户端需求,可以为 API 添加不同的输出格式。
- 示例: 生成 Excel 文件。
- 工具: 使用
drf-renderer-xlsx等库(演讲中提及XLSXFileMixin和XLSXRenderer)。 - 实现:
- 在视图集中继承相应的 Mixin。
- 在
renderer_classes列表中添加新的渲染器,如XLSXRenderer。
- 重要提示: 如果希望保留 DRF 默认的浏览器友好界面,不要忘记在
renderer_classes中同时保留BrowsableAPIRenderer。
结论与成果
通过综合应用上述六种技术,演讲者成功将一个存在严重性能问题的 API 转变为一个高效、健壮且功能丰富的服务。
- 性能飞跃: 优化后的 API 避免了因加载海量数据而导致的超长响应时间(超过2分钟),在分页条件下实现了毫秒级(124ms)的快速响应。
- 查询效率: 优化后的 API 数据库查询次数显著减少,日志记录显示查询语句变得极为简洁。
- 可维护性: 将序列化、权限等逻辑进行集中管理,提高了代码的可读性和可维护性。
- 最终建议: 演讲者强调,分页(Pagination)对于处理大规模数据集仍然是必不可少的,即使经过了多重优化。