| Blog信息 |
|
blog名称: 日志总数:1304 评论数量:2242 留言数量:5 访问次数:7659972 建立时间:2006年5月29日 |

| |
|
[Django]Page Stats Middleware 软件技术
lhwork 发表于 2007/2/4 14:10:37 |
| Some websites have a nice feature where they will show you a summary
of the performance metrics measured during the generation of a page.
I thought this would be a nice feature for some Django-based web
applications I was developing at work. When a page takes a while to
render, I like to quickly diagnose where the problem might be.
500)this.width=500'>
I wanted to know the time spent generating the page in total as well
as the breakdown between Python (Django) and the database.
Additionally, I wanted to know how many times the database was queried
as too many queries can be a hint that you aren't accessing the
database efficiently.
Since I wanted the feature to work across any Django app, a custom
middleware class seemed like the right approach. Writing a custom
middleware is quite easy and, like so many other things in Django,
well
documented.
Calculating the Metrics
Luckily, the Django developers had already thought of the value of
collecting some basic database statistics, so I simply had to figure
out how to access them. When DEBUG mode is on (through your settings
file), a database backend class wraps a cursor in a
CursorDebugWrapper object which keeps a list of each SQL query
executed along with its execution time. I just had to make sure
DEBUG was enabled, if not already, and then add up the time of any
calls incurred during this page's invocation.
The total time to process the view is measured in the middleware's
process_view routine and then the time spent in Python (i.e Django)
was easy to back out by timing the entire view call and then
subtracting out the database time.
Viewing the Stats
The last part of the puzzle was how to put this information in the
page itself. Normally in Django you would use a custom template tag
for such a chore but I couldn't since I wouldn't have completed timing
the view call until the template had already been rendered.
I decided to follow the spirit of template substitution by creating a
special HTML comment that described the format of the output and then
replacing the placeholder with the formatted output as the middleware
returned the view to its caller.
The StatsMiddleware Class
The code is pretty straightforward. First, I save the state of the
debug setting and then enable debugging. This triggers the use of the
debug wrapper class in the database backend which will keep the stats
of the database part of the execution. Then I save the size of the
connection's queries where this debug info is stored so I will know
which queries were additionally called during this view's invocation.
Then I build the view, keeping track of its time
import refrom operator import addfrom time import timefrom django.db import connectionclass StatsMiddleware(object): def process_view(self, request, view_func, view_args, view_kwargs): # turn on debugging in db backend to capture time from django.conf import settings debug = settings.DEBUG settings.DEBUG = True # get number of db queries before we do anything n = len(connection.queries) # time the view start = time() response = view_func(request, *view_args, **view_kwargs) totTime = time() - start # compute the db time for the queries just run queries = len(connection.queries) - n if queries: dbTime = reduce(add, [float(q['time']) for q in connection.queries[n:]]) else: dbTime = 0.0 # and backout python time pyTime = totTime - dbTime # restore debugging setting settings.DEBUG = debug stats = { 'totTime': totTime, 'pyTime': pyTime, 'dbTime': dbTime, 'queries': queries, } # replace the comment if found if response and response.content: s = response.content regexp = re.compile(r'(?P<cmt><!--\s*STATS:(?P<fmt>.*?)-->)') match = regexp.search(s) if match: s = s[:match.start('cmt')] + \ match.group('fmt') % stats + \ s[match.end('cmt'):] response.content = s return responseAfter the view is called, I restore the debug setting and save the stats in
a dict which will be used for argument replacement during the output
formatting step. As I said earlier, I use a sort of poor-man's templating
to render the stats into the page on output. An HTML comment of the form
<!-- STATS: format_string -->
will be replaced with the output of format_string after formatting
against the stats dict. This blog uses the string:
<!-- STATS: Total: %(totTime).2f Python: %(pyTime).2f DB: %(dbTime).2f Queries: %(queries)d -->
I put this inside the footer <div> and format through the CSS style
sheet.
The replacement block in the code deserves a couple of comments. I
wanted to make sure the string formatting was not applied against the
entire content of the page so that random % symbols wouldn't cause
problems. So I search for the special comment tag and then apply the
formatting only to the subsection where it was found. The resulting
content is then assembled with the untouched bounding parts of the
original conent and the modified subsection.
Immediately Helpful
Soon after I got this working, it helped me identify a logic bug
in my code. I had created a view that showed the position of an
investment fund in a fairly large table (about 700 rows). The page
took a few seconds to build and render. The statistics at the bottom
of the page showed a whopping 2813 queries! Yikes!
The reason became immediately clear: each row in the table displayed a
Position object as well as fields from the four other models it had
ForeignKey relations with. Thus, each time a row was evaluated in the
template, there were four individual database queries to look up the
related data. I simply added select_related() to the end of my main
query and the query count dropped to 4 and the execution time was cut
to less than half the original 2.5 seconds. Nice.
|
|
|