-
Website
http://www.davidcramer.net/ -
Original page
http://www.davidcramer.net/work/curse/44/what-powers-curse.html -
Subscribe
All Comments -
Community
-
Top Commenters
-
Rob Hudson
1 comment · 3 points
-
creeva
3 comments · 1 points
-
David Cramer
93 comments · 1 points
-
MsUnderestimated
1 comment · 14 points
-
Matt B.
1 comment · 2 points
-
-
Popular Threads
It's great that you published this info. You've got the highest volume Django installation I've heard of.
Hi David,
Thanks for the info! Could you please tell in more details what kind of issues you ran into with Django Fastcgi?
Thanks,
Peter
How do you handle / process your web log files, use a 3rd party to process them, or does some other hardware collect them? With all of the extra caching that would seem more difficult to track. I've used both Apache flatfiles / mysql logging but I've had my doubts about both ways.
In response to fcgi we ran into one issue with it throwing random server errors due to some weakref references somewhere in Django's core. I believe this was related to the MySQLdb library but even when we had pushed a new version of it out we still were having the problems. There were several other problems we had that we had fixed but it was one recurring problem after another.
After many many attempts to find some solution other than mod_python (due to its memory overhead) we finally decided it wasn't worth the time or trouble to continue to look for solutions to problems that kept popping up.
As a reference point for discussion a correctly compiled Python/mod_python should only result in an Apache mod_python.so loadable module of at most about 400 kilobytes. When this gets loaded, the actual memory for the module itself should be shared and so one shouldn't see a hit on each actual Apache child process, just once for the whole system.
A problem with mod_python.so though is that a lot of Linux distributions don't provide a shared library for Python in the distribution. This means that when mod_python.so is built, the Python static library objects have to be embedded within mod_python.so. This can add an extra 1.5MB to the size of mod_python.so. Worse is that when this is loaded into memory, it is necessary for the loader to perform address relocations on some platforms and thus rather than mod_python.so being shared, it becomes private memory to every process.
So, first thing you should check is whether your Python provides a shared library and whether mod_python.so is actually using it. If it isn't, you have already used up about 1.5MB of memory per process more than you need to.
Now when mod_python is loaded and initialised, there is some minor memory overhead in relation to its retaining of configuration information and again some minor overhead from the creation of the initial Python interpreter instance. Both of these should only be a few hundred kilobytes at most.
Where mod_python appears though to chew up a lot of memory initially is that it preloads various Python modules that it requires in order to perform URL dispatch. These include modules such as 'cgi', 'httplib' and 'urllib2' as well as others. In the main though, these are actually modules which would typically be used in a web application anyway and so it isn't actually overhead specific to mod_python.
Now it has been identified and discussed on the mod_python mailing list that a number of these modules need not actually be loaded at all, as they are loaded for one specific function which could just as easily be duplicated in mod_python. Also, some modules such as 'pdb' shouldn't be loaded unless debugging is being done. In all, it was found that up to 1MB of memory could be saved at startup by eliminating the modules that didn't need to be loaded.
Although this sounds great, various of these modules would end up getting loaded by the web application anyway, so memory saved might only amount to that taken up by 'pdb' which is a few hundred kilobytes. When mod_python 3.3.1 is cleaned up and old module importer removed which is currently available in parallel to old importer, should also be able to save some more memory.
A problem though with these modules being preloaded is that if you don't run your application in the mod_python 'main_interpreter', you will double the amount of memory consumed by these modules. In other words, those loaded into the 'main_interpreter' will sit idle and not be used, thus wasting memory.
Thus, if you aren't specifically needing to run multiple applications separated into distinct Python interpreters, make sure you run your application in the 'main_interpreter' by setting the PythonInterpreter directive. This will avoid the memory overhead of an extra Python interpreter and the separate copies of these modules.
When a request actually comes in, this is where memory use starts to climb again. First off, if using Apache worker MPM the whole overhead of Apache creating all the threads themselves does take a noticeable amount of memory.
More importantly, this is where your actual application will get loaded. For Django, the core takes up up to 7MB of memory. On top of that as requests come into specific parts of your application that will keep growing as potentially more and more Python modules get loaded for your customisations.
In summary, the bulk of memory used when using mod_python is still really the Python application itself that is being hosted. If you are not using a Python shared library with mod_python.so, you will waste about 1.5MB. If you don't run your web application in the 'main_interpreter' you can waste up to another 1MB. Since these figures are per process, it will all add up for the system as a whole.
So, mod_python could itself be trimmed by eliminating the need to load certain modules for each interpreter created, but this is going to be at most 0.5MB, and some deprecated code could be removed. The bigger problems come about though through a poorly configured and installed version of Python and not understanding the consequences of using multiple Python interpreters.
Having read all that and perhaps now having a better understanding of where memory gets used with mod_python, maybe you can relate your own experience. Would be particularly interested in here whether you do have a Python shared library being used by mod_python.so and whether you are running your application in the 'main_interpreter'.
I know this is a long post for a blog comment, so if you want to email me direct about it, or perhaps join the mod_python mailing list and share your experiences there instead then please do so.
Graham
@Graham: Next big issue I have with mod_python, I'm contacting you:)
Because mod_wsgi is tailored specifically to hosting WSGI applications and is not a general purpose way of writing Python applications in conjunction with Apache, it has less memory overhead than mod_python and also has less run time overhead than mod_python as well.
Maybe one day I might even be able to convince the Curse site maintainers to experiment with mod_wsgi and see if it better meets their needs. :-)
Graham
http://www.computerworld.com.au/index.php?id=76...
http://tweakers.net/reviews/661/7
http://forums.mysql.com/read.php?25,93181,93181
http://feedlounge.com/blog/2005/11/20/switched-...
http://www.postgresql.org/about/press/presskit8...
The last link has a couple of nice highlights of the most recent release:
"Performance improvements: version 8.2 improves performance around 20% overall in high-end OLTP (online transaction processing) system tests. Users can gain even more in data warehousing efficiency. The changes include faster in-memory and on-disk sorting, better multi-processor scaling, better planning of partitioned data queries, faster bulk loads and vastly accelerated outer joins.
Warm Standby Databases: through an extension to our Point in Time Recovery feature (introduced in version 8.0), administrators now can easily create a failover copy of your database cluster.
Online Index Builds: index builds can now occur while applications write to database tables, allowing performance tuning without downtime."
Is there an automatic failover between both database nodes incase of a failure? Are you using DRBD with heartbeat?
http://spyced.blogspot.com/2006/12/benchmark-po...