DISQUS

David Cramer's Blog: Spaceless HTML in Django

  • miracle2k · 1 year ago
    I've considering this before, but was always unsure about the CPU cost involved.
  • Subsume · 1 year ago
    Pretty hot yet simple. Thanks Mistar Cramar.
  • Sam McDonald · 1 year ago
    I like it a lot. I have been wondering if there was an easy way to do this, and there is. I will definitely have use for this soon.
  • Honza · 1 year ago
    Is that really worth it?

    taking a file:
    index.html - 91565 bytes

    code:
    >>> f = open('index.short', 'w')
    >>> f.write( short( file( 'index.html' ).read() ).encode('utf-8') )
    >>> f.close()

    produces:
    index.short - 69915

    and my favorite - gzip:
    index.gz - 14709
    index.short.gz - 13319

    examples from zena.centrum.cz where we have some nasty whitespace in our HTML...

    Knowing this, I would take mod_deflate in (insert your favorite web server) over any python space stripping anytime...
  • David Cramer · 1 year ago
    Does mod_deflate allow removing the extra whitespace? In your example you are correct, that it's not a huge savings for that individual request, and gzip I would highly recommend, but the CPU time is negligible.
  • Dan · 1 year ago
    I've considered this before, but wondered if there was much benefit over gzip (which I would recommend using even with whitespace stripping). Does a whitespace-removed, gzipped response save enough space to justify the (admittedly reasonable) CPU cost of whitespace-removal?
  • Josh · 1 year ago
    Very interesting thoughts. I'll have to play around with it sometime. Thanks!
  • Tom · 1 year ago
    So... every third byte was whitespace? That's a bit hard to believe, even as a worst-case. Still, if you saw improvements, then my congratulations.

    I would suggest, though (as I see others have), that people facing the same problem have a look at simply gzipping their text-based output at the webserver level. Since the action's probably happening on the webhead either way, it's unlikely to be much more costly in CPU terms, and it's even simpler to implement than your solution (it typically just takes a line or two in .htaccess if you're using Apache). Gzipping will certainly remove any overhead that whitespace represents.
  • David Cramer · 1 year ago
    It's mostly indentation that actually contributed to the whitespace :)
  • Andreas · 1 year ago
    Clever trick! Debuggers could always use firebug if they want it indented nice. Everyone should do this + gzip + expires header if plausible
  • mikl · 1 year ago
    Note that Content-Type will usually be 'text/html; charset=utf-8', not just 'text/html' (since Content-Type without a charset is unsafe).
  • bartTC · 1 year ago
    The drawback of this is, that it also steals the whitespace in pre and textarea tags.
  • David Cramer · 1 year ago
    Have you confirmed this? I believe the whitespace tag is designed to avoid that, but I haven't fully tested it.
  • bartTC · 1 year ago
    I had problems with highlighted (pygments) text in pre-tags which results in unindented code *cry*. But that's a very special condition.
  • Chris Kelly · 1 year ago
    The whitespace tag is really simple, it just finds the end of one tag and the beginning of another with whitespace in between, and removes said whitespace. It looks like it doesn't have any special cases, so it unfortunately affects textarea and pre tag content.

    see: http://code.djangoproject.com/browser/django/tr...
  • bartTC · 1 year ago
    Ups, I wasn't aware that there is a real "strip_spaces_between_tags" function. Sorry for the noise :D
  • Roger · 1 year ago
    The appropriate place for cached content to be effected would be at the start of the end? :-/
  • Roger · 1 year ago
    or the end*
  • Peter Bengtsson · 1 year ago
    I wrote this snippet:
    http://www.djangosnippets.org/snippets/1055/
    to (very) efficiently whitespace optimize inline CSS which might be interesting for you and your Jinja too.
  • Bartek · 1 year ago
    Very nice and simple trick. Thanks
  • Florent V. · 1 year ago
    Hello,

    I'm curious: why outputting whitespace-less HTML when you could just use gzip compression on the server for HTML, XML, CSS and JavaScript, and get better results (like from 300 kB -- not kb ;) -- to, say, 120 kB)? Did you do some performance testing regarding that?

    Unless gzipping your text output has a clear performance cost, I would keep my HTML code with whitespace (if not perfect, then at least decently readable code). Makes it easier to debug when you need to check the real thing (Firebug only shows what Firefox understands, not what it gets from the server).

    Do you happen to use this method, then gzip (through mod_deflate for instance)? If so, how much do you save compared to a gzipped version with all whitespace intact?
  • David Cramer · 1 year ago
    We GZIP as well. Sadly, everyone's still not on broadband today, so even shaving off 5 or 10k from the request can be quite useful (especially when the amount of time it takes to do that is immeasurable).