Oh really, why is that?

web performance for the curious

+Ilya Grigorik, @igrigorik
Make The Web Fast, Google

black box, noun

  1. 1. Any complex piece of equipment with contents that are mysterious to the user.

  • Black boxes suck.
  • Why? Because we can't ask why, by definition. QED.




  • WebKit comes in a white box - progress!
  • Astute observation: an open white box!
  • So why do we treat it as a closed box?

{Black, White} box theory, in theory







Given inputs X,Y,Z the output is...

{Black, White} box theory, in practice




!@!#!$&@!



  • Personal theory: combination of humidity and wind direction

We give ourselves (way) too much credit...




  • Deterministic replay is, at best, a nascent area of research
  • Yes, we can create synthentic environments. That's about it.



Any sufficiently complicated system will produce inexplicable behaviors... And that's ok.

Leave the box at home!

New model: why not a magical creature instead? *




* Hard to spot, lots of myths, but at least there are sightings...

Mozilla: Gecko






A gecko fox, on fire.. or somethin.. It's cute though!

Opera: Presto


  • A hyperactive plumber, who likes to create optimized tubes!

Personal rendition, does not reflect views of my employer, etc, etc...

Microsoft: Trident


Personal rendition, does not reflect views of my employer, etc, etc...

Chrome: WebKit




  • Unofficial sighting on DeviantArt! A Chrome pony! A Chromy?

</detour>

So what is this WebKit breed anyway?

"WebKit is an open source web browser engine."

  • A browser consists of many components...
  • WebKit provides a platform API and an embedding API
  • WebKit is not a "browser" on its own


  • WebKit ships with many swappable component implementations (gray)
  • Some components must be provided
  • Some components can use defaults

WebCore

  • Resource dispatch & loading
  • Parsing
  • DOM construction
  • Layout, style resolution, painting
  • Event handling
  • JavaScript bindings
  • ...


The fun hard stuff.

JavaScript Engine

"WebKit's JavaScript engine, JavaScriptCore, based on KJS, is a framework separate from WebCore and WebKit, and is used on Mac OS X for applications other than web page JavaScript."



  • JavaScriptCore ships by default
  • Generational garbage collection
  • Interpreter + JIT
  • Can be swapped, e.g. V8

Platform API's

  • Native widget implementations
  • Font rendering
  • Rendering / paint engine
  • Audio / Video codecs
  • Networking
  • JavaScript engine
  • ...


Performance, visual output, speed, security, capabilities all vary based on platform implementation.

WebKit Powered Browsers

Small sample of differences...

Chrome (OSX) WebKitGTK Android Browser Chrome for iOS
Rendering Skia Cairo Android stack CoreGraphics
Networking Own Chrome stack Soup Android stack Own Chrome stack
Fonts Quartz Pango Android stack Quartz
JavaScript V8 JavaScriptCore V8 JavaScriptCore (no JIT) *
...............

"UIWebView for rendering, no V8, and a single-process model... That said, there is a lot of code we do leverage, such as the network layer, the sync and bookmarks infrastructure, omnibox, metrics and crash reporting, ..." - Chrome for iOS

Dissecting a Browser

What does it take to render a page?

(Chrome) Networking Stack


An average page has grown to 1059kB (over 1MB!) and is now composed of 80+ subresources.


  • DNS prefetch - pre-resolve hostnames before we make the request
  • TCP preconnect - establish connection before we make the request
  • Pooling & re-use - leverage keep-alive, re-use existing connections (6 per host)
  • Caching - fastest request is request not made (sizing, validation, eviction, etc)

Ex, Chrome learns subresource domains:




(Chrome) Networking Stack



  • chrome://predictors - omnibox predictor stats (tip: check 'Filter zero confidences')
  • chrome://net-internals#sockets - current socket pool status
  • chrome://net-internals#dns - Chrome's in-memory DNS cache
  • chrome://histograms/DNS - histograms of your DNS performance
  • chrome://dns - startup prefetch list and subresource host cache
enum ResolutionMotivation {
    MOUSE_OVER_MOTIVATED,       // Mouse-over link induced resolution.
    PAGE_SCAN_MOTIVATED,        // Scan of rendered page induced resolution.
    LINKED_MAX_MOTIVATED,       // enum demarkation above motivation from links.
    OMNIBOX_MOTIVATED,          // Omni-box suggested resolving this.
    STARTUP_LIST_MOTIVATED,     // Startup list caused this resolution.
    EARLY_LOAD_MOTIVATED,       // In some cases we use the prefetcher to warm up the connection
    STATIC_REFERAL_MOTIVATED,   // External database suggested this resolution.
    LEARNED_REFERAL_MOTIVATED,  // Prior navigation taught us this resolution.
    SELF_REFERAL_MOTIVATED,     // Guess about need for a second connection.
    // ...
};
Best request is no request. Worst request is one that blocks the parser.
Networking stack attemps to hide network latency, when it can.
<!doctype html>
<meta charset=utf-8>
<title>Awesome HTML5 page</title>

<script src=application.js></script>
<link href=styles.css rel=stylesheet />

<p>I'm awesome.

HTMLDocumentParser begins parsing the received data...

      HTML
        - HEAD
          - META charset="utf-8"
          - TITLE
            #text: Awesome HTML5 page
          - SCRIPT src="application.js"
             ** stop **

Stop. Dispatch request for application.js. Wait...

<script> could doc.write, stop the world!

script "async" and "defer" are your only escape clauses

Preload Scanner to the rescue!


if (isWaitingForScripts()) {
    ASSERT(m_tokenizer->state() == HTMLTokenizerState::DataState);
    if (!m_preloadScanner) {
        m_preloadScanner = adoptPtr(new HTMLPreloadScanner(document()));
        m_preloadScanner->appendToEnd(m_input.current());
    }
    m_preloadScanner->scan();
}

HTMLPreloadScanner forges ahead, looking for blocking resources...

if (m_tagName != imgTag
    && m_tagName != inputTag
    && m_tagName != linkTag
    && m_tagName != scriptTag
    && m_tagName != baseTag)
    return;

Flush early, flush often





  • Network Predictor can run DNS prefetch & TCP-preconnect
  • PreloadScanner can fetch resources while parser is blocked
  • Early flush example: https://gist.github.com/3058839


// quick and dirty test.. show me those packets!
$> tcpdump -i en0 -A -n -s0 -vv tcp
$> curl www.igvita.com

Let the browser do its job...

  • Tip: Flush early, flush often, flush smart
  • Time to first packet matters when...
  • Content of first packet can tip-off the parser

  • CSSPreloadScanner scans for @import's only
  • Scheduling resources from scripts is expensive
  • Tip: Don't hide resources from the parser!

Let's build a Render tree...

Or, maybe an entire forest?

Welcome to the Render forest!





  • Some trees share objects
  • Independently constructed
  • Lazy construction, likes to defer work

RenderObject Tree StyleObject Tree RenderLayer Tree
owned by DOM tree computed styles for all renderers "helper" class for rendering
rendered content only owned by RenderObject tree used for <video>, <canvas, ...
responsible for layout & paint RenderObjects share RenderStyles Some RenderLayers have GPU layers
answers DOM API measurement requests RenderStyles share data members ...


Tip: querying layout forces a flush & breaks "lazy" evaluation - expensive. E.g, offsetWidth, offsetHeight.

jank, noun

  1. 1. Choppy performance
    "Scrolling on this page feels janky"
  2. 2. Discontinuos, surprising experience
    "What's with the jank on this page?"

butter, noun

  1. 1. Smooth performance (60 FPS)
    "Scrolling on this page is butter smooth"
60FPS? That's for games and stuff, right?


Wrong.

60FPS applies to web pages also!

Hardware Acceleration 101




  • A RenderLayer can have a GPU backing store
  • Certain elements are GPU backed automatically (canvas, video, CSS3 animations, ...)
  • Forcing a GPU layer: -webkit-transform:translateZ(0)
  • GPU is really fast at compositing, matrix operations and alpha blends


  • (1) The object is painted to a buffer (texture)
  • (2) Texture is uploaded to GPU
  • (3) Send commands to GPU: apply X to texture Y


  • Minimize CPU-GPU interactions
  • Texture uploads are not free
  • No upload: position, size, opacity
  • Texture upload: everything else


Tip: CSS3 Animations are as close to "free lunch" as you can get **

** WIP & Assuming no texture reuploads.. Animation runs entirely on GPU!
<style>
  .spin:hover {
    -webkit-animation: spin 2s infinite linear;
  }

  @-webkit-keyframes spin {
    0% { -webkit-transform: rotate(0deg);}
    100% { -webkit-transform: rotate(360deg);}
  }
</style>

<div class="spin" style="background-image: url(images/chrome-logo.png);"></div>


  • Look ma, no JavaScript!
  • Performance: YMMV, but improving rapidly

If you only remember one thing...

Pencil, notepad, editor, ready?

http://code.google.com/p/chromium/source/search?q={query}






Create a bookmarklet, setup a shortcut.. or just spend an evening with it.





Let's try it!

Claims:

  • All stylesheets block rendering (ex, media=print)
  • CSS is always in the critical path


Really, why?

Quick recap



You can't ask "why" of a black box. Good news: the browser is not a black box.


  • WebKit is a browser engine
  • There are platform & vendor differences between "WebKit" browsers
  • Don't hide resources from the browser: network stack is your ally
  • Flush early, flush often, flush smart
  • Avoid style readback & JavaScript layouts
  • Stay within your 16ms frame budget
  • Leverage hardware acceleration where you can


Check the source.

Fin. Questions?





http://code.google.com/p/chromium/source/search?q={query}

wwwwww.igvita.com
g+gplus.to/igrigorik
twitter@igrigorik