Is it better to be fast or good?

In support, there are two gold standards for measuring an organization's quality: speed of responsiveness, and quality of response. Focusing heavily on one of these two areas will clearly result in a high level of amazement. The amazement is the currency of support. It's not the most important thing -- it's the only thing.

At the same time, it's impractical to focus on one at the expense of the other. If someone has to wait several days for a response, a modern poet won't be able to save the experience. On the other hand, an inept, but immediate response will infuriate people.

So, how do you reconcile the two. The most obvious answer is to build a system where great support people -- people who can wield language like a painter wields a brush -- are do their best work as quickly as they comfortably can.

Building an system like that is a holistic challenge. Few companies think about support beyond ticket counts and potential upselling, let alone providing a quality experience.

But, we got off track. Which is it? Is it better to be fast or good?

The secret is that it's not about the answer. If you're thinking about the question you're already winning.

What is trust

When you and I use a product, there's an inherent element of trust. We trust it will straight up work as advertised, we trust that the product won't put us at risk, and we trust that the product will improve my life in some way.

It's a human connection to something that's often very inhuman. At best, most people ignore or gloss over the inherent nature of trust in product/user relationship. At worst, they exploit it.

But, I can't recall a single moment where someone outright aknowleged it.

Thinking about the products I use on a daily basis, every use or habit is rooted in an aspect of trust:

  • Gmail shoud be up and working at all times, and they shouldn't leak my email to the world.
  • Zendesk delivers my support responses immediately, and processes replies directly to me.
  • GitHub keeps my code, serves my site, and syncs up with branches as I push them.
  • My macbook saves things, has a long battery life, and doesn't impede my workflow.
  • Dropbox syncs my stuff and serves as a simple backup.
  • The internet works. Like magic.

Low and behold, despite human beings operating these things, they work. The more I use these services, the more trust I place in them. They all will apologize for incidents that break this trust (downtime, security incidents, etc.), but no one addresses the core emotion.

Why is trust a third rail?

It doesn't have to be, and it shouldn't. When a person builds a personal relationship with something you make, don't ignore it. It's a chance to build a meaningful connection for a once.

Fastly Benchmark

I got tired of looking at third party benchmarks of Fastly using closed source tools. They were black boxes, so I couldn't justify the numbers (good or bad) that they claimed. So, Simon and I knocked out the ultimate mic-droppingly honest benchmarking we could.

Take a look at the Gist here, but the script is (intentionally) so short that I can paste it in:

#!/bin/bash

# Dependencies: ApacheBench, MTR

REQUESTS=100
CONCURRENCY=10
FASTLY='www.example.com.global.prod.fastly.net'
CURRENT='www.example.com'

for url in 'path/to/test' 'path/to/test2' 'path/to/test3'; do
  for host in $CURRENT $FASTLY; do
    ab -n $REQUESTS -c $CONCURRENCY "http://${host}/${url}" >> fastly.ab.log
    #echo "http://${host}/${url}"
  done
done

for host in $CURRENT $FASTLY; do
  mtr -c $REQUESTS -w -r $host >> fastly.mtr.log
done

That's it. No fancy whiz-bangery. No grandisose flash and flare. Just tried and true metrics tools that have been used in the industry for decades.

So, what's going on?

The script uses two benchmarking tools:

  • ApacheBench
  • [MTR](http://en.wikipedia.org/wiki/MTR_(software)

The user provides their existing domain, and the fastly generated service domain. Note: the site and service need to be configured proper ly for caching for the test to be realistic.

From there, the script tests as many paths as the user desires with ab. The primary metric you want to look for here is the connection times and the percentile results. These readouts are going to be a good approximation of TTFB for that object.

#Other apache heath stats

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       86   92   3.2     91      98
Processing:   531  239 155.0    860    1234
Waiting:      234  640 145.8    649     962
Total:        675  931 155.3    945    1325

Percentage of the requests served within a certain time (ms)
  50%    955
  66%   1029
  75%   1053
  80%   1058
  90%   1112
  95%   1197
  98%   1216
  99%   1325
 100%   1325 (longest request)

The second command, for MTR, tests 100 cycles of network connection for each host. The readout should give you about overall network performance. The preformance you see here should correlate with the ab results, but it's a seperate approach to verifying latency improvements.

HOST: ip-172-31-2-216                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. ec2-79-125-1-98.eu-west-1.compute.amazonaws.com  0.0%   100    0.8   1.2   0.6  27.8   3.2
  2. 178.236.0.220                                    0.0%   100    0.9   2.0   0.9  43.8   5.3
  3. 178.236.0.182                                    0.0%   100    1.6   1.6   1.2  18.0   1.8
  5. 178.236.3.52                                     0.0%   100   10.9  13.0  10.6  70.0   9.8
  6. 82.112.115.161                                   0.0%   100   11.6  11.5  11.1  12.8   0.2
  7. ae-13.r02.londen03.uk.bb.gin.ntt.net             0.0%   100   12.1  12.0  11.3  13.8   0.3
  8. te0-7-0-9.ccr21.lon02.atlas.cogentco.com         0.0%   100   12.4  12.0  11.3  13.8   0.5
  9. be2328.ccr21.lon01.atlas.cogentco.com            0.0%   100   11.5  11.8  11.5  13.9   0.4
 10. te2-1.ccr01.lon03.atlas.cogentco.com             0.0%   100   12.1  24.8  11.9 180.9  38.2
 11. ???                                             100.0   100    0.0   0.0   0.0   0.0   0.0
 12. 185.31.18.185                                    0.0%   100   14.9  12.5  11.0  15.4   1.8

The rule going forward

We're not stoping here with benchmarking and evaluation options. We want more exaustive tools that go deeper into edge cases, and don't require as much external configuration. Right now, you need to have ab and mtr on your machine, and you realistically need a few EC2 instances or other servers to test global performance. You need to have your Fastly service set up to cache properly, otherwise the tests will all be cache misses. Also, the ab test sucks on mac; it errors out every other benchmark. This is too much work for such a simple script. It needs to be improved upon.

But, the core principles of a simple, open, reproducable test are things that will be around in the future.

Things I wish I was better at

I am imperfect

I have many flaws, character and otherwise. Some are more critical to fix than others. But, here's a list of things I need to step up my game on:

  • I wish I were more patient.
  • I wish I were better at finishing projects.
  • I wish I took learning more seriously.
  • I wish I were more active in my community.
  • I wish I were better at forgiveness. Including forgiving myself.
  • I wish I were a better parter.

But, simply "wishing" does nothing. It's bullshit.

I won't be wishing anymore. I guess I'll be working.

Let's try this again.

Moving to a new layout

Let's hope to God this doesn't result in Holman drinking more.

I have a few older, long posts I'll port over. But, the goal here is short stuff. Let's see if that makes a difference.