Zoravur's Blog

Napkin Math 2: Applying It to Computers

In this post, I will be explaining the technique of estimating performance from first principles. It’s based heavily on the following talk: https://www.youtube.com/watch?v=IxkSlnrRFqc.

To recap, the purpose of napkin math is to provide a quick approximation to within an order of magnitude of the real answer, for the purposes of quickly testing the feasibility of the design of a system. Last time, we talked about Fermi estimation, which is (according to my definition) just napkin math where you’re primarily multiplying, and isn’t necessarily focus on computers.

But for our purposes, the goal has always been to eventually estimate the performance of computer systems. Take, for example, the following problem, also presented within the talk:

I had memorized some base rates / did some googling, before arriving at my answer. If you don’t have the base rates off the top of your head, I encourage you to do the same. Here was my line of reasoning:

At this point, I waited for the speaker’s multiple choice options. My thought was that it would be dominated by the ipc latency, since reading from memory was on the order of 100 ns, and ipc is a higher order of magnitude here. Post-hoc, I could additionally justify this choice by noting that Redis’s GET operation is O(1). The options:

I ended up picking 10us, because I wasn’t sure if it would use unix domain sockets or tcp/udp loopback as the transport. But the answer was 25 us, so I basically picked the correct order of magnitude.

If we didn’t know the speed of memory access and IPC, it would have been much much more difficult to arrive at the correct answer. Therefore, it’s essential that we know as many of these reference points for computers that we can manage, so that we can solve these types of problems. It won’t get us the whole way though; we also needed some qualitative understanding of Redis and its implementation in order to arrive at the correct value. This is why it’s important to also read blog posts and understand the larger systems design ecosystem.

So let’s learn some reference points.

OperationLatencyThroughput1 MiB1 GiB
Sequential Memory R/W (64 bytes)0.5 ns
├ Single Thread, No SIMD10 GiB/s100 μs100 ms
├ Single Thread, SIMD20 GiB/s50 μs50 ms
├ Threaded, No SIMD30 GiB/s35 μs35 ms
├ Threaded, SIMD35 GiB/s30 μs30 ms
Network Same-Zone10 GiB/s100 μs100 ms
├ Inside VPC10 GiB/s100 μs100 ms
├ Outside VPC3 GiB/s300 μs300 ms
Hashing, not crypto-safe (64 bytes)25 ns2 GiB/s500 μs500 ms
Random Memory R/W (64 bytes)50 ns1 GiB/s1 ms1s
Fast Serialization [8] [9]N/A1 GiB/s1 ms1s
Fast Deserialization [8] [9]N/A1 GiB/s1 ms1s
System Call500 nsN/AN/AN/A
Hashing, crypto-safe (64 bytes)100 ns1 GiB/s1 ms1s
Sequential SSD read (8 KiB)1 μs4 GiB/s200 μs200 ms
Context Switch [1] [2]10 μsN/AN/AN/A
Sequential SSD write, -fsync (8KiB)10 μs1 GiB/s1 ms1s
TCP Echo Server (32 KiB)10 μs4 GiB/s200 μs200 ms
Decompression [11]N/A1 GiB/s1 ms1s
Compression [11]N/A500 MiB/s2 ms2s
Sequential SSD write, +fsync (8KiB)1 ms10 MiB/s100 ms2 min
Sorting (64-bit integers)N/A200 MiB/s5 ms5s
Sequential HDD Read (8 KiB)10 ms250 MiB/s2 ms2s
Blob Storage GET, 1 conn50 ms500 MiB/s2 ms2s
Blob Storage GET, n conn (offsets)50 msNW limit
Blob Storage PUT, 1 conn50 ms100 MiB/s10 ms10s
Blob Storage PUT, n conn (multipart)150 msNW limit
Random SSD Read (8 KiB)100 μs70 MiB/s15 ms15s
Serialization [8] [9]N/A100 MiB/s10 ms10s
Deserialization [8] [9]N/A100 MiB/s10 ms10s
Proxy: Envoy/ProxySQL/Nginx/HAProxy50 μs???
Network within same region250 μs2 GiB/s500 μs500 ms
Premium network within zone/VPC250 μs25 GiB/s50 μs40 ms
{MySQL, Memcached, Redis, ..} Query500 μs???
Random HDD Read (8 KiB)10 ms0.7 MiB/s2 s30m
Network between regions [6][Varies][i]25 MiB/s40 ms40s
Network NA Central <-> East25 ms25 MiB/s40 ms40s
Network NA Central <-> West40 ms25 MiB/s40 ms40s
Network NA East <-> West60 ms25 MiB/s40 ms40s
Network EU West <-> NA East80 ms25 MiB/s40 ms40s
Network EU West <-> NA Central100 ms25 MiB/s40 ms40s
Network NA West <-> Singapore180 ms25 MiB/s40 ms40s
Network EU West <-> Singapore160 ms25 MiB/s40 ms40s
Taken from https://github.com/sirupsen/napkin-math.

Whoa! That’s a lot. I don’t know about you, but I don’t really feel like memorizing this table. It’s a lot of values that don’t really mean much to me right now. We can get around this by learning the table incrementally – we do a practice problem, looking up the numbers as we need them, and then we add them to our Anki deck. We review our Anki deck every day, and hopefully we have it in our heads the next time we see a similar problem.

My next step in my systems design journey is to in fact go through the problems accompanying sirupsen’s talk, where problem n can be found at https://sirupsen.com/napkin/problem-{n}. I highly encourage you to watch it, and then go through these problems yourself. But that’s all I have to say about systems design for now.