On Tuesday, November 21, 2006 Ryan Barrett wrote:
>
Design 2: Issue reads in parallel:
>
10 ms/seek + 256K read / 30 MB/s = 18 ms
>
(Ignores variance, so really more like 30-60 ms, probably)
Wonderful table. But keep in mind that this thingie above
is true only if you have [at least] 30 disks, and all your images
are evenly spread over these disks - the case of the big server
farm, more or less. Certainly true for Google, but try this on
a single machine, and you'll be back to 560 ms from Design 1.
Best wishes -
S.Osokine.
21 Nov 2006.
-----Original Message-----
From: p2p-hackers-bounces@xxxxxxxxxxxxxxx
[
mailto:p2p-hackers-bounces@xxxxxxxxxxxxxxx]On Behalf Of Ryan Barrett
Sent: Tuesday, November 21, 2006 12:34 PM
To: p2p-hackers@xxxxxxxxxxxxxxx
Subject: [p2p-hackers] designing with back-of-the-envelope estimates
i caught a great talk the other day by jeff dean on using
back-of-the-envelope
calculations to inform design decisions in large-scale systems. it's relevant
to the latency vs. bandwidth thread, and also just good to know in general,
so
i figured i'd share a few of the more interesting slides.
no link for the talk, but jeff is
http://labs.google.com/people/jeff/ .
(i'm sure we can dispute all of the numbers to varying degrees. the point
isn't to get them exactly right, just within an order of magnitude or so. :P)
Numbers Everyone Should Know
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with XXX 10,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from network 10,000,000 ns
Read 1 MB sequentially from disk 30,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns
Back of the Envelope Calculations
How long to generate image results page (30 thumbnails)?
Design 1: Read serially, thumbnail 256K images on the fly
30 seeks * 10 ms/seek + 30 * 256K / 30 MB/s = 560 ms
Design 2: Issue reads in parallel:
10 ms/seek + 256K read / 30 MB/s = 18 ms
(Ignores variance, so really more like 30-60 ms, probably)
Lots of variations:
 caching (single images? whole sets of thumbnails?)
 pre-computing thumbnails
Â...
Back of the envelope helps identify most promising...
How long to quicksort 1 GB of 4 byte numbers?
Comparisons: lots of unpredictable branches
log(2^28) passes over 2^28 numbers = ~2^33 comparisons
~1/2 will mispredict, so 2^32 mispredicts * 5 ns/mispredict = 21 secs
Memory bandwidth: mostly sequential streaming
2^30 bytes * 28 passes = 28 GB. Memory BW is ~4 GB/s, so ~7 secs
So, it should take ~30 seconds to sort 1 GB on one CPU
Estimate only! Ignores initial read from disk, parallelism, etc...
-Ryan
--
http://snarfed.org/
Thread at a glance:
Previous Message by Date:
click to view message preview
designing with back-of-the-envelope estimates
i caught a great talk the other day by jeff dean on using back-of-the-envelope
calculations to inform design decisions in large-scale systems. it's relevant
to the latency vs. bandwidth thread, and also just good to know in general, so
i figured i'd share a few of the more interesting slides.
no link for the talk, but jeff is http://labs.google.com/people/jeff/ .
(i'm sure we can dispute all of the numbers to varying degrees. the point
isn't to get them exactly right, just within an order of magnitude or so. :P)
Numbers Everyone Should Know
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with XXX 10,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from network 10,000,000 ns
Read 1 MB sequentially from disk 30,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns
Back of the Envelope Calculations
How long to generate image results page (30 thumbnails)?
Design 1: Read serially, thumbnail 256K images on the fly
30 seeks * 10 ms/seek + 30 * 256K / 30 MB/s = 560 ms
Design 2: Issue reads in parallel:
10 ms/seek + 256K read / 30 MB/s = 18 ms
(Ignores variance, so really more like 30-60 ms, probably)
Lots of variations:
 caching (single images? whole sets of thumbnails?)
 pre-computing thumbnails
Â...
Back of the envelope helps identify most promising...
How long to quicksort 1 GB of 4 byte numbers?
Comparisons: lots of unpredictable branches
log(2^28) passes over 2^28 numbers = ~2^33 comparisons
~1/2 will mispredict, so 2^32 mispredicts * 5 ns/mispredict = 21 secs
Memory bandwidth: mostly sequential streaming
2^30 bytes * 28 passes = 28 GB. Memory BW is ~4 GB/s, so ~7 secs
So, it should take ~30 seconds to sort 1 GB on one CPU
Estimate only! Ignores initial read from disk, parallelism, etc...
-Ryan
--
http://snarfed.org/_______________________________________________
p2p-hackers mailing list
p2p-hackers@xxxxxxxxxxxxxxx
http://lists.zooko.com/mailman/listinfo/p2p-hackers
Next Message by Date:
click to view message preview
WiredReach Platform re-released
Hi all -
We have just re-released the WiredReach Platform (http://
www.wiredreach.org) with a much simpler and in many ways more
ambitious goal: To make it dead-simple for developers to create peer-
to-peer and peer-to-web applications.
Failing to find a general-purpose p2p platform back in 2004, we
started writing our own. Over the course of the project, we picked up
many other open source pieces and tied them together in a way that we
feel works towards optimizing developer productivity, standards
compliance and freedom to focus on "application" development versus
lower-level concerns. We launched over first commercial application -
BoxCloud (http://www.boxcloud.com) using this platform earlier this
summer. The Platform is Java based and promotes cross-platform
development. The P2P infrastructure is currently provided with a
minimalist use of JXTA but we are open to supporting any other P2P
implementation that makes sense.
Our main objective for this project, apart from giving something
back, is to encourage more p2p and p2web applications in the
marketplace. We are only at the beginning and are counting on the
support and feedback of the community to help take P2P/P2Web even
more into the mainstream.
Cheers,
Ash
---
Founder, WiredReach
company: http://www.wiredreach.com
blog: http://www.wiredjournal.com
Previous Message by Thread:
click to view message preview
designing with back-of-the-envelope estimates
i caught a great talk the other day by jeff dean on using back-of-the-envelope
calculations to inform design decisions in large-scale systems. it's relevant
to the latency vs. bandwidth thread, and also just good to know in general, so
i figured i'd share a few of the more interesting slides.
no link for the talk, but jeff is http://labs.google.com/people/jeff/ .
(i'm sure we can dispute all of the numbers to varying degrees. the point
isn't to get them exactly right, just within an order of magnitude or so. :P)
Numbers Everyone Should Know
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with XXX 10,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from network 10,000,000 ns
Read 1 MB sequentially from disk 30,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns
Back of the Envelope Calculations
How long to generate image results page (30 thumbnails)?
Design 1: Read serially, thumbnail 256K images on the fly
30 seeks * 10 ms/seek + 30 * 256K / 30 MB/s = 560 ms
Design 2: Issue reads in parallel:
10 ms/seek + 256K read / 30 MB/s = 18 ms
(Ignores variance, so really more like 30-60 ms, probably)
Lots of variations:
 caching (single images? whole sets of thumbnails?)
 pre-computing thumbnails
Â...
Back of the envelope helps identify most promising...
How long to quicksort 1 GB of 4 byte numbers?
Comparisons: lots of unpredictable branches
log(2^28) passes over 2^28 numbers = ~2^33 comparisons
~1/2 will mispredict, so 2^32 mispredicts * 5 ns/mispredict = 21 secs
Memory bandwidth: mostly sequential streaming
2^30 bytes * 28 passes = 28 GB. Memory BW is ~4 GB/s, so ~7 secs
So, it should take ~30 seconds to sort 1 GB on one CPU
Estimate only! Ignores initial read from disk, parallelism, etc...
-Ryan
--
http://snarfed.org/_______________________________________________
p2p-hackers mailing list
p2p-hackers@xxxxxxxxxxxxxxx
http://lists.zooko.com/mailman/listinfo/p2p-hackers
Next Message by Thread:
click to view message preview
WiredReach Platform re-released
Hi all -
We have just re-released the WiredReach Platform (http://
www.wiredreach.org) with a much simpler and in many ways more
ambitious goal: To make it dead-simple for developers to create peer-
to-peer and peer-to-web applications.
Failing to find a general-purpose p2p platform back in 2004, we
started writing our own. Over the course of the project, we picked up
many other open source pieces and tied them together in a way that we
feel works towards optimizing developer productivity, standards
compliance and freedom to focus on "application" development versus
lower-level concerns. We launched over first commercial application -
BoxCloud (http://www.boxcloud.com) using this platform earlier this
summer. The Platform is Java based and promotes cross-platform
development. The P2P infrastructure is currently provided with a
minimalist use of JXTA but we are open to supporting any other P2P
implementation that makes sense.
Our main objective for this project, apart from giving something
back, is to encourage more p2p and p2web applications in the
marketplace. We are only at the beginning and are counting on the
support and feedback of the community to help take P2P/P2Web even
more into the mainstream.
Cheers,
Ash
---
Founder, WiredReach
company: http://www.wiredreach.com
blog: http://www.wiredjournal.com