logo       

Re: Zero copy transmit: msg#00263

Subject: Re: Zero copy transmit
On Tue, Apr 29, 2003 at 10:39:46PM +0200, Andi Kleen wrote:
> > Don't get me wrong, we would certainly drop any notions of this if we 
> > found that it was slower and I will be glad to post any results. The 
> > goal is to take advantage of the hardware to make things faster.
> 
> You have no hardware to make the remote TLB flushes fast ;)
> 
> I'm sure you can show it being an advantage with a single threaded process.
> But when you run it on a multithreaded application just with two threads
> it may look very different.
> 
Last time I checked, the IA64 processor provides a ptc.g instruction for
exactly this.  The only hit we take from using it is Intel limits it to
a single outstanding ptc.g pending machine wide.  This is accomplished with
a global spinlock.  I would love to convince Intel to change this instruction,
but that probably will not happen any time soon.

I will concede that the ptc.g instruction takes a considerable period of
time on our 64 processor machines, but that comes out to a lot of local
TLB coherence domains that need to be updated.

I believe there is a similar instruction for x86.  Could someone verify
this?





<Prev in Thread] Current Thread [Next in Thread>