logo       

Re: In kernel PIC support: kernel patch: msg#00642

emulators.kvm.devel

Subject: Re: In kernel PIC support: kernel patch

On Thu, 2007-06-21 at 13:31 +0000, Gregory Haskins wrote:
> On Thu, 2007-06-21 at 21:02 +0800, Dong, Eddie wrote:
> >
> > Achitectually not only that. A premature IRR->ISR will cause guest OS
> > confuse in many place. A guest (say BIOS) may turn from interrupt
> > enabled mode to polling mode which polls IRR to detect if
> > there is pending IRQ to handle. In this case we have trouble.
>
> I completely agree. But what I am saying is that I can eliminate the
> premature IRR->ISR with the change I am proposing.

After thinking about this some more, I think I finally understand your
point. I was thinking that you were objecting to the multiple unacked
vectors in ISR. But really this could be a problem for *any* unacked
vectors? Hmm..if this is the case, you were right and I was wrong. My
bad :)

So assuming this newly enlightened position is true, I think this means
we have a choice:

1) Drop support for mixed "level-1" mode and move the PIC to the kernel
now as Eddie is doing

2) Keep the level-1/2 distinction, and add support for making sure that
once a vector is acked in the PIC, it truely gets put into service
immediately.

I can think of a really simple interface for (2). All we really need to
do is

a) go back to synchronous injection (as previously suggested)
b) promote kvm_cpuirq_extint to be higher than nmis and localints.

This will ensure the vector is immediately placed into service even over
in-kernel sources. Deferred interrupts would still take a higher
priority, but we would never intack the pic if deferred interrupts were
pending since irq.pending would be true.

Thoughts?
-Greg

>
> >
> >
> > > However, as I mentioned we can fix that issue with just a few new
> > > lines of
> > > code and by
> > > reverting the userspace injection model to the one used in prior
> > > releases without having to implement an entire in-kernel PIC to do it.
> >
> > In-kernel PIC gives us a full in kernel irqchip solution and performance
> > gain. Some OSes may use PIC only.
>
> Agreed. I will answer this down below with the level-0/1/2 question
> since they are related.
>
>
> >
> > >
> > > I think moving the PIC into the kernel has the advantage of allowing
> > > us to move the PIT into the kernel too (which is huge, IMHO), but
> >
> > Not very big, I just want to do one by one. we have done the code
> > in Xen already.
> >
> > > that is its sole advantage. Don't get me wrong: I am in favor of
> > > doing it. However, I wanted to point out that this particular
> > > solution to the problem you found essentially is invalidating
> > > "level-1" mode by only supporting level-0 or level-2, not fixing it.
> >
> > Not exactly understand level-0/1/2 meaning? Do u mean we mixed
> > irqchip mode is a feature requirement? I didn't see the necessity,
> > maybe I am wrong. But isn't both PIC and APIC in kernel much
> > nature and simple for us?
>
> At one point there was a debate between moving just the LAPIC, or moving
> everything (LAPIC/PIC/IOAPIC/PIT). Avi suggested that we start with
> just the LAPIC and see where we were at. Because we need to remain
> forward and backwards ABI compatible, I added an ioctl for setting the
> in-kernel-pic level. "0" is backwards compatible mode (everything in
> userspace). "1" is LAPIC only. "2" is presumable everything in the
> kernel, but today only 0 and 1 are the only two defined so there could
> be more than 3 levels someday if we wanted.
>
> >
> > If you really think supporting mixed irqchip mode is a must
>
> I will leave it to Avi to decide since he implicitly suggested it. But
> suffice to say, if we *dont* want to support it we will need to get the
> other in-kernel stuff into the lapic branch in its entirety before it
> can be merged so we dont create an ABI issue.
>
>
> > , we
> > need to introduce an intack I/F to notify user level irqchip if the irq
> > is really ACKed or not. We can do that but I doubt the necessity.
>
> This is where we disagree, as I know you have mentioned this before.
> The solution I am proposing (which reverts the userspace injection
> method) would not require this ACK that you mention. The reason is that
> the synchronous nature of the injection takes care of things naturally.
> Here is how:
>
> With sync-injection, we can only move a vector from the IRR->ISR if
> RFLAGS.IF and !irq.pending. Once we inject, the vector will move to ISR
> and get queued in kernel (thereby setting an irq.pending bit). It may
> not inject to the guest immediately, but thats ok because no new vectors
> will be acked in the PIC until that previous vector is processed
> (because irq.pending will remain set). Therefore, we will never
> prematurely move IRR->ISR until the previous vector really is
> "in-service". Does this make sense?
>
> My suggestion assumes that a single vector can stay in ISR without
> actually being in service without ill effect (Todays v9 code could
> presumably have multiple vectors in ISR without being in service, which
> I admit is bogus). If this assumption is false, then I agree with you
> that we need extra intack I/F. What are you thoughts here?
>
>
> >
> > > Perhaps we are "ok" with that, but I was under the distinct
> > > impression from Avi et. al. that these variable levels of support
> > > were a design goal for debugging purposes.
> > >
> > > I would prefer that we just fix level-1 mode with the changes I
> > > mentioned instead of just disabling it (in addition to adding level-2
> > > mode as Eddie is working on).
> > >
> > >>
> > >> This patch fixes this by introducing new VM level IOCTL
> > >> KVM_SET_ISA_IRQ_LEVEL & KVM_CREATE_PIC to avoid the ABI hole. The
> > >> original KVM_INTERRUPT is still there to be backward compatible.
> > >>
> > >> WIth this patch, old Qemu, new qemu (after this patch), old kvm, new
> > >> kvm can work in any combination. Both user level code and kernel
> > >> code will automatically decide the irq source base on irqchip
> > >> location (user or kernel).
> > >>
> > >> Some known issues (TODO):
> > >> 1: SVM support is on going.
> > >> The code for VMX to inject irq is same with Xen now since the
> > >> situation is same now (irqchip in hyprevisor), SVM code should be
> > >> able to directly reuse from Xen too.
> > >> 2: live migration break.
> > >> 3: Temply APIC support is removed in CPUID to wait for the merge
> > >> of in kernel APIC patch
> > >
> > > Note that you will need more than just the APIC patch. My patch only
> > > moves the LAPIC down. The IOAPIC and 8259s are still in userspace.
> > > Your patch only moves the 8259s down. This means there is a gap where
> > > the IOAPIC used to be that still needs to be addressed.
> >
> > Why do u think I/O APIC must be moved down too? A new IOCTL like
> > KVM_IOAPIC_MESSAGE can solve the interface issue IMO and quit
> > natual.
>
> I suppose, but it somewhat defeats the purpose IMO. Every pin in the
> 8259 that gets tickled implicitly means an IOAPIC pin was tickled also.
> Do we really want to go to userspace for that? Essentially what happens
> then is the PIT ends up having to go through userspace for every tick at
> that point. On the flip side, the IOAPIC model is very simple so I
> think it makes more sense to move it one to one with the PIC.
>
> Regards,
> -Greg
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise