osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TextMessage.getText returning null


Art,

Concurrent store and dispatch should be off by default for topics (which I
believe the test case is using).  Also, it wouldn't hit the store since
it's a topic and not a durable subscription so should just be in memory.
That's why this issue is weird as the messages shouldn't be concurrently
touched by any thread since they get copied.  (but there still seems to be
something going on that i am missing off the top of my head involving
multiple threads touching the message)

Clearing the text field is important because it prevents memory waste.  The
explanation is explained more in https://issues.apache.org/
jira/browse/AMQ-5857 If you don't clear it then extra memory is used that
isn't tracked and can lead to OOM errors (it's why the
reduceMemoryFootprintFlag exists in the broker).  So I think that change is
probably ok but a side effect is causing this other issue to pop up.

When I was doing some testing today, I tried a couple variations of
synchronization (using volatile, adding mutexes in places, etc) and some
worked and some didn't.  One thing that was interesting was that when I
synchronized on the copy() method of ActiveMQText message the issue was
fixed.  That is weird however because we shouldn't need to sync on copy.
So either there is a problem with copy() itself or adding the
synchronization is throwing off the timing just enough to hide the real
issue.

Ultimately I think adding synchronization in a test (like what you did)
will be useful to track down the problem since I think it's a race
condition issue but as a permanently fix I would avoid it.  The reason is
that it shouldn't be needed...the objects are designed to only be accessed
by one thread at a time and copied if necessary which makes sense to me.
However if adding synchronization does fix the issue then hopefully it will
at least be a clue as to what the real issue is.

Chris

On Thu, May 31, 2018 at 5:41 PM, Arthur Naseef <art@xxxxxxxxxx> wrote:

> Ahh, I bet that is a result of using the VM transport.  The broker is
> likely doing the "beforeMarshal()" call that clears the text just before
> the client is calling "getText()" on the same message - or just before
> "copy()" is called to make a copy of the message.
>
> Yeah, looking at the code carefully - the storeContent() method is handling
> "text != null" with "content = null" by converting text to a byte sequence
> (
> org.apache.activemq.util.ByteSequence) and stuffing that into "content".
>
> If all this happens concurrent with a copy() or a call to getText(), then
> it's entirely possible to get this race condition.  So, disabling
> "concurrent store and dispatch" could solve this problem.  Also, using
> critical sections in "storeContentAndClear()" as well as "getText()" might
> eliminate the race condition.
>
> With all that said, I don't see the big win of clearing the text field in
> storeContentAndClear().
>
> @Chris - can you try the patch here and see if it eliminates the problem?
> https://gist.github.com/artnaseef/d518c0431a34cd3d49e8d6e045a719e2
>
> I may have missed some paths that access both "text" and "content", but I
> think I got the important ones.  If that patch works, that increases
> confidence that you found the right race condition.
>
> Art
>
>
>
> On Thu, May 31, 2018 at 12:55 PM, Christopher Shannon <
> christopher.l.shannon@xxxxxxxxx> wrote:
>
> > Found the culprit: Seems to be related to
> > https://issues.apache.org/jira/browse/AMQ-5857
> >
> > Specifically, this commit:
> > https://git-wip-us.apache.org/repos/asf?p=activemq.git;a=
> > blobdiff;f=activemq-client/src/main/java/org/apache/activemq/command/
> > ActiveMQTextMessage.java;h=cca09be21bfb877213a4a506415929f1f81a1902;hp=
> > 347db78da559bf57b049574c96cd3a8e08681dae;hb=84ec047;hpb=
> > de86f473f776fe3a8ca32d7f929ecec7993c458b
> >
> > Changing the storeContentAndClear() call back to storeContent() fixes it,
> > however it was changed for a reason to prevent memory waste so need to
> see
> > if/what a permanent fix is
> >
> >
> > On Thu, May 31, 2018 at 3:54 PM, codingismy11to7 <steven@xxxxxxxxxxxxxx>
> > wrote:
> >
> > > Ok, thank you...my thought also was that performance impact of using
> tcp
> > > wouldn't be enough to hurt us in any substantial way, even though it
> > > *feels*
> > > really wasteful to be marshaling the message and using TCP when we're
> in
> > > the
> > > same JVM.
> > >
> > > In response to your first message, I changed the transport from nio://
> to
> > > tcp:// on all the non-vm threads and the behavior still reproduces.
> Using
> > > Java and raw AMQ libs is a much tougher problem...which I can do...but
> I
> > > probably won't be able to get to that for a few days.
> > >
> > >
> > >
> > > --
> > > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> > > f2368404.html
> > >
> >
>