[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Different query results for 0.12.2 and 0.10.1

Hi Samarth,

The doubleSum difference is likely due to the fact that before 0.11.0,
Druid read values out of columns as 32 bit floats and then cast them to 64
bit doubles. Now it can read them directly as 64 bit doubles. And actually,
it can _store_ floating point values as 64 bit doubles too, although this
won't be enabled by default until 0.13.0 (see
for how to enable it today).

Some thoughts on specific query types:

- The ordering of select results can vary due to differing choices about
which segments to read first. The results will stay in time order, but two
results with the same timestamp might swap positions. Btw, if you don't
need the strict time ordering guarantees, consider Scan queries (
http://druid.io/docs/latest/querying/scan-query.html) which are much
lighter in terms of memory usage.
- The exact ranking and values of TopN results can also vary, since topNs
are approximate and their results can vary based on which segments are
processed in which order and on which servers.
- GroupBy I would not expect to vary: what kinds of differences are you
seeing there?
- Search I'm not familiar with enough to think of a reason why it should or
shouldn't vary.

One thing you can do to try to get more consistent results for comparison
is add "bySegment" : true to your context. This will skip the merging step,
and just return sub-results for each segment individually. Most of the
potential variation is introduced in the merging step, so this should give
you more consistent results. With the caveat that it means you won't be
getting to test the merging step.

On Sun, Aug 5, 2018 at 10:55 PM Samarth Jain <samarth.jain@xxxxxxxxx> wrote:

> I have an internal test harness setup that I am using for testing version
> upgrade from Druid 0.10.1 to 0.12.2. As part of the testing, I noticed that
> executing the same query against the same data sources(on different druid
> clusters) gives slightly different results for 0.10.1 and 0.12.2. I have
> seen this happen for search, group by, top n, select query types. The
> common part in all such queries is that they have a paging spec with
> descending set to false.
> "pagingSpec": {"pagingIdentifiers": {}, "threshold": 5000}
> "desceding": false
> My guess is that data distribution is slightly differently within the two
> clusters which combined with paging spec is causing this mismatch. Is my
> guess correct? If so, is there a way to make such kind of testing
> deterministic.
> The other thing that I observed is that with doubleSum aggregation type,
> 0.10.1 is returning values with lower precision (ex - 616346.0) as opposed
> to 0.12.1 (ex - 616346.0208094628). Did something change to cause this
> change in precision?