They do outperform parallel individuals, but in cost of higher pressure on coordinators which leads to more blocked Natives and dropped mutations,
Actually i think that 10-20% better write performance + 20-30% less CPU usage on client machines (we don't care about client machines in compare with cluster machines) which is outcome of batch statements with multiple partitions on each batch, does not worth it, because less-busy cluster nodes are needed to answer read queries, compactions, repairs, etc
I assume that in most scenarios, the client machines are in the same network with Cassandra cluster, so is it still faster?
Did you compare the cluster performance including blocked natives, dropped mutations, 95 percentiles, cluster CPU usage, etc in two scenarios (batch vs single)?
Although 500M per day is not so much for 8 nodes cluster (if the node spec is compliant with datastax recommendations) and async single statements could handle it (just demands high CPU on client machine), the impact of such things (non compliant batch statements annoying the cluster) would show up after some weeks, when suddenly a lot of cluster tasks need to be run simultaneously; one or two big compactions are running on most of the nodes, some hinted hand offs and cluster could not keep up and starts to became slower and slower.
The way to prevent it sooner, would be keep the error counters as low as possible, things like blocked NTPs, dropped, errors, hinted hinted hand-offs, latencies, etc.