[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Not all files are processed? Stream source with ContinuousFileMonitoringFunction

Dear flinksters,

I'm using the class `ContinuousFileMonitoringFunction` as a source to monitor a folder for new incoming files. I have the problem that not all the files that are sent to the folder get processed / triggered by the function. Specific details of my workflow is that I send up to 1k files per minute, and that I consume the stream as a `AsyncDataStream`.

I myself raised an unrelated issue with the `ContinuousFileMonitoringFunction` class some time ago (https://issues.apache.org/jira/browse/FLINK-8046): if two or more files shared the very same timestamp, only the first one (non-deterministically chosen) would be processed. However, I patched the file myself to fix that problem by using a LinkedHashMap to remember which files had been really processed before or not. My patch is working fine as far as I can tell.

The problem seems to be rather that some files (when many are sent at once to the same folder) do not even get triggered/activated/registered by the class.

Am I properly explaining my problem?

Any hints to solve this challenge would be greatly appreciated ! ❤ THANK YOU

Juanmi, CEO and co-founder @ 🍃tagtog.net

    Follow tagtog updates on 🐦 Twitter: @tagtog_net