osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 2 tier input


Both solutions mean that i cannot use the beam IO classes that will be
me the distribution, but i would have to get the data myself using a
ParDo method, is this something that will change in the future? i
understand that spark has a push down method that will pass the filter
to the next level of querys.
chaim
On Mon, Oct 22, 2018 at 4:02 PM Jeff Klukas <jklukas@xxxxxxxxxxx> wrote:
>
> Chaim - If the full list of IDs is able to fit comfortably in memory and the Mongo collection is small enough that you can read the whole collection, you may want to fetch the IDs into a Java collection using the BigQuery API directly, then turn them into a Beam PCollection using Create.of(collection_of_ids). You could then use MongoDbIO.read() to read the entire collection, but throw out rows based on the side input of IDs.
>
> If the list of IDs is particularly small, you could fetch the collection into memory and parse that into a string filter that you pass to MongoDbIO.read() to specify which documents to fetch, avoiding the need for a side input.
>
> Otherwise, if it's a large number of IDs, you may need to use Beam's BigQueryIO to create a PCollection for the IDs, and then pass that into a ParDo with a custom DoFn that issues Mongo queries for a batch of IDs. I'm not very familiar with Mongo APIs, but you'd need to give the DoFn a connection to Mongo that's serializable. You could likely look at the implementation of MongoDbIO for inspiration there.
>
> On Sun, Oct 21, 2018 at 5:18 AM Chaim Turkel <chaim@xxxxxxxxxx> wrote:
>>
>> hi,
>>   I have the following flow i need to implement.
>> From the bigquery i run a query and get a list of id's then i need to
>> load from mongo all the documents based on these id's and export them
>> as an xml file.
>> How do you suggest i go about doing this?
>>
>> chaim
>>
>> --
>>
>>
>> Loans are funded by
>> FinWise Bank, a Utah-chartered bank located in Sandy,
>> Utah, member FDIC, Equal
>> Opportunity Lender. Merchant Cash Advances are
>> made by Behalf. For more
>> information on ECOA, click here
>> <https://www.behalf.com/legal/ecoa/>. For important information about
>> opening a new
>> account, review Patriot Act procedures here
>> <https://www.behalf.com/legal/patriot/>.
>> Visit Legal
>> <https://www.behalf.com/legal/> to
>> review our comprehensive program terms,
>> conditions, and disclosures.

-- 


Loans are funded by
FinWise Bank, a Utah-chartered bank located in Sandy, 
Utah, member FDIC, Equal
Opportunity Lender. Merchant Cash Advances are 
made by Behalf. For more
information on ECOA, click here 
<https://www.behalf.com/legal/ecoa/>. For important information about 
opening a new
account, review Patriot Act procedures here 
<https://www.behalf.com/legal/patriot/>.
Visit Legal 
<https://www.behalf.com/legal/> to
review our comprehensive program terms, 
conditions, and disclosures.