osdir.com
mailing list archive

Subject: [jira] Commented: (HIVE-587) Duplicate result from multiple TIPs of the same task - msg#00600

List: hive-dev-hadoop-apache

Date: Prev Next Index Thread: Prev Next Index

[
https://issues.apache.org/jira/browse/HIVE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725333#action_12725333
]

Namit Jain commented on HIVE-587:
---------------------------------

I did not understand the changes in ScriptOperator

> Duplicate result from multiple TIPs of the same task
> ----------------------------------------------------
>
> Key: HIVE-587
> URL: https://issues.apache.org/jira/browse/HIVE-587
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.3.0, 0.3.1
> Reporter: Zheng Shao
> Priority: Blocker
> Attachments: HIVE-587.1.patch
>
>
> On our cluster we found a job committed with duplicate output from different
> TIPs of the same Task (from FileSinkOperator).
> The reason is that FileSinkOperator.commit can be called at multiple TIPs of
> the same task.
> FileSinkOperator.jobClose() (which is called at the Hive Client side) should
> do either:
> A. Get all successful TIPs and only move the output files of those TIPs to
> the output directory
> B. Ignore TIPs from the JobInProgress, but only move one file out of
> potentially several output files
> B is preferred because A might be slow (if the job finished and immediately
> got moved out of the JobTracker memory). Since we control the file name by
> ourselves, we know exactly what the file names are.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

[jira] Updated: (HIVE-530) Map Join followup: optimize number of map-reduce jobs

[ https://issues.apache.org/jira/browse/HIVE-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-530: ---------------------------- Resolution: Fixed Fix Version/s: 0.4.0 Release Note: HIVE-530. Map Join followup: optimize number of map-reduce jobs. (Namit Jain via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Namit! > Map Join followup: optimize number of map-reduce jobs > ----------------------------------------------------- > > Key: HIVE-530 > URL: https://issues.apache.org/jira/browse/HIVE-530 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor > Affects Versions: 0.4.0 > Reporter: Namit Jain > Assignee: Namit Jain > Fix For: 0.4.0 > > Attachments: hive.530.1.patch, hive.530.2.patch, hive.530.3.patch, > hive.530.4.patch > > > Instead of creating a temporary destination after every mapJoin, avoid > where-ever possible. > Also, replace the Select after MapJoin with a ForwardOperator -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.

Next Message by Date: click to view message preview

[jira] Updated: (HIVE-587) Duplicate result from multiple TIPs of the same task

[ https://issues.apache.org/jira/browse/HIVE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-587: ---------------------------- Resolution: Fixed Fix Version/s: 0.4.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Zheng > Duplicate result from multiple TIPs of the same task > ---------------------------------------------------- > > Key: HIVE-587 > URL: https://issues.apache.org/jira/browse/HIVE-587 > Project: Hadoop Hive > Issue Type: Bug > Affects Versions: 0.3.0, 0.3.1 > Reporter: Zheng Shao > Priority: Blocker > Fix For: 0.4.0 > > Attachments: HIVE-587.1.patch > > > On our cluster we found a job committed with duplicate output from different > TIPs of the same Task (from FileSinkOperator). > The reason is that FileSinkOperator.commit can be called at multiple TIPs of > the same task. > FileSinkOperator.jobClose() (which is called at the Hive Client side) should > do either: > A. Get all successful TIPs and only move the output files of those TIPs to > the output directory > B. Ignore TIPs from the JobInProgress, but only move one file out of > potentially several output files > B is preferred because A might be slow (if the job finished and immediately > got moved out of the JobTracker memory). Since we control the file name by > ourselves, we know exactly what the file names are. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.

Previous Message by Thread: click to view message preview

[jira] Updated: (HIVE-587) Duplicate result from multiple TIPs of the same task

[ https://issues.apache.org/jira/browse/HIVE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-587: ---------------------------- Status: Patch Available (was: Open) > Duplicate result from multiple TIPs of the same task > ---------------------------------------------------- > > Key: HIVE-587 > URL: https://issues.apache.org/jira/browse/HIVE-587 > Project: Hadoop Hive > Issue Type: Bug > Affects Versions: 0.3.0, 0.3.1 > Reporter: Zheng Shao > Priority: Blocker > Attachments: HIVE-587.1.patch > > > On our cluster we found a job committed with duplicate output from different > TIPs of the same Task (from FileSinkOperator). > The reason is that FileSinkOperator.commit can be called at multiple TIPs of > the same task. > FileSinkOperator.jobClose() (which is called at the Hive Client side) should > do either: > A. Get all successful TIPs and only move the output files of those TIPs to > the output directory > B. Ignore TIPs from the JobInProgress, but only move one file out of > potentially several output files > B is preferred because A might be slow (if the job finished and immediately > got moved out of the JobTracker memory). Since we control the file name by > ourselves, we know exactly what the file names are. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.

Next Message by Thread: click to view message preview

[jira] Updated: (HIVE-587) Duplicate result from multiple TIPs of the same task

[ https://issues.apache.org/jira/browse/HIVE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-587: ---------------------------- Resolution: Fixed Fix Version/s: 0.4.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Zheng > Duplicate result from multiple TIPs of the same task > ---------------------------------------------------- > > Key: HIVE-587 > URL: https://issues.apache.org/jira/browse/HIVE-587 > Project: Hadoop Hive > Issue Type: Bug > Affects Versions: 0.3.0, 0.3.1 > Reporter: Zheng Shao > Priority: Blocker > Fix For: 0.4.0 > > Attachments: HIVE-587.1.patch > > > On our cluster we found a job committed with duplicate output from different > TIPs of the same Task (from FileSinkOperator). > The reason is that FileSinkOperator.commit can be called at multiple TIPs of > the same task. > FileSinkOperator.jobClose() (which is called at the Hive Client side) should > do either: > A. Get all successful TIPs and only move the output files of those TIPs to > the output directory > B. Ignore TIPs from the JobInProgress, but only move one file out of > potentially several output files > B is preferred because A might be slow (if the job finished and immediately > got moved out of the JobTracker memory). Since we control the file name by > ourselves, we know exactly what the file names are. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by