Subject: [jira] Commented: (HIVE-587) Duplicate result from multiple TIPs of the same task - msg#00600
List: hive-dev-hadoop-apache
[
https://issues.apache.org/jira/browse/HIVE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725333#action_12725333
]
Namit Jain commented on HIVE-587:
---------------------------------
I did not understand the changes in ScriptOperator
>
Duplicate result from multiple TIPs of the same task
>
----------------------------------------------------
>
>
Key: HIVE-587
>
URL: https://issues.apache.org/jira/browse/HIVE-587
>
Project: Hadoop Hive
>
Issue Type: Bug
>
Affects Versions: 0.3.0, 0.3.1
>
Reporter: Zheng Shao
>
Priority: Blocker
>
Attachments: HIVE-587.1.patch
>
>
>
On our cluster we found a job committed with duplicate output from different
>
TIPs of the same Task (from FileSinkOperator).
>
The reason is that FileSinkOperator.commit can be called at multiple TIPs of
>
the same task.
>
FileSinkOperator.jobClose() (which is called at the Hive Client side) should
>
do either:
>
A. Get all successful TIPs and only move the output files of those TIPs to
>
the output directory
>
B. Ignore TIPs from the JobInProgress, but only move one file out of
>
potentially several output files
>
B is preferred because A might be slow (if the job finished and immediately
>
got moved out of the JobTracker memory). Since we control the file name by
>
ourselves, we know exactly what the file names are.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
[jira] Updated: (HIVE-530) Map Join followup: optimize number of map-reduce jobs
[
https://issues.apache.org/jira/browse/HIVE-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zheng Shao updated HIVE-530:
----------------------------
Resolution: Fixed
Fix Version/s: 0.4.0
Release Note: HIVE-530. Map Join followup: optimize number of map-reduce
jobs. (Namit Jain via zshao)
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed. Thanks Namit!
> Map Join followup: optimize number of map-reduce jobs
> -----------------------------------------------------
>
> Key: HIVE-530
> URL: https://issues.apache.org/jira/browse/HIVE-530
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor
> Affects Versions: 0.4.0
> Reporter: Namit Jain
> Assignee: Namit Jain
> Fix For: 0.4.0
>
> Attachments: hive.530.1.patch, hive.530.2.patch, hive.530.3.patch,
> hive.530.4.patch
>
>
> Instead of creating a temporary destination after every mapJoin, avoid
> where-ever possible.
> Also, replace the Select after MapJoin with a ForwardOperator
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Next Message by Date:
click to view message preview
[jira] Updated: (HIVE-587) Duplicate result from multiple TIPs of the same task
[
https://issues.apache.org/jira/browse/HIVE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Namit Jain updated HIVE-587:
----------------------------
Resolution: Fixed
Fix Version/s: 0.4.0
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed. Thanks Zheng
> Duplicate result from multiple TIPs of the same task
> ----------------------------------------------------
>
> Key: HIVE-587
> URL: https://issues.apache.org/jira/browse/HIVE-587
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.3.0, 0.3.1
> Reporter: Zheng Shao
> Priority: Blocker
> Fix For: 0.4.0
>
> Attachments: HIVE-587.1.patch
>
>
> On our cluster we found a job committed with duplicate output from different
> TIPs of the same Task (from FileSinkOperator).
> The reason is that FileSinkOperator.commit can be called at multiple TIPs of
> the same task.
> FileSinkOperator.jobClose() (which is called at the Hive Client side) should
> do either:
> A. Get all successful TIPs and only move the output files of those TIPs to
> the output directory
> B. Ignore TIPs from the JobInProgress, but only move one file out of
> potentially several output files
> B is preferred because A might be slow (if the job finished and immediately
> got moved out of the JobTracker memory). Since we control the file name by
> ourselves, we know exactly what the file names are.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Previous Message by Thread:
click to view message preview
[jira] Updated: (HIVE-587) Duplicate result from multiple TIPs of the same task
[
https://issues.apache.org/jira/browse/HIVE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zheng Shao updated HIVE-587:
----------------------------
Status: Patch Available (was: Open)
> Duplicate result from multiple TIPs of the same task
> ----------------------------------------------------
>
> Key: HIVE-587
> URL: https://issues.apache.org/jira/browse/HIVE-587
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.3.0, 0.3.1
> Reporter: Zheng Shao
> Priority: Blocker
> Attachments: HIVE-587.1.patch
>
>
> On our cluster we found a job committed with duplicate output from different
> TIPs of the same Task (from FileSinkOperator).
> The reason is that FileSinkOperator.commit can be called at multiple TIPs of
> the same task.
> FileSinkOperator.jobClose() (which is called at the Hive Client side) should
> do either:
> A. Get all successful TIPs and only move the output files of those TIPs to
> the output directory
> B. Ignore TIPs from the JobInProgress, but only move one file out of
> potentially several output files
> B is preferred because A might be slow (if the job finished and immediately
> got moved out of the JobTracker memory). Since we control the file name by
> ourselves, we know exactly what the file names are.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Next Message by Thread:
click to view message preview
[jira] Updated: (HIVE-587) Duplicate result from multiple TIPs of the same task
[
https://issues.apache.org/jira/browse/HIVE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Namit Jain updated HIVE-587:
----------------------------
Resolution: Fixed
Fix Version/s: 0.4.0
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed. Thanks Zheng
> Duplicate result from multiple TIPs of the same task
> ----------------------------------------------------
>
> Key: HIVE-587
> URL: https://issues.apache.org/jira/browse/HIVE-587
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.3.0, 0.3.1
> Reporter: Zheng Shao
> Priority: Blocker
> Fix For: 0.4.0
>
> Attachments: HIVE-587.1.patch
>
>
> On our cluster we found a job committed with duplicate output from different
> TIPs of the same Task (from FileSinkOperator).
> The reason is that FileSinkOperator.commit can be called at multiple TIPs of
> the same task.
> FileSinkOperator.jobClose() (which is called at the Hive Client side) should
> do either:
> A. Get all successful TIPs and only move the output files of those TIPs to
> the output directory
> B. Ignore TIPs from the JobInProgress, but only move one file out of
> potentially several output files
> B is preferred because A might be slow (if the job finished and immediately
> got moved out of the JobTracker memory). Since we control the file name by
> ourselves, we know exactly what the file names are.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.