Запрос куста останавливается с сообщением об ошибке «Ошибка выполнения, код возврата 2 из org.apache.hadoop.hive.ql.exe»

Я использую Hive (с Yarn), установленный CDH-5.14.2-1, и создал базу данных, в которой хранится история покупок. Одна таблица с историей покупок содержит 1 000 000 000 кортежей.

Я попробовал выполнить следующий запрос, чтобы измерить производительность Hive.

SELECT c.gender, 
       g.NAME, 
       i.NAME, 
       Sum(b.num) 
FROM   customers c 
       JOIN boughts_bil b 
         ON ( c.id = b.cus_id 
              AND b.id < $var ) 
       JOIN items i 
         ON ( i.id = b.item_id ) 
       JOIN genres g 
         ON ( g.id = i.gen_id ) 
GROUP  BY c.gender, 
          g.NAME, 
          i.NAME; 

Кстати, так как я хочу попробовать без оптимизации, я не делал никаких разделов.

Когда я устанавливаю «$ var = 30,000,000», возникает ошибка «Ошибка выполнения, код возврата 2 из org.apache.hadoop.hive.ql.exe». На самом деле я использовал тот же запрос три месяца назад, и тогда он работал нормально.

Я проверил HistoryServer и написал, как показано ниже

Diagnostics: 
Application failed due to failed ApplicationMaster. 
Only partial information is available; some values may be inaccurate.

План Клодеры был «Экспресс», когда все шло хорошо, но теперь план стал «только для Enterprise». Это причина?

Или есть разные причины, например, ошибка нехватки памяти.

Пожалуйста, поделитесь своей мудростью.

Спасибо.

добавление

Хотя я поменял $var = 50, задача не была выполнена. Позавчера поставил $var = 1,000,000, задача выполнилась полностью.

Я думал, что причина не в данных или запросах, а в серверах.

Терминальное сообщение находится ниже:

Query ID = ..._20180813111111_92d8a1f2-4614-49c6-8833-d7b2e709c79c
Total jobs = 2
Stage-1 is selected by condition resolver.
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 557
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1534123434864_0480, Tracking URL = http://...:8088/proxy/application_1534123434864_0480/
Kill Command = /.../hadoop job  -kill job_1534123434864_0480
Hadoop job information for Stage-1: number of mappers: 140; number of reducers: 557
2018-08-13 11:11:49,795 Stage-1 map = 0%,  reduce = 0%
2018-08-13 11:12:39,732 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 159.56 sec
2018-08-13 11:12:40,808 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 428.58 sec
2018-08-13 11:12:41,884 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 649.73 sec
2018-08-13 11:12:42,965 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 945.71 sec
2018-08-13 11:12:44,040 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 1089.56 sec
2018-08-13 11:12:45,112 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 1154.54 sec
2018-08-13 11:12:46,197 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 1163.98 sec
2018-08-13 11:12:48,336 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 1195.79 sec
2018-08-13 11:12:50,465 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 1221.94 sec
2018-08-13 11:12:51,529 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 1243.78 sec
2018-08-13 11:12:52,628 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 1250.12 sec
2018-08-13 11:12:54,755 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 1258.72 sec
2018-08-13 11:12:55,818 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 1310.93 sec
2018-08-13 11:12:56,878 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 1402.61 sec
2018-08-13 11:12:57,936 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 1440.37 sec
2018-08-13 11:12:58,994 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 1514.14 sec
2018-08-13 11:13:00,049 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 1545.1 sec
2018-08-13 11:13:02,163 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 1603.52 sec
2018-08-13 11:13:03,228 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 1657.94 sec
2018-08-13 11:13:04,283 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 1717.53 sec
2018-08-13 11:13:05,339 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1730.17 sec
2018-08-13 11:13:11,744 Stage-1 map = 100%,  reduce = 1%, Cumulative CPU 1752.35 sec
2018-08-13 11:13:13,882 Stage-1 map = 100%,  reduce = 2%, Cumulative CPU 1755.7 sec
2018-08-13 11:13:14,947 Stage-1 map = 100%,  reduce = 3%, Cumulative CPU 1772.27 sec
2018-08-13 11:13:16,005 Stage-1 map = 100%,  reduce = 5%, Cumulative CPU 1818.92 sec
2018-08-13 11:13:17,067 Stage-1 map = 100%,  reduce = 7%, Cumulative CPU 1846.94 sec
2018-08-13 11:13:19,191 Stage-1 map = 100%,  reduce = 9%, Cumulative CPU 1885.14 sec
2018-08-13 11:13:20,251 Stage-1 map = 100%,  reduce = 10%, Cumulative CPU 1909.41 sec
2018-08-13 11:13:21,312 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 1922.64 sec
2018-08-13 11:13:25,546 Stage-1 map = 100%,  reduce = 13%, Cumulative CPU 1956.43 sec
2018-08-13 11:13:26,614 Stage-1 map = 100%,  reduce = 15%, Cumulative CPU 1995.36 sec
2018-08-13 11:13:27,683 Stage-1 map = 100%,  reduce = 17%, Cumulative CPU 2027.25 sec
2018-08-13 11:13:28,749 Stage-1 map = 100%,  reduce = 19%, Cumulative CPU 2066.51 sec
2018-08-13 11:13:29,819 Stage-1 map = 100%,  reduce = 20%, Cumulative CPU 2093.91 sec
2018-08-13 11:13:30,884 Stage-1 map = 100%,  reduce = 21%, Cumulative CPU 2100.15 sec
2018-08-13 11:13:31,947 Stage-1 map = 100%,  reduce = 23%, Cumulative CPU 2136.57 sec
2018-08-13 11:13:33,017 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 2168.52 sec
2018-08-13 11:13:34,076 Stage-1 map = 100%,  reduce = 27%, Cumulative CPU 2210.15 sec
2018-08-13 11:13:38,326 Stage-1 map = 100%,  reduce = 28%, Cumulative CPU 2226.99 sec
2018-08-13 11:13:39,389 Stage-1 map = 100%,  reduce = 29%, Cumulative CPU 2246.71 sec
2018-08-13 11:13:40,447 Stage-1 map = 100%,  reduce = 31%, Cumulative CPU 2281.74 sec
2018-08-13 11:13:41,511 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 2319.49 sec
2018-08-13 11:13:42,570 Stage-1 map = 100%,  reduce = 35%, Cumulative CPU 2350.72 sec
2018-08-13 11:13:45,746 Stage-1 map = 100%,  reduce = 36%, Cumulative CPU 2371.35 sec
2018-08-13 11:13:46,809 Stage-1 map = 100%,  reduce = 37%, Cumulative CPU 2391.87 sec
2018-08-13 11:13:48,924 Stage-1 map = 100%,  reduce = 39%, Cumulative CPU 2428.84 sec
2018-08-13 11:13:49,982 Stage-1 map = 100%,  reduce = 41%, Cumulative CPU 2461.64 sec
2018-08-13 11:13:51,030 Stage-1 map = 100%,  reduce = 42%, Cumulative CPU 2492.05 sec
2018-08-13 11:13:52,075 Stage-1 map = 100%,  reduce = 43%, Cumulative CPU 2512.36 sec
2018-08-13 11:13:53,138 Stage-1 map = 100%,  reduce = 46%, Cumulative CPU 2551.82 sec
2018-08-13 11:13:54,200 Stage-1 map = 100%,  reduce = 48%, Cumulative CPU 2598.15 sec
2018-08-13 11:13:55,262 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 2626.53 sec
2018-08-13 11:13:56,322 Stage-1 map = 100%,  reduce = 51%, Cumulative CPU 2644.72 sec
2018-08-13 11:13:57,362 Stage-1 map = 100%,  reduce = 52%, Cumulative CPU 2654.88 sec
2018-08-13 11:14:10,109 Stage-1 map = 100%,  reduce = 53%, Cumulative CPU 2670.23 sec
2018-08-13 11:14:11,167 Stage-1 map = 100%,  reduce = 54%, Cumulative CPU 2679.96 sec
2018-08-13 11:14:14,342 Stage-1 map = 100%,  reduce = 56%, Cumulative CPU 2709.52 sec
2018-08-13 11:14:28,034 Stage-1 map = 100%,  reduce = 57%, Cumulative CPU 2728.34 sec
2018-08-13 11:14:35,427 Stage-1 map = 100%,  reduce = 58%, Cumulative CPU 2747.36 sec
2018-08-13 11:14:39,652 Stage-1 map = 100%,  reduce = 59%, Cumulative CPU 2772.93 sec
2018-08-13 11:14:41,763 Stage-1 map = 100%,  reduce = 60%, Cumulative CPU 2788.89 sec
2018-08-13 11:14:48,042 Stage-1 map = 100%,  reduce = 61%, Cumulative CPU 2813.88 sec
2018-08-13 11:14:49,097 Stage-1 map = 100%,  reduce = 62%, Cumulative CPU 2826.24 sec
2018-08-13 11:14:53,335 Stage-1 map = 100%,  reduce = 63%, Cumulative CPU 2847.18 sec
2018-08-13 11:14:56,501 Stage-1 map = 100%,  reduce = 64%, Cumulative CPU 2868.39 sec
2018-08-13 11:14:58,614 Stage-1 map = 100%,  reduce = 65%, Cumulative CPU 2889.34 sec
2018-08-13 11:14:59,673 Stage-1 map = 100%,  reduce = 66%, Cumulative CPU 2889.52 sec
2018-08-13 11:15:01,785 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 2903.53 sec
2018-08-13 11:15:02,844 Stage-1 map = 100%,  reduce = 68%, Cumulative CPU 2909.72 sec
2018-08-13 11:15:03,903 Stage-1 map = 100%,  reduce = 69%, Cumulative CPU 2915.11 sec
2018-08-13 11:15:04,962 Stage-1 map = 100%,  reduce = 70%, Cumulative CPU 2939.29 sec
2018-08-13 11:15:06,022 Stage-1 map = 100%,  reduce = 73%, Cumulative CPU 2991.99 sec
2018-08-13 11:15:07,088 Stage-1 map = 100%,  reduce = 74%, Cumulative CPU 3008.11 sec
2018-08-13 11:15:08,147 Stage-1 map = 100%,  reduce = 75%, Cumulative CPU 3023.01 sec
2018-08-13 11:15:09,196 Stage-1 map = 100%,  reduce = 76%, Cumulative CPU 3029.96 sec
2018-08-13 11:15:12,359 Stage-1 map = 100%,  reduce = 77%, Cumulative CPU 3053.28 sec
2018-08-13 11:15:14,471 Stage-1 map = 100%,  reduce = 78%, Cumulative CPU 3074.76 sec
2018-08-13 11:15:16,585 Stage-1 map = 100%,  reduce = 79%, Cumulative CPU 3087.69 sec
2018-08-13 11:15:18,709 Stage-1 map = 100%,  reduce = 80%, Cumulative CPU 3104.28 sec
2018-08-13 11:15:20,824 Stage-1 map = 100%,  reduce = 81%, Cumulative CPU 3126.94 sec
2018-08-13 11:15:21,931 Stage-1 map = 100%,  reduce = 83%, Cumulative CPU 3166.12 sec
2018-08-13 11:15:22,979 Stage-1 map = 100%,  reduce = 85%, Cumulative CPU 3209.21 sec
2018-08-13 11:15:24,039 Stage-1 map = 100%,  reduce = 87%, Cumulative CPU 3245.82 sec
2018-08-13 11:15:25,096 Stage-1 map = 100%,  reduce = 88%, Cumulative CPU 3259.57 sec
2018-08-13 11:15:27,211 Stage-1 map = 100%,  reduce = 89%, Cumulative CPU 3275.9 sec
2018-08-13 11:15:29,326 Stage-1 map = 100%,  reduce = 90%, Cumulative CPU 3291.91 sec
2018-08-13 11:15:30,386 Stage-1 map = 100%,  reduce = 91%, Cumulative CPU 3318.16 sec
2018-08-13 11:15:31,441 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 3357.23 sec
2018-08-13 11:15:32,496 Stage-1 map = 100%,  reduce = 95%, Cumulative CPU 3382.19 sec
2018-08-13 11:15:33,548 Stage-1 map = 100%,  reduce = 96%, Cumulative CPU 3407.15 sec
2018-08-13 11:15:34,598 Stage-1 map = 100%,  reduce = 97%, Cumulative CPU 3419.89 sec
2018-08-13 11:15:37,755 Stage-1 map = 100%,  reduce = 98%, Cumulative CPU 3442.94 sec
2018-08-13 11:15:39,871 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 3449.41 sec
2018-08-13 11:15:45,128 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3475.74 sec
MapReduce Total cumulative CPU time: 57 minutes 55 seconds 740 msec
Ended Job = job_1534123434864_0480
Execution log at: /.../..._20180813111111_92d8a1f2-4614-49c6-8833-d7b2e709c79c.log
2018-08-13 11:15:51 Starting to launch local task to process map join;  maximum memory = 1908932608
2018-08-13 11:15:52 Dump the side-table for tag: 1 with group count: 24 into file: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable
2018-08-13 11:15:52 Uploaded 1 File to: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable (902 bytes)
2018-08-13 11:15:52 Dump the side-table for tag: 1 with group count: 3500 into file: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile11--.hashtable
2018-08-13 11:15:52 Uploaded 1 File to: file:/.../c33533aa-7637-4034-a3d1-2e8b857c2820/hive_2018-08-13_11-11-38_070_2752807246292956243-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile11--.hashtable (107794 bytes)
2018-08-13 11:15:52 End of local task; Time Taken: 1.54 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1534123434864_0536, Tracking URL = http://...:8088/proxy/application_1534123434864_0536/
Kill Command = /.../hadoop job  -kill job_1534123434864_0536
Hadoop job information for Stage-4: number of mappers: 4; number of reducers: 1
2018-08-13 11:16:23,048 Stage-4 map = 0%,  reduce = 0%
2018-08-13 11:16:44,240 Stage-4 map = 25%,  reduce = 0%, Cumulative CPU 2.28 sec
2018-08-13 11:16:46,330 Stage-4 map = 50%,  reduce = 0%, Cumulative CPU 5.06 sec
2018-08-13 11:16:49,473 Stage-4 map = 75%,  reduce = 0%, Cumulative CPU 9.58 sec
2018-08-13 11:16:50,520 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 15.14 sec
2018-08-13 11:17:12,471 Stage-4 map = 0%,  reduce = 0%
2018-08-13 11:17:42,680 Stage-4 map = 25%,  reduce = 0%, Cumulative CPU 2.2 sec
2018-08-13 11:17:44,779 Stage-4 map = 50%,  reduce = 0%, Cumulative CPU 5.25 sec
2018-08-13 11:17:46,873 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 15.0 sec
2018-08-13 11:18:12,006 Stage-4 map = 0%,  reduce = 0%
MapReduce Total cumulative CPU time: 15 seconds 0 msec
Ended Job = job_1534123434864_0536 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 140  Reduce: 557   Cumulative CPU: 3475.74 sec   HDFS Read: 37355213704 HDFS Write: 56143 SUCCESS
Stage-Stage-4: Map: 4  Reduce: 1   Cumulative CPU: 15.0 sec   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 58 minutes 10 seconds 740 msec
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

И я проверил yarn logs -applicationId application_1534123434864_0480, и в container_1534123434864_0480_02_000001 есть несколько видов ошибок.

(1)ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
Container complete event for unknown container container_1534123434864_0480_02_000143


(2)INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1534123434864_0480_r_000014_1000: 
Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
Killed by external signal

(3)INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diagnostics report from attempt_1534123434864_0480_r_000041_1000:
Container exited with a non-zero exit code 154

(4)ERROR [ContainerLauncher #1] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: 
Container launch failed for container_1534123434864_0480_02_000241 : 
java.io.IOException: Failed on local exception: java.io.IOException: java.io.IOException: 
Connection reset from partner; Host Details : local host is: "node3"; destination host is: "node2":8041; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
    at org.apache.hadoop.ipc.Client.call(Client.java:1508)
    at org.apache.hadoop.ipc.Client.call(Client.java:1441)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
    at com.sun.proxy.$Proxy40.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
    at com.sun.proxy.$Proxy41.startContainers(Unknown Source)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.io.IOException: Connection reset from partner
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:718)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:681)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:769)
    at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    ... 15 more
Caused by: java.io.IOException: Connection reset from partner
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
    at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:370)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:594)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:396)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:761)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:757)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:756)
    ... 18 more

Эти ошибки проявлялись не раз. Я думал, что сервер node2 проблемный.

"Начало работы = job_1533884886452_0204"> вы проверяли логи YARN через HistoryServer или через команду yarn logs -applicationId application_1533884886452_0204?
Samson Scharfrichter 10.08.2018 21:53

Спасибо. Я добавил журнал пряжи. Кажется, контейнер остановлен, поэтому я подумал, что причина в состоянии сервера.

tbt 11.08.2018 17:32

Я ожидал, что где-то в Exception будет отображаться какой-то ERROR или stderr - и некоторая паника в контейнере AppMaster (обычно * _000001) при обнаружении, что один из рабочих контейнеров был DOA. Боковое примечание: это другая работа # и, судя по всему, отрывок из журнала поступает со второй попытки

Samson Scharfrichter 11.08.2018 23:25

Я вижу. Задачи, которые я указал в своем вопросе, разные, поэтому я изменил ту же задачу. И я проверил журнал, есть ошибки. Я думал, что причина в сетевых проблемах между node3 и node2.

tbt 13.08.2018 04:50
Стоит ли изучать PHP в 2023-2024 годах?
Стоит ли изучать PHP в 2023-2024 годах?
Привет всем, сегодня я хочу высказать свои соображения по поводу вопроса, который я уже много раз получал в своем сообществе: "Стоит ли изучать PHP в...
Поведение ключевого слова "this" в стрелочной функции в сравнении с нормальной функцией
Поведение ключевого слова "this" в стрелочной функции в сравнении с нормальной функцией
В JavaScript одним из самых запутанных понятий является поведение ключевого слова "this" в стрелочной и обычной функциях.
Приемы CSS-макетирования - floats и Flexbox
Приемы CSS-макетирования - floats и Flexbox
Здравствуйте, друзья-студенты! Готовы совершенствовать свои навыки веб-дизайна? Сегодня в нашем путешествии мы рассмотрим приемы CSS-верстки - в...
Тестирование функциональных ngrx-эффектов в Angular 16 с помощью Jest
В системе управления состояниями ngrx, совместимой с Angular 16, появились функциональные эффекты. Это здорово и делает код определенно легче для...
Концепция локализации и ее применение в приложениях React ⚡️
Концепция локализации и ее применение в приложениях React ⚡️
Локализация - это процесс адаптации приложения к различным языкам и культурным требованиям. Это позволяет пользователям получить опыт, соответствующий...
Пользовательский скаляр GraphQL
Пользовательский скаляр GraphQL
Листовые узлы системы типов GraphQL называются скалярами. Достигнув скалярного типа, невозможно спуститься дальше по иерархии типов. Скалярный тип...
0
4
588
1

Ответы 1

Я удаляю Cloudera Manager по ссылке cloudera manager удалить.

Затем я снова устанавливаю Cloudera Manager. Улей работает хорошо.

Другие вопросы по теме