|
Re: Network partition and OTP: msg#00419lang.erlang.general
In the AXD301 product this problem is handled by setting the kernel flag 'dist_auto_connect' to 'once'. Why You may ask? Because the most ugliest (perverted?) thing that can happen is if one uses the automatic node reconnect feature and have flipping communication (down->up->down and so on). This can really screw up the distributed applications (global, dist_ac). Before we changed to dist_auto_connect'=='once' we could see some systems (at customer site) that were totally screwed up in the dist_ac. We could only pray that this disastrous situation would escalate to node restart so everything would clear up. What happens then if connection can only be setup once? Well, we have implemented a simple resolve protocol that is activated between the two nodes that looses connection. (UPD ports always ready to receive messages, one on each Erlang node). Both involved nodes makes a decision on which node is more important and selects the least important node. Minor handshaking and one of the the nodes is restarted (the least prior. node). When it comes up again it will reconnect. This solution have worked quite well and has been enhanced as we found more ugly cases. We even try to discover which of the two nodes is the "guilty" party. For example, if one node looses connection to more than one node it "must" be guilty. Such case can happen for instance if there is some huge garbage collect that takes up all execution. In that case only the "guilty" node is restarted and the other involved nodes are unharmed. I feel that the automatic node reconnect feature might be nice for small systems with very few applications. But it will still be lot of work to handle the reconnect case correctly. I'm not sure but I think that very few have thought about handling this error case. Haven't seen anything in the OTP documentation about this but then I seldom read all documentation that carefully.. Asko Husso E-mail:etxhua@xxxxxxxxxxxxxxx Ericsson AB Phone: +46 8 7192324 Varuvägen 9B S-126 25 Stockholm-Älvsjö, Sweden |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Network partition and OTP: 00419, Reto Kramer |
|---|---|
| Next by Date: | Re: The Erlang way - dynamic upgrade of a server and UBF extensions: 00419, Joe Armstrong |
| Previous by Thread: | Network partition and OTPi: 00419, Reto Kramer |
| Next by Thread: | the beginner: 00419, wicaksono |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |