Cardano's incident between block 8300569 and 8300570

About the incident between block 8300569 and 8300570) which caused approx. 50% of Cardano nodes to disconnect and restart..

... Analysis of 'someone'

I’ve been debugging this for a few hours now and have a fairly strong opinion that the issue is indeed a bug in the balanced tree algorithm implementation of the Haskell containers package. Specifically a logic error that presumes one of the container subtree bins can never have a left or right sub-sub-tree node that is empty when attempting to rebalance the parent. Probably a condition that arises when a Map is being aggressive purged of values in successive calls, resulting in an unbalanced branch and violation of the assumed invariant here:

https://github.com/haskell/containers/blob/v0.6.5.1/containers/src/Data/Map/Internal.hs#L4154 

Rather than attempting to recreate the conditions by capturing another sequence of TXs, it might be easier to just code a stress harness around the Map container of the Haskell/containers package that randomly mutates a Map container with modulated bias to number of successive insert/remove operations until the suspected conditions are repeated.

There’s a lot of voodoo in that library that attempts to optimise performance by preventing unboxing and preserving box addresses for comparison purposes too. It all makes the logic flow very difficult to follow. But the error condition raised seems unambiguous in that either a (Bin _ _ _ _ _, Tip _ _ _ _ _) or (Tip _ _ _ _ _, Bin _ _ _ _ _) is being matched.

The fact that approximately 50% of nodes were affected is probably just to do with the fact that a proportional amount were shielded from the offending condition by virtue of the other 50% crashing and dropping connections before propagating. Which kind of makes sense given most people are probably running with default +RTS -N2 +RTS which effectively serialises network operations. Unless your a maverick like me that runs -N8 and probably killed a significantly greater number of peers than most. 


Comentarios

Entradas populares de este blog

Charles' Thoughts on the SPO vs. MPO contraversy (Twitter Space - 2/5/2021)

Cardano antiFUD: Tweet @cardano_whale

Finalidad vs Latencia. Tweet de @KtorZ