This resolves a minor issue where neighbors responses containing less
than 16 nodes would bump the failure counter, removing the node. One
situation where this can happen is a private deployment where the total
number of extant nodes is less than 16.
Issue found by @jsying.
This PR is replacing the metrics.influxdb.host.tag cmd-line flag with metrics.influxdb.tags - a comma-separated key/value tags, that are passed to the InfluxDB reporter, so that we can index measurements with multiple tags, and not just one host tag.
This will be useful for Swarm, where we want to index measurements not just with the host tag, but also with bzzkey and git commit version (for long-running deployments).
(cherry picked from commit 21acf0bc8d4f179397bb7d06d6f36df3cbee4a8e)
* swarm: Reinstate Pss Protocol add call through swarm service
* swarm: Even less self
(cherry picked from commit d88c6ce6b058ccd04b03d079d486b1d55fe5ef61)
* swarm/network: DRY out repeated giga comment
I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.
* p2p/simulations: encapsulate Node.Up field so we avoid data races
The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the field. Other times the locking was missed and we had
a data race.
For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.
resolves: ethersphere/go-ethereum#1146
* p2p/simulations: fix unmarshal of simulations.Node
Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.
Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177
* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON
* p2p/simulations: revert back to defer Unlock() pattern for Network
It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.
The patten was abandoned in 85a79b3ad3c5863f8612d25c246bcfad339f36b7,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.
* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON
* p2p/simulations: remove JSON annotation from private fields of Node
As unexported fields are not serialized.
* p2p/simulations: fix deadlock in Network.GetRandomDownNode()
Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock
On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.
* p2p/simulations: ensure method conformity for Network
Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.
* p2p/simulations: fix deadlock during network shutdown
`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.
`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.
Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.
Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223
* swarm/state: simplify if statement in DBStore.Put()
* p2p/simulations: remove faulty godoc from private function
The comment started with the wrong method name.
The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
(cherry picked from commit 50b872bf05b8644f14b9bea340092ced6968dd59)
swarm/network/stream: remove netstore internal wg
swarm/network/stream: run individual tests with t.Run
(cherry picked from commit 3ee09ba03511ad9a49e37c58f0c35b9c9771dd6f)
* swarm/network: new saturation for implementation
* swarm/network: re-added saturation func in Kademlia as it is used elsewhere
* swarm/network: saturation with higher MinBinSize
* swarm/network: PeersPerBin with depth check
* swarm/network: edited tests to pass new saturated check
* swarm/network: minor fix saturated check
* swarm/network/simulations/discovery: fixed renamed RPC call
* swarm/network: renamed to isSaturated and returns bool
* swarm/network: early depth check
(cherry picked from commit 2af24724dd5f3ab1994001854eb32c6a19f9f64a)
* swarm/network/stream: newStreamerTester cleanup only if err is nil
* swarm/network/stream: raise newStreamerTester waitForPeers timeout
* swarm/network/stream: fix data races in GetPeerSubscriptions
* swarm/storage: prevent data race on LDBStore.batchesC
https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461775049
* swarm/network/stream: fix TestGetSubscriptionsRPC data race
https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461768477
* swarm/network/stream: correctly use Simulation.Run callback
https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461783804
* swarm/network: protect addrCountC in Kademlia.AddrCountC function
https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462273444
* p2p/simulations: fix a deadlock calling getRandomNode with lock
https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462317407
* swarm/network/stream: terminate disconnect goruotines in tests
* swarm/network/stream: reduce memory consumption when testing data races
* swarm/network/stream: add watchDisconnections helper function
* swarm/network/stream: add concurrent counter for tests
* swarm/network/stream: rename race/norace test files and use const
* swarm/network/stream: remove watchSim and its panic
* swarm/network/stream: pass context in watchDisconnections
* swarm/network/stream: add concurrent safe bool for watchDisconnections
* swarm/storage: fix LDBStore.batchesC data race by not closing it
(cherry picked from commit 3fd6db2bf63ce90232de445c7f33943406a5e634)