Lightning node reported as offline by WebODM, but is actually up?


#1

WebODM installed here on Docker, on an Arch64 host, accessed using Chrome on Windows 7 (x64) on the local network.

The default Node1 shows up, is reported as online, and works (returns results when starting tasks)

I signed up to the Lightning node service, pasted my token into the Lightning UI in WebODM, and it reported success and added the lightning node to the list of nodes available, reporting it online.

I didn’t test it at that stage.

A couple hours later, I log back in to ‘my’ WebODM, and the lightning node is reported as ‘down’, but I can access it directly from the Lightning webpage.

The ‘resync’ and ‘refresh balance’ buttons seem to work, but the node is still reported as ‘down’.

webodm.sh down && webodm.sh start doesn’t seem to help.

I can’t see anything ominous in the command line output on the server.


#2

Mm, if you click the Lightning Node from the “Processing Nodes” menu from the left side, does it show an error message at the top of the page? It takes ~25-30 seconds for a node to be reported online/offline.


#3

Yes, that’s right - it comes up with the “spark1.webodm.net:80 seems to be offline.” message. The icon for this node is also red, and is not available to select when starting a new task.


#4

Aha, I think I have it.

The host that webodm etc runs on is also the wan gateway, which also runs dnsmasq.

Dnsmasq was providing nameservers for the machines within the network, but of course Docker is on its own network, so just uses the hosts /etc/resolv.conf (less any local ip’s).

Unfortunately, I had some flaky nameservers listed in the hosts resolv.conf - this didn’t affect the rest of the network, as dnsmasq provided its own nameservers, but docker removed all local ip’s and was left with two flaky nameservers.

It appears they were working when I added the Lightning node, but fell over some time later. Subsequent attempts by webodm to reach the node failed to find a route to the host, as the only nameservers available to it were down. Trying to reach the node from other machines on the network worked as dnsmasq provided a route.

This could have been resolved by configuring dnsmasq to listen to the docker network, but removing the offending nameservers from resolv.conf (and restarting the docker service and webodm) appears to have fixed my issue.

Thanks for your assistance!