Our application involves connecting to a LabJack T7 from Ubuntu 16.04 over ethernet, using the LJM library. The LabJack T7 is also plugged into the computer via USB for power. Occasionally, usually after several days of uptime, the LabJack will persistently fail to connect. While it's in this state, the USB connection is still alive and being recognized by the OS, as seen in the output of lsusb. The LabJack is still pingable by its static IP address. The only action that fixes it is unplugging and replugging the T7 from USB. Unfortunately, gaining physical access to these devices every few days is not feasible for our application, so we need another way to get out of this state, or avoid getting into it in the first place.
Things we tried that don't help:
- Restarting the host computer
- Unloading and reloading the USB kernel module. e.g. sudo modprobe -r usbhid; sudo modprobe usbhid
- Unbind the USB device. e.g. echo '1-8' > /sys/bus/usb/drivers/usb/unbind
- Causes the USB device to disconnect and reconnect, but no change to ethernet connectivity.
- One report of the issue persisting even after power was fully removed from the computer and labjack.
- The USB port can't be unpowered
lsusb output:
Bus 001 Device 008: ID 0cd5:0007 LabJack Corporation
The LabJack is opened with the following call:
LJM_Open(LJM_dtT7, LJM_ctETHERNET, "LJM_idANY", &_labjack_handle);
But it persistently fails with LJME_DEVICE_NOT_FOUND. There is exactly 1 LabJack connected to the computer.
How can we get the LabJack connecting again without having to physically replug the USB?
It sounds like the issue could be due to the TCP socket not getting released properly; there can only be two TCP connections to the LabJack at a time. Please ensure you always close the device connection handle using LJM_Close. I would also recommend upgrading firmware to 1.0292 (release) if you are not up to date.
Since you can ping the device even under the failure, you may be able to connect via UDP and reset the Ethernet or entire device remotely that way. "ETHERNET_UDP" can be specified in the open call to open a UDP connection:
https://labjack.com/support/software/api/ljm/function-reference/ljmopens
To reset Ethernet you can write a 0 to the POWER_ETHERNET register. This is described in the power section of our Ethernet documentation:
https://labjack.com/support/datasheets/t-series/ethernet
You can do a full software reset using the SYSTEM_REBOOT register as described on the following page:
https://labjack.com/support/datasheets/t-series/hardware-overview
Thanks for the tips. I tried several different connection types: ETHERNET (what we’re currently using), ETHERNET_UDP, NETWORK_UDP, NETWORK_ANY. I tried specifying the static IP address. No change in the resulting error, LJME_DEVICE_NOT_FOUND.
I then attempted to connect over USB. With the connection type ANY or USB, I received this error instead: LJME_DEVICE_CURRENTLY_CLAIMED_BY_ANOTHER_PROCESS.
Which supports the theory that we aren’t properly closing connections to the LabJack. But netstat does not indicate any open tcp connections to the LabJack, and I still can’t connect via any method to reset the LabJack.
Any further things to try to get a connection or release the claim on the LabJack?
I tried some testing with a LabJack unit in a good state, under very similar conditions, to see if I could reproduce the issue. I wasnt able to. Here's what I tried, going on the theory that the issue is due to improperly closed streams or device handles. Each connection attempt was made with connection type ETHERNET, to match what we typically use.
Repeatedly running an example program without closing the LabJack handle.
The same, but instead of letting the program terminate normally, interrupt it with SIGINT.
Repeated running an example program without stopping the stream and without closing the LabJack handle.
The same, but instead of letting the program terminate normally, interrupt it with SIGINT.
The same, but instead of letting the program terminate normally, interrupt it with SIGKILL. The next connection attempt succeeds, but opening the stream fails with STREAM_IS_ACTIVE. Closing the stream before attempting to open a new one solves the issue.
Repeatedly running our production program, and ending it with SIGINT.
The same, but with SIGTERM.
The same, but with SIGKILL. Similar results to 5.
Running multiple example programs at once. One succeeds and the rest fail with STREAM_IS_ACTIVE.
Any other ideas for reproducing this issue?
Another option to restart the device if communication fails is to use the software watchdog. Somehow this slipped my mind previously, sorry about that:
https://labjack.com/support/datasheets/t-series/watchdog
The watchdog should be all you need to reset the device. If you would like to continue troubleshooting what is happening that requires the reset, I think the best course of action is for you to run until failure and collect more debug information. I would recommend you use wireshark to check the port activity while failing to reconnect the device and enable the LJM debug logger as it will likely capture additional information for the connection failure:
https://labjack.com/support/software/api/ljm/function-reference/debuggin...
As some additional information, LJM has a signal handler that will try to close the connection automatically upon any signal that ends the process. Ping working should likely indicate the Ethernet stack is up and packets are getting through to the device, or there is also a chance that ping working indicates some other device is using the same IP (it was not the LabJack responding). To ensure that cannot happen the static IP should be placed outside of the DHCP range. UDP not working under the failure state tells us that either UDP does not work under your network, so it was never going to work, or something more than socket issues are going on since UDP is connectionless.
Breakthrough!
I was able to make a connection with these problematic devices by connecting with type ETHERNET and specifying the IP address instead of ANY. (I thought I had tried this earlier, but I guess I hadn't).
There were two LabJacks attached to two different computers I was able to connect to with this method. But connecting to them once didn't resolve the issue of not being able to connect with ANY.
On Unit 1, I was able to write to the ETHERNET_APPLY_SETTINGS address and then subsequent attempts to connect with ETHERNET/ANY succeeded.
On Unit 2, ETHERNET_APPLY_SETTINGS did not fix the issue. I then rebooted the unit with a write to SYSTEM_REBOOT, and then subsequent attempts to connect with ETHERNET/ANY succeeded. I collected the debug logs from Unit 2 to see if there might be any useful clues when looking at the pre- vs. post- reboot connection attempts. Nothing stood out to me, only difference is that the succcessful run progresses past
[Nov 05 03:05:45 2021] INFO - Initiating ETHERNET discovery.
Pre-reboot failure to connect with ETHERNET/ANY: labjack_debug_failed_ethernet_any.log
Successful connection and reboot with ETHERNET/IP: labjack_debug_ethernet_ip_reboot_success.log
Post-reboot success with ETHERNET/ANY: labjack_debug_ethernet_any_post_reboot_success.log
One thing I noticed is that warnings about an unknown type in ljm_constants.json appear in all the logs. The only appearances of UINT64 are for the fields ETHERNET_MAC and WIFI_MAC. We use an unmodified ljm_constants.json file. Any reason this might interfere with the operation of ethernet or wifi?
[Nov 05 01:58:27 2021] WARNING - Unknown type in /usr/local/share/LabJack/LJM/ljm_constants.json: UINT64
[Nov 05 01:58:27 2021] WARNING - Unknown type in /usr/local/share/LabJack/LJM/ljm_constants.json: UINT64
Thanks so much for the help! We will definitely change to connect by IP address instead of ANY, and implement the watchdog.
If it is reasonable, I would recommend using a static IP and open the device based on IP rather than use the "Open any" call.
The "Open any" call failure suggests a UDP issue. One potential solution is to use the LJM specific IPs file as described on the following page:
https://labjack.com/support/software/api/ljm/constants/SpecificIPsConfigs
Enabling UDP discovery-only mode could also help if you are getting a lot of traffic on port 52362 and that is causing issues:
https://labjack.com/support/datasheets/t-series/ethernet#udp-discovery-o...
The "Unknown type" warning will not affect operation.