Jump to content

Server dies regularly with "remote_endpoint: Bad file descriptor"


Recommended Posts

Hi,
 
I noticed that the server dies every 2 days or so.
The trace log shows these lines ever 10 seconds:
2016-04-23 22:51:39 Error: NetworkModule [Thread 7fbab4ff9700]: remote_endpoint: Bad file descriptor
2016-04-23 22:51:49 Error: NetworkModule [Thread 7fbab77fe700]: remote_endpoint: Bad file descriptor
2016-04-23 22:51:49 Error: NetworkModule [Thread 7fbab4ff9700]: remote_endpoint: Bad file descriptor
 
the last entries in the last-error.html:
 
 

Scope	Time	Text
NetworkModule	2016-Apr-22 17:09:32	remote_endpoint: Bad file descriptor
SchedulerModule	2016-Apr-22 17:09:35	Received message: RegisterSleepEvent
NetworkModule	2016-Apr-22 17:09:42	No descriptors available. Active session count:2
NetworkModule	2016-Apr-22 17:09:42	remote_endpoint: Bad file descriptor
NetworkModule	2016-Apr-22 17:09:52	No descriptors available. Active session count:2
NetworkModule	2016-Apr-22 17:09:52	remote_endpoint: Bad file descriptor
CCleanupModule	2016-Apr-22 17:09:55	Initiating calculation of status snapshots
CCleanupModule	2016-Apr-22 17:09:55	Finished calculation of status snapshots
SchedulerModule	2016-Apr-22 17:09:55	Received message: RegisterSleepEvent
AutomationModule	2016-Apr-22 17:10:00	ReportManager: ProcessWatchdogCheck: Watchdog reports normal behaviour.
ConsoleApiModule	2016-Apr-22 17:10:00	Session data cleanup timeout.
CReplicationModule	2016-Apr-22 17:10:00	CStepProcessor: Server state changed to OK
NetworkModule	2016-Apr-22 17:10:02	No descriptors available. Active session count:2
NetworkModule	2016-Apr-22 17:10:02	remote_endpoint: Bad file descriptor
NetworkModule	2016-Apr-22 17:10:02	Socket accepted. Remote ip address: 178.191.54.94 remote port: 55989
NetworkModule	2016-Apr-22 17:10:02	Resolving ip address: 178.191.54.94
NetworkModule	2016-Apr-22 17:10:02	Receiving ip address: 178.191.54.94 from cache
NetworkModule	2016-Apr-22 17:10:02	Successfully received ip address: 178.191.54.94 from cache
NetworkModule	2016-Apr-22 17:10:02	remote_endpoint: Transport endpoint is not connected
NetworkModule	2016-Apr-22 17:10:02	Socket accepted. Remote ip address: 213.47.170.12 remote port: 54350
NetworkModule	2016-Apr-22 17:10:02	Resolving ip address: 213.47.170.12
NetworkModule	2016-Apr-22 17:10:02	Receiving ip address: 213.47.170.12 from cache
NetworkModule	2016-Apr-22 17:10:02	Successfully received ip address: 213.47.170.12 from cache
NetworkModule	2016-Apr-22 17:10:02	Socket connection (isClientConnection:0) established for id 32342
NetworkModule	2016-Apr-22 17:10:02	Socket connection (isClientConnection:0) established for id 32343
NetworkModule	2016-Apr-22 17:10:02	Connection closed by remote peer for session id 32342
NetworkModule	2016-Apr-22 17:10:02	Connection closed by remote peer for session id 32343
NetworkModule	2016-Apr-22 17:10:02	Forcibly closing sessionId:32342, isClosing:0
NetworkModule	2016-Apr-22 17:10:02	Removing session 32342
NetworkModule	2016-Apr-22 17:10:02	Closing connection , session id:32342
NetworkModule	2016-Apr-22 17:10:02	Forcibly closing sessionId:32343, isClosing:0
NetworkModule	2016-Apr-22 17:10:02	Removing session 32343
NetworkModule	2016-Apr-22 17:10:02	Closing connection , session id:32343
NetworkModule	2016-Apr-22 17:10:03	No descriptors available. Active session count:2
NetworkModule	2016-Apr-22 17:10:03	remote_endpoint: Bad file descriptor
NetworkModule	2016-Apr-22 17:10:03	Socket accepted. Remote ip address: 213.47.170.12 remote port: 55358
NetworkModule	2016-Apr-22 17:10:03	Resolving ip address: 213.47.170.12
NetworkModule	2016-Apr-22 17:10:03	Receiving ip address: 213.47.170.12 from cache
NetworkModule	2016-Apr-22 17:10:03	Successfully received ip address: 213.47.170.12 from cache
NetworkModule	2016-Apr-22 17:10:03	remote_endpoint: Transport endpoint is not connected

 
I also notice a lot of the following warnings and errors in the tracelog:
 
2016-04-21 16:32:04 Warning: NetworkModule [Thread 7fbab7fff700]: The connection will be closed due to timeout. Resolved endpoint is NULL
2016-04-21 16:32:05 Error: NetworkModule [Thread 7fbab6ffd700]: Error reported by JobScheduler[Name:Dns job scheduler for not network operation]. Error message is:resolve: Host not found
 
Installed ERA Server 6.3.148.0
Agent on Server 6.3.148.0
Debian (64-bit), Version 8.4
 
I hope you can help me...
 
Kind regards

Edited by tobiasperschon
Link to post
Share on other sites
  • ESET Staff

Seems you have two (maybe related) issues. Seconds one you mention is problem with reverse DNS resolving that may lead to rejecting client connection. It is a known issue and you can use fixed libraries available from here (package contains also firs for other issue and I would suggest to replace both libraries).

 

Other problems are caused by network sockets exhaustion on your system. I would suggest to use command lsof to analyze what process and what type of file descriptos/sockets is causing this. In case it crashes after 2 days, I would wait at least day until running command, so that possible leak is visible.

Edited by MartinK
Link to post
Share on other sites

I applied the fix. I will check if the second problem is gone. I will also check the server if the first problem occurs again. Interestingly the server was running fine for over a year now. I apt-get upgraded it many times in this year but the first problem I mentioned just started recently. (I did not add more clients...)

 

anyway, we'll see and I will get back to you! thanks for the advice so far!

 

Update:

so far no more errors... lets see how long the server will keep running

 

Update2:

The patch seems to have fixed both issues. Server is running fine now.

Edited by tobiasperschon
Link to post
Share on other sites
Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...