Jump to content

Agent troubleshooting and repair


Recommended Posts

It seems this topic comes up a lot and I still haven't seen a good troubleshooting guide and easy steps to repair the agent.

I have a number of OS X clients that work perfectly fine for a while and at some point the agent quits working properly. Status.html is all green and replication is occurring regularly.  Last-error.html shows nothing noteworthy and the trace.log does not indicate any glaring errors.

Yet the client does not show up in the ERA console by either name or IP address, and the client does not pull the correct policies. It does, however, update definitions properly.  I have three macbooks next to me all doing the same thing. They are all running the latest version of the agent.

It seems the only option is to remove the agent and reinstall it. But this can't be done via ERA console.

Are there any guides to determining what is causing the issue and how to repair the client easily?

Thank you.

Link to post
Share on other sites

I can confirm this behavior in my environment with the Windows agent.  We have only about a 100 macs, but we have ~6,000 PCs and I am seeing it happen with those on roughly a hundred.  

The weird thing is that the machines continue to get the policy changes I make, but I can not find them anywhere.

 

Link to post
Share on other sites
  • ESET Staff

Hello, was this anyhow communicate with ESET Support? (could you send me your respective ticket numbers?)

I have briefly consulted your issues with developers, and what they have suggested is, to analyze your database, if you are willing to provide it & ideally alongside some screenshots (logs from the affected computers) of what should be reported, but is not (computer that is installed, and connecting, but not visible in Remote Administrator).  Are you sure, that the computer was not renamed (task that is executed automatically over lost & found), or it is placed in a different group. Have you used search, or you were using filters by the means of dynamic groups (for the mac computers)? 

Also, when you have tried to troubleshoot, have you tried to remove the agent completely (uninstall, and install again) or you was always doing repairs? Does it appear as a new computer after re-install? You can contact me via private message, in case you will have any other inputs / or details.

 

Link to post
Share on other sites
10 minutes ago, MichalJ said:

Are you sure, that the computer was not renamed (task that is executed automatically over lost & found), or it is placed in a different group. Have you used search, or you were using filters by the means of dynamic groups (for the mac computers)? 

Also, when you have tried to troubleshoot, have you tried to remove the agent completely (uninstall, and install again) or you was always doing repairs? Does it appear as a new computer after re-install?

Thanks for the reply.

In my case, I had three identical Macbooks next to me that had the latest agent installed. They've been powered off for several weeks. I powered them on today to do some testing and found that none were getting the latest policies. I searched 'All devices' including subgroups by both name and IP address and found none of the three.  Agents on two were still communicating with the ERA server. The third is not communicating at all. All three are configured the same (same model even).

When I've encountered this in the past with Windows machines, uninstalling and reinstalling the agent always works. But this is not a solution. I'm not aware of a repair option for OS X. Re-deploying the agent yields a successful task, but changes/fixes nothing.

The biggest issue is that when the agents quit functioning we have no way of knowing --they just appear to be another offline system and they no longer receive updated policies.

Link to post
Share on other sites
18 minutes ago, j-gray said:

Thanks for the reply.

In my case, I had three identical Macbooks next to me that had the latest agent installed. They've been powered off for several weeks. I powered them on today to do some testing and found that none were getting the latest policies. I searched 'All devices' including subgroups by both name and IP address and found none of the three.  Agents on two were still communicating with the ERA server. The third is not communicating at all. All three are configured the same (same model even).

When I've encountered this in the past with Windows machines, uninstalling and reinstalling the agent always works. But this is not a solution. I'm not aware of a repair option for OS X. Re-deploying the agent yields a successful task, but changes/fixes nothing.

The biggest issue is that when the agents quit functioning we have no way of knowing --they just appear to be another offline system and they no longer receive updated policies.

I spoke with ESET Business Support on Monday regarding this issue.  We were able to get the machine working by reinstalling, and then it shows back up.  But as mentioned above, this is not a workable solution as there really isn't a method for knowing when they stop working.  I am able to determine this however through the AD sync task.  I am able to do the tedious process of looking through the objects and finding which ones are not "managed" .  I was a little disappointed in the rep not really wanting to look into the root cause.  However, I have been in communication with someone from your top business support on a few other issues that he has been very helpful with solving, so I was going to bring it up with him after the holidays.  However, I do have logs from one of the machines that I can provide.  The error showing in the last errors html file seems to be consistent with all of the machines that have the issue.  The ticket number from when I spoke with someone from support this week was: 1502830.  A sample from the last error log:

CEssConnectorModule 2016-Aug-31 10:25:22 Requesting protection status log from product
CEssConnectorModule 2016-Aug-31 10:25:22 Protection status content: 









 
 
CEssConnectorModule 2016-Aug-31 10:25:22 Protection status log deserialized and published
CSystemConnectorModule 2016-Aug-31 10:25:24 StatusLog_PERFORMANCE_USER_STATUS: "Rows":[{"symbols":[{"symbol_type":453,"symbol_data":{"val_int":[1]}},{"symbol_type":447,"symbol_data":{"val_uuid":[{"uuid":"442b1cd1-77ce-40f7-a9c7-94a0cbed839f"}]}},{"symbol_type":454,"symbol_data":{"val_time_date":[{"year":2016,"month":8,"day":31,"hour":10,"minute":25,"second":24}]}},{"symbol_type":456,"symbol_data":{"val_res_id":[508906757892866568]}}]}]
AutomationModule 2016-Aug-31 10:25:24 Trigger: Tick ALLOWED [UUID=00000000-0000-0000-7006-000000000001, TYPE=REPLICATION].
CDataMinersModule 2016-Aug-31 10:25:24 Machine is not idle because user is not idle
SchedulerModule 2016-Aug-31 10:25:24 Received message: RegisterSleepEvent
AutomationModule 2016-Aug-31 10:25:24 Task: Executing task [UUID=00000000-0000-0000-7005-000000000001, TYPE=Replication, CONFIG=scenarioType: REGULAR linkData { dataLimit: 1024 isDisabled: false connections { host: "[REMOVED]" port: [REMOVED] } }].
SchedulerModule 2016-Aug-31 10:25:24 Received message: GetRemainingTimeByUserDataRequest
CReplicationModule 2016-Aug-31 10:25:24 CReplicationManager: Processing client replication task message
CReplicationModule 2016-Aug-31 10:25:24 CReplicationManager: Initiating replication connection to 'host: "[REMOVED]" port: [REMOVED]' (scenario: Regular, data limit: 1024KB)
NetworkModule 2016-Aug-31 10:25:24 Received message: CreateConnectionRequest
NetworkModule 2016-Aug-31 10:25:24 Attempting to connect to endpoint: [REMOVED]
NetworkModule 2016-Aug-31 10:25:24 Forcibly closing sessionId:60, isClosing:0
NetworkModule 2016-Aug-31 10:25:24 Removing session 60
NetworkModule 2016-Aug-31 10:25:24 Closing connection , session id:60
NetworkModule 2016-Aug-31 10:25:24 Sending message: ConnectionFailure
CReplicationModule 2016-Aug-31 10:25:26 CReplicationManager: Replication (network) connection to 'host: "[REMOVED]" port: [REMOVED]' failed with: (0x2751), A socket operation was attempted to an unreachable host

 

The date/times are all different on each of the ones that I have had to fix so there is no common theme there.

Edited by kingoftheworld
Link to post
Share on other sites

FWIW, I have the exact same two warnings in yellow. The error I get in red is the same, except it's "failed with: connection closed by remote peer for session id xx"  'xx' being a specific session number.

I have not opened a case on this yet, as I'm still spending a lot of time on the OS X startup delay issue.

Link to post
Share on other sites
2 minutes ago, j-gray said:

FWIW, I have the exact same two warnings in yellow. The error I get in red is the same, except it's "failed with: connection closed by remote peer for session id xx"  'xx' being a specific session number.

I have not opened a case on this yet, as I'm still spending a lot of time on the OS X startup delay issue.

Have you opened a case for the OSX issue?  I am currently still working with support since roughly June on the OSX startup delay.  They provided an early release of 6.4 for OSX that doesn't appear to resolve the issue.

Link to post
Share on other sites
Just now, kingoftheworld said:

Have you opened a case for the OSX issue?  I am currently still working with support since roughly June on the OSX startup delay.  They provided an early release of 6.4 for OSX that doesn't appear to resolve the issue.

Oh yes. We've had a case open since August for the startup issue. I've tried the RC, as well and felt a small improvement but not enough to make it useable.

Link to post
Share on other sites
Just now, j-gray said:

Oh yes. We've had a case open since August for the startup issue. I've tried the RC, as well and felt a small improvement but not enough to make it useable.

The GUI is the cause of the issue.  We have found that forcing the GUI to not to try to launch will correct the issue.  I believe our OSX engineer modified one of the plist files that points to the location of the GUI will cause it to fail immediately.  The A/V component appears to function normally at that point, but we are hoping for an actual fix. 

Link to post
Share on other sites
  • ESET Staff

Seems both of you have almost the same problem - i will add few questions/notes:

  1. Computer/Device is missing from Webconsole:
    • We are currently removing computers only during execution of task "Delete not connecting computers" and when user explicitly requests removal of computer - are you using this task (it is server task)?
    • In case computer is removed from console and it connects, it will be re-created with original name, but most probably in Lost&Found group.
    • Removal of computer should be traced in Audit log - could you please check for entries mentioning missing computer by its name?
    • Could you check results of this DB query:
      SELECT c.computer_name, a.computer_connected, s.group_name, c.removed, c.muted, c.computer_id from tbl_computers c JOIN tbl_computers_aggr a ON c.computer_id=a.computer_id JOIN tbl_static_groups s ON s.group_id=c.parent_id;

      for computers you are missing (not seeing in console)? Result of this query should contain list of all computers, including their last connection time and name of static group.

  2. AGENT is not connecting:
    • When checking status.html and related logs, please make sure dates/timestamps are current. It is possible that AGENT is not even running and status.html may still show all green - but with old timestamps.
    • Does restarting system (or only AGENT service if possible) helps to resolve connection problems?
    • Previously mentioned error "A socket operation was attempted to an unreachable host" indicates problem with network - maybe this computer was not even connected to network when this error occurred - this error was resolved by reinstalling?
Link to post
Share on other sites
  • 5 hours ago, MartinK said:
    • Seems both of you have almost the same problem - i will add few questions/notes:
    • Computer/Device is missing from Webconsole:
      • We are currently removing computers only during execution of task "Delete not connecting computers" and when user explicitly requests removal of computer - are you using this task (it is server task)?
      • In case computer is removed from console and it connects, it will be re-created with original name, but most probably in Lost&Found group.
      • Removal of computer should be traced in Audit log - could you please check for entries mentioning missing computer by its name?
      • Could you check results of this DB query:
        
        SELECT c.computer_name, a.computer_connected, s.group_name, c.removed, c.muted, c.computer_id from tbl_computers c JOIN tbl_computers_aggr a ON c.computer_id=a.computer_id JOIN tbl_static_groups s ON s.group_id=c.parent_id;

        for computers you are missing (not seeing in console)? Result of this query should contain list of all computers, including their last connection time and name of static group.

    • AGENT is not connecting:
      • When checking status.html and related logs, please make sure dates/timestamps are current. It is possible that AGENT is not even running and status.html may still show all green - but with old timestamps.
      • Does restarting system (or only AGENT service if possible) helps to resolve connection problems?
      • Previously mentioned error "A socket operation was attempted to an unreachable host" indicates problem with network - maybe this computer was not even connected to network when this error occurred - this error was resolved by reinstalling?
  • We run the 'Delete not connecting computers' server task. We also use AD sync to remove disabled/deleted workstations
  • The workstations appear by name due to AD sync. But they appear offline/unmanaged with no information.
  • What Audit log are you referring to specifically? Under the 'Last Connection' report in ERA console, they show up in the 'Never' subsection for last connection. This is not accurate.
  • Results of the DB query list out all computers as I imagine should be expected. The three broken ones show up in the AD OU where they would be sync'd, but no information. What should I look for in the query results?
  • Time stamps in the logs for 2 of three are current. The third apparently is not connecting.
  • Multiple reboots have not resolved the issue. Manual uninstall and reinstall of the agent is successful, but these systems still do not appear in the console. The only way I've gotten it to work is 1) Uninstall the Agent, 2) Manually delete the com.eset.remoteadministrator.agent folder, 3) then install the agent
Link to post
Share on other sites
  • ESET Staff
  • There is a special audit log report available in reports section -> it lists operation that were performed  in your ERA environment, especially management of various objects and configuration. It should also list modifications made by server task (as is deleting not connecting computers).
  • Query of DB result should list all computers/devices that are visible in Webconsole. Does number of listed entries correspond to what you see in console? Could you also check content of table tbl_computers and compare it with results of previous query? Are there more entries?
  • You mentioned that one computer has not current timestamps in status.html - are there at least errors with current time? On the bottom of the status.html there is time this log has been updated: is it almost current? In case last update of status.html was not performed in last minute, it most probably means that this AGENT is not even running, and thus not connecting. Can you see process "ERAAgent" running on this machine?
  • Regarding re-installation of AGENT on MacOSX, it seems to me, that you are uninstalling AGENT by "drag&drop" to trash, which is unfortunately not proper way and results in broken state, where AGENT is still running and not all files are uninstalled. Proper way of uninstalling AGENT in OSX environment is to use script:
    /Applications/ESET Remote Administrator Agent.app/Contents/Scripts/Uninstall.command

    which will stop running services and remove all files installed.

 

From my point of view it seems that AGENT stops connecting for some reason, which results of it's removal in Webconsole (delete not connecting computers). Unfortunately we do not know what is the reason of this behavior. There is also chance that AGENT crashed and generated minidump: could you check directory:

/Library/Application Support/com.eset.remoteadministrator.agent//Dumps/

for .dmp files?

Link to post
Share on other sites
1 hour ago, MartinK said:
  • You mentioned that one computer has not current timestamps in status.html - are there at least errors with current time? On the bottom of the status.html there is time this log has been updated: is it almost current? In case last update of status.html was not performed in last minute, it most probably means that this AGENT is not even running, and thus not connecting. Can you see process "ERAAgent" running on this machine?

I am out of the office for the rest of the week, but I can probably gain access to some of the logs next week.  However, the "Last Error Log" showed a date/time well in the past.  For the post above, it was "
Generated at 2016-Aug-31 10:25:26 (2016-Aug-31 06:25:26 local time)"

Obviously, we are well into December now, but I am not sure if this is generated at the last time an entry was added to the log or the current time of the machine when I viewed the file.

Link to post
Share on other sites
  • ESET Staff
On 23. 12. 2016 at 11:15 PM, kingoftheworld said:

I am out of the office for the rest of the week, but I can probably gain access to some of the logs next week.  However, the "Last Error Log" showed a date/time well in the past.  For the post above, it was "
Generated at 2016-Aug-31 10:25:26 (2016-Aug-31 06:25:26 local time)"

Obviously, we are well into December now, but I am not sure if this is generated at the last time an entry was added to the log or the current time of the machine when I viewed the file.

Log should contain local time in moment specific information was logged (+ also time of last modification in the bottom of status.html). In case status.html (and possibly all other) logs were not modified for this long time, it is almost sure that AGENT is not running...

To be sure, please verify, whether process C:\Program Files\ESET\RemoteAdministrator\Agent\ERAAgent.exe is running. In case not, try to start it's service manually. Also you may try to run previously mentioned ERAAgent.exe from administrator console (cmd.exe) where it may print more detailed information, i.e. reason why it is not starting. Most of related problems with AGENT not able to run we have seen recently were caused by changes in Microsoft Visual C++ 2010 Redistributable Package installation - it is possible that it was either uninstalled, or third-party application made changes that broke AGENT. My recommendation is to try to install runtime manually (https://www.microsoft.com/en-us/download/details.aspx?id=5555 or https://www.microsoft.com/en-us/download/details.aspx?id=13523 depending on platform).

 

 

Link to post
Share on other sites
Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...