GreenEnvy22 6 Posted November 7, 2022 Share Posted November 7, 2022 We just noticed an issue with our Eset Protect server. None of our 500+ agents have reported in since Nov 1st. As far as we're aware, no changes have been made to the server in weeks, other than it maybe auto-updating itself. Some agents are inside the firewall, others outside, but none of them are reporting in. I checked the agent logs and see messages like: 2022-11-07 14:32:24 Error: CReplicationModule [Thread 22c0]: CAgentReplicationManager: Replication finished unsuccessfully with message: Deadline Exceeded (code: 4) for request Era.Common.Services.Replication.GetStaticObjectsRequest (id: 2cef4920-d643-4a11-ae85-ba4bf536c88f) on connection to 'host: "eset.DOMAIN.org" port: 2222', Task: CStaticObjectMetadataTask, Scenario: Automatic replication (REGULAR), Connection: eset.DOMAIN.org:2222, Connection established: true, Replication inconsistency detected: true, Server busy state detected: false, Realm change detected: false, Realm uuid: 356af7a2-24c8-42d7-ac8e-061bb6fe9e5c, Sent logs: 0, Cached static objects: 162, Cached static object groups: 10, Static objects to save: 0, Static objects to delete: 0, Modified static objects: 0 2022-11-07 14:32:24 Error: CSystemConnectorModule [Thread 22d4]: CDeviceSnapshotWinLoaderUtils::ReadDiskDriveIDs: calling funtion=[CreateFile] for volume=[\\?\Volume{2c7bdea2-5dad-11ed-9f17-047bcb29463b}] failed, error=[5] 2022-11-07 14:35:54 Error: CReplicationModule [Thread 22c0]: InitializeConnection: Replication connection problem: Deadline Exceeded (code: 4) for request Era.Common.Services.Replication.CheckReplicationConsistencyRequest (id: c00aabdc-43e7-4ebd-8d85-9f34f9871cd0) on connection to 'host: "eset.DOMAIN.org" port: 2222' 2022-11-07 14:35:54 Warning: CReplicationModule [Thread 22c0]: InitializeConnection: Not possible to establish any connection (Attempts: 1) [RequestId: c00aabdc-43e7-4ebd-8d85-9f34f9871cd0] 2022-11-07 14:35:54 Error: CReplicationModule [Thread 22c0]: InitializeFailOverScenario: Skipping fail-over scenario (stored replication link is the same as current) [RequestId: c00aabdc-43e7-4ebd-8d85-9f34f9871cd0] 2022-11-07 14:35:54 Error: CReplicationModule [Thread 22c0]: CAgentReplicationManager: Replication finished unsuccessfully with message: Replication connection problem: Deadline Exceeded (code: 4) for request Era.Common.Services.Replication.CheckReplicationConsistencyRequest (id: c00aabdc-43e7-4ebd-8d85-9f34f9871cd0) on connection to 'host: "eset.DOMAIN.org" port: 2222', Task: CReplicationConsistencyTask, Scenario: Automatic replication (REGULAR), Connection: eset.DOMAIN.org:2222, Connection established: false, Replication inconsistency detected: false, Server busy state detected: false, Realm change detected: false, Realm uuid: 356af7a2-24c8-42d7-ac8e-061bb6fe9e5c, Sent logs: 0, Cached static objects: 162, Cached static object groups: 10, Static objects to save: 0, Static objects to delete: 0, Modified static objects: 0 2022-11-07 14:35:54 Error: CSystemConnectorModule [Thread 22d4]: CDeviceSnapshotWinLoaderUtils::ReadDiskDriveIDs: calling funtion=[CreateFile] for volume=[\\?\Volume{2c7bdea2-5dad-11ed-9f17-047bcb29463b}] failed, error=[5] And Status.htm: Error: Replication connection problem: Deadline Exceeded (code: 4) for request Era.Common.Services.Replication.CheckReplicationConsistencyRequest (id: c00aabdc-43e7-4ebd-8d85-9f34f9871cd0) on connection to 'host: "eset.DOMAIN.org" port: 2222' Task: CReplicationConsistencyTask Scenario: Automatic replication (REGULAR) Connection: eset.DOMAIN.org:2222 Connection established: false Replication inconsistency detected: false Server busy state detected: false Realm change detected: false Realm uuid: 356af7a2-24c8-42d7-ac8e-061bb6fe9e5c Sent logs: 0 Cached static objects: 162 Cached static object groups: 10 Static objects to save: 0 Static objects to delete: 0 Modified static objects: 0 All replication attempts: 285 On the Protect server (Windows VM, running Protect 9.1.1295, and web console 9.1.292.0): 2022-11-07 17:46:14 Error: NetworkModule [Thread 14d8]: Protocol failure for session id 9589, error:Receive: NodSslWriteEncryptedData: Internal error in the underlying implementations. 2022-11-07 17:47:02 Error: CReplicationModule [Thread 14a4]: RpcGetStaticObjectsHandler: Failed to load static objects list for peer c4d5ec23-dbe2-4563-bea8-81eeed0cffd2 with error: Object b5a1a0b4-1e4e-4a4d-99f5-dd5f0178cf28 was not found (LoadUser: User state deserialization failed) 2022-11-07 17:47:16 Error: CReplicationModule [Thread 19ec]: RpcGetStaticObjectsHandler: Failed to load static objects list for peer 09bfa9a4-c73c-408f-89d1-5836d8803d7e with error: Object b5a1a0b4-1e4e-4a4d-99f5-dd5f0178cf28 was not found (LoadUser: User state deserialization failed) 2022-11-07 17:47:17 Error: NetworkModule [Thread 1ba8]: Verify user failed for all computers: grd-tvcart1.DOMAIN.lan: NodVerifyCertificateChain failed: NodVerifyTrustResult: 42, NVT_NotTrusted, X509ChainStatus: 0x1, X509CSF_NotTimeValid, certificate: [Subject='CN=Agent certificate for host *, OU=IT, O=DOMAIN, L=MY_CITY, S=MI, C=US', Issuer='CN=Certificate Authority for DOMAIN ESET, OU=IT, O=DOMAIN, L=MY_CITY, S=GR, C=US', NotBefore=2015-Jul-23 04:00:00, NotAfter:2020-Jul-24 03:59:59, Serial=REMOVED, SHA256=REMOVED, SubjectKeyIdentifier=51fee92de54bc02db2e2805ddefade8259e20adb, AuthorityKeyIdentifier=REMOVED],10.2.12.61: NodVerifyCertificateChain failed: NodVerifyTrustResult: 42, NVT_NotTrusted, X509ChainStatus: 0x1, X509CSF_NotTimeValid, certificate: [Subject='CN=Agent certificate for host *, OU=IT, O=DOMAIN, L=MY_CITY, S=MI, C=US', Issuer='CN=Certificate Authority for DOMAIN ESET, OU=IT, O=DOMAIN, L=MY_CITY, S=GR, C=US', NotBefore=2015-Jul-23 04:00:00, NotAfter:2020-Jul-24 03:59:59, Serial=REMOVED, SHA256=REMOVED SubjectKeyIdentifier=REMOVED, AuthorityKeyIdentifier=REMOVED] 2022-11-07 17:47:17 Error: NetworkModule [Thread 1ba8]: Receive: NodSslWriteEncryptedData: Internal error in the underlying implementations., ResolvedIpAddress:10.2.12.61, ResolvedHostname:grd-tvcart1.DOMAIN.lan, ResolvedPort:51383 2022-11-07 17:47:17 Error: NetworkModule [Thread 1ba8]: Protocol failure for session id 9753, error:Receive: NodSslWriteEncryptedData: Internal error in the underlying implementations. 2022-11-07 17:47:30 Error: CReplicationModule [Thread 19dc]: RpcGetStaticObjectsHandler: Failed to load static objects list for peer a28b25cd-9ef8-4248-82a4-8296824b0409 with error: Object b5a1a0b4-1e4e-4a4d-99f5-dd5f0178cf28 was not found (LoadUser: User state deserialization failed) 2022-11-07 17:47:33 Error: NetworkModule [Thread 14d8]: Error code:121;The semaphore timeout period has expired; SessionId:9582 2022-11-07 17:47:43 Error: CReplicationModule [Thread 15f0]: RpcGetStaticObjectsHandler: Failed to load static objects list for peer 10d2c848-ecca-41f3-98ea-7476baeb7495 with error: Object b5a1a0b4-1e4e-4a4d-99f5-dd5f0178cf28 was not found (LoadUser: User state deserialization failed) 2022-11-07 17:47:57 Error: CReplicationModule [Thread df0]: RpcGetStaticObjectsHandler: Failed to load static objects list for peer 6d5de554-9d37-4e1c-a9bf-1c35cf24c887 with error: Object b5a1a0b4-1e4e-4a4d-99f5-dd5f0178cf28 was not found (LoadUser: User state deserialization failed) 2022-11-07 17:48:06 Error: NetworkModule [Thread 14d8]: Error code:10053;An established connection was aborted by the software in your host machine; SessionId:9729 2022-11-07 17:48:06 Error: NetworkModule [Thread 1bac]: Error in SendCallback: An established connection was aborted by the software in your host machine;Error code: 10053; SessionId:9729 2022-11-07 17:48:10 Error: CReplicationModule [Thread 1cf4]: RpcGetStaticObjectsHandler: Failed to load static objects list for peer 67156841-d3c4-4fbb-ae50-17943ffc0eaf with error: Object b5a1a0b4-1e4e-4a4d-99f5-dd5f0178cf28 was not found (LoadUser: User state deserialization failed) 2022-11-07 17:48:25 Error: CReplicationModule [Thread 1a4c]: RpcGetStaticObjectsHandler: Failed to load static objects list for peer 02a35abd-da18-4be2-a974-60dd0dd1c882 with error: Object b5a1a0b4-1e4e-4a4d-99f5-dd5f0178cf28 was not found (LoadUser: User state deserialization failed) 2022-11-07 17:48:38 Error: CReplicationModule [Thread 1508]: RpcGetStaticObjectsHandler: Failed to load static objects list for peer fd6bd6f9-88e1-4e46-8d31-932dadde7720 with error: Object b5a1a0b4-1e4e-4a4d-99f5-dd5f0178cf28 was not found (LoadUser: User state deserialization failed) 2022-11-07 17:49:00 Error: NetworkModule [Thread 14d8]: Error code:121;The semaphore timeout period has expired; SessionId:9802 2022-11-07 17:49:18 Error: NetworkModule [Thread 1bac]: Error code:121;The semaphore timeout period has expired; SessionId:9674 2022-11-07 17:50:12 Error: NetworkModule [Thread 1bb0]: Error code:121;The semaphore timeout period has expired; SessionId:9992 I see some old certificates mentioned there that expired in 2020. None of those are in use anywhere in the protect iterface, all ours show valid until 2025 for server and agents. The servers status.html shows all green. It does note the certificate is using an obsolete signature algorithm, but is still listed as OK. Scope Time (UTC) Text Server performance 2022-Nov-07 18:01:30 Overall performance status is: OK Synchronization status 2022-Nov-07 16:45:53 Idle - OK Last synchronization of Licenses was successful at 2022-Nov-07 16:45:53 (1 seat pools synchronized) Peer certificate 2022-Nov-07 16:45:45 OK Server peer certificate with subject 'CN=Server certificate for host eset.DOMAIN.org, OU=IT, O=DOMAIN, L=MY_CITY, S=Michigan, C=US' issued by 'CN=Certificate Authority for MYDOMAIN ESET, OU=IT, O=MYDOMAIN, L=MY_CITY, S=GR, C=US' with serial number 'REMOVED' is signed with obsolete signature algorithm and it is still valid for next 30 days Replication Throttling 2022-Nov-07 16:45:44 Current throttling state is: OK Max logs count is : 14000 Max logs KB is : 1048576 KB Max agents is : 280 NetworkModule statistics 2022-Nov-07 18:01:45 NetworkModule per minute statistics Number of SocketContainers: 167 Log duration statistics 2022-Nov-07 18:01:32 Logs Replication statistics for last 60 minutes, updated at least every 15 seconds PERFORMANCE_SERVER_EVENT: Total time: 601 miliseconds, count: 121, one process time: 4 QOS_DATABASE_EVENT: Total time: 277 miliseconds, count: 60, one process time: 4 QOS_NETWORK_EVENT: Total time: 208 miliseconds, count: 60, one process time: 3 TASK_CLIENT_EVENT: Total time: 86 miliseconds, count: 106, one process time: 0 FUNCTIONALITY_COMPUTER_STATUS: Total time: 35 miliseconds, count: 1, one process time: 35 LIVEGRID_STATUS: Total time: 35 miliseconds, count: 8, one process time: 4 APPLIEDPOLICYPRODUCTS_STATUS: Total time: 29 miliseconds, count: 1, one process time: 29 FUNCTIONALITY_PRODUCTS_STATUS: Total time: 28 miliseconds, count: 2, one process time: 14 FUNCTIONALITY_PROBLEMSDETAILS_STATUS: Total time: 25 miliseconds, count: 2, one process time: 12 EPNS_STATUS: Total time: 25 miliseconds, count: 1, one process time: 25 Total time: 1seconds, Total count: 362 Performance details 2022-Nov-07 18:01:30 Detailed performance statistics: I/O reads: 0 KB/s I/O writes: 1 KB/s I/O others: 1027 KB/s Logs latency: 0 s Pending logs: 0 Database size: 335 MB Received logs 2022-Nov-07 17:37:11 Received logs statistics: Received in last minute: 125 (2.08 /s) Received in last hour: 125 (0.03 /s) History of received logs in last hour (group by 10 seconds): Replication Statistics 2022-Nov-07 17:03:39 Logs Replication statistics per minute Number of throttled logs replications: 0 Number of succesful logs replications: 0 NetworkModule 2022-Nov-07 16:45:45 Network module configuration: Socket timeout is set to 21600 seconds Reverse DNS resolving is enabled Dns servers update interval is set to 1200 seconds Product 2022-Nov-07 16:45:29 Product install configuration: Product type: Server Product version: 9.1.1295.0 Product locale: en_US Performance Indicator Value Up time 01:16:22 Memory private usage 1184 MB Available physical memory 1842 MB Any thoughts here before I submit a ticket to support? Link to comment Share on other sites More sharing options...
GreenEnvy22 6 Posted November 8, 2022 Author Share Posted November 8, 2022 As a followup, we restored the VM from a backup made the day before the issue began, and agents all started reporting in again. I didn't figure out for sure what caused it, but it may have been a database corruption due to low space. The windows logs showed some windows updates failing due to space issues. There was still about 6GB free when I looked at it today, but perhaps during an update that had got critically low. I gave the restored VM more space to avoid the issue going forward. Link to comment Share on other sites More sharing options...
GreenEnvy22 6 Posted November 8, 2022 Author Share Posted November 8, 2022 Followup #2. That broke again at 3am today. Not sure why, there were no windows updates, no server reboots. Agents just stopped reporting in several hours after restoring it. The server has tons of space, but I still saw an error message in the mysql logs about not being able to write a SERERNAME-bin.00001 file because there was no space, but the drive has 25GB free. Something seems messed up with the database. I backed it up and restored to a new server, but the new server complains the DB isn't configured correctly. I tried restoring it to a new appliance, but it says that version number (blank) isn't compatible with 9.1.xxxx. I did a repair install of server, which completed, but did not fix the issue. The mySQL section though gave me no issues. Next I tried uninstalling, leaving the DB in place, and reinstalling. Now I got the message about database not being configured properly. I'm next going to restore the VM from backups again, and try backing up that DB before it breaks, and restore to new machine, to see if it works. Link to comment Share on other sites More sharing options...
ESET Staff MartinK 375 Posted December 13, 2022 ESET Staff Share Posted December 13, 2022 Any chance you found out what was going on? From logs I can confirm, that original problem was caused by problem with database - for some reason, it ended up in an inconsistent state in terms of our data. This is something that can be possibly resolved with help of ESET support, but might not be as fast - depending on impact. So reverting database to previous backup was recommended step. But it is crucial to resolve source of the problem as it will most probably occurs again, especially in case there is problem with underlying storage or disk size limitations. Regarding MySQL configuration, there are few parameters that has to be set properly to use MySQL server for ESET PROTECT - were they set prior to deployment of PROTECT over new database? Link to comment Share on other sites More sharing options...
Recommended Posts