Log_reuse_wait_desc showing as Availability_Replica



Recently in one my Always On Availability replica environment I got an issue where one of the availability replica database log file has grown completely, and we got a request to shrink the issue. When we run the below command we found that “log_reuse_wait_desc” showing as “Availability_Replica
What this means is this indicates that the logged changes in the availability database at the primary replica that have not arrived and that these changes were applied to the availability database at one of the secondary replicas. Until logged changes arrive and are applied, the changes cannot be truncated from the availability database log at the primary replica.

Troubleshooting:
"Log Send Queue" and "Redo Queue" are measurable data points during availability database synchronization. You can monitor these data points in order to determine whether an availability database log cannot be truncated because of the log uses type AVAILABILITY_REPLICA.
·   Log Send Queue
When a transaction is performed at the primary, the logged blocks must be delivered and hardened to the database log file at the secondary. Any delay will prevent truncation of those logged changes in the database at the primary replica. A common reasons for a sustained log send queue is latency in the network or during hardening (write to disk) of the log blocks on the secondary.
As soon as it is hardened to the secondary database log file, a dedicated redo thread applies the log records. If the redo operation cannot keep up with the transaction log that is generated, log growth may occur. If the secondary replica redo operation is behind in applying those changes to a corresponding secondary database, the primary will truncate the transaction log.


· Redo Queue
As soon as it is hardened to the secondary database log file, a dedicated redo thread applies the log records. If the redo operation cannot keep up with the transaction log that is generated, log growth may occur. If the secondary replica redo operation is behind in applying those changes to a corresponding secondary database, the primary will truncate the transaction log

Identifying the secondary database that is delaying log truncation:
If there is more than one secondary, there may be a particular secondary that is responsible for the log truncation issue. To identify the secondary database that is delaying log truncation, log on to the primary replica, query the truncation_lsn column of the sys.dm_hadr_database_replica_statesdynamic management view (DMV), and then review the log_send_queue and redo_queue data points to help diagnose the issue.
The AlwaysOn dashboard and the sys.dm_hadr_database_replica_states DMV help monitor the redo progress. The following table lists some key fields:


FieldDescription
log_send_queue_sizeSize of log records that did not arrive at the secondary replica.
log_send_rateRate at which log records are being sent to the secondary databases.
redo_queue_sizeSize of log records in the log files of the secondary replica that has not yet been redone, in kilobytes (KB).
redo_rateRate at which the log records are being redone on a given secondary database, in KB per second.
last_redone_lsnActual log sequence number of the last log record that was redone on the secondary database. The last_redone_lsn value is always less than the last_hardened_lsn value.
last_received_lsnLog block ID that identifies the point up to which all log blocks are received by the secondary replica that hosts this secondary database. This field reflects a log-block ID that is padded with zeros. It is not an actual log sequence number.

For example, execute the following query against the primary replica in order to report the log_send_queue and redo_queue values for each secondary availability database.

select ar.replica_server_name, drs.truncation_lsn, drs.log_send_queue_size, drs.redo_queue_size
from sys.dm_hadr_database_replica_states drs join sys.availability_replicas ar
on drs.replica_id=ar.replica_id
where drs.is_local=0

Workaround:
After you identify the secondary database that makes this occur, try one or more of the following methods to work around this issue temporarily:
  • Take the database out of the availability group for the offending secondary
  • Note This method will result in the loss of the High Availability/Disaster Recovery scenario for the secondary. You may have to set up the Availability Group again in the future.
  • Add more log space or log files.




Comments

Popular posts from this blog

System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out

Pre login Handshake or Connection Timeout Period

Transparent Data Encryption(TDE) with Master Key and Certificate in SQL Server