Configuring secondary database in Azure SQL DB - Bug found

Hi All

Last week we had an issue with a secondary DB in geo replication and fail over group.

To make the long story short we had to delete the secondary and recreate a secondary from scratch.

And now let me tell you the story, we build a DB in P6 tier - very high, expensive and highly available.

Then we add a geo replication copy via the platform, like it shows here.



This is take from MSFT documentation:

https://docs.microsoft.com/en-us/azure/azure-sql/database/active-geo-replication-overview

It is written: 

"Both primary and secondary databases are required to have the same service tier. It is also strongly recommended that the secondary database is created with the same backup storage redundancy and compute size (DTUs or vCores) as the primary. If the primary database is experiencing a heavy write workload, a secondary with lower compute size may not be able to keep up with it. That will cause redo lag on the secondary, and potential unavailability of the secondary. To mitigate these risks, active geo-replication will throttle the primary's transaction log rate if necessary to allow its secondaries to catch up.

Another consequence of an imbalanced secondary configuration is that after failover, application performance may suffer due to insufficient compute capacity of the new primary. In that case, it will be necessary to scale up database service objective to the necessary level, which may take significant time and compute resources, and will require a high availability failover at the end of the scale up process.

If you decide to create the secondary with lower compute size, the log IO percentage chart in Azure portal provides a good way to estimate the minimal compute size of the secondary that is required to sustain the replication load. For example, if your primary database is P6 (1000 DTU) and its log write percent is 50%, the secondary needs to be at least P4 (500 DTU). To retrieve historical log IO data, use the sys.resource_stats view. To retrieve recent log write data with higher granularity that better reflects short-term spikes in log rate, use sys.dm_db_resource_stats view.

So we have decide to create the secondary with lower compute size. its nor recommend - but we can live with it.

We have created a P4 secondary and all went very well.

We used in those DB's an in-memory table, lately its been growing. one day we say in the monitoring that the secondary is stuck. So as a cloud lovers we just click to scale it up. Nothing happened. We clocked again.... nothing happened.

So we opened a ticket, they also tried to understand what wend wrong.

They found the root cause but did not find a way to solve it.

The in-Memory Table in the primary went higher then the size permitted in the secondary. Scale up stuck all time

Nothing helped, we need the secondary urgently, so we decided to cut it from the geo-replication and delete it and build a new secondary.


Those are the limitation :

https://docs.microsoft.com/en-us/azure/azure-sql/database/resource-limits-dtu-single-databases




I know we need to work by the recommendations but i am sure this scenario just now added to the automation QA :-).



תגובות