From DBA to Data Engineer in Azure

I recently moved a role

From being a DBA Manager, Who is responsible for the operational databases.

I moved to manage the data engineering group.

So what exactly is the difference between the two functions?

DBA - Production Databases:

SQL\ NoSQL- 24*7, powerful server on premise or on the cloud, managed or semi managed, security tasks, high performance is a target, multiregional, HA as top priority.
Developers are using Microservices - so we have many applications many services and many many Databases.
Many kinds of DB's like Cloud IAAS and PAAS.
Secure and audit the data is must.
The Clusters must have Uptime as long as we can achive.
Data Modeling - is so important too.

Challenges and Problems in the data bases systems

Lots of DB’s
Lots of creators / no standards
Lots of Consumers (Query, tools, SLA)
Raw data
Lots of data resources
Data silos

In Data Engineering we have other challenges for example we have Data lake and Data Warehouses :

Batch process.
Stream Process.
many data sources
ETL and EL.
Data Quality \ Governance \ Catalog.
New skill sets
Lots of consumers with difference skill sets.

It is a new and fascinating world, with special challenges and its importance is increasing day by day in organizations.

If once only analysts or data scientists consumed information from the Data Lake, today the apps are already taking information from there.

If once it was enough to know SQL to access the information - today you need to know much more and on the other hand the role of the data engineer to make the information accessible to those who need it in the simplest and easiest way.

The interesting challenges will be presented in the following posts.

stay tuned

Comments

Availability Zones in Azure and the relation to SLA in Azure SQL DB

שלום לכולם כאשר מקנפגים Azure SQL DB מנוהל והוא תחת Premium or Business Critical Toers מיקרוסופט שואלים אותך: Would you like to make this database zone redundant? מה זו השאלה הזו? מה ההשפעה שלה ומה העלות שלה? אז עשינו ובדקנו מה ההשפעה. הלינק הבסיסי להסבר הקונפיגורציה הזו הוא זה: https://azure.microsoft.com/is-is/blog/azure-sql-database-now-offers-zone-redundant-premium-databases-and-elastic-pools/ מה זה אומר? זה מסביר על המושג הקריטי Availability Zones ועל השימוש שלו ב SLA של הדיבי שלנו. ובכן בכל דאטה סנטר בענן יש בעצם 3 דאטה סנטרים, הכל מנותק אחד מהשני. https://docs.microsoft.com/en-us/azure/availability-zones/az-overview לכל Azure SQL DB יש 3 עותקים ניסתרים, מיקרוסופט נותנת אופציה לשמור את אחד העותקים הניסתרים ב Availability Zone אחר. הדבר מעלה את ה SLA כך שאם יש תקלה ב Availability Zone אחד זה לא משפיע על ה DB. לכן אני ממליץ בחום לכולם לאפשר את הקונפיגורציה הזו.

How to restore deleted Azure Synapse dedicated SQL pool

Existing dedicated pool can be easily restored from Azure portal or PowerShell command, but for now deleted pool could be restored from PowerShell only! Example: # Connect to Azure with system-assigned managed identity $AzureContext = (Connect-AzAccount -Identity).context # set and store context $AzureContext = Set-AzContext -SubscriptionName $AzureContext.Subscription -DefaultProfile $AzureContext # $AzureContext = Set-AzContext -SubscriptionName $SubscriptionName -DefaultProfile $AzureContext $SubscriptionName="Databases" $ResourceGroupName="stg-rg-we" $ServerName="stg-synapse-we" $DatabaseName="sql_we_2023_11_07_13_42" $NewDatabaseName="sql_dp_we_deleted" ######################################## $token = (Get-AzAccessToken -ResourceUrl https://database.windows.net).Token $SubscriptionId = "ce088f9e-1111111a3914b" $DedicatedPoolEndPoint = "stg-synapse-we.sql.azuresynapse.net" $DedicatedPoolName = $DatabaseNam...

Configuring secondary database in Azure SQL DB - Bug found

Hi All Last week we had an issue with a secondary DB in geo replication and fail over group. To make the long story short we had to delete the secondary and recreate a secondary from scratch . And now let me tell you the story, we build a DB in P6 tier - very high, expensive and highly available. Then we add a geo replication copy via the platform, like it shows here. This is take from MSFT documentation: https://docs.microsoft.com/en-us/azure/azure-sql/database/active-geo-replication-overview It is written: " Both primary and secondary databases are required to have the same service tier. It is also strongly recommended that the secondary database is created with the same backup storage redundancy and compute size (DTUs or vCores) as the primary. If the primary database is experiencing a heavy write workload, a secondary with lower compute size may not be able to keep up with it. That will cause redo lag on the secondary, and potential unavailability of the secondary. To mit...

SQL Azure for DBAs and Data Engineers

Search This Blog