In this post we will find out how to use PolyBase in SQL Server 2019 to extract data from text files stored in Azure Data Lake Gen2 storage. In the second part of this post we will see how to add databases that hosts PolyBase external tables into an Always On Availability Group.
Scenario
- Text files in Azure Data Lake Storage
- On premises SQL Server 2019
- PolyBase to extract data from text files in Azure and store into a table in an on-premises SQL Server
Solution
PolyBase in SQL Server 2019 allows querying a wide variety of external data sources including Azure Blob Storage, click here for details.
-
- Install PolyBase Engine if it is not already installed. After installation, following three databases will be added to your SQL Server instance depending on the options you have selected during the installation.
- Enable PolyBase with the following statement and restart SQL Server service. Restarting SQL Server restarts these services as well
-
- SQL Server PolyBase Engine
- SQL Server PolyBase Data Movement Service
12345EXEC sp_configure @configname='PolyBase enabled',@configvalue=1GORECONFIGUREGO -
- Set “hadoop connectivity” to Azure Blob Storage provider using sp_configure. This setting is required in order to create an external data source for external table.
12345EXEC sp_configure @configname='hadoop connectivity',@configvalue=7GORECONFIGUREGO - Create an empty database
123456CREATE DATABASE [MAQDB]CONTAINMENT = NONEON PRIMARY( NAME = N'MAQDB', FILENAME = N'F:\data\MAQDB.mdf' , SIZE = 8192KB , FILEGROWTH = 65536KB )LOG ON( NAME = N'MAQDB_log', FILENAME = N'G:\log\MAQDB_log.ldf' , SIZE = 8192KB , FILEGROWTH = 65536KB ) - PolyBase uses database scoped credentials to create and access objects from external sources. We will first create a master key in the context of MAQDB using the following query.
12USE [MAQDB]CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Pa55w.rd@123'; - Create a database scoped credential named “AzureStorageCredential” with the following details. Remember to replace the secret with your storage account key.
12345USE [MAQDB]CREATE DATABASE SCOPED CREDENTIAL [AzureStorageCredential]WITHIDENTITY = 'MAQID',SECRET = 'KWdL5mFnUvYabHV+R1piGbONm8iAO2y0yPr7HW4ykPrKrzWM1cHe7fzhPhVwpNQu4KzXn5EEX2Zw42N6MupnQA=='; - Create external data source using the following query. Please replace <yourBlobContainerName> and <yourStorageAccountName> with your details.
1234567CREATE EXTERNAL DATA SOURCE AzureStorageWITH(TYPE = HADOOP,LOCATION = 'wasbs://@ .blob.core.windows.net',CREDENTIAL = AzureStorageCredential);NOTE:Make sure you have entered the LOCATION correctly without any typos, otherwise you will get following nasty error whilst creating EXTERNAL table.
Msg 105019, Level 16, State 1, Line 35 EXTERNAL TABLE access failed due to internal
error: ‘Java exception raised on call to HdfsBridge_IsDirExist. Java exception
message: com.microsoft.azure.storage.StorageException: The server encountered an
unknown failure: : Error [com.microsoft.azure.storage.StorageException:
The server encountered an unknown failure: ] occurred while accessing external file.’You can verify LOCATION by accessing the following link in any browser, replace <storageAccountName> and <BlobContainerName> with your details
https://<yourStorageAccountName>.blob.core.windows.net/<yourBlobContainerName>You will get following response from the above URL
Convert https link to the following and use it in your query as LOCATION
wasbs://<yourBlobContainerName>@<yourStorageAccountName>.blob.core.windows.netAt this point we have a database, database scoped credential and an external data source
- Create an external file format named TextFile with a formattype of DelimitedText and a field terminator of comma as I am using a comma separated Text file.
12345CREATE EXTERNAL FILE FORMAT TextFileWITH (FORMAT_TYPE = DelimitedText,FORMAT_OPTIONS (FIELD_TERMINATOR = ','));
- Create an external table using the following query. Please change the LOCATION, DATA_SOURCE and FILE_FORMAT with your own values.
1234567891011121314CREATE EXTERNAL TABLE dbo.MAQExternal ([Date] datetime2(3) NULL,[DateKey] decimal(38, 0) NULL,[MonthKey] decimal(38, 0) NULL,[Month] nvarchar(100) NULL,[Quarter] nvarchar(100) NULL,[Year] decimal(38, 0) NULL,[Year-Quarter] nvarchar(100) NULL)WITH (LOCATION='/MAQDate.txt',DATA_SOURCE=AzureStorage,FILE_FORMAT=TextFile);
- Test that the table is created and populated by running the following statement
1SELECT * FROM dbo.MAQExternal;
In part 2 of this post we will add “MAQDB” database to an Always On Availability Group.
- Install PolyBase Engine if it is not already installed. After installation, following three databases will be added to your SQL Server instance depending on the options you have selected during the installation.
Leave a Reply
Be the First to Comment!