Azure Data Lake Catalog (U-SQL)

The Azure Data Lake Catalog (U-SQL) is one of the ADLA component by which U-SQL organizes data and code for sharing and re-use. The Catalog stores databases, tables, views, stored procedures, table-valued functions (TVFs), schemas, assemblies, external data sources and all other code-related items. Basically catalog are useful when there is a requirement of code sharing and performance enhance in Azure Data Lake Analytics.

Followings are the few key points:-

Every Data Lake Analytics account has a catalog associated with it.
Catalog can not be deleted.
One master database will be available by default, which can not be deleted.
Every catalog can have one or more databases .
Catalog stores the U-SQL code in the form of views, stored proc and table valued functions(TVFs).
Catalog database data gets stored in the Azure data lake stores in the catalog folder.
Catalog data file extension is .ss .

To Start with database creation and other entities follow the next steps:

You need one ADLA account which will have catalog by default.
To create any entity in catalog you will have to write U-SQL code and run it in ADLA jobs.
Create database

CREATE DATABASE TestDB;
// with IF NOT EXISTS
CREATE DATABASE IF NOT EXISTS TestDB;

4. Create Schema

USE DATABASE TestDB;
 CREATE SCHEMA NewSchema;

5. Create Table

It can be created in two ways, you can directly define the schema and structure and then load the data later, like below.

DROP TABLE IF EXISTS TestDB.NewSchema.NewTable;

CREATE TABLE TestDB.NewSchema.NewTable
( 
DimID int,
DimName string, 
RoleDescription string,
isActive bool,
INDEX idx 
CLUSTERED(DimID ASC)
DISTRIBUTED BY HASH(DimID)
);

Other way to create the table is, using the RowSet extracted from the file, but the table must have the index defined. As below:

@input = 
SELECT * FROM 
(VALUES
("Val1", 1 ),
("Val2", 2 ),
("Val3", 3 )
) AS T( Col1, Id );

DROP TABLE IF EXISTS [TestDB].[NewSchema].[NewTable];

CREATE TABLE [TestDB].[NewSchema].[NewTable]
( 
INDEX idx 
CLUSTERED(col1 ASC)
DISTRIBUTED BY HASH(col1) 
) AS SELECT * FROM @input;

6. Create View

DROP VIEW IF EXISTS [TestDB].[NewSchema].[VW_TestView];

CREATE VIEW [TestDB].[NewSchema].[VW_TestView]
AS
SELECT 3000000 + ROW_NUMBER() OVER ( ORDER BY R.DimID ASC) AS [NewId] 
, "Source" AS [DataSource] 
, Convert.ToString((Convert.ToString(R.RID)) ?? "N/A") AS [R_Id] 
, R.DimName AS [OrgName] 
, R.ZipCode AS [PostalCode] 
, G.ISOCCode AS [C_Code] 
,B.Name ?? "UNKNOWN" AS [BCode] 

FROM [TestDB].[NewSchema].Dim_R AS R
INNER JOIN 
(SELECT DISTINCT ISOCCode, Name
FROM [TestDB].[NewSchema].[DimC]) AS G
ON G.Name == R.Code
LEFT JOIN [TestDB].[NewSchema].[Dim_B] AS B
ON R.ID == B.ID;

7. Create Stored Procedure

DROP PROCEDURE IF EXISTS [TestDB].[NewSchema].[CatalogFirstStoredProc];
CREATE PROCEDURE [TestDB].[NewSchema].[CatalogFirstStoredProc]
(
@Param1 string = "Test",
@Param2 int = 0
) AS
BEGIN

DECLARE @Var1 int= 8000000;

INSERT INTO [TestDB].[NewSchema].[NewTable]
(
Col1
,Col2
,Col3
,Col4
)
SELECT Convert.ToInt32(@Var1 + ROW_NUMBER() OVER(ORDER BY R.Column1 ASC)) AS Col1
,B.col2
,B.col3
,R.col4
FROM [TestDB].[NewSchema].Dim_R AS R
INNER JOIN 
(SELECT DISTINCT ISOCCode, Name FROM [TestDB].[NewSchema].[DimC]) AS G
ON G.Name == R.Code
LEFT JOIN [TestDB].[NewSchema].[Dim_B] AS B
ON R.ID == B.ID;

@result = 
SELECT * 
FROM [TestDB].[NewSchema].[NewTable]; 

OUTPUT @result 
TO "/Output/Procedure/CatalogFirstStoredProc.csv" 
USING Outputters.Csv(); 

END;

8. Call Stored Procedure

// This example will call the procedure CatalogFirstStoredProc: 
[TestDB].[NewSchema].[CatalogFirstStoredProc](); 

// This example will call the procedure CatalogFirstStoredProc With Parameters: 
[TestDB].[NewSchema].[CatalogFirstStoredProc]
 ( 
400,
DateTime.Parse("12/24/2018")
 );

Some of the content has been taken from the MSDN sites, i have consolidated and put it together to understand it better. I hope, you now have a better idea about Azure Data Lake Catalog. For any clarification please comment.

Feedback will be appreciated. Cheers!

Azure Data Lake Catalog (U-SQL)

Create database

4. Create Schema

5. Create Table

6. Create View

7. Create Stored Procedure

8. Call Stored Procedure

Leave a Reply Cancel reply

PraYaG

Create database

4. Create Schema

5. Create Table

6. Create View

7. Create Stored Procedure

8. Call Stored Procedure

Leave a Reply Cancel reply

PraYaG

Related Posts