Shalini's Knowledge Share

28th Aug 2023

Loops in Databricks or Parallel Processing

Usage of loops and conditions is unavoidable in both application development and data processing. However, for a data engineer, this becomes a nightmare if not handled correctly. I will not be deep diving into how loops are handled in Databricks. However, in short, the problem is that, the same block of code is processed one… Continue reading

Databricks

Parallel Processing, Spark cluster, Databricks Loops

14th Apr 2023

Databricks – Processing Geographic Data for Australia

Little bit about shapefiles; One way of storing Geographic data is using a shape file format. Shape file format is created by ESRI which consists of vector data. ESRI Technical Documentation describes the shapefile as something that stress nontopological geometry and attribute information for spatial features in a data set. Geometry for a feature is… Continue reading

Databricks

Geographic Data, GeoPandas, KML, Shape Files

11th Nov 2022

Azure – Databricks Integration with GIT Devops

Below writeup demonstrate how you can use source control using GIT on your databricks code. Azure Databricks has it’s own facility where you can maintain your code, no doubt. However, if you are using multiple technology (specially in a BI project, your bound to use ADF, MS SQL, Azure SQL server and so on), it… Continue reading

Databricks

Databricks Repository, DevOps, GIT

30th May 2022

Access Blob Storage from Azure Databricks using Azure Key Vault

To access the blob storage from databricks, you need to add a secret scope. A secret scope is a collection of secrets identified by a name. There can be 2 types of secret scopes: Below is the step by step guide on how to access blob storage from databricks: Assumptions: Azure databricks and Azure storage… Continue reading

Databricks

Azure Blob Storage, Azure Key Vault

29th Mar 2022

Azure – Process and extract excel files

Excel usage is still quite widespread and is being used by many people across all industries to maintain their data. These data will need to be used to create various reports after joining with other application level data which are used by their relational databases. These excel files may come through various medium; Email, Sharepoint… Continue reading

ADF, Databricks

12th May 2021

How to use the transformations in a Dataflow to optimize the performance

In SSIS, we have many different kinds of transformations that we can use to cleanse and restructure our data in the way we want before sending to our destination. However, key thing to note is that behind our designer, all these transformations do not behave the same way. Each have their own way of… Continue reading

SSIS

12th May 2021

SSIS Variables Vs Parameters

Many people are quite confused on how parameters and variables work. However to clearly demonstrate the differences, you need to deploy a package in the SQL server. Parameters There are 2 types of Parameters: Package Parameters and Project Parameters. The only thing to that differentiates in these 2 types is the scope. Project parameters… Continue reading

SSIS

SSIS Variables

25th Sep 2020

Accessing FTPS using SSIS

Although there is a FTP task within native SSIS, it lacks both SFTP and FTPS tasks. There are 3 methods to access: 1. Using third party components – ex: zappySys, Cozyroc etc. 2. Purely code based 3. Using command line tools Using a third party component is pretty straightforward if you have an understanding… Continue reading

SSIS

5th Jun 2020

Surrogate and Business Keys

Looking at my previous write up on BI solution purpose, there are 4 main requirements that can be expected out of a dimension table: They should obviously be linked to the facts or the business process. Since we are getting data from so many different back end source systems, we need to keep track of… Continue reading

Uncategorized

5th Jun 2020

Purpose of having a separate BI solution.

Several questions, answers which I have not written previously, partly because I was not interested in writing what I know and what I feel, my view points. What is the purpose of having BI – to analyse data, aggregate, slice and dice, identify trends, correspondence that will enable people to make decisions, from simple marketing… Continue reading

Uncategorized