Databricks
-
Loops in Databricks or Parallel Processing
Usage of loops and conditions is unavoidable in both application development and data processing. However, for a data engineer, this becomes a nightmare if not handled correctly. I will not be deep diving into how loops are handled in Databricks. However, in short, the problem is that, the same block of code is processed one Continue reading
-
Databricks – Processing Geographic Data for Australia
Little bit about shapefiles; One way of storing Geographic data is using a shape file format. Shape file format is created by ESRI which consists of vector data. ESRI Technical Documentation describes the shapefile as something that stress nontopological geometry and attribute information for spatial features in a data set. Geometry for a feature is Continue reading
-
Azure – Databricks Integration with GIT Devops
Below writeup demonstrate how you can use source control using GIT on your databricks code. Azure Databricks has it’s own facility where you can maintain your code, no doubt. However, if you are using multiple technology (specially in a BI project, your bound to use ADF, MS SQL, Azure SQL server and so on), it Continue reading
-
Access Blob Storage from Azure Databricks using Azure Key Vault
To access the blob storage from databricks, you need to add a secret scope. A secret scope is a collection of secrets identified by a name. There can be 2 types of secret scopes: Below is the step by step guide on how to access blob storage from databricks: Assumptions: Azure databricks and Azure storage Continue reading
-
Azure – Process and extract excel files
Excel usage is still quite widespread and is being used by many people across all industries to maintain their data. These data will need to be used to create various reports after joining with other application level data which are used by their relational databases. These excel files may come through various medium; Email, Sharepoint Continue reading