Shalini's Knowledge Share


  • 28th Aug 2023

    Loops in Databricks or Parallel Processing

    Usage of loops and conditions is unavoidable in both application development and data processing. However, for a data engineer, this becomes a nightmare if not handled correctly. I will not be deep diving into how loops are handled in Databricks. However, in short, the problem is that, the same block of code is processed one… Continue reading

    Databricks
    Parallel Processing, Spark cluster, Databricks Loops
  • 14th Apr 2023

    Databricks – Processing Geographic Data for Australia

    Little bit about shapefiles; One way of storing Geographic data is using a shape file format. Shape file format is created by ESRI which consists of vector data. ESRI Technical Documentation describes the shapefile as something that stress nontopological geometry and attribute information for spatial features in a data set. Geometry for a feature is… Continue reading

    Databricks
    Geographic Data, GeoPandas, KML, Shape Files
  • 11th Nov 2022

    Azure – Databricks Integration with GIT Devops

    Below writeup demonstrate how you can use source control using GIT on your databricks code. Azure Databricks has it’s own facility where you can maintain your code, no doubt. However, if you are using multiple technology (specially in a BI project, your bound to use ADF, MS SQL, Azure SQL server and so on), it… Continue reading

    Databricks
    Databricks Repository, DevOps, GIT
  • 30th May 2022

    Access Blob Storage from Azure Databricks using Azure Key Vault

    To access the blob storage from databricks, you need to add a secret scope. A secret scope is a collection of secrets identified by a name. There can be 2 types of secret scopes: Below is the step by step guide on how to access blob storage from databricks: Assumptions: Azure databricks and Azure storage… Continue reading

    Databricks
    Azure Blob Storage, Azure Key Vault
  • 29th Mar 2022

    Azure – Process and extract excel files

    Excel usage is still quite widespread and is being used by many people across all industries to maintain their data. These data will need to be used to create various reports after joining with other application level data which are used by their relational databases. These excel files may come through various medium; Email, Sharepoint… Continue reading

    ADF, Databricks
  • 12th May 2021

    How to use the transformations in a Dataflow to optimize the performance

      In SSIS, we have many different kinds of transformations that we can use to cleanse and restructure our data in the way we want before sending to our destination. However, key thing to note is that behind our designer, all these transformations do not behave the same way. Each have their own way of… Continue reading

    SSIS
  • 12th May 2021

    SSIS Variables Vs Parameters

      Many people are quite confused on how parameters and variables work. However to clearly demonstrate the differences, you need to deploy a package in the SQL server. Parameters There are 2 types of Parameters: Package Parameters and Project Parameters. The only thing to that differentiates in these 2 types is the scope. Project parameters… Continue reading

    SSIS
    SSIS Variables
  • 25th Sep 2020

    Accessing FTPS using SSIS

      Although there is a FTP task within native SSIS, it lacks both SFTP and FTPS tasks. There are 3 methods to access: 1. Using third party components – ex: zappySys, Cozyroc etc. 2. Purely code based 3. Using command line tools Using a third party component is pretty straightforward if you have an understanding… Continue reading

    SSIS
  • 5th Jun 2020

    Surrogate and Business Keys

    Looking at my previous write up on BI solution purpose, there are 4 main requirements that can be expected out of a dimension table: They should obviously be linked to the facts or the business process. Since we are getting data from so many different back end source systems, we need to keep track of… Continue reading

    Uncategorized
  • 5th Jun 2020

    Purpose of having a separate BI solution.

    Several questions, answers which I have not written previously, partly because I was not interested in writing what I know and what I feel, my view points. What is the purpose of having BI – to analyse data, aggregate, slice and dice, identify trends, correspondence that will enable people to make decisions, from simple marketing… Continue reading

    Uncategorized
Next Page»

Recent Posts

  • Loops in Databricks or Parallel Processing
  • Databricks – Processing Geographic Data for Australia
  • Azure – Databricks Integration with GIT Devops
  • Access Blob Storage from Azure Databricks using Azure Key Vault
  • Azure – Process and extract excel files

Recent Posts

  • Loops in Databricks or Parallel Processing
  • Databricks – Processing Geographic Data for Australia
  • Azure – Databricks Integration with GIT Devops

Follow Me

Tumblr

WordPress

Instagram

Newsletter

Blog at WordPress.com.

  • Subscribe Subscribed
    • Shalini's Knowledge Share
    • Already have a WordPress.com account? Log in now.
    • Shalini's Knowledge Share
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar