Databricks

Databricks is a unified data analytics platform that provides a collaborative environment for data scientists, engineers, and business analysts. It is built on top of Apache Spark and provides a fully managed, scalable, and secure cloud-based platform for big data analytics.

πŸ”§ Setup Guide for Databricks in WELD

Databricks can be used with WELD by leveraging its integration directly with Unity Catalog. Weld uses the M2M OAuth application for establishing connection.

Step 1: Configure Databricks service principal

Databricks setup weld user
  1. Inside your Databricks deployment you go to: Settings -> identity and access -> service principals -> manage

  2. Then press the Add service principal and select add new. Then choose datbricks managed and a name (fx weld-service-principal) and press add.

  3. When you have added that service principal go to the page of it. Press Secrets and generate a new secret. Note down the Client ID and Secret you need them during setup in Weld.

Step 2: Setup the SQL Warehouse

Databricks setup SQL Warehouse
  1. In the Databricks console, go to SQL -> Create -> SQL Warehouse.

  2. We recommend starting with 2X-Small size and scaling up as your workload increases.

  3. Set the timeout to 5 min. And choose the warehouse type you want.

  4. Go to the connection details and note down the Server hostname and HTTP path. You will need those when configuring the connection in Weld.

  5. Go to premissions and add your weld-service-principal to permissions and set can use.

Step 3: Setup the Unity Catalog

Databricks setup unity catalog
  1. In the Databricks console, go to Catalog -> + -> Create a catalog.
  2. Enter name of the catalog (weld) and choose a storage location.
  3. Go to your newly created catalog and Set permissions for the weld-service-principal:
    1. Prerequisite USE SCHEMA
    2. Create CREATE VOLUME
    3. Edit WRITE VOLUME
    4. Read EXECUTE, READ VOLUME, SELECT
  4. Create a new schema
  5. Go to the newly created schema and create a new volume
  6. And copy the URL under the Description. It will look something like this: /Volumes/weld_databricks/weld_catalog/weld_volume

You now have all the settings you need to setup Weld with Databricks and start syncing your data.

Was this page helpful?