Skip to main content

Integrations - Databricks (push)

O
Written by Optimize Team
Updated today

This article describes our approach and general guidance for getting your Experiment Data, such as Views and Metrics into Databricks.

Overview

Webtrends Optimize can export data directly into Azure Blob Storage, from where it can be ingested into your Databricks environment. Setting up the export is handled by the Webtrends Optimize support team - contact us to get this configured.

Once the data is flowing into Blob Storage, the steps below will get it into Databricks.

What gets exported

Webtrends Optimize writes data as files into a container in Azure Blob Storage. Files are typically named with a date, for example data_2026-04-07.csv.

Each file represents a day's worth of experiment and interaction data.

Where to set up your Workflow

Head over to Jobs & Pipelines

Create a new Job or ETL pipeline, depending on how you plan to use our data:

Everyone's journey from here will differ somewhat, depending on how you want to use our data and where you wish to save it.

Workflow type

There are a few options for how to set up your Workflow.

1. Scheduled Job with dynamic pathname (simple)

A Job lets you run a notebook or Python script on a schedule, and is the right mechanism for reading files from external storage and landing them into Databricks.

You would schedule a job to run at a predictable time, typically after 6am UTC.

The code would do something like the below - reading the data and appending it to your table.

(python)

from datetime import date

today = date.today().strftime("%Y-%m-%d")
path = f"abfss://[email protected]/data_{today}.csv"

df = spark.read.csv(path, header=True, inferSchema=True)

df.write.mode("append").saveAsTable("my_database.my_table")

2. File Arrival triggers

This is a poll-based watcher in Databricks, that you can configure to watch a Blob Storage path. Databricks does the checking for you.

If you use this, Databricks will check every minute for new files.

This is a heavy-handed approach for most users, but in some cases we post incremental updates throughout the day and not just once at the end of each day, making this the right approach for those users.

Format note

Files are exported as CSV.

If you intend to query the data heavily inside Databricks, you may want to convert it to Delta format after ingestion - Delta is Databricks' native table format and offers significantly better query performance and reliability for analytical workloads.

Did this answer your question?