Skip to content

elisemoe/caspers-kitchens

Β 
Β 

Repository files navigation

πŸ” Casper's Kitchens

Casper's Kitchens is a ghost kitchen and food-delivery platform built entirely in Databricks by the Developer Relations team. It brings together Lakeflow (ingestion, Spark Declarative Pipelines), AI and BI dashboards with Genie, Agent Bricks, and Apps powered by Lakebase (Postgres): all stitched into one cohesive live demo.

Casper's isn't just a showcase. It's a playground for exploring, experimenting, and practicing a bit of creative misuse. We're pushing the platform past its comfort zone to see what's possible. Everything is designed to be easy to:

  1. πŸš€ Deploy: Just a couple of commands to spin up and tear down the entire environment in minutes.
  2. 🎬 Demo: Run only the stages you need. Every component is powered by live, streaming data.
  3. πŸ§‘β€πŸ’» Develop: add new pipelines, agents, or apps with minimal setup. Everything is built to extend.

We intentionally build only with Databricks, even when it’s inconvenient, so that Casper’s can serve as a shared sandbox for demos, onboarding, learning, and experimentation.

Deploy

Casper's Kitchens uses Databricks Asset Bundles (DABs) to make deployment easy. To get started, clone this repository to your local machine and run the following command from the root of the project:

databricks bundle deploy

This will create the main job, Casper's Initializer, that helps orchestrate the Casper's universe, and make all of Casper's assets available in your workspace at /Workspace/Users/<youruser@email.com>/caspers-kitchens-demo. Databricks Asset Bundles will use your existing CLI configuration by default, to configure more targets and learn more about DABs see databricks.yml and the documentation

This assumes you have installed the databricks CLI to your local machine and authenticated to your workspace. You can interactively authenticate via OAuth with databricks auth login.

Note that you can also deploy via the UI in your Databricks workspace by cloning this repo as a git folder to your workspace, navigating to the folder, and clicking the deploy button.

Deploy from the UI

You're ready to demo!

Demo

Stages

Casper's is organized into tasks within the Casper's Initializer job. Each task handles a specific part of the system: data generation, pipelines, agents, apps, etc.

You can control which tasks to run through the Databricks Jobs UI or using Databricks Asset Bundles (DABs) (examples below). For example, if you only need raw data, run the data generation task. If you want to include agents and apps, run the full chain of dependent tasks as shown in the DAG above.

The easiest way to control Casper's is using Databricks Jobs UI, but you can use DAB commands as shown below to control from the CLI.

1. Run the full demo (recommended)

To spin up the complete Casper's Kitchens environment as pictured above (executes every task in the DAG):

databricks bundle run caspers 

By default, this uses the catalog caspersdev. You can override this like:

databricks bundle run caspers --params "CATALOG=mycatalog"

Make sure you have permission to create catalogs, or that the specified catalog already exists.

To shut down and cleanup all of Casper's assets:

databricks bundle run cleanup (--params "CATALOG=mycatalog")
databricks bundle destroy

2. Run individual stages

You can also run only specific stages of Casper's. Databricks Asset Bundles (--only) execute exactly the tasks you list, so if a stage depends on others, you'll need to include them too. To simplify this, load the helper that resolves dependencies automatically:

source <(python utils/resolve_tasks.py)

Once loaded, you can use the same task names with $ to automatically include all dependencies (for example, $Refund_Recommender_Agent will also run Spark_Declarative_Pipeline and Raw_Data).

Please be aware that currently the the tasks are not idempotent, so run with care. When in doubt, destroy the environment, and re-run with the stage you want with the $ prepended like below:

πŸ“Š Raw Data

Generates a realistic stream of orders and GPS coordinates.

databricks bundle run caspers --only $Raw_Data

πŸ”„ Lakeflow (Spark Declarative Pipeline)

Processes and curates order data using a medallion architecture (Bronze β†’ Silver β†’ Gold tables).

databricks bundle run caspers --only $Spark_Declarative_Pipeline

πŸ€– Refund Recommender Agent

Scores each delivery for refund eligibility.

databricks bundle run caspers --only $Refund_Recommender_Agent

⚑ Refund Recommender Stream

Streams completed orders into the refund-scoring agent in real time.

databricks bundle run caspers --only $Refund_Recommender_Stream

πŸ” Lakebase Reverse ETL

Syncs refund-scored orders from the Lakehouse into Lakebase.

databricks bundle run caspers --only $Lakebase_Reverse_ETL

πŸ“± Refund Manager App

Launches a Databricks app for manual refund review and approval.

databricks bundle run caspers --only $Databricks_App_Refund_Manager

πŸ’¬ Complaint Agent

LLM agent that classifies and responds to customer complaints.

databricks bundle run caspers --only $Complaint_Agent

🌊 Complaint Generator Stream

Simulates a live stream of customer complaint events.

databricks bundle run caspers --only $Complaint_Generator_Stream

βš™οΈ Complaint Agent Stream

Runs the complaint-handling agent continuously on live events.

databricks bundle run caspers --only $Complaint_Agent_Stream

πŸ—„οΈ Complaint Lakebase

Stores complaint data in Lakebase (Postgres) for downstream use.

databricks bundle run caspers --only $Complaint_Lakebase

πŸ“Š Generated Event Types

The data generator produces the following realistic events for each order in the Volume caspers.simulator.events:

Event Description Data Included
order_created Customer places order Customer location (lat/lon), delivery address, ordered items with quantities
gk_started Kitchen begins preparing food Timestamp when prep begins
gk_finished Kitchen completes food preparation Timestamp when food is ready
gk_ready Order ready for pickup Timestamp when driver can collect
driver_arrived Driver arrives at kitchen Timestamp of driver arrival
driver_picked_up Driver collects order Full GPS route to customer, estimated delivery time
driver_ping Driver location updates during delivery Current GPS coordinates, delivery progress percentage
delivered Order delivered to customer Final delivery location coordinates

Each event includes order ID, sequence number, timestamp, and location context. The system models realistic timing between events based on configurable service times, kitchen capacity, and real road network routing via OpenStreetMap data.

🎯 Use Cases

  • πŸ“š Learning Databricks: Complete end-to-end platform experience
  • πŸŽ“ Teaching: Consistent narrative across different Databricks features
  • πŸ§ͺ CUJ Testing: Run critical user journeys in realistic environment
  • 🎨 UX Prototyping: Fully loaded platform for design iteration
  • 🎬 Demo Creation: Unified narrative for new feature demonstrations

πŸ™Œ Why This Matters

Most demos show just one slice of Databricks. Casper's Kitchens shows how it all connects: ingestion, curation, analytics, and AI apps working together. Use it to learn, demo to customers, or build your own extensions.

License

Β© 2025 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.

library description license source

About

πŸ” A fully simulated ghost-kitchen business built on Databricks. Showcasing the end to end Data + AI story in one cohesive framework you can deploy, explore, and extend.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 75.5%
  • Python 18.9%
  • HTML 5.6%